RE: Faceted search not working?
Hi, try http://localhost:8080/solr/select/?q=YOUR-QUERYfacet=truefacet.field=title I don't think the bolean fields is mapped to on and off :) -birger -Original Message- From: Ilya Sterin [mailto:ster...@gmail.com] Sent: 24. mai 2010 23:11 To: solr-user@lucene.apache.org Subject: Faceted search not working? I'm trying to perform a faceted search without any luck. Result set doesn't return any facet information... http://localhost:8080/solr/select/?q=title:*facet=onfacet.field=title I'm getting the result set, but no face information present? Is there something else that needs to happen to turn faceting on? I'm using latest Solr 1.4 release. Data is indexed from the database using dataimporter. Thanks. Ilya Sterin
Tagging and excluding Filters
Hi, I am using the following solution: http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters However when I implemented this on I found that I cannot combine different filter types: http://search.un-informed.org/search?q==t[23]=malariatm=anys=Search The above request would generate the following Solr query: facet=truefl=*,scoresort=score+descstart=0q=(tag_ids:(23))facet.field={!ex%3Ddt}organisation_idfacet.field={!ex%3Ddt}tag_idsfacet.field={!ex%3Ddt}addressee_idsfacet.field={!ex%3Ddt}operative_phrase_idfacet.field={!ex%3Ddt}documenttype_idfacet.field={!ex%3Ddt}information_type_idfacet.field={!ex%3Ddt}legal_valuejson.nl=mapwt=jsonrows=21 Now when I deselect one of the checkboxes I add an fq parameters: facet=truefl=*,scoresort=score+descstart=0q=(tag_ids:(23))facet.field={!ex%3Ddt}organisation_idfacet.field={!ex%3Ddt}tag_idsfacet.field={!ex%3Ddt}addressee_idsfacet.field={!ex%3Ddt}operative_phrase_idfacet.field={!ex%3Ddt}documenttype_idfacet.field={!ex%3Ddt}information_type_idfacet.field={!ex%3Ddt}legal_valuejson.nl=mapwt=jsonfq={!tag%3Ddt}organisation_id:(-9)rows=21 {!tag=dt}organisation_id:(-8) Now where I am at a loss is when I want to filter in multiple different sections (like filter both organisations as well as clause information type. I tried various ways of constructing the fq prameter but I always get a parse error: {!tag=dt}(organisation_id:(-8) AND information_type_id:(-1)) {!tag=dt}organisation_id:(-8) AND {!tag=dt}information_type_id:(-1) For example: Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'organisation_id:(-9) AND {!tag=dt}information_type_id:(-1)': Encountered } } at line 1, column 35. When running: facet=truefl=*,scoresort=score+descstart=0q=(tag_ids:(23))facet.field={!ex%3Ddt}organisation_idfacet.field={!ex%3Ddt}tag_idsfacet.field={!ex%3Ddt}addressee_idsfacet.field={!ex%3Ddt}operative_phrase_idfacet.field={!ex%3Ddt}documenttype_idfacet.field={!ex%3Ddt}information_type_idfacet.field={!ex%3Ddt}legal_valuejson.nl=mapwt=jsonfq={!tag%3Ddt}organisation_id:(-9)+AND+{!tag%3Ddt}information_type_id:(-1)rows=21} Can someone give me a hint? regards, Lukas Kahwe Smith m...@pooteeweet.org
How well does Solr scale over large number of facet values?
I want to facet over a field group. Since group is created by users, potentially there can be a huge number of values for group. - Would Solr be able to handle a use case like this? Or is Solr not really appropriate for facet fields with a large number of values? - I understand that I can set facet.limit to restrict the number of values returned for a facet field. Would this help in my case? Say there are 100,000 matching values for group in a search, if I set facet.limit to 50. would that speed up the query, or would the query still be slow because Solr still needs to process and sort through all the facet values and return the top 50 ones? -Any tips on how to tune Solr for large number of facet values? Thanks.
Re: Apache or Nginx In front of SOLR?
It depends on what kind of load you are talking about and what your expertise is. NGINX does perform better than apache for most people, however less people know about NGINX than apache. If you have more than 100K searchers a day doing a few searches each, you will benefits from NGINX. If your traffic is lower and you know apache better, apache will do just fine. 2010/5/25 Kranti™ K K Parisa kranti.par...@gmail.com Dear All, Which is the best implementation in front of SOLR between Apache and NGINX? The main aspects would be 1. Ability to handle high loads They are both known to handle high loads just fine. 2. Resource utilizations Apache uses more resources than NGINX in heavy loads, but I am sure apache can be tuned. 3. Caching (can we have caching implemented in front of solr, I did implement SOLR caching but to the extent possible i would still reduce the calls to SOLR by having some caching implemented in front of SOLR to serve You probably want to look at a reverse proxy like varnish or squid. the static pages whose data actually comes from SOLR) 4. Ability to record the statistics like AWSTATS available for Apache. This shouldn't be a concern. You can even configure tomcat or jetty to log in apache format. Please suggest your thoughts/ideas. Best Regards, Kranti K K Parisa Hope that helps, Paul Dhaliwal
RE: Highlighting is not happening
Hey, I thought the Highlights would happen in the field of the documents returned from SOLR J But it gives new list of Highlighting at below, sorry for the confusion I was wondering is there a way that the fields returned itself contains bold characters Eg : if searched for query doc str field name=onereturned response which contains bquery/b should be bold/str /doc Regards Prakash -Original Message- From: Sascha Szott [mailto:sz...@zib.de] Sent: Monday, May 24, 2010 10:55 PM To: solr-user@lucene.apache.org Subject: Re: Highlighting is not happening Hi Prakash, can you provide 1. the definition of the relevant field 2. your query 3. the definition of the relevant request handler 4. a field value that is stored in your index and should be highlighted -Sascha Doddamani, Prakash wrote: Thanks Sascha, The type for fields for which I am searching are all text , and I am using solr.TextField fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Regards Prakash -Original Message- From: Sascha Szott [mailto:sz...@zib.de] Sent: Monday, May 24, 2010 10:29 PM To: solr-user@lucene.apache.org Subject: Re: Highlighting is not happening Hi Prakash, more importantly, check the field type and its associated analyzer. In case you use a non-tokenized type (e.g., string), highlighting will not appear if only a partial field match exists (only exact matches, i.e. the query coincides with the field value, will be highlighted). If that's not your intent, you should at least define an tokenizer for the field type. Best, Sascha Doddamani, Prakash wrote: Hey Daren, Yes the fields for which I am searching are stored and indexed, also they are returned from the query, Also it is not coming, if the entire search keyword is part of the field. Thanks Prakash -Original Message- From: dar...@ontrenet.com [mailto:dar...@ontrenet.com] Sent: Monday, May 24, 2010 9:32 PM To: solr-user@lucene.apache.org Subject: Re: Highlighting is not happening Check that the field you are highlighting on is stored. It won't work otherwise. Now, this also means that the field is returned from the query. For large text fields to be highlighted only, this means the entire text is returned for each result. There is a pending feature to address this, that allows you to tell Solr to NOT return a specific field (to avoid unecessary transfer of large text fields in this scenario). Darren Hi I am using dismax request handler, I wanted to highlight the search field, So added str name=hltrue/str I was expecting like if I search for keyword Akon resultant docs wherever the Akon is available is bold. But I am not seeing them getting bold, could some one tell me the real path where I should tune If I pass explicitly the hl=true does not work I have added the request handler requestHandler name=dismax class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qf name^20.0 coming^5 playing^4 keywords^0.1 /str str name=bf rord(isclassic)^0.5 ord(listeners)^0.3 /str str
Re: Faceted search not working?
Hi Birger, Birger Lie wrote: I don't think the bolean fields is mapped to on and off :) You can use true and on interchangeably. -Sascha -birger -Original Message- From: Ilya Sterin [mailto:ster...@gmail.com] Sent: 24. mai 2010 23:11 To: solr-user@lucene.apache.org Subject: Faceted search not working? I'm trying to perform a faceted search without any luck. Result set doesn't return any facet information... http://localhost:8080/solr/select/?q=title:*facet=onfacet.field=title I'm getting the result set, but no face information present? Is there something else that needs to happen to turn faceting on? I'm using latest Solr 1.4 release. Data is indexed from the database using dataimporter. Thanks. Ilya Sterin
Re: sort by field length
Hi Erick, Erick Erickson wrote: Are you sure you want to recompute the length when sorting? It's the classic time/space tradeoff, but I'd suggest that when your index is big enough to make taking up some more space a problem, it's far too big to spend the cycles calculating each term length for sorting purposes considering you may be sorting all the terms in your index worst-case. Good point, thank you for the clarification. I thought that Lucene internally stores the field length (e.g., in order to compute the relevance) and getting this information at query time requires only a simple lookup. -Sascha But you could consider payloads for storing the length, although that would still be redundant... Best Erick On Mon, May 24, 2010 at 8:30 AM, Sascha Szottsz...@zib.de wrote: Hi folks, is it possible to sort by field length without having to (redundantly) save the length information in a seperate index field? At first, I thought to accomplish this using a function query, but I couldn't find an appropriate one. Thanks in advance, Sascha
Re: Highlighting is not happening
Hi, to accomplish that, use the highlighting parameters hl.simple.pre and hl.simple.post. By the way, there are a plenty of other parameters that affect highlighting. Take a look at: http://wiki.apache.org/solr/HighlightingParameters -Sascha Doddamani, Prakash wrote: Hey, I thought the Highlights would happen in the field of the documents returned from SOLR J But it gives new list of Highlighting at below, sorry for the confusion I was wondering is there a way that the fields returned itself contains bold characters Eg : if searched for query doc str field name=onereturned response which contains bquery/b should be bold/str /doc Regards Prakash -Original Message- From: Sascha Szott [mailto:sz...@zib.de] Sent: Monday, May 24, 2010 10:55 PM To: solr-user@lucene.apache.org Subject: Re: Highlighting is not happening Hi Prakash, can you provide 1. the definition of the relevant field 2. your query 3. the definition of the relevant request handler 4. a field value that is stored in your index and should be highlighted -Sascha Doddamani, Prakash wrote: Thanks Sascha, The type for fields for which I am searching are all text , and I am using solr.TextField fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Regards Prakash -Original Message- From: Sascha Szott [mailto:sz...@zib.de] Sent: Monday, May 24, 2010 10:29 PM To: solr-user@lucene.apache.org Subject: Re: Highlighting is not happening Hi Prakash, more importantly, check the field type and its associated analyzer. In case you use a non-tokenized type (e.g., string), highlighting will not appear if only a partial field match exists (only exact matches, i.e. the query coincides with the field value, will be highlighted). If that's not your intent, you should at least define an tokenizer for the field type. Best, Sascha Doddamani, Prakash wrote: Hey Daren, Yes the fields for which I am searching are stored and indexed, also they are returned from the query, Also it is not coming, if the entire search keyword is part of the field. Thanks Prakash -Original Message- From: dar...@ontrenet.com [mailto:dar...@ontrenet.com] Sent: Monday, May 24, 2010 9:32 PM To: solr-user@lucene.apache.org Subject: Re: Highlighting is not happening Check that the field you are highlighting on is stored. It won't work otherwise. Now, this also means that the field is returned from the query. For large text fields to be highlighted only, this means the entire text is returned for each result. There is a pending feature to address this, that allows you to tell Solr to NOT return a specific field (to avoid unecessary transfer of large text fields in this scenario). Darren Hi I am using dismax request handler, I wanted to highlight the search field, So added str name=hltrue/str I was expecting like if I search for keyword Akon resultant docs wherever the Akon is available is bold. But I am not seeing them getting bold, could some one tell me the real path where I should tune If I pass explicitly the hl=true does not work I have added the request handler requestHandler name=dismax class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str
Re: Apache or Nginx In front of SOLR?
Thanks Paul, I shall continue doing some more RD with your inputs. Best Regards, Kranti K K Parisa On Tue, May 25, 2010 at 12:54 PM, Paul Dhaliwal subp...@gmail.com wrote: It depends on what kind of load you are talking about and what your expertise is. NGINX does perform better than apache for most people, however less people know about NGINX than apache. If you have more than 100K searchers a day doing a few searches each, you will benefits from NGINX. If your traffic is lower and you know apache better, apache will do just fine. 2010/5/25 Kranti™ K K Parisa kranti.par...@gmail.com Dear All, Which is the best implementation in front of SOLR between Apache and NGINX? The main aspects would be 1. Ability to handle high loads They are both known to handle high loads just fine. 2. Resource utilizations Apache uses more resources than NGINX in heavy loads, but I am sure apache can be tuned. 3. Caching (can we have caching implemented in front of solr, I did implement SOLR caching but to the extent possible i would still reduce the calls to SOLR by having some caching implemented in front of SOLR to serve You probably want to look at a reverse proxy like varnish or squid. the static pages whose data actually comes from SOLR) 4. Ability to record the statistics like AWSTATS available for Apache. This shouldn't be a concern. You can even configure tomcat or jetty to log in apache format. Please suggest your thoughts/ideas. Best Regards, Kranti K K Parisa Hope that helps, Paul Dhaliwal
Re: How well does Solr scale over large number of facet values?
With the uninverted algorithm it will be very fast whatever is the number of unique terms. But be careful with the memory because it uses quite a lot. Using the oldest facet algorithm, if you have a lot of different terms it will be slow. -- View this message in context: http://lucene.472066.n3.nabble.com/How-well-does-Solr-scale-over-large-number-of-facet-values-tp841508p841613.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with extended dismax, minus prefix (to mean NOT) and interaction with mm?
This looks like a case where the extended dismax parser is creating a Lucene QueryParser parsed query rather than a disjunction maximum query. A case of too much magic maybe? Looks like this one should be parsed quite differently. Try dismax and see what you get, it'll be quite different. Erik On May 24, 2010, at 11:33 AM, Bill Dueber wrote: I'm running edismax (on both a 1.4 with patch and a branch_3x version) and I'm seeing something I don't expect. We have our mm set such that 2/2 must match and 2/3 must match (mm=2-1 567% A query of dog cat ...gets interpreted as dog AND cat But a query of dog cat -mouse ...gets interpreted as (dog AND cat) OR (dog AND NOT mouse) OR (cat AND NOT mouse) In other words, the -mouse is being interpreted as a single token (NOT mouse) to be counted for mm. I would expect the query to interpret as: (dog AND cat) AND (NOT mouse) Are my expectations out of whack? Or is this unexpected behavior? [I've pasted the debugQuery info for a similar search below, though I freely admit to not knowing how to read it] Any thoughts on what I'm seeing here? -Bill- lst name=debug str name=rawquerystringdog cat -trilogy/str str name=querystringdog cat -trilogy/str str name=parsedqueryallfields:dog allfields:cat -allfields:trilogi/str str name=parsedquery_toStringallfields:dog allfields:cat -allfields:trilogi/str lst name=explain str name=000107098 2.1741915 = (MATCH) sum of: 1.2620605 = (MATCH) weight(allfields:dog in 3187), product of: 0.7618881 = queryWeight(allfields:dog), product of: 8.744003 = idf(docFreq=64, maxDocs=15) 0.08713264 = queryNorm 1.6564907 = (MATCH) fieldWeight(allfields:dog in 3187), product of: 1.7320508 = tf(termFreq(allfields:dog)=3) 8.744003 = idf(docFreq=64, maxDocs=15) 0.109375 = fieldNorm(field=allfields, doc=3187) 0.912131 = (MATCH) weight(allfields:cat in 3187), product of: 0.64770865 = queryWeight(allfields:cat), product of: 7.4335938 = idf(docFreq=240, maxDocs=15) 0.08713264 = queryNorm 1.4082427 = (MATCH) fieldWeight(allfields:cat in 3187), product of: 1.7320508 = tf(termFreq(allfields:cat)=3) 7.4335938 = idf(docFreq=240, maxDocs=15) 0.109375 = fieldNorm(field=allfields, doc=3187) /str str name=36695 2.1518915 = (MATCH) sum of: 1.249116 = (MATCH) weight(allfields:dog in 36426), product of: 0.7618881 = queryWeight(allfields:dog), product of: 8.744003 = idf(docFreq=64, maxDocs=15) 0.08713264 = queryNorm 1.6395006 = (MATCH) fieldWeight(allfields:dog in 36426), product of: 2.0 = tf(termFreq(allfields:dog)=4) 8.744003 = idf(docFreq=64, maxDocs=15) 0.09375 = fieldNorm(field=allfields, doc=36426) 0.9027756 = (MATCH) weight(allfields:cat in 36426), product of: 0.64770865 = queryWeight(allfields:cat), product of: 7.4335938 = idf(docFreq=240, maxDocs=15) 0.08713264 = queryNorm 1.3937988 = (MATCH) fieldWeight(allfields:cat in 36426), product of: 2.0 = tf(termFreq(allfields:cat)=4) 7.4335938 = idf(docFreq=240, maxDocs=15) 0.09375 = fieldNorm(field=allfields, doc=36426) /str str name=38137 1.4345944 = (MATCH) sum of: 0.832744 = (MATCH) weight(allfields:dog in 37852), product of: 0.7618881 = queryWeight(allfields:dog), product of: 8.744003 = idf(docFreq=64, maxDocs=15) 0.08713264 = queryNorm 1.0930004 = (MATCH) fieldWeight(allfields:dog in 37852), product of: 1.0 = tf(termFreq(allfields:dog)=1) 8.744003 = idf(docFreq=64, maxDocs=15) 0.125 = fieldNorm(field=allfields, doc=37852) 0.6018504 = (MATCH) weight(allfields:cat in 37852), product of: 0.64770865 = queryWeight(allfields:cat), product of: 7.4335938 = idf(docFreq=240, maxDocs=15) 0.08713264 = queryNorm 0.9291992 = (MATCH) fieldWeight(allfields:cat in 37852), product of: 1.0 = tf(termFreq(allfields:cat)=1) 7.4335938 = idf(docFreq=240, maxDocs=15) 0.125 = fieldNorm(field=allfields, doc=37852) /str str name=000134898 1.2629167 = (MATCH) sum of: 0.624558 = (MATCH) weight(allfields:dog in 30673), product of: 0.7618881 = queryWeight(allfields:dog), product of: 8.744003 = idf(docFreq=64, maxDocs=15) 0.08713264 = queryNorm 0.8197503 = (MATCH) fieldWeight(allfields:dog in 30673), product of: 1.0 = tf(termFreq(allfields:dog)=1) 8.744003 = idf(docFreq=64, maxDocs=15) 0.09375 = fieldNorm(field=allfields, doc=30673) 0.6383587 = (MATCH) weight(allfields:cat in 30673), product of: 0.64770865 = queryWeight(allfields:cat), product of: 7.4335938 = idf(docFreq=240, maxDocs=15) 0.08713264 = queryNorm 0.9855646 = (MATCH) fieldWeight(allfields:cat in 30673), product of: 1.4142135 = tf(termFreq(allfields:cat)=2) 7.4335938 = idf(docFreq=240, maxDocs=15) 0.09375 = fieldNorm(field=allfields, doc=30673) /str str name=29964 1.25527 = (MATCH) sum
Re: How well does Solr scale over large number of facet values?
Since Solr 1.4 I think the uninverted method is on by default. Anyway, you can choose wich to use with the method param: facet.method=fc/enum (where fc is the uninverted one) http://wiki.apache.org/solr/SimpleFacetParameters -- View this message in context: http://lucene.472066.n3.nabble.com/How-well-does-Solr-scale-over-large-number-of-facet-values-tp841508p841683.html Sent from the Solr - User mailing list archive at Nabble.com.
Using solrJ to get all fields in a particular schema/index
Hi, Is there any way to get all the fields (irrespective of whether it contains a value or null) in solrDocument. or Is there any way to get all the fields in schema.xml of the url link ( http://localhost:8983/solr/core0/)?? Regards, Raakhi
Re: Using solrJ to get all fields in a particular schema/index
To reterive all documents, You need to use the query/filter *FieldName:*:** Regards Aditya www.findbestopensource.com On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, Is there any way to get all the fields (irrespective of whether it contains a value or null) in solrDocument. or Is there any way to get all the fields in schema.xml of the url link ( http://localhost:8983/solr/core0/)?? Regards, Raakhi
Re: How real-time are Solr/Lucene queries?
How many docs are in the batch you are pulling down? How many docs/second do you expect on the index size? How big are the docs? What do you expect in terms of queries per second? How fast do new documents need to be available on the local server? How much analysis do you have to do? Also, define Real Time. You'd be surprised at the number of people I talk to who think they need Real Time, but then when you ask them questions like I just did, they don't really need it. I've seen Solr turn around new docs in as little as 30 seconds on commodity hardware w/o any special engineering effort and I've seen it faster than that with some engineering effort. That isn't necessarily possible for every application, but... Despite the other suggestions, what you describe still looks feasible to me in Solr, pending the questions above (and some followups). On May 21, 2010, at 4:05 AM, Thomas J. Buhr wrote: Thanks for the new information. Its really great to see so many options for Lucene. In my scenario there are the following pieces: 1 - A local Java client with an embedded Solr instance and its own local index/s. 2 - A remote server running Solr with index/s that are more like a repository that local clients query for extra goodies. 3 - The client is also a JXTA node so it can share indexes or documents too. 4 - There is no browser involved what so ever. My music composing application is a local client that uses configurations which would become many different document types. A subset of these configurations will be bundled with the application and then many more would be made available via a server/s running Solr. I would not expect the queries which would be made from within the local client to be returned in real-time. I would only expect such queries to be made in reasonable time and returned to the client. The client would have its local Lucene index system (embedded Solr using SolrJ) which would be updated with the results of the query made to the Solr instance running on the remote server. Then the user on the client would issue queries to the local Lucene index/s to obtain results which are used to setup contexts for different aspects of the client. For example: an activated context for musical scales and rhythms used for creating musical notes, an activated context for rendering with layout and style information for different music symbol renderer types. I'm not yet sure but it may be best to make queries against the local Lucene index/s and then convert the results into some context objects, maybe an array or map (I'd like to learn more about how query results can be returned as arrays or maps as well). Then the tools and renderers which require the information in the contexts would do any real-time lookup directly from the context objects not the local or remote Lucene or Solr index/s. The local client is also a JXTA node so it can share its own index/s with fellow peers. This is how I envision this happening with my limited knowledge of Lucene/Solr at this time. What are your thoughts on the feasibility of such a scenario? I'm just reading through the Solr reference PDF now and looking over the Solr admin application. Looking at the Schema.xml it seems to be field not document oriented. From my point of view I think in terms of configuration types which would be documents. In the schema it seems like only fields are defined and it does not matter which configuration/document they belong to? I guess this is fine as long as the indexing takes into account my unique document types and I can search for them as a whole as well, not only for specific values across a set of indexed documents. Also, does the schema allow me to index certain documents into specific indexes or are they all just bunched together? I'd rather have unique indexes for specific document types. I've just read about multiple cores running under one Solr instance, is this the only way to support multiple indexes? I'm thinking of ordering the Lucene in Action v2 book which is due this month and also the Solr 1.4 book. Before I do I just need to understand a few things which is why I'm writing such a long message :-) Thom On 2010-05-21, at 2:12 AM, Ben Eliott wrote: Further to earlier note re Lucandra. I note that Cassandra, which Lucandra backs onto, is 'eventually consistent', so given your real-time requirements, you may want to review this in the first instance, if Lucandra is of interest. On 21 May 2010, at 06:12, Walter Underwood wrote: Solr is a very good engine, but it is not real-time. You can turn off the caches and reduce the delays, but it is fundamentally not real-time. I work at MarkLogic, and we have a real-time transactional search engine (and respository). If you are curious, contact me directly. I do like Solr for lots of applications -- I chose it when I was at
Re: Using solrJ to get all fields in a particular schema/index
To reterive all documents, You need to use the query/filter *FieldName:*:** Regards Aditya www.findbestopensource.com On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, Is there any way to get all the fields (irrespective of whether it contains a value or null) in solrDocument. or Is there any way to get all the fields in schema.xml of the url link ( http://localhost:8983/solr/core0/)?? Regards, Raakhi
Re: Using solrJ to get all fields in a particular schema/index
Resending it as there is a typo error. To reterive all documents, You need to use the query/filter FieldName:*:* . Regards Aditya www.findbestopensource.com On Tue, May 25, 2010 at 4:29 PM, findbestopensource findbestopensou...@gmail.com wrote: To reterive all documents, You need to use the query/filter *FieldName:*:* * Regards Aditya www.findbestopensource.com On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.comwrote: Hi, Is there any way to get all the fields (irrespective of whether it contains a value or null) in solrDocument. or Is there any way to get all the fields in schema.xml of the url link ( http://localhost:8983/solr/core0/)?? Regards, Raakhi
Re: Machine utilization while indexing
Hi all, I did some further investigation and (after turning of some filters in yourkit) found that is was actually the machine sending the files to solr that was slowing things down. At first I couldn't find this as it turned out that yourkit hides org.apache.* classes. When I removed this filter, it turned out that atleast 50% of the CPU time was taken by org.apache.solr.client.solrj.util.ClientUtils.writeXML(SolrInputDocument, Writer) This was taking so much time that the commit queues where filling up on the client side instead of the solr server. I have now switched back to my custom BlockingQueue with multiple CommonsHttpSolrServers that use the BinaryRequestWriter. And I'm now able to index 80 documents in 8minutes (including optimize). And 2.9milj documents in 32 minutes(inlc. optimize). As the StreamingUpdateSolrServer only supports XML I can't use that. So now I wonder why BinaryRequestWriter (and BinaryUpdateRequestHandler) aren't turned on by default. (eps considering some threads on the dev-list some time ago about setting a default schema for optimum performance. Also finding out about this performance enhancement wasn't easy as it's hardly mentioned on the Wiki. I'll see if I can update this. Thanks for all the advise and esp the great work on SolrLucene. Thijs On 20-5-2010 21:34, Chris Hostetter wrote: : StreamingUpdateSolrServer already has multiple threads and uses multiple : connections under the covers. At least the api says ' Uses an internal Hmmm... i think one of us missunderstands the point behind StreamingUpdateSolrServer and it's internal threads/queues. (it's very possible that it's me) my understanding is that this allows it to manage the batching of multiple operations for you, reusing connections as it goes -- so the the queueSize is how many individual requests it buffers before sending the batch to Solr, and the threadCount controls how many batches it can send in parallel (in the event that one thread is still waiting for the response when the queue next fills up) But if you are only using a single thread to feed SolrRequests to a single instance of StreamingUpdateSolrServer then there can still be lots of opportunities for Solr itself to be idle -- as i said, it's not clear to me if you are using multiple threads to write to your StreamingUpdateSolrServer ... even if if you reuse the same StreamingUpdateSolrServer instance, multiple threads in your client code may increse the throughput (assuming that at the moment the threads in StreamingUpdateSolrServer are largely idle) But as i said ... this is all mostly a guess. I'm not intimatiely familiar with solrj. -Hoss
Re: Using solrJ to get all fields in a particular schema/index
Hi Aditya, i can retrieve all documents. but cannot retrieve all the fields in a document(if it does not hv any value). For example i get a list of documents, some of the documents have some value for title field, and others mite not contain a value for title field. in anycase i need to get the entry for title in getFieldNames(). How do i go about that? Regards, Raakhi On Tue, May 25, 2010 at 5:07 PM, findbestopensource findbestopensou...@gmail.com wrote: Resending it as there is a typo error. To reterive all documents, You need to use the query/filter FieldName:*:* . Regards Aditya www.findbestopensource.com On Tue, May 25, 2010 at 4:29 PM, findbestopensource findbestopensou...@gmail.com wrote: To reterive all documents, You need to use the query/filter *FieldName:*:* * Regards Aditya www.findbestopensource.com On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, Is there any way to get all the fields (irrespective of whether it contains a value or null) in solrDocument. or Is there any way to get all the fields in schema.xml of the url link ( http://localhost:8983/solr/core0/)?? Regards, Raakhi
Re: Using solrJ to get all fields in a particular schema/index
If a field doesn't have a value, You will get NULL on retrieving it. How could you expect a value for a field which is not provided? You have two options, choose either one.. 1. If the fieldvalue is returned NULL then display a proper error / user defined message. Handle the error. 2. Add a dummy value say NO_VALUE to the title field, which doesn't have any value. Regards Aditya www.findbestopensource.com On Tue, May 25, 2010 at 5:20 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi Aditya, i can retrieve all documents. but cannot retrieve all the fields in a document(if it does not hv any value). For example i get a list of documents, some of the documents have some value for title field, and others mite not contain a value for title field. in anycase i need to get the entry for title in getFieldNames(). How do i go about that? Regards, Raakhi On Tue, May 25, 2010 at 5:07 PM, findbestopensource findbestopensou...@gmail.com wrote: Resending it as there is a typo error. To reterive all documents, You need to use the query/filter FieldName:*:* . Regards Aditya www.findbestopensource.com On Tue, May 25, 2010 at 4:29 PM, findbestopensource findbestopensou...@gmail.com wrote: To reterive all documents, You need to use the query/filter *FieldName:*:* * Regards Aditya www.findbestopensource.com On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, Is there any way to get all the fields (irrespective of whether it contains a value or null) in solrDocument. or Is there any way to get all the fields in schema.xml of the url link ( http://localhost:8983/solr/core0/)?? Regards, Raakhi
Re: Tagging and excluding Filters
On 25.05.2010, at 08:55, Lukas Kahwe Smith wrote: Now when I deselect one of the checkboxes I add an fq parameters: facet=truefl=*,scoresort=score+descstart=0q=(tag_ids:(23))facet.field={!ex%3Ddt}organisation_idfacet.field={!ex%3Ddt}tag_idsfacet.field={!ex%3Ddt}addressee_idsfacet.field={!ex%3Ddt}operative_phrase_idfacet.field={!ex%3Ddt}documenttype_idfacet.field={!ex%3Ddt}information_type_idfacet.field={!ex%3Ddt}legal_valuejson.nl=mapwt=jsonfq={!tag%3Ddt}organisation_id:(-9)rows=21 {!tag=dt}organisation_id:(-8) Now where I am at a loss is when I want to filter in multiple different sections (like filter both organisations as well as clause information type. I tried various ways of constructing the fq prameter but I always get a parse error: {!tag=dt}(organisation_id:(-8) AND information_type_id:(-1)) {!tag=dt}organisation_id:(-8) AND {!tag=dt}information_type_id:(-1) For example: Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'organisation_id:(-9) AND {!tag=dt}information_type_id:(-1)': Encountered } } at line 1, column 35. When running: facet=truefl=*,scoresort=score+descstart=0q=(tag_ids:(23))facet.field={!ex%3Ddt}organisation_idfacet.field={!ex%3Ddt}tag_idsfacet.field={!ex%3Ddt}addressee_idsfacet.field={!ex%3Ddt}operative_phrase_idfacet.field={!ex%3Ddt}documenttype_idfacet.field={!ex%3Ddt}information_type_idfacet.field={!ex%3Ddt}legal_valuejson.nl=mapwt=jsonfq={!tag%3Ddt}organisation_id:(-9)+AND+{!tag%3Ddt}information_type_id:(-1)rows=21} The following syntax seems to do what I want: {!tag=dt}!(organisation_id:(8) OR information_type_id:(2)) regards, Lukas Kahwe Smith m...@pooteeweet.org
Re: sort by field length
Ah, I may have misunderstood, I somehow got it in my mind you were talking about the length of each term (as in string length). But if you're looking at the field length as the count of terms, that's another question, sorry for the confusion... I have to ask, though, why you want to sort this way? The relevance calculations already factor in both term frequency and field length. What's the use-case for sorting by field length given the above? Best Erick On Tue, May 25, 2010 at 3:40 AM, Sascha Szott sz...@zib.de wrote: Hi Erick, Erick Erickson wrote: Are you sure you want to recompute the length when sorting? It's the classic time/space tradeoff, but I'd suggest that when your index is big enough to make taking up some more space a problem, it's far too big to spend the cycles calculating each term length for sorting purposes considering you may be sorting all the terms in your index worst-case. Good point, thank you for the clarification. I thought that Lucene internally stores the field length (e.g., in order to compute the relevance) and getting this information at query time requires only a simple lookup. -Sascha But you could consider payloads for storing the length, although that would still be redundant... Best Erick On Mon, May 24, 2010 at 8:30 AM, Sascha Szottsz...@zib.de wrote: Hi folks, is it possible to sort by field length without having to (redundantly) save the length information in a seperate index field? At first, I thought to accomplish this using a function query, but I couldn't find an appropriate one. Thanks in advance, Sascha
Re: Faceted search not working?
Is the FacetComponent loaded at all? requestHandler name=standard class=solr.SearchHandler default=true arr name=components strquery/str strfacet/str /arr /requestHandler On 2010-05-25, at 3:32 AM, Sascha Szott wrote: Hi Birger, Birger Lie wrote: I don't think the bolean fields is mapped to on and off :) You can use true and on interchangeably. -Sascha -birger -Original Message- From: Ilya Sterin [mailto:ster...@gmail.com] Sent: 24. mai 2010 23:11 To: solr-user@lucene.apache.org Subject: Faceted search not working? I'm trying to perform a faceted search without any luck. Result set doesn't return any facet information... http://localhost:8080/solr/select/?q=title:*facet=onfacet.field=title I'm getting the result set, but no face information present? Is there something else that needs to happen to turn faceting on? I'm using latest Solr 1.4 release. Data is indexed from the database using dataimporter. Thanks. Ilya Sterin
question about indexing...
I have a work!, i musst indexing a lot of E-Mails, so i will create a Script to generate me a xml of the Mails. Now is the question, what happens when i creade a field body and in this field comes a lot of or like this: Confidentiality Caution: This message and all its included content and assets are confidential and for the individual use of the entity to whom it is send to only. If you, the reader of this message, have recieved this communication by error please notify me about this immediately, by return address, and delete the message and its assets. Thank you. Apropos: In eurem Footer scheint ein r zu fehlen (Headqua_r_ter). Snapt Pty Ltd: Stephan Plesnik schrieb: Headquaters: Diese E-Mail, einschließlich sämtlicher mit ihr übertragenen Dateie n, ist vertraulich und ist für die ausschließliche Verwendung durch die Person oder das Unternehmen vorgesehen, an die/das sie adressiert ist. Sollten Sie diese E-Mail fälschlicherweise erhalten haben, benachrichtigen Sie bitte unseren Systemverwalter (serv...@plesnik.de). Diese E-Mail wurde auf die Abwesenheit von Computerviren überprüft. --- Or hallo Mr. xy thanks for greats dear Mr. xyz I think it dosen´t Work! howcan i make it, so that each content inpuit in the Solr
Re: IndexSearcher and Caches
Chris, I am using SolrIndexSearcher to get a handle to the total number of records in the index. I am doing it like this : int num = Integer.parseInt((String)solrSearcher.getStatistics().get(numDocs).toString()); Please let me know if there is a better way to do this. Mark, I can tell you what I do in my applicaiton. We provide a tool to do the index update and assume that the user will always use it to create/update the index. Whenever an update happens, we notify the querying application and it creates a new instance of SolrCore, SolrServer etc. These continue to be shared across multiple users (as statics) till the next update happens. Thank you. Regards Rahul On Tue, May 25, 2010 at 4:18 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Thank you I found the API to get the existing SolrIndexSearcher to be : present in SolrCore: : SolrCore.getSearcher().get() I think perhaps you need to take 5 big steps back and explain what your goal is. 99.999% of all solr users should never care about that method -- even the 99.9% of the folks writing java code and using EmbeddedSolr should never ever have a need to call those -- so what exactly is it you are doing, and how did you get along hte path you find yourself on? this thread started with some fairly innoculous questions about how caches worked in regardes to new searchers -- which is all fine and dandy, those concepts that solr users should be aware of ... in the abstract. you should almost never be instantiating those IndexSearchers or Caches yourself. Stick with teh SolrServer abstraction provided by SolrJ... http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/SolrServer.html -Hoss
Re: Faceted search not working?
Hi, please note, that the FacetComponent is one of the six search components that are automatically associated with solr.SearchHandler (this holds also for the QueryComponent). Another note: By using name=components all default components will be replaced by the components you explicitly mentioned (i.e., QueryComponent and FacetComponent in your example). To avoid this, use name=last-components instead. -Sascha Jean-Sebastien Vachon wrote: Is the FacetComponent loaded at all? requestHandler name=standard class=solr.SearchHandler default=true arr name=components strquery/str strfacet/str /arr /requestHandler On 2010-05-25, at 3:32 AM, Sascha Szott wrote: Hi Birger, Birger Lie wrote: I don't think the bolean fields is mapped to on and off :) You can use true and on interchangeably. -Sascha -birger -Original Message- From: Ilya Sterin [mailto:ster...@gmail.com] Sent: 24. mai 2010 23:11 To: solr-user@lucene.apache.org Subject: Faceted search not working? I'm trying to perform a faceted search without any luck. Result set doesn't return any facet information... http://localhost:8080/solr/select/?q=title:*facet=onfacet.field=title I'm getting the result set, but no face information present? Is there something else that needs to happen to turn faceting on? I'm using latest Solr 1.4 release. Data is indexed from the database using dataimporter. Thanks. Ilya Sterin
Re: question about indexing...
Well, you'll just have to create valid XML, either encoding some characters or using CDATA sections. Erik On May 25, 2010, at 10:06 AM, Jörg Agatz wrote: I have a work!, i musst indexing a lot of E-Mails, so i will create a Script to generate me a xml of the Mails. Now is the question, what happens when i creade a field body and in this field comes a lot of or like this: Confidentiality Caution: This message and all its included content and assets are confidential and for the individual use of the entity to whom it is send to only. If you, the reader of this message, have recieved this communication by error please notify me about this immediately, by return address, and delete the message and its assets. Thank you. Apropos: In eurem Footer scheint ein r zu fehlen (Headqua_r_ter). Snapt Pty Ltd: Stephan Plesnik schrieb: Headquaters: Diese E-Mail, einschließlich sämtlicher mit ihr übertragenen Dateie n, ist vertraulich und ist für die ausschließliche Verwendung durch die Person oder das Unternehmen vorgesehen, an die/das sie adressiert ist. Sollten Sie diese E-Mail fälschlicherweise erhalten haben, benachrichtigen Sie bitte unseren Systemverwalter (serv...@plesnik.de). Diese E-Mail wurde auf die Abwesenheit von Computerviren überprüft. --- Or hallo Mr. xy thanks for greats dear Mr. xyz I think it dosen´t Work! howcan i make it, so that each content inpuit in the Solr
Re: question about indexing...
ok, done.. But now i dosent find any word in the CDATA field. i make : field name=P_CONTENT_ITEMS_COMMENT![CDATA[ Hallo leute. mein name ist dein name und wir wollen eigentlich nur unsere Ruhe haben. bich du er sie es/b Ha ha Ha ha ha ha ha ha ha ha ]]/field it is a string field Multivalued.. King
Re: question about indexing...
You have to provide more details than that. We need to know the field definition for that named field, the corresponding field type definition, and the exact request you're making to Solr that you think should find this document. And most importantly, did you commit/ :) Erik On May 25, 2010, at 11:22 AM, Jörg Agatz wrote: ok, done.. But now i dosent find any word in the CDATA field. i make : field name=P_CONTENT_ITEMS_COMMENT![CDATA[ Hallo leute. mein name ist dein name und wir wollen eigentlich nur unsere Ruhe haben. bich du er sie es/b Ha ha Ha ha ha ha ha ha ha ha ]]/field it is a string field Multivalued.. King
Re: question about indexing...
i create a new Index, but nothing Change. field name=COMMENT type=string indexed=true stored=true multiValued=true/ field name=COMMENT ![CDATA[ Hallo leute. mein name ist dein name und wir wollen eigentlich nur unsere Ruhe haben. bich du er sie es/b Ha ha Ha ha ha ha ha ha ha ha ]]/field I search for : *:* I fond it i search vor hallo Hallo hallo* Hallo*or some other content from the CDATA field i dosent.
Re: caching on unique queries
: Pretty much every one of my queries is going to be unique. However, the : query is fairly complex and also contains both unique and non-unique : data. In the query, some fields will be unique (e.g description), but : other fields will be fairly common (e.g. category). If we could use : those common fields as filters, it would be easy to use the filter : cache. I could just separate the filters and let the filter cache do its : thing. Unfortunately, due to the nature of our application, pretty much : every field is just a boost. ... : Is there anyway to cache part of the query? Or basically cache : subqueries? I have my own request handler, so I am willing to write the : necessary code. I am fearful that the best performance may be to just : turn off caching. One thing a custom plugin could possibly do in cases like this is to use the filterCache to cache the DocSets corripsonding to the re-used portions of your queries (on a graunular level) and then wrap those DocSets in a Query facade to build them up into a big BooleanQuery -- which you should explicitly make sure Solr does not cache (because the Query objects will be *huge* if they contain all those DocSets wrapped up) Note: i did this once a *long* time ago and it wokred out ok, but this may be a lot harder now that we have per segment searching/scoring -- i'm not sure that the Query/Scorer API gives you everything you need to be able to return segment based docIds from a global DocSet. -Hoss
Help me understand query syntax of subqueries
Any idea why this query returns 0 records: sexual assault AND (-obama) while this one returns 1400 ? sexual assault AND -(obama) Some debug info: sexual assault AND (-obama), translates to: +text:sexual assault +(-text:obama), returns 0 records sexual assault AND -(obama), translates to: +text:sexual assault -text:obama, returns 1400 records sexual assault AND obama,translates to: +text:sexual assault +text:obama, return 53 records (-obama), translates to: -text:obama, returns 716295 records -(obama), translates to: -text:obama, returns 716295 records I am using Solr 1.4 qt:standard QParser:LuceneQParser Thanks, Mis
Re: How real-time are Solr/Lucene queries?
The main issue is if you're using facets, which are currently inefficient for the realtime use case because they're created on the entire set of segment/readers. Field caches in Lucene are per segment and so don't have this problem. On Tue, May 25, 2010 at 4:09 AM, Grant Ingersoll gsing...@apache.org wrote: How many docs are in the batch you are pulling down? How many docs/second do you expect on the index size? How big are the docs? What do you expect in terms of queries per second? How fast do new documents need to be available on the local server? How much analysis do you have to do? Also, define Real Time. You'd be surprised at the number of people I talk to who think they need Real Time, but then when you ask them questions like I just did, they don't really need it. I've seen Solr turn around new docs in as little as 30 seconds on commodity hardware w/o any special engineering effort and I've seen it faster than that with some engineering effort. That isn't necessarily possible for every application, but... Despite the other suggestions, what you describe still looks feasible to me in Solr, pending the questions above (and some followups). On May 21, 2010, at 4:05 AM, Thomas J. Buhr wrote: Thanks for the new information. Its really great to see so many options for Lucene. In my scenario there are the following pieces: 1 - A local Java client with an embedded Solr instance and its own local index/s. 2 - A remote server running Solr with index/s that are more like a repository that local clients query for extra goodies. 3 - The client is also a JXTA node so it can share indexes or documents too. 4 - There is no browser involved what so ever. My music composing application is a local client that uses configurations which would become many different document types. A subset of these configurations will be bundled with the application and then many more would be made available via a server/s running Solr. I would not expect the queries which would be made from within the local client to be returned in real-time. I would only expect such queries to be made in reasonable time and returned to the client. The client would have its local Lucene index system (embedded Solr using SolrJ) which would be updated with the results of the query made to the Solr instance running on the remote server. Then the user on the client would issue queries to the local Lucene index/s to obtain results which are used to setup contexts for different aspects of the client. For example: an activated context for musical scales and rhythms used for creating musical notes, an activated context for rendering with layout and style information for different music symbol renderer types. I'm not yet sure but it may be best to make queries against the local Lucene index/s and then convert the results into some context objects, maybe an array or map (I'd like to learn more about how query results can be returned as arrays or maps as well). Then the tools and renderers which require the information in the contexts would do any real-time lookup directly from the context objects not the local or remote Lucene or Solr index/s. The local client is also a JXTA node so it can share its own index/s with fellow peers. This is how I envision this happening with my limited knowledge of Lucene/Solr at this time. What are your thoughts on the feasibility of such a scenario? I'm just reading through the Solr reference PDF now and looking over the Solr admin application. Looking at the Schema.xml it seems to be field not document oriented. From my point of view I think in terms of configuration types which would be documents. In the schema it seems like only fields are defined and it does not matter which configuration/document they belong to? I guess this is fine as long as the indexing takes into account my unique document types and I can search for them as a whole as well, not only for specific values across a set of indexed documents. Also, does the schema allow me to index certain documents into specific indexes or are they all just bunched together? I'd rather have unique indexes for specific document types. I've just read about multiple cores running under one Solr instance, is this the only way to support multiple indexes? I'm thinking of ordering the Lucene in Action v2 book which is due this month and also the Solr 1.4 book. Before I do I just need to understand a few things which is why I'm writing such a long message :-) Thom On 2010-05-21, at 2:12 AM, Ben Eliott wrote: Further to earlier note re Lucandra. I note that Cassandra, which Lucandra backs onto, is 'eventually consistent', so given your real-time requirements, you may want to review this in the first instance, if Lucandra is of interest. On 21 May 2010, at 06:12, Walter Underwood wrote: Solr is a very good engine, but it is not
Re: Problem with extended dismax, minus prefix (to mean NOT) and interaction with mm?
: I'm running edismax (on both a 1.4 with patch and a branch_3x version) and : I'm seeing something I don't expect. ... : str name=rawquerystringdog cat -trilogy/str : str name=querystringdog cat -trilogy/str : str name=parsedqueryallfields:dog allfields:cat : -allfields:trilogi/str : str name=parsedquery_toStringallfields:dog allfields:cat : -allfields:trilogi/str Hmmm... something is really odd here -- are you sure you are using edismax as your query parser? ... because with Solr 1.4 and edismax you should still be seeing DisjunctionMaxQuery show up in your parsedquery output. that said: i can definitely confirm that i see a descrepency between dismax and edismax in how they deal with the mm param (on trunk and in 1.4)... MM is ignored... http://localhost:8983/solr/select?debugQuery=truedefType=edismaxqf=textq=xxx+yyy+zzz+-1234mm=2 MM is used... http://localhost:8983/solr/select?debugQuery=truedefType=dismaxqf=textq=xxx+yyy+zzz+-1234mm=2 ...the negative clause definitely seems to be what triggers it. I've added this to SOLR-1553 (edismax is still considered an open issue, it was only experimental in Solr 1.4) -Hoss
Does SOLR provide a java class to perform url-encoding
I would like to leverage on whatever SOLR provides to properly url-encode a search string. For example a user enters: mr. bill oh no The URL submitted by the admin page is: http://localhost:8983/solr/select?indent=onversion=2.2q=%22mr.+bill%22+oh+nofq=start=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl.fl= Since the admin page uses it I would image that this functionality is there, but having some trouble finding it. -- View this message in context: http://lucene.472066.n3.nabble.com/Does-SOLR-provide-a-java-class-to-perform-url-encoding-tp842660p842660.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does SOLR provide a java class to perform url-encoding
Java provides one. You probably want to use utf-8 as the encoding scheme. http://java.sun.com/javase/6/docs/api/java/net/URLEncoder.html Note you also will want to strip or escape character that are meaningful in the Solr/Lucene query syntax. http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#Escaping%20Special%20Characters -Sean On 5/25/2010 1:20 PM, JohnRodey wrote: I would like to leverage on whatever SOLR provides to properly url-encode a search string. For example a user enters: mr. bill oh no The URL submitted by the admin page is: http://localhost:8983/solr/select?indent=onversion=2.2q=%22mr.+bill%22+oh+nofq=start=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl.fl= Since the admin page uses it I would image that this functionality is there, but having some trouble finding it.
Re: How real-time are Solr/Lucene queries?
My documents are all quite small if not down right tiny, there is not much analysis to do. I plan to mainly use Solr for indexing application configuration data which there is a lot of and I have all pre-formated. Since it is a music application there are many score templates, scale and rhythm strings, notation symbol skins, etc. Then there are slightly more usual things to index like application help pages and tutorials. In terms of queries per second there will be a lot being fired by our painter. In our application data is flowing into a painter who in turn delegates specific painting tasks to renderer objects. These renderer objects then make many queries extremely fast to the embedded Solr indexes for data they need, such as layout and style values. Believe me there is a lot of detailed data involved in music notation and abstracting it into configurations in the form of index documents is a good way to manage such data. Further, the data in the form of documents work as a form of plugins so that alternate configurations for different notation types can be added to the index. Then via simple search it is possible to dialup a certain set of documents which contain all the details of a given notation. Mean while the renderer objects remain generic and are just reconfigured with the different indexed configuration documents. Will making many fast queries from renderers to an embedded local Solr index slow my painting down? Thom On 2010-05-25, at 6:09 AM, Grant Ingersoll wrote: How many docs are in the batch you are pulling down? How many docs/second do you expect on the index size? How big are the docs? What do you expect in terms of queries per second? How fast do new documents need to be available on the local server? How much analysis do you have to do? Also, define Real Time. You'd be surprised at the number of people I talk to who think they need Real Time, but then when you ask them questions like I just did, they don't really need it. I've seen Solr turn around new docs in as little as 30 seconds on commodity hardware w/o any special engineering effort and I've seen it faster than that with some engineering effort. That isn't necessarily possible for every application, but... Despite the other suggestions, what you describe still looks feasible to me in Solr, pending the questions above (and some followups). On May 21, 2010, at 4:05 AM, Thomas J. Buhr wrote: Thanks for the new information. Its really great to see so many options for Lucene. In my scenario there are the following pieces: 1 - A local Java client with an embedded Solr instance and its own local index/s. 2 - A remote server running Solr with index/s that are more like a repository that local clients query for extra goodies. 3 - The client is also a JXTA node so it can share indexes or documents too. 4 - There is no browser involved what so ever. My music composing application is a local client that uses configurations which would become many different document types. A subset of these configurations will be bundled with the application and then many more would be made available via a server/s running Solr. I would not expect the queries which would be made from within the local client to be returned in real-time. I would only expect such queries to be made in reasonable time and returned to the client. The client would have its local Lucene index system (embedded Solr using SolrJ) which would be updated with the results of the query made to the Solr instance running on the remote server. Then the user on the client would issue queries to the local Lucene index/s to obtain results which are used to setup contexts for different aspects of the client. For example: an activated context for musical scales and rhythms used for creating musical notes, an activated context for rendering with layout and style information for different music symbol renderer types. I'm not yet sure but it may be best to make queries against the local Lucene index/s and then convert the results into some context objects, maybe an array or map (I'd like to learn more about how query results can be returned as arrays or maps as well). Then the tools and renderers which require the information in the contexts would do any real-time lookup directly from the context objects not the local or remote Lucene or Solr index/s. The local client is also a JXTA node so it can share its own index/s with fellow peers. This is how I envision this happening with my limited knowledge of Lucene/Solr at this time. What are your thoughts on the feasibility of such a scenario? I'm just reading through the Solr reference PDF now and looking over the Solr admin application. Looking at the Schema.xml it seems to be field not document oriented. From my point of view I think in terms of configuration types which would be documents. In the schema
Re: Using solrJ to get all fields in a particular schema/index
:Is there any way to get all the fields (irrespective of whether : it contains a value or null) in solrDocument. no. a document only has Field instances for the fields which it has values for. it's also not a feature that would even be theoretically posisbly to add, becuase of dynamicFields. If you have even one single dynmaicField declaration, then there is an infinite number of possible fields. : Is there any way to get all the fields in schema.xml of the url link ( : http://localhost:8983/solr/core0/)?? Take a look at http://localhost:8983/solr/core0/admin/luke?show=schema ... it can programaticly return details about the schema (including all the fields and dynamicFields) to your application. -Hoss
Re: Faceted search not working?
Sascha thanks for the response, here is the output... ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=wtxml/str str name=qtitle:*/str str name=fltitle/str /lst /lst result name=response numFound=3 start=0 doc str name=titleBaseball game/str /doc doc str name=titleSoccer game/str /doc doc str name=titleFootball game/str /doc /result /response On Mon, May 24, 2010 at 5:39 PM, Sascha Szott sz...@zib.de wrote: Hi Ilya, Ilya Sterin wrote: I'm trying to perform a faceted search without any luck. Result set doesn't return any facet information... http://localhost:8080/solr/select/?q=title:*facet=onfacet.field=title I'm getting the result set, but no face information present? Is there something else that needs to happen to turn faceting on? No. What does http://localhost:8080/solr/select/?q=title:*fl=titlewt=xml return? -Sascha
Re: Faceted search not working? (RESOLVED)
Ah, the issue was explicitly specifying components... arr name=components strquery/str /arr I don't remember changing this during default install, commenting this out enabled faceted search component. Thanks all for the help. Ilya On Tue, May 25, 2010 at 10:38 AM, Sascha Szott sz...@zib.de wrote: Hi, please note, that the FacetComponent is one of the six search components that are automatically associated with solr.SearchHandler (this holds also for the QueryComponent). Another note: By using name=components all default components will be replaced by the components you explicitly mentioned (i.e., QueryComponent and FacetComponent in your example). To avoid this, use name=last-components instead. -Sascha Jean-Sebastien Vachon wrote: Is the FacetComponent loaded at all? requestHandler name=standard class=solr.SearchHandler default=true arr name=components strquery/str strfacet/str /arr /requestHandler On 2010-05-25, at 3:32 AM, Sascha Szott wrote: Hi Birger, Birger Lie wrote: I don't think the bolean fields is mapped to on and off :) You can use true and on interchangeably. -Sascha -birger -Original Message- From: Ilya Sterin [mailto:ster...@gmail.com] Sent: 24. mai 2010 23:11 To: solr-user@lucene.apache.org Subject: Faceted search not working? I'm trying to perform a faceted search without any luck. Result set doesn't return any facet information... http://localhost:8080/solr/select/?q=title:*facet=onfacet.field=title I'm getting the result set, but no face information present? Is there something else that needs to happen to turn faceting on? I'm using latest Solr 1.4 release. Data is indexed from the database using dataimporter. Thanks. Ilya Sterin
Re: Does SOLR provide a java class to perform url-encoding
Thanks Sean, that was exactly what I need. One question though... How to correctly retain the Solr specific characters. I tried adding escape chars but URLEncoder doesn't seem to care about that: Example: String s1 = \mr. bill\ oh n?; String s2 = \\\mr. bill\\\ oh n\\?; String encoded1 = URLEncoder.encode(s1, UTF-8); String encoded2 = URLEncoder.encode(s2, UTF-8); System.out.println(encoded1); System.out.println(encoded2); Output: %22mr.+bill%22+oh+n%3F %5C%22mr.+bill%5C%22+oh+n%5C%3F Should I allow the URLEncoder to translate s1, then replace %22 with , %3F with ?, and so on? Or is there a better way? -- View this message in context: http://lucene.472066.n3.nabble.com/Does-SOLR-provide-a-java-class-to-perform-url-encoding-tp842660p842744.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR-343 date facet mincount patch
Hoss, I was able to successfully apply the path Solr-343 and even after applying the patch, date facet minCount does not work. Appropriate part of response are as given below: [responseHeader] = object(SolrObject)#107 (3) { [status] = int(0) [QTime] = int(4) [params] = object(SolrObject)#108 (18) { [facet.date.start] = string(17) NOW/YEAR-200YEARS [facet] = string(4) true [indent] = string(2) on [facet.date] = string(11) r_take_date [wt] = string(3) xml [f.Instrument.facet.mincount] = string(1) 1 [version] = string(3) 2.2 [rows] = string(2) 20 [f.r_take_date.facet.mincount] = string(1) 1 [f.Target_Audience.facet.mincount] = string(1) 1 [start] = string(1) 0 [q] = string(3) *:* [f.Language.facet.mincount] = string(1) 1 [f.Location.facet.mincount] = string(1) 1 [facet.field] = array(5) { [0] = string(4) Type [1] = string(10) Instrument [2] = string(8) Language [responseHeader] = object(SolrObject)#107 (3) { [status] = int(0) [QTime] = int(4) [params] = object(SolrObject)#108 (18) { [facet.date.start] = string(17) NOW/YEAR-200YEARS [facet] = string(4) true [indent] = string(2) on [facet.date] = string(11) r_take_date [wt] = string(3) xml [f.Instrument.facet.mincount] = string(1) 1 [version] = string(3) 2.2 [rows] = string(2) 20 [f.r_take_date.facet.mincount] = string(1) 1 [f.Target_Audience.facet.mincount] = string(1) 1 [start] = string(1) 0 [q] = string(3) *:* [f.Language.facet.mincount] = string(1) 1 [f.Location.facet.mincount] = string(1) 1 [facet.field] = array(5) { [0] = string(4) Type [1] = string(10) Instrument [2] = string(8) Language [3] = string(8) Location [4] = string(15) Target_Audience } [facet.date.gap] = string(6) +8YEAR [f.Type.facet.mincount] = string(1) 1 [facet.date.end] = string(8) NOW/YEAR } [3] = string(8) Location [4] = string(15) Target_Audience } [facet.date.gap] = string(6) +8YEAR [f.Type.facet.mincount] = string(1) 1 [facet.date.end] = string(8) NOW/YEAR } Facet info- [facet_counts] = object(SolrObject)#130 (3) { [facet_queries] = object(SolrObject)#131 (0) { } [facet_fields] = object(SolrObject)#132 (5) { [Type] = object(SolrObject)#133 (26) { [Instrumental] = int(1673) [Vocal] = int(977) [Spoken] = int(38) [tenor vocal] = int(6) [baritone vocal] = int(4) [cornet] = int(4) [soprano vocal] = int(3) [bass vocal] = int(2) [flute] = int(2) [whistling] = int(2) [bagpipes] = int(1) [barrel piano] = int(1) [bass trombone] = int(1) [chimes] = int(1) [clarinet] = int(1) [contralto] = int(1) [euphonium] = int(1) [mandolin] = int(1) [piano] = int(1) [piccolo] = int(1) [saxophone] = int(1) [trombone] = int(1) [trumpet] = int(1) [violin] = int(1) [violoncello] = int(1) [xylophone] = int(1) } [Instrument] = object(SolrObject)#134 (13) { [ baritone horn] = int(54) [ bassoon] = int(54) [ cornets] = int(54) [ piccolo] = int(54) [ tubas] = int(54) [clarinets] = int(54) [ traps ] = int(39) [ trombones] = int(39) [ French horns] = int(33) [ horns] = int(21) [ oboe] = int(18) [ traps] = int(15) [ trombones ] = int(15) } [Language] = object(SolrObject)#135 (4) { [Italian] = int(43) [French] = int(13) [Polish] = int(8) [Spanish] = int(8) } [Location] = object(SolrObject)#136 (6) { [Camden, New Jersey [unconfirmed]] = int(1555) [Philadelphia, Pennsylvania [unconfirmed]] = int(979) [Camden, New Jersey] = int(101) [New York, New York] = int(29) [Camden, New Jersey. Church Bldg.] = int(19) [Philadelphia, Pennsylvania] = int(1) } [Target_Audience] = object(SolrObject)#137 (5) { [Scottish] = int(3) [Jewish] = int(2) [Bohemian (Czech)] = int(1) [Polish] = int(1) [Swedish] = int(1) } } [facet_dates] = object(SolrObject)#138 (1) { [r_take_date] = object(SolrObject)#139 (27) { [1810-01-01T00:00:00Z] = int(0) [1818-01-01T00:00:00Z] = int(0) [1826-01-01T00:00:00Z] = int(0) [1834-01-01T00:00:00Z] = int(0) [1842-01-01T00:00:00Z] = int(0) [1850-01-01T00:00:00Z] = int(0) [1858-01-01T00:00:00Z] = int(0) [1866-01-01T00:00:00Z] = int(0) [1874-01-01T00:00:00Z] = int(0) [1882-01-01T00:00:00Z] = int(0) [1890-01-01T00:00:00Z] = int(0) [1898-01-01T00:00:00Z] = int(0)
Re: SOLR-343 date facet mincount patch
Chris, Please ignore the repeated response header due to typo in the previous message. ~Umesh -- View this message in context: http://lucene.472066.n3.nabble.com/Re-SOLR-343-date-facet-mincount-patch-tp789556p842863.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing stalls reads
This sounds like you have the same solrconfig for the slave and the master? You should turn off autoCommit on the slave. Only the master should autoCommit. You should set up the ReplicationHandler. This moves index updates from the indexer to the query server. http://www.lucidimagination.com/search/document/CDRG_ch10_10.3.3.5?q=replication http://www.lucidimagination.com/search/out?u=http%3A%2F%2Fwiki.apache.org%2Fsolr%2FSolrReplication On Mon, May 24, 2010 at 2:33 AM, Manish N m1n...@live.com wrote: Hey, I'm using solr 1.4 I've a master / slave setup, I use the slave for all my read operations commits are scheduled every 20 mins or every 1 docs. Now I think slave shouldn't build index but fetch ones created on Master, but I see it creating indexes, during which all read stalls. Now I don't think thats a common behavior or is there any other way to stop this ? Also how do i stop slave from removing the old indexes till AutoWarming is done ? is there a way to achieve this ? Thnx n Regards, - Manish _ The amazing world in sharp snaps http://news.in.msn.com/gallery/archive.aspx -- Lance Norskog goks...@gmail.com
Solr read-only core
Is there a way to open a Solr index/core in read-only mode? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-read-only-core-tp843049p843049.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr read-only core
Hi, I'd guess there are two ways in doing this but i've never seen any solrconfig.xml file having any directives that explicitly do not allow for updates. You'd either have a proxy in front that simply won't allow any other HTTP method than GET and HEAD, or you could remove the update request handler from your solrconfig.xml file. I've never tried the latter but i'd figure that without a request handler to accommodate updates, no updates can be made. Cheers, -Original message- From: Yao y...@ford.com Sent: Tue 25-05-2010 21:49 To: solr-user@lucene.apache.org; Subject: Solr read-only core Is there a way to open a Solr index/core in read-only mode? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-read-only-core-tp843049p843049.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: IndexSearcher and Caches
The stats.jsp page walks the internal JMX beans. It prints out the numbers of documents among other things. I would look at how that works instead of writing your own thing for the internal APIs. They may have changed from Solr 1.3 to 1.4 and will change further for 1.5 (4.0 is the new name?). On Tue, May 25, 2010 at 7:11 AM, Rahul R rahul.s...@gmail.com wrote: Chris, I am using SolrIndexSearcher to get a handle to the total number of records in the index. I am doing it like this : int num = Integer.parseInt((String)solrSearcher.getStatistics().get(numDocs).toString()); Please let me know if there is a better way to do this. Mark, I can tell you what I do in my applicaiton. We provide a tool to do the index update and assume that the user will always use it to create/update the index. Whenever an update happens, we notify the querying application and it creates a new instance of SolrCore, SolrServer etc. These continue to be shared across multiple users (as statics) till the next update happens. Thank you. Regards Rahul On Tue, May 25, 2010 at 4:18 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Thank you I found the API to get the existing SolrIndexSearcher to be : present in SolrCore: : SolrCore.getSearcher().get() I think perhaps you need to take 5 big steps back and explain what your goal is. 99.999% of all solr users should never care about that method -- even the 99.9% of the folks writing java code and using EmbeddedSolr should never ever have a need to call those -- so what exactly is it you are doing, and how did you get along hte path you find yourself on? this thread started with some fairly innoculous questions about how caches worked in regardes to new searchers -- which is all fine and dandy, those concepts that solr users should be aware of ... in the abstract. you should almost never be instantiating those IndexSearchers or Caches yourself. Stick with teh SolrServer abstraction provided by SolrJ... http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/SolrServer.html -Hoss -- Lance Norskog goks...@gmail.com
Re: question about indexing...
Change type=string to type=text. This causes the field to be analyzed and then searching on words finds the document. On Tue, May 25, 2010 at 8:34 AM, Jörg Agatz joerg.ag...@googlemail.com wrote: i create a new Index, but nothing Change. field name=COMMENT type=string indexed=true stored=true multiValued=true/ field name=COMMENT ![CDATA[ Hallo leute. mein name ist dein name und wir wollen eigentlich nur unsere Ruhe haben. bich du er sie es/b Ha ha Ha ha ha ha ha ha ha ha ]]/field I search for : *:* I fond it i search vor hallo Hallo hallo* Hallo*or some other content from the CDATA field i dosent. -- Lance Norskog goks...@gmail.com
Enhancing Solr relevance functions through predefined constants
Hi all, I have a suggestion for improving relevance functions in Solr by way of providing access to a set of pre-defined constants in Solr queries. Specifically, the number of documents indexed, the number of unique terms in a field, the total number of terms in a field, etc. are some of the query-time constants that I believe can be made use of in function queries as well as boosted queries to aid in the relevance calculations. One of the tips provided in the Solr 1.4 Enterprise search server book relating to using function queries is this - If your data changes in ways causing you to alter the constants in your function queries, then consider implementing a periodic automated test of your Solr data to ensure that the data fits within expected bounds. I believe that having access to some of the constants mentioned above will help in coming up with dynamic boost values that adapts as the underlying data changes. I think this makes sense given that one of the basic relevancy scoring metric - idf - is directly influenced by the number of documents indexed. I can imagine some of these constants being useful in Function queries and Boosted Queries but am not able to think of a neat little usage example. I request you all to provide feedback, comments on this idea to help evaluate if it is worth creating an enhancement jira item for the same. Thanks, Prasanna
Re: Debugging - DIH Delta Queries-
: Subject: Debugging - DIH Delta Queries- : References: : 1659766275.5213.1274376509278.javamail.r...@vicenza.dmz.lexum.pri : In-Reply-To: : 1659766275.5213.1274376509278.javamail.r...@vicenza.dmz.lexum.pri http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss
Re: Solr Cell and encrypted pdf files
: I can't seem to get solr cell to index password protected pdf files. : I can't figure out how to pass the password to tika and looking at : ExtractingDocumentLoader, : it doesn't seem to pass any pdf password related metadata to the tika parser. I suspect you are correct, i don't think anyone has ever submitted a patch to enable Solr take advantage of this functionality in Tika. if you have suggestions on how to implement it, please open a Jira issue (even if you don't have a patch to contribute, suggestions about how it might make sense to implement it from someone who has looked at the code and has some ideas may help inspire someoen else to work on a patch) -Hoss
Re: question about indexing...
Don't forget to re-index after you make the change Lance suggested... Erick On Tue, May 25, 2010 at 4:51 PM, Lance Norskog goks...@gmail.com wrote: Change type=string to type=text. This causes the field to be analyzed and then searching on words finds the document. On Tue, May 25, 2010 at 8:34 AM, Jörg Agatz joerg.ag...@googlemail.com wrote: i create a new Index, but nothing Change. field name=COMMENT type=string indexed=true stored=true multiValued=true/ field name=COMMENT ![CDATA[ Hallo leute. mein name ist dein name und wir wollen eigentlich nur unsere Ruhe haben. bich du er sie es/b Ha ha Ha ha ha ha ha ha ha ha ]]/field I search for : *:* I fond it i search vor hallo Hallo hallo* Hallo*or some other content from the CDATA field i dosent. -- Lance Norskog goks...@gmail.com
Re: Solr Delta Queries
: field name=indexed_timestamp type=date indexed=true stored=true default=NOW multiValued=false/ : For some reason when doing delta indexing via DIH, this field is not being updated. : : Are timestamp fields updated during DELTA updates? timestamp fields aren't treated any differnetly then any other field -- as far as Solr is concerned this just date field that happens to have a default value specified in case the client adding documents doesn't specify a value for this field -- in your case the client is DIH. One thing that isn't clear from the way your worded your question is wether you realize that when DIH does a delta-import only new documents matching your deltaQuery are updated in the index -- all other existing documents are left alone (with their old value for the indexed_timestamp field) ... however you should be able to see that any *new* documents have a value for the indexed_timestamp field. Perhaps the documents you are looking at where this field is not being updated weren't actually updated as part of the deltaQuery? if you look at the output from loading DIH in your browser, it will tell you how many documents were processed as a result of your last delta-import, and the log files will show you the uniqueKey of each doc so you can see exactly what was updated. -Hoss
Re: solr caches from external caching system like memcached
: Is it possible to use solr caches such as query cache , filter cache : and document cache from external caching system like memcached as it : has several advantages such as centralized caching system and reducing the : pause time of JVM 's garbage collection as we can assign less memory to : jvm . No. The purpose of those solr caches is to micro-cache those objects that are used at a very low level in memory. In an external cache system network overhead and object serialization become a factor in performance. using them in a micro cache aspect doesn't make sense -- at that point you're probably better off using something like an HTTP proxy cache to do macro caching at the level of the entire HTTP request/response. -Hoss
Re: Solr highlighter and custom queries?
: Actually, its not as much a Solr problem as a Lucene one, as it turns : out, the WeightedSpanTermExtractor is in Lucene and not Solr. : : Why they decided to only highlight queries that are in Lucene I don't : know, but what I did to solve this problem was simply to make my queries : extends a Lucene query instead of just Query. I am not very well informed on highlighting, but as i understand it the Span based Highlighter is specificly designed to deal with position based information that depends on dealing either with SpanQueries or with well known query types where that information can be faked. However, i believe the more traditional highlighter (using QueryTermExtractor) was able to deal with highlighting any query tat implemented extractTerms(Set) so perhaps something about the way you are using the highlighter is triggering the use of WeightedSpanTermExtractor instead of QueryTermExtractor? -Hoss
Re: Full Import failed
: yes i am running 1.5, Any idea how we can run Solr 1.4 using Java 1.5 Solr 1.4 works just fine with Java 1.5 -- even when Using the DataImportHandler. there are some features of DIH like the ScriptTransformer that requires java 1.6, but that's not your issue... : Last I encountered that exception was with the usage of String.isEmpty : which is a 1.6 novelty. ...the line in question in the stack trace provided has nothign to do with String.isEmpty. Caused by: java.lang.NoSuchMethodError: isEmpty at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:391) the object in question is a DocWrapper which inherits from SolrInputDocument which defines isEmpty. if you are getting this error it suggests that something is wonky with your classpath, and you probably have multiple versions of some solr jars getting included by mistake -- in particular an old copy of the solr-common jar where SolrInputDocument is defined. -Hoss
Re: Full Import failed
I am just using the sor.war file that came with the Solr 1.4 download on weblogic. did not add any jar or remove any jar On Tue, May 25, 2010 at 9:54 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : yes i am running 1.5, Any idea how we can run Solr 1.4 using Java 1.5 Solr 1.4 works just fine with Java 1.5 -- even when Using the DataImportHandler. there are some features of DIH like the ScriptTransformer that requires java 1.6, but that's not your issue... : Last I encountered that exception was with the usage of String.isEmpty : which is a 1.6 novelty. ...the line in question in the stack trace provided has nothign to do with String.isEmpty. Caused by: java.lang.NoSuchMethodError: isEmpty at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:391) the object in question is a DocWrapper which inherits from SolrInputDocument which defines isEmpty. if you are getting this error it suggests that something is wonky with your classpath, and you probably have multiple versions of some solr jars getting included by mistake -- in particular an old copy of the solr-common jar where SolrInputDocument is defined. -Hoss