Re: logical relation among filter queries
Right, i can combine that into one fq query. The only thing is that i want to reduce the cache size. I remember this is what i read from wiki. fq=rating:R (filter query cache A) fq=rating:PG-13 (filter query cache B) fq=rating:(R O PG-13) -- (It won't be able to leverage the filter query cache A and B above, instead it will create another whole new filter query cache C) fq=rating:Rfq=rating:PG-13 -- (Will be able to leverage filter query cache A and B) I will have a lot of queries with different combination of the values out of the same field, rating. Therefore, i thought if the logical relation among filter query is OR, it will control the number of distinct cache to be distinct number of rating value. Does it matter? -- View this message in context: http://lucene.472066.n3.nabble.com/logical-relation-among-filter-queries-tp2649142p2649904.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: StreamingUpdateSolrServer
Yes. Each thread uses its own connection, and each becomes a new thread in the servlet container. On Mon, Mar 7, 2011 at 2:54 AM, Isan Fulia isan.fu...@germinait.com wrote: Hi all, I am using StreamingUpdateSolrServer with queuesize = 5 and threadcount=4 The no. of connections created are same as threadcount. Is it that it creates a new connection for every thread. -- Thanks Regards, Isan Fulia. -- Lance Norskog goks...@gmail.com
Synonyms question
Hi guys How to put this in synonyms.txt US USA United States of America
Re: Synonyms question
http://lmgtfy.com/?q=solr+synonym (First hit gives many examples) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 8. mars 2011, at 10.06, Darx Oman wrote: Hi guys How to put this in synonyms.txt US USA United States of America
How to Index and query URLs as fields
Hi, I've run into problems trying to achieve a seemingly simple thing. I'm indexing a bunch of files (local ones and potentially some accessible via other protocols like http or ftp) and have an index field with the url to the file, e.g. file:/home/foo/bar.pdf. Now I want to perform two simple types of queries on this, i.e. retrieve all file records located under a certain path (e.g. file:/home/foo/*) or find the file record for an exact URL. What I naively tried was to index the file URL in a field (fileURL) of type string and simply perform queries like fileURL:file\:/home/foo/* and fileURL:file\:/home/foo/bar.pdf and neither one returned results. the type is defined as fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true/ and the field as field name=fileURL type=string indexed=true stored=true multiValued=false / I am using solr 1.4.1 and use solrj to do the indexing and querying. This seems like a rather basic requirement and obviously I am doing something wrong. I didn't find anything in the docs or the mailing list archive so far. Any help, hints, pointers would be appreciated. Robert
Re: Use of multiple tomcat instance and shards.
Hi, from my experience when you have to scale in the number of documents it's good idea to use shards (so one schema and N shards containing (1/N)*total#docs) while if the requirement is granting high query volume response you could get a significant boost from replicating the same index on 2 or more machines and do load balancing on those machines (consider that in most cases a round robin LB works pretty well). So I think you should look inside the replication wiki page [1]. To check your Tomcat installation the related wiki page may also be useful [2]. My 2 cents, Tommaso [1] : http://wiki.apache.org/solr/SolrReplication [2] : http://wiki.apache.org/solr/SolrTomcat 2011/3/8 rajini maski rajinima...@gmail.com In order to increase the Java heap memory, I have only 2gb ram… so my default memory configuration is --JvmMs 128 --JvmMx 512 . I have the single solr data index upto 6gb. Now if I am trying to fire a search very often on this data index, after sometime I find an error as java heap space out of memory error and search does not return results. What are the possibilities to fix this error? (I cannot increase heap memory) How about having another tomcat instance running (how this works? )or is it by configuring shards? What is that might help me fix this search fail? Rajani
Re: Use of multiple tomcat instance and shards.
Having 2Gb physical memory on the box I would allocate -Xmx1024m to Java as a starting point. The other thing you could do is try to trim your config to use less memory. Are you using many facets? String sorts? Wildcards? Fuzzy? Storing or returning more fields than needed? http://wiki.apache.org/solr/SolrPerformanceFactors#RAM_Usage_Considerations -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 8. mars 2011, at 07.40, rajini maski wrote: In order to increase the Java heap memory, I have only 2gb ram… so my default memory configuration is --JvmMs 128 --JvmMx 512 . I have the single solr data index upto 6gb. Now if I am trying to fire a search very often on this data index, after sometime I find an error as java heap space out of memory error and search does not return results. What are the possibilities to fix this error? (I cannot increase heap memory) How about having another tomcat instance running (how this works? )or is it by configuring shards? What is that might help me fix this search fail? Rajani
Difference between Faceting Fieldcollapsing
Hi, Can anyone explain in which scenario faceting field collapsing is used .What is the difference between these two. Best Regards! Isha
Re: How to Index and query URLs as fields
My mistake. The error turned out to be somewhere else and the described approach seems to work. Sorry for the wasted bandwidth. On Mar 8, 2011, at 11:06 AM, Robert Krüger wrote: Hi, I've run into problems trying to achieve a seemingly simple thing. I'm indexing a bunch of files (local ones and potentially some accessible via other protocols like http or ftp) and have an index field with the url to the file, e.g. file:/home/foo/bar.pdf. Now I want to perform two simple types of queries on this, i.e. retrieve all file records located under a certain path (e.g. file:/home/foo/*) or find the file record for an exact URL. What I naively tried was to index the file URL in a field (fileURL) of type string and simply perform queries like fileURL:file\:/home/foo/* and fileURL:file\:/home/foo/bar.pdf and neither one returned results. the type is defined as fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true/ and the field as field name=fileURL type=string indexed=true stored=true multiValued=false / I am using solr 1.4.1 and use solrj to do the indexing and querying. This seems like a rather basic requirement and obviously I am doing something wrong. I didn't find anything in the docs or the mailing list archive so far. Any help, hints, pointers would be appreciated. Robert
Re: Difference between Faceting Fieldcollapsing
Faceting is returned independently of your result set, telling you how many documents contain each facet. Field collapsing / grouping modifies your result set to roll up multiple hits sharing the same collapse key, much like Google does to hide more results from same site. You may use a field both for faceting and collapsing, but for different reasons. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 8. mars 2011, at 12.50, Isha Garg wrote: Hi, Can anyone explain in which scenario faceting field collapsing is used .What is the difference between these two. Best Regards! Isha
Re: logical relation among filter queries
The filter queries are interpreted to be intersection. That is, each fq clause is intersected with the result set. There's no way I know of to combine separate filter queries with an OR operator. Best Erick On Tue, Mar 8, 2011 at 2:59 AM, cyang2010 ysxsu...@hotmail.com wrote: Right, i can combine that into one fq query. The only thing is that i want to reduce the cache size. I remember this is what i read from wiki. fq=rating:R (filter query cache A) fq=rating:PG-13 (filter query cache B) fq=rating:(R O PG-13) -- (It won't be able to leverage the filter query cache A and B above, instead it will create another whole new filter query cache C) fq=rating:Rfq=rating:PG-13 -- (Will be able to leverage filter query cache A and B) I will have a lot of queries with different combination of the values out of the same field, rating. Therefore, i thought if the logical relation among filter query is OR, it will control the number of distinct cache to be distinct number of rating value. Does it matter? -- View this message in context: http://lucene.472066.n3.nabble.com/logical-relation-among-filter-queries-tp2649142p2649904.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Use of multiple tomcat instance and shards.
I have considered the RAM usage points of solr_wiki and yes,I have many facet queries fired every time and might be this is one of the reason .. I did give the Xmx-1024m and the error occurred but it was 2-3 times after many search queries fired.. But then the system slows down . So I needed any alternative. * * Tommaso, Please can you share any link that explains me about how to enable and do load balancing on the machines that you did mention above..? On Tue, Mar 8, 2011 at 4:11 PM, Jan Høydahl jan@cominvent.com wrote: Having 2Gb physical memory on the box I would allocate -Xmx1024m to Java as a starting point. The other thing you could do is try to trim your config to use less memory. Are you using many facets? String sorts? Wildcards? Fuzzy? Storing or returning more fields than needed? http://wiki.apache.org/solr/SolrPerformanceFactors#RAM_Usage_Considerations -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 8. mars 2011, at 07.40, rajini maski wrote: In order to increase the Java heap memory, I have only 2gb ram… so my default memory configuration is --JvmMs 128 --JvmMx 512 . I have the single solr data index upto 6gb. Now if I am trying to fire a search very often on this data index, after sometime I find an error as java heap space out of memory error and search does not return results. What are the possibilities to fix this error? (I cannot increase heap memory) How about having another tomcat instance running (how this works? )or is it by configuring shards? What is that might help me fix this search fail? Rajani
Re: Use of multiple tomcat instance and shards.
Have you looked at your cache usage statistics from the admin page? That should give you some sense of whether your caches are experiencing evictions, which would also lead to excessive garbage collections. That should give you some additional information to work with. Also, what version of Solr are you using? 1.4.1? Best Erick On Tue, Mar 8, 2011 at 7:52 AM, rajini maski rajinima...@gmail.com wrote: I have considered the RAM usage points of solr_wiki and yes,I have many facet queries fired every time and might be this is one of the reason .. I did give the Xmx-1024m and the error occurred but it was 2-3 times after many search queries fired.. But then the system slows down . So I needed any alternative. * * Tommaso, Please can you share any link that explains me about how to enable and do load balancing on the machines that you did mention above..? On Tue, Mar 8, 2011 at 4:11 PM, Jan Høydahl jan@cominvent.com wrote: Having 2Gb physical memory on the box I would allocate -Xmx1024m to Java as a starting point. The other thing you could do is try to trim your config to use less memory. Are you using many facets? String sorts? Wildcards? Fuzzy? Storing or returning more fields than needed? http://wiki.apache.org/solr/SolrPerformanceFactors#RAM_Usage_Considerations -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 8. mars 2011, at 07.40, rajini maski wrote: In order to increase the Java heap memory, I have only 2gb ram… so my default memory configuration is --JvmMs 128 --JvmMx 512 . I have the single solr data index upto 6gb. Now if I am trying to fire a search very often on this data index, after sometime I find an error as java heap space out of memory error and search does not return results. What are the possibilities to fix this error? (I cannot increase heap memory) How about having another tomcat instance running (how this works? )or is it by configuring shards? What is that might help me fix this search fail? Rajani
Re: Use of multiple tomcat instance and shards.
Hi Rajani, i 2011/3/8 rajini maski rajinima...@gmail.com Tommaso, Please can you share any link that explains me about how to enable and do load balancing on the machines that you did mention above..? if you're querying Solr via SolrJ [1] you could use the LBHttpSolrServer [2] otherwise, if you still want Solr to be responsible for load balancing, implement a custom handler which wraps it (see [3]). Consider also that this load balancing often gets done using a VIP [4] or an Apache HTTP server in front of Solr. Hope this helps, Tommaso [1] : http://wiki.apache.org/solr/Solrj [2] : http://wiki.apache.org/solr/LBHttpSolrServer [3] : http://markmail.org/thread/25jrko5s7wlmzjf7 [4] : http://en.wikipedia.org/wiki/Virtual_IP_address
getting much double-Values from solr -- timeout
Hello. i have 34.000.000 documents in my index and each doc have a field with a double-value. i want the sum of these fields. i testet it with the statscomponent but this is not usable. !! so i get all my values directly from solr, from the index and with php-sum() i get my sum. that works fine but, when a user search over really much documents (~ 30.000), my skript need longer than 30 seconds and php skipped this. how can i tune solr, to geht much faster this double-values from the index !? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 4GB Xmx - Solr2 for Update-Request - delta every 2 Minutes - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/getting-much-double-Values-from-solr-timeout-tp2650981p2650981.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Use of multiple tomcat instance and shards.
Just one more hint, I didn't mention it in the previous email since I imagine the scenario you explained doesn't allow it but anyways you could also check Solr Cloud and its distributed requests [1]. Cheers, Tommaso [1] : http://wiki.apache.org/solr/SolrCloud#Distributed_Requests 2011/3/8 Tommaso Teofili tommaso.teof...@gmail.com Hi Rajani, i 2011/3/8 rajini maski rajinima...@gmail.com Tommaso, Please can you share any link that explains me about how to enable and do load balancing on the machines that you did mention above..? if you're querying Solr via SolrJ [1] you could use the LBHttpSolrServer [2] otherwise, if you still want Solr to be responsible for load balancing, implement a custom handler which wraps it (see [3]). Consider also that this load balancing often gets done using a VIP [4] or an Apache HTTP server in front of Solr. Hope this helps, Tommaso [1] : http://wiki.apache.org/solr/Solrj [2] : http://wiki.apache.org/solr/LBHttpSolrServer [3] : http://markmail.org/thread/25jrko5s7wlmzjf7 [4] : http://en.wikipedia.org/wiki/Virtual_IP_address
Re: logical relation among filter queries
Erick, Thanks for reply. Is there anyway that i can instruct to combine seperate filter queries with UNION result, without creating the 3rd filter query cache as I described above? If not, shall I give up using filter query for such scenario (where i query the same field with multiple value using OR) and using normal solr query instead? At least solr query cache is lighter weighted than filter query cache. What do you think? Thanks, Carole -- View this message in context: http://lucene.472066.n3.nabble.com/logical-relation-among-filter-queries-tp2649142p2651639.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: How to handle searches across traditional and simplifies Chinese?
This page discusses the reasons why it's not a simple one to one mapping http://www.kanji.org/cjk/c2c/c2cbasis.htm Tom -Original Message- I have documents that contain both simplified and traditional Chinese characters. Is there any way to search across them? For example, if someone searches for 类 (simplified Chinese), I'd like to be able to recognize that the equivalent character is 類 in traditional Chinese and search for 类 or 類 in the documents
docBoost
Hi all, I am using dataimport to create my index and I want to use docBoost to assign some higher weights to certain docs. I understand the concept behind docBoost but I haven't been able to find an example anywhere that shows how to implement it. Assuming the following config file: document entity name=animal dataSource=animals pk=id query=SELECT * FROM animals field column=id name=id / field column=genus name=genus / field column=species name=species / entity name=boosters dataSource=boosts query=SELECT boost_score FROM boosts WHERE animal_id = ${ animal.id} field column=boost_score name=boost_score / /entity /entity /document How do I add in a docBoost score? The boost score is currently in a separate table as shown above.
Re: Problem adding new requesthandler to solr branch_3x
: 1. Why the problem occurs (has something changed between 1.4.1 and 3x)? Various pieces of code dealing with config parsing have changed since 1.4.1 to be better about verifying that configs are meaningful ,ad reporting errors when unexpected things are encountered. i'm not sure of the specific change, but the underlying point is: if 1.4.1 wasn't giving you an error for that syntax, it's because it was compleltey ignoring it. -Hoss
Smart Pagination queries
e.g. There are 4,000 solr documents that were found for a particular word search. My app has entitlement rules applied to those 4,000 documents and it's quite possible that user is only eligible to view 3,000 results out of 4K. This is achieved through post filtering application logic. My question related to solr pagination is : In order to paint Next links app would have to know total number of records that user is eligible for read. getNumFound() will tell me that there are total 4K records that Solr returned. If there wasn't any entitlement rules then it could have been easier to determine how many Next links to paint and when user clicks on Next pass in start position appropriately in solr query. Since I have to apply post filter as and when results are fetched from Solr is there a better way to achieve this? e.g. Because of post filtering I wouldn't know whether to paint Next link until results for next links are pre-fetched and filtered. Pre-fetching won't work as that would kill the performance and have no meaning of Solr pagination. Any better suggestion? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Smart-Pagination-queries-tp2652273p2652273.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: -ignore words not working?
: AND ((-title:men) AND (-keywords:men) AND (-description:men)) ... : As soon as I put in -field:value it yeilds no results... even though there : are a ton of results that match the criteria :/ you didn't add -field:value ... you added (-field:value) the parens are significant. the parents create a boolean query, and inside that boolean query you have one clause which is purely negative. a boolean query with all negative clauses by definittion matches nothing. in your outher query, you have then made that boolean query mandatory (because of the AND) which means your outer query can't match anythign either. removing the parens would probably work, or using a meme of (*:* -keywords:men) would probably work. (Solr does a good job of helping you with pure negative queries at the top level of your syntax (ie: fq=-field:value) but it doesn't traverse the entire query looking for things that are structural valid but don't actually match anything ... that might have been your point when you wrote it) -Hoss
Re: Error during auto-warming of key
Anyone here with some thoughts on this issue? Hi, Yesterday's error log contains something peculiar: ERROR [solr.search.SolrCache] - [pool-29-thread-1] - : Error during auto- warming of key:+*:* (1.0/(7.71E-8*float(ms(const(1298682616680),date(sort_date)))+1.0))^20.0:ja va.lang.NullPointerException at org.apache.lucene.util.StringHelper.intern(StringHelper.java:36) at org.apache.lucene.search.FieldCacheImpl$Entry.init(FieldCacheImpl.java:27 5) at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:525) at org.apache.solr.search.function.LongFieldSource.getValues(LongFieldSource.j ava:57) at org.apache.solr.search.function.DualFloatFunction.getValues(DualFloatFuncti on.java:48) at org.apache.solr.search.function.ReciprocalFloatFunction.getValues(Reciproca lFloatFunction.java:61) at org.apache.solr.search.function.FunctionQuery$AllScorer.init(FunctionQuer y.java:123) at org.apache.solr.search.function.FunctionQuery$FunctionWeight.scorer(Functio nQuery.java:93) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.jav a:297) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:246) at org.apache.lucene.search.Searcher.search(Searcher.java:171) at org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java :651) at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:5 45) at org.apache.solr.search.SolrIndexSearcher.cacheDocSet(SolrIndexSearcher.java :520) at org.apache.solr.search.SolrIndexSearcher$2.regenerateItem(SolrIndexSearcher .java:296) at org.apache.solr.search.FastLRUCache.warm(FastLRUCache.java:168) at org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1481) at org.apache.solr.core.SolrCore$2.call(SolrCore.java:1131) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.j ava:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: 908) at java.lang.Thread.run(Thread.java:619) Well, i use Dismax' bf parameter to boost very recent documents. I'm not using the queryResultCache or documentCache, only filterCache and Lucene fieldCache. I've check LUCENE-1890 but am unsure if that's the issue. Anyt thoughts on this one? https://issues.apache.org/jira/browse/LUCENE-1890 Cheers,
Re: getting much double-Values from solr -- timeout
Are you using shards or have everything in same index? What problem did you experience with the StatsCompnent? How did you use it? I think the right approach will be to optimize StatsComponent to do quick sum() -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 8. mars 2011, at 16.52, stockii wrote: Hello. i have 34.000.000 documents in my index and each doc have a field with a double-value. i want the sum of these fields. i testet it with the statscomponent but this is not usable. !! so i get all my values directly from solr, from the index and with php-sum() i get my sum. that works fine but, when a user search over really much documents (~ 30.000), my skript need longer than 30 seconds and php skipped this. how can i tune solr, to geht much faster this double-values from the index !? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 4GB Xmx - Solr2 for Update-Request - delta every 2 Minutes - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/getting-much-double-Values-from-solr-timeout-tp2650981p2650981.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: logical relation among filter queries
Can't really answer that question in the abstract. About all you can really do is monitor your caches (the admin stats page helps) and note if/when you start getting cache evictions and adjust then. I really wouldn't worry about this unless and until you start getting query slowdowns, just go ahead and use combined filter queries instead (i.e. fq=(A OR B OR C) Best Erick On Tue, Mar 8, 2011 at 12:15 PM, cyang2010 ysxsu...@hotmail.com wrote: Erick, Thanks for reply. Is there anyway that i can instruct to combine seperate filter queries with UNION result, without creating the 3rd filter query cache as I described above? If not, shall I give up using filter query for such scenario (where i query the same field with multiple value using OR) and using normal solr query instead? At least solr query cache is lighter weighted than filter query cache. What do you think? Thanks, Carole -- View this message in context: http://lucene.472066.n3.nabble.com/logical-relation-among-filter-queries-tp2649142p2651639.html Sent from the Solr - User mailing list archive at Nabble.com.
NRT in Solr
Hi, Is NRT in Solr 4.0 from trunk? I have checkouted from Trunk, but could not find the configuration for NRT. Regards Jae
two QueryHandler components in one schema?
hello list, in my schema I have searchComponent name=query class=org.curriki.solr.handlers.CurrikiSolrQueryComponent / which, as I understand it, allows all requestHandlers to use my query-component. That is useful but I wonder if there's a way for me to have one request-handler that uses my query-component and another to use the default one? Formulated, differently, my question is whether - search-components can be defined by name within the requestHandler element of the schema - or whether a differently named query search-component would still be used as query-component thanks in advance paul
Re: two QueryHandler components in one schema?
: in my schema I have First off, a bit of terminoligy clarification: Search COmponents are declarred in the solrconfig.xml file. schema.xml is where you define what, inherently, the data in your index *is*. solrocnfig.xml is where you define how you want people to be able to interact with the data in your index. : Formulated, differently, my question is whether : - search-components can be defined by name within the requestHandler element of the schema : - or whether a differently named query search-component would still be used as query-component yes, and yes. SearchHandler refrences Search Components by name, using the component list it is configured with. So you can leave the name query for the default instance of QueryComponent and then give your custom component it's own name, and refer to it by name when configuring the SearchHandler's you want to use it... http://wiki.apache.org/solr/SearchHandler http://wiki.apache.org/solr/SearchComponent -Hoss
Re: two QueryHandler components in one schema?
Le 8 mars 2011 à 23:03, Chris Hostetter a écrit : : in my schema I have First off, a bit of terminoligy clarification: Search COmponents are declarred in the solrconfig.xml file. schema.xml is where you define what, inherently, the data in your index *is*. solrocnfig.xml is where you define how you want people to be able to interact with the data in your index. Sorry, this is absolutely true. I should have said in my config. : Formulated, differently, my question is whether : - search-components can be defined by name within the requestHandler element of the schema : - or whether a differently named query search-component would still be used as query-component yes, and yes. SearchHandler refrences Search Components by name, using the component list it is configured with. So you can leave the name query for the default instance of QueryComponent and then give your custom component it's own name, and refer to it by name when configuring the SearchHandler's you want to use it... So how do I define, for a given request-handler, a special query component? I did not find in this in the schema. paul
Re: two QueryHandler components in one schema?
A request handler can have first-components and last-components and also just plain components. List all your stuff in components and voila. Don't forget to also add debug, facet and other default components if you need them. Le 8 mars 2011 à 23:03, Chris Hostetter a écrit : : in my schema I have First off, a bit of terminoligy clarification: Search COmponents are declarred in the solrconfig.xml file. schema.xml is where you define what, inherently, the data in your index *is*. solrocnfig.xml is where you define how you want people to be able to interact with the data in your index. Sorry, this is absolutely true. I should have said in my config. : Formulated, differently, my question is whether : - search-components can be defined by name within the requestHandler : element of the schema - or whether a differently named query : search-component would still be used as query-component yes, and yes. SearchHandler refrences Search Components by name, using the component list it is configured with. So you can leave the name query for the default instance of QueryComponent and then give your custom component it's own name, and refer to it by name when configuring the SearchHandler's you want to use it... So how do I define, for a given request-handler, a special query component? I did not find in this in the schema. paul
Solr Hanging all of sudden with update/csv
Hi folks, I've been using solr for about 3 months. Our Solr install is a single node, and we have been injecting logging data into the solr server every couple of minutes, which each updating taking few minutes. Everything working fine until this morning, at which point it appeared that all updates were hung. Retarting the solr server did not help, as all updaters immediately 'hung' again. Poking around in the threads, and strace, I do in fact see stuff happening. The index size itself is about 270Gb, (we are hopping to support upto 500-1TB), and have supplied the system with ~3TB diskspace. Any Tips on what could be happening? notes: we have never run an optimize yet. we have never deleted from system yet. The merge Thread appears to be the one..'never returnning' Lucene Merge Thread #0 - Thread t@41 java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcher.pread0(Native Method) at sun.nio.ch.FileDispatcher.pread(FileDispatcher.java:31) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:234) at sun.nio.ch.IOUtil.read(IOUtil.java:210) at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:622) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:139) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:94) at org.apache.lucene.store.DataOutput.copyBytes(DataOutput.java:176) at org.apache.lucene.index.FieldsWriter.addRawDocuments(FieldsWriter.java:209) at org.apache.lucene.index.SegmentMerger.copyFieldsNoDeletions(SegmentMerger.java:424) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:153) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4053) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3645) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:339) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:407) Some ptrace output: 23178 pread(172, \270\316\276\2\245\371\274\2\271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2..., 4096, 98004192) = 4096 0.09 23178 pread(172, \245\371\274\2\271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2..., 4096, 98004196) = 4096 0.09 23178 pread(172, \271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2..., 4096, 98004200) = 4096 0.08 23178 pread(172, \272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2..., 4096, 98004204) = 4096 0.08 23178 pread(172, \273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2..., 4096, 98004208) = 4096 0.08 23178 pread(172, \274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2..., 4096, 98004212) = 4096 0.09 23178 pread(172, \275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2..., 4096, 98004216) = 4096 0.08 23178 pread(172, \276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2..., 4096, 98004220) = 4096 0.09 23178 pread(172, \277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2..., 4096, 98004224) = 4096 0.13 22688 ... futex resumed ) = -1 ETIMEDOUT (Connection timed out) 0.051276 23178 pread(172, \300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2..., 4096, 98004228) = 4096 0.10 22688 futex(0x464a9f28, FUTEX_WAKE_PRIVATE, 1 23178 pread(172, \301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2\306\316\276\2..., 4096, 98004232) = 4096 0.10 22688 ... futex resumed ) = 0 0.51 23178 pread(172, \302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2\306\316\276\2\307\316\276\2..., 4096, 98004236) = 4096 0.10 22688 clock_gettime(CLOCK_MONOTONIC, 23178 pread(172, \367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2\306\316\276\2\307\316\276\2\310\316\276\2..., 4096, 98004240) = 4096 0.10 22688 ... clock_gettime resumed {1900472, 454038316}) = 0 0.54 23178 pread(172, \246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2\306\316\276\2\307\316\276\2\310\316\276\2\311\316\276\2..., 4096, 98004244) = 4096 0.11 22688 clock_gettime(CLOCK_MONOTONIC, 23178 pread(172,
Re: two QueryHandler components in one schema?
: So how do I define, for a given request-handler, a special query component? : I did not find in this in the schema. you mean solrocnfig.xml, again. Taken directly from the SearchHandler URL i sent you... If you want to have a custom list of components (either omitting defaults or adding custom components) you can specify the components for a handler directly: arr name=components strquery/str strfacet/str strmlt/str strhighlight/str strdebug/str strsomeothercomponent/str /arr ...so if you don't wnat to use query and you want to use mySpecialQueryComponent it would be ... arr name=components strmySpecialQueryComponent/str strfacet/str strmlt/str strhighlight/str strdebug/str /arr ...the SearchComponent URL i sent, as well as the example/solr/conf/solrconfig.xml file that ships with solr also has examples of how/when you can specify an explicit components list -Hoss
Re: two QueryHandler components in one schema?
Erm, did you, Hoss, not say that components are referred to by name? How could the search result be read from the query mySpecialQueryComponent if it cannot be named? Simply through the pool of SolrParams? If yes, that's the great magic of solr. paul Le 8 mars 2011 à 23:19, Chris Hostetter a écrit : ...so if you don't wnat to use query and you want to use mySpecialQueryComponent it would be ... arr name=components strmySpecialQueryComponent/str strfacet/str strmlt/str strhighlight/str strdebug/str /arr ...the SearchComponent URL i sent, as well as the example/solr/conf/solrconfig.xml file that ships with solr also has examples of how/when you can specify an explicit components list
How to intercept the http request made by solrj
Hi, Anyone knows how to intercept the http request made by solrj? I only see the url being printed out when the request is invalid. But still as part of development/debugging process, i want to verify what http request it sent out to solr server. Thanks. CY -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-intercept-the-http-request-made-by-solrj-tp2652951p2652951.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: error in log INFO org.apache.solr.core.SolrCore - webapp=/solr path=/admin/ping params={} status=0 QTime=1
: I am using solr under jboss, so this might be more of a jboss config : issue, not really sure. But my logs keep getting spammed, because : solr sends it as ERROR [STDERR] INFO org.apache.solr.core.SolrCore - : webapp=/solr path=/admin/ping params={} status=0 QTime=1 : : Has anyone seen this and found a workaround to not send this as an Error? that's not an error -- that's Solr logging a message using the INFO level which some other code is then prepending ERROR [STDERR] in front of. My guess: your installation is setup so that Java Util Logging goes to System.err by default, and then something in JBoss has remapped System.err to an internal stream that it then processes/redirects and lets you know that those lines were written to STDERR (and treats them as an error) ... most likely everything Solr ever logs is being written out that way (not just those INFO messages from SolrCore. Solr users the SLF4J abstraction to do it's logging, and by default ships with the SLF4J-to-JUL bridge (because JUL logging is the one type of logging garunteed to be supported by every servlet container w/o any external dependencies or risk of class path collision). You should investigate how to configure JUL logging for your JBoss installation to get those messages somewhere more useful then STDERR, and/or change the SLF4J bindings that are in use in your Solr installation... http://wiki.apache.org/solr/SolrLogging -Hoss
Re: two QueryHandler components in one schema?
: did you, Hoss, not say that components are referred to by name? How : could the search result be read from the query mySpecialQueryComponent : if it cannot be named? Simply through the pool of SolrParams? in the example i gave, mySpecialQueryComponent *is* the name of some component you have already defined -- instead of using the component named query which has also already been defined (either implicitly as a default or explicitly in the config) As i keep saying: if you look at the 1.4.1 example solrconfig.xml, there are several examples of this (and the example solrconfig.xml that will be in the Solr 3.1 is even better)... From 1.4.1... By default, the following components are avaliable: searchComponent name=query class=org.apache.solr.handler.component.QueryComponent / ... Default configuration in a requestHandler would look like: arr name=components strquery/str strfacet/str strmlt/str strhighlight/str strstats/str strdebug/str /arr If you register a searchComponent to one of the standard names, that will be used instead. ... !-- A component to return terms and document frequency of those terms. This component does not yet support distributed search. -- searchComponent name=termsComponent class=org.apache.solr.handler.component.TermsComponent/ requestHandler name=/terms class=org.apache.solr.handler.component.SearchHandler lst name=defaults bool name=termstrue/bool /lst arr name=components strtermsComponent/str /arr /requestHandler -Hoss
Re: dataimport
: INFO: Creating a connection for entity id with URL: : jdbc:mysql://localhost/researchsquare_beta_library?characterEncoding=UTF8zeroDateTimeBehavior=convertToNull : Feb 24, 2011 8:58:25 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 : call : INFO: Time taken for getConnection(): 137 : Killed : : So it looks like for whatever reason, the server crashes trying to do a full : import. When I add a LIMIT clause on the query, it works fine when the LIMIT : is only 250 records but if I try to do 500 records, I get the same message. ...wow. that's ... weird. I've never seen a java process just log Killed like that. The only time i've ever seen a process log Killed is if it was terminated by the os (ie: kill -9 pid) What OS are you using? how are you running solr? (ie: are you using the simple jetty example java -jar start.jar or are you using a differnet servlet container?) ... are you absolutely certain your machine doens't have some sort of monitoring in place that kills jobs if they take too long, or use too much CPU? -Hoss
Re: Help with explain query syntax
: str name=parsedquery : +DisjunctionMaxQuery((company_name:(linguajob.pl linguajob) pl)~0.01) () : /str you can see the crux of your problem in this query string it seems you have a query time synonym in place to *expand* linguajob.pl into [linguajob.pl] and [linguajob] [pl] but query time synonym expansion of multiword queries doesn't work -- what it is ultimatley requiring is that a doc contain linguajob.pl and linguajob at the same term position, followed by pl this is not what you have indexed. This type of specific example is warned against on the wiki... http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory -Hoss
Re: Solr Hanging all of sudden with update/csv
My guess is that you're running out of RAM. Actual Java profiling is beyond me, but I have seen issues on updating that were solved by more RAM. If you are updating every few minutes, and your new index takes more than a few minutes to warm, you could be running into overlapping warming indexes issues. Some more info on what I mean by this in this FAQ, although the FAQ isn't actually targetted at this case exactly: http://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F Overlapping warming indexes can result in excessive RAM and/or CPU usage. If you haven't given your JVM options to tune the JVM Garbage Collection, that can also help things, using the options for concurrent thread GC. But if your fundamental problem is overlapping warming queries, you probably need to make that stop. On 3/8/2011 5:17 PM, danomano wrote: Hi folks, I've been using solr for about 3 months. Our Solr install is a single node, and we have been injecting logging data into the solr server every couple of minutes, which each updating taking few minutes. Everything working fine until this morning, at which point it appeared that all updates were hung. Retarting the solr server did not help, as all updaters immediately 'hung' again. Poking around in the threads, and strace, I do in fact see stuff happening. The index size itself is about 270Gb, (we are hopping to support upto 500-1TB), and have supplied the system with ~3TB diskspace. Any Tips on what could be happening? notes: we have never run an optimize yet. we have never deleted from system yet. The merge Thread appears to be the one..'never returnning' Lucene Merge Thread #0 - Thread t@41 java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcher.pread0(Native Method) at sun.nio.ch.FileDispatcher.pread(FileDispatcher.java:31) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:234) at sun.nio.ch.IOUtil.read(IOUtil.java:210) at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:622) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:139) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:94) at org.apache.lucene.store.DataOutput.copyBytes(DataOutput.java:176) at org.apache.lucene.index.FieldsWriter.addRawDocuments(FieldsWriter.java:209) at org.apache.lucene.index.SegmentMerger.copyFieldsNoDeletions(SegmentMerger.java:424) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:153) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4053) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3645) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:339) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:407) Some ptrace output: 23178 pread(172, \270\316\276\2\245\371\274\2\271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2..., 4096, 98004192) = 40960.09 23178 pread(172, \245\371\274\2\271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2..., 4096, 98004196) = 40960.09 23178 pread(172, \271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2..., 4096, 98004200) = 40960.08 23178 pread(172, \272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2..., 4096, 98004204) = 40960.08 23178 pread(172, \273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2..., 4096, 98004208) = 40960.08 23178 pread(172, \274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2..., 4096, 98004212) = 40960.09 23178 pread(172, \275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2..., 4096, 98004216) = 40960.08 23178 pread(172, \276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2..., 4096, 98004220) = 40960.09 23178 pread(172, \277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2..., 4096, 98004224) = 40960.13 22688... futex resumed ) = -1 ETIMEDOUT (Connection timed out)0.051276 23178 pread(172, \300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2..., 4096, 98004228) = 40960.10 22688 futex(0x464a9f28, FUTEX_WAKE_PRIVATE, 1 23178 pread(172,
Re: Solr Hanging all of sudden with update/csv
Actually this is definitely not a ram issue. I have visualVM connected and MAX Ram available for the JavaVM is ~7GB, but the system is only using ~5.5GB, with a MAX so far of 6.5GB consumed. I think..well I'm guessing the system hit a merge threshold, but I can't tell for sure..I have seen the index size grow rapidly today (much more then normal, in the last 3 hours the index size has increased by about 50%). From various posts I see that during the 'optimize' (which I have not called), or the perhaps the merging of segments it is normal for the disk space requirements to temporarily increase by 2x to 3x. As such my only assumption is that it must be conducing a merge. Note: since I restarted the solr server, I have only 1 client thread pushing data in (it already transmitted the data.(~2mb)). (and it has been held up for about 4 hours now..I believe its stuck waiting for the merge thread to complete). Is there a better way to handle merging? or at least predicting when it will occur? (I'm essentially using the defaults MergeFactor:10, ramBuffer 32MB). I'm totally new to solr/lucune/indexing in generaly so I'm so what clueless on all this.. It should be noted we have 'millions of documents' all which are generally 4k bytes. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Hanging-all-of-sudden-with-update-csv-tp2652903p2653423.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Hanging all of sudden with update/csv
The index size itself is about 270Gb, (we are hopping to support upto 500-1TB), and have supplied the system with ~3TB diskspace. That's simply massive for a single node. When the system tries to merge the segments the queries are probably not working? And the merges will take quite a while. How long is OK for a single query to return in? On Tue, Mar 8, 2011 at 2:17 PM, danomano dshopk...@earthlink.net wrote: Hi folks, I've been using solr for about 3 months. Our Solr install is a single node, and we have been injecting logging data into the solr server every couple of minutes, which each updating taking few minutes. Everything working fine until this morning, at which point it appeared that all updates were hung. Retarting the solr server did not help, as all updaters immediately 'hung' again. Poking around in the threads, and strace, I do in fact see stuff happening. The index size itself is about 270Gb, (we are hopping to support upto 500-1TB), and have supplied the system with ~3TB diskspace. Any Tips on what could be happening? notes: we have never run an optimize yet. we have never deleted from system yet. The merge Thread appears to be the one..'never returnning' Lucene Merge Thread #0 - Thread t@41 java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcher.pread0(Native Method) at sun.nio.ch.FileDispatcher.pread(FileDispatcher.java:31) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:234) at sun.nio.ch.IOUtil.read(IOUtil.java:210) at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:622) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:139) at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:94) at org.apache.lucene.store.DataOutput.copyBytes(DataOutput.java:176) at org.apache.lucene.index.FieldsWriter.addRawDocuments(FieldsWriter.java:209) at org.apache.lucene.index.SegmentMerger.copyFieldsNoDeletions(SegmentMerger.java:424) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:153) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4053) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3645) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:339) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:407) Some ptrace output: 23178 pread(172, \270\316\276\2\245\371\274\2\271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2..., 4096, 98004192) = 4096 0.09 23178 pread(172, \245\371\274\2\271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2..., 4096, 98004196) = 4096 0.09 23178 pread(172, \271\316\276\2\272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2..., 4096, 98004200) = 4096 0.08 23178 pread(172, \272\316\276\2\273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2..., 4096, 98004204) = 4096 0.08 23178 pread(172, \273\316\276\2\274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2..., 4096, 98004208) = 4096 0.08 23178 pread(172, \274\316\276\2\275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2..., 4096, 98004212) = 4096 0.09 23178 pread(172, \275\316\276\2\276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2..., 4096, 98004216) = 4096 0.08 23178 pread(172, \276\316\276\2\277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2..., 4096, 98004220) = 4096 0.09 23178 pread(172, \277\316\276\2\300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2..., 4096, 98004224) = 4096 0.13 22688 ... futex resumed ) = -1 ETIMEDOUT (Connection timed out) 0.051276 23178 pread(172, \300\316\276\2\301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2..., 4096, 98004228) = 4096 0.10 22688 futex(0x464a9f28, FUTEX_WAKE_PRIVATE, 1 23178 pread(172, \301\316\276\2\302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2\306\316\276\2..., 4096, 98004232) = 4096 0.10 22688 ... futex resumed ) = 0 0.51 23178 pread(172, \302\316\276\2\367\343\274\2\246\371\274\2\303\316\276\2\304\316\276\2\305\316\276\2\306\316\276\2\307\316\276\2..., 4096, 98004236) = 4096 0.10 22688 clock_gettime(CLOCK_MONOTONIC, 23178
Re: Help with explain query syntax
It's probably the WordDelimiterFilter: org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal: 1 splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 0 generateWordParts: 1 catenateAll: 0 catenateNumbers: 0 } Get rid of the preserveOriginal=1 in the query analyzer. -Yonik http://lucidimagination.com On Tue, Mar 1, 2011 at 9:01 AM, Glòria Martínez gloria.marti...@careesma.com wrote: Hello, I can't understand why this query is not matching anything. Could someone help me please? *Query* http://localhost:8894/solr/select?q=linguajob.plqf=company_namewt=xmlqt=dismaxdebugQuery=onexplainOther=id%3A1 response - lst name=responseHeader int name=status0/int int name=QTime12/int - lst name=params str name=explainOtherid:1/str str name=debugQueryon/str str name=qlinguajob.pl/str str name=qfcompany_name/str str name=wtxml/str str name=qtdismax/str /lst /lst result name=response numFound=0 start=0/ - lst name=debug str name=rawquerystringlinguajob.pl/str str name=querystringlinguajob.pl/str - str name=parsedquery +DisjunctionMaxQuery((company_name:(linguajob.pl linguajob) pl)~0.01) () /str - str name=parsedquery_toString +(company_name:(linguajob.pl linguajob) pl)~0.01 () /str lst name=explain/ str name=otherQueryid:1/str - lst name=explainOther - str name=1 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited clause(s) 0.0 = no match on required clause (company_name:(linguajob.pl linguajob) pl) *- What does this syntax (field:(token1 token2) token3) mean?* 0.0 = (NON-MATCH) fieldWeight(company_name:(linguajob.pl linguajob) pl in 0), product of: 0.0 = tf(phraseFreq=0.0) 1.6137056 = idf(company_name:(linguajob.pl linguajob) pl) 0.4375 = fieldNorm(field=company_name, doc=0) /str /lst str name=QParserDisMaxQParser/str null name=altquerystring/ null name=boostfuncs/ + lst name=timing ... /response There's only one document indexed: *Document* http://localhost:8894/solr/select?q=1qf=idwt=xmlqt=dismax response - lst name=responseHeader int name=status0/int int name=QTime2/int - lst name=params str name=qfid/str str name=wtxml/str str name=qtdismax/str str name=q1/str /lst /lst - result name=response numFound=1 start=0 - doc str name=company_nameLinguaJob.pl/str str name=id1/str int name=status6/int date name=timestamp2011-03-01T11:14:24.553Z/date /doc /result /response *Solr Admin Schema* Field: company_name Field Type: text Properties: Indexed, Tokenized, Stored Schema: Indexed, Tokenized, Stored Index: Indexed, Tokenized, Stored Position Increment Gap: 100 Index Analyzer: org.apache.solr.analysis.TokenizerChain Details Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: schema.UnicodeNormalizationFilterFactory args:{composed: false remove_modifiers: true fold: true version: java6 remove_diacritics: true } org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true enablePositionIncrements: true } org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal: 1 splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 1 generateWordParts: 1 catenateAll: 0 catenateNumbers: 1 } org.apache.solr.analysis.LowerCaseFilterFactory args:{} org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{} Query Analyzer: org.apache.solr.analysis.TokenizerChain Details Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: schema.UnicodeNormalizationFilterFactory args:{composed: false remove_modifiers: true fold: true version: java6 remove_diacritics: true } org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt expand: true ignoreCase: true } org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true } org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal: 1 splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 0 generateWordParts: 1 catenateAll: 0 catenateNumbers: 0 } org.apache.solr.analysis.LowerCaseFilterFactory args:{} org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{} Docs: 1 Distinct: 5 Top 5 terms term frequency lingua 1 linguajob.pl 1 linguajobpl 1 pl 1 job 1 *Solr Analysis* Field name: company_name Field value (Index): LinguaJob.pl Field value (Query): linguajob.pl *Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 term text LinguaJob.pl term type word source start,end 0,12 payload schema.UnicodeNormalizationFilterFactory {composed=false, remove_modifiers=true, fold=true, version=java6, remove_diacritics=true} term position 1 term text LinguaJob.pl term type word source start,end 0,12 payload org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, ignoreCase=true, enablePositionIncrements=true} term position 1 term text LinguaJob.pl term type word source start,end
Custom search filters
Hi all, I am trying to use a custom search filter (org.apache.lucene.search.Filter) but I am unsure of where I should configure this. Would I have to create my own SearchHandler that would wrap this logic in? Any example/suggestions out there? Thanks
Re: Use of multiple tomcat instance and shards.
Thank you all . Tommaso , Thanks. I will follow the links you suggested. Erick, It is Solr 1.4.1 .. Regards, Rajani Maski On Tue, Mar 8, 2011 at 10:16 PM, Tommaso Teofili tommaso.teof...@gmail.comwrote: Just one more hint, I didn't mention it in the previous email since I imagine the scenario you explained doesn't allow it but anyways you could also check Solr Cloud and its distributed requests [1]. Cheers, Tommaso [1] : http://wiki.apache.org/solr/SolrCloud#Distributed_Requests 2011/3/8 Tommaso Teofili tommaso.teof...@gmail.com Hi Rajani, i 2011/3/8 rajini maski rajinima...@gmail.com Tommaso, Please can you share any link that explains me about how to enable and do load balancing on the machines that you did mention above..? if you're querying Solr via SolrJ [1] you could use the LBHttpSolrServer [2] otherwise, if you still want Solr to be responsible for load balancing, implement a custom handler which wraps it (see [3]). Consider also that this load balancing often gets done using a VIP [4] or an Apache HTTP server in front of Solr. Hope this helps, Tommaso [1] : http://wiki.apache.org/solr/Solrj [2] : http://wiki.apache.org/solr/LBHttpSolrServer [3] : http://markmail.org/thread/25jrko5s7wlmzjf7 [4] : http://en.wikipedia.org/wiki/Virtual_IP_address
True master-master fail-over without data gaps
Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1 Here is another similar thread, but this one doesn't cover how 2 masters are kept in sync at all times: http://search-lucene.com/m/aOsyN15f1qd1 Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: NRT in Solr
I think once this starts yielding matches: Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Jae Joo jaejo...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, March 8, 2011 4:27:41 PM Subject: NRT in Solr Hi, Is NRT in Solr 4.0 from trunk? I have checkouted from Trunk, but could not find the configuration for NRT. Regards Jae
Re: NRT in Solr
I think once this starts yielding matches: trunk/solr$ find . -name \*java | xargs grep IndexReader | grep IndexWriter ...we'll know NRT has landed. Until then: http://wiki.apache.org/solr/NearRealtimeSearch Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Jae Joo jaejo...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, March 8, 2011 4:27:41 PM Subject: NRT in Solr Hi, Is NRT in Solr 4.0 from trunk? I have checkouted from Trunk, but could not find the configuration for NRT. Regards Jae
RE: True master-master fail-over without data gaps
I'd honestly think about buffer the incoming documents in some store that's actually made for fail-over persistence reliability, maybe CouchDB or something. And then that's taking care of not losing anything, and the problem becomes how we make sure that our solr master indexes are kept in sync with the actual persistent store; which I'm still not sure about, but I'm thinking it's a simpler problem. The right tool for the right job, that kind of failover persistence is not solr's specialty. From: Otis Gospodnetic [otis_gospodne...@yahoo.com] Sent: Tuesday, March 08, 2011 11:45 PM To: solr-user@lucene.apache.org Subject: True master-master fail-over without data gaps Hello, What are some common or good ways to handle indexing (master) fail-over? Imagine you have a continuous stream of incoming documents that you have to index without losing any of them (or with losing as few of them as possible). How do you set up you masters? In other words, you can't just have 2 masters where the secondary is the Repeater (or Slave) of the primary master and replicates the index periodically: you need to have 2 masters that are in sync at all times! How do you achieve that? * Do you just put N masters behind a LB VIP, configure them both to point to the index on some shared storage (e.g. SAN), and count on the LB to fail-over to the secondary master when the primary becomes unreachable? If so, how do you deal with index locks? You use the Native lock and count on it disappearing when the primary master goes down? That means you count on the whole JVM process dying, which may not be the case... * Or do you use tools like DRBD, Corosync, Pacemaker, etc. to keep 2 masters with 2 separate indices in sync, while making sure you write to only 1 of them via LB VIP or otherwise? * Or ... This thread is on a similar topic, but is inconclusive: http://search-lucene.com/m/aOsyN15f1qd1 Here is another similar thread, but this one doesn't cover how 2 masters are kept in sync at all times: http://search-lucene.com/m/aOsyN15f1qd1 Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/