Re: Boosting documents by categorical preferences
Chris, Sounds good! Thanks for the tips.. I'll be glad to submit my talk to this as I have a writeup pretty much ready to go. Cheers Amit On Tue, Jan 28, 2014 at 11:24 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : The initial results seem to be kinda promising... of course there are many : more optimizations I could do like decay user ratings over time to indicate : that preferences decay over time so a 5 rating a year ago doesn't count as : much as a 5 rating today. : : Hope this helps others. I'll open source what I have soon and post back. If : there is feedback or other thoughts let me know! Hey Amit, Glad to hear your user based boosting experiments are paying off. I would definitely love to see a more detailed writeup down the road showing off how it affects your final user metrics -- or perhaps even give a session on your technique at ApacheCon? http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp -Hoss http://www.lucidworks.com/
Re: Boosting documents by categorical preferences
Hi Chris (and others interested in this), Sorry for dropping off.. I got sidetracked with other work and came back to this and finally got a V1 of this implemented. The final process is as follows: 1) Pre-compute the global categorical num_ratings/average/std-dev (so for Action the average rating may be 3.49 with stdDev of .99) 2) For a given user, retrieve the last X (X for me is 10) ratings and compute the user's categorical affinities by taking the average rating for all movies in that particular category (Action) subtract the global cat average and divide by cat std_dev. Furthermore, multiply this by the fraction of total user ratings in that category. - For example, if a user's last 10 ratings consisted of 9/10 Drama and 1/10 Thriller, the z-score of the Thriller should be discounted relative to that of the Drama so that it's more prominent the user's preference (either positive or negative) to Drama. 3) Sort by the absolute value of the z-score (Thanks Hossman.. great thought). 4) Return the top 3 (arbitrary number) 5) Modify the query to look like the following: qq=tom hanksq={!boost b=$b defType=edismax v=$qq}cat1=category:Childrencat2=category:Fantasycat3=category:Animationb=sum(1,sum(product(query($cat1),0.22267872),product(query($cat2),0.21630952),product(query($cat3),0.21120241))) basically b = 1+(pref1*query(category:something1) + pref2*query(category:something2) + pref3*query(category:something3)) The initial results seem to be kinda promising... of course there are many more optimizations I could do like decay user ratings over time to indicate that preferences decay over time so a 5 rating a year ago doesn't count as much as a 5 rating today. Hope this helps others. I'll open source what I have soon and post back. If there is feedback or other thoughts let me know! Cheers Amit On Fri, Nov 22, 2013 at 11:38 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I thought about that but my concern/question was how. If I used the pow : function then I'm still boosting the bad categories by a small : amount..alternatively I could multiply by a negative number but does that : work as expected? I'm not sure i understand your concern: negative powers would give you values less then 1, positive powers would give you values greater then 1, and then you'd use those values as multiplicitive boosts -- so the values less then 1 would penalize the scores of existing matching docs in the categories the user dislikes. Oh wait ... i see, in your original email (and in my subsequent suggested tweak to use pow()) you were talking about sum()ing up these 3 category boosts (and i cut/pasted sum() in my example as well) ... yeah, using multiplcation there would make more sense if you wanted to do the negative prefrences as well, because then then score of any matching doc will be reduced if it matches on an undesired category -- and the amount it will be reduced will be determined by how strongly it matches on that category (ie: the base score returned by the nested query() func) and how negative the undesired prefrence value (ie: the pow() exponent) is qq=... q={!boost b=$b v=$qq} b=prod(pow(query($cat1,cat1z)),pow(query($cat2,cat2z)),pow(query($cat3,cat3z)) cat1=...action... cat1z=1.48 cat2=...comedy... cat2z=1.33 cat3=...kids... cat3z=-1.7 -Hoss
Re: Boosting documents by categorical preferences
I thought about that but my concern/question was how. If I used the pow function then I'm still boosting the bad categories by a small amount..alternatively I could multiply by a negative number but does that work as expected? I haven't done much with negative boosting except for the sledgehammer approach of category exclusion through filters. Thanks Amit On Nov 19, 2013 8:51 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : My approach was something like: : 1) Look at the categories that the user has preferred and compute the : z-score : 2) Pick the top 3 among those : 3) Use those to boost search results. I think that totaly makes sense ... the additional bit i was suggesting that you consider is that instead of picking the highest 3 z-scores, pick the z-scores with the greatest absolute value ... that way if someone is a very booring person and their positive interests are all basically exactly the same as the mean for everyone else, but they have some very strong dis-interests you don't bother boosting on those miniscule interests and instead you negatively boost on the things they are antogonistic against. -Hoss
Re: Boosting documents by categorical preferences
Hey Chris, Sorry for the delay and thanks for your response. This was inspired by your talk on boosting and biasing that you presented way back when at a meetup. I'm glad that my general approach seems to make sense. My approach was something like: 1) Look at the categories that the user has preferred and compute the z-score 2) Pick the top 3 among those 3) Use those to boost search results. I'll look at using the boosts as an exponent instead of a multiplier as I think that would make sense.. also as it handles the 0 case. This is for a prototype I am doing but I'll share the results one day in a meetup as I think it'll be kinda interesting. Thanks again Amit On Thu, Nov 14, 2013 at 11:11 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I have a question around boosting. I wanted to use the boost= to write a : nested query that will boost a document based on categorical preferences. You have no idea how stoked I am to see you working on this in a real world application. : Currently I have the weights set to the z-score equivalent of a user's : preference for that category which is simply how many standard deviations : above the global average is this user's preference for that movie category. : : My question though is basically whether or not semantically the equation : query(category:Drama)*some weight + query(category:Comedy)*some weight : + query(category:Action)*some weight makes sense? My gut says that your apprach makes sense -- but if i'm understadning you correclty, i think that you need to add 1 to all your weights: the boost is a multiplier, so if someone's rating for every category is is 0 std devs above the average rating (ie: the most average person imaginable), you don't wnat to give every moving in every category a score of 0. Are you picking the top 3 categories the user prefers as a cut off, or are you arbitrarily using N category boosts for however many N categories the user is above the global average in their pref for that category? Are your prefrences coming from explicit user feedback on the categories (ie: rate how much you like comedies on a scale of 1-5) or are you infering it from user ratings of the movies themselves? (ie: rate this movie, which happens to be an scifi,action,comedy, on a scale of 1-5) ... because if it's hte later you probably want to be careful to also normalize based on how many categories the movie is in. the other thing to consider is wether you want to include negative prefrences (ie: weights less then 1) based on how many std dev the user's average is *below* the global average for a category .. in this case i *think* you'd want to divide the raw value from -1 to get a useful multiplier. Alternatively: you oculd experiment with using the weights as exponents instead of multipliers... b=sum(pow(query($cat1),1.482),pow(query($cat2),0.1199),pow(query($cat3),1.448)) ...that would simplify the math you'd have to worry about both for the totally boring average user (x**0 = 1) and for the categories users hate (x**-5 = some positive fraction that will act as a penalty) ... but you'd definitley need to run some tests to see if it over boosts as the std dev variations get really high (might want to take a root first before using them as the exponent) -Hoss
Boosting documents by categorical preferences
Hi all, I have a question around boosting. I wanted to use the boost= to write a nested query that will boost a document based on categorical preferences. For a movie search for example, say that a user likes drama, comedy, and action. I could use things like qq=q={!boost%20b=$b%20defType=edismax%20v=$qq}b=sum(product(query($cat1),1.482),product(query($cat2),0.1199),product(query($cat3),1.448))cat1=category:Dramacat2=category:Comedycat3=category:Action where cat1=Drama cat2=Comedy cat3=Action Currently I have the weights set to the z-score equivalent of a user's preference for that category which is simply how many standard deviations above the global average is this user's preference for that movie category. My question though is basically whether or not semantically the equation query(category:Drama)*some weight + query(category:Comedy)*some weight + query(category:Action)*some weight makes sense? What are some techniques people use to boost documents based on discrete things like category, manufacturer, genre etc? Thanks! Amit
Re: When is/should qf different from pf?
Thanks Erick. Numeric fields make sense as I guess would strictly string fields too since its one term? In the normal text searching case though does it make sense to have qf and pf differ? Thanks Amit On Oct 28, 2013 3:36 AM, Erick Erickson erickerick...@gmail.com wrote: The facetious answer is when phrases aren't important in the fields. If you're doing a simple boolean match, adding phrase fields will add expense, to no good purpose etc. Phrases on numeric fields seems wrong. FWIW, Erick On Mon, Oct 28, 2013 at 1:03 AM, Amit Nithian anith...@gmail.com wrote: Hi all, I have been using Solr for years but never really stopped to wonder: When using the dismax/edismax handler, when do you have the qf different from the pf? I have always set them to be the same (maybe different weights) but I was wondering if there is a situation where you would have a field in the qf not in the pf or vice versa. My understanding from the docs is that qf is a term-wise hard filter while pf is a phrase-wise boost of documents who made it past the qf filter. Thanks! Amit
Re: How to configure solr to our java project in eclipse
Try this: http://hokiesuns.blogspot.com/2010/01/setting-up-apache-solr-in-eclipse.html I use this today and it still works. If anything is outdated (as it's a relatively old post) let me know. I wrote this so ping me if you have any questions. Thanks Amit On Sun, Oct 27, 2013 at 7:33 PM, Amit Aggarwal amit.aggarwa...@gmail.comwrote: How so you start your another project ? If it is maven or ant then you can use anturn plugin to start solr . Otherwise you can write a small shell script to start solr .. On 27-Oct-2013 9:15 PM, giridhar girimc...@gmail.com wrote: Hi friends,Iam giridhar.please clarify my doubt. we are using solr for our project.the problem the solr is outside of our project( in another folder) we have to manually type java -start.jar to start the solr and use that services. But what we need is,when we run the project,the solr should be automatically start. our project is a java project with tomcat in eclipse. How can i achieve this. Please help me. Thankyou. Giridhar -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-configure-solr-to-our-java-project-in-eclipse-tp4097954.html Sent from the Solr - User mailing list archive at Nabble.com.
When is/should qf different from pf?
Hi all, I have been using Solr for years but never really stopped to wonder: When using the dismax/edismax handler, when do you have the qf different from the pf? I have always set them to be the same (maybe different weights) but I was wondering if there is a situation where you would have a field in the qf not in the pf or vice versa. My understanding from the docs is that qf is a term-wise hard filter while pf is a phrase-wise boost of documents who made it past the qf filter. Thanks! Amit
Re: Restaurant availability from database
Hossman did a presentation on something similar to this using spatial data at a Solr meetup some months ago. http://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/ May be helpful to you. On Thu, May 23, 2013 at 9:40 AM, rajh ron...@trimm.nl wrote: Thank you for your answer. Do you mean I should index the availability data as a document in Solr? Because the availability data in our databases is around 6,509,972 records and contains the availability per number of seats and per 15 minutes. I also tried this method, and as far as I know it's only possible to join the availability documents and not to include that information per result document. An example API response (created from the Solr response): { restaurants: [ { id: 13906, name: Allerlei, zipcode: 6511DP, house_number: 59, available: true }, { id: 13907, name: Voorbeeld, zipcode: 6512DP, house_number: 39, available: false } ], resultCount: 12156, resultCountAvailable: 55, } I'm currently hacking around the problem by executing the search again with a very high value for the rows parameter and counting the number of available restaurants on the backend, but this causes a big performance impact (as expected). -- View this message in context: http://lucene.472066.n3.nabble.com/Restaurant-availability-from-database-tp4065609p4065710.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: writing a custom Filter plugin?
At first I thought you were referring to Filters in Lucene at query time (i.e. bitset filters) but I think you are referring to token filters at indexing/text analysis time? I have had success writing my own Filter as the link presents. The key is that you should write a custom class that extends TokenFilter ( http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/analysis/TokenFilter.html) and write the implementation in your incrementToken() method. My recollection of this is that instead of returning something of a Token like you would have in earlier versions of Lucene, you set attribute values on a notional current token. One obvious attribute is the term text itself and perhaps any positional information. The best place to start is to pick a fairly simple example from the Solr Source (maybe lowercasefilter) and try and mimic that. Cheers! Amit On Mon, May 13, 2013 at 1:33 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Does anyone know of any tutorials, basic examples, and/or documentation on writing your own Filter plugin for Solr? For Solr 4.x/4.3? I would like a Solr 4.3 version of the normalization filters found here for Solr 1.4: https://github.com/billdueber/**lib.umich.edu-solr-stuffhttps://github.com/billdueber/lib.umich.edu-solr-stuff But those are old, for Solr 1.4. Does anyone have any hints for writing a simple substitution Filter for Solr 4.x? Or, does a simple sourcecode example exist anywhere?
Re: Need solr query help
Is it possible instead to store in your solr index a bounding box of store location + delivery radius, do a bounding box intersection between your user's point + radius (as a bounding box) and the shop's delivery bounding box. If you want further precision, the frange may work assuming it's a post-filter implementation so that you are doing heavy computation on a presumably small set of data only to filter out the corner cases around the radius circle that results. I haven't looked at Solr's spatial querying in a while to know if this is possible or not. Cheers Amit On Sat, May 11, 2013 at 10:42 AM, smsolr sms...@hotmail.com wrote: Hi Abhishek, I forgot to explain why it works. It uses the frange filter which is mentioned here:- http://wiki.apache.org/solr/CommonQueryParameters and it works because it filters in results where the geodist minus the shopMaxDeliveryDistance is less than zero (that's what the u=0 means, upper limit=0), i.e.:- geodist - shopMaxDeliveryDistance 0 - geodist shopMaxDeliveryDistance i.e. the geodist is less than the shopMaxDeliveryDistance and so the shop is within delivery range of the location specified. smsolr -- View this message in context: http://lucene.472066.n3.nabble.com/Need-solr-query-help-tp4061800p4062603.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sharing index amongst multiple nodes
I don't understand why this would be more performant.. seems like it'd be more memory and resource intensive as you'd have multiple class-loaders and multiple cache spaces for no good reason. Just have a single core with sufficiently large caches to handle your response needs. If you want to load balance reads consider having multiple physical nodes with a master/slaves or SolrCloud. On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.comwrote: Hi. Wat are the thoughts on having multiple SOLR instances i.e. multiple SOLR war files, sharing the same index (i.e. sharing the same solr_home) where only one SOLR instance is used for writing and the others for reading? Is this possible? Is it beneficial - is it more performant than having just one solr instance? How does it affect auto-commits i.e. how would the read nodes know the index has been changed and re-populate cache etc.? Sole 3.6.1 Thanks.
Re: how to skip test while building
If you generate the maven pom files you can do this I think by doing mvn whtaever here -DskipTests=true. On Sat, Apr 6, 2013 at 7:25 AM, Erick Erickson erickerick...@gmail.comwrote: Don't know a good way to skip compiling the tests, but there isn't any harm in compiling them... changing to the solr directory and just issuing ant example dist builds pretty much everything. You don't execute tests unless you specify ant test. ant -p shows you all the targets. Note that you have different targets depending on whether you're executing it in solr_home or solr_home/solr or solr_home/lucene. Since you mention Solr, you probably want to work in solr_home/solr to start. Best Erick On Sat, Apr 6, 2013 at 5:36 AM, parnab kumar parnab.2...@gmail.com wrote: Hi All, I am new to Solr . I am using solr 3.4 . I want to build without building lucene tests files in lucene and skip the tests to be fired . Can anyone please help where to make the necessary changes . Thanks, Pom
Re: Solr 4.2 single server limitations
There's a whole heap of information that is missing like what you plan on storing vs indexing and yes QPS too. My short answer is try with one server until it falls over then start adding more. When you say multiple-server setup do you mean multiple servers where each server acts as a slave storing the entire index so you have load balancing across multiple servers OR do you mean multiple servers where each server stores a portion of the data? If it's the former, sometimes a simple master/slave setup in Solr 4.x works but the latter may mean SolrCloud. Master/Slave is easy but I don't know much about SolrCloud. Questions to think about (this is not exhaustive by any means) 1) When you say 5-10 pages per website (300+ websites) that you are crawling 2x per hour, are you *replacing* the old copy of the web page in your index or storing some form of history for some reason. 2) What are you planning on storing vs indexing which would dictate your memory requirements. 3) You mentioned you don't know QPS but having some guess would help.. is it mostly for storage and occasional lookup (where slow responses is probably tolerable) or is this powering a real user-facing website (where low latency is prob desired). Again, I like to start simple and use one server until it dies then expand from there. Cheers Amit On Thu, Apr 4, 2013 at 7:58 AM, imehesz imeh...@gmail.com wrote: hello, I'm using a single server setup with Nutch (1.6) and Solr (4.2) I plan to trigger the Nutch crawling process every 30 minutes or so and add about 300+ websites a month with (~5-10 pages each). At this point I'm not sure about the query requests/sec. Can I run this on a single server (how long)? If not, what would be the best and most efficient way to have multiple server setup? thanks, --iM -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-single-server-limitations-tp4053829.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: do SearchComponents have access to response contents
We need to also track the size of the response (as the size in bytes of the whole xml response tat is streamed, with stored fields and all). I was a bit worried cause I am wondering if a searchcomponent will actually have access to the response bytes... == Can't you get this from your container access logs after the fact? I may be misunderstanding something but why wouldn't mining the Jetty/Tomcat logs for the response size here suffice? Thanks! Amit On Thu, Apr 4, 2013 at 1:34 AM, xavier jmlucjav jmluc...@gmail.com wrote: A custom QueryResponseWriter...this makes sense, thanks Jack On Wed, Apr 3, 2013 at 11:21 PM, Jack Krupansky j...@basetechnology.com wrote: The search components can see the response as a namedlist, but it is only when SolrDispatchFIlter calls the QueryResponseWriter that XML or JSON or whatever other format (Javabin as well) is generated from the named list for final output in an HTTP response. You probably want a custom query response writer that wraps the XML response writer. Then you can generate the XML and then do whatever you want with it. The QueryResponseWriter class and queryResponseWriter in solrconfig.xml. -- Jack Krupansky -Original Message- From: xavier jmlucjav Sent: Wednesday, April 03, 2013 4:22 PM To: solr-user@lucene.apache.org Subject: do SearchComponents have access to response contents I need to implement some SearchComponent that will deal with metrics on the response. Some things I see will be easy to get, like number of hits for instance, but I am more worried with this: We need to also track the size of the response (as the size in bytes of the whole xml response tat is streamed, with stored fields and all). I was a bit worried cause I am wondering if a searchcomponent will actually have access to the response bytes... Can someone confirm one way or the other? We are targeting Sorl4.0 thanks xavier
Re: SOLR on hdfs
Why wouldn't SolrCloud help you here? You can setup shards and replicas etc to have redundancy b/c HDFS isn't designed to serve real time queries as far as I understand. If you are using HDFS as a backup mechanism to me you'd be better served having multiple slaves tethered to a master (in a non-cloud environment) or setup SolrCloud either option would give you more redundancy than copying an index to HDFS. - Amit On Wed, Mar 6, 2013 at 12:23 PM, Joseph Lim ysli...@gmail.com wrote: Hi Upayavira, sure, let me explain. I am setting up Nutch and SOLR in hadoop environment. Since I am using hdfs, in the event if there is any crashes to the localhost(running solr), i will still have the shards of data being stored in hdfs. Thanks you so much =) On Thu, Mar 7, 2013 at 1:19 AM, Upayavira u...@odoko.co.uk wrote: What are you actually trying to achieve? If you can share what you are trying to achieve maybe folks can help you find the right way to do it. Upayavira On Wed, Mar 6, 2013, at 02:54 PM, Joseph Lim wrote: Hello Otis , Is there any configuration where it will index into hdfs instead? I tried crawlzilla and lily but I hope to update specific package such as Hadoop only or nutch only when there are updates. That's y would prefer to install separately . Thanks so much. Looking forward for your reply. On Wednesday, March 6, 2013, Otis Gospodnetic wrote: Hello Joseph, You can certainly put them there, as in: hadoop fs -copyFromLocal localsrc URI But searching such an index will be slow. See also: http://katta.sourceforge.net/ Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 6, 2013 at 7:50 AM, Joseph Lim ysli...@gmail.com javascript:; wrote: Hi, Would like to know how can i put the indexed solr shards into hdfs? Thanks.. Joseph On Mar 6, 2013 7:28 PM, Otis Gospodnetic otis.gospodne...@gmail.comjavascript:; wrote: Hi Joseph, What exactly are you looking to to? See http://incubator.apache.org/blur/ Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 6, 2013 at 2:39 AM, Joseph Lim ysli...@gmail.com javascript:; wrote: Hi I am running hadoop distributed file system, how do I put my output of the solr dir into hdfs automatically? Thanks so much.. -- Best Regards, *Joseph* -- Best Regards, *Joseph* -- Best Regards, *Joseph*
Re: SOLR on hdfs
Joseph, Doing what Otis said will do literally what you want which is copying the index to HDFS. It's no different than copying it to a different machine which btw is what Solr's master/slave replication scheme does. Alternatively, I think people are starting to setup new Solr instances with SolrCloud which doesn't have the concept of master/slave but rather a series of nodes with the option of having replicas (what I believe to be backup nodes) so that you have the redundancy you want. Honestly HDFS in the way that you are looking for is probably no different than storing your solr index in a RAIDed storage format but I don't pretend to know much about RAID arrays. What exactly are you trying to achieve from a systems perspective? Why do you want Hadoop in the mix here and how does copying the index to HDFS help you? If SolrCloud seems complicated try just setting up a simple master/slave replication scheme for that's really easy. Cheers Amit On Wed, Mar 6, 2013 at 9:55 PM, Joseph Lim ysli...@gmail.com wrote: Hi Amit, so you mean that if I just want to get redundancy for solr in hdfs, the only best way to do it is to as per what Otis suggested using the following command hadoop fs -copyFromLocal localsrc URI Ok let me try out solrcloud as I will need to make sure it works well with nutch too.. Thanks for the help.. On Thu, Mar 7, 2013 at 5:47 AM, Amit Nithian anith...@gmail.com wrote: Why wouldn't SolrCloud help you here? You can setup shards and replicas etc to have redundancy b/c HDFS isn't designed to serve real time queries as far as I understand. If you are using HDFS as a backup mechanism to me you'd be better served having multiple slaves tethered to a master (in a non-cloud environment) or setup SolrCloud either option would give you more redundancy than copying an index to HDFS. - Amit On Wed, Mar 6, 2013 at 12:23 PM, Joseph Lim ysli...@gmail.com wrote: Hi Upayavira, sure, let me explain. I am setting up Nutch and SOLR in hadoop environment. Since I am using hdfs, in the event if there is any crashes to the localhost(running solr), i will still have the shards of data being stored in hdfs. Thanks you so much =) On Thu, Mar 7, 2013 at 1:19 AM, Upayavira u...@odoko.co.uk wrote: What are you actually trying to achieve? If you can share what you are trying to achieve maybe folks can help you find the right way to do it. Upayavira On Wed, Mar 6, 2013, at 02:54 PM, Joseph Lim wrote: Hello Otis , Is there any configuration where it will index into hdfs instead? I tried crawlzilla and lily but I hope to update specific package such as Hadoop only or nutch only when there are updates. That's y would prefer to install separately . Thanks so much. Looking forward for your reply. On Wednesday, March 6, 2013, Otis Gospodnetic wrote: Hello Joseph, You can certainly put them there, as in: hadoop fs -copyFromLocal localsrc URI But searching such an index will be slow. See also: http://katta.sourceforge.net/ Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 6, 2013 at 7:50 AM, Joseph Lim ysli...@gmail.com javascript:; wrote: Hi, Would like to know how can i put the indexed solr shards into hdfs? Thanks.. Joseph On Mar 6, 2013 7:28 PM, Otis Gospodnetic otis.gospodne...@gmail.comjavascript:; wrote: Hi Joseph, What exactly are you looking to to? See http://incubator.apache.org/blur/ Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 6, 2013 at 2:39 AM, Joseph Lim ysli...@gmail.com javascript:; wrote: Hi I am running hadoop distributed file system, how do I put my output of the solr dir into hdfs automatically? Thanks so much.. -- Best Regards, *Joseph* -- Best Regards, *Joseph* -- Best Regards, *Joseph* -- Best Regards, *Joseph*
Re: ping query frequency
We too run a ping every 5 seconds and I think the concurrent Mark/Sweep helps to avoid the LB from taking a box out of rotation due to long pauses. Either that or I don't see large enough pauses for my LB to take it out (it'd have to fail 3 times in a row or 15 seconds total before it's gone). The ping query does execute an actual query so of course you want to make this as simple as possible (i.e. q=primary_key:value) so that there's limited to no scanning of the index. I think our query does an id:0 which would always return 0 docs but also any stupid-simple query is fine so long as it hits the caches on subsequent hits. The goal, to me at least, is not that the ping query yields actual docs but that it's a mechanism to remove a solr server out of rotation without having to login to an ops controlled device directly. I'd definitely remove the ping per request (wouldn't the fact that you are doing /select serve as the ping and hence defeat the purpose of the ping query) and definitely do the frequent ping as we are describing if you want to have your solr boxes behind some load balancer. On Sun, Mar 3, 2013 at 8:21 AM, Shawn Heisey s...@elyograg.org wrote: On 3/3/2013 2:15 AM, adm1n wrote: I'm wonderring how frequent this query should be made. Currently it is done before each select request (some very old legacy). I googled a little and found out that it is bad practice and has performance impact. So the question is should I completely remove it or just do it once in some period of time. Can you point me at the place where it says that it's bad practice to do frequent pings? I use the ping functionality in my haproxy load balancer that sits in front of Solr. It executes a ping request against all my Solr instances every five seconds. Most of the time, the ping request (which is distributed) finishes in single-digit milliseconds. If that is considered bad practice, I want to figure out why and submit issues to get the problem fixed. I can imagine that sending a ping before every query would be a bad idea, but I am hoping that the way I'm using it is OK. The only problem with ping requests that I have ever noticed was caused by long garbage collection pauses on my 8GB Solr heap. Those pauses caused the load balancer to incorrectly mark the active Solr instance(s) as down and send requests to a backup. Through experimentation with -XX memory tuning options, I have now eliminated the GC pause problem. For machines running Solr 4.2-SNAPSHOT, I have reduced the heap to 6GB, the 3.5.0 machines are still running with 8GB. Thanks, Shawn
Re: Poll: SolrCloud vs. Master-Slave usage
But does that mean that in SolrCloud, slave nodes are busy indexing documents? On Fri, Mar 1, 2013 at 5:37 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Amit, NRT is not possible in a master-slave setup because of the necessity of a hard commit and replication, both of which add considerable delay. Solr Cloud sends each document for a given shard to each node hosting that shard, so there's no need for the hard commit and replication for visibility. You could conceivably get NRT on a single node without Solr Cloud, but there would be no redundancy. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Fri, Mar 1, 2013 at 1:22 AM, Amit Nithian anith...@gmail.com wrote: Erick, Well put and thanks for the clarification. One question: And if you need NRT, you just can't get it with traditional M/S setups. == Can you explain how that works with SolrCloud? I agree with what you said too because there was an article or discussion I read that said having high-availability masters requires some fairly complicated setups and I guess I am under-estimating how expensive/complicated our setup is relative to what you can get out of the box with SolrCloud. Thanks! Amit On Thu, Feb 28, 2013 at 6:29 PM, Erick Erickson erickerick...@gmail.com wrote: Amit: It's a balancing act. If I was starting fresh, even with one shard, I'd probably use SolrCloud rather than deal with the issues around the how do I recover if my master goes down question. Additionally, SolrCloud allows one to monitor the health of the entire system by monitoring the state information kept in Zookeeper rather than build a monitoring system that understands the changing topology of your network. And if you need NRT, you just can't get it with traditional M/S setups. In a mature production system where all the operational issues are figured out and you don't need NRT, it's easier just to plop 4.x in traditional M/S setups and not go to SolrCloud. And you're right, you have to understand Zookeeper which isn't all that difficult, but is another moving part and I'm a big fan of keeping the number of moving parts down if possible. It's not a one-size-fits-all situation. From what you've described, I can't say there's a compelling reason to do the SolrCloud thing. If you find yourself spending lots of time building monitoring or High Availability/Disaster Recovery tools, then you might find the cost/benefit analysis changing. Personally, I think it's ironic that the memory improvements that came along _with_ SolrCloud make it less necessary to shard. Which means that traditional M/S setups will suit more people longer G Best Erick On Thu, Feb 28, 2013 at 8:22 PM, Amit Nithian anith...@gmail.com wrote: I don't know a ton about SolrCloud but for our setup and my limited understanding of it is that you start to bleed operational and non-operational aspects together which I am not comfortable doing (i.e. software load balancing). Also adding ZooKeeper to the mix is yet another thing to install, setup, monitor, maintain etc which doesn't add any value above and beyond what we have setup already. For example, we have a hardware load balancer that can do the actual load balancing of requests among the slaves and taking slaves in and out of rotation either on demand or if it's down. We've placed a virtual IP on top of our multiple masters so that we have redundancy there. While we have multiple cores, the data volume is large enough to fit on one node so we aren't at the data volume necessary for sharding our indices. I suspect that if we had a sufficiently large dataset that couldn't fit on one box SolrCloud is perfect but when you can fit on one box, why add more complexity? Please correct me if I'm wrong for I'd like to better understand this! On Thu, Feb 28, 2013 at 12:53 AM, rulinma ruli...@gmail.com wrote: I am doing research on SolrCloud. -- View this message in context: http://lucene.472066.n3.nabble.com/Poll-SolrCloud-vs-Master-Slave-usage-tp4042931p4043582.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Poll: SolrCloud vs. Master-Slave usage
I don't know a ton about SolrCloud but for our setup and my limited understanding of it is that you start to bleed operational and non-operational aspects together which I am not comfortable doing (i.e. software load balancing). Also adding ZooKeeper to the mix is yet another thing to install, setup, monitor, maintain etc which doesn't add any value above and beyond what we have setup already. For example, we have a hardware load balancer that can do the actual load balancing of requests among the slaves and taking slaves in and out of rotation either on demand or if it's down. We've placed a virtual IP on top of our multiple masters so that we have redundancy there. While we have multiple cores, the data volume is large enough to fit on one node so we aren't at the data volume necessary for sharding our indices. I suspect that if we had a sufficiently large dataset that couldn't fit on one box SolrCloud is perfect but when you can fit on one box, why add more complexity? Please correct me if I'm wrong for I'd like to better understand this! On Thu, Feb 28, 2013 at 12:53 AM, rulinma ruli...@gmail.com wrote: I am doing research on SolrCloud. -- View this message in context: http://lucene.472066.n3.nabble.com/Poll-SolrCloud-vs-Master-Slave-usage-tp4042931p4043582.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Poll: SolrCloud vs. Master-Slave usage
Erick, Well put and thanks for the clarification. One question: And if you need NRT, you just can't get it with traditional M/S setups. == Can you explain how that works with SolrCloud? I agree with what you said too because there was an article or discussion I read that said having high-availability masters requires some fairly complicated setups and I guess I am under-estimating how expensive/complicated our setup is relative to what you can get out of the box with SolrCloud. Thanks! Amit On Thu, Feb 28, 2013 at 6:29 PM, Erick Erickson erickerick...@gmail.comwrote: Amit: It's a balancing act. If I was starting fresh, even with one shard, I'd probably use SolrCloud rather than deal with the issues around the how do I recover if my master goes down question. Additionally, SolrCloud allows one to monitor the health of the entire system by monitoring the state information kept in Zookeeper rather than build a monitoring system that understands the changing topology of your network. And if you need NRT, you just can't get it with traditional M/S setups. In a mature production system where all the operational issues are figured out and you don't need NRT, it's easier just to plop 4.x in traditional M/S setups and not go to SolrCloud. And you're right, you have to understand Zookeeper which isn't all that difficult, but is another moving part and I'm a big fan of keeping the number of moving parts down if possible. It's not a one-size-fits-all situation. From what you've described, I can't say there's a compelling reason to do the SolrCloud thing. If you find yourself spending lots of time building monitoring or High Availability/Disaster Recovery tools, then you might find the cost/benefit analysis changing. Personally, I think it's ironic that the memory improvements that came along _with_ SolrCloud make it less necessary to shard. Which means that traditional M/S setups will suit more people longer G Best Erick On Thu, Feb 28, 2013 at 8:22 PM, Amit Nithian anith...@gmail.com wrote: I don't know a ton about SolrCloud but for our setup and my limited understanding of it is that you start to bleed operational and non-operational aspects together which I am not comfortable doing (i.e. software load balancing). Also adding ZooKeeper to the mix is yet another thing to install, setup, monitor, maintain etc which doesn't add any value above and beyond what we have setup already. For example, we have a hardware load balancer that can do the actual load balancing of requests among the slaves and taking slaves in and out of rotation either on demand or if it's down. We've placed a virtual IP on top of our multiple masters so that we have redundancy there. While we have multiple cores, the data volume is large enough to fit on one node so we aren't at the data volume necessary for sharding our indices. I suspect that if we had a sufficiently large dataset that couldn't fit on one box SolrCloud is perfect but when you can fit on one box, why add more complexity? Please correct me if I'm wrong for I'd like to better understand this! On Thu, Feb 28, 2013 at 12:53 AM, rulinma ruli...@gmail.com wrote: I am doing research on SolrCloud. -- View this message in context: http://lucene.472066.n3.nabble.com/Poll-SolrCloud-vs-Master-Slave-usage-tp4042931p4043582.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: numFound is not correct while using Result Grouping
I need to write some tests which I hope to do tonight and then I think it'll get into 4.2 On Tue, Feb 26, 2013 at 6:24 AM, Nicholas Ding nicholas...@gmail.comwrote: Thanks Amit, that's cool! So it will also be fixed on Solr 4.2, right? On Mon, Feb 25, 2013 at 6:04 PM, Amit Nithian anith...@gmail.com wrote: Yeah I had a similar problem. I filed and submitted this patch: https://issues.apache.org/jira/browse/SOLR-4310 Let me know if this is what you are looking for! Amit On Mon, Feb 25, 2013 at 1:50 PM, Teun Duynstee t...@duynstee.com wrote: Ah, I see. The docs say Although this result format does not have as much information, it may be easier for existing solr clients to parse. I guess the ngroups value could be added to this format, but apparently it isn't. I do agree with you that to be usefull (as in possible to read for a client that doesn't know of the grouped format), the number should be that of the groups, not of the documents. A quick glance in the code learns that it is indeed not calculated in this case. But not completely trivial to fix. Could you use format=simple instead? That will work with ngroups. Teun 2013/2/25 Nicholas Ding nicholas...@gmail.com Thanks Teun and Carlos, I set group.ngroups=true, but I don't have this ngroup number when I was using group.main = true. On Mon, Feb 25, 2013 at 12:02 PM, Carlos Maroto cmar...@searchtechnologies.com wrote: Use group.ngroups, check it in the Solr wiki for FieldCollapsing Carlos Maroto Search Architect at Search Technologies ( www.searchtechnologies.com) Nicholas Ding nicholas...@gmail.com wrote: Hello, I grouped the result, and set group.main=true. I was expecting the numFound equals to the number of groups, but actually it was not. How do I get the number of groups? Thanks Nicholas
Re: numFound is not correct while using Result Grouping
Yeah I had a similar problem. I filed and submitted this patch: https://issues.apache.org/jira/browse/SOLR-4310 Let me know if this is what you are looking for! Amit On Mon, Feb 25, 2013 at 1:50 PM, Teun Duynstee t...@duynstee.com wrote: Ah, I see. The docs say Although this result format does not have as much information, it may be easier for existing solr clients to parse. I guess the ngroups value could be added to this format, but apparently it isn't. I do agree with you that to be usefull (as in possible to read for a client that doesn't know of the grouped format), the number should be that of the groups, not of the documents. A quick glance in the code learns that it is indeed not calculated in this case. But not completely trivial to fix. Could you use format=simple instead? That will work with ngroups. Teun 2013/2/25 Nicholas Ding nicholas...@gmail.com Thanks Teun and Carlos, I set group.ngroups=true, but I don't have this ngroup number when I was using group.main = true. On Mon, Feb 25, 2013 at 12:02 PM, Carlos Maroto cmar...@searchtechnologies.com wrote: Use group.ngroups, check it in the Solr wiki for FieldCollapsing Carlos Maroto Search Architect at Search Technologies (www.searchtechnologies.com) Nicholas Ding nicholas...@gmail.com wrote: Hello, I grouped the result, and set group.main=true. I was expecting the numFound equals to the number of groups, but actually it was not. How do I get the number of groups? Thanks Nicholas
Re: [ANN] vifun: tool to help visually tweak Solr boosting
This is cool! I had done something similar except changing via JConsole/JMX: https://issues.apache.org/jira/browse/SOLR-2306 We had something not as nice at Zvents but I wanted to expose these as MBean properties so you could change them via any JMX UI like JVisualVM Cheers! Amit On Mon, Feb 25, 2013 at 2:36 PM, jmlucjav jmluc...@gmail.com wrote: Apologies...instructions are wrong on the cd, these commands are to be run at the top level of the project...I fixed the doc to read: cd vifun griffon run-app On Mon, Feb 25, 2013 at 10:45 PM, Jan Høydahl jan@cominvent.com wrote: Hi, I actually tried ../griffonw run-app but it says griffon-app does not appear to be part of a Griffon application. I installed griffon and tried again griffon run-app inside of griffon-app, but same error. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 25. feb. 2013 kl. 19:51 skrev jmlucjav jmluc...@gmail.com: Jan, thanks for looking at this! - Running from source: would you care to send me the error you get (if any) when running from source? I assume you have griffon1.1.0 installed right? - Binary dist: the distrib is created by griffon, so I'll check if the permission issue (I develop on windows, and tested on a clean windows too, so I don't face the issue you mention) is known or can be fixed somehow. I'll update the doc anyway. - wt param: I am already overriding wt param (in order to use javabin). What I didn't allow is to choose the handler to be used when submitting the query. I guess any handler that does not have appends/invariants that would interfere would work fine, I just thought /select is mostly available in most installations and that is one thing less to configure. But yes, I could let the user configure it, I'll open an issue. xavier On Mon, Feb 25, 2013 at 3:10 PM, Jan Høydahl jan@cominvent.com wrote: Cool. I tried running from source (using the bundled griffonw), but I think the instructions may be wrong, had to download binary dist. The file permissions for bin/vifun in binary dist should have +w so you can execute it with ./vifun What about the ability to override the wt param, so that you can point it to the /browse handler directly? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 23. feb. 2013 kl. 15:12 skrev jmlucjav jmluc...@gmail.com: Hi, I have built a small tool to help me tweak some params in Solr (typically qf, bf in edismax). As maybe others find it useful, I am open sourcing it on github: https://github.com/jmlucjav/vifun Check github for some more info and screenshots. I include part of the github page below. regards Description Did you ever spend lots of time trying to tweak all numbers in a *edismax* handler *qf*, *bf*, etc params so docs get scored to your liking? Imagine you have the params below, is 20 the right boosting for *name* or is it too much? Is *population* being boosted too much versus distance? What about new documents? !-- fields, boost some -- str name=qfname^20 textsuggest^10 edge^5 ngram^2 phonetic^1/str str name=mm33%/str !-- boost closest hits -- str name=bfrecip(geodist(),1,500,0)/str !-- boost by population -- str name=bfproduct(log(sum(population,1)),100)/str !-- boost newest docs -- str name=bfrecip(rord(moddate),1,1000,1000)/str This tool was developed in order to help me tweak the values of boosting functions etc in Solr, typically when using edismax handler. If you are fed up of: change a number a bit, restart Solr, run the same query to see how documents are scored now...then this tool is for you. https://github.com/jmlucjav/vifun#featuresFeatures - Can tweak numeric values in the following params: *qf, pf, bf, bq, boost, mm* (others can be easily added) even in *appends or invariants* - View side by side a Baseline query result and how it changes when you gradually change each value in the params - Colorized values, color depends on how the document does related to baseline query - Tooltips give you Explain info - Works on remote Solr installations - Tested with Solr 3.6, 4.0 and 4.1 (other versions would work too, as long as wt=javabin format is compatible) - Developed using Groovy/Griffon https://github.com/jmlucjav/vifun#requirementsRequirements - */select* handler should be available, and not have any *appends or invariants*, as it could interfere with how vifun works. - Java6 is needed (maybe it runs on Java5 too). A JRE should be enough. https://github.com/jmlucjav/vifun#getting-startedGetting started
Re: Slaves always replicate entire index Index versions
A few others have posted about this too apparently and SOLR-4413 is the root problem. Basically what I am seeing is that if your index directory is not index/ but rather index.timestamp set in the index.properties a new index will be downloaded all the time because the download is expecting your index to be in solr_data_dir/index. Sounds like a quick solution might be to rename your index directory to just index and see if the problem goes away. To confirm, look at line 728 in the SnapPuller.java file (in downloadIndexFiles) I am hoping that the patch and a more unified getIndexDir can be added to the next release of Solr as this is a fairly significant bug to me. Cheers Amit On Thu, Feb 21, 2013 at 12:56 AM, Amit Nithian anith...@gmail.com wrote: So the diff in generation numbers are due to the commits I believe that Solr does when it has the new index files but the fact that it's downloading a new index each time is baffling and I just noticed that too (hit the replicate button and noticed a full index download). I'm going to pop in to the source and see what's going on to see why unless there's a known bug filed about this? On Tue, Feb 19, 2013 at 1:48 AM, Raúl Grande Durán raulgrand...@hotmail.com wrote: Hello. We have recently updated our Solr from 3.5 to 4.1 and everything is running perfect except the replication between nodes. We have a master-repeater-2slaves architecture and we have seen some things that weren't happening before: When a Slave (repeater or slaves) starts to replicate it needs to download the entire index. Even when some little changes has been made to the index at master. This takes such a long time since our index is more than 20 Gb.After replication cycle we have different index generations in master, repeater and slaves. For example:Master: gen. 64590Repeater: gen. 64591Both slaves: gen. 64592 My replicationHandler configuration is like this:requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=confFilesschema.xml,stopwords.txt/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=masterUrl${solr.master.url:http://localhost/solr}/str str name=pollInterval00:03:00/str /lst /requestHandler Our problems are very similar to those explained here: http://lucene.472066.n3.nabble.com/Problem-with-replication-td2294313.html Any ideas?? Thanks
Re: Slaves always replicate entire index Index versions
Thanks for the links... I have updated SOLR-4471 with a proposed solution that I hope can be incorporated or amended so we can get a clean fix into the next version so our operations and network staff will be happier with not having gigs of data flying around the network :-) On Thu, Feb 21, 2013 at 1:24 AM, raulgrande83 raulgrand...@hotmail.comwrote: Hi Amit, I have came across some JIRAs that may be useful in this issue: https://issues.apache.org/jira/browse/SOLR-4471 https://issues.apache.org/jira/browse/SOLR-4354 https://issues.apache.org/jira/browse/SOLR-4303 https://issues.apache.org/jira/browse/SOLR-4413 https://issues.apache.org/jira/browse/SOLR-2326 Please, let us know if you find any solution. Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-Index-versions-tp4041256p4041817.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slaves always replicate entire index Index versions
Sounds good I am trying the combination of my patch and 4413 now to see how it works and will have to see if I can put unit tests around them as some of what I thought may not be true with respect to the commit generation numbers. For your issue above in your last post, is it possible that there was a commit on the master in that slight window after solr checks for the latest generation of the master but before it downloads the actual files? How frequent are the commits on your master? On Thu, Feb 21, 2013 at 2:00 AM, raulgrande83 raulgrand...@hotmail.comwrote: Thanks for the patch, we'll try to install these fixes and post if replication works or not. I renamed 'index.timestamp' folders to just 'index' but it didn't work. These lines appeared in the log: INFO: Master's generation: 64594 21-feb-2013 10:42:00 org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave's generation: 64593 21-feb-2013 10:42:00 org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Starting replication process 21-feb-2013 10:42:00 org.apache.solr.handler.SnapPuller fetchFileList SEVERE: No files to download for index generation: 64594 -- View this message in context: http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-Index-versions-tp4041256p4041827.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Anyone else see this error when running unit tests?
Okay so I think I found a solution if you are a maven user and don't mind forcing the test codec to Lucene40 then do the following: Add this to your pom.xml under the build pluginManagement plugins section plugin groupIdorg.apache.maven.plugins/groupId artifactIdmaven-surefire-plugin/artifactId version2.13/version configuration argLine-Dtests.codec=Lucene40/argLine /configuration /plugin If you are running in Eclipse, simply add this as a VM argument. The default test codec is set to random and this means that there is a possibility of picking Lucene3x if some random variable is 2 and other conditions are met. For me, my test-framework jar must not be ahead of the lucene one (b/c I don't control the classpath order and honestly this shouldn't be a requirement to run a test) so it periodically bombed. This little fix seems to have helped provided that you don't care about Lucene3x vs Lucene40 for your tests (I am on Lucene40 so it's fine for me). HTH! Amit On Mon, Feb 4, 2013 at 6:18 PM, Roman Chyla roman.ch...@gmail.com wrote: Me too, it fails randomly with test classes. We use Solr4.0 for testing, no maven, only ant. --roman On 4 Feb 2013 20:48, Mike Schultz mike.schu...@gmail.com wrote: Yes. Just today actually. I had some unit test based on AbstractSolrTestCase which worked in 4.0 but in 4.1 they would fail intermittently with that error message. The key to this behavior is found by looking at the code in the lucene class: TestRuleSetupAndRestoreClassEnv. I don't understand it completely but there are a number of random code paths through there. The following helped me get around the problem, at least in the short term. @org.apache.lucene.util.LuceneTestCase.SuppressCodecs({Lucene3x,Lucene40}) public class CoreLevelTest extends AbstractSolrTestCase { I also need to call this inside my setUp() method, in 4.0 this wasn't required. initCore(solrconfig.xml, schema.xml, /tmp/my-solr-home); -- View this message in context: http://lucene.472066.n3.nabble.com/Anyone-else-see-this-error-when-running-unit-tests-tp4015034p4038472.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: replication problems with solr4.1
I may be missing something but let me go back to your original statements: 1) You build the index once per week from scratch 2) You replicate this from master to slave. My understanding of the way replication works is that it's meant to only send along files that are new and if any files named the same between the master and slave have different sizes then this is a corruption of sorts and do this index.timestamp and send the full thing down. This, I think, explains your index.timestamp issue although why the old index/ directory isn't being deleted i'm not sure about. This is why I was asking about OS details, file system details etc (perhaps something else is locking that directory preventing Java from deleting it?) The second issue is the index generation which is governed by commits and is represented by looking at the last few characters in the segments_XX file. When the slave downloads the index and does the copy of the new files, it does a commit to force a new searcher hence why the slave generation will be +1 from the master. The index version is a timestamp and it may be the case that the version represents the point in time when the index was downloaded to the slave? In general, it shouldn't matter about these details because replication is only triggered if the master's version slave's version and the clocks that all servers use are synched to some common clock. Caveat however in my answer is that I have yet to try 4.1 as this is next on my TODO list so maybe I'll run into the same problem :-) but I wanted to provide some info as I just recently dug through the replication code to understand it better myself. Cheers Amit On Wed, Feb 13, 2013 at 11:57 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: OK then index generation and index version are out of count when it comes to verify that master and slave index are in sync. What else is possible? The strange thing is if master is 2 or more generations ahead of slave then it works! With your logic the slave must _always_ be one generation ahead of the master, because the slave replicates from master and then does an additional commit to recognize the changes on the slave. This implies that the slave acts as follows: - if the master is one generation ahaed then do an additional commit - if the master is 2 or more generations ahead then do _no_ commit OR - if the master is 2 or more generations ahead then do a commit but don't change generation and version of index Can this be true? I would say not really. Regards Bernd Am 13.02.2013 20:38, schrieb Amit Nithian: Okay so then that should explain the generation difference of 1 between the master and slave On Wed, Feb 13, 2013 at 10:26 AM, Mark Miller markrmil...@gmail.com wrote: On Feb 13, 2013, at 1:17 PM, Amit Nithian anith...@gmail.com wrote: doesn't it do a commit to force solr to recognize the changes? yes. - Mark
Re: Boost Specific Phrase
Have you looked at the pf parameter for dismax handlers? pf does I think what you are looking for which is to boost documents with the query term exactly matching in the various fields with some phrase slop. On Wed, Feb 13, 2013 at 2:59 AM, Hemant Verma hemantverm...@gmail.comwrote: Hi All I have a use case with phrase search. Let say I have a list of phrases in a file/dictionaries which are important as per our search content. One entry in the dictionary is lets say - project manager. If user's query contains any entry specified in dictionary then I want to boost the score of documents which have exact match of that entry. Lets take one example:- Now suppose user searches for (project manager in India with 2 yrs experience). There are words 'project manager' in the query in exact order as specified in dictionary then I want to boost the score of documents having 'project manager' as an exact match. This can be done at web application level after processing user query with dictionary and create query as below: q=project manager in India with 2 yrs experienceqf=titlebq=title:project manager^5 I want to know is there any better solution available to this use case at Solr level. AFAIK there is something very similar available in FAST ESP know as Phrase Recognition. Thanks Hemant -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-Specific-Phrase-tp4040188.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: what do you use for testing relevance?
Ultimately this is dependent on what your metrics for success are. For some places it may be just raw CTR (did my click through rate increase) but for other places it may be a function of money (either it may be gross revenue, profits, # items sold etc). I don't know if there is a generic answer for this question which is leading those to write their own frameworks b/c it's very specific to your needs. A scoring change that leads to an increase in CTR may not necessarily lead to an increase in the metric that makes your business go. On Tue, Feb 12, 2013 at 10:31 PM, Steffen Elberg Godskesen steffen.godske...@gmail.com wrote: Hi Roman, If you're looking for regression testing then https://github.com/sul-dlss/rspec-solr might be worth looking at. If you're not a ruby shop, doing something similar in another language shouldn't be to hard. The basic idea is that you setup a set of tests like If the query is X, then the document with id Y should be in the first 10 results If the query is S, then a document with title T should be the first result If the query is P, then a document with author Q should not be in the first 10 result and that you run these whenever you tune your scoring formula to ensure that you haven't introduced unintended effects. New ideas/requirements for your relevance ranking should always result in writing new tests - that will probably fail until you tune your scoring formula. This is certainly no magic bullet, but it will give you some confidence that you didn't make things worse. And - in my humble opinion - it also gives you the benefit of discouraging you from tuning your scoring just for fun. To put it bluntly: if you cannot write up a requirement in form of a test, you probably have no need to tune your scoring. Regards, -- Steffen On Tuesday, February 12, 2013 at 23:03 , Roman Chyla wrote: Hi, I do realize this is a very broad question, but still I need to ask it. Suppose you make a change into the scoring formula. How do you test/know/see what impact it had? Any framework out there? It seems like people are writing their own tools to measure relevancy. Thanks for any pointers, roman
Re: replication problems with solr4.1
So just a hunch... but when the slave downloads the data from the master, doesn't it do a commit to force solr to recognize the changes? In so doing, wouldn't that increase the generation number? In theory it shouldn't matter because the replication looks for files that are different to determine whether or not to do a full download or a partial replication. In the event of a full replication (an optimize would cause this), I think the replication handler considers this a corruption and forces a full download into this index.timestamp folder with the index.properties pointing at this folder to tell solr this is the new index directory. Since you mentioned you rebuild the index from scratch once per week I'd expect to see this behavior you are mentioning. I remember debugging the code to find out how replication works in 4.0 because of a bug that was fixed in 4.1 but I haven't read through the 4.1 code to see how much (if any) has changed from this logic. In short, I don't know why you'd have the old index/ directory there.. that seems either like a bug or something was locking that directory in the filesystem preventing it from being removed. What OS are you using and is the index/ directory stored on a local file system vs NFS? HTH Amit On Tue, Feb 12, 2013 at 2:26 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Now this is strange, the index generation and index version is changing with replication. e.g. master has index generation 118 index version 136059533234 and slave has index generation 118 index version 136059533234 are both same. Now add one doc to master with commit. master has index generation 119 index version 1360595446556 Next replicate master to slave. The result is: master has index generation 119 index version 1360595446556 slave has index generation 120 index version 1360595564333 I have not seen this before. I thought replication is just taking over the index from master to slave, more like a sync? Am 11.02.2013 09:29, schrieb Bernd Fehling: Hi list, after upgrading from solr4.0 to solr4.1 and running it for two weeks now it turns out that replication has problems and unpredictable results. My installation is single index 41 mio. docs / 115 GB index size / 1 master / 3 slaves. - the master builds a new index from scratch once a week - a replication is started manually with Solr admin GUI What I see is one of these cases: - after a replication a new searcher is opened on index.xxx directory and the old data/index/ directory is never deleted and besides the file replication.properties there is also a file index.properties OR - the replication takes place everything looks fine but when opening the admin GUI the statistics report Last Modified: a day ago Num Docs: 42262349 Max Doc: 42262349 Deleted Docs: 0 Version: 45174 Segment Count: 1 VersionGen Size Master: 1360483635404 112 116.5 GB Slave:1360483806741 113 116.5 GB In the first case, why is the replication doing that??? It is an offline slave, no search activity, just there fore backup! In the second case, why is the version and generation different right after full replication? Any thoughts on this? - Bernd -- * Bernd FehlingBielefeld University Library Dipl.-Inform. (FH)LibTec - Library Technology Universitätsstr. 25 and Knowledge Management 33615 Bielefeld Tel. +49 521 106-4060 bernd.fehling(at)uni-bielefeld.de BASE - Bielefeld Academic Search Engine - www.base-search.net *
Re: replication problems with solr4.1
Okay so then that should explain the generation difference of 1 between the master and slave On Wed, Feb 13, 2013 at 10:26 AM, Mark Miller markrmil...@gmail.com wrote: On Feb 13, 2013, at 1:17 PM, Amit Nithian anith...@gmail.com wrote: doesn't it do a commit to force solr to recognize the changes? yes. - Mark
Re: Boost Specific Phrase
Ah yes sorry mis-understood. Another option is to use n-grams so that projectmanager is a term so any query involving project manager in india with 2 years experience would match higher because the query would contain projectmanager as a term. On Wed, Feb 13, 2013 at 9:56 PM, Hemant Verma hemantverm...@gmail.comwrote: Thanks for the response. pf parameter actually boost the documents considering all search keywords mentioned in main query but I am looking for something which boost the documents considering few search keywords from the user query. Like as per the example, user query is (project manager in India with 2 yrs experience) and my dictionary contains one entry as 'project manager' which specifies if user query has 'project manager' in his query then boost those documents which contains 'project manager' as an exact match. -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-Specific-Phrase-tp4040188p4040371.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr HTTP Replication Question
Okay one last note... just for closure... looks like it was addressed in solr 4.1+ (I was looking at 4.0). On Thu, Jan 24, 2013 at 11:14 PM, Amit Nithian anith...@gmail.com wrote: Okay so after some debugging I found the problem. While the replication piece will download the index from the master server and move the files to the index directory but during the commit phase, these older generation files are deleted and the index is essentially left in tact. I noticed that a full copy is needed if the index is stale (meaning that files in common between the master and slave have different sizes) but also I think a full copy should be needed if the slaves generation is higher than the master as well. In short, to me it's not sufficient enough to simply say a full copy is needed if the slave's index version is = master's index version. I'll create a patch and file a bug along with a more thorough writeup of how I got in this state. Thanks! Amit On Thu, Jan 24, 2013 at 2:33 PM, Amit Nithian anith...@gmail.com wrote: Does Solr's replication look at the generation difference between master and slave when determining whether or not to replicate? To be more clear: What happens if a slave's generation is higher than the master yet the slave's index version is less than the master's index version? I looked at the source and didn't seem to see any reason why the generation matters other than fetching the file list from the master for a given generation. It's too wordy to explain how this happened so I'll go into details on that if anyone cares. Thanks! Amit
Re: Solr HTTP Replication Question
Okay so after some debugging I found the problem. While the replication piece will download the index from the master server and move the files to the index directory but during the commit phase, these older generation files are deleted and the index is essentially left in tact. I noticed that a full copy is needed if the index is stale (meaning that files in common between the master and slave have different sizes) but also I think a full copy should be needed if the slaves generation is higher than the master as well. In short, to me it's not sufficient enough to simply say a full copy is needed if the slave's index version is = master's index version. I'll create a patch and file a bug along with a more thorough writeup of how I got in this state. Thanks! Amit On Thu, Jan 24, 2013 at 2:33 PM, Amit Nithian anith...@gmail.com wrote: Does Solr's replication look at the generation difference between master and slave when determining whether or not to replicate? To be more clear: What happens if a slave's generation is higher than the master yet the slave's index version is less than the master's index version? I looked at the source and didn't seem to see any reason why the generation matters other than fetching the file list from the master for a given generation. It's too wordy to explain how this happened so I'll go into details on that if anyone cares. Thanks! Amit
Re: group.ngroups behavior in response
A new response attribute would be better but it also complicates the patch in that it would require a new way to serialize DocSlices I think (especially when group.main=true)? I was looking to set group.main=true so that my existing clients don't have to change to parse the grouped resultset format. Secondly, while a new response attribute makes sense the question is whether or not numFound is the numGroups or numTotal. To me it should be the number of groups because logically that is what the resultset shows and the new attribute should point to the number of total. Thanks Amit
group.ngroups behavior in response
Hi all, I recently discovered the group.main=true/false parameter which really has made life simple in terms of ensuring that the format coming out of Solr for my clients (RoR app) is backwards compatible with the non-grouped results which ensures no special handle grouped results logic. The only issue though is that the numFound is the number of total matches instead of the number of groups which can seem odd (and incorrect if you rely on the numFound to determine whether or not to display a next page link). I created a JIRA issue, SOLR-4310, and submitted a patch for this and wanted to get feedback to see if this is an issue that others have encountered and if so, would this help. Thanks Amit
Re: Grouping by a date field
Why not create a new field that just contains the day component? Then you can group by this field. On Thu, Nov 29, 2012 at 12:38 PM, sdanzig sdan...@gmail.com wrote: I'm trying to create a SOLR query that groups/field collapses by date. I have a field in -MM-dd'T'HH:mm:ss'Z' format, datetime, and I'm looking to group by just per day. When grouping on this field using group.field=datetime in the query, SOLR responds with a group for every second. I'm able to easily use this field to create day-based facets, but not groups. Advice please? - Scott -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-by-a-date-field-tp4023318.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Grouping by a date field
What's the performance impact of doing this? On Thu, Nov 29, 2012 at 7:54 PM, Jack Krupansky j...@basetechnology.comwrote: Or group by a function query which is the date field converted to milliseconds divided by the number of milliseconds in a day. Such as: q=*:*group=truegroup.func=**rint(div(ms(date_dt),mul(24,** mul(60,mul(60,1000) -- Jack Krupansky -Original Message- From: Amit Nithian Sent: Thursday, November 29, 2012 10:29 PM To: solr-user@lucene.apache.org Subject: Re: Grouping by a date field Why not create a new field that just contains the day component? Then you can group by this field. On Thu, Nov 29, 2012 at 12:38 PM, sdanzig sdan...@gmail.com wrote: I'm trying to create a SOLR query that groups/field collapses by date. I have a field in -MM-dd'T'HH:mm:ss'Z' format, datetime, and I'm looking to group by just per day. When grouping on this field using group.field=datetime in the query, SOLR responds with a group for every second. I'm able to easily use this field to create day-based facets, but not groups. Advice please? - Scott -- View this message in context: http://lucene.472066.n3.**nabble.com/Grouping-by-a-date-** field-tp4023318.htmlhttp://lucene.472066.n3.nabble.com/Grouping-by-a-date-field-tp4023318.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is there a way to prevent abusing rows parameter
If you're going to validate the rows parameter, may as well validate the start parameter too.. I've run into problems with start and rows with ridiculously high values crash our servers. On Thu, Nov 22, 2012 at 9:58 AM, solr-user solr-u...@hotmail.com wrote: Thanks guys. This is a problem with the front end not validating requests. I was hoping there might be a simple config value I could enter/change, rather than going the long process of migrating a proper fix all the way up to our production servers. Looks like not, but thx. -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-prevent-abusing-rows-parameter-tp4021467p4021892.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search among multiple cores
You can simplify your code by searching across cores in the SearchComponent: 1) public class YourComponent implements SolrCoreAware -- Grab instance of CoreContainer and store (mCoreContainer = core.getCoreDescriptor().getCoreContainer();) 2) In the process method: * grab the core requested (SolrCore core = mCoreContainer.getCore(sCoreName);) This way you can avoid having to implement the listener you mentioned and passing this in the servlet config. On Mon, Nov 26, 2012 at 7:28 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Would http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer save you some work? Otis -- SOLR Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html On Mon, Nov 26, 2012 at 7:18 PM, Nicholas Ding nicholas...@gmail.com wrote: Hi, I'm working on a search engine project based on Solr. Now I have three cores (Core A, B, C). I need to search Core A and Core B to get required parameters to search Core C. So far, I wrote a SearchComponent which uses SolrJ inside because I can't access other cores directly in SearchComponent. I was bit worried about performance and scalability because SolrJ brings little HTTP overhead. After digging into the Solr's source code, I wrote a SolrContextListener to initialize CoreContainer at server startup then put it into a ServlerContext. Then I wrote another Servlet to get a reference from ServletContext and now I'm able to get all the Core references in Java. The good part is I can access all Solr's internal structure in Java, but the bad part is I have to deal with internal types which requires deep understanding of Solr's source code. I was wondering if anybody had done similar things before? What's the side effects of extending Solr in code level? Thanks Nicholas
Re: custom request handler
Hi Lee, So the query component would be a subclass of SearchComponent and you can define the list of components executed during a search handler. http://wiki.apache.org/solr/SearchComponent I *think* you can have a custom component do what you want as long as it's the first component in the list so you can inspect and re-set the parameters before it goes downstream to the other components. However, it's still not clear how you are going to prevent users from POSTing bad queries or looking at things they probably shouldn't be like the schema.xml or solrconfig.xml or the admin console. Maybe there are ways in Solr to prevent this but then you'd have to allow it for internal admins but exclude it for the public. If you are exposing your slaves to the actual world wide public then I'd strongly suggest an app layer between solr and the public. I treat Solr like my database meaning that I don't expose access to my database publicly but rather through some app layer (say some CMS tools or what not). HTH! Amit On Sun, Nov 11, 2012 at 5:23 AM, Lee Carroll lee.a.carr...@googlemail.comwrote: Only slaves are public facing and they are read only, with limited query request handlers defined. The above approach is to prevent abusive / in appropriate queries by clients. A query component sounds interesting would this be implemented through an interface so could be separate from solr or would it be sub classing a base component ? cheers lee c On 9 November 2012 17:24, Amit Nithian anith...@gmail.com wrote: Lee, I guess my question was if you are trying to prevent the big bad world from doing stuff they aren't supposed to in Solr, how are you going to prevent the big bad world from POSTing a delete all query? Or restrict them from hitting the admin console, looking at the schema.xml, solrconfig.xml. I guess the question here is who is the big bad world? The internet at large or employees/colleagues in your organization? If it's the internet at large then I'd totally decouple this from Solr b/c I want to be 100% sure that the *only* thing that the internet has access to is a GET on /select with some restrictions and this could be done in many places but it's not clear that coupling this to Solr is the place to do it. If the big bad world is just within your organization and you want some basic protections around what they can and can't see then what you are saying is reasonable to me. Also perhaps another option is to consider a query component rather than creating a sublcass of the request handler as a query component promotes more re-use and flexibility. You could make the necessary parameter changes in the prepare() method and just make sure that this safe parameter component comes before the query component in the list of components for a handler and you should be fine. Cheers! Amit On Fri, Nov 9, 2012 at 5:39 AM, Lee Carroll lee.a.carr...@googlemail.com wrote: Hi Amit I did not do this via a servlet filter as I wanted the solr devs to be concerned with solr config and keep them out of any concerns of the container. By specifying declarative data in a request handler that would be enough to produce a service uri for an application. Or have I missed a point ? We have several cores with several apps all with different data query needs. Maybe 20 request handlers needed to support this with active development on going. Basically I want it easy for devs to create a specific request handler suited to their needs. I thought a servletfilter developed and mainatined every time would be over kill. Again though I may have missed a point / over emphasised a difficulty? Are you saying my custom request handler is to tightly bound to solr? so the parameters my apps talk is not de-coupled enough from solr? Lee C On 7 November 2012 19:49, Amit Nithian anith...@gmail.com wrote: Why not do this in a ServletFilter? Alternatively, I'd just write a front end application servlet to do this so that you don't firewall your internal admins off from accessing the core Solr admin pages. I guess you could solve this using some form of security but I don't know this well enough. If I were to restrict access to certain parts of Solr, I'd do this outside of Solr itself and do this in a servlet or a filter, inspecting the parameters. It's easy to create a modifiable parameters class and populate that with acceptable parameters before the Solr filter operates on it. HTH Amit
Re: Preventing accepting queries while custom QueryComponent starts up?
Jack, I think the issue is that the ping which is used to determine whether or not the server is live returns a seemingly false positive back to the load balancer (and indirectly the client) that this server is ready to go when in fact it's not. Reading this page ( http://wiki.apache.org/solr/SolrConfigXml), it does seem to be documented to do this but it may not be fully stressed to hide your Solr behind a load balancer. I am more than happy to write up a post that, in my opinion at least, stresses some best practices on the use of Solr based on my experience if others find this useful. What seems odd here is that the ping is a query so maybe the ping query in the solrconfig (for Aaron and others having this) should be configured to hit the handler that is used by the front end app so that while that handler is warming up the ping query will be blocked. Of course using the load balancer means that the app layer knows nothing about servers in and out of rotation. Cheers! Amit On Sun, Nov 11, 2012 at 8:05 AM, Jack Krupansky j...@basetechnology.comwrote: Is the issue here that the Solr node is continuously live with the load balancer so that the moment during startup that Solr can respond to anything, the load balancer will be sending it traffic and that this can occur while Solr is still warming up? First, shouldn't we be encouraging people to have an app layer between Solr and the outside world? If so, the app layer should simply not respond to traffic until the app layer can verified that Solr has stabilized. If not, then maybe we do need to suggest a change to Solr so that the developer can control exactly when Solr becomes live and responsive to incoming traffic. At a minimum, we should document when that moment is today in terms of an explicit contract. It sounds like the problem is that the contract is either nonexistent, vague, ambiguous, non-deterministic, or whatever. -- Jack Krupansky -Original Message- From: Amit Nithian Sent: Saturday, November 10, 2012 4:24 PM To: solr-user@lucene.apache.org Subject: Re: Preventing accepting queries while custom QueryComponent starts up? Yeah that's what I was suggesting in my response too. I don't think your load balancer should be doing this but whatever script does the release (restarting the container) should do this so that when the ping is enabled the warming has finished. On Sat, Nov 10, 2012 at 3:33 PM, Erick Erickson erickerick...@gmail.com* *wrote: Hmmm, rather than hit the ping query, why not just send in a real query and only let the queued ones through after the response? Just a random thought Erick On Sat, Nov 10, 2012 at 2:53 PM, Amit Nithian anith...@gmail.com wrote: Yes but the problem is that if user facing queries are hitting a server that is warming up and isn't being serviced quickly, then you could potentially bring down your site if all the front end threads are blocked on Solr queries b/c those queries are waiting (presumably at the container level since the filter hasn't finished its init() sequence) for the warming to complete (this is especially notorious when your front end is rails). This is why your ping to enable/disable a server from the load balancer has to be accurate with regards to whether or not a server is truly ready and warm. I think what I am gathering from this discussion is that the server is warming up, the ping is going through and tells the load balancer this server is ready, user queries are hitting this server and are queued waiting for the firstSearcher to finish (say these initial user queries are to respond in 500-1000ms) that's terrible for performance. Alternatively, if you have a bunch of servers behind a load balancer, you want this one server (or block of servers depending on your deployment) to be reasonably sure that user queries will return in a decent time (whatever you define decent to be) hence why this matters. Let me know if I am missing anything. Thanks Amit On Sat, Nov 10, 2012 at 10:03 AM, Erick Erickson erickerick...@gmail.com wrote: Why does it matter? The whole idea of firstSearcher queries is to warm up your system as fast as possible. The theory is that upon restarting the server, let's bet this stuff going immediately... They were never intended (as far as I know) to complete before any queries were handled. As an aside, I'm not quite sure I understand why pings during the warmup are a problem. But anyway. firstSearcher is particularly relevant because the autowarmCount settings on your caches are irrelevant when starting the server, there's no history to autowarm But, there's no good reason _not_ to let queries through while firstSearcher is doing it's tricks, they just get into the queue and are served as quickly as they may. That might be some time since, as you say, they may not get serviced
Re: 4.0 query question
Why not group by cid using the grouping component, within the group sort by version descending and return 1 result per group. http://wiki.apache.org/solr/FieldCollapsing Cheers Amit On Fri, Nov 9, 2012 at 2:56 PM, dm_tim dm_...@yahoo.com wrote: I think I may have found my answer buy I'd like additional validation: I believe that I can add a function to my query to get only the highest values of 'file_version' like this - _val_:max(file_version, 1) I seem to be getting the results I want. Does this look correct? Regards, Tim -- View this message in context: http://lucene.472066.n3.nabble.com/4-0-query-question-tp4019397p4019426.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Preventing accepting queries while custom QueryComponent starts up?
Yes but the problem is that if user facing queries are hitting a server that is warming up and isn't being serviced quickly, then you could potentially bring down your site if all the front end threads are blocked on Solr queries b/c those queries are waiting (presumably at the container level since the filter hasn't finished its init() sequence) for the warming to complete (this is especially notorious when your front end is rails). This is why your ping to enable/disable a server from the load balancer has to be accurate with regards to whether or not a server is truly ready and warm. I think what I am gathering from this discussion is that the server is warming up, the ping is going through and tells the load balancer this server is ready, user queries are hitting this server and are queued waiting for the firstSearcher to finish (say these initial user queries are to respond in 500-1000ms) that's terrible for performance. Alternatively, if you have a bunch of servers behind a load balancer, you want this one server (or block of servers depending on your deployment) to be reasonably sure that user queries will return in a decent time (whatever you define decent to be) hence why this matters. Let me know if I am missing anything. Thanks Amit On Sat, Nov 10, 2012 at 10:03 AM, Erick Erickson erickerick...@gmail.comwrote: Why does it matter? The whole idea of firstSearcher queries is to warm up your system as fast as possible. The theory is that upon restarting the server, let's bet this stuff going immediately... They were never intended (as far as I know) to complete before any queries were handled. As an aside, I'm not quite sure I understand why pings during the warmup are a problem. But anyway. firstSearcher is particularly relevant because the autowarmCount settings on your caches are irrelevant when starting the server, there's no history to autowarm But, there's no good reason _not_ to let queries through while firstSearcher is doing it's tricks, they just get into the queue and are served as quickly as they may. That might be some time since, as you say, they may not get serviced until the expensive parts get filled. But I don't think having them be serviced is doing any harm. Now, newSearcher and autowarming of the caches is a completely different beast since having the old searchers continue serving requests until the warmups _does_ directly impact the user, they don't see random slowness because a searcher is being opened. So I guess my real question is whether you're seeing a measurable problem or if this is a red herring FWIW, Erick On Thu, Nov 8, 2012 at 2:54 PM, Aaron Daubman daub...@gmail.com wrote: Greetings, I have several custom QueryComponents that have high one-time startup costs (hashing things in the index, caching things from a RDBMS, etc...) Is there a way to prevent solr from accepting connections before all QueryComponents are ready? Especially, since many of our instance are load-balanced (and added-in/removed automatically based on admin/ping responses) preventing ping from answering prior to all custom QueryComponents being ready would be ideal... Thanks, Aaron
Re: My latest solr blog post on Solr's PostFiltering
Oh weird. I'll post URLs on their own lines next time to clarify. Thanks guys and looking forward to any feedback! Cheers Amit On Fri, Nov 9, 2012 at 2:05 AM, Dmitry Kan dmitry@gmail.com wrote: I guess the url should have been: http://hokiesuns.blogspot.com/2012/11/using-solrs-postfiltering-to-collect.html i.e. without 'and' in the end of it. -- Dmitry On Fri, Nov 9, 2012 at 12:03 PM, Erick Erickson erickerick...@gmail.com wrote: It's always good when someone writes up their experiences! But when I try to follow that link, I get to your Random Writings, but it tells me that the blog post doesn't exist... Erick On Thu, Nov 8, 2012 at 4:21 PM, Amit Nithian anith...@gmail.com wrote: Hey all, I wanted to thank those who have helped in answering some of my esoteric questions and especially the one about using Solr's post filtering feature to implement some score statistics gathering we had to do at Zvents. To show this appreciation and to help advance the knowledge of this space in a more codified fashion, I have written a blog post about this work and open sourced the work as well. Please take a read by visiting http://hokiesuns.blogspot.com/2012/11/using-solrs-postfiltering-to-collect.htmland please let me know if there are any inaccuracies or points of contention so I can address/correct them. Thanks! Amit -- Regards, Dmitry Kan
Re: custom request handler
Lee, I guess my question was if you are trying to prevent the big bad world from doing stuff they aren't supposed to in Solr, how are you going to prevent the big bad world from POSTing a delete all query? Or restrict them from hitting the admin console, looking at the schema.xml, solrconfig.xml. I guess the question here is who is the big bad world? The internet at large or employees/colleagues in your organization? If it's the internet at large then I'd totally decouple this from Solr b/c I want to be 100% sure that the *only* thing that the internet has access to is a GET on /select with some restrictions and this could be done in many places but it's not clear that coupling this to Solr is the place to do it. If the big bad world is just within your organization and you want some basic protections around what they can and can't see then what you are saying is reasonable to me. Also perhaps another option is to consider a query component rather than creating a sublcass of the request handler as a query component promotes more re-use and flexibility. You could make the necessary parameter changes in the prepare() method and just make sure that this safe parameter component comes before the query component in the list of components for a handler and you should be fine. Cheers! Amit On Fri, Nov 9, 2012 at 5:39 AM, Lee Carroll lee.a.carr...@googlemail.comwrote: Hi Amit I did not do this via a servlet filter as I wanted the solr devs to be concerned with solr config and keep them out of any concerns of the container. By specifying declarative data in a request handler that would be enough to produce a service uri for an application. Or have I missed a point ? We have several cores with several apps all with different data query needs. Maybe 20 request handlers needed to support this with active development on going. Basically I want it easy for devs to create a specific request handler suited to their needs. I thought a servletfilter developed and mainatined every time would be over kill. Again though I may have missed a point / over emphasised a difficulty? Are you saying my custom request handler is to tightly bound to solr? so the parameters my apps talk is not de-coupled enough from solr? Lee C On 7 November 2012 19:49, Amit Nithian anith...@gmail.com wrote: Why not do this in a ServletFilter? Alternatively, I'd just write a front end application servlet to do this so that you don't firewall your internal admins off from accessing the core Solr admin pages. I guess you could solve this using some form of security but I don't know this well enough. If I were to restrict access to certain parts of Solr, I'd do this outside of Solr itself and do this in a servlet or a filter, inspecting the parameters. It's easy to create a modifiable parameters class and populate that with acceptable parameters before the Solr filter operates on it. HTH Amit
Re: is it possible to save the search query?
Are you trying to do this in real time or offlline? Wouldn't mining your access logs help? It may help to have your front end application pass in some extra parameters that are not interpreted by Solr but are there for stamping purposes for log analysis. One example could be a user id or user cookie or something in case you have to construct sessions. On Wed, Nov 7, 2012 at 10:01 PM, Romita Saha romita.s...@sg.panasonic.comwrote: Hi, The following is the example; 1st query: http://localhost:8983/solr/db/select/?defType=dismaxdebugQuery=onq=cashier2qf=data ^2 idstart=0rows=11fl=data,id Next query: http://localhost:8983/solr/db/select/?defType=dismaxdebugQuery=onq=cashier2qf=data id^2start=0rows=11fl=data,id In the 1st query the the field 'data' is boosted by 2. However may be the user was not satisfied with the response. Thus in the next query he boosted the field 'id' by 2. I want to record both the queries and compare between the two, meaning, what are the changes implemented on the 2nd query which are not present in the previous one. Thanks and regards, Romita Saha From: Otis Gospodnetic otis.gospodne...@gmail.com To: solr-user@lucene.apache.org, Date: 11/08/2012 01:35 PM Subject:Re: is it possible to save the search query? Hi, Compare in what sense? An example will help. Otis -- Performance Monitoring - http://sematext.com/spm On Nov 7, 2012 8:45 PM, Romita Saha romita.s...@sg.panasonic.com wrote: Hi All, Is it possible to record a search query in solr and then compare it with the previous search query? Thanks and regards, Romita Saha
Re: Searching for Partial Words
Look at the normal ngram tokenizer. Engine with ngram size 3 would yield eng ngi gin ine so a search for engi should match. You can play around with the min/max values. Edge ngram is useful for prefix matching but sounds like you want intra-word matching too? (eng should match ResidentEngineer) On Tue, Nov 6, 2012 at 7:35 AM, Sohail Aboobaker sabooba...@gmail.comwrote: Thanks Jack. In the configuration below: fieldType name=text_edgngrm class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.EdgeNGramTokenizerFactory side=front minGramSize=1 maxGramSize=1/ /analyzer /fieldType What are the possible values for side? If I understand it correctly, minGramSize=3 and side=front, will include eng* but not en*. Is this correct? So, the minGramSize is for number of characters allowed in the specified side. Does it allow side=both :) or something similar? Regards, Sohail
Re: Preventing accepting queries while custom QueryComponent starts up?
I think Solr does this by default and are you executing warming queries in the firstSearcher so that these actions are done before Solr is ready to accept real queries? On Thu, Nov 8, 2012 at 11:54 AM, Aaron Daubman daub...@gmail.com wrote: Greetings, I have several custom QueryComponents that have high one-time startup costs (hashing things in the index, caching things from a RDBMS, etc...) Is there a way to prevent solr from accepting connections before all QueryComponents are ready? Especially, since many of our instance are load-balanced (and added-in/removed automatically based on admin/ping responses) preventing ping from answering prior to all custom QueryComponents being ready would be ideal... Thanks, Aaron
Re: Preventing accepting queries while custom QueryComponent starts up?
Sorry I misunderstood. I am having difficulty finding this but it's never clear the exact load order. It seems odd that you'd be getting requests when the filter (DispatchFilter) hasn't 100% loaded yet. I didn't think that the admin handler would allow requests while the dispatch filter is still init'ing but sounds like it is? I'll have to play with this to see.. curious what the problem is for we have a similar setup but not as bad of an init problem (plus when I deploy, my deploy script runs some actual simple test queries to ensure they return before enabling the ping handler to return 200s) to avoid this problem. Cheers Amit On Thu, Nov 8, 2012 at 1:33 PM, Aaron Daubman daub...@gmail.com wrote: Amit, I am using warming /firstSearcher queries to ensure this happens before any external queries are received, however, unless I am misinterpreting the logs, solr starts responding to admin/ping requests before firstSearcher completes, and, the LB then puts the solr instance back in the pool, and it starts accepting connections... On Thu, Nov 8, 2012 at 4:24 PM, Amit Nithian anith...@gmail.com wrote: I think Solr does this by default and are you executing warming queries in the firstSearcher so that these actions are done before Solr is ready to accept real queries? On Thu, Nov 8, 2012 at 11:54 AM, Aaron Daubman daub...@gmail.com wrote: Greetings, I have several custom QueryComponents that have high one-time startup costs (hashing things in the index, caching things from a RDBMS, etc...) Is there a way to prevent solr from accepting connections before all QueryComponents are ready? Especially, since many of our instance are load-balanced (and added-in/removed automatically based on admin/ping responses) preventing ping from answering prior to all custom QueryComponents being ready would be ideal... Thanks, Aaron
Re: Preventing accepting queries while custom QueryComponent starts up?
Hi Aaron, Check out http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/handler/PingRequestHandler.html You'll see the ?action=enable/disable. I have our load balancers remove the server out of rotation when the response code != 200 for some number of times in a row which I suspect you are doing too. If I am rolling releasing our search code to production, it gets disabled, sleep for some known number of seconds for the LB to yank the search server out of rotation, push the code, execute some queries using CURL to ensure a response (the warming process should block the request until done) and then enable. HTH! Amit On Thu, Nov 8, 2012 at 2:01 PM, Aaron Daubman daub...@gmail.com wrote: (plus when I deploy, my deploy script runs some actual simple test queries to ensure they return before enabling the ping handler to return 200s) to avoid this problem. What are you doing to programmatically disable/enable the ping handler? This sounds like exactly what I should be doing as well...
Re: custom request handler
Why not do this in a ServletFilter? Alternatively, I'd just write a front end application servlet to do this so that you don't firewall your internal admins off from accessing the core Solr admin pages. I guess you could solve this using some form of security but I don't know this well enough. If I were to restrict access to certain parts of Solr, I'd do this outside of Solr itself and do this in a servlet or a filter, inspecting the parameters. It's easy to create a modifiable parameters class and populate that with acceptable parameters before the Solr filter operates on it. HTH Amit On Tue, Nov 6, 2012 at 6:46 AM, Lee Carroll lee.a.carr...@googlemail.comwrote: Hi we are extending SearchHandler to provide a custom search request handler. Basically we've added NamedLists called allowed , whiteList, maxMinList etc. These look like the default, append and invariant namedLists in the standard search handler config. In handleRequestBody we then remove params not listed in the allowed named list, white list values as per the white list and so on. The idea is to have a safe request handler which the big bad world could be exposed to. I'm worried. What have we missed that a front end app could give us ? Also removing params in SolrParams is a bit clunky. We are basically converting SolrParams into NamedList processing a new NamedList from this and then .setParams(SolrParams.toSolrParams(nlNew)) Is their a better way? In particular namedLists are not set up for key look ups... Anyway basically is having a custom request handler doing the above the way to go ? Cheers
Re: Urgent Help Needed: Solr Data import problem
This error is typically because of a mysql permissions problem. These are usually resolved by a GRANT statement on your DB to allow for users to connect remotely to your database server. I don't know the full syntax but a quick search on Google should yield what you are looking for. If you don't control access to this DB, talk to your sys admin who does maintain this access and s/he should be able to help resolve this. On Tue, Oct 30, 2012 at 7:13 AM, Travis Low t...@4centurion.com wrote: Like Amit said, this appears not to be a Solr problem. From the command line of your machine, try this: mysql -u'readonly' -p'readonly' -h'10.86.29.32' hpcms_db_new If that works, and 10.86.29.32 is the server referenced by the URL in your data-config.xml problem, then at least you know you have database connectivity, and to the right server. Also, if your unix server (presumably your mysql server) is 10.86.29.32, then the URL in your data-config.xml is pointing to the wrong machine. If the one in the data-config.xml is correct, you need to test for connectivity to that machine instead. cheers, Travis On Tue, Oct 30, 2012 at 5:15 AM, kunal sachdeva kunalsachde...@gmail.comwrote: Hi, This is my data-config file:- dataConfig dataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql:// 172.16.37.160:3306/hpcms_db_new user=readonly password=readonly/ document entity name=package query=select concat('pckg', id) as id,pkg_name,updated_time from hp_package_info; /entity entity name=destination query=select name,id from hp_city field column=name name=dest_name/ /entity !-- entity name=theme query=select id,name from hp_themes field column=name name=theme_name/ /entity -- /document /dataConfig and password is not null. and 10.86.29.32 is my unix server ip. regards, kunal On Tue, Oct 30, 2012 at 2:42 PM, Dave Stuart d...@axistwelve.com wrote: It looks as though you have a password set on your unix server. you will need to either remove this or ti add the password into the connection string e.g. readonly:[yourpassword]@'10.86.29.32' 'readonly'@'10.86.29.32' (using password: NO) On 30 Oct 2012, at 09:08, kunal sachdeva wrote: Hi, I'm not getting this error while running in local machine. Please Help Regards, Kunal On Tue, Oct 30, 2012 at 10:32 AM, Amit Nithian anith...@gmail.com wrote: This looks like a MySQL permissions problem and not a Solr problem. Caused by: java.sql.SQLException: Access denied for user 'readonly'@'10.86.29.32' (using password: NO) I'd advise reading your stack traces a bit more carefully. You should check your permissions or if you don't own the DB, check with your DBA to find out what user you should use to access your DB. - Amit On Mon, Oct 29, 2012 at 9:38 PM, kunal sachdeva kunalsachde...@gmail.com wrote: Hi, I have tried using data-import in my local system. I was able to execute it properly. but when I tried to do it unix server I got following error:- INFO: Starting Full Import Oct 30, 2012 9:40:49 AM org.apache.solr.handler.dataimport.SimplePropertiesWriter readIndexerProperties WARNING: Unable to read: dataimport.properties Oct 30, 2012 9:40:49 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [core0] REMOVING ALL DOCUMENTS FROM INDEX Oct 30, 2012 9:40:49 AM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=/opt/testsolr/multicore/core0/data/index,segFN=segments_1,version=1351490646879,generation=1,filenames=[segments_1] Oct 30, 2012 9:40:49 AM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1351490646879 Oct 30, 2012 9:40:49 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity destination with URL: jdbc:mysql:// 172.16.37.160:3306/hpcms_db_new Oct 30, 2012 9:40:50 AM org.apache.solr.common.SolrException log SEVERE: Exception while processing: destination document : SolrInputDocument[{}]:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select name,id from hp_city Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:264) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426) Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select name,id from hp_city Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument
Re: Any way to by pass the checking on QueryElevationComponent
Is the goal to have the elevation data read from somewhere else? In other words, why don't you want the elevate.xml to exist locally? If you want to read the data from somewhere else, could you put a dummy elevate.xml locally and subclass the QueryElevationComponent and override the loadElevationMap() to read this data from your own custom location? On Fri, Oct 26, 2012 at 6:47 PM, James Ji jiayu...@gmail.com wrote: Hi there We are currently working on having Solr files read from HDFS. We extended some of the classes so as to avoid modifying the original Solr code and make it compatible with the future release. So here comes the question, I found in QueryElevationComponent, there is a piece of code checking whether elevate.xml exists at local file system. I am wondering if there is a way to by pass this? QueryElevationComponent.inform(){ File fC = new File(core.getResourceLoader().getConfigDir(), f); File fD = new File(core.getDataDir(), f); if (fC.exists() == fD.exists()) { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, QueryElevationComponent missing config file: ' + f + \n + either: + fC.getAbsolutePath() + or + fD.getAbsolutePath() + must exist, but not both.); } if (fC.exists()) { exists = true; log.info(Loading QueryElevation from: +fC.getAbsolutePath()); Config cfg = new Config(core.getResourceLoader(), f); elevationCache.put(null, loadElevationMap(cfg)); } } -- Jiayu (James) Ji, *** Cell: (312)823-7393 Website: https://sites.google.com/site/jiayuji/ ***
Re: Urgent Help Needed: Solr Data import problem
This looks like a MySQL permissions problem and not a Solr problem. Caused by: java.sql.SQLException: Access denied for user 'readonly'@'10.86.29.32' (using password: NO) I'd advise reading your stack traces a bit more carefully. You should check your permissions or if you don't own the DB, check with your DBA to find out what user you should use to access your DB. - Amit On Mon, Oct 29, 2012 at 9:38 PM, kunal sachdeva kunalsachde...@gmail.com wrote: Hi, I have tried using data-import in my local system. I was able to execute it properly. but when I tried to do it unix server I got following error:- INFO: Starting Full Import Oct 30, 2012 9:40:49 AM org.apache.solr.handler.dataimport.SimplePropertiesWriter readIndexerProperties WARNING: Unable to read: dataimport.properties Oct 30, 2012 9:40:49 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [core0] REMOVING ALL DOCUMENTS FROM INDEX Oct 30, 2012 9:40:49 AM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=/opt/testsolr/multicore/core0/data/index,segFN=segments_1,version=1351490646879,generation=1,filenames=[segments_1] Oct 30, 2012 9:40:49 AM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1351490646879 Oct 30, 2012 9:40:49 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity destination with URL: jdbc:mysql:// 172.16.37.160:3306/hpcms_db_new Oct 30, 2012 9:40:50 AM org.apache.solr.common.SolrException log SEVERE: Exception while processing: destination document : SolrInputDocument[{}]:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select name,id from hp_city Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:264) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426) Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select name,id from hp_city Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225) ... 3 more Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select name,id from hp_city Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:683) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619) ... 5 more Caused by: java.sql.SQLException: Access denied for user 'readonly'@'10.86.29.32' (using password: NO) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3491) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3423) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:910) at com.mysql.jdbc.MysqlIO.secureAuth411(MysqlIO.java:3923) at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1273) at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2031) at com.mysql.jdbc.ConnectionImpl.init(ConnectionImpl.java:718) at com.mysql.jdbc.JDBC4Connection.init(JDBC4Connection.java:46) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at
Re: Monitor Deleted Event
I'm not 100% sure about this but looks like update processors may help? http://wiki.apache.org/solr/UpdateRequestProcessor It looks like you can put in custom code to execute when certain actions happen so sounds like this is what you are looking for. Cheers Amit On Wed, Oct 24, 2012 at 8:43 AM, jefferyyuan yuanyun...@gmail.com wrote: When some docs are deleted from Solr server, I want to execute some code - for example, add an record such as {contentid, deletedat} to another solr server or database. How can I do this through Solr or Lucene? Thanks for any reply and help :) -- View this message in context: http://lucene.472066.n3.nabble.com/Monitor-Deleted-Event-tp4015624.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Monitor Deleted Event
Since Lucene is a library there isn't much of a support for this since in theory the client application issuing the delete could also then do something else upon delete. solr on the other hand being a layer (a server layer) sitting on top of lucene, it makes sense for hooks to be configured there. Since here you can intercept the delete event, you can do what you wish with it (i.e. in your case maybe send a notification event to another solr server to add a record). On Wed, Oct 24, 2012 at 9:19 AM, jefferyyuan yuanyun...@gmail.com wrote: Thanks very much :) This is what I am looking for. And I also wonder whether this some thing as DeleteEvent in Solr or Lucene? Is there a way to do this in Lucene? - Not familiar with Lucene yet :) As I may choose to do this in lower level... -- View this message in context: http://lucene.472066.n3.nabble.com/Monitor-Deleted-Event-tp4015624p4015641.html Sent from the Solr - User mailing list archive at Nabble.com.
Understanding Filter Queries
Hi all, Quick question. I've been reading up on the filter query and how it's implemented and the multiple articles I see keep referring to this notion of leap frogging and filter query execution in parallel with the main query. Question: Can someone point me to the code that does this so I can better understand? Thanks! Amit
Benchmarking/Performance Testing question
Hi all, I know there have been many posts about this already and I have done my best to read through them but one lingering question remains. When doing performance testing on a Solr instance (under normal production like circumstances, not the ones where commits are happening more frequently than necessary), is there any value in performance testing against a server with caches *disabled* with a profiler hooked up to see where queries in the absence of a cache are spending the most time? The reason I am asking this is to tune things like field types, using tint vs regular int, different precision steps etc. Or maybe sorting is taking a long time and the profiler shows an inordinate amount of time spent there etc. so either we find a different way to solve that particular problem. Perhaps we are faceting on something bad etc. Then we can optimize those to at least not be as slow and then ensure that caching is tuned properly so that cache misses don't yield these expensive spikes. I'm trying to devise a proper performance testing for any new features/config changes and wanted to get some feedback on whether or not this approach makes sense. Of course performance testing against a typical production setup *with* caching will also be done to make sure things behave as expected. Thanks! Amit
Re: Easy question ? docs with empty geodata field
What about querying on the dynamic lat/long field to see if there are documents that do not have the dynamic _latlon0 or whatever defined? On Fri, Oct 19, 2012 at 8:17 AM, darul daru...@gmail.com wrote: I have already tried but get a nice exception because of this field type : -- View this message in context: http://lucene.472066.n3.nabble.com/Easy-question-docs-with-empty-geodata-field-tp4014751p4014763.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Easy question ? docs with empty geodata field
So here is my spec for lat/long (similar to yours except I explicitly define the sub-field names for clarity) fieldType name=latLon class=solr.LatLonType subFieldSuffix=_latLon/ field name=location type=latLon indexed=true stored=true/ !-- Could use dynamic fields here but prefer explicitly defining them so it's clear what's going on. The LatLonType looks to be a wrapper around these fields? -- field name=location_0_latLon type=tdouble indexed=true stored=true/ field name=location_1_latLon type=tdouble indexed=true stored=true/ So then the query would be location_0_latLon:[ * TO *]. Looking at your schema, my guess would be: location_0_coordinate:[* TO *] location_1_coordinate:[* TO *] Let me know if that helps Amit On Fri, Oct 19, 2012 at 9:37 AM, darul daru...@gmail.com wrote: Your idea looks great but with this schema info : fieldType name=point class=solr.PointType dimension=2 subFieldSuffix=_d/ fieldType name=location class=solr.LatLonType subFieldSuffix=_coordinate/ fieldtype name=geohash class=solr.GeoHashField/ . field name=geodata type=location indexed=true stored=true/ dynamicField name=*_coordinate type=tdouble indexed=true stored=false / How can I use it ? fq=location_coordinate:[1 to *] not working by instance -- View this message in context: http://lucene.472066.n3.nabble.com/Easy-question-docs-with-empty-geodata-field-tp4014751p4014779.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: maven artifact for solr-solrj-4.0.0
I am not sure if this repository https://repository.apache.org/content/repositories/releases/ works but the modification dates seem reasonable given the timing of the release. I suspect it'll be on maven central soon (hopefully) On Wed, Oct 17, 2012 at 11:13 PM, Grzegorz Sobczyk grzegorz.sobc...@contium.pl wrote: Hello Is there maven artifact for solrj 4.0.0 release ? When it will be available to download from http://mvnrepository.com/ ?? version 4.0.0-BETA isn't compatibile with 4.0.0 (problems with zookeeper and clusterstate.json parsing) Best regards Grzegorz Sobczyk
With Grouping enabled, 0 results yields maxScore of -Infinity
I see that when there are 0 results with the grouping enabled, the max score is -Infinity which causes parsing problems on my client. Without grouping enabled the max score is 0.0. Is there any particular reason for this difference? If not, would there be any resistance to submitting a patch that will set the score to 0 if the numFound is 0 in the grouping component? I see code that sets the max score to -Infinity and then will set it to a different value when iterating over some set of scores. With 0 scores, then it stays as -Infinity and serializes out as such. I'll be more than happy to work on this patch but before I do, I wanted to check that I am not missing something first. Thanks Amit
Re: Sum of scores for documents from a query.
Are you looking for the sum of the scores of each document in the result? In other words, if there were 1000 documents in the numFound but you only of course show 10 (or 0 depending on rows parameter) you want the sum of all the scores of 1000 documents in a separate section of the results? If so, I have some code and a blog post that I am going to write soon about it. Shoot me a private note and I'll zip and send to you. I have it as a separate component. Thanks Amit On Sun, Oct 14, 2012 at 4:47 PM, Erick Erickson erickerick...@gmail.com wrote: bq: is there any way to get a sum of all the scores for a query not that I know of. I'm not sure what value this would be anyway, what do you want to use it for? This seems like an XY problem... Best Erick On Sun, Oct 14, 2012 at 4:39 PM, Gilles Comeau gilles.com...@polecat.co wrote: Hi all, Very quick question: Score is created for each query, however is there any way to get a sum of all the scores for a query in the URL? I've tried stats and it didn't work, and also had no luck with function queries. Does anyone know a way to do this? Kind Regards, Gilles
Re: Auto Correction?
What's preventing you from using the spell checker and take the #1 result and re-issue the query from a sub-class of the query component? It should be reasonably fast to re-execute the query from the server side since you are already within Solr. You can modify the response to indicate that the new query was used so your client can display to the user that it searched automatically for milky.. click here for searches for mlky or something. On Tue, Oct 9, 2012 at 8:46 AM, Ahmet Arslan iori...@yahoo.com wrote: I would like to ask if there are any ways to correct user's queries automatically? I know there is spellchecker which *suggests* possible correct words... The thing i wanna do is *automatically fixing* those queries and running instead of the original one not out of the box, you need to re-run suggestions at client side. There is an commercial product though. http://sematext.com/products/dym-researcher/index.html
PostFilters, Grouping, Sorting Oh My!
Hi all, I've been working with using Solr's post filters/delegate collectors to collect some statistics about the scores of all the documents and had a few questions with regards to this when combined with grouping and sorting: 1) I noticed that if I don't include the score field as part of the sort spec with *no* grouping enabled, my custom delegate scorer gets called so I can then collect the stats I need. Same is true with score as part of the sort spec (this then leads me to focus on the grouping feature) 2) If I turn ON grouping: a) WITH score in the sort spec, my custom delegate scorer gets called b) WITHOUT score in the sort spec, my custom delegate scorer does NOT get called. What's interesting though is that there *are* scores generated so I'm not sure what all is going on. I traced through the code and saw that the scorer gets called as part of one of the comparators (RelevanceComparator) which is why with score in the sort spec it works but that is about as far as I could go. Since I am not too worried in my application about a sort spec without the score always being there it's not a huge concern; however, I do want to understand why with the grouping feature enabled, this doesn't work and whether or not it's a bug. Any help on this would be appreciated so that my solution to this problem is complete. Thanks! Amit
Solr 4.0 and Maven SNAPSHOT artifacts
Is there a maven repository location that contains the nightly build Maven artifacts of Solr? Are SNAPSHOT releases being generated by Jenkins or anything so that when I re-resolve the dependencies I'd get the latest snapshot jars? Thanks Amit
Re: Getting list of operators and terms for a query
I think you'd want to start by looking at the rb.getQuery() in the prepare (or process if you are trying to do post-results analysis). This returns a Query object that would contain everything in that and I'd then look at the Javadoc to see how to traverse it. I'm sure some runtime type-casting may be necessary to get at the sub-structures On Thu, Oct 4, 2012 at 9:23 AM, Davide Lorenzo Marino davide.mar...@gmail.com wrote: I don't need really start from the query String. What I need is obtain a list of terms and operators. So the real problem is: How can I access the Lucene Query structure to traverse it? Davide Marino 2012/10/4 Jack Krupansky j...@basetechnology.com I'm not quite following what the issue is here. I mean, the Solr QueryComponent generates a Lucene Query structure and you need to write code to recursively traverse that Lucene Query structure and generate your preferred form of output. There would be no need to look at the original query string. So, what exactly are you asking? Maybe you simply need to read up on Lucene Query and its subclasses to understand what that structure looks like. -- Jack Krupansky -Original Message- From: Davide Lorenzo Marino Sent: Thursday, October 04, 2012 11:36 AM To: solr-user@lucene.apache.org Subject: Re: Getting list of operators and terms for a query It's ok.. I did it and I took the query string. The problem is convert the java.lang.string (query) in a list of term and operators and doing it using the same parser used by Solr to execute the queries. 2012/10/4 Mikhail Khludnev mkhlud...@griddynamics.com you've got ResponseBuilder as process() or prepare() argument, check query field, but your component should be registered after QueryComponent in your requestHandler config. On Thu, Oct 4, 2012 at 6:03 PM, Davide Lorenzo Marino davide.mar...@gmail.com wrote: Hi All, i'm working in a new searchComponent that analyze the search queries. I need to know if given a query string is possible to get the list of operators and terms (better in polish notation)? I mean if the default field is country and the query is the String england OR (name:paul AND city:rome) to get the List [ Operator OR, Term country:england, OPERATOR AND, Term name:paul, Term city:rome ] Thanks in advance Davide Marino -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Getting list of operators and terms for a query
I'm not 100% sure but my guess is that you can get the list of boolean clauses and their occur (must, should, must not) and that would be your and, or, not equivalents. On Thu, Oct 4, 2012 at 10:39 AM, Davide Lorenzo Marino davide.mar...@gmail.com wrote: For what I saw in the documentation from the class org.apache.lucene.search.Query I can just iterate over the terms using the method extractTerms. How can I extract the operators? 2012/10/4 Amit Nithian anith...@gmail.com I think you'd want to start by looking at the rb.getQuery() in the prepare (or process if you are trying to do post-results analysis). This returns a Query object that would contain everything in that and I'd then look at the Javadoc to see how to traverse it. I'm sure some runtime type-casting may be necessary to get at the sub-structures On Thu, Oct 4, 2012 at 9:23 AM, Davide Lorenzo Marino davide.mar...@gmail.com wrote: I don't need really start from the query String. What I need is obtain a list of terms and operators. So the real problem is: How can I access the Lucene Query structure to traverse it? Davide Marino 2012/10/4 Jack Krupansky j...@basetechnology.com I'm not quite following what the issue is here. I mean, the Solr QueryComponent generates a Lucene Query structure and you need to write code to recursively traverse that Lucene Query structure and generate your preferred form of output. There would be no need to look at the original query string. So, what exactly are you asking? Maybe you simply need to read up on Lucene Query and its subclasses to understand what that structure looks like. -- Jack Krupansky -Original Message- From: Davide Lorenzo Marino Sent: Thursday, October 04, 2012 11:36 AM To: solr-user@lucene.apache.org Subject: Re: Getting list of operators and terms for a query It's ok.. I did it and I took the query string. The problem is convert the java.lang.string (query) in a list of term and operators and doing it using the same parser used by Solr to execute the queries. 2012/10/4 Mikhail Khludnev mkhlud...@griddynamics.com you've got ResponseBuilder as process() or prepare() argument, check query field, but your component should be registered after QueryComponent in your requestHandler config. On Thu, Oct 4, 2012 at 6:03 PM, Davide Lorenzo Marino davide.mar...@gmail.com wrote: Hi All, i'm working in a new searchComponent that analyze the search queries. I need to know if given a query string is possible to get the list of operators and terms (better in polish notation)? I mean if the default field is country and the query is the String england OR (name:paul AND city:rome) to get the List [ Operator OR, Term country:england, OPERATOR AND, Term name:paul, Term city:rome ] Thanks in advance Davide Marino -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Query filtering
I think one way to do this is issue another query and set a bunch of filter queries to restrict interesting_facet to just those ten values returned in the first query. fq=interesting_facet:1 OR interesting_facet:2 etcq=context:whatever Does that help? Amit On Thu, Sep 27, 2012 at 6:33 AM, Finotti Simone tech...@yoox.com wrote: Hello, I'm doing this query to return top 10 facets within a given context, specified via the fq parameter. http://solr/core/select?fq=(...)q=*:*rows=0facet.field=interesting_facetfacet.limit=10 Now, I should search for a term inside the context AND the previously identified top 10 facet values. Is there a way to do this with a single query? thank you in advance, S
Re: Getting the distribution information of scores from query
Thanks! That did the trick! Although it required some more work in the component level of generating the same query key as the index searcher else when you go to try and fetch scores for a cached query result, I got a lot of NPE since the stats are computed in the collector level which for me isn't set since the cache hit bypasses the lucene level. I'll write up what I did and probably try and open source the work for others to see. The stuff with PostFiltering is nice but needs some examples and documentation.. hopefully mine will help the cause. Thanks again Amit On Wed, Sep 26, 2012 at 5:13 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: I suggest to create a component, put it after QueryComponent. in prepare it should add own PostFilter into list of request filters, your post filter will be able to inject own DelegatingCollector, then you can just add collected histogram into result named list http://searchhub.org/dev/2012/02/10/advanced-filter-caching-in-solr/ On Tue, Sep 25, 2012 at 10:03 PM, Amit Nithian anith...@gmail.com wrote: We have a federated search product that issues multiple parallel queries to solr cores and fetches the results and blends them. The approach we were investigating was taking the scores, normalizing them based on some distribution (normal distribution seems reasonable) and use that z score as the way to blend the results (else you'll be blending scores on different scales). To accomplish this, I was looking to get the distribution of the scores for the query as an analog to the stats component but seem to see the only way to accomplish this would be to create a custom collector that would accumulate and store this information (mean, std-dev etc) since the stats component only operates on indexed fields. Is there an easy way to tell Solr to use a custom collector without having to modify the SolrIndexSearcher class? Maybe is there an alternative way to get this information? Thanks Amit -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: AutoIndexing
There's a couple ways to accomplish this from easy to hard depending on your database schema: 1) Use DB trigger - I don't like triggers too much b/c to me they couple your database layer with your application layer which leads to untestable and sometimes unmaintainable code - Also it gets difficult when you want to re-index a document based on a change to an auxiliary table. Say you associate an image with the main entity, you're not touching the main entity table so you then can have triggers on a bunch of tables which could get messy? 2) Use a database table as a queue of records to index and write to it from your application when the main entity changes - This isn't too bad.. it's a replayable queue basically that you can purge when you want, query to find all the main entities that changed and construct your SQL queries accordingly to submit documents for indexing 3) Use a real message queue and some receiver that will index the document - This could be the best but also most complicated solution.. when your application changes an entity, a message is sent on the queue either with the actual document itself or maybe an ID where you can re-construct the document for indexing. There are probably other solutions too but those are the 3 that come to my mind off hand and where I work, we use #2 with incremental index processes that check for changes since some last known time and indexes. - Amit On Tue, Sep 25, 2012 at 3:37 AM, Tom Mortimer tom.m.f...@gmail.com wrote: I'm afraid I don't have any DIH experience myself, but some googling suggests that using a postgresql trigger to start a delta import might be one approach: http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command and http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport Tom On 25 Sep 2012, at 11:28, darshan dk...@dreamsoftech.com wrote: My Document is Database(yes RDBMS) and software for it is postgresql, where any change in it's table should be reflected, without re-indexing. I am indexing it via DIH process Thanks, Darshan -Original Message- From: Tom Mortimer [mailto:tom.m.f...@gmail.com] Sent: Tuesday, September 25, 2012 3:31 PM To: solr-user@lucene.apache.org Subject: Re: AutoIndexing Hi Darshan, Can you give us some more details, e.g. what do you mean by database? A RDBMS? Which software? How are you indexing it (or intending to index it) to Solr? etc... cheers, Tom On 25 Sep 2012, at 09:55, darshan dk...@dreamsoftech.com wrote: Hi All, Is there any way where I can auto-index whenever there is changes in my database. Thanks, Darshan
Prevent Log and other math functions from returning Infinity and erroring out
Is there any reason why the log function shouldn't be modified to always take 1+the number being requested to be log'ed? Reason I ask is I am taking the log of the value output by another function which could return 0. For testing, I modified it to return 1 which works but would rather have the log function simply add 1. Of course I could do something like log(sum(...)) but that seems a bit much OR just create my own modified log function in my code but was wondering if there would be any objections to filing an issue and patch to fix math functions like this from returning infinity? Thanks Amit
Re: Is it possible to do an if statement in a Solr query?
If the fact that it's original vs generic is a field is_original 0/1 can you sort by is_original? Similarly, could you put a huge boost on is_original in the dismax so that document matches on is_original score higher than those that aren't original? Or is your goal to not show generics *at all*? On Wed, Sep 12, 2012 at 2:47 PM, Walter Underwood wun...@wunderwood.org wrote: You may be able to do this with grouping. Group on the medicine family, and only show the Original if there are multiple items in the family. wunder On Sep 12, 2012, at 2:09 PM, Gustav wrote: Hello everyone, I'm working on an e-commerce website and using Solr as my Search Engine, im really enjoying its funcionality and the search options/performance. But i am stucky in a kinda tricky cenario... That what happens: I Have a medicine web-store, where i indexed all necessary products in my Index Solr. But when i search for some medicine, following my business rules, i have to verify if the result of my search contains any Original medicine, if there is any, then i wouldn't show the generics of this respective medicine, on the other hand, if there wasnt any original product in the result i would have to return its generics. Im currently returning the original and generics, is there a way to do this kind of checking in solr? Thanks! :)
Re: XInclude Multiple Elements
Way back when I opened an issue about using XML entity includes in Solr as a way to break up the config. I have found problems with XInclude having multiple elements to include because the file is not well formed. From what I have read, if you make this well formed, you end up with a document that's not what you expect. For example: my schema.xml has fields ... xinclude href=more_fields.xml .../ /fields more_fields.xml field name=.. which isn't well formed. You could make it well formed: fields field name =.. /fields but then I think you end up with nested fields element which doesn't work (and btw I still keep getting the blasted failed to parse error which isn't very helpful). Looking at this made me wonder if entity includes work with Solr 4 and indeed they do! They aren't as flexible as XIncludes but for the purpose of breaking up an XML file into smaller pieces, it works beautifully and as you would expect. You can simply declare your entities at the top as shown in the earlier thread and then include them where you need. I've been using this for years and it works fairly well. Cheers! Amit On Thu, May 31, 2012 at 7:01 AM, Bogdan Nicolau bogdan@gmail.com wrote: I've also tried a lot of tricks to get xpointer working with multiple child elements, to no success. In the end, I've resorted to a less pretty, other-way-around solution. I do something like this: solrconfig_common.xml - no xml declaration, no root tag, no nothing etc/etc etc2/etc2 ... For each file that I need the common stuff into, I'd do something like this: solrconfig_master.xml/solrconfig_slave.xml/etc. ?xml version=1.0 encoding=UTF-8 ? !DOCTYPE config [ lt;!ENTITY solrconfigcommon SYSTEM quot;solrconfig_common.xmlquot; ] config solrconfigcommon; /config Solr starts with 0 warnings, the configuration is properly loaded, etc. Property substitution also works, including inside the solrconfig_common.xml. Hope it helps anyone. -- View this message in context: http://lucene.472066.n3.nabble.com/XInclude-Multiple-Elements-tp3167658p3987029.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Replication policy
If I understand you right, replication of data has 0 downtime, it just works and the data flows through from master to slaves. If you want, you can configure the replication to replicate configuration files across the cluster (although to me my deploy script does this). I'd recommend tweaking the warmers so that you don't get latency spikes due to cold caches during the replications. Not being well versed in the latest Solr features (I'm a bit behind here), I don't know if you can reload the cores on demand to indicate the latest configurations or not but in my environment, I have a rolling restart script that bounces a set of servers when the schema/solrconfig changes. HTH Amit On Mon, Sep 10, 2012 at 11:10 PM, Abhishek tiwari abhishek.tiwari@gmail.com wrote: HI All, am having 1 master and 3 slave solr server.(verson 3.6) What kind of replication policy should i adopt with zero down time no data loss . 1) when we do some configuration and schema changes on the solr server .
Re: solr.StrField with stored=true useless or bad?
This is great thanks for this post! I was curious about the same thing and was wondering why fl couldn't return the indexed representation of a field if that field were only indexed but not stored. My thoughts were return something than nothing but I didn't pay attention to the fact that getting even the indexed representation of a field given a document is not fast. Thanks Amit On Tue, Sep 11, 2012 at 4:03 PM, sy...@web.de wrote: Hi, I have a StrField to store an URL. The field definition looks like this: field name=link type=string indexed=true stored=true required=true / Type string is defined as usual: fieldType name=string class=solr.StrField sortMissingLast=true / Then I realized that a StrField doesn't execute any analyzers and stored data verbatim. The data is just a single token. The purpose of stored=true is to store the raw string data besides the analyzed/transformed data for displaying purposes. This is fine for an analyzed solr.TextField, but for an StrField both values are the same. So is there any reason to apply stored=true on a StrField as well? I ask, because I found a lot of sites and tutorials applying stored=true on StrFields as well. Do they all to it wrong or am I missing something here?
Re: Solr - Lucene Debuging help
The wiki should probably be updated.. maybe I'll take a stab at it. I'll also try and update my article referenced there too. When you checkout the project from SVN, do ant eclipse Look at this bug (https://issues.apache.org/jira/browse/SOLR-3817) and either run the ruby program or download the patch and apply but either way it should fix the classpath issues. Then import the project and you can follow the remainder of the steps in the http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-eclipse article. Cheers Amit On Mon, Sep 10, 2012 at 1:29 PM, BadalChhatbar badal...@yahoo.com wrote: Hi Steve, Thanks, I was able to create new project using that url. :) one more thing,.. its giving me about 32K error. (something like.. this type cannot be resolved). i tried rebuilding project and running ant command (build.xml) . but it didn't help. any suggestions on this ? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Lucene-Debuging-help-tp4006715p4006721.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: In-memory indexing
I have wondered about this too but instead why not just set your cache sizes large enough to house most/all of your documents and pre-warm the caches accordingly? My bet is that a large enough document cache may suffice but that's just a guess. - Amit On Mon, Sep 10, 2012 at 10:56 AM, Kiran Jayakumar kiranjuni...@gmail.com wrote: Hi, Does anyone have any experience in hosting the entire index in a RAM disk ? (I'm not thinking about Lucene's RAM directory). I have some small indexes (less than a Gb). Also, please recommend a good RAM disk application for Windows (I have used Gizmo, wondering if there's any better one out there). Thanks Kiran
Re: Trouble Setting Up Development Environment
Sorry i'm really late to this so not sure if this is even an issue: 1) I found that there is an ant eclipse that makes it easy to setup the eclipse .project and .classpath (I think I had done this by hand in the tutorial) 2) Yes you can attach to a remote instance of Solr but your JVM has to have the remote debug options and port setup. Eclipse can connect fairly easily to this in the debug configuration menu. Thanks Amit On Mon, Mar 26, 2012 at 4:13 AM, Erick Erickson erickerick...@gmail.com wrote: Depending upon what you actually need to do, you could consider just attaching to the running Solr instance remotely. I know it's easy in IntelliJ, and believe Eclipse makes this easy as well but I haven't used Eclipse in a while Best Erick On Sat, Mar 24, 2012 at 11:11 PM, Li Li fancye...@gmail.com wrote: I forgot to write that I am running it in tomcat 6, not jetty. you can right click the project - Debug As - Debug on Server - Manually define a new Server - Apache - Tomcat 6 if you should have configured a tomcat. On Sun, Mar 25, 2012 at 4:17 AM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: I followed your instructions. I got 8 Errors and a bunch of warnings few of them related to classpath. I also got the following exception when I tried to run with the jetty ( i have attached the full console output with this email. I figured solr directory with config files might be missing and added that in WebContent. Could be of great help if someone can point me at right direction. ls WebContent admin favicon.ico index.jsp solr WEB-INF *SEVERE: Error in solrconfig.xml:org.apache.solr.common.SolrException: No system property or default value specified for solr.test.sys.prop1* at org.apache.solr.common.util.DOMUtil.substituteProperty(DOMUtil.java:331) at org.apache.solr.common.util.DOMUtil.substituteProperties(DOMUtil.java:290) at org.apache.solr.common.util.DOMUtil.substituteProperties(DOMUtil.java:292) at org.apache.solr.core.Config.init(Config.java:165) at org.apache.solr.core.SolrConfig.init(SolrConfig.java:131) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:435) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:133) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at runjettyrun.Bootstrap.main(Bootstrap.java:97) *Here are the 8 errors I got* *Description Resource Path Location Type* core cannot be resolved dataimport.jsp /solr3_5/ssrc/solr/contrib/dataimporthandler/src/webapp/admin line 27 JSP Problem End tag (/html) not closed properly, expected .package.html /solr3_5/ssrc/lucene/contrib/queryparser/src/java/org/apache/lucene/queryParser/core/config line 64HTML Problem Fragment_info.jsp was not found at expected path /solr3_5/ssrc/solr/contrib/ dataimporthandler/src/webapp/admin/_info.jspdataimport.jsp /solr3_5/ssrc/solr/contrib/dataimporthandler/src/webapp/admin line 21JSP Problem Fragment _info.jsp was not found at expected path /solr3_5/ssrc/solr/contrib/dataimporthandler /src/webapp/admin/_info.jsp debug.jsp /solr3_5/ssrc/solr/contrib/dataimporthandler/src/webapp/admin line 19JSP Problem Named template dotdots is not available tabutils.xsl /solr3_5/ssrc/lucene/src/site/src/documentation/skins/common/xslt/html line 41XSL Problem Named template dotdots is not available tabutils.xsl /solr3_5/ssrc/solr/site-src/src/documentation/skins/common/xslt/html line 41XSL Problem Unhandled exception type Throwable ping.jsp
Re: N-gram ranking based on term position
I think your thought about using the edge ngram as a field and boosting that field in the qf/pf sections of the dismax handler sounds reasonable. Why do you have qualms about it? On Fri, Sep 7, 2012 at 12:28 PM, Kiran Jayakumar kiranjuni...@gmail.com wrote: Hi, Is it possible to score documents with a match early in the text higher than later in the text ? I want to boost begin with matches higher than the contains matches. I can define a copy field and analyze it as edge n-gram and boost it. I was wondering if there was a better way to do it. Thanks
Re: Running out of memory
I am debugging an out of memory error myself and a few suggestions: 1) Are you looking at your search logs around the time of the memory error? In my case, I found a few bad queries requesting a ton of rows (basically the whole index's worth which I think is an error somewhere in our app just have to find it) which happened close to the OOM error being thrown. 2) Do you have Solr hooked up to something like NewRelic/AppDynamics to see the cache usage in real time? Maybe as was suggested, tuning down or eliminating low used caches could help. 3) Are you ensuring that you aren't setting stored=true on fields that don't need it? This will increase the index size and possibly the cache size if lazy loading isn't enabled (to be honest, this part I am a bit unclear of since I haven't had much experience with this myself). Thanks Amit On Mon, Aug 13, 2012 at 11:37 AM, Jon Drukman jdruk...@gmail.com wrote: On Sun, Aug 12, 2012 at 12:31 PM, Alexey Serba ase...@gmail.com wrote: It would be vastly preferable if Solr could just exit when it gets a memory error, because we have it running under daemontools, and that would cause an automatic restart. -XX:OnOutOfMemoryError=cmd args; cmd args Run user-defined commands when an OutOfMemoryError is first thrown. Does Solr require the entire index to fit in memory at all times? No. But it's hard to say about your particular problem without additional information. How often do you commit? Do you use faceting? Do you sort by Solr fields and if yes what are those fields? And you should also check caches. I upgraded to solr-3.6.1 and an extra large amazon instance (15GB RAM) so we'll see if that helps. So far no out of memory errors.
Re: Nrt and caching
Thanks for the responses. I guess my specific question is if I had something which was dependent on the mapping between lucene document ids and some object primary key so i could pull in external data from another data source without a constant reindex, how would this get affected by soft and hard commits? I'd prefer not to have to rebuild this mapping from scratch on each soft or even hard commits if possible since those seem to happen frequently. Also can you explain why and how per segment caches are used and how at the client of lucene layer one gets access or knows about this? I always thought segments were an implementation detail where they get merged on optimize etc so wouldn't that affect clients depending on segment level stuff? Or what am I missing? Thanks again! Amit On Jul 7, 2012 9:22 AM, Andy angelf...@yahoo.com wrote: So If I want to use multi-value facet with NRT I'd need to convert the cache to per-segment? How do I do that? Thanks. From: Jason Rutherglen jason.rutherg...@gmail.com To: solr-user@lucene.apache.org Sent: Saturday, July 7, 2012 11:32 AM Subject: Re: Nrt and caching The field caches are per-segment, which are used for sorting and basic [slower] facets. The result set, document, filter, and multi-value facet caches are [in Solr] per-multi-segment. Of these, the document, filter, and multi-value facet caches could be converted to be [performant] per-segment, as with some other Apache licensed Lucene based search engines. On Sat, Jul 7, 2012 at 10:42 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Sat, Jul 7, 2012 at 9:59 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Currently the caches are stored per-multiple-segments, meaning after each 'soft' commit, the cache(s) will be purged. Depends which caches. Some caches are per-segment, and some caches are top level. It's also a trade-off... for some things, per-segment data structures would indeed turn around quicker on a reopen, but every query would be slower for it. -Yonik http://lucidimagination.com
Nrt and caching
Sorry I'm a bit new to the nrt stuff in solr but I'm trying to understand the implications of frequent commits and cache rebuilding and auto warming. What are the best practices surrounding nrt searching and caches and query performance. Thanks! Amit
Re: How to improve this solr query?
Couple questions: 1) Why are you explicitly telling solr to sort by score desc, shouldn't it do that for you? Could this be a source of performance problems since sorting requires the loading of the field caches? 2) Of the query parameters, q1 and q2, which one is actually doing text searching on your index? It looks like q1 is doing non-string related stuff, could this be better handled in either the bf or bq section of the edismax config? Looking at the sample though I don't understand how q1=apartment would hit non-string fields again (but see #3) 3) Are the string fields literally of string type (i.e. no analysis on the field) or are you saying string loosely to mean text field. pf == phrase fields == given a multiple word query, will ensure that the specified phrase exists in the specified fields separated by some slop (hello my world may match hello world depending on this slop value). The qf means that given a multi term query, each term exists in the specified fields (name, description whatever text fields you want). Best Amit On Mon, Jul 2, 2012 at 9:35 AM, Chamnap Chhorn chamnapchh...@gmail.com wrote: Hi all, I'm using solr 3.5 with nested query on the 4 core cpu server + 17 Gb. The problem is that my query is so slow; the average response time is 12 secs against 13 millions documents. What I am doing is to send quoted string (q2) to string fields and non-quoted string (q1) to other fields and combine the result together. facet=truesort=score+descq2=*apartment*facet.mincount=1q1=*apartment* tie=0.1q.alt=*:*wt=jsonversion=2.2rows=20fl=uuidfacet.query=has_map:+truefacet.query=has_image:+truefacet.query=has_website:+truestart=0q= * _query_:+{!dismax+qf='.'+fq='..'+v=$q1}+OR+_query_:+{!dismax+qf='..'+fq='...'+v=$q2} * facet.field={!ex%3Ddt}sub_category_uuidsfacet.field={!ex%3Ddt}location_uuid I have done solr optimize already, but it's still slow. Any idea how to improve the speed? Am I done anything wrong? -- Chhorn Chamnap http://chamnap.github.com/
Use of Solr as primary store for search engine
Hello all, I am curious to know how people are using Solr in conjunction with other data stores when building search engines to power web sites (say an ecommerce site). The question I have for the group is given an architecture where the primary (transactional) data store is MySQL (Oracle, PostGres whatever) with periodic indexing into Solr, when your front end issues a search query to Solr and returns results, are there any joins with your primary Oracle/MySQL etc to help render results? Basically I guess my question is whether or not you store enough in Solr so that when your front end renders the results page, it never has to hit the database. The other option is that your search engine only returns primary keys that your front end then uses to hit the DB to fetch data to display to your end user. With Solr 4.0 and Solr moving towards the NoSQL direction, I am curious what people are doing and what application architectures with Solr look like. Thanks! Amit
Re: Something like 'bf' or 'bq' with MoreLikeThis
No worries! What version of Solr are you using? One that you downloaded as a tarball or one that you checked out from SVN (trunk)? I'll take a bit of time and document steps and respond. I'll review the patch to see that it fits a general case. Question for you with MLT, are your users doing a blank search (no text) for something or are you returning results More Like results that were generated as a result of a user typing some text query. I may have built this patch assuming a blank query but I can make it work (or try to) make it work for text based queries. Thanks Amit On Wed, Jul 4, 2012 at 1:37 AM, nanshi nanshi.e...@gmail.com wrote: Thanks a lot, Amit! Please bear with me, I am a new Solr dev, could you please shed me some light on how to use a patch? point me to a wiki/doc is fine too. Thanks a lot! :) -- View this message in context: http://lucene.472066.n3.nabble.com/Something-like-bf-or-bq-with-MoreLikeThis-tp3989060p3992935.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Use of Solr as primary store for search engine
Paul, Thanks for your response! Were you using the SQL database as an object store to pull XWiki objects or did you have to execute several queries to reconstruct these objects? I don't know much about them sorry.. Also for those responding, can you provide a few basic metrics for me? 1) Number of nodes receiving queries 2) Approximate queries per second 3) Approximate latency per query I know some of this may be sensitive depending on where you work so reasonable ranges would be nice (i.e. sub-second isn't hugely helpful since 50,100,200 ms have huge impacts depending on your site). Thanks again! Amit On Wed, Jul 4, 2012 at 1:09 AM, Paul Libbrecht p...@hoplahup.net wrote: Amit, not exactly a response to your question but doing this with a lucene index on i2geo.net has resulted in considerably performance boost (reading from stored-fields instead of reading from the xwiki objects which pull from the SQL database). However, it implied that we had to rewrite anything necessary for the rendering, hence the rendering has not re-used that many code. Paul Le 4 juil. 2012 à 09:54, Amit Nithian a écrit : Hello all, I am curious to know how people are using Solr in conjunction with other data stores when building search engines to power web sites (say an ecommerce site). The question I have for the group is given an architecture where the primary (transactional) data store is MySQL (Oracle, PostGres whatever) with periodic indexing into Solr, when your front end issues a search query to Solr and returns results, are there any joins with your primary Oracle/MySQL etc to help render results? Basically I guess my question is whether or not you store enough in Solr so that when your front end renders the results page, it never has to hit the database. The other option is that your search engine only returns primary keys that your front end then uses to hit the DB to fetch data to display to your end user. With Solr 4.0 and Solr moving towards the NoSQL direction, I am curious what people are doing and what application architectures with Solr look like. Thanks! Amit
Re: Something like 'bf' or 'bq' with MoreLikeThis
I had a similar problem so I submitted this patch: https://issues.apache.org/jira/browse/SOLR-2351 I haven't applied this to trunk in a while but my goal was to ensure that bf parameters were passed down and respected by the MLT handler. Let me know if this works for you or not. If there is sufficient interest, I'll re-apply this patch to trunk and try and devise some tests. Thanks! Amit On Tue, Jul 3, 2012 at 5:08 PM, nanshi nanshi.e...@gmail.com wrote: Jack, can you please explain this in some more detail? Such as how to write my own search component to modify request to add bq parameter and get customized result back? -- View this message in context: http://lucene.472066.n3.nabble.com/Something-like-bf-or-bq-with-MoreLikeThis-tp3989060p3992888.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: difference between stored=false and stored=true ?
So couple questions on this (comment first then question): 1) I guess you can't have four combinations b/c index=false/stored=false has no meaning? 2) If you set less fields stored=true does this reduce the memory footprint for the document cache? Or better yet, I can store more documents in the cache possibly increasing my cache efficiency? I read about the lazy loading of fields which seems like a good way to maximize the cache and gain the advantage of storing data in Solr too. Thanks Amit On Sat, Jun 30, 2012 at 11:01 AM, Giovanni Gherdovich g.gherdov...@gmail.com wrote: Thank you François and Jack for those explainations. Cheers, GGhh 2012/6/30 François Schiettecatte: Giovanni stored=true means the data is stored in the index and [...] 2012/6/30 Jack Krupansky: indexed and stored are independent [...]
Re: Editing long Solr URLs - Chrome Extension
All, I have placed a new version of the extension (suffixed _0.3) at https://github.com/ANithian/url_edit_extension/downloads. A few of the bugs resolved: 1) Switching to a tab in a new window and clicking on the extension wasn't loading the right URL 2) Complex SOLR URLs (ironic as this was the purpose) weren't being handled properly. I had to ditch the 3rd party URL parser in favor of my own which should better handle these complex parameters. 3) Replaced the edit box of the parameter value from a single line textbox to a multiple line textarea. This doesn't solve the tab to edit the next row but it helps a bit in that problem. Please keep submitting issues as you encounter them and I'll address them as best as possible. I hope that this helps everyone! Thanks! Amit On Tue, May 15, 2012 at 6:20 PM, Amit Nithian anith...@gmail.com wrote: Erick Yes thanks I did see that and am working on a solution to that already. Hope to post a new revision shortly and eventually migrate to the extension store. Cheers Amit On May 15, 2012 9:20 AM, Erick Erickson erickerick...@gmail.com wrote: I think I put one up already, but in case I messed up github, complex params like the fq here: http://localhost:8983/solr/select?q=:fq={!geofilt sfield=store pt=52.67,7.30 d=5} aren't properly handled. But I'm already using it occasionally Erick On Tue, May 15, 2012 at 10:02 AM, Amit Nithian anith...@gmail.com wrote: Jan Thanks for your feedback! If possible can you file these requests on the github page for the extension so I can work on them? They sound like great ideas and I'll try to incorporate all of them in future releases. Thanks Amit On May 11, 2012 9:57 AM, Jan Høydahl j...@hoydahl.no wrote: I've been testing https://chrome.google.com/webstore/detail/mbnigpeabbgkmbcbhkkbnlidcobbapff?hl=enbut I don't think it's great. Great work on this one. Simple and straight forward. A few wishes: * Sticky mode? This tool would make sense in a sidebar, to do rapid refinements * If you edit a value and click TAB, it is not updated :( * It should not be necessary to URLencode all non-ascii chars - why not leave colon, caret (^) etc as is, for better readability? * Some param values in Solr may be large, such as fl, qf or bf. Would be nice if the edit box was multi-line, or perhaps adjusts to the size of the content -- Jan Høydahl, search solution architect Cominvent AS - www.facebook.com/Cominvent Solr Training - www.solrtraining.com On 11. mai 2012, at 07:32, Amit Nithian wrote: Hey all, I don't know about you but most of the Solr URLs I issue are fairly lengthy full of parameters on the query string and browser location bars aren't long enough/have multi-line capabilities. I tried to find something that does this but couldn't so I wrote a chrome extension to help. Please check out my blog post on the subject and please let me know if something doesn't work or needs improvement. Of course this can work for any URL with a query string but my motivation was to help edit my long Solr URLs. http://hokiesuns.blogspot.com/2012/05/manipulating-urls-with-long-query.html Thanks! Amit
Re: Editing long Solr URLs - Chrome Extension
Jan Thanks for your feedback! If possible can you file these requests on the github page for the extension so I can work on them? They sound like great ideas and I'll try to incorporate all of them in future releases. Thanks Amit On May 11, 2012 9:57 AM, Jan Høydahl j...@hoydahl.no wrote: I've been testing https://chrome.google.com/webstore/detail/mbnigpeabbgkmbcbhkkbnlidcobbapff?hl=enbut I don't think it's great. Great work on this one. Simple and straight forward. A few wishes: * Sticky mode? This tool would make sense in a sidebar, to do rapid refinements * If you edit a value and click TAB, it is not updated :( * It should not be necessary to URLencode all non-ascii chars - why not leave colon, caret (^) etc as is, for better readability? * Some param values in Solr may be large, such as fl, qf or bf. Would be nice if the edit box was multi-line, or perhaps adjusts to the size of the content -- Jan Høydahl, search solution architect Cominvent AS - www.facebook.com/Cominvent Solr Training - www.solrtraining.com On 11. mai 2012, at 07:32, Amit Nithian wrote: Hey all, I don't know about you but most of the Solr URLs I issue are fairly lengthy full of parameters on the query string and browser location bars aren't long enough/have multi-line capabilities. I tried to find something that does this but couldn't so I wrote a chrome extension to help. Please check out my blog post on the subject and please let me know if something doesn't work or needs improvement. Of course this can work for any URL with a query string but my motivation was to help edit my long Solr URLs. http://hokiesuns.blogspot.com/2012/05/manipulating-urls-with-long-query.html Thanks! Amit
Re: Editing long Solr URLs - Chrome Extension
Erick Yes thanks I did see that and am working on a solution to that already. Hope to post a new revision shortly and eventually migrate to the extension store. Cheers Amit On May 15, 2012 9:20 AM, Erick Erickson erickerick...@gmail.com wrote: I think I put one up already, but in case I messed up github, complex params like the fq here: http://localhost:8983/solr/select?q=:fq={!geofilt sfield=store pt=52.67,7.30 d=5} aren't properly handled. But I'm already using it occasionally Erick On Tue, May 15, 2012 at 10:02 AM, Amit Nithian anith...@gmail.com wrote: Jan Thanks for your feedback! If possible can you file these requests on the github page for the extension so I can work on them? They sound like great ideas and I'll try to incorporate all of them in future releases. Thanks Amit On May 11, 2012 9:57 AM, Jan Høydahl j...@hoydahl.no wrote: I've been testing https://chrome.google.com/webstore/detail/mbnigpeabbgkmbcbhkkbnlidcobbapff?hl=enbutI don't think it's great. Great work on this one. Simple and straight forward. A few wishes: * Sticky mode? This tool would make sense in a sidebar, to do rapid refinements * If you edit a value and click TAB, it is not updated :( * It should not be necessary to URLencode all non-ascii chars - why not leave colon, caret (^) etc as is, for better readability? * Some param values in Solr may be large, such as fl, qf or bf. Would be nice if the edit box was multi-line, or perhaps adjusts to the size of the content -- Jan Høydahl, search solution architect Cominvent AS - www.facebook.com/Cominvent Solr Training - www.solrtraining.com On 11. mai 2012, at 07:32, Amit Nithian wrote: Hey all, I don't know about you but most of the Solr URLs I issue are fairly lengthy full of parameters on the query string and browser location bars aren't long enough/have multi-line capabilities. I tried to find something that does this but couldn't so I wrote a chrome extension to help. Please check out my blog post on the subject and please let me know if something doesn't work or needs improvement. Of course this can work for any URL with a query string but my motivation was to help edit my long Solr URLs. http://hokiesuns.blogspot.com/2012/05/manipulating-urls-with-long-query.html Thanks! Amit
Editing long Solr URLs - Chrome Extension
Hey all, I don't know about you but most of the Solr URLs I issue are fairly lengthy full of parameters on the query string and browser location bars aren't long enough/have multi-line capabilities. I tried to find something that does this but couldn't so I wrote a chrome extension to help. Please check out my blog post on the subject and please let me know if something doesn't work or needs improvement. Of course this can work for any URL with a query string but my motivation was to help edit my long Solr URLs. http://hokiesuns.blogspot.com/2012/05/manipulating-urls-with-long-query.html Thanks! Amit
Re: Solr like for autocomplete field?
I implemented the edge ngrams solution and it's an awesome one compared to any other that I could think of because I can index more than just text (other metadata) that can be used to *rank* the autocomplete results eventually getting to rank by the probability of selection which is, after all, what you want to try and maximize with such systems. On Tue, Nov 2, 2010 at 6:30 PM, Lance Norskog goks...@gmail.com wrote: And the SpellingComponent. There's nothing to help you with phrases. On Tue, Nov 2, 2010 at 11:21 AM, Erick Erickson erickerick...@gmail.com wrote: Also, you might want to consider TermsComponent, see: http://wiki.apache.org/solr/TermsComponent Also, note that there's an autosuggestcomponent, that's recently been committed. Best Erick On Tue, Nov 2, 2010 at 1:56 PM, PeterKerk vettepa...@hotmail.com wrote: I have a city field. Now when a user starts typing in a city textbox I want to return found matches (like Google). So for example, user types new, and I will return new york, new hampshire etc. my schema.xml field name=city type=string indexed=true stored=true/ my current url: http://localhost:8983/solr/db/select/?indent=onfacet=trueq=*:*start=0rows=25fl=idfacet.field=cityfq=city:new Basically 2 questions here: 1. is the url Im using the best practice when implementing autocomplete? What I wanted to do, is use the facets for found matches. 2. How can I match PART of the cityname just like the SQL LIKE command, cityname LIKE '%userinput' Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-like-for-autocomplete-field-tp1829480p1829480.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: CoreContainer Usage
Hi sorry perhaps my question wasn't very clear. Basically I am trying to build a federated search where I blend the results of queries to multiple cores together. This is like distributed search but I believe the distributed search will issue network calls which I would like to avoid. I have read that someone will use a single core as the federated search handler and then run the searches across multiple cores and blend the results. This is great but I can't figure out how to easily get access to an instance of the CoreContainer that I hope has been initialized (so I am not having it re-parse the configuration files). Any help would be appreciated. Thanks! Amit On Thu, Oct 7, 2010 at 10:07 AM, Amit Nithian anith...@gmail.com wrote: I am trying to understand the multicore setup of Solr more and saw that SolrCore.getCore is deprecated in favor of CoreContainer.getCore(name). How can I get a reference to the CoreContainer for I assume it's been created somewhere in Solr and is it possible for one core to get access to another SolrCore via the CoreContainer? Thanks Amit
CoreContainer Usage
I am trying to understand the multicore setup of Solr more and saw that SolrCore.getCore is deprecated in favor of CoreContainer.getCore(name). How can I get a reference to the CoreContainer for I assume it's been created somewhere in Solr and is it possible for one core to get access to another SolrCore via the CoreContainer? Thanks Amit
Re: Very slow queries
Try stopping replication and see if your query performance may improve. I think the caches get reset each time replication occurs. You can look at the cache performance using the admin console.. try and see if any of the caches are constantly being missed.. this could be due to your newSearcher/firstSearcher warming queries not doing an adequate job of warming your caches which can affect performance. Perhaps the answer could be to allocate more cache space and hence more VM Heap space. I hope that this helps some. - Amit On Thu, Oct 7, 2010 at 4:32 AM, Christos Constantinou ch...@simpleweb.co.uk wrote: Hello everyone, All of a sudden, I am experiencing some very slow queries with solr. I have 13GB of indexed documents, each averaging 50-100kb. They have an id key, so I expect to be getting results really fast if I execute id:7cd6cb99fd239c1d743a51bb85a48f790f4a6d3c as the query with no other parameters. Instead, the query may take up to 1 full second (the majority of time spent on org.apache.solr.handler.component.QueryComponent) whereas more complicated queries may take more than a full minute to complete. I am not sure where to start looking for the problem. I stopped all the scripts that add and commit the solr server, then restarted solr, but the queries still take just as long. Also there is a replication server that runs every 60 seconds, I don't know how that might affect performance. Any clues as to how I should investigate this would be appreciated. Thanks Christos