Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load
[posted this yesterday in lucene-user mailing list, and got an advice to post this here instead. excuse me for spamming] Hi, I'm currently involved in a project of migrating from Lucene 2.9.1 to Solr 1.4.0. During stress testing, I encountered this performance problem: While actual search times in our shards (which are now running Solr) have not changed, the total time it takes for a query has increased dramatically. During this performance test, we of course do not modify the indexes. Our application is sending Solr select queries concurrently to the 8 shards, using CommonsHttpSolrServer. I added some timing debug messages, and found that CommonsHttpSolrServer.java, line 416 takes about 95% of the application's total search time: int statusCode = _httpClient.executeMethod(method); Just to clarify: looking at access logs of the Solr shards, TTLB for a query might be around 5 ms. (on all shards), but httpClient.executeMethod() for this query can be much higher - say, 50 ms. On average, if under light load queries take 12 ms. on average, under heavy load the take around 22 ms. Another route we tried to pursue is add the shards=shard1,shard2,… parameter to the query instead of doing this ourselves, but this doesn't seem to work due to an NPE caused by QueryComponent.returnFields(), line 553: if (returnScores sdoc.score != null) { where sdoc is null. I saw there is a null check on trunk, but since we're currently using Solr 1.4.0's ready-made WAR file, I didn't see an easy way around this. Note: we're using a custom query component which extends QueryComponent, but debugging this, I saw nothing wrong with the results at this point in the code. Our previous code used HTTP in a different manner: For each request, we created a new sun.net.www.protocol.http.HttpURLConnection, and called its getInputStream() method. Under the same load as the new application, the old application does not encounter the delays mentioned above. Our current code is initializing CommonsHttpSolrServer for each shard this way: MultiThreadedHttpConnectionManager httpConnectionManager = new MultiThreadedHttpConnectionManager(); httpConnectionManager.getParams().setTcpNoDelay(true); httpConnectionManager.getParams().setMaxTotalConnections(1024); httpConnectionManager.getParams().setStaleCheckingEnabled(false); HttpClient httpClient = new HttpClient(); HttpClientParams params = new HttpClientParams(); params.setCookiePolicy(CookiePolicy.IGNORE_COOKIES); params.setAuthenticationPreemptive(false); params.setContentCharset(StringConstants.UTF8); httpClient.setParams(params); httpClient.setHttpConnectionManager(httpConnectionManager); and passing the new HttpClient to the Solr Server: solrServer = new CommonsHttpSolrServer(coreUrl, httpClient); We tried two different ways - one with a single MultiThreadedHttpConnectionManager and HttpClient for all the SolrServer's, and the other with a new MultiThreadedHttpConnectionManager and HttpClient for each SolrServer. Both tries yielded similar performance results. Also tried to give setMaxTotalConnections() a much higher connections number (1,000,000) - didn't have an effect. One last thing - to answer Lance's question about this being an apples to apples comparison (in lucene-user thread) - yes, our main goal in this project is to do things as close to the previous version as possible. This way we can monitor that behavior (both quality and performance) remains similar, release this version, and then move forward to improve things. Of course, there are some changes, but I believe we are indeed measuring the complete flow on both apps, and that both apps are returning the same fields via HTTP. Would love to hear what you think about this. TIA, Ophir
Setting up apache solr in eclipse with Tomcat
I would like to setup apache solr in eclipse using tomcat. It is easy to setup with jetty but with tomcat it doesn't run solr on runtime. Anyone has done this before? Hando -- View this message in context: http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1021673.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load
Ophir, this sounds a bit strange: CommonsHttpSolrServer.java, line 416 takes about 95% of the application's total search time Is this only for heavy load? Some other things: * with lucene you accessed the indices with MultiSearcher in a LAN, right? * did you look into the logs of the servers, is there something wrong/delayed? * did you enable gzip compression for your servers or even the binary writer/parser for your solr clients? CommonsHttpSolrServer server = ... server.setRequestWriter(new BinaryRequestWriter()); server.setParser(new BinaryResponseParser()); Regards, Peter. [posted this yesterday in lucene-user mailing list, and got an advice to post this here instead. excuse me for spamming] Hi, I'm currently involved in a project of migrating from Lucene 2.9.1 to Solr 1.4.0. During stress testing, I encountered this performance problem: While actual search times in our shards (which are now running Solr) have not changed, the total time it takes for a query has increased dramatically. During this performance test, we of course do not modify the indexes. Our application is sending Solr select queries concurrently to the 8 shards, using CommonsHttpSolrServer. I added some timing debug messages, and found that CommonsHttpSolrServer.java, line 416 takes about 95% of the application's total search time: int statusCode = _httpClient.executeMethod(method); Just to clarify: looking at access logs of the Solr shards, TTLB for a query might be around 5 ms. (on all shards), but httpClient.executeMethod() for this query can be much higher - say, 50 ms. On average, if under light load queries take 12 ms. on average, under heavy load the take around 22 ms. Another route we tried to pursue is add the shards=shard1,shard2,… parameter to the query instead of doing this ourselves, but this doesn't seem to work due to an NPE caused by QueryComponent.returnFields(), line 553: if (returnScores sdoc.score != null) { where sdoc is null. I saw there is a null check on trunk, but since we're currently using Solr 1.4.0's ready-made WAR file, I didn't see an easy way around this. Note: we're using a custom query component which extends QueryComponent, but debugging this, I saw nothing wrong with the results at this point in the code. Our previous code used HTTP in a different manner: For each request, we created a new sun.net.www.protocol.http.HttpURLConnection, and called its getInputStream() method. Under the same load as the new application, the old application does not encounter the delays mentioned above. Our current code is initializing CommonsHttpSolrServer for each shard this way: MultiThreadedHttpConnectionManager httpConnectionManager = new MultiThreadedHttpConnectionManager(); httpConnectionManager.getParams().setTcpNoDelay(true); httpConnectionManager.getParams().setMaxTotalConnections(1024); httpConnectionManager.getParams().setStaleCheckingEnabled(false); HttpClient httpClient = new HttpClient(); HttpClientParams params = new HttpClientParams(); params.setCookiePolicy(CookiePolicy.IGNORE_COOKIES); params.setAuthenticationPreemptive(false); params.setContentCharset(StringConstants.UTF8); httpClient.setParams(params); httpClient.setHttpConnectionManager(httpConnectionManager); and passing the new HttpClient to the Solr Server: solrServer = new CommonsHttpSolrServer(coreUrl, httpClient); We tried two different ways - one with a single MultiThreadedHttpConnectionManager and HttpClient for all the SolrServer's, and the other with a new MultiThreadedHttpConnectionManager and HttpClient for each SolrServer. Both tries yielded similar performance results. Also tried to give setMaxTotalConnections() a much higher connections number (1,000,000) - didn't have an effect. One last thing - to answer Lance's question about this being an apples to apples comparison (in lucene-user thread) - yes, our main goal in this project is to do things as close to the previous version as possible. This way we can monitor that behavior (both quality and performance) remains similar, release this version, and then move forward to improve things. Of course, there are some changes, but I believe we are indeed measuring the complete flow on both apps, and that both apps are returning the same fields via HTTP. Would love to hear what you think about this. TIA, Ophir -- http://karussell.wordpress.com/
RE: wildcard and proximity searches
Thanks for you ideia. At this point I'm logging each query time. My ideia is to divide my queries into normal queries and heavy queries. I have some heavy queries with 1 minute or 2mintes to get results. But they have for instance (*word1* AND *word2* AND word3*). I guess that this will be always slower (could be a little faster with ReversedWildcardFilterFactory) but they never be ready in a few seconds. For now, I just increased the timeout for those :) (using solrnet). My priority at the moment is the queries phrases like word1* word2* word3. After this is working, I'll try to optimize the heavy queries Frederico -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: quarta-feira, 4 de Agosto de 2010 01:41 To: solr-user@lucene.apache.org Subject: Re: wildcard and proximity searches Frederico Azeiteiro wrote: But it is unusual to use both leading and trailing * operator. Why are you doing this? Yes I know, but I have a few queries that need this. I'll try the ReversedWildcardFilterFactory. ReverseWildcardFilter will help leading wildcard, but will not help trying to use a query with BOTH leading and trailing wildcard. it'll still be slow. Solr/lucene isn't good at that; I didn't even know Solr would do it at all in fact. If you really needed to do that, the way to play to solr/lucene's way of doing things, would be to have a field where you actually index each _character_ as a seperate token. Then leading and trailing wildcard search is basically reduced to a phrase search, but where the words are actually characters. But then you're going to get an index where pretty much every token belongs to every document, which Solr isn't that great at either, but then you can apply commongram stuff on top to help that out a lot too. Not quite sure what the end result will be, I've never tried it. I'd only use that weird special char as token field for queries that actually required leading and trailing wildcards. Figuring out how to set up your analyzers, and what (if anything) you're going to have to do client-app-side to transform the user's query into something that'll end up searching like a phrase search where each 'word' is a character is left as an exersize for the reader. :) Jonathan
AW: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load
Im not sure if i understand your problem, but basicly it isnt Solr vs Lucene but HttpURLConnection vs Solrj's CommonsHttpSolrServer, because Server Query Times havent changed at all from what u say? Why arent you querying the Server the same way you did before when u want to compare solr to lucene only? -Ursprüngliche Nachricht- Von: Ophir Adiv [mailto:firt...@gmail.com] Gesendet: Mittwoch, 4. August 2010 09:11 An: solr-user@lucene.apache.org Betreff: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load [posted this yesterday in lucene-user mailing list, and got an advice to post this here instead. excuse me for spamming] Hi, I'm currently involved in a project of migrating from Lucene 2.9.1 to Solr 1.4.0. During stress testing, I encountered this performance problem: While actual search times in our shards (which are now running Solr) have not changed, the total time it takes for a query has increased dramatically. During this performance test, we of course do not modify the indexes. Our application is sending Solr select queries concurrently to the 8 shards, using CommonsHttpSolrServer. I added some timing debug messages, and found that CommonsHttpSolrServer.java, line 416 takes about 95% of the application's total search time: int statusCode = _httpClient.executeMethod(method); Just to clarify: looking at access logs of the Solr shards, TTLB for a query might be around 5 ms. (on all shards), but httpClient.executeMethod() for this query can be much higher - say, 50 ms. On average, if under light load queries take 12 ms. on average, under heavy load the take around 22 ms. Another route we tried to pursue is add the shards=shard1,shard2,... parameter to the query instead of doing this ourselves, but this doesn't seem to work due to an NPE caused by QueryComponent.returnFields(), line 553: if (returnScores sdoc.score != null) { where sdoc is null. I saw there is a null check on trunk, but since we're currently using Solr 1.4.0's ready-made WAR file, I didn't see an easy way around this. Note: we're using a custom query component which extends QueryComponent, but debugging this, I saw nothing wrong with the results at this point in the code. Our previous code used HTTP in a different manner: For each request, we created a new sun.net.www.protocol.http.HttpURLConnection, and called its getInputStream() method. Under the same load as the new application, the old application does not encounter the delays mentioned above. Our current code is initializing CommonsHttpSolrServer for each shard this way: MultiThreadedHttpConnectionManager httpConnectionManager = new MultiThreadedHttpConnectionManager(); httpConnectionManager.getParams().setTcpNoDelay(true); httpConnectionManager.getParams().setMaxTotalConnections(1024); httpConnectionManager.getParams().setStaleCheckingEnabled(false); HttpClient httpClient = new HttpClient(); HttpClientParams params = new HttpClientParams(); params.setCookiePolicy(CookiePolicy.IGNORE_COOKIES); params.setAuthenticationPreemptive(false); params.setContentCharset(StringConstants.UTF8); httpClient.setParams(params); httpClient.setHttpConnectionManager(httpConnectionManager); and passing the new HttpClient to the Solr Server: solrServer = new CommonsHttpSolrServer(coreUrl, httpClient); We tried two different ways - one with a single MultiThreadedHttpConnectionManager and HttpClient for all the SolrServer's, and the other with a new MultiThreadedHttpConnectionManager and HttpClient for each SolrServer. Both tries yielded similar performance results. Also tried to give setMaxTotalConnections() a much higher connections number (1,000,000) - didn't have an effect. One last thing - to answer Lance's question about this being an apples to apples comparison (in lucene-user thread) - yes, our main goal in this project is to do things as close to the previous version as possible. This way we can monitor that behavior (both quality and performance) remains similar, release this version, and then move forward to improve things. Of course, there are some changes, but I believe we are indeed measuring the complete flow on both apps, and that both apps are returning the same fields via HTTP. Would love to hear what you think about this. TIA, Ophir
Re: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load
On Wed, Aug 4, 2010 at 10:50 AM, Peter Karich peat...@yahoo.de wrote: Ophir, this sounds a bit strange: CommonsHttpSolrServer.java, line 416 takes about 95% of the application's total search time Is this only for heavy load? I think this makes sense, since the hard work is done by Solr - once the application gets the search results from the shards, it does a bit of manipulations on them (combine, filter, ...), but these are easy tasks. Some other things: * with lucene you accessed the indices with MultiSearcher in a LAN, right? No, each shard was run under a different tomcat instance, and each shard was accessed by HTTP calls (the same way we're trying to work now with Solr) * did you look into the logs of the servers, is there something wrong/delayed? Everything seems peachy... logs are clean of errors/warnings and the likes * did you enable gzip compression for your servers or even the binary writer/parser for your solr clients? We're running our application (and Solr) under Tomcat. We do not enable compression (the configuration remained similar to our old application's configuration) Tried using XMLResponseParser instead of BinaryResponseParser - hardly affected run times. Thanks for the ideas, Ophir CommonsHttpSolrServer server = ... server.setRequestWriter(new BinaryRequestWriter()); server.setParser(new BinaryResponseParser()); Regards, Peter. [posted this yesterday in lucene-user mailing list, and got an advice to post this here instead. excuse me for spamming] Hi, I'm currently involved in a project of migrating from Lucene 2.9.1 to Solr 1.4.0. During stress testing, I encountered this performance problem: While actual search times in our shards (which are now running Solr) have not changed, the total time it takes for a query has increased dramatically. During this performance test, we of course do not modify the indexes. Our application is sending Solr select queries concurrently to the 8 shards, using CommonsHttpSolrServer. I added some timing debug messages, and found that CommonsHttpSolrServer.java, line 416 takes about 95% of the application's total search time: int statusCode = _httpClient.executeMethod(method); Just to clarify: looking at access logs of the Solr shards, TTLB for a query might be around 5 ms. (on all shards), but httpClient.executeMethod() for this query can be much higher - say, 50 ms. On average, if under light load queries take 12 ms. on average, under heavy load the take around 22 ms. Another route we tried to pursue is add the shards=shard1,shard2,… parameter to the query instead of doing this ourselves, but this doesn't seem to work due to an NPE caused by QueryComponent.returnFields(), line 553: if (returnScores sdoc.score != null) { where sdoc is null. I saw there is a null check on trunk, but since we're currently using Solr 1.4.0's ready-made WAR file, I didn't see an easy way around this. Note: we're using a custom query component which extends QueryComponent, but debugging this, I saw nothing wrong with the results at this point in the code. Our previous code used HTTP in a different manner: For each request, we created a new sun.net.www.protocol.http.HttpURLConnection, and called its getInputStream() method. Under the same load as the new application, the old application does not encounter the delays mentioned above. Our current code is initializing CommonsHttpSolrServer for each shard this way: MultiThreadedHttpConnectionManager httpConnectionManager = new MultiThreadedHttpConnectionManager(); httpConnectionManager.getParams().setTcpNoDelay(true); httpConnectionManager.getParams().setMaxTotalConnections(1024); httpConnectionManager.getParams().setStaleCheckingEnabled(false); HttpClient httpClient = new HttpClient(); HttpClientParams params = new HttpClientParams(); params.setCookiePolicy(CookiePolicy.IGNORE_COOKIES); params.setAuthenticationPreemptive(false); params.setContentCharset(StringConstants.UTF8); httpClient.setParams(params); httpClient.setHttpConnectionManager(httpConnectionManager); and passing the new HttpClient to the Solr Server: solrServer = new CommonsHttpSolrServer(coreUrl, httpClient); We tried two different ways - one with a single MultiThreadedHttpConnectionManager and HttpClient for all the SolrServer's, and the other with a new MultiThreadedHttpConnectionManager and HttpClient for each SolrServer. Both tries yielded similar performance results. Also tried to give setMaxTotalConnections() a much higher connections number (1,000,000) - didn't have an effect. One last thing - to answer Lance's question about this being an apples to apples comparison (in lucene-user thread) - yes, our main goal in this project is to do things as close to the previous version as possible. This way we can monitor
Is there a better for solor server side loadbalance?
The default solr solution is client side loadbalance. Is there a solution provide the server side loadbalance?
Support loading queries from external files in QuerySenderListener
Hi all! I cant load my custom queries from the external file, as written here: https://issues.apache.org/jira/browse/SOLR-784 This option is seems to be not implemented in current version 1.4.1 of Solr. It was deleted or it comes first with new version? regards, Stanislaw
Re: Date faceting
(10/08/04 19:42), Eric Grobler wrote: Hi Solr community, How do I facet on timestamp for example? I tried something like this - but I get no result. facet=true facet.date=timestamp f.facet.timestamp.date.start=2010-01-01T00:00:00Z f.facet.timestamp.date.end=2010-12-31T00:00:00Z f.facet.timestamp.date.gap=+1HOUR f.facet.timestamp.date.hardend=true Thanks ericz Your parameters are not correct. Try: facet=true facet.date=timestamp facet.date.start=2010-01-01T00:00:00Z facet.date.end=2010-12-31T00:00:00Z facet.date.gap=+1HOUR facet.date.hardend=true If you want to use per-field override feature, you can set them: f.timestamp.facet.date.start=2010-01-01T00:00:00Z f.timestamp.facet.date.end=2010-12-31T00:00:00Z f.timestamp.facet.date.gap=+1HOUR f.timestamp.facet.date.hardend=true Koji -- http://www.rondhuit.com/en/
Re: Best solution to avoiding multiple query requests
Not sure the processing would be any faster than just querying again, but, in your original result set the first doc that has a field value that matches a to 10 facet, will be the number 1 item if you fq on that facet value. So you don't need to query it again. You would only need to query those that aren't in your result set. ie: q=dogfacet=onfacet.field=foo results 10 docs id=1, foo=A id=2, foo=A id=3, foo=B id=4, foo=C id=5, foo=B id=6, foo=A id=7, foo=Z id=8, foo=T id=9, foo=B id=10, foo=J If your facet results top 10 were (A, B, T, J, D, X, Q, O, P, I) you already have the number 1 for A (id 1), B (id 3), T (id 8) and J (id 10) in your very first query. You only need to query D, X, Q, O, P, I. If your first query returned 100 instead of 10 you may even have more of the top 10 represented. Again, the processing steps you would need to do may not be any faster than re-querying, it depends on the speed of your index and network etc. I would think that if your second query was q=dogfq=(foo=A OR foo=B OR foo=T...etc) then you have even a greater chance of having the number 1 result for each of the top 10 in just your second query. -- View this message in context: http://lucene.472066.n3.nabble.com/Best-solution-to-avoiding-multiple-query-requests-tp1020886p1022397.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best solution to avoiding multiple query requests
Field Collapsing (currently as patch) is exactly what you're looking for imo. http://wiki.apache.org/solr/FieldCollapsing http://wiki.apache.org/solr/FieldCollapsingGeert-Jan 2010/8/4 Ken Krugler kkrugler_li...@transpac.com Hi all, I've got a situation where the key result from an initial search request (let's say for dog) is the list of values from a faceted field, sorted by hit count. For the top 10 of these faceted field values, I need to get the top hit for the target request (dog) restricted to that value for the faceted field. Currently this is 11 total requests, of which the 10 requests following the initial query can be made in parallel. But that's still a lot of requests. So my questions are: 1. Is there any magic query to handle this with Solr as-is? 2. if not, is the best solution to create my own request handler? 3. And in that case, any input/tips on developing this type of custom request handler? Thanks, -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
Re: Date faceting
Thanks Koji, It works :-) Have a nice day. regards ericz On Wed, Aug 4, 2010 at 12:08 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: (10/08/04 19:42), Eric Grobler wrote: Hi Solr community, How do I facet on timestamp for example? I tried something like this - but I get no result. facet=true facet.date=timestamp f.facet.timestamp.date.start=2010-01-01T00:00:00Z f.facet.timestamp.date.end=2010-12-31T00:00:00Z f.facet.timestamp.date.gap=+1HOUR f.facet.timestamp.date.hardend=true Thanks ericz Your parameters are not correct. Try: facet=true facet.date=timestamp facet.date.start=2010-01-01T00:00:00Z facet.date.end=2010-12-31T00:00:00Z facet.date.gap=+1HOUR facet.date.hardend=true If you want to use per-field override feature, you can set them: f.timestamp.facet.date.start=2010-01-01T00:00:00Z f.timestamp.facet.date.end=2010-12-31T00:00:00Z f.timestamp.facet.date.gap=+1HOUR f.timestamp.facet.date.hardend=true Koji -- http://www.rondhuit.com/en/
Re: Multi word synomyms
It would be nice if you could configure some kind of filter to be processed before the query string is passed to the parser. The QueryComponent class seems a nice place for this; a filter could be run against the raw query and ResponseBuilder's queryString value could be modified before the QParser is created. -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-word-synomyms-tp1019722p1022461.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Setting up apache solr in eclipse with Tomcat
Have got solr working in the Eclipse and deployed on Tomcat through eclipse plugin. The Crude approach, was to 1. Import the Solr war into Eclipse which will be imported as a web project and can be deployed on tomcat. 2. Add multiple source folders to the Project, linked to the checked out solr source code. e.g. entry in .project file linkedResources link namecommon/name type2/type locationD:/Solr/solr/src/common/location /link . /linkedResources 3. Remove the solr jars from the web-inf lib, so that changes on the project sources can be deployed and debugged. Let me know if you get a better approach. On Wed, Aug 4, 2010 at 3:49 AM, Hando420 hando...@gmail.com wrote: I would like to setup apache solr in eclipse using tomcat. It is easy to setup with jetty but with tomcat it doesn't run solr on runtime. Anyone has done this before? Hando -- View this message in context: http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1021673.html Sent from the Solr - User mailing list archive at Nabble.com.
analysis tool vs. reality
Erik: Yes, I did re-index if that means adding the document again. Here are the exact steps I took: 1. analysis.jsp ABC12 does NOT match title ABC12 (however, ABC or 12 does) 2. changed schema.xml WordDelimeterFilterFactory catenate-all 3. restarted tomcat 4. deleted the document with title ABC12 5. added the document with title ABC12 6. query ABC12 does NOT result in the document with title ABC12 7. analysis.jsp ABC12 DOES match that document now Is there any way to see, given an ID, how something is indexed internally? Lance: I understand the index/query sections of analysis.jsp. However, it operates on text that you enter into the form, not on actual index data. Since all my documents have a unique ID, I'd like to supply an ID and a query, and get back the same index/query sections- using whats actually in the index. -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 22:43:17 -0400 Subject: Re: analysis tool vs. reality Did you reindex after changing the schema? On Aug 3, 2010, at 7:35 PM, Justin Lolofie wrote: Hi Erik, thank you for replying. So, turning on debugQuery shows information about how the query is processed- is there a way to see how things are stored internally in the index? My query is ABC12. There is a document who's title field is ABC12. However, I can only get it to match if I search for ABC or 12. This was also true in the analysis tool up until recently. However, I changed schema.xml and turned on catenate-all in WordDelimterFilterFactory for title fieldtype. Now, in the analysis tool ABC12 matches ABC12. However, when doing an actual query, it does not match. Thank you for any help, Justin -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 16:50:06 -0400 Subject: Re: analysis tool vs. reality The analysis tool is merely that, but during querying there is also a query parser involved. Adding debugQuery=true to your request will give you the parsed query in the response offering insight into what might be going on. Could be lots of things, like not querying the fields you think you are to a misunderstanding about some text not being analyzed (like wildcard clauses). Erik On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote: Hello, I have found the analysis tool in the admin page to be very useful in understanding my schema. I've made changes to my schema so that a particular case I'm looking at matches properly. I restarted solr, deleted the document from the index, and added it again. But still, when I do a query, the document does not get returned in the results. Does anyone have any tips for debugging this sort of issue? What is different between what I see in analysis tool and new documents added to the index? Thanks, Justin
No group by? looking for an alternative.
Hello, I'm dealing with a problem since few days : I want to index and search shoes, each shoe can have several size and colors, at different prices. So, what i want is : when I search for Converse, i want to retrieve one shoe per model, i-e one color and one size, but having colors and sizes in facets. My first idea was to copy SQL behaviour with a SELECT * FROM solr WHERE text CONTAINS 'converse' GROUP BY model. But no group by in Solr :(. I try with FieldCollapsing, but have many bugs (NullPointerException). Then I try with multivalued facets : field name=size type=string indexed=true stored=true multiValued=true/ field name=color type=string indexed=true stored=true multiValued=true/ It's nearly working, but i have a problem : when i filtered on red shoes, in the size facet, I also have sizes which are not available in red. I don't find any solutions to filter multivalued facet with value of another multivalued facet. So if anyone have an idea for solving this problem... Mickael. -- View this message in context: http://lucene.472066.n3.nabble.com/No-group-by-looking-for-an-alternative-tp1022738p1022738.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: analysis tool vs. reality
I think I agree with Justin here, I think the way analysis tool highlights 'matches' is extremely misleading, especially considering it completely ignores queryparsing. it would be better if it put your text in a memoryindex and actually parsed the query w/ queryparser, ran it, and used the highlighter to try to show any matches. On Wed, Aug 4, 2010 at 10:14 AM, Justin Lolofie jta...@gmail.com wrote: Erik: Yes, I did re-index if that means adding the document again. Here are the exact steps I took: 1. analysis.jsp ABC12 does NOT match title ABC12 (however, ABC or 12 does) 2. changed schema.xml WordDelimeterFilterFactory catenate-all 3. restarted tomcat 4. deleted the document with title ABC12 5. added the document with title ABC12 6. query ABC12 does NOT result in the document with title ABC12 7. analysis.jsp ABC12 DOES match that document now Is there any way to see, given an ID, how something is indexed internally? Lance: I understand the index/query sections of analysis.jsp. However, it operates on text that you enter into the form, not on actual index data. Since all my documents have a unique ID, I'd like to supply an ID and a query, and get back the same index/query sections- using whats actually in the index. -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 22:43:17 -0400 Subject: Re: analysis tool vs. reality Did you reindex after changing the schema? On Aug 3, 2010, at 7:35 PM, Justin Lolofie wrote: Hi Erik, thank you for replying. So, turning on debugQuery shows information about how the query is processed- is there a way to see how things are stored internally in the index? My query is ABC12. There is a document who's title field is ABC12. However, I can only get it to match if I search for ABC or 12. This was also true in the analysis tool up until recently. However, I changed schema.xml and turned on catenate-all in WordDelimterFilterFactory for title fieldtype. Now, in the analysis tool ABC12 matches ABC12. However, when doing an actual query, it does not match. Thank you for any help, Justin -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 16:50:06 -0400 Subject: Re: analysis tool vs. reality The analysis tool is merely that, but during querying there is also a query parser involved. Adding debugQuery=true to your request will give you the parsed query in the response offering insight into what might be going on. Could be lots of things, like not querying the fields you think you are to a misunderstanding about some text not being analyzed (like wildcard clauses). Erik On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote: Hello, I have found the analysis tool in the admin page to be very useful in understanding my schema. I've made changes to my schema so that a particular case I'm looking at matches properly. I restarted solr, deleted the document from the index, and added it again. But still, when I do a query, the document does not get returned in the results. Does anyone have any tips for debugging this sort of issue? What is different between what I see in analysis tool and new documents added to the index? Thanks, Justin -- Robert Muir rcm...@gmail.com
analysis tool vs. reality
Wow, I got to work this morning and my query results now include the 'ABC12' document. I'm not sure what that means. Either I made a mistake in the process I described in the last email (I dont think this is the case) or there is some kind of caching of query results going on that doesnt get flushed on a restart of tomcat. Erik: Yes, I did re-index if that means adding the document again. Here are the exact steps I took: 1. analysis.jsp ABC12 does NOT match title ABC12 (however, ABC or 12 does) 2. changed schema.xml WordDelimeterFilterFactory catenate-all 3. restarted tomcat 4. deleted the document with title ABC12 5. added the document with title ABC12 6. query ABC12 does NOT result in the document with title ABC12 7. analysis.jsp ABC12 DOES match that document now Is there any way to see, given an ID, how something is indexed internally? Lance: I understand the index/query sections of analysis.jsp. However, it operates on text that you enter into the form, not on actual index data. Since all my documents have a unique ID, I'd like to supply an ID and a query, and get back the same index/query sections- using whats actually in the index. -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 22:43:17 -0400 Subject: Re: analysis tool vs. reality Did you reindex after changing the schema? On Aug 3, 2010, at 7:35 PM, Justin Lolofie wrote: Hi Erik, thank you for replying. So, turning on debugQuery shows information about how the query is processed- is there a way to see how things are stored internally in the index? My query is ABC12. There is a document who's title field is ABC12. However, I can only get it to match if I search for ABC or 12. This was also true in the analysis tool up until recently. However, I changed schema.xml and turned on catenate-all in WordDelimterFilterFactory for title fieldtype. Now, in the analysis tool ABC12 matches ABC12. However, when doing an actual query, it does not match. Thank you for any help, Justin -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 16:50:06 -0400 Subject: Re: analysis tool vs. reality The analysis tool is merely that, but during querying there is also a query parser involved. Adding debugQuery=true to your request will give you the parsed query in the response offering insight into what might be going on. Could be lots of things, like not querying the fields you think you are to a misunderstanding about some text not being analyzed (like wildcard clauses). Erik On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote: Hello, I have found the analysis tool in the admin page to be very useful in understanding my schema. I've made changes to my schema so that a particular case I'm looking at matches properly. I restarted solr, deleted the document from the index, and added it again. But still, when I do a query, the document does not get returned in the results. Does anyone have any tips for debugging this sort of issue? What is different between what I see in analysis tool and new documents added to the index? Thanks, Justin
Re: Support loading queries from external files in QuerySenderListener
On Wed, Aug 4, 2010 at 3:27 PM, Stanislaw solrgeschic...@googlemail.comwrote: Hi all! I cant load my custom queries from the external file, as written here: https://issues.apache.org/jira/browse/SOLR-784 This option is seems to be not implemented in current version 1.4.1 of Solr. It was deleted or it comes first with new version? That patch was never committed so it is not available in any release. -- Regards, Shalin Shekhar Mangar.
Re: analysis tool vs. reality
On Wed, Aug 4, 2010 at 7:52 PM, Robert Muir rcm...@gmail.com wrote: I think I agree with Justin here, I think the way analysis tool highlights 'matches' is extremely misleading, especially considering it completely ignores queryparsing. it would be better if it put your text in a memoryindex and actually parsed the query w/ queryparser, ran it, and used the highlighter to try to show any matches. +1 -- Regards, Shalin Shekhar Mangar.
Re: Is there a better for solor server side loadbalance?
2010/8/4 Chengyang atreey...@163.com The default solr solution is client side loadbalance. Is there a solution provide the server side loadbalance? No. Most of us stick a HTTP load balancer in front of multiple Solr servers. -- Regards, Shalin Shekhar Mangar.
DIH and Cassandra
Is it possible to use DIH with Cassandra either out of the box or with something more custom? Thanks
Re: enhancing auto complete
I preferred to answer this question privately earlier. But I have received innumerable requests to unveil the architecture. For the benefit of all, I am posting it here (after hiding as much info as I should, in my company's interest). The context: Auto-suggest feature on http://askme.in *Solr setup*: Underneath are some of the salient features - 1. TermsComponent is NOT used. 2. The index is made up of 4 fields of the following types - autocomplete_full, autocomplete_token, string and text. 3. autocomplete_full uses KeywordTokenizerFactory and EdgeNGramFilterFactory. autocomplete_token uses WhitespaceTokenizerFactory and EdgeNGramFilterFactory. Both of these are Solr text fields with standard filters like LowerCaseFilterFactory etc applied during querying and indexing. 4. Standard DataImportHandler and a bunch of sql procedures are used to derive all suggestable phrases from the system and index them in the above mentioned fields. *Controller setup*: The controller (to handle suggest queries) is a typical JAVA servlet using Solr as its backend (connecting via solrj). Based on the incoming query string, a lucene query is created. It is BooleanQuery comprising of TermQuery across all the above mentioned fields. The boost factor to each of these term queries would determine (to an extent) what kind of matches do you prefer to show up first. JSON is used as the data exchange format. *Frontend setup*: It is a home grown JS to address some specific use cases of the project in question. One simple exercise with Firebug will spill all the beans. However, I strongly recommend using jQuery to build (and extend) the UI component. Any help beyond this is available, but off the list. Cheers Avlesh @avlesh http://twitter.com/avlesh | http://webklipper.com On Tue, Aug 3, 2010 at 10:04 AM, Bhavnik Gajjar bhavnik.gaj...@gatewaynintec.com wrote: Whoops! table still not looks ok :( trying to send once again loremLorem ipsum dolor sit amet Hieyed ddi lorem ipsum dolor test lorem ipsume test xyz lorem ipslili lorem ipLorem ipsum dolor sit amet Hieyed ddi lorem ipsum dolor test lorem ipsume test xyz lorem ipslili lorem ipsltest xyz lorem ipslili On 8/3/2010 10:00 AM, Bhavnik Gajjar wrote: Avlesh, Thanks for responding The table mentioned below looks like, lorem Lorem ipsum dolor sit amet Hieyed ddi lorem ipsum dolor test lorem ipsume test xyz lorem ipslili lorem ip Lorem ipsum dolor sit amet Hieyed ddi lorem ipsum dolor test lorem ipsume test xyz lorem ipslili lorem ipsl test xyz lorem ipslili Yes, [http://askme.in] looks good! I would like to know its designs/solr configurations etc.. Can you please provide me detailed views of it? In [http://askme.in], there is one thing to be noted. Search text like, [business c] populates [Business Centre] which looks OK but, [Consultant Business] looks bit odd. But, in general the pointer you suggested is great to start with. On 8/2/2010 8:39 PM, Avlesh Singh wrote: From whatever I could read in your broken table of sample use cases, I think you are looking for something similar to what has been done here -http://askme.in; if this is what you are looking do let me know. Cheers Avlesh @avleshhttp://twitter.com/avlesh http://twitter.com/avlesh | http://webklipper.com On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik Gajjarbhavnik.gaj...@gatewaynintec.com wrote: Hi, I'm looking for a solution related to auto complete feature for one application. Below is a list of texts from which auto complete results would be populated. Lorem ipsum dolor sit amet tincidunt ut laoreet dolore eu feugiat nulla facilisis at vero eros et te feugait nulla facilisi Claritas est etiam processus anteposuerit litterarum formas humanitatis fiant sollemnes in futurum Hieyed ddi lorem ipsum dolor test lorem ipsume test xyz lorem ipslili Consider below table. First column describes user entered value and second column describes expected result (list of auto complete terms that should be populated from Solr) lorem *Lorem* ipsum dolor sit amet Hieyed ddi *lorem* ipsum dolor test *lorem *ipsume test xyz *lorem *ipslili lorem ip *Lorem ip*sum dolor sit amet Hieyed ddi *lorem ip*sum dolor test *lorem ip*sume test xyz *lorem ip*slili lorem ipsl test xyz *lorem ipsl*ili Can anyone share ideas of how this can be achieved
Re: Setting up apache solr in eclipse with Tomcat
Thanks man i haven't tried this but where do put that xml configuration. Is it to the web.xml in solr? Cheers, Hando -- View this message in context: http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1023188.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Setting up apache solr in eclipse with Tomcat
The sole home is configured in the web.xml of the application which points to the folder having the conf files and the data directory env-entry env-entry-namesolr/home/env-entry-name env-entry-valueD:/multicore/env-entry-value env-entry-typejava.lang.String/env-entry-type /env-entry Regards, Jayendra On Wed, Aug 4, 2010 at 12:21 PM, Hando420 hando...@gmail.com wrote: Thanks man i haven't tried this but where do put that xml configuration. Is it to the web.xml in solr? Cheers, Hando -- View this message in context: http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1023188.html Sent from the Solr - User mailing list archive at Nabble.com.
can't use strdist as functionquery?
I want to sort my results by how closely a given resultset field matches a given string. For example, say I am searching for a given product, and the product can be found in many cities including seattle. I want to sort the results so that results from city of seattle are at the top, and all other results below that I thought that I could do so by using strdist as a functionquery (I am using solr 1.4 so I cant directly sort on strdist) but am having problems with the syntax of the query because functionqueries require double quotes and so does strdist. My current query, which fails with an NPE, looks something like this: http://localhost:8080/solr/select?q=(product:foo) _val_:strdist(seattle,city,edit)sort=score%20ascfl=product, city, score I have tried various types of URL encoding (ie using %22 instead of double quotes in the strdist function), but no success. Any ideas?? Is there a better way to accomplish this sorting?? -- View this message in context: http://lucene.472066.n3.nabble.com/can-t-use-strdist-as-functionquery-tp1023390p1023390.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Setting up apache solr in eclipse with Tomcat
Thanks now its clear and works fine. Regards, Hando -- View this message in context: http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1023404.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sharing index files between multiple JVMs and replication
Is anybody else encountering these same issues; IF having a similar setup? And is there a way to configure certain Solr web-apps as read-only (basically dummy instances) so that index changes are not allowed? - Original Message From: Kelly Taylor wired...@yahoo.com To: solr-user@lucene.apache.org Sent: Tue, August 3, 2010 5:48:11 PM Subject: Re: Sharing index files between multiple JVMs and replication Yes, they are on a common file server, and I've been sharing the same index directory between the Solr JVMs. But I seem to be hitting a wall when attempting to use just one instance for changing the index. With Solr replication disabled, I stream updates to the one instance, and this process hangs whenever there are additional Solr JVMs started up with the same configuration in solrconfig.xml - So I then tried, to no avail, using a different configuration, solrconfig-readonly.xml where the updateHandler was commmented out, all /update* requestHandlers removed, mainIndex locktype of none, etc. And with Solr replication enabled, the Slave seems to hang, or at least report unusually long time estimates for the current running replication process to complete. -Kelly - Original Message From: Lance Norskog goks...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, August 3, 2010 4:56:58 PM Subject: Re: Sharing index files between multiple JVMs and replication Are these files on a common file server? If you want to share them that way, it actually does work just to give them all the same index directory, as long as only one of them changes it. On Tue, Aug 3, 2010 at 4:38 PM, Kelly Taylor wired...@yahoo.com wrote: Is there a way to share index files amongst my multiple Solr web-apps, by configuring only one of the JVMs as an indexer, and the remaining, as read-only searchers? I'd like to configure in such a way that on startup of the read-only searchers, missing cores/indexes are not created, and updates are not handled. If I can get around the files being locked by the read-only instances, I should be able to scale wider in a given environment, as well as have less replicated copies of my master index (Solr 1.4 Java Replication). Then once the commit is issued to the slave, I can fire off a RELOAD script for each of my read-only cores. -Kelly -- Lance Norskog goks...@gmail.com
Re: analysis tool vs. reality
: I think I agree with Justin here, I think the way analysis tool highlights : 'matches' is extremely misleading, especially considering it completely : ignores queryparsing. it really only attempts to identify when there is overlap between analaysis at query time and at indexing time so you can easily spot when one analyzer or the other breaks things so that they no longer line up (or when it fiexes things so they start to line up) Even if we eliminated that highlighting as missleading, people would still do it in thier minds, it would just be harder -- it doesn't change the underlying fact that analysis is only part of the picture. : it would be better if it put your text in a memoryindex and actually parsed : the query w/ queryparser, ran it, and used the highlighter to try to show : any matches. Thta level of query explanation really only works if the user gives us a full document (all fields, not just one) and a full query string, and all of the possible query params -- because the query parser (either implicit because of config, or explicitly specified by the user) might change it's behavior based on those other params. I agree with you: debugging functionality along hte lines of what you are describing would be *VASTLY* more useful then what we've got right now, and is something i breifly looked into doing before as an extension of the existing DebugComponent... https://issues.apache.org/jira/browse/SOLR-1749 ...the problems i encountered trying to do it as a debug component on a real Solr request seem like they would also be problems for a MemoryIndex based admin tool approach like what you suggest -- but if you've got ideas on working arround them i am 100% interested. Independent of how we might create a better QueryPasrser + Analyssis Explanation tool / debug component is hte question of what we can do to make it more clear what exactly the analysis.jsp page is doing and what people can infer from that page. As i said, i don't think removing the match highlighting will actaully reduce confusion, but perhaps there is verbage/disclaimers that could be added to make it more clear? -Hoss
Re: analysis tool vs. reality
Furthermore, I would like to add its not just the highlight matches functionality that is horribly broken here, but the output of the analysis itself is misleading. lets say i take 'textTight' from the example, and add the following synonym: this is broken = broke the query time analysis is wrong, as it clearly shows synonymfilter collapsing this is broken to broke, but in reality with the qp for that field, you are gonna get 3 separate tokenstreams and this will never actually happen (because the qp will divide it up on whitespace first) So really the output from 'Query Analyzer' is completely bogus. On Wed, Aug 4, 2010 at 1:57 PM, Robert Muir rcm...@gmail.com wrote: On Wed, Aug 4, 2010 at 1:45 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: it really only attempts to identify when there is overlap between analaysis at query time and at indexing time so you can easily spot when one analyzer or the other breaks things so that they no longer line up (or when it fiexes things so they start to line up) It attempts badly, because it only works in the most trivial of cases (e.g. doesnt reflect the interaction of queryparser with multiword synonyms or worddelimiterfilter). Since Solr includes these non-trivial analysis components *in the example* it means that this 'highlight matches' doesnt actually even really work at all. Someone is gonna use this thing when they dont understand why analysis isnt doing what they want, i.e. the cases like I outlined above. For the trivial cases where it does work the 'highlight matches' isnt useful anyway, so in its current state its completely unnecessary. Even if we eliminated that highlighting as missleading, people would still do it in thier minds, it would just be harder -- it doesn't change the underlying fact that analysis is only part of the picture. I'm not suggesting that. I'm suggesting fixing the highlighting so its not misleading. There are really only two choices: 1. remove the current highlighting 2. fix it. in its current state its completely useless and misleading, except for very trivial cases, in which you dont need it anyway. : it would be better if it put your text in a memoryindex and actually parsed : the query w/ queryparser, ran it, and used the highlighter to try to show : any matches. Thta level of query explanation really only works if the user gives us a full document (all fields, not just one) and a full query string, and all of the possible query params -- because the query parser (either implicit because of config, or explicitly specified by the user) might change it's behavior based on those other params. thats true, but I dont see why the user couldnt be allowed to provide just this. I'd bet money a lot of people are using this thing with a specific query/document in mind anyway! people can infer from that page. As i said, i don't think removing the match highlighting will actaully reduce confusion, but perhaps there is verbage/disclaimers that could be added to make it more clear? As i said before, I think i disagree with you. I think for stuff like this the technicals are less important, whats important is this is a misleading checkbox that really confuses users. I suggest disabling it entirely, you are only going to remove confusion. -- Robert Muir rcm...@gmail.com -- Robert Muir rcm...@gmail.com
Re: Best solution to avoiding multiple query requests
Hi Geert-Jan, On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote: Field Collapsing (currently as patch) is exactly what you're looking for imo. http://wiki.apache.org/solr/FieldCollapsing Thanks for the ref, good stuff. I think it's close, but if I understand this correctly, then I could get (using just top two, versus top 10 for simplicity) results that looked like dog training (faceted field value A) super dog (faceted field value B) but if the actual faceted field value/hit counts were: C (10) D (8) A (2) B (1) Then what I'd want is the top hit for dog AND facet field:C, followed by dog AND facet field:D. Used field collapsing would improve the probability that if I asked for the top 100 hits, I'd find entries for each of my top N faceted field values. Thanks again, -- Ken I've got a situation where the key result from an initial search request (let's say for dog) is the list of values from a faceted field, sorted by hit count. For the top 10 of these faceted field values, I need to get the top hit for the target request (dog) restricted to that value for the faceted field. Currently this is 11 total requests, of which the 10 requests following the initial query can be made in parallel. But that's still a lot of requests. So my questions are: 1. Is there any magic query to handle this with Solr as-is? 2. if not, is the best solution to create my own request handler? 3. And in that case, any input/tips on developing this type of custom request handler? Thanks, -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
Re: DIH and Cassandra
DIH only works with relational databases and XML files [1], you need to write custom code in order to index data from Cassandra. It should be pretty easy to map documents from Cassandra to Solr. There are a lot of client libraries available [2] for Cassandra. [1] http://wiki.apache.org/solr/DataImportHandler [2] http://wiki.apache.org/cassandra/ClientOptions On Wed, Aug 4, 2010 at 6:41 PM, Mark static.void@gmail.com wrote: Is it possible to use DIH with Cassandra either out of the box or with something more custom? Thanks -- Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr
Re: DIH and Cassandra
DIH only works with relational databases and XML files [1], you need to write custom code in order to index data from Cassandra. It should be pretty easy to map documents from Cassandra to Solr. There are a lot of client libraries available [2] for Cassandra. [1] http://wiki.apache.org/solr/DataImportHandler [2] http://wiki.apache.org/cassandra/ClientOptions On Wed, Aug 4, 2010 at 6:41 PM, Mark static.void@gmail.com wrote: Is it possible to use DIH with Cassandra either out of the box or with something more custom? Thanks -- Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr
Re: Is there a better for solor server side loadbalance?
Check this article [1] that explains how to setup haproxy to do load balacing. The steps are the same even if you are not using Drupal. By using this approach you can easily add more replicas without changing the application configuration files. You should also check SolrCloud [2] which does automatic load balancing and fail-over for queries. This branch is still under development. [1] http://davehall.com.au/blog/dave/2010/03/13/solr-replication-load-balancing-haproxy-and-drupal [2] http://wiki.apache.org/solr/SolrCloud 2010/8/4 Chengyang atreey...@163.com: - Hide quoted text - The default solr solution is client side loadbalance. Is there a solution provide the server side loadbalance? -- Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr
Solrj ContentStreamUpdateRequest Slow
I'm running a slight variation of the example code referenced below and it takes a real long time to finally execute. In fact it hangs for a long time at solr.request(up) before finally executing. Is there anything I can look at or tweak to improve performance? I am also indexing a local pdf file, there are no firewall issues, solr is running on the same machine, and I tried the actual host name in addition to localhost but nothing helps. Thanks - Tod http://wiki.apache.org/solr/ContentStreamUpdateRequestExample
Re: Best solution to avoiding multiple query requests
If I understand correctly: you want to sort your collapsed results by 'nr of collapsed results'/ hits. It seems this can't be done out-of-the-box using this patch (I'm not entirely sure, at least it doesn't follow from the wiki-page. Perhaps best is to check the jira-issues to make sure this isn't already available now, but just not updated on the wiki) Also I found a blogpost (from the patch creator afaik) with in the comments someone with the same issue + some pointers. http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/ hope that helps, Geert-jan 2010/8/4 Ken Krugler kkrugler_li...@transpac.com Hi Geert-Jan, On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote: Field Collapsing (currently as patch) is exactly what you're looking for imo. http://wiki.apache.org/solr/FieldCollapsing Thanks for the ref, good stuff. I think it's close, but if I understand this correctly, then I could get (using just top two, versus top 10 for simplicity) results that looked like dog training (faceted field value A) super dog (faceted field value B) but if the actual faceted field value/hit counts were: C (10) D (8) A (2) B (1) Then what I'd want is the top hit for dog AND facet field:C, followed by dog AND facet field:D. Used field collapsing would improve the probability that if I asked for the top 100 hits, I'd find entries for each of my top N faceted field values. Thanks again, -- Ken I've got a situation where the key result from an initial search request (let's say for dog) is the list of values from a faceted field, sorted by hit count. For the top 10 of these faceted field values, I need to get the top hit for the target request (dog) restricted to that value for the faceted field. Currently this is 11 total requests, of which the 10 requests following the initial query can be made in parallel. But that's still a lot of requests. So my questions are: 1. Is there any magic query to handle this with Solr as-is? 2. if not, is the best solution to create my own request handler? 3. And in that case, any input/tips on developing this type of custom request handler? Thanks, -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
Re: Best solution to avoiding multiple query requests
Hi Geert-jan, On Aug 4, 2010, at 12:04pm, Geert-Jan Brits wrote: If I understand correctly: you want to sort your collapsed results by 'nr of collapsed results'/ hits. It seems this can't be done out-of-the-box using this patch (I'm not entirely sure, at least it doesn't follow from the wiki-page. Perhaps best is to check the jira-issues to make sure this isn't already available now, but just not updated on the wiki) Also I found a blogpost (from the patch creator afaik) with in the comments someone with the same issue + some pointers. http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/ Yup, that's the one - http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/comment-page-1/#comment-1249 So with some modifications to that patch, it could work...thanks for the info! -- Ken 2010/8/4 Ken Krugler kkrugler_li...@transpac.com Hi Geert-Jan, On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote: Field Collapsing (currently as patch) is exactly what you're looking for imo. http://wiki.apache.org/solr/FieldCollapsing Thanks for the ref, good stuff. I think it's close, but if I understand this correctly, then I could get (using just top two, versus top 10 for simplicity) results that looked like dog training (faceted field value A) super dog (faceted field value B) but if the actual faceted field value/hit counts were: C (10) D (8) A (2) B (1) Then what I'd want is the top hit for dog AND facet field:C, followed by dog AND facet field:D. Used field collapsing would improve the probability that if I asked for the top 100 hits, I'd find entries for each of my top N faceted field values. Thanks again, -- Ken I've got a situation where the key result from an initial search request (let's say for dog) is the list of values from a faceted field, sorted by hit count. For the top 10 of these faceted field values, I need to get the top hit for the target request (dog) restricted to that value for the faceted field. Currently this is 11 total requests, of which the 10 requests following the initial query can be made in parallel. But that's still a lot of requests. So my questions are: 1. Is there any magic query to handle this with Solr as-is? 2. if not, is the best solution to create my own request handler? 3. And in that case, any input/tips on developing this type of custom request handler? Thanks, -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
Indexing boolean value
Im trying to index a boolean location, but for some reason it does not show up in my indexed data. data-config.xml entity name=location query=select * from locations field name=id column=ID / field name=title column=TITLE / field name=city column=CITY / field name=official column=OFFICIALLOCATION / OFFICIALLOCATION is a MSSQL database field of type 'bit' schema.xml field name=official type=boolean indexed=true stored=true/ copyField source=official dest=text / (im not sure why I would use copyField, also tried it without that line, but still without luck) -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023708.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Indexing fieldvalues with dashes and spaces
Your schema.xml setting for the field is probably tokenizing the punctuation. Change the field type to one that doesn't tokenize on punctuation; e.g. use text_ws and not text -Original Message- From: PeterKerk [mailto:vettepa...@hotmail.com] Sent: Wednesday, August 04, 2010 3:36 PM To: solr-user@lucene.apache.org Subject: Indexing fieldvalues with dashes and spaces Im having issues with indexing field values containing spaces and dashes. For example: Im trying to index province names of the Netherlands. Some province names contain a -: Zuid-Holland Noord-Holland my data-config has this: entity name=location_province query=select provinceid from locations where id=${location.id} entity name=provinces query=select title from provinces where id = ${location_province.provinceid} field name=province column=title / /entity /entity When I check what has been indexed, I have this: response − lst name=responseHeader int name=status0/int int name=QTime0/int − lst name=params str name=indenton/str str name=start0/str str name=q*:*/str str name=version2.2/str str name=rows10/str /lst /lst − result name=response numFound=3 start=0 − doc str name=cityNijmegen/str − arr name=features strTuin/str strCafe/str /arr str name=id1/str str name=provinceGelderland/str − arr name=services strFotoreportage/str /arr − arr name=theme strGemeentehuis/str /arr date name=timestamp2010-08-04T19:11:51.796Z/date str name=titleGemeentehuis Nijmegen/str /doc − doc str name=cityUtrecht/str − arr name=features strTuin/str strCafe/str strDanszaal/str /arr str name=id2/str str name=provinceUtrecht/str − arr name=services strFotoreportage/str strExclusieve huur/str /arr − arr name=theme strGemeentehuis/str /arr date name=timestamp2010-08-04T19:11:51.796Z/date str name=titleGemeentehuis Utrecht/str /doc − doc str name=cityBloemendaal/str − arr name=features strStrand/str strCafe/str strDanszaal/str /arr str name=id3/str str name=provinceZuid-Holland/str − arr name=services strExclusieve huur/str strLive muziek/str /arr − arr name=theme strStrand Zee/str /arr date name=timestamp2010-08-04T19:11:51.812Z/date str name=titleBeachclub Vroeger/str /doc /result /response So we see that the full field has been indexed: str name=provinceZuid-Holland/str BUT, when I check the facets via http://localhost:8983/solr/db/select/?wt=jsonindent=onq=*:*fl=id,title,city,score,features,official,servicesfacet=truefacet.field=themefacet.field=featuresfacet.field=provincefacet.field=services I get this (snippet): facet_counts:{ facet_queries:{}, facet_fields:{ theme:[ Gemeentehuis,2, ,1, a Strand,1, Zee,1], features:[ cafe,3, danszaal,2, tuin,2, strand,1], province:[ gelderland,1, holland,1, utrecht,1, zuid,1, b zuidholland,1], services:[ exclusiev,2, fotoreportag,2, c huur,2, live,1, d muziek,1]}, There several weird things happen which I have indicated with === a. the full field value is Strand Zee, but now one facet is b. the full field value is Zuid-Holland, but now zuid is a separate facet c. the full field value is fotoreportage, but somehow the last character has been truncated d. the full field value live muziek, but now live and muziek have become separate facets What can I do about this? -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023699.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Indexing boolean value
I could be wrong, but I thought bit was an integer. Try changing fieldtype to integer. -Original Message- From: PeterKerk [mailto:vettepa...@hotmail.com] Sent: Wednesday, August 04, 2010 3:42 PM To: solr-user@lucene.apache.org Subject: Indexing boolean value Im trying to index a boolean location, but for some reason it does not show up in my indexed data. data-config.xml entity name=location query=select * from locations field name=id column=ID / field name=title column=TITLE / field name=city column=CITY / field name=official column=OFFICIALLOCATION / OFFICIALLOCATION is a MSSQL database field of type 'bit' schema.xml field name=official type=boolean indexed=true stored=true/ copyField source=official dest=text / (im not sure why I would use copyField, also tried it without that line, but still without luck) -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023708.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Indexing fieldvalues with dashes and spaces
I changed values to text_ws Now I only seem to have problems with fieldvalues that hold spacessee below: field name=city type=text_ws indexed=true stored=true/ field name=theme type=text_ws indexed=true stored=true multiValued=true omitNorms=true termVectors=true / field name=features type=text_ws indexed=true stored=true multiValued=true/ field name=services type=text_ws indexed=true stored=true multiValued=true/ field name=province type=text_ws indexed=true stored=true/ It has now become: facet_counts:{ facet_queries:{}, facet_fields:{ theme:[ Gemeentehuis,2, ,1, still is created as separate facet Strand,1, Zee,1], features:[ Cafe,3, Danszaal,2, Tuin,2, Strand,1], province:[ Gelderland,1, Utrecht,1, Zuid-Holland,1], this is now correct services:[ Exclusieve,2, Fotoreportage,2, huur,2, Live,1, Live muziek is split and separate facets are created muziek,1]}, facet_dates:{}}} -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023787.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Indexing fieldvalues with dashes and spaces
You shouldn't fetch faceting results from analyzed fields, it will mess with your results. Search on analyzed fields but don't retrieve values from them. -Original message- From: PeterKerk vettepa...@hotmail.com Sent: Wed 04-08-2010 22:15 To: solr-user@lucene.apache.org; Subject: RE: Indexing fieldvalues with dashes and spaces I changed values to text_ws Now I only seem to have problems with fieldvalues that hold spacessee below: field name=city type=text_ws indexed=true stored=true/ field name=theme type=text_ws indexed=true stored=true multiValued=true omitNorms=true termVectors=true / field name=features type=text_ws indexed=true stored=true multiValued=true/ field name=services type=text_ws indexed=true stored=true multiValued=true/ field name=province type=text_ws indexed=true stored=true/ It has now become: facet_counts:{ facet_queries:{}, facet_fields:{ theme:[ Gemeentehuis,2, ,1, still is created as separate facet Strand,1, Zee,1], features:[ Cafe,3, Danszaal,2, Tuin,2, Strand,1], province:[ Gelderland,1, Utrecht,1, Zuid-Holland,1], this is now correct services:[ Exclusieve,2, Fotoreportage,2, huur,2, Live,1, Live muziek is split and separate facets are created muziek,1]}, facet_dates:{}}} -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023787.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Indexing boolean value
Hi, I tried that already, so that would make this: field name=official type=integer indexed=true stored=true/ copyField source=official dest=text / (still not sure what copyField does though) But even that wont work. I also dont see the officallocation columns indexed in the documents: http://localhost:8983/solr/db/select/?q=*%3A*version=2.2start=0rows=10indent=on -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023811.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Indexing fieldvalues with dashes and spaces
Sorry, but Im a newbie to Solr...how would I change my schema.xml to match your requirements? And what do you mean by it will mess with your results? What will happen then? -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023824.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Indexing fieldvalues with dashes and spaces
Echoing Markus - use the tokenized field to return results, but have a duplicate field of fieldtype=string to show the untokenized results. E.g. facet on that field. -Original Message- From: Markus Jelsma [mailto:markus.jel...@buyways.nl] Sent: Wednesday, August 04, 2010 4:18 PM To: solr-user@lucene.apache.org Subject: RE: Indexing fieldvalues with dashes and spaces You shouldn't fetch faceting results from analyzed fields, it will mess with your results. Search on analyzed fields but don't retrieve values from them. -Original message- From: PeterKerk vettepa...@hotmail.com Sent: Wed 04-08-2010 22:15 To: solr-user@lucene.apache.org; Subject: RE: Indexing fieldvalues with dashes and spaces I changed values to text_ws Now I only seem to have problems with fieldvalues that hold spacessee below: field name=city type=text_ws indexed=true stored=true/ field name=theme type=text_ws indexed=true stored=true multiValued=true omitNorms=true termVectors=true / field name=features type=text_ws indexed=true stored=true multiValued=true/ field name=services type=text_ws indexed=true stored=true multiValued=true/ field name=province type=text_ws indexed=true stored=true/ It has now become: facet_counts:{ facet_queries:{}, facet_fields:{ theme:[ Gemeentehuis,2, ,1, still is created as separate facet Strand,1, Zee,1], features:[ Cafe,3, Danszaal,2, Tuin,2, Strand,1], province:[ Gelderland,1, Utrecht,1, Zuid-Holland,1], this is now correct services:[ Exclusieve,2, Fotoreportage,2, huur,2, Live,1, Live muziek is split and separate facets are created muziek,1]}, facet_dates:{}}} -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023787.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Indexing boolean value
Copyfield copies the field so you can have multiple versions. Useful to dump all fields into one super field you can search on, for perf reasons. If the column isn't being indexed, I'd suggest the problem is in DIH. No suggestions as to why, I'm afraid. -Original Message- From: PeterKerk [mailto:vettepa...@hotmail.com] Sent: Wednesday, August 04, 2010 4:22 PM To: solr-user@lucene.apache.org Subject: RE: Indexing boolean value Hi, I tried that already, so that would make this: field name=official type=integer indexed=true stored=true/ copyField source=official dest=text / (still not sure what copyField does though) But even that wont work. I also dont see the officallocation columns indexed in the documents: http://localhost:8983/solr/db/select/?q=*%3A*version=2.2start=0rows=10indent=on -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023811.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Indexing fieldvalues with dashes and spaces
Hmm, you should first read a bit more on schema design on the wiki and learn about indexing and querying Solr. The copyField directive is what is commonly used in a faceted navigation system, search on analyzed fields, show faceting results using the primitive string field type. With copyField, you can, well, copy the field from one to another without it being analyzed by the first - so no chaining is possible, which is good. Let's say you have a city field you want to navigate with, but also search in, then you would have an analyzed field for search and a string field for displaying the navigation. But, check the wiki on this subject. -Original message- From: PeterKerk vettepa...@hotmail.com Sent: Wed 04-08-2010 22:23 To: solr-user@lucene.apache.org; Subject: RE: Indexing fieldvalues with dashes and spaces Sorry, but Im a newbie to Solr...how would I change my schema.xml to match your requirements? And what do you mean by it will mess with your results? What will happen then? -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023824.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH and Cassandra
On Wed, Aug 4, 2010 at 9:11 PM, Mark static.void@gmail.com wrote: Is it possible to use DIH with Cassandra either out of the box or with something more custom? Thanks It will take some modifications but DIH is built to create denormalized documents so it is possible. Also see https://issues.apache.org/jira/browse/SOLR-853 -- Regards, Shalin Shekhar Mangar.
RE: Indexing fieldvalues with dashes and spaces
Well the example you provided is 100% relevant to me :) I've read the wiki now (SchemaXml,SolrFacetingOverview,Query Syntax, SimpleFacetParameters), but still do not have an exact idea of what you mean. My situation: a city field is something that I want users to search on via text input, so lets say New Yo would give the results for New York. But also a facet Cities is available in which New York is just one of the cities that is clickable. The other facet is theme, which in my example holds values like Gemeentehuis and Strand Zee, that would not be a thing on which can be searched via manual input but IS clickable. If you look at my schema.xml, do you see stuff im doing that is absolutely wrong for the purpose described above? Because as far as I can see the documents are indexed correctly (BESIDES the spaces in the fieldvalues). Any help is greatly appreciated! :) -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023992.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH and Cassandra
If data is stored in the index, isn't the index of Solr pretty much already a 'Big/Cassandra Table', except with tokenized columns to make seaching easier? How are Cassandra/Big/Couch DBs doing text/weighted searching? Seems a real duplication to use Cassandra AND Solr. OTOH, I don't know how many 'Tables'/indexes one can make using Solr, I'm still a newbie. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 8/4/10, Andrei Savu andrei.s...@indekspot.com wrote: From: Andrei Savu andrei.s...@indekspot.com Subject: Re: DIH and Cassandra To: solr-user@lucene.apache.org Date: Wednesday, August 4, 2010, 12:00 PM DIH only works with relational databases and XML files [1], you need to write custom code in order to index data from Cassandra. It should be pretty easy to map documents from Cassandra to Solr. There are a lot of client libraries available [2] for Cassandra. [1] http://wiki.apache.org/solr/DataImportHandler [2] http://wiki.apache.org/cassandra/ClientOptions On Wed, Aug 4, 2010 at 6:41 PM, Mark static.void@gmail.com wrote: Is it possible to use DIH with Cassandra either out of the box or with something more custom? Thanks -- Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr
Re: Is there a better for solor server side loadbalance?
The default solr solution is client side loadbalance. Is there a solution provide the server side loadbalance? No. Most of us stick a HTTP load balancer in front of multiple Solr servers. E.g. mod_jk is a very easy solution (maybe too simple/stupid?) for a load balancer, but it offers also a failover functionality: It is as simple as: worker.loadbalancer.balance_workers=worker1,worker2,worker3,... and the failover: worker.worker1.redirect=worker2
Re: Some basic DataImportHandler questions
Thanks, I think part of my issue may be I am misunderstanding how to use the entity and field tags to import data in a particular format and am looking for a few more examples. Lets say I have a database table with 2 columns that contain metadata fields and values, and would like to import this into Solr and keep the pairs together, an example database table follows consisting of two columns (String), one containing metadata names and the other metadata values (col names: metadata_name, metadata_value in this example). There may be multiple records for a name. The set of potential metadata_names is unknown, it could be anything. metadata_name . metadata_value ===== title blah blah subject some subject subject another subject name some name What is the proper way to import these and keep the name/value pairs intact. I am seeing the following after import: arr name=metadata_name_s strtitle/str strsubject/str strname/str /arr − arr name=metadata_value_s strblah blah/str strsome subject/str stranother subject/str strsome name/str /arr Ideally, the end goal would be something like below: arr name=title_s strsome subject/str /arr arr name=name_s strsome name/str /arr etc It feels like I am missing something obvious and this would be a common structure for imports. Just starting with DataImportHandler and had a few simple questions. Is there a location for more in depth documentation other than http://wiki.apache.org/solr/DataImportHandler? Umm, no, but let us know what is not covered well and it can be added. -- View this message in context: http://lucene.472066.n3.nabble.com/Some-basic-DataImportHandler-questions-tp1010291p1024205.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No group by? looking for an alternative.
Hello- A way to do this is to create on faceting field that includes both the size and the color. I assume you have a different shoe product document for each model. Each model would include the color size 'red' and '14a' fields, but you would add a field with 'red-14a'. On Wed, Aug 4, 2010 at 7:17 AM, Mickael Magniez mickaelmagn...@gmail.com wrote: Hello, I'm dealing with a problem since few days : I want to index and search shoes, each shoe can have several size and colors, at different prices. So, what i want is : when I search for Converse, i want to retrieve one shoe per model, i-e one color and one size, but having colors and sizes in facets. My first idea was to copy SQL behaviour with a SELECT * FROM solr WHERE text CONTAINS 'converse' GROUP BY model. But no group by in Solr :(. I try with FieldCollapsing, but have many bugs (NullPointerException). Then I try with multivalued facets : field name=size type=string indexed=true stored=true multiValued=true/ field name=color type=string indexed=true stored=true multiValued=true/ It's nearly working, but i have a problem : when i filtered on red shoes, in the size facet, I also have sizes which are not available in red. I don't find any solutions to filter multivalued facet with value of another multivalued facet. So if anyone have an idea for solving this problem... Mickael. -- View this message in context: http://lucene.472066.n3.nabble.com/No-group-by-looking-for-an-alternative-tp1022738p1022738.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: analysis tool vs. reality
there is some kind of caching of query results going on that doesnt get flushed on a restart of tomcat. Yes. Solr by default has http caching on if there is no configuration, and the example solrconfig.xml has it configured on. You should edit solrconfig.xml to use the alternative described in the comments. On Wed, Aug 4, 2010 at 7:55 AM, Justin Lolofie jta...@gmail.com wrote: Wow, I got to work this morning and my query results now include the 'ABC12' document. I'm not sure what that means. Either I made a mistake in the process I described in the last email (I dont think this is the case) or there is some kind of caching of query results going on that doesnt get flushed on a restart of tomcat. Erik: Yes, I did re-index if that means adding the document again. Here are the exact steps I took: 1. analysis.jsp ABC12 does NOT match title ABC12 (however, ABC or 12 does) 2. changed schema.xml WordDelimeterFilterFactory catenate-all 3. restarted tomcat 4. deleted the document with title ABC12 5. added the document with title ABC12 6. query ABC12 does NOT result in the document with title ABC12 7. analysis.jsp ABC12 DOES match that document now Is there any way to see, given an ID, how something is indexed internally? Lance: I understand the index/query sections of analysis.jsp. However, it operates on text that you enter into the form, not on actual index data. Since all my documents have a unique ID, I'd like to supply an ID and a query, and get back the same index/query sections- using whats actually in the index. -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 22:43:17 -0400 Subject: Re: analysis tool vs. reality Did you reindex after changing the schema? On Aug 3, 2010, at 7:35 PM, Justin Lolofie wrote: Hi Erik, thank you for replying. So, turning on debugQuery shows information about how the query is processed- is there a way to see how things are stored internally in the index? My query is ABC12. There is a document who's title field is ABC12. However, I can only get it to match if I search for ABC or 12. This was also true in the analysis tool up until recently. However, I changed schema.xml and turned on catenate-all in WordDelimterFilterFactory for title fieldtype. Now, in the analysis tool ABC12 matches ABC12. However, when doing an actual query, it does not match. Thank you for any help, Justin -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 16:50:06 -0400 Subject: Re: analysis tool vs. reality The analysis tool is merely that, but during querying there is also a query parser involved. Adding debugQuery=true to your request will give you the parsed query in the response offering insight into what might be going on. Could be lots of things, like not querying the fields you think you are to a misunderstanding about some text not being analyzed (like wildcard clauses). Erik On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote: Hello, I have found the analysis tool in the admin page to be very useful in understanding my schema. I've made changes to my schema so that a particular case I'm looking at matches properly. I restarted solr, deleted the document from the index, and added it again. But still, when I do a query, the document does not get returned in the results. Does anyone have any tips for debugging this sort of issue? What is different between what I see in analysis tool and new documents added to the index? Thanks, Justin -- Lance Norskog goks...@gmail.com
Re: Indexing fieldvalues with dashes and spaces
I suspect you're running afoul of tokenizers and filters. The parts of your schema that you published aren't the ones that really count. What you probably need to look at is the FieldType definitions, i.e. what analysis is done for, say, text_ws (see FieldType... in your schema). There you might find things like WordDelimiterFilter with several options. LowerCaseFilter, etc. Each of these changes what's placed in your index. Here's a good place to start, although it's not exhaustive: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters The general idea here is that the Tokenizers in general break up the incoming stream according to various rules. The Filters then (potentially) modify each token in various ways. Until you have a firm handle on this process, facets are probably a distraction. You're better off looking at your index with the admin pages and/or Luke and/or LukeRequestHandler. And do be aware that fields you get back from a request (i.e. a search) are the stored fields, NOT what's indexed. This may trip you up too... HTH Erick On Wed, Aug 4, 2010 at 5:22 PM, PeterKerk vettepa...@hotmail.com wrote: Well the example you provided is 100% relevant to me :) I've read the wiki now (SchemaXml,SolrFacetingOverview,Query Syntax, SimpleFacetParameters), but still do not have an exact idea of what you mean. My situation: a city field is something that I want users to search on via text input, so lets say New Yo would give the results for New York. But also a facet Cities is available in which New York is just one of the cities that is clickable. The other facet is theme, which in my example holds values like Gemeentehuis and Strand Zee, that would not be a thing on which can be searched via manual input but IS clickable. If you look at my schema.xml, do you see stuff im doing that is absolutely wrong for the purpose described above? Because as far as I can see the documents are indexed correctly (BESIDES the spaces in the fieldvalues). Any help is greatly appreciated! :) -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023992.html Sent from the Solr - User mailing list archive at Nabble.com.
XML Format
doc int name=AP_AUC_PHOTO_AVAIL1/int double name=AUC_AD_PRICE1.0/double int name=AUC_CLIENT_ID27017/int str name=AUC_DESCR_SHORTBracket Ceiling untuk semua merk projector, panjang 60-90 cm Bahan Besi Cat Hitam = 325rb Bahan Sta/str str name=AUC_HTML_DIR_NL/aksesoris-batere-dan-tripod/update-bracket-projector-dan-lcd-plasma-tv-607136.html/str int name=AUC_ID607136/int str name=AUC_ISNEGONego/str int name=AUC_LOCATION7/int str name=AUC_PHOTO270/27017/bracket_lcd_plasma_3a-1274291780.JPG/str str name=AUC_START2010-05-19 17:56:45/str str name=AUC_TITLE[UPDATE] BRACKET Projector dan LCD/PLASMA TV/str int name=AUC_TYPE21/int int name=PRO_BACKGROUND0/int int name=PRO_BOLD0/int int name=PRO_COLOR0/int int name=PRO_GALLERY0/int int name=PRO_LINK0/int int name=PRO_SPONSOR0/int int name=cat_id_sub0/int int name=sectioncode28/int /doc above is my recent XML list, I can't do search for example searching bracket word, it will return empty list, so after I search on the internet, I found out that there is a mistake in my XML Schema, I should change the schema so it will return a list below (see the bold one): doc int name=AP_AUC_PHOTO_AVAIL1/int double name=AUC_AD_PRICE1.0/double int name=AUC_CLIENT_ID27017/int arr name=AUC_DESCR_SHORTstrBracket Ceiling untuk semua merk projector, panjang 60-90 cm Bahan Besi Cat Hitam = 325rb Bahan Sta/str/arr str name=AUC_HTML_DIR_NL/aksesoris-batere-dan-tripod/update-bracket-projector-dan-lcd-plasma-tv-607136.html/str int name=AUC_ID607136/int str name=AUC_ISNEGONego/str int name=AUC_LOCATION7/int str name=AUC_PHOTO270/27017/bracket_lcd_plasma_3a-1274291780.JPG/str str name=AUC_START2010-05-19 17:56:45/str arr name=AUC_TITLEstr[UPDATE] BRACKET Projector dan LCD/PLASMA TV/str/arr int name=AUC_TYPE21/int int name=PRO_BACKGROUND0/int int name=PRO_BOLD0/int int name=PRO_COLOR0/int int name=PRO_GALLERY0/int int name=PRO_LINK0/int int name=PRO_SPONSOR0/int int name=cat_id_sub0/int int name=sectioncode28/int /doc my question is, how to change my Schema so it will return the list like the one above that I bold? thanks before -- View this message in context: http://lucene.472066.n3.nabble.com/XML-Format-tp1024608p1024608.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrj ContentStreamUpdateRequest Slow
ContentStreamUpdateRequest seems to read the file contents and transfer it over http, which slows down the indexing. Try Using StreamingUpdateSolrServer with stream.file param @ http://wiki.apache.org/solr/SolrPerformanceFactors#Embedded_vs_HTTP_Post e.g. SolrServer server = new StreamingUpdateSolrServer(Solr Server URL,20,8); UpdateRequest req = new UpdateRequest(/update/extract); ModifiableSolrParams params = null ; params = new ModifiableSolrParams(); params.add(stream.file, new String[]{local file path}); params.set(literal.id, value); req.setParams(params); server.request(req); server.commit(); Regards, Jayendra On Wed, Aug 4, 2010 at 3:01 PM, Tod listac...@gmail.com wrote: I'm running a slight variation of the example code referenced below and it takes a real long time to finally execute. In fact it hangs for a long time at solr.request(up) before finally executing. Is there anything I can look at or tweak to improve performance? I am also indexing a local pdf file, there are no firewall issues, solr is running on the same machine, and I tried the actual host name in addition to localhost but nothing helps. Thanks - Tod http://wiki.apache.org/solr/ContentStreamUpdateRequestExample
how to take a value from the query result
this is my query in browser navigation toolbar http://172.16.17.126:8983/search/select/?q=AUC_ID:607136 and this is the result in browser page: ... doc int name=AP_AUC_PHOTO_AVAIL1/int double name=AUC_AD_PRICE1.0/double int name=AUC_CAT576/int int name=AUC_CLIENT_ID27017/int str name=AUC_DESCR_SHORTBracket Ceiling untuk semua merk projector, panjang 60-90 cm Bahan Besi Cat Hitam = 325rb Bahan Sta/str str name=AUC_HTML_DIR_NL/aksesoris-batere-dan-tripod/update-bracket-projector-dan-lcd-plasma-tv-607136.html/str int name=AUC_ID607136/int str name=AUC_ISNEGONego/str int name=AUC_LOCATION7/int str name=AUC_PHOTO270/27017/bracket_lcd_plasma_3a-1274291780.JPG/str str name=AUC_START2010-05-19 17:56:45/str str name=AUC_TITLE[UPDATE] BRACKET Projector dan LCD/PLASMA TV/str int name=AUC_TYPE21/int int name=PRO_BACKGROUND0/int int name=PRO_BOLD0/int int name=PRO_COLOR0/int int name=PRO_GALLERY0/int int name=PRO_LINK0/int int name=PRO_SPONSOR0/int int name=cat_id_sub0/int int name=sectioncode28/int /doc I want to get the AUC_CAT value (576) and using it in my PHP, how can I get that value? please help thanks before -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-take-a-value-from-the-query-result-tp1025119p1025119.html Sent from the Solr - User mailing list archive at Nabble.com.