solrcloud and csv import hangs
Hi, This appears to happen in trunk too. It appears that the add command request parameters get sent to the nodes. If I comment these out like so for add and commit: core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java - params = new ModifiableSolrParams(req.getParams()); + //params = new ModifiableSolrParams(req.getParams()); + params = new ModifiableSolrParams(); This things work as expected. Otherwise params like stream.url gets sent to the replicant nodes which causes failure if the file is missing, or worse repeatedly importing the same file if exists on a replicant. This might not be the right thing to do? ... what should be sent here for a streaming CSV import? Dan On Thu, Sep 20, 2012 at 4:32 PM, dan sutton danbsut...@gmail.com wrote: Hi, I'm using Solr 4.0-BETA and trying to import a CSV file as follows: curl http://localhost:8080/solr/core/update -d overwrite=false -d commit=true -d stream.contentType='text/csv;charset=utf-8' -d stream.url=file:///dir/file.csv I have 2 tomcat servers running on different machines and a separate zookeeper quorum (3 zoo servers, 2 on same machine). This is a 1 shard core, replicated to the other machine. It seems that for a 255K line file I have 170 docs on the server that issued the command, but on the other, the index seems to grow unbounded? Has anyone been seen this, or been successful in using the CSV import with solrcloud? Cheers, Dan
solrcloud and csv import hangs
Hi, I'm using Solr 4.0-BETA and trying to import a CSV file as follows: curl http://localhost:8080/solr/core/update -d overwrite=false -d commit=true -d stream.contentType='text/csv;charset=utf-8' -d stream.url=file:///dir/file.csv I have 2 tomcat servers running on different machines and a separate zookeeper quorum (3 zoo servers, 2 on same machine). This is a 1 shard core, replicated to the other machine. It seems that for a 255K line file I have 170 docs on the server that issued the command, but on the other, the index seems to grow unbounded? Has anyone been seen this, or been successful in using the CSV import with solrcloud? Cheers, Dan
Re: SOLR 4.0 / Jetty Security Set Up
Hi, If like most people you have application server(s) in front of solr, the simplest and most secure option is to bind solr to a local address (192.168.* or 10.0.0.*). The app server talks to solr via the local (a.k.a blackhole) ip address that no-one from outside can ever access as it's not routable. Plus you then don't need to employ authentication which can slow down responses as you're ONLY employing access control.This is what we do for access to 5 solr servers. Cheers, Dan On Wed, Sep 5, 2012 at 10:51 AM, Paul Codman snoozes...@gmail.com wrote: First time Solr user and I am loving it! I have a standard Solr 4 set up running under Jetty. The instructions in the Wiki do not seem to apply to Solr 4 (eg mortbay references / section to uncomment not present in xml file / etc) - could someone please advise on steps required to secure Solr 4 and can someone confirm that security operates in relation to new Admin interface. Thanks in advance.
Solr Cloud partitioning
Hi, At the moment, partitioning with solrcloud is hash based on uniqueid. What I'd like to do is have custom partitioning, e.g. based on date (shard_MMYY). I'm aware of https://issues.apache.org/jira/browse/SOLR-2592, but after a cursory look it seems that with the latest patch, one might end up with multiple partitions in the same shard, perhaps all (e.g. if 2 or more partition hash values end up in the same range), which I'd not want. Has anyone else implemented custom shard partitioning for solrcloud ? I think the answer is to have the partition class itself pluggable (default to hash of unique_key as now), but not sure how to pass the solrConfig pluggable partition class through to ClusterState (which is in solrj not core)? any advice? Cheers, Dan
flashcache and solr/lucene
Hi, Just wondering if anyone had any experience with solr and flashcache [https://wiki.archlinux.org/index.php/Flashcache], my guess it might be particularly useful for indicies not changing that often, and for large indicies where an SSD of that size is prohibitive. Cheers, Dan
Solr Warm-up performance issues
Hi List, We use Solr 4.0.2011.12.01.09.59.41 and have a dataset of roughly 40 GB. Every day we produce a new dataset of 40 GB and have to switch one for the other. Once the index switch over has taken place, it takes roughly 30 min for Solr to reach maximum performance. Are there any hardware or software solutions to reduce the warm-up time ? We tried warm-up queries but it didn't change much. Our hardware specs is: * Dell Poweredge 1950 * 2 x Quad-Core Xeon E5405 (2.00GHz) * 48 GB RAM * 2 x 146 GB SAS 3 Gb/s 15K RPM disk configured in RAID mirror One thing that does seem to take a long time is un-inverting a set of multivalued fields, are there any optimizations we might be able to use here? Thanks for your help. Dan
Re: How to return exact set of multivalue field
-field_name:[ * TO 384] +field_name:[385 TO 386] -field_name:[387 TO *] On Thu, Oct 20, 2011 at 10:51 AM, Ellery Leung elleryle...@be-o.com wrote: Hi all I am using Solr 3.4 on Windows 7. Here is the example of a multivalue field: doc arr name=field_name str387/str str386/str /arr /doc doc arr name= field_name str387/str str386/str /arr /doc doc arr name= field_name str387/str str386/str str385/str str382/str str312/str str311/str /arr /doc I am doing a search on field_name and JUST want to return record that IS 387 and 386 (the first and second record). Here is the query: field_name: (387 AND 386) But this query return all 3 records, which is wrong. I have tried using filter: field_name: (387 AND 386) but it still doesn't work. Therefore I would like to ask, are there any way to change this query so that it will ONLY return first and second record? Thank you in advance for any help.
Distributed Search question/feedback
Hi, Does SolrCloud use Distributed search as described http://wiki.apache.org/solr/DistributedSearch or is it different entirely? Does SolrCloud suffer from the same limitation as Distributed search (inefficient to use a high start parameter, and presumably high CPU highlighting all those docs etc among other issues). Our search mainly comprises of searches with a country, and occationally across a continent or worldwide, so I'm thinking it's probably simpler to have a pan index for worldwide and continent searches, and seperate country indicies (and these placed closer to each country for example). Any pointers for those who've been down the distributed path appreciated! Cheers, Dan
logging client ip address
Hi, We're using log4j with solr which is working fine and I'm wondering how I might be able to log the client ip address? Has anyone else been able to do this? Cheers, Dan
Re: logging client ip address
Does anyone know how I would be able to include the client ip address for tomcat 6 with log4j? Cheers, Dan On Wed, Sep 7, 2011 at 11:03 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Sep 7, 2011 at 2:56 PM, dan sutton danbsut...@gmail.com wrote: Hi, We're using log4j with solr which is working fine and I'm wondering how I might be able to log the client ip address? Has anyone else been able to do this? Your application container should have an access log facility. That is the best way to record client IPs. Solr does not have that capability. -- Regards, Shalin Shekhar Mangar.
replication/search on separate LANs
Hi All, I'm wondering if anyone had experience on replicating and searching over separate LANs? currently we do both over the same one. So each slave would have 2 Ethernet cards, 1/LAN and the master just one. We're currently building and replicating a daily index, this is quite large about 15M docs, and during the replication we see a high CPU load and searching becomes slow so we're trying to mitigate this. Has anyone set this up? did it help ? Cheers, Dan
custom highlighting
Hi, I'd like to make the highlighting work as follows: length(all snippits) approx. 200 chars hl.snippits = 2 (2 snippits) e.g. if there is onyl 1 snippet available, length = 200chars e.g. if there is 1 snippet, length each snippet == 100chars, so I take the first 2 and get 200 chars Is this possible with the regex fragmenter? Or does anyone know of any contrib fragmenter that might do this? Many thanks Dan
Suggester and query/index analysis
Hi All, I understand that I can use a custom queryConverter for the input to the suggester http://wiki.apache.org/solr/Suggester component, however there dosen't seem to be anything on the indexing side, TST appears to take the input verbatim, and Jaspell seems to lowercase everything. The problem with this is that a suggest query like q=l would not show up 'London, UK' due to case differences. Has anyone using the suggester component come up with a workaround? My initial thoughts are to override the TSTLookup to alter the key to pass through an analyzer, and do the same with my custom queryConverter. Any other options? e.g. I'd like the following to all return 'London, UK' as the display for the autocomplete london, uk london uk London UK London uk etc. Cheers, Dan
Enable/disable mainIndex component
Hi, Does anyone know if I can do the following: mainIndex enable=${enable.master:false} mergeFactor10/mergeFactor ... /mainIndex mainIndex enable=${enable.slave:true} mergeFactor2/mergeFactor ... /mainIndex Cheers, Dan
Highlighting and custom fragmenting
Hi All, I'd like to make the highlighting work as follows: length(all snippits) approx. 200 chars hl.snippits = 2 (2 snippits) is this possible with the regex fragmenter? or does anyone know of any contrib fragmenter that might do this? Many thanks Dan
Re: Math-generated fields during query
As a workaround can you not have a search component run after the querycomponent, and have the qty_ordered,unit_price as stored fields and returned with the fl parameter and have your custom component do the calc, unless you need to sort by this value too? Dan On Wed, Mar 9, 2011 at 10:06 PM, Peter Sturge peter.stu...@gmail.com wrote: Hi, I was wondering if it is possible during a query to create a returned field 'on the fly' (like function query, but for concrete values, not score). For example, if I input this query: q=_val_:product(15,3)fl=*,score For every returned document, I get score = 45. If I change it slightly to add *:* like this: q=*:* _val_:product(15,3)fl=*,score I get score = 32.526913. If I try my use case of _val_:product(qty_ordered,unit_price), I get varying scores depending on...well depending on something. I understand this is doing relevance scoring, but it doesn't seem to tally with the FunctionQuery Wiki [example at the bottom of the page]: q=boxname:findbox+_val_:product(product(x,y),z)fl=*,score ...where score will contain the resultant volume. Is there a trick to getting not a score, but the actual value of quantity*price (e.g. product(5,2.21) == 11.05)? Many thanks
Split analysis
Hi All, I have a requirement to analyze a field with a series of filters, calculate a 'signature' then concatenate with the original input e.g. input = 'this is the input' tokenized and filtered, input becomes say 'this input' = 12ef5e (signature) so the final output indexed is: 12ef5ethis is the input I can calculate the signature easily, but how can I get access to the original (now tokenized and filtered) input Many thanks in advance, Dan
Re: Replication and newSearcher registerd poll interval
Hi, Keeping the thread alive, any thought on only doing replication if there is no warming currently going on? Cheers, Dan On Thu, Feb 10, 2011 at 11:09 AM, dan sutton danbsut...@gmail.com wrote: Hi, If the replication window is too small to allow a new searcher to warm and close the current searcher before the new one needs to be in place, then the slaves continuously has a high load, and potentially an OOM error. we've noticed this in our environment where we have several facets on large multivalued fields. I was wondering what the list though about modifying the replication process to skip polls (though warning to logs) when there is a searcher in the process of warming? Else as in our case it brings the slave to it's knees, workaround was to extend the poll interval, though not ideal. Cheers, Dan
Replication and newSearcher registerd poll interval
Hi, If the replication window is too small to allow a new searcher to warm and close the current searcher before the new one needs to be in place, then the slaves continuously has a high load, and potentially an OOM error. we've noticed this in our environment where we have several facets on large multivalued fields. I was wondering what the list though about modifying the replication process to skip polls (though warning to logs) when there is a searcher in the process of warming? Else as in our case it brings the slave to it's knees, workaround was to extend the poll interval, though not ideal. Cheers, Dan
Re: facet.mincount
I don't think facet.mincount works with date faceting, see here: http://wiki.apache.org/solr/SimpleFacetParameters Dan On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia isan.fu...@germinait.com wrote: Any query followed by facet=onfacet.date=aUpdDtfacet.date.start=2011-01-02T08:00:00.000Zfacet.date.end=2011-02-03T08:00:00.000Zfacet.date.gap=%2B1HOURfacet.mincount=1 On 3 February 2011 15:14, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: could you post the query you are submitting to Solr? On 3 February 2011 09:33, Isan Fulia isan.fu...@germinait.com wrote: Hi all, Even after making facet.mincount=1 , it is showing the results with count = 0. Does anyone know why this is happening. -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Re: facet.mincount
facet.mincount is grouped only under field faceting parameters not date faceting parameters On Thu, Feb 3, 2011 at 11:08 AM, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: Hi Dan, I'm probably just not able to spot this, but where does the wiki page mention that the facet.mincount is not applicable on date fields? On 3 February 2011 10:55, Isan Fulia isan.fu...@germinait.com wrote: I am using solr1.4.1 release version I got the following error while using facet.mincount java.lang.IllegalStateException: STREAM at org.mortbay.jetty.Response.getWriter(Response.java:571) at org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:158) at org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:151) at org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:208) at org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:144) at org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:95) at org.apache.jsp.admin.index_jsp._jspService(org.apache.jsp.admin.index_jsp:397) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:373) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) at org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:431) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1098) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:286) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) On 3 February 2011 16:17, dan sutton danbsut...@gmail.com wrote: I don't think facet.mincount works with date faceting, see here: http://wiki.apache.org/solr/SimpleFacetParameters Dan On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia isan.fu...@germinait.com wrote: Any query followed by facet=onfacet.date=aUpdDtfacet.date.start=2011-01-02T08:00:00.000Zfacet.date.end=2011-02-03T08:00:00.000Zfacet.date.gap=%2B1HOURfacet.mincount=1 On 3 February 2011 15:14, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: could you post the query you are submitting to Solr? On 3 February 2011 09:33, Isan Fulia isan.fu...@germinait.com wrote: Hi all
EmbeddedSolrServer and junit
Hi, I have 2 cores CoreA and CoreB, when updating content on CoreB, I use solrj and EmbeddedSolrServer to query CoreA for information, however when I do this with my junit tests (which also use EmbeddedSolrServer to query) I get this error SEVERE: Previous SolrRequestInfo was not closed! junit.framework.AssertionFailedError [junit] at org.apache.solr.request.SolrRequestInfo.setRequestInfo(SolrRequestInfo.java:45) How should I write the junit tests to test a multi-core, with EmbeddedSolrServer used in a component during querying? Cheers, Dan
Re: EmbeddedSolrServer and junit
Hi, I think I've found the cause: src/java/org/apache/solr/util/TestHarness.java, query(String handler, SolrQueryRequest req) calls SolrRequestInfo.setRequestInfo(new SolrRequestInfo(req, rsp)), which my componenet also calls in the same thread hence the error. The fix was to override assertQ to call queryAndResponse(String handler, SolrQueryRequest req) instead which does not set/clear SolrRequestInfo Regards, Dan On Mon, Jan 31, 2011 at 2:32 PM, dan sutton danbsut...@gmail.com wrote: Hi, I have 2 cores CoreA and CoreB, when updating content on CoreB, I use solrj and EmbeddedSolrServer to query CoreA for information, however when I do this with my junit tests (which also use EmbeddedSolrServer to query) I get this error SEVERE: Previous SolrRequestInfo was not closed! junit.framework.AssertionFailedError [junit] at org.apache.solr.request.SolrRequestInfo.setRequestInfo(SolrRequestInfo.java:45) How should I write the junit tests to test a multi-core, with EmbeddedSolrServer used in a component during querying? Cheers, Dan
solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria
Hi, Is there a way with faceting or field collapsing to do the SQL equivalent of SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria i.e. I'm only interested in the total count not the individual records and counts. Cheers, Dan
JMX Cache values are wrong
Hi, I've used three different JMX clients to query solr/core:id=org.apache.solr.search.FastLRUCache,type=queryResultCache and solr/core:id=org.apache.solr.search.FastLRUCache,type=documentCache beans and they appear to return old cache information. As new searchers come online, the newer caches dosen't appear to be registered perhaps? I can see this when I query JMX for the 'description' attribute and the regenerator JMX output shows a different org.apache.solr.search.SolrIndexSearcher to that which appears in the stats.jsp page. Any ideas as to what's gone wrong ... anyone else experience this? From registry.jsp: Solr Specification Version: 1.4.0.2010.09.10.17.10.36 Solr Implementation Version: 1.4.1-dev exported Lucene Specification Version: 2.9.1 Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25 Cheers, Dan
Re: spatial sorting
Hi All, This is more of an FYI for those wanting to filter and sort by distance, and have the values returned in the result set after determining a way to do this with existing code. Using solr 4.0 an example query would contain the following parameters: /select? q=stevenage^0.0 +_val_:ghhsin(6371,geohash(52.0274,-0.4952),location)^1.0 Make the boost on all parts of the query other than the ghhsin distance value function 0 ,and 1 on the function, this is so that the score is then equal to the distance. (52.0274,-0.4952) here is the query point and 'location' is the geohash field to search against sort=score asc basically sort by distance asc (closest first) fq={!sfilt%20fl=location}pt=52.0274,-0.4952d=30 This is the spatial filter to limit the necessary distance calculations. fl=*,score Return all fields (if required) but include the score (which contains the distance calculation) Does anyone know if it's possible to return the distance and score separately? I know there has been a patch to sort by value function, but how can one return the values from this? Cheers, Dan On Fri, Sep 17, 2010 at 2:45 PM, dan sutton danbsut...@gmail.com wrote: Hi, I'm trying to filter and sort by distance with this URL: http://localhost:8080/solr/select/?q=*:*fq={!sfilt%20fl=loc_lat_lon}pt=52.02694,-0.49567d=2sort={!func}hsin(52.02694,-0.49567,loc_lat_lon_0_d,%20loc_lat_lon_1_d,3963.205)http://localhost:8080/solr/select/?q=*:*fq=%7B%21sfilt%20fl=loc_lat_lon%7Dpt=52.02694,-0.49567d=2sort=%7B%21func%7Dhsin%2852.02694,-0.49567,loc_lat_lon_0_d,%20loc_lat_lon_1_d,3963.205%29asc Filtering is fine but it's failing in parsing the sort with : The request sent by the client was syntactically incorrect (can not sort on undefined field or function: {!func}(52.02694,-0.49567,loc_lat_lon_0_d, loc_lat_lon_1_d, 3963.205)).* *I'm using the solr/lucene trunk to try this out ... does anyone know what is wrong with the syntax? Additionally am I able to return the distance sort values e.g. with param fl ? ... else am I going to have to either write my own component (which would also look up the filtered cached values rather than re-calculating distance) or use an alternative like localsolr ? Dan
multiple spatial values
Hi, I was looking at the LatLonType and how it might represent multiple lon/lat values ... it looks to me like the lat would go in {latlongfield}_0_LatLon and the long in {latlongfield}_1_LatLon ... how then if we have multiple lat/long points for a doc when filtering for example we choose the correct points. e.g. if thinking in cartisean coords and we have P1(3,4), P2(6,7) ... x is stored with 3,6 and y with 4,7 ... then how does it ensure we're not erroneously picking (3,7) or (6,4) whilst filtering with the spatial query? don't we have to store both values together ? what am i missing here? Cheers, Dan
Re: how to normalize a query
What I wanted was a was to determine that simply the query q=one two is equivalent to q=two one, by normalizing I might have q=one two for both for example, and then the q.hashCode() would be the same Simply using q.hashCode() returns different values for each query above so this is not suitable Cheers Dan On Thu, Sep 9, 2010 at 3:36 PM, Markus Jelsma markus.jel...@buyways.nlwrote: LuceneQParser http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#Proximity%20Searches DismaxQParser http://wiki.apache.org/solr/DisMaxQParserPlugin#qs_.28Query_Phrase_Slop.29 On Thursday 09 September 2010 15:08:41 dan sutton wrote: Hi, Does anyone know how I might normalized a query so that e.g. q=one two equals q=two one Cheers, Dan Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Auto Suggest
I set this up a few years ago with something like the following: fieldType name=autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.EdgeNGramFilterFactory maxGramSize=20 minGramSize=1 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / /analyzer /fieldType filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / is the bit missing i think here This way the search is agnostic to case and any non-alphanum chars, this was to facilitate a location autocomplete for searching So is was a basic search, returning the top N results along with additional info to show in the autocomplete to our mod_perl servers, Results were cached in the mod_perl servers. Regards, Dan On Thu, Sep 2, 2010 at 1:53 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: I'm having a different issue with the EdgeNGram technique described here: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ That is one word queries q=app on the query_text field, work fine however q=app mou do not. Why would this be or is there a configuration that could be missing? On Wed, Sep 1, 2010 at 3:53 PM, Eric Grobler impalah...@googlemail.com wrote: Thanks for your feedback Robert, I will try that and see how Solr performs on my data - I think I will create a field that contains only important key/product terms from the text. Regards Johan On Wed, Sep 1, 2010 at 9:12 PM, Robert Petersen rober...@buy.com wrote: We don't have that many, just a hundred thousand, and solr response times (since the index's docs are small and not complex) are logged as typically 1 ms if not 0 ms. It's funny but sometimes it is so fast no milliseconds have elapsed. Incredible if you ask me... :) Once you get SOLR to consider the whole phrase as just one big term, the wildcard is very fast. -Original Message- From: Eric Grobler [mailto:impalah...@googlemail.com] Sent: Wednesday, September 01, 2010 12:35 PM To: solr-user@lucene.apache.org Subject: Re: Auto Suggest Hi Robert, Interesting approach, how many documents do you have in Solr? I have about 2 million and I just wonder if it might be a bit slow. Regards Johan On Wed, Sep 1, 2010 at 7:38 PM, Robert Petersen rober...@buy.com wrote: I do this by replacing the spaces with a '%' in a separate search field which is not parsed nor tokenized and then you can wildcard across the whole phrase like you want and the spaces don't mess you up. Just store the original phrase with spaces in a separate field for returning to the front end for display. -Original Message- From: Jazz Globe [mailto:jazzgl...@hotmail.com] Sent: Wednesday, September 01, 2010 7:33 AM To: solr-user@lucene.apache.org Subject: Auto Suggest Hallo How would one implement a multiple term auto-suggest feature in Solr that is filter sensitive? For example, a user enters : mp3 and solr might suggest: - mp3 player - mp3 nano - mp3 sony and then the user starts the second word : mp3 n and that narrows it down to: - mp3 nano I had a quick look at the Terms Component. I suppose it just returns term totals for the entire index and cannot be used with a filter or query? Thanks Johan
Re: Spellchecking and frequency
Hi Mark, Thanks for that info looks very interesting, would be great to see your code. Out of interest did you use the dictionary and the phonetic file? Did you see better results with both? In regards to the secondary part to check the corpus for matching suggestions, would another way to do this is to have an event listener to listen for commits, and then build the dictionary for matching corpus words that way, then you avoid the performance hit at query time. Cheers, Dan On Tue, Jul 27, 2010 at 7:04 PM, Mark Holland mark.holl...@zoopla.co.ukwrote: Hi, I found the suggestions returned from the standard solr spellcheck not to be that relevant. By contrast, aspell, given the same dictionary and mispelled words, gives much more accurate suggestions. I therefore wrote an implementation of SolrSpellChecker that wraps jazzy, the java aspell library. I also extended the SpellCheckComponent to take the matrix of suggested words and query the corpus to find the first combination of suggestions which returned a match. This works well for my use case, where term frequency is irrelevant to spelling or scoring. I'd like to publish the code in case someone finds it useful (although it's a bit crude at the moment and will need a decent tidy up). Would it be appropriate to open up a Jira issue for this? Cheers, ~mark On 27 July 2010 09:33, dan sutton danbsut...@gmail.com wrote: Hi, I've recently been looking into Spellchecking in solr, and was struck by how limited the usefulness of the tool was. Like most corpora , ours contains lots of different spelling mistakes for the same word, so the 'spellcheck.onlyMorePopular' is not really that useful unless you click on it numerous times. I was thinking that since most of the time people spell words correctly why was there no other frequency parameter that could enter into the score? i.e. something like: spell_score ~ edit_dist * freq I'm sure others have come across this issue and was wonding what steps/algorithms they have used to overcome these limitations? Cheers, Dan
Spellchecking and frequency
Hi, I've recently been looking into Spellchecking in solr, and was struck by how limited the usefulness of the tool was. Like most corpora , ours contains lots of different spelling mistakes for the same word, so the 'spellcheck.onlyMorePopular' is not really that useful unless you click on it numerous times. I was thinking that since most of the time people spell words correctly why was there no other frequency parameter that could enter into the score? i.e. something like: spell_score ~ edit_dist * freq I'm sure others have come across this issue and was wonding what steps/algorithms they have used to overcome these limitations? Cheers, Dan
Re: why spellcheck and elevate search components can't work together?
It needs to be : arr name=last-components strspellcheck/str strelevateListings/str /arr or arr name=last-components strelevateListings/str strspellcheck/str /arr Dan On Mon, Jul 19, 2010 at 11:14 AM, Chamnap Chhorn chamnapchh...@gmail.comwrote: In my solrconfig.xml, I setup this way, but it doesn't work at all. Any one can help? it works one without other one. searchComponent name=elevateListings class=org.apache.solr.handler.component.QueryElevationComponent str name=queryFieldTypestring_ci/str str name=config-fileelevateListings.xml/str str name=forceElevationfalse/str /searchComponent requestHandler name=mb_listings class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=defTypedismax/str str name=qfname^2 full_text^1/str str name=fluuid/str str name=version2.2/str str name=indenton/str str name=tie0.1/str /lst lst name=appends str name=fqtype:Listing/str /lst lst name=invariants str name=facetfalse/str /lst arr name=last-components strspellcheck/str /arr arr name=last-components strelevateListings/str /arr /requestHandler If I remove spellcheck component, the elevate component works (the result also loads from elevateListings.xml). If I remove elevate component, http://localhost:8081/solr/select/?q=reddqt=mb_listingsspellcheck=truespellcheck.collate=truedoes work. Any ideas? Chhorn Chamnap http://chamnapchhorn.blogspot.com/
Re: Custom comparator
Apologies I didn't make the requirement clear. I need to keep the best N documents - set A (chosen by some criteria - call them sponsored docs) in front of the natural scoring docs - set B so that I return (A,B). The set A docs need to all score above 1% of maxScore in B else they join the B set, though I don't really know maxScore until I've looked at all the docs. I am looking at the QueryElevationComponent for some hints, but any other suggestions are appreciated. Many thanks, Dan On Fri, Jul 16, 2010 at 12:03 AM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, why do you need a custom collector? You can use the form of the search that returns a TopDocs, from which you can get the max score and the array of ScoreDoc each of which has its score. So you can just let the underlying code get the top N documents, and throw out any that don't score above 1%. HTH Erick On Thu, Jul 15, 2010 at 10:02 AM, dan sutton danbsut...@gmail.com wrote: Hi, I have a requirement to have a custom comparator that keep the top N documents (chosen by some criteria) but only if their score is more then e.g. 1% of the maxScore. Looking at SolrIndexSearcher.java, I was hoping to have a custom TopFieldCollector.java to return these via TopFieldCollector..topDocs, but I can't see how to override that class to provide my own, I think I need to do this here (TopFieldCollector..topDocs) as I won't know what the maxScore is until all the docs have been collected and compared? Does anyone have any suggestions? I'd like to avoid having to do two searches. Many Thanks, Dan
Custom comparator
Hi, I have a requirement to have a custom comparator that keep the top N documents (chosen by some criteria) but only if their score is more then e.g. 1% of the maxScore. Looking at SolrIndexSearcher.java, I was hoping to have a custom TopFieldCollector.java to return these via TopFieldCollector..topDocs, but I can't see how to override that class to provide my own, I think I need to do this here (TopFieldCollector..topDocs) as I won't know what the maxScore is until all the docs have been collected and compared? Does anyone have any suggestions? I'd like to avoid having to do two searches. Many Thanks, Dan
Re: Help with highlighting
It looks to me like a tokenisation issue, all_text content and the query text will match, but the string fieldtype fields 'might not' and therefore will not be highlighted. On Wed, Jun 23, 2010 at 4:40 PM, n...@frameweld.com wrote: Here's my request: q=ASA+AND+minisite_id%3A36version=1.3json.nl =maprows=10start=0wt=jsonhl=truehl.fl=%2Ahl.simple.pre=%3Cspan+class%3D%22hl%22%3Ehl.simple.post=%3C%2Fspan%3Ehl.fragsize=0hl.mergeContiguous=false And here's what happened: It didn't return results, even when I applied an asterisk for which fields highlight. I tried other fields and that didn't work either, however all_text is the only one that works. Any other ideas why the other fields won't highlight? Thanks. -Original Message- From: Erik Hatcher erik.hatc...@gmail.com Sent: Tuesday, June 22, 2010 9:49pm To: solr-user@lucene.apache.org Subject: Re: Help with highlighting You need to share with us the Solr request you made, any any custom request handler settings that might map to. Chances are you just need to twiddle with the highlighter parameters (see wiki for docs) to get it to do what you want. Erik On Jun 22, 2010, at 4:42 PM, n...@frameweld.com wrote: Hi, I need help with highlighting fields that would match a query. So far, my results only highlight if the field is from all_text, and I would like it to use other fields. It simply isn't the case if I just turn highlighting on. Any ideas why it only applies to all_text? Here is my schema: ?xml version=1.0 ? schema name=Search version=1.1 types !-- Basic Solr Bundled Data Types -- !-- Rudimentary types -- fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true / fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true / !-- Non-sortable numeric types -- fieldType name=integer class=solr.IntField omitNorms=true/ fieldType name=long class=solr.LongField omitNorms=true/ fieldType name=float class=solr.FloatField omitNorms=true/ fieldType name=double class=solr.DoubleField omitNorms=true/ !-- Sortable numeric types -- fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ fieldType name=slong class=solr.SortableLongField sortMissingLast=true omitNorms=true/ fieldType name=sfloat class=solr.SortableFloatField sortMissingLast=true omitNorms=true/ fieldType name=sdouble class=solr.SortableDoubleField sortMissingLast=true omitNorms=true/ !-- Date/Time types -- fieldType name=date class=solr.DateField sortMissingLast=true omitNorms=true/ !-- Pseudo types -- fieldType name=random class=solr.RandomSortField indexed=true / !-- Analyzing types -- fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=textTight class=solr.TextField positionIncrementGap=100
fl and nulls
Hi, In Solr 1.3 it looks like null fields were returned if requested with the fl param,, whereas with solr 1.4, nulls are omitted entirely. Is there a way to have the nulls returned with Solr 1.4 e.g. ... doc field1/ field2/ /doc Cheers, Dan
Dynamic analyzers
Hi, I have a requirement to dynamically choose a fieldType to analyze text in multiple languages. I will know the language (in a separate field) at index and query time. I've tried implementing this with a custom UpdateRequestProcessorFactory and custom DocumentBuilder.toDocument to change the FieldType, but this dosen't work. I realize I can have e.g. text_en, text_de,... and dynamically populate this with a custom UpdateRequestProcessorFactory, but we are worried with all the languages (lets say 50+) that effectively doing an OR with 50 fields will be a performance issue, is this true? Many thanks in advance, Dan
Custom sorting
Hi, I have a requirement to do the following: For up to the first 10 results (i.e. only on the first page) show sponsored category ads, in order of bid, but no more than 2 / category, and only if all sponsored cat' ads are more that min% of the highest score. e.g. If I had the following: min% =1 doc score bid cat_id sponsored 1 100 x x 0 255x x 0 3502 2 1 4202 2 1 5052 2 1 6801 1 1 7701 1 1 8601 1 1 x = dont care sorted order would be: 3 4 6 7 1 8 2 5 I'm not sure if this can be implemented with a custom comparator as I need access to the final score to enforce min%, I'm thinking I'm probably going to have to implement a subclass of QParserPlugin with a custom sort. but was wondering if there were alternatives ? Many thanks in advance. Dan