Re: change base Scoring Algorithms
Before you do so, you should have a look at function queries. Take a searchengine and look for some examples. I know there are some quite good ones to influence scoring by the creation-date, so that newer documents were scored higher than older ones. Kind regards - Mitch -- View this message in context: http://n3.nabble.com/change-base-Scoring-Algorithms-tp687041p687722.html Sent from the Solr - User mailing list archive at Nabble.com.
[ANN] Eclipse GIT plugin beta version released
GIT is one of the most popular distributed version control system. In the hope, that more Java developers may want to explore the world of easy branching, merging and patch management, I'd like to inform you, that a beta version of the upcoming Eclipse GIT plugin is available: http://www.infoq.com/news/2010/03/egit-released http://aniszczyk.org/2010/03/22/the-start-of-an-adventure-egitjgit-0-7-1/ Maybe, one day, some apache / hadoop projects will use GIT... :-) (Yes, I know git.apache.org.) Best regards, Thomas Koch, http://www.koch.ro
Re: question about synonyms and response
Reading the wiki, one can see that the synonyms are added to the query, when synonym-expanding at querytime is true. That means instead of searching only for nice you search for example for nice | pretty. I suggest you to read the wiki, searching for synonymFilter, and consider the noticed use-cases for your own schema.xml. Kind regards - Mitch -- View this message in context: http://n3.nabble.com/question-about-synonyms-and-response-tp686737p687878.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH - Unable to connect to a DB using JDBC:ODBC bridge
Hi, sorry, I have not much experiences in doing this with Solr, but my data-config.xml looks like: dataConfig dataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/db user=user password=... batchSize=-1/ document /document /dataConfig The db at the end of the url stands for the db you want to use. Perhaps this helps a little bit. Kind regards - Mitch -- View this message in context: http://n3.nabble.com/DIH-Unable-to-connect-to-a-DB-using-JDBC-ODBC-bridge-tp686781p687887.html Sent from the Solr - User mailing list archive at Nabble.com.
Search accross more than one field (dismax) ignored
Hello community, it seems like the query parser ignores any other field to search on. The only one that is not ignored is the standardSearchField. My search-url: select/?q=videoqt=dismaxqf=titleMain^2.0+titleShort^5.3debugQuery=on The parsedquerystring etc.: -- str name=rawquerystringvideo/str str name=querystringvideo/str − str name=parsedquery +DisjunctionMaxQuery((titleMain:video^2.0)~0.01) DisjunctionMaxQuery((titleMain:video^2.0)~0.01) /str − str name=parsedquery_toString +(titleMain:video^2.0)~0.01 (titleMain:video^2.0)~0.01 /str -- My solrconfig for the dismax handler: requestHandler name=dismax class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qf titleMain^2.5 titleShort^1.9 descriptionMain^2.0 descriptionExcerpt^1.5 /str str name=pf titleMain^2.0 titleShort^1.2 descriptionMain^1.2 descriptionExcerpt^1.1 /str str name=fl ID,title,score /str str name=mm 2lt;-1 5lt;-2 6lt;90% /str int name=ps10/int str name=q.alt*:*/str /lst Every field mentioned above is set to indexed=true in the schema.xml. Even when I do not query against the dismax-requestHandler, a search accross more than one field seems to fail. Any suggestions where you would search for the error are welcome. - Mitch -- View this message in context: http://n3.nabble.com/Search-accross-more-than-one-field-dismax-ignored-tp687935p687935.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Re: Multiple QParserPlugins, Single RequestHandler
Ahh, SearchComponent, that sounds like the one. I'll have a go with that and see how I get on. Hooking into log4j might be an option also. Many thanks for pointing out the right direction. Peter On Mar 30, 2010 9:54pm, Erik Hatcher erik.hatc...@gmail.com wrote: On Mar 30, 2010, at 2:43 PM, Peter S wrote: I have an existing QParserPlugin subclass that does some tagging functionality (kind of a group alias thing). This is currently registered with the default queryHandler. I want to add another, quite separate plugin that writes an audit of every query request that comes in. Sounds like what you want is a SearchComponent, not a QParserPlugin. You'll have to plug it into each request handler in the config though. or... Being able to track what has happened on a Solr instance in a non-repudiated fashion would be [hopefully] useful for others as well (eg if you're storing/accessing secure documents and need to know every time someone accesses something). I know there is some logging that tracks requests etc., but log files are difficult to secure in a forensically-legal way. Maybe whatever generates the log entries can be plugged into so that secure, 'tamper-proof' audit trails can be generated? The logging is able to be hooked, so you could write your own log handler to write the events elsewhere. This is left as an exercise for the reader, since it will depend on which logging framework employed. Erik
Shred queries on EmbeddedSolrServer
In my application I need to create and destroy indexes via java code, so to bypass the http requests I'm using the EmbeddedSolrServer, and I am creating different SolrCore(s) one per every index I need. Now the point is that a requirement of my application is the capability to perfom a query on a specific index, on a subset of indexes, or on every index. I have been looking to the shred parameter: http://localhost:8080/solr/core1/select?shards=localhost:8080/solr/core1,localhost:8080/solr/core2q=somehttp://localhost:8080/solr/core1/select?shards=localhost:8080/solr/core1,localhost:8080/solr/core2q=some query... ...and ok, but my solr core instances doesn't expose an http interface, so how can I shred a query between all my solr cores? Thanks in advance, Claudio
Re: SOLR-1316 How To Implement this autosuggest component ???
On 2010-03-31 06:14, Andy wrote: --- On Tue, 3/30/10, Andrzej Bialeckia...@getopt.org wrote: From: Andrzej Bialeckia...@getopt.org Subject: Re: SOLR-1316 How To Implement this autosuggest component ??? To: solr-user@lucene.apache.org Date: Tuesday, March 30, 2010, 9:59 AM On 2010-03-30 15:42, Robert Muir wrote: On Mon, Mar 29, 2010 at 11:34 PM, Andyangelf...@yahoo.com wrote: Reading through this thread and SOLR-1316, there seems to be a lot of different ways to implement auto-complete in Solr. I've seen the mentions of: EdgeNGrams TermsComponent Faceting TST Patricia Tries RadixTree DAWG Another idea is you can use the Automaton support in the lucene flexible indexing branch: to query the index directly with a DFA that represents whatever terms you want back. The idea is that there really isn't much gain in building a separate Pat, Radix Tree, or DFA to do this when you can efficiently intersect a DFA with the existing terms dictionary. I don't really understand what autosuggest needs to do, but if you are doing things like looking for mispellings you can easily build a DFA that recognizes terms within some short edit distance with the support thats there (the LevenshteinAutomata class), to quickly get back candidates. You can intersect/concatenate/union these DFAs with prefix or suffix DFAs if you want too, don't really understand what the algorithm should do, but I'm happy to try to help. The problem is a bit more complicated. There are two issues: * simple term-level completion often produces wrong results for multi-term queries (which are usually rewritten as weak phrase queries), * the weights of suggestions should not correspond directly to IDF in the index - much better results can be obtained when they correspond to the frequency of terms/phrases in the query logs ... TermsComponent and EdgeNGrams, while simple to use, suffer from both issues. Thanks. I actually have 2 use cases for autosuggest: 1) The normal one - I want to suggest search terms to users after they've typed a few letters. Just like Google suggest. Looks like for this use case SOLR-1316 is the best option. Right? Hopefully, yes - it depends on how you intend to populate the TST. If you populate it from the main index, then (unless you have indexed phrases) there won't be any benefit over the TermsComponent. It may be faster, but it will take more RAM. If you populate it from a list of top-N queries, then SOLR-1316 is the way to go. 2) I have a field city with values that are entered by users. When a user is entering his city, I want to make suggestion based on what cities have already been entered so far by other users -- in order to reduce chances of duplication. What method would you recommend for this use case? If the city field is not analyzed then TermsComponent is easiest to use. If it is analyzed, but vast majority of cities are single terms, then TermsComponent is ok too. If you want to assign different priorities to suggestions (other than a simple IDF based priority), or have many city names consisting of multiple tokens, then use SOLR-1316. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: Open Lucene indexs dynamically
On Mar 31, 2010, at 5:05 AM, Pierre FILSTROFF wrote: Hello ! I have a question about a specific configuration of solR : I want to configure solR to open Lucene indexs when it needs (also open indexs dynamically), and close them after a determined duration. The goal of this configuration is also to have a shorter initial loading of solR (approximately 2 hours at this moment). How big is your index? And what kind of hardware are you running? 2 hours seems really long. Could solR execute this kind of behavior ? And what could be the configuration ? Thank you! ++ Pierre -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
exclude words?
Hey there, I'm sure this easy a pretty easy thing, but I can't find the solution: can I search for a text with one word (e.g. books) especially not in it? so solr returns all documents, that don't have books somewhere in them? thanks for the help, sebastian
Re: exclude words?
I think you can use something like q=hello world -books. Should do. On Wed, Mar 31, 2010 at 7:34 PM, Sebastian Funk qbasti.f...@googlemail.comwrote: Hey there, I'm sure this easy a pretty easy thing, but I can't find the solution: can I search for a text with one word (e.g. books) especially not in it? so solr returns all documents, that don't have books somewhere in them? thanks for the help, sebastian -- - Siddhant
Persistent /repliction remotely...
Hi, Would anyone know if it is possible to persistently enable/disable repliction remotely using Solrj (http api)? i.e. If http://slave_host:port/solr/repliction?command=disablepollhttp://slave_host:port/solr/replication?command=disablepollis issued, and slave_host is subsequently restarted, is there a way to ensure it remembers it isn't supposed to be replic_ting. Thanks, Peter
Watch-words for the postmaster
Hi, Sorry for putting this on the user list directly - there seems to be no allowed response from the postmaster account. I couldn't figure out why there was so little on the subject of replic_tion on the user-list, until I posted a question about it. I found that the email server's spam blocker doesn't like the word replic_tion, as it thinks the sender is trying to sell dodgy wristware. Remplikeytion is such a great feature in Solr, it seems a shame for the forum to miss out on talking/learning more about it simply because of cheap bling. If the postmaster of this list could perhaps remove this and related stem words from its block list, that would be great.
Experience with Solr and JVM heap sizes over 2 GB
Hello all, We have been running a configuration in production with 3 solr instances under one tomcat with 16GB allocated to the JVM. (java -Xmx16384m -Xms16384m) I just noticed the warning in the LucidWorks Certified Distribution Reference Guide that warns against using more than 2GB (see below). Are other people using systems with over 2GB allocated to the JVM? What steps can we take to determine if performance is being adversely affected by the large heap size? “The larger the heap the longer it takes to do garbage collection. This can mean minor, random pauses or, in extreme cases, “freeze the world” pauses of a minute or more. As a practical matter, this can become a serious problem for heap sizes that exceed about two gigabytes, even if far more physical memory is available.” http://www.lucidimagination.com/search/document/CDRG_ch08_8.4.1?q=memory%20caching Tom Burton-West -- lst name=jvm str name=version14.2-b01/str str name=nameJava HotSpot(TM) 64-Bit Server VM/str int name=processors16/int − lst name=memory str name=free2.3 GB/str str name=total15.3 GB/str str name=max15.3 GB/str str name=used13.1 GB (%85.3)/str /lst
Re: Watch-words for the postmaster
Hmmm, I see plenty of stuff in the archives that mention replication. Testing: replica replication fake fake watch fake watch rolex rolex replica top quality replica watch brands authentic -Yonik http://www.lucidimagination.com On Wed, Mar 31, 2010 at 11:09 AM, Peter Sturge peter.stu...@googlemail.com wrote: Hi, Sorry for putting this on the user list directly - there seems to be no allowed response from the postmaster account. I couldn't figure out why there was so little on the subject of replic_tion on the user-list, until I posted a question about it. I found that the email server's spam blocker doesn't like the word replic_tion, as it thinks the sender is trying to sell dodgy wristware. Remplikeytion is such a great feature in Solr, it seems a shame for the forum to miss out on talking/learning more about it simply because of cheap bling. If the postmaster of this list could perhaps remove this and related stem words from its block list, that would be great.
Re: Experience with Solr and JVM heap sizes over 2 GB
I have used up to 27GB of heap with no issues, both SOLR and (just) Lucene. -Glen Newton http://zzzoot.blogspot.com/ On 31 March 2010 11:34, Burton-West, Tom tburt...@umich.edu wrote: Hello all, We have been running a configuration in production with 3 solr instances under one tomcat with 16GB allocated to the JVM. (java -Xmx16384m -Xms16384m) I just noticed the warning in the LucidWorks Certified Distribution Reference Guide that warns against using more than 2GB (see below). Are other people using systems with over 2GB allocated to the JVM? What steps can we take to determine if performance is being adversely affected by the large heap size? “The larger the heap the longer it takes to do garbage collection. This can mean minor, random pauses or, in extreme cases, “freeze the world” pauses of a minute or more. As a practical matter, this can become a serious problem for heap sizes that exceed about two gigabytes, even if far more physical memory is available.” http://www.lucidimagination.com/search/document/CDRG_ch08_8.4.1?q=memory%20caching Tom Burton-West -- lst name=jvm str name=version14.2-b01/str str name=nameJava HotSpot(TM) 64-Bit Server VM/str int name=processors16/int − lst name=memory str name=free2.3 GB/str str name=total15.3 GB/str str name=max15.3 GB/str str name=used13.1 GB (%85.3)/str /lst -- -
Re: Experience with Solr and JVM heap sizes over 2 GB
On Wed, Mar 31, 2010 at 11:34 AM, Burton-West, Tom tburt...@umich.edu wrote: Hello all, We have been running a configuration in production with 3 solr instances under one tomcat with 16GB allocated to the JVM. (java -Xmx16384m -Xms16384m) I just noticed the warning in the LucidWorks Certified Distribution Reference Guide that warns against using more than 2GB (see below). Are other people using systems with over 2GB allocated to the JVM? Plenty of people. People always want specific numbers for the general case (how many documents, how large a heap, etc)... and those specific numbers are always wrong for a good percent of the population and their specific setups and needs :-) In general, you don't want your heap larger than it needs to be - this leaves more free RAM for the OS to cache important parts of the lucene index files. What steps can we take to determine if performance is being adversely affected by the large heap size? If your query response latencies are acceptable, I wouldn't worry about it. If they normally are, but sometimes aren't, then GC could be the issue. One way to investigate further is to use the -verbose:gc and -XX:-PrintGC* options: http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp -Yonik http://www.lucidimagination.com
Re: Delivery Status Notification (Failure)
Hi Yonik, Well, I tried to reply to your email with all those spambot favourites, and I got the mailer daemon error below. Maybe it likes 'private' domains like LucidImagination better than googlemail... To be honest, I'd much rather live with mispelling replucidate than put up with spambots. Thanks On Wed, Mar 31, 2010 at 5:18 PM, Mail Delivery Subsystem mailer-dae...@googlemail.com wrote: Delivery to the following recipient failed permanently: solr-user@lucene.apache.org Technical details of permanent failure: Google tried to deliver your message, but it was rejected by the recipient domain. We recommend contacting the other email provider for further information about the cause of this error. The error that the other server returned was: 552 552 spam score (8.8) exceeded threshold (state 18). - Original message - MIME-Version: 1.0 Received: by 10.216.176.72 with HTTP; Wed, 31 Mar 2010 09:18:11 -0700 (PDT) In-Reply-To: y2gc68e39171003310838ja9ef7bf6g76b17634ec60b...@mail.gmail.com References: r2s7cd732451003310809oa3915e87nb11383c7de363...@mail.gmail.com y2gc68e39171003310838ja9ef7bf6g76b17634ec60b...@mail.gmail.com Date: Wed, 31 Mar 2010 17:18:11 +0100 Received: by 10.216.85.140 with SMTP id u12mr751266wee.78.1270052291422; Wed, 31 Mar 2010 09:18:11 -0700 (PDT) Message-ID: i2g7cd732451003310918qb0d547c0gbcc00d73c332...@mail.gmail.com Subject: Re: Watch-words for the postmaster From: Peter Sturge peter.stu...@googlemail.com To: solr-user@lucene.apache.org, yo...@lucidimagination.com Content-Type: multipart/alternative; boundary=0016e6db2ad6ad296e04831b1736 Indeed. Must be me, then - and I don't even wear a watch... On Wed, Mar 31, 2010 at 4:38 PM, Yonik Seeley yo...@lucidimagination.com wrote: (...content removed to allow send...)
Query time only Ranges
Hi All, I am working on use case - wherein i need to Query to just time ranges without date component. search for docs with between 4pm - 6pm Approaches- create something like - [01/01/1900T16:00:00Z TO 01/01/1900T18:00:00Z ] - a fixed time component or create a field for hh only or may be create a custom field for Time only Please suggest me which will be a good approach or any other approach if possible Ankit -- View this message in context: http://n3.nabble.com/Query-time-only-Ranges-tp688831p688831.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Some indexing requests to Solr fail
Hi there, Thanks for the reply! Our backend code is currently set to commit every time it sends over a batch of documents - so it depends on how big the batch is and how often edits occur - probably too often. I've looked at the code, and the SolrJ commit() method takes two parameters - one is called waitSearcher, and another waitFlush. They aren't really documented too well, but I assume that the waitSearcher bool (currently set to false) may be part of the problem. I am considering removing the code that calls the commit() method altogether and relying on the settings for DirectUpdateHandler2 to determine when commits actually get done. That way we can tweak it on the Solr side without having to recompile and redeploy our main app (or by having to add new settings and code to handle them to our main app). Out of curiosity; how are people doing optimize() calls? Are you doing them immediately after every commit(), or periodically as part of a job? Jon On 31 Mar 2010, at 05:11, Lance Norskog wrote: How often do you commit? New searchers are only created after a commit. You notice that handleCommit is in the stack trace :) This means that commits are happening too often for the amount of other traffic currently happening, and so it can't finishing creating the searcher before the next commit starts the next searcher. The service unavailable messages are roughly the same problem: these commits might be timing out because the other end is too busy doing commits. You might try using autocommit instead: commits can happen every N documents, every T seconds, or both. This keeps the commit overhead to a controlled amount and commits should stay behind warming up previous searchers. On Tue, Mar 30, 2010 at 7:15 AM, Jon Poulton jon.poul...@vyre.com wrote: Hi there, We have a setup in which our main application (running on a separate Tomcat instance on the same machine) uses SolrJ calls to an instance of Solr running on the same box. SolrJ is used both for indexing and searching Solr. Searching seems to be working fine, but quite frequently we see the following stack trace in our application logs: org.apache.solr.common.SolrException: Service Unavailable Service Unavailable request: http://localhost:8070/solr/unify/update/javabin at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request (CommonsHttpSolrServer.java:424) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request (CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process (AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java: 86) at vyre.content.rabida.index.RemoteIndexingThread.sendIndexRequest (RemoteIndexingThread.java:283) at vyre.content.rabida.index.RemoteIndexingThread.commitBatch (RemoteIndexingThread.java:195) at vyre.util.thread.AbstractBatchProcessor.commit (AbstractBatchProcessor.java:93) at vyre.util.thread.AbstractBatchProcessor.run (AbstractBatchProcessor.java:117) at java.lang.Thread.run(Thread.java:619) Looking in the Solr logs, there does not appear to be any problems. The host and port number are correct, its just sometimes our content gets indexed (visible in the solr logs), and sometimes it doesn't (nothing visible in solr logs). I'm not sure what could be causing this problem, but I can hazard a couple of guesses; is there any upper llimit on the size of a javabin request, or any point at which the service would decide that the POST was too large? Has any one else encountered a similar problem? On a final note, scrolling back through the solr logs does reveal the following: 29-Mar-2010 17:05:25 org.apache.solr.core.SolrCore getSearcher WARNING: [unify] Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. 29-Mar-2010 17:05:25 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 22 29-Mar-2010 17:05:25 org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java: 1029) at org.apache.solr.update.DirectUpdateHandler2.commit (DirectUpdateHandler2.java:418) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit (RunUpdateProcessorFactory.java:85) at org.apache.solr.handler.RequestHandlerUtils.handleCommit (RequestHandlerUtils.java:107) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody (ContentStreamHandlerBase.java:48) at org.apache.solr.handler.RequestHandlerBase.handleRequest (RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute (SolrDispatchFilter.java:338) at
Re: DIH - Unable to connect to a DB using JDBC:ODBC bridge
Hi Mitch, The configuration that you have seems to be perfectly fine . Could you please let us know what error you are seeing in the logs ? Also, could you please confirm whether you have the mysql-connector-java-5.1.12-bin.jar under the lib folder ? Following is my configuration that I used and works perfectly fine dataSource driver=com.mysql.jdbc.Driver autoCommit=true url=jdbc:mysql://localhost:3306/mysql user=username password=password / Thanks, sS - Original Message From: MitchK mitc...@web.de To: solr-user@lucene.apache.org Sent: Wed, March 31, 2010 12:57:04 AM Subject: Re: DIH - Unable to connect to a DB using JDBC:ODBC bridge Hi, sorry, I have not much experiences in doing this with Solr, but my data-config.xml looks like: dataConfig dataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/db user=user password=... batchSize=-1/ document /document /dataConfig The db at the end of the url stands for the db you want to use. Perhaps this helps a little bit. Kind regards - Mitch -- View this message in context: http://n3.nabble.com/DIH-Unable-to-connect-to-a-DB-using-JDBC-ODBC-bridge-tp686781p687887.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query time only Ranges
Hi Ankit, Try the following approach. create a query like - [01/01/1900T16:00:00Z/HOUR TO 01/01/1900T18:00:00Z/HOUR ] Solr will automatically will take care of Rounding off to the HOUR specified. For eg: the query - [01/01/1900T16:43:42Z/HOUR TO 01/01/1900T18:55:23Z/HOUR ] would be equivalent to [01/01/1900T16:00:00Z/HOUR TO 01/01/1900T18:00:00Z/HOUR ] Regards, sS - Original Message From: abhatna...@vantage.com abhatna...@vantage.com To: solr-user@lucene.apache.org Sent: Wed, March 31, 2010 9:56:38 AM Subject: Query time only Ranges Hi All, I am working on use case - wherein i need to Query to just time ranges without date component. search for docs with between 4pm - 6pm Approaches- create something like - [01/01/1900T16:00:00Z TO 01/01/1900T18:00:00Z ] - a fixed time component or create a field for hh only or may be create a custom field for Time only Please suggest me which will be a good approach or any other approach if possible Ankit -- View this message in context: http://n3.nabble.com/Query-time-only-Ranges-tp688831p688831.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query time only Ranges
Small typo..Corrected and sending.. the query - [01/01/1900T16:43:42Z/HOUR TO 01/01/1900T18:55:23Z/HOUR ] would be equivalent to [01/01/1900T16:00:00Z TO 01/01/1900T18:00:00Z] Thx, Tiru - Original Message From: Silent Surfer silentsurfe...@yahoo.com To: solr-user@lucene.apache.org Sent: Wed, March 31, 2010 12:36:22 PM Subject: Re: Query time only Ranges Hi Ankit, Try the following approach. create a query like - [01/01/1900T16:00:00Z/HOUR TO 01/01/1900T18:00:00Z/HOUR ] Solr will automatically will take care of Rounding off to the HOUR specified. For eg: the query - [01/01/1900T16:43:42Z/HOUR TO 01/01/1900T18:55:23Z/HOUR ] would be equivalent to [01/01/1900T16:00:00Z/HOUR TO 01/01/1900T18:00:00Z/HOUR ] Regards, sS - Original Message From: abhatna...@vantage.com abhatna...@vantage.com To: solr-user@lucene.apache.org Sent: Wed, March 31, 2010 9:56:38 AM Subject: Query time only Ranges Hi All, I am working on use case - wherein i need to Query to just time ranges without date component. search for docs with between 4pm - 6pm Approaches- create something like - [01/01/1900T16:00:00Z TO 01/01/1900T18:00:00Z ] - a fixed time component or create a field for hh only or may be create a custom field for Time only Please suggest me which will be a good approach or any other approach if possible Ankit -- View this message in context: http://n3.nabble.com/Query-time-only-Ranges-tp688831p688831.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spatial / Local Solr radius
Mauricio, I hooked up the spatial solr plugin to the Eclipse debugger and narrowed the problem down to CartesianShapeFilter.getBoxShape(). The algorithm used in the method can produce values of startX that are greater than endX depending on the tier level returned by CartesianTierPlotter.bestFit(). In this case, the for loop is skipped altogether and the method returns a CartesianShape object with an empty boxIds list. I notice the problem when I have small, geographically sparse datasets. I'm going to shoot the jteam an email regarding this. Michael D. On Tue, Mar 30, 2010 at 5:10 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Hi Michael I exchanged a few mails with jteam, ultimately I realized my longitudes' signs were inverted so I was mapping to China instead of U.S. Still a bug, but inverting those longitudes fixed the problem in my case since I'm not running world-wide searches. Before that I ran a test to determine what radii failed for a grid of 3x3 lat/long with radius between 10 and 2500, if you're interested I can send you the results to compare. Also I'm running RC3, I see RC4 is out but haven't tried it. It would be interesting to see if this happens with the new spatial functions in trunk. -- Mauricio On Tue, Mar 30, 2010 at 4:00 PM, Michael solrco...@gmail.com wrote: Mauricio, I was wondering whether you had heard anything back from jteam regarding this issue. I have also noticed it and was wondering why It was happening. One thing I noticed is that this problem only appears for sparse datasets as compared to dense ones. For example, I have two datasets I've been testing with - one with 56 U.S. cities (the sparse set) and one with over 197000 towns and cities (the dense set). The dense set exhibited no problems with consistency searching at various radii, but the sparse set exhibited the same issues you experienced. Michael D. On Mon, Dec 28, 2009 at 7:39 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: It's jteam's plugin ( http://www.jteam.nl/news/spatialsolr ) which AFAIK is just the latest patch for SOLR-773 packaged as a stand-alone plugin. I'll try to contact jteam directly. Thanks Mauricio On Mon, Dec 28, 2009 at 8:02 PM, Grant Ingersoll gsing...@apache.org wrote: On Dec 28, 2009, at 11:47 AM, Mauricio Scheffer wrote: q={!spatial lat=43.705 long=116.3635 radius=100}*:* What's QParser is the spatial plugin? I don't know of any such QParser in Solr. Is this a third party tool? If so, I'd suggest asking on that list. with no other parameters. When changing the radius to 250 I get no results. In my config I have startTier = 9 and endTier = 17 (default values) On Mon, Dec 28, 2009 at 1:24 PM, Grant Ingersoll gsi...@gmail.com wrote: What do your queries look like? On Dec 28, 2009, at 9:30 AM, Mauricio Scheffer wrote: Hi everyone, I'm getting inconsistent behavior from Spatial Solr when searching with different radii. For the same lat/long I get: radius=1 - 1 result radius=10 - 0 result radius=25 - 2 results radius=100 - 2 results radius=250 - 0 results I don't understand why radius=10 and 250 return no results. Is this a known bug? I'm using the default configuration as specified in the PDF. BTW I also tried LocalSolr with the same results. Thanks Mauricio -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: Spatial / Local Solr radius
Michael, This was a problem I encountered as well, sometime late summer last year. My memory is a bit hazy on the details, but as far as I remember the problem centered around the tier level being set incorrectly. Additionally, I think there's a JUnit test (perhaps CartesianShapeFilterTest?) that would indicate the source of the problem but large sections of the test are invalidated/commented out for the spatial change(s). Again, I haven't touched this code in several months but that's my recollection on the issue. Either way, it's certainly not an isolated problem, though my test datasets were also sparse and geographically distant. -Sean -- Forwarded Message From: Michael solrco...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Wed, 31 Mar 2010 13:33:39 -0700 To: solr-user@lucene.apache.org Subject: Re: Spatial / Local Solr radius Mauricio, I hooked up the spatial solr plugin to the Eclipse debugger and narrowed the problem down to CartesianShapeFilter.getBoxShape(). The algorithm used in the method can produce values of startX that are greater than endX depending on the tier level returned by CartesianTierPlotter.bestFit(). In this case, the for loop is skipped altogether and the method returns a CartesianShape object with an empty boxIds list. I notice the problem when I have small, geographically sparse datasets. I'm going to shoot the jteam an email regarding this. Michael D. On Tue, Mar 30, 2010 at 5:10 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Hi Michael I exchanged a few mails with jteam, ultimately I realized my longitudes' signs were inverted so I was mapping to China instead of U.S. Still a bug, but inverting those longitudes fixed the problem in my case since I'm not running world-wide searches. Before that I ran a test to determine what radii failed for a grid of 3x3 lat/long with radius between 10 and 2500, if you're interested I can send you the results to compare. Also I'm running RC3, I see RC4 is out but haven't tried it. It would be interesting to see if this happens with the new spatial functions in trunk. -- Mauricio On Tue, Mar 30, 2010 at 4:00 PM, Michael solrco...@gmail.com wrote: Mauricio, I was wondering whether you had heard anything back from jteam regarding this issue. I have also noticed it and was wondering why It was happening. One thing I noticed is that this problem only appears for sparse datasets as compared to dense ones. For example, I have two datasets I've been testing with - one with 56 U.S. cities (the sparse set) and one with over 197000 towns and cities (the dense set). The dense set exhibited no problems with consistency searching at various radii, but the sparse set exhibited the same issues you experienced. Michael D. On Mon, Dec 28, 2009 at 7:39 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: It's jteam's plugin ( http://www.jteam.nl/news/spatialsolr ) which AFAIK is just the latest patch for SOLR-773 packaged as a stand-alone plugin. I'll try to contact jteam directly. Thanks Mauricio On Mon, Dec 28, 2009 at 8:02 PM, Grant Ingersoll gsing...@apache.org wrote: On Dec 28, 2009, at 11:47 AM, Mauricio Scheffer wrote: q={!spatial lat=43.705 long=116.3635 radius=100}*:* What's QParser is the spatial plugin? I don't know of any such QParser in Solr. Is this a third party tool? If so, I'd suggest asking on that list. with no other parameters. When changing the radius to 250 I get no results. In my config I have startTier = 9 and endTier = 17 (default values) On Mon, Dec 28, 2009 at 1:24 PM, Grant Ingersoll gsi...@gmail.com wrote: What do your queries look like? On Dec 28, 2009, at 9:30 AM, Mauricio Scheffer wrote: Hi everyone, I'm getting inconsistent behavior from Spatial Solr when searching with different radii. For the same lat/long I get: radius=1 - 1 result radius=10 - 0 result radius=25 - 2 results radius=100 - 2 results radius=250 - 0 results I don't understand why radius=10 and 250 return no results. Is this a known bug? I'm using the default configuration as specified in the PDF. BTW I also tried LocalSolr with the same results. Thanks Mauricio -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search -- End of Forwarded Message
Re: Query time only Ranges
In that case, you could just calculate an offset from 00:00:00 in seconds (ignore the date) Pretty simple. On Wed, Mar 31, 2010 at 4:57 PM, abhatna...@vantage.com abhatna...@vantage.com wrote: Hi Sashi, Could you elaborate point no .1 in the light of case where in a field should have just time? Ankit -- View this message in context: http://n3.nabble.com/Query-time-only-Ranges-tp688831p689413.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH - Unable to connect to a DB using JDBC:ODBC bridge
I can only speculate, since I am not sure, why you are using { and } in your declarations. I don't really know what you are expecting {MetaMatrix ODBC} should do. The mysql-connector can be loaded, because I have set a classpath to it (it is stored in my JRE's root-directory). Hope this helps? Mitch -- View this message in context: http://n3.nabble.com/DIH-Unable-to-connect-to-a-DB-using-JDBC-ODBC-bridge-tp686781p689549.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: resetting stats
: reloading the core just to reset the stats definitely seems like throwing : out the baby with the bathwater. Agreed about throwing out the baby with the bath water - if stats need to be reset, though, then that's the only way today. A reset stats button would be a nice way to prevent having to do this. : Huh? ... how would having an extra core (with no data) help you with : getting aggregate stats from your request handlers? Say I have 3 Cores names core0, core1, and core2, where only core1 and core2 have documents and caches. If all my searches hit core0, and core0 shards out to core1 and core2, then the stats from core0 would be accurate for errors, timeouts, totalTime, avgTimePerRequest, avgRequestsPerSecond, etc. Obviously this is based upon the following two assumptions: 1) The request handlers you are using/monitoring are distributed aware, and 2) you are using distributed search and all your queries are going to an aggregating core. I'm not suggesting that anyone needs a setup like this, just pointing out that this type of setup somewhat avoids throwing the baby out with the bath water by not putting a baby in the bath water that is going to be thrown out (core0). On Wed, Mar 31, 2010 at 6:40 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : You can reload the core on which you want to reset the stats - this lets you : keep the engine up and running without requiring you restart Solr. If you reloading the core just to reset the stats definitely seems like throwing out the baby with the bathwater. : have an separate core for aggregating (i.e. a core that contains no data and : has no caches) then the overhead for reloading that core is negligable and : the time to reload is essentially zero. Huh? ... how would having an extra core (with no data) help you with getting aggregate stats from your request handlers? If you want to know the avgTImePerRequest from handlerA, that numberisn't going to be useful if it comes from a core that isn't what your users are querying against : : Is there a way to reset the stats counters? For example in the Query : handler : : avgTimePerRequest is not much use after a while as it is an avg since the : : server started. -Hoss
Re: resetting stats
: Say I have 3 Cores names core0, core1, and core2, where only core1 and core2 : have documents and caches. If all my searches hit core0, and core0 shards : out to core1 and core2, then the stats from core0 would be accurate for : errors, timeouts, totalTime, avgTimePerRequest, avgRequestsPerSecond, etc. Ahhh yes. (i see what you mean by aggregating core now ... i thought you ment a core just for aggregatign stats) *If* you are using distributed search, then you can gather stats from the core you use for collating/aggregating from the other shards, and reloading that core should be cheap. but if you aren't already using distributed searching, it would be a bad idea from a performance standpoint to add it just to take advantage of being able to reload the coordinator core (the overhead of searching one distributed shard vs doing the same query directly is usually very measurable, even on if the shard is the same Solr instance as your coordinator) -Hoss
Re: exclude words?
: I think you can use something like q=hello world -books. Should do. or just q=-books ... finds all docs that do not have books (in the default search field) : so solr returns all documents, that don't have books somewhere in them? somewhere is kinda vague ... if you mean don't have the word 'books' in any field then not unless you use copyField to create a catchall field you can query against containing all the text. -Hoss
Re: Solr crashing while extracting from very simple text file
Does anyone have any thoughts or suggestions on this? I guess it's really a Tika problem. Should I try to report it to the Tika project? I wonder if someone could try it to see if it's a general problem or just me. I can reproduce it by firing up the nano editor, creating a file with XXBLE on one line and nothing else. Try indexing that and Solr / Tika crashes. I can avoid it by editing the file slightly but I haven't really been able to discover a consistent pattern. It works if I change the word to lower case. Also a three line file like this works a a XXBLE but not x x XXBLE It's a bit unfortunate because a similar word (a person's name ??BLE ) with the same problem appears frequently in upper case near the top of my files. Cheers Ross On Sun, Mar 21, 2010 at 12:58 PM, Ross tetr...@gmail.com wrote: Hi all I'm trying to import some text files. I'm mostly following Avi Rappoport's tutorial. Some of my files cause Solr to crash while indexing. I've narrowed it down to a very simple example. I have a file named test.txt with one line. That line is the word XXBLE and nothing else This is the command I'm using. curl http://localhost:8080/solr-example/update/extract?literal.id=1commit=true; -F myfi...@test.txt The result is pasted below. Other files work just fine. The problem seems to be related to the letters B and E. If I change them to something else or make them lower case then it works. In my real files, the XX is something else but the result is the same. It's a common word in the files. I guess for this quick and dirty job I'm doing I could do a bulk replace in the files to make it lower case. Is there any workaround for this? Thanks Ross htmlheadtitleApache Tomcat/6.0.20 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 500 - org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:636) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190) ... 18 more Caused by: java.lang.NullPointerException at
Re: Shred queries on EmbeddedSolrServer
You can create and destroy cores over the HTTP interface: http://www.lucidimagination.com/search/document/CDRG_ch08_8.2.5 But you are right, the Embedded Solr API does not support Distributed Search across multiple cores. See: org.apache.solr.handler.component.SearchHandler.submit() which very definitly only does HTTP requests. https://issues.apache.org/jira/browse/SOLR-1858 requests this feature. On Wed, Mar 31, 2010 at 3:51 AM, Claudio Atzori claudio.atz...@isti.cnr.it wrote: In my application I need to create and destroy indexes via java code, so to bypass the http requests I'm using the EmbeddedSolrServer, and I am creating different SolrCore(s) one per every index I need. Now the point is that a requirement of my application is the capability to perfom a query on a specific index, on a subset of indexes, or on every index. I have been looking to the shred parameter: http://localhost:8080/solr/core1/select?shards=localhost:8080/solr/core1,localhost:8080/solr/core2q=somehttp://localhost:8080/solr/core1/select?shards=localhost:8080/solr/core1,localhost:8080/solr/core2q=some query... ...and ok, but my solr core instances doesn't expose an http interface, so how can I shred a query between all my solr cores? Thanks in advance, Claudio -- Lance Norskog goks...@gmail.com
Re: Some indexing requests to Solr fail
'waitFlush' means 'wait until the data from this commit is completely written to disk'. 'waitSearcher' means 'wait until Solr has completely finished loading up the new index from what it wrote to disk'. Optimize rearranges the entire disk footprint of the disk. It needs a separate amount of free disk space in the same partition. Usually people run optimize overnight, not during active production hours. There is a way to limit the optimize pass so that it makes the index 'more optimized': the maxSegments parameter: http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22_and_.22optimize.22 On Wed, Mar 31, 2010 at 10:04 AM, Jon Poulton jon.poul...@vyre.com wrote: Hi there, Thanks for the reply! Our backend code is currently set to commit every time it sends over a batch of documents - so it depends on how big the batch is and how often edits occur - probably too often. I've looked at the code, and the SolrJ commit() method takes two parameters - one is called waitSearcher, and another waitFlush. They aren't really documented too well, but I assume that the waitSearcher bool (currently set to false) may be part of the problem. I am considering removing the code that calls the commit() method altogether and relying on the settings for DirectUpdateHandler2 to determine when commits actually get done. That way we can tweak it on the Solr side without having to recompile and redeploy our main app (or by having to add new settings and code to handle them to our main app). Out of curiosity; how are people doing optimize() calls? Are you doing them immediately after every commit(), or periodically as part of a job? Jon On 31 Mar 2010, at 05:11, Lance Norskog wrote: How often do you commit? New searchers are only created after a commit. You notice that handleCommit is in the stack trace :) This means that commits are happening too often for the amount of other traffic currently happening, and so it can't finishing creating the searcher before the next commit starts the next searcher. The service unavailable messages are roughly the same problem: these commits might be timing out because the other end is too busy doing commits. You might try using autocommit instead: commits can happen every N documents, every T seconds, or both. This keeps the commit overhead to a controlled amount and commits should stay behind warming up previous searchers. On Tue, Mar 30, 2010 at 7:15 AM, Jon Poulton jon.poul...@vyre.com wrote: Hi there, We have a setup in which our main application (running on a separate Tomcat instance on the same machine) uses SolrJ calls to an instance of Solr running on the same box. SolrJ is used both for indexing and searching Solr. Searching seems to be working fine, but quite frequently we see the following stack trace in our application logs: org.apache.solr.common.SolrException: Service Unavailable Service Unavailable request: http://localhost:8070/solr/unify/update/javabin at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request (CommonsHttpSolrServer.java:424) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request (CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process (AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java: 86) at vyre.content.rabida.index.RemoteIndexingThread.sendIndexRequest (RemoteIndexingThread.java:283) at vyre.content.rabida.index.RemoteIndexingThread.commitBatch (RemoteIndexingThread.java:195) at vyre.util.thread.AbstractBatchProcessor.commit (AbstractBatchProcessor.java:93) at vyre.util.thread.AbstractBatchProcessor.run (AbstractBatchProcessor.java:117) at java.lang.Thread.run(Thread.java:619) Looking in the Solr logs, there does not appear to be any problems. The host and port number are correct, its just sometimes our content gets indexed (visible in the solr logs), and sometimes it doesn't (nothing visible in solr logs). I'm not sure what could be causing this problem, but I can hazard a couple of guesses; is there any upper llimit on the size of a javabin request, or any point at which the service would decide that the POST was too large? Has any one else encountered a similar problem? On a final note, scrolling back through the solr logs does reveal the following: 29-Mar-2010 17:05:25 org.apache.solr.core.SolrCore getSearcher WARNING: [unify] Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. 29-Mar-2010 17:05:25 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 22 29-Mar-2010 17:05:25 org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java: 1029) at
backup command
Hi, I'm running the official Solr 1.4 release and encountering an exception and told that a file does not exist when using the java replication command=backup. It looks very much like SOLR-1475 which was fixed for 1.4. I tried adding a deletionPolicy within solrconfig.xml to keep commit points for 30 minutes, but still receive the error. Our index is about 25G. On occasion I have seen the backup finish, but unfortunately it fails more often. Does anyone have any pointers? Thanks for your help, Jake