Re: Configure Collection Distribution in Solr 1.3
As some people have mentioned here on this mailing lists, the solr 1.3 distribution scripts (snappuller / shooter) etc do not work on windows. Some have indicated that it might be possible to use cygwin but I have doubts. So unfortunately, windows users suffers with regard to replication (although I would reccommend everyone to use Unix for running servers;) ) That being said, you can use Solr 1.4 (one of the nightly builds) where you get built-in replication that is easily configured through the solr server configuration, and this works on Windows aswell! So, if you don't have any real reason to not upgrade, I suggest that you try out Solr 1.4 (which also gives lots of new features and major improvements!) Cheers, Aleksander On Tue, 09 Jun 2009 21:00:27 +0200, MaheshR mahesh.ray...@gmail.com wrote: Hi Aleksander , I gone thorugh the below links and successfully configured rsync using cygwin on windows xp. In Solr documentation they mentioned many script files like rysnc-enable, snapshooter..etc. These all UNIX based files scripts. where do I get these script files for windows OS ? Any help on this would be great helpful. Thanks MaheshR. Aleksander M. Stensby wrote: You'll find everything you need in the Wiki. http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline http://wiki.apache.org/solr/SolrCollectionDistributionScripts If things are still uncertain I've written a guide for when we used the solr distribution scrips on our lucene index earlier. You can read that guide here: http://www.integrasco.no/index.php?option=com_contentview=articleid=51:lucene-index-replicationcatid=35:blogItemid=53 Cheers, Aleksander On Mon, 08 Jun 2009 18:22:01 +0200, MaheshR mahesh.ray...@gmail.com wrote: Hi, we configured multi-core solr 1.3 server in Tomcat 6.0.18 servlet container. Its working great. Now I need to configure collection Distribution to replicate indexing data between master and 2 slaves. Please provide me step by step instructions to configure collection distribution between master and slaves would be helpful. Thanks in advance. Thanks Mahesh. -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
Re: Query on date fields
Hello, for this you can simply use the nifty date functions supplied by SOLR (given that you have indexed your fields with the solr Date field. If I understand you correctly, you can achieve what you want with the following union query: displayStartDate:[* TO NOW] AND displayEndDate:[NOW TO *] Cheers, Aleksander On Mon, 08 Jun 2009 09:17:26 +0200, prerna07 pkhandelw...@sapient.com wrote: Hi, I have two date attributes in my Indexes: DisplayStartDate_dt DisplayEndDate_dt I need to fetch results where today's date lies between displayStartDate and dislayEndDate. However i cannot send hardcoded displayStartdate and displayEndDate date in query as there are 1000 different dates in indexes Please suggest the query. Thanks, Prerna -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail
highlighting on edgeGramTokenized field -- hightlighting incorrect bc. position not incremented..
Hi, I'm trying to highlight based on a (multivalued) field (prefix2) that has (among other things) a EdgeNGramFilterFactory defined. highlighting doesn't increment the start-position of the highlighted portion, so in other words the highlighted portion is always the beginning of the field. for example: for prefix2: Orlando Verenigde Staten the query: http://localhost:8983/solr/autocompleteCore/select?fl=prefix2,idq=prefix2:%22ver%22wt=xmlhl=truehl.fl=prefix2 returns: emOrl/emando Verenigde Staten while it should be: Orlando emVer/emenigde Staten the field def: fieldType name=prefix_token class=solr.TextField positionIncrementGap=1 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType I checked that removing the EdgeNGramFilterFactory results in correct positioning of highlighting. (But then I can't search for ngrams...) What am I missing? Thanks in advance, Britske -- View this message in context: http://www.nabble.com/highlighting-on-edgeGramTokenized-field---%3E-hightlighting-incorrect-bc.-position-not-incremented..-tp23996196p23996196.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: change data dir location
- It should be possible to specify dataDir directly for a core in solr.xml (over and above specifying it as a variable). It should also be possible to pass the dataDir as a request parameter while creating a core through the REST API. - A simple scenario which requires this feature is when the location of the data directory depends on runtime parameters (such as free disk space or number of directories inside a directory). - You could accomplish this by using symlinks if u r running Solr under UNIX -- View this message in context: http://www.nabble.com/change-data-dir-location-tp23992946p23996245.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search Phrase Wildcard?
Yes...!! you can search for phrases with wild cards. You dont have a direct support for it.. but u can achieve like the following... User input: Solr we Query should be: (name:Solr AND (name:we* OR name:we)) OR name:Solr we The query builder parses the original input and builds one that simulates a wildcard phrase query. It looks for all the words the user entered and adds a wildcard (*) to the last word. It also searches for the whole phrase the user entered using a phrase query in case the whole phrase is found in the index. This should work! let me know if you have any issues... -- View this message in context: http://www.nabble.com/Search-Phrase-Wildcard--tp23978330p23996409.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting details from delete
Anything sent with delete query will be deleted. It doesnt give u the details of the deleted records. For example, if u send a command like deleteid20070424150841/id/delete it will delete the record with id 20070424150841 but not give u the record details if it is already deleted. We need to send some query to solr like http://localhost:8080/solr/delete?id:20070424150841name:deleted_record;. But I dont think that we have this option now. -- View this message in context: http://www.nabble.com/Getting-details-from-%3Cdelete%3E-tp23982798p23996672.html Sent from the Solr - User mailing list archive at Nabble.com.
Custom Request handler Error:
hi, i am new to apache solr. I need to create a custom request handler class. So i create a new one and changed the solr-config.xml file as, requestHandler name=/select class=solr.my.MyCustomHandler lst name=defaults str name=echoParamsexplicit/str str name=qtandem/str str name=debugQuerytrue/str /lst /requestHandler And in my java class, the code is, public class MyCustomHandler extends RequestHandlerBase{ public CoreContainer coreContainer; public void handleRequestBody(SolrQueryRequest request, SolrQueryResponse response) throws Exception { SolrCore coreToRequest = coreContainer.getCore(core2); ModifiableSolrParams params = new ModifiableSolrParams(); params.set(echoParams, explicit); params.set(q, text); params.set(debugQuery, true); request = new LocalSolrQueryRequest(coreToRequest, params); // SolrRequestHandler reqHandler = coreToRequest.getRequestHandler(/select); coreToRequest.execute(reqHandler, request, response); coreToRequest.close(); request.close(); } // the abstract methods - getDescription(), getSourceId(), getSource(), getVersion() are //overrided... but these methods doesn't have any implementations. } But, if i search any text in my webapp from browser, gots the HTTP 500 error. i dont know how SolrContainer is intialized Pls anyone give me the solution... thanks and regards, Mohamed
Re: DataImportHandler backwards compatibility
Thanks for the info. Just FYI, I've decided to retrofit the 1.3 DataImportHandler with the JDBC driver params functionality to get us around the OOM error problem with as few changes as possible. kevin On 11 Jun 2009, at 14:42, Shalin Shekhar Mangar wrote: On Thu, Jun 11, 2009 at 6:42 PM, Kevin Lloyd kll...@lulu.com wrote: I'm in the process of implementing a DataImportHandler config for Solr 1.3 and I've hit across the Postgresql/JDBC Out Of Memory problem. Whilst the solution is documented on the wiki FAQ page: http://wiki.apache.org/solr/DataImportHandlerFaq it appears that the JDBC driver parameters were implemented in DataImportHandler post the 1.3 release. Yes, those parameters are new in 1.4 (we should note that on the wiki). I was wondering if it would be safe to take a nightly build of just the DataImportHandler contrib and run it against a Solr 1.3 installation? Solr 1.4 has a rollback command which 1.3 did not have. So, you'd need to hack the DataImportHandler code to remove references to RollBackCommand. You can use the 1.4 dih jar with 1.3 if you comment out the code in SolrWriter.rollback method, remove the import of RollbackUpdateCommand and recompile. -- Regards, Shalin Shekhar Mangar.
Re: DataImportHandler backwards compatibility
you can just drop in the new JdbcDataSource.java into the 1.3 release (and build it) and it should be just fine. On Fri, Jun 12, 2009 at 5:55 PM, Kevin Lloydkll...@lulu.com wrote: Thanks for the info. Just FYI, I've decided to retrofit the 1.3 DataImportHandler with the JDBC driver params functionality to get us around the OOM error problem with as few changes as possible. kevin On 11 Jun 2009, at 14:42, Shalin Shekhar Mangar wrote: On Thu, Jun 11, 2009 at 6:42 PM, Kevin Lloyd kll...@lulu.com wrote: I'm in the process of implementing a DataImportHandler config for Solr 1.3 and I've hit across the Postgresql/JDBC Out Of Memory problem. Whilst the solution is documented on the wiki FAQ page: http://wiki.apache.org/solr/DataImportHandlerFaq it appears that the JDBC driver parameters were implemented in DataImportHandler post the 1.3 release. Yes, those parameters are new in 1.4 (we should note that on the wiki). I was wondering if it would be safe to take a nightly build of just the DataImportHandler contrib and run it against a Solr 1.3 installation? Solr 1.4 has a rollback command which 1.3 did not have. So, you'd need to hack the DataImportHandler code to remove references to RollBackCommand. You can use the 1.4 dih jar with 1.3 if you comment out the code in SolrWriter.rollback method, remove the import of RollbackUpdateCommand and recompile. -- Regards, Shalin Shekhar Mangar. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Custom Request handler Error:
is there any error on the console? On Fri, Jun 12, 2009 at 4:26 PM, Noornoo...@opentechindia.com wrote: hi, i am new to apache solr. I need to create a custom request handler class. So i create a new one and changed the solr-config.xml file as, requestHandler name=/select class=solr.my.MyCustomHandler lst name=defaults str name=echoParamsexplicit/str str name=qtandem/str str name=debugQuerytrue/str /lst /requestHandler And in my java class, the code is, public class MyCustomHandler extends RequestHandlerBase{ public CoreContainer coreContainer; public void handleRequestBody(SolrQueryRequest request, SolrQueryResponse response) throws Exception { SolrCore coreToRequest = coreContainer.getCore(core2); ModifiableSolrParams params = new ModifiableSolrParams(); params.set(echoParams, explicit); params.set(q, text); params.set(debugQuery, true); request = new LocalSolrQueryRequest(coreToRequest, params); // SolrRequestHandler reqHandler = coreToRequest.getRequestHandler(/select); coreToRequest.execute(reqHandler, request, response); coreToRequest.close(); request.close(); } // the abstract methods - getDescription(), getSourceId(), getSource(), getVersion() are //overrided... but these methods doesn't have any implementations. } But, if i search any text in my webapp from browser, gots the HTTP 500 error. i dont know how SolrContainer is intialized Pls anyone give me the solution... thanks and regards, Mohamed -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Custom Request handler Error:
Yes, Nullpointer Exception. on the line SolrCore coreToRequest = coreContainer.getCore(core2); Noble Paul ??? ?? wrote: is there any error on the console? On Fri, Jun 12, 2009 at 4:26 PM, Noornoo...@opentechindia.com wrote: hi, i am new to apache solr. I need to create a custom request handler class. So i create a new one and changed the solr-config.xml file as, requestHandler name=/select class=solr.my.MyCustomHandler lst name=defaults str name=echoParamsexplicit/str str name=qtandem/str str name=debugQuerytrue/str /lst /requestHandler And in my java class, the code is, public class MyCustomHandler extends RequestHandlerBase{ public CoreContainer coreContainer; public void handleRequestBody(SolrQueryRequest request, SolrQueryResponse response) throws Exception { SolrCore coreToRequest = coreContainer.getCore(core2); ModifiableSolrParams params = new ModifiableSolrParams(); params.set(echoParams, explicit); params.set(q, text); params.set(debugQuery, true); request = new LocalSolrQueryRequest(coreToRequest, params); // SolrRequestHandler reqHandler = coreToRequest.getRequestHandler(/select); coreToRequest.execute(reqHandler, request, response); coreToRequest.close(); request.close(); } // the abstract methods - getDescription(), getSourceId(), getSource(), getVersion() are //overrided... but these methods doesn't have any implementations. } But, if i search any text in my webapp from browser, gots the HTTP 500 error. i dont know how SolrContainer is intialized Pls anyone give me the solution... thanks and regards, Mohamed
Identification of matching by field
Hi, Is possible to identify docId of document where occurred matching in specific Term or QueryTerm ? For example: I have a document with some fields and my query possesss the Query for each field. I need to know the docIds when the QueryTermX finds value. I know that I can verify if matching in the method below, but I think that not will performatic. Searcher searcher = new IndexSearcher(indexReader); final BitSet bits = new BitSet(indexReader.maxDoc()); searcher.search(query, new HitCollector() { public void collect(int doc, float score) { * if (reader.doc(doc).getField(Name).equals(search_word)){* * bits.set(doc);* * } * } }); Thanks
Re: Faceting on text fields
Hi, Sorry for being late to the party, let me try to clear some doubts about Carrot2. Do you know under what circumstances or application should we cluster the whole corpus of documents vs just the search results? I think it depends on what you're trying to achieve. If you'd like to give the users some alternative way of exploring the search results by organizing them into semantically related groups (search results clustering), Carrot2 would be the appropriate tool. Its algorithms are designed to work with small input (up to ~1000 results) and try to provide meaningful labels for each cluster. Currently, Carrot2 has two algorithms: an implementation of Suffix Tree Clustering (STC, a classic in search results clustering research, designed by O. Zamir, implemented by Dawid Weiss) and Lingo (designed and implemented by myself). STC is very fast compared to Lingo, but the latter will usually get you better clusters. Some comparison of the algorithms is here: http://project.carrot2.org/algorithms.html, but ultimately, I'd encourage you to experiment (e.g. using Clustering Workbench). For best results, I'd recommend feeding the algorithms with contextual snippets generated based on the user's query. If the summary could consist of complete sentence(s) containing the query (as opposed to individual words delimited by ...), you should be getting even nicer labels. One important thing for search results clustering is that it is done on-line, so it will add extra time to each search query your server handles. Plus, to get reasonable clusters, you'd need to fetch at least 50 documents from your index, which may put more load on the disks as well (sometimes clustering time may be only be a fraction of the time required to get the documents from the index). Finally, to compare search results clustering with facets: UI-wise they may look similar, but I'd say they're two different things that complement each other. While the list of facets and their values is fairly static (brand names etc.), clusters are less stable -- they're generated dynamically for each search and will vary across queries. Plus, as for any other unsupervised machine learning technique, your clusters will never be 100% correct (as opposed to facets). Almost always you'll be getting one or two clusters that don't make much sense. When it comes to clustering the whole collection, it might be useful in a couple of scenarios: a) if you wanted to get some high level overview of what's in your collection, b) if you'd wanted to e.g. use clusters to re-rank the search results presented to the user (implicit clustering: showing a few documents from each cluster), c) if you wanted to distribute your index based on the semantics of the documents (wild guess, I'm not sure if anyone tried that in practice). In general, I feel clustering the whole index is much harder than search results clustering not only because of the different scale, but also because you'd need to tune the algorithm for your specific needs and data. For example, in scenario a) and a collection of 1M documents: how many top level clusters do you generate? 10? 1? If it's 10, the clusters may end up too general / meaningless, it might be hard to describe them concisely. If it's 1, clusters are likely to be more focused, but hard to browse... I must admit I haven't followed Mahout too closely, maybe there is some nice way of resolving these problems. If you have any other questions about Carrot2, I'll try to answer them here. Alternatively, feel free to join Carrot2 mailing lists. Thanks, Staszek -- http://www.carrot2.org
Re: fq vs. q
Michael Ludwig schrieb: Martin Davidsson schrieb: I've tried to read up on how to decide, when writing a query, what criteria goes in the q parameter and what goes in the fq parameter, to achieve optimal performance. Is there [...] some kind of rule of thumb to help me decide how to split things up when querying against one or more fields. This is a good question. I don't know if there is any such rule. I'm going to sum up my understanding of filter queries hoping that the pros will point out any flaws in my assumptions. I've summarized what I've learnt about filter queries on this page: http://wiki.apache.org/solr/FilterQueryGuidance Michael Ludwig
Re: Configure Collection Distribution in Solr 1.3
Thank you very much. I will try using solr nightly build. Thanks Mahesh R Aleksander M. Stensby wrote: As some people have mentioned here on this mailing lists, the solr 1.3 distribution scripts (snappuller / shooter) etc do not work on windows. Some have indicated that it might be possible to use cygwin but I have doubts. So unfortunately, windows users suffers with regard to replication (although I would reccommend everyone to use Unix for running servers;) ) That being said, you can use Solr 1.4 (one of the nightly builds) where you get built-in replication that is easily configured through the solr server configuration, and this works on Windows aswell! So, if you don't have any real reason to not upgrade, I suggest that you try out Solr 1.4 (which also gives lots of new features and major improvements!) Cheers, Aleksander On Tue, 09 Jun 2009 21:00:27 +0200, MaheshR mahesh.ray...@gmail.com wrote: Hi Aleksander , I gone thorugh the below links and successfully configured rsync using cygwin on windows xp. In Solr documentation they mentioned many script files like rysnc-enable, snapshooter..etc. These all UNIX based files scripts. where do I get these script files for windows OS ? Any help on this would be great helpful. Thanks MaheshR. Aleksander M. Stensby wrote: You'll find everything you need in the Wiki. http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline http://wiki.apache.org/solr/SolrCollectionDistributionScripts If things are still uncertain I've written a guide for when we used the solr distribution scrips on our lucene index earlier. You can read that guide here: http://www.integrasco.no/index.php?option=com_contentview=articleid=51:lucene-index-replicationcatid=35:blogItemid=53 Cheers, Aleksander On Mon, 08 Jun 2009 18:22:01 +0200, MaheshR mahesh.ray...@gmail.com wrote: Hi, we configured multi-core solr 1.3 server in Tomcat 6.0.18 servlet container. Its working great. Now I need to configure collection Distribution to replicate indexing data between master and 2 slaves. Please provide me step by step instructions to configure collection distribution between master and slaves would be helpful. Thanks in advance. Thanks Mahesh. -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail -- Aleksander M. Stensby Lead software developer and system architect Integrasco A/S www.integrasco.no http://twitter.com/Integrasco Please consider the environment before printing all or any of this e-mail -- View this message in context: http://www.nabble.com/Configure-Collection-Distribution-in-Solr-1.3-tp23927332p23999342.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom Request handler Error:
I solved this NullPointerException, by the following changes. In java code: public void handleRequestBody(SolrQueryRequest request, SolrQueryResponse response) throws Exception { SolrCore coreToRequest = request.getCore();//coreContainer.getCore(core2); . } and in solr-config.xml: requestHandler name=/select class=solr.my.MyCustomHandler lst name=defaults str name=echoParamsexplicit/str str name=qtandem/str str name=debugQuerytrue/str /lst /requestHandler Now, my webapp runs fine by, http://localhost:8983/mysearch searching also working fine. But, these are not run through my custom handler. So i felt, it wrongly doing searching. Because, in solr admin statistics page, my custom handler under QueryHandler's request count remains 0, it doesn't get incremented, when i search something. Rather, statndardReqHandler's request count is incremented. And another thing, how do we debug solr. ??? Please anybody help me to solve this ... Thanks in advance. Noble Paul ??? ?? wrote: is there any error on the console? On Fri, Jun 12, 2009 at 4:26 PM, Noornoo...@opentechindia.com wrote: hi, i am new to apache solr. I need to create a custom request handler class. So i create a new one and changed the solr-config.xml file as, requestHandler name=/select class=solr.my.MyCustomHandler lst name=defaults str name=echoParamsexplicit/str str name=qtandem/str str name=debugQuerytrue/str /lst /requestHandler And in my java class, the code is, public class MyCustomHandler extends RequestHandlerBase{ public CoreContainer coreContainer; public void handleRequestBody(SolrQueryRequest request, SolrQueryResponse response) throws Exception { SolrCore coreToRequest = coreContainer.getCore(core2); ModifiableSolrParams params = new ModifiableSolrParams(); params.set(echoParams, explicit); params.set(q, text); params.set(debugQuery, true); request = new LocalSolrQueryRequest(coreToRequest, params); // SolrRequestHandler reqHandler = coreToRequest.getRequestHandler(/select); coreToRequest.execute(reqHandler, request, response); coreToRequest.close(); request.close(); } // the abstract methods - getDescription(), getSourceId(), getSource(), getVersion() are //overrided... but these methods doesn't have any implementations. } But, if i search any text in my webapp from browser, gots the HTTP 500 error. i dont know how SolrContainer is intialized Pls anyone give me the solution... thanks and regards, Mohamed
Re: Getting details from delete
On Thu, Jun 11, 2009 at 10:46 AM, Jacob Elderjel...@locamoda.com wrote: Is there any way to get the number of deleted records from a delete request? Nope. I avoided adding it initially because I thought it might get difficult to calculate that data i the future. That's now come true - Lucene now handles the delete and buffers it until later even. So it's not really possible to get the number at the time you send in the delete. -Yonik http://www.lucidimagination.com
Re: fq vs. q
On Fri, Jun 12, 2009 at 7:09 PM, Michael Ludwig m...@as-guides.com wrote: I've summarized what I've learnt about filter queries on this page: http://wiki.apache.org/solr/FilterQueryGuidance Wow! This is great! Thanks for taking the time to write this up Michael. I've added a section on analysis, scoring and faceting aspects. -- Regards, Shalin Shekhar Mangar.
Re: Custom Request handler Error:
On Fri, Jun 12, 2009 at 8:07 PM, noor noo...@opentechindia.com wrote: requestHandler name=/select class=solr.my.MyCustomHandler lst name=defaults str name=echoParamsexplicit/str str name=qtandem/str str name=debugQuerytrue/str /lst /requestHandler Now, my webapp runs fine by, http://localhost:8983/mysearch searching also working fine. But, these are not run through my custom handler. Specify the full package to your handler class. Packages starting with solr are loaded in a special way. -- Regards, Shalin Shekhar Mangar.
Stats for all documents and not current search
Hello, I need to retrieve the stats of my index (using StatsComponent). It's not a problem when my query is empty, but the stats are update according the current search... and I need the stats of the whole index everytime. I'm currently doing two request (one with empty keyword to get the stats, one to get the results). Any idea which could save me one request? Thanks ! Vincent -- View this message in context: http://www.nabble.com/Stats-for-all-documents-and-not-current-search-tp24001883p24001883.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: highlighting on edgeGramTokenized field -- hightlighting incorrect bc. position not incremented..
Britske, I'd have to dig, but there are a couple of JIRA issues in Lucene's JIRA (the actual ngram code is part of Lucene) that have to do with ngram positions. I have a feeling that may be the problem. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Britske gbr...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, June 12, 2009 6:15:36 AM Subject: highlighting on edgeGramTokenized field -- hightlighting incorrect bc. position not incremented.. Hi, I'm trying to highlight based on a (multivalued) field (prefix2) that has (among other things) a EdgeNGramFilterFactory defined. highlighting doesn't increment the start-position of the highlighted portion, so in other words the highlighted portion is always the beginning of the field. for example: for prefix2: Orlando Verenigde Staten the query: http://localhost:8983/solr/autocompleteCore/select?fl=prefix2,idq=prefix2:%22ver%22wt=xmlhl=truehl.fl=prefix2 returns: Orlando Verenigde Staten while it should be: Orlando Verenigde Staten the field def: positionIncrementGap=1 maxGramSize=20/ I checked that removing the EdgeNGramFilterFactory results in correct positioning of highlighting. (But then I can't search for ngrams...) What am I missing? Thanks in advance, Britske -- View this message in context: http://www.nabble.com/highlighting-on-edgeGramTokenized-field---%3E-hightlighting-incorrect-bc.-position-not-incremented..-tp23996196p23996196.html Sent from the Solr - User mailing list archive at Nabble.com.
Efficient Sharding with date sorted queries
I have a solr index which is going to grow 3x in the near future. I'm considering using distributed search and was contemplating what would be the best approach to splitting the index. Since most of the searches performed on the index are sorted by date descending, I'm considering splitting the index based on the created date of the documents. From Yonik Seeley's blog post, http://yonik.wordpress.com/2008/02/27/distributed-search-for-solr/, I've read that there are two phases to sharding. The first phase collects matching ids and documents across the shards. Then the second phase collects the stored fields for the documents. I'm assuming that this second phase's execution is limited by the number of rows requested and the number of results. So let's say I have 2 shards. The first shard has docs with creation dates of this year. The Second shard contains documents from the previous year. I run a solr query requesting 10 rows sorted by date and get 11 from the first shard and 3 from the second. Will the initial query only execute the first phase on the second shard? If so, that should result in more optimum performance, right? Thanks, -Tim
Strange missing docs when reindexing with threads.
Hi all! I'm using Solr 1.3 and currently testing reindexing... In my client app, i am sending 17494 requests to add documents... In 3 different scenarios: a) not using threads b) using 1 thread c) using 2 threads In scenario a), everything seems to work fine... In my client log, is see 17494 requests sent to solr, in solr's log, I see the same number of 'add' requests received, and If i search the index, i can see the same amount of documents. However, if I use 1 thread, I see the right amount of requests in logs, but I only find 15k or so documents (this varies a bit every time i run this scenario). It gets way worse if I use 2 threads... I can see the right amount of requests in both logs, but i end up with ~ 600 docs in the index! In all scenarios, I don't see any errors on the logs... As you can imagine, I need to be able to use multiple threads to speed up the process... It is also very concertning that I don't get any errors anywhere... Looking at solr's admin stats, I see also 17494 cumulative adds, but only a tiny fraction of actual documents can be found... Any clues? BTW, these indexers work fine if I use lucene straight... Thanks in advance for all your help!
Stable release, trunk release - same Tomcat instance
If I want to run the stable 1.3 release and the nightly build under the same Tomcat instance, should that be configured as multiple solr applications, or is there a different configuration to follow?
Re: Efficient Sharding with date sorted queries
On Fri, Jun 12, 2009 at 10:28 PM, Garafola Timothy timgaraf...@gmail.comwrote: So let's say I have 2 shards. The first shard has docs with creation dates of this year. The Second shard contains documents from the previous year. I run a solr query requesting 10 rows sorted by date and get 11 from the first shard and 3 from the second. No, you cannot request specific number of results from a shard. That is something that Solr will manage itself. It requests start+rows number of documents from each shard to find the rows number of documents to be returned. If you really want to get a specific number of results from a shard, make a query to that shard alone. -- Regards, Shalin Shekhar Mangar.
Re: Strange missing docs when reindexing with threads.
On Fri, Jun 12, 2009 at 11:40 PM, Alexander Wallace a...@rwmotloc.com wrote: Hi all! I'm using Solr 1.3 and currently testing reindexing... In my client app, i am sending 17494 requests to add documents... In 3 different scenarios: a) not using threads b) using 1 thread c) using 2 threads In scenario a), everything seems to work fine... In my client log, is see 17494 requests sent to solr, in solr's log, I see the same number of 'add' requests received, and If i search the index, i can see the same amount of documents. However, if I use 1 thread, I see the right amount of requests in logs, but I only find 15k or so documents (this varies a bit every time i run this scenario). It gets way worse if I use 2 threads... I can see the right amount of requests in both logs, but i end up with ~ 600 docs in the index! In all scenarios, I don't see any errors on the logs... As you can imagine, I need to be able to use multiple threads to speed up the process... It is also very concertning that I don't get any errors anywhere... Looking at solr's admin stats, I see also 17494 cumulative adds, but only a tiny fraction of actual documents can be found... Any clues? What is the uniqueKey in your schema.xml? Is it possible that those 17494 documents have a common uniqueKey and are therefore getting overwritten? -- Regards, Shalin Shekhar Mangar.
Replication problems on 1.4
I'm trying out the replication features on 1.4 (trunk) with multiple indices using a setup based on the example multicore config. The first time I tried it, (replicating through the admin web interface), it worked fine. I was a little surprised that telling one core to replicate caused both to replicate since the docs seem to imply that replication is done on a per-core basis, but I was happy to see that it worked. I wanted to replay my steps, so on the slave machine I deleted core0/data/* and core1/data/* and restarted the server. I restarted the server on master just to be sure. Now replication doesn't work at all. I've tried it both through the admin interface and by curl: curl http://localhost:8983/solr/core0/replication?command=snappull The response from curl indicates that the replication was successful, but nothing happened; my slave index is still empty. My only guess as to what's going wrong here is that deleting the coreN/data directory is not a good way to reset a core back to its initial condition. Maybe there's a bit of state somewhere that's making the slave think that it's already up-to-date with this master and so it doesn't need to do any replicating? But this is a wild conjecture; I'd appreciate any tips on where to look for what's going wrong. As to why the replication claims to be successful, I've no idea. Am I missing some crucial log file that explains what's going wrong? It's also possible that this stuff is still in a heavy state of development such that it shouldn't be expected to work by casual users, if that is the case I can go back to the external-script-based replication features of 1.3. thanks, Phil Hagelberg http://technomancy.us
Re: fq vs. q
On Fri, Jun 12, 2009 at 7:09 PM, Michael Ludwig m...@as-guides.com wrote: I've summarized what I've learnt about filter queries on this page: http://wiki.apache.org/solr/FilterQueryGuidance Wow! This is great! Thanks for taking the time to write this up Michael. I've added a section on analysis, scoring and faceting aspects. -- Regards, Shalin Shekhar Mangar. A very useful article. If I could chip in with another stupid but related issue. The article could explain the difference between fq= and facet.query= and when you should use one in preference to the other. Regards Fergus. -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: Replication problems on 1.4
Phil Hagelberg p...@hagelb.org writes: My only guess as to what's going wrong here is that deleting the coreN/data directory is not a good way to reset a core back to its initial condition. Maybe there's a bit of state somewhere that's making the slave think that it's already up-to-date with this master and so it doesn't need to do any replicating? But this is a wild conjecture; I'd appreciate any tips on where to look for what's going wrong. OK, so I inserted some more documents into the master, and now replication works. I get the feeling it may be due to this line in the master's solrconfig.xml: str name=replicateAftercommit/str Now this is confusing since it seems that the timing of replication is not up to the master, it's up to the slave. The slave's config has settings for the interval at which to replicate, and you POST to the slave to force a replication. So why is there a setting on the master to control when replication happens? My only interpretation from the config files is the master has some sort of you may not replicate from me unless conditions. This seems pretty undesirable since you may have a slave that needs to get replicated from the master immediately; it shouldn't have to wait for a commit on the master. Am I misunderstanding what's going on here? It certainly isn't clear from the documents on the wiki, so I'm kind of grasping in the dark. Perhaps I'm missing something. thanks, Phil Hagelberg http://technomancy.us
Re: Stable release, trunk release - same Tomcat instance
Um, yes this works. On Fri, Jun 12, 2009 at 11:12 AM, Jeff Rodenburg jeff.rodenb...@gmail.comwrote: If I want to run the stable 1.3 release and the nightly build under the same Tomcat instance, should that be configured as multiple solr applications, or is there a different configuration to follow?
RE: fq vs. q
-Original Message- From: Fergus McMenemie [mailto:fer...@twig.me.uk] Sent: Friday, June 12, 2009 3:41 PM To: solr-user@lucene.apache.org Subject: Re: fq vs. q On Fri, Jun 12, 2009 at 7:09 PM, Michael Ludwig m...@as-guides.com wrote: I've summarized what I've learnt about filter queries on this page: http://wiki.apache.org/solr/FilterQueryGuidance Wow! This is great! Thanks for taking the time to write this up Michael. I've added a section on analysis, scoring and faceting aspects. +1 definitely a great article I ran into this very issue recently as we are using a freshness filter for our data that can be 6//12/18 months etc. I discovered that even though we were only indexing with day-level granularity, we were specifying the query by computing a date down to the second and thus virutally every filter was unique. It's amazing how something this simple could bring solr to it's knees on a large data set. By simply changing the filter to date:[NOW-18MONTHS TO NOW] or equivalent, the problem vanishes. It does bring up an interestion question though - how is NOW treated wrt to the cache key? Does solr translate it to a date first? If so, how does it determine the granularity? If not, is there any mechanism to flush the cache when the corresponding result set changes? -Ken
Re: Strange missing docs when reindexing with threads.
Right after I sent the email I went on and checked for uniqueness of documents... In theory the were all supposed to be unique... But i've realized that the platform I'm using to reindex, is delaying sending the requests, this in combination with my reindexers reusing document fields (instead of creating new instances to save on GC) lead to the same document being sent many times with invalid data... I am fairly sure now that this is the source of my problem... My reindexers originally used LuceneWriter directly, which blocks thread excecution until the document is added to the index, and the new framework i'm using uses messaging which releases control back to the thread before the documents are actually sent to be indexed, my threads update the document fields meanwhile, so the data written to the index is transitioning and invalid... I've done an adjustment to my reindexing threads to ensure new instances of everything are used... I will test it shortly... But you point out exactly why i have less documents than 'add' requests... Thanks! Shalin Shekhar Mangar wrote: On Fri, Jun 12, 2009 at 11:40 PM, Alexander Wallace a...@rwmotloc.com wrote: Hi all! I'm using Solr 1.3 and currently testing reindexing... In my client app, i am sending 17494 requests to add documents... In 3 different scenarios: a) not using threads b) using 1 thread c) using 2 threads In scenario a), everything seems to work fine... In my client log, is see 17494 requests sent to solr, in solr's log, I see the same number of 'add' requests received, and If i search the index, i can see the same amount of documents. However, if I use 1 thread, I see the right amount of requests in logs, but I only find 15k or so documents (this varies a bit every time i run this scenario). It gets way worse if I use 2 threads... I can see the right amount of requests in both logs, but i end up with ~ 600 docs in the index! In all scenarios, I don't see any errors on the logs... As you can imagine, I need to be able to use multiple threads to speed up the process... It is also very concertning that I don't get any errors anywhere... Looking at solr's admin stats, I see also 17494 cumulative adds, but only a tiny fraction of actual documents can be found... Any clues? What is the uniqueKey in your schema.xml? Is it possible that those 17494 documents have a common uniqueKey and are therefore getting overwritten?
localsolr and collapse in Solr 1.4
Hi, Has anyone successfully used localsolr and collapse together in Solr 1.4. I am getting two result-sets one from localsolr and other from collapse. I need a merged result-set. Any pointers ???
Using The Tomcat Container
I am installing Solr 1.3.0, and currently have been trying to use Tomcat 5.5. This hasn't been working so far for me, and I have been told (unofficially) that my installation would go more smoothly if I were to use Tomcat 6. Does anyone have experiencing with Solr 1.3 and Tomcat 5.5?
Re: Replication problems on 1.4
On Sat, Jun 13, 2009 at 1:25 AM, Phil Hagelberg p...@hagelb.org wrote: OK, so I inserted some more documents into the master, and now replication works. I get the feeling it may be due to this line in the master's solrconfig.xml: str name=replicateAftercommit/str Now this is confusing since it seems that the timing of replication is not up to the master, it's up to the slave. The slave's config has settings for the interval at which to replicate, and you POST to the slave to force a replication. So why is there a setting on the master to control when replication happens? My only interpretation from the config files is the master has some sort of you may not replicate from me unless conditions. This seems pretty undesirable since you may have a slave that needs to get replicated from the master immediately; it shouldn't have to wait for a commit on the master. Am I misunderstanding what's going on here? It certainly isn't clear from the documents on the wiki, so I'm kind of grasping in the dark. Perhaps I'm missing something. You are right. In Solr/Lucene, a commit exposes updates to searchers. So you need to call commit on the master for the slave to pick up the changes. Replicating changes from the master and then not exposing new documents to searchers does not make sense. However, there is a lot of work going on in Lucene to enable near real-time search (exposing documents to searchrs as soon as possible). Once those features are mature enough, Solr's replication will follow suit. -- Regards, Shalin Shekhar Mangar.
Re: Replication problems on 1.4
Shalin Shekhar Mangar shalinman...@gmail.com writes: You are right. In Solr/Lucene, a commit exposes updates to searchers. So you need to call commit on the master for the slave to pick up the changes. Replicating changes from the master and then not exposing new documents to searchers does not make sense. However, there is a lot of work going on in Lucene to enable near real-time search (exposing documents to searchrs as soon as possible). Once those features are mature enough, Solr's replication will follow suit. I understand that; it's totally reasonable. What it doesn't explain is what happened in my case: the master added a bunch of docs, committed, and then the slave replicated fine. Then the slave lost all its data (due to me issuing an rm -rf of the data directory, but let's say it happened due to a disk failure or something) and tried to replicate again, but got zero docs. Once the master had another commit issued, the slave could now replicate properly. I would expect in this case the slave should be able to replicate after losing its data but before the second commit. I can see why the master would not expose uncommitted documents, but I can't see why it would refuse to allow _any_ of its index to be replicated from. I feel like I'm missing a piece of the picture here. -Phil
Re: Using The Tomcat Container
On Sat, Jun 13, 2009 at 2:25 AM, Mukerjee, Neiloy (Neil) neil.muker...@alcatel-lucent.com wrote: I am installing Solr 1.3.0, and currently have been trying to use Tomcat 5.5. This hasn't been working so far for me, and I have been told (unofficially) that my installation would go more smoothly if I were to use Tomcat 6. Does anyone have experiencing with Solr 1.3 and Tomcat 5.5? Can you elaborate on what is not working for you? We use Solr with Tomcat 5.5 and it works fine. -- Regards, Shalin Shekhar Mangar.
Re: highlighting on edgeGramTokenized field -- hightlighting incorrect bc. position not incremented..
Thanks, I'll check it out. Otis Gospodnetic wrote: Britske, I'd have to dig, but there are a couple of JIRA issues in Lucene's JIRA (the actual ngram code is part of Lucene) that have to do with ngram positions. I have a feeling that may be the problem. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Britske gbr...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, June 12, 2009 6:15:36 AM Subject: highlighting on edgeGramTokenized field -- hightlighting incorrect bc. position not incremented.. Hi, I'm trying to highlight based on a (multivalued) field (prefix2) that has (among other things) a EdgeNGramFilterFactory defined. highlighting doesn't increment the start-position of the highlighted portion, so in other words the highlighted portion is always the beginning of the field. for example: for prefix2: Orlando Verenigde Staten the query: http://localhost:8983/solr/autocompleteCore/select?fl=prefix2,idq=prefix2:%22ver%22wt=xmlhl=truehl.fl=prefix2 returns: Orlando Verenigde Staten while it should be: Orlando Verenigde Staten the field def: positionIncrementGap=1 maxGramSize=20/ I checked that removing the EdgeNGramFilterFactory results in correct positioning of highlighting. (But then I can't search for ngrams...) What am I missing? Thanks in advance, Britske -- View this message in context: http://www.nabble.com/highlighting-on-edgeGramTokenized-field---%3E-hightlighting-incorrect-bc.-position-not-incremented..-tp23996196p23996196.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/highlighting-on-edgeGramTokenized-field---%3E-hightlighting-incorrect-bc.-position-not-incremented..-tp23996196p24006375.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fq vs. q
On Sat, Jun 13, 2009 at 1:36 AM, Ensdorf Ken ensd...@zoominfo.com wrote: I ran into this very issue recently as we are using a freshness filter for our data that can be 6//12/18 months etc. I discovered that even though we were only indexing with day-level granularity, we were specifying the query by computing a date down to the second and thus virutally every filter was unique. It's amazing how something this simple could bring solr to it's knees on a large data set. By simply changing the filter to date:[NOW-18MONTHS TO NOW] or equivalent, the problem vanishes. Since you are indexing with day-level granularity, you should query too with the same granularity. For example, date:[NOW/DAY-18MONTHS TO NOW/DAY]. The '/' operator is used for rounding off in DateMath syntax ( http://lucene.apache.org/solr/api/org/apache/solr/util/DateMathParser.html). Perhaps this is something we should document more clearly, we recently had high CPU issues with one of our webapps due to the same issue. It does bring up an interestion question though - how is NOW treated wrt to the cache key? Does solr translate it to a date first? If so, how does it determine the granularity? If not, is there any mechanism to flush the cache when the corresponding result set changes? The date math syntax is translated to a date before a search is performed. NOW is always granular upto seconds (maybe milliseconds, not sure). -- Regards, Shalin Shekhar Mangar.
Re: Using The Tomcat Container
On Fri, Jun 12, 2009 at 4:55 PM, Mukerjee, Neiloy (Neil)neil.muker...@alcatel-lucent.com wrote: I am installing Solr 1.3.0, and currently have been trying to use Tomcat 5.5. This hasn't been working so far for me, and I have been told (unofficially) that my installation would go more smoothly if I were to use Tomcat 6. Does anyone have experiencing with Solr 1.3 and Tomcat 5.5? 5.5 should work OK. For nightlies, I just updated the Simple Example Install at http://wiki.apache.org/solr/SolrTomcat You might try stepping through those steps with your versions. -Yonik http://www.lucidimagination.com
Re: Strange missing docs when reindexing with threads.
That was exactly my issue... i changed my code to not reuse document/fields and it is all good now! Thanks for your support! Shalin Shekhar Mangar wrote: On Fri, Jun 12, 2009 at 11:40 PM, Alexander Wallace a...@rwmotloc.com wrote: Hi all! I'm using Solr 1.3 and currently testing reindexing... In my client app, i am sending 17494 requests to add documents... In 3 different scenarios: a) not using threads b) using 1 thread c) using 2 threads In scenario a), everything seems to work fine... In my client log, is see 17494 requests sent to solr, in solr's log, I see the same number of 'add' requests received, and If i search the index, i can see the same amount of documents. However, if I use 1 thread, I see the right amount of requests in logs, but I only find 15k or so documents (this varies a bit every time i run this scenario). It gets way worse if I use 2 threads... I can see the right amount of requests in both logs, but i end up with ~ 600 docs in the index! In all scenarios, I don't see any errors on the logs... As you can imagine, I need to be able to use multiple threads to speed up the process... It is also very concertning that I don't get any errors anywhere... Looking at solr's admin stats, I see also 17494 cumulative adds, but only a tiny fraction of actual documents can be found... Any clues? What is the uniqueKey in your schema.xml? Is it possible that those 17494 documents have a common uniqueKey and are therefore getting overwritten?
Joins or subselects in solr
Hello, I am storing items in an index. Each item has a comma separated list of related items. Is it possible to bring back an item and all of its related items in one query? If so how and how would you distinguish between which one is the main item and which are the related. Any help is much appreciated. Thanks! Nasseam Solr-powered Ajax search+nav: http://factbook.bodukai.com/ Powered by Boutique: http://bodukai.com/boutique/
Re: Replication problems on 1.4
On Sat, Jun 13, 2009 at 2:44 AM, Phil Hagelbergp...@hagelb.org wrote: Shalin Shekhar Mangar shalinman...@gmail.com writes: You are right. In Solr/Lucene, a commit exposes updates to searchers. So you need to call commit on the master for the slave to pick up the changes. Replicating changes from the master and then not exposing new documents to searchers does not make sense. However, there is a lot of work going on in Lucene to enable near real-time search (exposing documents to searchrs as soon as possible). Once those features are mature enough, Solr's replication will follow suit. I understand that; it's totally reasonable. What it doesn't explain is what happened in my case: the master added a bunch of docs, committed, and then the slave replicated fine. Then the slave lost all its data (due to me issuing an rm -rf of the data directory, but let's say it happened due to a disk failure or something) and tried to replicate again, but got zero docs. Once the master had another commit issued, the slave could now replicate properly. if you removed the files while the slave is running , then the slave will not know that you removed the files (assuming it is a *nix box) and it will serve the search requests. But if you restart the slave , it should have automatically picked up the current index. if it doesn't it is a bug I would expect in this case the slave should be able to replicate after losing its data but before the second commit. I can see why the master would not expose uncommitted documents, but I can't see why it would refuse to allow _any_ of its index to be replicated from. I feel like I'm missing a piece of the picture here. -Phil -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Custom Request handler Error:
Shalin Shekhar Mangar wrote: On Fri, Jun 12, 2009 at 8:07 PM, noor noo...@opentechindia.com wrote: requestHandler name=/select class=solr.my.MyCustomHandler lst name=defaults str name=echoParamsexplicit/str str name=qtandem/str str name=debugQuerytrue/str /lst /requestHandler Now, my webapp runs fine by, http://localhost:8983/mysearch searching also working fine. But, these are not run through my custom handler. Specify the full package to your handler class. Packages starting with solr are loaded in a special way. I specified like requestHandler name=/select class=org.apache.solr.my.MyCustomHandler . But still the same error.