Spellchecker index cannot be optimized
Hello, when I rebuild the spellchecker index ( by optimizing the data index or by calling cmd=rebuild ) the spellchecker index is not optimized. I even cannot delete the old indexfiles on the filesystem, because they are locked by the solr server. I have to stop the solr server(resin) to optimize the spellchecker index with luke or by deleting the old files. How can I optimize the index without stopping the solr server? Thanks Lutz Pumpenmeier
RE: Copyfield multi valued to single value
Thanks for the update, i'll have to find another way then :s. Marc Date: Mon, 14 Jun 2010 13:44:30 -0700 From: hossman_luc...@fucit.org To: solr-user@lucene.apache.org Subject: Re: Copyfield multi valued to single value : Is there a way to copy a multivalued field to a single value by taking : for example the first index of the multivalued field? Unfortunately no. This would either need to be done with an UpdateProcessor, or on the client constructing hte doc (either the remote client, or in your DIH config if that's how you are using Tika) -Hoss _ Installez gratuitement les nouvelles Emoch'ticones ! http://www.ilovemessenger.fr/emoticones/telecharger-emoticones-emochticones.aspx
RE: custom scorer in Solr
Hello Hoss, So far we have been using the default SearchHandler. I also looked into a solution proposed on this mailing list by Geert-Jan Brits using extra sort fields and functions to pick out the maximum. This however proved rather cumbersome to integrate in our SolrJ client and I also have some concerns about performance. The actual data has about 2.5 million documents in it, with some popular categories of more than 200K docs. I did look into the dismax query but the problem there was that name and category are not the only fields we search in. They are only the what field and we also have a where field. The code that actually came closest to the desired results was this: private String makeQuery(String what, String where) { StringBuilder sb = new StringBuilder(); sb.append(category:); sb.append(what); sb.append(^32 OR ); sb.append(name:); sb.append(what); sb.append(^16 AND (); sb.append(locality2:); sb.append(where); sb.append(^8 OR locality3:); sb.append(where); sb.append(^4 OR locality1:); sb.append(where); sb.append(^2 OR locality4:); sb.append(where); sb.append()); return sb.toString(); } ... SolrQuery query = new SolrQuery(); query.setQuery(makeQuery(what, where)); QueryResponse rsp; query.addSortField(score, ORDER.desc); query.addSortField(producttier, ORDER.asc); query.addSortField(random_ + System.currentTimeMillis(), ORDER.asc); So the actual query string was something like category:restaurant^32 OR name:restaurant^16 AND(locality2:Antwerp^8 OR locality3:Antwerp^4 OR locality1:Antwerp^2 OR locality4:Antwerp). I have no idea how this can be rewritten in SolrJ using a standard dismax query. So in conclusion I think this client will probably need a custom QParser. Time to start reading and experimenting I guess. Regards, Tom -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: maandag 14 juni 2010 22:29 To: solr-user@lucene.apache.org Subject: Re: custom scorer in Solr : Problem is that they want scores that make results fall in buckets: : : * Bucket 1: exact match on category (score = 4) : * Bucket 2: exact match on name (score = 3) : * Bucket 3: partial match on category (score = 2) : * Bucket 4: partial match on name (score = 1) ... : First thing we did was develop a custom similarity class that would : return the correct score depending on the field and an exact or partial : match. ... : The only problem now is that when a document matches on both the : category and name the scores are added together. what QParser are you using? what does the resulting Query data structure look like? I think with your custom Similarity class you might be able to achieve your goal using the DisMaxQParser w/o any other custom code -- just set your qf=category name (i'm assuming your Similarity already handles the relative weighting) and set the tie=0 ... that will ensure that the final score only comes from the Max scoring field (ie: no tie breaking values fro mthe other fields) if thta doesn't do what you want -- then your best bet is probably to write a custom QParser that generates *exactly* the query structure you want (likely using a DisjunctionMaxQuery) thta will give you the scores you want in conjunction with your similarity class. -Hoss
Indexing HTML files in SOLR
Hi, I am using SOLR with Apache Tomcat. I have some .html files(contains the articles) stored at XYZ location. How can I index these .html files in SOLR? Regards, Siddharth -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-HTML-files-in-SOLR-tp896530p896530.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need help on Solr Cell usage with specific Tika parser
Thanks, moving it to direcxt child worked. Olivier 2010/6/14 Chris Hostetter hossman_luc...@fucit.org : In solrconfig, in update/extract requesthandler I specified str : name=tika.config./tika-config.xml/str , where tika-config.xml is in : conf directory (same as solrconfig). can you show us the full requestHandler decalration? ... tika.config needs to be a direct child of the requestHandler (not in the defaults) I also don't know if using a local path like that will work -- depends on how that file is loaded (if solr loads it, then you might want to remove the ./; if solr just gives the path to tika, then you probably need an absolute path. -Hoss
HOWTO get a working copy of SOLR?
Dear list, this sounds stupid, but how to get a full working copy of SOLR? What I have tried so far: - started with LucidWorks SOLR. Installs fine, runs fine but has an old tika version and can only handle some PDFs. - changed to SOLR trunk. Installs fine, runs fine but luke 1.0.1 argues about Unknown format version: -10. I guess because luke 1.0.1 compiles with lucene-core-3.0.1.jar but trunk has lucene-core-4.0-dev.jar ??? Anyway, no luck with this version. - changed to SOLR branch_3x. Installs fine, runs fine, luke works fine but the extraction with /update/extract (ExtractingRequestHandler) only replies the metadata but not the content. No luck with this version. Is there any full working recent copy at all? Or a luke working with SOLR trunk? Regards, Bernd
Re: question about the fieldCollapseCache
Hi, I tried downloading solr 1.4.1 from the site. but it shows an empty directory. where did u get solr 1.4.1 from? Regards, Raakhi On Tue, Jun 8, 2010 at 10:35 PM, Jean-Sebastien Vachon js.vac...@videotron.ca wrote: Hi All, I've been running some tests using 6 shards each one containing about 1 millions documents. Each shard is running in its own virtual machine with 7 GB of ram (5GB allocated to the JVM). After about 1100 unique queries the shards start to struggle and run out of memory. I've reduced all other caches without significant impact. When I remove completely the fieldCollapseCache, the server can keep up for hours and use only 2 GB of ram. (I'm even considering returning to a 32 bits JVM) The size of the fieldCollapseCache was set to 5000 items. How can 5000 items eat 3 GB of ram? Can someone tell me what is put in this cache? Has anyone experienced this kind of problem? I am running Solr 1.4.1 with patch 236. All requests are collapsing on a single field (pint) and collapse.maxdocs set to 200 000. Thanks for any hints...
Re: how to use q=string in solrconfig.xml `?
okay thx. good idea with mod_rewrite =) thx -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-use-q-string-in-solrconfig-xml-tp861870p896902.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question about the fieldCollapseCache
They used to be in the branches if I recall correctly but you're right. They aren't there anymore. Maybe someone else can explain why... it looks like they restructure the repository for the Solr/lucene merge. On 2010-06-15, at 4:54 AM, Rakhi Khatwani wrote: Hi, I tried downloading solr 1.4.1 from the site. but it shows an empty directory. where did u get solr 1.4.1 from? Regards, Raakhi On Tue, Jun 8, 2010 at 10:35 PM, Jean-Sebastien Vachon js.vac...@videotron.ca wrote: Hi All, I've been running some tests using 6 shards each one containing about 1 millions documents. Each shard is running in its own virtual machine with 7 GB of ram (5GB allocated to the JVM). After about 1100 unique queries the shards start to struggle and run out of memory. I've reduced all other caches without significant impact. When I remove completely the fieldCollapseCache, the server can keep up for hours and use only 2 GB of ram. (I'm even considering returning to a 32 bits JVM) The size of the fieldCollapseCache was set to 5000 items. How can 5000 items eat 3 GB of ram? Can someone tell me what is put in this cache? Has anyone experienced this kind of problem? I am running Solr 1.4.1 with patch 236. All requests are collapsing on a single field (pint) and collapse.maxdocs set to 200 000. Thanks for any hints...
Re: HOWTO get a working copy of SOLR?
On Tue, Jun 15, 2010 at 12:58 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: - changed to SOLR branch_3x. Installs fine, runs fine, luke works fine but the extraction with /update/extract (ExtractingRequestHandler) only replies the metadata but not the content. Sounds like https://issues.apache.org/jira/browse/SOLR-1902 Sixten
how to get tf-idf values in solr
I am Sarfaraz, working on a Search Engine project which is based on Nutch Solr. I am trying to implement a new Search Algorithm for this engine. Our search engine is crawling the web and storing the documents in form of large strings in the database indexed by their urls. Now to implement my algorithm i need tf - idf values(0 - 1) for each document given by the crawler. but i m unable to find any method in solr or lucene which can serve my purpose. For my algorithm i need to maintain a relevance matrix of the following type : eg term1 term2 term3 term4... url1 0.7 0.8 0.3 0.1 url2 0.4 0.1 0.4 0.5 url3 . . . and for this purpose i need a core java method/function in solr that returns me the tf idf values for all terms in all documents for the available document list.. Plz help I will highly grateful to you all -Sarfaraz Masood
Re: how to get tf-idf values in solr
Have you taken a look at Solr's TermVector component? It's probably what you want: http://wiki.apache.org/solr/TermVectorComponent didier On Tue, Jun 15, 2010 at 8:38 AM, sarfaraz masood sarfarazmasood2...@yahoo.com wrote: I am Sarfaraz, working on a Search Engine project which is based on Nutch Solr. I am trying to implement a new Search Algorithm for this engine. Our search engine is crawling the web and storing the documents in form of large strings in the database indexed by their urls. Now to implement my algorithm i need tf - idf values(0 - 1) for each document given by the crawler. but i m unable to find any method in solr or lucene which can serve my purpose. For my algorithm i need to maintain a relevance matrix of the following type : eg term1 term2 term3 term4... url1 0.7 0.8 0.3 0.1 url2 0.4 0.1 0.4 0.5 url3 . . . and for this purpose i need a core java method/function in solr that returns me the tf idf values for all terms in all documents for the available document list.. Plz help I will highly grateful to you all -Sarfaraz Masood
Re: how to get tf-idf values in solr
The TermVectorComponent can return tf/idf: http://wiki.apache.org/solr/TermVectorComponent On Jun 15, 2010, at 9:38 AM, sarfaraz masood wrote: I am Sarfaraz, working on a Search Engine project which is based on Nutch Solr. I am trying to implement a new Search Algorithm for this engine. Our search engine is crawling the web and storing the documents in form of large strings in the database indexed by their urls. Now to implement my algorithm i need tf - idf values(0 - 1) for each document given by the crawler. but i m unable to find any method in solr or lucene which can serve my purpose. For my algorithm i need to maintain a relevance matrix of the following type : eg term1 term2term3term4... url10.7 0.8 0.30.1 url20.4 0.1 0.4 0.5 url3 . . . and for this purpose i need a core java method/function in solr that returns me the tf idf values for all terms in all documents for the available document list.. Plz help I will highly grateful to you all -Sarfaraz Masood
Re: Multiple location filters per search
Hoss, Thanks for the response. I was able to get multiple dist queries working, however, I've noticed another problem. when using fq=_query_:{!frange l=0 u=25 v=$qa} qa=dist(2,44.844833,-93.03528,latitude,longitude) it returns 9,975 documents. When I change the upper limit to 250 it returns 33,241 documents. So, the filter is doing something. But, the lat/long on the documents returned puts them well beyond the limit - for example, the first document returned with an upper limit of 25 has the following values: [latitude] = 36.0275 [longitude] = -80.2073 which is 907 miles away from the originating point. Currently, my lat/lon fields are indexed using field name=latitude type=tdouble indexed=true / field name=longitude type=tdouble indexed=true / I have no doubt there is something I am missing, and any help would be greatly appreciated. Aaron On Mon, Jun 14, 2010 at 7:43 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I am currently working with the following: : : {code} : {!frange l=0 u=1 unit=mi}dist(2,32.6126, -86.3950, latitude, longitude) : {/code} ... : {code} : {!frange l=0 u=1 unit=mi}dist(2,32.6126, -86.3950, latitude, : longitude) OR {!frange l=0 u=1 unit=mi}dist(2,44.1457, -73.8152, : latitude, longitude) : {/code} ... : I get an error. Hoping someone has an idea of how to work with : multiple locations in a single search. I think yo uare confused about how that query is getting parsed ... when SOlr sees the {!frange at the begining of hte param, that tells it that the *entire* praam value should be parsed by the frange parser. The frange parser doesn't know anything about keywords like OR What you probably want is to utilize the _query_ hack of the LuceneQParser so that you can parse some Lucene syntax (ie: A OR B) where the clauses are then generated by using another parser... http://wiki.apache.org/solr/SolrQuerySyntax fq=_query_={!frange l=0 u=1 unit=mi}dist(2,32.6126, -86.3950, latitude, longitude) OR _query_:{!frange l=0 u=1 unit=mi}dist(2,44.1457, -73.8152, latitude, longitude) ...or a little more readable... fq=_query_={!frange l=0 u=1 unit=mi v=$qa} OR _query_:{!frange l=0 u=1 unit=mi v=$qb} qa=dist(2,32.6126, -86.3950, latitude, longitude) qb=dist(2,44.1457, -73.8152, latitude, longitude) -Hoss
Re: VelocityResponseWriter in Solr Core ?! configuration
Are you using Ubuntu by any chance? It's a somewhat common problem ... @http://stackoverflow.com/questions/2854356/java-classpath-problems-in-ubuntu I'm unsure if this has been resolved but a similar thing happened to me on a recent VMware image in a dev environment. It worked everywhere else. - Jon On Jun 14, 2010, at 9:12 AM, stockii wrote: ah okay. i tried it with 1.4 and put the jars into lib of solr.home but it want be work. i get the same error ... i use 2 cores. and my solr.home is ...path/cores in this folder i put another folder with the name: lib and put all these Jars into it: apache-solr-velocity-1.4-dev.jar velocity-1.6.1.jar velocity-tools-2.0-beta3.jar commons-beanutils-1.7.0.jar commons-collections-3.2.1.jar commons-lang-2.1.jar and then in solrconfig.xml this line: queryResponseWriter name=velocity class=org.apache.solr.response.VelocityResponseWriter/ solr cannot find the jars =( -- View this message in context: http://lucene.472066.n3.nabble.com/VelocityResponseWriter-in-Solr-Core-configuration-tp894262p894354.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom faceting question
Got it. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-faceting-question-tp868015p897390.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple location filters per search
From what I've seen so far, using separate fields for latitude and longitude, especially with multiple values of each, does not work correctly in all situations. The hole in my understanding is how Solr knows how to pair a latitude and longitude field _back_ into a POINT. I can say that it doesn't work with multiple values and ranges, which is an approach floated on the wiki and various blogs. I can have qualifying longitude and latitude values that match a range (not sure about distance function tho), but taken together should result in a true negative, and not a false positive. I'm hoping to learn how its done myself. But so far the wiki isn't clear about it. On Tue, 15 Jun 2010 09:35:15 -0500, Aaron Chmelik aaron.chme...@gmail.com wrote: Hoss, Thanks for the response. I was able to get multiple dist queries working, however, I've noticed another problem. when using fq=_query_:{!frange l=0 u=25 v=$qa} qa=dist(2,44.844833,-93.03528,latitude,longitude) it returns 9,975 documents. When I change the upper limit to 250 it returns 33,241 documents. So, the filter is doing something. But, the lat/long on the documents returned puts them well beyond the limit - for example, the first document returned with an upper limit of 25 has the following values: [latitude] = 36.0275 [longitude] = -80.2073 which is 907 miles away from the originating point. Currently, my lat/lon fields are indexed using field name=latitude type=tdouble indexed=true / field name=longitude type=tdouble indexed=true / I have no doubt there is something I am missing, and any help would be greatly appreciated. Aaron On Mon, Jun 14, 2010 at 7:43 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I am currently working with the following: : : {code} : {!frange l=0 u=1 unit=mi}dist(2,32.6126, -86.3950, latitude, longitude) : {/code} ... : {code} : {!frange l=0 u=1 unit=mi}dist(2,32.6126, -86.3950, latitude, : longitude) OR {!frange l=0 u=1 unit=mi}dist(2,44.1457, -73.8152, : latitude, longitude) : {/code} ... : I get an error. Hoping someone has an idea of how to work with : multiple locations in a single search. I think yo uare confused about how that query is getting parsed ... when SOlr sees the {!frange at the begining of hte param, that tells it that the *entire* praam value should be parsed by the frange parser. The frange parser doesn't know anything about keywords like OR What you probably want is to utilize the _query_ hack of the LuceneQParser so that you can parse some Lucene syntax (ie: A OR B) where the clauses are then generated by using another parser... http://wiki.apache.org/solr/SolrQuerySyntax fq=_query_={!frange l=0 u=1 unit=mi}dist(2,32.6126, -86.3950, latitude, longitude) OR _query_:{!frange l=0 u=1 unit=mi}dist(2,44.1457, -73.8152, latitude, longitude) ...or a little more readable... fq=_query_={!frange l=0 u=1 unit=mi v=$qa} OR _query_:{!frange l=0 u=1 unit=mi v=$qb} qa=dist(2,32.6126, -86.3950, latitude, longitude) qb=dist(2,44.1457, -73.8152, latitude, longitude) -Hoss
DatImportHandler and cron issue
Hi All, We are trying implement solr for our newspapers site search. To build out the index with all the articles published so far, we are running script which send the request to dataimport handler with different dates. What we are seeing is the request is dispatched to solr server,but its not being processed. Just wanted to check if its some kind of threading issues, and whats the best approach to achieve this. We are sleeping for 75 secs between the requests, while (($date+=86400) $now) { $curdate=strftime(%D, localtime($date)); print Updating index for $curdate\n; $curdate=uri_escape($curdate); my $url = 'http://test.solr.ddtc.cmgdigital.com:8080/solr/npmetrosearch_statesman/dataimport?command=full-importentity=initialLoadclean=falsecommit=trueforDate=' . $curdate . 'numArticles=-1server=app5site=statesmanarticleTypes=story,slideshow,video,poll,specialArticle,list'; print Sending: $url\n; #if (system(wget -q -O - \'$url\' | egrep -q \'$regex_pat\')) { if (system(curl -s \'$url\' | egrep -q \'$regex_pat\')) { print Failed to match expected regex reply: \$regex_pat\\n; exit 1; } sleep 75; } This is what we are seeing on the server logs 2010-06-14 12:51:01,328 INFO [org.apache.solr.core.SolrCore] (http-0.0.0.0-8080-1) [npmetrosearch_statesman] webapp=/solr path=/dataimport params={site=statesmanforDate=03/24/10articleTypes=story,slideshow,video,poll,specialArticle,listclean=falsecommit=trueentity=initialLoadcommand=full-importnumArticles=-1server=app5} status=0 QTime=0 2010-06-14 12:51:01,329 INFO [org.apache.solr.handler.dataimport.DataImporter] (Thread-378) Starting Full Import 2010-06-14 12:51:01,332 INFO [org.apache.solr.handler.dataimport.SolrWriter] (Thread-378) Read dataimport.properties 2010-06-14 12:51:01,425 INFO [org.apache.solr.handler.dataimport.DocBuilder] (Thread-378) Time taken = 0:0:0.93 2010-06-14 12:51:16,338 INFO [org.apache.solr.core.SolrCore] (http-0.0.0.0-8080-1) [npmetrosearch_statesman] webapp=/solr path=/dataimport params={site=statesmanforDate=03/25/10articleTypes=story,slideshow,video,poll,specialArticle,listclean=falsecommit=trueentity=initialLoadcommand=full-importnumArticles=-1server=app5} status=0 QTime=0 2010-06-14 12:51:16,338 INFO [org.apache.solr.handler.dataimport.DataImporter] (Thread-379) Starting Full Import 2010-06-14 12:51:16,338 INFO [org.apache.solr.handler.dataimport.SolrWriter] (Thread-379) Read dataimport.properties 2010-06-14 12:51:16,465 INFO [org.apache.solr.handler.dataimport.DocBuilder] (Thread-379) Time taken = 0:0:0.126 Appreciate any thoughts on this. Thanks Indrani -- View this message in context: http://lucene.472066.n3.nabble.com/DatImportHandler-and-cron-issue-tp897698p897698.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Master master?
Don't think so, you probably want to look into this setup of distributed + sharding http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr#d0e410 it will get you high availability plus more scalable. Wilson Man | Principal Consultant | Liferay, Inc. | Enterprise. Open Source. For Life. On Jun 14, 2010, at 1:15 PM, Chris Hostetter wrote: : Does Solr handling having two masters that are also slaves to each other (ie : in a cycle)? no. -Hoss
Help patching Solr
Hey guys, Does anyone know how to patch stuff in Windows? I am trying to patch Solr with patch 238 but it keeps erroring out with this message: C:\solr\example\webappspatch solr.war ..\..\SOLR-236-trunk.patch patching file solr.war Assertion failed: hunk, file ../patch-2.5.9-src/patch.c, line 354 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. Thanks in advance Moazzam
RE: Help patching Solr
I'm pretty sure you need to be running the patch against a checkout of the trunk sources, not a generated .war file. Once you've done that you can use the build scripts to make a new war. -Kallin Nagelberg -Original Message- From: Moazzam Khan [mailto:moazz...@gmail.com] Sent: Tuesday, June 15, 2010 1:53 PM To: solr-user@lucene.apache.org Subject: Help patching Solr Hey guys, Does anyone know how to patch stuff in Windows? I am trying to patch Solr with patch 238 but it keeps erroring out with this message: C:\solr\example\webappspatch solr.war ..\..\SOLR-236-trunk.patch patching file solr.war Assertion failed: hunk, file ../patch-2.5.9-src/patch.c, line 354 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. Thanks in advance Moazzam
Re: Help patching Solr
Thanks. I finally patched it (I think). I got the source from SVN and applied the patch using a windows port. A caveat to those to want to do this on windows - open the file in wordpad and save it as a different file to replace unix line breaks with DOS line breaks. Otherwise, the patch program gives an error: Assertion failed: hunk, file ../patch-2.5.9-src/patch.c, line 354 Now, that I have patched it (as far as I can tell) how do I build the sources :D (sorry I know it's a basic question but I have no idea how to do this) - Moazzam On Tue, Jun 15, 2010 at 1:14 PM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: I'm pretty sure you need to be running the patch against a checkout of the trunk sources, not a generated .war file. Once you've done that you can use the build scripts to make a new war. -Kallin Nagelberg -Original Message- From: Moazzam Khan [mailto:moazz...@gmail.com] Sent: Tuesday, June 15, 2010 1:53 PM To: solr-user@lucene.apache.org Subject: Help patching Solr Hey guys, Does anyone know how to patch stuff in Windows? I am trying to patch Solr with patch 238 but it keeps erroring out with this message: C:\solr\example\webappspatch solr.war ..\..\SOLR-236-trunk.patch patching file solr.war Assertion failed: hunk, file ../patch-2.5.9-src/patch.c, line 354 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. Thanks in advance Moazzam
Reindexing only occurs after bouncing app
Hi all I wrote a small app using solrj and solr. The app has a small wrapper that handles the reindexing., which was written using groovy. The groovy script generates the solr docs, and then the java code deletes and recreates the data In a singleton ejb, we do this in the post construct phase: 39 CoreContainer.Initializer initializer = new CoreContainer.Initializer(); 40 coreContainer = initializer.initialize(); 41 solrServer = new EmbeddedSolrServer(coreContainer, ); A method that does this can be invoked over HTTP service to force the reindexing: 52 gse.run(search_indexer.groovy, b); 53 logger.info(Solr docs size: + solrDocs.size()); 54 solrServer.deleteByQuery(*:*); 55 solrServer.add(solrDocs); 56 solrServer.commit(); we've noticed that after executing this, we see appropriate log messages indicating that it ran, however the search indexes do not repopulate. We're deployed on glassfish v3. Any thoughts? Any ideas? Thanks, John
Solr / Solrj Wildcard Phrase Searches
Performing wild card phrase searches can be tricky. Spend some time figuring this one out. 1. To perform a wildcard search on a phrase, it is very important to escape the SPACE, so that SOLR treats it as a single phrase. Ex: Citibank NA = Citibank\ NA You can use org.apache.solr.client.solrj.util.ClientUtils (part of solrj library) to perform the escapes -- Example -- So that a search for: CITIBANK\ N* Should produce results: CITIBANK NA CITIBANK NATIONAL CITIBANK N 2. Also make sure your field (I named it client_name_starts) is of fieldType that is maintained as a single Token during indexing. -- Example -- !-- lowercases the entire field value, keeping it as a single token. -- fieldType name=lowercase class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType field name=client_name_starts type=lowercase indexed=true stored=true / 3. Make sure to Lower Case/Upper Case (depending on your setup) the search user input string, before sending it to SOLR - since wildcards are NOT analyzed - and send AS IS Good Luck Kind regards, Vladimir Sutskever Investment Bank - Technology JPMorgan Chase, Inc. Tel: (212) 552.5097 This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.
using DataImport Dev Console: no errors, but no documents
I'm new to Solr so I expect that I'm making some newbie error. I run my data-config.xml file through the DataImportHandler Development Console and I see all the results of the xpath queries scroll past in the debug pane. It processes all the content without reporting an error in the terminal window that runs Jetty, or in the Dev Console itself. This is what appears at the end of the debug pane: ---snip-- str name=statusidle/str str name=importResponseConfiguration Re-loaded sucessfully/str − lst name=statusMessages str name=Total Requests made to DataSource0/str str name=Total Rows Fetched5322/str str name=Total Documents Skipped0/str str name=Full Dump Started2010-06-15 21:51:14/str str name=Total Documents Processed0/str str name=Time taken 0:0:0.71/str /lst − ---snip--- It fetches 5322 rows but doesn't process any documents and doesn't populate the index. Any suggestions would be appreciated. /peter Here's my data-config.xml file: dataConfig dataSource name=myFileReader type=FileDataSource / document entity name=f processor=FileListEntityProcessor baseDir=/Users/pascal/tools/apache-solr-1.4.0/example/example-DIH/timetext fileName=.*ttml recursive=true rootEntity=false dataSource=null entity name=transcript pk=tid url=${f.fileAbsolutePath} processor=XPathEntityProcessor forEach=/tt/body/div/p rootEntity=false dataSource=myFileReader onError=continue field column=begin xpath=/tt/body/div/p/@begin / field column=dur xpath=/tt/body/div/p/@dur / field column=end xpath=/tt/body/div/p/@end / field column=phrase xpath=/tt/body/div/p / field column=tid xpath=/tt/body/div/p/@xml:id / /entity /entity /document /dataConfig
DIH error documents' list
DIH skips the documents which has errors and it also shows which field caused the error. But which documents is skipped and which field caused the error is only shown in the server console. Is there a way to retrieve that info in the browser or read the info from the console itself. Thanks, Maddy -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-error-documents-list-tp899052p899052.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCoreAware
Can someone please explain what the inform method should accomplish? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCoreAware-tp899064p899064.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrEventListener
Can someone explain how to register a SolrEventListener? I am actually interested in using the SpellCheckerListener and it appears that it would build/rebuild a spellchecker index on commit and/or optimize but according to the wiki the only events that can be listened for are firstSearcher and newSearcher (http://wiki.apache.org/solr/SolrPlugins#SolrEventListener) Is the wiki outdated or something? So how can I register this (or any other event listener) to execute on commit/optimize? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SolrEventListener-tp899074p899074.html Sent from the Solr - User mailing list archive at Nabble.com.