aliasing?
anyone knows about aliasing in Lucene/Solr? I need to implement something and there is a title called on the task list as aliasing... I have few ideas though but still not clear... anyone can explain that to me or refer some docs? -- View this message in context: http://lucene.472066.n3.nabble.com/aliasing-tp2917733p2917733.html Sent from the Solr - User mailing list archive at Nabble.com.
Total Documents Failed : How to find out why
Hi, I am running the solr index and post indexing I get these results, how can I know which documents failed and why? str name=Total Requests made to DataSource1/str str name=Total Rows Fetched5170850/str str name=Total Documents Skipped0/str str name=Full Dump Started2011-05-08 23:40:09/str str name=Indexing completed. Added/Updated: 2972300 documents. Deleted 0 documents./str str name=Committed2011-05-09 00:13:48/str str name=Optimized2011-05-09 00:13:48/str str name=Total Documents Processed2972300/str str name=Total Documents Failed2198550/str str name=Time taken 0:33:40.945/str Running solr on jetty right now and the console shows no error, also \Solr\example\logs folder is empty. Thanks, Rohit
tomcat and multicore processors
Hi, Is that possible that solr on tomcat on windows 2008 is using only one core of processor? Do I need configure something to use more cores? Best Regards, Solr_Beginner
Re: How to Update Value of One Field of a Document in Index?
Hello. You should be able to get the current document that you want to update, change your notes value with the new ones to be added bye the user, and then make and update petition to Solr to delete the old document (findable by the id that you include in the POST petition) and add the new document with the changes done. Try to develop a small Java application with SolrJ resources, for example. Depending on the number of update petitions that your system/application will do I recommend you, or not, to include a commit order after the update one. Also you can configure a periodic auto-commit to update indexes automatically.
Searching accross Solr-Multicore
Hallo everyone, i'm using solr-multicore with 3 cores to index my Web-Site. For testing i'm using the solr-admin GUI to get responses. The Problem is, that i get results only from one core, but not from the others also. Each core has its own schema.xml. The Cores are like follow structured: /multicore/solr/ solr.xml 1. core1 - config * schema_1.xml - data 2. core2 3. core3 Any idea what could be the problem? for all the help I am very appreciate Fahd
Re: tomcat and multicore processors
yea you can use solr on tomcat, i am doing the same actually... but have no idea about multiple cores tho... -- View this message in context: http://lucene.472066.n3.nabble.com/tomcat-and-multicore-processors-tp2917973p2918015.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching accross Solr-Multicore
On Mon, May 9, 2011 at 2:10 PM, Benyahya, Fahd fahd.benya...@netmoms.de wrote: Hallo everyone, i'm using solr-multicore with 3 cores to index my Web-Site. For testing i'm using the solr-admin GUI to get responses. The Problem is, that i get results only from one core, but not from the others also. [...] What do you mean by get results only from one core, but not from the others also? * Are you querying one core, and expecting to get results from all? This is not possible: You have to either query each, or merge them into a single core. * Or, is it that queries are working on one core, and not on the other? Regards, Gora
Re: uima fieldMappings and solr dynamicField
Thanks Koji for opening that, the dynamicField mapping is a commonly used feature especially for named entities mapping. Tommaso 2011/5/7 Koji Sekiguchi k...@r.email.ne.jp I've opened https://issues.apache.org/jira/browse/SOLR-2503 . Koji -- http://www.rondhuit.com/en/ (11/05/06 20:15), Koji Sekiguchi wrote: Hello, I'd like to use dynamicField in feature-field mapping of uima update processor. It doesn't seem to be acceptable currently. Is it a bad idea in terms of use of uima? If it is not so bad, I'd like to try a patch. Background: Because my uima annotator can generate many types of named entity from a text, I don't want to implement so many types, but one type NamedEntity: typeSystemDescription types typeDescription namecom.rondhuit.uima.next.NamedEntity/name description/ supertypeNameuima.tcas.Annotation/supertypeName features featureDescription namename/name description/ rangeTypeNameuima.cas.String/rangeTypeName /featureDescription featureDescription nameentity/name description/ rangeTypeNameuima.cas.String/rangeTypeName /featureDescription /features /typeDescription /types /typeSystemDescription sample extracted named entities: name=PERSON, entity=Barack Obama name=TITLE, entity=the President Now, I'd like to map these named entities to Solr fields like this: PERSON_S:Barack Obama TITLE_S:the President Because the type of name (PERSON, TITLE, etc.) can be so many, I'd like to use dynamicField *_s. And where * is replaced by the name feature of NamedEntity. I think this is natural requirement from Solr view point, but I'm not sure my uima annotator implementation is correct or not. In other words, should I implement many types for each entity types? (e.g. PersonEntity, TitleEntity, ... instead of NamedEntity) Thank you! Koji
Re: Searching accross Solr-Multicore
Hi, sorry that I did not so well explained my issue. That is exactly as you described it(* Or, is it that queries are working on one core, and not on the other?) Regards, Fahd On 9 May 2011 10:58, Gora Mohanty g...@mimirtech.com wrote: On Mon, May 9, 2011 at 2:10 PM, Benyahya, Fahd fahd.benya...@netmoms.de wrote: Hallo everyone, i'm using solr-multicore with 3 cores to index my Web-Site. For testing i'm using the solr-admin GUI to get responses. The Problem is, that i get results only from one core, but not from the others also. [...] What do you mean by get results only from one core, but not from the others also? * Are you querying one core, and expecting to get results from all? This is not possible: You have to either query each, or merge them into a single core. * Or, is it that queries are working on one core, and not on the other? Regards, Gora
Re: Searching accross Solr-Multicore
If the schema is different across cores , you can query across the cores only for those fields that are common. Querying across all cores for some query paramterer and gettin result set in one output xml can be achieved by shards http://localhost:8090/solr1indent=onq=*:*shards=localhost:8090/solr1,localhost:8090/solr2rows=10start=0 Regards, Rajani On Mon, May 9, 2011 at 2:36 PM, Benyahya, Fahd fahd.benya...@netmoms.dewrote: Hi, sorry that I did not so well explained my issue. That is exactly as you described it(* Or, is it that queries are working on one core, and not on the other?) Regards, Fahd On 9 May 2011 10:58, Gora Mohanty g...@mimirtech.com wrote: On Mon, May 9, 2011 at 2:10 PM, Benyahya, Fahd fahd.benya...@netmoms.de wrote: Hallo everyone, i'm using solr-multicore with 3 cores to index my Web-Site. For testing i'm using the solr-admin GUI to get responses. The Problem is, that i get results only from one core, but not from the others also. [...] What do you mean by get results only from one core, but not from the others also? * Are you querying one core, and expecting to get results from all? This is not possible: You have to either query each, or merge them into a single core. * Or, is it that queries are working on one core, and not on the other? Regards, Gora
Solr 3.1 / Java 1.5: Exception regarding analyzer implementation
I just attempted to set up an instance of Solr 3.1 in Tomcat 5.5 running in Java 1.5. It fails with the following exception on start-up: java.lang.AssertionError: Analyzer implementation classes or at least their tokenStream() and reusableTokenStream() implementations must be final at org.apache.lucene.analysis.Analyzer.assertFinal(Analyzer.java:57) The exact same configuration works like a charm on another machine with Java 1.6 again using Tomcat 5.5. Has anyone else run into this issue? Is Solr 3.1 not compatible to Java 1.5 anymore? The query analyzer where the exceptions seems to stem from looks like this: analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1 stemEnglishPossessive=0 splitOnNumerics=0 / filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=5 outputUnigrams=true / /analyzer Best, - Martin
Solr 1.3 highlighting problem
Hi, I'm using the old 1.3 Solr version on one of my sites and I decided to add a highlighting feature. Unfortunately I can not get it to work. I'm doing some testing in the Sorl admin interface without much luck. Below is some information that describes the problem. I would like to highlight text in the field text, schema.xml config of text: field name=text type=string indexed=true stored=true/ Query in the solr admin interface: http://127.0.0.1:8080/solr/select?indent=onversion=2.2q=solrstart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl=onhl.fl=text I get back two results, both of the text fields contain the query solr. In the highlight tag I get only the IDs: lst name=highlightinglst name=54807/lst name=105235//lst Any ideas what may be causing this and how I can debug it? Thanks. Kind regards, Nick -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-1-3-highlighting-problem-tp2918089p2918089.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching accross Solr-Multicore
thanks for all those who have answered my questions. But i still not understanding, why i cannot sent queries for each core own and get results only form the core who has quired. At first i'm not intersting to get resultes for all cores in one xml output . to do that i need to make a distributed searching. Regards, Fahd On 9 May 2011 11:09, rajini maski rajinima...@gmail.com wrote: If the schema is different across cores , you can query across the cores only for those fields that are common. Querying across all cores for some query paramterer and gettin result set in one output xml can be achieved by shards http://localhost:8090/solr1indent=onq=*:*shards=localhost:8090/solr1,localhost:8090/solr2rows=10start=0 Regards, Rajani On Mon, May 9, 2011 at 2:36 PM, Benyahya, Fahd fahd.benya...@netmoms.de wrote: Hi, sorry that I did not so well explained my issue. That is exactly as you described it(* Or, is it that queries are working on one core, and not on the other?) Regards, Fahd On 9 May 2011 10:58, Gora Mohanty g...@mimirtech.com wrote: On Mon, May 9, 2011 at 2:10 PM, Benyahya, Fahd fahd.benya...@netmoms.de wrote: Hallo everyone, i'm using solr-multicore with 3 cores to index my Web-Site. For testing i'm using the solr-admin GUI to get responses. The Problem is, that i get results only from one core, but not from the others also. [...] What do you mean by get results only from one core, but not from the others also? * Are you querying one core, and expecting to get results from all? This is not possible: You have to either query each, or merge them into a single core. * Or, is it that queries are working on one core, and not on the other? Regards, Gora
Re: Solr 3.1 / Java 1.5: Exception regarding analyzer implementation
On 09.05.11 11:04, Martin Jansen wrote: I just attempted to set up an instance of Solr 3.1 in Tomcat 5.5 running in Java 1.5. It fails with the following exception on start-up: java.lang.AssertionError: Analyzer implementation classes or at least their tokenStream() and reusableTokenStream() implementations must be final at org.apache.lucene.analysis.Analyzer.assertFinal(Analyzer.java:57) In the meantime I solved the issue by installing Java 1.6. Works without a problem now, but I'm wondering if Solr 3.1 is intentionally incompatible to Java 1.5 or if if happened by mistake. Martin
Faceting with MorelikeThis
Hi All! Can Anybody tell me how to exclude the count of similar results( obtained from morelikethis ) from total facet count. Thanks in Advance! Isha Garg
Re: Solr 1.3 highlighting problem
Whether your field text is stored or not? Highlighting works with stored fields of schema only. - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-1-3-highlighting-problem-tp2918089p2918299.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: aliasing?
Can you provide more detail about your required aliasing. - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/aliasing-tp2917733p2918303.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Whole unfiltered content in response document field
I understand now. I become the raw content of the field because is stored. The filtered content is in the response not visible. I can only see this in the analysis view. Ok now :) I will try to move the StopFilter under the WordDelimeter. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Whole-unfiltered-content-in-response-document-field-tp2911588p2918316.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Total Documents Failed : How to find out why
First you need to find your logs. That folder should not be empty regardless of whether DIH is working correctly or not. I'm assuming here that you're just doing the java -jar star.jar in the example directory, if this isn't the case how are you starting Solr/Jetty? Best Erick On Mon, May 9, 2011 at 3:26 AM, Rohit ro...@in-rev.com wrote: Hi, I am running the solr index and post indexing I get these results, how can I know which documents failed and why? str name=Total Requests made to DataSource1/str str name=Total Rows Fetched5170850/str str name=Total Documents Skipped0/str str name=Full Dump Started2011-05-08 23:40:09/str str name=Indexing completed. Added/Updated: 2972300 documents. Deleted 0 documents./str str name=Committed2011-05-09 00:13:48/str str name=Optimized2011-05-09 00:13:48/str str name=Total Documents Processed2972300/str str name=Total Documents Failed2198550/str str name=Time taken 0:33:40.945/str Running solr on jetty right now and the console shows no error, also \Solr\example\logs folder is empty. Thanks, Rohit
Re: Searching accross Solr-Multicore
There's not much information to go on here. Please review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Mon, May 9, 2011 at 5:26 AM, Benyahya, Fahd fahd.benya...@netmoms.de wrote: thanks for all those who have answered my questions. But i still not understanding, why i cannot sent queries for each core own and get results only form the core who has quired. At first i'm not intersting to get resultes for all cores in one xml output . to do that i need to make a distributed searching. Regards, Fahd On 9 May 2011 11:09, rajini maski rajinima...@gmail.com wrote: If the schema is different across cores , you can query across the cores only for those fields that are common. Querying across all cores for some query paramterer and gettin result set in one output xml can be achieved by shards http://localhost:8090/solr1indent=onq=*:*shards=localhost:8090/solr1,localhost:8090/solr2rows=10start=0 Regards, Rajani On Mon, May 9, 2011 at 2:36 PM, Benyahya, Fahd fahd.benya...@netmoms.de wrote: Hi, sorry that I did not so well explained my issue. That is exactly as you described it(* Or, is it that queries are working on one core, and not on the other?) Regards, Fahd On 9 May 2011 10:58, Gora Mohanty g...@mimirtech.com wrote: On Mon, May 9, 2011 at 2:10 PM, Benyahya, Fahd fahd.benya...@netmoms.de wrote: Hallo everyone, i'm using solr-multicore with 3 cores to index my Web-Site. For testing i'm using the solr-admin GUI to get responses. The Problem is, that i get results only from one core, but not from the others also. [...] What do you mean by get results only from one core, but not from the others also? * Are you querying one core, and expecting to get results from all? This is not possible: You have to either query each, or merge them into a single core. * Or, is it that queries are working on one core, and not on the other? Regards, Gora
RE: Total Documents Failed : How to find out why
Hi Erick, Thats exactly how I am starting solr. Regards, Rohit -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 09 May 2011 16:57 To: solr-user@lucene.apache.org Subject: Re: Total Documents Failed : How to find out why First you need to find your logs. That folder should not be empty regardless of whether DIH is working correctly or not. I'm assuming here that you're just doing the java -jar star.jar in the example directory, if this isn't the case how are you starting Solr/Jetty? Best Erick On Mon, May 9, 2011 at 3:26 AM, Rohit ro...@in-rev.com wrote: Hi, I am running the solr index and post indexing I get these results, how can I know which documents failed and why? str name=Total Requests made to DataSource1/str str name=Total Rows Fetched5170850/str str name=Total Documents Skipped0/str str name=Full Dump Started2011-05-08 23:40:09/str str name=Indexing completed. Added/Updated: 2972300 documents. Deleted 0 documents./str str name=Committed2011-05-09 00:13:48/str str name=Optimized2011-05-09 00:13:48/str str name=Total Documents Processed2972300/str str name=Total Documents Failed2198550/str str name=Time taken 0:33:40.945/str Running solr on jetty right now and the console shows no error, also \Solr\example\logs folder is empty. Thanks, Rohit
Custom filter development
Hi, I would like to write my own filter. I try to use the following class: public class MyFilter extends TokenFilter { private String myField public SemanticQueryExpansionFilter(TokenStream input, myFiled) { super(input); this.myField = myField; } @SuppressWarnings(deprecation) public Token next() throws IOException { return parseToken(this.input.next()); } @SuppressWarnings(deprecation) public Token next(Token result) throws IOException { return parseToken(this.input.next()); } protected Token parseToken(Token input) { /* do magic stuff with in.termBuffer() here (a char[] which can be manipulated) */ /* set the changed length of the new term with in.setTermLength(); before returning it */ } } The factory and deploying is no problem, but I have a different question. I want to trigger my filter at the last position after I have a clear set of Tokens. This I can configure in my analyser XML-configuration. My object from type MyFilter becomes in the constructor a input TokenStream. I assume that this is a list of Tokens. The methods next use the parseToken method. This is ok, the next Token from the input will be get an a modified Token will be returned. But this is a problem for me. The one-to-one mapping. I want to map a given Token, for example a to three Tokens a1, a2, a3. I want to do a one-to-one mapping to b - c too, and I want to have the possibility to remove a Token d - . How can I do this, when the next methods returns only one Token, not a collection? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-filter-development-tp2918459p2918459.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.3 highlighting problem
Hi Grijesh, The field text is stored and yet it is not working. Kind regards, Nick -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-1-3-highlighting-problem-tp2918089p2918518.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Use Solr / Lucene to search in a Logfile
Hello Robert, At my company, we are working on a generic log collector that uses Solr to provide search capabilities. What the collector does basically is this (this is greatly dumbed down !) : * collect a log line (read it from a file, receive it from the network, ... ) * parse it through a set of regular expressions, searching for known log formats ( apache CLF, ... ) * if there is a match, store the results as a set of keys/values ( url : http://www.apache.org , source : XXX , raw_log : xx , ... ) * insert the set as a document in the Solr backend, using the REST interface. Therefore I would advise you to adapt this workflow to suit your own needs : have a script looking for new lines in your log file, parse them in order to extract the relevant information you need, store the results as keys/values sets, then insert them into Solr via a http call. My company's product is probably overkill for what you need to do, and we'd probably need to develop a specific log parser for your log format, but if you are willing to give it a try feel free to contact me ! Greetings, Matthieu HUIN On 06/05/2011 21:40, Robert Naczinski wrote: Hi, thanks for the reply. I did not know that. Is there still a way to use Solr or Lucene? Or Apache Nutch would be not be bad. Could I maybe write a customized DIH? Greetings, Robert 2011/5/6 Otis Gospodneticotis_gospodne...@yahoo.com: Loggly.com
Re: uima fieldMappings and solr dynamicField
Thanks Tommaso! I'm glad to hear from the person with experience. I'll commit shortly. Koji (11/05/09 17:57), Tommaso Teofili wrote: Thanks Koji for opening that, the dynamicField mapping is a commonly used feature especially for named entities mapping. Tommaso 2011/5/7 Koji Sekiguchik...@r.email.ne.jp I've opened https://issues.apache.org/jira/browse/SOLR-2503 . Koji -- http://www.rondhuit.com/en/ (11/05/06 20:15), Koji Sekiguchi wrote: Hello, I'd like to use dynamicField in feature-field mapping of uima update processor. It doesn't seem to be acceptable currently. Is it a bad idea in terms of use of uima? If it is not so bad, I'd like to try a patch. Background: Because my uima annotator can generate many types of named entity from a text, I don't want to implement so many types, but one type NamedEntity: typeSystemDescription types typeDescription namecom.rondhuit.uima.next.NamedEntity/name description/ supertypeNameuima.tcas.Annotation/supertypeName features featureDescription namename/name description/ rangeTypeNameuima.cas.String/rangeTypeName /featureDescription featureDescription nameentity/name description/ rangeTypeNameuima.cas.String/rangeTypeName /featureDescription /features /typeDescription /types /typeSystemDescription sample extracted named entities: name=PERSON, entity=Barack Obama name=TITLE, entity=the President Now, I'd like to map these named entities to Solr fields like this: PERSON_S:Barack Obama TITLE_S:the President Because the type of name (PERSON, TITLE, etc.) can be so many, I'd like to use dynamicField *_s. And where * is replaced by the name feature of NamedEntity. I think this is natural requirement from Solr view point, but I'm not sure my uima annotator implementation is correct or not. In other words, should I implement many types for each entity types? (e.g. PersonEntity, TitleEntity, ... instead of NamedEntity) Thank you! Koji -- http://www.rondhuit.com/en/
Can ExtractingRequestHandler ignore documents metadata
I'm indexing content from a CMS' database of metadata. The client would prefer that Solr exclude the properties (metadata) of any documents being indexed. Is there a way to tell Tika to only index a document's text and not its properties? Thanks - Tod
Solr Range Facets
Hi Chris , I did try what you suggested, but I am not getting the expected results. The code is given below, SolrQuery query = new SolrQuery(); query.set(q,apple); query.set(facet,true); query.set(facet.range, createdOnGMTDate); query.set(facet.range.start, 2010-01-01T00:00:00Z) ; query.set(facet.range.gap, +1DAY); QueryResponse qr = server.query(query); SolrDocumentList sdl = qr.getResults(); System.out.println(Found: + sdl.getNumFound()); System.out.println(Start: + sdl.getStart()); System.out.println(---); ListFacetField facets = qr.getFacetFields(); for(FacetField facet : facets) { ListFacetField.Count facetEntries = facet.getValues(); for(FacetField.Count fcount : facetEntries) { System.out.println(fcount.getName() + : + fcount.getCount()); } } Regards, Rohit -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: 07 May 2011 04:36 To: solr-user@lucene.apache.org Subject: RE: Solr: org.apache.solr.common.SolrException: Invalid Date String: : Thanks for the response, actually what we need to achive is see group by : results based on dates like, : : 2011-01-01 23 : 2011-01-02 14 : 2011-01-03 40 : 2011-01-04 10 : : Now the records in my table run into millions, grouping the result based on : UTC date would not produce the right result since the result should be : grouped on users timezone. Is there anyway we can achieve this in Solr? Date faceting is entirely driven by query params, so if you index your events using the true time that they happend at (formatted as a string in UTC) you can then select your date ranges using whatever timezone offset is specified by your user at query time as a UTC offset. facet.range = dateField facet.range.start = 2011-01-01T00:00:00Z+${useroffset}MINUTES facet.range.gap = +1DAY etc... -Hoss
Re: Replication Clarification Please
Hello Mr. Bell, Thank you very much for patiently responding to my questions. We optimize once in every 2 days. Can you kindly rephrase your answer, I could not understand - if the amount of time if 10 segments, I believe that might also trigger a whole index, since you cycled all the segments.In that case I think you might want to increase the mergeFactor. The current index folder details and sizes are given below MASTER -- 5K search-data/spellchecker2 480M search-data/index 5K search-data/spellchecker1 5K search-data/spellcheckerFile 480M search-data SLAVE -- 2K search-data/index.20110509103950 419M search-data/index 2.3G search-data/index.20110429042508 SLAVE is pointing to this directory 5K search-data/spellchecker1 5K search-data/spellchecker2 5K search-data/spellcheckerFile 2.7G search-data Thanks, Ravi Kiran Bhaskar On Sat, May 7, 2011 at 11:49 PM, Bill Bell billnb...@gmail.com wrote: I did not see answers... I am not an authority, but will tell you what I think Did you get some answers? On 5/6/11 2:52 PM, Ravi Solr ravis...@gmail.com wrote: Hello, Pardon me if this has been already answered somewhere and I apologize for a lengthy post. I was wondering if anybody could help me understand Replication internals a bit more. We have a single master-slave setup (solr 1.4.1) with the configurations as shown below. Our environment is quite commit heavy (almost 100s of docs every 5 minutes), and all indexing is done on Master and all searches go to the Slave. We are seeing that the slave replication performance gradually decreases and the speed decreases 1kbps and ultimately gets backed up. Once we reload the core on slave it will be work fine for sometime and then it again gets backed up. We have mergeFactor set to 10 and ramBufferSizeMB is set to 32MB and solr itself is running with 2GB memory and locktype is simple on both master and slave. How big is your index? How many rows and GB ? Every time you replicate, there are several resets on caching. So if you are constantly Indexing, you need to be careful on how that performance impact will apply. I am hoping that the following questions might help me understand the replication performance issue better (Replication Configuration is given at the end of the email) 1. Does the Slave get the whole index every time during replication or just the delta since the last replication happened ? It depends. If you do an OPTIMIZE every time your index, then you will be sending the whole index down. If the amount of time if 10 segments, I believe that might also trigger a whole index, since you cycled all the segments. In that case I think you might want to increase the mergeFactor. 2. If there are huge number of queries being done on slave will it affect the replication ? How can I improve the performance ? (see the replications details at he bottom of the page) It seems that might be one way the you get the index.* directories. At least I see it more frequently when there is huge load and you are trying to replicate. You could replicate less frequently. 3. Will the segment names be same be same on master and slave after replication ? I see that they are different. Is this correct ? If it is correct how does the slave know what to fetch the next time i.e. the delta. Yes they better be. In the old days you could just rsync the data directory from master and slave and reload the core, that worked fine. 4. When and why does the index.TIMESTAMP folder get created ? I see this type of folder getting created only on slave and the slave instance is pointing to it. I would love to know all the conditions... I believe it is supposed to replicate to index.*, then reload to point to it. But sometimes it gets stuck in index.* land and never goes back to straight index. There are several bug fixes for this in 3.1. 5. Does replication process copy both the index and index.TIMESTAMP folder ? I believe it is supposed to copy the segment or whole index/ from master to index.* on slave. 6. what happens if the replication kicks off even before the previous invocation has not completed ? will the 2nd invocation block or will it go through causing more confusion ? That is not supposed to happen, if a replication is in process, it should not copy again until that one is complete. Try it, just delete the data/*, restart SOLR, and force a replication, while it is syncing, force it again. Does not seem to work for me. 7. If I have to prep a new master-slave combination is it OK to copy the respective contents into the new master-slave and start solr ? or do I have have to wipe the new slave and let it replicate from its new master ? If you shut down the slave, copy the data/* directory amd restart you should be fine. That is how we fix the data/ dir when there is corruption. 8. Doing an 'ls | wc -l' on index folder of
Solr 3.1 Upgrade - Reindex necessary ?
Hello All, I am planning to upgrade from Solr 1.4.1 to Solr 3.1. I saw some deprecation warnings in the log as shown below [#|2011-05-09T12:37:18.762-0400|WARNING|sun-appserver9.1|org.apache.solr.analysis.BaseTokenStreamFactory|_ThreadID=53;_ThreadName=httpSSLWorkerThread-9001-13 ;_RequestID=de32fd3f-e968-4228-a071-9bb175bfb549;|StopFilterFactory is using deprecated LUCENE_24 emulation. You should at some point declare and reindex to at least 3.0, because 2.x emulation is deprecated and will be removed in 4.0|#] [#|2011-05-09T12:37:18.765-0400|WARNING|sun-appserver9.1|org.apache.solr.analysis.BaseTokenStreamFactory|_ThreadID=53;_ThreadName=httpSSLWorkerThread-9001-13 ;_RequestID=de32fd3f-e968-4228-a071-9bb175bfb549;|WordDelimiterFilterFactory is using deprecated LUCENE_24 emulation. You should at some point declare and re index to at least 3.0, because 2.x emulation is deprecated and will be removed in 4.0|#] [#|2011-05-09T12:37:18.767-0400|WARNING|sun-appserver9.1|org.apache.solr.analysis.BaseTokenStreamFactory|_ThreadID=53;_ThreadName=httpSSLWorkerThread-9001-13 ;_RequestID=de32fd3f-e968-4228-a071-9bb175bfb549;|EnglishPorterFilterFactory is using deprecated LUCENE_24 emulation. You should at some point declare and re index to at least 3.0, because 2.x emulation is deprecated and will be removed in 4.0|#] so I would love the experts advise on the following questions 1. Do we have to reindex all content again to use Solr 3.1 ? 2. If we don't reindex all content are there any potential issues ? (I read somewhere that first commit would change the 1.4.1 format to 3.1. have the analyzer's behavior changed which warrants reindexing ?) 3. Apart from deploying the new solr 3.1 war; Is it just enough to set luceneMatchVersionLUCENE_31/luceneMatchVersion to get all the goodies and bug fixes of the LUCENE/SOLR 3.1 ? Thank You, Ravi Kiran Bhaskar
Re: Custom filter development
On Mon, May 9, 2011 at 5:07 AM, solrfan a2701...@jnxjn.com wrote: Hi, I would like to write my own filter. I try to use the following class: But this is a problem for me. The one-to-one mapping. I want to map a given Token, for example a to three Tokens a1, a2, a3. I want to do a one-to-one mapping to b - c too, and I want to have the possibility to remove a Token d - . How can I do this, when the next methods returns only one Token, not a collection? Buffer them internally. Look at SynonymFilter.java, it does exactly this. Tom Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-filter-development-tp2918459p2918459.html Sent from the Solr - User mailing list archive at Nabble.com.
Synonym Filter disable at query time
I would like to be able to disable the synonym filter during runtime based on a query parameter, say 'synoynms=true' or 'synonyms=false'. Is there a way within the AnaylzerQueryNodeProcessor or QParser that I can remove the SynonymFilter from the AnalyzerAttributes? It seems that the Analyzer has a hashmap for it's 'analyzers' but I cannot find the declaration of this item. Am I going about this wrong is also another question I had... -- View this message in context: http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-tp2919876p2919876.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr security
Hi all, Is it possible to set up solr so that it will only execute dataimport commands if they come from localhost? Right now, my application and my solr installation are on different servers so any requests are formatted http://domain:8983 instead of http://localhost:8983. I am concerned that when I launch my application, there will be the potential for abuse. Is the best solution to have everything reside on the same server? What are some other solutions? Thanks, Brian Lamb
Re: Solr security
Solr does not provide security (I believe Lucid EnterpriseWorks has something there). You should keep Solr itself secure behind a firewall, and pass all requests through some intermediary that only allows sensible stuff through to Solr itself. That way, the DataImportHandler is accessible inside your firewall, and your search functionality is available outside. Upayavira On Mon, 09 May 2011 14:57 -0400, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, Is it possible to set up solr so that it will only execute dataimport commands if they come from localhost? Right now, my application and my solr installation are on different servers so any requests are formatted http://domain:8983 instead of http://localhost:8983. I am concerned that when I launch my application, there will be the potential for abuse. Is the best solution to have everything reside on the same server? What are some other solutions? Thanks, Brian Lamb --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Solr is not working for few document
Hi I am using solr with liferay till Friday everything was good but today I added few documents but I am unable to search some of them L. It is showing 0 result(s) are found. Waiting for your help. Thanks, anil misra 248-880-4948
Re: Solr 4.0
REPOST as a more general question about ivy dependencies: http://stackoverflow.com/questions/5941789/do-ivy-dependency-revisions-have-anything-to-do-with-svns On Mon, May 9, 2011 at 11:31 AM, Gabriele Kahlout gabri...@mysimpatico.comwrote: I think you are talking about this dependency: dependency org=org.apache.solr name=solr-solrj *rev=1.4.1* conf=*-default / I've checked out solr 4 svn revision 1099940[1]. What value should I use for rev? [1] http://lucene.472066.n3.nabble.com/Is-it-possible-to-build-Solr-as-a-maven-project-tp2898068p2905051.html On Tue, Apr 19, 2011 at 2:48 PM, Julien Nioche lists.digitalpeb...@gmail.com wrote: You need to change the version of SOLR in ivy/ivy.xml then rebuild unless you change the jars straight in to nutch-1.3/runtime/local/lib - assuming that you're running Nutch locally only On 19 April 2011 07:09, Haspadar haspa...@gmail.com wrote: Yes, it occured after removing SolrJ1.4 jar and copy 4.0 version. Before it I upgrated Nutch for Solr 3.1 the same way and all worked fine. Thanks 2011/4/19 Markus Jelsma markus.jel...@openindex.io Hi, Hello. I'm using Nutch 1.3. I decided to upgrade Solr to version 4.0 and I replaced Nutch libs (Snapshot and SolrJ) from Solr dist. After that I got the error at SolrIndexer on Reduce stage: 11/04/19 01:47:19 INFO mapred.JobClient: map 100% reduce 27% 11/04/19 01:47:21 INFO mapred.JobClient: Task Id : attempt_201104190142_0009_r_00_0, Status : FAILED org.apache.solr.common.SolrException: ERROR: [doc= http://www.site.net/ ] Error adding field 'tstamp'='2011-04-18T22:45:17.404Z' ERROR: [doc=http://www.site.net/] Error adding field 'tstamp'='2011-04-18T22:45:17.404Z' request: http://127.0.0.1:8983/solr/update?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttp SolrServer.java:436) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttp SolrServer.java:245) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstract UpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:50) at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:75) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.ja va:48) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.Child.main(Child.java:170) If you are using Solr 1.4.x then you must upgrade the SolrJ jar's in Nutch. Solr 1.4.x and higher are not compatible. Just remove the 1.4.x jar's and copy over the new. I tried to remove tstamp from solrindex-mapping.xml and Solr's schema.xml. But this field is required in schema.xml and I got the error: 11/04/19 01:58:03 INFO mapred.JobClient: Task Id : attempt_201104190142_0010_r_00_0, Status : FAILED org.apache.solr.common.SolrException: ERROR: [doc= http://www.site.net/ ] unknown field 'tstamp' ERROR: [doc=http://www.site.net/] unknown field 'tstamp' Removing a mapping doesn't mean the field isn't copied over. All unmapped fields are copied as is. The example mapping seems rather useless as it copies exact field names. It's only useful if your source fields and destination fields are actually different, which is usually not the case if you dedicate a Solr core for a Nutch crawl. You must either not create the field by some plugin or add the field to your Solr index. I'm surprised this error actually showed up considering the incompatible Javabin versions. Perhaps you already upgraded the SolrJ api? How I can upgrade Solr to 4 version? Thank you. -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is
RE: Synonym Filter disable at query time
Just make another field using copyfield which the other field does not apply synonyms to the text and then search either the one with or without from the front end... that will be your selector. :) -Original Message- From: mtraynham [mailto:mtrayn...@digitalsmiths.com] Sent: Monday, May 09, 2011 11:17 AM To: solr-user@lucene.apache.org Subject: Synonym Filter disable at query time I would like to be able to disable the synonym filter during runtime based on a query parameter, say 'synoynms=true' or 'synonyms=false'. Is there a way within the AnaylzerQueryNodeProcessor or QParser that I can remove the SynonymFilter from the AnalyzerAttributes? It seems that the Analyzer has a hashmap for it's 'analyzers' but I cannot find the declaration of this item. Am I going about this wrong is also another question I had... -- View this message in context: http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time- tp2919876p2919876.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Synonym Filter disable at query time
Awesome thanks! Also, you wouldn't happen to have any insight on boosting synonyms lower than the original query after they were stemmed, would you? Say if I had synonyms turned on: The TokenStream is setup to do Synonyms - StopFilter - LowerCaseFilter - SnowballPorter. Say I search for Thomas, synonyms produces Thomas, Tom, Tommy. The SnowballPorter produces Tom, Tommi, Thoma. Is there a way to know Thoma would match the original term, so it could be boosted higher? -- View this message in context: http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-tp2919876p2920342.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Synonym Filter disable at query time
Actually now that I think about it, with copy fields I can just single out the Synonym reader and boost from an earlier processor. Thanks again though, that solved a lot of headache! -- View this message in context: http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-tp2919876p2920510.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr security
Hi, You can simply configure a firewall on your Solr server to only allow access from your frontend server. Whether you use the built-in software firewall of Linux/Windows/Whatever or use some other FW utility is a choice you need to make. This is by design - you should never ever expose your backend services, whether it's a search server or a database server, to the public. Read more about Solr security on the WIKI: http://wiki.apache.org/solr/SolrSecurity -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 9. mai 2011, at 20.57, Brian Lamb wrote: Hi all, Is it possible to set up solr so that it will only execute dataimport commands if they come from localhost? Right now, my application and my solr installation are on different servers so any requests are formatted http://domain:8983 instead of http://localhost:8983. I am concerned that when I launch my application, there will be the potential for abuse. Is the best solution to have everything reside on the same server? What are some other solutions? Thanks, Brian Lamb
RE: Synonym Filter disable at query time
I was thinking search both and boost non-synonym field perhaps? -Original Message- From: mtraynham [mailto:mtrayn...@digitalsmiths.com] Sent: Monday, May 09, 2011 1:20 PM To: solr-user@lucene.apache.org Subject: RE: Synonym Filter disable at query time Awesome thanks! Also, you wouldn't happen to have any insight on boosting synonyms lower than the original query after they were stemmed, would you? Say if I had synonyms turned on: The TokenStream is setup to do Synonyms - StopFilter - LowerCaseFilter - SnowballPorter. Say I search for Thomas, synonyms produces Thomas, Tom, Tommy. The SnowballPorter produces Tom, Tommi, Thoma. Is there a way to know Thoma would match the original term, so it could be boosted higher? -- View this message in context: http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time- tp2919876p2920342.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Synonym Filter disable at query time
Yay! :) -Original Message- From: mtraynham [mailto:mtrayn...@digitalsmiths.com] Sent: Monday, May 09, 2011 1:59 PM To: solr-user@lucene.apache.org Subject: RE: Synonym Filter disable at query time Actually now that I think about it, with copy fields I can just single out the Synonym reader and boost from an earlier processor. Thanks again though, that solved a lot of headache! -- View this message in context: http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time- tp2919876p2920510.html Sent from the Solr - User mailing list archive at Nabble.com.
Slow, CPU-bound commit
Hello, I am using the new SOLR 3.1 for a 2.6 Gb, 1MM documents index. Reading the forums and the archive I learned that SOLR and Lucene now manage commits and transactions a bit differently than in previous versions, and indeed I feel the behavior has changed. Here's the thing: committing a few 100s documents is consistently taking about 12 minutes of pure CPU fury on a 6168 AMD Opteron processor. Here is a log of the commit (taken from INFOSTREAM.txt): http://pastebin.com/1rFK3Fs1 These numbers improved significantly after we increased ramBufferSizeMB to 1024, here is the full solrconfig.xml: http://pastebin.com/M1Tw0ATe Does these numbers look normal to you? The index is being used for searching while the commit takes place (about 1 search per second). Thanks in advance, Santiago
Re: edismax available in solr 3.1?
Is it a formal feature that solr 3.1 support? Or still as experimental feature? If it is experimental feature, i would still be hesitating to use it. -- View this message in context: http://lucene.472066.n3.nabble.com/edismax-available-in-solr-3-1-tp2910613p2920975.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: aliasing?
well... if i knew what to do about aliasing, i wouldnt post my question here Grijesh :) My idea is this: for some search queries, I need to provide some synonyms... But it is just an idea... -- View this message in context: http://lucene.472066.n3.nabble.com/aliasing-tp2917733p2921305.html Sent from the Solr - User mailing list archive at Nabble.com.
A DB dataSource and a URL Data source for Solr
I am trying to use two different data sources as in the title. The problem is that it fails each time i try... I tried the stuff on SolrWiki but it failed. anyone knows how to configure solr for using two different types of sources, DB and URL? -- View this message in context: http://lucene.472066.n3.nabble.com/A-DB-dataSource-and-a-URL-Data-source-for-Solr-tp2921328p2921328.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: A DB dataSource and a URL Data source for Solr
well second time i have fixed the issue on my own after posting here but i dont understand why indexing time increased to 16 mins, while it was only 2 mins with only db source... confused -- View this message in context: http://lucene.472066.n3.nabble.com/A-DB-dataSource-and-a-URL-Data-source-for-Solr-tp2921328p2921444.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr approaches to re-indexing large document corpus
We are looking for some recommendations around systematically re-indexing in Solr an ever growing corpus of documents (tens of millions now, hundreds of millions in than a year) without taking the currently running index down. Re-indexing is needed on a periodic bases because: - New features are introduced around searching the existing corpus that require additional schema fields which we can't always anticipate in advance - The corpus is indexed across multiple shards. When it grows past a certain threshold, we need to create more shards and re-balance documents evenly across all of them (which SolrCloud does not seem to yet support). The current index receives very frequent updates and additions, which need to be available for search within minutes. Therefore, approaches where the corpus is re-indexed in batch offline don't really work as by the time the batch is finished, new documents will have been made available. The approaches we are looking into at the moment are: - Create a new cluster of shards and batch re-index there while the old cluster is still available for searching. New documents that are not part of the re-indexed batch are sent to both the old cluster and the new cluster. When ready to switch, point the load balancer to the new cluster. - Use CoreAdmin: spawn a new core per shard and send the re-indexed batch to the new cores. New documents that are not part of the re-indexed batch are sent to both the old cores and the new cores. When ready to switch, use CoreAdmin to dynamically swap cores. We'd appreciate if folks can either confirm or poke holes in either or all these approaches. Is one more appropriate than the other? Or are we completely off? Thank you in advance.
SolrQuery API for adding group filter
There doesn't seem to be API to add a group (like group.field or group=true). I'm very new to this so I'm wondering how I'd go about adding a group query much like how I use 'addFilterQuery' to add an fq. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrQuery-API-for-adding-group-filter-tp2921539p2921539.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: aliasing?
a lot of this stuff is covered in the tutorial, and expanded in the wiki. still the best places to start in figuring out the fundamentals: http://lucene.apache.org/solr/tutorial.html http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory hth, rc On Mon, May 9, 2011 at 9:09 PM, deniz denizdurmu...@gmail.com wrote: well... if i knew what to do about aliasing, i wouldnt post my question here Grijesh :) My idea is this: for some search queries, I need to provide some synonyms... But it is just an idea... -- View this message in context: http://lucene.472066.n3.nabble.com/aliasing-tp2917733p2921305.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: A DB dataSource and a URL Data source for Solr
On Tue, May 10, 2011 at 7:24 AM, deniz denizdurmu...@gmail.com wrote: well second time i have fixed the issue on my own after posting here but i dont understand why indexing time increased to 16 mins, while it was only 2 mins with only db source... confused [...] Try timing just the URLDataSource. We had the same experience, and for us it had to do with the need to read many small files from the filesystem, something which was slower than a SELECT from a database. Regards, Gora
Re: Solr is not working for few document
On Tue, May 10, 2011 at 12:38 AM, Misra, Anil extern.anil.mi...@vw.com wrote: [...] I am using solr with liferay till Friday everything was good but today I added few documents but I am unable to search some of them L. [...] We are not mind-readers, so it is hard to tell what went wrong without any details from your side. Try going through this document: http://wiki.apache.org/solr/UsingMailingLists Looking into my crystal ball, one possibility is that you did not do a commit after indexing. However, please go through the above document before haring off after this solution. Regards