Re: search query text field with Comma
Dear Ravi, this is most likely the consequence of the analyzer-configuration: If you tokenize your text without removing the commas (and other punctuation), the comma right after the word Series will be part of the resulting token. You should check the configuration and make sure you use the appropriate analyzer (e.g. standard analyzer). Best, Sven Am 06.10.2014 um 20:56 schrieb EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com: Hi users, This is may be a basic question, but I am facing some trouble.. The scenario is , I have a text Truck Series, 12V and 15V, if the user search Truck Series it is not getting the row , but Truck Series, is working.. How can I get for search Truck Series..? Thanks Ravi
Re: solr finds allways all documents
Dear Robert, could you give me a little more information about your setting? For example the complete solrconfig.xml and the complete schema.xml would definitely help. Best, Sven -- kippdata informationstechnologie GmbH Sven Maurmann Tel: 0228 98549 -12 Bornheimer Str. 33a Fax: 0228 98549 -50 D-53111 Bonn sven.maurm...@kippdata.de HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417 Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann Am 20.08.2012 um 16:39 schrieb robert rottermann: Hi there, I am new to solr et all. Besides I am a java noob. What I am doing: I want to do full text retrival on office documents. The metadata of these documents are maintained in Postgesql. So the only intormation I need to get out of solr is a documet ID. My problem no is, that my index seem to be done badly. (nearly) What ever I look up, returns all documents. I would be very glad, if somebody could give me an idea what I shoul change. thanks Robert What I am using is the sample configuration that comes with solr 3.6. I removed all the fields and added the following: fields field name=docid type=string indexed=true stored=true required=true/ field name=docnum type=text indexed=true stored=true required=false/ field name=titel type=text indexed=true stored=true required=false/ field name=fsname type=text indexed=true stored=true required=false/ field name=directory type=text indexed=true stored=true required=false/ field name=fulltext type=text indexed=true stored=false required=false/ dynamicField name=* type=ignored / /fields !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeydocid/uniqueKey
Re: Language analyzers
Hi! Could you explain this a little more detailed? Thanks, Sven Am 16.05.2012 um 16:17 schrieb anarchos78: Hello, Is it possible to use two language analyzers for one fieldtype. Lets say Greek and English (for indexing and querying) Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Language-analyzers-tp3984116.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: analyzers in schema
Dear Gary, yes, you are right. Best, Sven Am 07.05.2012 um 17:08 schrieb G.Long: Hi :) In the schema.xml file, If an analyzer is specified for a fieldtype but without the attribute type=index or type=query, does it mean the analyzer is used by default for both cases? Gary
Re: solr connection question
Hi, Solr runs as a Web application. The requests you most probably mean are just HTTP-requests to the underlying container. Internally each request is processed against the Lucene index, usually being a file- based one. Therefore there are no connections like in a database application, where you have a pool of connections to your remote databse server. Best, Sven --On Donnerstag, 8. Juli 2010 15:46 +0300 ZAROGKIKAS,GIORGOS g.zarogki...@multirama.gr wrote: Hi solr users I need to know how solr manages the connections when we make a request(select update commit) Is there any connection pooling or an article to learn about it connection management?? How can I log in a file the connections solr server I have setup my solr 1.4 with tomcat Thanks in advance
Re: Configuring RequestHandler in solrconfig.xml OR in the Servlet code using SolrJ
Hi, there are reasons for both options. Usually it is a good idea to put the default configuration into the solrconfig.xml (and even fix some of the configuration) in order to have simple client-side code. But sometimesit is necessary to have some flexibility for the actual query. In this situation one would use the client-side approach. If done right, this does not mean to put the parameters in the servlet code. Cheers, Sven --On Dienstag, 22. Juni 2010 17:52 +0200 Jan Høydahl / Cominvent jan@cominvent.com wrote: Hi, Sometimes I do both. I put the defaults in solrconfig.xml and thus have one place to define all kind of low-level default settings. But then I make a possibility in the application space to add/override any parameters as well. This gives you great flexibility to let server administrators (with access to solrconfig.xml) tune low level stuff, but also gives programmers a middle layer to put domain-space config instead of locking it down on the search node or up in the web interfaces. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 21. juni 2010, at 22.29, Saïd Radhouani wrote: I completely agreed. Thanks a lot! -S On Jun 21, 2010, at 9:08 PM, Abdelhamid ABID wrote: Why would someone port the solr config into servlet code ? IMO the first option would be the best choice, one obvious reason is that, when alter the solr config you only need to restart the server, whereas changing in the source drive you to redeploy your app and restart the server. On 6/21/10, Saïd Radhouani r.steve@gmail.com wrote: Hello, I'm developing a Web application that communicate with Solr using SolrJ. I have three search interfaces, and I'm facing two options: 1- Configuring one SearchHandler per search interface in solrconfig.xml Or 2- Write the configuration in the java servlet code that is using SolrJ It there any significant difference between these two options ? If yes, what's the best choice? Thanks, -Saïd -- Abdelhamid ABID Software Engineer- J2EE / WEB
Re: Build query programmatically with lucene, but issue to solr?
Hi Pillip, could you give me some more information of your environment? A first idea that comes to my mind is to use the SearchComponents for the solution of your problem. You could either replace the whole QueryComponent (not re- commended) or write a (probably small) SearchComponent that creates the Lucene query and puts it into the appropriate place in the ResponseBuilder. If you add such a component to first-components in your handler-definition, you will execute the query. Regards, Sven --On Freitag, 28. Mai 2010 12:23 -0400 Phillip Rhodes rhodebumpl...@gmail.com wrote: Hi. I am building up a query with quite a bit of logic such as parentheses, plus signs, etc... and it's a little tedious dealing with it all at a string level. I was wondering if anyone has any thoughts on constructing the query in lucene and using the string representation of the query to send to solr. Thanks, Phillip
Re: is solr ignored my filters ?
Hi, could you provide at least some information? Usually you can be 100% sure that Solr uses the configuration it is provided with. Cheers, Sven --On Montag, 19. April 2010 05:53 -0800 stockii st...@shopgate.com wrote: hey. sry for this ... stupid question ;) when i perform an import from my data is use some filters. how can i really be sure that solr used my configured filters and analyzer ? when i search in solr the result looks 100% like bevor an import. th =) -- View this message in context: http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p729646.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: HTMLStripCharFilterFactory configuration problem
Hi, please note that you get the stored value of the field as a result and not the indexed one. Cheers, Sven --On Wednesday, April 14, 2010 02:54:52 PM +0530 Ranveer Kumar ranveer.s...@gmail.com wrote: Hi all, I am facing problem to configure HTMLStripCharFilterFactory. following is the schema : fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory/!-- escapedTags=lt;,gt;/ -- tokenizer class=solr.WhitespaceTokenizerFactory/ !-- filter class=solr.LengthFilterFactory min=2 max=50 / -- filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType when I am checking with analysis.jsp it giving true result. But in my query result still I am getting html tage.. I am using solrj client.. please help me -- kippdata informationstechnologie GmbH Sven Maurmann Tel: 0228 98549 -12 Bornheimer Str. 33a Fax: 0228 98549 -50 D-53111 Bonnsven.maurm...@kippdata.de HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417 Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann
Re: Need a bit of help, Solr 1.4: type text.
Hi, the parameter for WordDelimiterFilterFactory is catenateAll; you should set it to 1. Cheers, Sven --On Mittwoch, 10. Februar 2010 16:37 -0800 Yu-Shan Fung ambivale...@gmail.com wrote: Check out the configuration of WordDelimiterFilterFactory in your schema.xml. Depending on your settings, it's probably tokenizaing 13th into 13 and th. You can also have them concatenated back into a single token, but I can't remember the exact parameter. I think it could be catenateAll. On Wed, Feb 10, 2010 at 4:32 PM, Dickey, Dan dan.dic...@savvis.net wrote: I'm using the standard text type for a field, and part of the data being indexed is 13th, as in Friday the 13th. I can't seem to get it to match when I'm querying for Friday the 13th either quoted or not. One thing that does match is 13 th if I send the search query with a space between... Any suggestions? I know this is short on detail, but it's been a long day... time to get outta here. Thanks for any and all help. -Dan This message contains information which may be confidential and/or privileged. Unless you are the intended recipient (or authorized to receive for the intended recipient), you may not read, use, copy or disclose to anyone the message or any information contained in the message. If you have received the message in error, please advise the sender by reply e-mail and delete the message and any attachment(s) thereto without retaining any copies. -- When nothing seems to help, I go look at a stonecutter hammering away at his rock perhaps a hundred times without as much as a crack showing in it. Yet at the hundred and first blow it will split in two, and I know it was not that blow that did it, but all that had gone before. — Jacob Riis
Re: dismax and multi-language corpus
Hi, this is correct. Usually one does not know, how a stemmer - or other language specific filters - behaves in the context of a foreign language. But there is an exception that sometimes comes to the rescue: If one has a stable dictionary of terms in all the languages of interest, then one might put these terms in a synoynm list and also into a list of protected words for the stemmers. Then searches for one those terms in any language will return the documents regardless of their own language. Of course this does not solve the general problem of cross-language search, but it helps in certain circumstances. Cheers, Sven --On Donnerstag, 11. Februar 2010 13:45 -0800 Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Claudio, Ah, through multilingual indexing/search work (with http://www.sematext.com/products/multilingual-indexer/index.html ) I learned that cross-language search often doesn't really make sense, unless the search involves universal terms (e.g. Fiat, BMW, Mercedes, Olivetti, Tomi de Paola, Alberto Tomba...). If the search involved natural language-specific terms, then searching in the foreign language doesn't work so well and doesn't make a ton. Imagine a search for ciao ragazzi. I have no idea what the Italian stemmer does with that, but say it turns it into cia raga (it doesn't, but just imagine). If this was done with Italian docs at index time, you will find the matching docs. But what happens if ciao ragazzi was analyzed by some German analyzer? Different tokens will be created and indexed, so a ciao ragazzi search won't work. And this Analyzer would you use to analyze that query anyway? Italian or German? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Claudio Martella claudio.marte...@tis.bz.it To: solr-user@lucene.apache.org Sent: Thu, February 11, 2010 3:21:32 AM Subject: Re: dismax and multi-language corpus I'll try removing the '-'. I do need now to search it. the other option would be to request the user what language to query. but in my region we use italian and german in the same quantity, so it would turn out in querying both the languages all the time. or you meant a more performant solution of query both the languages all the time? :) Otis Gospodnetic wrote: Claudio - fields with '-' in them can be problematic. Side comment: do you really want to search across all languages at once? If not, maybe 3 different dismax configs would make your searches better. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ - Original Message From: Claudio Martella To: solr-user@lucene.apache.org Sent: Wed, February 10, 2010 3:15:40 PM Subject: dismax and multi-language corpus Hello list, I have a corpus with 3 languages, so i setup a text content field (with no stemming) and 3 text-[en|it|de] fields with specific snowball stemmers. i copyField the text to my language-away fields. So, I setup this dismax searchHandler: dismax title^1.2 content-en^0.8 content-it^0.8 content-de^0.8 title^1.2 content-en^0.8 content-it^0.8 content-de^0.8 title^1.2 content-en^0.8 content-it^0.8 content-de^0.8 0.1 but i get this error: HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Expected ',' at position 7 in 'content-en' type Status report message org.apache.lucene.queryParser.ParseException: Expected ',' at position 7 in 'content-en' description The request sent by the client was syntactically incorrect (org.apache.lucene.queryParser.ParseException: Expected ',' at position 7 in 'content-en'). Any idea? TIA Claudio -- Claudio Martella Digital Technologies Unit Research Development - Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19,
Re: How to reindex data without restarting server
Hi, restarting the Solr server wouldn't help. If you want to re-index your data you have to pipe it through the whole process again. In your case it might be a good idea to consider having several cores holding the different schema definitions. This will not save you from getting the original data and doing the analysis once again, but at least you do not have a schema not being consistent with the data in the index. If you have a way to find and access the original data from the unique id in your index, you may create a small program that reads the data belonging to the id and sends it to the new core for indexing (just rough toughts depending of the nature of your situation). Cheers, Sven --On Freitag, 12. Februar 2010 03:40 +0500 Emad Mushtaq emad.mush...@sigmatec.com.pk wrote: Hi, I would like to know if there is a way of reindexing data without restarting the server. Lets say I make a change in the schema file. That would require me to reindex data. Is there a solution to this ? -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/
Re: Embedded Solr problem
Hi Ranveer, I assume that you have enough knowlesge in Java. You should essentially your code for instantiating the server (depending on what you intend to do this may be done in a separate class or in a method of the class doing the queries). Then you use this instance to handle all the queries using for example the method query of SolrServer. For further information you may want to consult either the API documentation or the url http://wiki.apache.org/solr/Solrj from the wiki. Cheers, Sven --On Montag, 8. Februar 2010 08:53 +0530 Ranveer Kumar ranveer.s...@gmail.com wrote: Hi Sven, thanks for reply. yes i notice that every time when request, new instance is created of solr server. could you please guide me to do the same ( initialization to create an instance of SolrServer, once during first request). On Mon, Feb 8, 2010 at 2:11 AM, Sven Maurmann sven.maurm...@kippdata.dewrote: Hi, would it be possible that you instantiate a new instance of your SolrServer every time you do a query? You should use the code you quoted in your mail once during initialization to create an instance of SolrServer (the interface being implemented by EmbeddedSolrServer) and subsquently use the query method of SolrServer to do the query. Cheers, Sven --On Sonntag, 7. Februar 2010 21:54 +0530 Ranveer Kumar ranveer.s...@gmail.com wrote: Hi All, I am still very new to solr. Currently I am facing problem to use EmbeddedSolrServer. following is my code: File home = new File(D:/ranveer/java/solr_home/solr/first); CoreContainer coreContainer = new CoreContainer(); SolrConfig config = null; config = new SolrConfig(home + /core1,solrconfig.xml,null); CoreDescriptor descriptor = new CoreDescriptor(coreContainer,core1,home + /core1); SolrCore core = new SolrCore(core1, home+/core1/data, config, new IndexSchema(config, schema.xml,null), descriptor); coreContainer.register(core.getName(), core, true); final EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, core1); Now my problem is every time when I making request for search SolrCore is initializing the core. I want if the core/instance of core is already start then just use previously started core. Due to this problem right now searching is taking too much time. I tried to close core after search but same thing when fresh search result is made, solr is starting from very basic. please help.. thanks
Re: Indexing / querying multiple data types
Hi, could you be a little more precise about your configuration? It may be much easier to answer your question then. Cheers, Sven --On Montag, 8. Februar 2010 17:39 + stefan.ma...@bt.com wrote: OK - so I've now got my data-config.xml sorted so that I'm pulling in the expected number of indexed documents for my two data sets So I've defined two entities (name1 name2) and they both make use of the same fields -- I'm not sure if this is a good thing to have done When I run a query I include qt=name1 (or qt=name2) and am expecting to only get the number of results from the appropriate data set -- in fact I'm getting the sum total from both Does the entity name=name1 equate to the query qt=name1 In my solrconfig.xml I have defined two requestHandlers (name1 name2) using the common set of fields So how do ensure that my query http://localhost:7001/solr/select/?q=foodqt=name1 or http://localhost:7001/solr/select/?q=foodqt=name2 Will operate on the correct data set as loaded via the data import -- entity name=name1 or entity name=name2 Thankss Stefan Maric BT Innovate Design | Collaboration Platform - Customer Innovation Solutions
Re: Use of solr.ASCIIFoldingFilterFactory
Hi, you might have run into an encoding problem. If you use Tomcat as the container for Solr you should probably consult the following http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config Cheers, Sven --On Freitag, 5. Februar 2010 15:41 +0100 Yann PICHOT ypic...@gmail.com wrote: Hi, I have define this type in my schema.xml file : fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Fields definition : fields field name=id type=string indexed=true stored=true required=true / field name=idProd type=string indexed=false stored=false required=false / field name=description type=text indexed=true stored=true required=false / field name=artiste type=text indexed=true stored=true required=false / field name=collection type=text indexed=true stored=true required=false / field name=titre type=text indexed=true stored=true required=false / field name=all type=text indexed=true stored=true required=false / /fields copyField source=description dest=all/ copyField source=collection dest=all/ copyField source=artiste dest=all/ copyField source=titre dest=all/ I have import my documents with DataImportHandler (my orginals documents are in RDBMS). I test query this query string on SOLR web application : all:chateau. Results (content of the field all) : CHATEAU D'AMBOISE [CHATEAU EN FRANCE, BABELON] ope dvd rene chateau CHATEAU DE LA LOIRE DE CHATEAU EN CHATEAU ENTRE LA LOIRE ET LE CHER [LE CHATEAU AMBULANT, HAYAO MIYAZAKI] [Chambres d'hôtes au château, Moreau] [ARCHIMEDE, LA VIE DE CHATEAU, KRAHENBUHL] [NEUF, NAISSANCE D UN CHATEAU FORT, MACAULAY] [ARCHIMEDE, LA VIE DE CHATEAU, KRAHENBUHL] Now i try this query string : all:château. No result :( I don't understand. I think the second query respond the same result of the first query but it is not the case. I use SOLR 1.4 (Solr Implementation Version: 1.4.0 833479 - grantingersoll - 2009-11-06 12:33:40). Java 32 bits : Java(TM) SE Runtime Environment (build 1.6.0_17-b04) OS : Windows Seven 64 bits Regards, -- Yann
Re: Embedded Solr problem
Hi, would it be possible that you instantiate a new instance of your SolrServer every time you do a query? You should use the code you quoted in your mail once during initialization to create an instance of SolrServer (the interface being implemented by EmbeddedSolrServer) and subsquently use the query method of SolrServer to do the query. Cheers, Sven --On Sonntag, 7. Februar 2010 21:54 +0530 Ranveer Kumar ranveer.s...@gmail.com wrote: Hi All, I am still very new to solr. Currently I am facing problem to use EmbeddedSolrServer. following is my code: File home = new File(D:/ranveer/java/solr_home/solr/first); CoreContainer coreContainer = new CoreContainer(); SolrConfig config = null; config = new SolrConfig(home + /core1,solrconfig.xml,null); CoreDescriptor descriptor = new CoreDescriptor(coreContainer,core1,home + /core1); SolrCore core = new SolrCore(core1, home+/core1/data, config, new IndexSchema(config, schema.xml,null), descriptor); coreContainer.register(core.getName(), core, true); final EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, core1); Now my problem is every time when I making request for search SolrCore is initializing the core. I want if the core/instance of core is already start then just use previously started core. Due to this problem right now searching is taking too much time. I tried to close core after search but same thing when fresh search result is made, solr is starting from very basic. please help.. thanks
Re: Basic questions about Solr cost in programming time
Hi! Of course the answer depends (as usually) very much on the features you want to realize. But Solr can be set up very fast. When we created our first prototype, it took us about a week to get it running with spell phoneme search, spell checking, facetting - and even collapsing (using the famous 236-patch). It is definitely very nice that you can do a lot of things using the available components and only configuring them inside solrconfig.xml and schema.xml. And you may well start with the standard distribution. Cheers, Sven --On Dienstag, 26. Januar 2010 12:00 -0800 Jeff Crump jcr...@hq.mercycorps.org wrote: Hi, I hope this message is OK for this list. I'm looking into search solutions for an intranet site built with Drupal. Eventually we'd like to scale to enterprise search, which would include the Drupal site, a document repository, and Jive SBS (collaboration software). I'm interested in Lucene/Solr because of its scalability, faceted search and optimization features, and because it is free. Our problem is that we are a non-profit organization with only three very busy programmers/sys admins supporting our employees around the world. To help me argue for Solr in terms of total cost, I'm hoping that members of this list can share their insights about the following: * About how many hours of programming did it take you to set up your instance of Lucene/Solr (not counting time spent on optimization)? * Are there any disadvantages of going with a certified distribution rather than the standard distribution? Thanks and best regards, Jeff Jeff Crump jcr...@hq.mercycorps.org
Re: Solr wiki link broken
Hi, you might want to try the link called Frontpage on the generic wiki page. But well, this seems to be kind of broken for some locales. Regards, Sven --On Dienstag, 26. Januar 2010 01:23 -0500 Teruhiko Kurosaka k...@basistech.com wrote: In http://lucene.apache.org/solr/ the wiki tab and Docs (wiki) hyper text in the side bar text after expansion are the link to http://wiki.apache.org/solr But the wiki site seems to be broken. The above link took me to a generic help page of the Wiki system. What's going on? Did I just hit the site in a maintenance time? Kuro
Re: Solr wiki link broken
Hi Erik, one observation from me who is using the wiki from a browser living in a non-US locale: I usually get the standard wiki frontpage (in German) and not (!) the Solr-Frontpage I get, if I use a US locale (or click on the link FrontPage). B.t.w I know that this does not strictly belong to this list. Cheers, Sven --On Dienstag, 26. Januar 2010 04:05 -0500 Erik Hatcher erik.hatc...@gmail.com wrote: All seems well now. The wiki does have its flakey moments though. Erik On Jan 26, 2010, at 1:23 AM, Teruhiko Kurosaka wrote: In http://lucene.apache.org/solr/ the wiki tab and Docs (wiki) hyper text in the side bar text after expansion are the link to http://wiki.apache.org/solr But the wiki site seems to be broken. The above link took me to a generic help page of the Wiki system. What's going on? Did I just hit the site in a maintenance time? Kuro
Re: Index gets deleted after commit?
DIH is the DataImportHandler. Please consult the two URLs http://wiki.apache.org/solr/DataImportHandler and http://wiki.apache.org/solr/DataImportHandlerFaq for further information. Cheers, Sven --On Monday, January 25, 2010 11:33:59 AM +0200 Bogdan Vatkov bogdan.vat...@gmail.com wrote: Hi Amit, What is DIH? (I am Solr newbie). In the mean time I resolved my issue - it was very stupid one - on of the files in my folder with XMLs (that I send to Solr with the SimplePostTool), and actually the latest created one (so it got executed last each time I run the folder), contaoned delete*:* :) Best regards, Bogdan On Sun, Jan 24, 2010 at 6:25 AM, Amit Nithian anith...@gmail.com wrote: Are you using the DIH? If so, did you try setting clean=false in the URL line? That prevents wiping out the index on load. On Jan 23, 2010 4:06 PM, Bogdan Vatkov bogdan.vat...@gmail.com wrote: After mass upload of docs in Solr I get some REMOVING ALL DOCUMENTS FROM INDEX without any explanation. I was running indexing w/ Solr for several weeks now and everything was ok - I indexed 22K+ docs using the SimplePostTool I was first launching deletequery*:*/query/delete commit waitFlush=true waitSearcher=true/ then some 22K+ Add... with a finishing commit waitFlush=true waitSearcher=true/ But you can see from the log - right after the last commit I get this strange REMOVING ALL... I do not remember what I changed last but now I have this issue that after the mass upload of docs the index gets completely deleted. why is this happening? log after the last commit: INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDele tes=false) Jan 24, 2010 1:48:24 AM org.apache.solr.core.SolrDeletionPolicy onCommit INFO: SolrDeletionPolicy.onCommit: commits:num=2 commit{dir=/store/dev/inst/apache-solr-1.4.0/example/solr/data/ind ex,segFN=segments_fr,version=1260734716752,generation=567,filename s=[segments_fr] commit{dir=/store/dev/inst/apache-solr-1.4.0/example/solr/data/ind ex,segFN=segments_fs,version=1260734716753,generation=568,filename s=[_gv.nrm, segments_fs, _gv.fdx, _gw.nrm, _gv.tii, _gv.prx, _gv.tvf, _gv.tis, _gv.tvd, _gv.fdt, _gw.fnm, _gw.tis, _gw.frq, _gv.fnm, _gw.prx, _gv.tvx, _gw.tii, _gv.frq] Jan 24, 2010 1:48:24 AM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1260734716753 Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher init INFO: Opening searc...@de26e52 main Jan 24, 2010 1:48:24 AM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions =0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumu lative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@de26e52 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions =0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumu lative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,s ize=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulati ve_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@de26e52 main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,s ize=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulati ve_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,eviction s=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cum ulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@de26e52 main queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,eviction s=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cum ulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0 ,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumula tive_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for
Re: multi field search
Hi, you might want to use the Dismax-Handler. Sven --On Monday, January 18, 2010 02:58:09 PM +0100 Lukas Kahwe Smith m...@pooteeweet.org wrote: Hi, I realize that I can copy all fields together into one multiValue field and set that as the defaultSearchField. However in that case I cannot leverage the various custom analyzers I want to apply to the fields separately (name should use doublemetaphone, street should use the world splitter etc.). I can of course also do an OR query as well. But it would be nice to be able to do: q=*:foo and that would simply search all fields against the query foo. regards, Lukas Kahwe Smith m...@pooteeweet.org
Re: Fundamental questions of how to build up solr for huge portals
Hi! Your question is quite general in nature, therefore here are only a few initial remarks on how to get started: If you want to have a global search over all of your portals it might be best to start with one Solr instance and access it from all the portals. If you plan to build collections that are special to one or another portal you can do so during index-time: Just mark the indexed object in a dedicated field of the index. If you provide query handlers for each of the portals you can control the behaviour of the search based on the respective portal. You may than use query filters to filter results based on the portal. So much for the erer side. For your question about which client (language) to use: Since Solr is able to generate responses for a number of client platforms you may want to consult http://wiki.apache.org/solr/IntegratingSolr for additional information. I like to use a very lightweight solution using Java Script with the query responses from Solr being delivered via JSON. Since you can do this also for PHP clients, you might want to give it a try. Regards, Sven --On Samstag, 16. Januar 2010 15:16 +0100 Peter zarato...@gmx.net wrote: Hello! Our team wants to use solr for an community portal built up out of 3 and more sub portals. We are unsure in which way we sould build up the whole architecture, because we have more than one portal and we want to make them all connected and searchable by solr. Could some experts help us on these questions? - whats the best way to use solr to get the best performance for an huge portal with 5000 users that might expense fastly? - which client to use (Java,PHP...)? Now the portal is almost PHP/MySQL based. But we want to make solr as best as it could be in all ways (performace, accesibility, way of good programming, using the whole features of lucene - like tagging, facetting and so on...) We are thankful of every suggestions :) Thanks, Peter -- kippdata informationstechnologie GmbH Sven Maurmann Tel: 0228 98549 -12 Bornheimer Str. 33a Fax: 0228 98549 -50 D-53111 Bonnsven.maurm...@kippdata.de HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417 Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann
Re: Problem with text field in Solr
Hi, from a first glance on your configuration it appears that run run into the following: You use a wildcard query to query a stemmed term (aviation becomes aviat) in the index. Now if you provide a wildcard query with the trailing asterisk as the only wildcard, this wildcard query is rewritten as a prefix query, which is not (!) stemmed. Therefore everything seems to be fine for your first two examples (as avia and aviat are both prefixes of the stemmed aviation), but the remaining three queries try to match the prefixes aviati, aviatio and aviation against the stemm aviat of aviation - and fail. You may want to consult either the Lucene documentation (on the QueryParser for example) of the appropriate chapters in the excellent book Lucene in Action (LIA) by Hatcher and Gospodnetic. Hope that helps. Sven --On Friday, January 15, 2010 04:15:40 PM +0530 deepak agrawal dk.a...@gmail.com wrote: HI, I am using Solr in which I have BODY field as text. But when i am searching with BODY having word like *aviation* when i am Searching *BODY:avia** (aviation is coming) when i am Searching *BODY:aviat** (aviation is coming) when i am searching *BODY:aviati** (aviation is not coming) when i am searching *BODY:aviatio** (aviation is not coming) when i am searching *BODY:aviation** (aviation is not coming) Please help me how can i search these type of world with (*aviati*,** aviatio*,**aviation**) Below is the detail of How we are using BODY with Text. *field name=BODY type=text indexed=true stored=true multiValued=true termVectors=true/* fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType -- DEEPAK AGRAWAL +91-9379433455 GOOD LUCK. -- kippdata informationstechnologie GmbH Sven Maurmann Tel: 0228 98549 -12 Bornheimer Str. 33a Fax: 0228 98549 -50 D-53111 Bonnsven.maurm...@kippdata.de HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417 Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann
Re: Need help Migrating to Solr
Hi, since we did some kind of migration in a similar situation in the recent past, I might add some (hopefully helpful) remarks: If You use a Lucene-based application right now, You might already have an idea of which fields You want to store in Solr. Since You already do analyzing of fields, it should be easy to identify the necessary analyzers and filter-chains to be configured in the fiel-type part of the schema. Once you got the basic definition of the schema, You can start loading content into Solr. You can inspect the results using the admin web gui. I've found the ad hoc query interface and the analysis facility very helpful to get an idea of the inner workings. Of course that is only the very beginning. You should realize that Solr offers a very powerful mechanism to configure the way how queries are handled (using query handlers ...). The book Solr 1.4 Enterprise Search Server is a very good first step to understanding what You can do with Solr (refer to Solr's home page for the complete citation). Sven --On Thursday, January 14, 2010 08:38:12 AM -0500 Grant Ingersoll gsing...@apache.org wrote: I've done a fair number of migrations, but it's kind of hard to give generic advice on it. Specific questions as you dig in would be best. I'd probably, at least, just start with a simple schema that models most of your data and get Solr up and ingesting it. Then run some queries against it in your browser (no need for writing client side code yet) then go from there. -Grant On Jan 12, 2010, at 11:42 PM, Abin Mathew wrote: Hi I am new to the solr technology. We have been using lucene for handling searching in our web application www.toostep.com which is a knowledge sharing platform developed in java using Spring MVC architecture and iBatis as the persistance framework. Now that the application is getting very complex we have decided to implement Solr technology over lucene. Anyone having expertise in this area please give me some guidelines on where to start off and how to form the schema for Solr. Thanks and Regards Abin Mathew -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search -- kippdata informationstechnologie GmbH Sven Maurmann Tel: 0228 98549 -12 Bornheimer Str. 33a Fax: 0228 98549 -50 D-53111 Bonnsven.maurm...@kippdata.de HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417 Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann
RE: Problem comitting on 40GB index
Hi! Garbage collection is an issue of the underlying JVM. You may use –XX:+PrintGCDetails as an argument to your JVM in order to collect details of the garbage collection. If you also use the parameter –XX:+PrintGCTimeStamps you get the time stamps of the garbage collection. For further information you may want to refer to the paper http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf which points you to a few other utilities related to GC. Best, Sven Maurmann --On Mittwoch, 13. Januar 2010 18:03 + Frederico Azeiteiro frederico.azeite...@cision.com wrote: The hanging didn't happen again since yesterday. I never run out of space again. This is still a dev environment, so the number of searches is very low. Maybe I'm just lucky... Where can I see the garbage collection info? -Original Message- From: Marc Des Garets [mailto:marc.desgar...@192.com] Sent: quarta-feira, 13 de Janeiro de 2010 17:20 To: solr-user@lucene.apache.org Subject: RE: Problem comitting on 40GB index Just curious, have you checked if the hanging you are experiencing is not garbage collection related? -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 13 January 2010 13:33 To: solr-user@lucene.apache.org Subject: Re: Problem comitting on 40GB index That's my understanding.. But fortunately disk space is cheap G On Wed, Jan 13, 2010 at 5:01 AM, Frederico Azeiteiro frederico.azeite...@cision.com wrote: Sorry, my bad... I replied to a current mailing list message only changing the subject... Didn't know about this Hijacking problem. Will not happen again. Just for close this issue, if I understand correctly, for an index of 40G, I will need, for running an optimize: - 40G if all activity on index is stopped - 80G if index is being searched...) - 120G if index is being searched and if a commit is performed. Is this correct? Thanks. Frederico -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: terça-feira, 12 de Janeiro de 2010 19:18 To: solr-user@lucene.apache.org Subject: Re: Problem comitting on 40GB index Huh? On Tue, Jan 12, 2010 at 2:00 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Subject: Problem comitting on 40GB index : In-Reply-To: 7a9c48b51001120345h5a57dbd4o8a8a39fc4a98a...@mail.gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss -- This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. Any views or opinions expressed within it are those of the author and do not necessarily represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary companies. If you are not the intended recipient then you must not disclose, copy or take any action in reliance of this transmission. If you have received this transmission in error, please notify the sender as soon as possible. No employee or agent is authorised to conclude any binding agreement on behalf of i-CD Publishing (UK) Ltd with another party by email without express written confirmation by an authorised employee of the Company. http://www.192.com (Tel: 08000 192 192). i-CD Publishing (UK) Ltd is incorporated in England and Wales, company number 3148549, VAT No. GB 673128728.