Solr Admin Schema Browser and field named keywords
I have a field named keywords in my index. The schema browser page is not able to deal with this, so I have trouble getting statistical information on this field. When I click on the field, Firefox hangs for a minute and then gives the unresponsive script warning. I assume (without actually checking) that this is due to keywords being already used for something in the javascript code. Is this already a known problem, or should I create a Jira? Related to this, would it be difficult to make this feature display something like a status bar when it is first grabbing information, indicating how many fields there are and which one it's working on at the moment? It takes a few minutes for it to load on my indexes, so some indication of how far along it is would be very nice. Shawn
How to do Spatial Search with Solr?
Hi, I am using nutch to do the crawling and solr to do the searching. The index has City and State. I want to able to get all nearby cities by entering city name. e.g. when I type New York, I want to get the following as facet: New York, NY (1905) Brooklyn, NY (89) Jersey City, NJ (55) New York City, NY (34) Montclair, NJ (25) How do I do that? More importantly, where do I get all the latitute and longitude data for all cities? Thanks.
Re: How to do Spatial Search with Solr?
Hi Savannah, Check out the patches I just threw up for SOLR-2073, SOLR-2074, SOLR-2075, SOLR-2076 and SOLR-2077. There's code in there to deal with Geonames.org data. There's more patches coming so hopefully it will be clearer as I add them. Thanks to W. Quach for leading the charge on these patches! Cheers, Chris On 8/22/10 11:21 PM, Savannah Beckett savannah_becket...@yahoo.com wrote: Hi, I am using nutch to do the crawling and solr to do the searching. The index has City and State. I want to able to get all nearby cities by entering city name. e.g. when I type New York, I want to get the following as facet: New York, NY (1905) Brooklyn, NY (89) Jersey City, NJ (55) New York City, NY (34) Montclair, NJ (25) How do I do that? More importantly, where do I get all the latitute and longitude data for all cities? Thanks. ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Solr Admin Schema Browser and field named keywords
On 8/23/2010 12:07 AM, Shawn Heisey wrote: I have a field named keywords in my index. The schema browser page is not able to deal with this, so I have trouble getting statistical information on this field. When I click on the field, Firefox hangs for a minute and then gives the unresponsive script warning. I assume (without actually checking) that this is due to keywords being already used for something in the javascript code. I've just looked over the javascript sent to my browser, and do not see anything even close to keywords in it. I also did not see it in the old version of jQuery that it loads. I also looked through the branch_3x source code with the following command, and did not find anything that actually looked like a problem: grep -irl keywords * | grep -v svn This has been a problem for me the entire time I've used Solr. I started with a 1.5-dev version, my production is now completely stock 1.4.1, and I've been doing all my tests today on a 3.1 build from 2010-06-29. The code tree that I grepped is from 2010-08-13. If there's any troubleshooting that someone needs done on my end, let me know. Thanks, Shawn
help refactoring from 3.x to 4.x
I have a function that works well in 3.x, but when I tried to re-implement in 4.x it runs very very slow (~20ms vs 45s on an index w ~100K items). Big picture, I am trying to calculate a bounding box for items that match the query. To calculate this, I have two fields bboxNS, and bboxEW that get filled with the min and max values for that doc. To get the bounding box, I just need the first matching term in the index and the last matching term. In 3.x the code looked like this: public class FirstLastMatchingTerm { String first = null; String last = null; public static FirstLastMatchingTerm read(SolrIndexSearcher searcher, String field, DocSet docs) throws IOException { FirstLastMatchingTerm firstLast = new FirstLastMatchingTerm(); if( docs.size() 0 ) { IndexReader reader = searcher.getReader(); TermEnum te = reader.terms(new Term(field,)); do { Term t = te.term(); if( null == t || !t.field().equals(field) ) { break; } if( searcher.numDocs(new TermQuery(t), docs) 0 ) { firstLast.last = t.text(); if( firstLast.first == null ) { firstLast.first = firstLast.last; } } } while( te.next() ); } return firstLast; } } In 4.x, I tried: public class FirstLastMatchingTerm { String first = null; String last = null; public static FirstLastMatchingTerm read(SolrIndexSearcher searcher, String field, DocSet docs) throws IOException { FirstLastMatchingTerm firstLast = new FirstLastMatchingTerm(); if( docs.size() 0 ) { IndexReader reader = searcher.getReader(); Terms terms = MultiFields.getTerms(reader, field); TermsEnum te = terms.iterator(); BytesRef term = te.next(); while( term != null ) { if( searcher.numDocs(new TermQuery(new Term(field,term)), docs) 0 ) { firstLast.last = term.utf8ToString(); if( firstLast.first == null ) { firstLast.first = firstLast.last; } } term = te.next(); } } return firstLast; } } but the results are slow (and incorrect). I tried some variations of using ReaderUtil.Gather(), but the real hit seems to come from if( searcher.numDocs(new TermQuery(new Term(field,term)), docs) 0 ) Any ideas? I'm not tied to the approach or indexing strategy, so if anyone has other suggestions that would be great. Looking at it again, it seems crazy that you have to run a query for each term, but in 3.x thanks ryan
possible to have multiple elevation file?
Hi, I need multiple elevation file for each site (around 200). I think one big elevation file is difficult to manage. How could I manage each elevation file differently? Thanks -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/
Re: Autosuggest on PART of cityname
On 8/20/2010 7:04 PM, PeterKerk wrote: @Markus: thanks, will try to work with that. @Gijs: I've looked at the site and the search function on your homepage is EXACTLY what I need! Do you have some Solr code samples for me to study perhaps? (I just need the relevant fields in the schema.xml and the query url) It would help me a lot! :) Thanks to you both! The fields in our schema are: field name=id type=string indexed=true stored=true required=true / - Just an id based on type, depth and a number, not important field name=type type=string indexed=true stored=true required=true / - This is either buy or rent as our sections have separate autocompleters field name=depth type=string indexed=true stored=true / - Since you can search by country, region or city, this stores the type of this document (well, since we use geonames.org geographical data we actually have 4 regions) field name=name type=text indexed=true stored=true / - The canonical name of the country/region/city dynamicField name=name_* type=text indexed=true stored=true / - The name of the country/region/city in various languages field name=parent type=text indexed=true stored=true / - The name of the country/region/city with any of it's parents comma separated, this is used for phrase searches so if you enter Amsterdam, Netherlands the dutch Amsterdam will match before any of the Amsterdams in other countries. dynamicField name=parent_* type=text indexed=true stored=true / - The same as parent but in different languages field name=data type=string indexed=false stored=true / - This is some internal data used to create the correct filters when this particular suggestion is selected dynamicField name=data_* type=text indexed=true stored=true / - The same as parent but in different languages, as our filters are on the actual name of countries/regions/cities field name=count type=tint indexed=true stored=true / - The number of documents, i.e. the number on the right of the suggestions field name=names type=text indexed=true multiValued=true / - Multivalued field which is copyfield-ed from name and name_* field name=parents type=text indexed=true multiValued=true / - Multivalued field which is copyfield-ed from parent and parent_* Where text is fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=30/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Our autocompletion requests are dismax request where the most important parameters are: - q=the text the user has entered into the searchbox so far - fq=type:sale (or rent) - qf=name_lang^4 name^4 names (Where lang is the currently selected language on the website) - pf=name_lang^4 name^4 names parents Honestly, those parameters are basically just tweaked without quite understanding their meaning until I got something that worked adequately. Hope this helps. Regards, gwk
Re: possible to have multiple elevation file?
Hi, Here, I talk about QueryElevationComponenthttp://wiki.apache.org/solr/QueryElevationComponent?action=fullsearchcontext=180value=linkto%3A%22QueryElevationComponent%22 . Anyone has some idea? Thanks On Mon, Aug 23, 2010 at 3:10 PM, Chamnap Chhorn chamnapchh...@gmail.comwrote: Hi, I need multiple elevation file for each site (around 200). I think one big elevation file is difficult to manage. How could I manage each elevation file differently? Thanks -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/ -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/
Re: Proper Escaping of Ampersands
Hi Yonik, I got it working, but I think the Stopword Filter is not behaving as expected - (The document could be found when I disabled the stopword filter, details later in this mail...) On 20.08.2010 16:57, Yonik Seeley wrote On Thu, Aug 19, 2010 at 11:33 AM, Nikolas Tautenhahn nik_s...@livinglogic.de wrote: But when I search for q=at%26s (=ats), I get nothing. That's the correct encoding if you're typing it directly into a browser address box. http://localhost:8983/solr/select?defType=dismaxqf=textq=at%26sdebugQuery=true But you should be able to verify that solr is getting the correct query string by checking out params in the response (in the example server, by default they are echoed back). And adding debugQuery=true to the request should show you exactly what query is being generated. But the real issue likely lies with your fieldType definition. Can you show that? As I (normally) query multiple fields, I changed my request URL to http://127.0.0.1:8983/solr/select?q=at%26sfl=titelqt=dismaxqf=titeldebugQuery=truefl=*qt=dismaxqf=titeldebugQuery=true in order to narrow it down and got this response (cut to, as I think, relevant stuff) str name=rawquerystringats/str str name=querystringats/str str name=parsedquery+DisjunctionMaxQuery((titel:(ats at) s)~0.1) ()/str str name=parsedquery_toString+(titel:(ats at) s)~0.1 ()/str lst name=explain/ str name=QParserDisMaxQParser/str on my local debugging instance, using standard dismax config (from the examples directory at solr). The titel-Field is configured like this: field name=titel type=textgen indexed=true stored=true/ and textgen is configured like this fieldType name=textgen class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.HTMLStripStandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType The document is indexed correctly, a search for at s found it and all fields looked great (ats and not for example, atamp;s). As my stopword list does not contain at or or amp;, I don't quite understand, why my result is found, when I disable the stopword-list. My stopwordlist can be found here http://pastebin.com/RfLuBHqd Do you happen to see bad things for a string like ats here? The analysis page in the admin panel tells me, these steps for the Index Analyzer: (HTMLStripStandardTokenizer) ats = ats (SynonymFilter) ats = ats (WordDelimiterFilter) ats = term position 1: ats, at; term pos 2: s, ats (LowerCaseFilter) 1: ats, at; 2: s, ats = 1: ats, at; 2: s, ats (StopFilter) 1: ats, at; 2: s, ats = 1: ats, at; 2: ats So, according to this, it should be found even with my stopwords enabled... best regards and thanks for your response, Nikolas Tautenhahn
Re: help refactoring from 3.x to 4.x
Spooky that you see incorrect results! The code looks correct. What are the specifics on when it produces an invalid result? Also spooky that you see it running slower -- how much slower? Did you rebuild the index in 4.x (if not, you are using the preflex codec)? And is the index otherwise identical? You could improve perf by not using SolrIndexSearcher.numDocs? Ie you don't need the count; you just need to know if it's 0. So you could make your own loop that breaks out on the first docID in common. You could also stick w/ BytesRef the whole time (only do .utf8ToString() in the end on the first/last), though this is presumably a net/nets tiny cost. But, we should still dig down on why numDocs is slower in 4.x; that's unexpected; Yonik any ideas? I'm not familiar with this part of Solr... Mike On Mon, Aug 23, 2010 at 2:38 AM, Ryan McKinley ryan...@gmail.com wrote: I have a function that works well in 3.x, but when I tried to re-implement in 4.x it runs very very slow (~20ms vs 45s on an index w ~100K items). Big picture, I am trying to calculate a bounding box for items that match the query. To calculate this, I have two fields bboxNS, and bboxEW that get filled with the min and max values for that doc. To get the bounding box, I just need the first matching term in the index and the last matching term. In 3.x the code looked like this: public class FirstLastMatchingTerm { String first = null; String last = null; public static FirstLastMatchingTerm read(SolrIndexSearcher searcher, String field, DocSet docs) throws IOException { FirstLastMatchingTerm firstLast = new FirstLastMatchingTerm(); if( docs.size() 0 ) { IndexReader reader = searcher.getReader(); TermEnum te = reader.terms(new Term(field,)); do { Term t = te.term(); if( null == t || !t.field().equals(field) ) { break; } if( searcher.numDocs(new TermQuery(t), docs) 0 ) { firstLast.last = t.text(); if( firstLast.first == null ) { firstLast.first = firstLast.last; } } } while( te.next() ); } return firstLast; } } In 4.x, I tried: public class FirstLastMatchingTerm { String first = null; String last = null; public static FirstLastMatchingTerm read(SolrIndexSearcher searcher, String field, DocSet docs) throws IOException { FirstLastMatchingTerm firstLast = new FirstLastMatchingTerm(); if( docs.size() 0 ) { IndexReader reader = searcher.getReader(); Terms terms = MultiFields.getTerms(reader, field); TermsEnum te = terms.iterator(); BytesRef term = te.next(); while( term != null ) { if( searcher.numDocs(new TermQuery(new Term(field,term)), docs) 0 ) { firstLast.last = term.utf8ToString(); if( firstLast.first == null ) { firstLast.first = firstLast.last; } } term = te.next(); } } return firstLast; } } but the results are slow (and incorrect). I tried some variations of using ReaderUtil.Gather(), but the real hit seems to come from if( searcher.numDocs(new TermQuery(new Term(field,term)), docs) 0 ) Any ideas? I'm not tied to the approach or indexing strategy, so if anyone has other suggestions that would be great. Looking at it again, it seems crazy that you have to run a query for each term, but in 3.x thanks ryan
Re: How to Debug Sol-Code in Eclipse ?!
can nobody help me or want :D -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-Debug-Sol-Code-in-Eclipse-tp1262050p1288705.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get most indexed keyword from SOLR
Hi Pawan, If u r using solr1.4 or latter version then u can see terms info by using terms request handler like http://localhost:8080/solr/terms/?terms.fl=textterms.sort=count -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-most-indexed-keyword-from-SOLR-tp1240552p1289084.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to Debug Sol-Code in Eclipse ?!
can nobody help me or want :D As already someone said: - install Eclipse - add Jetty Webapp Plugin to Eclipse - add svn plugin to Eclipse - download with svn the repository from trunk - change to lucene dir and run ant package - change to solr dir and run ant dist - setup with Run configure... a Jetty Webapp for solr - start debugging :-) If debugging below solr level into lucene level just add lucene src path to debugging source. May be you should read: http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-eclipse Regards, Bernd
Re: How to Debug Sol-Code in Eclipse ?!
On Sun, Aug 22, 2010 at 8:29 PM, stockii st...@shopgate.com wrote: okay, thx. but it want work =( i checkout solr1.4.1 as dynamic web project into eclipse. startet jetty with XDebug. In eclpise i add WebLogic exactly how the tutorial shows but eclipse cannot connect =( any idea what im doing wrong ? No idea. Check your arguments and verify that the port is the same on both the command-line you're using to start jetty and in the eclipse remote debugger configuration. Make sure you don't have a firewall running on your machine because that might block the connection from eclipse to jetty.
Re: Doing Shingle but also keep special single word
1. We have over ten million news articles to build into Solr index. 2. We copy several fields, such as title, author, body, caption of attahed photos into a new field for default search. 3. We then wanna use shingle filter on this new field. 4. We can't predict what new single-word noun that our users may be interesting cause it's news, you know. For exmple, the word ECFA is only very popular word in news here recently, so I wish users can type in 'ECFA' to search and Solr will output see some relevant news articles. 5. I wish to keep index as smaller as possible. 6. I also wish to do same thing descirbed in 5 when I search by explicitly specifyng field name of those fields, too. Can i ask why do you need/use shingle filter?
Re: SolrException log
Hi Bastian, this seems to be related to IO and file deletion (optimization compacts and removes index files), are you running Solr on NFS or a distributed file system? You could set a propert IndexDeletionPolicy (SolrDeletionPolicy) in solrconfig.xml to handle this. My 2 cents, Tommaso 2010/8/11 Bastian Spitzer bspit...@magix.net Hi, we are using solr 1.4.1 in a master-slave setup with replication, requests are loadbalanced to both instances. this is just working fine, but the slave behaves strange sometimes with a SolrException log (trace below). We are using 1.4.1 for weeks now, and this has happened only a few times so far, and it only occured on the Slave. The Problem seemed to be gone when we added a cron-job to send a periodic optimize/ (once a day) to the master, but today it did happen again. The Index contains 55 files right now, after optimize there are only 10. So it seems its a problem when the index is spread among a lot files. The Slave wont ever recover once this Exception shows up, the only thing that helps is a restart. Is this a known issue? Only workaround would be to track the commit-counts and send additional optimize/ requests after a certain amount of commits, but id prefer solving this problem rather than building a workaround.. Any hints/thoughts on this issue are verry much appreciated, thanks in advance for your help. cheers Bastian. Aug 11, 2010 4:51:58 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=media_id,keyword_1004sort=priority_1000+desc,+score+descind ent=offstart=0q=mandant_id:1000+AND+partner_id:1000+AND+active_1000:tr ue+AND+cat_id_path_1000:7231/7258*+AND+language_id:1004rows=24version= 2.2} status=500 QTime=2 Aug 11, 2010 4:51:58 PM org.apache.solr.common.SolrException log SEVERE: java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.jav a:151) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.j ava:38) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:78) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:112) at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheI mpl.java:461) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:22 4) at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430) at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheI mpl.java:445) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:22 4) at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430) at org.apache.lucene.search.FieldComparator$IntComparator.setNextReader(Fie ldComparator.java:332) at org.apache.lucene.search.TopFieldCollector$MultiComparatorNonScoringColl ector.setNextReader(TopFieldCollector.java:435) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:249) at org.apache.lucene.search.Searcher.search(Searcher.java:171) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher. java:988) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.j ava:884) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:3 41) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent. java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search Handler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:241) at org.mortbay.jetty.servlet.WebApplicationHandler$CachedChain.doFilter(Web ApplicationHandler.java:821) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationH andler.java:471) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:568) at org.mortbay.http.HttpContext.handle(HttpContext.java:1530) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationCon text.java:633) at org.mortbay.http.HttpContext.handle(HttpContext.java:1482) at org.mortbay.http.HttpServer.service(HttpServer.java:909) at org.mortbay.http.HttpConnection.service(HttpConnection.java:820) at org.mortbay.http.ajp.AJP13Connection.handleNext(AJP13Connection.java:295 ) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:837) at org.mortbay.http.ajp.AJP13Listener.handleConnection(AJP13Listener.java:2 12) at
Re: Tokenising on Each Letter
Probably a good idea to post the relevant information! I guess I thought it would be a really obvious answer but it seems its a bit more complex ;) field name=productsModel type=textTight indexed=true stored=true omitNorms=true/ !-- Less flexible matching, but less false matches. Probably not ideal for product names, but may be good for SKUs. Can insert dashes in the wrong place and still match. -- fieldType name=textTight class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ !-- this filter can remove any duplicate tokens that appear at the same position - sometimes possible with WordDelimiterFilter in conjuncton with stemming. -- filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType It seems you may be correct about the catenateAll option, but I'm not sure if adding in a wildcard at the end of every search would be a great idea? This is meant to be applied to a general search box, but still retain flexibility for model numbers. Right now, we are using mySQL % % wildcards so it matches pretty much anything on the model number, whether you cut off the start or the end etc, and I wanted to retain that. Could you elaborate about N gram for me, based on my schema? The main reason I picked TextTight was for model numbers like EQW-500DBE-1AVER etc, I thought it would produce better results? Thanks a lot for the detailed reply. Scott -- View this message in context: http://lucene.472066.n3.nabble.com/Tokenising-on-Each-Letter-tp1247113p1291984.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to Debug Sol-Code in Eclipse ?!
ant package BUILD FAILED run program perl ... it`s necessary to install perl on my computer ?! -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-Debug-Sol-Code-in-Eclipse-tp1262050p1291992.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tokenising on Each Letter
Hi Scottie, Could you elaborate about N gram for me, based on my schema? just a quick reply: fieldType name=textNGram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=1 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory side=front minGramSize=2 maxGramSize=30 / filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Will produce any NGrams from 2 up to 30 Characters, for Info check http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory Be sure to adjust those sizes (minGramSize/maxGramSize) so that maxGramSize is big enough to keep the whole original serial number/model number and minGramSize is not so small that you fill your index with useless information. Best regards, Nikolas Tautenhahn
Re: SolrException log
Hi Tommaso, Thanks for your Reply. The Solr Files are on local disk, on a reiserfs. I'll try to set a Deletion Policy and report back if that solved the problem, thank you for the hint. cheers, Bastian -Ursprüngliche Nachricht- Von: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Gesendet: Montag, 23. August 2010 15:31 An: solr-user@lucene.apache.org Betreff: Re: SolrException log Hi Bastian, this seems to be related to IO and file deletion (optimization compacts and removes index files), are you running Solr on NFS or a distributed file system? You could set a propert IndexDeletionPolicy (SolrDeletionPolicy) in solrconfig.xml to handle this. My 2 cents, Tommaso 2010/8/11 Bastian Spitzer bspit...@magix.net Hi, we are using solr 1.4.1 in a master-slave setup with replication, requests are loadbalanced to both instances. this is just working fine, but the slave behaves strange sometimes with a SolrException log (trace below). We are using 1.4.1 for weeks now, and this has happened only a few times so far, and it only occured on the Slave. The Problem seemed to be gone when we added a cron-job to send a periodic optimize/ (once a day) to the master, but today it did happen again. The Index contains 55 files right now, after optimize there are only 10. So it seems its a problem when the index is spread among a lot files. The Slave wont ever recover once this Exception shows up, the only thing that helps is a restart. Is this a known issue? Only workaround would be to track the commit-counts and send additional optimize/ requests after a certain amount of commits, but id prefer solving this problem rather than building a workaround.. Any hints/thoughts on this issue are verry much appreciated, thanks in advance for your help. cheers Bastian. Aug 11, 2010 4:51:58 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=media_id,keyword_1004sort=priority_1000+desc,+score+desci nd ent=offstart=0q=mandant_id:1000+AND+partner_id:1000+AND+active_1000: tr ue+AND+cat_id_path_1000:7231/7258*+AND+language_id:1004rows=24versio ue+AND+n= 2.2} status=500 QTime=2 Aug 11, 2010 4:51:58 PM org.apache.solr.common.SolrException log SEVERE: java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.j av a:151) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput .j ava:38) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:78) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:112) at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCach eI mpl.java:461) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java: 22 4) at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430) at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCach eI mpl.java:445) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java: 22 4) at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430) at org.apache.lucene.search.FieldComparator$IntComparator.setNextReader(F ie ldComparator.java:332) at org.apache.lucene.search.TopFieldCollector$MultiComparatorNonScoringCo ll ector.setNextReader(TopFieldCollector.java:435) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:249) at org.apache.lucene.search.Searcher.search(Searcher.java:171) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher. java:988) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher .j ava:884) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java :3 41) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent. java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sear ch Handler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle rB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter. ja va:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter .j ava:241) at org.mortbay.jetty.servlet.WebApplicationHandler$CachedChain.doFilter(W eb ApplicationHandler.java:821) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicatio nH andler.java:471) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:568) at org.mortbay.http.HttpContext.handle(HttpContext.java:1530) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationC on text.java:633) at org.mortbay.http.HttpContext.handle(HttpContext.java:1482)
Re: SolrException log
I dont seem to find a decent documentation on how those parameters actually work. this is the default, example block: deletionPolicy class=solr.SolrDeletionPolicy !-- The number of commit points to be kept -- str name=maxCommitsToKeep1/str !-- The number of optimized commit points to be kept -- str name=maxOptimizedCommitsToKeep0/str !-- Delete all commit points once they have reached the given age. Supports DateMathParser syntax e.g. str name=maxCommitAge30MINUTES/str str name=maxCommitAge1DAY/str -- /deletionPolicy so do i have to increase the maxCommitsToKeep to a value of 2 when i add a maxCommitAge Parameter? Or will 1 still be enough? Do i have to call optimize more than once a day when i add maxOptimizedCommitsToKeep with a value of 1? can some1 please explain how this is supposed to work? -Ursprüngliche Nachricht- Von: Bastian Spitzer [mailto:bspit...@magix.net] Gesendet: Montag, 23. August 2010 16:40 An: solr-user@lucene.apache.org Betreff: Re: SolrException log Hi Tommaso, Thanks for your Reply. The Solr Files are on local disk, on a reiserfs. I'll try to set a Deletion Policy and report back if that solved the problem, thank you for the hint. cheers, Bastian -Ursprüngliche Nachricht- Von: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Gesendet: Montag, 23. August 2010 15:31 An: solr-user@lucene.apache.org Betreff: Re: SolrException log Hi Bastian, this seems to be related to IO and file deletion (optimization compacts and removes index files), are you running Solr on NFS or a distributed file system? You could set a propert IndexDeletionPolicy (SolrDeletionPolicy) in solrconfig.xml to handle this. My 2 cents, Tommaso 2010/8/11 Bastian Spitzer bspit...@magix.net Hi, we are using solr 1.4.1 in a master-slave setup with replication, requests are loadbalanced to both instances. this is just working fine, but the slave behaves strange sometimes with a SolrException log (trace below). We are using 1.4.1 for weeks now, and this has happened only a few times so far, and it only occured on the Slave. The Problem seemed to be gone when we added a cron-job to send a periodic optimize/ (once a day) to the master, but today it did happen again. The Index contains 55 files right now, after optimize there are only 10. So it seems its a problem when the index is spread among a lot files. The Slave wont ever recover once this Exception shows up, the only thing that helps is a restart. Is this a known issue? Only workaround would be to track the commit-counts and send additional optimize/ requests after a certain amount of commits, but id prefer solving this problem rather than building a workaround.. Any hints/thoughts on this issue are verry much appreciated, thanks in advance for your help. cheers Bastian. Aug 11, 2010 4:51:58 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=media_id,keyword_1004sort=priority_1000+desc,+score+desci nd ent=offstart=0q=mandant_id:1000+AND+partner_id:1000+AND+active_1000: tr ue+AND+cat_id_path_1000:7231/7258*+AND+language_id:1004rows=24versio ue+AND+n= 2.2} status=500 QTime=2 Aug 11, 2010 4:51:58 PM org.apache.solr.common.SolrException log SEVERE: java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.j av a:151) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput .j ava:38) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:78) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:112) at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCach eI mpl.java:461) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java: 22 4) at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430) at org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCach eI mpl.java:445) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java: 22 4) at org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430) at org.apache.lucene.search.FieldComparator$IntComparator.setNextReader(F ie ldComparator.java:332) at org.apache.lucene.search.TopFieldCollector$MultiComparatorNonScoringCo ll ector.setNextReader(TopFieldCollector.java:435) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:249) at org.apache.lucene.search.Searcher.search(Searcher.java:171) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher. java:988) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher .j ava:884) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java :3 41) at
Re: Tokenising on Each Letter
Nikolas, thanks a lot for that, I've just gave it a quick test and it definitely seems to work for the examples I've gave. Thanks again, Scott From: Nikolas Tautenhahn [via Lucene] Sent: Monday, August 23, 2010 3:14 PM To: Scottie Subject: Re: Tokenising on Each Letter Hi Scottie, Could you elaborate about N gram for me, based on my schema? just a quick reply: fieldType name=textNGram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=1 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory side=front minGramSize=2 maxGramSize=30 / filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Will produce any NGrams from 2 up to 30 Characters, for Info check http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory Be sure to adjust those sizes (minGramSize/maxGramSize) so that maxGramSize is big enough to keep the whole original serial number/model number and minGramSize is not so small that you fill your index with useless information. Best regards, Nikolas Tautenhahn View message @ http://lucene.472066.n3.nabble.com/Tokenising-on-Each-Letter-tp1247113p1292238.html To unsubscribe from Tokenising on Each Letter, click here. -- View this message in context: http://lucene.472066.n3.nabble.com/Tokenising-on-Each-Letter-tp1247113p1294586.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to Debug Sol-Code in Eclipse ?!
thx, for your help. now it works fine. its very simple when you kno how :D haha i try bernds suggest =) -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-Debug-Sol-Code-in-Eclipse-tp1262050p1296175.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Proper Escaping of Ampersands
: The document is indexed correctly, a search for at s found it and all : fields looked great (ats and not for example, atamp;s). : : As my stopword list does not contain at or or amp;, I don't : quite understand, why my result is found, when I disable the : stopword-list. My stopwordlist can be found here : : http://pastebin.com/RfLuBHqd : : Do you happen to see bad things for a string like ats here? s is in your stopwords file, which may be part of the problem (but i didn't look hard at your query string to verify that) : The analysis page in the admin panel tells me, these steps for the Index : Analyzer: ... : (StopFilter) 1: ats, at; 2: s, ats = 1: ats, at; 2: ats : : So, according to this, it should be found even with my stopwords enabled... Strange, based on the stopwords file you posted the s should definitely be getting removed at index time -- it would also get removed at query time, but because you have it *before* WDF at query time that wouldn't affect this query (even though it did affect the index) There was a bug with analysis.jsp and stopwords recently, but that shouldn't have affected 1.4 (you are definitely using 1.4, correct?) https://issues.apache.org/jira/browse/SOLR-2051 -Hoss
Re: Proper Escaping of Ampersands
I'd recommend going back to the textgen field type as defined in the example schema. Your move of the StopFilter is what is causing the problem. At index time, the s gets removed (because the StopFilter is now after the WDF). But a query of ats is transformed into at s (the s isn't removed because StopFilter is before WDF for the query analyzer). Since s isn't in the index, no docs are found. Also, I notice you're using preserveOriginal=1 - make sure you really need that... it's normally only useful if you are doing wildcard searches (for example at*). -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 On Mon, Aug 23, 2010 at 5:43 AM, Nikolas Tautenhahn nik_s...@livinglogic.de wrote: Hi Yonik, I got it working, but I think the Stopword Filter is not behaving as expected - (The document could be found when I disabled the stopword filter, details later in this mail...) On 20.08.2010 16:57, Yonik Seeley wrote On Thu, Aug 19, 2010 at 11:33 AM, Nikolas Tautenhahn nik_s...@livinglogic.de wrote: But when I search for q=at%26s (=ats), I get nothing. That's the correct encoding if you're typing it directly into a browser address box. http://localhost:8983/solr/select?defType=dismaxqf=textq=at%26sdebugQuery=true But you should be able to verify that solr is getting the correct query string by checking out params in the response (in the example server, by default they are echoed back). And adding debugQuery=true to the request should show you exactly what query is being generated. But the real issue likely lies with your fieldType definition. Can you show that? As I (normally) query multiple fields, I changed my request URL to http://127.0.0.1:8983/solr/select?q=at%26sfl=titelqt=dismaxqf=titeldebugQuery=truefl=*qt=dismaxqf=titeldebugQuery=true in order to narrow it down and got this response (cut to, as I think, relevant stuff) str name=rawquerystringats/str str name=querystringats/str str name=parsedquery+DisjunctionMaxQuery((titel:(ats at) s)~0.1) ()/str str name=parsedquery_toString+(titel:(ats at) s)~0.1 ()/str lst name=explain/ str name=QParserDisMaxQParser/str on my local debugging instance, using standard dismax config (from the examples directory at solr). The titel-Field is configured like this: field name=titel type=textgen indexed=true stored=true/ and textgen is configured like this fieldType name=textgen class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.HTMLStripStandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType The document is indexed correctly, a search for at s found it and all fields looked great (ats and not for example, atamp;s). As my stopword list does not contain at or or amp;, I don't quite understand, why my result is found, when I disable the stopword-list. My stopwordlist can be found here http://pastebin.com/RfLuBHqd Do you happen to see bad things for a string like ats here? The analysis page in the admin panel tells me, these steps for the Index Analyzer: (HTMLStripStandardTokenizer) ats = ats (SynonymFilter) ats = ats (WordDelimiterFilter) ats = term position 1: ats, at; term pos 2: s, ats (LowerCaseFilter) 1: ats, at; 2: s, ats = 1: ats, at; 2: s, ats (StopFilter) 1: ats, at; 2: s, ats = 1: ats, at; 2: ats So, according to this, it should be found even with my stopwords enabled... best regards and thanks for your response, Nikolas Tautenhahn
Re: Problem in setting the request writer in SolrJ (wiki page wrong?)
Note that the 'setRequestWriter' is not part of the SolrServer API, it is on the CommonsHttpSolrServer: http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.html#setRequestWriter%28org.apache.solr.client.solrj.request.RequestWriter%29 If you are using EmbeddedSolrServer, the params are not serialized via RequestWriter, so you don't have any options there. ryan On Mon, Aug 23, 2010 at 9:24 AM, Constantijn Visinescu baeli...@gmail.com wrote: Hello, I'm using an embedded solrserver in my Java webapp, but as far as i can tell it's defaulting to sending updates in XML, which seems like a huge waste compared to sending it in Java binary format. According to this page: http://wiki.apache.org/solr/Solrj#Setting_the_RequestWriter I'm supposed to be able to set the requestwriter like so: server.setRequestWriter(new BinaryRequestWriter()); However this method doesn't seem to exists in the SolrServer class of SolrJ 1.4.1 ? How do i set it to process updates in the java binary format? Thanks in advance, Constantijn Visinescu P.S. I'm creating my SolrServer instance like this: private SolrServer solrServer; CoreContainer container = new CoreContainer.Initializer().initialize(); solrServer = new EmbeddedSolrServer(container, ); this solrServer wont let me set a request writer.
ANNOUNCE: Stump Hoss @ Lucene Revolution
Hey everybody, As you (hopefully) have heard by now, Lucid Imagination is sponsoring a Lucene/Solr conference in Boston about 6 weeks from now. We've got a lot of really great speakers lined up to give some really interesting technical talks, so I offered to do something a little bit different. I'm going to be in the hot seat for a Stump The Chump style session, where I'll be answering Solr questions live and unrehearsed... http://bit.ly/stump-hoss The goal is to really make me sweat and work hard to think of creative solutions to non-trivial problems on the spot -- like when I answer questions on the solr-user mailing list, except in a crowded room with hundreds of people staring at me and laughing. But in order to be a success, we need your questions/problems/challenges! If you had a tough situation with Solr that you managed to solve with a creative solution (or haven't solved yet) and are interesting to see what type of solution I might come up with under pressure, please email a description of your problem to st...@lucenerevolution.org -- More details online... http://lucenerevolution.org/Presentation-Abstracts-Day1#stump-hostetter Even if you won't be able to make it to Boston, please send in any challenging problems you would be interested to see me tackle under the gun. The session will be recorded, and the video will be posted online shortly after the conference has ended. And if you can make it to Boston: all the more fun to watch live and in person (and maybe answer follow up questions) In any case, it should be a very interesting session: folks will either get to learn a lot, or laugh at me a lot, or both. (win/win/win) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: Doing Shingle but also keep special single word
No, I mean that you use an additional field (indexed) for searching (i.e. whitespace-tokenized, so every word - seperated by a whitespace - becomes to a token . So you have got two fields (shingle-token-field and single-token-field). So you can search accross both fields. This provides several benefits: i.e. you can boost the shingle-field at query-time, since a match in a shingle-field would mean, that there matches an exact phrase. Additionally: You can search with single-word-queries as well as multi-word-queries. Furthermore you can apply synonyms to your single-token-field. If you want to keep your index as small as possible but as large as needed, try to understand Lucene's similarity implementation to consider, whether you can set the field option omitNorms=true or omitTermFreqAndPositions=true. http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/search/Similarity.html Keep in mind what happens, if you omit one of those options. A small example of the consequences of setting omitNorms = true;. doc1: this is a short example doc doc2: this is a longer example doc for presenting the effect of omitNorms If you are searching for doc while omitNorms=false your response will look like this: doc1, doc2 This is because the norm-value for doc1 is smaller as the norm-value for doc2, because doc1 is shorter than doc2 (have a look at the provided link). If omitNorms=true, the scores for both docs will be equal. Kind regards, - Mitch scott chu wrote: I don't quite understand additional-field-way? Do you mean making another field that stores special words particularly but no indexing for that field? Scott - Original Message - From: MitchK mitc...@web.de To: solr-user@lucene.apache.org Sent: Sunday, August 22, 2010 11:48 PM Subject: Re: Doing Shingle but also keep special single word Hi, keepword-filter is no solution for this problem, since this would lead to the problematic that one has to manage a word-dictionary. As explained, this would lead to too much effort. You can easily add outputUnigrams=true and check out the analysis.jsp for this field. So you can see how much bigger a single field will become with this option. However, I am quite sure that the difference between using outputUnigrams=true and indexing in a seperate field is not noteworthy. I would suggest you to do it the additionally-field-way, since this would lead to more flexibility in boosting the different fields. Unfortunately, I haven't understood your explanation about the use-case. But it sounds a little bit like tagging? Kind regards, - Mitch iorixxx wrote: Isn't set outputUnigrams=true will make index size about twice than when it's set to false? Sure index will be bigger. I didn't know that this is problem for you. But if you have a list of special single words that you want to keep, keepwordfilter can eliminate other tokens. So index size will be okey. Scott - Original Message - From: Ahmet Arslan iori...@yahoo.com To: solr-user@lucene.apache.org Sent: Saturday, August 21, 2010 1:15 AM Subject: Re: Doing Shingle but also keep special single word I am building index with Shingle filter. We know it's minimum 2-gram but I also want keep some special single word, e.g. IBM, Microsoft, etc. i.e. I want to do a minimum 2-gram but also want to have these single word in my index, Is it possible? outputUnigrams=true parameter does not work for you? After that you can cast filter class=solr.KeepWordFilterFactory words=keepwords.txt ignoreCase=true/ with keepwords.txt=IBM, Microsoft. -- View this message in context: http://lucene.472066.n3.nabble.com/Doing-Shingle-but-also-keep-special-single-word-tp1241204p1276506.html Sent from the Solr - User mailing list archive at Nabble.com. ¥¼¦b¶Ç¤J°T®§¤¤§ä¨ì¯f¬r¡C Checked by AVG - www.avg.com Version: 9.0.851 / Virus Database: 271.1.1/3083 - Release Date: 08/20/10 14:35:00 -- View this message in context: http://lucene.472066.n3.nabble.com/Doing-Shingle-but-also-keep-special-single-word-tp1241204p1300497.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrj ContentStreamUpdateRequest Slow
: ContentStreamUpdateRequest req = new : ContentStreamUpdateRequest(/update/extract); : : System.out.println(setting params...); : req.setParam(stream.url, fileName); : req.setParam(literal.content_id, solrId); ContentStreamUpdateRequest exists so that you can stream content directly from the client to the server -- you aren't doing that, you are asking the server t ogo fetch the stream.url itself. The NullPointerException happens because you've never called ContentStreamUpdateRequest.addFile or ContentStreamUpdateRequest.addContentStream so it gets into a state where it doesn't know what it's doing (admitedely the error message is less then ideal) If you just use a plain old regular UpdateRequest (or even a QueryRequest) instead, your code works as written. -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: How to use synonms on a faceted field with multiple words
: A quick and dirty work around using Solr 1.4 is to replace spaces in the synonm file with : some other character/pattern. I used ## (i.e. video = digital##media). Then add the : solr.PatternReplaceFilterFactory after the synonm filter to replace pattern with space. : This works, but I'd love to know if there is a better way. A feature was added a little while back (i think by Koji) to let you specify a tokenizerFactory attribute when you declare a SynonymFilterFactory -- it's then used to parse the synonyms file. i think it was included in 1.4, but i may be wrong. -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
minMergeDocs supported ?
Heya: IS minMergeDocs SUPPORTED IN soLR ? -- View this message in context: http://lucene.472066.n3.nabble.com/minMergeDocs-supported-tp1302856p1302856.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ANNOUNCE: Stump Hoss @ Lucene Revolution
Chris, I have a couple of questions I would like to through your way. Is there a place where one can sign up for this. Is sounds very interesting. On Mon, Aug 23, 2010 at 4:49 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: Hey everybody, As you (hopefully) have heard by now, Lucid Imagination is sponsoring a Lucene/Solr conference in Boston about 6 weeks from now. We've got a lot of really great speakers lined up to give some really interesting technical talks, so I offered to do something a little bit different. I'm going to be in the hot seat for a Stump The Chump style session, where I'll be answering Solr questions live and unrehearsed... http://bit.ly/stump-hoss The goal is to really make me sweat and work hard to think of creative solutions to non-trivial problems on the spot -- like when I answer questions on the solr-user mailing list, except in a crowded room with hundreds of people staring at me and laughing. But in order to be a success, we need your questions/problems/challenges! If you had a tough situation with Solr that you managed to solve with a creative solution (or haven't solved yet) and are interesting to see what type of solution I might come up with under pressure, please email a description of your problem to st...@lucenerevolution.org -- More details online... http://lucenerevolution.org/Presentation-Abstracts-Day1#stump-hostetter Even if you won't be able to make it to Boston, please send in any challenging problems you would be interested to see me tackle under the gun. The session will be recorded, and the video will be posted online shortly after the conference has ended. And if you can make it to Boston: all the more fun to watch live and in person (and maybe answer follow up questions) In any case, it should be a very interesting session: folks will either get to learn a lot, or laugh at me a lot, or both. (win/win/win) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump! -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Solr jam after all my jvm thread pool hang in blocked state
I, I'm running solr 1.3 in production for now 1 year and i never had any problem with it since 2 weeks. It happen 6-7 times a day, all of my thread but one are in a blocked state. All thread that are blocked are waiting on the Console monitor owned by the Runnable thread. We did not changed anything on the application / server. I have monitored the thread count and there's no accumulation of thread during the period solr is ok. The problem don't seem to be related to high load of queries since it also happen during low load period. Anyone got a clue of is going on ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-jam-after-all-my-jvm-thread-pool-hang-in-blocked-state-tp1303361p1303361.html Sent from the Solr - User mailing list archive at Nabble.com.
lucene + solr: corrupt index
Hi, I am using lucene 3.0 jars and built a lucene index with 200 documents. The index files were then copied over to my solr 1.4.1 installation. I get the following error every time I start SOLR: What could I be doing wrong? SEVERE: Could not start SOLR. Check solr/home property java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: Incompatible format version: 2 expected 1 or lower at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.init(SolrCore.java:579) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) Caused by: org.apache.lucene.index.CorruptIndexException: Incompatible format version: 2 expected 1 or lower at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:117) at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:291) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:654) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:613) at org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:104) at org.apache.lucene.index.ReadOnlyDirectoryReader.init(ReadOnlyDirectoryReader.java:27) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:74) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) at org.apache.lucene.index.IndexReader.open(IndexReader.java:476) at org.apache.lucene.index.IndexReader.open(IndexReader.java:403) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1057) ... 27 more
Re: lucene + solr: corrupt index
(10/08/24 10:02), ANurag wrote: Hi, I am using lucene 3.0 jars and built a lucene index with 200 documents. The index files were then copied over to my solr 1.4.1 installation. I get the following error every time I start SOLR: What could I be doing wrong? Solr 1.4 can read Lucene 2.9 index or older. Koji -- http://www.rondhuit.com/en/
Re: lucene + solr: corrupt index
Thx Koji, I tried 2.9.3 and it works :-) On Mon, Aug 23, 2010 at 6:15 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: (10/08/24 10:02), ANurag wrote: Hi, I am using lucene 3.0 jars and built a lucene index with 200 documents. The index files were then copied over to my solr 1.4.1 installation. I get the following error every time I start SOLR: What could I be doing wrong? Solr 1.4 can read Lucene 2.9 index or older. Koji -- http://www.rondhuit.com/en/
about readercycle script
I'm working on SOLR-2046 and realized that readercycle script might be looking for old(?) Solr response format, therefore, today it always fails: https://issues.apache.org/jira/browse/SOLR-2046 Since I've looked for issues regarding readercycle in jira and maling list archives so far, nobody complain about it, I think there is no users. Is there anybody get in trouble for deleting readercycle script? Thanks, Koji -- http://www.rondhuit.com/en/
Re: ANNOUNCE: Stump Hoss @ Lucene Revolution
: I have a couple of questions I would like to through your way. : : Is there a place where one can sign up for this. Heh sure, all the details were in my email... : http://bit.ly/stump-hoss ...and... : type of solution I might come up with under pressure, please email a : description of your problem to st...@lucenerevolution.org -- More details : online... : : http://lucenerevolution.org/Presentation-Abstracts-Day1#stump-hostetter -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: Doing Shingle but also keep special single word
The request is from our business team, they wish user of our product can type in partial string of a word that exists in title or body field. But now I also doubt if this request is really necessary? Scott - Original Message - From: Ahmet Arslan iori...@yahoo.com To: solr-user@lucene.apache.org Sent: Monday, August 23, 2010 8:35 PM Subject: Re: Doing Shingle but also keep special single word 1. We have over ten million news articles to build into Solr index. 2. We copy several fields, such as title, author, body, caption of attahed photos into a new field for default search. 3. We then wanna use shingle filter on this new field. 4. We can't predict what new single-word noun that our users may be interesting cause it's news, you know. For exmple, the word ECFA is only very popular word in news here recently, so I wish users can type in 'ECFA' to search and Solr will output see some relevant news articles. 5. I wish to keep index as smaller as possible. 6. I also wish to do same thing descirbed in 5 when I search by explicitly specifyng field name of those fields, too. Can i ask why do you need/use shingle filter? ¥¼¦b¶Ç¤J°T®§¤¤§ä¨ì¯f¬r¡C Checked by AVG - www.avg.com Version: 9.0.851 / Virus Database: 271.1.1/3088 - Release Date: 08/23/10 02:35:00
Re: Doing Shingle but also keep special single word
Thanks! I'll give more effort to understand your suggestion that Norm thing. - Original Message - From: MitchK mitc...@web.de To: solr-user@lucene.apache.org Sent: Tuesday, August 24, 2010 5:28 AM Subject: Re: Doing Shingle but also keep special single word No, I mean that you use an additional field (indexed) for searching (i.e. whitespace-tokenized, so every word - seperated by a whitespace - becomes to a token . So you have got two fields (shingle-token-field and single-token-field). So you can search accross both fields. This provides several benefits: i.e. you can boost the shingle-field at query-time, since a match in a shingle-field would mean, that there matches an exact phrase. Additionally: You can search with single-word-queries as well as multi-word-queries. Furthermore you can apply synonyms to your single-token-field. If you want to keep your index as small as possible but as large as needed, try to understand Lucene's similarity implementation to consider, whether you can set the field option omitNorms=true or omitTermFreqAndPositions=true. http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/search/Similarity.html Keep in mind what happens, if you omit one of those options. A small example of the consequences of setting omitNorms = true;. doc1: this is a short example doc doc2: this is a longer example doc for presenting the effect of omitNorms If you are searching for doc while omitNorms=false your response will look like this: doc1, doc2 This is because the norm-value for doc1 is smaller as the norm-value for doc2, because doc1 is shorter than doc2 (have a look at the provided link). If omitNorms=true, the scores for both docs will be equal. Kind regards, - Mitch scott chu wrote: I don't quite understand additional-field-way? Do you mean making another field that stores special words particularly but no indexing for that field? Scott - Original Message - From: MitchK mitc...@web.de To: solr-user@lucene.apache.org Sent: Sunday, August 22, 2010 11:48 PM Subject: Re: Doing Shingle but also keep special single word Hi, keepword-filter is no solution for this problem, since this would lead to the problematic that one has to manage a word-dictionary. As explained, this would lead to too much effort. You can easily add outputUnigrams=true and check out the analysis.jsp for this field. So you can see how much bigger a single field will become with this option. However, I am quite sure that the difference between using outputUnigrams=true and indexing in a seperate field is not noteworthy. I would suggest you to do it the additionally-field-way, since this would lead to more flexibility in boosting the different fields. Unfortunately, I haven't understood your explanation about the use-case. But it sounds a little bit like tagging? Kind regards, - Mitch iorixxx wrote: Isn't set outputUnigrams=true will make index size about twice than when it's set to false? Sure index will be bigger. I didn't know that this is problem for you. But if you have a list of special single words that you want to keep, keepwordfilter can eliminate other tokens. So index size will be okey. Scott - Original Message - From: Ahmet Arslan iori...@yahoo.com To: solr-user@lucene.apache.org Sent: Saturday, August 21, 2010 1:15 AM Subject: Re: Doing Shingle but also keep special single word I am building index with Shingle filter. We know it's minimum 2-gram but I also want keep some special single word, e.g. IBM, Microsoft, etc. i.e. I want to do a minimum 2-gram but also want to have these single word in my index, Is it possible? outputUnigrams=true parameter does not work for you? After that you can cast filter class=solr.KeepWordFilterFactory words=keepwords.txt ignoreCase=true/ with keepwords.txt=IBM, Microsoft. -- View this message in context: http://lucene.472066.n3.nabble.com/Doing-Shingle-but-also-keep-special-single-word-tp1241204p1276506.html Sent from the Solr - User mailing list archive at Nabble.com. ¥¼¦b¶Ç¤J°T®§¤¤§ä¨ì¯f¬r¡C Checked by AVG - www.avg.com Version: 9.0.851 / Virus Database: 271.1.1/3083 - Release Date: 08/20/10 14:35:00 -- View this message in context: http://lucene.472066.n3.nabble.com/Doing-Shingle-but-also-keep-special-single-word-tp1241204p1300497.html Sent from the Solr - User mailing list archive at Nabble.com. ___b___J_T_f_r_C Checked by AVG - www.avg.com Version: 9.0.851 / Virus Database: 271.1.1/3090 - Release Date: 08/24/10 02:34:00
Why it's boosted up?
In Lucene's web page, there's a paragraph: Indexing time boosts are preprocessed for storage efficiency and written to the directory (when writing the document) in a single byte (!) as follows: For each field of a document, all boosts of that field (i.e. all boosts under the same field name in that doc) are multiplied. The result is multiplied by the boost of the document, and also multiplied by a field length norm value that represents the length of that field in that doc (so shorter fields are automatically boosted up). I though the greater the value, the boosting is upper. Then why short fields are boost up? Isn't Norm value for short fields smaller?
Solr Hangs up after couple of hours
Hi all, I am facing a peculiar problem with Solr querying. During our indexing process we analyze the existing index. For this we query the index. We found that the solr server just hangs on a arbitrary query. If we access the admin/stats.jsp, it again resumes executing the queries. The thread count and memory utilization looks very normal. Any clues on whats going on will be very helpful. Thanks Kalyan
SolrJ addField with Reader
I am using SolrJ with embedded Solr server and some documents have a lot of text. Solr will be running on a small device with very limited memory. In my tests I cannot process more than 3MB of text (in a body) with 64MB heap. According to Java there is about 30MB free memory before I call server.add and with 5MB of text it runs out of memory. Is there a way around this? Is there a plan to enhance SolrJ to allow a reader to be passed in instead of a string? thx! b
Re: Solr jam after all my jvm thread pool hang in blocked state
It would be helpful it you can attached a threads dump. BIll On Mon, Aug 23, 2010 at 6:00 PM, AlexxelA alexandre.boudrea...@canoe.cawrote: I, I'm running solr 1.3 in production for now 1 year and i never had any problem with it since 2 weeks. It happen 6-7 times a day, all of my thread but one are in a blocked state. All thread that are blocked are waiting on the Console monitor owned by the Runnable thread. We did not changed anything on the application / server. I have monitored the thread count and there's no accumulation of thread during the period solr is ok. The problem don't seem to be related to high load of queries since it also happen during low load period. Anyone got a clue of is going on ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-jam-after-all-my-jvm-thread-pool-hang-in-blocked-state-tp1303361p1303361.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Hangs up after couple of hours
It would be very useful if you can take a threads dump while Solr is hanging. That will give indication where/why Solr is hanging. Bill On Mon, Aug 23, 2010 at 9:32 PM, Manepalli, Kalyan kalyan.manepa...@orbitz.com wrote: Hi all, I am facing a peculiar problem with Solr querying. During our indexing process we analyze the existing index. For this we query the index. We found that the solr server just hangs on a arbitrary query. If we access the admin/stats.jsp, it again resumes executing the queries. The thread count and memory utilization looks very normal. Any clues on whats going on will be very helpful. Thanks Kalyan