Re: Indexing from a DB, corrupt Lucene index
the fact that there is nothing in the data dir sugests that you are looking at the wrong directory. Just fire a query for *:* and it will tell you if there are indeed documents in the index. The statistics admin page can tell you where the index is created On Thu, Apr 23, 2009 at 12:25 AM, ahammad ahmed.ham...@gmail.com wrote: Excuse the error in the title. It should say missing Lucene index Cheers ahammad wrote: Hello, I finally was able to run a full import on an Oracle database. According to the statistics, it looks like it fetched all the rows from the table. However, When I go into solrhome/data, there is nothing in there. This is my data-config.xml file: dataConfig dataSource driver=oracle.jdbc.driver.OracleDriver url=url user= password=/ document name=article entity name=akb query=select * from akb field column=TITLE name=title / field column=STATUS name=status / field column=BODY name=body / field column=ID name=id / entity name=akbr query=select USER from AKBR where AID='${akb.ID}' field column=USER name=user / /entity /entity /document /dataConfig I added all the relevant fileds in the schema.xml file. From the interface when I do dataimport?command=full-import, it says that n rows were fetched, where n is the actual number of rows in the DB table. Everything looks great from there, but there is nothing in my data folder. In solrconfig.xml, the line that defines the location where data is stored is: dataDir${solr.data.dir:./solr/data}/dataDir What am I missing exactly? BTW, the Tomcat logs don't show errors or anything like that. Cheers and Thank you. -- View this message in context: http://www.nabble.com/Indexing-from-a-DB%2C-corrupt-Lucene-index-tp23175796p23175805.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: replicated index files have incorrect timestamp
Let me assume that you are using the in-inbuilt replication. The replication ties to set the timestamp of all the files same as that of the files in the master. just cross check. On Thu, Apr 23, 2009 at 6:57 AM, Jian Han Guo jian...@gmail.com wrote: Hi, I am using nightly build on 4/22/2009. Replication works fine, but the files inside index directory on slave side all have old timestamp: Dec 31 1969. Is this a known issue? Thanks, Jianhan -- --Noble Paul
Re: replicated index files have incorrect timestamp
That's right. The timestamp of files on the slave side are all Dec 31 1969, so it looks the timestamp was not set (and therefore it is zero). The ones on the master side are all correct. Nevertheless, solr seems being able to recognize that master and slave are in sync after replication. Don't know how it does that. I haven't check if the two machines are in sync, but even if they are not, the timestamp should not be Dec 31, 1969, I think. Thanks, Jianhan 2009/4/22 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com Let me assume that you are using the in-inbuilt replication. The replication ties to set the timestamp of all the files same as that of the files in the master. just cross check. On Thu, Apr 23, 2009 at 6:57 AM, Jian Han Guo jian...@gmail.com wrote: Hi, I am using nightly build on 4/22/2009. Replication works fine, but the files inside index directory on slave side all have old timestamp: Dec 31 1969. Is this a known issue? Thanks, Jianhan -- --Noble Paul
Re: Sorting dates with reduced precision
On 04/22/2009 03:20 PM, Ensdorf Ken wrote: Yes, but dates are fairly spesific, say 06:45 Nov. 2 , 2009. What if I want to say Sort so that withing entries for Nov. 2 , you sort by relevance for example? Append /DAY to the date value you index, for example 1995-12-31T23:59:59Z/DAY will yield 1995-12-31 So that all documents with the same date will then be sorted by relevance or whatever you specify as the next criteria in the sort parameter. Thanks, this happens at indexing time? kind regards, Tarjei
Re: replicated index files have incorrect timestamp
which OS are you using? it does not look at the timestamps to decide if the index is in sync . It looks at the index version only. BTW can you just hit the master withe url and paste the response here http://masterhost:port/solr/replication?command=filelist On Thu, Apr 23, 2009 at 11:53 AM, Jian Han Guo jian...@gmail.com wrote: That's right. The timestamp of files on the slave side are all Dec 31 1969, so it looks the timestamp was not set (and therefore it is zero). The ones on the master side are all correct. Nevertheless, solr seems being able to recognize that master and slave are in sync after replication. Don't know how it does that. I haven't check if the two machines are in sync, but even if they are not, the timestamp should not be Dec 31, 1969, I think. Thanks, Jianhan 2009/4/22 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com Let me assume that you are using the in-inbuilt replication. The replication ties to set the timestamp of all the files same as that of the files in the master. just cross check. On Thu, Apr 23, 2009 at 6:57 AM, Jian Han Guo jian...@gmail.com wrote: Hi, I am using nightly build on 4/22/2009. Replication works fine, but the files inside index directory on slave side all have old timestamp: Dec 31 1969. Is this a known issue? Thanks, Jianhan -- --Noble Paul -- --Noble Paul
Re: replicated index files have incorrect timestamp
I am using Mac OS 10.5. I can't access the box right now and this week. I'll do it next week and post the result then. Thanks, Jianhan 2009/4/22 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com which OS are you using? it does not look at the timestamps to decide if the index is in sync . It looks at the index version only. BTW can you just hit the master withe url and paste the response here http://masterhost:port/solr/replication?command=filelist On Thu, Apr 23, 2009 at 11:53 AM, Jian Han Guo jian...@gmail.com wrote: That's right. The timestamp of files on the slave side are all Dec 31 1969, so it looks the timestamp was not set (and therefore it is zero). The ones on the master side are all correct. Nevertheless, solr seems being able to recognize that master and slave are in sync after replication. Don't know how it does that. I haven't check if the two machines are in sync, but even if they are not, the timestamp should not be Dec 31, 1969, I think. Thanks, Jianhan 2009/4/22 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com Let me assume that you are using the in-inbuilt replication. The replication ties to set the timestamp of all the files same as that of the files in the master. just cross check. On Thu, Apr 23, 2009 at 6:57 AM, Jian Han Guo jian...@gmail.com wrote: Hi, I am using nightly build on 4/22/2009. Replication works fine, but the files inside index directory on slave side all have old timestamp: Dec 31 1969. Is this a known issue? Thanks, Jianhan -- --Noble Paul -- --Noble Paul
Re: autowarmcount how to check if cache has been warmed up
It looks like it doesnt warm up, no? sunnyfr wrote: still the same ? Seems done : lookups : 0 hits : 0 hitratio : 0.00 inserts : 0 evictions : 0 size : 5 warmupTime : 20973 cumulative_lookups : 0 cumulative_hits : 0 cumulative_hitratio : 0.00 cumulative_inserts : 0 cumulative_evictions : 0 Apr 22 11:09:29 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:29 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@48b6c333 main from searc...@79e79d96 main ^IfieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Apr 22 11:09:29 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:29 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@48b6c333 main ^IfieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Apr 22 11:09:29 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:29 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@48b6c333 main from searc...@79e79d96 main ^IfilterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Apr 22 11:09:29 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:29 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@48b6c333 main ^IfilterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Apr 22 11:09:29 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:29 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@48b6c333 main from searc...@79e79d96 main ^IqueryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=5,warmupTime=3055,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Apr 22 11:09:50 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:50 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@48b6c333 main ^IqueryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=5,warmupTime=20973,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Apr 22 11:09:50 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:50 AM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to searc...@48b6c333 main Apr 22 11:09:50 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:50 AM org.apache.solr.core.SolrCore execute INFO: [video] webapp=null path=null params={start=0q=solrrows=100} hits=164 status=0 QTime=0 Apr 22 11:09:50 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:50 AM org.apache.solr.core.SolrCore execute INFO: [video] webapp=null path=null params={start=0q=rocksrows=100} hits=167581 status=0 QTime=51 Apr 22 11:09:51 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:51 AM org.apache.solr.core.SolrCore execute INFO: [video] webapp=null path=null params={sort=id+descq=anything} hits=8419 status=0 QTime=50 Apr 22 11:09:51 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:51 AM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done. Apr 22 11:09:51 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:51 AM org.apache.solr.core.SolrCore registerSearcher INFO: [video] Registered new searcher searc...@48b6c333 main Apr 22 11:09:51 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:51 AM org.apache.solr.search.SolrIndexSearcher close INFO: Closing searc...@79e79d96 main ^IfieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} ^IfilterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} ^IqueryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=5,warmupTime=3055,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Apr 22 11:09:51 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:51 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Apr 22 11:09:51 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:51 AM org.apache.solr.handler.dataimport.SolrWriter persist INFO: Wrote last indexed time to dataimport.properties Apr 22 11:09:51 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:51 AM org.apache.solr.handler.dataimport.DocBuilder execute INFO: Time
Some characters are searchable
Hi, I am trying to search the following characters present through solr. `, @, #, $, %, _ , , , . But I am not getting any result back, even if those characters are present in the document . So my question is are these characters getting indexed? Thanks, Koushik CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
Re: Access HTTP headers from custom request handler
Hello Hoss, thank you for your reply. I have no problems subclassing the SolrDispatchFilter...but where shall I configure it? :-) I cannot find any doc/wiki explaining how to configure a custom dispatch filter. I believe it should be in solrconfig.xml requestDispatcher ... ... /requestDispatcher Any idea? Is there a schema for solrconfig.xml? It would make my life easier... ;-) Thanks, Giovanni On Wed, Apr 15, 2009 at 12:48 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Solr cannot assume that the request would always come from http (think : of EmbeddedSolrServer) .So it assumes that there are only parameters exactly. : Your best bet is to modify SolrDispatchFilter and readthe params and : set them in the SolrRequest Object SolrDispatchFilter is designed to be subclassed to make this easy by overriding the execute method... protected void execute( HttpServletRequest req, SolrRequestHandler handler, SolrQueryRequest sreq, SolrQueryResponse rsp) { sreq.getContext().put( HttpServletRequest, req ); super.execute( req, handler, sreq, rsp ) } -Hoss
Re: Some characters are searchable
Not with the default-analyzers. But certainly with a whitespaceanalyzer. paul Le 23-avr.-09 à 11:57, Koushik Mitra a écrit : Hi, I am trying to search the following characters present through solr. `, @, #, $, %, _ , , , . But I am not getting any result back, even if those characters are present in the document . So my question is are these characters getting indexed? smime.p7s Description: S/MIME cryptographic signature
Re: MLT for sorting results?
That is true *only if* you combine those 2 clauses with AND. It's not true with OR. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Shrutipriya shrutipr...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, April 22, 2009 11:45:30 PM Subject: Re: MLT for sorting results? true. but in the normal process of search Solr uses parametric fields as filters. so if i do the following search keyword = java team lead ; location (parametric)=delhi, i will not get docs that match the keywords exactly with a different location. -shruti On Thu, Apr 23, 2009 at 3:29 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: What you describe is what normal Solr search does already - you can think of the query as a very small document and the search as a process that tries to find documents in the index that are the most similar to that query document. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Shrutipriya To: solr-user@lucene.apache.org Sent: Wednesday, April 22, 2009 1:50:25 PM Subject: MLT for sorting results? hi, i was wondering if anyone has used solr MLT (more like this) for sorting search results i.e. documents that are most like the query appear on top and so on. so the query is itself treated like a document and one tries finding docs similar to it from the corpus. is there a way to set precision in the MLT handler to help with sorting? (documents that match 99.9% on top, then 99% down to 0.1% or whatever) thanks, shruti
Synonym file in a different location
Hi All, I am trying to use synonyms in my project. I would like to know whether it is possible to pick the synonyms.txt file from a configurable location. Ideally I would like to specify the location in a properties file and make solr read it to load the synonyms file. Could any one please let me know how we can achieve this? Thanks in advance. Regards, Raja -- View this message in context: http://www.nabble.com/Synonym-file-in-a-different-location-tp23195669p23195669.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Synonym file in a different location
Hi Raja, Try putting the absolute path to the synonyms file in the schema.xml. If that doesn't work you can always just use 'ln': http://unixhelp.ed.ac.uk/CGI/man-cgi?ln Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: rajam r...@portaltech.net To: solr-user@lucene.apache.org Sent: Thursday, April 23, 2009 8:14:08 AM Subject: Synonym file in a different location Hi All, I am trying to use synonyms in my project. I would like to know whether it is possible to pick the synonyms.txt file from a configurable location. Ideally I would like to specify the location in a properties file and make solr read it to load the synonyms file. Could any one please let me know how we can achieve this? Thanks in advance. Regards, Raja -- View this message in context: http://www.nabble.com/Synonym-file-in-a-different-location-tp23195669p23195669.html Sent from the Solr - User mailing list archive at Nabble.com.
prefix matching
Hi all, I'm trying to use prefixes to match similar strings to a query string. I have the following field type: fieldtype name=prefix stored=true indexed=true class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=10/ /analyzer /fieldtype field: field name=wordPrefix type=prefix indexed=true stored=true/ copyField: copyField source=word dest=wordPrefix/ If I apply this to an indexed string: ipod shuffle and query string: shufle (missing f) I get matching terms for sh, shu shuf Index Analyzer ipodshuffle ipodshuffle ipodshuffle ipipoipodshshushuf shuffshufflshuffle Query Analyzer shufle shufle shufle shshushufshufl shufle However when I query for with shufle i get no results: http://localhost:8983/solr/select?q=wordPrefix%3Ashuflefl=wordPrefixqt=standarddebugQuery=on lst name=debug str name=rawquerystringwordPrefix:shufle/str str name=querystringwordPrefix:shufle/str - str name=parsedquery PhraseQuery(wordPrefix:sh hu uf fl le shu huf ufl fle shuf hufl ufle shufl hufle shufle) /str - str name=parsedquery_toString wordPrefix:sh hu uf fl le shu huf ufl fle shuf hufl ufle shufl hufle shufle /str This post suggests that I need to set the Position Increment for the my token filter, but I'm not sure how to do that or if it's possible. http://www.lucidimagination.com/search/document/bc643c39f0b6e423/queryparser_and_ngrams#629b39ea39aa9cd4 Thoughts? Thanks...Tom
Re: Access HTTP headers from custom request handler
nope. you must edit the web.xml and register the filter there On Thu, Apr 23, 2009 at 3:45 PM, Giovanni De Stefano giovanni.destef...@gmail.com wrote: Hello Hoss, thank you for your reply. I have no problems subclassing the SolrDispatchFilter...but where shall I configure it? :-) I cannot find any doc/wiki explaining how to configure a custom dispatch filter. I believe it should be in solrconfig.xml requestDispatcher ... ... /requestDispatcher Any idea? Is there a schema for solrconfig.xml? It would make my life easier... ;-) Thanks, Giovanni On Wed, Apr 15, 2009 at 12:48 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Solr cannot assume that the request would always come from http (think : of EmbeddedSolrServer) .So it assumes that there are only parameters exactly. : Your best bet is to modify SolrDispatchFilter and readthe params and : set them in the SolrRequest Object SolrDispatchFilter is designed to be subclassed to make this easy by overriding the execute method... protected void execute( HttpServletRequest req, SolrRequestHandler handler, SolrQueryRequest sreq, SolrQueryResponse rsp) { sreq.getContext().put( HttpServletRequest, req ); super.execute( req, handler, sreq, rsp ) } -Hoss -- --Noble Paul
RE: Sorting dates with reduced precision
Yes, but dates are fairly spesific, say 06:45 Nov. 2 , 2009. What if I want to say Sort so that withing entries for Nov. 2 , you sort by relevance for example? Append /DAY to the date value you index, for example 1995-12-31T23:59:59Z/DAY will yield 1995-12-31 So that all documents with the same date will then be sorted by relevance or whatever you specify as the next criteria in the sort parameter. Thanks, this happens at indexing time? Yes
Custom score for a id field
I have a requirement.I index a field id and a calculated score for that field named fieldScore. Note:I have many other fields which are also indexed.But only for this id field i want a custom calculated score. So when I search for that id q=id:1234.What I want is in the results if I use result.getScore() i should get the indexed score(fieldName :fieldScore) for the id instead of the default solr score. Please let me know the solution to do this. Thanks, Raju -- View this message in context: http://www.nabble.com/Custom-score-for-a-id-field-tp23197465p23197465.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Control segment size
Hi, You are looking for maxMergeDocs, I believe. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar vivex...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, April 23, 2009 1:08:20 PM Subject: Control segment size Hi, Is there any configuration to control the segments' file size in Solr? Currently, I've an index (70G) with 80 segment files and one of the file is 24G. We noticed that in some cases commit takes over 2 hours to complete (committing 50K records), whereas usually it finishes in 20 seconds. After further investigation it turns out the system was doing lot of paging - the file system buffer was trying to write back the big segment back to disk. I got 20G memory on system with 6 G assigned to Solr instance (running 2 instances). It seems if I can control the segment size to max of 4-5 GB I'll be ok. Is there any way to do so? I got merging factor of 100 - does that impacts the size too? Why different segments have different size? Thanks, -vivek
storing xml - how to highlight hits in response?
Hi, I'm storing some raw xml in solr (stored and non-tokenized). I'd like to highlight hits in the response, obviously this is problematic as the highlighting elements are also xml. So if I match an attribute value or tag name, the xml response is messed up. Is there a way to highlight only text, that is not part of an xml element? As in, only the text content? Matt
RE: storing xml - how to highlight hits in response?
Hi, I'm storing some raw xml in solr (stored and non-tokenized). I'd like to highlight hits in the response, obviously this is problematic as the highlighting elements are also xml. So if I match an attribute value or tag name, the xml response is messed up. Is there a way to highlight only text, that is not part of an xml element? As in, only the text content? You could create a custom Analyzer or Tokenizer that strips everything but the text content. -Ken
RE: Highlight question
Thanks a lot for your answer, I'm going to test and I will reply. Bertrand Ensdorf Ken wrote: Add the following parameters to the url: hl=truehl.fl=xhtml http://wiki.apache.org/solr/HighlightingParameters -Original Message- From: Bertrand DUMAS-PILHOU [mailto:bdum...@eurocortex.fr] Sent: Wednesday, April 22, 2009 4:43 PM To: solr-user@lucene.apache.org Subject: Highlight question Hi everybody, I have an schema seems like this in SOLR: title, type:string , indexed not stored body, type:string, stemmed, indexed not stored xhtml, type:string, not indexed, stored When user make an search on field title, body or both, I want to highlight the match string in the xhtml field only. How I can do this ? Thanks and sorry for my english. -- View this message in context: http://www.nabble.com/Highlight-question- tp23175851p23175851.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Highlight-question-tp23175851p23198244.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: replicated index files have incorrect timestamp
We see the exact same thing. Additionally, that url returns 404 on a multicore and gives an error when I add the core. − response − lst name=responseHeader int name=status0/int int name=QTime0/int /lst str name=statusno indexversion specified/str /response -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Jian Han Guo jian...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Wed, 22 Apr 2009 23:43:02 -0700 To: solr-user@lucene.apache.org Subject: Re: replicated index files have incorrect timestamp I am using Mac OS 10.5. I can't access the box right now and this week. I'll do it next week and post the result then. Thanks, Jianhan 2009/4/22 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com which OS are you using? it does not look at the timestamps to decide if the index is in sync . It looks at the index version only. BTW can you just hit the master withe url and paste the response here http://masterhost:port/solr/replication?command=filelist On Thu, Apr 23, 2009 at 11:53 AM, Jian Han Guo jian...@gmail.com wrote: That's right. The timestamp of files on the slave side are all Dec 31 1969, so it looks the timestamp was not set (and therefore it is zero). The ones on the master side are all correct. Nevertheless, solr seems being able to recognize that master and slave are in sync after replication. Don't know how it does that. I haven't check if the two machines are in sync, but even if they are not, the timestamp should not be Dec 31, 1969, I think. Thanks, Jianhan 2009/4/22 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com Let me assume that you are using the in-inbuilt replication. The replication ties to set the timestamp of all the files same as that of the files in the master. just cross check. On Thu, Apr 23, 2009 at 6:57 AM, Jian Han Guo jian...@gmail.com wrote: Hi, I am using nightly build on 4/22/2009. Replication works fine, but the files inside index directory on slave side all have old timestamp: Dec 31 1969. Is this a known issue? Thanks, Jianhan -- --Noble Paul -- --Noble Paul
modify SOLR scoring
Hi everybody, I'm using SOLR with a schema (for example) like this: parutiondate, date, indexed, not stored fulltext, stemmed, indexed, not stored I know it's possible to order by a field or more, but I want to order by score and modify the scrore formula. I'll want keep the SOLR score but add a new parameter in the formula to boost the score of the most recent document. What is the best way to do this ? Thanks. Excuse for my english. -- View this message in context: http://www.nabble.com/modify-SOLR-scoring-tp23198326p23198326.html Sent from the Solr - User mailing list archive at Nabble.com.
Change boost of documents / single fields / external scoring ?
Hi. Confusing subject eh ? Trying to become a little clearer in a few sentences. We have a Solr/Lucene index where each document is a Blog Entry. We have just implemented the PageRank algorithm for Blogs and are about to add a column to the index called score and perhaps adjust the document boost. We have as well decided that it is the blog itself and not the individual pages that are to be ranked so all entries belonging to one blog will receive the same score. I have not found a way to apply a document score without actually re-indexing all fields in the affected entries (could very well be 100% at every PageRank recalculation) and this will of course take hell of a long time to reindex which effectively will render the process useless since it would take a week or of reindexing as of current and will take more and more time. (100M blog entries as of current and rapidly increasing). Guess we have run into the issue where we have some static data which we do not want to touch at all but we want to update certain dynamic fields. Lucene is not a database I know but is there a way to implement external search-time scoring or update individual fields ? Would there be a possibilty to do some kind of join (parallell searches separate index types) ? or send the result to a separate sorting algorithm ? Hmmm Perhaps a subclass of Sort ? Grasping at straws here folks... Hope anyone of the core experts can help us. Cheers //Marcus Herou -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/
RE: modify SOLR scoring
I believe you can use a function query to do this: http://wiki.apache.org/solr/FunctionQuery if you embed the following in your query, you should get a boost for more recent date values: _val_:ord(dateField) Where dateField is the field name of the date you want to use. -Original Message- From: Bertrand DUMAS-PILHOU [mailto:bdum...@eurocortex.fr] Sent: Thursday, April 23, 2009 3:44 PM To: solr-user@lucene.apache.org Subject: modify SOLR scoring Hi everybody, I'm using SOLR with a schema (for example) like this: parutiondate, date, indexed, not stored fulltext, stemmed, indexed, not stored I know it's possible to order by a field or more, but I want to order by score and modify the scrore formula. I'll want keep the SOLR score but add a new parameter in the formula to boost the score of the most recent document. What is the best way to do this ? Thanks. Excuse for my english. -- View this message in context: http://www.nabble.com/modify-SOLR- scoring-tp23198326p23198326.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: replicated index files have incorrect timestamp
You need to specify the index version number for which list of files is to be shown. The URL should be like this:http://masterhost:port/solr/replication?command=filelistindexversion=index version number You can get the index version number from the URL: http://masterhost:port/solr/replication?command=indexversion On Fri, Apr 24, 2009 at 1:10 AM, Jeff Newburn jnewb...@zappos.com wrote: We see the exact same thing. Additionally, that url returns 404 on a multicore and gives an error when I add the core. − response − lst name=responseHeader int name=status0/int int name=QTime0/int /lst str name=statusno indexversion specified/str /response -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Jian Han Guo jian...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Wed, 22 Apr 2009 23:43:02 -0700 To: solr-user@lucene.apache.org Subject: Re: replicated index files have incorrect timestamp I am using Mac OS 10.5. I can't access the box right now and this week. I'll do it next week and post the result then. Thanks, Jianhan 2009/4/22 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com which OS are you using? it does not look at the timestamps to decide if the index is in sync . It looks at the index version only. BTW can you just hit the master withe url and paste the response here http://masterhost:port/solr/replication?command=filelist On Thu, Apr 23, 2009 at 11:53 AM, Jian Han Guo jian...@gmail.com wrote: That's right. The timestamp of files on the slave side are all Dec 31 1969, so it looks the timestamp was not set (and therefore it is zero). The ones on the master side are all correct. Nevertheless, solr seems being able to recognize that master and slave are in sync after replication. Don't know how it does that. I haven't check if the two machines are in sync, but even if they are not, the timestamp should not be Dec 31, 1969, I think. Thanks, Jianhan 2009/4/22 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com Let me assume that you are using the in-inbuilt replication. The replication ties to set the timestamp of all the files same as that of the files in the master. just cross check. On Thu, Apr 23, 2009 at 6:57 AM, Jian Han Guo jian...@gmail.com wrote: Hi, I am using nightly build on 4/22/2009. Replication works fine, but the files inside index directory on slave side all have old timestamp: Dec 31 1969. Is this a known issue? Thanks, Jianhan -- --Noble Paul -- --Noble Paul -- Regards, Akshay K. Ukey.
Re: storing xml - how to highlight hits in response?
Yeah great idea, thanks. Does anyone know if there is code out there that will do this sort of thing? Matt On Thu, Apr 23, 2009 at 3:23 PM, Ensdorf Ken ensd...@zoominfo.com wrote: Hi, I'm storing some raw xml in solr (stored and non-tokenized). I'd like to highlight hits in the response, obviously this is problematic as the highlighting elements are also xml. So if I match an attribute value or tag name, the xml response is messed up. Is there a way to highlight only text, that is not part of an xml element? As in, only the text content? You could create a custom Analyzer or Tokenizer that strips everything but the text content. -Ken
Re: modify SOLR scoring
Hi. I am interested in a very similar topic like yours. I want to modify the field named score and the document boost but not reindex the all fields since it would take to much power. Please let me know if you find a solution to this. Kindly //Marcus On Thu, Apr 23, 2009 at 10:02 PM, Ensdorf Ken ensd...@zoominfo.com wrote: I believe you can use a function query to do this: http://wiki.apache.org/solr/FunctionQuery if you embed the following in your query, you should get a boost for more recent date values: _val_:ord(dateField) Where dateField is the field name of the date you want to use. -Original Message- From: Bertrand DUMAS-PILHOU [mailto:bdum...@eurocortex.fr] Sent: Thursday, April 23, 2009 3:44 PM To: solr-user@lucene.apache.org Subject: modify SOLR scoring Hi everybody, I'm using SOLR with a schema (for example) like this: parutiondate, date, indexed, not stored fulltext, stemmed, indexed, not stored I know it's possible to order by a field or more, but I want to order by score and modify the scrore formula. I'll want keep the SOLR score but add a new parameter in the formula to boost the score of the most recent document. What is the best way to do this ? Thanks. Excuse for my english. -- View this message in context: http://www.nabble.com/modify-SOLR- scoring-tp23198326p23198326.html Sent from the Solr - User mailing list archive at Nabble.com. -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/
Re: Custom score for a id field
Did you find an answer to this ? On Thu, Apr 23, 2009 at 7:19 PM, Raju444us gudipal...@gmail.com wrote: I have a requirement.I index a field id and a calculated score for that field named fieldScore. Note:I have many other fields which are also indexed.But only for this id field i want a custom calculated score. So when I search for that id q=id:1234.What I want is in the results if I use result.getScore() i should get the indexed score(fieldName :fieldScore) for the id instead of the default solr score. Please let me know the solution to do this. Thanks, Raju -- View this message in context: http://www.nabble.com/Custom-score-for-a-id-field-tp23197465p23197465.html Sent from the Solr - User mailing list archive at Nabble.com. -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/
Re: Synonym file in a different location or loading synonyms from database
Thanks Otis. I tried putting the absolute path and it worked. But I wanted something configurable so that it can be changed if required.(may be thro an admin interface?) In the mean time, another idea stuck to maintain all the synonyms in database. I tried writing a FilterFactory of my own for creating the SynonymMap and SynonymFilter. I couldn't get this working and getting NullPointerException as in the stack trace below. 21:16:10,921 ERROR [STDERR] 23-Apr-2009 21:16:10 org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at org.apache.solr.analysis.SynonymFilter.next(SynonymFilter.java:79) at org.apache.lucene.analysis.StopFilter.next(StopFilter.java:120) at org.apache.solr.analysis.WordDelimiterFilter.next(WordDelimiterFilter.java:272) at org.apache.lucene.analysis.LowerCaseFilter.next(LowerCaseFilter.java:34) at org.apache.solr.analysis.EnglishPorterFilter.next(EnglishPorterFilterFactory.java:106) at org.apache.lucene.analysis.TokenStream.next(TokenStream.java:47) at org.apache.solr.analysis.BufferedTokenStream.read(BufferedTokenStream.java:94) at org.apache.solr.analysis.BufferedTokenStream.next(BufferedTokenStream.java:80) at org.apache.lucene.analysis.TokenStream.next(TokenStream.java:91) at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:519) at org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:116) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1324) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1211) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1139) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1128) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:170) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:79) at org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java:119) at org.apache.solr.search.QParser.getQuery(QParser.java:88) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:82) It looks like I am missing some configuration and solr is not updating the SynonymMap or SynonymFilter when my factory class is invoked. But I am sure that my factory class is invoked. In the schema.xml, I updated the index analyzer as filter class=custom class synonyms=synonyms.txt ignoreCase=true expand=false/ The query analyzer is updated as filter class=custom class synonyms=synonyms.txt ignoreCase=true expand=true/ Is there any thing which is obvious that's missed? Regards, Raja -- View this message in context: http://www.nabble.com/Synonym-file-in-a-different-location-tp23195669p23200257.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: storing xml - how to highlight hits in response?
Yeah great idea, thanks. Does anyone know if there is code out there that will do this sort of thing? Perhaps a much simpler option would be to use this: http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternReplaceFilterFactory.html with a regex of [^]* or something like that - I'm no regex expert. Of course it could get tricky to handle escaped characters and the like, but it may be a good enough poor man's solution. -Ken
Re: Change boost of documents / single fields / external scoring ?
Could an ExternalFileField help me ? http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html On Thu, Apr 23, 2009 at 10:01 PM, Marcus Herou marcus.he...@tailsweep.comwrote: Hi. Confusing subject eh ? Trying to become a little clearer in a few sentences. We have a Solr/Lucene index where each document is a Blog Entry. We have just implemented the PageRank algorithm for Blogs and are about to add a column to the index called score and perhaps adjust the document boost. We have as well decided that it is the blog itself and not the individual pages that are to be ranked so all entries belonging to one blog will receive the same score. I have not found a way to apply a document score without actually re-indexing all fields in the affected entries (could very well be 100% at every PageRank recalculation) and this will of course take hell of a long time to reindex which effectively will render the process useless since it would take a week or of reindexing as of current and will take more and more time. (100M blog entries as of current and rapidly increasing). Guess we have run into the issue where we have some static data which we do not want to touch at all but we want to update certain dynamic fields. Lucene is not a database I know but is there a way to implement external search-time scoring or update individual fields ? Would there be a possibilty to do some kind of join (parallell searches separate index types) ? or send the result to a separate sorting algorithm ? Hmmm Perhaps a subclass of Sort ? Grasping at straws here folks... Hope anyone of the core experts can help us. Cheers //Marcus Herou -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/
Re: prefix matching
Hmm, did some poking around and this conversation rung a bell from the Lucene list see http://www.lucidimagination.com/search/document/3e4ce083206664d2/ngrams_and_positions#3e4ce083206664d2 Looks like Lucene would need to solve LUCENE-1224 and LUCENE-1225. https://issues.apache.org/jira/browse/LUCENE-1224 https://issues.apache.org/jira/browse/LUCENE-1225 -Grant On Apr 23, 2009, at 10:52 AM, Tom Morton wrote: Hi all, I'm trying to use prefixes to match similar strings to a query string. I have the following field type: fieldtype name=prefix stored=true indexed=true class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=10/ /analyzer /fieldtype field: field name=wordPrefix type=prefix indexed=true stored=true/ copyField: copyField source=word dest=wordPrefix/ If I apply this to an indexed string: ipod shuffle and query string: shufle (missing f) I get matching terms for sh, shu shuf Index Analyzer ipodshuffle ipodshuffle ipodshuffle ipipoipodshshushuf shuffshufflshuffle Query Analyzer shufle shufle shufle shshushufshufl shufle However when I query for with shufle i get no results: http://localhost:8983/solr/select?q=wordPrefix%3Ashuflefl=wordPrefixqt=standarddebugQuery=on lst name=debug str name=rawquerystringwordPrefix:shufle/str str name=querystringwordPrefix:shufle/str - str name=parsedquery PhraseQuery(wordPrefix:sh hu uf fl le shu huf ufl fle shuf hufl ufle shufl hufle shufle) /str - str name=parsedquery_toString wordPrefix:sh hu uf fl le shu huf ufl fle shuf hufl ufle shufl hufle shufle /str This post suggests that I need to set the Position Increment for the my token filter, but I'm not sure how to do that or if it's possible. http://www.lucidimagination.com/search/document/bc643c39f0b6e423/queryparser_and_ngrams#629b39ea39aa9cd4 Thoughts? Thanks...Tom -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: replicated index files have incorrect timestamp
I have attached the output from our filelist below. The slaves are on the same version using the replication internal to solr 1.4. All replicated files are set to the date Dec 31 1969 response ? lst name=responseHeader int name=status0/int int name=QTime1/int /lst ? arr name=filelist ? lst str name=name_b7t.fdx/str long name=lastmodified1240473795000/long long name=size1248940/long /lst ? lst str name=name_b7t.nrm/str long name=lastmodified1240473844000/long long name=size27164362/long /lst ? lst str name=name_b7u.tii/str long name=lastmodified1240502374000/long long name=size1293/long /lst ? lst str name=name_b7t.fdt/str long name=lastmodified1240473795000/long long name=size507673107/long /lst ? lst str name=name_b7t.prx/str long name=lastmodified1240473843000/long long name=size157383562/long /lst ? lst str name=name_b7t.tvx/str long name=lastmodified1240473845000/long long name=size2497876/long /lst ? lst str name=name_b7u.nrm/str long name=lastmodified1240502374000/long long name=size10697/long /lst ? lst str name=name_b7t.frq/str long name=lastmodified1240473843000/long long name=size87254863/long /lst ? lst str name=name_b7u.fdt/str long name=lastmodified1240502374000/long long name=size2221854/long /lst ? lst str name=name_b7u.tis/str long name=lastmodified1240502374000/long long name=size96085/long /lst ? lst str name=name_b7u.fdx/str long name=lastmodified1240502374000/long long name=size2316/long /lst ? lst str name=name_b7u.tvx/str long name=lastmodified1240502374000/long long name=size4628/long /lst ? lst str name=name_b7t.tvf/str long name=lastmodified1240473845000/long long name=size17981946/long /lst ? lst str name=name_b7t.fnm/str long name=lastmodified1240473717000/long long name=size7401/long /lst ? lst str name=name_b7t.tvd/str long name=lastmodified1240473845000/long long name=size1851683/long /lst ? lst str name=name_b7t.tii/str long name=lastmodified1240473843000/long long name=size157438/long /lst ? lst str name=name_b7u.frq/str long name=lastmodified1240502374000/long long name=size270339/long /lst ? lst str name=name_b7u.prx/str long name=lastmodified1240502374000/long long name=size779156/long /lst ? lst str name=name_b7t.tis/str long name=lastmodified1240473843000/long long name=size11609437/long /lst ? lst str name=name_b7u.fnm/str long name=lastmodified1240502374000/long long name=size1525/long /lst ? lst str name=name_b7t_1.del/str long name=lastmodified1240502374000/long long name=size176/long /lst ? lst str name=namesegments_9yc/str long name=lastmodified1240502374000/long long name=size93/long /lst ? lst str name=name_b7u.tvf/str long name=lastmodified1240502374000/long long name=size38262/long /lst ? lst str name=name_b7u.tvd/str long name=lastmodified1240502374000/long long name=size3474/long /lst /arr ? arr name=confFiles ? lst str name=aliassolrconfig.xml/str str name=nameslave_solrconfig.xml/str long name=lastmodified1239712292000/long long name=checksum876307977/long long name=size33857/long /lst ? lst str name=nameschema.xml/str long name=lastmodified1237313545000/long long name=checksum1878024973/long long name=size24008/long /lst ? lst str name=namestopwords.txt/str long name=lastmodified123621333/long long name=checksum2619507454/long long name=size1168/long /lst ? lst str name=nameelevate.xml/str long name=lastmodified123621333/long long name=checksum790732532/long long name=size1274/long /lst ? lst str name=namesynonyms.txt/str long name=lastmodified1237990595000/long long name=checksum816919275/long long name=size68713/long /lst /arr /response -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Akshay akshay.u...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Fri, 24 Apr 2009 01:37:54 +0530 To: solr-user@lucene.apache.org Subject: Re: replicated index files have incorrect timestamp /solr/replication?command=indexversion
newbie question about indexing RSS feeds with SOLR
Hi, I've just downloaded solr and got it working, it seems pretty cool. I have a project which needs to maintain an index of articles that were published on the web via rss feed. Basically I need to watch some rss feeds, and search and index the items to be searched. Additionally, I need to run jobs based on particular keywords or events during parsing. is this something that I can do with SOLR? are their any related projects using SOLR that are better suited to indexing specific xml types like RSS? I had a look at the project enormo which appears to be a property lettings and sales listing aggregator. But I can see that they must have solved some of the problems I am thinking of such as scheduled indexing of remote resources, and writing a parser to get data fields from some other sites templates. Any advice would be welcome... Many Thanks, Tom
Re: Access HTTP headers from custom request handler
Right, you will have to build a new war with your own subclass of SolrDispatchFilter *rather* then using the packaged one. On Apr 23, 2009, at 12:34 PM, Noble Paul നോബിള് नोब्ळ् wrote: nope. you must edit the web.xml and register the filter there On Thu, Apr 23, 2009 at 3:45 PM, Giovanni De Stefano giovanni.destef...@gmail.com wrote: Hello Hoss, thank you for your reply. I have no problems subclassing the SolrDispatchFilter...but where shall I configure it? :-) I cannot find any doc/wiki explaining how to configure a custom dispatch filter. I believe it should be in solrconfig.xml requestDispatcher ... ... /requestDispatcher Any idea? Is there a schema for solrconfig.xml? It would make my life easier... ;-) Thanks, Giovanni On Wed, Apr 15, 2009 at 12:48 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Solr cannot assume that the request would always come from http (think : of EmbeddedSolrServer) .So it assumes that there are only parameters exactly. : Your best bet is to modify SolrDispatchFilter and readthe params and : set them in the SolrRequest Object SolrDispatchFilter is designed to be subclassed to make this easy by overriding the execute method... protected void execute( HttpServletRequest req, SolrRequestHandler handler, SolrQueryRequest sreq, SolrQueryResponse rsp) { sreq.getContext().put( HttpServletRequest, req ); super.execute( req, handler, sreq, rsp ) } -Hoss -- --Noble Paul
Solr Performance bottleneck
Hi all, I am trying to solve a serious performance problem with our Solr search index. We're running under Solr 1.3. We've sharded our index into 4 shards. Index data is stored on a network mount that is accessed over Fibre Channel. Each document's text is indexed, but not stored. Each day, roughly 10K - 20K new documents are added. After a document is submitted, it is compared, sentence by sentence, against every document we have indexed in its category. It's a requirement that we keep our index as up-to-date as possible. We reload our indexes once a minute in order to miss as few matches as possible. We are not expecting to find matches, so our document cache hits rates are abysmal. We also don't expect many repeated sentences across documents, so cached query hits rates are also practically zero. After running fine for over 9 months, the system broke down this week. The queries per second are around 17 to 18, and our paper backlog is well north of 14,000. The number of papers in the index has hit 3.7 million, and each shard is 2.3GB in size (roughly 925K papers in each index). In order to increase throughput, we tried to stand up additional read-only Solr instances pointed at the shared indexes, but got I/O errors from the secondary Solr instances when the reload time came. We tried switching the locking mechanize from single to simple, but the I/O error continued. We're running on 64-bit Linux with a 64-bit JVM (Java 1.6.something), with 4GB of RAM assigned to each Solr instance. Has anyone else seen a problem like this before? Can anyone suggest any solutions? Will Solr 1.4 help (and is Solr 1.4 ready for production use)? Any answers would be greatly appreciated. Thanks, Jon -- View this message in context: http://www.nabble.com/Solr-Performance-bottleneck-tp23209595p23209595.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: OutofMemory on Highlightling
I am not sure whether lazy loading should help solve this problem. I have set enableLazyFieldLoading to true but it is not helping. I went through the code and observed that DefaultSolrHighlighter.doHighlighting is reading all the documents and the fields for highlighting (In my case, 1 MB stored field is read for all documents). Also I am confused over the following code in SolrIndexSearcher.doc() method if(!enableLazyFieldLoading || fields == null) { d = searcher.getIndexReader().document(i); } else { d = searcher.getIndexReader().document(i, new SetNonLazyFieldSelector(fields)); } Are we setting the fields as NonLazy even if lazy loading is enabled? Thanks, Siddharth -Original Message- From: Gargate, Siddharth [mailto:sgarg...@ptc.com] Sent: Wednesday, April 22, 2009 11:12 AM To: solr-user@lucene.apache.org Subject: RE: OutofMemory on Highlightling Here is the stack trace SEVERE: java.lang.OutOfMemoryError: Java heap space at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:133) at java.lang.StringCoding.decode(StringCoding.java:173) at java.lang.String.init(String.java:444) at org.apache.lucene.store.IndexInput.readString(IndexInput.java:125) at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:390) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:230) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:892) at org.apache.lucene.index.MultiSegmentReader.document(MultiSegmentReader.j ava:277) at org.apache.solr.search.SolrIndexReader.document(SolrIndexReader.java:176 ) at org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:457) at org.apache.solr.search.SolrIndexSearcher.readDocs(SolrIndexSearcher.java :482) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultS olrHighlighter.java:253) at org.apache.solr.handler.component.HighlightComponent.process(HighlightCo mponent.java:84) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search Handler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2 86) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84 5) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) -Original Message- From: Gargate, Siddharth [mailto:sgarg...@ptc.com] Sent: Wednesday, April 22, 2009 9:29 AM To: solr-user@lucene.apache.org Subject: RE: OutofMemory on Highlightling I tried disabling the documentCache but still the same issue. documentCache class=solr.LRUCache size=0 initialSize=0 autowarmCount=0/ -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Monday, April 20, 2009 4:38 PM To: solr-user@lucene.apache.org Subject: Re: OutofMemory on Highlightling Gargate, Siddharth wrote: Anybody facing the same issue? Following is my configuration ... field name=content type=text indexed=true stored=false multiValued=true/ field name=teaser type=text indexed=false stored=true/ copyField source=content dest=teaser maxChars=100 / ... ... requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str int name=rows500/int str name=hltrue/str str name=flid,score/str str name=hl.flteaser/str str name=hl.alternateFieldteaser/str int name=hl.fragsize200/int int name=hl.maxAlternateFieldLength200/int int name=hl.maxAnalyzedChars500/int /lst /requestHandler ... Search works fine if I disable
Re: replicated index files have incorrect timestamp
looks like a bug. https://issues.apache.org/jira/browse/SOLR-1126 On Fri, Apr 24, 2009 at 3:26 AM, Jeff Newburn jnewb...@zappos.com wrote: I have attached the output from our filelist below. The slaves are on the same version using the replication internal to solr 1.4. All replicated files are set to the date Dec 31 1969 response ? lst name=responseHeader int name=status0/int int name=QTime1/int /lst ? arr name=filelist ? lst str name=name_b7t.fdx/str long name=lastmodified1240473795000/long long name=size1248940/long /lst ? lst str name=name_b7t.nrm/str long name=lastmodified1240473844000/long long name=size27164362/long /lst ? lst str name=name_b7u.tii/str long name=lastmodified1240502374000/long long name=size1293/long /lst ? lst str name=name_b7t.fdt/str long name=lastmodified1240473795000/long long name=size507673107/long /lst ? lst str name=name_b7t.prx/str long name=lastmodified1240473843000/long long name=size157383562/long /lst ? lst str name=name_b7t.tvx/str long name=lastmodified1240473845000/long long name=size2497876/long /lst ? lst str name=name_b7u.nrm/str long name=lastmodified1240502374000/long long name=size10697/long /lst ? lst str name=name_b7t.frq/str long name=lastmodified1240473843000/long long name=size87254863/long /lst ? lst str name=name_b7u.fdt/str long name=lastmodified1240502374000/long long name=size2221854/long /lst ? lst str name=name_b7u.tis/str long name=lastmodified1240502374000/long long name=size96085/long /lst ? lst str name=name_b7u.fdx/str long name=lastmodified1240502374000/long long name=size2316/long /lst ? lst str name=name_b7u.tvx/str long name=lastmodified1240502374000/long long name=size4628/long /lst ? lst str name=name_b7t.tvf/str long name=lastmodified1240473845000/long long name=size17981946/long /lst ? lst str name=name_b7t.fnm/str long name=lastmodified1240473717000/long long name=size7401/long /lst ? lst str name=name_b7t.tvd/str long name=lastmodified1240473845000/long long name=size1851683/long /lst ? lst str name=name_b7t.tii/str long name=lastmodified1240473843000/long long name=size157438/long /lst ? lst str name=name_b7u.frq/str long name=lastmodified1240502374000/long long name=size270339/long /lst ? lst str name=name_b7u.prx/str long name=lastmodified1240502374000/long long name=size779156/long /lst ? lst str name=name_b7t.tis/str long name=lastmodified1240473843000/long long name=size11609437/long /lst ? lst str name=name_b7u.fnm/str long name=lastmodified1240502374000/long long name=size1525/long /lst ? lst str name=name_b7t_1.del/str long name=lastmodified1240502374000/long long name=size176/long /lst ? lst str name=namesegments_9yc/str long name=lastmodified1240502374000/long long name=size93/long /lst ? lst str name=name_b7u.tvf/str long name=lastmodified1240502374000/long long name=size38262/long /lst ? lst str name=name_b7u.tvd/str long name=lastmodified1240502374000/long long name=size3474/long /lst /arr ? arr name=confFiles ? lst str name=aliassolrconfig.xml/str str name=nameslave_solrconfig.xml/str long name=lastmodified1239712292000/long long name=checksum876307977/long long name=size33857/long /lst ? lst str name=nameschema.xml/str long name=lastmodified1237313545000/long long name=checksum1878024973/long long name=size24008/long /lst ? lst str name=namestopwords.txt/str long name=lastmodified123621333/long long name=checksum2619507454/long long name=size1168/long /lst ? lst str name=nameelevate.xml/str long name=lastmodified123621333/long long name=checksum790732532/long long name=size1274/long /lst ? lst str name=namesynonyms.txt/str long name=lastmodified1237990595000/long long name=checksum816919275/long long name=size68713/long /lst /arr /response -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Akshay akshay.u...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Fri, 24 Apr 2009 01:37:54 +0530 To: solr-user@lucene.apache.org Subject: Re: replicated index files have incorrect timestamp /solr/replication?command=indexversion -- --Noble Paul
Re: Get date facet counts per month
On Thu, Apr 23, 2009 at 2:36 AM, Raju444us ngudipa...@cormineid.com wrote: In the example on the wiki at gives the facet counts for date per day.How should the query look like to get date facets by month. http://wiki.apache.org/solr/SimpleFacetParameters#head-068dc96b0dac1cfc7264fe85528d7df5bf391acd Here is the sample query for day level facet counts. http://localhost:8983/solr/select/?q=*:*rows=0facet=truefacet.date=timestampfacet.date.start=NOW/DAY-5DAYSfacet.date.end=NOW/DAY%2B1DAYfacet.date.gap=%2B1DAY You can use facet.date.gap=+1MONTH -- Regards, Shalin Shekhar Mangar.
Re: Delete from Solr index...
On Thu, Apr 23, 2009 at 9:25 AM, lupiss lupitaga...@hotmail.com wrote: hola de nuevo! es cierto ese comando es el que borra un index, ya lo intenté y sí, así borraré mis registros de prueba de mi proyecto, estaría bien saber como borrarlo desde la aplicación mediante solrj, saludos, gracias :) hello again! this is true is the command that erases an index, and I tried and yes, that blot on my record of my test project, it would be nice to know how to delete it from your application using solrj, greetings, thank you:) You can use solrServer.deleteByQuery(*:*) and then call commit by solrServer.commit(true, true); This will erase the index. -- Regards, Shalin Shekhar Mangar.
PageRank sort
Hi. I've posted before but here it goes again: I have BlogData data which is more or less 100% static but one field is not - the PageRank. I would like to sort on that field and on the Lucene list I got these answers. 1. Use two indexes and a ParallellReader 2. Use a FieldScoreQuery containing the PageRank field. 3. Use a CustomScoreQuery which uses the FieldScoreQuery combined with other Queries (the actual search). I think I could use this pattern as well: 1. Use two indexes and a ParallellReader 2. Normal search and Sort on the PageRank column (perhaps consuming more memory) Anyone have an idea of howto implement these patterns in SOLR ? I have never extended SOLR but am not afraid of doing so if someone pushes me in the right direction. Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/
Re: autowarmcount how to check if cache has been warmed up
OK, lets try this: 1. Before a commit, check the stats page, see if the size is more than 5 2. Then call commit, and verify that the size is more than 5 If the original size was 5, then you should have size 5 after autowarming too. On Wed, Apr 22, 2009 at 2:57 PM, sunnyfr johanna...@gmail.com wrote: still the same ? Seems done : lookups : 0 hits : 0 hitratio : 0.00 inserts : 0 evictions : 0 size : 5 warmupTime : 20973 cumulative_lookups : 0 cumulative_hits : 0 cumulative_hitratio : 0.00 cumulative_inserts : 0 cumulative_evictions : 0 Apr 22 11:09:29 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:29 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@48b6c333 main from searc...@79e79d96 main ^IfieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Apr 22 11:09:29 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:29 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@48b6c333 main ^IfieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Apr 22 11:09:29 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:29 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@48b6c333 main from searc...@79e79d96 main ^IfilterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Apr 22 11:09:29 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:29 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@48b6c333 main ^IfilterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Apr 22 11:09:29 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:29 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@48b6c333 main from searc...@79e79d96 main ^IqueryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=5,warmupTime=3055,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Apr 22 11:09:50 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:50 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for searc...@48b6c333 main ^IqueryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=5,warmupTime=20973,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Apr 22 11:09:50 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:50 AM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to searc...@48b6c333 main Apr 22 11:09:50 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:50 AM org.apache.solr.core.SolrCore execute INFO: [video] webapp=null path=null params={start=0q=solrrows=100} hits=164 status=0 QTime=0 Apr 22 11:09:50 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:50 AM org.apache.solr.core.SolrCore execute INFO: [video] webapp=null path=null params={start=0q=rocksrows=100} hits=167581 status=0 QTime=51 Apr 22 11:09:51 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:51 AM org.apache.solr.core.SolrCore execute INFO: [video] webapp=null path=null params={sort=id+descq=anything} hits=8419 status=0 QTime=50 Apr 22 11:09:51 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:51 AM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done. Apr 22 11:09:51 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:51 AM org.apache.solr.core.SolrCore registerSearcher INFO: [video] Registered new searcher searc...@48b6c333 main Apr 22 11:09:51 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:51 AM org.apache.solr.search.SolrIndexSearcher close INFO: Closing searc...@79e79d96 main ^IfieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} ^IfilterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} ^IqueryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=5,warmupTime=3055,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Apr 22 11:09:51 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:51 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Apr 22 11:09:51 search-01 jsvc.exec[31908]: Apr 22, 2009 11:09:51
Re: how to reset the index in solr
Thanks for your valuable suggestions. Can i get the rake task for clearing the index of solr, I mean rake index::rebuild, It would be very helpful and also to avoid the delete id by manually. regards, Sg.. Otis Gospodnetic wrote: You can also delete it with delete by query using the following query in the delete command: *:* Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: sagi4 gee...@angleritech.com To: solr-user@lucene.apache.org Sent: Wednesday, April 22, 2009 7:57:50 AM Subject: Re: how to reset the index in solr Thanks for your response. I want to clear it basically, it means clearing the index Thank you Sg.. What do you mean? To delete it or to reload it? If you want to delete it just delete ./data/index folder. If you want to reload just reload your server if you can. In case you are using cores you can just reload a core with all it's configuration http://localhost:8080/solr/admin/cores?action=RELOADcore=core_name I need to clearing the index in solr. I would appreciate anyone help -- Best Regards, ** *Geetha S * -- View this message in context: http://www.nabble.com/how-to-reset-the-index-in-solr-tp23170853p23174806.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/how-to-reset-the-index-in-solr-tp23170853p23210349.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query | Solr conf and data (index) distribution using master slave configuration
On Thu, Apr 23, 2009 at 10:10 AM, Vicky_Dev vikrantv_shirbh...@yahoo.co.inwrote: 1. Please confirm whether the tag entry : dataDir/datadir In solrconfig.xml should match for the Slave solr server / master solr server in accordance to the scripts.conf configuration settings. Yes, dataDir in solrconfig.xml and scripts.conf should be same. 2. Also let us know whether some specific handling has to be done in case of using multi cores during replication. You'd need to setup replication separately for each core. -- Regards, Shalin Shekhar Mangar.