RE: Replacing existing documents
Hello, Recently someone mentioned that it would be possible to have a 'replace existing document' feature rather than just dropping and adding documents with the same unique id. AFAIK, this is not possible. You have the update in lucene, but internally it just does a delete/add operation We have a few use cases in this area and I'm researching whether it is effective to check for a document via Solr queries, or whether it is worthwhile to add this to the Solr implementation. What are the usecases?? I do not see what you mean. Does anyone have an estimate for the difference between querying, day, 100 documents by unique ID from the network v.s. fetching them directly from the index? Depends of course from the networkfetching them from the index is fast normally. One use case is that we would like to use the index as our one database for documents, and if we delete a document we want it to stay deleted. Thus we would mark it deleted and check for its existence. I suppose you mark it deleted by setting some flag (like lucene Field: isDeleted set to true). I am not sure wether using the lucene index as your database is really smart...i might get corrupt. I would at least suggest to backup it frequently Regards Ard ps sry for my annoying .. because i am using a web mail client Another use case is that we are re-adding the same document a few times a day, and the commit times are ballooning. Where would I implement this? Thanks, Lance
Major update to Solrsharp
A big update was just posted to the Solrsharp project. This update now provides for first-class support for highlighting in the library. The implementation is really robust and provides the following features: - Structured highlight parameter assignment based on the SolrField object - Full access for all highlight parameters, on both an aggregate and per-field basis - Incorporation of highlighted values into the base search result records All of the supplied documentation has been updated as well as the example application in using the highlighting classes. Please report any issues through JIRA. Be sure to associate any issues with the C# client component. cheers, jeff r.
Re: Replacing existing documents
On Aug 21, 2007, at 9:25 PM, Lance Norskog wrote: Recently someone mentioned that it would be possible to have a 'replace existing document' feature rather than just dropping and adding documents with the same unique id. There is such a patch: https://issues.apache.org/jira/browse/SOLR-139 I'm experimenting with it right now and it works well for my cases. However, it is still under the covers a delete/add and One use case is that we would like to use the index as our one database for documents, and if we delete a document we want it to stay deleted. Thus we would mark it deleted and check for its existence. Another use case is that we are re-adding the same document a few times a day, and the commit times are ballooning. ...you still have to commit for changes to be visible. Erik
Indexing HTML content... (Embed HTML into XML?)
Hello, Sorry for stupid question. I'm trying to index html file as one of the fields in Solr, I've setup appropriate analyzer in schema but I'm not sure how to add html content to Solr. Encapsulating HTML content within field tag is obviously not valid. How do I add html content? Hope the query is clear Thanks, Ravi
Re: Indexing HTML content... (Embed HTML into XML?)
You need to encode your html content so it can be include as a normal 'string' value in your xml element. As far as remember, the only unsafe characters you have to encode as entities are: - lt; - gt; - quote; - amp; (google xml entities to be sure). I dont know what language you use , but for perl for instance, you can use something like: use HTML::Entities ; my $xmlString = encode_entities($rawHTML , '' ); Also you need to make sure your Html is encoded in UTF-8 . To comply with solr need for UTF-8 encoded xml. I hope it helps. J. On 8/22/07, Ravish Bhagdev [EMAIL PROTECTED] wrote: Hello, Sorry for stupid question. I'm trying to index html file as one of the fields in Solr, I've setup appropriate analyzer in schema but I'm not sure how to add html content to Solr. Encapsulating HTML content within field tag is obviously not valid. How do I add html content? Hope the query is clear Thanks, Ravi -- Jerome Eteve. [EMAIL PROTECTED] http://jerome.eteve.free.fr/
RE: Query optimisation - multiple filter caches?
I understand - thanks, Yonik. I notice that LuceneQueryOptimizer is still used in SolrIndexSearcher.search(Query, Filter, Sort) - is the idea then that this method is deprecated, or that the config parameter query/boolTofilterOptimizer is no longer to be used? As for the other search() methods, they just delegate directly to org.apache.lucene.search.IndexSearcher, so no use of caches there. Jon -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: 16 August 2007 01:40 To: solr-user@lucene.apache.org Subject: Re: Query optimisation - multiple filter caches? On 8/15/07, Jonathan Woods [EMAIL PROTECTED] wrote: I'm trying to understand how best to integrate directly with Solr (Java-to-Java in the same JVM) to make the most of its query optimisation - chiefly, its caching of queries which merely filter rather than rank results. I notice that SolrIndexSearcher maintains a filter cache and so does LuceneQueryOptimiser. Shouldn't they be contributing to/using the same cache, or are they used for different things? LuceneQueryOptimiser is no longer used since one can directly specify filters via fq parameters. -Yonik
Apache web server logs in solr
Hello, I was thinking that solr - with its built in faceting - would make for a great apache log file storage system. I was wondering if anyone knows of any module or library for apache to write log files directly to solr or to a lucene index? Thanks Andrew
RE: SolJava --- which attachments are valid?
Sorry for revisiting this 3 weeks old thread. I downloaded the nighlty yesterday. I noticed that some classes have API docs (.html) but no source code (.java). For example, there is a javadoc for org.apache.solr.client.solrj.util.ClientUtils but no ClientUtils.java: bash-3.00$ find . -type f | grep Client ./docs/api-solrj/org/apache/solr/client/solrj/util/class-use/ClientUtils .html ./docs/api-solrj/org/apache/solr/client/solrj/util/ClientUtils.html Is this a packaging problem, or is it intentional? -kuro -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Friday, August 03, 2007 12:50 PM To: solr-user@lucene.apache.org Subject: Re: SolJava --- which attachments are valid? Teruhiko Kurosaka wrote: or you can get it from the nightly builds in: http://people.apache.org/builds/lucene/solr/nightly/ For those of you who are interested... As far as I can tell by inspecting the source code in Trunk, solrj.jar from the nightly doesn't seem to work with Solr 1.2. For one thing, there is a new layer org.apache.solr.common and org.apache.util has become a sub component under the common. Things like SolrInputDocument do not exist in Solr 1.2 at all. To run solrj, you need: apache-solr-1.3-dev-common.jar apache-solr-1.3-dev-solrj.jar and all the files in: solrj-lib You *should* be able to use the client against a server that is running 1.2, but I don't make any promises there. ryan
Solr and terracotta
Recently I ran into this topic. I googled it a little and didn't find much information. It would be great to have solr working with RAMDirectory and Terracotta. We could stop using crons for rsync, right? Has anyone tried that out?
Solr scoring: relative or absolute?
Are the score values generated in Solr relative to the index or are they against an absolute standard? Is it possible to create a scoring algorithm with this property? Are there parts of the score inputs that are absolute? My use case is this: I would like to do a parallel search against two Solr indexes, and combine the results. The two indexes are built with the same data sources, we just can't handle one giant index. If the score values are against a common 'scale', then scores from the two search indexes can be compared. I could combine the result sets with a simple merge by score. This is a difficult concept to explain. I hope I have succeeded. Thanks, Lance
Re: Solr scoring: relative or absolute?
Indexes cannot be directly compared unless they have similar collection statistics. That is the same terms occur with the same frequency across all indexes and the average document lengths are about the same (though the default similarity in Lucene may not care about average document length--I'm not sure). SOLR-303 is an attempt to solve the partitioning issue from the search side of things. -Sean Lance Norskog wrote: Are the score values generated in Solr relative to the index or are they against an absolute standard? Is it possible to create a scoring algorithm with this property? Are there parts of the score inputs that are absolute? My use case is this: I would like to do a parallel search against two Solr indexes, and combine the results. The two indexes are built with the same data sources, we just can't handle one giant index. If the score values are against a common 'scale', then scores from the two search indexes can be compared. I could combine the result sets with a simple merge by score. This is a difficult concept to explain. I hope I have succeeded. Thanks, Lance
RE: Solr and terracotta
tried it, didn't work that well...so I ended up making my own little faceted Search engine directly using RAMDirectory and clustering it via Terracotta...not as good as SOLR(smile), but it worked. i actually posted some questions awhile back in trying to get it to work. so terracotta can hook the RAMDirectory, maybe be good to submit this in JIRA for terrocotta support! Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size.. -Prince(1986) Date: Wed, 22 Aug 2007 16:18:24 -0300 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Solr and terracotta Recently I ran into this topic. I googled it a little and didn't find much information. It would be great to have solr working with RAMDirectory and Terracotta. We could stop using crons for rsync, right? Has anyone tried that out?
Re: Solr and terracotta
How come it didn't work? How did you add RAMDir support to solr? On 8/22/07, Jeryl Cook [EMAIL PROTECTED] wrote: tried it, didn't work that well...so I ended up making my own little faceted Search engine directly using RAMDirectory and clustering it via Terracotta...not as good as SOLR(smile), but it worked. i actually posted some questions awhile back in trying to get it to work. so terracotta can hook the RAMDirectory, maybe be good to submit this in JIRA for terrocotta support! Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size.. -Prince(1986) Date: Wed, 22 Aug 2007 16:18:24 -0300 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Solr and terracotta Recently I ran into this topic. I googled it a little and didn't find much information. It would be great to have solr working with RAMDirectory and Terracotta. We could stop using crons for rsync, right? Has anyone tried that out?
Running into problems with distributed index and search
Hi All, This is the scenario, I have two search SOLR instances running on two different partitions, I am treating one of the servers strictly read-only (for search) (search server) and the other Instance (index server) for indexing. The index file data directory reside on a NFS partition, I am running into the following problems, 1) Index dir is /indexdata/data, when I index using the Index server, the index server understands the data dir mentioned in solrconfig.xml, writes the index files To the location and is able to read the files ( I am able to do queries using SOLR Admin) 2) Search server respects the NFS directory, but does not read the index files, SOLR Admin returns no search results, I had to create a sym link to the NFS partition Under $SOLRHOME to point to NFS partition to work. 3) I had to bounce the tomcat search SOLR Webapp instance for it to read the index files, is it mandatory? In a distributed environment, do we always have to Bounce the SOLR Webapp instances to reflect the changes in the index files? Any help/suggestions would be greatly appreciated. Thanks, kasi
How to extract constrained fields from query
Hello, in my custom request handler, I want to determine which fields are constrained by the user. E.g. the query (q) might be ipod AND brand:apple and there might be a filter query (fq) like color:white (or more). What I want to know is that brand and color are constrained. AFAICS I could use SolrPluginUtils.parseFilterQueries and test if the queries are TermQueries and read its Field. Then should I also test which kind of queries I get when parsing the query (q) and look for all TermQueries from the parsed query? Or is there a more elegant way of doing this? Thanx a lot, cheers, Martin signature.asc Description: This is a digitally signed message part
RE: Solr and terracotta
Jeryl, I remember you asking about how to hook in the RAMDirectory a while back. It seemed like there was maybe some support within Solr that you needed. I assume you're suggesting adding an issue in the Solr JIRA, right? Is there something that the Terracotta team can do to help? Cheers, Orion Jeryl Cook wrote: tried it, didn't work that well...so I ended up making my own little faceted Search engine directly using RAMDirectory and clustering it via Terracotta...not as good as SOLR(smile), but it worked. i actually posted some questions awhile back in trying to get it to work. so terracotta can hook the RAMDirectory, maybe be good to submit this in JIRA for terrocotta support! Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size.. -Prince(1986) Date: Wed, 22 Aug 2007 16:18:24 -0300 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Solr and terracotta Recently I ran into this topic. I googled it a little and didn't find much information. It would be great to have solr working with RAMDirectory and Terracotta. We could stop using crons for rsync, right? Has anyone tried that out? -- View this message in context: http://www.nabble.com/Solr-and-terracotta-tf4313531.html#a12283537 Sent from the Solr - User mailing list archive at Nabble.com.
Web statistics for solr?
Hello! I was wondering if anyone has written a script that displays any stats from SOLR.. queries per second, number of docs added.. this sort of thing. Sort of a general dashboard for SOLR. I'd rather not write it myself if I don't need to, and I didn't see anything conclusive in the archives for the email list. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++
Re: defining fiels to be returned when using mlt
Hi Stefan, Currently there is no way to specify the list of fields to be returned by the MoreLikeThis handler. I've been looking to address this issue in https://issues.apache.org/jira/browse/SOLR-295 (point 3) however in the broader scheme of things, it seems logical to wait until https://issues.apache.org/jira/browse/SOLR-281 is resolved before making changes to MLT. cheers, Piete On 22/08/07, Stefan Rinner [EMAIL PROTECTED] wrote: Hi Is there any way to define the numer/type of fields of the documents returned in the moreLikeThis part of the response, when mlt is set to true? Currently I'm using morelikethis to show the number and sources of similar documents - therefore I'd need only the source field of these similar documents and not everything. - stefan
Re: Web statistics for solr?
Matthew, Maybe the SOLR Statistics page would suit your purpose? (click on statistics from the main solr page or use the following url) http://localhost:8983/solr/admin/stats.jsp cheers, Piete On 23/08/07, Matthew Runo [EMAIL PROTECTED] wrote: Hello! I was wondering if anyone has written a script that displays any stats from SOLR.. queries per second, number of docs added.. this sort of thing. Sort of a general dashboard for SOLR. I'd rather not write it myself if I don't need to, and I didn't see anything conclusive in the archives for the email list. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++
Re: almost realtime updates with replication
At Infoseek, we ran a separate search index with today's updates and merged that in once each day. It requires a little bit of federated search to prefer the new content over the big index, but the daily index can be very nimble for update. wunder On 8/22/07 7:58 AM, mike topper [EMAIL PROTECTED] wrote: Hello, Currently in our application we are using the master/slave setup and have a batch update/commit about every 5 minutes. There are a couple queries that we would like to run almost realtime so I would like to have it so our client sends an update on every new document and then have solr configured to do an autocommit every 5-10 seconds. reading the Wiki, it seems like this isn't possible because of the strain of snapshotting and pulling to the slaves at such a high rate. What I was thinking was for these few queries to just query the master and the rest can query the slave with the not realtime data, although I'm assuming this wouldn't work either because since a snapshot is created on every commit, we would still impact the performance too much? anyone have any suggestions? If I set autowarmingCount=0 would I be able to to pull to the slave faster than every couple of minutes (say, every 10 seconds)? what if I take out the postcommit hook on the master and just have the snapshooter run on a cron every 5 minutes? -Mike
Re: Solr and terracotta
If I am not wrong once you have the RAMDir feature mounting Terracotta should be transparent and fast, right? On 8/22/07, Orion Letizi [EMAIL PROTECTED] wrote: Jeryl, I remember you asking about how to hook in the RAMDirectory a while back. It seemed like there was maybe some support within Solr that you needed. I assume you're suggesting adding an issue in the Solr JIRA, right? Is there something that the Terracotta team can do to help? Cheers, Orion Jeryl Cook wrote: tried it, didn't work that well...so I ended up making my own little faceted Search engine directly using RAMDirectory and clustering it via Terracotta...not as good as SOLR(smile), but it worked. i actually posted some questions awhile back in trying to get it to work. so terracotta can hook the RAMDirectory, maybe be good to submit this in JIRA for terrocotta support! Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size.. -Prince(1986) Date: Wed, 22 Aug 2007 16:18:24 -0300 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Solr and terracotta Recently I ran into this topic. I googled it a little and didn't find much information. It would be great to have solr working with RAMDirectory and Terracotta. We could stop using crons for rsync, right? Has anyone tried that out? -- View this message in context: http://www.nabble.com/Solr-and-terracotta-tf4313531.html#a12283537 Sent from the Solr - User mailing list archive at Nabble.com.