RE: Lock obtain timeout
Thanks for that - I have made the following changes: - optimize more often - omitNorms on all non-fulltext fields - useCompoundfFile=true (will keep an eye on performance) And that seems to have solved the problem. -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: 13 January 2007 01:37 To: solr-user@lucene.apache.org Subject: Re: Lock obtain timeout : Are the two problems related? Looking through the mailing list it seems : that changing the settings for useCompoundFile from false to true could : help but before I do that I would like to understand if there are : undesirable side effects, what isnât this param set to true by : default? Too Many Open Files can result from lots of different possible reasons: one is that you have soo many indexed fields with norms that the number of files in your index is too big -- that's the use case where useCompoundFile=true can help you -- but it's not set that way be default because it can make searching slower. the other reason why you can have too many open files is if you are getting more concurrent requests then you can handle -- or if the clients initiating those requests aren't closing them properly (sockets count as files too) understanding why you are getting these errors requires that you look at what your hard and soft file limits are (ulimit -aH and ulimit -aS on my system) and what files are in use by Solr when these errors occur (lsof -p _solrpid_). to answer your earlier question, i *think* you may be getting the lock timeout errors because it can't access the lock file, because it can't open any more files ... i'm not 100% sure. -Hoss
Re: Does Solr support integration with the Compass framework?
Hi Marios, It can store the index in a database, but I wouldn't want to use that route myself. Here is a quick link to the docs which provides an over-view of the transactional features; http://www.opensymphony.com/compass/versions/1.1M3/html/core-searchengine.html#core-searchengine-transaction HTH, Graham Marios Skounakis wrote: Does compass store the lucene index in a database? If this is the case, it is fairly straightforward to understand how this happens. If the index is still in disk files how does it provide transactional semantics? Would you care to give a high-level overview? TIA Marios On 1/15/07, Graham O'Regan [EMAIL PROTECTED] wrote: compass provides a transaction manager for lucene indexes so you can incorporate an index update and database update in a single transaction or roll-back if either fails. thats why it would be interesting to see the two working together. Marios Skounakis wrote: Hi all, I am working on a hibernate-solr bridge that will behave like the compass Hibernate3GpsDevice. It gets a callback from hibernate when an object is stored, checks if it is 'SolrDocumentable' and sends it to solr using the client library from: http://issues.apache.org/jira/browse/SOLR-20 (solr-client.zip) If your interested, i can send you my initial version... when i'm further along, i'll try to post it to solr/client/java That would be great - we're also facing the same issue of rolling our own code to keep a Solr index in sync with a MySQL DB that we access via Hibernate. I wonder whether people who try to keep a Solr (or Lucene) index in sync with a database are at all worried about index update failures. Propagating the update from the DB to the index is one thing, and relatively easy to implement. But how do you handle failures to update either the index or the DB since you cannot enforce transactional semantics over both updates? Or do index update failures occur so infrequently that you do not worry about it? Marios
Re: Does Solr support integration with the Compass framework?
On 1/15/07, Lukas Vlcek [EMAIL PROTECTED] wrote: Ryan, Could you be more specific on your statement? On 1/12/07, Ryan McKinley [EMAIL PROTECTED] wrote: I started using compass a few months back. It is an amazing system: with almost no effort, it just works. BUT the showstopper (for me) was that you could not easily update the index from multiple machines. Compass lets you put the lucene indexes in SQL with JDBC, but this felt wrong then i found solr, and it solves most things. What exactly do you mean by you could not easily update the index from multiple machines? Consider a standard load balanced web setup with three machines: DB1 - running mysql WEB1 - webapp talking to DB1 WEB2 - webapp talking to DB1 ... In compass, the lucene index is stored on a disk - WEB1 writes its lucene index on WEB1. So for changes that WEB1 makes, WEB2 does not see them (without adding more logic) To solve this problem, compass is able to write its index into SQL. WEB1 WEB2 can write the lucene index in DB1. But the performance is not great and it seems to be something people discourage (tho i have not tried it) Also, take a look at: http://forums.opensymphony.com/thread.jspa?messageID=100071 Could you describe you problem in more details (and possible workaround if you found any) please? workaround? I'm now using solr :) Otherwise, consider: * Try the JDBC store: http://www.opensymphony.com/compass/versions/1.1M3/html/core-connection.html#core-connection-jdbc * If you are ok with WEB1 WEB2 being slightly out of sync for new content, you could us them normally and periodically call index() on the hibernate GPS device. This will synchronize whatever is stored in hibernate with the lucene index
Re: XML querying
On Mon, 2007-01-15 at 12:23 +, Luis Neves wrote: Hello. What I do now to index XML documents it's to use a Filter to strip the markup, this works but it's impossible to know where in the document is the match located. What would it take to make possible to specify a filter query that accepts xpath expressions?... something like: fq=xmlField:/book/content/text() This way only the /book/content/ element was searched. Did I make sense? Is this possible? AFAIK short answer: no. The field is ALWAYS plain text. There is no xmlField type. ...but why don't you just add your text in multiple field when indexing. Instead of plain stripping the markup do above xpath on your document and create different fields. Like field name=content xsl:value-of select=/book/content/text()//field field name=more xsl:value-of select=/book/more/text()//field Makes sense? HTH salu2 -- Luis Neves
Re: XML querying
Hi! Thorsten Scherler wrote: On Mon, 2007-01-15 at 12:23 +, Luis Neves wrote: Hello. What I do now to index XML documents it's to use a Filter to strip the markup, this works but it's impossible to know where in the document is the match located. What would it take to make possible to specify a filter query that accepts xpath expressions?... something like: fq=xmlField:/book/content/text() This way only the /book/content/ element was searched. Did I make sense? Is this possible? AFAIK short answer: no. The field is ALWAYS plain text. There is no xmlField type. ...but why don't you just add your text in multiple field when indexing. Instead of plain stripping the markup do above xpath on your document and create different fields. Like field name=content xsl:value-of select=/book/content/text()//field field name=more xsl:value-of select=/book/more/text()//field Makes sense? Yes, but I have documents with different schemas on the same xml field, also, that way I would have to know the schema of the documents being indexed (which I don't). The schema I use is something like: field name=DocumentType type=string indexed=true stored=true/ field name=Document type=text indexed=true stored=true/ Where each distinct DocumentType has its own schema. I could revise this approach to use an Solr instance for each DocumentType but I would have to find a way to merge results from the different instances because I also need to search across different DocumentTypes... I guess I'm SOL :-( -- Luis Neves
Re: One item, multiple fields, and range queries
Thanks Hoss. Interesting approach, but the N bound could be well in the hundreds, and the N bound would be variable (some maximum number, but different across events.) I've not yet used dynamic fields in this manner. With that number range, what limitations could I encounter? Given the size of that, I would need the solr engine to formulate that query, correct? I can't imagine I could pass that entire subquery statement in the http request, as the character limit would likely be exceeded. Some of my comments may not make sense, so I'll check into dynamic fields and such in the meantime. thanks, j On 1/14/07, Chris Hostetter [EMAIL PROTECTED] wrote: : 2) use multivalued fields as correlated vectors, so the first start : date corresponds :to the first end date corresponds to the first lat and long value. : You get them all back :in a query though, so your app would need to do extra work to sort : out which matched. if you expect a bounded number of correlated events per item, you can use dynaimc fields, and build up N correlated subqueries where N is the upper bound on the number of events you expect any item to have, ie... (+lat1:[x TO y] +lon1:[w TO z] +time1:[a TO b]) OR (+lat2:[x TO y] +lon2:[w TO z] +time2:[a TO b]) OR (+lat3:[x TO y] +lon3:[w TO z] +time3:[a TO b]) ... -Hoss
Re: One item, multiple fields, and range queries
: I've not yet used dynamic fields in this manner. With that number range, : what limitations could I encounter? Given the size of that, I would need very little, yonik recently listed the costs of dynamic fields... http://www.nabble.com/Searching-multiple-indices-%28solr-newbie%29-tf2903899.html#a8245621 ..as he points out, with omitNorms=true you can have thousands of dynamic fields and not even notice. : the solr engine to formulate that query, correct? I can't imagine I could : pass that entire subquery statement in the http request, as the character : limit would likely be exceeded. yeah ... if you wanted to try the approach i described, and your N wasn't a single digit number, i would recommend putting the query building code into a custom RequestHandler ... it could even inspect the list of field names from the IndexReader and know exactly how big N is at any given moment. i have no idea how efficient this approach would be if N really does get up into the hundreds. A completely different approach you could take if you want to get into Lucene Query internals would be to take advantage of something Doug mentioned once that has stayed in the back of my mind for almost a year now: PhraseQuery artificially enforces that the Terms you add to it are in the same field ... you could easily write a PhraseQuery-ish query that takes Terms from differnet fields, and ensures that they appear near eachother in terms of their token sequence -- the context of that comment was searching for instances of words with specific usage (ie: house used as a noun) by putting the usage type of each term in a different term in a seperate parallel field, but with identicle token positions. if you forget for a moment about the ranges you need to do, and imagine instead that you store the quadrent number and hour of day for each event, where e1q is the quadtrent of event1 for an item, and e1h is the hour of the day that event1 happened at, then for an item with multiple events you could index the field/terms lists quadrent: e1q e2q e3q hour: e1h e2h e3h and query for your input quadrent at a term position equal to the term position of your input hour. if you got *that* working, you could concievably change the query to take in a range for each field -- using TermEnum to get the list of of all latitude Terms in your latitude range, then for each of those Terms get the list of documents and the term position within thta document, and then look for the longitude terms in the same relative term position which are in your longitude range, and time terms in the same relative term position in your time range. does that make any sense? this is all purely theoretical, it just seems like it *should* be possible, but i haven't thought through how it would be implimented. if you acctually wanted to tackle it, i would start a discussion on [EMAIL PROTECTED] first, so people smarter then me can tlel you if i'm smoking crack or not. -Hoss
Re: Faceting question...
: fieldtype name=string class=solr.StrField sortMissingLast=true : omitNorms=true/ : field name=child_catname type=text indexed=true stored=true/ your fhild_catname isn't using string as it's field type -- it's using text (which is most likely using TextField and being tokenized) -Hoss
Re: Faceting question...
Thanks Chris. DUMB of me not to have noticed. Chris Hostetter wrote: : fieldtype name=string class=solr.StrField sortMissingLast=true : omitNorms=true/ : field name=child_catname type=text indexed=true stored=true/ your fhild_catname isn't using string as it's field type -- it's using text (which is most likely using TextField and being tokenized) -Hoss -- View this message in context: http://www.nabble.com/Faceting-question...-tf3016974.html#a8379221 Sent from the Solr - User mailing list archive at Nabble.com.
Apostrophes in fields
Hi This is probably more of a lucene question, but: I have an author field, If I query author:Shelley Ohara - no results are returned If I query author:Shelley O'hara - many results are returned, Is it possible, to get solr to ignore apostrophes in queries like the one above? e.g. doc doc arr name=authorstrShelley O'Hara/str/arr bool name=availabletrue/bool str name=descriptionlong descirption/str str name=ean9780764559747/str str name=formatPaperback/str str name=publisherIDGP/str str name=titleKierkegaard Within Your Grasp/str str name=year2004/str /doc Thanks -- - Nick
Re: document support for file system crawling
: In that respect I agree with the original posting that Solr lacks : functionality with respect to desired functionality. One can argue that : more or less random data should be structured by the user writing a : decent application. However a more easy to use and configurable plugin : architecture for different filtering and document parsing could make : Solr more attractive. I think that many potential users would welcome : such additions. i don't think you'll get any argument about the benefits of supporting more plugins to handle updates - both in terms of how the data is expressed, and how the data is fetched, in fact you'll find some rather involved discussions on that very topic going on on the solr-dev list right now. the thread you cite was specificly asking about: a) crawling a filesystem b) detecting document types and indexing text portions accordingly. I honestly can't imagine either of those things being supported out of the box by Solr -- there's just no reason for Solr to duplicate what Nutch alrady does very well. What i see being far more likely are: 1) more documentation (and posisbly some locking configuration options) on how you can use Solr to access an index generated by the nutch crawler (i think Thorsten has allready done this) or by Compass, or any other system that builds a Lucene index. 2) contrib code that runs as it's own process to crawl documents and send them to a Solr server. (mybe it parses them, or maybe it relies on the next item...) 3) Stock update plugins that can each read a raw inputstreams of a some widely used file format (PDF, RDF, HTML, XML of any schema) and have configuration options telling them them what fields in the schema each part of their document type should go in. 4) easy hooks for people to write their own update plugins for non widely used fileformats. -Hoss
Re: Trouble with data type in schema
On 1/15/07, Phil Rosen [EMAIL PROTECTED] wrote: I am trying to construct a data type that given the content: ID-111 would match on either ID or 111 Text and string wont do this, any suggestions? The text field as defined by Solr example's schema.xml should achieve this effect. Have you looked at the analysis portion of the solr admin ui (with 'verbose' checked) to investigate how your strings are being tokenized? regards, -Mike
separate log files
Hi Solr users, I'm running multiple instances of Solr, which all using the same war file to load from. Below is an example of the servlet context file used for each application. Context path=/app1-solr docBase=/var/usr/solr/solr-1.0.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/var/local/app1 override=true / /Context Hence each application is using the same WEB-INF/classes/logging.properties file to configure logging. I would like to each instance to log to separate log files such as; app1-solr.-mm-dd.log app2-solr.-mm-dd.log ... Is there an easy way to append the context path to org.apache.juli.FileHandler.prefix E.g. org.apache.juli.FileHandler.prefix = ${catalina.context}-solr. Or would this require a code change? Regards -Ben