Stable release, trunk release - same Tomcat instance
If I want to run the stable 1.3 release and the nightly build under the same Tomcat instance, should that be configured as multiple solr applications, or is there a different configuration to follow?
Re: Stable release, trunk release - same Tomcat instance
Um, yes this works. On Fri, Jun 12, 2009 at 11:12 AM, Jeff Rodenburg jeff.rodenb...@gmail.comwrote: If I want to run the stable 1.3 release and the nightly build under the same Tomcat instance, should that be configured as multiple solr applications, or is there a different configuration to follow?
Re: Getting SolrSharp to work, Part 2
Great, thanks Peter. And yes, I think it would be good to concentrate the conversation over on codeplex. I know the Solr team has no problem with solrsharp conversations here on the solr mailing list, but the conversation is highly focused on the server. Putting the solrsharp conversation on codeplex would keep the messages from drowning on this list. I'll check out the patch when I get a chance, thanks for the contribution. Hope things are working better for you now. :-/ -- j On Jan 25, 2008 4:53 AM, Peter Thygesen [EMAIL PROTECTED] wrote: Ups. Forgot to tell that the patch was uploaded on CodePlex http://www.codeplex.com/solrsharp/SourceControl/PatchList.aspx \peter -Original Message- From: Peter Thygesen Sent: 25. januar 2008 13:17 To: solr-user@lucene.apache.org Subject: RE: Getting SolrSharp to work, Part 2 This patch covers the issues I wrote about in my previous mails How to get SolrSharp to work and How to get SolrSharp to work, part 2 By the way should I post on this thread, or on CodePlex. When the topic is SolrSharp? I don't mind adding a few more comments to the discussion I already started on CodePlex. \peter -Original Message- From: Jeff Rodenburg [mailto:[EMAIL PROTECTED] Sent: 24. januar 2008 20:59 To: solr-user@lucene.apache.org Subject: Re: Getting SolrSharp to work, Part 2 Hey Peter - if you could submit your changes as an svn patch, we could apply the update much faster. thanks, jeff On Jan 23, 2008 2:42 AM, Peter Thygesen [EMAIL PROTECTED] wrote: I wrote a small client in .Net which query Solr and dumps the result on screen.. fantastic low-tech.. ;) However I ran into new SolrSharp problems. My schema allows a particular field to be multiValued, but if it only has one value, it will cause SolrSharp fail in line 88 of Class: IndexFiledAttribute. My SearchRecord property is an array (List) and line 88 tries to set my property as if it was a string. The code should be corrected by checking if the property is an array and not whether it has 1 value or more. E.g. change line 85 to 085 if(!this.PropertyInfo.PropertyType.IsArray) Original code (from class IndexFiledAttribute): 082 public void SetValue(SearchRecord searchRecord) 083 { 084 XmlNodeList xnlvalues = searchRecord.XNodeRecord.SelectNodes(this.XnodeExpression); 085 if (xnlvalues.Count == 1) //single value 086 { 087 XmlNode xnodevalue = xnlvalues[0]; 088 this.PropertyInfo.SetValue(searchRecord, Convert.ChangeType(xnodevalue.InnerText, this.PropertyInfo.PropertyType) , null); 089 } 090 else if (xnlvalues.Count 1) //array 091 { 092 Type basetype = this.PropertyInfo.PropertyType.GetElementType(); 093 Array valueArray = Array.CreateInstance(basetype, xnlvalues.Count); 094 for (int i = 0; i xnlvalues.Count; i++) 095 { 096 valueArray.SetValue(Convert.ChangeType(xnlvalues[i].InnerText, basetype), i); 097 } 098 this.PropertyInfo.SetValue(searchRecord, valueArray, null); 099 } 100 } My code (replace): 085if(!this.PropertyInfo.PropertyType.IsArray) // single value 090else // array Cheers, Peter Thygesen -- hope to see you all at ApacheCon in Amsterdam :)
Re: Updating and Appending
On Jan 23, 2008 1:29 PM, Chris Harris [EMAIL PROTECTED] wrote: And then if you're using a client such as solrsharp, there's the question of whether *it* will slurp the whole stream into memory. Solrsharp reads of the XML stream from Solr use standard dotnet framework XML objects, which by default read the entirety of the stream into memory before returning control back to your code. There are facilities in the dotnet framework which provide for reading XML data in chunks vs. the full stream, but solrsharp at present uses the defaults of the framework. -- jeff
Re: Solr, operating systems and globalization
OK, this simplifies things greatly. For C#, the proper culture setting for interaction with Solr should be Invariant. Basically, the primary requirement for Solrsharp is to be culturally-consistent with the targeted Solr server to ensure proper data-type formatting. Since Solr is culturally-agnostic, Solrsharp should be so as well. Thanks for the clarification. On 10/17/07, Chris Hostetter [EMAIL PROTECTED] wrote: : This is exactly the scenario. Ideally what I'd like to achieve is for : Solrsharp to discover the culture settings from the targeted Solr instance : and set the client in appropriate position. well ... my point is there shouldn't be any cultural settings on the targeted Solr server that the client needs to know about. the communication between the server and any clients should always be in a fixed format independent of culture. Any (hypothetical) culture specific settings the server has to have might affect teh functionality, but shouldn't affect the communication (ie: for the purposes of date rounding/faceting the Solr server might be configured to know what timezone to use for rounding to the nearest day is, or what Locale to use to compute the first first day of the week, but when returning that info to clients it should still be stringified in an abolute format (UTC) : multi-lingual systems across different JVM and OS platforms. If it *were* : the case that different underlying system stacks affected solr in such a : way, Solrsharp should follow the server's lead. if that were the case, the server would be buggy and should be fixed :) i don't know much about C#, but i can't really think of a lot of cases where client APIs really need to be very multi-cultural aware ... typically culture/locale type settings related to parsing and formatting of datatypes (ie: how to stringify a number, how to convert a date to/from a string, etc...). when client code is taking input and sending it to solr it's dealing with native objects nad stringifying them into the canonical format Solr wants -- independent of culture. when client code is reading data back from Solr and returning it it needs to parse those strings from the canonical form and return them as native objects. The only culture that SolrSharp should need to worry about is the InvariantCulture you described ... right? -Hoss
Re: Solr, operating systems and globalization
Thanks for the comments Hoss. More notes embedded below... On 10/17/07, Chris Hostetter [EMAIL PROTECTED] wrote: : However, SolrSharp culture settings should be reflective and consistent with : the solr server instance's culture. This leads to my question: does Solr : control its culture language settings through the various language : components that can be incorporated, or does the underlying OS have a say in : how that data is treated? As a general rule: 1) Solr (the server) should operate as culturally and locally agnostic as possible. 2) Solr Clients that want to act culturally appropriate should explicitly translate from local formats to absolute concepts that it sends to the server. (ala: the absolute unambiguous date format) Ideally you should be able to take a Solr install from one box, move it to another JVM on a different OS in a different timezone with different Locale settings and everything will keep working the same. I fully understand that approach. Going back to C#/Windows, this is known as an Invariant culture setting, which we're incorporating into Solrsharp (along with configurable culture settings as appropriate.) (I think once upon a time i argued that Solr should assume the charencoding of the local JVM, and wiser people then me pointed out that was bad). There may be exceptions to this -- but those exceptions should be in cases where: a) the person configuring Solr is in completley control; and b) the exception is prudent because doing the work in the client would require more complexity. Analysis is a good example of this: we don't make the clients analyze the text according to the native language customs -- we let the person creating the schema.xml specify what the Analysis should be. As i recal, the issue that prompted this email had to do with C# and the various cultural ways to specify a floating point number: 1,234 vs 1.234 (comma vs period). this is the kind of thing that should be translated in clients to the canonical floating point representation. ... by which i mean: the one the solr server uses :) This is exactly the scenario. Ideally what I'd like to achieve is for Solrsharp to discover the culture settings from the targeted Solr instance and set the client in appropriate position. *IF* Solr has the behavior where setting the JVM local to something random makes Solr assume floats should be in the comma format, then i would consider that a Bug in Solr ... Solr should allways be consistent. This would be an interesting discovery exercise for those who deal with multi-lingual systems across different JVM and OS platforms. If it *were* the case that different underlying system stacks affected solr in such a way, Solrsharp should follow the server's lead. -Hoss
Solr, operating systems and globalization
We discovered and verified an issue in SolrSharp whereby indexing and searching can be disrupted without taking Windows globalization culture settings into consideration. For example, European cultures affect numeric and date values differently from US/English cultures. The resolution for this type of issue is to specifically control the culture settings to allow for index data formatting to work. However, SolrSharp culture settings should be reflective and consistent with the solr server instance's culture. This leads to my question: does Solr control its culture language settings through the various language components that can be incorporated, or does the underlying OS have a say in how that data is treated? Some education on this would be greatly appreciated. cheers, jeff r.
Re: WebException (ServerProtocolViolation) with SolrSharp
Good to know, I think this needs to be a configurable value in the library (overridable, at a minimum.) What's outstanding for me on this is understanding the Solr side of the equation, and whether culture variance comes into play. What makes this even more interesting/confusing is how culture scenarios may differ across platforms. I do most of my production work against a solr farm running on RHEL4, but often do side development work against Win2K3. Thanks for confirming the culture issue, this will make its way into the source as a fix in the future. cheers, jeff On 10/11/07, Filipe Correia [EMAIL PROTECTED] wrote: Jeff, Thanks! Your suggestion worked, instead of invoking ToString() on float values I've used ToString's other signature, which takes a an IFormatProvider: CultureInfo MyCulture = CultureInfo.InvariantCulture; this.Add(new IndexFieldValue(weight, weight.ToString(MyCulture.NumberFormat))); this.Add(new IndexFieldValue(price, price.ToString( MyCulture.NumberFormat))); This made me think on a related issue though. In this case it was the client that was using a non-invariant number format, but can this also happen on Solr's side? If so, I guess I may need to configure it somewhere... Cheers, Filipe Correia On 10/10/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: Hi Felipe - The issue you're encountering is a problem with the data format being passed to the solr server. If you follow the stack trace that you posted, you'll notice that the solr field is looking for a value that's a float, but the passed value is 1,234. I'm guessing this is caused by one of two possibilities: (1) there's a typo in your example code, where 1,234 should actually be 1.234, or (2) there's a culture settings difference on your server that's converting 1.234 to 1,234 Assuming it's the latter, add this line in the ExampleIndexDocument constructor: CultureInfo MyCulture = new CultureInfo(en-US); Please let me know if this fixes the issue, I've been looking at this previously and would like to confirm it. thanks, jeff r. On 10/10/07, Filipe Correia [EMAIL PROTECTED] wrote: Hello, I am trying to run SolrSharp's example application but am getting a WebException with a ServerProtocolViolation status message. After some debugging I found out this is happening with a call to: http://localhost:8080/solr/update/ And using fiddler[1] found out that solr is actually throwing the following exception: org.apache.solr.core.SolrException: Error while creating field 'weight{type=sfloat,properties=indexed,stored,omitNorms,sortMissingLast}' from value '1,234' at org.apache.solr.schema.FieldType.createField(FieldType.java :173) at org.apache.solr.schema.SchemaField.createField( SchemaField.java :94) at org.apache.solr.update.DocumentBuilder.addSingleField( DocumentBuilder.java:57) at org.apache.solr.update.DocumentBuilder.addField( DocumentBuilder.java:73) at org.apache.solr.update.DocumentBuilder.addField( DocumentBuilder.java:83) at org.apache.solr.update.DocumentBuilder.addField( DocumentBuilder.java:77) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc( XmlUpdateRequestHandler.java:339) at org.apache.solr.handler.XmlUpdateRequestHandler.update( XmlUpdateRequestHandler.java:162) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody( XmlUpdateRequestHandler.java:84) at org.apache.solr.handler.RequestHandlerBase.handleRequest( RequestHandlerBase.java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute( SolrDispatchFilter.java:191) at org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:159) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter( ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke( StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke( StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke( ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service( CoyoteAdapter.java:263) at org.apache.coyote.http11.Http11Processor.process( Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:584
Re: WebException (ServerProtocolViolation) with SolrSharp
Hi Felipe - The issue you're encountering is a problem with the data format being passed to the solr server. If you follow the stack trace that you posted, you'll notice that the solr field is looking for a value that's a float, but the passed value is 1,234. I'm guessing this is caused by one of two possibilities: (1) there's a typo in your example code, where 1,234 should actually be 1.234, or (2) there's a culture settings difference on your server that's converting 1.234 to 1,234 Assuming it's the latter, add this line in the ExampleIndexDocument constructor: CultureInfo MyCulture = new CultureInfo(en-US); Please let me know if this fixes the issue, I've been looking at this previously and would like to confirm it. thanks, jeff r. On 10/10/07, Filipe Correia [EMAIL PROTECTED] wrote: Hello, I am trying to run SolrSharp's example application but am getting a WebException with a ServerProtocolViolation status message. After some debugging I found out this is happening with a call to: http://localhost:8080/solr/update/ And using fiddler[1] found out that solr is actually throwing the following exception: org.apache.solr.core.SolrException: Error while creating field 'weight{type=sfloat,properties=indexed,stored,omitNorms,sortMissingLast}' from value '1,234' at org.apache.solr.schema.FieldType.createField(FieldType.java :173) at org.apache.solr.schema.SchemaField.createField(SchemaField.java :94) at org.apache.solr.update.DocumentBuilder.addSingleField( DocumentBuilder.java:57) at org.apache.solr.update.DocumentBuilder.addField( DocumentBuilder.java:73) at org.apache.solr.update.DocumentBuilder.addField( DocumentBuilder.java:83) at org.apache.solr.update.DocumentBuilder.addField( DocumentBuilder.java:77) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc( XmlUpdateRequestHandler.java:339) at org.apache.solr.handler.XmlUpdateRequestHandler.update( XmlUpdateRequestHandler.java:162) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody( XmlUpdateRequestHandler.java:84) at org.apache.solr.handler.RequestHandlerBase.handleRequest( RequestHandlerBase.java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute( SolrDispatchFilter.java:191) at org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:159) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter( ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke( StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke( StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke( ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service( CoyoteAdapter.java:263) at org.apache.coyote.http11.Http11Processor.process( Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:584) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run( JIoEndpoint.java:447) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.NumberFormatException: For input string: quot;1,234quot; at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source) at java.lang.Float.parseFloat(Unknown Source) at org.apache.solr.util.NumberUtils.float2sortableStr( NumberUtils.java:80) at org.apache.solr.schema.SortableFloatField.toInternal( SortableFloatField.java:50) at org.apache.solr.schema.FieldType.createField(FieldType.java :171) ... 24 more type Status report message Error while creating field 'weight{type=sfloat,properties=indexed,stored,omitNorms,sortMissingLast}' from value '1,234' I am just starting to try Solr, and might be missing some configurations, but I have no clue where to begin to investigate this further without digging into Solr's source, which I would really like to avoid for now. Any thoughts? thank you in advance, Filipe Correia [1] http://www.fiddlertool.com/
Re: Solrsharp culture problems
Yes, that would be the right solution. I'm not sure if, in order to use french culture settings on xp, you would require corresponding changes in culture settings for the solr instance. Hope this helps. -- j On 9/24/07, JP Genty - LibertySurf [EMAIL PROTECTED] wrote: I use solrsharp on a french XP and I have problems with the float conversion to text. I modified ExempleIndexDocument constructor to force the en-US culture. CultureInfo MyCulture = new CultureInfo(en-US); . . this.Add(new IndexFieldValue(weight, weight.ToString(MyCulture))); this.Add(new IndexFieldValue(price, price.ToString(MyCulture))); And I modified IndexFieldAttribute SetValue method CultureInfo MyCulture = new CultureInfo(en-US); this.PropertyInfo.SetValue(searchRecord, Convert.ChangeType(xnodevalue.InnerText, this.PropertyInfo.PropertyType, MyCulture), null); valueArray.SetValue(Convert.ChangeType(xnlvalues[i].InnerText, basetype, MyCulture), i); Now the example runs smoothly on a windows XP french. Is it the right solution ?? Thanks Jean-Paul
Dilbert (off-topic)
It may be off-topic, but it's friday and thought all the java coders would appreciate today's dilbert. (I'm not primary a java dev, but I know the feeling) http://www.dilbert.com/comics/dilbert/archive/dilbert-20070907.html cheers, jeff r.
Solrsharp now supports debugQuery
Solrsharp now supports query debugging. This is enabled through the debugQuery and explainOther parameters. A DebugResults object is referenced by a SearchResults instance and provides all the debugging information that is available through these parameters, such as: - QueryString and ParsedQuery string values - Array of ExplanationRecord objects - OtherQuery value (if provided) - Array of ExplanationRecord objects supporting the OtherQuery value The ExplanationRecord object provides the details of the debug results, specifically including the ExplainInfo string (the debug analysis payload) and a reference to the UniqueRecordKey of the evaluated record. The UniqueRecordKey, though returned as a string, could then be cast appropriately to reference the matching SearchRecord referenced by the same SearchResults instance. The example program with the source code has been updated to show how to make use of these properties. If any issues are found, please log them to JIRA and associate them with the C# client component. cheers, jeff r.
Major update to Solrsharp
A big update was just posted to the Solrsharp project. This update now provides for first-class support for highlighting in the library. The implementation is really robust and provides the following features: - Structured highlight parameter assignment based on the SolrField object - Full access for all highlight parameters, on both an aggregate and per-field basis - Incorporation of highlighted values into the base search result records All of the supplied documentation has been updated as well as the example application in using the highlighting classes. Please report any issues through JIRA. Be sure to associate any issues with the C# client component. cheers, jeff r.
Re: Solrsharp highlighting
I've been working on the highlighting component, and it's a little odd how it works. For myself, if I want terms highlighted, I'd like those in the return results. Solr, on the other hand, returns a separate xml node that represents the portions of the results that are highlighted. I know that it's incorporated that way for other reasons, but it makes patching the highlighted portions together with the doc results in Solrsharp an out-of-band experience. Nonetheless, the approach I'm trying is one where the highlighted nodes are associated with the SearchResults object, and will have their highlighted text bits incorporated into the associated SearchRecord objects. At least that's what I'm initially trying to accomplish. -- j On 8/15/07, Charlie Jackson [EMAIL PROTECTED] wrote: Thanks for adding in those facet examples. That should help me out a great deal. As for the highlighting, did you have any ideas about a good way to go about it? I was thinking about taking a stab at it, but I want to get your input first. Thanks, Charlie -Original Message- From: Jeff Rodenburg [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 14, 2007 1:08 AM To: solr-user@lucene.apache.org Subject: Re: Solrsharp highlighting Pull down the latest example code from http://solrstuff.org/svn/solrsharpwhich includes adding facets to search results. It's really short and simple to add facets; the example application implements one form of it. The nice thing about the facet support is that it utilizes generics to allow you to have strongly typed name/value pairs for the fieldname/count data. Hope this helps. -- jeff r. On 8/10/07, Charlie Jackson [EMAIL PROTECTED] wrote: Also, are there any examples out there of how to use Solrsharp's faceting capabilities? Charlie Jackson 312-873-6537 [EMAIL PROTECTED] -Original Message- From: Charlie Jackson [mailto:[EMAIL PROTECTED] Sent: Friday, August 10, 2007 3:51 PM To: solr-user@lucene.apache.org Subject: Solrsharp highlighting Trying to use Solrsharp (which is a great tool, BTW) to get some results in a C# application. I see the HighlightFields method of the QueryBuilder object and I've set it to my highlight field, but how do I get at the results? I don't see anything in the SearchResults code that does anything with the highlight results XML. Did I miss something? Thanks, Charlie
Re: Solrsharp highlighting
Pull down the latest example code from http://solrstuff.org/svn/solrsharpwhich includes adding facets to search results. It's really short and simple to add facets; the example application implements one form of it. The nice thing about the facet support is that it utilizes generics to allow you to have strongly typed name/value pairs for the fieldname/count data. Hope this helps. -- jeff r. On 8/10/07, Charlie Jackson [EMAIL PROTECTED] wrote: Also, are there any examples out there of how to use Solrsharp's faceting capabilities? Charlie Jackson 312-873-6537 [EMAIL PROTECTED] -Original Message- From: Charlie Jackson [mailto:[EMAIL PROTECTED] Sent: Friday, August 10, 2007 3:51 PM To: solr-user@lucene.apache.org Subject: Solrsharp highlighting Trying to use Solrsharp (which is a great tool, BTW) to get some results in a C# application. I see the HighlightFields method of the QueryBuilder object and I've set it to my highlight field, but how do I get at the results? I don't see anything in the SearchResults code that does anything with the highlight results XML. Did I miss something? Thanks, Charlie
Re: Solrsharp highlighting
Thanks for the comments, Charlie. No, you didn't miss anything with the highlight results. It hasn't been implemented yet. :-/ The first implementation was quite janky, and was consequently removed. I'm adding an issue in JIRA about implementing highlighted fields. ( https://issues.apache.org/jira/browse/SOLR-338) On 8/10/07, Charlie Jackson [EMAIL PROTECTED] wrote: Trying to use Solrsharp (which is a great tool, BTW) to get some results in a C# application. I see the HighlightFields method of the QueryBuilder object and I've set it to my highlight field, but how do I get at the results? I don't see anything in the SearchResults code that does anything with the highlight results XML. Did I miss something? Thanks, Charlie
Re: Please help! Solr 1.1 HTTP server stops responding
Not sure if this would help you, but we encountered java heap OOM issues with 1.1 earlier this year. We patched solr with the latest bits at the time, which included a lucene memory fix for java heap OOM issues. ( http://issues.apache.org/jira/browse/LUCENE-754) Different servlet container (Tomcat 5.5) and we're running JRE 5 v9. After applying the update to the solr bits that included the patch mentioned above, OOM has never re-appeared. -- j On 7/30/07, Mike Klaas [EMAIL PROTECTED] wrote: On 30-Jul-07, at 11:35 AM, David Whalen wrote: Hi Yonik! I'm glad to finally get to talk to you. We're all very impressed with solr and when it's running it's really great. We increased the heap size to 1500M and that didn't seem to help. In fact, the crashes seem to occur more now than ever. We're constantly restarting solr just to get a response. How much memory is on the system, and is anything else running? How large is the resulting index? If you're willing for some queries to take longer after a commit, reducing/eliminating the autoWarmCount for your queryCache and facetCache should decrease the peak memory usage (as Solr as two copies of the cache open at that point). Setting it to zero could up the halve the peak memory usage (at the cost of loss of performance after commits). As yonik suggested, check for PERFORMANCE warnings too--you may have more than two Searchers open at once. -Mike
Acceptable schema def?
As an example, consider the following: dynamicField name=*_field type=text_ws indexed=true stored=true/ copyField source=yadayada_field dest=all_fields / field name=all_fields type=text_ws indexed=true stored=false/ Two questions: 1) Is the definition of the source attribute for a copyField node that would work as a dynamicField node valid? 2) Is the dest attribute for a copyField node required to be implemented as a field node? Could it be a dynamic field? For example, could the dest attribute in the above example be set to mega_field (since that would match the dynamicField definition)? I'll test these myself laer, but don't have access to a solr instance to play with this stuff right now. thanks, j
Re: Acceptable schema def?
As an example, consider the following: dynamicField name=*_field type=text_ws indexed=true stored=true/ copyField source=yadayada_field dest=all_fields / field name=all_fields type=text_ws indexed=true stored=false/ Two questions: 1) Is the definition of the source attribute for a copyField node that would work as a dynamicField node valid? 2) Is the dest attribute for a copyField node required to be implemented as a field node? Could it be a dynamic field? For example, could the dest attribute in the above example be set to mega_field (since that would match the dynamicField definition)? I'll test these myself later, but don't have access to a solr instance to play with this stuff right now. Another funky example to ponder: dynamicField name=*_field type=string indexed=true stored=true/ copyField source=*_field dest=all_fields / field name=all_fields type=text_ws indexed=true stored=false/
Solrsharp: direction
I've been asked a few questions of late that all have a familiar theme: what's going on with solrsharp development? Well, I've been working on the next iteration of the Solrsharp client library, attempting to bring it more in line with the capabilities of Solr, at least as of the 1.2 release. The goal of the Solrsharp project is to enable C# applications to take full advantage of Solr. Here's what happening: the main feature in development right now is the creation of RequestHandler objects. Solrsharp uses default handlers for queries and updates (/select and /update); the RequestHandler objects will enable assignable solr requesthandlers to any query. While assigning a request handler for a specific query is an active step, loading the solr-configured request handlers will be passive. The default handlers will still apply, in case you don't require any different handlers. If anyone has suggestions or comments around this, please pass them along. Ideally, we would begin thinking about Solr 1.3 features and how Solrsharp would be extended to utilize those as well. Any comments about future capabilities and what clients need to do to take advantage of those are welcome. cheers, jeff r.
Re: SolrSharp boost - int vs. float
Nope, other than just oversight. I just modified the QueryParameter class to change the _boost and Boost variable property to type float, and all works well. I'll log an issue in JIRA and update the source. thanks otis, jeff On 7/5/07, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi, Here is a quick one for Jeff R. about his SolrSharp client. Looking at http://solrstuff.org/svn/solrsharp/src/Query/Parameters/QueryParameter.cs, I see boost defined as an int(eger): private int _boost = 1; Lucene's boosts are floats (see http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/Query.html#getBoost()) Is there a reason boosts are ints in SolrSharp? Thanks, Otis
Re: solrsharp thoughts
Thanks Ryan. Comments below. On 7/5/07, Ryan McKinley [EMAIL PROTECTED] wrote: I just took a quick look at solrsharp. I don't really have to use it yet, so this is not an in depth review. I like the templated SearchResults -- that seems useful. That has proven to be extremely useful in our implementation. The template gives you the base stuff, and the implementation allows us to strongly type our results which makes programmatic usage easier to deal with. I don't quite follow the need to parse the SolrSchema on the client side? Is that to know what fields are available? Could the same thing be achieved reading the response from the luke request handler? I only worry about it is as something to keep in sync with the java impl. There's no real need to parse SolrSchema in order to execute searches or to add/update docs to the search index. The SolrSchema object was just a way of gathering schema definition and using it for whatever purpose it might make sense. It would be good to be able to change the request paths. While /select and /update will usually work, it is possible to put stuff elsewhere. This is a TODO item. The original library was constructed around the default paths, prior to the 1.1 release. The TODO item is actually one where named request handlers should be accessible objects that can be assigned to both searches and index updates. nitpick: FacetParameter.Zeros - should use facet.mincount instead. (facet.mincount=0 is the same behavior) Yep, another TODO item. I actually have this in place in development, pending a review of all facet parameters against the 1.2 release for accuracy. ryan
SolrRequestHandler question
I have a search use case that requires that I use the results of search from IndexA and apply them as a query component of a second search to IndexB. (The nature of the data doesn't allow me to combine these indexes). At present, this is handled at the client level: search one index, get the results, apply them to a search against another index. I can't change the two-query fundamentals, but I'd like to hide the implementation from the client. If I wanted to concentrate this logic at the server, should I be considering a custom request handler? The request handler would: - accept the query parameters - use a subset of parameters to build a query against another search index - execute that query, gather the results - use those results as new parameters in another query - execute the second query I'm sure this isn't atypical, how are others accomplishing this? thanks, j
Re: Recent updates to Solrsharp
great, thanks Yonik. On 6/20/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 6/21/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: As an aside, it would be nice to record these issues more granularly in JIRA. Could we get a component created for our client library, similar to java/php/ruby? Done. -Yonik
Re: SolrSharp example
Hi Michael - Moving this conversations to the general solr mailing list... 1. SolrSharp example solution works with schema.xml from apache-solr-1.1.0-incubating.If I'm using schema.xml from apache-solr-1.2.0 example program doesn't update index... I didn't realize the solr 1.2 release code sample schema.xml was different from the solr 1.1 version. In my implementation, I had solr 1.1 already installed and upgraded to 1.2 by replacing the war file (per the instructions in solr.) So, the example code is geared to go against the 1.1schema. For the example code, adding the timestamp field in the ExampleIndexDocument public constructor such as: this.Add(new IndexFieldValue(timestamp, DateTime.Now.ToString (s)+Z))); will take care of the solr 1.2 schema invalidation issue. The addition of the @default attribute on this field in the schema is not presently accommodated in the validation routine. If I'm not mistaken, the default attribute value will be applied for all documents without that field present in the xml payload. This would imply that any field with a default attribute is not required for any implemented UpdateIndexDocument. I'll look into this further. 2. When I run example with schema.xml from apache-solr-1.1.0-incubating program throw Exception Hmmm, can't really help you with this one. It sounds as if solr is incurring an error when the xml is posted to the server. Try the standard step-through troubleshooting routines to see what messages are being passed back from the server. -- j On 6/19/07, Michael Plax [EMAIL PROTECTED] wrote: Hello Jeff, thank you again for updating files. I just run with some problems. I don't know what is the best way to report them solr maillist/solrsharp jira. 1. SolrSharp example solution works with schema.xml from apache-solr-1.1.0-incubating. If I'm using schema.xml from apache-solr-1.2.0 example program doesn't update index because: line 33: if (solrSearcher.SolrSchema.IsValidUpdateIndexDocument(iDoc)) return false. update falls because of configuration file schema.xml file: line 265: field name=word type=string indexed=true stored=true/ ... line 279:field name=timestamp type=date indexed=true stored=true default=NOW multiValued=false/ those fields word, timestamp don't pass validation in SolrSchema.csline 217. 2. When I run example with schema.xml from apache-solr-1.1.0-incubating program throw Exception System.Exception was unhandled Message=Http error in request/response to http://localhost:8983/solr/update/; Source=SolrSharp StackTrace: at org.apache.solr.SolrSharp.Configuration.SolrSearcher.WebPost(String url, Byte[] bytesToPost, String statusDescription) in E:\SOLR-CSharp\src\Configuration\SolrSearcher.cs:line 229 at org.apache.solr.SolrSharp.Update.SolrUpdater.PostToIndex(IndexDocument oDoc, Boolean bCommit) in E:\SOLR-CSharp\src\Update\SolrUpdater.cs:line 70 at SolrSharpExample.Program.Main(String[] args) in E:\SOLR-CSharp\example\Program.cs:line 35 at System.AppDomain.nExecuteAssembly(Assembly assembly, String[] args) at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args) at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly () at System.Threading.ThreadHelper.ThreadStart_Context(Object state) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.ThreadHelper.ThreadStart() xmlstring value from oDoc.SerializeToString() ?xml version=\1.0\ encoding=\utf-8\?add xmlns:xsi=\ http://www.w3.org/2001/XMLSchema-instance\http://www.w3.org/2001/XMLSchema-instance%5C xmlns:xsd=\http://www.w3.org/2001/XMLSchema\;docfieldhttp://www.w3.org/2001/XMLSchema%5C%22%3E%3Cdoc%3E%3Cfieldname=\id\101/fieldfield name=\name\One oh one/fieldfield name=\manu\Sony/fieldfield name=\cat\Electronics/fieldfield name=\cat\Computer/fieldfield name=\features\Good/fieldfield name=\features\Fast/fieldfield name=\features\Cheap/fieldfield name=\includes\USB cable/fieldfield name=\weight\1.234/fieldfield name=\price\99.99/fieldfield name=\popularity\1/fieldfield name=\inStock\True/field/doc/add I checked all features from Solr tutorial, they are working. I'm running solr on Windows XP Pro without firewall. Do you know how to solve those problems? Do you recommend to handle all communication by maillist/jira ? Regards Michael
Re: SolrSharp example
On 6/20/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 6/20/07, Michael Plax [EMAIL PROTECTED] wrote: This is a log that I got after runnning SolrSharp example. I think example program posts not properly formatted xml. I'm running Solr on Windows XP, Java 1.5. Are those settings could be the problem? Solr1.2 is pickier about the Content-type in the HTTP headers. I bet it's being set incorrectly. Ahh, good point. Within SolrSearcher.cs, the WebPost method contains this setting: oRequest.ContentType = application/x-www-form-urlencoded; Looking through the CHANGES.txt file in the 1.2 tagged release on svn: 9. The example solrconfig.xml maps /update to XmlUpdateRequestHandler using the new request dispatcher (SOLR-104). This requires posted content to have a valid contentType: curl -H 'Content-type:text/xml; charset=utf-8'. The response format matches that of /select and returns standard error codes. To enable solr1.1 style /update, do not map /update to any handler in solrconfig.xml (ryan) For SolrSearcher.cs, it sounds as though changing the ContentType setting to text/xml may fix this issue. I don't have a 1.2 instance to test this against available to me right now, but can check this later. Michael, try updating your SolrSearcher.cs file for this content-type setting to see if that resolves your issue. thanks, jeff r.
Re: SolrSharp example
Thanks for checking, Michael -- great find. I'm in process of readying this same fix for inclusion in the source code (I'm verifying against a full 1.2install.) The SolrField class is now also being extended to incorporate an IsDefaulted property, which will permit the SolrSchema.IsValidUpdateIndexDocument to yield true when default value fields aren't present in the update request. thanks, jeff r. On 6/20/07, Michael Plax [EMAIL PROTECTED] wrote: Hello, Yonik and Jeff thank you for your help. You are right this was content-type issue. in order to run example following things need to be done: 1.Code (SolrSharp) should be changed from: src\Configuration\SolrSearcher.cs(217):oRequest.ContentType = application/x-www-form-urlencoded; to: src\Configuration\SolrSearcher.cs(217):oRequest.ContentType = text/xml; 2. In order take care of the solr 1.2 schema invalidation issue: schema.xml comment line: 265 !-- field name=word type=string indexed=true stored=true/-- comment line: 279 !-- field name=timestamp type=date indexed=true stored=true default=NOW multiValued=false/-- or as Jeff suggested: For the example code, adding the timestamp field in the ExampleIndexDocument public constructor such as: this.Add(new IndexFieldValue(timestamp, DateTime.Now.ToString(s)+Z))); Regards Michael - Original Message - From: Jeff Rodenburg [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, June 20, 2007 1:56 PM Subject: Re: SolrSharp example On 6/20/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 6/20/07, Michael Plax [EMAIL PROTECTED] wrote: This is a log that I got after runnning SolrSharp example. I think example program posts not properly formatted xml. I'm running Solr on Windows XP, Java 1.5. Are those settings could be the problem? Solr1.2 is pickier about the Content-type in the HTTP headers. I bet it's being set incorrectly. Ahh, good point. Within SolrSearcher.cs, the WebPost method contains this setting: oRequest.ContentType = application/x-www-form-urlencoded; Looking through the CHANGES.txt file in the 1.2 tagged release on svn: 9. The example solrconfig.xml maps /update to XmlUpdateRequestHandler using the new request dispatcher (SOLR-104). This requires posted content to have a valid contentType: curl -H 'Content-type:text/xml; charset=utf-8'. The response format matches that of /select and returns standard error codes. To enable solr1.1 style /update, do not map /update to any handler in solrconfig.xml (ryan) For SolrSearcher.cs, it sounds as though changing the ContentType setting to text/xml may fix this issue. I don't have a 1.2 instance to test this against available to me right now, but can check this later. Michael, try updating your SolrSearcher.csfile for this content-type setting to see if that resolves your issue. thanks, jeff r.
Recent updates to Solrsharp
Thanks to Yonik, Michael, Ryan, (and others) for some recent help on various issues discovered with Solrsharp. We were able to discover a few issues with the library relative to the Solr 1.2 release. Those issues have been remedied and have been pushed into source control. The Solrsharp source code can be obtained at: http://solrstuff.org/svn/solrsharp. Recent fixes include: - Fix for broken DeleteIndexDocument xml serialization - Update to correct document posting content-type to solr 1.2 instance - Identifying schema fields with new IsDefaulted property - Updates to the example application to incorporate these fixes and the solr 1.2 sample schema - Updated documentation consistent with these changes As an aside, it would be nice to record these issues more granularly in JIRA. Could we get a component created for our client library, similar to java/php/ruby? cheers, j
Update to SolrSharp
Solrsharp has been validated against the Solr 1.2 release. Validation was made using the example application that's available with the Solrsharp code against a default example index with the Solr 1.2 released bits. - The source code for Solrsharp is now accessible via subversion. Many thanks to Ryan McKinley for hosting the codebase. You can find it at: http://solrstuff.org/svn/solrsharp - A new folder has been added: docs/api. We have MSDN-style documentation to help explain the full library. When you update from the repository, just point your browser to the local file at /docs/api/index.html. As always, send your praise or complaints this direction. cheers, jeff r.
Re: solr+hadoop = next solr
On 6/7/07, Rafael Rossini [EMAIL PROTECTED] wrote: Hi, Jeff and Mike. Would you mind telling us about the architecture of your solutions a little bit? Mike, you said that you implemented a highly-distributed search engine using Solr as indexing nodes. What does that mean? You guys implemented a master, multi-slave solution for replication? Or the whole index shards for high availability and fail over? Our solution doesn't use solr, but goes directly to lucene. It's built on windows, so the interop communication service is built on .net remoting (tcp based). Microsoft has deprecated ongoing development with .net remoting, in favor of other more standard mechanisms, i.e. http. So, we're looking to migrate our solution to a more community-supported model. The underlying structure sounds similar to what others have done: index shards distributed to various servers, each responsible for a subset of the index. A merging server handles coordination of concurrent thread requests and synchronizes the results as they're returned. The thread coordination and search results interleaving process is functional but not really scalable. It works for our user model, where users tend not to page deeply through results. We want to change that so we can use solr as our primary data source read mechanism for our site. -- j
Re: solr+hadoop = next solr
Mike - thanks for the comments. Some responses added below. On 6/7/07, Mike Klaas [EMAIL PROTECTED] wrote: I've implemented a highly-distributed search engine using Solr (200m docs and growing, 60+ servers). It is not a Solr-based solution in the vein of FederatedSearch--it is a higher-level architecture that uses Solr as indexing nodes. I'll note that it is a lot of work and would be even more work to develop in the generic extensible philosophy that Solr espouses. Yeah, we've done the same thing in the .Net world, and it's a tough slog. We're in the same situation -- making our solution generically extensible is pretty much a non-starter. In terms of the FederatedSearch wiki entry (updated last year), has there been any progress made this year on this topic, at least something worthy of being added or updated to the wiki page? Not to splinter efforts here, but maybe a working group that was focused on that topic could help to move things forward a bit. I don't believe that absence of organization has been the cause of lack of forward progress on this issue, but simply that there has been no-one sufficiently interested and committed to prioritizing this huge task to work on it. There is no need to form a working group (not when there are only a handful of active committers to begin with)--all interested people could just use solr-dev@ for discussion. That makes sense, just didn't want to bombard the list with the subject if it was a detractor from the core project, i.e. keep lucene messages on lucene, solr messages on solr, etc. The good-community-participant approach, if you will. Solr is an open-source project, so huge features will get implemented when there is a person or group of people devoted to leading the charge on the issue. If you're interested in being that person, that's great! Glad to jump in, not sure I qualify as such for that, but certainly a big cheerleader nonetheless.
Re: solr+hadoop = next solr
I've been exploring distributed search, as of late. I don't know about the next solr but I could certainly see a distributed solr grow out of such an expansion. In terms of the FederatedSearch wiki entry (updated last year), has there been any progress made this year on this topic, at least something worthy of being added or updated to the wiki page? Not to splinter efforts here, but maybe a working group that was focused on that topic could help to move things forward a bit. - j On 6/6/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 6/6/07, James liu [EMAIL PROTECTED] wrote: anyone agree? No ;-) At least not if you mean using map-reduce for queries. When I started looking at distributed search, I immediately went and read the map-reduce paper (easier concept than it first appeared), and realized it's really more for the indexing side of things (big batch jobs, making data from data, etc). Nutch uses map reduce for crawling/indexing, but not for querying. -Yonik
Re: distributed search
David - It depends on what distributed means in your question. If you're looking for high availability, that can be accomplished through typical load balancing schemes for the servlet container that's running solr. Solr helps out in this respect with a replication scheme using rsync that keeps the indexes updated on all load-balanced nodes. If you're looking for support for bigger indexes that don't fit inside one solr instance (multiple solr instances = one search index), it's presently not available (as far as I know.) Work has progressed in the area of federated search (http://wiki.apache.org/solr/FederatedSearch). There are many challenges to accomplishing this; the wiki outlines where progress has been made. -- j On 6/3/07, David Xiao [EMAIL PROTECTED] wrote: Hello all, Is there distributed support in Solr search engine? For example install solr instance on different server and have them load balanced. Anyway, any suggestion/experience about Solr distributed search topic is appreciated. Regards, David
Re: read only indexes?
We're controlling this with Tomcat configuration on our end. I'm not a servlet-container guru, but I would imagine similar capabilities exist on Jetty, et al. -- j On 5/24/07, Ryan McKinley [EMAIL PROTECTED] wrote: Is there a good way to force an index to be read-only? I could configure a dummy handler to sit on top of /update and throw an error, but i'd like a stronger assurance that nothing can call UpdateHandler.addDoc()
Solrsharp feedback
I sent a few messages to the list about Solrsharp, the C# library for working with Solr, a couple of weeks ago. This was the first iteration of the library and something I expected to see modified as others got a chance to review it. I've not heard any feedback since then, though. For those that have checked out the code, is it working for you? Does it make sense? thanks, jeff r.
Re: Requests per second/minute monitor?
Not yet from us, but I'm thinking about a nagios plugin for Solr. It would be tomcat-based for the http stuff, however. On 4/18/07, Walter Underwood [EMAIL PROTECTED] wrote: Is there a good spot to track request rate in Solr? Has anyone built a monitor? wunder -- Search Guru Netflix
Re: SolrSharp - a C# client API for Solr
It will be extremely helpful to get this in the hands of others. Like most packages, this was built out of need. As we get more eyes on it, I hope to see it improve at the same rate as change in Solr. I promised a few other additions to this set. Here's what I'm working on: - More content within the documentation about how to use the api. It's strongly object-oriented and usage requires you to put together your own set of classes that inherit from abstract classes in the library. The example code does it, but it's not clear how or why you do it, so some guidance is needed. I should probably add a wiki entry on the Solr site as well. - Nunit tests need to be added. These always get complex when involving distributed systems, but such is life. -- jeff On 4/10/07, JimS [EMAIL PROTECTED] wrote: Thanx for the great contribution Jeff! A hand clap to the Solr team too. I am looking forward to using Solr and Solr# in the coming months. Your client is going to be a great help. regards, -jim On 4/9/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: All - I'm proud to announce a release to a new client API for Solr -- SolrSharp. SolrSharp is a C# library that abstracts the interoperation of a solr search server. This is an initial release that covers the basics of working with Solr. The library is very fleshed out, but the example has only implemented simple keyword search. I really like the library (I'm a dogfood user, for sure) because I can strongly type different types of objects to search results. There's more forthcoming, i.e. more examples, but the basics are in place. Feedback always appreciated, suggestions for improvement are nice, and helping hands are the best. Until there's a better home for it, you can download the bits from JIRA at: https://issues.apache.org/jira/browse/SOLR-205 cheers, jeff r.
Re: Question about code contribution
Perfect, thanks Otis. Nice to hear from you, btw. cheers, j On 4/6/07, Otis Gospodnetic [EMAIL PROTECTED] wrote: Yes, each file needs to contain the license. Look at any .java file to see what should go there and where. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: Jeff Rodenburg [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, April 6, 2007 11:16:28 AM Subject: Re: Question about code contribution Whoops, typo: ...do the source code files need to contain the boilerplate Apache license. On 4/6/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: If I'm contributing new source files (separate project entirely) through JIRA, so the source code files need to contain the boilerplate Apache license/disclaimers and the like? This is new code and a new project (C#), and the wiki page on contributions ( http://wiki.apache.org/solr/HowToContribute) is mostly concerned with core Solr code. If there's a checklist of items that should be included, please forward or send me the link. cheers, j
SolrSharp - a C# client API for Solr
All - I'm proud to announce a release to a new client API for Solr -- SolrSharp. SolrSharp is a C# library that abstracts the interoperation of a solr search server. This is an initial release that covers the basics of working with Solr. The library is very fleshed out, but the example has only implemented simple keyword search. I really like the library (I'm a dogfood user, for sure) because I can strongly type different types of objects to search results. There's more forthcoming, i.e. more examples, but the basics are in place. Feedback always appreciated, suggestions for improvement are nice, and helping hands are the best. Until there's a better home for it, you can download the bits from JIRA at: https://issues.apache.org/jira/browse/SOLR-205 cheers, jeff r.
Question about code contribution
If I'm contributing new source files (separate project entirely) through JIRA, so the source code files need to contain the boilerplate Apache license/disclaimers and the like? This is new code and a new project (C#), and the wiki page on contributions ( http://wiki.apache.org/solr/HowToContribute) is mostly concerned with core Solr code. If there's a checklist of items that should be included, please forward or send me the link. cheers, j
Re: Question about code contribution
Whoops, typo: ...do the source code files need to contain the boilerplate Apache license. On 4/6/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: If I'm contributing new source files (separate project entirely) through JIRA, so the source code files need to contain the boilerplate Apache license/disclaimers and the like? This is new code and a new project (C#), and the wiki page on contributions ( http://wiki.apache.org/solr/HowToContribute) is mostly concerned with core Solr code. If there's a checklist of items that should be included, please forward or send me the link. cheers, j
Re: C# API for Solr
I'm working on it right now. The library is largely done, but I need to add some documentation and a few examples for usage. No promises, but I hope to have something available in the next few days. -- j On 4/5/07, Mike Austin [EMAIL PROTECTED] wrote: I would be very interested in this. Any idea on when this will be available? Thanks -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, April 02, 2007 1:44 AM To: solr-user@lucene.apache.org Subject: Re: C# API for Solr Well, i think there will be a lot of people who will be very happy with this C# client. grts,m Jeff Rodenburg [EMAIL PROTECTED] 31/03/2007 18:00 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject C# API for Solr We built our first search system architecture around Lucene.Net back in 2005 and continued to make modifications through 2006. We quickly learned that search management is so much more than query algorithms and indexing choices. We were not readily prepared for the operational overhead that our Lucene-based search required: always-on availability, fast response times, batch and real-time updates, etc. Fast forward to 2007. Our front-end is Microsoft-based, but we needed to support parallel development on non-Microsoft architecture, and thus needed a cross-platform search system. Hello Solr! We've transitioned our search system to Solr with a Linux/Tomcat back-end, and it's been a champ. We now use solr not only for standard keyword search, but also to drive queries for lots of different content sections on our site. Solr has moved beyond mission critical in our operation. As we've proceeded, we've built out a nice C# client library to abstract the interaction from C# to Solr. It's mostly generic and designed for extensibilty. With a few modifications, this could be a stand-alone library that works for others. I have clearance from the organization to contribute our library to the community if there's interest. I'd first like to gauge the interest of everyone before doing so; please reply if you do. cheers, jeff r.
Re: problems finding negative values
This one caught us as well. Refer to http://lucene.apache.org/java/docs/queryparsersyntax.html#Escaping%20Special%20Charactersfor understanding what characters need to be escaped for your queries. On 4/4/07, galo [EMAIL PROTECTED] wrote: Hi, I have an index consisting on the following fields: field name=id type=long indexed=true stored=true/ field name=length type=integer indexed=true stored=true/ field name=key type=integer indexed=true stored=true multiValued=true / Each doc has a few key values, some of which are negative. Ok, I know there's a document that has both 826606443 and -1861807411 If I search with http://localhost:8080/solr/select/?stylesheet=version=2.1start=0rows=50indent=onq=-1861807411fl=id,length,key I get no results, but if I do http://localhost:8080/solr/select/?stylesheet=version=2.1start=0rows=50indent=onq=826606443fl=id,length,key I get the document as expected. Obviously the key field is configured as a search field, indexed, etc. but somehow solr doesn't like negatives. I'm assuming this might have something to do with analysers but can't tell how to fix it.. any ideas?? Thanks galo
Re: org.apache.jasper.JasperException: Exception in JSP: /admin/_info.jsp:27
Whenever I've encountered this, the cause has nearly always been starting tomcat with the proper current working directory. I went through the example install a few weeks ago, line by line, from the wiki page for Tomcat and it ran fine. I'm running 5.5.17, and have done this on both FC5 and FC6. Other things of importance: proper chmod settings on /bin under apache-tomcat. Hope this helps. -- j On 4/3/07, Karen Loughran [EMAIL PROTECTED] wrote: Hi all, I'm trying to install Solr in a Tomcat 5.5.17 container on Linux Fedora core 5. I receive org.apache.jasper.JasperException: Exception in JSP: /admin/_info.jsp:27. Full error given below. I'm following the instructions on the WIKI I have copied the solr.war (from apache-solr-1.1.0) to $CATALINA_HOME/webapps. I have copied the example solr home example/solr as a template for my solr home. I then start tomcat from the same directory which contains this solr directory as instructed in the wiki. Any help would be much appreciated, Thanks Karen Full Error: type Exception report INFO: Deploying web application archive solr.war Apr 3, 2007 2:52:47 PM org.apache.solr.servlet.SolrServlet init INFO: SolrServlet.init() Apr 3, 2007 2:52:47 PM org.apache.solr.servlet.SolrServlet init INFO: No /solr/home in JNDI Apr 3, 2007 2:52:47 PM org.apache.solr.servlet.SolrServlet init message description The server encountered an internal error () that prevented it from fulfilling this request. exception org.apache.jasper.JasperException: Exception in JSP: /admin/_info.jsp:27 24: 25: %-- jsp:include page=header.jsp/ --% 26: %-- do a verbatim include so we can use the local vars --% 27: [EMAIL PROTECTED] file=header.jsp % 28: 29: br clear=all 30: table Stacktrace: org.apache.jasper.servlet.JspServletWrapper.handleJspException( JspServletWrapper.java:504) org.apache.jasper.servlet.JspServletWrapper.service( JspServletWrapper.java:375) org.apache.jasper.servlet.JspServlet.serviceJspFile( JspServlet.java:314) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) root cause javax.servlet.ServletException org.apache.jasper.runtime.PageContextImpl.doHandlePageException( PageContextImpl.java:858) org.apache.jasper.runtime.PageContextImpl.handlePageException( PageContextImpl.java:791) org.apache.jsp.admin.index_jsp._jspService(index_jsp.java:313) org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) org.apache.jasper.servlet.JspServletWrapper.service( JspServletWrapper.java:332) org.apache.jasper.servlet.JspServlet.serviceJspFile( JspServlet.java:314) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) root cause java.lang.NoClassDefFoundError org.apache.jsp.admin.index_jsp._jspService(index_jsp.java:80) org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) org.apache.jasper.servlet.JspServletWrapper.service( JspServletWrapper.java:332) org.apache.jasper.servlet.JspServlet.serviceJspFile( JspServlet.java:314) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264) javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
Re: Troubleshooting java heap out-of-memory
Hoping I can get a better response with a more directed question: With facet queries and the fields used, what qualifies as a large number of values? The wiki uses U.S. states as an example, so the number of unique values = 50. More to the point, is there an algorithm that I can use to estimate the cache consumption rate for facet queries? -- j On 4/1/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: I've read through the list entries here, the Lucene list, and the wiki docs and am not resolving a major pain point for us. We've been trying to determine what could possibly cause us to hit this in our given environment, and am hoping more eyes on this issue can help. Our scenario: 150MB index, 14 documents, read/write servers in place using standard replication. Running Tomcat 5.5.17 on Redhat Enterprise Linux 4. Java configured to start with -Xmx1024m. We encounter java heap out-of-memory issues on the read server at staggered times, but usually once every 48 hours. Search request load is roughly 2 searches every 3 seconds, with some spikes here or there. We are using facets: 3 are based on type integer, one is based on type string. We are using sorts: 1 is based on type sint, 2 are based on type date. Caching is disabled. Solr bits are also from September 2006. Is there anything in that configuration that we should interrogate? thanks, j
Re: Troubleshooting java heap out-of-memory
On 4/2/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 4/1/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: Our scenario: 150MB index, 14 documents, read/write servers in place using standard replication. Running Tomcat 5.5.17 on Redhat Enterprise Linux 4. Java configured to start with -Xmx1024m. We encounter java heap out-of-memory issues on the read server at staggered times, but usually once every 48 hours. Could you do a grep through your server logs for WARNING, to eliminate the possibility of multiple overlapping searchers causing the OOM issue? We're not seeing warnings for overlapping searchers prior to the oom events. Only SEVERE -- java.lang.OutOfMemoryError: Java heap space. Are you doing incremental updates? If so, try lowering your mergeFactor for the index, or optimize more frequently. As an index is incrementally updated, old docs are marked as deleted and new docs are added. This leaves holes in the document id space which can increase memory usage. Both BitSet filters and FieldCache entry sizes are proportionally related to maxDoc (the maximum internal docid in the index). You can see maxDoc from the statistics page... there might be a correlation. We are doing incremental updates, and we optimize quite a bit. mergeFactor presently set to 10. maxDoc count = 144156 numDocs count = 144145
Re: Troubleshooting java heap out-of-memory
Thanks for the pointers, Mike. I'm trying to determine the math to resolve some strange numbers we're seeing. Here's the top dozen lines from a jmap analysis on a heap dump: SizeCount Class description - 428246064 1792204 int[] 931751763213131 char[] 771950403216460 java.lang.String 674791123945 long[] 530738881658559 java.util.LinkedHashMap$Entry 396683521652848 org.apache.solr.search.HashDocSet 2819528027131 byte[] 271654561697841 org.apache.lucene.index.Term 270240161689001 org.apache.lucene.search.TermQuery 22265920695810org.apache.lucene.document.Field 4931568 5974 java.lang.Object[] 4366768 77978 org.apache.lucene.store.FSIndexInput I see the HashDocSet numbers (count=1.65 million), assume they have references to the int arrays (count=1.79 million) and wonder how I could have so many of those in memory. A few more data tidbits: - Facet field Id1 = type int, unique values = 2710 - Facet field Id2 = type int, unique values = 65 - Facet field Id3 = type string, unique values = 15179 Thanks for the extra eyes on this, much appreciated. -- j On 4/2/07, Mike Klaas [EMAIL PROTECTED] wrote: On 4/2/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: With facet queries and the fields used, what qualifies as a large number of values? The wiki uses U.S. states as an example, so the number of unique values = 50. More to the point, is there an algorithm that I can use to estimate the cache consumption rate for facet queries? The cache consumption rate is one entry per unique value in all faceted fields, excluding fields that have faceting satisfied via FieldCache (single-valued fields with exacly one token per document). The size of each cached filter is num docs / 8 bytes, unless the number of maching docs is less than the useHashSet threshold in solrconfig.xml. Sorting requires FieldCache population, which consists of an integer per document plus the sum of the lengths of the unique values in the field (less for pure int/float fields, but I'm not sure if Solr's sint qualifies). Both faceting and sorting shouldn't consume more memory after their datastructures have been built, so it would be odd to see OOM after 48 hours if they were the cause. -Mike
Re: Troubleshooting java heap out-of-memory
Sorry for the confusion. We do have caching disabled. I was asking the question because I wasn't certain if the configurable cache settings applied throughout, or if the FieldCache in lucene still came in play. The two integer-based facets are single valued per document. The string-based facet is multiValued. On 4/2/07, Chris Hostetter [EMAIL PROTECTED] wrote: : values = 50. More to the point, is there an algorithm that I can use to : estimate the cache consumption rate for facet queries? I'm confused ... i thought you said in your orriginal mail that you had all the caching disabled? (except for FieldCache which is so low level in Lucene it's always used) are the fields you are faceting on multiValued or single valued? -Hoss
Re: Troubleshooting java heap out-of-memory
Major version is 1.0. The bits are from a nightly build from early September 2006. We do have plans to upgrade solr soon. On 4/2/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 4/2/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: We are doing incremental updates, and we optimize quite a bit. mergeFactor presently set to 10. maxDoc count = 144156 numDocs count = 144145 What version of Solr are you using? Another potential OOM (multiple threads generating the same FieldCache entry) was fixed in later versions of Lucene included with Solr. -Yonik
Re: Troubleshooting java heap out-of-memory
Yonik - is this the JIRA entry you're referring to? http://issues.apache.org/jira/browse/LUCENE-754 On 4/2/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 4/2/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: We are doing incremental updates, and we optimize quite a bit. mergeFactor presently set to 10. maxDoc count = 144156 numDocs count = 144145 What version of Solr are you using? Another potential OOM (multiple threads generating the same FieldCache entry) was fixed in later versions of Lucene included with Solr. -Yonik
Re: C# API for Solr
What would make things consistent for the client api's is a prescribed set of implementations for a solr release. For example, executing searches with these parameters, support for facets requires those parameters, updates should be called in this manner, etc. For lack of a better term, a loosely-coupled interface definition. Those requirements could then be versioned, and the various api's could advertise themselves as solr 1.0compliant, solr 1.1 compliant, and so on. The solr release dictates the requirements for compliance; the api maintainer is responsible for meeting those requirements. This would also be handy when certain features are deprecated, i.e. when the /update url is changed. Regarding C#, this would be easy enough to implement. There are common community methods for building/compilation, test libraries, and help documentation, so doing things consistently with Erik and the solrb library works for C# as well (and I assume most other languages.) -- j On 3/31/07, Chris Hostetter [EMAIL PROTECTED] wrote: On a related note: We've still never really figured out how to deal with integrating compilation or testing for client code into our main and build system -- or for that matter how we should distribute them when we do our next release, so if you have any suggestions regarding your C# client by all means speak up ... in the mean time we can do the same thing Erik started with solrb and flare: an isolated build system that makes sense to the people who understand that language and rely on community to cacth any changes to Solr that might break clients. -Hoss
Re: C# API for Solr
Ryan - I'm working on cleanup to release this thing for the world to enjoy. -- j On 3/31/07, Ryan McKinley [EMAIL PROTECTED] wrote: Yes yes! On 3/31/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: We built our first search system architecture around Lucene.Net back in 2005 and continued to make modifications through 2006. We quickly learned that search management is so much more than query algorithms and indexing choices. We were not readily prepared for the operational overhead that our Lucene-based search required: always-on availability, fast response times, batch and real-time updates, etc. Fast forward to 2007. Our front-end is Microsoft-based, but we needed to support parallel development on non-Microsoft architecture, and thus needed a cross-platform search system. Hello Solr! We've transitioned our search system to Solr with a Linux/Tomcat back-end, and it's been a champ. We now use solr not only for standard keyword search, but also to drive queries for lots of different content sections on our site. Solr has moved beyond mission critical in our operation. As we've proceeded, we've built out a nice C# client library to abstract the interaction from C# to Solr. It's mostly generic and designed for extensibilty. With a few modifications, this could be a stand-alone library that works for others. I have clearance from the organization to contribute our library to the community if there's interest. I'd first like to gauge the interest of everyone before doing so; please reply if you do. cheers, jeff r.
C# API for Solr
We built our first search system architecture around Lucene.Net back in 2005 and continued to make modifications through 2006. We quickly learned that search management is so much more than query algorithms and indexing choices. We were not readily prepared for the operational overhead that our Lucene-based search required: always-on availability, fast response times, batch and real-time updates, etc. Fast forward to 2007. Our front-end is Microsoft-based, but we needed to support parallel development on non-Microsoft architecture, and thus needed a cross-platform search system. Hello Solr! We've transitioned our search system to Solr with a Linux/Tomcat back-end, and it's been a champ. We now use solr not only for standard keyword search, but also to drive queries for lots of different content sections on our site. Solr has moved beyond mission critical in our operation. As we've proceeded, we've built out a nice C# client library to abstract the interaction from C# to Solr. It's mostly generic and designed for extensibilty. With a few modifications, this could be a stand-alone library that works for others. I have clearance from the organization to contribute our library to the community if there's interest. I'd first like to gauge the interest of everyone before doing so; please reply if you do. cheers, jeff r.
Re: C# API for Solr
Good thought, Yonik. I haven't looked at the Java client, would certainly be worthwhile. I'll move to prepping the files so they're completely generic and can work for anyone. One administrative question: can I contribute these files to be stored under /lucene/solr/trunk/client? I don't have a handy place for making these publicly accessible at the moment. thanks, jeff On 3/31/07, Yonik Seeley [EMAIL PROTECTED] wrote: C# and Java are so similar, perhaps the Java client in SOLR-20 could learn something from yours (or vice-versa). -Yonik
Controlling read/write access for replicated indexes
I'm curious what mechanisms everyone is using to control read/write access for distributed replicated indexes. We're moving to a replication environment very soon, and our client applications (quite a few) all have configuration pointers to the URLs for solr instances. As a precaution, I don't want errant configuration values to inadvertently send write requests to read servers, as an example. As an aside, we're running solr under tomcat 5.5.x which has its own control aspects as well. Any best practices, i.e. something that's not a maintenance headache later, from those who have done this would be greatly appreciated. thanks, j.r.
Re: Error with bin/optimize and multiple solr webapps
This issue has been logged as: https://issues.apache.org/jira/browse/SOLR-188 A patch file is included for those who are interested. I've unit tested in my environment, please validate it for your own environment. cheers, j On 3/5/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: Thanks Hoss. I'll add an issue in JIRA and attach the patch. On 3/5/07, Chris Hostetter [EMAIL PROTECTED] wrote: : This line assumes a single solr installation under Tomcat, whereas the : multiple webapp scenario runs from a different location (the /solr part). : I'm sure this applies elsewhere. good catch ... it looks like all of our scripts assume /solr/update is the correct path to POST commit/optimize messages to. : I would submit a patch for JIRA, but couldn't find these files under version : control. Any recommendations? They live in src/scripts ... a patch would ceritanly be apprecaited. FYI: there is an evolution underway to allow XML based update messages to be sent to any path (and the fixed path /update is being deprecated) so it would be handy if the entire URL path was configurable (not just hte webapp name) -Hoss
Re: Error with bin/optimize and multiple solr webapps
Oops, my bad I didn't see either 186 or 187 before entering 188. :-) -- j On 3/6/07, Graham Stead [EMAIL PROTECTED] wrote: Apologies in advance if SOLR-187 and SOLR-188 look the same -- they are the same issue. I have been using adjusted scripts locally but hadn't used Jira before and wasn't sure of the process. I decided to figure it out after answering Gola's question this morning...then saw that Jeff had mentioned a similar issue last night. I apologize again for confusion over the double entry. Thanks, -Graham -Original Message- From: Jeff Rodenburg [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 06, 2007 4:34 PM To: solr-user@lucene.apache.org Subject: Re: Error with bin/optimize and multiple solr webapps This issue has been logged as: https://issues.apache.org/jira/browse/SOLR-188 A patch file is included for those who are interested. I've unit tested in my environment, please validate it for your own environment. cheers, j On 3/5/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: Thanks Hoss. I'll add an issue in JIRA and attach the patch. On 3/5/07, Chris Hostetter [EMAIL PROTECTED] wrote: : This line assumes a single solr installation under Tomcat, whereas the : multiple webapp scenario runs from a different location (the /solr part). : I'm sure this applies elsewhere. good catch ... it looks like all of our scripts assume /solr/update is the correct path to POST commit/optimize messages to. : I would submit a patch for JIRA, but couldn't find these files under version : control. Any recommendations? They live in src/scripts ... a patch would ceritanly be apprecaited. FYI: there is an evolution underway to allow XML based update messages to be sent to any path (and the fixed path /update is being deprecated) so it would be handy if the entire URL path was configurable (not just hte webapp name) -Hoss
Error with bin/optimize and multiple solr webapps
I noticed an issue with the optimize bash script in /bin. Per the line: rs=`curl http://${solr_hostname}:${solr_port}/solr/update -s -d optimize/` This line assumes a single solr installation under Tomcat, whereas the multiple webapp scenario runs from a different location (the /solr part). I'm sure this applies elsewhere. I would submit a patch for JIRA, but couldn't find these files under version control. Any recommendations? -- j
Re: Solr graduates and joins Lucene as sub-project
Congrats to all involved committers on the project as well. Solr is an invaluable system in my operation. Great job. On 1/17/07, Yonik Seeley [EMAIL PROTECTED] wrote: Solr has just graduated from the Incubator, and has been accepted as a Lucene sub-project! Thanks to all the Lucene and Solr users, contributors, and developers who helped make this happen! I have a feeling we're just getting started :-) -Yonik
Re: One item, multiple fields, and range queries
Now I follow. I was misreading the first comments, thinking that the field content would be deconstructed to smaller components or pieces. Too much (or not enough) coffee. I'm expecting the index doc needs to be constructed with lat/long/dates in sequential order, i.e.: doc add field name=event_id123/field field name=latitude32.123456/field field name=longitude-88.987654/field field name=when01/31/2007/field field name=latitude42.123456/field field name=longitude-98.987654/field field name=when01/31/2007/field field name=latitude40.123456/field field name=longitude-108.987654/field field name=when01/30/2007/field .etc. Assuming slop count of 0, while the intention is to match lat/long/when in that order, could it possibly match long/when/lat, or when/lat/long? Does PhraseQuery enforce order and starting point as well? Assuming all of this, how does range query come into play? Or could the PhraseQuery portion be applied as a filter? On 1/17/07, Chris Hostetter [EMAIL PROTECTED] wrote: : OK, you lost me. It sounds as if this PhraseQuery-ish approach involves : breaking datetime and lat/long values into pieces, and evaluation occurs : with positioning. Is that accurate? i'm not sure what you mean by pieces ... the idea is that you would have a single latitude field and a single longitude field and a single when field, and if an item had a single event, you would store a single value in each field ... but if the item has multiple events, you would store them in the same relative ordering, and then use the same kind of logic PhraseQuery uses to verify that if the latitude field has a value in the right range, and the longitude field has a value in the right range, and the when field has a value in the right range, that all of those values have the same position (specificly: are within a set amount of slop from eachother, which you would allways set to 0) : It seems like this could even be done in the same field if one had a : query type that allowed querying for tokens at the same position. : Just index _noun at the same position as house (and make sure : there can't be collisions between real terms and markers via escaping, : or use \0 instead of _, etc). true ... but the point doug made way back when is that with a generalized multi-field phrase query you wouldn't have to do that escaping ... the hard part in this case is the numeric ranges. -Hoss
Re: One item, multiple fields, and range queries
Yonik/Hoss - OK, you lost me. It sounds as if this PhraseQuery-ish approach involves breaking datetime and lat/long values into pieces, and evaluation occurs with positioning. Is that accurate? On 1/16/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 1/15/07, Chris Hostetter [EMAIL PROTECTED] wrote: PhraseQuery artificially enforces that the Terms you add to it are in the same field ... you could easily write a PhraseQuery-ish query that takes Terms from differnet fields, and ensures that they appear near eachother in terms of their token sequence -- the context of that comment was searching for instances of words with specific usage (ie: house used as a noun) by putting the usage type of each term in a different term in a seperate parallel field, but with identicle token positions. It seems like this could even be done in the same field if one had a query type that allowed querying for tokens at the same position. Just index _noun at the same position as house (and make sure there can't be collisions between real terms and markers via escaping, or use \0 instead of _, etc). -Yonik
Re: One item, multiple fields, and range queries
Thanks Hoss. Interesting approach, but the N bound could be well in the hundreds, and the N bound would be variable (some maximum number, but different across events.) I've not yet used dynamic fields in this manner. With that number range, what limitations could I encounter? Given the size of that, I would need the solr engine to formulate that query, correct? I can't imagine I could pass that entire subquery statement in the http request, as the character limit would likely be exceeded. Some of my comments may not make sense, so I'll check into dynamic fields and such in the meantime. thanks, j On 1/14/07, Chris Hostetter [EMAIL PROTECTED] wrote: : 2) use multivalued fields as correlated vectors, so the first start : date corresponds :to the first end date corresponds to the first lat and long value. : You get them all back :in a query though, so your app would need to do extra work to sort : out which matched. if you expect a bounded number of correlated events per item, you can use dynaimc fields, and build up N correlated subqueries where N is the upper bound on the number of events you expect any item to have, ie... (+lat1:[x TO y] +lon1:[w TO z] +time1:[a TO b]) OR (+lat2:[x TO y] +lon2:[w TO z] +time2:[a TO b]) OR (+lat3:[x TO y] +lon3:[w TO z] +time3:[a TO b]) ... -Hoss
Re: One item, multiple fields, and range queries
Thanks Yonik. 1) model a single document as a single event at a singe place with a start and end date. This was my first approach, but at presentation time I need to display the event once -- with multiple start/end dates and locations beneath it. Is treating the given event uniqueId as a facet the way to go? thanks, jeff On 1/12/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 1/12/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: I'm stuck with a query issue that at present seems unresolvable. Hoping the community has some insight to this. My index contains events that have multiple beginning/ending date ranges and multiple locations. For example, event A (uniqueId = 123) occurs every weekend, sometimes in one location, sometimes in many locations. Dates have a beginning and ending date, and locations have a latitude longitude. I need to query for the set of events for a given area, where area = bounding box. So, a single event has multiple beginning and ending dates and multiple locations. So, the beginning date, ending date, latitude and longitude values only apply collectively as a unit. However, I need to do range queries on both the dates and the lat/long values. 1) model a single document as a single event at a singe place with a start and end date. OR 2) use multivalued fields as correlated vectors, so the first start date corresponds to the first end date corresponds to the first lat and long value. You get them all back in a query though, so your app would need to do extra work to sort out which matched. I'd do (1) if you can... it's simpler. -Yonik
One item, multiple fields, and range queries
I'm stuck with a query issue that at present seems unresolvable. Hoping the community has some insight to this. My index contains events that have multiple beginning/ending date ranges and multiple locations. For example, event A (uniqueId = 123) occurs every weekend, sometimes in one location, sometimes in many locations. Dates have a beginning and ending date, and locations have a latitude longitude. I need to query for the set of events for a given area, where area = bounding box. So, a single event has multiple beginning and ending dates and multiple locations. So, the beginning date, ending date, latitude and longitude values only apply collectively as a unit. However, I need to do range queries on both the dates and the lat/long values. Any suggested strategies for indexing and query formulation? thanks, j
WordDelimiterFilter usage
I'm trying to determine how to index/query for a certain use case, and the WordDelimiterFilterFactory appears to be what I need to use. Here's the scenario: - Text field being indexed - Field exists as a full name - Data might be cold play - This should match against searches for cold play and coldplay (just cold and just play are OK as well) I'm not able to match cold play against searches for coldplay at present. I'm certain this is a common scenario and I'm missing something obvious. Any suggestions of how/where to look/fix this issue? thanks, j
Re: WordDelimiterFilter usage
Thanks Hoss - it is a finite list, but in the tens of thousands. I'm going to easy route -- adding another field that indexes the terms with no included whitespace. This is used in an ajax-style lookup, so it works for this scenario. Not something I'd normally do in a typical index, for sure. thanks, jeff On 1/11/07, Chris Hostetter [EMAIL PROTECTED] wrote: WordDelimiterFilter wo't really help you in this situations ... but it would help if you find a lot of users are searching for ColdPlay or cold-play. if you have a finite list of popular terms like this that you need to deal with, the SynonymFilter can help you out. : Date: Thu, 11 Jan 2007 13:30:39 -0800 : From: Jeff Rodenburg [EMAIL PROTECTED] : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: WordDelimiterFilter usage : : I'm trying to determine how to index/query for a certain use case, and the : WordDelimiterFilterFactory appears to be what I need to use. Here's the : scenario: : : - Text field being indexed : - Field exists as a full name : - Data might be cold play : - This should match against searches for cold play and coldplay (just : cold and just play are OK as well) : : I'm not able to match cold play against searches for coldplay at : present. I'm certain this is a common scenario and I'm missing something : obvious. Any suggestions of how/where to look/fix this issue? : : thanks, : j : -Hoss
Re: Multiple indexes
This is good information, thanks Chris. My preference was to keep things separate, just needed some external info from others to back me up. thanks, jeff On 1/7/07, Chris Hostetter [EMAIL PROTECTED] wrote: I don't know if there really are any general purpose best practices ... it really depends on use cases -- the main motivation for allowing JNDI context specification of the solr.home location so that multiple instances of SOlr can run in a single instace of a servlet container was so that if you *wanted* to run multiple instances in a single JVM, they could share one heap space, and you wouldn't have to guess how much memory to allocate to multiple instances -- but wether or not you *want* to have a single instance or not is really up to you. the plus side (as i mentioned) is that you can throw all of your available memory at that single JVM instance, and not worry about how much ram each solr instance really needs. the down side is that if any one solr instance really gets hammered to hell by it's users and rolls over and dies, it could bring down your other solr instances as well -- which may not be a big deal if in your use cases all solr instances get hit equally (via a meta searcher) but might be quite a big problem if those seperate instances are completely independent (ie: each paid for by seperate clients) personally: if you've got the resources (money/boxes/RAM) i would recommend keeping everything isolated. (the nice thing about my job is that while i frequently walk out of meetings with the directive to make it faster, I've never been asked to make it use less RAM) -Hoss
Multiple indexes
I've followed a host of past threads on this subject and am trying to determine what's best for our implementation. For those who've chimed in on this, I think I'm just looking for a good summary (as Hoss recently mentioned, perhaps a FAQ). We presently have one index running under Solr/Tomcat55/Linux, which is continually growing in size. I have a need to add two other separate indexes (or is it indices?), which would all carry separate configs. One will be small and won't change, the other will grow in size. For redundancy, I expect to get into the Solr distribution model. Collectively, all three indexes will venture into the 2GB range, so nothing to extensive. All things considered -- jvm memory management, availability, other things I've left off the list -- are there any determinations of best practice for deployment under the topic of multiple index/multiple instance? Any specific recommendations for the given details I've provided here? thanks, j
Replacing a nightly build
What is the recommended path to deployment for replacing a solr nightly build with another? In our scenario, we're updating our current build is roughly 3 months old. We're updating to the latest. Aside from replacing the bits and restarting, are there any steps that everyone is following in maintaining the code stack under deployment? thanks.
Re: Error in faceted browsing
Thanks Chris. I bumped the facet.limit to 10 and it works like a charm. Thanks for the heads up on the merchant_name. I would probably just keep a dictionary in memory, but if I wanted to pull the stored merchant_name back, how would/can I do that? thanks, j On 9/13/06, Chris Hostetter [EMAIL PROTECTED] wrote: : I just pulled down the nightly solr build from 9/12 and have it up and : running. I copied an index created in a solr version that's about 3 months : old. it looks like my changes to have a sensible default (which is when facet.limit=-1 became legal) didn't make it into solr-2006-09-12.zip, but it is in solr-2006-09-13.zip. with the version you are using leaving out the facet.limit should achieve what you want ... but based on your schema, using merchant_name as a facet field may not work like you expect -- you'll probably want an exact String version of the merchant_name field (or just use merchant_id and lookup the name in a handy Map) : : I have a query formulated like this: : http://solrbox:8080/solr/select?q=description:dellrows=0facet=truefacet.limit=-1facet.field=merchant_name : : The fields definition from schema.xml: : :field name=item_id type=long indexed=true stored=true/ :field name=title type=text indexed=true stored=true/ :field name=description type=text indexed=true stored=true/ :field name=merchant_id type=integer indexed=true stored=true / :field name=merchant_name type=text indexed=true stored=true / : : The result: : response : responseHeader : status0/status : QTime2/QTime : /responseHeader : result numFound=52 start=0/ : lst name=facet_counts : lst name=facet_queries/ : str name=exception : java.util.NoSuchElementException : at java.util.TreeMap.key(TreeMap.java:433) : at java.util.TreeMap.lastKey(TreeMap.java:297) : at java.util.TreeSet.last(TreeSet.java:417) : at org.apache.solr.util.BoundedTreeSet.adjust(BoundedTreeSet.java :54) : at org.apache.solr.util.BoundedTreeSet.setMaxSize( BoundedTreeSet.java : :50) : at org.apache.solr.util.BoundedTreeSet.init(BoundedTreeSet.java :31) : at org.apache.solr.request.SimpleFacets.getFacetTermEnumCounts( : SimpleFacets.java:187) : at org.apache.solr.request.SimpleFacets.getFacetFieldCounts( : SimpleFacets.java:137) : at org.apache.solr.request.SimpleFacets.getFacetCounts( SimpleFacets.java : :84) : at org.apache.solr.request.StandardRequestHandler.getFacetInfo( : StandardRequestHandler.java:180) : at org.apache.solr.request.StandardRequestHandler.handleRequest( : StandardRequestHandler.java:120) : at org.apache.solr.core.SolrCore.execute(SolrCore.java:586) : at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:91) : at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) : at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) : at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter( : ApplicationFilterChain.java:252) : at org.apache.catalina.core.ApplicationFilterChain.doFilter( : ApplicationFilterChain.java:173) : at org.apache.catalina.core.StandardWrapperValve.invoke( : StandardWrapperValve.java:213) : at org.apache.catalina.core.StandardContextValve.invoke( : StandardContextValve.java:178) : at org.apache.catalina.core.StandardHostValve.invoke( : StandardHostValve.java:126) : at org.apache.catalina.valves.ErrorReportValve.invoke( : ErrorReportValve.java:105) : at org.apache.catalina.core.StandardEngineValve.invoke( : StandardEngineValve.java:107) : at org.apache.catalina.connector.CoyoteAdapter.service( : CoyoteAdapter.java:148) : at org.apache.coyote.http11.Http11Processor.process( Http11Processor.java : :869) : at : org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection : (Http11BaseProtocol.java:664) : at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket( : PoolTcpEndpoint.java:527) : at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt( : LeaderFollowerWorkerThread.java:80) : at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run( : ThreadPool.java:684) : at java.lang.Thread.run(Thread.java:595) : /str : /lst : /response : : : What am I missing? : : -- j : -Hoss
Re: Error in faceted browsing
Outstanding, thanks. - j On 9/13/06, Yonik Seeley [EMAIL PROTECTED] wrote: On 9/13/06, Jeff Rodenburg [EMAIL PROTECTED] wrote: Thanks for the heads up on the merchant_name. I would probably just keep a dictionary in memory, but if I wanted to pull the stored merchant_name back, how would/can I do that? If you don't want merchant_name tokenized at all, just change the type to string. If you want an additional field for faceting on with merchant_name untokenized, then use copyField in schema.xml to copy merchant_name to merchant_name_exact and define field name=merchant_name_exact type=string indexed=true stored=false / -Yonik
Re: Re: IIS web server and Solr integration
Tim - If you can help it, I would suggest running Solr under Tomcat under Linux. Speaking from experience in a mixed mode environment, the Linux/Tomcat/Solr implementation just works. We're not newbies under Linux, but we're also a native Windows shop. The memory management and system availability is just outstanding in that stack. If you must run Windows, Tomcat does integrate with IIS, but be prepared to jump through a few hoops. Spend time on making that combination work, and you'll be 90% there Hope this helps. -- j On 9/10/06, Tim Archambault [EMAIL PROTECTED] wrote: Good news. The rookie did just that. Thanks Chris. Just having a difficult time how to send my query parameters to the engine from Coldfusion [intelligently]. I'm going to download the PHP app and see if I can figure it out. Having lots of fun with this for sure. Tim On 9/10/06, Chris Hostetter [EMAIL PROTECTED] wrote: : Should it run on a separate port than IIS or integrated using ISAPI plug-in? I can't make any specific recomendations about Windows or IIS, but i personally wouldn't Run Solr in the same webserver/appserver that your users hit -- from a security standpoint, i would protect your solr instance the same way you would protect a database, let the applications running in your webserver connect to it and run queries against it, but don't expose it to the outside world directly. -Hoss
Faceted browsing: status
From the Tasklist wiki: - Simple faceted browsing (grouping) support in the standard query handler - group by field (provide counts for each distinct value in that field) - group by (query1, query2, query3, query4, query5) How far/close is this task to completion? (I'm trying to gauge time/effort here.) -- j
Re: Documentation?
Thanks Chris/Yonik, don't know why I didn't see those yet. -- j On 5/15/06, Chris Hostetter [EMAIL PROTECTED] wrote: : I was checking around the solr site and pages at apache.org and wasn't : finding much. Before jumping into the code, I'd like to get as familiar : with solr as I could from existing docs or the like. Can someone point me : in the direction? The best documentation about using Solr is the tutorial... http://incubator.apache.org/solr/tutorial.html The documentation on Solr's internals and developing Query plugins are pretty sparse at the moment. It's on my todo list (hopefull this week) If you want a good chunk of code to sink your teeth into as a starting point, take a look at StandardRequestHandler, and the APIs it uses from other classes. -Hoss
Documentation?
I was checking around the solr site and pages at apache.org and wasn't finding much. Before jumping into the code, I'd like to get as familiar with solr as I could from existing docs or the like. Can someone point me in the direction? thanks, jeff r.