Re: Issue with 2WD and 4WD in query
Brendan, pull up your Solr Admin Analysis page and try running your queries through that. The output will tell you precisely how each analyzer affects your tokens on either the index or query side. In my own quick test, WordDelimiterFilterFactory seems inclined to break 2WD into (2,WD) (using org.apache.solr.analysis.WordDelimiterFilterFactory {catenateWords=1, catenateNumbers=1, catenateAll=0, generateNumberParts=1, generateWordParts=1}) --matt On Dec 9, 2007, at 6:41 PM, Brendan Grainger wrote: Hi, I hope you can help me. I'm having an odd problem with solr. I have a field that could be represent a car. A car could have a name like Silverado or could be something like Silverado 2WD to denote the 2 wheel drive version of the car. Anyway, all is well when I search over the field for Silverado, but when I try searching for 2WD (doesn't matter what case) nothing is returned. Same applies for Silverado 2WD etc. I currently have the field defined as text, ie: field name=car_name type=text indexed=true stored=true / But I've also tried defining my own (simpler) field with no luck. FYI my text field is defined like this: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index !-- This is supposed to remove HTML tags before indexing -- tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory/ !-- tokenizer class=solr.WhitespaceTokenizerFactory/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Any help? Thanks! Brendan -- Matt Kangas / [EMAIL PROTECTED]
Re: Issue with 2WD and 4WD in query
I suppose you'll have to take WordDelimiterFilter out of your analysis chain, at least for that field. Or, perhaps toggling the generateNumberParts argument will have some effect? The API documentation should be your best resource here... --matt On Dec 10, 2007, at 11:48 AM, Brendan Grainger wrote: Hi Matt, Thanks for the reply. I've done what you said and I get exactly what you're saying as a result. Any ideas about how to make 2WD and 4WD be terms on their own? THanks On Dec 10, 2007, at 11:41 AM, Matt Kangas wrote: Brendan, pull up your Solr Admin Analysis page and try running your queries through that. The output will tell you precisely how each analyzer affects your tokens on either the index or query side. In my own quick test, WordDelimiterFilterFactory seems inclined to break 2WD into (2,WD) (using org.apache.solr.analysis.WordDelimiterFilterFactory {catenateWords=1, catenateNumbers=1, catenateAll=0, generateNumberParts=1, generateWordParts=1}) --matt On Dec 9, 2007, at 6:41 PM, Brendan Grainger wrote: Hi, I hope you can help me. I'm having an odd problem with solr. I have a field that could be represent a car. A car could have a name like Silverado or could be something like Silverado 2WD to denote the 2 wheel drive version of the car. Anyway, all is well when I search over the field for Silverado, but when I try searching for 2WD (doesn't matter what case) nothing is returned. Same applies for Silverado 2WD etc. I currently have the field defined as text, ie: field name=car_name type=text indexed=true stored=true / But I've also tried defining my own (simpler) field with no luck. FYI my text field is defined like this: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index !-- This is supposed to remove HTML tags before indexing -- tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory/ !-- tokenizer class=solr.WhitespaceTokenizerFactory/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Any help? Thanks! Brendan -- Matt Kangas / [EMAIL PROTECTED] -- Matt Kangas / [EMAIL PROTECTED]
Re: Distribution without SSH?
Your company's network policies seem to be a good thing. I've worked at places with this same policy, for good reason. But it does tend to complicate operations sometimes. Some options you might pursue: * Set up ssh-agent on the clients and use passphrase-protected keys. Downside to this, someone on your ops team will be inevitably awoken at 4am to type in the password. * Try to get an exception to the policy by running Solr under a new user account inside a jail. Use a restricted login shell to make sure it can do only what you intend. So when the key is compromised, damage is contained. Or, write a custom server/client running on a different port. In this case you lose over-the-wire encryption, and if your server is buggy, you get pwn3d anyway. --Matt On Nov 29, 2007, at 7:48 PM, Justin Knoll wrote: Hello, I recently set up Solr with distribution on a couple of servers. I just learned that our network policies do not permit us to use SSH with passphraseless keys, and the snappuller script uses SSH to examine the master Solr instance's state before it pulls the newest index via rsync. We plan to attempt to rewrite the snappuller (and possibly other distribution scripts, as required) to eliminate this dependency on SSH. I thought I ask the list in case anyone has experience with this same situation or any insights into the reasoning behind requiring SSH access to the master instance. Thanks, Justin Knoll -- Matt Kangas / [EMAIL PROTECTED]
Re: anyone can send me jetty-plus
If you're using Jetty 6, there's no need for a separate Jetty Plus download. The plus jarfiles come in the standard distribution. --matt On Sep 27, 2007, at 12:10 AM, James liu wrote: i can't download it from http://jetty.mortbay.org/jetty5/plus/ index.html -- regards jl -- Matt Kangas / [EMAIL PROTECTED]
Re: Re[2]: multiple indices
Jack, the JNDI-enabling jarfiles now ship as part of the main .zip distribution. There is no need for a separate JettyPlus download as of Jetty 6. I used Jetty 6.1.3 (http://dist.codehaus.org/jetty/jetty-6.1.x/ jetty-6.1.3.zip) at the time, and I am using only these jarfiles from the main distribution. I stripped everything else out that seemed unnecessary for running Solr. lib/jetty-6.1.3.jar lib/jetty-util-6.1.3.jar lib/jsp-2.1/ant-1.6.5.jar lib/jsp-2.1/core-3.1.1.jar lib/jsp-2.1/jsp-2.1.jar lib/jsp-2.1/jsp-api-2.1.jar lib/naming/jetty-naming-6.1.3.jar lib/plus/jetty-plus-6.1.3.jar lib/servlet-api-2.5-6.1.3.jar --Matt On Sep 13, 2007, at 11:44 AM, Jack L wrote: Thanks Matt, I'll give it a try! So this requires JettyPlus? -- Best regards, Jack Wednesday, September 12, 2007, 5:14:32 AM, you wrote: Jack, I've posted a complete recipe for running two Solr indices within one Jetty 6 container: http://wiki.apache.org/solr/SolrJetty Scroll down to the part that says: (7/2007 MattKangas) The recipe above didn't work for me with Jetty 6.1.3. ... I'm glossing over a lot of details, so attached is a tarball with a known-good configuration that runs two Solr instances inside one Jetty container. I'm using Solr 1.2.0 and Jetty 6.1.3 respectively. Hope this helps, --matt On Sep 11, 2007, at 11:52 AM, Jack L wrote: I was going through some old emails on this topic. Rafael Rossini figured out how to run multiple indices on single instance of jetty but it has to be jetty plus. I guess jetty doesn't allow this? I suppose I can add additional jars and make it work but I haven't tried that. It'll always be much safer/simpler/less playing around if a feature is available out of box. I'm mentioning this again because I really think it's a desirable feature, especially because each JVM uses a lot of memory and sometimes it's not possible to start a new jetty for each index due to memory limitation. I understand I can use a type field and mix doc types but this is not ideal for two reasons: 1. it's easier to maintain separate indices. I can just wipe out all the files and re-post an individual index. Much less posting work to do as opposed to re-posting all docs. Or I can move one index to another partition, or even to another server to run separately in order to scale up. It'll be a problem (although solvable by deleting and re-posting) with a mixed index. 2. my understanding is that mixed index means larger index files and slower performance JettyPlus's download links seem to be broken so I wasn't able to check its download size. If not too big, maybe JettyPlus is an option? If not, there should be a way to have this feature implemented on solr side? Maybe by prefixing the REST URLs with index names... -- Thanks, Jack -- Matt Kangas / [EMAIL PROTECTED] -- Matt Kangas / [EMAIL PROTECTED]
Recipe: multiple webapps in Jetty 6
For anyone who's been watching SOLR-215 (Multiple Solr Cores), or otherwise has wanted to run multiple Solr instances in a single Jetty instance... I've posted a new, improved recipe to http:// wiki.apache.org/solr/SolrJetty (scroll to bottom) I've also attached a tarball with a known-good config for Solr 1.2.0 Jetty 6.1.3. It should be straightforward to define as many webapps as you need with this recipe. Note: I'm pretty sure there is an even cleaner way to accomplish this too, without the need to fetch additional .jars and messing with JNDI, but I haven't fleshed out the details yet... will update the wiki if I get it working. :) Cheers, --Matt -- Matt Kangas / [EMAIL PROTECTED]
Re: Please help! Solr 1.1 HTTP server stops responding
David, If nothing on port 8983 responds, your servlet container is certainly the first thing that should be checked, because that is what's listening on port 8983. First, let's need to figure out what version of Jetty you're using and how it is started -- which will lead you to the log files, if it is producing any. When Jetty/Solr is running correctly, try fetching any page from that host using curl -I. Example: here's what I see on my laptop, with Solr running inside Jetty shaft:R curl -I http://localhost:8983/ HTTP/1.1 404 Not Found Content-Type: text/html; charset=iso-8859-1 Content-Length: 1287 Server: Jetty(6.1.3) From this, I know it's Jetty 6.1.3. Next, how is Jetty being started? Where is its jetty.xml configuration file? What does that file specify for RequestLog? On my laptop, I'm manually starting it via java -jar start.jar. On my work hosts, java -jar start.jar is being run from daemontools (unlikely in your case) Or, Jetty can be invoked without the start.jar shortcut. That's just the default way of starting Solr. My point is that I can't predict how it's started on your machine. You need to find out yourself. On Linux: - ps -ef | grep java - look at that list, see which java process is the relevant one - take the parent PID of that process and run ps -p value to see what process started it - repeat until you find the script or program that started Jetty, and the path to jetty.xml If the process actually was java -jar start.jar, then look for an etc subdir in the current working directory for that process. HTH, --Matt Kangas (stepping in to help with what seems to be a panicked-newbie question...) On Jul 30, 2007, at 2:35 PM, David Whalen wrote: Hi Yonik! I'm glad to finally get to talk to you. We're all very impressed with solr and when it's running it's really great. We increased the heap size to 1500M and that didn't seem to help. In fact, the crashes seem to occur more now than ever. We're constantly restarting solr just to get a response. I don't know enough to know where the log files are to answer your question (again, I'm filling in for the guy that set us up with all this). Can I ask for your patience so we can figure this out? Thanks! Dave W -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, July 30, 2007 2:23 PM To: solr-user@lucene.apache.org Subject: Re: Please help! Solr 1.1 HTTP server stops responding It may be related to the out-of-memory errors you were seeing. severe errors like that should never be ignored. Do you see any other warning or severe errors in your logs? -Yonik On 7/30/07, David Whalen [EMAIL PROTECTED] wrote: Guys: Can anyone help me? Things are getting serious at my company and heads are going to roll. I need to figure out why solr just suddenly stops responding without any warning. DW -Original Message- From: David Whalen [mailto:[EMAIL PROTECTED] Sent: Friday, July 27, 2007 10:49 AM To: solr-user@lucene.apache.org Subject: RE: Solr 1.1 HTTP server stops responding We're using Jetty. I don't know what version though. To my knowledge, Solr is the only thing running inside it. Yes, we cannot get to the admin pages either. Nothing on port 8983 responds. So maybe it's actually Jetty that's messing me up? How can I make sure of that? Thanks for the help! DW -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, July 27, 2007 10:40 AM To: solr-user@lucene.apache.org Subject: Re: Solr 1.1 HTTP server stops responding Solr runs as a webapp (think .war file) inside a servlet container (e.g. Tomcat, Jetty, Resin...). It could be that the servlet contan itself has a bug that prevents it from responding properly after a while. If you have other webapps in the same container, do they still respond? Can you got to *any* of Solr's pages (e.g. admin page)? Anything in container or Solr logs? Otis -- Lucene Consulting - http://lucene-consulting.com/ - Original Message From: David Whalen [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, July 27, 2007 4:21:18 PM Subject: RE: Solr 1.1 HTTP server stops responding Hi Otis. I'm filling-in for the guy that installed the software for us (now he's long gone), so I'm just getting familiar with all of this. Can you elaborate on what you mean? DW -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, July 27, 2007 10:01 AM To: solr-user@lucene.apache.org Subject: Re: Solr 1.1 HTTP server stops responding Hi David, Have you ruled out your servlet container as the source of this bug? Otis - Original Message From: David Whalen [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, July 27, 2007 3:06:42 PM Subject: Solr 1.1 HTTP server stops responding Hi All. We're running Solr 1.1 and we're seeing intermittent cases where Solr