Re: Issue with 2WD and 4WD in query

2007-12-10 Thread Matt Kangas
Brendan, pull up your Solr Admin Analysis page and try running your  
queries through that. The output will tell you precisely how each  
analyzer affects your tokens on either the index or query side.


In my own quick test, WordDelimiterFilterFactory seems inclined to  
break 2WD into (2,WD)


(using org.apache.solr.analysis.WordDelimiterFilterFactory  
{catenateWords=1, catenateNumbers=1, catenateAll=0,  
generateNumberParts=1, generateWordParts=1})


--matt

On Dec 9, 2007, at 6:41 PM, Brendan Grainger wrote:


Hi,

I hope you can help me. I'm having an odd problem with solr. I have  
a field that could be represent a car. A car could have a name like  
Silverado or could be something like Silverado 2WD to denote the  
2 wheel drive version of the car. Anyway, all is well when I search  
over the field for Silverado, but when I try searching for  
2WD (doesn't matter what case) nothing is returned. Same applies  
for Silverado 2WD etc. I currently have the field defined as text,  
ie:


field name=car_name type=text indexed=true stored=true /

But I've also tried defining my own (simpler) field with no luck.  
FYI my text field is defined like this:


   fieldType name=text class=solr.TextField  
positionIncrementGap=100

 analyzer type=index
!-- This is supposed to remove HTML tags before indexing --
tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory/
!--
   tokenizer class=solr.WhitespaceTokenizerFactory/
--
   filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt/
   filter class=solr.WordDelimiterFilterFactory  
generateWordParts=1 generateNumberParts=1 catenateWords=1  
catenateNumbers=1 catenateAll=0/

   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPorterFilterFactory  
protected=protwords.txt/

   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory  
synonyms=synonyms.txt ignoreCase=true expand=true/
   filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt/
   filter class=solr.WordDelimiterFilterFactory  
generateWordParts=1 generateNumberParts=1 catenateWords=0  
catenateNumbers=0 catenateAll=0/

   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPorterFilterFactory  
protected=protwords.txt/

   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
   /fieldType

Any help?

Thanks!
Brendan


--
Matt Kangas / [EMAIL PROTECTED]




Re: Issue with 2WD and 4WD in query

2007-12-10 Thread Matt Kangas
I suppose you'll have to take WordDelimiterFilter out of your analysis  
chain, at least for that field. Or, perhaps toggling the  
generateNumberParts argument will have some effect? The API  
documentation should be your best resource here...


--matt

On Dec 10, 2007, at 11:48 AM, Brendan Grainger wrote:


Hi Matt,

Thanks for the reply. I've done what you said and I get exactly what  
you're saying as a result. Any ideas about how to make 2WD and 4WD  
be terms on their own?


THanks

On Dec 10, 2007, at 11:41 AM, Matt Kangas wrote:

Brendan, pull up your Solr Admin Analysis page and try running  
your queries through that. The output will tell you precisely how  
each analyzer affects your tokens on either the index or query side.


In my own quick test, WordDelimiterFilterFactory seems inclined to  
break 2WD into (2,WD)


(using org.apache.solr.analysis.WordDelimiterFilterFactory  
{catenateWords=1, catenateNumbers=1, catenateAll=0,  
generateNumberParts=1, generateWordParts=1})


--matt

On Dec 9, 2007, at 6:41 PM, Brendan Grainger wrote:


Hi,

I hope you can help me. I'm having an odd problem with solr. I  
have a field that could be represent a car. A car could have a  
name like Silverado or could be something like Silverado 2WD  
to denote the 2 wheel drive version of the car. Anyway, all is  
well when I search over the field for Silverado, but when I try  
searching for 2WD (doesn't matter what case) nothing is  
returned. Same applies for Silverado 2WD etc. I currently have  
the field defined as text, ie:


field name=car_name type=text indexed=true stored=true /

But I've also tried defining my own (simpler) field with no luck.  
FYI my text field is defined like this:


  fieldType name=text class=solr.TextField  
positionIncrementGap=100

analyzer type=index
!-- This is supposed to remove HTML tags before indexing --
tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory/
!--
  tokenizer class=solr.WhitespaceTokenizerFactory/
   --
  filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt/
  filter class=solr.WordDelimiterFilterFactory  
generateWordParts=1 generateNumberParts=1 catenateWords=1  
catenateNumbers=1 catenateAll=0/

  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.EnglishPorterFilterFactory  
protected=protwords.txt/

  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.SynonymFilterFactory  
synonyms=synonyms.txt ignoreCase=true expand=true/
  filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt/
  filter class=solr.WordDelimiterFilterFactory  
generateWordParts=1 generateNumberParts=1 catenateWords=0  
catenateNumbers=0 catenateAll=0/

  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.EnglishPorterFilterFactory  
protected=protwords.txt/

  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
  /fieldType

Any help?

Thanks!
Brendan


--
Matt Kangas / [EMAIL PROTECTED]






--
Matt Kangas / [EMAIL PROTECTED]




Re: Distribution without SSH?

2007-11-29 Thread Matt Kangas
Your company's network policies seem to be a good thing. I've worked  
at places with this same policy, for good reason. But it does tend to  
complicate operations sometimes. Some options you might pursue:


* Set up ssh-agent on the clients and use passphrase-protected keys.  
Downside to this, someone on your ops team will be inevitably awoken  
at 4am to type in the password.
* Try to get an exception to the policy by running Solr under a new  
user account inside a jail. Use a restricted login shell to make sure  
it can do only what you intend. So when the key is compromised,  
damage is contained.


Or, write a custom server/client running on a different port. In this  
case you lose over-the-wire encryption, and if your server is buggy,  
you get pwn3d anyway.


--Matt

On Nov 29, 2007, at 7:48 PM, Justin Knoll wrote:


Hello,
I recently set up Solr with distribution on a couple of servers. I  
just learned that our network policies do not permit us to use SSH  
with passphraseless keys, and the snappuller script uses SSH to  
examine the master Solr instance's state before it pulls the newest  
index via rsync.


We plan to attempt to rewrite the snappuller (and possibly other  
distribution scripts, as required) to eliminate this dependency on  
SSH. I thought I ask the list in case anyone has experience with  
this same situation or any insights into the reasoning behind  
requiring SSH access to the master instance.


Thanks,
Justin Knoll


--
Matt Kangas / [EMAIL PROTECTED]




Re: anyone can send me jetty-plus

2007-09-27 Thread Matt Kangas
If you're using Jetty 6, there's no need for a separate Jetty Plus  
download. The plus jarfiles come in the standard distribution.


--matt

On Sep 27, 2007, at 12:10 AM, James liu wrote:

i can't download it from http://jetty.mortbay.org/jetty5/plus/ 
index.html


--
regards
jl


--
Matt Kangas / [EMAIL PROTECTED]




Re: Re[2]: multiple indices

2007-09-17 Thread Matt Kangas
Jack, the JNDI-enabling jarfiles now ship as part of the main .zip  
distribution. There is no need for a separate JettyPlus download as  
of Jetty 6.


I used Jetty 6.1.3 (http://dist.codehaus.org/jetty/jetty-6.1.x/ 
jetty-6.1.3.zip) at the time, and I am using only these jarfiles from  
the main distribution. I stripped everything else out that seemed  
unnecessary for running Solr.


lib/jetty-6.1.3.jar
lib/jetty-util-6.1.3.jar
lib/jsp-2.1/ant-1.6.5.jar
lib/jsp-2.1/core-3.1.1.jar
lib/jsp-2.1/jsp-2.1.jar
lib/jsp-2.1/jsp-api-2.1.jar
lib/naming/jetty-naming-6.1.3.jar
lib/plus/jetty-plus-6.1.3.jar
lib/servlet-api-2.5-6.1.3.jar

--Matt

On Sep 13, 2007, at 11:44 AM, Jack L wrote:


Thanks Matt, I'll give it a try! So this requires JettyPlus?

--
Best regards,
Jack

Wednesday, September 12, 2007, 5:14:32 AM, you wrote:


Jack, I've posted a complete recipe for running two Solr indices
within one Jetty 6 container:



http://wiki.apache.org/solr/SolrJetty



Scroll down to the part that says:

(7/2007 MattKangas) The recipe above didn't work for me with Jetty
6.1.3.

...

I'm glossing over a lot of details, so attached is a tarball with a
known-good configuration that runs two Solr instances inside one
Jetty container. I'm using Solr 1.2.0 and Jetty 6.1.3 respectively.





Hope this helps,
--matt



On Sep 11, 2007, at 11:52 AM, Jack L wrote:



I was going through some old emails on this topic. Rafael Rossini
figured
out how to run multiple indices on single instance of jetty but it
has to
be jetty plus. I guess jetty doesn't allow this? I suppose I can add
additional jars and make it work but I haven't tried that. It'll
always be much safer/simpler/less playing around if a feature is
available out of box.

I'm mentioning this again because I really think it's a desirable
feature,
especially because each JVM uses a lot of memory and sometimes it's
not possible to start a new jetty for each index due to memory
limitation.

I understand I can use a type field and mix doc types but this is  
not

ideal for two reasons:

1. it's easier to maintain separate indices. I can just wipe out all
the files and re-post an individual index. Much less posting work to
do as opposed to re-posting all docs. Or I can move one index to
another partition, or even to another server to run separately in
order to scale up. It'll be a problem (although solvable by deleting
and re-posting) with a mixed index.

2. my understanding is that mixed index means larger index files and
slower performance

JettyPlus's download links seem to be broken so I wasn't able to  
check

its download size. If not too big, maybe JettyPlus is an option?
If not, there should be a way to have this feature implemented on  
solr

side? Maybe by prefixing the REST URLs with index names...

--
Thanks,
Jack




--
Matt Kangas / [EMAIL PROTECTED]





--
Matt Kangas / [EMAIL PROTECTED]




Recipe: multiple webapps in Jetty 6

2007-07-31 Thread Matt Kangas
For anyone who's been watching SOLR-215 (Multiple Solr Cores), or  
otherwise has wanted to run multiple Solr instances in a single Jetty  
instance... I've posted a new, improved recipe to http:// 
wiki.apache.org/solr/SolrJetty (scroll to bottom)


I've also attached a tarball with a known-good config for Solr 1.2.0  
 Jetty 6.1.3. It should be straightforward to define as many webapps  
as you need with this recipe.


Note: I'm pretty sure there is an even cleaner way to accomplish this  
too, without the need to fetch additional .jars and messing with  
JNDI, but I haven't fleshed out the details yet... will update the  
wiki if I get it working. :)


Cheers,
--Matt

--
Matt Kangas / [EMAIL PROTECTED]




Re: Please help! Solr 1.1 HTTP server stops responding

2007-07-30 Thread Matt Kangas

David,

If nothing on port 8983 responds, your servlet container is  
certainly the first thing that should be checked, because that is  
what's listening on port 8983.


First, let's need to figure out what version of Jetty you're using  
and how it is started -- which will lead you to the log files, if it  
is producing any. When Jetty/Solr is running correctly, try fetching  
any page from that host using curl -I.


Example: here's what I see on my laptop, with Solr running inside Jetty


shaft:R curl -I http://localhost:8983/
HTTP/1.1 404 Not Found
Content-Type: text/html; charset=iso-8859-1
Content-Length: 1287
Server: Jetty(6.1.3)


From this, I know it's Jetty 6.1.3.

Next, how is Jetty being started? Where is its jetty.xml  
configuration file? What does that file specify for RequestLog?


On my laptop, I'm manually starting it via java -jar start.jar.
On my work hosts, java -jar start.jar is being run from daemontools  
(unlikely in your case)
Or, Jetty can be invoked without the start.jar shortcut. That's  
just the default way of starting Solr.


My point is that I can't predict how it's started on your machine.  
You need to find out yourself.

On Linux:
- ps -ef | grep java
- look at that list, see which java process is the relevant one
- take the parent PID of that process and run ps -p value to see  
what process started it
- repeat until you find the script or program that started Jetty, and  
the path to jetty.xml


If the process actually was java -jar start.jar, then look for an  
etc subdir in the current working directory for that process.


HTH,
--Matt Kangas

(stepping in to help with what seems to be a panicked-newbie  
question...)




On Jul 30, 2007, at 2:35 PM, David Whalen wrote:


Hi Yonik!

I'm glad to finally get to talk to you.  We're all very impressed
with solr and when it's running it's really great.

We increased the heap size to 1500M and that didn't seem to help.
In fact, the crashes seem to occur more now than ever.  We're
constantly restarting solr just to get a response.

I don't know enough to know where the log files are to answer
your question (again, I'm filling in for the guy that set us
up with all this).  Can I ask for your patience so we can figure
this out?

Thanks!

Dave W



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
Of Yonik Seeley
Sent: Monday, July 30, 2007 2:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Please help! Solr 1.1 HTTP server stops responding

It may be related to the out-of-memory errors you were seeing.
severe errors like that should never be ignored.
Do you see any other warning or severe errors in your logs?

-Yonik

On 7/30/07, David Whalen [EMAIL PROTECTED] wrote:

Guys:

Can anyone help me?  Things are getting serious at my company and
heads are going to roll.

I need to figure out why solr just suddenly stops

responding without

any warning.

DW



-Original Message-
From: David Whalen [mailto:[EMAIL PROTECTED]
Sent: Friday, July 27, 2007 10:49 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr 1.1 HTTP server stops responding

We're using Jetty.  I don't know what version though.  To my
knowledge, Solr is the only thing running inside it.

Yes, we cannot get to the admin pages either.  Nothing on port
8983 responds.

So maybe it's actually Jetty that's messing me up?  How

can I make

sure of that?

Thanks for the help!

DW



-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Friday, July 27, 2007 10:40 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 1.1 HTTP server stops responding

Solr runs as a webapp (think .war file) inside a

servlet container

(e.g. Tomcat, Jetty, Resin...).  It could be that the

servlet contan

itself has a bug that prevents it from responding

properly after a

while.  If you have other webapps in the same container, do

they still

respond?  Can you got to
*any* of Solr's pages (e.g. admin page)?  Anything in

container or

Solr logs?

Otis
--
Lucene Consulting - http://lucene-consulting.com/



- Original Message 
From: David Whalen [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, July 27, 2007 4:21:18 PM
Subject: RE: Solr 1.1 HTTP server stops responding

Hi Otis.

I'm filling-in for the guy that installed the software

for us (now

he's long gone), so I'm just getting familiar with all of

this.  Can

you elaborate on what you mean?

DW



-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Friday, July 27, 2007 10:01 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 1.1 HTTP server stops responding

Hi David,

Have you ruled out your servlet container as the source

of this bug?


Otis


- Original Message 
From: David Whalen [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, July 27, 2007 3:06:42 PM
Subject: Solr 1.1 HTTP server stops responding

Hi All.

We're running Solr 1.1 and we're seeing intermittent cases

where Solr