RE: Indexing process causes Tomcat to stop working

2004-10-28 Thread James Tyrrell
From: "Armbrust, Daniel C." <[EMAIL PROTECTED]>
Right got back to work with newly created  index to try these ideas,
So, are you creating the indexes from inside the tomcat runtime, or are you 
creating them on the command line (which would be in a different runtime 
than tomcat)?
I'm creating them on the command line using a variation on the standard 
shown in the demo (has some additional optimisation input that is set to 
default until I can fix this bug).

What happens to tomcat?  Does it hang - still running but not responsive?  
Or does it crash?
If it hangs, maybe you are running out of memory.  By default, Tomcat's 
limit is set pretty low...
It definately hangs when shutdown you can't access it, when re-started it 
just sits there trying to access  port 8080

There is no reason at all you should have to reboot... If you stop and 
start tomcat, (make sure it >actually stopped - sometimes it requires a 
kill -9 when it really gets hung) it should start working >again.  
Depending on your setup of Tomcat + apache, you may  have to restart apache 
as well to >get them linked to each other again...
Good news this did work, however I never see tomcat in top or even using ps 
-A | grep tomcat, the only way I've found tomcat is using ps -auwx | grep 
tomcat. The output is

*after tomcat shutdown.sh run*
---
root  2266  0.0  3.8 243740 4860 pts/0   SOct26   0:36 
/opt/jdk1.4/bin/java -Djava.endorsed.dirs=/opt/tomcat/common/endorsed 
-classpath 
/opt/jdk1.4/lib/tools.jar:/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/commons-logging-api.jar 
-Dcatalina.base=/opt/tomcat -Dcatalina.home=/opt/tomcat 
-Djava.io.tmpdir=/opt/to
root 16050  0.0  0.4  3576  620 pts/0S08:41   0:00 grep tomcat
--

I did however find two java proccesses running so I duitifully used kill -9 
on both pid's, hey-presto when I restarted Tomcat it ran perfectly. So while 
I can work around this I think, I guess now the question becomes, does 
anyone have any advice as to what could be causing this? Bearing in mind I 
can still run java proccesses (even create new indexes) on the same machine 
so it is just Tomcat thats affected.

Meanwhile, I will try as Dan suggested to raise the default memory of Tomcat 
significantly and run another index (it seems a likely culprit).

Thanks for all the help thus far, its more than appreciated regards,
JT

Original Message-
From: James Tyrrell [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 27, 2004 10:49 AM
To: [EMAIL PROTECTED]
Subject: RE: Indexing process causes Tomcat to stop working
Aad,
  D'oh forgot to mention that mildly important info. Rather than
re-index I am just creating a new index each time, this makes things easier
to roll-back etc (which is what my boss wants). the command line is
something like  I
have wondered about whether sessions could be a problem, but I don't think
so, otherwise wouldn't a restart of Tomcat be sufficient rather than a
reboot? I even tried the killall command on java & tomcat then started
everything again to no avail.
cheers,
JT

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Locks and Readers and Writers

2004-10-28 Thread Christoph Kiehl
[EMAIL PROTECTED] wrote:
I'm getting:
java.io.IOException: Lock obtain timed out
I have
a writer service that opens the index to delete and add docs.  I have a reader
service that opens the index for searching only.
AFAIK you should never open an IndexWriter and an IndexReader at the 
same time. You should use only one of them at a time but you may open as 
many IndexSearchers as you like for searching.

Regards,
Christoph
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Indexing process causes Tomcat to stop working

2004-10-28 Thread iouli . golovatyi
before scewing tomcat too much...

1.make it sure both indexing and reading processes use the same locking 
directory (i.e. set it explicitly, take a look in wiky how to)
2. try to execute queries from command line and see what happends
3. in case your queries use sorting, there is a memory leak it 1.4.1 -> 
upgrade to 1.4.2

Regards,
J.






"James Tyrrell" <[EMAIL PROTECTED]>
28.10.2004 10:13
Please respond to "Lucene Users List"

 
To: [EMAIL PROTECTED]
cc: (bcc: Iouli Golovatyi/X/GP/Novartis)
Subject:RE: Indexing process causes Tomcat to stop working
Category: 



>From: "Armbrust, Daniel C." <[EMAIL PROTECTED]>

Right got back to work with newly created  index to try these ideas,

>So, are you creating the indexes from inside the tomcat runtime, or are 
you 
>creating them on the command line (which would be in a different runtime 
>than tomcat)?

I'm creating them on the command line using a variation on the standard 
shown in the demo (has some additional optimisation input that is set to 
default until I can fix this bug).

>What happens to tomcat?  Does it hang - still running but not responsive? 
 
>Or does it crash?
>If it hangs, maybe you are running out of memory.  By default, Tomcat's 
>limit is set pretty low...

It definately hangs when shutdown you can't access it, when re-started it 
just sits there trying to access  port 8080

>There is no reason at all you should have to reboot... If you stop and 
>start tomcat, (make sure it >actually stopped - sometimes it requires a 
>kill -9 when it really gets hung) it should start working >again. 
>Depending on your setup of Tomcat + apache, you may  have to restart 
apache 
>as well to >get them linked to each other again...

Good news this did work, however I never see tomcat in top or even using 
ps 
-A | grep tomcat, the only way I've found tomcat is using ps -auwx | grep 
tomcat. The output is

*after tomcat shutdown.sh run*
---
root  2266  0.0  3.8 243740 4860 pts/0   SOct26   0:36 
/opt/jdk1.4/bin/java -Djava.endorsed.dirs=/opt/tomcat/common/endorsed 
-classpath 
/opt/jdk1.4/lib/tools.jar:/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/commons-logging-api.jar
 

-Dcatalina.base=/opt/tomcat -Dcatalina.home=/opt/tomcat 
-Djava.io.tmpdir=/opt/to
root 16050  0.0  0.4  3576  620 pts/0S08:41   0:00 grep tomcat
--

I did however find two java proccesses running so I duitifully used kill 
-9 
on both pid's, hey-presto when I restarted Tomcat it ran perfectly. So 
while 
I can work around this I think, I guess now the question becomes, does 

anyone have any advice as to what could be causing this? Bearing in mind I 

can still run java proccesses (even create new indexes) on the same 
machine 
so it is just Tomcat thats affected.

Meanwhile, I will try as Dan suggested to raise the default memory of 
Tomcat 
significantly and run another index (it seems a likely culprit).

Thanks for all the help thus far, its more than appreciated regards,

JT


>Original Message-
>From: James Tyrrell [mailto:[EMAIL PROTECTED]
>Sent: Wednesday, October 27, 2004 10:49 AM
>To: [EMAIL PROTECTED]
>Subject: RE: Indexing process causes Tomcat to stop working
>
>Aad,
>   D'oh forgot to mention that mildly important info. Rather than
>re-index I am just creating a new index each time, this makes things 
easier
>to roll-back etc (which is what my boss wants). the command line is
>something like  
I
>have wondered about whether sessions could be a problem, but I don't 
think
>so, otherwise wouldn't a restart of Tomcat be sufficient rather than a
>reboot? I even tried the killall command on java & tomcat then started
>everything again to no avail.
>
>cheers,
>
>JT
>
>
>
>-
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
>-
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





Re: Locks and Readers and Writers

2004-10-28 Thread Morus Walter
Christoph Kiehl writes:
> 
> AFAIK you should never open an IndexWriter and an IndexReader at the 
> same time. You should use only one of them at a time but you may open as 
> many IndexSearchers as you like for searching.
> 
You cannot open an IndexSearcher without opening an IndexReader (explicitly
or implicitly).

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Search.jhtml ?

2004-10-28 Thread Willy De Waele

Hi,

I'm new using lucene.
I downloaded lucene 1.4.2 and added the 2 jar files to the
classpath.
Executing the demos as a bat file (Windows) is working fine, but
using lucene as a web 'application' is not working ...
Since I'm using netbeans, I startup the tomcat 5.0.28 with following
statements in the conf/Catalina/localhost/lucene.xml:



where the 'docBase' is the main-path to the lucene-dirs.
Then i call the 'Search.html' in the src/demo dir, modified because
of the path:
action=http://localhost:8080/lucene/src/demo/Search.jhtml
After entering a search argument (ie Java) I get the contents of the
jhtml file,
after changing the 'queryString' by 'java', I get following error
message:
"The requested resource (/lucene/src/demo/servletPath) is
not available"

I'm sure that I did something wrong, but where ???
TIA
Willy



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Indexing process causes Tomcat to stop working

2004-10-28 Thread James Tyrrell

From: [EMAIL PROTECTED]
Hello!
before scewing tomcat too much...
A little late but probably good advice thankfully it hasn't gone wrong
1.make it sure both indexing and reading processes use the same locking
directory (i.e. set it explicitly, take a look in wiky how to)
working on this not so good at Java yet (until recently I mostly worked on 
Php) , looked on the wiki "how to's" could you be more specific as I 
couldn't find much on locking directories. But I will struggle on

2. try to execute queries from command line and see what happends
I only exectute from the command line, so so all the info in previous posts 
is what happens

3. in case your queries use sorting, there is a memory leak it 1.4.1 ->
upgrade to 1.4.2
My queries do use sorting! So I have placed the 1.4 final jar onto my 
classpath and have started 'another' index, as the company I work for is 
moving home tomorrow may not be able to tell you if that worked till next 
week mind.

To Dan, the increased memory allocation for Tomcat didn't work unfortunately 
but I do  know a lot more about catalina_opt and Tomcat now which has proved 
handy for other things.

cheers for all the advice people will keep you posted if I make a 
breakthrough
thanks for your patience, regards,

JT
Regards,
J.
"James Tyrrell" <[EMAIL PROTECTED]>
28.10.2004 10:13
Please respond to "Lucene Users List"
To: [EMAIL PROTECTED]
cc: (bcc: Iouli Golovatyi/X/GP/Novartis)
Subject:RE: Indexing process causes Tomcat to stop working
Category:

>From: "Armbrust, Daniel C." <[EMAIL PROTECTED]>
Right got back to work with newly created  index to try these ideas,
>So, are you creating the indexes from inside the tomcat runtime, or are
you
>creating them on the command line (which would be in a different runtime
>than tomcat)?
I'm creating them on the command line using a variation on the standard
shown in the demo (has some additional optimisation input that is set to
default until I can fix this bug).
>What happens to tomcat?  Does it hang - still running but not responsive?
>Or does it crash?
>If it hangs, maybe you are running out of memory.  By default, Tomcat's
>limit is set pretty low...
It definately hangs when shutdown you can't access it, when re-started it
just sits there trying to access  port 8080
>There is no reason at all you should have to reboot... If you stop and
>start tomcat, (make sure it >actually stopped - sometimes it requires a
>kill -9 when it really gets hung) it should start working >again.
>Depending on your setup of Tomcat + apache, you may  have to restart
apache
>as well to >get them linked to each other again...
Good news this did work, however I never see tomcat in top or even using
ps
-A | grep tomcat, the only way I've found tomcat is using ps -auwx | grep
tomcat. The output is
*after tomcat shutdown.sh run*
---
root  2266  0.0  3.8 243740 4860 pts/0   SOct26   0:36
/opt/jdk1.4/bin/java -Djava.endorsed.dirs=/opt/tomcat/common/endorsed
-classpath
/opt/jdk1.4/lib/tools.jar:/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/commons-logging-api.jar
-Dcatalina.base=/opt/tomcat -Dcatalina.home=/opt/tomcat
-Djava.io.tmpdir=/opt/to
root 16050  0.0  0.4  3576  620 pts/0S08:41   0:00 grep tomcat
--
I did however find two java proccesses running so I duitifully used kill
-9
on both pid's, hey-presto when I restarted Tomcat it ran perfectly. So
while
I can work around this I think, I guess now the question becomes, does
anyone have any advice as to what could be causing this? Bearing in mind I
can still run java proccesses (even create new indexes) on the same
machine
so it is just Tomcat thats affected.
Meanwhile, I will try as Dan suggested to raise the default memory of
Tomcat
significantly and run another index (it seems a likely culprit).
Thanks for all the help thus far, its more than appreciated regards,
JT
>Original Message-
>From: James Tyrrell [mailto:[EMAIL PROTECTED]
>Sent: Wednesday, October 27, 2004 10:49 AM
>To: [EMAIL PROTECTED]
>Subject: RE: Indexing process causes Tomcat to stop working
>
>Aad,
>   D'oh forgot to mention that mildly important info. Rather than
>re-index I am just creating a new index each time, this makes things
easier
>to roll-back etc (which is what my boss wants). the command line is
>something like 
I
>have wondered about whether sessions could be a problem, but I don't
think
>so, otherwise wouldn't a restart of Tomcat be sufficient rather than a
>reboot? I even tried the killall command on java & tomcat then started
>everything again to no avail.
>
>cheers,
>
>JT
>
>
>
>-
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
>-
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>

--

RE: Indexing process causes Tomcat to stop working

2004-10-28 Thread Armbrust, Daniel C.
You want version 1.4.2, not version 1.4.

The website makes it hard to find 1.4.2, because the mirrors have not been updated yet.

Get 1.4.2 here:  http://cvs.apache.org/dist/jakarta/lucene/v1.4.2/
 

>My queries do use sorting! So I have placed the 1.4 final jar onto my 
>classpath and have started 'another' index, as the company I work for is 
>moving home tomorrow may not be able to tell you if that worked till next 
>week mind.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Indexing process causes Tomcat to stop working

2004-10-28 Thread iouli . golovatyi
>From: [EMAIL PROTECTED]
Hello!

>before scewing tomcat too much...

A little late but probably good advice thankfully it hasn't gone wrong

>1.make it sure both indexing and reading processes use the same locking
>directory (i.e. set it explicitly, take a look in wiky how to)

working on this not so good at Java yet (until recently I mostly worked on 

Php) , looked on the wiki "how to's" could you be more specific as I 
couldn't find much on locking directories. But I will struggle on

Here is the simplest way to do it:
System.setProperty("java.io.tmpdir",tmpdir);
Just drop this line in indexing and quering modules. You'll  see after 
that smth. like lucene-bla-bla.lock files. In case processs is stumling 
You might delete them manually.

>2. try to execute queries from command line and see what happends

I only exectute from the command line, so so all the info in previous 
posts 
is what happens

Wait a minute, indexing is from command line but quering from tomcat or 
what?!

>3. in case your queries use sorting, there is a memory leak it 1.4.1 ->
>upgrade to 1.4.2

My queries do use sorting!

Congratulations! This is definitly an issue - sever will be out of memory 
and probably out of open file descriptors pretty quick, so You are half a 
way to success.:)

 So I have placed the 1.4 final jar onto my 
classpath and have started 'another' index, as the company I work for is 
moving home tomorrow may not be able to tell you if that worked till next 
week mind.

To Dan, the increased memory allocation for Tomcat didn't work 
unfortunately 
but I do  know a lot more about catalina_opt and Tomcat now which has 
proved 
handy for other things.

cheers for all the advice people will keep you posted if I make a 
breakthrough
thanks for your patience, regards,

JT

>Regards,
>J.
>
>"James Tyrrell" <[EMAIL PROTECTED]>
>28.10.2004 10:13
>Please respond to "Lucene Users List"
>
>
> To: [EMAIL PROTECTED]
> cc: (bcc: Iouli Golovatyi/X/GP/Novartis)
> Subject:RE: Indexing process causes Tomcat to stop 
working
> Category:
>
>
>
> >From: "Armbrust, Daniel C." <[EMAIL PROTECTED]>
>
>Right got back to work with newly created  index to try these ideas,
>
> >So, are you creating the indexes from inside the tomcat runtime, or are
>you
> >creating them on the command line (which would be in a different 
runtime
> >than tomcat)?
>
>I'm creating them on the command line using a variation on the standard
>shown in the demo (has some additional optimisation input that is set to
>default until I can fix this bug).
>
> >What happens to tomcat?  Does it hang - still running but not 
responsive?
>
> >Or does it crash?
> >If it hangs, maybe you are running out of memory.  By default, Tomcat's
> >limit is set pretty low...
>
>It definately hangs when shutdown you can't access it, when re-started it
>just sits there trying to access  port 8080
>
> >There is no reason at all you should have to reboot... If you stop and
> >start tomcat, (make sure it >actually stopped - sometimes it requires a
> >kill -9 when it really gets hung) it should start working >again.
> >Depending on your setup of Tomcat + apache, you may  have to restart
>apache
> >as well to >get them linked to each other again...
>
>Good news this did work, however I never see tomcat in top or even using
>ps
>-A | grep tomcat, the only way I've found tomcat is using ps -auwx | grep
>tomcat. The output is
>
>*after tomcat shutdown.sh run*
>---
>root  2266  0.0  3.8 243740 4860 pts/0   SOct26   0:36
>/opt/jdk1.4/bin/java -Djava.endorsed.dirs=/opt/tomcat/common/endorsed
>-classpath
>/opt/jdk1.4/lib/tools.jar:/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/commons-logging-api.jar
>
>-Dcatalina.base=/opt/tomcat -Dcatalina.home=/opt/tomcat
>-Djava.io.tmpdir=/opt/to
>root 16050  0.0  0.4  3576  620 pts/0S08:41   0:00 grep 
tomcat
>--
>
>I did however find two java proccesses running so I duitifully used kill
>-9
>on both pid's, hey-presto when I restarted Tomcat it ran perfectly. So
>while
>I can work around this I think, I guess now the question becomes, 
does
>
>anyone have any advice as to what could be causing this? Bearing in mind 
I
>
>can still run java proccesses (even create new indexes) on the same
>machine
>so it is just Tomcat thats affected.
>
>Meanwhile, I will try as Dan suggested to raise the default memory of
>Tomcat
>significantly and run another index (it seems a likely culprit).
>
>Thanks for all the help thus far, its more than appreciated regards,
>
>JT
>
>
> >Original Message-
> >From: James Tyrrell [mailto:[EMAIL PROTECTED]
> >Sent: Wednesday, October 27, 2004 10:49 AM
> >To: [EMAIL PROTECTED]
> >Subject: RE: Indexing process causes Tomcat to stop working
> >
> >Aad,
> >   D'oh forgot to mention that mildly important info. Rather than
> >re-index I am just creating a new index each time, this makes things
>easier
> >to roll-back etc (which is what my boss wants). the 

RE: Searchable Solutions Please

2004-10-28 Thread gwithers
A quick pointer..

What you want to look at is using a stemming implementation.  Look, for
example, at the FAQ and docs related to the PorterStemFilter and writing
A customer analyzer
(http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.index
ing&toc=faq#q17).

There is a lot of information regarding this but you'll need the same
analyzer for index and query and this would be more or less English only.

-George

> -Original Message-
> From: Karthik N S [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 28, 2004 1:47 AM
> To: LUCENE
> Subject: Searchable Solutions Please
> 
> 
> Hi Guys
> 
> 
> Aplologies
> 
> 
> On a Using the  Lucene Search , If returned hits for the following is to
be
> aquired
> 
> Search Word =' kids watches '
> Hits on docs  returned should have =kid's , kid watch , junior watches
> 
> 
> Solution's Please
> 
> 
> Thx in advance
> 
> 
> 
> 
> 
> 
>   WITH WARM REGARDS
>   HAVE A NICE DAY
>   [ N.S.KARTHIK]
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


Searching for a phrase that contains quote character

2004-10-28 Thread Will Allen

I am having this same problem, but cannot find any help!

I have a keyword field that sometimes includes double quotes, but I am unable to 
search for that field because the escape for a quote doesnt work!

I have tried a number of things:

myfield:"lucene is \"cool\""

AND

myfield:"lucene is \\"cool\\""


http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]&msgNo=7351

From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
Subject: Searching for a phrase that contains quote character
Date: Wed, 24 Mar 2004 21:25:16 +

I'd like to search for a phrase that contains the quote character. I've tried 
escaping the quote character, but am receiving a ParseException from the 
QueryParser:

For example to search for the phrase:

 this is a "test"

I'm trying the following

 QueryParser.parse("field:\"This is a \\\"test\\\"\"", "field", new 
StandardAnalyzer());

This results in:

org.apache.lucene.queryParser.ParseException: Lexical error at line 1, column 31.  
Encountered:  after : ""
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:111)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:87)
...

What is the proper way to accomplish this?

--Dan

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching for a phrase that contains quote character

2004-10-28 Thread Justin Swanhart
Have you tried making a term query by hand and testing to see if it works?  

Term t = new Term("field", "this is a \"test\"");
PhraseQuery pq = new PhraseQuery(t);
...



On Thu, 28 Oct 2004 12:02:48 -0400, Will Allen <[EMAIL PROTECTED]> wrote:
> 
> I am having this same problem, but cannot find any help!
> 
> I have a keyword field that sometimes includes double quotes, but I am unable to 
> search for that field because the escape for a quote doesnt work!
> 
> I have tried a number of things:
> 
> myfield:"lucene is \"cool\""
> 
> AND
> 
> myfield:"lucene is \\"cool\\""
> 
> http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]&msgNo=7351
> 
> From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
> Subject: Searching for a phrase that contains quote character
> Date: Wed, 24 Mar 2004 21:25:16 +
> 
> I'd like to search for a phrase that contains the quote character. I've tried
> escaping the quote character, but am receiving a ParseException from the
> QueryParser:
> 
> For example to search for the phrase:
> 
>  this is a "test"
> 
> I'm trying the following
> 
>  QueryParser.parse("field:\"This is a \\\"test\\\"\"", "field", new 
> StandardAnalyzer());
> 
> This results in:
> 
> org.apache.lucene.queryParser.ParseException: Lexical error at line 1, column 31.  
> Encountered:  after : ""
> at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:111)
> at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:87)
> ...
> 
> What is the proper way to accomplish this?
> 
> --Dan
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Negative boosting?

2004-10-28 Thread Jason Haruska
Hi Terry,

I know this is an old message on the list but it does not look like
anyone responded to your request. I had to do negative boosting for my
search functionality as well so I'd like to share the modification to
QueryParser.jj to make it work. Find your  TOKEN and change it
to:

 TOKEN : {
)+ ( "." (<_NUM_CHAR>)+ )? > : DEFAULT
}


This simply allows an optional negative sign in front of a boost
value. After stepping through the program I've seen the
negative value carry through to the weight calculation.

Hope this helps.

--- Start Quote 
I've often found the use of query-based boosting to be very
beneficial.  This is particularly so when it's easy to identify the
term that I want to stand out as a primary selector.

However, I've come across quite a few other cases where it would be
easier (and more logical) to apply a negative boost - to de-emphasize
the match when the term is present.

Is it possible to apply a negative boost (It doesn't seem to work),
and if not, would it break anything significant [sic] if that were added?

Regards,

Terry

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching for a phrase that contains quote character

2004-10-28 Thread Erik Hatcher
On Oct 28, 2004, at 1:03 PM, Justin Swanhart wrote:
Have you tried making a term query by hand and testing to see if it  
works?

Term t = new Term("field", "this is a \"test\"");
PhraseQuery pq = new PhraseQuery(t);
That's not accurate API, but add you used pq.add(t), it still would  
presume that text is all a single term.

Chances are, though, that even getting the query to have the quotes is  
not going to work as you've probably lost the quotes during indexing.   
Check out the AnalysisParalysis page on the wiki and "analyze" your  
Analyzer and make sure you are indexing the text with the quotes (no  
built-in analyzer besides WhitespaceAnalyzer would do that for you).

Erik

...

On Thu, 28 Oct 2004 12:02:48 -0400, Will Allen  
<[EMAIL PROTECTED]> wrote:
I am having this same problem, but cannot find any help!
I have a keyword field that sometimes includes double quotes, but I  
am unable to search for that field because the escape for a quote  
doesnt work!

I have tried a number of things:
myfield:"lucene is \"cool\""
AND
myfield:"lucene is \\"cool\\""
http://issues.apache.org/eyebrowse/ReadMsg?listName=lucene- 
[EMAIL PROTECTED]&msgNo=7351

From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
Subject: Searching for a phrase that contains quote character
Date: Wed, 24 Mar 2004 21:25:16 +
I'd like to search for a phrase that contains the quote character.  
I've tried
escaping the quote character, but am receiving a ParseException from  
the
QueryParser:

For example to search for the phrase:
 this is a "test"
I'm trying the following
 QueryParser.parse("field:\"This is a \\\"test\\\"\"", "field",  
new StandardAnalyzer());

This results in:
org.apache.lucene.queryParser.ParseException: Lexical error at line  
1, column 31.  Encountered:  after : ""
at  
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:111)
at  
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:87)
...

What is the proper way to accomplish this?
--Dan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Searching for a phrase that contains quote character

2004-10-28 Thread Will Allen
I am using a NullAnalyzer for this field.  

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, October 28, 2004 2:00 PM
To: Lucene Users List
Subject: Re: Searching for a phrase that contains quote character



On Oct 28, 2004, at 1:03 PM, Justin Swanhart wrote:
> Have you tried making a term query by hand and testing to see if it  
> works?
>
> Term t = new Term("field", "this is a \"test\"");
> PhraseQuery pq = new PhraseQuery(t);

That's not accurate API, but add you used pq.add(t), it still would  
presume that text is all a single term.

Chances are, though, that even getting the query to have the quotes is  
not going to work as you've probably lost the quotes during indexing.   
Check out the AnalysisParalysis page on the wiki and "analyze" your  
Analyzer and make sure you are indexing the text with the quotes (no  
built-in analyzer besides WhitespaceAnalyzer would do that for you).

Erik


> ...
>
>
>
> On Thu, 28 Oct 2004 12:02:48 -0400, Will Allen  
> <[EMAIL PROTECTED]> wrote:
>>
>> I am having this same problem, but cannot find any help!
>>
>> I have a keyword field that sometimes includes double quotes, but I  
>> am unable to search for that field because the escape for a quote  
>> doesnt work!
>>
>> I have tried a number of things:
>>
>> myfield:"lucene is \"cool\""
>>
>> AND
>>
>> myfield:"lucene is \\"cool\\""
>>
>> http://issues.apache.org/eyebrowse/ReadMsg?listName=lucene- 
>> [EMAIL PROTECTED]&msgNo=7351
>>
>> From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
>> Subject: Searching for a phrase that contains quote character
>> Date: Wed, 24 Mar 2004 21:25:16 +
>>
>> I'd like to search for a phrase that contains the quote character.  
>> I've tried
>> escaping the quote character, but am receiving a ParseException from  
>> the
>> QueryParser:
>>
>> For example to search for the phrase:
>>
>>  this is a "test"
>>
>> I'm trying the following
>>
>>  QueryParser.parse("field:\"This is a \\\"test\\\"\"", "field",  
>> new StandardAnalyzer());
>>
>> This results in:
>>
>> org.apache.lucene.queryParser.ParseException: Lexical error at line  
>> 1, column 31.  Encountered:  after : ""
>> at  
>> org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:111)
>> at  
>> org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:87)
>> ...
>>
>> What is the proper way to accomplish this?
>>
>> --Dan
>>
>> -
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching for a phrase that contains quote character

2004-10-28 Thread Daniel Naber
On Thursday 28 October 2004 19:03, Justin Swanhart wrote:

> Have you tried making a term query by hand and testing to see if it
> works? Â
>
> Term t = new Term("field", "this is a \"test\"");
> PhraseQuery pq = new PhraseQuery(t);

That's not a proper PharseQuery, it searches for *one* 
term >this is a "test"< which is probably not what one wants. You 
have to add the terms one by one to a PhraseQuery.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Search.jhtml ?

2004-10-28 Thread Daniel Naber
On Thursday 28 October 2004 15:01, Willy De Waele wrote:

> Executing the demos as a bat file (Windows) is working fine, but
> using lucene as a web 'application' is not working ...

I think that Search.jhtml is totally outdated, please try src/jsp instead.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Searching against index in memory

2004-10-28 Thread Ravi
If I have a document set of 10,000 docs and my merge factor is 1000, for
every 1000 documents, Lucene creates a new segment. By the time Lucene
indexes 4500 documents, index will have 4000 documents on the disk and
index for 500 documents is stored in memory. How can I search against
this index at the same time from a different JVM? I can access the 4000
docs on the disk. But what about those in the memory on the indexing
box? Is there a way to do this? 

Thanks
Ravi. 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lots Of Interest in Lucene Desktop

2004-10-28 Thread Kevin A. Burton
I've made a few passive mentions of my Lucene 
 Desktop prototype here on PeerFear 
in the last few days and I'm amazed how much feedback I've had. People 
really want to start work on an Open Source desktop search based on 
Lucene.


http://www.peerfear.org/rss/permalink/2004/10/28/LotsOfInterestInLuceneDesktop/
--
Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an 
invite!  Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
If you're interested in RSS, Weblogs, Social Networking, etc... then you 
should work for Rojo!  If you recommend someone and we hire them you'll 
get a free iPod!
   
Kevin A. Burton, Location - San Francisco, CA
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412



Re: Negative boosting?

2004-10-28 Thread Jason Haruska
You'll have to run tests but it shouldn't. All it does is change the
NUMBER token to accept an optional "-" in front of a number. So,
existing queries with no negative numbers should not be impacted.


On Thu, 28 Oct 2004 13:50:47 -0400, Terry Steichen <[EMAIL PROTECTED]> wrote:
>  
> Jason, 
>   
> Thanks for the info.  I'll try it. 
>   
> Do you know if the change you describe has any impact on existing queries
> (other than to support negative boosting, if that's specified)?  In other
> words, if I apply this change to existing queries, will I likely see any
> changes in results because of the change? 
>   
> Regards, 
>   
> Terry
> 
>  
>  
> - Original Message - 
> From: Jason Haruska 
> To: Lucene Users List 
> Sent: Thursday, October 28, 2004 1:23 PM 
> Subject: Re: Negative boosting? 
> 
> Hi Terry,
> 
> I know this is an old message on the list but it does not look like
> anyone responded to your request. I had to do negative boosting for my
> search functionality as well so I'd like to share the modification to
> QueryParser.jj to make it work. Find your  TOKEN and change it
> to:
> 
>  TOKEN : {
> )+ ( "." (<_NUM_CHAR>)+ )? > : DEFAULT
> }
> 
> 
> This simply allows an optional negative sign in front of a boost
> value. After stepping through the program I've seen the
> negative value carry through to the weight calculation.
> 
> Hope this helps.
> 
> --- Start Quote 
> I've often found the use of query-based boosting to be very
> beneficial.  This is particularly so when it's easy to identify the
> term that I want to stand out as a primary selector.
> 
> However, I've come across quite a few other cases where it would be
> easier (and more logical) to apply a negative boost - to de-emphasize
> the match when the term is present.
> 
> Is it possible to apply a negative boost (It doesn't seem to work),
> and if not, would it break anything significant [sic] if that were added?
> 
> Regards,
> 
> Terry
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching for a phrase that contains quote character

2004-10-28 Thread Justin Swanhart
absolutely correct.  sorry about that.  shouldn't code before coffee :)


On Thu, 28 Oct 2004 20:16:16 +0200, Daniel Naber
<[EMAIL PROTECTED]> wrote:
> On Thursday 28 October 2004 19:03, Justin Swanhart wrote:
> 
> > Have you tried making a term query by hand and testing to see if it
> > works?  
> >
> > Term t = new Term("field", "this is a \"test\"");
> > PhraseQuery pq = new PhraseQuery(t);
> 
> That's not a proper PharseQuery, it searches for *one*
> term >this is a "test"< which is probably not what one wants. You
> have to add the terms one by one to a PhraseQuery.
> 
> Regards
>  Daniel
> 
> --
> http://www.danielnaber.de
> 
> -
> 
> 
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching for a phrase that contains quote character

2004-10-28 Thread Erik Hatcher
On Oct 28, 2004, at 2:02 PM, Will Allen wrote:
I am using a NullAnalyzer for this field.
Which means that each field is added exactly as-is as a single term?
Then trying the PhraseQuery directly is a good first step  - if you can 
get that to work then you can move on to making QueryParser work with 
escaping.  But don't complicate things with QueryParser at first.  
Start with the queries constructed directly first.

Erik
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, October 28, 2004 2:00 PM
To: Lucene Users List
Subject: Re: Searching for a phrase that contains quote character

On Oct 28, 2004, at 1:03 PM, Justin Swanhart wrote:
Have you tried making a term query by hand and testing to see if it
works?
Term t = new Term("field", "this is a \"test\"");
PhraseQuery pq = new PhraseQuery(t);
That's not accurate API, but add you used pq.add(t), it still would
presume that text is all a single term.
Chances are, though, that even getting the query to have the quotes is
not going to work as you've probably lost the quotes during indexing.
Check out the AnalysisParalysis page on the wiki and "analyze" your
Analyzer and make sure you are indexing the text with the quotes (no
built-in analyzer besides WhitespaceAnalyzer would do that for you).
Erik

...

On Thu, 28 Oct 2004 12:02:48 -0400, Will Allen
<[EMAIL PROTECTED]> wrote:
I am having this same problem, but cannot find any help!
I have a keyword field that sometimes includes double quotes, but I
am unable to search for that field because the escape for a quote
doesnt work!
I have tried a number of things:
myfield:"lucene is \"cool\""
AND
myfield:"lucene is \\"cool\\""
http://issues.apache.org/eyebrowse/ReadMsg?listName=lucene-
[EMAIL PROTECTED]&msgNo=7351
From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
Subject: Searching for a phrase that contains quote character
Date: Wed, 24 Mar 2004 21:25:16 +
I'd like to search for a phrase that contains the quote character.
I've tried
escaping the quote character, but am receiving a ParseException from
the
QueryParser:
For example to search for the phrase:
 this is a "test"
I'm trying the following
 QueryParser.parse("field:\"This is a \\\"test\\\"\"", "field",
new StandardAnalyzer());
This results in:
org.apache.lucene.queryParser.ParseException: Lexical error at line
1, column 31.  Encountered:  after : ""
at
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:111)
at
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:87)
...
What is the proper way to accomplish this?
--Dan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Searching for a phrase that contains quote character

2004-10-28 Thread Will Allen
The nullanalyzer overrides the isTokenChar method to simply return true in the 
tokenizer class (http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1703655).

The situation is that it seems lucene does not expect you to escape characters that 
exist inside of a quoted string.  So my search [ authorkeyword:"MariaMy*" ] works, but 
[ authorkeyword:"MariaMy\*" ] does not, even though the * character should be escaped 
(http://jakarta.apache.org/lucene/docs/queryparsersyntax.html#Terms)

So, if this is true, then the rule might be, reserved characters must be escaped 
EXCEPT when they are within double quotes as a phrase.  When double quotes are needed 
within a phrase, they should be escaped with a .. ?

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, October 28, 2004 3:05 PM
To: Lucene Users List
Subject: Re: Searching for a phrase that contains quote character


On Oct 28, 2004, at 2:02 PM, Will Allen wrote:
> I am using a NullAnalyzer for this field.

Which means that each field is added exactly as-is as a single term?

Then trying the PhraseQuery directly is a good first step  - if you can 
get that to work then you can move on to making QueryParser work with 
escaping.  But don't complicate things with QueryParser at first.  
Start with the queries constructed directly first.

Erik

>
> -Original Message-
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> Sent: Thursday, October 28, 2004 2:00 PM
> To: Lucene Users List
> Subject: Re: Searching for a phrase that contains quote character
>
>
>
> On Oct 28, 2004, at 1:03 PM, Justin Swanhart wrote:
>> Have you tried making a term query by hand and testing to see if it
>> works?
>>
>> Term t = new Term("field", "this is a \"test\"");
>> PhraseQuery pq = new PhraseQuery(t);
>
> That's not accurate API, but add you used pq.add(t), it still would
> presume that text is all a single term.
>
> Chances are, though, that even getting the query to have the quotes is
> not going to work as you've probably lost the quotes during indexing.
> Check out the AnalysisParalysis page on the wiki and "analyze" your
> Analyzer and make sure you are indexing the text with the quotes (no
> built-in analyzer besides WhitespaceAnalyzer would do that for you).
>
>   Erik
>
>
>> ...
>>
>>
>>
>> On Thu, 28 Oct 2004 12:02:48 -0400, Will Allen
>> <[EMAIL PROTECTED]> wrote:
>>>
>>> I am having this same problem, but cannot find any help!
>>>
>>> I have a keyword field that sometimes includes double quotes, but I
>>> am unable to search for that field because the escape for a quote
>>> doesnt work!
>>>
>>> I have tried a number of things:
>>>
>>> myfield:"lucene is \"cool\""
>>>
>>> AND
>>>
>>> myfield:"lucene is \\"cool\\""
>>>
>>> http://issues.apache.org/eyebrowse/ReadMsg?listName=lucene-
>>> [EMAIL PROTECTED]&msgNo=7351
>>>
>>> From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
>>> Subject: Searching for a phrase that contains quote character
>>> Date: Wed, 24 Mar 2004 21:25:16 +
>>>
>>> I'd like to search for a phrase that contains the quote character.
>>> I've tried
>>> escaping the quote character, but am receiving a ParseException from
>>> the
>>> QueryParser:
>>>
>>> For example to search for the phrase:
>>>
>>>  this is a "test"
>>>
>>> I'm trying the following
>>>
>>>  QueryParser.parse("field:\"This is a \\\"test\\\"\"", "field",
>>> new StandardAnalyzer());
>>>
>>> This results in:
>>>
>>> org.apache.lucene.queryParser.ParseException: Lexical error at line
>>> 1, column 31.  Encountered:  after : ""
>>> at
>>> org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:111)
>>> at
>>> org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:87)
>>> ...
>>>
>>> What is the proper way to accomplish this?
>>>
>>> --Dan
>>>
>>> -
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>>
>> -
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Searching for a path

2004-10-28 Thread Bill Tschumy
I have a need to search an index for documents that were taken ffrom 
particulars files in the filesystem.

Each document in the index has a field named "url" that is created 
using:

doc.add(Field.Text("url", urlStr));
I understand this is both stored and indexed.
My search works if I do something like:
	String queryStr = "\"file:///someDir/someOtherDir/File.txt\""
 	query = MultiFieldQueryParser.parse("url:" + queryString, 
searchedFields, new StandardAnalyzer());
 	hits = searcher.search(query);

It is important for me to quote the path for the search to succeed
I was hoping to speed the search up a bit by bypassing the QueryParser. 
 However, if I do something like

String queryStr = "\"file:///someDir/someOtherDir/File.txt\""
Query query = new TermQuery(new Term("url", queryStr));
hits = searcher.search(query);
I get zero hits.  Why are these not equivalent?  I think it has 
something to do with the fact that the url needs to be quoted so I 
search for an exact match.  It does work if I have stored the url as a 
"Field.Keyword" rather than as "Field.Text" and then don't need to 
quote the string.  However I would prefer not to have to change the 
format of the index.

Thanks for any help.
--
Bill Tschumy
Otherwise -- Austin, TX
http://www.otherwise.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Searching for a path

2004-10-28 Thread Daniel Naber
On Friday 29 October 2004 00:22, Bill Tschumy wrote:

> I get zero hits. ÂWhy are these not equivalent? ÂI think it has
> something to do with the fact that the url needs to be quoted so I
> search for an exact match.

When you manually build the query there's no need to have quotes around it. 
Can you try without the quotes?

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching for a path

2004-10-28 Thread Bill Tschumy
I have tried that and it doesn't work either.  I have also tried using 
a PhraseQuery rather than TermQuery.

On Oct 28, 2004, at 5:29 PM, Daniel Naber wrote:
On Friday 29 October 2004 00:22, Bill Tschumy wrote:
I get zero hits.  Why are these not equivalent?  I think it has
something to do with the fact that the url needs to be quoted so I
search for an exact match.
When you manually build the query there's no need to have quotes 
around it.
Can you try without the quotes?

Regards
 Daniel
--
http://www.danielnaber.de
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--
Bill Tschumy
Otherwise -- Austin, TX
http://www.otherwise.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Faster highlighting with TermPositionVectors

2004-10-28 Thread markharw00d
Thanks to the recent changes (see CVS) in TermFreqVector support we can now make use 
of term offset information held 
in the Lucene index rather than incurring the cost of re-analyzing text to highlight 
it.

I have created a  class ( see http://www.inperspective.com/lucene/TokenSources.java ) 
which handles creating
a TokenStream from the TermPositionVector stored in the database which can then be 
passed to the highlighter.
This approach is significantly faster than re-parsing the original text.
If people are happy with this class I'll add it to the Highlighter sandbox but it may 
sit better elsewhere in the Lucene code base
as a more general purpose utility.

BTW as part of putting this together I found that the TermFreq code throws a null 
pointer when indexing fields
that produce no tokens (ie empty or all stopwords). Otherwise things work very well.


Cheers
Mark



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new version of NewMultiFieldQueryParser

2004-10-28 Thread Bill Janssen
> Try to see the behavior if you want to have a single term query 
> juat something like: "robust" .. and print out the query string ...

Sure, that works fine.  For instance, if you have the three default
fields "title", "authors", and "contents", the one-word search
"robust" expands to

   title:foobar authors:foobar contents:foobar

just as it should.

>  Try to see what is happening with Prefix, Wild, and Fuzzy searches ...

Good point.  My older version (see below) found these, but the new one
doesn't.  Oh, well, back to the working version.  I knew there was some
reason getFieldQuery wasn't sufficient.

The working version is in the file SearchTest.java, which you can find
at ftp://ftp.parc.xerox.com/transient/janssen/SearchTest.java.  It's a
test program which runs the query through the NewMultiFieldQueryParser,
and then prints it out, so that you can see what the expansion is.

Bill

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Ability to apply document age with the score?

2004-10-28 Thread Kevin A. Burton
Lets say I have an index with two documents.  They both have the same 
score but one was added 6 months ago and the other was added 2 minutes ago.

I want the score adjusted based on the age so that older documents have 
a lower score.

I don't want to sort by document age (date) because if one document is 
older but has a HIGHER score it would be better to have it rise above 
newer documents that have a lower score.

Is this possible?  The only way I could think of doing it would be to 
have a DateFilter and then apply a dampening after the query.

Kevin
--
Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an 
invite!  Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
If you're interested in RSS, Weblogs, Social Networking, etc... then you 
should work for Rojo!  If you recommend someone and we hire them you'll 
get a free iPod!
   
Kevin A. Burton, Location - San Francisco, CA
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Faster highlighting with TermPositionVectors

2004-10-28 Thread Fred Toth
Hi,
We are very interested in highlighting, but haven't gotten around
to reviewing the state of the highlighting mechanisms.
Could someone possibly give me the "big picture" on highlighting?
What code is available?
How does it work?
What are the current issues?
Many thanks,
Fred
At 07:16 PM 10/28/2004, you wrote:
Thanks to the recent changes (see CVS) in TermFreqVector support we can 
now make use of term offset information held
in the Lucene index rather than incurring the cost of re-analyzing text to 
highlight it.

I have created a  class ( see 
http://www.inperspective.com/lucene/TokenSources.java ) which handles creating
a TokenStream from the TermPositionVector stored in the database which can 
then be passed to the highlighter.
This approach is significantly faster than re-parsing the original text.
If people are happy with this class I'll add it to the Highlighter sandbox 
but it may sit better elsewhere in the Lucene code base
as a more general purpose utility.

BTW as part of putting this together I found that the TermFreq code throws 
a null pointer when indexing fields
that produce no tokens (ie empty or all stopwords). Otherwise things work 
very well.

Cheers
Mark

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


LUCENE INDEX STATISTICS

2004-10-28 Thread Karthik N S


Hi Guys

Apologies.


Can some body provide approximate Statics about the following factor for
Developement  and Deployment of Lucene   [ it may be usefull for Pro's
Developers ]

a)  Creation Indexing

1) X  [ Say 100 Million ] of  number of documents  Y  [ Kilobytes ]
with  Z no of Fields
 Hardware requirement [ RAM / Os / Processor / HardDisk Space  /
Other  Specific Details  ]
 Software [ Jdk Version / Lucene Version / Appserver Version ]


 2) X [Say 100 Million]  number  to create  Merged Indexes
  Hardware requirement [ RAM / Os / Processor / HardDisk Space  /
Other  Specific Details  ]
  Software [ Jdk Version / Lucene Version / Appserver Version ]


b)Searching  on Indexes   [ 2  number of Persons Searching  per  Sec  ]

1) X  [ Say 100 Million ] of  number of documents  Y  [ Kilobytes ]
with  Z no of Fields
Hardware requirement [ RAM / Os / Processor / HardDisk Space  /
Other  Specific Details  ]
Software [ Jdk Version / Lucene Version / Appserver Version ]


 2)X [Say 100 Million]  number of Merged Indexes
 Hardware requirement [ RAM / Os / Processor / HardDisk Space  /
Other  Specific Details  ]
 Software [ Jdk Version / Lucene Version / Appserver Version ]



Thx in Advance
Karthik


  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Faster highlighting with TermPositionVectors

2004-10-28 Thread Bruce Ritchie
Mark,

> Thanks to the recent changes (see CVS) in TermFreqVector 
> support we can now make use of term offset information held 
> in the Lucene index rather than incurring the cost of 
> re-analyzing text to highlight it.
> 
> I have created a  class ( see 
> http://www.inperspective.com/lucene/TokenSources.java ) which 
> handles creating a TokenStream from the TermPositionVector 
> stored in the database which can then be passed to the highlighter.
> This approach is significantly faster than re-parsing the 
> original text.
> If people are happy with this class I'll add it to the 
> Highlighter sandbox but it may sit better elsewhere in the 
> Lucene code base as a more general purpose utility.
> 
> BTW as part of putting this together I found that the 
> TermFreq code throws a null pointer when indexing fields that 
> produce no tokens (ie empty or all stopwords). Otherwise 
> things work very well.

This is great news! While I won't have the time to test this until probably mid 
November I do look forward to the speed improvements as the current highlighting 
mechanisms (reparsing the text) was just not performant enough under heavy loads.


Regards,

Bruce Ritchie
http://www.jivesoftware.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching for a path

2004-10-28 Thread sergiu gordea
Bill Tschumy wrote:
I have a need to search an index for documents that were taken ffrom 
particulars files in the filesystem.

Each document in the index has a field named "url" that is created using:
doc.add(Field.Text("url", urlStr));
I understand this is both stored and indexed.
My search works if I do something like:
String queryStr = "\"file:///someDir/someOtherDir/File.txt\""
 query = MultiFieldQueryParser.parse("url:" + queryString, 
searchedFields, new StandardAnalyzer());
 hits = searcher.search(query);

It is important for me to quote the path for the search to succeed
I was hoping to speed the search up a bit by bypassing the 
QueryParser.  However, if I do something like

String queryStr = "\"file:///someDir/someOtherDir/File.txt\""
Query query = new TermQuery(new Term("url", queryStr));
hits = searcher.search(query);
For the begining I suggest you to make a system.out.println(query);
and to see what is the difference between the 2 queries 
 Sergiu
ahh I see now
you must to construct a PhraseQuery instead of TermQuery ...
The first one is PhraseQuery the second one that you construct with the 
term is TermQuery.
I suggest you to use QueryParser, the differemce in performance between 
your constructed query is just
the interpretation of regular expresion to find the type of the query. 
Using the QueryParser will ensure you that you won't
face problems that this one anymore.

   All the best,
 Sergiu
I get zero hits.  Why are these not equivalent?  I think it has 
something to do with the fact that the url needs to be quoted so I 
search for an exact match.  It does work if I have stored the url as a 
"Field.Keyword" rather than as "Field.Text" and then don't need to 
quote the string.  However I would prefer not to have to change the 
format of the index.

Thanks for any help.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Searching for a phrase that contains quote character

2004-10-28 Thread Morus Walter
Daniel Naber writes:
> On Thursday 28 October 2004 19:03, Justin Swanhart wrote:
> 
> > Have you tried making a term query by hand and testing to see if it
> > works?  
> >
> > Term t = new Term("field", "this is a \"test\"");
> > PhraseQuery pq = new PhraseQuery(t);
> 
> That's not a proper PharseQuery, it searches for *one* 
> term >this is a "test"< which is probably not what one wants. You 
> have to add the terms one by one to a PhraseQuery.
> 
Will spoke of a keyword field, in which case he would want to search
for one term.
Using a TermQuery make more sense, though.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching for a path

2004-10-28 Thread Morus Walter
Bill Tschumy writes:
> I have tried that and it doesn't work either.  I have also tried using 
> a PhraseQuery rather than TermQuery.
> 
How did you create the phrase query?
You have to analyze the string with the same analyzer you used during
indexing and add all created tokens. Given that the analyzer creates
more than one token a TermQuery won't work.
The fakt, that you needed the quotes in query parser, indicates that this
is the case.

Did you look at the serialized form of the query created by query parser
and your attempts? That is used query.toString("some field") to see
what's going on?

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new version of NewMultiFieldQueryParser

2004-10-28 Thread sergiu gordea
Bill Janssen wrote:
Try to see the behavior if you want to have a single term query 
juat something like: "robust" .. and print out the query string ...
   

Sure, that works fine.  For instance, if you have the three default
fields "title", "authors", and "contents", the one-word search
"robust" expands to
  title:foobar authors:foobar contents:foobar
just as it should.
 

Strange .. on my computer was created just someting like
default:foobar
... and I think that should work like that on your computer too ... I've 
take a look on lucene code ... and I undestood why ...
all the best ... Sergiu

 

Try to see what is happening with Prefix, Wild, and Fuzzy searches ...
   

Good point.  My older version (see below) found these, but the new one
doesn't.  Oh, well, back to the working version.  I knew there was some
reason getFieldQuery wasn't sufficient.
The working version is in the file SearchTest.java, which you can find
at ftp://ftp.parc.xerox.com/transient/janssen/SearchTest.java.  It's a
test program which runs the query through the NewMultiFieldQueryParser,
and then prints it out, so that you can see what the expansion is.
Bill
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: new version of NewMultiFieldQueryParser

2004-10-28 Thread Morus Walter
Bill Janssen writes:
> > Try to see the behavior if you want to have a single term query 
> > juat something like: "robust" .. and print out the query string ...
> 
> Sure, that works fine.  For instance, if you have the three default
> fields "title", "authors", and "contents", the one-word search
> "robust" expands to
> 
>title:foobar authors:foobar contents:foobar
> 
> just as it should.
> 
> >  Try to see what is happening with Prefix, Wild, and Fuzzy searches ...
> 
> Good point.  My older version (see below) found these, but the new one
> doesn't.  Oh, well, back to the working version.  I knew there was some
> reason getFieldQuery wasn't sufficient.
> 
wouldn't it be better to go on and overwrite the methods creating these 
types of queries too?

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new version of NewMultiFieldQueryParser

2004-10-28 Thread sergiu gordea
Morus Walter wrote:
Bill Janssen writes:
 

Try to see the behavior if you want to have a single term query 
juat something like: "robust" .. and print out the query string ...
 

Sure, that works fine.  For instance, if you have the three default
fields "title", "authors", and "contents", the one-word search
"robust" expands to
  title:foobar authors:foobar contents:foobar
just as it should.
   

Try to see what is happening with Prefix, Wild, and Fuzzy searches ...
 

Good point.  My older version (see below) found these, but the new one
doesn't.  Oh, well, back to the working version.  I knew there was some
reason getFieldQuery wasn't sufficient.
   

wouldn't it be better to go on and overwrite the methods creating these 
types of queries too?

Morus
 

Yes that't what I wanted to suggest ...
The query parser work fine  if you add all types of query parser ... but 
it was not working correctly in the case of single tem.
Therefore I test this first and I create a Query by using the normal 
MultifieldQueryParser.
Maybe is not the best solution but it works perfect ... and I had to 
write just a few lines of code 

Sergiu
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]