RE: Search Performance

2005-02-18 Thread David Townsend
Are you creating new IndexSearchers or IndexReaders on each search?  Caching 
your IndexSearchers has a dramatic effect on speed.

David Townsend

-Original Message-
From: Michael Celona [mailto:[EMAIL PROTECTED]
Sent: 18 February 2005 15:55
To: Lucene Users List
Subject: Search Performance


What is single handedly the best way to improve search performance?  I have
an index in the 2G range stored on the local file system of the searcher.
Under a load test of 5 simultaneous users my average search time is ~4700
ms.  Under a load test of 10 simultaneous users my average search time is
~1 ms.I have given the JVM 2G of memory and am a using a dual 3GHz
Zeons.  Any ideas?  

 

Michael


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Performance

2005-02-18 Thread David Townsend
IndexSearchers are thread safe, so you can use the same object on multiple 
requests.  If the index is static and not constantly updating, just keep one 
IndexSearcher for the life of the app.  If the index changes and you need that 
instantly reflected in the results, you need to check if the index has changed, 
if it has create a new cached IndexSearcher.  To check for changes use you'll 
need to monitor the version number of the index obtained via

IndexReader.getCurrentVersion(Index Name)

David

-Original Message-
From: Stefan Groschupf [mailto:[EMAIL PROTECTED]
Sent: 18 February 2005 16:15
To: Lucene Users List
Subject: Re: Search Performance


Try a singleton pattern or an static field.

Stefan

Michael Celona wrote:

I am creating new IndexSearchers... how do I cache my IndexSearcher...

Michael

-Original Message-
From: David Townsend [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 18, 2005 11:00 AM
To: Lucene Users List
Subject: RE: Search Performance

Are you creating new IndexSearchers or IndexReaders on each search?  Caching
your IndexSearchers has a dramatic effect on speed.

David Townsend

-Original Message-
From: Michael Celona [mailto:[EMAIL PROTECTED]
Sent: 18 February 2005 15:55
To: Lucene Users List
Subject: Search Performance


What is single handedly the best way to improve search performance?  I have
an index in the 2G range stored on the local file system of the searcher.
Under a load test of 5 simultaneous users my average search time is ~4700
ms.  Under a load test of 10 simultaneous users my average search time is
~1 ms.I have given the JVM 2G of memory and am a using a dual 3GHz
Zeons.  Any ideas?  

 

Michael


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


  



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Multi-threading problem: couldn't delete segments

2005-01-13 Thread David Townsend
The problem could be you're writing to an index with multiple processes. This 
can happen if you're using a shared file system (NFS?).  We saw this problem 
when we had two IndexWriters getting access to a single index at the same time. 
 Usually if you're working on a single machine the file locks prevent this from 
happening.



-Original Message-
From: Luke Francl [mailto:[EMAIL PROTECTED]
Sent: 13 January 2005 18:13
To: Lucene Users List
Subject: Re: Multi-threading problem: couldn't delete segments


I didn't get any response to this post so I wanted to follow up (you can
read the full description of my problem in the archives:
http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=11986).

Here's an additional piece of information: 

I wrote a small program to confirm that on Windows, you can't rename a
file while another thread has it open.

If I am performing a search, is it possible that the IndexReader is
holding open the segments file when there is an attempt by my indexing
code to overwrite it with File.renameTo()?

Thanks,
Luke Francl

On Thu, 2005-01-06 at 17:43, Luke Francl wrote:
 We are having a problem with Lucene in a high concurrency
 create/delete/search situation. I thought I fixed all these problems,
 but I guess not.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene Book in UK

2005-01-06 Thread David Townsend
Sorry if this is the wrong forum but I wondered what's happened to 'Lucene In 
Action' in the UK.  Looking forward to reading it but amazon.co.uk report it as 
a 'hard to find' item and are now quoting a 4-6 week delivery time and  tacking 
on a rare book charge.  Amazon.com are quoting shipping in 24hrs.  Is this a 
new 'Boston Tea Party'? 

cheers

David




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



hits.length() changes during delete process.

2004-12-03 Thread David Townsend
I have a delete script

IndexSearcher searcher = new IndexSearcher(reader);

Hits hits = searcher.search(query);
log.info(there are  + hits.length() +  hits);

for (int i = 0; i  hits.length(); i++) {
  log.info(hits.length() +   + i +   + hits.id(i));
  reader.delete(hits.id(i));
}

which iterates through the results of a search and deletes the returns.  I keep 
getting an ArrayIndexOutOfBoundsException.  I've found the reason is that 
hits.length() actually changes during the iteration in large regular steps i.e 

The hits length is initially 10003

after 100 deletions hits.length() changes to  9903
after 200 deletions hits.length() changes to 9803

then changes after 
200 deletions
400
800
1600
3200

So the short question is, should the hits object be changing and what is the 
best way to delete all the results of a search (it's a range query so I can't 
use delete(Term term)? 

cheers.

David

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Making lucene work in weblogic cluster

2004-10-08 Thread David Townsend
No I didn't.  If you look for NFS in the archives, there is an alternate solution out 
there.  I suppose I should get around to submitting the patch.

-Original Message-
From: Praveen Peddi [mailto:[EMAIL PROTECTED]
Sent: 08 October 2004 16:10
To: lucenelist
Subject: Making lucene work in weblogic cluster


While I was going through the mailing list in solving the lucene cluster problem, I 
came accross this thread. Does any one know if David Townsend had submitted the patch 
he was talking about?
http://www.mail-archive.com/[EMAIL PROTECTED]/msg06252.html

I am interested in looking at the NFS solution (mounting the shared drive on each 
server in cluster). I don't know if anyone has used this solution in cluster but this 
seems to be a better approach than RemoteSearchable interface and DB based index 
(SQLDirectory).


I am currently looking at 2 options:
Index on Shared drive: Use single index dir on a shared drive (NFS, etc.), which is 
mounted on each app server. All the servers in the cluster write to this shared drive 
when objects are modified.
Problems:
1) Known problems like file locking etc. (The above thread talks about moving locking 
mechanism to DB but I have no idea how).
2) Performance.

Index Per Server: Create copies of the index dir for each machine. Requires regular 
updates, etc. Each server maintains its own index and searches on its own index.
Problems:
1) Modifying the index is complex. When Objects are modified on a server1 that does 
not run the search system, server1 needs to notify all servers in the cluster about 
these modifications so that each server can update its own index. This may involve 
some kind of remote communication mechanism which will perform bad since our index 
modifies a lot.

So I am still reviewing both options and trying to figure out which one is the best 
and how to solve the above problems.

If you guys have any ideas, Pls shoot them. I would appreciate any help regarding 
making lucene clusterable (both indexing and searching).

Praveen

** 
Praveen Peddi
Sr Software Engg, Context Media, Inc. 
email:[EMAIL PROTECTED] 
Tel:  401.854.3475 
Fax:  401.861.3596 
web: http://www.contextmedia.com 
** 
Context Media- The Leader in Enterprise Content Integration 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Making lucene work in weblogic cluster

2004-10-08 Thread David Townsend
Doug discusses the locking issue, with a potential solution

http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1619988



-Original Message-
From: Praveen Peddi [mailto:[EMAIL PROTECTED]
Sent: 08 October 2004 16:10
To: lucenelist
Subject: Making lucene work in weblogic cluster


While I was going through the mailing list in solving the lucene cluster problem, I 
came accross this thread. Does any one know if David Townsend had submitted the patch 
he was talking about?
http://www.mail-archive.com/[EMAIL PROTECTED]/msg06252.html

I am interested in looking at the NFS solution (mounting the shared drive on each 
server in cluster). I don't know if anyone has used this solution in cluster but this 
seems to be a better approach than RemoteSearchable interface and DB based index 
(SQLDirectory).


I am currently looking at 2 options:
Index on Shared drive: Use single index dir on a shared drive (NFS, etc.), which is 
mounted on each app server. All the servers in the cluster write to this shared drive 
when objects are modified.
Problems:
1) Known problems like file locking etc. (The above thread talks about moving locking 
mechanism to DB but I have no idea how).
2) Performance.

Index Per Server: Create copies of the index dir for each machine. Requires regular 
updates, etc. Each server maintains its own index and searches on its own index.
Problems:
1) Modifying the index is complex. When Objects are modified on a server1 that does 
not run the search system, server1 needs to notify all servers in the cluster about 
these modifications so that each server can update its own index. This may involve 
some kind of remote communication mechanism which will perform bad since our index 
modifies a lot.

So I am still reviewing both options and trying to figure out which one is the best 
and how to solve the above problems.

If you guys have any ideas, Pls shoot them. I would appreciate any help regarding 
making lucene clusterable (both indexing and searching).

Praveen

** 
Praveen Peddi
Sr Software Engg, Context Media, Inc. 
email:[EMAIL PROTECTED] 
Tel:  401.854.3475 
Fax:  401.861.3596 
web: http://www.contextmedia.com 
** 
Context Media- The Leader in Enterprise Content Integration 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Moving from a single server to a cluster

2004-09-08 Thread David Townsend
Would it be cheeky to ask you to post the docs to the group?  It would be interesting 
to read how you've tackled this.

-Original Message-
From: Nader Henein [mailto:[EMAIL PROTECTED]
Sent: 08 September 2004 13:57
To: Lucene Users List
Subject: Re: Moving from a single server to a cluster


Hey Ben,

We've been using a distributed environment with three servers and three 
separate indecies for the past 2 years since the first stable Lucene 
release and it has been great, recently and for the past two months I've 
been working on a redesign for our Lucene App and I've shared my 
findings and plans with Otis, Doug and Erik, they pointed out a few 
faults in my logic which you will probably come across soon enough that 
mainly have to do with keeping you updates atomic (not too hard) and 
your deletes atomic (a little more tricky), give me a few days and I'll 
send you both the early document and the newer version that deals 
squarely with Lucene in a distributed environment with high volume index.

Regards.

Nader Henein

Ben Sinclair wrote:

My application currently uses Lucene with an index living on the
filesystem, and it works fine. I'm moving to a clustered environment
soon and need to figure out how to keep my indexes together. Since the
index is on the filesystem, each machine in the cluster will end up
with a different index.

I looked into JDBC Directory, but it's not tested under Oracle and
doesn't seem like a very mature project.

What are other people doing to solve this problem?

  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: worddoucments search

2004-08-24 Thread David Townsend
Is this a wind-up?

-Original Message-
From: Santosh [mailto:[EMAIL PROTECTED]
Sent: 24 August 2004 13:16
To: Lucene Users List
Subject: worddoucments search


Can lucene be able to search word documents? if so please give me information about it

regards
Santosh kumar


---SOFTPRO DISCLAIMER--



Information contained in this E-MAIL and any attachments are

confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'

and 'confidential'.



If you are not an intended or authorised recipient of this E-MAIL or

have received it in error, You are notified that any use, copying or

dissemination  of the information contained in this E-MAIL in any

manner whatsoever is strictly prohibited. Please delete it immediately

and notify the sender by E-MAIL.



In such a case reading, reproducing, printing or further dissemination

of this E-MAIL is strictly prohibited and may be unlawful.



SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

hereto is free from computer viruses or other defects. 



The opinions expressed in this E-MAIL and any ATTACHEMENTS may be

those of the author and are not necessarily those of SOFTPRO SYSTEMS.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: pdf search

2004-08-20 Thread David Townsend
Hi Santosh,

Lucene doesn't search pdfs per se.  To make anything searchable you have to first 
extract the content and then put it in lucene in a form it understands (i.e document 
objects).  So in order to search your pdfs you first need to extract the info from the 
PDFs using something like PDFBox.  So your battleplan should be forget lucene for a 
while, get the raw data out of all the items you want to search. Then look at the 
lucene articles about creating simple searchable indices.

DT

If we didn't train to fight, who'd fight the wars? :)

-Original Message-
From: Santosh [mailto:[EMAIL PROTECTED]
Sent: 20 August 2004 13:30
To: Lucene Users List
Subject: Fw: pdf search


How can I search through PDF?
- Original Message - 
From: Santosh 
To: Lucene Users List 
Sent: Friday, August 20, 2004 5:59 PM
Subject: pdf search


Hi,

I am new bee to lucene.

I have downloaded zip file. now how can i give my own list words to lucene?
In the demo i saw that lucene is automatically creating index if we run the java 
program.but I want to give my own search words, how is it possible? 


regards
Santosh kumar
SoftPro Systems
Hyderabad


The harder you train in peace, the lesser you bleed in war

---SOFTPRO DISCLAIMER--



Information contained in this E-MAIL and any attachments are

confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'

and 'confidential'.



If you are not an intended or authorised recipient of this E-MAIL or

have received it in error, You are notified that any use, copying or

dissemination  of the information contained in this E-MAIL in any

manner whatsoever is strictly prohibited. Please delete it immediately

and notify the sender by E-MAIL.



In such a case reading, reproducing, printing or further dissemination

of this E-MAIL is strictly prohibited and may be unlawful.



SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

hereto is free from computer viruses or other defects. 



The opinions expressed in this E-MAIL and any ATTACHEMENTS may be

those of the author and are not necessarily those of SOFTPRO SYSTEMS.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: searchhelp

2004-08-19 Thread David Townsend
JGURU FAQ
http://www.jguru.com/faq/Lucene

OFFICIAL FAQ
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi

MAIL ARCHIVE
http://www.mail-archive.com/[EMAIL PROTECTED]/

hope this helps.


-Original Message-
From: Santosh [mailto:[EMAIL PROTECTED]
Sent: 19 August 2004 11:25
To: Lucene Users List
Subject: Re: searchhelp


I am recently joined into list, I didnt gone through any previous mails, if
you have any mails or related code please forward it to me
- Original Message -
From: Chandan Tamrakar [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 3:47 PM
Subject: Re: searchhelp


 For PDF you need to extract a text from pdf files using pdfbox library
and
 for word documents u can use apache POI api's . There are messages
 posted on the  lucene list related to your queries. About database ,i
guess
 someone must have done it . :)

 - Original Message -
 From: Santosh [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Thursday, August 19, 2004 3:58 PM
 Subject: searchhelp


 Hi,

 I am using lucene search engine for my application.

 i am able to search through the text files and htmls as specified by
lucene

 can you please clarify my doubts

 1.can lucene search through pdfs and word documents? if yes then how?

 2.can lucene search through database ? if yes then how?

 thankyou

 santosh


 ---SOFTPRO DISCLAIMER--

 Information contained in this E-MAIL and any attachments are
 confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
 and 'confidential'.

 If you are not an intended or authorised recipient of this E-MAIL or
 have received it in error, You are notified that any use, copying or
 dissemination  of the information contained in this E-MAIL in any
 manner whatsoever is strictly prohibited. Please delete it immediately
 and notify the sender by E-MAIL.

 In such a case reading, reproducing, printing or further dissemination
 of this E-MAIL is strictly prohibited and may be unlawful.

 SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
 hereto is free from computer viruses or other defects.

 The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
 those of the author and are not necessarily those of SOFTPRO SYSTEMS.
 



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Memo: RE: Query parser and minus signs

2004-05-21 Thread David Townsend
Doesn't en UK as a phrase query work?

You're probably indexing it as a text field so it's being tokenised.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: 21 May 2004 16:47
To: Lucene Users List
Subject: Memo: RE: Query parser and minus signs






Hmm, we may have to if there is no work around. We're not using java
locales, but were trying to stick to the ISO standard which uses hyphens.




Ryan Sonnek [EMAIL PROTECTED] on 21 May 2004 16:38

Please respond to Lucene Users List [EMAIL PROTECTED]

To:Lucene Users List [EMAIL PROTECTED]
cc:
bcc:

Subject:RE: Query parser and minus signs


if you're dealing with locales, why not use java's built in locale syntax
(ex: en_UK, zh_HK)?

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 Sent: Friday, May 21, 2004 10:36 AM
 To: [EMAIL PROTECTED]
 Subject: Query parser and minus signs






 Hi All,

 I'm using Lucene on a site that has split content with a
 branch containing
 pages in English and a separate branch in Chinese.  Some of
 the chinese
 pages include some (untranslatable) English words, so when a search is
 carried out in either language you can get pages from the
 wrong branch. To
 combat this we introduced a language field into the index
 which contains
 the standard language codes: en-UK and zh-HK.

 When you parse a query  e.g. language:en\-UK you could
 reasonably expect
 the search to recover all pages with the language field set
 to en-UK (the
 minus symbol should be escaped by the backslash according to the FAQ).
 Unfortunately the parser seems to return en UK as the
 parsed query and
 hence returns no documents.

 Has anyone else had this problem, or could suggest a
 workaround ?? as I
 have
 yet to find a solution in the mailing archives or elsewhere.

 Many thanks in advance,

 Alex Bourne



 _

 This transmission has been issued by a member of the HSBC Group
 (HSBC) for the information of the addressee only and should not be
 reproduced and / or distributed to any other person. Each page
 attached hereto must be read in conjunction with any disclaimer which
 forms part of it. This transmission is neither an offer nor
 the solicitation
 of an offer to sell or purchase any investment. Its contents
 are based
 on information obtained from sources believed to be reliable but HSBC
 makes no representation and accepts no responsibility or
 liability as to
 its completeness or accuracy.


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



**
 This message originated from the Internet. Its originator may or
 may not be who they claim to be and the information contained in
 the message and any attachments may or may not be accurate.
**








_

This transmission has been issued by a member of the HSBC Group 
(HSBC) for the information of the addressee only and should not be 
reproduced and / or distributed to any other person. Each page 
attached hereto must be read in conjunction with any disclaimer which 
forms part of it. This transmission is neither an offer nor the solicitation 
of an offer to sell or purchase any investment. Its contents are based 
on information obtained from sources believed to be reliable but HSBC 
makes no representation and accepts no responsibility or liability as to 
its completeness or accuracy.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: about search and update one index simultaneously

2004-05-19 Thread David Townsend
There is no problem with updating and searching simultaneously.  Two threads updating 
simultaneously on the same index on NFS can be a problem, as the locking does not work 
reliably.  Have a look through the archives for NFS, there are some solutions 
scattered about.

David

-Original Message-
From: xuemei li [mailto:[EMAIL PROTECTED]
Sent: 18 May 2004 23:01
To: [EMAIL PROTECTED]
Subject: about search and update one index simultaneously


Hi,all,

Can we do search and update one index simultaneously?Is someone know sth
about it? I had done some experiments.Now the search will be blocked
when the index is being updated.The error in search node is like this:
caught a class java.io.IOException
with message:Stale NFS file handle

Thanks

Xuemei Li




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Getting a field value from a large indexed document is slow.

2004-05-14 Thread David Townsend
You say the content is indexed, is it stored?  If note, index the content of the 
document, but don't store it.

eg 

doc.add(Field.UnStored(content, content));



-Original Message-
From: Paul Williams [mailto:[EMAIL PROTECTED]
Sent: 14 May 2004 16:22
To: 'Lucene Users List'
Subject: Getting a field value from a large indexed document is slow.


Hi,

I hope someone can help!
I am using Lucene to make a searching repository of electronic documents.
(MS Office, PDF's etc.). Some of these document can contain a large amount
of text (about 500K of text in some cases) which is indexed to make it
searchable. 

Doing the search and getting the hits found is not effected by the size of
the document found.

But when I try and access a field (my document id) in the document

 i.e.

// Create Lucene Doc with value
Document doc = hits.doc(i);

String number = doc.get(Field10);


The creation of the Lucene document can take up to a second per hit. I don't
actually use any of the other fields apart from getting my ID value from
field10.

So my question is:-

Is there a smarter way of getting out the 'Field10' value without it
populating all the rest of the fields in the Lucene document and therefore
reduce the time taken for this action.


Paul

DISCLAIMER:
 The information in this message is confidential and may be legally
privileged. It is intended solely for the addressee.  Access to this message
by anyone else is unauthorised.  If you are not the intended recipient, any
disclosure, copying, or distribution of the message,  or any action or
omission taken by you in reliance on it, is prohibited and may be unlawful.
Please immediately contact the sender if you have received this message in
error.
 Thank you.
 Valid Information Systems Limited.   Address:  Morline House,  160 London
Road,  Barking, Essex, IG11 8BB. 
http://www.valinf.com Tel: +44 (0) 20 8215 1414 Fax: +44 (0) 20 8215 2040 
-

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Returning Separate Hits from Multisearcher

2004-05-10 Thread David Townsend

We have a number of small indices and also an uber-index made up of all the smaller 
indices.  We need to get do a search across a number of the sub-indices and get back a 
hit count from each.  Currently we search  each index, we've also tried running 
multiple queries against the uber-index, with a field denoting which subindex we are 
interested in. Obviously this approach is very slow.  Is there any way to use 
MultiSearcher to do this?  The problem we currently have with MultiSearcher is there 
seems no way to tell how many hits came from each index.  Is there a recommended way 
to do this, or should we modify MultiSearcher to return information about the hits on 
each index.

any ideas?

David Townsend




RE: weblogic cluster, index on NFS and locking problem

2004-02-04 Thread David Townsend
We work on NFS and have had major problems with locking, which often leads to the 
indices becoming corrupt.  Our solution was to replace file locking with a database 
system.   I can release the code but I'm not sure of the process or where to put it.  
It basically two classes one that extends the Directory class and one that deals with 
the database interaction.

David Townsend

-Original Message-
From: Dmitri Ilyin [mailto:[EMAIL PROTECTED]
Sent: 04 February 2004 09:49
To: [EMAIL PROTECTED]
Subject: Re: weblogic cluster, index on NFS and locking problem


What is it good for???
unfortunately i don't have any access to NFS server. It runs at 
customers in production environment.
 Suggestion: make sure the NFS lock daemon (lockd) is running on the NFS
 server.
 
 Peter
 
 - Original Message - 
 From: Dmitri Ilyin [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Tuesday, February 03, 2004 1:09 PM
 Subject: weblogic cluster, index on NFS and locking problem
 
 
 
Hi,

We run our application on weblogic cluster. the lucene index service
runs on both server in cluster and they both write to one index
directory, shared via NFS. We have experenced a problem with commit.lock
file, that seems not to be deleted and stayed in the index directory, so
we could not start indexing any more becouse lucene could not
create/read commit.lock file.

I'm not sure what excatly our problem is. It could be NFS problem or it
could be our usage problem. We are just starting to use lucene and
could made something wrong.

We use lucene to index and to search documents. Write/read could be
concurent.

I saw in the list some messages about problems with lock files on NFS
file system. But i could not realy understand what the problem is.

How can we improve our solution?? What do we have to do excatly to avoid
problem with stayed commit.lock file???

thaks for any advise

regards

Dmitri



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: use Lucene LOCAL (looking for a frontend)

2004-01-28 Thread David Townsend
Why don't you take a look at luke. That way you can play with the index you built and 
work from there.  If you're looking to replicate something like Luke, I'd get studying 
now ;).

http://www.getopt.org/luke/



-Original Message-
From: Sebastian Fey [mailto:[EMAIL PROTECTED]
Sent: 28 January 2004 14:23
To: Lucene Users List
Subject: AW: use Lucene LOCAL (looking for a frontend)


Not being funny, but if you have no experience in Java, then why are you using a Java 
API for index building/text searching ?

im just testing some possibilities.
though i cant write an java application, i can read it and, if someone gives me 
something to start with, im sure ill make it. if lucene seems to be the best solution, 
ill spend some time to leran something about java.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Multiple Creation of Writers

2004-01-14 Thread David Townsend
In my system indices are created and updated by multiple threads.  I need to check if 
an index exists to decide whether to pass true or false to the IndexWriter constructor.

new IndexWriter(FSDirectory, Analyzer, boolean);

The problem arises when two threads attempt to create the same index after 
simultaneously finding that the index does not exist.  This problem can be reproduced 
in a single thread by

writerA = new IndexWriter(new File(c:/import/test), new StandardAnalyzer(), true);
writerB = new IndexWriter(new File(c:/import/test), new StandardAnalyzer(), true);
add1000Docs(writerA);
add1000Docs(writerB);

this will throw an IOException

C:\import\test\_a.fnm (The system cannot find the file specified)

The only solution I can think of is to create a database/file lock to get around this, 
or change the Lucene code to obtain a lock before creating an index.  Any ideas?

David








-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lock obtain timed out

2003-12-16 Thread David Townsend
Does this mean if you can insure that only one IndexWriter and/or IndexReader(Doing 
deletion) are never open at the same time (eg using database instead of lucene's 
locking), there will be no problem with removing locking?   If you do not use an 
IndexReader to do deletion can you open and close it at anytime?

David
-Original Message-
From: Morus Walter [mailto:[EMAIL PROTECTED]
Sent: 16 December 2003 11:08
To: Lucene Users List
Subject: Re: Lock obtain timed out


Hohwiller, Joerg writes:
 
 Am I safe disabling the locking???

No.

 Can anybody tell me where to get documentation about the Locking
 strategy (I still would like to know why I have that problem) ???
 
I guess -- but given your input I really have to guess; the source you
wanted to attach didn't make it to the list -- your problem is, that
you cannot have a writing (deleting) IndexReader and an IndexWriter open
at the same time.
There can only be one instance that writes to an index at one time.

Disabling locking disables the checks, but then you have to take care
yourself. So in practice disabling locking is useful for readonly access
to static indices only.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: indexing/searching a website

2003-11-27 Thread David Townsend
I would advise you to use the excellent articles listed here.  

http://jakarta.apache.org/lucene/docs/resources.html

Some good examples and by the end of it you should have a good understanding of the 
major
classes and their use.

-Original Message-
From: Michal S [mailto:[EMAIL PROTECTED]
Sent: 27 November 2003 10:52
To: Lucene Users List
Subject: Re: indexing/searching a website



 Another option is to deploy your site and crawl it from the outside 
 (have a look at Nutch at sourceforge - or write your own using 
 HttpClient and some HTML parsing for hyperlinks).

I realize that it will be necessary to write or use existing html 
parser. I know that i need But i don't know how the whole framework 
would look like (how to translate pages on webserwer to Lucene 
documents, how to index them, how to search them).

The example on the Lucene home page is very simple and doesn't give me 
much answers.


 I would argue that content within the JSP is a bad thing given that you 
 want to index it - perhaps it makes more sense to put the content 
 somewhere easier to get at like a database?

You are absolutely right. But my client wants to edit the content as 
easy as possible (via notepad or other text editor). If the content were 
in database, it would be necessery to provide my client with some kind 
of application which could let him update the content. The budget of the 
project is strongly limited so i can't afford to allocate more 
developers to build content editor.

Thanks for the reply.
Michal.



___
Najlepsze bo darmowe - konta e-mail
www.free.os.pl


-- SUPER LOGOSY I DZWONKI DO TWOJEJ KOMÓRKI --
   www.logo-dzwonki.pl


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]