indexreader throwing IOException with lock obtain timed out

2004-05-31 Thread Sebastian Ho
hi

i am updating the index and therefore need to delete documents before
adding the updated version.

This is how I delete the document which is working fine.

-
int deleteDoc = 0;
deleteDoc = IndexReader.open(dstDir).delete(new Term(url, url));
IndexReader.open(dstDir).close();
-

The writer after that throws an IOException : Lock obtain timed out.

-
Analyzer analyzer = new StandardAnalyzer();
IndexWriter writer = new IndexWriter(dstDir, analyzer, overwrite);
-

Am I missing anything? I have already closed the IndexReader before
calling the writer.

Thanks


Sebastian 




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: indexreader throwing IOException with lock obtain timed out

2004-05-31 Thread Sebastian Ho
Sorry guys I have solved it. I should do this.

int deleteDoc = 0;
IndexReader reader = IndexReader.open(dstDir);
deleteDoc = reader.delete(new Term(url, url));
reader.close();

Just need to use the same instance of reader.

anyway lucene should just overwrite the old document during updating
instead..

sebastian

On Mon, 2004-05-31 at 18:02, Sebastian Ho wrote:
 hi
 
 i am updating the index and therefore need to delete documents before
 adding the updated version.
 
 This is how I delete the document which is working fine.
 
 -
 int deleteDoc = 0;
 deleteDoc = IndexReader.open(dstDir).delete(new Term(url, url));
 IndexReader.open(dstDir).close();
 -
 
 The writer after that throws an IOException : Lock obtain timed out.
 
 -
 Analyzer analyzer = new StandardAnalyzer();
 IndexWriter writer = new IndexWriter(dstDir, analyzer, overwrite);
 -
 
 Am I missing anything? I have already closed the IndexReader before
 calling the writer.
 
 Thanks
 
 
 Sebastian 
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



clean up html before indexing or add tags to ignore list

2004-05-12 Thread Sebastian Ho
Hi

This is a typical web crawler, indexing and search application
development. I have wrote my crawler and planning to add lucene in next.
One questions pop to my mind, in terms of performance, do i clean up the
html removing all tags before indexing, or i add all tags into the
ignore list during indexing/search stage. 

Which is better?

Thanks

Sebastian Ho


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



potential synchronization problem

2004-04-30 Thread Sebastian Ho
Hi

I forsee the following scenario in my project and hope to get a reply to
this before I start coding :

I have an standalone application which runs lucene indexing in the
background at a user specified interval (e.g. every 2 days). In the
meantime, user will be able to force a indexing operation anytime he
wish to. I assume this will cause two process of lucene writing to the
same index files (one from the background lucene and the other one by
the user). Will this cause any problem with regards to race condition or
synchronization issues if any?

Thanks

Sebastian Ho
BII


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Java Parser or Lucene

2004-04-25 Thread Sebastian Ho
Hi

I am deciding between using the parsing API provided by Java and Lucene
for searching keywords in a HTML page. Is Lucene an overkill for a
webpage (which is quite small in size) in this case? I don't have a
obvious choice between them because both can do the same job.

Any advise?

Thanks

Sebastian Ho
Bioinformatics Institute


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: suitability of lucene for project

2004-04-14 Thread Sebastian Ho
I will be searching webpages (url given by user) for keyword (in
clinical record). Will that be structured or unstructured? The records
might be in a table or a list of urls pointing to individual record
webpages.

thks

sebastian


On Tue, 2004-04-13 at 11:15, Stephane James Vaucher wrote:
 It could be part of you solution, but I don't think so. Let me explain:
 
 I've done this a few times something similar to what you describe. I use 
 often use HttpUnit to get information. How you process it, it's up 
 to you. If you want it to be indexed (searchable), you can use Lucene. If 
 you want to extract structured (or semi-structured) information, use 
 wrapper induction techniques (not Lucene).
 
 cheers,
 sv
 
 On 13 Apr 2004, Sebastian Ho wrote:
 
  hi all
  
  i am investigating technologies to use for a project which basically
  retrieves html pages on a regular basis(or whenever there are changes)
  and allow html parsing to extract specific information, and presenting
  them as links in a webpage. Note that this is not a general search
  engine kind of project but we are extracting clinical information from
  various website and consolidating them.
  
  Pls advise me whether Lucene can do the above and in areas where it
  cannot, suggestions to solutions will be appreciated.
  
  Thanks
  
  Sebastian Ho
  Bioinformatics Institute
  
  
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
  
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



suitability of lucene for project

2004-04-12 Thread Sebastian Ho
hi all

i am investigating technologies to use for a project which basically
retrieves html pages on a regular basis(or whenever there are changes)
and allow html parsing to extract specific information, and presenting
them as links in a webpage. Note that this is not a general search
engine kind of project but we are extracting clinical information from
various website and consolidating them.

Pls advise me whether Lucene can do the above and in areas where it
cannot, suggestions to solutions will be appreciated.

Thanks

Sebastian Ho
Bioinformatics Institute


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]