Re: Lucene : avoiding locking (incremental indexing)

2004-11-16 Thread jeichels

I am interested in pursuing experienced peoples' understanding as I have half 
the queue approach developed already.

I am not following why you don't like the queue approach Sergiu.  From what I 
gathered from this board, if you do lots of updates, the opening of the 
WriterIndex is very intensive and should be used in a batch orientation rather 
then on a one-at-a-time incremental approach.  In some cases on this board they 
talk about it being so overwhelming that people are putting forced delays so 
the Java engine can catch up.  Using a queueing approach, you may get a hit 
every 30 seconds or minute or...whatever you choose as your timeframe, but it 
should be enough of a delay to allow the java engine to not be overwhelmed.  I 
would like this not to happen with Lucene and would like to be able to update 
every time an update occurs, but this does not seem the right approach right 
now.  As I said before, this seems like a wish item for Lucene.  I don't really 
know if the wish is feasible.

So far the biggest problem I was facing with this approach, however, was having 
feedback from the archiving process to the main database that the archiving 
change actually has happened and correctly even if the server goes down.

JohnE





 Personally I don't like the Queue aproach... because I already 
 implemented multithreading in out application
 to improve its performance. In our application indexing is not a 
 high 
 priority, but it's happening quite often.
 Search is a priority.
 
 Lucene allows to have more searches at on time. When you have a 
 big 
 index and a many users then ...
 the Queue aproach can slow down your application to much. I think 
 it 
 will be a bottleneck.
 
 I know that the lock problem is annoying, but I also think that 
 the 
 right way is to identify the source of locking.
 Our application is a webbased application based on turbine, and 
 when we 
 want to restart tomcat, we just kill
 the process (otherwise we need to restart 2 times because of some 
 log4j 
 initialization problem), so ...
 the index is locked after the tomcat restart. In my case it makes 
 sense 
 to check if index is locked one time at
 startup. I'm also logging all errors that I get in the systems, 
 this is 
 helping me to find their sourcce easier.
 
 All the best,
 
 Sergiu
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene : avoiding locking (incremental indexing)

2004-11-16 Thread Sergiu Gordea
[EMAIL PROTECTED] wrote:
I am interested in pursuing experienced peoples' understanding as I have half the queue approach developed already.
 

well I think that experienced people developed lucene :)  theyoffered us 
the possibility to use multithreading and concurent searching.
Of course .. depends on requirements to use them or not. I choose to use 
them ... because I'm developing a web application.

I am not following why you don't like the queue approach Sergiu.  From what I gathered from this board, if you do lots of updates, the opening of the WriterIndex is very intensive and should be used in a batch orientation rather then on a one-at-a-time incremental approach.  

That's not my case .. I have to reindex the information that is changed 
in our system. We are developing a knowledge management platform and
reindex the objects each time they are changed.

In some cases on this board they talk about it being so overwhelming that people are putting forced delays so the Java engine can catch up.  

I haven'T had this kind of problems and I use multithreading when I 
reindex the whole index ... and the searches still work correctly 
whithout any locking
problems. I think that the locking problems come from outside .. and 
this locking sources should be identified.
But again .. this is just my case ...

Using a queueing approach, you may get a hit every 30 seconds or minute 
or...whatever you choose as your timeframe, but it should be enough of a delay 
to allow the java engine to not be overwhelmed.
No .. I cannot accept this because our users should be able to change 
information in the system and to make searches in the same time, without 
having to wait
to much for server response ...

 I would like this not to happen with Lucene and would like to be able to update every time an update occurs, but this does not seem the right approach right now.  As I said before, this seems like a wish item for Lucene.  I don't really know if the wish is feasible.
 

I agree that maybe a built in function for identifying false locking 
would be very usefull ... but it might be also a little bit bad for the 
users because they
will be tempted to unlock index ... instead of closing readers/writers 
correctly.

So far the biggest problem I was facing with this approach, however, was having feedback from the archiving process to the main database that the archiving change actually has happened and correctly even if the server goes down.
 

... so .. it may work correctly if we use lucene (and the servers and 
the OS)  correctly :)

 Maybe it will be a good idea to create some junit/jmeter tests to 
identify the source of  unespected locks.
This is also depending on your availability. But I think it will worth 
the effort.

Sergiu
JohnE


 

Personally I don't like the Queue aproach... because I already 
implemented multithreading in out application
to improve its performance. In our application indexing is not a 
high 
priority, but it's happening quite often.
Search is a priority.

Lucene allows to have more searches at on time. When you have a 
big 
index and a many users then ...
the Queue aproach can slow down your application to much. I think 
it 
will be a bottleneck.

I know that the lock problem is annoying, but I also think that 
the 
right way is to identify the source of locking.
Our application is a webbased application based on turbine, and 
when we 
want to restart tomcat, we just kill
the process (otherwise we need to restart 2 times because of some 
log4j 
initialization problem), so ...
the index is locked after the tomcat restart. In my case it makes 
sense 
to check if index is locked one time at
startup. I'm also logging all errors that I get in the systems, 
this is 
helping me to find their sourcce easier.

All the best,
Sergiu
   


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene : avoiding locking

2004-11-15 Thread jeichels

I am new to Lucene, but have a large project in production on the web using 
other apache software including Tomcat, Struts, OJB, and others.

The database I need to support will hopefully grow to millions of records.  
Right now it only has thousands but it is growing.   These documents get 
updated by users regularly, but not frequently.   When you have 100k users 
though, infrequently means you still have to deal with lock types of issues.

When they update their record, their search criteria will have to be updated 
and they will expect to see results somewhat immediately.

In moving from exact matching which is very poor for searches to Lucene, this 
locking is the only thing that has me nervous.   I would really like a well 
thought out scheme for incremental changes as I won't generally need batch 
unless I have to delete/recreate the database for some reason.

Thinking about most online forums, I think incremental is the way they would 
like to be able to go for searching.

I have lots to learn about this project, but I really like what I see besides 
that locking issue.   If I get into this more and understand details maybe I 
will have something to offer later.   Lots to learn first though.

Thank you for your hard work,

JohnE





I am curious, though, how many people on this list are using Lucene in
the incremental update case. Most examples I've seen all assume batch
indexing.

Regards,

Luke Francl




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene : avoiding locking (incremental indexing)

2004-11-15 Thread Luke Francl
This is how I implemented incremental indexing. If anyone sees anything
wrong, please let me know.

Our motivation is similar to John Eichel's. We have a digital asset
management system and when users update, delete or create a new asset,
they need to see their results immediately.

The most important thing to know about incremental indexing that
multiple threads cannot share the same IndexWriter, and only one
IndexWriter can be open on an index at a time.

Therefore, what I did was control access to the IndexWriter through a
singleton wrapper class that synchronizes access to the IndexWriter and
IndexReader (for deletes). After finishing writing to the index, you
must close the IndexWriter to flush the changes to the index.

If you do this you will be fine.

However, opening and closing the index takes time so we had to look for
some ways to speed up the indexing.

The most obvious thing is that you should do as much work as possible
outside of the synchronized block. For example, in my application, the
creation of Lucene Document objects is not synchronized. Only the part
of the code that is between your IndexWriter.open() and
IndexWriter.close() needs to be synchronized.

The other easy thing I did to improve performance was batch changes in a
transaction together for indexing. If a user changes 50 assets, that
will all be indexed using one Lucene IndexWriter.

So far, we haven't had to explore further performance enhancements, but
if we do the next thing I will do is create a thread that gathers assets
that need to be indexed and performs a batch job every five minutes or
so.

Hope this is helpful,
Luke


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene : avoiding locking (incremental indexing)

2004-11-15 Thread jeichels
It really seems like I am not the only person having this issue.

So far I am seeing 2 solutions and honestly I don't love either totally.  I am 
thinking that without changes to Lucene itself, the best general way to 
implement this might be to have a queue of changes and have Lucene work off 
this queue in a single thread using a time-settable batch method.   This is 
similar to what you are using below, but I don't like that you forcibly unlock 
Lucene if it shows itself locked.   Using the Queue approach, only that one 
thread could be accessing Lucene for writes/deletes anyway so there should be 
no unknown locking.

I can imagine this being a very good addition to Lucene - creating a high level 
interface to Lucene that manages incremental updates in such a manner.  If 
anybody has such a general piece of code, please post it!!!   I would use it 
tonight rather then create my own.

I am not sure if there is anything that can be done to Lucene itself to help 
with this need people seem to be having.  I realize the likely reasons why 
Lucene might need to only have one Index writer and the additional load that 
might be caused by locking off pieces of the database rather then the whole 
database.  I think I need to look in the developer archives.

JohnE



- Original Message -
From: Luke Shannon [EMAIL PROTECTED]
Date: Monday, November 15, 2004 5:14 pm
Subject: Re: Lucene : avoiding locking (incremental indexing)

 Hi Luke;
 
 I have a similar system (except people don't need to see results
 immediatly). The approach I took is a little different.
 
 I made my Indexer a thread with the indexing operations occuring 
 the in run
 method. When the IndexWriter is to be created or the IndexReader 
 needs to
 execute a delete I called the following method:
 
 private void manageIndexLock() {
  try {
   //check if the index is locked and deal with it if it is
   if (index.exists()  IndexReader.isLocked(indexFileLocation)) {
System.out.println(INDEXING INFO: There is more than one 
 process trying
 to write to the index folder. Will wait for index to become 
 available.);//perform this loop until the lock if released or 
 3 mins
// has expired
int indexChecks = 0;
while (IndexReader.isLocked(indexFileLocation)
   indexChecks  6) {
 //increment the number of times we check the index
 // files
 indexChecks++;
 try {
  //sleep for 30 seconds
  Thread.sleep(3L);
 } catch (InterruptedException e2) {
  System.out.println(INDEX ERROR: There was a problem waiting 
 for the
 lock to release. 
  + e2.getMessage());
 }
}//closes the while loop for checking on the index
// directory
//if we are still locked we need to do something about it
if (IndexReader.isLocked(indexFileLocation)) {
 System.out.println(INDEXING INFO: Index Locked After 3 
 minute of
 waiting. Forcefully releasing lock.);
 IndexReader.unlock(FSDirectory.getDirectory(index, false));
 System.out.println(INDEXING INFO: Index lock released);
}//close the if that actually releases the lock
   }//close the if ensure the file exists
  }//closes the try for all the above operations
  catch (IOException e1) {
   System.out.println(INDEX ERROR: There was a problem waiting 
 for the lock
 to release. 
   + e1.getMessage());
  }
 }//close the manageIndexLock method
 
 Do you think this is a bad approach?
 
 Luke
 
 - Original Message - 
 From: Luke Francl [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Monday, November 15, 2004 5:01 PM
 Subject: Re: Lucene : avoiding locking (incremental indexing)
 
 
  This is how I implemented incremental indexing. If anyone sees 
 anything wrong, please let me know.
 
  Our motivation is similar to John Eichel's. We have a digital asset
  management system and when users update, delete or create a new 
 asset, they need to see their results immediately.
 
  The most important thing to know about incremental indexing that
  multiple threads cannot share the same IndexWriter, and only one
  IndexWriter can be open on an index at a time.
 
  Therefore, what I did was control access to the IndexWriter 
 through a
  singleton wrapper class that synchronizes access to the 
 IndexWriter and
  IndexReader (for deletes). After finishing writing to the index, you
  must close the IndexWriter to flush the changes to the index.
 
  If you do this you will be fine.
 
  However, opening and closing the index takes time so we had to 
 look for
  some ways to speed up the indexing.
 
  The most obvious thing is that you should do as much work as 
 possible outside of the synchronized block. For example, in my 
 application, the
  creation of Lucene Document objects is not synchronized. Only 
 the part
  of the code that is between your IndexWriter.open() and
  IndexWriter.close() needs to be synchronized.
 
  The other easy thing I did to improve performance was batch 
 changes in a
  transaction together

Re: Lucene : avoiding locking (incremental indexing)

2004-11-15 Thread sergiu gordea
Luke Shannon wrote:
I like the sound of the Queue approach.  I also don't like that I have to
focefully unlock the index.
 

Personally I don't like the Queue aproach... because I already 
implemented multithreading in out application
to improve its performance. In our application indexing is not a high 
priority, but it's happening quite often.
Search is a priority.

Lucene allows to have more searches at on time. When you have a big 
index and a many users then ...
the Queue aproach can slow down your application to much. I think it 
will be a bottleneck.

I know that the lock problem is annoying, but I also think that the 
right way is to identify the source of locking.
Our application is a webbased application based on turbine, and when we 
want to restart tomcat, we just kill
the process (otherwise we need to restart 2 times because of some log4j 
initialization problem), so ...
the index is locked after the tomcat restart. In my case it makes sense 
to check if index is locked one time at
startup. I'm also logging all errors that I get in the systems, this is 
helping me to find their sourcce easier.

All the best,
Sergiu
I'm not the most experience programmer and am on a tight deadline. The
approach I ended up with was the best I could do with the experience I've
got and the time I had.
My indexer works so far and doesn't have to forcefully release the lock on
the Index too often (the case is most likely to occur when someone removes a
content file(s) and the reader needs to delete from the existing index for
the first time). We will see what happens as more people use the system with
large content directories.
As I learn more I plan to expand the functionality of my class.
Luke S
- Original Message - 
From: [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, November 15, 2004 5:50 PM
Subject: Re: Lucene : avoiding locking (incremental indexing)

 

It really seems like I am not the only person having this issue.
So far I am seeing 2 solutions and honestly I don't love either totally.
   

I am thinking that without changes to Lucene itself, the best general way
to implement this might be to have a queue of changes and have Lucene work
off this queue in a single thread using a time-settable batch method.   This
is similar to what you are using below, but I don't like that you forcibly
unlock Lucene if it shows itself locked.   Using the Queue approach, only
that one thread could be accessing Lucene for writes/deletes anyway so there
should be no unknown locking.
 

I can imagine this being a very good addition to Lucene - creating a high
   

level interface to Lucene that manages incremental updates in such a manner.
If anybody has such a general piece of code, please post it!!!   I would use
it tonight rather then create my own.
 

I am not sure if there is anything that can be done to Lucene itself to
   

help with this need people seem to be having.  I realize the likely reasons
why Lucene might need to only have one Index writer and the additional load
that might be caused by locking off pieces of the database rather then the
whole database.  I think I need to look in the developer archives.
 

JohnE

- Original Message -
From: Luke Shannon [EMAIL PROTECTED]
Date: Monday, November 15, 2004 5:14 pm
Subject: Re: Lucene : avoiding locking (incremental indexing)
   

Hi Luke;
I have a similar system (except people don't need to see results
immediatly). The approach I took is a little different.
I made my Indexer a thread with the indexing operations occuring
the in run
method. When the IndexWriter is to be created or the IndexReader
needs to
execute a delete I called the following method:
private void manageIndexLock() {
try {
 //check if the index is locked and deal with it if it is
 if (index.exists()  IndexReader.isLocked(indexFileLocation)) {
  System.out.println(INDEXING INFO: There is more than one
process trying
to write to the index folder. Will wait for index to become
available.);//perform this loop until the lock if released or
3 mins
  // has expired
  int indexChecks = 0;
  while (IndexReader.isLocked(indexFileLocation)
 indexChecks  6) {
   //increment the number of times we check the index
   // files
   indexChecks++;
   try {
//sleep for 30 seconds
Thread.sleep(3L);
   } catch (InterruptedException e2) {
System.out.println(INDEX ERROR: There was a problem waiting
for the
lock to release. 
+ e2.getMessage());
   }
  }//closes the while loop for checking on the index
  // directory
  //if we are still locked we need to do something about it
  if (IndexReader.isLocked(indexFileLocation)) {
   System.out.println(INDEXING INFO: Index Locked After 3
minute of
waiting. Forcefully releasing lock.);
   IndexReader.unlock(FSDirectory.getDirectory(index, false));
   System.out.println(INDEXING INFO: Index lock released);
  }//close the if that actually releases the lock
 }//close the if ensure the file exists
}//closes

Re: Lucene : avoiding locking

2004-11-12 Thread Luke Francl
Luke,

I also integrated Lucene into a content management application with
incremental updates and ran into the same problem you did.

You need to make sure only one process (which means, no multiple copies
of the application writing to the index simultaneously) or thread ever
writes to the index. That includes deletes as in your code below, so
make sure that is synchronized, too.

Also, you will find that opening and closing the index for writing is
very costly, especially on a large index, so it pays to batch up all
changes in a transaction (inserts and deletes) together in one go at the
Lucene index. If this still isn't enough, you can batch up 5 minutes
worth of changes and apply them at once. We haven't got to that point
yet.

I am curious, though, how many people on this list are using Lucene in
the incremental update case. Most examples I've seen all assume batch
indexing.

Regards,

Luke Francl



On Thu, 2004-11-11 at 18:33, Luke Shannon wrote:
 Syncronizing the method didn't seem to help. The lock is being detected
 right here in the code:
 
 while (uidIter.term() != null
uidIter.term().field() == uid
uidIter.term().text().compareTo(uid)  0) {
  //delete stale docs
  if (deleting) {
   reader.delete(uidIter.term());
  }
  uidIter.next();
 }
 
 This runs fine on my own site so I am confused. For now I think I am going
 to remove the deleting of stale files etc and just rebuild the index each
 time to see what happens.
 
 - Original Message - 
 From: [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Thursday, November 11, 2004 6:56 PM
 Subject: Re: Lucene : avoiding locking
 
 
  I'm working on a similar project...
  Make sure that only one call to the index method is occuring at
  a time.  Synchronizing that method should do it.
 
  --- Luke Shannon [EMAIL PROTECTED] wrote:
 
   Hi All;
  
   I have hit a snag in my Lucene integration and don't know what
   to do.
  
My company has a content management product. Each time
   someone changes the
directory structure or a file with in it that portion of the
   site needs to
be re-indexed so the changes are reflected in future searches
   (indexing
   must
happen during run time).
  
I have written a Indexer class with a static Index() method.
   The idea is
   too
call the method every time something changes and the index
   needs to be
re-examined. I am hoping the logic put in by Doug Cutting
   surrounding the
UID will make indexing efficient enough to be called so
   frequently.
  
This class works great when I tested it on my own little site
   (I have about
2000 file). But when I drop the functionality into the QA
   environment I get
a locking error.
  
I can't access the stack trace, all I can get at is a log
   file the
application writes too. Here is the section my class wrote.
   It was right in
the middle of indexing and bang lock issue.
  
I don't know if the problem is in my code or something in the
   existing
application.
  
Error Message:
ENTER|SearchEventProcessor.visit(ContentNodeDeleteEvent)
|INFO|INDEXING INFO: Start Indexing new content.
|INFO|INDEXING INFO: Index Folder Did Not Exist. Start
   Creation Of New
   Index
|INFO|INDEXING INFO: Beginnging Incremental update
   comparisions
|INFO|INDEXING INFO: Beginnging Incremental update
   comparisions
|INFO|INDEXING INFO: Beginnging Incremental update
   comparisions
|INFO|INDEXING INFO: Beginnging Incremental update
   comparisions
|INFO|INDEXING INFO: Beginnging Incremental update
   comparisions
|INFO|INDEXING INFO: Beginnging Incremental update
   comparisions
|INFO|INDEXING INFO: Beginnging Incremental update
   comparisions
|INFO|INDEXING INFO: Beginnging Incremental update
   comparisions
|INFO|INDEXING INFO: Beginnging Incremental update
   comparisions
|INFO|INDEXING INFO: Beginnging Incremental update
   comparisions
|INFO|INDEXING INFO: Beginnging Incremental update
   comparisions
|INFO|INDEXING INFO: Beginnging Incremental update
   comparisions
|INFO|INDEXING INFO: Beginnging Incremental update
   comparisions
|INFO|INDEXING INFO: Beginnging Incremental update
   comparisions
|INFO|INDEXING INFO: Beginnging Incremental update
   comparisions
|INFO|INDEXING INFO: Beginnging Incremental update
   comparisions
|INFO|INDEXING INFO: Beginnging Incremental update
   comparisions
|INFO|INDEXING ERROR: Unable to index new content Lock obtain
   timed out:
  
  
 
 Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432
10f7fe8-write.lock
  
   |ENTER|UpdateCacheEventProcessor.visit(ContentNodeDeleteEvent)
  
Here is my code. You will recognize it pretty much as the
   IndexHTML class
from the Lucene demo written by Doug Cutting. I have put a
   ton of comments
in a attempt to understand what is going on.
  
Any help would

Re: Lucene : avoiding locking

2004-11-12 Thread Luke Shannon
Hi Luke;

Currently I am experimenting with checking if the index is lock using
IndexReader.locked before creating a writer. If this turns out to be the
case I was thinking of just unlocking the file.

Do you think this is a good strategy?

Thanks,

Luke

- Original Message - 
From: Luke Francl [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Friday, November 12, 2004 10:38 AM
Subject: Re: Lucene : avoiding locking


 Luke,

 I also integrated Lucene into a content management application with
 incremental updates and ran into the same problem you did.

 You need to make sure only one process (which means, no multiple copies
 of the application writing to the index simultaneously) or thread ever
 writes to the index. That includes deletes as in your code below, so
 make sure that is synchronized, too.

 Also, you will find that opening and closing the index for writing is
 very costly, especially on a large index, so it pays to batch up all
 changes in a transaction (inserts and deletes) together in one go at the
 Lucene index. If this still isn't enough, you can batch up 5 minutes
 worth of changes and apply them at once. We haven't got to that point
 yet.

 I am curious, though, how many people on this list are using Lucene in
 the incremental update case. Most examples I've seen all assume batch
 indexing.

 Regards,

 Luke Francl



 On Thu, 2004-11-11 at 18:33, Luke Shannon wrote:
  Syncronizing the method didn't seem to help. The lock is being detected
  right here in the code:
 
  while (uidIter.term() != null
 uidIter.term().field() == uid
 uidIter.term().text().compareTo(uid)  0) {
   //delete stale docs
   if (deleting) {
reader.delete(uidIter.term());
   }
   uidIter.next();
  }
 
  This runs fine on my own site so I am confused. For now I think I am
going
  to remove the deleting of stale files etc and just rebuild the index
each
  time to see what happens.
 
  - Original Message - 
  From: [EMAIL PROTECTED]
  To: Lucene Users List [EMAIL PROTECTED]
  Sent: Thursday, November 11, 2004 6:56 PM
  Subject: Re: Lucene : avoiding locking
 
 
   I'm working on a similar project...
   Make sure that only one call to the index method is occuring at
   a time.  Synchronizing that method should do it.
  
   --- Luke Shannon [EMAIL PROTECTED] wrote:
  
Hi All;
   
I have hit a snag in my Lucene integration and don't know what
to do.
   
 My company has a content management product. Each time
someone changes the
 directory structure or a file with in it that portion of the
site needs to
 be re-indexed so the changes are reflected in future searches
(indexing
must
 happen during run time).
   
 I have written a Indexer class with a static Index() method.
The idea is
too
 call the method every time something changes and the index
needs to be
 re-examined. I am hoping the logic put in by Doug Cutting
surrounding the
 UID will make indexing efficient enough to be called so
frequently.
   
 This class works great when I tested it on my own little site
(I have about
 2000 file). But when I drop the functionality into the QA
environment I get
 a locking error.
   
 I can't access the stack trace, all I can get at is a log
file the
 application writes too. Here is the section my class wrote.
It was right in
 the middle of indexing and bang lock issue.
   
 I don't know if the problem is in my code or something in the
existing
 application.
   
 Error Message:
 ENTER|SearchEventProcessor.visit(ContentNodeDeleteEvent)
 |INFO|INDEXING INFO: Start Indexing new content.
 |INFO|INDEXING INFO: Index Folder Did Not Exist. Start
Creation Of New
Index
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions

Re: Lucene : avoiding locking

2004-11-12 Thread Otis Gospodnetic
Hello,

--- Luke Shannon [EMAIL PROTECTED] wrote:

 Currently I am experimenting with checking if the index is lock using
 IndexReader.locked before creating a writer. If this turns out to be
 the
 case I was thinking of just unlocking the file.
 
 Do you think this is a good strategy?

Only if you synchronize well and only if all index-modifying accesses
are contained in the same JVM.  Alternatively, you could add a 'sleep
and retry' logic around the lock check, and perhaps 'give up or force
unlock if you got too much sleep'.

Otis

 
 - Original Message - 
 From: Luke Francl [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Friday, November 12, 2004 10:38 AM
 Subject: Re: Lucene : avoiding locking
 
 
  Luke,
 
  I also integrated Lucene into a content management application with
  incremental updates and ran into the same problem you did.
 
  You need to make sure only one process (which means, no multiple
 copies
  of the application writing to the index simultaneously) or thread
 ever
  writes to the index. That includes deletes as in your code below,
 so
  make sure that is synchronized, too.
 
  Also, you will find that opening and closing the index for writing
 is
  very costly, especially on a large index, so it pays to batch up
 all
  changes in a transaction (inserts and deletes) together in one go
 at the
  Lucene index. If this still isn't enough, you can batch up 5
 minutes
  worth of changes and apply them at once. We haven't got to that
 point
  yet.
 
  I am curious, though, how many people on this list are using Lucene
 in
  the incremental update case. Most examples I've seen all assume
 batch
  indexing.
 
  Regards,
 
  Luke Francl
 
 
 
  On Thu, 2004-11-11 at 18:33, Luke Shannon wrote:
   Syncronizing the method didn't seem to help. The lock is being
 detected
   right here in the code:
  
   while (uidIter.term() != null
  uidIter.term().field() == uid
  uidIter.term().text().compareTo(uid)  0) {
//delete stale docs
if (deleting) {
 reader.delete(uidIter.term());
}
uidIter.next();
   }
  
   This runs fine on my own site so I am confused. For now I think I
 am
 going
   to remove the deleting of stale files etc and just rebuild the
 index
 each
   time to see what happens.
  
   - Original Message - 
   From: [EMAIL PROTECTED]
   To: Lucene Users List [EMAIL PROTECTED]
   Sent: Thursday, November 11, 2004 6:56 PM
   Subject: Re: Lucene : avoiding locking
  
  
I'm working on a similar project...
Make sure that only one call to the index method is occuring at
a time.  Synchronizing that method should do it.
   
--- Luke Shannon [EMAIL PROTECTED] wrote:
   
 Hi All;

 I have hit a snag in my Lucene integration and don't know
 what
 to do.

  My company has a content management product. Each time
 someone changes the
  directory structure or a file with in it that portion of the
 site needs to
  be re-indexed so the changes are reflected in future
 searches
 (indexing
 must
  happen during run time).

  I have written a Indexer class with a static Index() method.
 The idea is
 too
  call the method every time something changes and the index
 needs to be
  re-examined. I am hoping the logic put in by Doug Cutting
 surrounding the
  UID will make indexing efficient enough to be called so
 frequently.

  This class works great when I tested it on my own little
 site
 (I have about
  2000 file). But when I drop the functionality into the QA
 environment I get
  a locking error.

  I can't access the stack trace, all I can get at is a log
 file the
  application writes too. Here is the section my class wrote.
 It was right in
  the middle of indexing and bang lock issue.

  I don't know if the problem is in my code or something in
 the
 existing
  application.

  Error Message:
  ENTER|SearchEventProcessor.visit(ContentNodeDeleteEvent)
  |INFO|INDEXING INFO: Start Indexing new content.
  |INFO|INDEXING INFO: Index Folder Did Not Exist. Start
 Creation Of New
 Index
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions

Re: Lucene : avoiding locking

2004-11-12 Thread Luke Francl
On Fri, 2004-11-12 at 09:51, Luke Shannon wrote:
 Hi Luke;
 
 Currently I am experimenting with checking if the index is lock using
 IndexReader.locked before creating a writer. If this turns out to be the
 case I was thinking of just unlocking the file.
 
 Do you think this is a good strategy?

No, because if the index is locked, that means another thread or process
is writing to it.

If you're getting spurious locks, stop your application and clean our
the /tmp/ directory (you should see files named *lucene* -- these are
the lock files).

Luke


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene : avoiding locking

2004-11-12 Thread Otis Gospodnetic
 I am curious, though, how many people on this list are using Lucene
 in the incremental update case. Most examples I've seen all assume
 batch indexing.

I do both on for Simpy (simpy.com).  To ensure no duplicates, I try to
delete (by some unique ID) before I add a new Document.

Otis


 On Thu, 2004-11-11 at 18:33, Luke Shannon wrote:
  Syncronizing the method didn't seem to help. The lock is being
 detected
  right here in the code:
  
  while (uidIter.term() != null
 uidIter.term().field() == uid
 uidIter.term().text().compareTo(uid)  0) {
   //delete stale docs
   if (deleting) {
reader.delete(uidIter.term());
   }
   uidIter.next();
  }
  
  This runs fine on my own site so I am confused. For now I think I
 am going
  to remove the deleting of stale files etc and just rebuild the
 index each
  time to see what happens.
  
  - Original Message - 
  From: [EMAIL PROTECTED]
  To: Lucene Users List [EMAIL PROTECTED]
  Sent: Thursday, November 11, 2004 6:56 PM
  Subject: Re: Lucene : avoiding locking
  
  
   I'm working on a similar project...
   Make sure that only one call to the index method is occuring at
   a time.  Synchronizing that method should do it.
  
   --- Luke Shannon [EMAIL PROTECTED] wrote:
  
Hi All;
   
I have hit a snag in my Lucene integration and don't know what
to do.
   
 My company has a content management product. Each time
someone changes the
 directory structure or a file with in it that portion of the
site needs to
 be re-indexed so the changes are reflected in future searches
(indexing
must
 happen during run time).
   
 I have written a Indexer class with a static Index() method.
The idea is
too
 call the method every time something changes and the index
needs to be
 re-examined. I am hoping the logic put in by Doug Cutting
surrounding the
 UID will make indexing efficient enough to be called so
frequently.
   
 This class works great when I tested it on my own little site
(I have about
 2000 file). But when I drop the functionality into the QA
environment I get
 a locking error.
   
 I can't access the stack trace, all I can get at is a log
file the
 application writes too. Here is the section my class wrote.
It was right in
 the middle of indexing and bang lock issue.
   
 I don't know if the problem is in my code or something in the
existing
 application.
   
 Error Message:
 ENTER|SearchEventProcessor.visit(ContentNodeDeleteEvent)
 |INFO|INDEXING INFO: Start Indexing new content.
 |INFO|INDEXING INFO: Index Folder Did Not Exist. Start
Creation Of New
Index
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update
comparisions
 |INFO|INDEXING ERROR: Unable to index new content Lock obtain
timed out:
   
   
  
 

Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432
 10f7fe8-write.lock
   
|ENTER|UpdateCacheEventProcessor.visit(ContentNodeDeleteEvent)
   
 Here is my code. You will recognize it pretty much as the
IndexHTML class
 from the Lucene demo written by Doug Cutting. I have put a
ton of comments
 in a attempt to understand what is going on.
   
 Any help would be appreciated.
   
 Luke
   
 package com.fbhm.bolt.search;
   
 /*
  * Created on Nov 11, 2004
  *
  * This class will create a single index file for the Content
  * Management System (CMS). It contains logic to ensure
  * indexing is done intelligently. Based on IndexHTML.java
  * from the demo folder that ships with Lucene
  */
   
 import java.io.File;
 import java.io.IOException;
 import java.util.Arrays;
 import java.util.Date

Re: Lucene : avoiding locking

2004-11-12 Thread Luke Shannon
.
indexDocs(1 arg));
  //look out for reader/writer conflicts
  if (IndexReader.isLocked(index.getPath())) {
   try {
System.out
  .println(Waiting 1 minute for the reader to release the lock on
the index.);
Thread.sleep(6L);
//if we are still locked we need to do
// something about it
if (IndexReader.isLocked(index.getPath())) {
 System.out
   .println(Index Locked After 1 minute waiting. Forcefully
releasing lock.);
 IndexReader.unlock(FSDirectory
   .getDirectory(index, false));
 System.out.println(Index lock released);
}

   } catch (InterruptedException e2) {
System.out
  .println(INDEX ERROR: There was a problem waiting for the lock to
release. 
+ e2.getMessage());
   }
  }
  System.out
.println(INDEX INFO: Data has been deleted from the index.);
  reader.delete(uidIter.term());
 }
 uidIter.next();
}
//if the terms are equal there is no change with this document
//we keep it as is
if (uidIter.term() != null  uidIter.term().field() == uid
   uidIter.term().text().compareTo(uid) == 0) {
 uidIter.next();
}
//if we are not deleting and the document was not there
//it means we didn't have this document on the last index
//and we should add it
else if (!deleting) {
 System.out
   .println(INDEXING INFO: Adding a new Document to the existing index:

 + file.getPath());
 //pdf files
 if (file.getPath().endsWith(.pdf)) {
  try {
   Document doc = LucenePDFDocument.getDocument(file);
   writer.addDocument(doc);
  } catch (Exception e) {
   System.out
 .println(INDEXING ERROR: Unable to index pdf document: 
   + file.getPath()
   +  
   + e.getMessage());
  }
 }
 //xml documents
 else if (file.getPath().endsWith(.xml)) {
  try {
   Document doc = XMLDocument.Document(file);
   writer.addDocument(doc);
  } catch (Exception e) {
   System.out
 .println(INDEXING ERROR: Was unable to index XML document: 
   + file.getPath()
   +  
   + e.getMessage());
  }
 }
 //html and txt documents
 else {
  try {
   Document doc = HTMLDocument.Document(file);
   writer.addDocument(doc);
  } catch (Exception e) {
   System.out
 .println(INDEXING ERROR: Was unable to index HTML/TXT file: 
   + file.getPath()
   +  
   + e.getMessage());
  }
 }
}
   }//end the if for an incremental update
   //we are creating a new index, add all document types
   else {
System.out
  .println(INDEXING INFO: Adding a new Document to a new index: 
+ file.getPath());
//pdf documents
if (file.getPath().endsWith(.pdf)) {
 try {
  Document doc = LucenePDFDocument.getDocument(file);
  writer.addDocument(doc);
 } catch (Exception e) {
  System.out
.println(INDEXING ERROR: Unable to index pdf document: 
  + file.getPath() +   + e.getMessage());
 }
}
//xml documents
else if (file.getPath().endsWith(.xml)) {
 try {
  Document doc = XMLDocument.Document(file);
  writer.addDocument(doc);
 } catch (Exception e) {
  System.out
.println(INDEXING ERROR: Was unable to index XML document: 
  + file.getPath() +   + e.getMessage());
 }
}
//html and txt documents
else {
 try {
  Document doc = HTMLDocument.Document(file);
  writer.addDocument(doc);
 } catch (Exception e) {
  System.out
.println(INDEXING ERROR: Was unable to index HTML/TXT file: 
  + file.getPath() +   + e.getMessage());
 }
}//close the else
   }//close the else for a new index
  }//close the else if to handle file types
 }//close the indexDocs method

 /*
  * Close any open objects.
  */
 protected void finalize() throws Throwable {
  if (reader != null) {
   reader.close();
  }
  if (writer != null) {
   writer.close();
  }
 }
}
- Original Message - 
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Friday, November 12, 2004 11:03 AM
Subject: Re: Lucene : avoiding locking


 Hello,

 --- Luke Shannon [EMAIL PROTECTED] wrote:

  Currently I am experimenting with checking if the index is lock using
  IndexReader.locked before creating a writer. If this turns out to be
  the
  case I was thinking of just unlocking the file.
 
  Do you think this is a good strategy?

 Only if you synchronize well and only if all index-modifying accesses
 are contained in the same JVM.  Alternatively, you could add a 'sleep
 and retry' logic around the lock check, and perhaps 'give up or force
 unlock if you got too much sleep'.

 Otis


  - Original Message - 
  From: Luke Francl [EMAIL PROTECTED]
  To: Lucene Users List [EMAIL PROTECTED]
  Sent: Friday, November 12

Lucene : avoiding locking

2004-11-11 Thread Luke Shannon
Hi All;

I have hit a snag in my Lucene integration and don't know what to do.

 My company has a content management product. Each time someone changes the
 directory structure or a file with in it that portion of the site needs to
 be re-indexed so the changes are reflected in future searches (indexing
must
 happen during run time).

 I have written a Indexer class with a static Index() method. The idea is
too
 call the method every time something changes and the index needs to be
 re-examined. I am hoping the logic put in by Doug Cutting surrounding the
 UID will make indexing efficient enough to be called so frequently.

 This class works great when I tested it on my own little site (I have about
 2000 file). But when I drop the functionality into the QA environment I get
 a locking error.

 I can't access the stack trace, all I can get at is a log file the
 application writes too. Here is the section my class wrote. It was right in
 the middle of indexing and bang lock issue.

 I don't know if the problem is in my code or something in the existing
 application.

 Error Message:
 ENTER|SearchEventProcessor.visit(ContentNodeDeleteEvent)
 |INFO|INDEXING INFO: Start Indexing new content.
 |INFO|INDEXING INFO: Index Folder Did Not Exist. Start Creation Of New
Index
 |INFO|INDEXING INFO: Beginnging Incremental update comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update comparisions
 |INFO|INDEXING INFO: Beginnging Incremental update comparisions
 |INFO|INDEXING ERROR: Unable to index new content Lock obtain timed out:

Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432
 10f7fe8-write.lock
 |ENTER|UpdateCacheEventProcessor.visit(ContentNodeDeleteEvent)

 Here is my code. You will recognize it pretty much as the IndexHTML class
 from the Lucene demo written by Doug Cutting. I have put a ton of comments
 in a attempt to understand what is going on.

 Any help would be appreciated.

 Luke

 package com.fbhm.bolt.search;

 /*
  * Created on Nov 11, 2004
  *
  * This class will create a single index file for the Content
  * Management System (CMS). It contains logic to ensure
  * indexing is done intelligently. Based on IndexHTML.java
  * from the demo folder that ships with Lucene
  */

 import java.io.File;
 import java.io.IOException;
 import java.util.Arrays;
 import java.util.Date;

 import org.apache.lucene.analysis.standard.StandardAnalyzer;
 import org.apache.lucene.document.Document;
 import org.apache.lucene.index.IndexReader;
 import org.apache.lucene.index.IndexWriter;
 import org.apache.lucene.index.Term;
 import org.apache.lucene.index.TermEnum;
 import org.pdfbox.searchengine.lucene.LucenePDFDocument;
 import org.apache.lucene.demo.HTMLDocument;

 import com.alaia.common.debug.Trace;
 import com.alaia.common.util.AppProperties;

 /**
  * @author lshannon Description: br
  *   This class is used to index a content folder. It contains logic to
  *   ensure only new or documents that have been modified since the last
  *   search are indexed. br
  *   Based on code writen by Doug Cutting in the IndexHTML class found in
  *   the Lucene demo
  */
 public class Indexer {
  //true during deletion pass, this is when the index already exists
  private static boolean deleting = false;

  //object to read existing indexes
  private static IndexReader reader;

  //object to write to the index folder
  private static IndexWriter writer;

  //this will be used to write the index file
  private static TermEnum uidIter;

  /*
   * This static method does all the work, the end result is an up-to-date
 index folder
  */
  public static void Index() {
   //we will assume to start the index has been created
   boolean create = true;
   //set the name of the index file
   String indexFileLocation =
 AppProperties.getPropertyAsString(bolt.search.siteIndex.index.root);
   //set the name of the content folder
   String contentFolderLocation =
 AppProperties.getPropertyAsString(site.root);
   //manage whether the index needs to be created or not
   File index = new File(indexFileLocation);
   

Re: Lucene : avoiding locking

2004-11-11 Thread yahootintin-lucene
I'm working on a similar project...
Make sure that only one call to the index method is occuring at
a time.  Synchronizing that method should do it.

--- Luke Shannon [EMAIL PROTECTED] wrote:

 Hi All;
 
 I have hit a snag in my Lucene integration and don't know what
 to do.
 
  My company has a content management product. Each time
 someone changes the
  directory structure or a file with in it that portion of the
 site needs to
  be re-indexed so the changes are reflected in future searches
 (indexing
 must
  happen during run time).
 
  I have written a Indexer class with a static Index() method.
 The idea is
 too
  call the method every time something changes and the index
 needs to be
  re-examined. I am hoping the logic put in by Doug Cutting
 surrounding the
  UID will make indexing efficient enough to be called so
 frequently.
 
  This class works great when I tested it on my own little site
 (I have about
  2000 file). But when I drop the functionality into the QA
 environment I get
  a locking error.
 
  I can't access the stack trace, all I can get at is a log
 file the
  application writes too. Here is the section my class wrote.
 It was right in
  the middle of indexing and bang lock issue.
 
  I don't know if the problem is in my code or something in the
 existing
  application.
 
  Error Message:
  ENTER|SearchEventProcessor.visit(ContentNodeDeleteEvent)
  |INFO|INDEXING INFO: Start Indexing new content.
  |INFO|INDEXING INFO: Index Folder Did Not Exist. Start
 Creation Of New
 Index
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING INFO: Beginnging Incremental update
 comparisions
  |INFO|INDEXING ERROR: Unable to index new content Lock obtain
 timed out:
 

Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432
  10f7fe8-write.lock
 
 |ENTER|UpdateCacheEventProcessor.visit(ContentNodeDeleteEvent)
 
  Here is my code. You will recognize it pretty much as the
 IndexHTML class
  from the Lucene demo written by Doug Cutting. I have put a
 ton of comments
  in a attempt to understand what is going on.
 
  Any help would be appreciated.
 
  Luke
 
  package com.fbhm.bolt.search;
 
  /*
   * Created on Nov 11, 2004
   *
   * This class will create a single index file for the Content
   * Management System (CMS). It contains logic to ensure
   * indexing is done intelligently. Based on IndexHTML.java
   * from the demo folder that ships with Lucene
   */
 
  import java.io.File;
  import java.io.IOException;
  import java.util.Arrays;
  import java.util.Date;
 
  import org.apache.lucene.analysis.standard.StandardAnalyzer;
  import org.apache.lucene.document.Document;
  import org.apache.lucene.index.IndexReader;
  import org.apache.lucene.index.IndexWriter;
  import org.apache.lucene.index.Term;
  import org.apache.lucene.index.TermEnum;
  import org.pdfbox.searchengine.lucene.LucenePDFDocument;
  import org.apache.lucene.demo.HTMLDocument;
 
  import com.alaia.common.debug.Trace;
  import com.alaia.common.util.AppProperties;
 
  /**
   * @author lshannon Description: br
   *   This class is used to index a content folder. It
 contains logic to
   *   ensure only new or documents that have been modified
 since the last
   *   search are indexed. br
   *   Based on code writen by Doug Cutting in the IndexHTML
 class found in
   *   the Lucene demo
   */
  public class Indexer {
   //true during deletion pass, this is when the index already
 exists
   private static boolean deleting = false;
 
   //object to read existing indexes
   private static IndexReader reader;
 
   //object to write to the index folder
   private static IndexWriter writer;
 
   //this will be used to write the index file
   private static TermEnum uidIter;
 
   /*
* This static method does all the work, the end result is
 an up-to-date
  index folder
   */
   public static void Index() {
//we will assume to start the index has been created
boolean create = true;
//set 

Re: Lucene : avoiding locking

2004-11-11 Thread Luke Shannon
I will try that now.
Thank you.

- Original Message - 
From: [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, November 11, 2004 6:56 PM
Subject: Re: Lucene : avoiding locking


 I'm working on a similar project...
 Make sure that only one call to the index method is occuring at
 a time.  Synchronizing that method should do it.

 --- Luke Shannon [EMAIL PROTECTED] wrote:

  Hi All;
 
  I have hit a snag in my Lucene integration and don't know what
  to do.
 
   My company has a content management product. Each time
  someone changes the
   directory structure or a file with in it that portion of the
  site needs to
   be re-indexed so the changes are reflected in future searches
  (indexing
  must
   happen during run time).
 
   I have written a Indexer class with a static Index() method.
  The idea is
  too
   call the method every time something changes and the index
  needs to be
   re-examined. I am hoping the logic put in by Doug Cutting
  surrounding the
   UID will make indexing efficient enough to be called so
  frequently.
 
   This class works great when I tested it on my own little site
  (I have about
   2000 file). But when I drop the functionality into the QA
  environment I get
   a locking error.
 
   I can't access the stack trace, all I can get at is a log
  file the
   application writes too. Here is the section my class wrote.
  It was right in
   the middle of indexing and bang lock issue.
 
   I don't know if the problem is in my code or something in the
  existing
   application.
 
   Error Message:
   ENTER|SearchEventProcessor.visit(ContentNodeDeleteEvent)
   |INFO|INDEXING INFO: Start Indexing new content.
   |INFO|INDEXING INFO: Index Folder Did Not Exist. Start
  Creation Of New
  Index
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING ERROR: Unable to index new content Lock obtain
  timed out:
 
 

Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432
   10f7fe8-write.lock
 
  |ENTER|UpdateCacheEventProcessor.visit(ContentNodeDeleteEvent)
 
   Here is my code. You will recognize it pretty much as the
  IndexHTML class
   from the Lucene demo written by Doug Cutting. I have put a
  ton of comments
   in a attempt to understand what is going on.
 
   Any help would be appreciated.
 
   Luke
 
   package com.fbhm.bolt.search;
 
   /*
* Created on Nov 11, 2004
*
* This class will create a single index file for the Content
* Management System (CMS). It contains logic to ensure
* indexing is done intelligently. Based on IndexHTML.java
* from the demo folder that ships with Lucene
*/
 
   import java.io.File;
   import java.io.IOException;
   import java.util.Arrays;
   import java.util.Date;
 
   import org.apache.lucene.analysis.standard.StandardAnalyzer;
   import org.apache.lucene.document.Document;
   import org.apache.lucene.index.IndexReader;
   import org.apache.lucene.index.IndexWriter;
   import org.apache.lucene.index.Term;
   import org.apache.lucene.index.TermEnum;
   import org.pdfbox.searchengine.lucene.LucenePDFDocument;
   import org.apache.lucene.demo.HTMLDocument;
 
   import com.alaia.common.debug.Trace;
   import com.alaia.common.util.AppProperties;
 
   /**
* @author lshannon Description: br
*   This class is used to index a content folder. It
  contains logic to
*   ensure only new or documents that have been modified
  since the last
*   search are indexed. br
*   Based on code writen by Doug Cutting in the IndexHTML
  class found in
*   the Lucene demo
*/
   public class Indexer {
//true during deletion pass, this is when the index already
  exists
private static boolean deleting = false;
 
//object to read existing indexes
private static IndexReader reader;
 
//object to write to the index folder
private

Re: Lucene : avoiding locking

2004-11-11 Thread Luke Shannon
Syncronizing the method didn't seem to help. The lock is being detected
right here in the code:

while (uidIter.term() != null
   uidIter.term().field() == uid
   uidIter.term().text().compareTo(uid)  0) {
 //delete stale docs
 if (deleting) {
  reader.delete(uidIter.term());
 }
 uidIter.next();
}

This runs fine on my own site so I am confused. For now I think I am going
to remove the deleting of stale files etc and just rebuild the index each
time to see what happens.

- Original Message - 
From: [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, November 11, 2004 6:56 PM
Subject: Re: Lucene : avoiding locking


 I'm working on a similar project...
 Make sure that only one call to the index method is occuring at
 a time.  Synchronizing that method should do it.

 --- Luke Shannon [EMAIL PROTECTED] wrote:

  Hi All;
 
  I have hit a snag in my Lucene integration and don't know what
  to do.
 
   My company has a content management product. Each time
  someone changes the
   directory structure or a file with in it that portion of the
  site needs to
   be re-indexed so the changes are reflected in future searches
  (indexing
  must
   happen during run time).
 
   I have written a Indexer class with a static Index() method.
  The idea is
  too
   call the method every time something changes and the index
  needs to be
   re-examined. I am hoping the logic put in by Doug Cutting
  surrounding the
   UID will make indexing efficient enough to be called so
  frequently.
 
   This class works great when I tested it on my own little site
  (I have about
   2000 file). But when I drop the functionality into the QA
  environment I get
   a locking error.
 
   I can't access the stack trace, all I can get at is a log
  file the
   application writes too. Here is the section my class wrote.
  It was right in
   the middle of indexing and bang lock issue.
 
   I don't know if the problem is in my code or something in the
  existing
   application.
 
   Error Message:
   ENTER|SearchEventProcessor.visit(ContentNodeDeleteEvent)
   |INFO|INDEXING INFO: Start Indexing new content.
   |INFO|INDEXING INFO: Index Folder Did Not Exist. Start
  Creation Of New
  Index
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING INFO: Beginnging Incremental update
  comparisions
   |INFO|INDEXING ERROR: Unable to index new content Lock obtain
  timed out:
 
 

Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432
   10f7fe8-write.lock
 
  |ENTER|UpdateCacheEventProcessor.visit(ContentNodeDeleteEvent)
 
   Here is my code. You will recognize it pretty much as the
  IndexHTML class
   from the Lucene demo written by Doug Cutting. I have put a
  ton of comments
   in a attempt to understand what is going on.
 
   Any help would be appreciated.
 
   Luke
 
   package com.fbhm.bolt.search;
 
   /*
* Created on Nov 11, 2004
*
* This class will create a single index file for the Content
* Management System (CMS). It contains logic to ensure
* indexing is done intelligently. Based on IndexHTML.java
* from the demo folder that ships with Lucene
*/
 
   import java.io.File;
   import java.io.IOException;
   import java.util.Arrays;
   import java.util.Date;
 
   import org.apache.lucene.analysis.standard.StandardAnalyzer;
   import org.apache.lucene.document.Document;
   import org.apache.lucene.index.IndexReader;
   import org.apache.lucene.index.IndexWriter;
   import org.apache.lucene.index.Term;
   import org.apache.lucene.index.TermEnum;
   import org.pdfbox.searchengine.lucene.LucenePDFDocument;
   import org.apache.lucene.demo.HTMLDocument;
 
   import com.alaia.common.debug.Trace;
   import com.alaia.common.util.AppProperties;
 
   /**
* @author lshannon Description: br
*   This class is used to index a content folder. It
  contains logic to
*   ensure only new