Re: Lock files in a read-only application

2007-04-01 Thread Michael McCandless
"Chris Hostetter" <[EMAIL PROTECTED]> wrote:
>
> : locks without upgrading to 2.1. Our application uses its own custom
> : locking mechanism, so that lucene locking is actually redundant. We
> : are currently using Lucene version 2.0.
> 
> since before the 2.0.0 release there has been a static
> FSDirectory.setDisableLocks that can be called before opening any indexes
> to prevent locking -- it's only intended to be used on indexes on read
> only disk -- which is not the case in your situation, since a seperate
> process is in fact modifying the index, but if you are confident in your
> own locking mechanism you can use it.

You need to be really certain your own locking protects Lucene
properly.  Specifically, no IndexReader can be created (restarted)
while a writer is open against the index, and, only one writer can be
open on the index at once (it sounds like you already have that).  If
you're sure about that then disabling the locks as Hoss describes
above is OK.

> : The application has multiple threads (different web requests) reading
> : the same index simultaneously (say 20 concurrent threads). Can that be
> : a reason of this problem. Sometimes the lockfiles remain there for
> : long periods of time (more than a few minutes, which is bad).
> 
> mutliple reader threads should not cause the commit lock to stay arround
> that long, even if each thread is opening it's on IndexReader (which they
> should not do, it's better to open one and reuse it among many threads)

This part (commit lock staying around for so long) is definitely odd
and I'd like to get to the root cause.  Multiple threads are fine
(though, you should share one IndexReader).  The only way I know of
for this to happen is if JVM crashes while IndexReader or IndexWriter
is being initialized.  Even then it's quite unlikely because JVM has
to crash right when segments file is being read or written.
 
> : Yes, JVM sometimes crashes when it runs out of memory. There should be
> : someway that the lock files are removed after such crash (any fixes is
> : 2.1?).
> 
> As Michael said, in 2.1 the commit lock doesn't even exist, and in general
> there is a much more robust lock management system that lets you decide
> what type of lock mechanism to use.

In fact with 2.1 we have a new optional locking implementation called
NativeFSLockFactory.  One of its big benefits over the default Lucene
locking (SimpleFSLockFactory) is that if the JVM crashes then the lock
file(s) are correctly released (ie, no more "stale lock files" left in
the filesystem).  This way if the JVM of the writer crashes then the
write.lock that it held is properly freed by the OS.

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Emulating Pages Search

2007-04-01 Thread Mohammad Norouzi

Mosen,
In order to support pagination, I wrapped the Hits is a class just like
java.sql.ResultSet
You can create a wrapper class and put the Hits in that and implement some
methods like next() prev() to forward and backward through the docuements.

Hope this help you.

--
Regards,
Mohammad


Re: Emulating Pages Search

2007-04-01 Thread Mohsen Saboorian

This is possible, but the problem here is performance. Why is it not possible
to support pagination in a more efficient way? Suppose, a Searcher looks
through Documents and find the matching ones. Theoretically, it can stop
searching when the result hit number gets more than a threshold. Searcher
may save it's state (reference to the last matched document) whithin the
searcher instance, making it possible for incremental search.

What is the restriction here in Lucene indices structure, which prevents us
from having this kind of search?


is_maximum wrote:
> 
> Mosen,
> In order to support pagination, I wrapped the Hits is a class just like
> java.sql.ResultSet
> You can create a wrapper class and put the Hits in that and implement some
> methods like next() prev() to forward and backward through the docuements.
> 
> Hope this help you.
> 
> -- 
> Regards,
> Mohammad
> 

-- 
View this message in context: 
http://www.nabble.com/Emulating-Pages-Search-tf3500169.html#a9776722
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Emulating Pages Search

2007-04-01 Thread Mohammad Norouzi

It has no performance problem and works fine.
whenever you are going to access a document the searcher will load the
document from the index.

On 4/1/07, Mohsen Saboorian <[EMAIL PROTECTED]> wrote:



This is possible, but the problem here is performance. Why is it not
possible
to support pagination in a more efficient way? Suppose, a Searcher looks
through Documents and find the matching ones. Theoretically, it can stop
searching when the result hit number gets more than a threshold. Searcher
may save it's state (reference to the last matched document) whithin the
searcher instance, making it possible for incremental search.

What is the restriction here in Lucene indices structure, which prevents
us
from having this kind of search?


is_maximum wrote:
>
> Mosen,
> In order to support pagination, I wrapped the Hits is a class just like
> java.sql.ResultSet
> You can create a wrapper class and put the Hits in that and implement
some
> methods like next() prev() to forward and backward through the
docuements.
>
> Hope this help you.
>
> --
> Regards,
> Mohammad
>

--
View this message in context:
http://www.nabble.com/Emulating-Pages-Search-tf3500169.html#a9776722
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





--
Regards,
Mohammad


Re: Emulating Pages Search

2007-04-01 Thread Erick Erickson

Efficient in your situation, maybe. Good for everybody? Probably not.
The key is exactly your use of the word "state". Personally, I do
NOT want the core search engine to be stateful, that brings a whole
raft of problems with it. And Lucene is a search engine, not a search
application.

I really don't want my underlying search engine to keep track of the
states of many thousands of, say, web users. This, without even
asking the question of how to keep track of the state for each
user in a complex web application. Not to mention the added
requirements that I somehow indicate to Lucene which user's state
to use.

And I'm not even going to go down the path of how to accomplish
the bookkeeping for dropped sessions, timeouts, coordinating
underlying index changes with all these states, etc. etc. etc. I think
that if you consider the larger community, asking Lucene to
"save its state" is much more complex that you think.

That said, I can certainly imagine that there are situations where
making the search process stateful is a good thing. But do you
have any evidence that the current architecture actually is hurting
you other than "theoretically"? I certainly wouldn't go down the
stateful path until I'd demonstrated this in my situation.

If, however, you'd like to make a stateful way to do things and
submit it to the contrib section, I'm sure the guys would be
thrilled.

Erick

On 4/1/07, Mohsen Saboorian <[EMAIL PROTECTED]> wrote:



This is possible, but the problem here is performance. Why is it not
possible
to support pagination in a more efficient way? Suppose, a Searcher looks
through Documents and find the matching ones. Theoretically, it can stop
searching when the result hit number gets more than a threshold. Searcher
may save it's state (reference to the last matched document) whithin the
searcher instance, making it possible for incremental search.

What is the restriction here in Lucene indices structure, which prevents
us
from having this kind of search?


is_maximum wrote:
>
> Mosen,
> In order to support pagination, I wrapped the Hits is a class just like
> java.sql.ResultSet
> You can create a wrapper class and put the Hits in that and implement
some
> methods like next() prev() to forward and backward through the
docuements.
>
> Hope this help you.
>
> --
> Regards,
> Mohammad
>

--
View this message in context:
http://www.nabble.com/Emulating-Pages-Search-tf3500169.html#a9776722
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Emulating Pages Search

2007-04-01 Thread Xiaocheng Luan
Just to add to the thoughtful responses from the others, it isn't really that 
bad to do a new search each time. First, the later searches may likely be 
"warm" searches and thus won't take as long as the first search; second, it's 
the searcher.doc(docId) part that will likely hurt the most, but hopefully 
you'll only need to do that for one "page" at a time only (for each request).

Thanks,
Xiaocheng

Mohsen Saboorian <[EMAIL PROTECTED]> wrote: 
Hi,
Is there a way to do emulate paged search in Lucene? I can use the following
peace of code for returning the first page (10 items in each page), but
don't know how to navigate to the next page :-)

 IndexSearcher is = new ...
 ...
 TopFieldDocs tops = is.search(query, null /*filter*/, 10, Sort.RELEVANCE);
 for (int i = 0; i < tops.scoreDocs.length; i++) {
  ScoreDoc scoreDoc = tops.scoreDocs[i];
  System.out.println(is.doc(scoreDoc.doc));
 }

I can see that tops.totalHits, returns all matched documents. So is this
really "paged search", or I'm just doing a complete search and put a window
on the returned result each time?

Thanks.
-- 
View this message in context: 
http://www.nabble.com/Emulating-Pages-Search-tf3500169.html#a9775141
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



 
-
Sucker-punch spam with award-winning protection.
 Try the free Yahoo! Mail Beta.

Re: Help - FileNotFoundException during IndexWriter.init()

2007-04-01 Thread Antony Bowesman

Michael McCandless wrote:

Yes, I've disabled it currently while the new test runs.  Let's see. 
I'll re-run the test a few more times and see if I can re-create the problem.


OK let's see if that makes it go away!  Hopefully :)


I ran the tests several times over the weekend with no virus checker in the DB 
directory and haven't managed to reproduce the problem.


Thanks for the help Mike.  Nothing like an exception never seen before, two days 
before the product is due to go live, to induce mild panic ;)


Antony


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Help - FileNotFoundException during IndexWriter.init()

2007-04-01 Thread Michael McCandless
"Antony Bowesman" <[EMAIL PROTECTED]> wrote:
> Michael McCandless wrote:
> 
> >> Yes, I've disabled it currently while the new test runs.  Let's see. 
> >> I'll re-run the test a few more times and see if I can re-create the 
> >> problem.
> > 
> > OK let's see if that makes it go away!  Hopefully :)
> 
> I ran the tests several times over the weekend with no virus checker in the 
> DB 
> directory and haven't managed to reproduce the problem.
> 
> Thanks for the help Mike.  Nothing like an exception never seen before, two 
> days 
> before the product is due to go live, to induce mild panic ;)

Phew, I'm glad to hear that!  Thanks for bringing closure to this.

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lock files in a read-only application

2007-04-01 Thread Nilesh Bansal

thanks for your replies. i have two more questions.

You need to be really certain your own locking protects Lucene
properly.  Specifically, no IndexReader can be created (restarted)
while a writer is open against the index, and, only one writer can be
open on the index at once (it sounds like you already have that).  If
you're sure about that then disabling the locks as Hoss describes
above is OK.

1. If our locking fails, what will happen in the worst case, i.e., an
IndexSearcher tries to read while an IndexWriter is updating the
index. Can it lead to index corruption, or just that the searcher will
give garbage results (or fail with exception) for that query.

2. Currently we are not using any IndexReader. When a request arrives,
we create a new IndexSearcher, and destroy it when it finishes
searching. Is it more efficient to create just one IndexSearcher and
share it with all threads? Or create one IndexReader and use it for
creating all IndexSearchers.

thanks again,
Nilesh

--
Nilesh Bansal.
http://queens.db.toronto.edu/~nilesh/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]