Re: Concurrent read and write

2005-01-23 Thread Cheolgoo Kang
Lucene's write lock is simple. There are four places that requires the
write lock.
Those are IndexReader.delete(int), IndexReader.setNorm(int,String,byte),
IndexReader.undeleteAll(), and IndexWriter.init. Once you've used
one of methods
above, you should close that IndexReader or IndexWriter instance to
release the write
lock (and to avoid the Lock-obtain-timed-out exception).

For example, the following sequence should be okay.
 IndexReader reader = IndexReader.open( DIR );
 reader.delete( new Term( A, B ) );
 reader.close();
 IndexWriter writer = new IndexWriter( DIR, a, b );
 writer.add( oneDocument );
 writer.close();

But, a sequence following causes a Lock obtain timed out exception.
 IndexReader reader = IndexReader.open( DIR );
 reader.delete( new Term( A, B ) );
 IndexWriter writer = new IndexWriter( DIR, a, b );

Because, the write lock obtained at IndexReader.delete() wouldn't be removed
by IndexReader.close() and new IndexWriter() sentence requires a write lock.


On Fri, 21 Jan 2005 08:20:12 -0800 (PST), Otis Gospodnetic
[EMAIL PROTECTED] wrote:
 Hello Ashley,
 
 You can read/search while modifying the index, but you have to ensure
 only one thread or only one process is modifying an index at any given
 time.  Both IndexReader and IndexWriter can be used to modify an index.
  The former to delete Documents and the latter to add them.  You have
 to ensure these two operations don't overlap.
 c.f. http://www.lucenebook.com/search?query=concurrent
 
 Otis
 
 
 --- Ashley Steigerwalt [EMAIL PROTECTED] wrote:
 
  I am a little fuzzy on the thread-safeness of Lucene, or maybe just
  java.
  From what I understand, and correct me if I'm wrong, Lucene takes
  care of
  concurrency issues and it is ok to run a query while writing to an
  index.
 
  My question is, does this still hold true if the reader and writer
  are being
  executed as separate programs?  I have a cron job that will update
  the index
  periodically.  I also have a search application on a web form.  Is
  this going
  to cause trouble if someone runs a query while the indexer is
  updating?
 
  Ashley
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-- 
Cheolgoo, Kang

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: retrieving added document

2004-11-23 Thread Cheolgoo Kang
On Tue, 23 Nov 2004 22:47:21 +0100, Paul [EMAIL PROTECTED] wrote:
 Hi,
 I'm creating a document and adding it with a writer to the index. For
 some reason I need to add data to this specific document later on
 (minutes, not hours or days). Is it possible to retrieve it and add
 additonal data?

No, you cannot add additional data (or modify) to previously added document.
It's easy to delete the old one from the index and add a new document with
additional data included.

 I found the document(int n) - method within the IndexReader (btw: the
 description makes no sense for me: Returns the stored fields of the
 nth Document in this index. - but it returns a Document and not a
 list of fields..) but where do I get that number from? (and the
 numbers change, I know..)

Usually you search using IndexSearcher and it's resulting Hits has the doc-id
(the number) in that index. And the Document contains the list of
(stored) fields.

 
 thanks for any help
 
 Paul
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-- 
Cheolgoo, Kang

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: modifying existing index

2004-11-23 Thread Cheolgoo Kang
On Wed, 24 Nov 2004 13:04:20 +0530, Santosh [EMAIL PROTECTED] wrote:
 I have gon through IndexReader , I got method : delete(int docNum)   ,
 but from where I will get document number? Is  this predifined? or we have
 to give a number prior  to indexing?

The number(aka doc-id) is given by lucene and is it's an internal sequential
integer. This number is usually retrieved from Hits.id(int) of your search.

Hits myHits = myIndexSearcher.search( myQuery );
for ( int i=0; imyHits.length(); i++ ) {
  Document doc = myHits.doc( myHits.id( i ) );
  // myHits.id( i ) retrieves the i-th doc-id and
  // myHits.doc( myHits.id( i ) ) returns the desired i-th document
  // in the result myHits.
}

HTH

 
 
 - Original Message -
 From: Luke Francl [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Wednesday, November 24, 2004 1:26 AM
 Subject: Re: modifying existing index
 
  On Tue, 2004-11-23 at 13:59, Santosh wrote:
   I am using lucene for indexing, when I am creating Index the docuemnts
 are added. but when I want to modify the single existing document and
 reIndex again, it is taking as new document and adding one more time, so
 that I am getting same document twice in the results.
   To overcome this I am deleting existing Index and again recreating whole
 Index. but is it possibe to index  the modified document again and overwrite
 existing document without deleting and recreation. can I do this? If so how?
 
  You do not need to recreate the whole index. Just mark the document as
  deleted using the IndexReader and then add it again with the
  IndexWriter. Remember to close your IndexReader and IndexWriter after
  doing this.
 
  The deleted document will be removed the next time you optimize your
  index.
 
  Luke Francl
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-- 
Cheolgoo, Kang

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Encrypted indexes

2004-10-13 Thread Cheolgoo Kang
I think it's possible to make a field encrypted by an symmetric encryption
algorithms just the same as the compressed field and algorithms such like
DES can be used with little performance loss.

If the ability to block reverse engineering is critical, you should use PKI
and would result more and more performance loss than those symmectic
methods.


On Wed, 13 Oct 2004 15:33:53 +0200, petite_abeille
[EMAIL PROTECTED] wrote:
 
 On Oct 13, 2004, at 15:26, Nader Henein wrote:
 
  Well, are you storing any data for retrieval from the index, because
  you could encrypt the actual data and then encrypt the search string
  public key style.
 
 Alternatively, write your index to an encrypted volume... something
 along the line of FileVault and PGP Disk [1] [2].
 
 PA.
 
 [1] http://www.apple.com/macosx/features/filevault/
 [2] http://www.pgp.com/products/desktop/index.html
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


Cheolgoo, Kang

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: View lucene index file

2004-09-09 Thread Cheolgoo Kang
check out this page

http://jakarta.apache.org/lucene/docs/contributions.html

we got tools like LIMO and Luke.

Cheo

On Thu, 9 Sep 2004 23:38:17 -0400, Anne Y. Zhang [EMAIL PROTECTED] wrote:
 I am using Nutch. Is there any way I can view the lucene index file?
 It seems that lucene write index as binary file. Could anybody explain
 how lucene does the indexing and where the index file located?
 Thank you very much!
 
 Ya
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 



-- 
Cheolgoo, Kang

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Problems with special characters

2004-07-02 Thread Cheolgoo Kang
How about creating a special-char-converting-reader like this?

public class LuceneReader extends Reader {
 private Reader source = null;
 private char buffer = (char) 0;
 public LuceneReader( Reader sourceReader ) {
  this.source = sourceReader;
 }
 public int read() {
  char result = (char) 0;
  if ( buffer != (char) 0 ) {
   result = buffer;
   buffer = (char) 0;
   return result;
  }
  result = (char) source.read();
  if ( isSpecialCharacter( result ) ) {
   buffer = result;
   return '\\';
  }
  return result;
 }
 private boolean isSpecialCharacter( char c ) {
  return ( c == '+' /* all special characters */ );
 }
}

The LuceneReader.read() above checks for the char to be returned.
if it's one of those special characters, it buffers the char and return '\'.

I've just wrote it instantly and of course not a complete one but can
be your starting point.

Cheolgoo


On Fri, 2 Jul 2004 12:44:48 +0200, Marten Senkel
[EMAIL PROTECTED] wrote:
 
 
 I had a similar problem.
 I don't know whether there is a more intelligent solution, but the quickest I had in 
 mind was to
 convert the special characters I needed to look up into a fixed random character 
 string. For
 example: prior to indexing I replace all occurences of '+' by 'PLUSsdfaEGsgfAE'.
 
 When searching I intercept the terms the user entered, replace '+' by the same 
 random character
 string and search for it instead of the original special character.
 This works, of course, only if one constructs the query by oneself giving the user 
 only some basic
 checkbox options to specify 'AND' or 'OR' queries for example.
 
 If you use sth like this users wouldn't be able to write themselves 'advanced' 
 searches like +foo
 +bar as the command sign '+' would be converted as well.
 A fix for that problem could be to convert 'C+' to a random string and replace only 
 'C+' by the
 random string when searching ... this would leave the command '+' intact.
 
 It's a very basic and quick  dirty solution, I know, but it worked well for me.
 
 Marten
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]