Re: locking problems

2004-10-08 Thread Doug Cutting
Aad Nales wrote:
1. can I have one or multiple searchers open when I open a writer?
2. can I have one or multiple readers open when I open a writer?
Yes, with one caveat: if you've called the IndexReader methods delete(), 
undelete() or setNorm() then you may not open an IndexWriter until 
you've closed that IndexReader instance.

In general, only a single object may modify an index at once, but many 
may access it simultaneously in a read-only manner, including while it 
is modified.  Indexes are modified by either an IndexWriter or by the 
IndexReader methods delete(), undelete() and setNorm().

Typically an application which modifies and searches simultaneously 
should keep the following open:

  1. A single IndexReader instance used for all searches, perhaps 
opened via an IndexSearcher.  Periodically, as the index changes, this 
is discarded, and replaced with a new instance.

  2. Either:
 a. An IndexReader to delete documents.
 b. An IndexWriter to add documents; or
So an updating thread might open (2a), delete old documents, close it, 
then open (2b) add new documents, perhaps optimize, then close.  At this 
point, when the index has been updated (1) can be discarded and replaced 
with a new instance.  Typically the old instance of (1) is not 
explicitly closed, rather the garbage collector closes it when the last 
thread searching it completes.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Sort regeneration in multithreaded server

2004-10-08 Thread Doug Cutting
Stephen Halsey wrote:
I was wondering if anyone could help with a problem (or should that be
"challenge"?) I'm having using Sort in Lucene over a large number of records
in multi-threaded server program on a continually updated index.
I am using lucene-1.4-rc3.
A number of bugs with the sorting code have been fixed since that 
release.  Can you please try with 1.4.2 and see if you still have the 
problem?  Thanks.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Demo lucene

2004-10-08 Thread Miguel Angel
I use debian sarge, i work with lucene for a project, how use demo
lucene in the web page ???

-- 
Miguel Angel Angeles R.
Asesoria en Conectividad y Servidores
Telf. 97451277

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Indexing Strategy for 20 million documents

2004-10-08 Thread Justin Swanhart
It depends on a lot of factors.  I myself use multiple indexes for
about 10M documents.
My documents are transient.  Each day I get about 400K and I remove
about 400K.  I
always remove an entire days documents at one time.  It is much
faster/easier to delete
the lucene index for the day that I am removing, then looping through
one big index and
removing the entries with the IndexReader.  Since my data is also
partitioned by day in
my database, I essentially do the same thing there with "truncate table."

I use a ParallelMultiSearcher object to search the indexes.  I store
my indexes on a 14
disk 15k rpm  fibre channel RAID 1+0 array (striped mirrors).

I get very good performance in both updating and searching indexes.

On Fri, 8 Oct 2004 06:11:37 -0700 (PDT), Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Jeff,
> 
> These questions are difficult to answer, because the answer depends on
> a number of factors, such as:
> - hardware (memory, disk speed, number of disks...)
> - index complexity and size (number of fields and their size)
> - number of queries/second
> - complexity of queries
> etc.
> 
> I would try putting everything in a single index first, and split it up
> only if I see performance issues.  Going from 1 index to N indices is
> not a lot of work (not a lot of Lucene-related code).  If searching 1
> big index is too slow, split your index, put each index on a separate
> disk, and use ParallelMultiSearcher
> (http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/ParallelMultiSearcher.html)
> to search your indices.
> 
> Otis
> 
> 
> 
> 
> --- Jeff Munson <[EMAIL PROTECTED]> wrote:
> 
> > I am a new user of Lucene.  I am looking to index over 20 million
> > documents (and a lot more someday) and am looking for ideas on the
> > best
> > indexing/search strategy.
> >
> > Which will optimize the Lucene search, one index or multiple indexes?
> > Do I create multiple indexes and merge them all together?  Or do I
> > create multiple indexes and search on the multiple indexes?
> >
> > Any helpful ideas would be appreciated!
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: first demo with lucene

2004-10-08 Thread Miguel Angel
> Hi, I´m Miguel Angel, i use demo lucene from official web site, the
> demo i use in the console, now i how use demo for web in jsp ???
> --
> Miguel Angel Angeles R.
> Asesoria en Conectividad y Servidores
> Telf. 97451277
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



first demo with lucene

2004-10-08 Thread Miguel Angel
Hi, I´m Miguel Angel, i use demo lucene from official web site, the
demo i use in the console, now i how use demo for web in jsp ???
-- 
Miguel Angel Angeles R.
Asesoria en Conectividad y Servidores
Telf. 97451277

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Making lucene work in weblogic cluster

2004-10-08 Thread David Townsend
Doug discusses the locking issue, with a potential solution

http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1619988



-Original Message-
From: Praveen Peddi [mailto:[EMAIL PROTECTED]
Sent: 08 October 2004 16:10
To: lucenelist
Subject: Making lucene work in weblogic cluster


While I was going through the mailing list in solving the lucene cluster problem, I 
came accross this thread. Does any one know if David Townsend had submitted the patch 
he was talking about?
http://www.mail-archive.com/[EMAIL PROTECTED]/msg06252.html

I am interested in looking at the NFS solution (mounting the shared drive on each 
server in cluster). I don't know if anyone has used this solution in cluster but this 
seems to be a better approach than RemoteSearchable interface and DB based index 
(SQLDirectory).


I am currently looking at 2 options:
Index on Shared drive: Use single index dir on a shared drive (NFS, etc.), which is 
mounted on each app server. All the servers in the cluster write to this shared drive 
when objects are modified.
Problems:
1) Known problems like file locking etc. (The above thread talks about moving locking 
mechanism to DB but I have no idea how).
2) Performance.

Index Per Server: Create copies of the index dir for each machine. Requires regular 
updates, etc. Each server maintains its own index and searches on its own index.
Problems:
1) Modifying the index is complex. When Objects are modified on a server1 that does 
not run the search system, server1 needs to notify all servers in the cluster about 
these modifications so that each server can update its own index. This may involve 
some kind of remote communication mechanism which will perform bad since our index 
modifies a lot.

So I am still reviewing both options and trying to figure out which one is the best 
and how to solve the above problems.

If you guys have any ideas, Pls shoot them. I would appreciate any help regarding 
making lucene clusterable (both indexing and searching).

Praveen

** 
Praveen Peddi
Sr Software Engg, Context Media, Inc. 
email:[EMAIL PROTECTED] 
Tel:  401.854.3475 
Fax:  401.861.3596 
web: http://www.contextmedia.com 
** 
Context Media- "The Leader in Enterprise Content Integration" 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Making lucene work in weblogic cluster

2004-10-08 Thread David Townsend
No I didn't.  If you look for NFS in the archives, there is an alternate solution out 
there.  I suppose I should get around to submitting the patch.

-Original Message-
From: Praveen Peddi [mailto:[EMAIL PROTECTED]
Sent: 08 October 2004 16:10
To: lucenelist
Subject: Making lucene work in weblogic cluster


While I was going through the mailing list in solving the lucene cluster problem, I 
came accross this thread. Does any one know if David Townsend had submitted the patch 
he was talking about?
http://www.mail-archive.com/[EMAIL PROTECTED]/msg06252.html

I am interested in looking at the NFS solution (mounting the shared drive on each 
server in cluster). I don't know if anyone has used this solution in cluster but this 
seems to be a better approach than RemoteSearchable interface and DB based index 
(SQLDirectory).


I am currently looking at 2 options:
Index on Shared drive: Use single index dir on a shared drive (NFS, etc.), which is 
mounted on each app server. All the servers in the cluster write to this shared drive 
when objects are modified.
Problems:
1) Known problems like file locking etc. (The above thread talks about moving locking 
mechanism to DB but I have no idea how).
2) Performance.

Index Per Server: Create copies of the index dir for each machine. Requires regular 
updates, etc. Each server maintains its own index and searches on its own index.
Problems:
1) Modifying the index is complex. When Objects are modified on a server1 that does 
not run the search system, server1 needs to notify all servers in the cluster about 
these modifications so that each server can update its own index. This may involve 
some kind of remote communication mechanism which will perform bad since our index 
modifies a lot.

So I am still reviewing both options and trying to figure out which one is the best 
and how to solve the above problems.

If you guys have any ideas, Pls shoot them. I would appreciate any help regarding 
making lucene clusterable (both indexing and searching).

Praveen

** 
Praveen Peddi
Sr Software Engg, Context Media, Inc. 
email:[EMAIL PROTECTED] 
Tel:  401.854.3475 
Fax:  401.861.3596 
web: http://www.contextmedia.com 
** 
Context Media- "The Leader in Enterprise Content Integration" 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Making lucene work in weblogic cluster

2004-10-08 Thread Praveen Peddi
While I was going through the mailing list in solving the lucene cluster problem, I 
came accross this thread. Does any one know if David Townsend had submitted the patch 
he was talking about?
http://www.mail-archive.com/[EMAIL PROTECTED]/msg06252.html

I am interested in looking at the NFS solution (mounting the shared drive on each 
server in cluster). I don't know if anyone has used this solution in cluster but this 
seems to be a better approach than RemoteSearchable interface and DB based index 
(SQLDirectory).


I am currently looking at 2 options:
Index on Shared drive: Use single index dir on a shared drive (NFS, etc.), which is 
mounted on each app server. All the servers in the cluster write to this shared drive 
when objects are modified.
Problems:
1) Known problems like file locking etc. (The above thread talks about moving locking 
mechanism to DB but I have no idea how).
2) Performance.

Index Per Server: Create copies of the index dir for each machine. Requires regular 
updates, etc. Each server maintains its own index and searches on its own index.
Problems:
1) Modifying the index is complex. When Objects are modified on a server1 that does 
not run the search system, server1 needs to notify all servers in the cluster about 
these modifications so that each server can update its own index. This may involve 
some kind of remote communication mechanism which will perform bad since our index 
modifies a lot.

So I am still reviewing both options and trying to figure out which one is the best 
and how to solve the above problems.

If you guys have any ideas, Pls shoot them. I would appreciate any help regarding 
making lucene clusterable (both indexing and searching).

Praveen

** 
Praveen Peddi
Sr Software Engg, Context Media, Inc. 
email:[EMAIL PROTECTED] 
Tel:  401.854.3475 
Fax:  401.861.3596 
web: http://www.contextmedia.com 
** 
Context Media- "The Leader in Enterprise Content Integration" 


locking problems

2004-10-08 Thread Aad Nales
Based on discussions in this group I figure that I should 'cache'
IndexSearchers and IndexReaders, which i did. I have build an
IndexSearcherPool and an IndexReaderPool. Both seem to work fine
(although I am still testing). However, whenever I use these I can not
create an IndexWriter. The thread fails and generates a timeout on
org.apache.lucene.store.Lock.obtain (1.3.1) in line 97.

Can somebody help me to figure out with what actions these locks are
obtained? I have been reading all faq's on the subject but failed to
understand the following:

1. can I have one or multiple searchers open when I open a writer?
2. can I have one or multiple readers open when I open a writer?

And if not. I am writing an application that does regular updates on the
index what kind of strategy could you advise? Should  I use
ResourcePooling at all?

TIA,
Aad Nales


--
Aad Nales
[EMAIL PROTECTED], +31-(0)6 54 207 340 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Indexing Strategy for 20 million documents

2004-10-08 Thread Otis Gospodnetic
Jeff,

These questions are difficult to answer, because the answer depends on
a number of factors, such as:
- hardware (memory, disk speed, number of disks...)
- index complexity and size (number of fields and their size)
- number of queries/second
- complexity of queries
etc.

I would try putting everything in a single index first, and split it up
only if I see performance issues.  Going from 1 index to N indices is
not a lot of work (not a lot of Lucene-related code).  If searching 1
big index is too slow, split your index, put each index on a separate
disk, and use ParallelMultiSearcher
(http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/ParallelMultiSearcher.html)
to search your indices.

Otis


--- Jeff Munson <[EMAIL PROTECTED]> wrote:

> I am a new user of Lucene.  I am looking to index over 20 million
> documents (and a lot more someday) and am looking for ideas on the
> best
> indexing/search strategy.  
> 
> Which will optimize the Lucene search, one index or multiple indexes?
> Do I create multiple indexes and merge them all together?  Or do I
> create multiple indexes and search on the multiple indexes?  
> 
> Any helpful ideas would be appreciated!
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Indexing Strategy for 20 million documents

2004-10-08 Thread Jeff Munson
I am a new user of Lucene.  I am looking to index over 20 million
documents (and a lot more someday) and am looking for ideas on the best
indexing/search strategy.  

Which will optimize the Lucene search, one index or multiple indexes?
Do I create multiple indexes and merge them all together?  Or do I
create multiple indexes and search on the multiple indexes?  

Any helpful ideas would be appreciated!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Sort regeneration in multithreaded server

2004-10-08 Thread Stephen Halsey
Hi,

I was wondering if anyone could help with a problem (or should that be
"challenge"?) I'm having using Sort in Lucene over a large number of records
in multi-threaded server program on a continually updated index.

I am using lucene-1.4-rc3.

Question in more general terms:- Is it possible to write a multithreaded
search program which uses a Sort object that is updated at regular intervals
(e.g. every 5 minutes, taking 5 seconds to regenerate) while the searching
threads continue to do their sorted searching without any 5 seconds
interruption?

Question in quick specific format: Can I generate a new updated Sort object
in a separate Thread of my search server program while the original Sort
object continues to be used in the other Threads of the program and then
switch the searching Threads to the new Sort object?

More details: We are using Lucene to index about one million news articles
and the index size is about 3Gb and needs to be continually updated with new
news records.  I have written a search server which performs sorted searches
on the index.  The "challenge" is that the Sort object does not update in
memory as the index is updated on disk and so has to be regenerated.  This
takes about 5 seconds and so cannot be done for every single search.  I
thought I would be able to regenerate the Sort and Searcher objects in a
separate Thread and then pass them to the searcher Threads for searching,
but have found that there seems to be some kind of memory locking that stops
this from being possible.

I have written a simple test program (attached, with output) that
demonstrates this problem by running a sorted search in one or two threads.
If you run it with one thread it runs fine, with the searches that
regenerate the Sort object taking about 5 seconds and the searches
themselves taking only 0.25 seconds.  But if you run it with two threads
then every search takes about 10 seconds, which implies that the Sort object
is being regenerated for every single search.  I am guessing that this is
because Lucene has been written in a Thread safe way and so to be safe the
Sort object is being regenerated every time?

If it turns out that what I am trying to do is not possible then I will
probably just restart the search server program every 5 minutes and load
balance the searches across a number of servers, but that seems a bit messy
compared to regenerating it in memory in a continually running program?
Thanks in advance, and don't worry - its not urgent and if I don't get the
answer I think it should be OK(ish) doing it the messy restarting server
way.

ta


Steve


testDoTwoSeparateThreadsWithSorts.java:-

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.FilterIndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.search.Searcher;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.SortField;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Hits;
import org.apache.lucene.queryParser.QueryParser;

import java.net.*;
import java.io.*;
import java.util.*;
import java.lang.*;

//*//
//
// This program tests running two separate threads each running searches and
then refreshing the Sort object
// every so often.  This is needed in our search server since it runs
continuously in multiple threads and
// never dies and so as the lucene index is updated the Sort and Searcher
objects in each thread have to be updated.
// I find with this program that when two threads are running the Sort
object seems to be regererated every time
// which causes each search to take about 10 seconds.  With only one thread
the regeneration of the Sort object takes
// about 5 seconds and then each search only takes 200 milliseconds or so.
//
// cd /home1/moreover/lucene/test_programs/; javac
testDoTwoSeparateThreadsWithSorts.java; java -ms100m -mx200m
testDoTwoSeparateThreadsWithSorts
/home1/moreover/lucene_indexes/testKeepSortInMemoryIndex/ news
dontRunSecondThread
//
// cd /home1/moreover/lucene/test_programs/; javac
testDoTwoSeparateThreadsWithSorts.java; java -ms100m -mx200m
testDoTwoSeparateThreadsWithSorts
/home1/moreover/lucene_indexes/testKeepSortInMemoryIndex/ news
doRunSecondThread
//
//
//
//**//


class testDoTwoSeparateThreadsWithSorts {

public static void main(String[] args) {

 try {
 // initialise variables
 String indexDirectory = args[0];
 String query = args[1];
 String runSecondThread = args[2];

 System.out.println(": Starting first thread to do s

Re: WebLucene 0.5 released: with a SAX based indexing sample Re: XML Indexing

2004-10-08 Thread Sumathi

  Hi ,

   As of now , WebLucene is working from command as a standalone
application (i can both index and search). but when i try it as a
webapplication using tomcat server , i'm getting a blank page  :(. Can u
please tell me what could be the problem? and also the purpose of creating
various XSLs.

  Expecting some Help from u ,
  Thanks in Advance !
  - Original Message - 
  From: "Che Dong" <[EMAIL PROTECTED]>
  To: "Lucene Users List" <[EMAIL PROTECTED]>
  Sent: Wednesday, October 06, 2004 8:02 PM
  Subject: Re: WebLucene 0.5 released: with a SAX based indexing sample Re:
XML Indexing


  > You can found a INSTALL.txt in gzipped package and a sample xml data
  > source within dump/ directory and run the command line IndexRunner to
  > build index.
  >
  > Good luck
  >
  > Che Dong
  >
  >
  >
  > Sumathi wrote:
  > >   can u pls tellme where can i find a complete
documentation/tutorialhelp
  > > regarding using this api?
  > >
  > >   - Original Message - 
  > >   From: "Che Dong" <[EMAIL PROTECTED]>
  > >   To: "Lucene Users List" <[EMAIL PROTECTED]>
  > >   Sent: Tuesday, October 05, 2004 11:20 PM
  > >   Subject: WebLucene 0.5 released: with a SAX based indexing sample
Re: XML
  > > Indexing
  > >
  > >
  > >   > http://sourceforge.net/projects/weblucene/
  > >   >
  > >   > Regards
  > >   >
  > >   > Che Dong
  > >   > http://www.chedong.com/tech/weblucene.html
  > >   >
  > >   > Sumathi wrote:
  > >   > >   Can any one give me a demo for indexing XML files ?
  > >   > >
  > >   >
  > >   >
  > >
  > -
  > >   > To unsubscribe, e-mail: [EMAIL PROTECTED]
  > >   > For additional commands, e-mail:
[EMAIL PROTECTED]
  > >   >
  > >
  > >
  > > -
  > > To unsubscribe, e-mail: [EMAIL PROTECTED]
  > > For additional commands, e-mail: [EMAIL PROTECTED]
  > >
  > >
  >
  >
  > -
  > To unsubscribe, e-mail: [EMAIL PROTECTED]
  > For additional commands, e-mail: [EMAIL PROTECTED]
  >


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



searching using the CJKAnalyzer

2004-10-08 Thread Daan Hoogland
LS,
in
http://issues.apache.org/eyebrowse/ReadMsg?listId=30&msgNo=8980
Jon Schuster explains how to get a Japanese search system working. I 
followed his advice and got a index that "luke" shows as what I expected 
it to be.
I don't know how to enter a search so that it gets passed to the engine 
properly. It works in luke but not in weblucene or in my own app.


-- 
The information contained in this communication and any attachments is confidential 
and may be privileged, and is for the sole use of the intended recipient(s). Any 
unauthorized review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please notify the sender immediately by replying to this message 
and destroy all copies of this message and any attachments. ASML is neither liable for 
the proper and complete transmission of the information contained in this 
communication, nor for any delay in its receipt.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]