Re: problems with lucene in multithreaded environment

2004-06-08 Thread Jayant Kumar
 --- Doug Cutting [EMAIL PROTECTED] wrote:  Jayant
Kumar wrote:
  Thanks for the patch. It helped in increasing the
  search speed to a good extent.
 
 Good.  I'll commit it.  Thanks for testing it.
 
  But when we tried to
  give about 100 queries in 10 seconds, then again
 we
  found that after about 15 seconds, the response
 time
  per query increased.
 
 This still sounds very slow to me.  Is your index
 optimized?  What JVM 
 are you using?

Yes our index is optimized and we are using java
version 1.4.2. What would be the difference between
search on a optimized index and that on an unoptimized
index.

 
 You might also consider ramping up your benchmark
 more slowly, to warm 
 the filesystem's cache.  So, when you first launch
 the server, give it a 
 few queries at a lower rate, then, after those have
 completed, try a 
 higher rate.
 

We have tried this out and since we have lots of ram,
once the indexes go into the os cache, results come
out fast.

  We were able to simplify the searches further by
  consolidating the fields in the index but that
  resulted in increasing the index size to 2.5 GB as
 we
  required fields 2-5 and fields 1-7 in different
  searches.
 
 That will slow updates a bit, but searching should
 be faster.
 
 How about your range searches?  Do you know how many
 terms they match? 
 The easiest way to determine this might be to insert
 a print statement 
 in RangeQuery.rewrite() that shows the query before
 it is returned.
 
  Our indexes are on the local disk therefor
  there is no network i/o involved.
 
 It does like file i/o is now your bottleneck.  The
 traces below show 
 that you're using the compound file format, which
 combines many files 
 into one.  When two threads try to read two
 logically different files 
 (.prx and .frq below) they must sychronize when the
 compound format is 
 used.  But if your application did not use the
 compound format this 
 synchronization would not be required.  So you
 should try rebuilding 
 your index with the compound format turned off. 
 (The fastest way to do 
 this is simply to add and/or delete a single
 document, then re-optimize 
 the index with compound format turned off.  This
 will cause the index to 
 be re-written in non-compound format.)

We have changed the indexwriter to write index in
non-compound format and have noticed a little
difference in the response time.

 
 Is this on linux?  If so, please try running 'iostat
 -x 1' while you 
 perform your benchmark (iostat is installed by the
 'sysstat' package). 
 What percentage is the disk utilized (%util)?  What
 is the percentage of 
 idle CPU (%idle)?  What is the rate of data that is
 read (rkB/s)?  If 
 things really are i/o bound then you might consider
 spreading the data 
 over multiple disks, e.g., with lvm striping or a
 RAID controller.
 
 If you have a lot of RAM, then you could also
 consider moving certain 
 files of the index onto a ramfs-based drive.  For
 example, moving the 
 .tis, .frq and .prx can greatly improve performance.
  Also, having these 
 files in RAM means that the cache does not need to
 be warmed.
 
 Hope this helps!

Thanks for the help. We will continue interacting with
you as and when we face problems. We would also like
you to know that lucene is an excellent search engine
compared to any other.

Jayant
  


Yahoo! India Matrimony: Find your partner online. 
http://yahoo.shaadi.com/india-matrimony/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: problems with lucene in multithreaded environment

2004-06-07 Thread Doug Cutting
Jayant Kumar wrote:
Thanks for the patch. It helped in increasing the
search speed to a good extent.
Good.  I'll commit it.  Thanks for testing it.
But when we tried to
give about 100 queries in 10 seconds, then again we
found that after about 15 seconds, the response time
per query increased.
This still sounds very slow to me.  Is your index optimized?  What JVM 
are you using?

You might also consider ramping up your benchmark more slowly, to warm 
the filesystem's cache.  So, when you first launch the server, give it a 
few queries at a lower rate, then, after those have completed, try a 
higher rate.

We were able to simplify the searches further by
consolidating the fields in the index but that
resulted in increasing the index size to 2.5 GB as we
required fields 2-5 and fields 1-7 in different
searches.
That will slow updates a bit, but searching should be faster.
How about your range searches?  Do you know how many terms they match? 
The easiest way to determine this might be to insert a print statement 
in RangeQuery.rewrite() that shows the query before it is returned.

Our indexes are on the local disk therefor
there is no network i/o involved.
It does like file i/o is now your bottleneck.  The traces below show 
that you're using the compound file format, which combines many files 
into one.  When two threads try to read two logically different files 
(.prx and .frq below) they must sychronize when the compound format is 
used.  But if your application did not use the compound format this 
synchronization would not be required.  So you should try rebuilding 
your index with the compound format turned off.  (The fastest way to do 
this is simply to add and/or delete a single document, then re-optimize 
the index with compound format turned off.  This will cause the index to 
be re-written in non-compound format.)

Is this on linux?  If so, please try running 'iostat -x 1' while you 
perform your benchmark (iostat is installed by the 'sysstat' package). 
What percentage is the disk utilized (%util)?  What is the percentage of 
idle CPU (%idle)?  What is the rate of data that is read (rkB/s)?  If 
things really are i/o bound then you might consider spreading the data 
over multiple disks, e.g., with lvm striping or a RAID controller.

If you have a lot of RAM, then you could also consider moving certain 
files of the index onto a ramfs-based drive.  For example, moving the 
.tis, .frq and .prx can greatly improve performance.  Also, having these 
files in RAM means that the cache does not need to be warmed.

Hope this helps!
Doug
  Thread-23 prio=1 tid=0x08169f38 nid=0x2867 waiting for monitor 
entry [69bd4000..69bd48c8]
at 
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:217)
- waiting to lock 0x46f1b828 (a org.apache.lucene.store.FSInputStream)
at org.apache.lucene.store.InputStream.refill(InputStream.java:158)
at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
at 
org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:58)
Thread-22 prio=1 tid=0x08159f78 nid=0x2866 waiting for monitor entry 
[69b53000..69b538c8]
at 
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:217)
- waiting to lock 0x46f1b828 (a org.apache.lucene.store.FSInputStream)
at org.apache.lucene.store.InputStream.refill(InputStream.java:158)
at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
at org.apache.lucene.store.InputStream.readVInt(InputStream.java:86)
at org.apache.lucene.index.SegmentTermDocs.read(SegmentTermDocs.java:126)
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: problems with lucene in multithreaded environment

2004-06-05 Thread Jayant Kumar
Thanks for the patch. It helped in increasing the
search speed to a good extent. But when we tried to
give about 100 queries in 10 seconds, then again we
found that after about 15 seconds, the response time
per query increased. Enclosed is the dump which we
took after about 30 seconds of starting the search.
The maximum query time has reduced from 200-300
seconds to about 50 seconds.

We were able to simplify the searches further by
consolidating the fields in the index but that
resulted in increasing the index size to 2.5 GB as we
required fields 2-5 and fields 1-7 in different
searches. Our indexes are on the local disk therefor
there is no network i/o involved.

Thanks
Jayant

 --- Doug Cutting [EMAIL PROTECTED] wrote:  Doug
Cutting wrote:
  Please tell me if you are able to simplify your
 queries and if that 
  speeds things.  I'll look into a ThreadLocal-based
 solution too.
 
 I've attached a patch that should help with the
 thread contention, 
 although I've not tested it extensively.
 
 I still don't fully understand why your searches are
 so slow, though. 
 Are the indexes stored on the local disk of the
 machine?  Indexes 
 accessed over the network can be very slow.
 
 Anyway, give this patch a try.  Also, if anyone else
 can try this and 
 report back whether it makes multi-threaded
 searching faster, or 
 anything else slower, or is buggy, that would be
 great.
 
 Thanks,
 
 Doug
  Index:

src/java/org/apache/lucene/index/TermInfosReader.java

===
 RCS file:

/home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/TermInfosReader.java,v
 retrieving revision 1.6
 diff -u -u -r1.6 TermInfosReader.java
 ---

src/java/org/apache/lucene/index/TermInfosReader.java
 20 May 2004 11:23:53 -1.6
 +++

src/java/org/apache/lucene/index/TermInfosReader.java
 4 Jun 2004 21:45:15 -
 @@ -29,7 +29,8 @@
private String segment;
private FieldInfos fieldInfos;
  
 -  private SegmentTermEnum enumerator;
 +  private ThreadLocal enumerators = new
 ThreadLocal();
 +  private SegmentTermEnum origEnum;
private long size;
  
TermInfosReader(Directory dir, String seg,
 FieldInfos fis)
 @@ -38,19 +39,19 @@
  segment = seg;
  fieldInfos = fis;
  
 -enumerator = new
 SegmentTermEnum(directory.openFile(segment +
 .tis),
 -fieldInfos, false);
 -size = enumerator.size;
 +origEnum = new
 SegmentTermEnum(directory.openFile(segment +
 .tis),
 +   fieldInfos,
 false);
 +size = origEnum.size;
  readIndex();
}
  
public int getSkipInterval() {
 -return enumerator.skipInterval;
 +return origEnum.skipInterval;
}
  
final void close() throws IOException {
 -if (enumerator != null)
 -  enumerator.close();
 +if (origEnum != null)
 +  origEnum.close();
}
  
/** Returns the number of term/value pairs in the
 set. */
 @@ -58,6 +59,15 @@
  return size;
}
  
 +  private SegmentTermEnum getEnum() {
 +SegmentTermEnum enum =
 (SegmentTermEnum)enumerators.get();
 +if (enum == null) {
 +  enum = terms();
 +  enumerators.set(enum);
 +}
 +return enum;
 +  }
 +
Term[] indexTerms = null;
TermInfo[] indexInfos;
long[] indexPointers;
 @@ -102,16 +112,17 @@
}
  
private final void seekEnum(int indexOffset)
 throws IOException {
 -enumerator.seek(indexPointers[indexOffset],
 -   (indexOffset * enumerator.indexInterval) -
 1,
 +getEnum().seek(indexPointers[indexOffset],
 +   (indexOffset * getEnum().indexInterval) - 1,
 indexTerms[indexOffset],
 indexInfos[indexOffset]);
}
  
/** Returns the TermInfo for a Term in the set,
 or null. */
 -  final synchronized TermInfo get(Term term) throws
 IOException {
 +  TermInfo get(Term term) throws IOException {
  if (size == 0) return null;
  
 -// optimize sequential access: first try
 scanning cached enumerator w/o seeking
 +// optimize sequential access: first try
 scanning cached enum w/o seeking
 +SegmentTermEnum enumerator = getEnum();
  if (enumerator.term() != null
 // term is at or past current
((enumerator.prev != null 
 term.compareTo(enumerator.prev)  0)
   || term.compareTo(enumerator.term()) = 0)) {
 @@ -128,6 +139,7 @@
  
/** Scans within block for matching term. */
private final TermInfo scanEnum(Term term) throws
 IOException {
 +SegmentTermEnum enumerator = getEnum();
  while (term.compareTo(enumerator.term())  0 
 enumerator.next()) {}
  if (enumerator.term() != null 
 term.compareTo(enumerator.term()) == 0)
return enumerator.termInfo();
 @@ -136,10 +148,12 @@
}
  
/** Returns the nth term in the set. */
 -  final synchronized Term get(int position) throws
 IOException {
 +  final Term get(int position) throws IOException {
  if (size == 0) return null;
  
 -if (enumerator != null  

Re: problems with lucene in multithreaded environment

2004-06-04 Thread Doug Cutting
Jayant Kumar wrote:
Please find enclosed jvmdump.txt which contains a dump
of our search program after about 20 seconds of
starting the program.
Also enclosed is the file queries.txt which contains
few sample search queries.
Thanks for the data.  This is exactly what I was looking for.
Thread-14 prio=1 tid=0x080a7420 nid=0x468e waiting for monitor entry 
[4d61a000..4d61ac18]
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:112)
- waiting to lock 0x44c95228 (a org.apache.lucene.index.TermInfosReader)
Thread-12 prio=1 tid=0x080a58e0 nid=0x468e waiting for monitor entry 
[4d51a000..4d51ad18]
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:112)
- waiting to lock 0x44c95228 (a org.apache.lucene.index.TermInfosReader)
These are all stuck looking terms up in the dictionary (TermInfos). 
Things would be much faster if your queries didn't have so many terms.

Query : (  (  (  (  (  FIELD1: proof OR  FIELD2: proof OR  FIELD3: proof OR  FIELD4: proof OR  FIELD5: proof OR  FIELD6: proof OR  FIELD7: proof ) AND (  FIELD1: george bush OR  FIELD2: george bush OR  FIELD3: george bush OR  FIELD4: george bush OR  FIELD5: george bush OR  FIELD6: george bush OR  FIELD7: george bush )  ) AND (  FIELD1: script OR  FIELD2: script OR  FIELD3: script OR  FIELD4: script OR  FIELD5: script OR  FIELD6: script OR  FIELD7: script )  ) AND (  (  FIELD1: san OR  FIELD2: san OR  FIELD3: san OR  FIELD4: san OR  FIELD5: san OR  FIELD6: san OR  FIELD7: san ) OR (  (  FIELD1: war OR  FIELD2: war OR  FIELD3: war OR  FIELD4: war OR  FIELD5: war OR  FIELD6: war OR  FIELD7: war ) OR (  (  FIELD1: gulf OR  FIELD2: gulf OR  FIELD3: gulf OR  FIELD4: gulf OR  FIELD5: gulf OR  FIELD6: gulf OR  FIELD7: gulf ) OR (  (  FIELD1: laden OR  FIELD2: laden OR  FIELD3: laden OR  FIELD4: laden OR  FIELD5: laden OR  FIELD6: laden OR  FIELD7: laden ) OR (  (  FIE
LD1: ttouristeat OR  FIELD2: ttouristeat OR  FIELD3: ttouristeat OR  FIELD4: 
ttouristeat OR  FIELD5: ttouristeat OR  FIELD6: ttouristeat OR  FIELD7: ttouristeat ) 
OR (  (  FIELD1: pow OR  FIELD2: pow OR  FIELD3: pow OR  FIELD4: pow OR  FIELD5: pow 
OR  FIELD6: pow OR  FIELD7: pow ) OR (  FIELD1: bin OR  FIELD2: bin OR  FIELD3: bin OR 
 FIELD4: bin OR  FIELD5: bin OR  FIELD6: bin OR  FIELD7: bin )  )  )  )  )  )  )  )  ) 
AND  RANGE: ([ 0800 TO 1100 ]) AND  (  S_IDa: (7 OR 8 OR 9 OR 10 OR 11 OR 12 OR 13 OR 
14 OR 15 OR 16 OR 17 )  or  S_IDb: (2 )  )
All your queries look for terms in fields 1-7.  If you instead combined 
the contents of fields 1-7 in a single field, and searched that field, 
then your searches would contain far fewer terms and be much faster.

Also, I don't know how many terms your RANGE queries match, but that 
could also be introducing large numbers of terms which would slow things 
down too.

But, still, you have identified a bottleneck: TermInfosReader caches a 
TermEnum and hence access to it must be synchronized.  Caching the enum 
greatly speeds sequential access to terms, e.g., when merging, 
performing range or prefix queries, etc.  Perhaps however the cache 
should be done through a ThreadLocal, giving each thread its own cache 
and obviating the need for synchronization...

Please tell me if you are able to simplify your queries and if that 
speeds things.  I'll look into a ThreadLocal-based solution too.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: problems with lucene in multithreaded environment

2004-06-04 Thread Doug Cutting
Doug Cutting wrote:
Please tell me if you are able to simplify your queries and if that 
speeds things.  I'll look into a ThreadLocal-based solution too.
I've attached a patch that should help with the thread contention, 
although I've not tested it extensively.

I still don't fully understand why your searches are so slow, though. 
Are the indexes stored on the local disk of the machine?  Indexes 
accessed over the network can be very slow.

Anyway, give this patch a try.  Also, if anyone else can try this and 
report back whether it makes multi-threaded searching faster, or 
anything else slower, or is buggy, that would be great.

Thanks,
Doug
Index: src/java/org/apache/lucene/index/TermInfosReader.java
===
RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/TermInfosReader.java,v
retrieving revision 1.6
diff -u -u -r1.6 TermInfosReader.java
--- src/java/org/apache/lucene/index/TermInfosReader.java	20 May 2004 11:23:53 -	1.6
+++ src/java/org/apache/lucene/index/TermInfosReader.java	4 Jun 2004 21:45:15 -
@@ -29,7 +29,8 @@
   private String segment;
   private FieldInfos fieldInfos;
 
-  private SegmentTermEnum enumerator;
+  private ThreadLocal enumerators = new ThreadLocal();
+  private SegmentTermEnum origEnum;
   private long size;
 
   TermInfosReader(Directory dir, String seg, FieldInfos fis)
@@ -38,19 +39,19 @@
 segment = seg;
 fieldInfos = fis;
 
-enumerator = new SegmentTermEnum(directory.openFile(segment + .tis),
-			   fieldInfos, false);
-size = enumerator.size;
+origEnum = new SegmentTermEnum(directory.openFile(segment + .tis),
+   fieldInfos, false);
+size = origEnum.size;
 readIndex();
   }
 
   public int getSkipInterval() {
-return enumerator.skipInterval;
+return origEnum.skipInterval;
   }
 
   final void close() throws IOException {
-if (enumerator != null)
-  enumerator.close();
+if (origEnum != null)
+  origEnum.close();
   }
 
   /** Returns the number of term/value pairs in the set. */
@@ -58,6 +59,15 @@
 return size;
   }
 
+  private SegmentTermEnum getEnum() {
+SegmentTermEnum enum = (SegmentTermEnum)enumerators.get();
+if (enum == null) {
+  enum = terms();
+  enumerators.set(enum);
+}
+return enum;
+  }
+
   Term[] indexTerms = null;
   TermInfo[] indexInfos;
   long[] indexPointers;
@@ -102,16 +112,17 @@
   }
 
   private final void seekEnum(int indexOffset) throws IOException {
-enumerator.seek(indexPointers[indexOffset],
-	  (indexOffset * enumerator.indexInterval) - 1,
+getEnum().seek(indexPointers[indexOffset],
+	  (indexOffset * getEnum().indexInterval) - 1,
 	  indexTerms[indexOffset], indexInfos[indexOffset]);
   }
 
   /** Returns the TermInfo for a Term in the set, or null. */
-  final synchronized TermInfo get(Term term) throws IOException {
+  TermInfo get(Term term) throws IOException {
 if (size == 0) return null;
 
-// optimize sequential access: first try scanning cached enumerator w/o seeking
+// optimize sequential access: first try scanning cached enum w/o seeking
+SegmentTermEnum enumerator = getEnum();
 if (enumerator.term() != null // term is at or past current
 	 ((enumerator.prev != null  term.compareTo(enumerator.prev)  0)
 	|| term.compareTo(enumerator.term()) = 0)) {
@@ -128,6 +139,7 @@
 
   /** Scans within block for matching term. */
   private final TermInfo scanEnum(Term term) throws IOException {
+SegmentTermEnum enumerator = getEnum();
 while (term.compareTo(enumerator.term())  0  enumerator.next()) {}
 if (enumerator.term() != null  term.compareTo(enumerator.term()) == 0)
   return enumerator.termInfo();
@@ -136,10 +148,12 @@
   }
 
   /** Returns the nth term in the set. */
-  final synchronized Term get(int position) throws IOException {
+  final Term get(int position) throws IOException {
 if (size == 0) return null;
 
-if (enumerator != null  enumerator.term() != null  position = enumerator.position 
+SegmentTermEnum enumerator = getEnum();
+if (enumerator != null  enumerator.term() != null 
+position = enumerator.position 
 	position  (enumerator.position + enumerator.indexInterval))
   return scanEnum(position);		  // can avoid seek
 
@@ -148,6 +162,7 @@
   }
 
   private final Term scanEnum(int position) throws IOException {
+SegmentTermEnum enumerator = getEnum();
 while(enumerator.position  position)
   if (!enumerator.next())
 	return null;
@@ -156,12 +171,13 @@
   }
 
   /** Returns the position of a Term in the set or -1. */
-  final synchronized long getPosition(Term term) throws IOException {
+  final long getPosition(Term term) throws IOException {
 if (size == 0) return -1;
 
 int indexOffset = getIndexOffset(term);
 seekEnum(indexOffset);
 
+SegmentTermEnum enumerator = getEnum();
 

Re: problems with lucene in multithreaded environment

2004-06-03 Thread Supun Edirisinghe
I noticed delays when concurrent threads query an IndexSearcher too.
our index is about 550MB with about 850,000 docs. each doc with 20-30 
fields of which only 3 are indexed. Our queries are not very complex -- 
just 3 required term queries.

this is what my test did:
intialilize an array of terms that are known to appear in the
initialize a IndexSearcher
start a number of threads
	that query the indexsearcher and extract
	each thread picks random terms that are known to appear in the indexed 
Keyword fields and builds a boolean query
	and then extracts all 20-30 fields from the 1st 10 hits.
	waits .5 secondseach thread does this 30 times.

typical queries returned 20 - 100 hits
with just one thread: 30 queries ran over a span about 20 seconds. 
search time for each query generally took 40ms to 75ms. The longest 
search time was 445ms but searches that took more than 100ms were rare.

with 5 threads: 150 queries ran over a span of 62 seconds. search time 
for each query for the most part increased to 120ms to 300ms. big 
delays were more prevalent and took 3 or 4 seconds.

with 10 or more threads things got bad. and I didn't run enough tests. 
but most searches took 1 to 2 seconds and some searches did take 20 to 
30 seconds.

when I ran the test with 5 concurrent thread each doing one query 
search times were like 100ms to 200 ms with a max of 700ms.

I have not looked into the code Lucene much and I didn't think queries 
were queued.

I ran my test with the -DdisableLuceneLocks in the command line. But I 
wasn't sure it did anything.

I ran the test on Lucene1.3 final on my powerbook G4 and tests ran with 
alot of other processes going on.

I was interested in this discussion because I could not figure out the 
delay if queries are run in parallel.

On Jun 2, 2004, at 9:32 PM, Doug Cutting wrote:
Jayant Kumar wrote:
We recently tested lucene with an index size of 2 GB
which has about 1,500,000 documents, each document
having about 25 fields. The frequency of search was
about 20 queries per second. This resulted in an
average response time of about 20 seconds approx
per search.
That sounds slow, unless your queries are very complex.  What are your 
queries like?

What we observed was that lucene queues
the queries and does not release them until the
results are found. so the queries that have come in
later take up about 500 seconds. Please let us know
whether there is a technique to optimize lucene in
such circumstances.
Multiple queries executed from different threads using a single 
searcher should not queue, but should run in parallel.  A technique to 
find out where threads are queueing is to get a thread dump and see 
where all of the threads are stuck.  In Solaris and Linux, sending the 
JVM a SIGQUIT will give a thread dump.  On Windows, use Control-Break.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: problems with lucene in multithreaded environment

2004-06-03 Thread Jayant Kumar
We conducted a test on our search for 500 requests
given in 27 seconds. We noticed that in the first 5
seconds, the results were coming in 100 to 500 ms. But
as the queue size kept increasing, the response time
of the search increased drastically to approx 80-100
seconds. 

Please find enclosed jvmdump.txt which contains a dump
of our search program after about 20 seconds of
starting the program.

Also enclosed is the file queries.txt which contains
few sample search queries.

Please note that this is done on a sample of 400,000
documents (450MB) on P4 having 1GB RAM.

Kindly let us know if this helps to identify the cause
of slow response.

Jayant

 --- Doug Cutting [EMAIL PROTECTED] wrote:  Jayant
Kumar wrote:
  We recently tested lucene with an index size of 2
 GB
  which has about 1,500,000 documents, each document
  having about 25 fields. The frequency of search
 was
  about 20 queries per second. This resulted in an
  average response time of about 20 seconds approx
  per search.
 
 That sounds slow, unless your queries are very
 complex.  What are your 
 queries like?
 
  What we observed was that lucene queues
  the queries and does not release them until the
  results are found. so the queries that have come
 in
  later take up about 500 seconds. Please let us
 know
  whether there is a technique to optimize lucene in
  such circumstances. 
 
 Multiple queries executed from different threads
 using a single searcher 
 should not queue, but should run in parallel.  A
 technique to find out 
 where threads are queueing is to get a thread dump
 and see where all of 
 the threads are stuck.  In Solaris and Linux,
 sending the JVM a SIGQUIT 
 will give a thread dump.  On Windows, use
 Control-Break.
 
 Doug
 

-
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]
  


Yahoo! India Matrimony: Find your partner online. 
http://yahoo.shaadi.com/india-matrimony/Thread-14 prio=1 tid=0x080a7420 nid=0x468e waiting for monitor entry 
[4d61a000..4d61ac18]
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:112)
- waiting to lock 0x44c95228 (a org.apache.lucene.index.TermInfosReader)
at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:51)
at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:364)
at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:59)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
at org.apache.lucene.search.Hits.init(Hits.java:43)
at org.apache.lucene.search.Searcher.search(Searcher.java:33)
at org.apache.lucene.search.Searcher.search(Searcher.java:27)
at resdex.searchinc.getHits(searchinc.java:752)
at resdex.searchinc.Search(searchinc.java:943)
at resdex.searchinctest.conductTestSearch(searchinctest.java:99)
at resdex.Server$Handler.run(Server.java:64)
at java.lang.Thread.run(Thread.java:534)

Thread-12 prio=1 tid=0x080a58e0 nid=0x468e waiting for monitor entry 
[4d51a000..4d51ad18]
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:112)
- waiting to lock 0x44c95228 (a org.apache.lucene.index.TermInfosReader)
at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:51)
at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:364)
at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:59)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:164)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
at org.apache.lucene.search.Hits.init(Hits.java:43)
at org.apache.lucene.search.Searcher.search(Searcher.java:33)
at org.apache.lucene.search.Searcher.search(Searcher.java:27)
at resdex.searchinc.getHits(searchinc.java:752)
at resdex.searchinc.Search(searchinc.java:943)
at resdex.searchinctest.conductTestSearch(searchinctest.java:99)
at 

problems with lucene in multithreaded environment

2004-06-02 Thread Jayant Kumar
We recently tested lucene with an index size of 2 GB
which has about 1,500,000 documents, each document
having about 25 fields. The frequency of search was
about 20 queries per second. This resulted in an
average response time of about 20 seconds approx
per search. What we observed was that lucene queues
the queries and does not release them until the
results are found. so the queries that have come in
later take up about 500 seconds. Please let us know
whether there is a technique to optimize lucene in
such circumstances. 

Please note that we have created a single object for
the searcher (IndexSearcher) and all queries are
passed to this searcher only. We are using a P4 dual
processor machine with 6 gb of ram. We need results at
the rate of about 60 queries/second at peak load. Is
there a way to optimize lucene to get this performance
from this machine? What other ways can i optimize
lucene for this output?

Regards
Jayant


Yahoo! India Matrimony: Find your partner online. 
http://yahoo.shaadi.com/india-matrimony/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: problems with lucene in multithreaded environment

2004-06-02 Thread Doug Cutting
Jayant Kumar wrote:
We recently tested lucene with an index size of 2 GB
which has about 1,500,000 documents, each document
having about 25 fields. The frequency of search was
about 20 queries per second. This resulted in an
average response time of about 20 seconds approx
per search.
That sounds slow, unless your queries are very complex.  What are your 
queries like?

What we observed was that lucene queues
the queries and does not release them until the
results are found. so the queries that have come in
later take up about 500 seconds. Please let us know
whether there is a technique to optimize lucene in
such circumstances. 
Multiple queries executed from different threads using a single searcher 
should not queue, but should run in parallel.  A technique to find out 
where threads are queueing is to get a thread dump and see where all of 
the threads are stuck.  In Solaris and Linux, sending the JVM a SIGQUIT 
will give a thread dump.  On Windows, use Control-Break.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]