date:20040820

The bottleneck seems to be disk IO.
Since this is a read-only index, why not spread some of the frequently
scanned index files over multiple disks, or put the index on SCSI disks
hooked up in a RAID.  Maybe this is already the case, but you didn't
mention in.

Oh, I already answered a similar question once before:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg05103.html

Otis
http://www.simpy.com/ -- Index, Search and Share your bookmarks


--- Yonik Seeley <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> I'm trying to figure out how to speed up queries to a
> large index.
> I'm currently getting 133 req/sec, which isn't bad,
> but isn't too close
> to MySQL, which is getting 500 req/sec on the same
> hardware with the
> same set of documents.
> 
> Setup info & Stats:
> - 4.3M documents, 12 keyword fields per document, 11
> unindexed fields per document.
> - lucene index size on disk=1.3G
> - Hardware: dual opteron w/ 16GB memory, running 64
> bit JVM (Sun 1.5 beta)
> - Lucene version 1.4.1
> - Hitting multithreaded server w/ 10 clients at once
> - This is a read-only index... no updating is done
> - Single IndexSearcher that is reused for all requests
>  
> 
> Q1)  while hitting it with multiple queries at once,
> lucene is pegged at 50% CPU usage (meaning it is
> only using 1 out of 2 CPUs on average).  I took a
> thread dump
> and all of the lucene threads except one are blocked
> on
> reading a file (see trace below).  I could create two
> index
> readers, but that seems like it might be a waste, and
> fixing
> a symptom instead of the root problem.  Would multiple
> IndexSearchers or IndexReaders share internal caches?
> Is there a way to cache more info at a higher level
> such that
> it would get rid of this bottleneck?  The JVM isn't
> taking up
> much space (125M or so), and I have 16GB to work with!
> The OS (linux) is obviously caching the index file,
> but
> that doesn't get rid of the synchronization issues,
> and the
> overhead of re-reading.
> How is caching in lucene configured?
> Does it internally use FieldCache, or do I have to use
> that
> somehow myself?
>  
> "tcpConnection-8080-72" daemon prio=1
> tid=0x002b24412490 nid=0x34a4 waiting for monitor
> entry 
> 
> [0x45aba000..0x45abb2d0]
> at
>
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:215)
> - waiting to lock <0x002ae153fa00> (a
> org.apache.lucene.store.FSInputStream)
> at
> org.apache.lucene.store.InputStream.refill(InputStream.java:158)
> at
> org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
> at
> org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
> at
>
org.apache.lucene.index.SegmentTermDocs.skipTo(SegmentTermDocs.java:176)
> at
> org.apache.lucene.search.TermScorer.skipTo(TermScorer.java:88)
> at
>
org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:53)
> at
>
org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:48)
> at
> org.apache.lucene.search.Scorer.score(Scorer.java:37)
> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:92)
> at
> org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
> at
> org.apache.lucene.search.Hits.(Hits.java:43)
> at
> org.apache.lucene.search.Searcher.search(Searcher.java:33)
> at
> org.apache.lucene.search.Searcher.search(Searcher.java:27)
> 
> 
> Even using only 1 cpu though, MySQL is faster. Here is
> what
> the queries look like:
> 
> "field1:4 AND field2:188453 AND field3:1"
> 
> field1:4  done alone selects around 4.2M records
> field2:188453 done alone selects around 1.6M records
> field3:1  done alone selects around 1K records
> The whole query normally selects less than 50 records
> Only the first 10 are returned (or whatever range
> the client selects).
> 
> The fields are all keywords checked for exact matches
> (no
> fulltext search is done).  Is there anything I can do
> to
> speed these queries up, or is the structure just more
> suited
> to MySQL (and not an inverted index)?
> 
> How is a query like this carried out?
> 
> Any help would be greatly appreciated.  There's not a
> lot of info
> on searching (much more on updating). I'm looking
> forward
> to "Lucene in Action"!  too bad it's not out till
> October.
> 
> -Yonik
> 
> 
>   
> ___
> Do you Yahoo!?
> Win 1 of 4,000 free domain names from Yahoo! Enter now.
> http://promotions.yahoo.com/goldrush
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Custom filter

On Aug 20, 2004, at 6:48 PM, [EMAIL PROTECTED] wrote:
We're currently in lucene 1.2... haven't moved to 1.3 yet.
Skip 1.3 and go straight to 1.4.1 :)
Upgrade - why not?
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: memory leek in lucene?

2004-08-20 Thread Terence Lai

Are you calling ParallelMultiSearcher.search(Query query, Sort sort) to do your 
search? If so, I am currently having a similar problem.

Terence

> 
> Doing query against lucene  I run into memomry problem, i.e. it's look like
> it's not giving memory back after the
> query have been  executed.
> 
> I use ParallelMultiSearcher ant call close method after results are
> displayed.
> 
> hits=null; // Hits class
> if (ms!=null) ms.close(); //ParallelMultiSearcher
> 
> Doesn't help. The memory getting not free. On queries like "No*" I get
> incremental memory consume of c. 20-70mb. per query.
> Imagine what happens with my web server...
> 
> I tried also from command line and got the similar result.
> 
> Am I doing wrong or miss something?
> 
> Please help, I use 1.4.1 on linux box.
> Joel
> 
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 




--
Get your free email account from http://www.trekspace.com
  Your Internet Virtual Desktop!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Custom filter

2004-08-20 Thread roy-lucene-user

We're currently in lucene 1.2... haven't moved to 1.3 yet.

Roy.

On Fri, 20 Aug 2004 18:46:29 -0400, Erik Hatcher wrote
> Have you considered using the built-in QueryFilter for this?   Why 
> isn't it sufficient for your needs?


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Debian build problem with 1.4.1


>It's easy enough for folks to  compile Lucene this way

I'm having trouble, warnings and error messages appended. This is for
Lucene 1.4.1. One of the few Debian specific changes was to call the
jarball 1.4 instead of the default 1.5-rc1-dev designation in
build.xml.

rode:~> gcj --version
gcj (GCC) 3.3.4 (Debian 1:3.3.4-9)

rode:~> gcj build/lucene-1.4.jar build/lucene-demos-1.4.jar -o indexer \
   --main=org.apache.lucene.demo.IndexHTML >& /tmp/log.txt

> and applications built this way are pretty small.  The big thing to
> install is libgcj.

I'm potentially interested in C applications calling a Lucene gcj
compiled native library. But that would be in the distant future if at
all. Right now just compiling a working Lucene app with gcj would be
pretty cool.

Cheers,
Jeff



=


org/apache/lucene/analysis/de/WordlistLoader.java: In class 
`org.apache.lucene.analysis.de.WordlistLoader':
org/apache/lucene/analysis/de/WordlistLoader.java: In method 
`org.apache.lucene.analysis.de.WordlistLoader.getWordSet(java.io.File)':
org/apache/lucene/analysis/de/WordlistLoader.java:47: warning: exception handler 
inside code that is being protected
CompoundFileReader.java: In class 
`org.apache.lucene.index.CompoundFileReader$CSInputStream':
CompoundFileReader.java: In method 
`org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(byte[],int,int)':
CompoundFileReader.java:215: warning: exception handler inside code that is being 
protected
org/apache/lucene/index/CompoundFileReader.java: In class 
`org.apache.lucene.index.CompoundFileReader':
org/apache/lucene/index/CompoundFileReader.java: In constructor 
`(org.apache.lucene.store.Directory,java.lang.String)':
org/apache/lucene/index/CompoundFileReader.java:51: warning: exception handler inside 
code that is being protected
org/apache/lucene/index/CompoundFileWriter.java: In class 
`org.apache.lucene.index.CompoundFileWriter':
org/apache/lucene/index/CompoundFileWriter.java: In method 
`org.apache.lucene.index.CompoundFileWriter.close()':
org/apache/lucene/index/CompoundFileWriter.java:127: warning: exception handler inside 
code that is being protected
org/apache/lucene/index/CompoundFileWriter.java: In method 
`org.apache.lucene.index.CompoundFileWriter.copyFile(org.apache.lucene.index.CompoundFileWriter$FileEntry,org.apache.lucene.store.OutputStream,byte[])':
org/apache/lucene/index/CompoundFileWriter.java:194: warning: exception handler inside 
code that is being protected
org/apache/lucene/index/DocumentWriter.java: In class 
`org.apache.lucene.index.DocumentWriter':
org/apache/lucene/index/DocumentWriter.java: In method 
`org.apache.lucene.index.DocumentWriter.addDocument(java.lang.String,org.apache.lucene.document.Document)':
org/apache/lucene/index/DocumentWriter.java:60: warning: exception handler inside code 
that is being protected
org/apache/lucene/index/DocumentWriter.java: In method 
`org.apache.lucene.index.DocumentWriter.invertDocument(org.apache.lucene.document.Document)':
org/apache/lucene/index/DocumentWriter.java:117: warning: exception handler inside 
code that is being protected
org/apache/lucene/index/DocumentWriter.java: In method 
`org.apache.lucene.index.DocumentWriter.writePostings(org.apache.lucene.index.Posting[],java.lang.String)':
org/apache/lucene/index/DocumentWriter.java:250: warning: exception handler inside 
code that is being protected
org/apache/lucene/index/DocumentWriter.java: In method 
`org.apache.lucene.index.DocumentWriter.writeNorms(org.apache.lucene.document.Document,java.lang.String)':
org/apache/lucene/index/DocumentWriter.java:320: warning: exception handler inside 
code that is being protected
org/apache/lucene/index/FieldInfos.java: In class `org.apache.lucene.index.FieldInfos':
org/apache/lucene/index/FieldInfos.java: In constructor 
`(org.apache.lucene.store.Directory,java.lang.String)':
org/apache/lucene/index/FieldInfos.java:36: warning: exception handler inside code 
that is being protected
org/apache/lucene/index/FieldInfos.java: In method 
`org.apache.lucene.index.FieldInfos.write(org.apache.lucene.store.Directory,java.lang.String)':
org/apache/lucene/index/FieldInfos.java:172: warning: exception handler inside code 
that is being protected
org/apache/lucene/index/IndexReader.java: In class 
`org.apache.lucene.index.IndexReader':
org/apache/lucene/index/IndexReader.java: In method 
`org.apache.lucene.index.IndexReader.open(org.apache.lucene.store.Directory,boolean)':
org/apache/lucene/index/IndexReader.java:110: warning: exception handler inside code 
that is being protected
org/apache/lucene/index/IndexReader.java: In method 
`org.apache.lucene.index.IndexReader.delete(org.apache.lucene.index.Term)':
org/apache/lucene/index/IndexReader.java:449: warning: exception handler inside code 
that is being protected
org/apache/lucene/index/IndexReader.java: In method 
`org.apache.lucene.index.IndexReader.commit()':
org/apache/lucene/index/IndexReader.java:480: warning: exception h

Re: Custom filter

Have you considered using the built-in QueryFilter for this?   Why 
isn't it sufficient for your needs?

Erik
On Aug 20, 2004, at 6:32 PM, [EMAIL PROTECTED] wrote:
Hi guys!
I was hoping someone here could help me out with a custom filter.
We have an index of emails and do some searches on the text of an 
email message and also searches based on the email addresses in a To, 
From or CC.

Since we also do searches on a bunch of emails, we created a custom 
filter for searches on an array of fields for an array of values.  
[code included below]

The problem we're having is that creating a query string like so:
"Message:viagra AND (From:(email1 OR email2) OR To:(email1 OR email2) 
OR CC:(email1 OR email2))"
would return results, but our filter combined with a query string of 
"Message:viagra" sometimes wouldn't.

One thing I noticed is that when the results do return with the 
filter, the email has the format of [EMAIL PROTECTED], but the 
one that doesn't has something like [EMAIL PROTECTED]

Also it might have something to do with the storage of the From or To 
or CC.  We don't parse out the email addresses before storing them.  
So sometimes the value of a From/To/CC field might be 
"[EMAIL PROTECTED]" or "local <[EMAIL PROTECTED]>" or even 
"<[EMAIL PROTECTED]>".  Could the carrots be throwing off my filter?

I also wouldn't mind any suggestions to doing this filter better.
Here is the bits method from our custom filter:
-
final public BitSet bits( IndexReader reader ) throws IOException {
BitSet bits = new BitSet( reader.maxDoc() );
for ( int x = 0; x < fields.length; x++ ) {
for ( int y = 0; y < values.length; y++ ) {
TermDocs termDocs = reader.termDocs( new Term( 
fields[x], values[y] ) );
try {
while ( termDocs.next() ) {
bits.set( termDocs.doc() );
}
}
finally {
termDocs.close();
}
}
}
return bits;
}
-

Thanks in advance,
Roy.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Custom filter

2004-08-20 Thread roy-lucene-user

Hi guys!

I was hoping someone here could help me out with a custom filter.

We have an index of emails and do some searches on the text of an email message and 
also searches based on the email addresses in a To, From or CC.

Since we also do searches on a bunch of emails, we created a custom filter for 
searches on an array of fields for an array of values.  [code included below]

The problem we're having is that creating a query string like so:
"Message:viagra AND (From:(email1 OR email2) OR To:(email1 OR email2) OR CC:(email1 OR 
email2))"
would return results, but our filter combined with a query string of "Message:viagra" 
sometimes wouldn't.

One thing I noticed is that when the results do return with the filter, the email has 
the format of [EMAIL PROTECTED], but the one that doesn't has something like [EMAIL 
PROTECTED]

Also it might have something to do with the storage of the From or To or CC.  We don't 
parse out the email addresses before storing them.  So sometimes the value of a 
From/To/CC field might be "[EMAIL PROTECTED]" or "local <[EMAIL PROTECTED]>" or even 
"<[EMAIL PROTECTED]>".  Could the carrots be throwing off my filter?

I also wouldn't mind any suggestions to doing this filter better.

Here is the bits method from our custom filter:
-
final public BitSet bits( IndexReader reader ) throws IOException {
BitSet bits = new BitSet( reader.maxDoc() );

for ( int x = 0; x < fields.length; x++ ) {
for ( int y = 0; y < values.length; y++ ) {
TermDocs termDocs = reader.termDocs( new Term( fields[x], values[y] ) 
);
try {
while ( termDocs.next() ) {
bits.set( termDocs.doc() );
}
}
finally {
termDocs.close();
}
}
}
return bits;
}
-

Thanks in advance,

Roy.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

speeding up queries (MySQL faster)

2004-08-20 Thread Yonik Seeley

Hi,

I'm trying to figure out how to speed up queries to a
large index.
I'm currently getting 133 req/sec, which isn't bad,
but isn't too close
to MySQL, which is getting 500 req/sec on the same
hardware with the
same set of documents.

Setup info & Stats:
- 4.3M documents, 12 keyword fields per document, 11
unindexed fields per document.
- lucene index size on disk=1.3G
- Hardware: dual opteron w/ 16GB memory, running 64
bit JVM (Sun 1.5 beta)
- Lucene version 1.4.1
- Hitting multithreaded server w/ 10 clients at once
- This is a read-only index... no updating is done
- Single IndexSearcher that is reused for all requests
 

Q1)  while hitting it with multiple queries at once,
lucene is pegged at 50% CPU usage (meaning it is
only using 1 out of 2 CPUs on average).  I took a
thread dump
and all of the lucene threads except one are blocked
on
reading a file (see trace below).  I could create two
index
readers, but that seems like it might be a waste, and
fixing
a symptom instead of the root problem.  Would multiple
IndexSearchers or IndexReaders share internal caches?
Is there a way to cache more info at a higher level
such that
it would get rid of this bottleneck?  The JVM isn't
taking up
much space (125M or so), and I have 16GB to work with!
The OS (linux) is obviously caching the index file,
but
that doesn't get rid of the synchronization issues,
and the
overhead of re-reading.
How is caching in lucene configured?
Does it internally use FieldCache, or do I have to use
that
somehow myself?
 
"tcpConnection-8080-72" daemon prio=1
tid=0x002b24412490 nid=0x34a4 waiting for monitor
entry 

[0x45aba000..0x45abb2d0]
at
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:215)
- waiting to lock <0x002ae153fa00> (a
org.apache.lucene.store.FSInputStream)
at
org.apache.lucene.store.InputStream.refill(InputStream.java:158)
at
org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
at
org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
at
org.apache.lucene.index.SegmentTermDocs.skipTo(SegmentTermDocs.java:176)
at
org.apache.lucene.search.TermScorer.skipTo(TermScorer.java:88)
at
org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:53)
at
org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:48)
at
org.apache.lucene.search.Scorer.score(Scorer.java:37)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:92)
at
org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
at
org.apache.lucene.search.Hits.(Hits.java:43)
at
org.apache.lucene.search.Searcher.search(Searcher.java:33)
at
org.apache.lucene.search.Searcher.search(Searcher.java:27)


Even using only 1 cpu though, MySQL is faster. Here is
what
the queries look like:

"field1:4 AND field2:188453 AND field3:1"

field1:4  done alone selects around 4.2M records
field2:188453 done alone selects around 1.6M records
field3:1  done alone selects around 1K records
The whole query normally selects less than 50 records
Only the first 10 are returned (or whatever range
the client selects).

The fields are all keywords checked for exact matches
(no
fulltext search is done).  Is there anything I can do
to
speed these queries up, or is the structure just more
suited
to MySQL (and not an inverted index)?

How is a query like this carried out?

Any help would be greatly appreciated.  There's not a
lot of info
on searching (much more on updating). I'm looking
forward
to "Lucene in Action"!  too bad it's not out till
October.

-Yonik



___
Do you Yahoo!?
Win 1 of 4,000 free domain names from Yahoo! Enter now.
http://promotions.yahoo.com/goldrush

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Debian build problem with 1.4.1

2004-08-20 Thread Doug Cutting

I can successfully use gcc 3.4.0 with Lucene as follows:
ant jar jar-demo
gcj -O3 build/lucene-1.5-rc1-dev.jar build/lucene-demos-1.5-rc1-dev.jar 
-o indexer --main=org.apache.lucene.demo.IndexHTML

./indexer -create docs
It runs pretty snappy too!  However I don't know if there's much milage 
in packaging Lucene as a native library.  It's easy enough for folks to 
compile Lucene this way, and applications built this way are pretty 
small.  The big thing to install is libgcj.

Doug
Jeff Breidenbach wrote:
Ok, Lucene 1.4.1 has been uploaded to Debian. Hopefully it will have
enough time to percolate before the sarge release.
Now that that is taken care of, I'm curious about the status of gcj
compilation. Packaging Lucene as a native library might be useful for
projects such as PyLucene, and it is also advantageous for license
reasons i.e. avoiding the non-free JVM dependency. What's the current
gcj compilation recipe? The best I could find on Google (below) seems
a little bit stale.
http://www.mail-archive.com/[EMAIL PROTECTED]/msg04131.html
Cheers,
Jeff

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Lucene with English and Spanish Best Practice?

2004-08-20 Thread Chad Small

Hello,

I'm interested in any feedback from anyone who has worked through implementing 
Internationalization (I18N) search with Lucene or has ideas for this requirement.  
Currently, we're using Lucene with straight English and are looking to add Spanish to 
the mix (with maybe more languages to follow).  

This is our current IndexWriter setup utilizing the PerFieldAnalyzerWrapper:

   PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(new 
StandardAnalyzer());
   analyzer.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer());
   analyzer.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer());
   IndexWriter writer = new IndexWriter(indexDir, analyzer, create);

Would people suggest we switch this over to Snowball so there are English and Spanish 
Analyzers and IndexWriters?  Something like this:

PerFieldAnalyzerWrapper analyzerEnglish = new PerFieldAnalyzerWrapper(new 
SnowballAnalyzer("English"));
analyzerEnglish.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer());
analyzerEnglish.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer());
IndexWriter writerEnglish = new IndexWriter(indexDir, analyzerEnglish, create);

PerFieldAnalyzerWrapper analyzerSpanish = new PerFieldAnalyzerWrapper(new 
SnowballAnalyzer("Spanish"));
analyzerSpanish.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer());
analyzerSpanish.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer());
IndexWriter writerSpanish = new IndexWriter(indexDir, analyzerSpanish, create);


Are multiple indexes or mirrors of each index then usually created for every language? 
 We currently have 4 indexes that are all English.  Would we then create 4 more that 
are Spanish?  Then at search time we would determine the language and which set of 
indexes to search against, English or Spanish.

Or another approach could be to add a Spanish field to the existing 4 indexes since 
most of the indexes have only one field that will be translated from English to 
Spanish.


thanks a bunch,
chad.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Lucene with English and Spanish Best Practice?

2004-08-20 Thread Chad Small

Hello,

I'm interested in any feedback from anyone who has worked through implementing 
Internationalization (I18N) search with Lucene or has ideas for this requirement.  
Currently, we're using Lucene with straight English and are looking to add Spanish to 
the mix (with maybe more languages to follow).  

This is our current IndexWriter setup utilizing the PerFieldAnalyzerWrapper:

   PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(new 
StandardAnalyzer());
   analyzer.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer());
   analyzer.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer());
   IndexWriter writer = new IndexWriter(indexDir, analyzer, create);

Would people suggest we switch this over to Snowball so there are English and Spanish 
Analyzers and IndexWriters?  Something like this:

PerFieldAnalyzerWrapper analyzerEnglish = new PerFieldAnalyzerWrapper(new 
SnowballAnalyzer("English"));
analyzerEnglish.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer());
analyzerEnglish.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer());
IndexWriter writerEnglish = new IndexWriter(indexDir, analyzerEnglish, create);

PerFieldAnalyzerWrapper analyzerSpanish = new PerFieldAnalyzerWrapper(new 
SnowballAnalyzer("Spanish"));
analyzerSpanish.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer());
analyzerSpanish.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer());
IndexWriter writerSpanish = new IndexWriter(indexDir, analyzerSpanish, create);


Are multiple indexes or mirrors of each index then usually created for every language? 
 We currently have 4 indexes that are all English.  Would we then create 4 more that 
are Spanish?  Then at search time we would determine the language and which set of 
indexes to search against, English or Spanish.

Or another approach could be to add a Spanish field to the existing 4 indexes since 
most of the indexes have only one field that will be translated from English to 
Spanish.


thanks a bunch,
chad.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: NegativeArraySizeException when creating a new IndexSearcher

2004-08-20 Thread Doug Cutting

Looks to me like you're using an older version of Lucene on your Linux 
box.  The code is back-compatible, it will read old indexes, but Lucene 
1.3 cannot read indexes created by Lucene 1.4, and will fail in the way 
you describe.

Doug
Sven wrote:
Hi!
I have a problem to port a Lucene based knowledgebase from Windows to Linux.
On Windows it works fine whereas I get a NegativeArraySizeException on Linux
when I try to initialise a new IndexSearcher to search the index. Deleting
and rebuilding the index didn't help. I checked permissions, file path and
lock_dir but as far as I can say they seem to be all right. As I couldn't
find another one with the same problem I guess I've overlooked sth, but I've
run out of ideas. I use lucene-1.4-rc2 and tomcat 5.0.18. Can someone help
me please with this or has an idea?
Kind regards,
Sven
java.lang.NegativeArraySizeException
 at
org.apache.lucene.index.TermInfosReader.readIndex(TermInfosReader.java:106)
 at org.apache.lucene.index.TermInfosReader.(TermInfosReader.java:82)
 at org.apache.lucene.index.SegmentReader.(SegmentReader.java:141)
 at org.apache.lucene.index.SegmentReader.(SegmentReader.java:120)
 at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118)
 at org.apache.lucene.store.Lock$With.run(Lock.java:148)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:111)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:99)
 at org.apache.lucene.search.IndexSearcher.(IndexSearcher.java:75)
 at
com.sykon.knowledgebase.action.ListQueryResultAction.act(ListQueryResultActi
on.java:134)
 at
org.apache.cocoon.components.treeprocessor.sitemap.ActTypeNode.invoke(ActTyp
eNode.java:159)
 at
org.apache.cocoon.components.treeprocessor.sitemap.ActionSetNode.call(Action
SetNode.java:121)
 at
org.apache.cocoon.components.treeprocessor.sitemap.ActSetNode.invoke(ActSetN
ode.java:98)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:84)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PreparableMatchNode.invok
e(PreparableMatchNode.java:165)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:107)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(Pipel
ineNode.java:162)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:107)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(Pipe
linesNode.java:136)
 at
org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess
or.java:371)
 at
org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess
or.java:312)
 at
org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNod
e.java:133)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:84)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PreparableMatchNode.invok
e(PreparableMatchNode.java:165)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:107)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(Pipel
ineNode.java:162)
 at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:107)
 at
org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(Pipe
linesNode.java:136)
 at
org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess
or.java:371)
 at
org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess
or.java:312)
 at org.apache.cocoon.Cocoon.process(Cocoon.java:656)
 at org.apache.cocoon.servlet.CocoonServlet.service(CocoonServlet.java:1112)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:856)
 at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:284)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
ain.java:204)
 at
org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.
java:742)
 at
org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDis
patcher.java:506)
 at
org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatch
er.java:443)
 at
org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher
.java:359)
 at
org.apache.jasper.runtime.PageContextImpl.doForward(PageContextImpl.java:712
)
 at
org.apache.jasper.runtime.PageContextImpl.forward(PageContextImpl.java:682)
 at
org.apache.jsp.knowlegebase.controller_jsp._jspService(controller_jsp.java:8
44)
 at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:133)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:856)
 at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3
11)
 at org.apache.jasper.serv

RE: continuous index updates

2004-08-20 Thread Crump, Michael

So the finalizer on the underlying reader closes file handles?

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 20, 2004 2:41 PM
To: Lucene Users List
Subject: Re: continuous index updates

I just create a new IndexSearcher, leave the old IndexSearcher alone,
and JVM's garbage collection cleans it up.

Otis

--- "Crump, Michael" <[EMAIL PROTECTED]> wrote:

> Hello,
> 
>  
> 
> I am currently working on a server app that will require the ability
> to
> make index additions/deletions at any time.  I want to cache/reuse
> index
> searchers and readers.  I know that once an index has changed only
> newly
> opened readers will see the changes.  Creating a new reader to see
> the
> changes and caching it will be no problem.  My  problem is that since
> this is a multithreaded app other threads may be using the old
> readers
> making it difficult to know when to close them.  I assume that a
> reader
> must be closed to free the associated resources.  I was thinking
> about
> using some kind of reference counted reader that would keep track of
> its
> references and only truly close when there were no references.
> 
>  
> 
> Am I making this too difficult?
> 
>  
> 
> Is there a better way?
> 
>  
> 
> I assume others have had to do this using Lucene, do you have any
> recommendations?
> 
>  
> 
> Regards,
> 
>  
> 
> Michael
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: continuous index updates

I just create a new IndexSearcher, leave the old IndexSearcher alone,
and JVM's garbage collection cleans it up.

Otis

--- "Crump, Michael" <[EMAIL PROTECTED]> wrote:

> Hello,
> 
>  
> 
> I am currently working on a server app that will require the ability
> to
> make index additions/deletions at any time.  I want to cache/reuse
> index
> searchers and readers.  I know that once an index has changed only
> newly
> opened readers will see the changes.  Creating a new reader to see
> the
> changes and caching it will be no problem.  My  problem is that since
> this is a multithreaded app other threads may be using the old
> readers
> making it difficult to know when to close them.  I assume that a
> reader
> must be closed to free the associated resources.  I was thinking
> about
> using some kind of reference counted reader that would keep track of
> its
> references and only truly close when there were no references.
> 
>  
> 
> Am I making this too difficult?
> 
>  
> 
> Is there a better way?
> 
>  
> 
> I assume others have had to do this using Lucene, do you have any
> recommendations?
> 
>  
> 
> Regards,
> 
>  
> 
> Michael
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Debian build problem with 1.4.1

On Aug 20, 2004, at 12:36 PM, Jeff Breidenbach wrote:

I don't understand this.  StandardTokenizer.java hasn't changed since
last year.
I have packaged Lucene such that 'ant javacc' is called at package
build time. I now see the problem - 'import java.io.*;' has been
removed from StandardTokenizer.jj in Lucene 1.4.1.  When I put that
line back in, things build fine.
Now that I know what the problem is, I'll go ahead and patch the Debian
package. Please make sure the Lucene codebase gets fixed as well.
The codebase has been fixed, as of a couple of weeks ago :)
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Debian build problem with 1.4.1


Ok, Lucene 1.4.1 has been uploaded to Debian. Hopefully it will have
enough time to percolate before the sarge release.

Now that that is taken care of, I'm curious about the status of gcj
compilation. Packaging Lucene as a native library might be useful for
projects such as PyLucene, and it is also advantageous for license
reasons i.e. avoiding the non-free JVM dependency. What's the current
gcj compilation recipe? The best I could find on Google (below) seems
a little bit stale.

http://www.mail-archive.com/[EMAIL PROTECTED]/msg04131.html

Cheers,
Jeff



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

memory leek in lucene?

2004-08-20 Thread iouli . golovatyi


Doing query against lucene  I run into memomry problem, i.e. it's look like
it's not giving memory back after the
query have been  executed.

I use ParallelMultiSearcher ant call close method after results are
displayed.

hits=null; // Hits class
if (ms!=null) ms.close(); //ParallelMultiSearcher

Doesn't help. The memory getting not free. On queries like "No*" I get
incremental memory consume of c. 20-70mb. per query.
Imagine what happens with my web server...

I tried also from command line and got the similar result.

Am I doing wrong or miss something?

Please help, I use 1.4.1 on linux box.
Joel





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Debian build problem with 1.4.1


>I don't understand this.  StandardTokenizer.java hasn't changed since  
>last year.

I have packaged Lucene such that 'ant javacc' is called at package
build time. I now see the problem - 'import java.io.*;' has been
removed from StandardTokenizer.jj in Lucene 1.4.1.  When I put that
line back in, things build fine.

Now that I know what the problem is, I'll go ahead and patch the Debian 
package. Please make sure the Lucene codebase gets fixed as well.

Cheers,
Jeff

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene and ejb applications

2004-08-20 Thread Praveen Peddi

Infact we do the same exact thing. Session bean method called search()
delegates to a POJO SearchService. We lazy load the IndexSearch cache it in
memory and invalidate that object when someone else modifies the index. This
trick works wonderfually for us. The search has become faster after caching
the searcher.

Praveen
- Original Message - 
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Friday, August 20, 2004 12:02 PM
Subject: Re: lucene and ejb applications


> On Aug 20, 2004, at 7:54 AM, Rupinder Singh Mazara wrote:
> > hi erik
> >
> >  thanks for the warning and the code.
> >  Let me re-phrase the question,
> >
> >  i have a index generated by lucene, i need to have the search
> > capabilty
> >  to have a high availabilty. What solutions would be the most optimal
>
> I'm guessing from your descriptions that you want a search server that
> multiple applications can access.  Correct?  Is that what you mean by
> "high availability"?
>
> Take a look at Nutch for examples of doing this kind of thing.  And
> also...
>
> >
> >  Currentlly i have two senarions in mind
> >   a) setup a RMI based app. that on start-up initializes a
> > IndexSearcher
> > object
> >  and waits for invocation of a method like Vector
> > executeQuery(Query )
>
> Lucene has built-in RMI capability, so you don't need to recreate this
> yourself.  Look at RemoteSearchable (and the test cases that use it).
>
> >   b) create a web based app(jsp/servlet or struts)  that initialises
> > the
> > IndexSearcher object, and stores in the servletContext on
> > intialization, and
> > all request invoke the Hits search(Query q)
>
> This is ok, but you have the same issues with servlet context
> (application scope or even session scope) with distributed
> applications.  IndexSearcher, at the very least, should be transient
> and lazy initialized, perhaps nested under a controller object of your
> making.
>
> >   with senario a)  i can have more control over updates, insert, and
> > deletes
> >   where as with  senario b) has higher availabilty
>
> I disagree with your analysis of those scenarios.  Neither has more or
> less control or availability than the other.
>
> >  I want to create and store the IndexSearcher object, during
> > initailization
> > to save on
> >  mutlitple open and reads. once updates are ready signal can be sent to
> > block further searches while the updates are integrated into the
> > existing
> > index.
>
> It is a good thing to keep an IndexSearcher instance around for big
> indexes to save on that I/O, I completely agree.  A simple
> IndexSearcher-encapsulating Java object which lazy initializes and
> keeps IndexSearcher as a transient would be quite sufficient, I think.
> Store that object wherever you like - application scope seems to be
> appropriate for your web application scenario.
>
> Erik
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Debian build problem with 1.4.1

On Aug 20, 2004, at 11:12 AM, Jeff Breidenbach wrote:
Hi Otis,
I'm asking, because it looks like your compiler is not finding Reader
and IOException classes, both of which are in java.io.* package, which
I see imported in StandardTokenizer.java as 'import java.io.*;'.

In my copy of StandardTokenizer.java, there is no 'import java.io.*;'
(and in fact this is a change from lucene-1.4-final).
I don't understand this.  StandardTokenizer.java hasn't changed since  
last year.

% cvs log StandardTokenizer.java
  ...

revision 1.3
date: 2003/12/22 22:12:24;  author: cutting;  state: Exp;  lines: +6 -6
Fix StandardTokenizer's handling of CJK characters.

revision 1.2
date: 2003/10/01 16:39:26;  author: ehatcher;  state: Exp;  lines: +7 -4
oops, forgot to check in JavaCC generated files

revision 1.1
date: 2003/09/11 01:51:33;  author: ehatcher;  state: Exp;
PR 19468, but not exactly as it was done in the provided patches.   
JavaCC is no longer required to build Lucene, but can be run optionally
 
=

And I have import java.io.* at the top.
 Since this file
is apparently generated from JavaCC, I'm not sure what to do.
You can regenerate StandardTokenizer by running:
ant javacc
(you'll need JavaCC installed, of course, and this is the reason we  
check in the generated files in order to save the hassle for others)

It seems something is fishy with the copy of the code you have.
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene and ejb applications

On Aug 20, 2004, at 7:54 AM, Rupinder Singh Mazara wrote:
hi erik
 thanks for the warning and the code.
 Let me re-phrase the question,
 i have a index generated by lucene, i need to have the search 
capabilty
 to have a high availabilty. What solutions would be the most optimal
I'm guessing from your descriptions that you want a search server that 
multiple applications can access.  Correct?  Is that what you mean by 
"high availability"?

Take a look at Nutch for examples of doing this kind of thing.  And 
also...

 Currentlly i have two senarions in mind
  a) setup a RMI based app. that on start-up initializes a 
IndexSearcher
object
 and waits for invocation of a method like Vector 
executeQuery(Query )
Lucene has built-in RMI capability, so you don't need to recreate this 
yourself.  Look at RemoteSearchable (and the test cases that use it).

  b) create a web based app(jsp/servlet or struts)  that initialises 
the
IndexSearcher object, and stores in the servletContext on 
intialization, and
all request invoke the Hits search(Query q)
This is ok, but you have the same issues with servlet context 
(application scope or even session scope) with distributed 
applications.  IndexSearcher, at the very least, should be transient 
and lazy initialized, perhaps nested under a controller object of your 
making.

  with senario a)  i can have more control over updates, insert, and 
deletes
  where as with  senario b) has higher availabilty
I disagree with your analysis of those scenarios.  Neither has more or 
less control or availability than the other.

 I want to create and store the IndexSearcher object, during 
initailization
to save on
 mutlitple open and reads. once updates are ready signal can be sent to
block further searches while the updates are integrated into the 
existing
index.
It is a good thing to keep an IndexSearcher instance around for big 
indexes to save on that I/O, I completely agree.  A simple 
IndexSearcher-encapsulating Java object which lazy initializes and 
keeps IndexSearcher as a transient would be quite sufficient, I think.  
Store that object wherever you like - application scope seems to be 
appropriate for your web application scenario.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Indexing and Searching Database in Lucene





Funy thing is I was thinking of doing something like this just today. 
This is especially good when you perform a lot of queries using the
LIKE statement.  Lucene would increase search performance a great deal.

Aviran wrote:

  You need to create a lucene index from the database.
Just  index the columns and the records from the database.
It will be useful to have also a field in lucene that contains the
database's primary key, so you can retrieve the actual record from the
database

Aviran

-Original Message-
From: sivalingam T [mailto:[EMAIL PROTECTED]] 
Sent: Friday, August 20, 2004 10:55 AM
To: [EMAIL PROTECTED]
Subject: Indexing and Searching Database in Lucene


  Hi

  Can we index and search database in Lucene Search Engine?
  if anybody have please send reply.


With Warm Regards,
Sivalingam.T

Sai Eswar Innovations (P) Ltd,
Chennai-92



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

  



-- 

Don Vaillancourt
Director of Software Development


WEB IMPACT INC.
phone: 416-815-2000 ext. 245
fax: 416-815-2001
email: [EMAIL PROTECTED]
web: http://www.web-impact.com




This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright. If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Debian build problem with 1.4.1



Hi Otis,

>I'm asking, because it looks like your compiler is not finding Reader
>and IOException classes, both of which are in java.io.* package, which
>I see imported in StandardTokenizer.java as 'import java.io.*;'.


In my copy of StandardTokenizer.java, there is no 'import java.io.*;'
(and in fact this is a change from lucene-1.4-final). Since this file
is apparently generated from JavaCC, I'm not sure what to do.  I'm
happy to supply a login to a Debian computer if someone is interested
in helping debug.

>Are any of those commands actually using Lucene's build.xml?

Yes, they are just a wrapper around calling ant. The build.xml 
file has very minimal debian specific modifications.

Cheers,
Jeff

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Indexing and Searching Database in Lucene

2004-08-20 Thread Aviran

You need to create a lucene index from the database.
Just  index the columns and the records from the database.
It will be useful to have also a field in lucene that contains the
database's primary key, so you can retrieve the actual record from the
database

Aviran

-Original Message-
From: sivalingam T [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 20, 2004 10:55 AM
To: [EMAIL PROTECTED]
Subject: Indexing and Searching Database in Lucene


  Hi

  Can we index and search database in Lucene Search Engine?
  if anybody have please send reply.


With Warm Regards,
Sivalingam.T

Sai Eswar Innovations (P) Ltd,
Chennai-92



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Indexing and Searching Database in Lucene

2004-08-20 Thread sivalingam T

  Hi

  Can we index and search database in Lucene Search Engine?
  if anybody have please send reply.


With Warm Regards,
Sivalingam.T

Sai Eswar Innovations (P) Ltd,
Chennai-92

Indexing and Searching Database Values in Lucene Search Engine

2004-08-20 Thread sivalingam T

  
How to index and search database values using Lucene Search Engine?

By

T.Sivalingam.

Sivalingam T

Re: pdfboxhelp

Iam sorry, mail has been sent accidentally
  - Original Message - 
  From: Don Vaillancourt 
  To: Lucene Users List 
  Sent: Friday, August 20, 2004 8:02 PM
  Subject: Re: pdfboxhelp

  Did I leave you speechless!?  :-)

  Santosh wrote:

  - Original Message - 
  From: Don Vaillancourt 
  To: Lucene Users List 
  Sent: Friday, August 20, 2004 7:37 PM
  Subject: Re: pdfboxhelp

  Here is the super simple code required.

  import org.pdfbox.searchengine.lucene.*;

  File pdfFile = new File("/path/to/the/file.pdf"); 

  // Below returns a parse PDF file in a Lucene Document object.
  Document doc = LucenePDFDocument.getDocument(pdfFile);

  Santosh wrote:

exactly, the same is required to me
  - Original Message - 
  From: Don Vaillancourt 
  To: Lucene Users List 
  Sent: Friday, August 20, 2004 6:39 PM
  Subject: Re: pdfboxhelp

  What are your intensions with PDFBox?

  You want to use it to index PDF files?

  Santosh wrote:

hi,

I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can 
I check with demo, I dont see any help document with this download, please help me.

regards
Santosh kumar
SoftPro Systems
Hyderabad

"The harder you train in peace, the lesser you bleed in war"

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects. 

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

  -- 
  Don Vaillancourt
  Director of Software Development

  WEB IMPACT INC.
  phone: 416-815-2000 ext. 245
  fax: 416-815-2001
  email: [EMAIL PROTECTED]
  web: http://www.web-impact.com

  This email message is intended only for the addressee(s)
  and contains information that may be confidential and/or
  copyright. If you are not the intended recipient please
  notify the sender by reply email and immediately delete
  this email. Use, disclosure or reproduction of this email
  by anyone other than the intended recipient(s) is strictly
  prohibited. No representation is made that this email or
  any attachments are free of viruses. Virus scanning is
  recommended and is the responsibility of the recipient.

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects. 

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

--

  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
h

Re: pdfboxhelp

Did I leave you speechless!? :-)

Santosh wrote:

- Original Message -
From: Don Vaillancourt
To: Lucene Users List
Sent: Friday, August 20, 2004 7:37 PM
Subject: Re: pdfboxhelp

Here is the super simple code required.

import org.pdfbox.searchengine.lucene.*;

File pdfFile = new File("/path/to/the/file.pdf");

// Below returns a parse PDF file in a Lucene Document object.
Document doc = LucenePDFDocument.getDocument(pdfFile);

Santosh wrote:

exactly, the same is required to me
- Original Message -
From: Don Vaillancourt
To: Lucene Users List
Sent: Friday, August 20, 2004 6:39 PM
Subject: Re: pdfboxhelp

What are your intensions with PDFBox?

You want to use it to index PDF files?

Santosh wrote:

hi,

I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me.

regards
Santosh kumar
SoftPro Systems
Hyderabad

"The harder you train in peace, the lesser you bleed in war"

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being proprietary to SOFTPRO SYSTEMS is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

--
Don Vaillancourt
Director of Software Development

WEB IMPACT INC.
phone: 416-815-2000 ext. 245
fax: 416-815-2001
email: [EMAIL PROTECTED]
web: http://www.web-impact.com

This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright. If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being proprietary to SOFTPRO SYSTEMS is 'privileged'
and 'confidential'.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being proprietary to SOFTPRO SYSTEMS is 'privileged'
and 'confidential'.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.
-

continuous index updates

2004-08-20 Thread Crump, Michael

Hello,

 

I am currently working on a server app that will require the ability to
make index additions/deletions at any time.  I want to cache/reuse index
searchers and readers.  I know that once an index has changed only newly
opened readers will see the changes.  Creating a new reader to see the
changes and caching it will be no problem.  My  problem is that since
this is a multithreaded app other threads may be using the old readers
making it difficult to know when to close them.  I assume that a reader
must be closed to free the associated resources.  I was thinking about
using some kind of reference counted reader that would keep track of its
references and only truly close when there were no references.

 

Am I making this too difficult?

 

Is there a better way?

 

I assume others have had to do this using Lucene, do you have any
recommendations?

 

Regards,

 

Michael

Re: pdfboxhelp

  - Original Message - 
  From: Don Vaillancourt 
  To: Lucene Users List 
  Sent: Friday, August 20, 2004 7:37 PM
  Subject: Re: pdfboxhelp

  Here is the super simple code required.

  import org.pdfbox.searchengine.lucene.*;

  File pdfFile = new File("/path/to/the/file.pdf"); 

  // Below returns a parse PDF file in a Lucene Document object.
  Document doc = LucenePDFDocument.getDocument(pdfFile);

  Santosh wrote:

exactly, the same is required to me
  - Original Message - 
  From: Don Vaillancourt 
  To: Lucene Users List 
  Sent: Friday, August 20, 2004 6:39 PM
  Subject: Re: pdfboxhelp

  What are your intensions with PDFBox?

  You want to use it to index PDF files?

  Santosh wrote:

hi,

I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can 
I check with demo, I dont see any help document with this download, please help me.

regards
Santosh kumar
SoftPro Systems
Hyderabad

"The harder you train in peace, the lesser you bleed in war"

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects. 

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

  -- 
  Don Vaillancourt
  Director of Software Development

  WEB IMPACT INC.
  phone: 416-815-2000 ext. 245
  fax: 416-815-2001
  email: [EMAIL PROTECTED]
  web: http://www.web-impact.com

  This email message is intended only for the addressee(s)
  and contains information that may be confidential and/or
  copyright. If you are not the intended recipient please
  notify the sender by reply email and immediately delete
  this email. Use, disclosure or reproduction of this email
  by anyone other than the intended recipient(s) is strictly
  prohibited. No representation is made that this email or
  any attachments are free of viruses. Virus scanning is
  recommended and is the responsibility of the recipient.

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects. 

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

--

  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects. 

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

Re: pdfboxhelp

Here is the super simple code required.

import org.pdfbox.searchengine.lucene.*;

File pdfFile = new File("/path/to/the/file.pdf");

// Below returns a parse PDF file in a Lucene Document object.
Document doc = LucenePDFDocument.getDocument(pdfFile);

Santosh wrote:

exactly, the same is required to me
- Original Message -
From: Don Vaillancourt
To: Lucene Users List
Sent: Friday, August 20, 2004 6:39 PM
Subject: Re: pdfboxhelp

What are your intensions with PDFBox?

You want to use it to index PDF files?

Santosh wrote:

hi,

I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me.

regards
Santosh kumar
SoftPro Systems
Hyderabad

"The harder you train in peace, the lesser you bleed in war"

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being proprietary to SOFTPRO SYSTEMS is 'privileged'
and 'confidential'.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

--
Don Vaillancourt
Director of Software Development

WEB IMPACT INC.
phone: 416-815-2000 ext. 245
fax: 416-815-2001
email: [EMAIL PROTECTED]
web: http://www.web-impact.com

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being proprietary to SOFTPRO SYSTEMS is 'privileged'
and 'confidential'.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being proprietary to SOFTPRO SYSTEMS is 'privileged'
and 'confidential'.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

Don Vaillancourt
Director of Software Development

WEB IMPACT INC.
phone: 416-815-2000 ext. 245
fax: 416-815-2001
email: [EMAIL PRO

Re: Lucene Search Applet

2004-08-20 Thread Simon mcIlwaine

Im a new Lucene User and I'm not too familiar with Applets either but I've
been doing a bit of testing on java applet security and if im correct in
saying that applets can read anything below there codebase then my problem
is not a security restriction one. The error is reading
java.lang.NoClassDefFoundError and the classpath is set as I have it working
in a Swing App. Does someone actually have Lucene working in an Applet? Can
it be done?? Please help.

Thanks

Simon

- Original Message - 

From: "Terry Steichen" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, August 18, 2004 4:17 PM
Subject: Re: Lucene Search Applet


I suspect it has to do with the security restrictions of the applet, 'cause
it doesn't appear to be finding your Lucene jar file.  Also, regarding the
lock files, I believe you can disable the locking stuff just for purposes
like yours (read-only index).

Regards,

Terry
  - Original Message - 
  From: Simon mcIlwaine
  To: Lucene Users List
  Sent: Wednesday, August 18, 2004 11:03 AM
  Subject: Lucene Search Applet


  Im developing a Lucene CD-ROM based search which will search html pages on
CD-ROM, using an applet as the UI. I know that theres a problem with lock
files and also security restrictions on applets so I am using the
RAMDirectory. I have it working in a Swing application however when I put it
into an applet its giving me problems. It compiles but when I go to run the
applet I get the error below. Can anyone help? Thanks in advance.
  Simon

  Error:

  Java.lang.noClassDefFoundError: org/apache/lucene/store/Directory

  At: Java.lang.Class.getDeclaredConstructors0(Native Method)

  At: Java.lang.Class.privateGetDeclaredConstructors(Class.java:1610)

  At: Java.lang.Class.getConstructor0(Class.java:1922)

  At: Java.lang.Class.newInstance0(Class.java:278)

  At: Java.lang.Class.newInstance(Class.java:261)

  At: sun.applet.AppletPanel.createApplet(AppletPanel.java:617)

  At: sun.applet.AppletPanel.runloader(AppletPanel.java:546)

  At: sun.applet.AppletPanel.run(AppletPanel.java:298)

  At: java.lang.Thread.run(Thread.java:534)

  Code:

  import org.apache.lucene.search.IndexSearcher;

  import org.apache.lucene.search.Query;

  import org.apache.lucene.search.TermQuery;

  import org.apache.lucene.store.RAMDirectory;

  import org.apache.lucene.store.Directory;

  import org.apache.lucene.index.Term;

  import org.apache.lucene.search.Hits;

  import java.awt.*;

  import java.awt.event.*;

  import javax.swing.*;

  import java.io.*;

  public class MemorialApp2 extends JApplet implements ActionListener{

  JLabel prompt;

  JTextField input;

  JButton search;

  JPanel panel;

  String indexDir = "C:/Java/lucene/index-list";

  private static RAMDirectory idx;

  public void init(){

  Container cp = getContentPane();

  panel = new JPanel();

  panel.setLayout(new FlowLayout(FlowLayout.CENTER, 4, 4));

  prompt = new JLabel("Keyword search:");

  input = new JTextField("",20);

  search = new JButton("Search");

  search.addActionListener(this);

  panel.add(prompt);

  panel.add(input);

  panel.add(search);

  cp.add(panel);

  }

  public void actionPerformed(ActionEvent e){

  if (e.getSource() == search){

  String surname = (input.getText());

  try {

  findSurname(indexDir, surname);

  } catch(Exception ex) {

  System.err.println(ex);

  }

  }

  }

  public static void findSurname(String indexDir, String surname) throws
Exception{

  idx = new RAMDirectory(indexDir);

  IndexSearcher searcher = new IndexSearcher(idx);

  Query query = new TermQuery(new Term("surname", surname));

  Hits hits = searcher.search(query);

  for (int i = 0; i < hits.length(); i++) {

  //Document doc = hits.doc(i);

  System.out.println("Surname: " + hits.doc(i).get("surname"));

  }

  }

  }



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: pdfboxhelp

exactly, the same is required to me
  - Original Message - 
  From: Don Vaillancourt 
  To: Lucene Users List 
  Sent: Friday, August 20, 2004 6:39 PM
  Subject: Re: pdfboxhelp

  What are your intensions with PDFBox?

  You want to use it to index PDF files?

  Santosh wrote:

hi,

I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can 
I check with demo, I dont see any help document with this download, please help me.

regards
Santosh kumar
SoftPro Systems
Hyderabad

"The harder you train in peace, the lesser you bleed in war"

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects. 

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

  -- 
  Don Vaillancourt
  Director of Software Development

  WEB IMPACT INC.
  phone: 416-815-2000 ext. 245
  fax: 416-815-2001
  email: [EMAIL PROTECTED]
  web: http://www.web-impact.com

  This email message is intended only for the addressee(s)
  and contains information that may be confidential and/or
  copyright. If you are not the intended recipient please
  notify the sender by reply email and immediately delete
  this email. Use, disclosure or reproduction of this email
  by anyone other than the intended recipient(s) is strictly
  prohibited. No representation is made that this email or
  any attachments are free of viruses. Virus scanning is
  recommended and is the responsibility of the recipient.

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects. 

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

--

  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are

confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'

and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or

have received it in error, You are notified that any use, copying or

dissemination  of the information contained in this E-MAIL in any

manner whatsoever is strictly prohibited. Please delete it immediately

and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination

of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be

those of the author and are not necessarily those of SOFTPRO SYSTEMS.

Re: pdf search

hi karthik,

I have a website with some items, each  contain html and pdf documents , I
have to store keywords against each item, whenever a user enters any search
word if it matches with any one of  the existing keyword list then it should
show the link to particular Item.


- Original Message -
From: "Karthik N S" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Friday, August 20, 2004 6:56 PM
Subject: RE: pdf search


> hi
>
> What is that u intend to Search and What is this own 'search words'
>
>  First Explain properly  u'r requirement to the form to get intented
> results.
>
>
>
> with regards
> Karthik
>
> -Original Message-
> From: Santosh [mailto:[EMAIL PROTECTED]
> Sent: Friday, August 20, 2004 5:59 PM
> To: Lucene Users List
> Subject: pdf search
>
>
> Hi,
>
> I am new bee to lucene.
>
> I have downloaded zip file. now how can i give my own list words to
lucene?
> In the demo i saw that lucene is automatically creating index if we run
the
> java program.but I want to give my own search words, how is it possible?
>
>
> regards
> Santosh kumar
> SoftPro Systems
> Hyderabad
>
>
> "The harder you train in peace, the lesser you bleed in war"
>
> ---SOFTPRO DISCLAIMER--
>
>
>
> Information contained in this E-MAIL and any attachments are
>
> confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
>
> and 'confidential'.
>
>
>
> If you are not an intended or authorised recipient of this E-MAIL or
>
> have received it in error, You are notified that any use, copying or
>
> dissemination  of the information contained in this E-MAIL in any
>
> manner whatsoever is strictly prohibited. Please delete it immediately
>
> and notify the sender by E-MAIL.
>
>
>
> In such a case reading, reproducing, printing or further dissemination
>
> of this E-MAIL is strictly prohibited and may be unlawful.
>
>
>
> SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
>
> hereto is free from computer viruses or other defects.
>
>
>
> The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
>
> those of the author and are not necessarily those of SOFTPRO SYSTEMS.
>
> 
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Debian build problem with 1.4.1

Hello Jeff,

I don't have Debian to try this out, and this is going to be a stupid
question and suggestion, but where/how is the CLASSPATH set?  Are any
of those commands actually using Lucene's build.xml?

I'm asking, because it looks like your compiler is not finding Reader
and IOException classes, both of which are in java.io.* package, which
I see imported in StandardTokenizer.java as 'import java.io.*;'.

Otis

--- Jeff Breidenbach <[EMAIL PROTECTED]> wrote:

> 
> Hi all,
> 
> I am the Debian package maintainer for Lucene, and I'm having build
> problems with 1.4.1. We are very close to a major Debian release
> (code
> named 'sarge'), and the window for changes is very small. Can someone
> please help me in the next day or two, otherwise Debian stable will
> ship
> Lucene 1.4-final for the next couple of years. It looks to me like
> the
> problem is in javacc generated code, and it's not obvious to me what
> to do.
> 
> For debian sarge or sid users out there who want to reproduce the
> build problem, download the lucene 1.4.1 source tarball, then:
> 
>   apt-get install devscripts
>   apt-get source liblucene-java
>   cd lucene-1.4
>   uupdate -v 1.4.1 ../lucene-1.4.1-src.tar.gz 
>   cd ../lucene-1.4.1
>   debuild -us -uc
> 
> Cheers,
> Jeff
> 
> =
> 
> 
> compile-core:
> [mkdir] Created dir: /tmp/lucene/lucene-1.4.1/build/classes/java
> [javac] Compiling 160 source files to
> /tmp/lucene/lucene-1.4.1/build/classes/java
> [javac]
>
/tmp/lucene/lucene-1.4.1/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java:15:
> cannot resolve symbol
> [javac] symbol  : class Reader
> [javac] location: class
> org.apache.lucene.analysis.standard.StandardTokenizer
> [javac]   public StandardTokenizer(Reader reader) {
> [javac]^
> [javac]
>
/tmp/lucene/lucene-1.4.1/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java:24:
> cannot resolve symbol
> [javac] symbol  : class IOException
> [javac] location: class
> org.apache.lucene.analysis.standard.StandardTokenizer
> [javac]   final public org.apache.lucene.analysis.Token next()
> throws ParseException, IOException {
> [javac]  
> ^
> [javac]
>
/tmp/lucene/lucene-1.4.1/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java:15:
> recursive constructor invocation
> [javac]   public StandardTokenizer(Reader reader) {
> [javac]  ^
> [javac] 3 errors
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: lucene and ejb applications

Option b) sounds simpler and sufficient to me.  I don't see why you
would need to involve RMI for something as simple as this.  I use
something similar to your b) option for some indices behind
http://www.simpy.com/ .  I don't store IndexSearcher in the servlet
context, though - I just have some logic like this:


/**
 * Returns an instance of [EMAIL PROTECTED] IndexDescriptor} for the given
 * indexID, which must represent an absolute file
 * path to the index directory.
 * 
 * This method caches [EMAIL PROTECTED] IndexDescriptor}s in a LRU Map and
 * first tries to retrieve them from there.
 * 
 * If the specified index has been changed since the the last time
 * it was used, its [EMAIL PROTECTED] Searcher} is reloaded.
 *
 * @param indexID the full path to the index directory
 * @return an instance of [EMAIL PROTECTED] IndexDescriptor}
 * @throws SearcherException if the given index cannot be accessed
 */
IndexDescriptor getUserSearcherIndexDescriptor(String indexID)
throws SearcherException
{
File indexDir = validateIndex(indexID);
IndexDescriptor indexDescriptor =
getIndexDescriptorFromCache(indexDir);

try
{
// if this is a known index
if (indexDescriptor != null)
{
// if the index has changed since this Searcher was
created, make a new Searcher
long currentVersion =
IndexReader.getCurrentVersion(indexDir);
if (currentVersion > indexDescriptor.lastKnownVersion)
{
indexDescriptor.lastKnownVersion = currentVersion;
indexDescriptor.searcher = new
LuceneUserSearcher(indexDir);
}
}
// if this is a new index
else
{
indexDescriptor = new IndexDescriptor();
indexDescriptor.indexDir = indexDir;
indexDescriptor.lastKnownVersion =
IndexReader.getCurrentVersion(indexDir);
indexDescriptor.searcher = new
LuceneUserSearcher(indexDir);
}
return cacheIndexDescriptor(indexDescriptor);
}
catch (IOException e)
{
throw new SearcherException("Cannot open index: " +
indexDir, e);
}
}

IndexDescriptor is a simple struct-like class.


Otis


--- Rupinder Singh Mazara <[EMAIL PROTECTED]> wrote:

> hi erik
> 
>  thanks for the warning and the code.
>  Let me re-phrase the question,
> 
>  i have a index generated by lucene, i need to have the search
> capabilty
>  to have a high availabilty. What solutions would be the most optimal
> 
>  Currentlly i have two senarions in mind
>   a) setup a RMI based app. that on start-up initializes a
> IndexSearcher
> object
>  and waits for invocation of a method like Vector
> executeQuery(Query )
> 
>   b) create a web based app(jsp/servlet or struts)  that initialises
> the
> IndexSearcher object, and stores in the servletContext on
> intialization, and
> all request invoke the Hits search(Query q)
> 
>   with senario a)  i can have more control over updates, insert, and
> deletes
>   where as with  senario b) has higher availabilty
> 
>  I want to create and store the IndexSearcher object, during
> initailization
> to save on
>  mutlitple open and reads. once updates are ready signal can be sent
> to
> block further searches while the updates are integrated into the
> existing
> index.
> 
> 
> 
> >-Original Message-
> >From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> >Sent: 20 August 2004 11:13
> >To: Lucene Users List
> >Subject: Re: lucene and ejb applications
> >
> >
> >What would be the best way?  Use Lucene outside of EJB.  It's quite
> >silly to make such a decision "purely due to a policy decision" when
> >the technicalities of it show that it is an unwise decision.
> >
> >You're going to navigate Hits through a session bean?  And as you
> said,
> >the EJB spec says not to use file I/O from EJB's.  That is a good
> >recommendation if you are distributing your system across servers
> and
> >replication is occurring - if another call to a session bean occurs
> and
> >ends up on a different server, then the file handle is lost.
> >
> >I violate the spec in my JavaDevWithAnt project and have one mode
> where
> >I have a stateless session bean returning search results:
> >http://www.ehatchersolutions.com/JavaDevWithAnt - but I definitely
> do
> >not recommend it.  It works when you are in a single-server
> >environment.
> >
> >In summary - EJB and Lucene are not a good mix - don't force it just
> to
> >be buzzword compliant.
> >
> > Erik
> >
> >
> >On Aug 20, 2004, at 4:32 AM, Rupinder Singh Mazara wrote:
> >
> >> hi all
> >>
> >>purely due to a policy decision, we would like to host our
> lucene
> >> search
> >> application , in a j2ee container, preferable by means of a ejb.
> >> Since access to java.io is restricted by the ej

RE: pdf search

2004-08-20 Thread Karthik N S

hi

What is that u intend to Search and What is this own 'search words'

 First Explain properly  u'r requirement to the form to get intented
results.



with regards
Karthik

-Original Message-
From: Santosh [mailto:[EMAIL PROTECTED]
Sent: Friday, August 20, 2004 5:59 PM
To: Lucene Users List
Subject: pdf search


Hi,

I am new bee to lucene.

I have downloaded zip file. now how can i give my own list words to lucene?
In the demo i saw that lucene is automatically creating index if we run the
java program.but I want to give my own search words, how is it possible?


regards
Santosh kumar
SoftPro Systems
Hyderabad


"The harder you train in peace, the lesser you bleed in war"

---SOFTPRO DISCLAIMER--



Information contained in this E-MAIL and any attachments are

confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'

and 'confidential'.



If you are not an intended or authorised recipient of this E-MAIL or

have received it in error, You are notified that any use, copying or

dissemination  of the information contained in this E-MAIL in any

manner whatsoever is strictly prohibited. Please delete it immediately

and notify the sender by E-MAIL.



In such a case reading, reproducing, printing or further dissemination

of this E-MAIL is strictly prohibited and may be unlawful.



SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

hereto is free from computer viruses or other defects.



The opinions expressed in this E-MAIL and any ATTACHEMENTS may be

those of the author and are not necessarily those of SOFTPRO SYSTEMS.





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: pdfboxhelp

What are your intensions with PDFBox?

You want to use it to index PDF files?

Santosh wrote:

hi,

I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me.

regards
Santosh kumar
SoftPro Systems
Hyderabad

"The harder you train in peace, the lesser you bleed in war"

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being proprietary to SOFTPRO SYSTEMS is 'privileged'
and 'confidential'.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.

Don Vaillancourt
Director of Software Development

WEB IMPACT INC.
phone: 416-815-2000 ext. 245
fax: 416-815-2001
email: [EMAIL PROTECTED]
web: http://www.web-impact.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

pdfboxhelp

hi,

I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can 
I check with demo, I dont see any help document with this download, please help me.


regards
Santosh kumar
SoftPro Systems
Hyderabad


"The harder you train in peace, the lesser you bleed in war"

---SOFTPRO DISCLAIMER--



Information contained in this E-MAIL and any attachments are

confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'

and 'confidential'.



If you are not an intended or authorised recipient of this E-MAIL or

have received it in error, You are notified that any use, copying or

dissemination  of the information contained in this E-MAIL in any

manner whatsoever is strictly prohibited. Please delete it immediately

and notify the sender by E-MAIL.



In such a case reading, reproducing, printing or further dissemination

of this E-MAIL is strictly prohibited and may be unlawful.



SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

hereto is free from computer viruses or other defects.



The opinions expressed in this E-MAIL and any ATTACHEMENTS may be

those of the author and are not necessarily those of SOFTPRO SYSTEMS.

RE: pdf search

2004-08-20 Thread David Townsend

Hi Santosh,

Lucene doesn't search pdfs per se.  To make anything searchable you have to first 
extract the content and then put it in lucene in a form it understands (i.e document 
objects).  So in order to search your pdfs you first need to extract the info from the 
PDFs using something like PDFBox.  So your battleplan should be forget lucene for a 
while, get the raw data out of all the items you want to search. Then look at the 
lucene articles about creating simple searchable indices.

DT

"If we didn't train to fight, who'd fight the wars?" :)

-Original Message-
From: Santosh [mailto:[EMAIL PROTECTED]
Sent: 20 August 2004 13:30
To: Lucene Users List
Subject: Fw: pdf search


How can I search through PDF?
- Original Message - 
From: Santosh 
To: Lucene Users List 
Sent: Friday, August 20, 2004 5:59 PM
Subject: pdf search


Hi,

I am new bee to lucene.

I have downloaded zip file. now how can i give my own list words to lucene?
In the demo i saw that lucene is automatically creating index if we run the java 
program.but I want to give my own search words, how is it possible? 


regards
Santosh kumar
SoftPro Systems
Hyderabad


"The harder you train in peace, the lesser you bleed in war"

---SOFTPRO DISCLAIMER--



Information contained in this E-MAIL and any attachments are

confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'

and 'confidential'.



If you are not an intended or authorised recipient of this E-MAIL or

have received it in error, You are notified that any use, copying or

dissemination  of the information contained in this E-MAIL in any

manner whatsoever is strictly prohibited. Please delete it immediately

and notify the sender by E-MAIL.



In such a case reading, reproducing, printing or further dissemination

of this E-MAIL is strictly prohibited and may be unlawful.



SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

hereto is free from computer viruses or other defects. 



The opinions expressed in this E-MAIL and any ATTACHEMENTS may be

those of the author and are not necessarily those of SOFTPRO SYSTEMS.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Fw: pdf search

2004-08-20 Thread Ben Litchfield



In order to search through a PDF document the text must be extracted from
the PDF document.  There are several libraries to do that, including
http://www.pdfbox.org   After you have the text from the PDF document you
just add it to the lucene index like any other text document.  You should
go through the intro tutorial to understand how to index/search text using
lucene.

Ben



On Fri, 20 Aug 2004, Santosh wrote:

> How can I search through PDF?
> - Original Message -
> From: Santosh
> To: Lucene Users List
> Sent: Friday, August 20, 2004 5:59 PM
> Subject: pdf search
>
>
> Hi,
>
> I am new bee to lucene.
>
> I have downloaded zip file. now how can i give my own list words to lucene?
> In the demo i saw that lucene is automatically creating index if we run the java 
> program.but I want to give my own search words, how is it possible?
>
>
> regards
> Santosh kumar
> SoftPro Systems
> Hyderabad
>
>
> "The harder you train in peace, the lesser you bleed in war"
>
> ---SOFTPRO DISCLAIMER--
>
> Information contained in this E-MAIL and any attachments are
> confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
> and 'confidential'.
>
> If you are not an intended or authorised recipient of this E-MAIL or
> have received it in error, You are notified that any use, copying or
> dissemination  of the information contained in this E-MAIL in any
> manner whatsoever is strictly prohibited. Please delete it immediately
> and notify the sender by E-MAIL.
>
> In such a case reading, reproducing, printing or further dissemination
> of this E-MAIL is strictly prohibited and may be unlawful.
>
> SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
> hereto is free from computer viruses or other defects.
>
> The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
> those of the author and are not necessarily those of SOFTPRO SYSTEMS.
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Fw: pdf search

How can I search through PDF?
- Original Message - 
From: Santosh 
To: Lucene Users List 
Sent: Friday, August 20, 2004 5:59 PM
Subject: pdf search

Hi,

I am new bee to lucene.

I have downloaded zip file. now how can i give my own list words to lucene?
In the demo i saw that lucene is automatically creating index if we run the java 
program.but I want to give my own search words, how is it possible? 

regards
Santosh kumar
SoftPro Systems
Hyderabad

"The harder you train in peace, the lesser you bleed in war"

---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are

confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'

and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or

have received it in error, You are notified that any use, copying or

dissemination  of the information contained in this E-MAIL in any

manner whatsoever is strictly prohibited. Please delete it immediately

and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination

of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be

those of the author and are not necessarily those of SOFTPRO SYSTEMS.

pdf search

Hi,

I am new bee to lucene.

I have downloaded zip file. now how can i give my own list words to lucene?
In the demo i saw that lucene is automatically creating index if we run the java 
program.but I want to give my own search words, how is it possible? 


regards
Santosh kumar
SoftPro Systems
Hyderabad


"The harder you train in peace, the lesser you bleed in war"

---SOFTPRO DISCLAIMER--



Information contained in this E-MAIL and any attachments are

confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'

and 'confidential'.



If you are not an intended or authorised recipient of this E-MAIL or

have received it in error, You are notified that any use, copying or

dissemination  of the information contained in this E-MAIL in any

manner whatsoever is strictly prohibited. Please delete it immediately

and notify the sender by E-MAIL.



In such a case reading, reproducing, printing or further dissemination

of this E-MAIL is strictly prohibited and may be unlawful.



SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment

hereto is free from computer viruses or other defects.



The opinions expressed in this E-MAIL and any ATTACHEMENTS may be

those of the author and are not necessarily those of SOFTPRO SYSTEMS.

RE: lucene and ejb applications

2004-08-20 Thread Rupinder Singh Mazara

hi erik

 thanks for the warning and the code.
 Let me re-phrase the question,

 i have a index generated by lucene, i need to have the search capabilty
 to have a high availabilty. What solutions would be the most optimal

 Currentlly i have two senarions in mind
  a) setup a RMI based app. that on start-up initializes a IndexSearcher
object
 and waits for invocation of a method like Vector executeQuery(Query )

  b) create a web based app(jsp/servlet or struts)  that initialises the
IndexSearcher object, and stores in the servletContext on intialization, and
all request invoke the Hits search(Query q)

  with senario a)  i can have more control over updates, insert, and deletes
  where as with  senario b) has higher availabilty

 I want to create and store the IndexSearcher object, during initailization
to save on
 mutlitple open and reads. once updates are ready signal can be sent to
block further searches while the updates are integrated into the existing
index.



>-Original Message-
>From: Erik Hatcher [mailto:[EMAIL PROTECTED]
>Sent: 20 August 2004 11:13
>To: Lucene Users List
>Subject: Re: lucene and ejb applications
>
>
>What would be the best way?  Use Lucene outside of EJB.  It's quite
>silly to make such a decision "purely due to a policy decision" when
>the technicalities of it show that it is an unwise decision.
>
>You're going to navigate Hits through a session bean?  And as you said,
>the EJB spec says not to use file I/O from EJB's.  That is a good
>recommendation if you are distributing your system across servers and
>replication is occurring - if another call to a session bean occurs and
>ends up on a different server, then the file handle is lost.
>
>I violate the spec in my JavaDevWithAnt project and have one mode where
>I have a stateless session bean returning search results:
>http://www.ehatchersolutions.com/JavaDevWithAnt - but I definitely do
>not recommend it.  It works when you are in a single-server
>environment.
>
>In summary - EJB and Lucene are not a good mix - don't force it just to
>be buzzword compliant.
>
>   Erik
>
>
>On Aug 20, 2004, at 4:32 AM, Rupinder Singh Mazara wrote:
>
>> hi all
>>
>>purely due to a policy decision, we would like to host our lucene
>> search
>> application , in a j2ee container, preferable by means of a ejb.
>> Since access to java.io is restricted by the ejb specification, what
>> would
>> be the best way to create desgin the application ?
>>   i have taken a look at [EMAIL PROTECTED] but it my relies on mbeans and
>> not a
>> session bean
>>   does any one have pointers or samples that can be looked at
>>
>>
>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>-
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene and ejb applications