RE : RE : Lucene scalability/clustering

2004-02-24 Thread Rasik Pandey
 I'm trying to see what are some common ways to scale lucene
 onto
 multiple boxes.  Is RMI based search and using a MultiSearcher
 the
 general approach?

More details about what you are attempting would be helpful.


RBP


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



TermDocs

2004-02-24 Thread Damien Lust
I used the class TermDocs to have the freq,doc (a number) of a term.

I would like to retrieve the freq,Document  of a term.

Any Ideas?

Thanks.

Here you are my code:

 // Create an IndexReader on a specific Directory
Directory directory = FSDirectory.getDirectory(lucenePathIndex,false);
IndexReader reader = IndexReader.open(directory);
	
// Enumeration of all the terms of the lucene files
TermEnum enum = reader.terms();
while (enum.next()){
	  Term term = enum.term();
			// Return the object that contains the document number and the 
frequence of the term in the doc
		TermDocs termDocs = reader.termDocs(term);
}

Re: Lucene and Message Driven Bean

2004-02-24 Thread Otis Gospodnetic
IOException has been discussed here, but never really itched anyone
enough to change the code.
Look at IndexReader methods for checking whether index is locked and
whether it exists.

Missing segments file, index being locked, etc. are signs of possible
mis-understanding/mis-use of IndexReader and IndexWriter.

Otis

--- Clandes Tino [EMAIL PROTECTED] wrote:
 Hi all, 
 I am new at this mailing list, although I have been
 using Lucene for a quite long time.
 I have implemented Lucene API for a pretty big
 multi-language groupware application, but I still have
 some problems and dilemmas.
 I should not use Lucene indexing in schedule procedure
 (as I found like common way to use Lucene), because I
 am supposed to provide searchable item, as soon as it
 is uploaded (document, meeting, forum article etc)
 So, I made a solution (described under) and would like
 to hear from experts in this field if it is a good or
 bad one in general, suggestions and opinions. 
 1. Indexing process:
 After upload (parallel storage in DB and File System)
 I call my Stateless Session Bean which puts uploaded
 item (wrapped in JMS Message) in Queue. Message Driven
 Bean (configured as One Instance in Pool – under
 JBoss) receives message and calls Lucene methods which
 then perform indexing stuff.
 Dilemma: Is there better way to do this, providing the
 same functionality?
 Problem: I face the situation that IOException is
 raised after call IndexWriter constructor
 IndexWriter(Directory d, Analyzer a, final boolean
 create) with different messages.
 - Index locked for write
 - Lock obtain timed out
 - Other messages if index is corrupted (no segments
 file e.g - I deleted it on purpose)
 The thing I would like to do is:
 - If Index is locked due to any reason, rollback the
 transaction – bring the message back into queue.
 - If Index is corrupted, discard the messages in queue
 and send mail to administrator.
 Do you find an idea to subclass IOException and
 somehow treat differently situation when index is
 locked from when it is corrupted, appropriate?
 
 Thanks a lot in advance.
 Next problem – dilemma is regarding analyzing content
 and is to be followed.
 Best regards
 Milan Agatonovic
 
 
 
   
   
   
 ___
 Yahoo! Messenger - Communicate instantly...Ping 
 your friends today! Download Messenger Now 
 http://uk.messenger.yahoo.com/download/index.html
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene HARD lock during indexing... BAD bug!?

2004-02-24 Thread Kevin A. Burton
I've recently noticed this major problem with our indexer.

It only seems to happen running on 1.3 final and not on 1.3 RC2.

Essentially we have about 80 threads doing indexing... they all add to a 
RAMDirectory which after 10k documents does a commit to disk.

This appears to be a problem with FSDirectory .. which was not modified 
in 1.3 final (same as in RC2)

If anyone has any suggestions they would be appreciated!

Here's the full stacktrace ...

ksa-task-thread-32 prio=1 tid=0x89d04140 nid=0x4666 runnable 
[ba7ff000..ba7ff8d0]
   at java.io.RandomAccessFile.readBytes(Native Method)
   at java.io.RandomAccessFile.read(RandomAccessFile.java:307)
   at 
org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:433)
   - locked 0x46df74a0 (a 
org.apache.lucene.store.FSInputStream$Descriptor)
   at org.apache.lucene.store.InputStream.refill(InputStream.java:196)
   at org.apache.lucene.store.InputStream.readByte(InputStream.java:81)
   at 
org.apache.lucene.store.InputStream.readVLong(InputStream.java:141)
   at 
org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:117)
   at 
org.apache.lucene.index.TermInfosReader.readIndex(TermInfosReader.java:110)
   at 
org.apache.lucene.index.TermInfosReader.init(TermInfosReader.java:82)
   at 
org.apache.lucene.index.SegmentReader.init(SegmentReader.java:141)
   at 
org.apache.lucene.index.SegmentReader.init(SegmentReader.java:120)
   at 
org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118)
   at org.apache.lucene.store.Lock$With.run(Lock.java:148)
   at org.apache.lucene.index.IndexReader.open(IndexReader.java:111)
   - locked 0x4ccc3790 (a org.apache.lucene.store.FSDirectory)
   at 
ksa.index.AdvancedIndexWriter.doDeleteByResource(AdvancedIndexWriter.java:264)
   at 
ksa.index.AdvancedIndexWriter.commit(AdvancedIndexWriter.java:172)
   - locked 0x497cfde0 (a java.lang.Object)
   at 
ksa.index.IndexManager.getAdvancedIndexWriter(IndexManager.java:282)
   - locked 0x497e0a50 (a java.lang.Object)
   at 
ksa.robot.FeedTaskParserListener.onItemEnd(FeedTaskParserListener.java:315)
   at 
org.apache.commons.feedparser.RSSFeedParser.doParseItem(RSSFeedParser.java:260)
   at 
org.apache.commons.feedparser.RSSFeedParser.parse(RSSFeedParser.java:126)
   at 
org.apache.commons.feedparser.FeedParser.parse(FeedParser.java:146)
   at 
org.apache.commons.feedparser.FeedParser.parse(FeedParser.java:113)
   at ksa.robot.FeedTask.doUpdate(FeedTask.java:375)
   at ksa.robot.FeedTask.doUpdate(FeedTask.java:331)
   at ksa.robot.FeedTask.run(FeedTask.java:147)
   at ksa.robot.TaskThread.doProcessTask(TaskThread.java:213)
   at ksa.robot.TaskThread.run(TaskThread.java:78)

--

Please reply using PGP:

   http://peerfear.org/pubkey.asc

   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

begin:vcard
fn:Kevin Burton
n:Burton;Kevin
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



signature.asc
Description: OpenPGP digital signature


Re: Lucene HARD lock during indexing... BAD bug!?

2004-02-24 Thread Kevin A. Burton
Kevin A. Burton wrote:

I've recently noticed this major problem with our indexer.

It only seems to happen running on 1.3 final and not on 1.3 RC2.

Essentially we have about 80 threads doing indexing... they all add to 
a RAMDirectory which after 10k documents does a commit to disk.
... also.  This seems to only happen when I delete items from the index.

Kevin

--

Please reply using PGP:

   http://peerfear.org/pubkey.asc

   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

begin:vcard
fn:Kevin Burton
n:Burton;Kevin
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



signature.asc
Description: OpenPGP digital signature


Re: can't delete from an index using IndexReader.delete()

2004-02-24 Thread Kevin A. Burton
Dhruba Borthakur wrote:

Hi folks,

I am using the latest and greatest Lucene jar file and am facing a 
problem with
deleting documents from the index. Browsing the mail archive, I found 
that the
following email (June 2003) listed the exact problem that I am 
encountering.

In short: I am using Field.text(id, value) to mark a document. 
Then I use
reader.delete(new Term(id, value)) to remove the document: this
call returns 0 and fails to delete the document. The attached sample 
program
shows this behaviour.
Agreed... you're values might be indexed... try adding them as Tokens...

Kevin

--

Please reply using PGP:

   http://peerfear.org/pubkey.asc

   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

begin:vcard
fn:Kevin Burton
n:Burton;Kevin
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard



signature.asc
Description: OpenPGP digital signature


Re: RE : Lucene scalability/clustering

2004-02-24 Thread Doug Cutting
Anson Lau wrote:
I'm trying to see what are some common ways to scale lucene onto
multiple boxes.  Is RMI based search and using a MultiSearcher the
general approach?
Yes, although you probably want to use ParallelMultiSearcher.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE : Lucene scalability/clustering

2004-02-24 Thread Anson Lau
RBP,

I'm implementing a search engine for a project at work.  It's going to
index approx 1.5 rows in a database.

I am trying to get a feel of what my options are when scalability
becomes an issue.  I also want to know if those options require me to
implement my app in a different way right from the start.

Anson

-Original Message-
From: Rasik Pandey [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, February 24, 2004 9:34 PM
To: 'Lucene Users List'
Subject: RE : RE : Lucene scalability/clustering

 I'm trying to see what are some common ways to scale lucene
 onto multiple boxes.  Is RMI based search and using a 
 MultiSearcher the general approach?

More details about what you are attempting would be helpful.


RBP


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Prevent duplicate results?

2004-02-24 Thread Dror Matalon
The short answer is no. 
The longer answer is that the best way to achieve this is by first
seeing if the entry is already in the Index and replacing the old entry
with the new entry.

Dror

On Tue, Feb 24, 2004 at 04:10:00PM -0800, Kevin A. Burton wrote:
 Is there any way to prevent lucene from returning duplicate (but 
 'older') results from returning within a search result?
 
 Kevin
 
 -- 
 
 Please reply using PGP:
 
http://peerfear.org/pubkey.asc
 
NewsMonster - http://www.newsmonster.org/

 Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
   AIM/YIM - sfburtonator,  Web - http://peerfear.org/
 GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster
 





-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]