RE : RE : Lucene scalability/clustering
I'm trying to see what are some common ways to scale lucene onto multiple boxes. Is RMI based search and using a MultiSearcher the general approach? More details about what you are attempting would be helpful. RBP - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
TermDocs
I used the class TermDocs to have the freq,doc (a number) of a term. I would like to retrieve the freq,Document of a term. Any Ideas? Thanks. Here you are my code: // Create an IndexReader on a specific Directory Directory directory = FSDirectory.getDirectory(lucenePathIndex,false); IndexReader reader = IndexReader.open(directory); // Enumeration of all the terms of the lucene files TermEnum enum = reader.terms(); while (enum.next()){ Term term = enum.term(); // Return the object that contains the document number and the frequence of the term in the doc TermDocs termDocs = reader.termDocs(term); }
Re: Lucene and Message Driven Bean
IOException has been discussed here, but never really itched anyone enough to change the code. Look at IndexReader methods for checking whether index is locked and whether it exists. Missing segments file, index being locked, etc. are signs of possible mis-understanding/mis-use of IndexReader and IndexWriter. Otis --- Clandes Tino [EMAIL PROTECTED] wrote: Hi all, I am new at this mailing list, although I have been using Lucene for a quite long time. I have implemented Lucene API for a pretty big multi-language groupware application, but I still have some problems and dilemmas. I should not use Lucene indexing in schedule procedure (as I found like common way to use Lucene), because I am supposed to provide searchable item, as soon as it is uploaded (document, meeting, forum article etc) So, I made a solution (described under) and would like to hear from experts in this field if it is a good or bad one in general, suggestions and opinions. 1. Indexing process: After upload (parallel storage in DB and File System) I call my Stateless Session Bean which puts uploaded item (wrapped in JMS Message) in Queue. Message Driven Bean (configured as One Instance in Pool under JBoss) receives message and calls Lucene methods which then perform indexing stuff. Dilemma: Is there better way to do this, providing the same functionality? Problem: I face the situation that IOException is raised after call IndexWriter constructor IndexWriter(Directory d, Analyzer a, final boolean create) with different messages. - Index locked for write - Lock obtain timed out - Other messages if index is corrupted (no segments file e.g - I deleted it on purpose) The thing I would like to do is: - If Index is locked due to any reason, rollback the transaction bring the message back into queue. - If Index is corrupted, discard the messages in queue and send mail to administrator. Do you find an idea to subclass IOException and somehow treat differently situation when index is locked from when it is corrupted, appropriate? Thanks a lot in advance. Next problem dilemma is regarding analyzing content and is to be followed. Best regards Milan Agatonovic ___ Yahoo! Messenger - Communicate instantly...Ping your friends today! Download Messenger Now http://uk.messenger.yahoo.com/download/index.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene HARD lock during indexing... BAD bug!?
I've recently noticed this major problem with our indexer. It only seems to happen running on 1.3 final and not on 1.3 RC2. Essentially we have about 80 threads doing indexing... they all add to a RAMDirectory which after 10k documents does a commit to disk. This appears to be a problem with FSDirectory .. which was not modified in 1.3 final (same as in RC2) If anyone has any suggestions they would be appreciated! Here's the full stacktrace ... ksa-task-thread-32 prio=1 tid=0x89d04140 nid=0x4666 runnable [ba7ff000..ba7ff8d0] at java.io.RandomAccessFile.readBytes(Native Method) at java.io.RandomAccessFile.read(RandomAccessFile.java:307) at org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:433) - locked 0x46df74a0 (a org.apache.lucene.store.FSInputStream$Descriptor) at org.apache.lucene.store.InputStream.refill(InputStream.java:196) at org.apache.lucene.store.InputStream.readByte(InputStream.java:81) at org.apache.lucene.store.InputStream.readVLong(InputStream.java:141) at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:117) at org.apache.lucene.index.TermInfosReader.readIndex(TermInfosReader.java:110) at org.apache.lucene.index.TermInfosReader.init(TermInfosReader.java:82) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:141) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:120) at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118) at org.apache.lucene.store.Lock$With.run(Lock.java:148) at org.apache.lucene.index.IndexReader.open(IndexReader.java:111) - locked 0x4ccc3790 (a org.apache.lucene.store.FSDirectory) at ksa.index.AdvancedIndexWriter.doDeleteByResource(AdvancedIndexWriter.java:264) at ksa.index.AdvancedIndexWriter.commit(AdvancedIndexWriter.java:172) - locked 0x497cfde0 (a java.lang.Object) at ksa.index.IndexManager.getAdvancedIndexWriter(IndexManager.java:282) - locked 0x497e0a50 (a java.lang.Object) at ksa.robot.FeedTaskParserListener.onItemEnd(FeedTaskParserListener.java:315) at org.apache.commons.feedparser.RSSFeedParser.doParseItem(RSSFeedParser.java:260) at org.apache.commons.feedparser.RSSFeedParser.parse(RSSFeedParser.java:126) at org.apache.commons.feedparser.FeedParser.parse(FeedParser.java:146) at org.apache.commons.feedparser.FeedParser.parse(FeedParser.java:113) at ksa.robot.FeedTask.doUpdate(FeedTask.java:375) at ksa.robot.FeedTask.doUpdate(FeedTask.java:331) at ksa.robot.FeedTask.run(FeedTask.java:147) at ksa.robot.TaskThread.doProcessTask(TaskThread.java:213) at ksa.robot.TaskThread.run(TaskThread.java:78) -- Please reply using PGP: http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster begin:vcard fn:Kevin Burton n:Burton;Kevin email;internet:[EMAIL PROTECTED] x-mozilla-html:TRUE version:2.1 end:vcard signature.asc Description: OpenPGP digital signature
Re: Lucene HARD lock during indexing... BAD bug!?
Kevin A. Burton wrote: I've recently noticed this major problem with our indexer. It only seems to happen running on 1.3 final and not on 1.3 RC2. Essentially we have about 80 threads doing indexing... they all add to a RAMDirectory which after 10k documents does a commit to disk. ... also. This seems to only happen when I delete items from the index. Kevin -- Please reply using PGP: http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster begin:vcard fn:Kevin Burton n:Burton;Kevin email;internet:[EMAIL PROTECTED] x-mozilla-html:TRUE version:2.1 end:vcard signature.asc Description: OpenPGP digital signature
Re: can't delete from an index using IndexReader.delete()
Dhruba Borthakur wrote: Hi folks, I am using the latest and greatest Lucene jar file and am facing a problem with deleting documents from the index. Browsing the mail archive, I found that the following email (June 2003) listed the exact problem that I am encountering. In short: I am using Field.text(id, value) to mark a document. Then I use reader.delete(new Term(id, value)) to remove the document: this call returns 0 and fails to delete the document. The attached sample program shows this behaviour. Agreed... you're values might be indexed... try adding them as Tokens... Kevin -- Please reply using PGP: http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster begin:vcard fn:Kevin Burton n:Burton;Kevin email;internet:[EMAIL PROTECTED] x-mozilla-html:TRUE version:2.1 end:vcard signature.asc Description: OpenPGP digital signature
Re: RE : Lucene scalability/clustering
Anson Lau wrote: I'm trying to see what are some common ways to scale lucene onto multiple boxes. Is RMI based search and using a MultiSearcher the general approach? Yes, although you probably want to use ParallelMultiSearcher. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE : Lucene scalability/clustering
RBP, I'm implementing a search engine for a project at work. It's going to index approx 1.5 rows in a database. I am trying to get a feel of what my options are when scalability becomes an issue. I also want to know if those options require me to implement my app in a different way right from the start. Anson -Original Message- From: Rasik Pandey [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 24, 2004 9:34 PM To: 'Lucene Users List' Subject: RE : RE : Lucene scalability/clustering I'm trying to see what are some common ways to scale lucene onto multiple boxes. Is RMI based search and using a MultiSearcher the general approach? More details about what you are attempting would be helpful. RBP - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Prevent duplicate results?
The short answer is no. The longer answer is that the best way to achieve this is by first seeing if the entry is already in the Index and replacing the old entry with the new entry. Dror On Tue, Feb 24, 2004 at 04:10:00PM -0800, Kevin A. Burton wrote: Is there any way to prevent lucene from returning duplicate (but 'older') results from returning within a search result? Kevin -- Please reply using PGP: http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]