Bug 23650 (aka "docs out of order")?

2005-02-25 Thread petite_abeille
Re: http://issues.apache.org/bugzilla/show_bug.cgi?id=23650 Hello, I'm pretty confident that I'm misusing Lucene one way or another... and of course it was just a question of time before I ran into this "docs out of order" exception: java.lang.IllegalStateException: docs out of order at org

Re: ngramj

2005-02-24 Thread petite_abeille
On Feb 24, 2005, at 14:50, Gusenbauer Stefan wrote: Does anyone know a good tutorial or the javadoc for ngramj because i need it for guessing the language of the documents which should be indexed? http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/plugin/ languageidentifier/ Cheers -- PA,

Re: Opening up one large index takes 940M or memory?

2005-01-23 Thread petite_abeille
On Jan 24, 2005, at 00:10, Vic wrote: (Is there a btree seralization impl in java?) http://jdbm.sourceforge.net/ Cheers -- PA http://alt.textdrive.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [E

Re: Opening up one large index takes 940M or memory?

2005-01-22 Thread petite_abeille
On Jan 22, 2005, at 23:50, Kevin A. Burton wrote: The problem I think for everyone right now is that 32bits just doesn't cut it in production systems... 2G of memory per process and you really start to feel it. Hmmm... no... no pain at all... or perhaps you are implying that your entire system

Re: Lucene appreciation

2004-12-16 Thread petite_abeille
On Dec 16, 2004, at 17:26, Rony Kahan wrote: If you are interested in Lucene work you can set up an rss feed or email alert from here: http://www.indeed.com/search?q=lucene&sort=date Looks great :) One thing though, the web search returns 14 hits for the above query. Using the RSS feed only retur

[RFE] IndexWriter.updateDocument()

2004-12-14 Thread petite_abeille
Well, the subject says it all... If there is one thing which is overly cumbersome in Lucene, it's updating documents, therefore this Request For Enhancement: Please consider enhancing the IndexWriter API to include an updateDocument(...) method to take care of all the gory details involved in s

Re: Opinions: Using Lucene as a thin database

2004-12-14 Thread petite_abeille
On Dec 14, 2004, at 15:40, Kevin L. Cobb wrote: Was wondering if anyone out there was doing the same of it there are any dissenting opinions on using Lucene for this purpose. ZOE [1] [2] takes the same approach and uses Lucene as a relational engine of sort. However, for both practical and ideolo

Re: GETVALUES +SEARCH

2004-12-01 Thread petite_abeille
On Dec 01, 2004, at 21:14, Chris Hostetter wrote: The real question in my mind is not "how should we impliment 'get' given that we allow multiple values?", a better question is "how should we impliment 'put'?" Yes, retrofitting Document.add() in the Map interface would be a pain. But this is not

Re: GETVALUES +SEARCH

2004-12-01 Thread petite_abeille
On Dec 01, 2004, at 20:43, Erik Hatcher wrote: Sure, I could put it all together as a space separated String and use the WhitespaceAnalyzer, but why not do it this way? What other suggestions do you have for doing this? If this works for you, I don't see any problem with it. In general, I avoi

Re: GETVALUES +SEARCH

2004-12-01 Thread petite_abeille
On Dec 01, 2004, at 20:06, Erik Hatcher wrote: I also extensively use multiple fields of the same name. Odd... on the other hand... perhaps this is "une affaire de gout"... So does this rule out implementing the Map interface on Document? Why? Nobody mentioned what value such a Map would hold... i

Re: GETVALUES +SEARCH

2004-12-01 Thread petite_abeille
On Dec 01, 2004, at 18:05, Erik Hatcher wrote: Having Hits implement List sounds nice, but it could not function by itself if the backing IndexSearcher/IndexReader is closed or is not accessible. Wouldn't it be too tempting for naive users to consider passing this List around between tiers and

Re: Document-Map, Hits-List

2004-12-01 Thread petite_abeille
On Dec 01, 2004, at 17:41, Luke Francl wrote: Yes, but Otis hasn't implemented that interface. He's wrapping his Hits with a List of Maps. Right... I'm sure that Otis knows what he is doing :) As far as implementation goes, you have at least 3 options: - Implement List and Map directly in Lucene's

Re: Document-Map, Hits-List

2004-12-01 Thread petite_abeille
On Dec 01, 2004, at 17:31, Luke Francl wrote: How do you avoid the problem Eric just mentioned, iterating through all the Hits at once to populate this data structure? You don't need to iterate through anything upfront... you simply do it on-demand... eg when invoking List.get() you would invoke t

Re: GETVALUES +SEARCH

2004-12-01 Thread petite_abeille
On Dec 01, 2004, at 13:37, Karthik N S wrote: We create a ArrayList Object and Load all the Hit Values into them and return the same for Display purpose on a Servlet. Talking of which... It would be very handy if org.apache.lucene.search.Hits would implement the java.util.List interface... in

[OT] Re: Lots Of Interest in Lucene Desktop

2004-10-29 Thread petite_abeille
On Oct 28, 2004, at 20:26, Kevin A. Burton wrote: http://www.peerfear.org/rss/permalink/2004/10/28/ LotsOfInterestInLuceneDesktop/ Many people, few ideas :) http://www.popsearch.net/index.html PA. - To unsubscribe, e-mail: [EMAIL

Re: Google Desktop Could be Better

2004-10-15 Thread petite_abeille
On Oct 15, 2004, at 16:10, Tom Cunningham wrote: I'd be interested in trying to implement some of these ideas on Mac OS X, mostly because it's not already covered by Google Desktop, and I think the screensaver idea would work pretty well there. Anyone else want to give this a shot? "Google i

Re: Encrypted indexes

2004-10-13 Thread petite_abeille
On Oct 13, 2004, at 15:26, Nader Henein wrote: Well, are you "storing" any data for retrieval from the index, because you could encrypt the actual data and then encrypt the search string public key style. Alternatively, write your index to an encrypted volume... something along the line of FileV

Re: indexing size

2004-09-01 Thread petite_abeille
Hi Niraj, On Sep 01, 2004, at 06:45, Niraj Alok wrote: If I make some of them Field.Unstored, I can see from the javadocs that it will be indexed and tokenized but not stored. If it is not stored, how can I use it while searching? The different type of fields don't impact how you do your search

Re: indexing size

2004-08-31 Thread petite_abeille
On Aug 31, 2004, at 17:17, Otis Gospodnetic wrote: You also have a large number of fields, and it looks like a lot (all?) of them are stored and indexed. That's what that large .fdt file indicated. That file is > 206 MB in size. Try using Field.UnStored() to avoid storing all those data in your i

alternative query syntax?

2004-08-31 Thread petite_abeille
Hello, I would like to provide an alternative query syntax for ranges by using a colon (':') or two dots ('..') instead of ' TO '. For example: mod_date:[20020101:20030101] Or mod_date:[20020101..20030101] What would be the correct procedure to modify the QueryParser to achieve this? Should I si

Re: Lucene and MVC (was Re: Bad file descriptor (IOException) using SearchBean contribution)

2004-05-19 Thread petite_abeille
On May 20, 2004, at 04:38, Erik Hatcher wrote: OffTopic: havoc and Struts go well together ;) Pick up Tapestry instead! Nah. Keep it really Simple [1] instead :o) http://simpleweb.sourceforge.net/ PA. - To unsubscribe, e-mail: [E

index update (was Re: Large InputStream.BUFFER_SIZE causes OutOfMemoryError.. FYI)

2004-04-13 Thread petite_abeille
On Apr 13, 2004, at 02:45, Kevin A. Burton wrote: He mentioned that I might be able to squeeze 5-10% out of index merges this way. Talking of which... what strategy(ies) do people use to minimize downtime when updating an index? My current "strategy" is as follow: (1) use a temporary RAMDirect

Re: Did you mean...

2004-02-12 Thread petite_abeille
On Feb 12, 2004, at 16:42, Abhay Saswade wrote: How about creating spellcheck dictionary with all words in lucene index? That way you ensure that the word really exists in the index. You can indeed use the terms identified by Lucene as the dictionary "words" ands apply traditional spell checking

Re: index: how to store binary data or objects ?

2004-02-10 Thread petite_abeille
On Feb 10, 2004, at 14:53, Markus Brosch wrote: My application will deal with "small" data sets. The problem is, that I want to index the content (String) of some objects. I want to refer to that object once I found this by a keyword or whatever. So, using a simple map or tree? Something along

Re: Index advice...

2004-02-10 Thread petite_abeille
On Feb 10, 2004, at 14:03, Scott ganyo wrote: I have. While document.add() itself doesn't increase over time, the merge does. Ways of partially overcoming this include increasing the mergeFactor (but this will increase the number of file handles used), or building blocks of the index in memor

Re: index: how to store binary data or objects ?

2004-02-10 Thread petite_abeille
On Feb 10, 2004, at 09:32, Andrzej Bialecki wrote: Just a comment: for ext2fs and BSD FFS (dunno about NT) scalability issues with this approach can be partially addressed by building a tree of subdirectories, instead of using just one. I.e. a file named "myThesis.pdf" would go into /m/y/t/myTh

Re: index: how to store binary data or objects ?

2004-02-10 Thread petite_abeille
On Feb 10, 2004, at 03:59, [EMAIL PROTECTED] wrote: Is there a way to do this? Lucene deals with text. You could always serialize your objects in a byte array, hex encode them or something, and store that in an appropriate field. What would you suggest to do? Don't store your objects in Lucene

[OT] Re: Need Advices and Help

2004-02-05 Thread petite_abeille
On Feb 05, 2004, at 13:01, Otis Gospodnetic wrote: I believe it would be the value of a 'Message-ID' or 'Reference' or 'Reference-ID' message header. However, I remember reading that mail readers are not very good at sticking to a standard (some RFC, I guess), so they don't always provide the corr

[OT] Digital Format-Specific Validation

2003-12-06 Thread petite_abeille
http://hul.harvard.edu/jhove/ Might be of interest to some :) Cheers, PA. smime.p7s Description: S/MIME cryptographic signature

Re: moving documents from one index to another?

2003-11-20 Thread petite_abeille
On Nov 20, 2003, at 14:34, Eric Jain wrote: I see. Assuming I have the relevant terms for a given document, how would a build a new document based on those terms? Something like adding each term's field and text to the new document? Yes. Ok. Retrieving the term for a document turns out to be prett

Re: moving documents from one index to another?

2003-11-20 Thread petite_abeille
On Nov 20, 2003, at 14:34, Eric Jain wrote: I believe a term always contains it's own text. (It must be somewhere, after all...) Documents on the other hand may or may not contain the original text, depending on whether a field is stored or not. This seems to be the case: the term's text hold the

Re: moving documents from one index to another?

2003-11-20 Thread petite_abeille
On Nov 20, 2003, at 14:13, Eric Jain wrote: That's what I had in mind, but maybe there is better way. Once all terms are collected, they can be reassembled into a new document that that can then be indexed again. I see. Assuming I have the relevant terms for a given document, how would a build

Re: moving documents from one index to another?

2003-11-20 Thread petite_abeille
On Nov 20, 2003, at 13:45, Eric Jain wrote: If the document contains unstored fields, the only way to reconstruct the document is by iterating through all terms in the index and picking out those that reference the document. Hmmm... how would you do that? Something along the lines of aReader.term

moving documents from one index to another?

2003-11-20 Thread petite_abeille
Hello, I'm trying to move a Document from one Index to another, without necessarily reindexing it... The Document is composed of one Field.Keyword and a bunch of Field.UnStored. Reading such a Document from one index and then adding it to another one doesn't seems to have the expected effect

Re: Document ID's and duplicates

2003-11-19 Thread petite_abeille
On Nov 19, 2003, at 18:14, Don Kaiser wrote: If you do this will the old version of the document be replaced by the new one? No. They will coexist. In Lucene, an update implies a delete/insert sequence. PA. - To unsubscribe, e

Re: inter-term correlation [was Re: Vector Space Model in Lucene?]

2003-11-14 Thread petite_abeille
On Nov 14, 2003, at 21:14, Philippe Laflamme wrote: Rules of linguistics? Is there such a thing? :) Actually, yes there is. Natural Language Processing (NLP) is a very broad research subject but a lot has come out of it. A lot of what? "If" statements? :) Yes... just like every software boils down

Re: Vector Space Model in Lucene?

2003-11-14 Thread petite_abeille
On Nov 14, 2003, at 21:16, Chong, Herb wrote: if you know what TREC is, you know what i meant earlier. this isn't exotic technology, this is close to 15 year old technology. This is not really what I asked. What I would be interested to know is what approach you consider to provide the "biggest

Re: Vector Space Model in Lucene?

2003-11-14 Thread petite_abeille
On Nov 14, 2003, at 20:54, Chong, Herb wrote: it solves one part of the problem, but there are a lot of sentences in a typical document. you'll need to composite a rank of a document from its constituent sentences then. there are less drastic ways to solve the problem. the other problem is that

Re: inter-term correlation [was Re: Vector Space Model in Lucene?]

2003-11-14 Thread petite_abeille
On Nov 14, 2003, at 20:29, Philippe Laflamme wrote: Rules of linguistics? Is there such a thing? :) Actually, yes there is. Natural Language Processing (NLP) is a very broad research subject but a lot has come out of it. A lot of what? "If" statements? :) More specifically, Rule-based taggers ha

Re: Vector Space Model in Lucene?

2003-11-14 Thread petite_abeille
On Nov 14, 2003, at 20:27, Dror Matalon wrote: I might be the only person on the list who's having a hard time following this discussion. Nope. I don't understand a word of what those guys are talking about either :) Would one of you wise folks care to point me to a good "dummies", also known a

Re: inter-term correlation [was Re: Vector Space Model in Lucene?]

2003-11-14 Thread petite_abeille
On Nov 14, 2003, at 19:50, Chong, Herb wrote: if you are handling inter correlation properly, then terms can't cross sentence boundaries. Could you not break down your document along sentences boundary? If you manage to figure out what a sentence is, that is. if you are not paying attention to

Re: Query Filters on term A in query "A AND (B OR C OR D)"

2003-11-13 Thread petite_abeille
On Nov 13, 2003, at 22:32, Jie Yang wrote: I am trying to optimse the 500 OR terms so that it does not do a full 2 millions docs search but on the 1000 returned. Would it be beneficial to move the first result set into its own (transient) index to perform the second part of your query? PA.

Re: Objection to using /tmp for lock files.

2003-11-13 Thread petite_abeille
On Nov 13, 2003, at 19:00, Dror Matalon wrote: I've been experimenting with it and it seems to work as advertised. It has the advantage of not requiring *any* write capability in /tmp or anywhere else. There is a system property to turn off the lock files altogether. PA.

Re: fuzzy searches

2003-11-13 Thread petite_abeille
On Nov 11, 2003, at 21:02, Bruce Ritchie wrote: Just a note the LSI is encumbered by US patents 4,839,853 and 5,301,109. It would be wise to make sure that any implementation is either blessed by the patent holders or does not infringe on the patents. Since when did developers turn into armchai

Re: fuzzy searches

2003-11-13 Thread petite_abeille
On Nov 13, 2003, at 15:09, Thomas Krämer wrote: i am not familiar with intelectual property law, but it sounds somewhat strange to me, that it is possible to patent an abstract idea of hom extracting information from data. The process of "Spreading Cream Cheese On Bagels" (C) (R) (TM) has been

Re: Overview to Lucene

2003-11-12 Thread petite_abeille
Hi Ralf, On Nov 12, 2003, at 14:06, [EMAIL PROTECTED] wrote: Does anybody know good articles which demonstrate parts of that or give a good start into Lucene? Otis Gospodnetic's articles are a good starting point: "Introduction to Text Indexing with Apache Jakarta Lucene" http://www.onjava.com/

Re: Document Clustering

2003-11-11 Thread petite_abeille
On Nov 11, 2003, at 21:32, maurits van wijland wrote: There is the carrot project : http://www.cs.put.poznan.pl/dweiss/carrot/ "Leo Galambos, author of the Egothor project, constantly supports us with fresh ideas and includes Carrot components in his own project!" http://www.cs.put.poznan.pl/dwe

Re: Document Clustering

2003-11-11 Thread petite_abeille
On Nov 11, 2003, at 16:58, Tate Avery wrote: Categorization typically assigns documents to a node in a pre-defined taxonomy. For clustering, however, the categorization 'structure' is emergent... i.e. the clusters (which are analogous to taxonomy nodes) are created dynamically based on the con

Re: Document Clustering

2003-11-11 Thread petite_abeille
Hi Otis, On Nov 11, 2003, at 16:41, Otis Gospodnetic wrote: How is document clustering different/related to text categorization? Not that I'm an expert in any of this, but clustering is a much more "holistic" approach than categorization. Usually, categorization is understood as a more precise

Re: Document Clustering

2003-11-11 Thread petite_abeille
On Nov 11, 2003, at 16:05, Marcel Stör wrote: As everybody seems to be so exited about it, would someone please be so kind to explain what "document based clustering" is? This mostly means finding document which are "similar" in some way(s). The "similitude" is mostly in the eyes of the beholder

Re: The best way forward

2003-11-04 Thread petite_abeille
Hi Dror, On Nov 04, 2003, at 19:33, Dror Matalon wrote: By the way, we're also thinking of integrating newsgroups into RSS aggregator which you can see at www.fastbuzz.com. ZOE does something similar already. It can vend messages as RSS feeds: http://zoe.nu/itstories/story.php?data=stories&num

Re: Relational Search

2003-11-04 Thread petite_abeille
On Nov 04, 2003, at 19:28, Tate Avery wrote: Does anyone have any creative ideas for tackling this problem with Lucene? Perhaps Not sure if this quiet what you are after, but you could take a look at ZOE's SZObject framework. It's build on top of Lucene to provide lightweight ODBMS like fun

Re: The best way forward

2003-11-04 Thread petite_abeille
On Nov 04, 2003, at 13:04, Otis Gospodnetic wrote: Eventually i am going to try to implement something similar to google groups, indexing lots of NNTP traffic. Has anyone done this before with lucune? Not that I know, but people have used Lucene to index their email, which is somewhat similar. Ver

Re: Exotic format indexing?

2003-10-30 Thread petite_abeille
On Oct 30, 2003, at 20:48, Ben Litchfield wrote: Unfortunately, it is not quite so easy. I am not sure about Word documents The raw text is visible. but PDFs usually have there contents compressed Yep. PDF is really an image format ;) so a raw "fishing" around for text would be pointless. That'

Re: 182 file formats for lucene!!! was: Re: Exotic format indexing?

2003-10-30 Thread petite_abeille
Hi Stefan, On Oct 30, 2003, at 21:02, Stefan Groschupf wrote: just to let you know, i had implement for the nutch project a plugin that can parse 182 file formats including m$ office. I simply use open office and use the available java api. Yes, I saw that. Great work :) Unfortunately, using Op

Exotic format indexing?

2003-10-30 Thread petite_abeille
Hello, Indexing a multitude of esoteric formats (MS Office, PDF, etc) is a popular question on this list... The traditional approach seems to be to try to find some kind of format specific reader to properly extract the textual part of such documents for indexing. The drawback of such an appro

Re: Term out of order.

2003-10-30 Thread petite_abeille
On Oct 30, 2003, at 13:36, Pasha Bizhan wrote: I think that it's problem of java version of Lucene. Because all core algorithms of Lucene and Lucene.Net are identical. Talking of which... it appears... that... something... is... wrong... somewhere... This definitely needs some additional invest

Re: java.nio.channels.FileLock

2003-10-29 Thread petite_abeille
On Oct 29, 2003, at 19:08, Ronald Muller wrote: What is the advantage of using a FileLock object instead of the way Lucene does it? (I do not see it) Less code. Less worries. Also note an mportant limitation: "File locks are held on behalf of the entire Java virtual machine. They are not suitab

java.nio.channels.FileLock

2003-10-29 Thread petite_abeille
Hello, Just stumbled upon that: http://java.sun.com/j2se/1.4.1/docs/api/java/nio/channels/FileLock.html Which might be of interest to Lucene if the library ever migrates to 1.4 :) Cheers, PA. - To unsubscribe, e-mail: [EMAIL

Re: new release: 1.3 RC2

2003-10-22 Thread petite_abeille
Hello, On Wednesday, Oct 22, 2003, at 18:13 Europe/Amsterdam, Doug Cutting wrote: A new Lucene release is available. Very nice. Thanks :) Quick question regarding release note number 11: What's the difference between IndexWriter.addIndexes(IndexReader[]) and IndexWriter.addIndexes(Directory[]

Re: Weird NPE in RAMInputStream when merging indices

2003-10-22 Thread petite_abeille
Hi Otis, On Wednesday, Oct 22, 2003, at 18:06 Europe/Amsterdam, Otis Gospodnetic wrote: Since 'files' is a Hashtable, neither the key nor the value (file) can be null, even though the NPE in RAMInputStream constructor implies that file was null. Yep... pretty weird... but looking at openFile(Str

Weird NPE in RAMInputStream when merging indices

2003-10-21 Thread petite_abeille
Hello, What could cause such weird exception? RAMInputStream.: java.lang.NullPointerException java.lang.NullPointerException at org.apache.lucene.store.RAMInputStream.(RAMDirectory.java:217) at org.apache.lucene.store.RAMDirectory.openFile(RAMDirectory.java:182) at org.apache.lucene.index.FieldIn

[OT] Open Source Goes to COMDEX

2003-10-20 Thread petite_abeille
Hello, This is pretty much off topic, but... ZOE has been nominated as one of the candidate project to go the Open Source Innovation Area on the COMDEX Exhibit Floor. http://www.oreillynet.com/contest/comdex/ ZOE is one of the few Java project short listed and it uses Lucene quiet extensively

Index locked for write

2003-10-04 Thread petite_abeille
[Posted to Dev by mistake] [Reposted to User] [Sorry for the mess] Hello, I recently updated from 1.3 RC1 to the latest cvs version. RC1 has proven very reliable for me, but I needed Dmitry compound index functionality. Therefore the move to the cvs version. I have been using 1.3 RC1 without an

Re: which lock belong to which index?

2003-10-02 Thread petite_abeille
Hi Otis, On Thursday, Oct 2, 2003, at 13:56 Europe/Amsterdam, Otis Gospodnetic wrote: I cannot remember the answer I got, but I asked the same question after the code was changed to put locks in java.io.tmpdir. Because I have an application that deals with a lot of indices simultaneously, I felt

which lock belong to which index?

2003-10-02 Thread petite_abeille
Hello, 10/01 11:25:41 (Warning) IndexWriter.: java.io.IOException: Index locked for write: [EMAIL PROTECTED]:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\lucene- 08d0626209019ccc9327ba6fb063c456-write.lock Is there a straightforward way to figure out which lock belong to which index? The lock name see

Re: Design question

2003-09-23 Thread petite_abeille
I, like a lot of other people are new to Lucene. Practical examples are pretty scarce. If you don't mind learning by example, take a look at the "Powered by Lucene" page. A fair number of those projects are open source. http://jakarta.apache.org/lucene/docs/powered.html PA. ---

Re: Is the lucene index serializable?

2003-09-23 Thread petite_abeille
Can I send a small lucene index by SOAP/TCP/HTTP/RMI? Is there a way to serialize a Lucene Index? I wan to send it from the Indexer server to the Search Server, and then do a merge operation in the Search Server with the previous index file. Well, what about a very old fashioned way instead? Som

Re: StandardTokenizer problem

2003-09-04 Thread petite_abeille
On Thursday, Sep 4, 2003, at 16:07 Europe/Zurich, Nicolas Maisonneuve wrote: "I.B.M" can be a host or acronym, so threre is a problem , no ? Perhaps as far as this parser goes... but... in practice... '.M' is not a valid TLD. PA. ---

Re: Lucene app to index Java code

2003-09-04 Thread petite_abeille
or Objective-C: http://homepage.mac.com/petite_abeille/MagicHat/ But from the sound of what Otis is saying this is not what you guys are looking for... back to the pampa then... Cheers, PA. - To unsubscribe, e-mail: [EMAIL

Re: Lucene app to index Java code

2003-09-04 Thread petite_abeille
Hi Otis, On Thursday, Sep 4, 2003, Otis Gospodnetic wrote: Has anyone written an application that uses Lucene to index Java code, either from the source .java files, or compiled .class files? If you are talking about my ultra secret project "Zapata: Coding Mexican Style", then yes ;) But... it

Re: IndexReader.delete(Term)?

2003-08-27 Thread petite_abeille
Hi Erik, On Wednesday, Aug 27, 2003, Erik Hatcher wrote: What you are doing looks fine to me. I'm sure these are obvious questions, kinda like "is your computer plugged in?", but here goes: - How are you determining that the document is still there? With an IndexReader? IndexSearcher? - A f

IndexReader.delete(Term)?

2003-08-26 Thread petite_abeille
Hello, This is more a sanity check, than anything else, but... I'm trying to delete a document using IndexReader.delete(Term)... (for the record I'm using 1.3-rc1) The document was created with a Field.Keyword() to uniquely identify it. The document exists, was saved, can be queried, life is g

Advanced Text Indexing with Lucene

2003-03-06 Thread petite_abeille
Another fine article by Otis: http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html PA. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Best HTML Parser !!

2003-02-25 Thread petite_abeille
On Monday, Feb 24, 2003, at 20:28 Europe/Zurich, Lukas Zapletal wrote: I have some good experiences with JTidy. It works like DOM-XML parser and cleans HTML it by the way. I use jtidy also. Both for parsing and clean-up. Works pretty nicely. This is VERY useful, because EVERY HTML have at least

Re: Indexing Tips and Hints

2003-02-25 Thread petite_abeille
On Tuesday, Feb 25, 2003, at 11:48 Europe/Zurich, Andrzej Bialecki wrote: This is strange, or at least counter-intuitive - if you buffer larger parts of data in RAM than the standard implementation does, it should definitely be faster... Let's wait and see what Terry comes up with. BTW. how la

Re: Indexing Tips and Hints

2003-02-25 Thread petite_abeille
On Tuesday, Feb 25, 2003, at 09:43 Europe/Zurich, Andrzej Bialecki wrote: No, I'm not - this is clearly stated in the class javadoc. I meant to try it out in my application, but haven't got to it yet - I need to address first the base functionality, not performance; so, I don't have the modifi

NullPointerException?

2003-01-22 Thread petite_abeille
Hello, I just ran into this exception: java.lang.NullPointerException at org.apache.lucene.store.RAMInputStream.(Unknown Source) at org.apache.lucene.store.RAMDirectory.openFile(Unknown Source) at org.apache.lucene.index.FieldInfos.(Unknown Source) at org.apache.lu

Re: RE : read past EOF?

2003-01-13 Thread petite_abeille
On Sunday, Jan 12, 2003, at 15:43 Europe/Zurich, Rasik Pandey wrote: Are you using a MultiSearcher? No. PA. -- To unsubscribe, e-mail: For additional commands, e-mail:

Re: read past EOF?

2003-01-08 Thread petite_abeille
On Tuesday, Jan 7, 2003, at 22:46 Europe/Zurich, Doug Cutting wrote: It looks like the .fdx and one of the .f[0-9]* files are out of sync. The .fdx file for each segment should be exactly eight times as long as all of the .f[0-9] files for that segment. This could happen if Lucene's file lock

Bad file descriptor?

2003-01-08 Thread petite_abeille
Hello, Here is another symptom of misbehavior in Lucene: java.io.IOException: Bad file descriptor at java.io.RandomAccessFile.readBytes(Native Method) at java.io.RandomAccessFile.read(RandomAccessFile.java:214) at org.apache.lucene.store.FSInputStream.readInternal(Unknown

read past EOF?

2003-01-07 Thread petite_abeille
Hello, Here is a pretty fatal exception I get from time to time in Lucene... java.io.IOException: read past EOF at org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:277) at org.apache.lucene.store.InputStream.readBytes(Unknown Source) at org.apache.luce

Re: Heuristics on searching HTML Documents ?

2002-12-30 Thread petite_abeille
On Monday, Dec 30, 2002, at 15:01 Europe/Zurich, Erik Hatcher wrote: If you have control over the HTML, how about marking the navbar pieces with a certain CSS class and then filtering that out from what you index? It seems like that would be a reasonable way to filter it - but this is of cou

Re: powered by lucene question

2002-12-27 Thread petite_abeille
On Friday, Dec 27, 2002, at 18:22 Europe/Zurich, Otis Gospodnetic wrote: It would be nice to make that Lucene image clickable, which should be a piece of cake, since Zoe uses HTML for rendering the UI. Doable? Well... yes. This is how it works in the application itself: you can click on the L

Re: powered by lucene question

2002-12-26 Thread petite_abeille
Hi Otis, On Sunday, Dec 22, 2002, at 22:50 Europe/Zurich, Otis Gospodnetic wrote: I think this would be a fine place for the Powered by Lucene logo + link for an application such as Zoe. Let me know when you do it, so I can add Zoe to the Powered by Lucene page. Done. It's in the latest versio

[OT] Still lost?

2002-12-24 Thread petite_abeille
Some reading for those long winter nights ;-) "You Are Here" Still lost? A cadre of new companies want to show you the way. http://www.newarchitectmag.com/documents/s=7766/na0103a/index.html Happy Holidays. PA. -- To unsubscribe, e-mail: For additional commands, e

Re: package information?

2002-12-20 Thread petite_abeille
On Friday, Dec 20, 2002, at 21:44 Europe/Zurich, Eric Isakson wrote: I think this info is available via the Manifest that is created during the build. This is cut from the build.xml from the latest CVS... Great! I must have overlooked it somehow. Thanks. PA. -- To unsubscribe, e-mail: <

package information?

2002-12-20 Thread petite_abeille
Hi, Would it be possible for Lucene to provide package informations? Basically all the java.lang.Package attributes... Things like implementation vendor, name, version and so on... This would make it easier to identify which packages/versions are used. Thanks. PA. -- To unsubscribe, e-mail:

powered by lucene question

2002-12-20 Thread petite_abeille
Hello, I'm in the process of creating the "about" page for my app and I was wondering what are the requirements to get included in the "Powered by Lucene" page? The app is a desktop application... it's not a web site. The only requirement I see is "Please include something like the following w

Re: Lucene Benchmarks and Information

2002-12-20 Thread petite_abeille
On Friday, Dec 20, 2002, at 19:58 Europe/Zurich, Scott Ganyo wrote: FYI: The best thing I've found for both increasing speed and reducing file handles is to use an IndexWriter on a RamDirectory for indexing and then use FileWriter.addIndexes() to write the result to disk. This is subject to

Re: write.lock file

2002-12-20 Thread petite_abeille
On Friday, Dec 20, 2002, at 19:48 Europe/Zurich, Doug Cutting wrote: Can you provide a reproducible test case that demonstrates index corruption? I honestly wish I could. Unfortunately, because of the nature of the application (Otis is familiar with it), I never seem to be able to come up wi

Re: write.lock file

2002-12-17 Thread petite_abeille
On Tuesday, Dec 17, 2002, at 17:43 Europe/Zurich, Doug Cutting wrote: Index updates are atomic, so it is very unlikely that the index is corrupted, unless the underlying file system itself is corrupted. Ummm... Perhaps in theory... In practice, indexes seems to get corrupted quiet easily in m

Re: Indexing in a CBD Environment

2002-12-11 Thread petite_abeille
On Wednesday, Dec 11, 2002, at 15:21 Europe/Zurich, Cohan, Sean wrote: Is there a better way to provide an acceptable searching mechanism using the relational database engine? Well it depend of what you mean by "acceptable"... but if you are using Oracle, you should look into Oracle Text: ht

Re: Indexing in a CBD Environment

2002-12-10 Thread petite_abeille
On Wednesday, Dec 11, 2002, at 07:16 Europe/Zurich, Otis Gospodnetic wrote: It uses Lucene as an object store, of sort, I believe, with variuos relations between objects (I did not look at the source, but I suspect it does this based on the functionality it offers). Yep. The basic approach ZOE

Re: Indexing email messages?

2002-12-06 Thread petite_abeille
On Friday, Dec 6, 2002, at 11:12 Europe/Zurich, Ashley Collins wrote: I'm using Lucene to index MIME messages and have a couple of questions. You should take a look at ZOE as it does all that and more. It's open source and uses Lucene to index every single bits of email. http://guests.evecto

Re: Readability score?

2002-11-23 Thread petite_abeille
On Friday, Nov 22, 2002, at 20:46 Europe/Zurich, petite_abeille wrote: Does anyone have a handy library to compute "readability score"? Here is an extract from a paper describing the Flesch index and an algorithm to count syllables... Does that make any sense? Thanks. "T

Readability score?

2002-11-22 Thread petite_abeille
Hello, This is slightly off topic but... Does anyone have a handy library to compute "readability score"? Something like Flesch Reading Ease score & Co: http://thibs.menloschool.org/~djwong/docs/wordReadabilityformulas.html Would you like to share?-) Thanks. R. -- To unsubscribe, e-mail:

Re: Concurency in Lucene

2002-10-16 Thread petite_abeille
On Thursday, Oct 17, 2002, at 00:44 Europe/Zurich, [EMAIL PROTECTED] wrote: > If there > is enough interest I would like to donate this code to Lucene. Please do :-) I ran into exactly the same type of problems and while I seem to have hammered them out I would love to see your take on it. P

[OT] Googling Your Email

2002-10-08 Thread petite_abeille
http://www.oreillynet.com/pub/a/network/2002/10/07/udell.html Powered by Lucene :-) -- To unsubscribe, e-mail: For additional commands, e-mail:

Re: using lucene as a lookup table?

2002-09-27 Thread petite_abeille
On Friday, Sep 27, 2002, at 13:27 Europe/Zurich, petite_abeille wrote: > - the first field would represent a random lookup key in the form of a > Field.Keyword Ooops... I should have mention that the key field is stored as Field( aKey, aValue, false, true, false): eg not stored, indexe

  1   2   >