[OT] Intertwingle

2002-04-23 Thread petite_abeille
Thought you might be interested. http://homepage.mac.com/zoe_info/ powered by lucene ;-) PA -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]

FileNotFoundException: Too many open files

2002-04-26 Thread petite_abeille
Hello, I'm running into this exception quiet often while using Lucene (the situation is so bad with the latest rc, that I had to revert to the last com.lucene package). I'm sure I have my fair share of bugs in my app, but nonetheless, how can I control Lucene usage of RandomAccessFile? The

Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread petite_abeille
using a RAMDir as a middle man solved my problems... Thanks. What's is your heuristic to flush the RAMDirectory? Also how do you deal with System.exit() or application death? Eg, your are indexing something and the application dies or is killed. Thanks for any input. R. -- To unsubscribe,

Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread petite_abeille
Thanks. What's is your heuristic to flush the RAMDirectory? please explain this because i don't understand english that good :-( That's ok, I don't really understand English either :-) Simply put, when do you flush the RAMDirectory into the FSDirectory? Every five documents? Ten? A thousand?

Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread petite_abeille
Hello again, There is no tool to detect index corruption, fixing of indexing, nor index rebuilding. The last one anyone can/has to do on their own. :-( Well, that *very* sad to say the least... How do I know if my indexes are not corrupted even if everything seems to be working fine? Don't

rc4 and FileNotFoundException: an update

2002-04-26 Thread petite_abeille
Hello again, I guess it's really not my day... Just to make sure I'm not hallucinating to much, I downloaded the latest and greatest: rc4. Changed all the packages names to org.apache. Updated a method here and there to reflect the APIs changes. And run my little app. I would like to

Re: rc4 and FileNotFoundException: an update

2002-04-26 Thread petite_abeille
Have you posted code that demonstrates this problem? If so I missed it. Thanks for your help. PA. -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]

Re: rc4 and FileNotFoundException: an update

2002-04-27 Thread petite_abeille
Hi Steven, Sounds like a pretty nasty situation. It is... This makes sense - any effort to solve the problem will first involve isolating the bug, and that's a task you're best suited for, since you know your system best. Ok... From what I understand, this situation arise depending on

FileNotFoundException: code example

2002-04-28 Thread petite_abeille
Hello again, attached is the source code of the only class interacting directly with Lucene in my app. Sorry for not providing a complete test case as it's hard for me to come up with something self contained. Maybe there is something that's obviously wrong in what I'm doing. Thanks for any

Re: rc4 and FileNotFoundException: an update

2002-04-29 Thread petite_abeille
I don't know what environment you're using Lucene in. However, we had this too many open files problem on our Solaris box, and increasing the number of file descriptors through the ulimit -n command fixed it. Thanks. That should help. However, I have a little desktop app and it will be

Re: too many open files in system

2002-04-29 Thread petite_abeille
On Tuesday, 9. April 2002 14:08, you wrote: root wrote: Doesn't Lucene releases the filehandles?? because I get too many open files in system after running lucene a while! Are you closing the readers and writers after you've finished using them? cheers, Chris Yes I close the

Re: FileNotFoundException: code example

2002-04-29 Thread petite_abeille
I would add some logging to the code You lost me here... Where should I add some logging? to get more idea of which Lucene methods are actually being called, when, in what sequence. I typical sequence looks like that: - search() - deleteIndexWithID() - indexValuesWithID() PA -- To

Re: too many open files in system

2002-04-29 Thread petite_abeille
how many open files you think can be used at your process?? Not sure. It varies with usage pattern. I will check it out in any case. cat /proc/sys/fs/file-max cat: /proc/sys/fs/file-max: No such file or directory echo 5 /proc/sys/fs/file-max Unfortunately, I cannot use this kind of

Re: rc4 and FileNotFoundException: an update

2002-04-29 Thread petite_abeille
Does this mean you tried it on other OSs and it worked? Yes. Which ones? Win2k SP2 What JDK did those have jre 1.4.0 and what was their ulimit and what is the ulimit on your OSX machine? Just curious. I don't know. Does it matter? PA -- To unsubscribe, e-mail: mailto:[EMAIL

Homogeneous vs Heterogeneous indexes (was: FileNotFoundException)

2002-04-29 Thread petite_abeille
First of, thanks to Jagadesh Nandasamy who directed me to the right direction. It seems, that in my situation, more homogeneous indexes work better than fewer heterogeneous indexes: I have a dozen class that I'm indexing. They vary from two fields to more than a dozen field per document (aka

Re: Homogeneous vs Heterogeneous indexes (was: FileNotFoundException)

2002-04-30 Thread petite_abeille
On Tuesday, April 30, 2002, at 01:57 AM, Steven J. Owens wrote: Just be glad you aren't doing this on Solaris with JDK 1.1.6 I know... In fact I'm looking forward to port my stuff to 1.4... As my app is very much IO bond I'm really excited by this nio madness... :-) Yes and no. Setting

Re: Homogeneous vs Heterogeneous indexes (was: FileNotFoundException)

2002-05-01 Thread petite_abeille
On Wednesday, May 1, 2002, at 12:41 AM, Dmitry Serebrennikov wrote: - the number of files that Lucene uses depends on the number of segments in the index and the number of *stored* fields - if your fields are not stored but only indexed, they do not require separate files. Otherwise, an

Re: indexing PDF files

2002-05-01 Thread petite_abeille
On Tuesday, April 30, 2002, at 10:46 PM, Otis Gospodnetic wrote: Hm, this should be a FAQ. Maybe it should... ;-) Check Lucene contributions page, there are some starting points there, Well, this seems to be a very popular request... In fact I need something like that also. Unfortunately,

Re: indexing PDF files

2002-05-03 Thread petite_abeille
On Wednesday, May 1, 2002, at 05:41 PM, Otis Gospodnetic wrote: Wouldn't you want to convert to XML instead and use XSLT to transform the XML representation to any desired format by just applying a style sheet? Sounds like less work with bigger document type coverage. Sounds good... But

Re: indexing PDF files

2002-05-03 Thread petite_abeille
On Friday, May 3, 2002, at 03:16 PM, Moturu,Praveen wrote: Can I assume none of the poeple on the lucene user group had implemented indexing a pdf document using lucene. Who knows...?!? In any case, it's not public knowledge... If some one has.. Please help me by providing the solution.

[OT] An Open Letter

2002-05-27 Thread petite_abeille
FYI. Begin forwarded message: From: Alex Horovitz [EMAIL PROTECTED] Date: Mon May 27, 2002 01:58:27 PM Europe/Zurich To: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Cc: Steve Jobs [EMAIL PROTECTED], [EMAIL PROTECTED], toni Trujillo-Vian [EMAIL PROTECTED],

source code available

2002-05-27 Thread petite_abeille
For entertainment purpose only, ZOË's source code is available at: http://guests.evectors.it/zoe/ PA. -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]

[OT] Zoe open source

2002-06-03 Thread petite_abeille
Hello, I'm releasing Zoe under the Apple Public Source License and putting together a SourceForge project to coordinate the future development of Zoe. Our plan is to choose a handful of experienced developers to form the core development team for Zoe. Anyone is free to contribute code which

Re: [OT] Zoe open source

2002-06-03 Thread petite_abeille
On Monday, June 3, 2002, at 04:44 PM, Peter Carlson wrote: Good luck with your project. It looks very exciting and refreshing. I haven't tried it yet, but the screen shots look useful and beautiful. Thanks. I hope that you will stay active in the Lucne user community and contribute

Lucene for OSX?

2002-07-16 Thread petite_abeille
Hello, I was wandering if anybody knows of a Lucene port to straight C or Objective C...?!? I need something equivalent to Lucene (but native if possible) on Mac OS X... Thanks for any pointers!-) PA. -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail:

Re: Lucene for OSX?

2002-07-16 Thread petite_abeille
On Tuesday, July 16, 2002, at 03:41 , Otis Gospodnetic wrote: The only thing that I can think of right now is omseek on sf.net, but that project seems somewhat dead. I think that is in C or C++. Thanks. I also found something called Onix (http://www.lextek.com/onix/) Anybody have any

Re: Lucene for OSX?

2002-07-16 Thread petite_abeille
Hi James, On Tuesday, July 16, 2002, at 03:52 , Brook, James wrote: How about this? I think it's what they use for Sherlock. Apple Information Access Toolkit (AIAT) http://www.devworld.apple.com/dev/aiat/ Well, that's basically the first incarnation of Lucene :-) And in fact I was

Re: Lucene for OSX?

2002-07-16 Thread petite_abeille
On Tuesday, July 16, 2002, at 04:04 , Brook, James wrote: It looks like it's available for FTP download as an 'SDK' on this page http://developer.apple.com/sdk/ I have no idea whether this is up-to-date or compatible with the latest OS X. Thanks. I will take a look into it. Cheers,

text format and scoring

2002-08-02 Thread petite_abeille
Hello, I was wandering what would be a good way to incorporate text format information in Lucene word/document scoring. For example, when turning HTML into plain text for indexing purpose, a lot of potentially useful information are lost: eg tags like bold, strong and so on could be

Re: text format and scoring

2002-08-03 Thread petite_abeille
Hi Alex, On Saturday, August 3, 2002, at 11:13 , Alex Murzaku wrote: Hi PA! How are things going? Doing all right :-) It's an interesting question but I don't think Lucene (as it is today) could change weights based on semantics (either assigned by formatting tags or maybe looked up in

using lucene as a lookup table?

2002-09-27 Thread petite_abeille
Hello, I would like to use Lucene as a kind of lookup table (aka Map): A document would have two fields: - the first field would represent a random lookup key in the form of a Field.Keyword - the second field would be an object id also stored as a Field.Keyword Which sounds fine in theory.

Re: using lucene as a lookup table?

2002-09-27 Thread petite_abeille
On Friday, Sep 27, 2002, at 13:27 Europe/Zurich, petite_abeille wrote: - the first field would represent a random lookup key in the form of a Field.Keyword Ooops... I should have mention that the key field is stored as Field( aKey, aValue, false, true, false): eg not stored, indexed

Readability score?

2002-11-22 Thread petite_abeille
Hello, This is slightly off topic but... Does anyone have a handy library to compute readability score? Something like Flesch Reading Ease score Co: http://thibs.menloschool.org/~djwong/docs/wordReadabilityformulas.html Would you like to share?-) Thanks. R. -- To unsubscribe, e-mail:

Re: Readability score?

2002-11-23 Thread petite_abeille
On Friday, Nov 22, 2002, at 20:46 Europe/Zurich, petite_abeille wrote: Does anyone have a handy library to compute readability score? Here is an extract from a paper describing the Flesch index and an algorithm to count syllables... Does that make any sense? Thanks. The Flesch index

Re: Indexing email messages?

2002-12-06 Thread petite_abeille
On Friday, Dec 6, 2002, at 11:12 Europe/Zurich, Ashley Collins wrote: I'm using Lucene to index MIME messages and have a couple of questions. You should take a look at ZOE as it does all that and more. It's open source and uses Lucene to index every single bits of email.

Re: Indexing in a CBD Environment

2002-12-10 Thread petite_abeille
On Wednesday, Dec 11, 2002, at 07:16 Europe/Zurich, Otis Gospodnetic wrote: It uses Lucene as an object store, of sort, I believe, with variuos relations between objects (I did not look at the source, but I suspect it does this based on the functionality it offers). Yep. The basic approach

Re: Indexing in a CBD Environment

2002-12-11 Thread petite_abeille
On Wednesday, Dec 11, 2002, at 15:21 Europe/Zurich, Cohan, Sean wrote: Is there a better way to provide an acceptable searching mechanism using the relational database engine? Well it depend of what you mean by acceptable... but if you are using Oracle, you should look into Oracle Text:

Re: write.lock file

2002-12-17 Thread petite_abeille
On Tuesday, Dec 17, 2002, at 17:43 Europe/Zurich, Doug Cutting wrote: Index updates are atomic, so it is very unlikely that the index is corrupted, unless the underlying file system itself is corrupted. Ummm... Perhaps in theory... In practice, indexes seems to get corrupted quiet easily in

Re: write.lock file

2002-12-20 Thread petite_abeille
On Friday, Dec 20, 2002, at 19:48 Europe/Zurich, Doug Cutting wrote: Can you provide a reproducible test case that demonstrates index corruption? I honestly wish I could. Unfortunately, because of the nature of the application (Otis is familiar with it), I never seem to be able to come up

powered by lucene question

2002-12-20 Thread petite_abeille
Hello, I'm in the process of creating the about page for my app and I was wondering what are the requirements to get included in the Powered by Lucene page? The app is a desktop application... it's not a web site. The only requirement I see is Please include something like the following with

package information?

2002-12-20 Thread petite_abeille
Hi, Would it be possible for Lucene to provide package informations? Basically all the java.lang.Package attributes... Things like implementation vendor, name, version and so on... This would make it easier to identify which packages/versions are used. Thanks. PA. -- To unsubscribe,

Re: package information?

2002-12-20 Thread petite_abeille
On Friday, Dec 20, 2002, at 21:44 Europe/Zurich, Eric Isakson wrote: I think this info is available via the Manifest that is created during the build. This is cut from the build.xml from the latest CVS... Great! I must have overlooked it somehow. Thanks. PA. -- To unsubscribe, e-mail:

Re: powered by lucene question

2002-12-27 Thread petite_abeille
On Friday, Dec 27, 2002, at 18:22 Europe/Zurich, Otis Gospodnetic wrote: It would be nice to make that Lucene image clickable, which should be a piece of cake, since Zoe uses HTML for rendering the UI. Doable? Well... yes. This is how it works in the application itself: you can click on the

Re: Heuristics on searching HTML Documents ?

2002-12-30 Thread petite_abeille
On Monday, Dec 30, 2002, at 15:01 Europe/Zurich, Erik Hatcher wrote: If you have control over the HTML, how about marking the navbar pieces with a certain CSS class and then filtering that out from what you index? It seems like that would be a reasonable way to filter it - but this is of

read past EOF?

2003-01-07 Thread petite_abeille
Hello, Here is a pretty fatal exception I get from time to time in Lucene... java.io.IOException: read past EOF at org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:277) at org.apache.lucene.store.InputStream.readBytes(Unknown Source) at

Re: Indexing Tips and Hints

2003-02-25 Thread petite_abeille
On Tuesday, Feb 25, 2003, at 11:48 Europe/Zurich, Andrzej Bialecki wrote: This is strange, or at least counter-intuitive - if you buffer larger parts of data in RAM than the standard implementation does, it should definitely be faster... Let's wait and see what Terry comes up with. BTW. how

Re: Best HTML Parser !!

2003-02-25 Thread petite_abeille
On Monday, Feb 24, 2003, at 20:28 Europe/Zurich, Lukas Zapletal wrote: I have some good experiences with JTidy. It works like DOM-XML parser and cleans HTML it by the way. I use jtidy also. Both for parsing and clean-up. Works pretty nicely. This is VERY useful, because EVERY HTML have at least

Advanced Text Indexing with Lucene

2003-03-06 Thread petite_abeille
Another fine article by Otis: http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html PA. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

IndexReader.delete(Term)?

2003-08-26 Thread petite_abeille
Hello, This is more a sanity check, than anything else, but... I'm trying to delete a document using IndexReader.delete(Term)... (for the record I'm using 1.3-rc1) The document was created with a Field.Keyword() to uniquely identify it. The document exists, was saved, can be queried, life is

Re: IndexReader.delete(Term)?

2003-08-27 Thread petite_abeille
Hi Erik, On Wednesday, Aug 27, 2003, Erik Hatcher wrote: What you are doing looks fine to me. I'm sure these are obvious questions, kinda like is your computer plugged in?, but here goes: - How are you determining that the document is still there? With an IndexReader? IndexSearcher? - A

Re: Lucene app to index Java code

2003-09-04 Thread petite_abeille
Hi Otis, On Thursday, Sep 4, 2003, Otis Gospodnetic wrote: Has anyone written an application that uses Lucene to index Java code, either from the source .java files, or compiled .class files? If you are talking about my ultra secret project Zapata: Coding Mexican Style, then yes ;) But... it

Re: Lucene app to index Java code

2003-09-04 Thread petite_abeille
: http://homepage.mac.com/petite_abeille/MagicHat/ But from the sound of what Otis is saying this is not what you guys are looking for... back to the pampa then... Cheers, PA. - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: StandardTokenizer problem

2003-09-04 Thread petite_abeille
On Thursday, Sep 4, 2003, at 16:07 Europe/Zurich, Nicolas Maisonneuve wrote: I.B.M can be a host or acronym, so threre is a problem , no ? Perhaps as far as this parser goes... but... in practice... '.M' is not a valid TLD. PA.

Re: Is the lucene index serializable?

2003-09-23 Thread petite_abeille
Can I send a small lucene index by SOAP/TCP/HTTP/RMI? Is there a way to serialize a Lucene Index? I wan to send it from the Indexer server to the Search Server, and then do a merge operation in the Search Server with the previous index file. Well, what about a very old fashioned way instead?

Re: Design question

2003-09-23 Thread petite_abeille
I, like a lot of other people are new to Lucene. Practical examples are pretty scarce. If you don't mind learning by example, take a look at the Powered by Lucene page. A fair number of those projects are open source. http://jakarta.apache.org/lucene/docs/powered.html PA.

Re: which lock belong to which index?

2003-10-02 Thread petite_abeille
Hi Otis, On Thursday, Oct 2, 2003, at 13:56 Europe/Amsterdam, Otis Gospodnetic wrote: I cannot remember the answer I got, but I asked the same question after the code was changed to put locks in java.io.tmpdir. Because I have an application that deals with a lot of indices simultaneously, I

Index locked for write

2003-10-04 Thread petite_abeille
[Posted to Dev by mistake] [Reposted to User] [Sorry for the mess] Hello, I recently updated from 1.3 RC1 to the latest cvs version. RC1 has proven very reliable for me, but I needed Dmitry compound index functionality. Therefore the move to the cvs version. I have been using 1.3 RC1 without

[OT] Open Source Goes to COMDEX

2003-10-20 Thread petite_abeille
Hello, This is pretty much off topic, but... ZOE has been nominated as one of the candidate project to go the Open Source Innovation Area on the COMDEX Exhibit Floor. http://www.oreillynet.com/contest/comdex/ ZOE is one of the few Java project short listed and it uses Lucene quiet

Weird NPE in RAMInputStream when merging indices

2003-10-21 Thread petite_abeille
Hello, What could cause such weird exception? RAMInputStream.init: java.lang.NullPointerException java.lang.NullPointerException at org.apache.lucene.store.RAMInputStream.init(RAMDirectory.java:217) at org.apache.lucene.store.RAMDirectory.openFile(RAMDirectory.java:182) at

Re: Weird NPE in RAMInputStream when merging indices

2003-10-22 Thread petite_abeille
Hi Otis, On Wednesday, Oct 22, 2003, at 18:06 Europe/Amsterdam, Otis Gospodnetic wrote: Since 'files' is a Hashtable, neither the key nor the value (file) can be null, even though the NPE in RAMInputStream constructor implies that file was null. Yep... pretty weird... but looking at

Re: new release: 1.3 RC2

2003-10-22 Thread petite_abeille
Hello, On Wednesday, Oct 22, 2003, at 18:13 Europe/Amsterdam, Doug Cutting wrote: A new Lucene release is available. Very nice. Thanks :) Quick question regarding release note number 11: What's the difference between IndexWriter.addIndexes(IndexReader[]) and

Re: java.nio.channels.FileLock

2003-10-29 Thread petite_abeille
On Oct 29, 2003, at 19:08, Ronald Muller wrote: What is the advantage of using a FileLock object instead of the way Lucene does it? (I do not see it) Less code. Less worries. Also note an mportant limitation: File locks are held on behalf of the entire Java virtual machine. They are not

Re: Term out of order.

2003-10-30 Thread petite_abeille
On Oct 30, 2003, at 13:36, Pasha Bizhan wrote: I think that it's problem of java version of Lucene. Because all core algorithms of Lucene and Lucene.Net are identical. Talking of which... it appears... that... something... is... wrong... somewhere... This definitely needs some additional

Exotic format indexing?

2003-10-30 Thread petite_abeille
Hello, Indexing a multitude of esoteric formats (MS Office, PDF, etc) is a popular question on this list... The traditional approach seems to be to try to find some kind of format specific reader to properly extract the textual part of such documents for indexing. The drawback of such an

Re: 182 file formats for lucene!!! was: Re: Exotic format indexing?

2003-10-30 Thread petite_abeille
Hi Stefan, On Oct 30, 2003, at 21:02, Stefan Groschupf wrote: just to let you know, i had implement for the nutch project a plugin that can parse 182 file formats including m$ office. I simply use open office and use the available java api. Yes, I saw that. Great work :) Unfortunately, using

Re: Exotic format indexing?

2003-10-30 Thread petite_abeille
On Oct 30, 2003, at 20:48, Ben Litchfield wrote: Unfortunately, it is not quite so easy. I am not sure about Word documents The raw text is visible. but PDFs usually have there contents compressed Yep. PDF is really an image format ;) so a raw fishing around for text would be pointless. That's

Re: The best way forward

2003-11-04 Thread petite_abeille
On Nov 04, 2003, at 13:04, Otis Gospodnetic wrote: Eventually i am going to try to implement something similar to google groups, indexing lots of NNTP traffic. Has anyone done this before with lucune? Not that I know, but people have used Lucene to index their email, which is somewhat similar.

Re: Relational Search

2003-11-04 Thread petite_abeille
On Nov 04, 2003, at 19:28, Tate Avery wrote: Does anyone have any creative ideas for tackling this problem with Lucene? Perhaps Not sure if this quiet what you are after, but you could take a look at ZOE's SZObject framework. It's build on top of Lucene to provide lightweight ODBMS like

Re: The best way forward

2003-11-04 Thread petite_abeille
Hi Dror, On Nov 04, 2003, at 19:33, Dror Matalon wrote: By the way, we're also thinking of integrating newsgroups into RSS aggregator which you can see at www.fastbuzz.com. ZOE does something similar already. It can vend messages as RSS feeds:

Re: Document Clustering

2003-11-11 Thread petite_abeille
On Nov 11, 2003, at 16:05, Marcel Stör wrote: As everybody seems to be so exited about it, would someone please be so kind to explain what document based clustering is? This mostly means finding document which are similar in some way(s). The similitude is mostly in the eyes of the beholder. In

Re: Document Clustering

2003-11-11 Thread petite_abeille
On Nov 11, 2003, at 16:58, Tate Avery wrote: Categorization typically assigns documents to a node in a pre-defined taxonomy. For clustering, however, the categorization 'structure' is emergent... i.e. the clusters (which are analogous to taxonomy nodes) are created dynamically based on the

Re: Document Clustering

2003-11-11 Thread petite_abeille
On Nov 11, 2003, at 21:32, maurits van wijland wrote: There is the carrot project : http://www.cs.put.poznan.pl/dweiss/carrot/ Leo Galambos, author of the Egothor project, constantly supports us with fresh ideas and includes Carrot components in his own project!

Re: Overview to Lucene

2003-11-12 Thread petite_abeille
Hi Ralf, On Nov 12, 2003, at 14:06, [EMAIL PROTECTED] wrote: Does anybody know good articles which demonstrate parts of that or give a good start into Lucene? Otis Gospodnetic's articles are a good starting point: Introduction to Text Indexing with Apache Jakarta Lucene

Re: fuzzy searches

2003-11-13 Thread petite_abeille
On Nov 11, 2003, at 21:02, Bruce Ritchie wrote: Just a note the LSI is encumbered by US patents 4,839,853 and 5,301,109. It would be wise to make sure that any implementation is either blessed by the patent holders or does not infringe on the patents. Since when did developers turn into

Re: Objection to using /tmp for lock files.

2003-11-13 Thread petite_abeille
On Nov 13, 2003, at 19:00, Dror Matalon wrote: I've been experimenting with it and it seems to work as advertised. It has the advantage of not requiring *any* write capability in /tmp or anywhere else. There is a system property to turn off the lock files altogether. PA.

Re: Query Filters on term A in query A AND (B OR C OR D)

2003-11-13 Thread petite_abeille
On Nov 13, 2003, at 22:32, Jie Yang wrote: I am trying to optimse the 500 OR terms so that it does not do a full 2 millions docs search but on the 1000 returned. Would it be beneficial to move the first result set into its own (transient) index to perform the second part of your query? PA.

Re: inter-term correlation [was Re: Vector Space Model in Lucene?]

2003-11-14 Thread petite_abeille
On Nov 14, 2003, at 19:50, Chong, Herb wrote: if you are handling inter correlation properly, then terms can't cross sentence boundaries. Could you not break down your document along sentences boundary? If you manage to figure out what a sentence is, that is. if you are not paying attention to

Re: Vector Space Model in Lucene?

2003-11-14 Thread petite_abeille
On Nov 14, 2003, at 20:27, Dror Matalon wrote: I might be the only person on the list who's having a hard time following this discussion. Nope. I don't understand a word of what those guys are talking about either :) Would one of you wise folks care to point me to a good dummies, also known as

Re: inter-term correlation [was Re: Vector Space Model in Lucene?]

2003-11-14 Thread petite_abeille
On Nov 14, 2003, at 20:29, Philippe Laflamme wrote: Rules of linguistics? Is there such a thing? :) Actually, yes there is. Natural Language Processing (NLP) is a very broad research subject but a lot has come out of it. A lot of what? If statements? :) More specifically, Rule-based taggers

Re: Vector Space Model in Lucene?

2003-11-14 Thread petite_abeille
On Nov 14, 2003, at 21:16, Chong, Herb wrote: if you know what TREC is, you know what i meant earlier. this isn't exotic technology, this is close to 15 year old technology. This is not really what I asked. What I would be interested to know is what approach you consider to provide the biggest

Re: inter-term correlation [was Re: Vector Space Model in Lucene?]

2003-11-14 Thread petite_abeille
On Nov 14, 2003, at 21:14, Philippe Laflamme wrote: Rules of linguistics? Is there such a thing? :) Actually, yes there is. Natural Language Processing (NLP) is a very broad research subject but a lot has come out of it. A lot of what? If statements? :) Yes... just like every software boils down

Re: Document ID's and duplicates

2003-11-19 Thread petite_abeille
On Nov 19, 2003, at 18:14, Don Kaiser wrote: If you do this will the old version of the document be replaced by the new one? No. They will coexist. In Lucene, an update implies a delete/insert sequence. PA. - To unsubscribe,

moving documents from one index to another?

2003-11-20 Thread petite_abeille
Hello, I'm trying to move a Document from one Index to another, without necessarily reindexing it... The Document is composed of one Field.Keyword and a bunch of Field.UnStored. Reading such a Document from one index and then adding it to another one doesn't seems to have the expected effect

Re: moving documents from one index to another?

2003-11-20 Thread petite_abeille
On Nov 20, 2003, at 13:45, Eric Jain wrote: If the document contains unstored fields, the only way to reconstruct the document is by iterating through all terms in the index and picking out those that reference the document. Hmmm... how would you do that? Something along the lines of

Re: moving documents from one index to another?

2003-11-20 Thread petite_abeille
On Nov 20, 2003, at 14:13, Eric Jain wrote: That's what I had in mind, but maybe there is better way. Once all terms are collected, they can be reassembled into a new document that that can then be indexed again. I see. Assuming I have the relevant terms for a given document, how would a build

Re: moving documents from one index to another?

2003-11-20 Thread petite_abeille
On Nov 20, 2003, at 14:34, Eric Jain wrote: I believe a term always contains it's own text. (It must be somewhere, after all...) Documents on the other hand may or may not contain the original text, depending on whether a field is stored or not. This seems to be the case: the term's text hold the

Re: moving documents from one index to another?

2003-11-20 Thread petite_abeille
On Nov 20, 2003, at 14:34, Eric Jain wrote: I see. Assuming I have the relevant terms for a given document, how would a build a new document based on those terms? Something like adding each term's field and text to the new document? Yes. Ok. Retrieving the term for a document turns out to be

[OT] Digital Format-Specific Validation

2003-12-06 Thread petite_abeille
http://hul.harvard.edu/jhove/ Might be of interest to some :) Cheers, PA. smime.p7s Description: S/MIME cryptographic signature

[OT] Re: Need Advices and Help

2004-02-05 Thread petite_abeille
On Feb 05, 2004, at 13:01, Otis Gospodnetic wrote: I believe it would be the value of a 'Message-ID' or 'Reference' or 'Reference-ID' message header. However, I remember reading that mail readers are not very good at sticking to a standard (some RFC, I guess), so they don't always provide the

Re: Index advice...

2004-02-10 Thread petite_abeille
On Feb 10, 2004, at 14:03, Scott ganyo wrote: I have. While document.add() itself doesn't increase over time, the merge does. Ways of partially overcoming this include increasing the mergeFactor (but this will increase the number of file handles used), or building blocks of the index in

Re: index: how to store binary data or objects ?

2004-02-10 Thread petite_abeille
On Feb 10, 2004, at 14:53, Markus Brosch wrote: My application will deal with small data sets. The problem is, that I want to index the content (String) of some objects. I want to refer to that object once I found this by a keyword or whatever. So, using a simple map or tree? Something along

Re: Did you mean...

2004-02-12 Thread petite_abeille
On Feb 12, 2004, at 16:42, Abhay Saswade wrote: How about creating spellcheck dictionary with all words in lucene index? That way you ensure that the word really exists in the index. You can indeed use the terms identified by Lucene as the dictionary words ands apply traditional spell checking

index update (was Re: Large InputStream.BUFFER_SIZE causes OutOfMemoryError.. FYI)

2004-04-13 Thread petite_abeille
On Apr 13, 2004, at 02:45, Kevin A. Burton wrote: He mentioned that I might be able to squeeze 5-10% out of index merges this way. Talking of which... what strategy(ies) do people use to minimize downtime when updating an index? My current strategy is as follow: (1) use a temporary

Re: Lucene and MVC (was Re: Bad file descriptor (IOException) using SearchBean contribution)

2004-05-19 Thread petite_abeille
On May 20, 2004, at 04:38, Erik Hatcher wrote: OffTopic: havoc and Struts go well together ;) Pick up Tapestry instead! Nah. Keep it really Simple [1] instead :o) http://simpleweb.sourceforge.net/ PA. - To unsubscribe, e-mail:

alternative query syntax?

2004-08-31 Thread petite_abeille
Hello, I would like to provide an alternative query syntax for ranges by using a colon (':') or two dots ('..') instead of ' TO '. For example: mod_date:[20020101:20030101] Or mod_date:[20020101..20030101] What would be the correct procedure to modify the QueryParser to achieve this? Should I

Re: indexing size

2004-08-31 Thread petite_abeille
On Aug 31, 2004, at 17:17, Otis Gospodnetic wrote: You also have a large number of fields, and it looks like a lot (all?) of them are stored and indexed. That's what that large .fdt file indicated. That file is 206 MB in size. Try using Field.UnStored() to avoid storing all those data in your

Re: indexing size

2004-09-01 Thread petite_abeille
Hi Niraj, On Sep 01, 2004, at 06:45, Niraj Alok wrote: If I make some of them Field.Unstored, I can see from the javadocs that it will be indexed and tokenized but not stored. If it is not stored, how can I use it while searching? The different type of fields don't impact how you do your

Re: Encrypted indexes

2004-10-13 Thread petite_abeille
On Oct 13, 2004, at 15:26, Nader Henein wrote: Well, are you storing any data for retrieval from the index, because you could encrypt the actual data and then encrypt the search string public key style. Alternatively, write your index to an encrypted volume... something along the line of

Re: Google Desktop Could be Better

2004-10-15 Thread petite_abeille
On Oct 15, 2004, at 16:10, Tom Cunningham wrote: I'd be interested in trying to implement some of these ideas on Mac OS X, mostly because it's not already covered by Google Desktop, and I think the screensaver idea would work pretty well there. Anyone else want to give this a shot? Google

[OT] Re: Lots Of Interest in Lucene Desktop

2004-10-29 Thread petite_abeille
On Oct 28, 2004, at 20:26, Kevin A. Burton wrote: http://www.peerfear.org/rss/permalink/2004/10/28/ LotsOfInterestInLuceneDesktop/ Many people, few ideas :) http://www.popsearch.net/index.html PA. - To unsubscribe, e-mail: [EMAIL

  1   2   >