Re: which lock belong to which index?

2003-10-02 Thread Otis Gospodnetic
I cannot remember the answer I got, but I asked the same question after the code was changed to put locks in java.io.tmpdir. Because I have an application that deals with a lot of indices simultaneously, I felt like this will make things more difficult in cases where you have stale locks, etc. Try

Re: German stamming algorithm problem

2003-10-05 Thread Otis Gospodnetic
I do not know enough about the German stemmer included with Lucene, but I can suggest that you look at the Snowball stemmers. Take a look at the Lucene Sandbox (link on Lucene's home page) to see how they can be used with Lucene. Otis --- Marius Seiceanu [EMAIL PROTECTED] wrote: Hello!

GermanAnalyzer.java GermanStemmer.java

2003-10-09 Thread Otis Gospodnetic
as they are currently. Otis --- Erik Hatcher [EMAIL PROTECTED] wrote: It seems to be the issue mentioned here as well: http://nagoya.apache.org/bugzilla/show_bug.cgi?id=18410 On Wednesday, October 8, 2003, at 09:41 PM, Otis Gospodnetic wrote: Answer to question comment: possibly

Re: Builder.com article on Lucene

2003-10-13 Thread Otis Gospodnetic
Uh, this message was flagged nicely in my Lucene folder, but I just got to it now. The link should show up in the 'Articles, etc.' section on Lucene's pages next time they are refreshed. Otis --- Jeff Linwood [EMAIL PROTECTED] wrote: Hi, I wrote a short introductory article about Lucene on

Re: DBDirectory available for download

2003-10-13 Thread Otis Gospodnetic
Thank you, I added this to out 'patch queue'. If you have a more recent version, feel free to attach it to the enhancement report that you received in email. Otis --- Anthony Eden [EMAIL PROTECTED] wrote: Version 1.0 of the DBDirectory library, which implements a Directory which can store

Re: A strange Error--IllegalAccessError: tried to access field org.apache.lucene.search.IndexSearcher.reader

2003-10-15 Thread Otis Gospodnetic
Please note that the class that caused the error, org.apache.lucene.search.IndexOrderSearcher, is not really a Lucene class. You got that class from http://sf.net/projects/weblucene, most likely. Otis --- lhelper [EMAIL PROTECTED] wrote: Hi. I get a strange problem with my web application

RE: Lucene on Windows

2003-10-20 Thread Otis Gospodnetic
The CVS version of Lucene has a patch that allows one to use a 'Compound Index' instead of the traditional one. This reduces the number of open files. For more info, see/make the Javadocs for IndexWriter. Otis --- Tate Avery [EMAIL PROTECTED] wrote: You might have trouble with too many open

Re: Lucene on Windows

2003-10-21 Thread Otis Gospodnetic
A very rough and simple 'add a single document to the index' test shows that the Compound Index is marginally slower than the traditional one. I did not test searching. Otis --- Eric Jain [EMAIL PROTECTED] wrote: The CVS version of Lucene has a patch that allows one to use a 'Compound Index'

Re: Weird NPE in RAMInputStream when merging indices

2003-10-22 Thread Otis Gospodnetic
Hm, beat me. The code in question seems to be: public RAMInputStream(RAMFile f) { file = f; length = file.length; } ...which is called from: /** Returns a stream reading an existing file. */ public final InputStream openFile(String name) { RAMFile file =

Re: positional token info

2003-10-22 Thread Otis Gospodnetic
Then we agree, and it is StopFilter that needs to be patched to take into account the number of removed terms, and add appropriate positional info to each term. Otis --- Erik Hatcher [EMAIL PROTECTED] wrote: On Tuesday, October 21, 2003, at 07:31 PM, Otis Gospodnetic wrote: So phone boy

Re: Weird NPE in RAMInputStream when merging indices

2003-10-22 Thread Otis Gospodnetic
, at 18:06 Europe/Amsterdam, Otis Gospodnetic wrote: Since 'files' is a Hashtable, neither the key nor the value (file) can be null, even though the NPE in RAMInputStream constructor implies that file was null. Yep... pretty weird... but looking at openFile(String name)... could

Re: Best practice

2003-10-28 Thread Otis Gospodnetic
--- William W [EMAIL PROTECTED] wrote: Hi Erik, Why don't you write a book about Lucene ? : ) Maybe he already is writing it. :) Regarding your original question, I don't think anyone will be able to answer it, as it is quite general. I suggest you describe pieces of your code that concern

Re: Performing subqueries on a result

2003-10-28 Thread Otis Gospodnetic
I've seen people just an additional BooleanClause and join it with the original query using an AND. Otis --- Stephan Melchior [EMAIL PROTECTED] wrote: Hi, I'm new with Lucene and need help, My Problem: I successfully performed a query via hits = searcher.search(query); Now i want to

A book about Lucene coming soon to a book store near you

2003-10-28 Thread Otis Gospodnetic
Hello, Erik Hatcher and I are in the process of writing a book about Lucene. Among other things, we would like to include 'Lucene Patterns' / 'Lucene Best Practices' type of material in the book. If you feel that you have observed and/or implemented Lucene usage patterns that look like they

Re: Term out of order.

2003-10-29 Thread Otis Gospodnetic
Apparently so :( http://www.google.com/search?q=lucene+%22term+out+of+order%22 Otis --- Victor Hadianto [EMAIL PROTECTED] wrote: Hi all, I'm using Lucene.Net but seems appropriate to post here as well. I have been getting this exception Term out of order every now and then while doing a

Re: lucene indexing and searching engine performance

2003-10-30 Thread Otis Gospodnetic
Look at the Benchmarks page on Lucene's site. It is not complete (heh, it can never be complete), but it will give you some ideas about Lucene's performance. Feel free to submit your benchmarks, using this template: http://jakarta.apache.org/lucene/docs/benchmarktemplate.xml Thank you, Otis

Re: Term out of order.

2003-10-30 Thread Otis Gospodnetic
.) Is this a problem or not? Regards, Terry Steichen - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, October 29, 2003 7:09 PM Subject: Re: Term out of order. Apparently so :( http://www.google.com/search?q

Re: MultiFieldQueryParser default operator

2003-10-30 Thread Otis Gospodnetic
I believe a person just sent an email with a solution yesterday or the day before. Look for a message with MultiFieldQueryParser in its Subject. Otis --- Maurice Coyle [EMAIL PROTECTED] wrote: are there any plans to implement some sort of MultiFieldQueryParser.setOperator(int) method so folk

Re: The best way forward

2003-10-31 Thread Otis Gospodnetic
Wow, with 16GB RAM, I would definitely load the index into RAM. You can use RAMDirectory(Directory) constructor for that. As for RAMDrives. I have no experience with those, but I have heard of some people using ramfs under Linux. Ramfs is a memory based filesystem. Mount it and you have

Re: The best way forward

2003-11-04 Thread Otis Gospodnetic
--- jt oob [EMAIL PROTECTED] wrote: Thank you for the replies! My indexes are currently looking like they might be 12GB when finished on the current run. I have spotted a tool on the lucene site for listing the most frequently occuring words in the index. Currently I am using the

Re: best way of reusing IndexSearcher objects

2003-11-05 Thread Otis Gospodnetic
Use a single instance of IndexSearcher. When you detect that the index has changed, through that instance (see javadoc for the exact method name, I don't recall its exact name now), discard that instance, and make a new one. Do this check before every query or every X unit of time if you don't

Re: crash in Lucene

2003-11-07 Thread Otis Gospodnetic
--- Erik Hatcher [EMAIL PROTECTED] wrote: On Thursday, November 6, 2003, at 02:44 PM, Chong, Herb wrote: it's the line with the close(). so the remedy then is to make sure that it is called only once. what is the recommended way to process two folders worth of documents then? do i

RE: crash in Lucene

2003-11-07 Thread Otis Gospodnetic
Excellent. If you have time, please contribute a patch for the terse and vague documentation, so others don't have to suffer. Thanks, Otis --- Chong, Herb [EMAIL PROTECTED] wrote: i'm running in a single thread. the demo app is pretty vague on things and expects me to read the detailed

Re: Document Clustering

2003-11-11 Thread Otis Gospodnetic
--- Leo Galambos [EMAIL PROTECTED] wrote: Marcel Stör wrote: Hi As everybody seems to be so exited about it, would someone please be so kind to explain what document based clustering is? AFAIK, document clustering consists of detection of documents with similar content (similar

Re: Document Clustering

2003-11-11 Thread Otis Gospodnetic
Thanks for the clarification, Stefan. I should have known that... :) Otis --- Stefan Groschupf [EMAIL PROTECTED] wrote: Hi, How is document clustering different/related to text categorization? Clustering: try to find own categories and put documents that match in it. You group all

Re: Reopen IndexWriter after delete?

2003-11-11 Thread Otis Gospodnetic
1). If I delete a term using an IndexReader, can I use an existing IndexWriter to write to the index? Or do I need to close and reopen the IndexWriter? No. You should close IndexWriter first, then open IndexReader, then call delete, then close IndexReader, and then open a new IndexWriter.

Re: Index pdf files with your content in lucene.

2003-11-11 Thread Otis Gospodnetic
Ernesto, it looks like something got stripped. A ZIP file should make it to the list. If not, maybe you can post it somewhere. Could you also tell us a bit about this code? Is it better than existing PDF/Word parsing solutions? Pure Java? Uses POI? Thanks, Otis --- Ernesto De Santis

Re: Reopen IndexWriter after delete?

2003-11-12 Thread Otis Gospodnetic
Correct. write.lock is used for that. Otis --- Morus Walter [EMAIL PROTECTED] wrote: Otis Gospodnetic writes: No, it is not safe. You should close the IndexWriter, then delete the document and close IndexReader, and then get a new IndexWriter and continue writing. IIRC lucene

Re: Vector Space Model in Lucene?

2003-11-13 Thread Otis Gospodnetic
Lucene does not implement vector space model. Otis --- [EMAIL PROTECTED] wrote: Hi, does Lucene implement a Vector Space Model? If yes, does anybody have an example of how using it? Cheers, Ralf -- NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien... Fotoalbum, File

Re: Latent Semantic Indexing

2003-11-13 Thread Otis Gospodnetic
No, sorry. Otis --- Ralf Bierig [EMAIL PROTECTED] wrote: Does Lucene implement Latent Semantic Indexing? Examples? Ralf -- NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien... Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService Jetzt kostenlos anmelden unter

Re: Reopen IndexWriter after delete?

2003-11-13 Thread Otis Gospodnetic
an IndexWriter to delete an item? On Tue, Nov 11, 2003 at 02:46:37PM -0800, Otis Gospodnetic wrote: 1). If I delete a term using an IndexReader, can I use an existing IndexWriter to write to the index? Or do I need to close and reopen the IndexWriter? No. You should close

RE: Reopen IndexWriter after delete?

2003-11-13 Thread Otis Gospodnetic
? Which begs the question: why do you need to use an IndexReader rather than an IndexWriter to delete an item? On Tue, Nov 11, 2003 at 02:46:37PM -0800, Otis Gospodnetic wrote: 1). If I delete a term using an IndexReader, can I use an existing IndexWriter to write to the index? Or do I

Re: Poor Performance when searching for 500+ terms

2003-11-13 Thread Otis Gospodnetic
I am not using RAMDirectory due to the large size of index file. the index generated on hard disc is 1.57G for 1 million documents, each document has average 500 terms. I am using Field.UnStored(fieldName, terms), so i beliece I am not storing the documents, just the index. (is that right?)

Re: Two possible solutions on Parallel Searching

2003-11-13 Thread Otis Gospodnetic
Multiple threads against the same index or multiple indices - no advantage - think about the mechanical parts involved (disk head). Multiple threads against indices on different disks (not just paritions!) - yes, that would be faster. Reading the index from the disk is the bottleneck, not the

Re: QueryParser Rules article (Erik Hatcher)

2003-11-14 Thread Otis Gospodnetic
Erik is referring to the VERY latest version - the CVS :) Otis --- Erik Hatcher [EMAIL PROTECTED] wrote: On Thursday, November 13, 2003, at 06:46 PM, Tomcat Programmer wrote: Hopefully the dev group will consider refactoring the code so that when its doing the lexing it will throw

RE: Contributing to Lucene (was RE: inter-term correlation [was R e: Vector Space Model in Lucene?])

2003-11-17 Thread Otis Gospodnetic
Dmitry once contributed a nice beefy patch that added Term Vector support to Lucene. While we never integrated the changes (for no good reason), I do recall that the patch was nice and elegant, because it allowed one to turn Term Vector support on/off at indexing time. If turned on, Lucene would

Re: Lucene version 1.3.

2003-11-24 Thread Otis Gospodnetic
Sorry, no firm date. However, 1.3 RC2 is pretty solid, so I suggest you just use that until 1.3 final is out. Otis --- Tun Lin [EMAIL PROTECTED] wrote: Hi, Anyone knows when the full version of Lucene version 1.3 will be released? Please advise. Thanks.

Re: Similarity class

2003-11-24 Thread Otis Gospodnetic
It sounds like you missed this: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/DefaultSimilarity.html You can write your own implementations and use it during indexing and searching. Otis --- Ralf B [EMAIL PROTECTED] wrote: Hi, I am a very beginner of Lucene und

RE: Lucene version 1.3.

2003-11-25 Thread Otis Gospodnetic
1.3RC2. Otis --- Scott Smith [EMAIL PROTECTED] wrote: If you had to be production in January, would you be using 1.3RC2 or 1.2? -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Monday, November 24, 2003 4:03 AM To: Lucene Users List; [EMAIL PROTECTED

Re: Search Question - not returning desired results

2003-11-25 Thread Otis Gospodnetic
You have to look at Analyzers. Figure out which one you are using and why, and see if you should be using a different one or even write your own. Some of the Analyzers break input on certain tokens (e.g. . or _ or ...), which sounds like the problem is here. I think Erik's java.net article about

RE: Search Question

2003-11-25 Thread Otis Gospodnetic
Because '_' wa sprobably removed from your input before it was indexed. I suggest reading up on Analyzers and Tokenizers. Otis --- Pleasant, Tracy [EMAIL PROTECTED] wrote: Also searching 'red_*' returns nothing, also. -Original Message- From: Dror Matalon [mailto:[EMAIL

Re: Searching different types of words

2003-11-25 Thread Otis Gospodnetic
Yes. For this particular example, PorterStemFilter will do the job. For more complex things (e.g. a search for car returning car, auto, automobile, vehicle) you'll need to add thesaurus-like capability to your indexer. This can be done by writing a custom Analyzer. It sounds like you have a lot

RE: Hits Highlighting

2003-11-25 Thread Otis Gospodnetic
There are several Lucene highlighting solutions for Lucene out there. I know of two that include source code, and I think at least one of them is on the Contributions page. There are some threads that talk about this issue, too. Otis --- Pleasant, Tracy [EMAIL PROTECTED] wrote: I have seen that

Re: Wildcard searches and BooleanQuery$TooManyClauses

2003-11-25 Thread Otis Gospodnetic
Correct. As for side-effect, well, things will be slower, obviously :) Increase the limit, perform a search, and see if it's still sufficiently fast...that's what I would do. :) Otis --- Dror Matalon [EMAIL PROTECTED] wrote: This was raised in http://www.mail-archive.com/[EMAIL

Re: Lucene command line tool

2003-11-25 Thread Otis Gospodnetic
Hm, I don't know of any such tools. It would be nice to have something like that. If you find such a tool, or write it yourself, let us know about the URL, so we can include on the Lucene's site. Otis --- Dror Matalon [EMAIL PROTECTED] wrote: Hi, I looked around the archives and didn't see

Re: Chinese input.

2003-11-26 Thread Otis Gospodnetic
Maybe this will help? http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23545 Otis --- Tun Lin [EMAIL PROTECTED] wrote: Hi, May I know how do I analyse Chinese input from Chinese text in Lucene? Do I use Analyser function in Lucene? If yes, how to go about using it?

Re: disable locks on read only indexes (performance improvement?)

2003-12-02 Thread Otis Gospodnetic
--- Dror Matalon [EMAIL PROTECTED] wrote: So, the lock is set, the segments file is opened, all the files in the segments file are opened and then the lock is released? Is that correct? Yes. See IndexReader. And we're relying on the OS to keep the file handles around even if the files

Re: New Lucene-powered Website

2003-12-02 Thread Otis Gospodnetic
Could you add a Lucene logo somewhere on your search results, as noted here: http://jakarta.apache.org/lucene/docs/powered.html ? Thanks! Otis --- Ulrich Mayring [EMAIL PROTECTED] wrote: Hello, we (DENIC) are the world's second largest domain registry (.de-zone has almost 6.9 million

Re: New Lucene-powered Website

2003-12-02 Thread Otis Gospodnetic
Ok, let us know if you can add it. Otis --- Ulrich Mayring [EMAIL PROTECTED] wrote: Otis Gospodnetic wrote: Could you add a Lucene logo somewhere on your search results, as noted here: http://jakarta.apache.org/lucene/docs/powered.html ? Will suggest that to the powers

Re: Translation.

2003-12-03 Thread Otis Gospodnetic
Uh, I get to do this dirty job. :( Lucene-user and lucene-dev are not the appropriate fora for questions such as this one. Please ask the original author of the text for help, or use an online translation service, such as the one at http://babelfish.av.com Also, for questions about Lucene usage,

Re: What about Spindle

2003-12-03 Thread Otis Gospodnetic
You should ask Spindle author(s). The error doesn't look like something that is related to Lucene, really. Otis --- Zhou, Oliver [EMAIL PROTECTED] wrote: What about Spindle? Has anybody used it to crawle a jsp based web site? Do I need to intall listlib.jar to do so? I got error message

Re: Indexing

2003-12-04 Thread Otis Gospodnetic
Maybe http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html#maxFieldLength ? Otis --- Aaron Galea [EMAIL PROTECTED] wrote: Hi I am indexing a document but for a strange reason the word Mayo is never indexed. The thing is that in this large document this term

Re: FSDIrectory.create doesn't tolerate subdirectories

2003-12-08 Thread Otis Gospodnetic
I am against making the suggested Lucene modification. Lucene index structure may change in the future. It is possible that one day Lucene developers will need to use a hierarchy of directories to implement some feature. Therefore, Lucene users should be discouraged from creating sub-directories

Re: term vector (Damian patch)

2003-12-08 Thread Otis Gospodnetic
Stefan, which patch are you referring to? I looked at the following, but did not find it:

Re: term vector (Damian patch)

2003-12-08 Thread Otis Gospodnetic
I think this never resulted in a patch. A few days after that thread another person expressed interest in implementing the same thing, but I am not sure what the status of that idea is now. Otis --- Stefan Groschupf [EMAIL PROTECTED] wrote: Otis, based on this discussion:

Re: term vector or document vector

2003-12-08 Thread Otis Gospodnetic
--- Stefan Groschupf [EMAIL PROTECTED] wrote: Just to be sure since there was a lot of dicussion in the lists. There is actually no solution available to get a term vector for a document or a TF/IDF feature vector for a document, isn't it? Correct :( Some one had work on such things?

Re: term vector or document vector

2003-12-08 Thread Otis Gospodnetic
Nice. Please send the cvs diff, as I mentioned in that thread where you sent inlined diffs. Thanks, Otis --- Damian Gajda [EMAIL PROTECTED] wrote: BTW. i may send You the partly working Lucene with Dmitrys code patched in. -- Damian

Re: write.lock

2003-12-11 Thread Otis Gospodnetic
I don't think there was a follow-up to this. Aaron, please provide a listing of the directory that you are using in IndexWriter constructor. Is it empty? What are permissions on it? When the exception occurs, a file called write.lock should remain in the directory. Can you ls -al that file?

Re: Unindexed fields

2003-12-11 Thread Otis Gospodnetic
I don't fully understand what you mean by increasing the maximum string size. Are you referring to the length of terms in the field, so now your field can contain terms whose text/string value can have the size/length of 10,000 bytes? If that is so, I believe there is an internal (to Lucene)

RE: Unindexed fields

2003-12-11 Thread Otis Gospodnetic
Please try. I find this (the originally described problem) hard to believe. :) Otis --- Chong, Herb [EMAIL PROTECTED] wrote: i would like to, but the documents contain confidential content. i don't know if i can reproduce the problem with another set of documents. Herb

Re: Web Lucene classes.

2003-12-13 Thread Otis Gospodnetic
Tun Lin, WebLucene is a different project, so you should really use its mailing lists. I doubt subscribers to the Lucene mailing list will be able to help you as much as the WebLucene author and other WebLucene users. Otis --- Tun Lin [EMAIL PROTECTED] wrote: Hi, When I downloaded the web

Re: Wildcard in Field

2003-12-18 Thread Otis Gospodnetic
You can use raw *Query classes and OR, perhaps. Or, if you are using QueryParser, there is a MultiFieldQueryParser (or something like that) classwhich I've used awhile ago. Otis --- Thijs Cadier [EMAIL PROTECTED] wrote: I'm implementing Lucene in our Content Management system. A plugin for

Re: java.io.IOException: couldn't delete backup

2003-12-18 Thread Otis Gospodnetic
Lucene writes locks to some directories (java.io.temp system property), so make sure you can write to those. Otis --- Alex Gadea [EMAIL PROTECTED] wrote: I am trying to setup a Lucene installation on a Windows 2000 server. I can not get the IndexWriter to initialize properly. It fails out

Re: best way of reusing IndexSearcher objects

2003-12-18 Thread Otis Gospodnetic
You could subclass IndexSearcher like you said. You could patch IndexSearcher by adding this method. You could also change the search method to do this check automatically. You could also add setAutoRefresh(boolean) method to IndexSearcher and then to automatic refresh only if this was set to

Re: Types of field in index

2003-12-22 Thread Otis Gospodnetic
I suggest you look at the Articles section of Lucene's site, in particular an article about XML, Lucene, and Digester. Much better than using IndexHTML demo, I believe. Otis --- Thomas_Krämer [EMAIL PROTECTED] wrote: Hello Lucene Users i use Lucene 1.3rc3 to index several thousand metadata

Re: IndexWriter.optimize bug in version 1.3-final?

2004-01-06 Thread Otis Gospodnetic
This is a question for lucene-user list...redirecting. Looks okay, except it doesn't look like real code. Also, you are catching Exception and only logging it. Maybe that exception hides the source of the problem. Otis --- [EMAIL PROTECTED] wrote: Greetings, I upgraded from lucene-1.2.jar

Re: Performance question

2004-01-06 Thread Otis Gospodnetic
--- Scott Smith [EMAIL PROTECTED] wrote: I have an application that is reading in XML files and indexing them. Each XML file is 3K-6K bytes. This application preloads a database that I will add to on the fly later. However, all I want it to do initially is take some existing files and

Re: Closing the IndexSearcher object

2004-01-18 Thread Otis Gospodnetic
I think this is a FAQ. Keep that single IndexSearcher until you change the index and want that IS to see those changes. Otis --- Karl Koch [EMAIL PROTECTED] wrote: Hi all, I have a search method who is used by many programs with different queries. I therefore do not want to close the

Re: Unexpected end in indexing HTML file

2004-01-19 Thread Otis Gospodnetic
Look at the IndexWriter Javadocs. One of the fields allows you to set maximum term length. This may also be a problem with the HTML parser you are using. You didn't share a lot of details, so I cannot help more. Otis --- Syrén_Per [EMAIL PROTECTED] wrote: Hi all, Have a question

Re: QueryParser and stopwords

2004-01-21 Thread Otis Gospodnetic
Hello Morus, --- Morus Walter [EMAIL PROTECTED] wrote: Hi, I'm currently trying to get rid of query parser problems with stopwords (depending on the query, there are ArrayIndexOutOfBoundsExceptions, e.g. for stop AND nonstop where stop is a stopword and nonstop not). While this isn't

Re: setMaxClauseCount ??

2004-01-21 Thread Otis Gospodnetic
Karl: http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=114748 Status: several people have mentioned they wanted to work on it, but nobody has contributed any patches. The code you see at the above URL is not compatible with Lucene 1.3, but could be brought up to date. Otis --- Karl

Re: Query madness with NOTs...

2004-01-23 Thread Otis Gospodnetic
Redirecting to lucene-user --- Jim Hargrave [EMAIL PROTECTED] wrote: Can anyone tell me why these two queries would produce different results: +A -B A -(-B) A and +A are not the same thing when you have multiple terms in a query. Also, we are having a hard time understanding why

Re: example of using RAMDirectory

2004-01-26 Thread Otis Gospodnetic
Use RAMDirectory and then user mergeIndexes(Directory[]) method. Otis --- Chong, Herb [EMAIL PROTECTED] wrote: does anyone have an example of using RAMDirectory during indexing and then copying the index into a FSDirectory? Herb...

Re: arrays of values in a field

2004-01-27 Thread Otis Gospodnetic
You can add multiple String values to a single Field. I don't remember the API to provide an example here, but you should be able to find this in the Javadoc. Maybe even in FAQ, not sure. Otis --- Gabe [EMAIL PROTECTED] wrote: If I have a group of documents and I want to filter on a

Re: Performance difference between 1.2 and 1.3?

2004-01-29 Thread Otis Gospodnetic
Hello, This is not a known problem. The mention of Cocoon makes me think XML. What format are your documents in? If they are in XML, the first place to look for performance-related problems is the XML parser. It looks like you got a new version of Cocoon, so maybe this new version includes a

Re: Japanese Analyzer

2004-01-29 Thread Otis Gospodnetic
I think that's the only one we've got. You can browse the Lucene Sandbox contributions directory, it's there. Otis --- Weir, Michael [EMAIL PROTECTED] wrote: Is the CJKAnalyzer the best to use for Japanese? If not, which is? If so, from where can I download it? Thanks. Michael Weir .

Re: Paid support for Lucene

2004-01-29 Thread Otis Gospodnetic
Otis Gospodnetic --- Boris Goldowsky [EMAIL PROTECTED] wrote: Strangely, the web site does not seem to list any vendors who provide incident support for Lucene. That can't be right, can it? Can anyone point me to organizations that would be willing to provide support for Lucene issues

Re: Paid support for Lucene - SUMMARY

2004-01-30 Thread Otis Gospodnetic
Not as far as I know. Otis --- Stefan Groschupf [EMAIL PROTECTED] wrote: Am 30.01.2004 um 22:11 schrieb Stefan Groschupf: JBoss Group http://jboss.org/ Does jboss really support maven? Sorry, doing 2 things at the same time is not good. Should be: Does jboss really

Re: Lucene with Postgres db

2004-01-31 Thread Otis Gospodnetic
Use JCDB to connect to your DB, issue appropriate SELECTs to select each of you entity/document units, then use the returned data to create instances of Lucene documents, add those to the index via IndexWriter, and you got yourself a Lucene index that represents data you have stored in DB. If

RE: weblogic cluster, index on NFS and locking problem

2004-02-04 Thread Otis Gospodnetic
The best way to submit contributions is via Bugzilla. For instance, here is the current queue of contributed code, patches, etc.:

Re: [newbie] Hit quality rating

2004-02-04 Thread Otis Gospodnetic
There is score. Look at Similarity class. Otis --- [EMAIL PROTECTED] wrote: Hi! Is there a hit quality rating in Lucene or are there only hits and non-hits? Timo - To unsubscribe, e-mail: [EMAIL PROTECTED] For

Re: Need Advices and Help

2004-02-05 Thread Otis Gospodnetic
I believe it would be something like Message-ID or --- Caroline Jen [EMAIL PROTECTED] wrote: I am trying to build message inboxes for all registered members of a web site. Therefore, each thread (i.e. under a certain discussion topic) can have several postings. And each registered member's

Re: Need Advices and Help

2004-02-05 Thread Otis Gospodnetic
I believe it would be the value of a 'Message-ID' or 'Reference' or 'Reference-ID' message header. However, I remember reading that mail readers are not very good at sticking to a standard (some RFC, I guess), so they don't always provide the corrent ID, or they store it under non-standard names,

Re: ANNOUNCE: Plucene

2004-02-05 Thread Otis Gospodnetic
Good news, I was looking forward to the Perl port. I added it to the list of Lucene ports on Lucene site. Otis --- Tony Bowden [EMAIL PROTECTED] wrote: Plucene 1.0 has just been released to CPAN, and is available at http://search.cpan.org/dist/Plucene/ This is a port of Lucene to Perl,

Re: Nightly snapshots

2004-02-08 Thread Otis Gospodnetic
A problem with Gump, I believe. Otis --- Eric Jain [EMAIL PROTECTED] wrote: How come there is no nightly snapshot newer than 2003-09-09 at http://cvs.apache.org/builds/jakarta-lucene/nightly/? - To unsubscribe, e-mail:

Re: Lucene 1.3 final -- Lock/Segment permission errors

2004-02-09 Thread Otis Gospodnetic
I would first look at the exact command line that is used to start the app server. Could it be that includes something like -Djava.io.temp=some-directory-here ? Lucene uses java.io.temp System property to determine the location/directory to use for lock files. Maybe this app server uses some

Re: Another Newbie question--FSDirectory

2004-02-10 Thread Otis Gospodnetic
You should probably always try to use Directory, and not String nor FSDirectory. Directory is the most abstract 'index type and location entity', and using it smartly allows you to change your index type and location more easily, should you ever choose to do that. Otis --- Scott Smith [EMAIL

Re: Index advice...

2004-02-10 Thread Otis Gospodnetic
Without seeing more information/code, I can't tell which part of your system slows down with time, but I can tell you that Lucene's 'add' does not slow over time (i.e. as the index gets larger). Therefore, I would look elsewhere for causes of the slowdown. The easiest thing to do is add logging

Re: Ordering by a field value

2004-02-10 Thread Otis Gospodnetic
There were some recent contributions that should make this possible and simple to do. The code should be added to Lucene CVS repository in the next week or so. Otis --- Gabe [EMAIL PROTECTED] wrote: Hi, I was wondering whether it was possible to sort search results by the order of the

Re: Index advice...

2004-02-10 Thread Otis Gospodnetic
--- Leo Galambos [EMAIL PROTECTED] wrote: Otis Gospodnetic napsal(a): Without seeing more information/code, I can't tell which part of your system slows down with time, but I can tell you that Lucene's 'add' does not slow over time (i.e. as the index gets larger). Therefore, I would

Re: Index advice...

2004-02-10 Thread Otis Gospodnetic
--- Leo Galambos [EMAIL PROTECTED] wrote: Otis Gospodnetic napsal(a): Thus I do not know how it could be O(1). ~ O(1) is what I have observed through experiments with indexing of several million documents. What did you exactly measured? Just the time of the insert

Re: Ordering by a field value

2004-02-10 Thread Otis Gospodnetic
will the names of the relevant files be and will I be able to use 1.3 final still (simply integrating the contributions into my own code) or would I have to go with the latest code from CVS? Thanks again, Gabe --- Otis Gospodnetic [EMAIL PROTECTED] wrote: There were some recent contributions

Re: how to re-index

2004-02-11 Thread Otis Gospodnetic
Update in Lucene means: delete the document and then re-add it. This may be a FAQ. Otis --- Markus Brosch [EMAIL PROTECTED] wrote: However, I have problems with reindexing. First, I index all my object contents. Then some of these objects can change and need to be re-indexed. I

Re: commit.lock file

2004-02-11 Thread Otis Gospodnetic
If there are commit.lock files being left over, you should really investigate why that is happening. Something is probaly dying, and you are not catching it and cleaning up by closing things like IndexReader or IndexWriter. If you want to forcefully unlock the index, use isLocked and unlock

Re: featues page in the Lucene web site

2004-02-11 Thread Otis Gospodnetic
, this will be the most comprehensive and up to date documentation about Lucene. Otis Gospodnetic --- Nicolas Maisonneuve [EMAIL PROTECTED] wrote: hy, it would be great if a page with all features of lucene would be created in the apache lucene site ! in the sourceforge website (http

Re: code for more like this query expansion - was - Re: setMaxClauseCount ??

2004-02-12 Thread Otis Gospodnetic
Lots of params in that mlt method, but it seems flexible. I'll try it. Small optimization suggestion: use int[] with a single element for that words Map, instead of creating lots of Integer()s. Actually, maybe JVMs are smart and don't allocate new objects for the same int wrapped in Integer

Re: Word not in index

2004-02-16 Thread Otis Gospodnetic
Searches ARE case sensitive, it is just that some Analyzers lowercase all tokens. If you are using WhitespaceAnalyzer, then tokens will not be lowercased, so a search for albert and Albert may yield different results. Otis --- [EMAIL PROTECTED] wrote: On Monday 16 February 2004 19:20, [EMAIL

Re: Word not in index

2004-02-16 Thread Otis Gospodnetic
Custom? :) Otis --- [EMAIL PROTECTED] wrote: On Monday 16 February 2004 19:57, Otis Gospodnetic wrote: Searches ARE case sensitive, it is just that some Analyzers lowercase all tokens. If you are using WhitespaceAnalyzer, then tokens will not GermanAnalyzer apparently is one of them

Re: Word not in index

2004-02-16 Thread Otis Gospodnetic
Timo, by the nature of your questions it seems like you didn't see the Articles section of Lucene's site. There are links to several articles there. A few of them explain indexing (intro + more advanced), at least one explains QueryParser and maybe Analyzer, and a few explain vanilla searching.

Re: Concurrency

2004-02-20 Thread Otis Gospodnetic
Ive just got a couple of questions which i cant quite work out...wondered if someone could help me with them: 1. What happens if i make a backup (copy) of an index while documents are being added? Can it cause problems, and if so is there a way to safely do this? You should be okay.

<    1   2   3   4   5   6   7   8   >