RE: final steps

2001-10-01 Thread Doug Cutting
From: Jason van Zyl [mailto:[EMAIL PROTECTED]] If you can build the javadoc than create a link on the site for it and that should suffice. We have no central place for generated javadoc. Okay. The first step according to the HOWTO is to logon to the web server, but I don't know how to do

RE: [Lucene-dev] Problem: Maximum field size

2001-10-02 Thread Doug Cutting
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Is there any limit for the size of a field? I've tried to index a document with a field (UnStored type) of something like 8 chars and I've noticed that the words which are in the end of that field aren't indexed... If there is a

FW: Lucene 1.2 and directory write permissions?

2001-10-05 Thread Doug Cutting
Here's one vote for putting locks in a separate directory. Anyone dislike that? Doug -Original Message- From: Snyder, David [mailto:[EMAIL PROTECTED]] Sent: Friday, October 05, 2001 11:23 AM To: Doug Cutting Subject: RE: Lucene 1.2 and directory write permissions? The lock file

RE: Lucene 1.2 and directory write permissions?

2001-10-05 Thread Doug Cutting
From: Snyder, David [mailto:[EMAIL PROTECTED]] I think splitting out the locks into a separate directory would solve our problem... Do you think this is something very difficult to do? No, it will be easy. our indexes (we use many with the multisearcher) are about 13 gigs now and

RE: TermVector support - first release

2001-10-19 Thread Doug Cutting
Dmitry, Wow! This looks great! I was preparing a response to your questions of last weekend, but it seems like you figured out a lot of it on your own. I've attached that response anyway, in case you're still interested. Once we get 1.2 out the door I'd like to make you a committer

RE: Re: [Lucene-dev] Katakana characters in queries (a bug?)

2001-10-22 Thread Doug Cutting
Brian, Do you know what's going on here? I have not yet had time to look at this. If you don't have time, and no one else volunteers, then I will look into it. I would like fix this for the 1.2 final release, if the change required is not major. Doug -Original Message- From: [EMAIL

new file: CHANGES.txt

2001-11-04 Thread Doug Cutting
I have added a new file in the top-level of Lucene named 'CHANGES.txt'. This contains a list of user-visibile changes. I've filled in some historical information. Committers: please add an entry at the top of this file when you make changes. This will serve as release notes. Thanks, Doug --

Adobe Illustrator help?

2001-09-27 Thread Doug Cutting
Can someone with access to Adobe Illustrator please help Lucene? To build Lucene's new home at Apache Jakarta we need to extract the Lucene logo from the original Lucene artwork and save it as a set of GIFs. These should contain just the script of the word Lucene. We need a 300 pixel wide

RE: [Lucene-dev] CVS commit: 'lucene/com/lucene/analysis/de Germa nAnalyzer.java,1.1 GermanStemFilter.java,1.1 GermanStemmer.java,1.1 Makef ile,1.1 WordlistLoader.java,1.1'

2001-09-25 Thread Doug Cutting
From: Jon Stevens [mailto:[EMAIL PROTECTED]] on 9/24/01 2:52 PM, Doug Cutting [EMAIL PROTECTED] wrote: I think we should 'cvs rm' all of the files, and change the README to point to Jakarta. Does that sound reasonable? Don't do that. It will serve as the repo for the old history

RE: Adobe Illustrator help?

2001-09-27 Thread Doug Cutting
Thanks to all who responded. Matt Tucker was first and did a fine job, so I'll be using his. Thanks again, Doug

RE: final steps

2001-09-27 Thread Doug Cutting
From: Ted Husted [mailto:[EMAIL PROTECTED]] Any Committer with site karma can do this. Right now, that includes me ;-) I believe Brian would be able to grant the same to you if you want to try it yourself. I'm happy to let you do it. Can you put up the javadoc now? That will fix the

RE: final steps

2001-09-27 Thread Doug Cutting
From: Doug Cutting [mailto:[EMAIL PROTECTED]] Okay, so maybe we should just start with the nightlies. Actually, it would be nice to have at least a milestone release when we go public. What's involved in being the release manager? I'm happy to write up some release notes, if that would

RE: multithreading in SegmentsReader

2001-10-10 Thread Doug Cutting
Your analysis looks good to me. I think it would be simpler, if a bit less optimized, to just make SegmentsReader.numDocs() and SegmentsReader.delete() synchronized methods. Does that sound like a reasonable fix to you? Thanks for spotting this. As for closing, your analysis also sounds

RE: multithreading in SegmentsReader

2001-10-11 Thread Doug Cutting
From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED]] But I was looking again at the MultiSearcher after reading through the SegmentsReader (and friends) and I was thinking if it wouldn't be better to write MultiSearcher not in terms of searching over multiple Searchers, but as an

RE: Token retrieval question

2001-10-11 Thread Doug Cutting
From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED]] Doug, thanks for posting these. I may end up going in this direction in the next few days and will use this as a blueprint. Maybe I'll end up putting in the first pass implementation and then you can later further tune it when

RE: Re:Added comments to InputStream and OutputStrea m

2001-10-12 Thread Doug Cutting
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Unicode is 16 bits. Unicode is currently defined as having up to 2^31 positions, although the current plan is for somewhere between 2^20 and 2^21 characters. (2^16 characters was the old Unicode standard - dropped when someone pointed

RE: build.xml still requires anakia

2001-10-15 Thread Doug Cutting
From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED]] The latest build.xml works fine with Ant and without the batch files, but it has a classpath statement that fails if anakia is not present. If I remove anakia, then it only fails for me when I try to build the docs target, which is

RE: build.xml still requires anakia

2001-10-15 Thread Doug Cutting
investigate. Commenting it out works fine, but it would be better, if we didn't have to modify this file for different compilation scenarios. Doug Cutting wrote: From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED]] The latest build.xml works fine with Ant and without the batch files

RE: Format Stripping [ was: XLS parser ]

2002-01-22 Thread Doug Cutting
From: Brian Goetz [mailto:[EMAIL PROTECTED]] I like the idea of being able to add fields to a Document after the Document is indexed. Then, for documents with a long 'body' and short metadata fields, you could process the body through an InputStream adapter, which would, as a side effect,

RE: Format Stripping [ was: XLS parser ]

2002-01-22 Thread Doug Cutting
From: Andrew C. Oliver [mailto:[EMAIL PROTECTED]] We've implemented an event based system for reading documents (so you register for what you care about and then kick it off and it throws events to listeners as it runs into them). Not sure if there is a clean way to graft those ideas onto

RE: cvs commit: jakarta-lucene/lib JavaCC.zip LICENSE.txt

2002-01-22 Thread Doug Cutting
From: Andrew C. Oliver [mailto:[EMAIL PROTECTED]] I believe you could submit this request to [EMAIL PROTECTED], or perhaps Ted could give us some direction on that. Ted's the one who asked me to remove it. Doug -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands,

RE: first draft

2002-01-24 Thread Doug Cutting
From: Andrew C. Oliver [mailto:[EMAIL PROTECTED]] Would the demos be pre-compiled in the distribution? I think they are currently. If they're not, they should be. As for packaging it in org.apache.lucene.demo in addition to keeping it in a separate jar (and hence under demo instead of

RE: Getting Started (first draft comments)

2002-01-24 Thread Doug Cutting
Currently information on how to build Lucene is in the BUILD.txt file that is in CVS and distributed with the source distribution, but not the binary distribution. Is this document inaccurate or inadequate? Should we improve it or replace it? In any case, Lucene build instructions and the

RE: update website please

2002-01-27 Thread Doug Cutting
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Can someone who has privileges, please update the website html. Done. If I also have privileges, please let me know how to update the site. ssh www.apache.org cd /www/jakarta.apache.org/lucene cvs update -d Doug -- To unsubscribe,

release 1.2 RC3

2002-01-27 Thread Doug Cutting
I just made a new release, 1.2RC3, based on the current CVS: http://jakarta.apache.org/builds/jakarta-lucene/release/v1.2-rc3/ I did some simple tests, and things look good to me. Does anyone see a reason not to announce this to lucene-user? Hopefully we can turn this into a 1.2 final

RE: release 1.2 RC3

2002-01-28 Thread Doug Cutting
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Another solution is to make a symbolic link (shortcut?) from ./lib/JavaCC.zip to the real JavaCC.zip, which is what I just did. That works so long as you're not building distributions. The 'dist' and 'dist-src' targets bundle in the

RE: Junit3.5 - 3.7?

2002-01-28 Thread Doug Cutting
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Does anyone see a problem with moving from Junit 3.5 to Junit 3.7? +1 Doug -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]

RE: Delete is not multi-thread safe

2002-01-31 Thread Doug Cutting
From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED]] It seems that either a) deletes should be write-through, or b) deletes should be done by the writer, or c) writer should not optimize non-RAM segments unless asked to. As a client, I like option b) the best, though, this is not

RE: Delete is not multi-thread safe

2002-01-31 Thread Doug Cutting
From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED]] If there is one user performing additions and deletions, then the two can be ordered. But if an application is such that it allows multiple people initiate index updates of various kinds, it may be much harder to order additions

RE: Proposal for Lucene

2002-02-07 Thread Doug Cutting
I think this is a great idea. Lucene badly needs this sort of high-level interface. As far as other folks' concern about keeping Lucene a library and not making it an application, I agree, but I also assumed that's what you meant to do. All of this can be layered on top of the existing API.

RE: cvs commit: jakarta-lucene/src/java/org/apache/lucene/store FSDirectory.java

2002-02-14 Thread Doug Cutting
Thanks for making all these cleanups, Otis! One comment: From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Wednesday, February 13, 2002 5:47 PM To: [EMAIL PROTECTED] Subject: cvs commit: jakarta-lucene/src/java/org/apache/lucene/store FSDirectory.java [ ... ] + * Examples of

RE: Re : How does Lucene handle phrases containing words that are not indexed?

2002-02-14 Thread Doug Cutting
From: Halácsy Péter [mailto:[EMAIL PROTECTED]] I'd like to index documents that are described by keywords. One document can have zero or more keywords and a keyword can be related to one ore more documents. Assume two keywords: human computer interaction computer science If I add

RE: Re : How does Lucene handle phrases containing words that are not indexed?

2002-02-14 Thread Doug Cutting
From: Julien Nioche [mailto:[EMAIL PROTECTED]] By the way, I was wondering if there is any Analyzer that uses the following constructor public Token(String text, int start, int end, String typ) ? StandardTokenizer uses Token's type field to communicate with StandardFilter, which does

RE: Indexes in WAR files

2002-02-14 Thread Doug Cutting
From: Les Hughes [mailto:[EMAIL PROTECTED]] Reading the servlet spec again it says that calls such as servletcontext.getRealPath() will *possibly* return null if the content is being served from a war as opposed the physical path on disk - I'm informed that weblogic actually returns

RE: Lucene Query Structure

2002-02-19 Thread Doug Cutting
From: Halácsy Péter [mailto:[EMAIL PROTECTED]] Sent: Tuesday, February 19, 2002 8:49 AM To: Lucene Developers List; Lucene Users List Subject: RE: Lucene Query Structure The queryParser of Lucene implies OR logic if no operator found in the query, doesn't it? Yes. How could I modify

RE: Status of proximity in query language

2002-02-19 Thread Doug Cutting
From: Ype Kingma [mailto:[EMAIL PROTECTED]] I happen to be familiar with a (boolean) query language that only allows proximity operators between or like queries (including prefix terms). This case is not too difficult to explain and not confusing at all. It might be not too difficult to

RE: Lucene Query Structure

2002-02-19 Thread Doug Cutting
From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]] Okay, I think I finally understand how this is working. If we express the semantics of (required, prohibited) in terms of their impact on the score for a document D and query q, we get: (true, false): if q is not satisfied by D,

RE: HitCollector: Why is it abstract?

2002-02-20 Thread Doug Cutting
From: Eric Fixler [mailto:[EMAIL PROTECTED]] I'm wondering if there's a design reason why HitCollector is an abstract class, rather than an interface. I don't recall my thinking, if any, when I did this. An interface is more flexible, since it can be a mix-in, but calls to interfaces are

RE: StrictAnalyzer

2002-02-20 Thread Doug Cutting
From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED]] I know at least in my case, I have a much more extensive list of stop words and they are simply read from a file into an array and then passed to the existing class. Would this approach work in your case? I think that serious

RE: can some access modifiers in org.apache.lucene.search be changed/opened up

2002-02-22 Thread Doug Cutting
From: Spencer, Dave [mailto:[EMAIL PROTECTED]] Proposed solution is to change a couple of decls in Scorer and Query: Scorer.java make score() public Query.java make all methods public or protected (normalize, sumOfSquaredWeights,prepare) I'm a little hesitant.

RE: building lucene

2002-02-26 Thread Doug Cutting
From: Daniel Calvo [mailto:[EMAIL PROTECTED]] This issue has been discussed some time ago and Erik Hatcher sent a patch proposing the definition of all properties in build.xml and letting users customize their environment (javacc.home, etc.) in build.properties. IMO, this is the best

RE: new version of IndexWriter.java

2002-02-27 Thread Doug Cutting
It would be good to also know the average size of your documents, the size of your index, and the amount of RAM required for each benchmark. Lucene currently indexes using very little memory. You're making it faster by using more RAM. In particular you're able to get a 10% speedup (58 versus

RE: Hard to customize sort method in IndexSearcher via HitCollector

2002-02-28 Thread Doug Cutting
From: Che Dong [mailto:[EMAIL PROTECTED]] here is example for sort result with score multi by rank field; scorer.score(new HitCollector() { public final void collect(int doc, float score) { [ ... ] String rank = reader.doc(doc).getField(rank).stringValue(); The problem is that

on vacation through 3/19

2002-03-06 Thread Doug Cutting
FYI, I will be on vacation, without email access, starting tomorrow through March 19th. Please don't expect any responses from me about Lucene during this time. Sorry for the SPAM. Doug -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL

RE: corrupted index

2002-04-02 Thread Doug Cutting
[EMAIL PROTECTED] wrote: Otis, You can remove the .lock file and try re-indexing or continuing indexing where you left off. I am not sure about the corrupt index. I have never seen it happen, and I believe I recall reading some messages from Doug Cutting

RE: Phonetic Encoders

2002-04-02 Thread Doug Cutting
From: Peter Carlson [mailto:[EMAIL PROTECTED]] I recently updated the contributions page (last night), but I need Doug to update the site. I just updated the site. We should get you the privleges required to do this. Once you have the privledges, all that you do is: ssh www.apache.org

release status

2002-04-02 Thread Doug Cutting
I've lost track of just where we are with the 1.2 release. Are there outstanding bugs that we intend to fix before the 1.2 release? There have been only a few minor patches since RC4. Should we make an RC5 or just go ahead with the final release? Doug -- To unsubscribe, e-mail:

Re: Bug? QueryParser may not correctly interpret RangeQuery text

2002-06-05 Thread Doug Cutting
Brian Goetz wrote: I still want to see Date and Number fields supported as basic types in the Field class, rather than use a String in this magic date format. The first part of this is easy: just add new Field constuctor methods that take Date and number parameters, e.g.:

Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/store FSDirectory.java

2002-06-24 Thread Doug Cutting
[EMAIL PROTECTED] wrote: [ ... ] + private static final boolean DISABLE_LOCKS = Boolean.getBoolean(disableLocks); [ ... ] public boolean obtain() throws IOException { - if (Constants.JAVA_1_1) return true;// locks disabled in jdk 1.1 + if

Re: Interesting idea

2002-07-10 Thread Doug Cutting
Jon Scott Stevens wrote: Adding support to Lucene for Nilsimsa seems like a cool idea... http://ixazon.dynip.com/~cmeclax/nilsimsa.html The index would be the hash and one could use Lucene to rank searches based on the Nilsimsa rating of the results... Nilsimsa employs a very different

Re: CachedSearcher

2002-07-17 Thread Doug Cutting
Halcsy Pter wrote: Could you please make a proposal to the lucene-dev list of which methods and classes should be made public or protected or non-final, and what documentation should be added? 1. all package-protected abstract method of Searcher should be made to protected abstract These

Re: Remote searcher

2002-07-17 Thread Doug Cutting
I just added a remote searchable implementation. See src/test/org/apache/lucene/search/TestRemoteSearchable.java for an example of how this can be used. This is the first RMI code I've written, so please tell me if I've got something wrong. Doug -- To unsubscribe, e-mail: mailto:[EMAIL

Re: cvs commit: jakarta-lucene/src/test/org/apache/lucene/searchTestDocBoost.java

2002-07-29 Thread Doug Cutting
[EMAIL PROTECTED] wrote: Log: msg.txt Oops. That log entry was supposed to read: Added support for boosting the score of documents and fields via the new methods Document.setBoost(float) and Field.setBoost(float). Note: This changes the encoding of an indexed value. Indexes

document field boosting

2002-07-29 Thread Doug Cutting
document scoring, so that a user can alter any part of the formula without altering Lucene's core code. Enjoy! Doug Original Message Subject: Re: cvs commit: jakarta-lucene/src/test/org/apache/lucene/search TestDocBoost.java Date: Mon, 29 Jul 2002 12:14:22 -0700 From: Doug Cutting

Re: setBoost Q.

2002-08-01 Thread Doug Cutting
Mike Tinnes wrote: I've been working on tying in a PageRank algo to my web crawler using lucene and have a few problems. If I don't know the boost factor until AFTER the crawl is it possible to still set the boost? Why not: (1) crawl, saving pages to disk; (2) analyze links and compute

Re: Serialization of org.apache.lucene.search.BooleanClause

2002-08-15 Thread Doug Cutting
Karl von Randow wrote: The org.apache.lucene.search.BooleanClause is not currently Serializable, I would like to propose that it is made serializable. You're right, it should be. This is a bug. When I recently added support for remote searching I tested only TermQuery. I fixed this and

Re: Optimize

2002-08-19 Thread Doug Cutting
Christian Ullenboom wrote: I take a look at the StopFilter/StopAnalyser, the BitVector, and PorterStemmer and I would like to optimize the code. What is the best way to contribute? Please submit contributions to [EMAIL PROTECTED] If the changes are small, a diff file is appropriate. For

Re: RussianAnalyzer

2002-08-21 Thread Doug Cutting
This looks great to me. Does anyone object to adding this to Lucene as the package org.apache.lucene.analysis.ru? Doug -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]

Re: Multi-language (ISpell-based) stemming Analyzer

2002-08-21 Thread Doug Cutting
äÍÉÔÒÉÊ ï×ÓÑÎËÏ wrote: http://www.halyava.ru/do/org.apache.lucene.analysis.zip This looks great! If I understand correctly, it can be used to quickly build stemmers for lots of languages. For example, the following page lists the location of ispell dictionaries for over 30 languages!

Re: AND on two weighted fields

2002-08-21 Thread Doug Cutting
Clemens Marschner wrote: I need to perform an AND query on two fields and weight the results according to in which fields the results came from. That is, I would need something like (field1^2 OR field2^1):(+token1 +token2 +token3) This means that _all_ of the tokens _have_ to occur in

Re: [contrib]: StandardTokenizer with sigram based CJK Support

2002-08-27 Thread Doug Cutting
+1 Che Dong wrote: Attached StandardTokenizer.jj with Sigram Based east asia language support: tested under Windows and GNU/Linux Just treat different UnicodeBlock with different word segment method. Hope in the future released we can add more language support in StandardTokenizer.jj step by

Re: [Bug 12137] New: - Can '*' or '?' symbol be used as the firstcharacter of a search?

2002-08-29 Thread Doug Cutting
Did my suggestion not make sense? I think we can make everyone happy here. By adding a parameter to the existing query parser we can: 1. Keep things so that the default behaviour is not to permit initial wildcards. 2. Make it so that developers who want to permit initial wildcards can

Re: Possible Bug with MultiSearcher?

2002-09-05 Thread Doug Cutting
Can you please submit a complete, self-contained test program that demonstrates the problem? That will make it much easier for someone to debug and fix it. Thanks, Doug Rasik Pandey wrote: Hello, I am getting the following exception when searching using a MultiSearcher and the first

Re: [patch] bug with boosts in parsed queries

2002-09-06 Thread Doug Cutting
This fixes the query parser, but, unfortunately, the problem is deeper. BooleanQuery does not implement boosting. This could be fixed too, but, for now, the easiest thing to do is simply to boost each term within the boolean query. Doug -- To unsubscribe, e-mail: mailto:[EMAIL

Re: fixed url and How to contribute code to lucene sandbox?

2002-09-11 Thread Doug Cutting
Che Dong wrote: 1. custom sorting beside default score sorting: make docID alias one field you need output sorting solved by sort data before indexing(example sorted by field PostDate), so docID can be an alias to the sort field. if we make hitCollector sort with docID or 1/docID or even

Re: Query Rewriting

2002-09-11 Thread Doug Cutting
Clemens Marschner wrote: I want to perform some rewriting rules on the queries I get. The best way to do that is to edit the parse tree. However, the Query classes do not contain any methods for reading out or altering their contents or to clone them. Is there any reason for that? Or is

Re: [patch] bug with boosts in parsed queries

2002-09-11 Thread Doug Cutting
Lee Mallabone wrote: Should I update the patch for now so that BooleanQuery.setBoost() just calls setBoost() on all its clauses? That only works if you call setBoost() after all of the clauses have been added, which is a little fragile. So you'd also need to boost new clauses as they're

Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/index FieldsReader.java

2002-09-19 Thread Doug Cutting
Otis, I really appreciate all of the work you do on Lucene. However sometimes I have to disagree. [EMAIL PROTECTED] wrote: - Added FIXME/TODO tags about things to document. While documentation in a package private class is nice, it is not an absolute requirement. So I don't think this

Re: cvs commit: jakarta-lucene build.xml

2002-09-19 Thread Doug Cutting
Otis Gospodnetic wrote: Sorry about that, I'll put the old file back. Regarding javadocs - I simply wanted a way to see the Javadocs of some classes (FileInfos, I believe it was) that were not visible. Maybe we should add another target: javadocs-internal or something. That would be good

internal documentation

2002-09-19 Thread Doug Cutting
Otis Gospodnetic wrote: --- Doug Cutting [EMAIL PROTECTED] wrote: Maybe we should add another target: javadocs-internal or something. That would be good encouragement to add javadoc comments to internal classes. Sounds good to me. I think it would encourage documentation of internals

Re: internal documentation

2002-09-19 Thread Doug Cutting
Doug Cutting wrote: I've attached this in Open Office format and as HTML. The HTML conversion is not great, but it's readable. Perhaps I should maintain this in HTML instead of Open Office, since it contains no diagrams... For some reason the HTML conversion was dropped in the copy I

Re: internal documentation

2002-09-19 Thread Doug Cutting
Doug Cutting wrote: For some reason the HTML conversion was dropped in the copy I received. So here it is again. Looks like this mailing list drops HTML attachments... This time I zipped it. We'll see if that works. Doug FileFormats.zip Description: Macintosh archive -- To unsubscribe

coding conventions

2002-09-19 Thread Doug Cutting
Scott Ganyo wrote: Nevertheless, I'm willing to accept that you have defined it as Lucene standard style and I do abide by it when developing Lucene... I don't think style should be (or even can be) mandated. When writing new code from scratch, a developer should of course try use a style

Re: RE : TR : Possible Bug with MultiSearcher?

2002-09-19 Thread Doug Cutting
Rasik Pandey wrote: Developers, Attached is the diff for MultiSearcher which seems to correct these bugs. I have not yet found any problems caused by these changes in testingbut we will keep you informed! [ ... ] diff -w -r1.4 MultiSearcher.java 96,98c100,105 public final

Re: RE : RE : TR : Possible Bug with MultiSearcher?

2002-09-20 Thread Doug Cutting
Rasik Pandey wrote: Understood. I made the second change,in MulitSeacher, and it works on this end. Do you think this change needs to be made in other places of the lucene code, such as the SegmentsReader.readerIndex(int n) method, as it uses what looks to be the same algorithm? I was

Re: memory usage - RE: your crawler

2002-09-20 Thread Doug Cutting
Otis Gospodnetic wrote: Every URL extracted from a fetched document needs to be looked up in this VisitedURLsFilter. If not there already, it needs to be added to it (and to the queue of URLs to fetch). If there already, it is thrown away. Because of this, the data structure that

Re: Keyword boosting

2002-09-21 Thread Doug Cutting
Brian Goetz wrote: Lets say we search for text retrieval. We want to find documents that have text retrieval in the body OR in the keywords, but we want to weight hits on the keywords more heavily. I can't boost the tokens in the index base, so I have to do that through the query.

Re: Excerpt pondering

2002-10-03 Thread Doug Cutting
Tom Dunstan wrote: I'd like some feedback on an idea that I have to extend lucene to hold the extra information that it needs to stop me having to reparse the entire body text again to generate excerpts. Basically, to work out which sections of the text have the terms that generate the

Re: BooleanQuery cannot be serialized

2002-10-16 Thread Doug Cutting
Stas Chetvertkov wrote: Recently we met a necessity to pass Query objects through network. We encoutered a problem that BooleanQuery cannot be serialized in spite of abstract Query object is Serializable. The source of the problem is that BooleanQuery holds a vector of BooleanClause objects

Re: Are score values always between 0 and 1?

2002-10-16 Thread Doug Cutting
Dmitry Serebrennikov wrote: I know that the FAQ says that they are, but in at least one instance in my index it appears to be equal to 1.94something. Are the scores guaranteed to be between 0 and 1 No. and if not, what would it take to make them such? A different Similarity

Re: Question: using boost for sorting

2002-10-16 Thread Doug Cutting
documented. Doug Otis Gospodnetic wrote: This sounds good to me, as it would lead us to pluggable similarity computation.... I can refactor some of this tonight. Otis --- Doug Cutting [EMAIL PROTECTED] wrote: This looks like a good approach. When I get a chance, I'd like to make

Re: Index Optimization space requirements

2002-11-04 Thread Doug Cutting
Konrad Scherer wrote: I am using lucene 1.2 (Java 1.4 on Solaris 7) and the xml indexer to index ~24000 small xml documents. The finished and optimized index uses around 340 MB disk space. The documents are reindexed once a week and this has worked without any trouble for months. Recently the

scoring API

2002-11-12 Thread Doug Cutting
Last week I checked in changes that provide a public API that lets applications easily alter Lucene's scoring function. The API is documented in the javadoc for the (now public) class org.apache.lucene.search.Similarity. Has anyone had a chance to try this? Doug -- To unsubscribe, e-mail:

Re: getAllFieldNames diffs

2002-11-13 Thread Doug Cutting
Scott Ganyo wrote: Now that we've committed to Java 2, I would not be opposed to removing Enumeration references... or at least deprecating them in favor of newer-style methods. The javadoc for Enumeration says: The functionality of this interface is duplicated by the Iterator interface. In

Re: New PhrasePrefixQuery.java

2002-11-20 Thread Doug Cutting
Konrad Scherer wrote: I have modified QueryParser.jj and PhrasePrefixQuery.java to allow wildcard searches within phrases. This turned out to be a very involved change going through a few revisions. I have tried to make the changes as clean as possible. Thanks for taking the time to work on

Re: Bug in current CVS source with DateField

2002-11-26 Thread Doug Cutting
[ I moved this discussion to lucene-dev. -drc ] This looks like a premature optimization gone bad. Brian, you made this change. Would you like to fix it, or should I? Doug Chris D wrote: I found that the current code in CVS prevents a org.apache.lucene.search.DateFilter from functioning

Re: computing size() in frequently used methods

2002-11-26 Thread Doug Cutting
Julien Nioche wrote: This kind of modification could be done in almost all the methods of the classes BooleanQuery and PhraseQuery, providing a small optimization (I did not mesure it - but even small optimizations can be useful). These computations are performed only once per search. I would

Re: Why doesn't IndexWriter have delete()?

2002-12-06 Thread Doug Cutting
Perhaps IndexWriter is badly named. It might better be called IndexAppender. It doesn't normally touch any of the index but the list of segments, unless it has to merge some segments, in which case it usually only touches a small subset of the index data. IndexReader, on the other hand, is

Re: cvs commit: jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowballpackage.html SnowballAnalyzer.java SnowballFilter.java

2002-12-20 Thread Doug Cutting
Otis Gospodnetic wrote: We had this in Lucene Sandbox? I never saw it committed, weird. I just committed it today. The commit message bounced because it was too big. I can't get it from the repository, any idea why? Some protections were wrong. I think I fixed it. Try now. Doug --

Re: cvs commit: jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowballpackage.html SnowballAnalyzer.java SnowballFilter.java

2002-12-21 Thread Doug Cutting
Otis Gospodnetic wrote: I wonder about SnowballAnalyzer and SnowballFilter classes. The ctor of the later uses introspection to instantiate the appropriate Stemmer. In most use cases that will be the same Stemmer from call to call. Seems like redundant work and objects created. Wouldn't it be

Re: custom scoring api questions

2002-12-31 Thread Doug Cutting
Shah, Vineel wrote: Here's what I'm trying to do: A query that looks for for java unix windows in the keywords field of an index. If the document has java unix, the score is .66..., regardless of any other factor. I want 1.0 for all three, .33... for just one, and no hit for none. This is easy

Re: Lucene's use of one byte to encode document length

2003-01-14 Thread Doug Cutting
Jonathan Baxter wrote: How important is it for I/O performance that Lucene uses only one byte to represent document length? Or are there reasons other than performance for using so few bits? To achieve good search performance, field-length normalization factors must be memory-resident. So

Re: Article about Lucene

2003-01-16 Thread Doug Cutting
Great article! I look forward to the rest of the series! The Java Developers Journal also recently ran a cover story on Lucene. Full text is not freely available, but the figures and examples are at: http://www.sys-con.com/java/source.cfm?id=1777 Should we add a link to this article on the

Re: time for 1.3 release?

2003-01-17 Thread Doug Cutting
I think you're proposing that the classes in http://jakarta.apache.org/lucene/docs/lucene-sandbox/snowball/api/ be added to the core Lucene jar and release. Is that right? I don't have a problem with this. Do others? The Javadoc should probably also include a pointer to:

Re: java.lang.UnsupportedOperationException

2003-01-17 Thread Doug Cutting
Sounds like a bug. Can you please supply a complete, self-contained test case? Ideally as a JUnit test class. Thanks, Doug Rasik Pandey wrote: Hello, Can anyone explain we I would be seeing this when re-using a query (MultiTermQuery or PrefixQuery, or any Query that doesn't implement the

Re: RE : java.lang.UnsupportedOperationException

2003-01-20 Thread Doug Cutting
sent previously. Let me know if I should enter a bug report? Thanks, Rasik -Message d'origine- De : Doug Cutting [mailto:[EMAIL PROTECTED]] Envoyé : vendredi 17 janvier 2003 20:47 À : Lucene Developers List Objet : Re: java.lang.UnsupportedOperationException Sounds like a bug. Can you

Re: Automatic stop-words

2003-01-22 Thread Doug Cutting
Leo Galambos wrote: When I want to search Linux, nothing is found. This word is in every article in the content. Or is something wrong? Yes :) why? log(1)=0. it is OK, I think :-))) so where's any problem? Lucene's IDF computation is: log( maxDoc / docFreq+1) + 1.0 Thus a term which

Re: manipulating content of stored fields

2003-02-07 Thread Doug Cutting
Lucene does not permit one to modify documents that are already indexed. You must delete them and re-index them, even if changes are only to non-indexed fields. Lucene should not be used as a document database. It is a full-text indexing library, which, as a convenience, permits one to store

Re: MultiSearcher discards interim results

2003-02-07 Thread Doug Cutting
I'm confused. The contract of this method is to return the top-scoring nDocs. For a multi-searcher it must compute the top-scoring nDocs from each sub-searcher, then find the top-scoring nDocs among these. If you want more of the top-scoring documents, just pass in a larger value for nDocs.

Re: [PATCH] Refactoring QueryParser.jj, setLowercaseWildcardTerms()

2003-02-12 Thread Doug Cutting
+1 I like this approach of modifying the query parser through subclassing. We should consider taking this approach further, e.g., perhaps by making addClause(), getFieldQuery() and getRangeQuery() into protected methods, so that folks can modify their behavior too. Thoughts? Also, I think we

  1   2   3   4   5   >