Re: Ok, for a Java newbie, how setup NetBeans 4 + Lucene

2005-03-22 Thread Chuck Williams
Mario Alejandro M. writes (3/22/2005 6:07 PM): I download the most recent Lucene code. I download NetBeans 4.0 + JDK 5. I use the wizard to import the source and I get a Project with the rigth packages under it. Now, i want to use the Test Case code because I want to compare outpputs for my Delphi

Re: Ok, for a Java newbie, how setup NetBeans 4 + Lucene

2005-03-23 Thread Chuck Williams
Mario Alejandro M. writes (3/23/2005 12:30 PM): Ok, I was able to setup it and see that compile fine and run the testcases. However, I don't can debug it (i put a breakpoint but nothing happend)... You need a debug target. The easiest way by far is to create a second standard project with File

Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-03-31 Thread Chuck Williams
Wolf Siberski writes (3/31/2005 1:54 AM): As some time has passed now since I submitted the Multisearcher patch, and no objections have been raised, I would like to ask to commit it now. I have put substantial effort into it, and my concern is that conflicts with newer patches will emerge if the co

Re: DO NOT REPLY [Bug 32965] - [PATCH] Use filter bits for next() and skipTo() in FilteredQuery

2005-04-04 Thread Chuck Williams
Erik Hatcher writes (4/4/2005 2:36 AM): Oh, and one other thing Paul's code relies on JDK 1.4's assert keyword. It seems this is an unnecessary reason to jump to 1.4 dependence. As a 1.5 user, I'd love to see Lucene at least at 1.4. Assert's are a good thing. Chuck -

Re: HighlighterTest failure

2005-04-25 Thread Chuck Williams
Erik Hatcher wrote: I get a failure running HighlighterTest from the Subversion trunk. Below are the details. What's the fix? I don't have the code here to run, but the problem is that MultiSearcher.rewrite(line 298) is calling Query.combine, which requires all the combined queries to be equal.

Re: SortTest failing

2005-04-25 Thread Chuck Williams
Otis Gospodnetic wrote: Hm, Erik is not alone with unit tests failing. My HighlighterTest passes (I didn't do svn update today yet), but I see SortTest failing: [junit] Testcase: testNormalizedScores(org.apache.lucene.search.TestSort): FAILED [junit] expected:<0.375> but was:<0.392445> [

Re: HighlighterTest failure

2005-04-25 Thread Chuck Williams
Erik Hatcher wrote: On Apr 25, 2005, at 10:02 PM, Chuck Williams wrote: Erik Hatcher wrote: I get a failure running HighlighterTest from the Subversion trunk. Below are the details. What's the fix? I don't have the code here to run, but the problem is that MultiSearcher.rewrite(li

Correct of Query.combine() bugs with new MultiSearcher

2005-04-26 Thread Chuck Williams
As noted in the patch description I just submitted, it should be a complete, correct, robust (relative to possible user Query implementations) and reasonably optimal solution for Query.combine(). It also simplifies the previous methods, deleting all overrides of Query.combine() and Query.merge

Re: Correct of Query.combine() bugs with new MultiSearcher

2005-04-26 Thread Chuck Williams
uck's patch does fix the Highlighter test. I'm set to commit it once it gets the thumbs-up from Doug. Erik On Apr 26, 2005, at 4:58 PM, Chuck Williams wrote: As noted in the patch description I just submitted, it should be a complete, correct, robust (relative to possible us

Re: DO NOT REPLY [Bug 31841] - [PATCH] MultiSearcher problems with Similarity.docFreq()

2005-04-28 Thread Chuck Williams
Wolf Siberski wrote: --- Additional Comments From [EMAIL PROTECTED] 2005-04-27 17:15 --- Wolf's revisions to my changes to Query.combine() look fine. The single-query optimization is good -- my oversight to have not included it originally. I don't believe either of the other two chang

Re: ParallelReader

2005-04-28 Thread Chuck Williams
Doug Cutting writes (4/28/2005 2:19 PM): Please find attached something I wrote today. It has not been yet tested extensively, and the documentation could be improved, but I thought it would be good to get comments sooner rather than later. Would folks find this useful? Yes, very useful, especi

Re: java.util.zip (was Questions about DeleteFile method)

2005-05-04 Thread Chuck Williams
Doug Cutting wrote: Monsur Hossain wrote: George, what about SharpZipLib: http://www.icsharpcode.net/OpenSource/SharpZipLib/Default.aspx It's a third-party project, but its written in C# and is under GPL. GPL unfortunately means that the library cannot be distributed by Apache with Lucene.Net. Ge

Re: usage of parallelreader

2005-12-16 Thread Chuck Williams
I also need the ability to achieve rapid updates. ParallelReader is attractive because my content naturally segments into a set of large and small stored fields where the small fields need to be accessed quickly, plus stable and mutable indexed fields where the mutable fields need to be updated qu

Re: GData, updateable IndexSearcher

2006-04-26 Thread Chuck Williams
If I'm following this correctly, it omits a related issue which is the need to periodically close and reopen the IndexWriter in order to flush its internal RAMDirectory, and similarly for the IndexReader used for delete. Is there any good solution to avoid these as well? My app has an IndexManage

Re: 2.0 release

2006-04-27 Thread Chuck Williams
Any chance at a last plea for LUCENE-362? It saves me an enormous amount of unnecessary allocation for the common case of a single large compressed field. It is an expert-level api that needs to be used carefully, but has no affect on any behavior if you don't use it. http://issues.apache.org/ji

Re: storing term text internally as byte array and bytecount as prefix, etc.

2006-05-01 Thread Chuck Williams
Could someone summarize succinctly why it is considered a major issue that Lucene uses the Java modified UTF-8 encoding within its index rather than the standard UTF-8 encoding. Is the only concern compatibility with index formats in other Lucene variants? The API to the values is a String, which

Re: storing term text internally as byte array and bytecount as prefix, etc.

2006-05-02 Thread Chuck Williams
ing data in a database for web application. You want to >> store it in such a way that other programs can manipulate easily >> other than >> only the web app program. Because there will be cases that you want >> to mass >> update or mass change the data, and

Re: storing term text internally as byte array and bytecount as prefix, etc.

2006-05-02 Thread Chuck Williams
ge to standard UTF-8 could be a hot item on the Lucene > 2.0list? > > Cheers, > > Jian Chen > > On 5/2/06, Doug Cutting <[EMAIL PROTECTED]> wrote: >> >> Chuck Williams wrote: >> > For lazy fields, there would be a substantial benefit to havin

Re: Lucene 2.0

2006-05-18 Thread Chuck Williams
I think Lucene-561 is in the "egregious" category and it has a patch to fix it (be sure to get the most recent of the two). Can this be included? Chuck Yonik Seeley wrote on 05/18/2006 10:50 AM: > On 5/18/06, DM Smith <[EMAIL PROTECTED]> wrote: >> > at the monent, there are two Jira issues with

Re: Fwd: How to combine results from several indices

2006-05-21 Thread Chuck Williams
wu fox wrote on 05/21/2006 03:02 AM: > I have several indices and each index describe part of a document. > For example ,a index contains dublin core information of a document, > and another index contains some classification information of the same > document. If I have a query "title:lucene AND

Re: Fwd: How to combine results from several indices

2006-05-21 Thread Chuck Williams
Wu, I don't know of a general solution that you could get now. I do have such a solution, but it is part of a much larger mechanism. My Company will likely authorize contribution of this code if there is sufficient community interest in using and enhancing it. I plan to describe the capabilitie

Re: Fwd: How to combine results from several indices

2006-05-21 Thread Chuck Williams
I think it would be hard as there are many places in Lucene where it is assumed that doc-ids uniquely identify a document. This is the main reason that ParallelReader has these constraints. Are you sure the transactionalizing updates to sub-indexes will cause you performance problems? My applica

Re: Breaking up text in fields or aggregate fields idea or field inheritance

2006-05-25 Thread Chuck Williams
JMA, I think you will find that multiple fields are beneficial. However, a simple answer to your question, and one that is needed even for your examples of multiple values in a single field, is to use a position increment gap. See Analyzer.getPositionIncrementGap(). When you use multiple values

Re: Lucene and Java 1.5

2006-05-27 Thread Chuck Williams
1.5 has been out for almost 2 years now and has substantial improvements over 1.4.x, including generics for example. Isn't it time for Lucene to adopt 1.5? Chuck Simon Willnauer wrote on 05/27/2006 12:47 AM: > I guess the discussion about switching to 1.5 will startup right now > due to the 2.0

Re: Lucene and Java 1.5

2006-05-27 Thread Chuck Williams
Another issue concerns user contributions of patches and enhancements. I have a significant body of code that might be contributed, all in 1.5, to do things that have been requested by others who participate in the lists. As most of the development community is using 1.5 now, Lucene may get fewer g

Re: Lucene and Java 1.5

2006-05-27 Thread Chuck Williams
Robin H. Johnson wrote on 05/27/2006 11:05 AM: >> After all, Lucene comes with version numbers. >> > Yes it does, I just think the core functionality shouldn't be so quick > to change away from supporting 1.4. > 2 years is hardly quick. Performance, contributions from the vast majority of

Re: Lucene and Java 1.5

2006-05-27 Thread Chuck Williams
Andi Vajda wrote on 05/27/2006 12:01 PM: > > On Sat, 27 May 2006, karl wettin wrote: > >> How about a binary 1.4-target distribution? > > That's a great idea that might solve the problem as long as the > resulting bytecode is compatible with 1.4 and with gcj. > This would preclude use of the 1.5

Re: Lucene Gdata -- the best way to store the feeds / entries

2006-05-27 Thread Chuck Williams
Simon, Storing content in a Lucene index is a common approach and works well. I use a patch, LUCENE-362, to boost performance. Compress and decompress the field externally, storing just the byte[] in the Lucene index. The patch eliminates all copying of the byte[] otherwise done in lucene, at t

Re: Benchmarking on GOV2

2006-05-29 Thread Chuck Williams
Sebastiano Vigna wrote on 05/28/2006 10:39 PM: > but we will certainly need > some help to configure Lucene so that it works at its best. > > We would like to measure indexing time and query answer time > I'm not sure what form you would like that help to take, but here are a couple high-level

Re: Lucene and Java 1.5

2006-05-30 Thread Chuck Williams
Doug Cutting wrote on 05/30/2006 08:51 AM: > Chris Hostetter wrote: >> : Agreed. But, I have not heard one compelling argument for the JDK 5 >> for >> : core. (JVM certainly) >> >> Off the top of my head... >> >> * Generics for cleaner more type safe APIs >> * Varargs for cleaner APIs >> * C

Re: Lucene and Java 1.5

2006-05-30 Thread Chuck Williams
Doug Cutting wrote on 05/30/2006 10:29 AM: > Tomcat 5.5 does not yet support Java 1.5 language features in jsp pages, This is not true -- I use them all the time. 1.5 features are not supported by default, but can be enabled easily by adding this to your jsp servlet: compiler

Re: Query.combine()

2006-05-30 Thread Chuck Williams
If I understand what you are saying, unfortunately it will not work. The issue is that rewrite() needs to access the index. E.g., a* or [a TO d] rewrite to disjunctions of all terms that exist in the index that match, respectively, the prefix or range. To determine this set of terms it is necess

Re: Query.combine()

2006-05-30 Thread Chuck Williams
Joe R wrote on 05/30/2006 12:37 PM: > If I understand correctly, I can still do the > combine() once on the JMS MultiSearcher if the query doesn't contain any > wildcard or range terms. > > In general, you cannot rely on this. Any app may define any subtype of Query and give it a rewrite() met

Re: Lexicon access questions

2006-06-01 Thread Chuck Williams
This approach comes to mind. You could model your semantic tags as tokens and index them at the same positions as the words or phrases to which they apply. This is particularly easy if you can integrate your taggers with your Analyzer. You would probably want to create one or more new Query subclas

Re: create feeds in GDATA - Server

2006-06-06 Thread Chuck Williams
Simon, What are your U requirements in the CRUD? Are these only on individual items so that delete/add is sufficient, or do you have any bulk update requirements? Chuck Simon Willnauer wrote on 06/06/2006 05:47 AM: > Hello, > > the first version of the GDATA server is already running and it >

Prefix and general wildcards

2006-06-09 Thread Chuck Williams
Hi all, I need to support query expressions like *xyz and possibly *lmn*. The former can be done with high search efficiency by storing (delimited) reversed tokens and the latter by storing all (delimited) rotations for each token. However, both of these techniques have high index overhead, the

Re: Prefix and general wildcards

2006-06-09 Thread Chuck Williams
Doug Cutting wrote on 06/09/2006 11:00 AM: > Why not instead add the rotated and/or reversed tokens to a different > field that does not store vectors? That would be a better idea. Thanks! Chuck - To unsubscribe, e-mail: [EMA

Re: Prefix and general wildcards

2006-06-10 Thread Chuck Williams
Doug Cutting wrote on 06/09/2006 08:00 AM: > Chuck Williams wrote: >> one simple and substantial optimization is >> to support a token filter for term vectors, i.e. pass tokens through an >> additional filter for addition to term vectors. > > Why not instead add the rot

Re: Fwd: How to combine results from several indices

2006-06-12 Thread Chuck Williams
Hi Wu, The simplest solution is to synchronize calls to a ParallelWriter.addDocument() method that calls IndexWriter.addDocument() for each sub-index. This will work assuming there are no exceptions and assuming you never refresh your IndexReader within ParallelWriter.addDocument(). If exception

Re: Fwd: How to combine results from several indices

2006-06-12 Thread Chuck Williams
ation and recovery get more complex. The basic idea would be to have a thread with a work queue for each sub-index. I may look at this later, or if you enhance this, please submit your version. Hope this helps, Chuck Chuck Williams wrote on 06/12/2006 09:05 AM: > Hi Wu, > > The simplest

Re: Java 1.5 was [jira] Updated: (LUCENE-600) ParallelWriter companion to ParallelReader

2006-06-13 Thread Chuck Williams
t to reopen the discussion? > > Chuck Williams (JIRA) wrote: >> [ http://issues.apache.org/jira/browse/LUCENE-600?page=all ] >> >> Chuck Williams updated LUCENE-600: >> -- >> >> Attachment: ParallelWriter.pa

Re: Fwd: How to combine results from several indices

2006-06-13 Thread Chuck Williams
You can try that approach, but I think you will find it more difficult. E.g., all of the primitive query classes are written specifically to use doc-ids. So, you either need to do you searches separately on each subindex and then write your own routine to join the results, or you would need to re

Re: Fwd: How to combine results from several indices

2006-06-16 Thread Chuck Williams
Wu, Glad to hear that! Congratulations on getting it working. Looking forward to your contribution, Chuck wu fox wrote on 06/16/2006 03:30 PM: > Hi ,chuck. I have implment my own parallelReader by override methods like > Document and ParallelTermDocs ,and it really works.Your idea isnpired > m

Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-16 Thread Chuck Williams
> ___ > The all-new Yahoo! Mail goes wherever you go - free your email address from > your Internet provider. http://uk.docs.yahoo.com/nowyoucan.html > > -

Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-17 Thread Chuck Williams
Ray Tsang wrote on 06/17/2006 06:29 AM: > I think the problem right now isn't whether we are going to have 1.5 > code or not. We will eventually have to have 1.5 code anyways. But > we need a sound plan that will make the transition easy. I believe > the transition from 1.4 to 1.5 is not an ov

Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-17 Thread Chuck Williams
Tatu Saloranta wrote on 06/17/2006 06:54 AM: > And it's > bit curious as to what the current mad rush regarding > migration is -- beyond the convenience and syntactic > sugar, only the concurrency package seems like a > tempting immediate reason? > The only people who keep bringing up these no

Re: Soccer-themed question: null fields?

2006-06-18 Thread Chuck Williams
JMA wrote on 06/17/2006 10:16 PM: > 1) Is there a way to find a document that has null fields? > For example, if I have two fields (FIRST_NAME, LAST_NAME) for World Cup > players: > > FIRST_NAME: Brian LAST_NAME: McBride > FIRST_NAME: Agustin LAST_NAME: Delgado > FIRST_NAME: Zinha

Re: Recency weightage in Lucene

2006-06-18 Thread Chuck Williams
[EMAIL PROTECTED] wrote on 06/17/2006 10:52 PM: > I am thinking of modifying lucene's current ranking algorithm to include the > document's recency-weightage. So that the latest modified documents gets > preference over earlier modified documents, which makes sense for news > search. > > (I b

Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-19 Thread Chuck Williams
Ray Tsang wrote on 06/19/2006 09:06 AM: > On 6/17/06, Chuck Williams <[EMAIL PROTECTED]> wrote: >> >> Ray Tsang wrote on 06/17/2006 06:29 AM: >> > I think the problem right now isn't whether we are going to have 1.5 >> > code or not. We will ev

Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-20 Thread Chuck Williams
t; - and a great community. > > OG: Thanks Dan, and please don't take my email(s) wrong. I'm quite > clear-headed in this issue, and am trying to be objective. I personally > wouldn't get hurt if we stayed with 1.4, I'd just be feeling bad and guilty > i

Re: Combining Hits and HitCollector

2006-06-27 Thread Chuck Williams
IMHO, Hits is the worst class in Lucene. It's atrocities are numerous, including the hardwired "50" and the strange normalization of dividing all scores by the top score if the top score happens to be greater than 1.0 (which destroys any notion of score values having any absolute meaning, although

Re: Memory Leak IndexSearcher

2006-07-03 Thread Chuck Williams
I'd suggest forcing gc after each n iteration(s) of your loop to eliminate the garbage factor. Also, you can run a profiler to see which objects are leaking (e.g., the netbeans profiler is excellent). Those steps should identify any issues quickly. Chuck robert engels wrote on 07/03/2006 07:40

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-07-06 Thread Chuck Williams
robert engels wrote on 07/06/2006 12:24 PM: > I guess we just chose a much simpler way to do this... > > Even with you code changes, to see the modification made using the > IndexWriter, it must be closed, and a new IndexReader opened. > > So a far simpler way is to get the collection of updates fi

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-07-06 Thread Chuck Williams
Robert, Either you or I are missing something basic. I'm not sure which. As I understand things, an IndexWriter and an IndexReader cannot both have the write lock at the same time (they use the same write lock file name). Only an IndexReader can delete and only an IndexWriter can add. So to up

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-07-06 Thread Chuck Williams
r to IndexModifier without > the warning that you should do all the deletions first, and then all > the additions - the BufferedWriter would manage this for you. > > On Jul 6, 2006, at 9:16 PM, Chuck Williams wrote: > >> Robert, >> >> Either you or I are missing somethi

Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-07 Thread Chuck Williams
DM Smith wrote on 07/07/2006 07:07 PM: > Otis, > First let me say, I don't want to rehash the arguments for or > against Java 1.5. This is an emotional issue for people on both sides. > However, I think you have identified that the core people need to > make a decision and the rest of us

Global field semantics

2006-07-08 Thread Chuck Williams
Many things would be cleaner in Lucene if fields had a global semantics, i.e., if properties like text vs. binary, Index, Store, TermVector, the appropriate Analyzer, the assignment of Directory in ParallelReader (or ParallelWriter), etc. were a function of just the field name and the index. This

Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-08 Thread Chuck Williams
Doug Cutting wrote on 07/08/2006 09:41 AM: > Chuck Williams wrote: >> I only work in 1.5 and use its features extensively. I don't think >> about 1.4 at all, and so have no idea how heavily dependent the code in >> question is on 1.5. >> >> Unfortunately,

Re: Global field semantics

2006-07-08 Thread Chuck Williams
karl wettin wrote on 07/08/2006 10:27 AM: > On Sat, 2006-07-08 at 09:46 -0700, Chuck Williams wrote: > >> Many things would be cleaner in Lucene if fields had a global semantics, >> > > >> Has this been considered before? Are there good reasons this

Re: Global field semantics

2006-07-08 Thread Chuck Williams
karl wettin wrote on 07/08/2006 12:27 PM: > On Sat, 2006-07-08 at 11:08 -0700, Chuck Williams wrote: > >> Karl, do you have specific reasons or use cases to normalize fields at >> Document rather than at Index? >> > > Nothing more than that the way the API

Re: Global field semantics

2006-07-09 Thread Chuck Williams
Marvin Humphrey wrote on 07/08/2006 11:13 PM: > > On Jul 8, 2006, at 9:46 AM, Chuck Williams wrote: > >> Many things would be cleaner in Lucene if fields had a global semantics, >> i.e., if properties like text vs. binary, Index, Store, TermVector, the >> appropriate

Re: Global field semantics

2006-07-09 Thread Chuck Williams
David Balmain wrote on 07/09/2006 06:44 PM: > On 7/10/06, Chuck Williams <[EMAIL PROTECTED]> wrote: >> Marvin Humphrey wrote on 07/08/2006 11:13 PM: >> > >> > On Jul 8, 2006, at 9:46 AM, Chuck Williams wrote: >> > >> >> Many things would be

Re: Global field semantics

2006-07-10 Thread Chuck Williams
David Balmain wrote on 07/10/2006 01:04 AM: > The only problem I could find with this solution is that > fields are no longer in alphabetical order in the term dictionary but > I couldn't think of a use-case where this is necessary although I'm > sure there probably is one. So presumably fields ar

Re: Global field semantics

2006-07-10 Thread Chuck Williams
Chris Hostetter wrote on 07/10/2006 02:06 AM: > As near as i can tell, the large issue can be sumarized with the following > sentiment: > > Performance gains could be realized if Field > properties were made fixed and homogeneous for > all Documents in an index. > This is cert

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-07-10 Thread Chuck Williams
Yonik Seeley wrote on 07/10/2006 09:27 AM: > I'll rephrase my original question: > When implementing NewIndexModifier, what type of efficiencies do we > get by using the new protected methods of IndexWriter vs using the > public APIs of IndexReader and IndexWriter? I won't comment on Ning's imp

Re: Global field semantics

2006-07-10 Thread Chuck Williams
Chris Hostetter wrote on 07/10/2006 12:31 PM: > So i guess we are on the same page that this kind of thing can be done at > the App level -- what benefits do you see moving them into the Lucene > index level? > Other than performance per David's and Marvin's ideas, the functionality benefits

Re: Lucene/Netbean Newbie looking for help

2006-07-10 Thread Chuck Williams
y suggestions? Or any pointers to getting the tests > to work in netbeans are appreciated. > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] >

Re: Using Lucene for Semantic search

2006-07-20 Thread Chuck Williams
I have built such a system, although not with Lucene at the time. I doubt you need to modify anything in Lucene to achieve this. You may want to index words, stems and/or concepts from the ontology. Concepts from the ontology may relate to words or phrases. Lucene's token structure is flexible,

Strange behavior of positionIncrementGap

2006-08-11 Thread Chuck Williams
Hi All, There is a strange treatment of positionIncrementGap in DocumentWriter.invertDocument().The gap is inserted between all values of a field, except it is not inserted between values if the prefix of the value list up to that point has not yet generated a token. For example, if a field F

Re: Strange behavior of positionIncrementGap

2006-08-11 Thread Chuck Williams
Chris Hostetter wrote on 08/11/2006 09:08 AM: > (using lower case > to indicate no tokens produced and upper case to indicate tokens were > produced) ... > > 1) a b C _gap_ D ...results in: C _gap_ D > 2) a B _gap_ C _gap_ D ...results in: B _gap_ C _gap_ D > 3) A _gap_ b _gap_

Re: Strange behavior of positionIncrementGap

2006-08-12 Thread Chuck Williams
Yonik Seeley wrote on 08/12/2006 05:08 AM: > On 8/11/06, Chuck Williams <[EMAIL PROTECTED]> wrote: >> 1) a b C D ...results in: _gap_ _gap_ C _gap_ D >> 2) a B C D ...results in: _gap_ B _gap_ C _gap_ D >> 3) A b c D ...results in: A _gap_ _gap_ _gap_ D >> >

Re: Combining search steps without re-searching

2006-08-28 Thread Chuck Williams
I presume your search steps are anded, as in typical drill-downs? >From a Lucene standpoint, each sequence of steps is a BooleanQuery of required clauses, one for each step. To add a step, you extend the BooleanQuery with a new clause. To not re-evaluate the full query, you'd need some query th

Re: Combining search steps without re-searching

2006-08-28 Thread Chuck Williams
Andrzej Bialecki wrote on 08/28/2006 09:19 AM: > Chuck Williams wrote: >> I presume your search steps are anded, as in typical drill-downs? >> >> >From a Lucene standpoint, each sequence of steps is a BooleanQuery of >> required clauses, one for each step.

After kill -9 index was corrupt

2006-09-10 Thread Chuck Williams
Hi All, An application of ours under development had a memory link that caused it to slow interminably. On linux, the application did not response to kill -15 in a reasonable time, so kill -9 was used to forcibly terminate it. After this the segments file contained a reference to a segment whose

Re: After kill -9 index was corrupt

2006-09-11 Thread Chuck Williams
Paul Elschot wrote on 09/10/2006 09:15 PM: > On Monday 11 September 2006 02:24, Chuck Williams wrote: > >> Hi All, >> >> An application of ours under development had a memory link that caused >> it to slow interminably. On linux, the application did no

Re: After kill -9 index was corrupt

2006-09-11 Thread Chuck Williams
robert engels wrote on 09/11/2006 07:34 AM: > A kill -9 should not affect the OS's writing of dirty buffers > (including directory modifications). If this were the case, massive > system corruption would almost always occur every time a kill -9 was > used with any program. > > The only thing a kill

Re: After kill -9 index was corrupt

2006-09-11 Thread Chuck Williams
that appears to > be the likely culprit to me. > > On Sep 11, 2006, at 2:56 PM, Chuck Williams wrote: > >> robert engels wrote on 09/11/2006 07:34 AM: >>> A kill -9 should not affect the OS's writing of dirty buffers >>> (including directory modifications).

Re: After kill -9 index was corrupt

2006-09-29 Thread Chuck Williams
t and the recovery code forgot to turn that off prior to the optimize! Thus a .cfs file was created, which confused the bulk updater -- it did not see a segment that was inside the cfs. Sorry for the false alarm and thanks to all who helped with the original question/concern, Chuck Chuck Williams

Re: Define end-of-paragraph

2006-10-03 Thread Chuck Williams
Reuven Ivgi wrote on 10/02/2006 09:32 PM: > I want to divide a document to paragraphs, still having proximity search > within each paragraph > > How can I do that? > Is your issue that you want the paragraphs to be in a single document, but you want to limit proximity search to find matches on

Re: Define end-of-paragraph

2006-10-03 Thread Chuck Williams
t the > best way > What do you think? > Thanks in advance > > Reuven Ivgi > > -Original Message- > From: Chuck Williams [mailto:[EMAIL PROTECTED] > Sent: Tuesday, October 03, 2006 10:58 AM > To: java-dev@lucene.apache.org > Subject: Re: Define end-of-para

Re: Ferret's changes

2006-10-10 Thread Chuck Williams
David Balmain wrote on 10/10/2006 03:56 PM: > Actually not using single doc segments was only possible due to the > fact that I have constant field numbers so both optimizations stem > from this one change. So it I'm not sure if it is worth answering your > question but I'll try anyway. It obviousl

Re: Ferret's changes

2006-10-11 Thread Chuck Williams
David Balmain wrote on 10/10/2006 08:53 PM: > On 10/11/06, Chuck Williams <[EMAIL PROTECTED]> wrote: > > I personally would always store term vectors since I use a > StandardTokenizer and Stemming. In this case highlighting matches in > small documents is not trivial. Ferret&

Re: Include BM25 in Lucene?

2006-10-17 Thread Chuck Williams
Vic Bancroft wrote on 10/17/2006 02:44 AM: > In some of my group's usage of lucene over large document collections, > we have split the documents across several machines. This has lead to > a concern of whether the inverse document frequency was appropriate, > since the score seems to be dependant

Re: ParallelMultiSearcher reimplementation

2006-11-03 Thread Chuck Williams
Chris Hostetter wrote on 11/03/2006 09:40 AM: > : Is there any timeline for when Java 1.5 packages will be allowed? > > I don't think i'll incite too much rioting to say "no there is no > timeline" > .. I may incite some rioting by saying "my guess is 1.5 packages will be > supported when the patch

Re: ParallelMultiSearcher reimplementation

2006-11-05 Thread Chuck Williams
Doug Cutting wrote on 11/03/2006 12:18 PM: > Chuck Williams wrote: >> Why would a thread pool be more controversial? Dynamically creating and >> garbaging threads has many downsides. > > The JVM already pools native threads, so mostly what's saved by thread

Dynamically varying maxBufferedDocs

2006-11-09 Thread Chuck Williams
Hi All, Does anybody have experience dynamically varying maxBufferedDocs? In my app, I can never truncate docs and so work with maxFieldLength set to Integer.MAX_VALUE. Some documents are large, over 100 MBytes. Most documents are tiny. So a fixed value of maxBufferedDocs to avoid OOM's is too

Re: Dynamically varying maxBufferedDocs

2006-11-09 Thread Chuck Williams
eeley wrote on 11/09/2006 08:37 AM: > On 11/9/06, Chuck Williams <[EMAIL PROTECTED]> wrote: >> My main concern is that the mergeFactor escalation merging logic will >> somehow behave poorly in the presence of dynamically varying initial >> segment sizes. > > Thi

Re: Dynamically varying maxBufferedDocs

2006-11-09 Thread Chuck Williams
Yonik Seeley wrote on 11/09/2006 08:50 AM: > For best behavior, you probably want to be using the current > (svn-trunk) version of Lucene with the new merge policy. It ensures > there are mergeFactor segments with size <= maxBufferedDocs before > triggering a merge. This makes for faster indexin

Re: Dynamically varying maxBufferedDocs

2006-11-09 Thread Chuck Williams
Chuck Williams wrote on 11/09/2006 08:55 AM: > Yonik Seeley wrote on 11/09/2006 08:50 AM: > >> For best behavior, you probably want to be using the current >> (svn-trunk) version of Lucene with the new merge policy. It ensures >> there are mergeFactor segments with

Re: Dynamically varying maxBufferedDocs

2006-11-09 Thread Chuck Williams
> > Yonik Seeley wrote: >> On 11/9/06, Chuck Williams <[EMAIL PROTECTED]> wrote: >>> Thanks Yonik! Poor wording on my part. I won't vary maxBufferedDocs, >>> just am making flushRamSegments() public and calling it externally >>> (properly sync

Re: Dynamically varying maxBufferedDocs

2006-11-09 Thread Chuck Williams
Michael Busch wrote on 11/09/2006 09:56 AM: > >> This sounds good. Michael, I'd love to see your patch, >> >> Chuck > > Ok, I'll probably need a few days before I can submit it (have to code > unit tests and check if it compiles with the current head), because > I'm quite busy with other stuff rig

Re: ParallelMultiSearcher reimplementation

2006-11-13 Thread Chuck Williams
Doug Cutting wrote on 11/13/2006 10:50 AM: > Chuck Williams wrote: >> I followed this same logic in ParallelWriter and got burned. My first >> implementation (still the version submitted as a patch in jira) used >> dynamic threads to add the subdocuments to th

Re: [jira] Resolved: (LUCENE-709) [PATCH] Enable application-level management of IndexWriter.ramDirectory size

2006-11-22 Thread Chuck Williams
Michael Busch wrote on 11/22/2006 08:47 AM: > Ning Li wrote: >> A possible design could be: >> First, in addDocument(), compute the byte size of a ram segment after >> the ram segment is created. In the synchronized block, when the newly >> created segment is added to ramSegmentInfos, also add its

Efficiently expunging deletions of recently added documents

2006-12-04 Thread Chuck Williams
Hi All, I'd like to open up the API to mergeSegments() in IndexWriter and am wondering if there are potential problems with this. I use ParallelReader and ParallelWriter (in jira) extensively as these provide the basis for fast bulk updates of small metadata fields. ParallelReader requires that

Re: Efficiently expunging deletions of recently added documents

2006-12-05 Thread Chuck Williams
Thanks Ning. This is all very helpful. I'll make sure to be consistent with the new merge policy and its invariant conditions. Chuck Ning Li wrote on 12/05/2006 08:01 AM: > An old issue (http://issues.apache.org/jira/browse/LUCENE-325 new > method expungeDeleted() added to IndexWriter) request

Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

2006-12-05 Thread Chuck Williams
Mike Klaas wrote on 12/05/2006 11:38 AM: > On 12/5/06, negrinv <[EMAIL PROTECTED]> wrote: > >> Chris Hostetter wrote: > >> > If the code was not already in the core, and someone asked about >> adding it >> > I would argue against doing so on the grounds that some helpfull >> utility >> > methods

Re: Locale string compare: Java vs. C#

2006-12-13 Thread Chuck Williams
Surprising but it looks to me like a bug in Java's collation rules for en-US. According to http://developer.mimer.com/collations/charts/UCA_latin.htm, \u00D8 (which is Latin Capital Letter O With Stroke) should be before U, implying -1 is the correct result. Java is returning 1 for all strengths

15 minute hang in IndexInput.clone() involving finalizers

2006-12-15 Thread Chuck Williams
Hi All, I've had a bizarre anomaly arise in an application and am wondering if anybody has ever seen anything like this. Certain queries, in not easy to reproduce cases, take 15-20 minutes to execute rather than a few seconds. The same query is fast some times and anomalously slow others. This

Re: 15 minute hang in IndexInput.clone() involving finalizers

2006-12-15 Thread Chuck Williams
Yonik and Robert, thanks for the suggestions and pointer to the patch! We've looked at the synchronization involved with finalizers and don't see how it could cause the issue as running the finalizers themselves is outside the lock. The code inside the lock is simple fixed-time list manipulation,

Re: 15 minute hang in IndexInput.clone() involving finalizers

2006-12-16 Thread Chuck Williams
va:175) > org.apache.lucene.store.BufferedIndexInput.clone(BufferedIndexInput.java:128) > > org.apache.lucene.store.FSIndexInput.clone(FSDirectory.java:564) > org.apache.lucene.index.SegmentTermDocs.(SegmentTermDocs.java:45) Thanks, Chuck Chuck Williams wrote on 12/15/2006 08:22 A

  1   2   >