from:"Earwin Burrfoot"

Stored fields access

2010-02-25 Thread Earwin Burrfoot

I'm thinking, should Lucene introduce new interface to read stored document fields? Current 'Document document(int n)' mechanism is barely usable due to overhead involved. While I believe underlying index structure works pretty fast (if it fits in memory, as is the case for most performance-concer

Re: Stored fields access

2010-02-25 Thread Earwin Burrfoot

actually want all the fields. > Erick > > On Thu, Feb 25, 2010 at 7:52 AM, Earwin Burrfoot wrote: >> >> I'm thinking, should Lucene introduce new interface to read stored >> document fields? >> >> Current 'Document document(int n)' mec

Re: Stored fields access

2010-02-25 Thread Earwin Burrfoot

(didn't see any interest from anyone though) > > -- Tim > > Erick Erickson wrote: > > OK, never mind > Erick > > On Thu, Feb 25, 2010 at 1:48 PM, Earwin Burrfoot wrote: >> >> My issue is with extra objects created in the process. Field selection >&g

Re: Turning IndexReader.isDeleted implementations to final

2010-02-28 Thread Earwin Burrfoot

> but even non-final methods are inlined by hotspot, if the compiler is sure > that the class was not extended There's absolutely no way a JIT compiler can be sure that the class was not extended (except declaring it final) - because you can create a new classloader and load new class any time you

Re: lucene and solr trunk

2010-03-17 Thread Earwin Burrfoot

Some of these people got traumatized by maven, now they only can think in terms of "mash everything together and sprinkle with hand-downloaded dependency jars". No offence : ) I, personally, prefer side-by-side layouts. You can add new stuff, and wire dependencies to the old one, without reorganiz

Re: lucene and solr trunk

2010-03-18 Thread Earwin Burrfoot

> Unless maven has some features i'm not aware of, your "nicely depends" > works buy pulling Lucene jars from a repository The 'missing feature' is called multi-module projects. On Thu, Mar 18, 2010 at 03:33, Chris Hostetter wrote: > : build and nicely gets all dependencies to Lucene and Tika whe

Re: (LUCENE-2297) IndexWriter should let you optionally enable reader pooling

2010-03-22 Thread Earwin Burrfoot

I think that would be ideal because right now it is somewhat confusing on where to pull your latest-and-greatest from and what should you base your patches on. On Mon, Mar 22, 2010 at 14:21, Chris Male wrote: > I think that would be ideal because we can then start getting some nightly > builds us

Re: Modules

2010-03-26 Thread Earwin Burrfoot

> Sounds good to me. I guess one thing to think about is the analyzers > in core (should they move to this module, too?). > If so, perhaps we could make 'ant test' of lucene depend on this > module, since core tests use analyzers. > But you could use lucene without an analyzers module, it wouldnt b

Re: Modules

2010-03-26 Thread Earwin Burrfoot

On Fri, Mar 26, 2010 at 18:24, Robert Muir wrote: > I would really love to see them all in one place though, for the > users. I think that the elegance of our tests should be second to the > users ease. > Perhaps we could just have a fast and dirty TestAnalyzer so the core > tests don't need to de

Re: svn commit: r928246 [1/6] - in /lucene/java/branches/flex_1458: ./ backwards/src/ backwards/src/java/org/apache/lucene/search/ backwards/src/test/org/apache/lucene/analysis/ backwards/src/test/org

2010-03-27 Thread Earwin Burrfoot

I think original Directory.copy() just copied everything in flex, without nocommits? Unlike before, now you can specify which files do you want to have copied, so people can query Codecs and whatnot themselves. >> Author: uschindler >> Date: Sat Mar 27 19:12:08 2010 >> New Revision: 928246 >> >

Re: svn commit: r928246 [1/6] - in /lucene/java/branches/flex_1458: ./ backwards/src/ backwards/src/java/org/apache/lucene/search/ backwards/src/test/org/apache/lucene/analysis/ backwards/src/test/org

2010-03-27 Thread Earwin Burrfoot

) ;) >> >> Ie Directory.copy used to filter for only index files, but >> Directory.copyTo copies everything so you must provide your own list >> if this matters. >> >> Mike >> >> On Sat, Mar 27, 2010 at 4:24 PM, Earwin Burrfoot wrote: >>> I

Re: Incremental Field Updates

2010-03-28 Thread Earwin Burrfoot

>>> Of course introducing the idea of updates also introduces the notion of a >>> primary key and there's probably an entirely separate discussion to be had >>> around user-supplied vs Lucene-generated keys. >> Not sure I see that need. Can you explain your reasoning a bit more? > If you want to u

Re: Incremental Field Updates

2010-03-29 Thread Earwin Burrfoot

> Of course introducing the idea of updates also introduces the notion of a > primary key and there's probably an entirely separate discussion to be had > around user-supplied vs Lucene-generated keys. Not sure I see that need. Can you explain your reasoning a bit more? >>> If you

Re: Incremental Field Updates

2010-03-29 Thread Earwin Burrfoot

>>If someone needs this, it can be built over lucene, without >>introducing it as a core feature and needlessly complicating things. > > I think with any partial-update feature the *absence* of primary key support > would "needlessly complicate things": > If Lucene is not capable of performing du

Re: Incremental Field Updates

2010-03-29 Thread Earwin Burrfoot

>>Variant d) sounds most logical? And enables all sorts of fun stuff. > > So the duplicate-key docs can have different values for initial-insert fields > but partial updates will cause sharing of a common field value? > And subsequent same-key doc inserts do or don't share these previous > "part

Re: Incremental Field Updates

2010-03-29 Thread Earwin Burrfoot

>>Who ever said that some_condition should point to a unique document? > > My assumption was, for now, we were still talking about the simpler case of > updating a single document. > If we extend the discussion to support set-based updates it's worth > considering the common requirements for upda

Re: Build failed in Hudson: Lucene-trunk #1144

2010-04-01 Thread Earwin Burrfoot

No, no, no, Lucene still has no need for maven or ivy for dependency management. We can just hack around all issues with ant scripts. : ) On Thu, Apr 1, 2010 at 09:48, Chris Hostetter wrote: > > : I was wondering yesterday why aren't the required libs checked in to SVN? We > > Licensing issues. >

Re: Welcome Uwe Schindler to the Lucene PMC

2010-04-01 Thread Earwin Burrfoot

Generics SpecOps made it to the top and are gonna rule us from the shadows :) Congrats! On Thu, Apr 1, 2010 at 16:37, Robert Muir wrote: > Congrats Uwe! > > On Thu, Apr 1, 2010 at 7:05 AM, Grant Ingersoll wrote: >> >> I'm pleased to announce that the Lucene PMC has voted to add Uwe Schindler >>

Re: Build failed in Hudson: Lucene-trunk #1144

2010-04-01 Thread Earwin Burrfoot

> it doesn't really matter if it's ant scripts, or ivy declarations, or > maven pom entries -- the point is the same. > > We can't distribute the jars, but we can distribute programatic means for > users to fetch teh jars themselves. > > (even if we magicly switched to ivy or maven for dependency m

Term space continuity

2010-04-05 Thread Earwin Burrfoot

A random thought from some of the earlier discussions. Had anybody used the fact that Lucene Term space is continuous (single per-index/segment space instead of separate per-field spaces) at least once? I only see code around that copes with this somehow, like checking "term.field() == field" just

Re: Term space continuity

2010-04-05 Thread Earwin Burrfoot

Wow! Cool. On Tue, Apr 6, 2010 at 03:51, Michael McCandless wrote: > The flex API isolates fields, ie you get a TermsEnum for a given field > and it enums only the term's text (as a BytesRef). > > Mike > > On Mon, Apr 5, 2010 at 7:22 PM, Earwin Burrfoot wrote: >>

Getting fsync out of the loop

2010-04-06 Thread Earwin Burrfoot

So, I want to pump my IndexWriter hard and fast with documents. Removing fsync from FSDirectory helps. But for that I pay with possibility of index corruption, not only if my node suddenly loses power/kernelpanics, but also if it runs out of disk space (which happens more frequently). I invented

Re: Getting fsync out of the loop

2010-04-06 Thread Earwin Burrfoot

> Running out of disk space with fsync disabled won't lead to corruption. > Even kill -9 the JRE process with fsync disabled won't corrupt. > In these cases index just falls back to last successful commit. > > It's "only" power loss / OS / machine crash where you need fsync to > avoid possible corr

Re: Getting fsync out of the loop

2010-04-07 Thread Earwin Burrfoot

nning time improves, but I'm curious to know by how much. > > Shai > > On Wed, Apr 7, 2010 at 2:26 AM, Earwin Burrfoot wrote: >> >> > Running out of disk space with fsync disabled won't lead to corruption. >> > Even kill -9 the JRE process with fsync dis

Re: Getting fsync out of the loop

2010-04-07 Thread Earwin Burrfoot

> No, this doesn't make sense. The OS detects a disk full on accepting > the write into the write cache, not [later] on flushing the write > cache to disk. If the OS accepts the write, then disk is not full (ie > flushing the cache will succeed, unless some other not-disk-full > problem happens).

Re: Getting fsync out of the loop

2010-04-08 Thread Earwin Burrfoot

> But, IW doesn't let you "hold on to" checkpoints... only to commits. > > Ie SnapshotDP will only "see" actual commit/close calls, not > intermediate checkpoints like a random segment merge completing, a > flush happening, etc. > > Or... maybe you would in fact call commit frequently from the main

Re: Proposal about Version API "relaxation"

2010-04-13 Thread Earwin Burrfoot

I wholeheartedly support this anti-version riot :) On Tue, Apr 13, 2010 at 19:27, Shai Erera wrote: > Hi > > I'd like to propose a relaxation on the Version API. Uwe, please read the > entire email before you reply :). > > I was thinking, following a question on the user list, that the > Version-

Re: [jira] Account password

2010-04-13 Thread Earwin Burrfoot

Priceless On Wed, Apr 14, 2010 at 00:53, wrote: > > You (or someone else) has reset your password. > > - > > Your password has been changed to: MCwqNr > > You can change your password here: > > https://issues.apache.org/jira/

Re: [jira] Account password

2010-04-13 Thread Earwin Burrfoot

It wasn't On Wed, Apr 14, 2010 at 02:06, Erick Erickson wrote: > A, good. That means the very long e-mail that came to my regular account > about someone hacking the JIRA server is bogus too I assume.. > Erick > > On Tue, Apr 13, 2010 at 5:58 PM, Uwe Schindler wrote: >> >> LOL! >> >> Thi

Re: Proposal about Version API "relaxation"

2010-04-14 Thread Earwin Burrfoot

The thread somehow got sidetracked. So, let's get this carriage back on its rails? Let me remind - we have an API on hands that is mandatory and tends to be cumbersome. Proposed solution does indeed have ultrascary word "static" in it. But if you brace yourself and look closer - the use of said st

Re: Proposal about Version API "relaxation"

2010-04-14 Thread Earwin Burrfoot

Can't believe my eyes. +1 On Thu, Apr 15, 2010 at 01:22, Michael McCandless wrote: > On Wed, Apr 14, 2010 at 12:06 AM, Marvin Humphrey > wrote: > >> Essentially, we're free to break back compat within "Lucy" at any time, but >> we're not able to break back compat within a stable fork like "Lucy

Re: Proposal about Version API "relaxation"

2010-04-15 Thread Earwin Burrfoot

3.1.1 >> just to get a new feature but get it API back-supported? As soon as they >> upgrade to 3.2, that means a new set of API right? >> >> Major releases will just change the index structure format then? Or move >> to Java 1.6? Well ... not even that bec

Re: SnapshotDeletionPolicy throws NPE if no commit happened

2010-04-15 Thread Earwin Burrfoot

We should just let IW create a null commit on an empty directory, like it always did ;) Then a whole class of such problems disappears. On Thu, Apr 15, 2010 at 11:16, Shai Erera wrote: > SDP throws NPE if the index includes no commits, but snapshot() is called. > This is an extreme case, but can

Re: Proposal about Version API "relaxation"

2010-04-15 Thread Earwin Burrfoot

I think an index upgrade tool is okay? While you still definetly have to code it, things like "if idxVer==m doOneStuff elseif idxVer==n doOtherStuff else blowUp" are kept away from lucene innards and we all profit? On Thu, Apr 15, 2010 at 16:21, Robert Muir wrote: > its open source, if you feel t

Re: Proposal about Version API "relaxation"

2010-04-15 Thread Earwin Burrfoot

I like the idea of index conversion tool over silent online upgrade because it is 1. controllable - with online upgrade you never know for sure when your index is completely upgraded, even optimize() won't help here, as it is a noop for already-optimized indexes 2. way easier to write - as flex sho

Re: Proposal about Version API "relaxation"

2010-04-15 Thread Earwin Burrfoot

On Thu, Apr 15, 2010 at 17:17, Yonik Seeley wrote: > Seamless online upgrades have their place too... say you are upgrading > one server at a time in a cluster. Nothing here that can't be solved with an upgrade tool. Down one server, upgrade index, upgrade sofware, up. -- Kirill Zakharenko/Кири

Re: Proposal about Version API "relaxation"

2010-04-15 Thread Earwin Burrfoot

On Thu, Apr 15, 2010 at 17:49, Robert Muir wrote: > wrong, it doesnt fix the analyzers problem. > you need to reindex. > > On Thu, Apr 15, 2010 at 9:39 AM, Earwin Burrfoot wrote: >> >> On Thu, Apr 15, 2010 at 17:17, Yonik Seeley >> wrote: >> > Seamle

Re: Proposal about Version API "relaxation"

2010-04-15 Thread Earwin Burrfoot

> reasonable, but changing APIs around when there's not a good reason > behind it (other than someone liked the name a little better) should > still be approached with caution. Changing names is a good enough reason :) They make a darn difference between having to read a book to be able to use som

Re: Proposal about Version API "relaxation"

2010-04-15 Thread Earwin Burrfoot

ays nice to be > able to work without dealing with pesky legacy issues . Perhaps > splitting out the indexing upgrades into a separate program lets us > accommodate both concerns. > FWIW > Erick > On Thu, Apr 15, 2010 at 9:42 AM, Danil ŢORIN wrote: >> >> True. Just ne

Re: Proposal about Version API "relaxation"

2010-04-15 Thread Earwin Burrfoot

> First, the index format. IMHO, it is a good thing for a major release to be > able to read the prior major release's index. And the ability to convert it > to the current format via optimize is also good. Whatever is decided on this > thread should take this seriously. Optimize is a bad way to co

Re: Proposal about Version API "relaxation"

2010-04-15 Thread Earwin Burrfoot

ote: > On 04/15/2010 01:50 PM, Earwin Burrfoot wrote: >>> >>> First, the index format. IMHO, it is a good thing for a major release to >>> be >>> able to read the prior major release's index. And the ability to convert >>> it >>> to

Re: Proposal about Version API "relaxation"

2010-04-15 Thread Earwin Burrfoot

> BTW Earwin, we can come up w/ a migrate() method on IW to accomplish > manual migration on the segments that are still on old versions. > That's not the point about whether optimize() is good or not. It is > the difference between telling the customer to run a 5-day migration > process, or a coup

Re: Proposal about Version API "relaxation"

2010-04-15 Thread Earwin Burrfoot

On Thu, Apr 15, 2010 at 23:07, DM Smith wrote: > On 04/15/2010 03:04 PM, Earwin Burrfoot wrote: >>> >>> BTW Earwin, we can come up w/ a migrate() method on IW to accomplish >>> manual migration on the segments that are still on old versions. >>> That'

Re: Proposal about Version API "relaxation"

2010-04-15 Thread Earwin Burrfoot

> Not sure if plain users are allowed/encouraged to post in this list, > but wanted to mention (just an opinion from a happy user), as other > users have, that not all of us can reindex just like that. It would > not be 10 min for one of our installations for sure... > > First, i would need to impl

Re: Proposal about Version API "relaxation"

2010-04-15 Thread Earwin Burrfoot

2010/4/15 Shai Erera : > The reason Earwin why online migration is faster is because when u > finally need to *fully* migrate your index, most chances are that most > of the segments are already on the newer format. Offline migration > will just keep the application idle for some amount of time unt

Re: Proposal about Version API "relaxation"

2010-04-15 Thread Earwin Burrfoot

I think this should split off the mega-thread :) On Thu, Apr 15, 2010 at 23:28, Uwe Schindler wrote: > Hi Earwin, > > I am strongly +1 on this. I would also make the Release Manager for 3.1, if > nobody else wants to do this. I would like to take the preflex tag or some > revisions before (mayb

Re: official GIT repository / switch to GIT?

2010-04-17 Thread Earwin Burrfoot

Why can't people just use svn or mercurial as a client for subversion repository? What is the benefit of migrating repository itself? On Sat, Apr 17, 2010 at 11:20, Thomas Koch wrote: > Hi, > > at least since august 2009 nobody has dared to ask this question, so let's > start a flamewar: > Don't

Re: official GIT repository / switch to GIT?

2010-04-17 Thread Earwin Burrfoot

These are broken, by the way. We need to kick someone to merge entries for lucene&solr and point them to a new svn url. On Sun, Apr 18, 2010 at 04:10, John Wang wrote: > Hi Thomas: > There is a git mirror already: http://github.com/apache/lucene > All of apache projects are: http://git.

Re: wiki

2009-01-24 Thread Earwin Burrfoot

Looks like Czech to my slavic eyes :) On Sat, Jan 24, 2009 at 18:14, Paul Elschot wrote: > On Saturday 24 January 2009 15:29:12 Grant Ingersoll wrote: > >> Anyone know what this is: >> http://wiki.apache.org/lucene-java/IndeksRe%C4%8Di > > After looking around on the lucene wiki a bit I also foun

Re: Integrating Language Models into Lucene

2009-02-25 Thread Earwin Burrfoot

Have you looked at MG4J (http://mg4j.dsi.unimi.it/)? Last time I did, it looked like an opposite of lucene - nice and up-to-date algorithmics, but hard to apply to complex real-world tasks. On Thu, Feb 26, 2009 at 04:21, Koren Krupko wrote: > > Hello Lucene Developers! > > My name is Koren Krupko

Re: Bitmap index

2009-02-27 Thread Earwin Burrfoot

> Maybe we can use the > compression technology mentioned in this Wikipedia article to further > optimize filters and their DocIdSetIterators. We already use WAH-encoded bitmap filters over here for roughly a year. And yes, they are nice. -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) H

Re: Sorting and multi-term fields again

2009-03-02 Thread Earwin Burrfoot

My opinion is that if you want to enable sorting on multi-term fields, you need a pluggable selection policy. I see someone wanting biggest/smallest term represent a document when sorting. Or maybe a function of the terms. On Mon, Mar 2, 2009 at 20:34, Uwe Schindler wrote: > I updated yesterday h

Re: extending the query parser

2009-03-11 Thread Earwin Burrfoot

Take ANTLR and roll your own query parser from scratch? It's pretty easy. On Thu, Mar 12, 2009 at 04:24, Candide Kemmler wrote: > Hello, > > I'm looking at a way to extend the lucene query parser to allow for semantic > computations in IEML space (see http://ieml.org). What I'd like to know is: >

Re: extending the query parser

2009-03-12 Thread Earwin Burrfoot

On Thu, Mar 12, 2009 at 21:16, Candide Kemmler wrote: > > On 11 Mar 2009, at 23:21, Earwin Burrfoot wrote: > >> Take ANTLR and roll your own query parser from scratch? It's pretty easy. >> > > Hi Earwin, > > That would be fantastic, since our parser is a

Re: move TrieRange* to core?

2009-03-18 Thread Earwin Burrfoot

On Wed, Mar 18, 2009 at 23:08, Andi Vajda wrote: > > On Mar 18, 2009, at 13:01, Michael McCandless > wrote: > >> I think we should move TrieRange* into core before 2.9? >> >> It's received alot of attention, from both developers (Uwe & Yonik did >> lots of iterations, and Solr is folding it in) a

Re: Modularization

2009-03-23 Thread Earwin Burrfoot

> - contrib has always had a lower bar and stuff was committed under > that lower bar - there should be no blanket promotion. > - contrib items may have different dependencies... putting it all > under the same source root can make a developers job harder > - many contrib items are less related to

Re: Modularization

2009-03-23 Thread Earwin Burrfoot

On Mon, Mar 23, 2009 at 22:13, Mark Miller wrote: > Earwin Burrfoot wrote: >>> >>> - contrib has always had a lower bar and stuff was committed under >>> that lower bar - there should be no blanket promotion. >>> - contrib items may have different dependenci

Re: Is TopDocCollector's collect() implementation correct?

2009-03-26 Thread Earwin Burrfoot

I'd say it is a bad name. Raw hit is way far from being result of a search. If you're already breaking back compat with 3.0 release (by incrementing java version), maybe its worthy to break it in some more places, just so ugly names like MRHC and special code paths that check for n-year-old interf

Re: Is TopDocCollector's collect() implementation correct?

2009-03-26 Thread Earwin Burrfoot

> BTW, I like the name ResultsCollector, as it's just like HitCollector, but > does not commit too much to "hits" .. i.e., facets aren't hits ... I think? What this class consumes and what it produces is a totally different thing. HitCollector always collects 'hits', and then produces whatever imp

Re: Is TopDocCollector's collect() implementation correct?

2009-03-26 Thread Earwin Burrfoot

> On Thu, Mar 26, 2009 at 08:44:57AM -0400, Michael McCandless wrote: > >> do you have an alternative? > > Brainstorming > > * Harvester > * Trawler > * HitPicker > * HitGrabber > > Marvin Humphrey NitPicker - that absolutely made my day -- Kirill Zakharenko/Кирилл Захаренко (ear...@gma

Re: Is TopDocCollector's collect() implementation correct?

2009-03-26 Thread Earwin Burrfoot

> I think ResultsCollector (or maybe ResultCollector) is my favorite so far... > > But how about simply Collector? (I realize it's very generic... but > we don't collect anything else in Lucene?). That's exactly what I'm using in my app -> abstract class Collector extends HitCollector, that serves

Re: NIO.2

2009-03-28 Thread Earwin Burrfoot

On Sat, Mar 28, 2009 at 16:44, Michael Busch wrote: > NIO.2 sounds great. > Though, it will probably take a pretty long time before we can switch Lucene > to Java 1.7 :( > > We could write a (contrib) module that we don't ship together with the core > that has a Directory implementation which uses

Re: NIO.2

2009-03-28 Thread Earwin Burrfoot

> I think having async IO will be great, though I wonder how we would > change Lucene to take advantage of it. It ought to gain us > concurrency (eg we can score last chunk while we have an io request > out to retrieve next chunk, of term docs / positions / etc.). A presentation given above refere

Possible IndexInput optimization

2009-03-28 Thread Earwin Burrfoot

While drooling over MappedBigByteBuffer, which we'll (hopefully) see in JDK7, I revisited my own Directory code and noticed a certain peculiarity, shared by Lucene core classes: Each and every IndexInput implementation only implements readByte() and readBytes(), never trying to override readInt/VIn

Re: Possible IndexInput optimization

2009-03-29 Thread Earwin Burrfoot

> A while ago I tried overriding the read* methods in BufferedIndexInput like > this: > > I'm still surprised there was no performance improvement at all. Maybe > something was wrong with my test and I should try it again... For BufferedIndexInput improvement should be

Re: Possible IndexInput optimization

2009-03-29 Thread Earwin Burrfoot

> Earwin, > I did not experiment lately, but I'd like to add a general compressed > integer array to the basic types in an index, that would be compressed > on writing and decompressed on reading. > A first attempt is at LUCENE-1410, and one of the choices I had there > was whether or not to use NI

Re: Possible IndexInput optimization

2009-03-29 Thread Earwin Burrfoot

>> In my case I have to switch to MMap/Buffers, Java behaves ugly with >> 8Gb heaps. > Do you mean that because garbage collection does not perform well > on these larger heaps, one should avoid to create arrays to have heaps > of that size, and rather use (direct) MMap/Buffers? Yes, exactly. Keepi

Re: Modularization

2009-04-01 Thread Earwin Burrfoot

Lucene is in fact already available through maven. poms do exist, all what is left is to find who manages them and releases. On Thu, Apr 2, 2009 at 01:40, Douglas Campos wrote: > +1 on maven, and I volunteer to aid in the creation of the maven project > files (pom's) > > On Wed, Apr 1, 2009 at 11

possible TermInfosReader speedup

2009-04-08 Thread Earwin Burrfoot

Currently, when we're seeking a given Term, it does a binary search across all term space, including terms belonging to other fields. I propose augmenting fields file with two pointers (firstTerm, lastTerm) for each field. That reduces range we need to search, and instead of comparing Terms we only

Re: possible TermInfosReader speedup

2009-04-08 Thread Earwin Burrfoot

On Thu, Apr 9, 2009 at 00:14, Michael McCandless wrote: > On Wed, Apr 8, 2009 at 3:46 PM, Earwin Burrfoot wrote: > >> Currently, when we're seeking a given Term, it does a binary search >> across all term space, including terms belonging to other fields. >> I propos

Re: possible TermInfosReader speedup

2009-04-08 Thread Earwin Burrfoot

On Thu, Apr 9, 2009 at 02:01, Uwe Schindler wrote: >> >> Also, on the other topic - how hard is it to boost >> >> TermEnum.skipTo(term) speed to IndexReader.terms(term) level? Would be >> >> nice for TrieRangeFilter and probably some other filters. >> > I think all that's needed is to implement Se

Re: Modularization

2009-04-09 Thread Earwin Burrfoot

On Fri, Apr 10, 2009 at 02:25, Chris Hostetter wrote: > Or just make it trivial to get all jars that fit a given profile w/o > actually merging those jars into an uber-jar ... does maven's > dependency management have any like "bundles" or "virtual packages" so > we could publish a "lucene-all-ana

IndexReader plugins

2009-04-12 Thread Earwin Burrfoot

To support my dream of kicking fieldCache out of the core and to add some extensibility to Lucene, I want to introduce IndexReaderPlugins. Rough pseudocode follows: interface IndexReaderPlugin { void attach(SegmentReader reader); void detach(SegmentReader reader); void att

Re: IndexReader plugins

2009-04-12 Thread Earwin Burrfoot

> Earwin Burrfoot wrote: >> >> Benefits are numerous. We get rid of alien code like: >> +++ src/java/org/apache/lucene/index/SegmentReader.java (working copy) >> @@ -83,6 +86,8 @@ >> + protected ValueSource valueSource; >> + >> @@ -555

Re: IndexReader plugins

2009-04-13 Thread Earwin Burrfoot

ferent plugin instances per-subreader. Do we want plugins supporting more than one interface, or is it an unnecessary complication? Like: indexReader.bindPlugin(instance).to(Iface1.class, Iface2.class); And then: indexReader.plugin(Iface1.class) == indexReader.plugin(Iface2.class) > Mike > >

Re: IndexReader plugins

2009-04-13 Thread Earwin Burrfoot

>> Can we outline some requirements for the plugin API? >> >> Do we want to attach/detach them to IndexReader after it is created, >> or only during construction? > > I think I'd lean towards only at construction. Seems dangerous to > allow swap in/out at some later time. I have several points pro

Re: IndexReader plugins

2009-04-13 Thread Earwin Burrfoot

On Mon, Apr 13, 2009 at 17:14, Michael McCandless wrote: > On Mon, Apr 13, 2009 at 9:02 AM, Earwin Burrfoot wrote: > >>> I think I'd lean towards only at construction. Seems dangerous to >>> allow swap in/out at some later time. >> I have several points

Re: IndexReader plugins

2009-04-14 Thread Earwin Burrfoot

>> IndexReader.java is littered with the likes of: >> public static IndexReader open(final Directory directory, >> IndexDeletionPolicy deletionPolicy) throws CorruptIndexException, >> IOException; > But I don't understand why is this a problem... Doubling the number of factory methods? We have to k

Re: IndexReader plugins

2009-04-14 Thread Earwin Burrfoot

>> > With the early binding approach, you wouldn't pass all plugins during >> > creation; you'd pass a factory object that exposes methods like: >> > >> > getPostingsComponent(SegmentInfo) >> > getStoredFieldsComponent(SegmentInfo) >> > getValueSourceComponent(SegmentInfo) >> >> That basically k

Re: IndexReader plugins

2009-04-14 Thread Earwin Burrfoot

> The original example justification was to avoid putting a ValueSource in the > IndexReader (I guess avoiding the funky init code? valueSource = new > CachingValueSource(this, new UninversionValueSource(this)) That was a bit of drama for the sake of drama, I couldn't restrain myself :) My justific

Re: IndexReader plugins

2009-04-14 Thread Earwin Burrfoot

Mark Miller wrote: > The distinction I am making with core is that we will have to call known > methods on those > core 'modules' that are not very generic? Doesn't that keep it from playing > nice with the very generic 'attach this to this segment'? Genericity spans binding, notifications and retr

Re: IndexReader plugins

2009-04-14 Thread Earwin Burrfoot

Michael McCandless wrote: > I gave the example to show the init vs inflight distinction, because > inflight makes me nervous. I'm thinking of some (bad name follows) PluginBundle, that has add/remove/inspect methods and constructor/method for filling it with default Lucene components. Then instead

Re: IndexReader plugins

2009-04-14 Thread Earwin Burrfoot

On Wed, Apr 15, 2009 at 00:15, Mark Miller wrote: > Mark Miller wrote: >> >> Earwin Burrfoot wrote: >>> >>> Mark Miller wrote: >>> >>>> >>>> The distinction I am making with core is that we will have to call known >>>>

Re: IndexReader plugins

2009-04-14 Thread Earwin Burrfoot

On Wed, Apr 15, 2009 at 00:55, Mark Miller wrote: > Earwin Burrfoot wrote: >> >> On Wed, Apr 15, 2009 at 00:15, Mark Miller wrote: >> >>> >>> Mark Miller wrote: >>> >>>> >>>> Earwin Burrfoot wrote: >>>> >&

Re: I wanna contribute a Chinese analyzer to lucene

2009-04-16 Thread Earwin Burrfoot

On Thu, Apr 16, 2009 at 18:16, Ken Krugler wrote: > I wrote a Analyzer for apache lucene for analyzing sentences in Chinese > language, it's called imdict-chinese-analyzer as it is a subproject of > imdict, which is an intelligent online dictionary. > > The project on google code is here: > http:/

String.intern() alternative for field names

2009-04-19 Thread Earwin Burrfoot

Okay, we'd like to have equality-by-reference for field names, yielding überfast comparisions in all our tight inner loops. But we dislike default String.intern() for its java<->native transitions and general lentitude. There's a perfect solution. Too dumb to come up with it myself, but fortunately

Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Earwin Burrfoot

On Sun, Apr 19, 2009 at 23:16, Chris Miller wrote: > As far as I can see, both these implementations only suffer from > threadsafety problems in that they don't guarantee visibility across > threads, ie it's possible for threads to see stale data. > So the code should work fine if you can live wi

Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Earwin Burrfoot

On Sun, Apr 19, 2009 at 23:42, Chris Miller wrote: >> As soon as all possible fields are in the pool, we're essentially >> readonly. > The problem is, there's no guarantee we will ever reach this point. For > example suppose you have a server app that spawns a new thread per request. > Each new th

Re: [jira] Commented: (LUCENE-1607) String.intern() faster alternative

2009-04-19 Thread Earwin Burrfoot

> Sorry I wasn't as clear as I could have been - I realise JEE servers use a > threadpool for handling requests, I was thinking of many other applications > in the real world I'm aware of that don't (be that good design or > otherwise...). You was. I just wanted to point out that in real apps you'r

Re: Synonym filter with support for phrases?

2009-04-22 Thread Earwin Burrfoot

> Hello everyone, > > I'm looking for feedback and thoughts on the following problem (it's more of > development than user-centered problem, hope the dev list is appropriate): > > - a token stream is given, > > - a set of "synonyms" is given, where synonyms are token sequences to be > matched and t

Re: Synonym filter with support for phrases?

2009-04-22 Thread Earwin Burrfoot

>> Building on your example, "food place in new york" will find nothing, >> because 'place' and 'in' share the same position. > You're right, but is it such a big problem in real life? Well, everyone has his own requirements for the search quality. For us it was a problem. User enters a query, the

Re: Synonym filter with support for phrases?

2009-04-22 Thread Earwin Burrfoot

> Your example concerns phrase queries, so somebody would have to keep adding > terms to a phrase. My experience with open search queries (I had access to a > larger slice of queries from Microsoft Live) is that phrases are a minority > of all searches. In the most common case, people will look for

Re: Synonym filter with support for phrases?

2009-04-23 Thread Earwin Burrfoot

> On Wed, Apr 22, 2009 at 5:12 AM, Earwin Burrfoot wrote: > >> Your synonyms will break if you try searching for phrases. >> Building on your example, "food place in new york" will find nothing, >> because 'place' and 'in' share the same

Re: Synonym filter with support for phrases?

2009-04-23 Thread Earwin Burrfoot

>> engine. So guys looking for "MSU CMC" really want to get "Московский >> Государственный Университет, факультет ВМиК" and his friends. > And? How often do they extend this particular phrase with further terms? They don't need to. Variations of this phrase alone killed my first several approaches

Score calculation with new by-segment collection

2009-04-30 Thread Earwin Burrfoot

Did I miss something, or when trunk switched to collecting on SegmentReaders we've lost proper scores? I mean, before score depended on TF calculated across all the index, and now it depends on TF for a given segment (yup, unless I missed something). Per-segment TF can vary wildly, especially in ca

Re: Score calculation with new by-segment collection

2009-04-30 Thread Earwin Burrfoot

On Fri, May 1, 2009 at 00:47, Yonik Seeley wrote: > On Thu, Apr 30, 2009 at 4:44 PM, Earwin Burrfoot wrote: >> Did I miss something, or when trunk switched to collecting on >> SegmentReaders we've lost proper scores? >> I mean, before score depended on TF calculated ac

Re: Sort on TermEnum

2009-05-08 Thread Earwin Burrfoot

Isn't it better to have specially prepared sort fields? Like lowercased, if you want case-insensitive comparisons, or stripped of whitespace and punctuation, like I did once. That way you have more flexibility and also don't kill performance outright. On Fri, May 8, 2009 at 11:58, Federica Falini

Random test failure

2009-05-16 Thread Earwin Burrfoot

Running latest lucene trunk with some patches applied, but they do not touch IndexWriter and friends anywhere. Happened once, I failed to reproduce it, with and without patches. Java(TM) SE Runtime Environment (build 1.6.0_07-b06-153) Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_07-b06-57, mixed

Re(opening) (Multi)SegmentReaders

2009-05-17 Thread Earwin Burrfoot

While experimenting with indexReader 'components', I've got this thought: What if we always create MultiSegmentReader when (re)opening an index, even if index contains a single segment? Using unwrapped SegmentReader for single-segment case was a valid optimization for the times when Lucene did col

Re: Re(opening) (Multi)SegmentReaders

2009-05-18 Thread Earwin Burrfoot

that doesn't hamper backwards compatibility? 2009/5/17 Michael McCandless : > I tentatively think that's a good idea. The reopen logic is quite hairy... > > Wanna make a separate patch for that? > > Mike > > On Sun, May 17, 2009 at 8:37 AM, Earwin Burrfoot wrote: >

1 2 3 4 5 >

1 - 100 of 448 matches

Mail list logo