I'm thinking, should Lucene introduce new interface to read stored
document fields?
Current 'Document document(int n)' mechanism is barely usable due to
overhead involved. While I believe underlying index structure works
pretty fast (if it fits in memory, as is the case for most
performance-concer
actually want all the fields.
> Erick
>
> On Thu, Feb 25, 2010 at 7:52 AM, Earwin Burrfoot wrote:
>>
>> I'm thinking, should Lucene introduce new interface to read stored
>> document fields?
>>
>> Current 'Document document(int n)' mec
(didn't see any interest from anyone though)
>
> -- Tim
>
> Erick Erickson wrote:
>
> OK, never mind
> Erick
>
> On Thu, Feb 25, 2010 at 1:48 PM, Earwin Burrfoot wrote:
>>
>> My issue is with extra objects created in the process. Field selection
>&g
> but even non-final methods are inlined by hotspot, if the compiler is sure
> that the class was not extended
There's absolutely no way a JIT compiler can be sure that the class
was not extended (except declaring it final) - because you can create
a new classloader and load new class any time you
Some of these people got traumatized by maven, now they only can think
in terms of "mash everything together and sprinkle with
hand-downloaded dependency jars".
No offence : )
I, personally, prefer side-by-side layouts. You can add new stuff, and
wire dependencies to the old one, without reorganiz
> Unless maven has some features i'm not aware of, your "nicely depends"
> works buy pulling Lucene jars from a repository
The 'missing feature' is called multi-module projects.
On Thu, Mar 18, 2010 at 03:33, Chris Hostetter wrote:
> : build and nicely gets all dependencies to Lucene and Tika whe
I think that would be ideal because right now it is somewhat confusing
on where to pull your latest-and-greatest from and what should you
base your patches on.
On Mon, Mar 22, 2010 at 14:21, Chris Male wrote:
> I think that would be ideal because we can then start getting some nightly
> builds us
> Sounds good to me. I guess one thing to think about is the analyzers
> in core (should they move to this module, too?).
> If so, perhaps we could make 'ant test' of lucene depend on this
> module, since core tests use analyzers.
> But you could use lucene without an analyzers module, it wouldnt b
On Fri, Mar 26, 2010 at 18:24, Robert Muir wrote:
> I would really love to see them all in one place though, for the
> users. I think that the elegance of our tests should be second to the
> users ease.
> Perhaps we could just have a fast and dirty TestAnalyzer so the core
> tests don't need to de
I think original Directory.copy() just copied everything in flex, without
nocommits?
Unlike before, now you can specify which files do you want to have copied, so
people can query Codecs and whatnot themselves.
>> Author: uschindler
>> Date: Sat Mar 27 19:12:08 2010
>> New Revision: 928246
>>
>
) ;)
>>
>> Ie Directory.copy used to filter for only index files, but
>> Directory.copyTo copies everything so you must provide your own list
>> if this matters.
>>
>> Mike
>>
>> On Sat, Mar 27, 2010 at 4:24 PM, Earwin Burrfoot wrote:
>>> I
>>> Of course introducing the idea of updates also introduces the notion of a
>>> primary key and there's probably an entirely separate discussion to be had
>>> around user-supplied vs Lucene-generated keys.
>> Not sure I see that need. Can you explain your reasoning a bit more?
> If you want to u
> Of course introducing the idea of updates also introduces the notion of a
> primary key and there's probably an entirely separate discussion to be had
> around user-supplied vs Lucene-generated keys.
Not sure I see that need. Can you explain your reasoning a bit more?
>>> If you
>>If someone needs this, it can be built over lucene, without
>>introducing it as a core feature and needlessly complicating things.
>
> I think with any partial-update feature the *absence* of primary key support
> would "needlessly complicate things":
> If Lucene is not capable of performing du
>>Variant d) sounds most logical? And enables all sorts of fun stuff.
>
> So the duplicate-key docs can have different values for initial-insert fields
> but partial updates will cause sharing of a common field value?
> And subsequent same-key doc inserts do or don't share these previous
> "part
>>Who ever said that some_condition should point to a unique document?
>
> My assumption was, for now, we were still talking about the simpler case of
> updating a single document.
> If we extend the discussion to support set-based updates it's worth
> considering the common requirements for upda
No, no, no, Lucene still has no need for maven or ivy for dependency management.
We can just hack around all issues with ant scripts.
: )
On Thu, Apr 1, 2010 at 09:48, Chris Hostetter wrote:
>
> : I was wondering yesterday why aren't the required libs checked in to SVN? We
>
> Licensing issues.
>
Generics SpecOps made it to the top and are gonna rule us from the
shadows :) Congrats!
On Thu, Apr 1, 2010 at 16:37, Robert Muir wrote:
> Congrats Uwe!
>
> On Thu, Apr 1, 2010 at 7:05 AM, Grant Ingersoll wrote:
>>
>> I'm pleased to announce that the Lucene PMC has voted to add Uwe Schindler
>>
> it doesn't really matter if it's ant scripts, or ivy declarations, or
> maven pom entries -- the point is the same.
>
> We can't distribute the jars, but we can distribute programatic means for
> users to fetch teh jars themselves.
>
> (even if we magicly switched to ivy or maven for dependency m
A random thought from some of the earlier discussions.
Had anybody used the fact that Lucene Term space is continuous (single
per-index/segment space instead of separate per-field spaces) at least
once?
I only see code around that copes with this somehow, like checking
"term.field() == field" just
Wow! Cool.
On Tue, Apr 6, 2010 at 03:51, Michael McCandless
wrote:
> The flex API isolates fields, ie you get a TermsEnum for a given field
> and it enums only the term's text (as a BytesRef).
>
> Mike
>
> On Mon, Apr 5, 2010 at 7:22 PM, Earwin Burrfoot wrote:
>>
So, I want to pump my IndexWriter hard and fast with documents.
Removing fsync from FSDirectory helps. But for that I pay with possibility of
index corruption, not only if my node suddenly loses
power/kernelpanics, but also if it
runs out of disk space (which happens more frequently).
I invented
> Running out of disk space with fsync disabled won't lead to corruption.
> Even kill -9 the JRE process with fsync disabled won't corrupt.
> In these cases index just falls back to last successful commit.
>
> It's "only" power loss / OS / machine crash where you need fsync to
> avoid possible corr
nning time improves, but I'm curious to know by how much.
>
> Shai
>
> On Wed, Apr 7, 2010 at 2:26 AM, Earwin Burrfoot wrote:
>>
>> > Running out of disk space with fsync disabled won't lead to corruption.
>> > Even kill -9 the JRE process with fsync dis
> No, this doesn't make sense. The OS detects a disk full on accepting
> the write into the write cache, not [later] on flushing the write
> cache to disk. If the OS accepts the write, then disk is not full (ie
> flushing the cache will succeed, unless some other not-disk-full
> problem happens).
> But, IW doesn't let you "hold on to" checkpoints... only to commits.
>
> Ie SnapshotDP will only "see" actual commit/close calls, not
> intermediate checkpoints like a random segment merge completing, a
> flush happening, etc.
>
> Or... maybe you would in fact call commit frequently from the main
I wholeheartedly support this anti-version riot :)
On Tue, Apr 13, 2010 at 19:27, Shai Erera wrote:
> Hi
>
> I'd like to propose a relaxation on the Version API. Uwe, please read the
> entire email before you reply :).
>
> I was thinking, following a question on the user list, that the
> Version-
Priceless
On Wed, Apr 14, 2010 at 00:53, wrote:
>
> You (or someone else) has reset your password.
>
> -
>
> Your password has been changed to: MCwqNr
>
> You can change your password here:
>
> https://issues.apache.org/jira/
It wasn't
On Wed, Apr 14, 2010 at 02:06, Erick Erickson wrote:
> A, good. That means the very long e-mail that came to my regular account
> about someone hacking the JIRA server is bogus too I assume..
> Erick
>
> On Tue, Apr 13, 2010 at 5:58 PM, Uwe Schindler wrote:
>>
>> LOL!
>>
>> Thi
The thread somehow got sidetracked. So, let's get this carriage back
on its rails?
Let me remind - we have an API on hands that is mandatory and tends to
be cumbersome.
Proposed solution does indeed have ultrascary word "static" in it. But
if you brace yourself and look closer - the use of said st
Can't believe my eyes.
+1
On Thu, Apr 15, 2010 at 01:22, Michael McCandless
wrote:
> On Wed, Apr 14, 2010 at 12:06 AM, Marvin Humphrey
> wrote:
>
>> Essentially, we're free to break back compat within "Lucy" at any time, but
>> we're not able to break back compat within a stable fork like "Lucy
3.1.1
>> just to get a new feature but get it API back-supported? As soon as they
>> upgrade to 3.2, that means a new set of API right?
>>
>> Major releases will just change the index structure format then? Or move
>> to Java 1.6? Well ... not even that bec
We should just let IW create a null commit on an empty directory, like
it always did ;)
Then a whole class of such problems disappears.
On Thu, Apr 15, 2010 at 11:16, Shai Erera wrote:
> SDP throws NPE if the index includes no commits, but snapshot() is called.
> This is an extreme case, but can
I think an index upgrade tool is okay?
While you still definetly have to code it, things like "if idxVer==m
doOneStuff elseif idxVer==n doOtherStuff else blowUp" are kept away
from lucene innards and we all profit?
On Thu, Apr 15, 2010 at 16:21, Robert Muir wrote:
> its open source, if you feel t
I like the idea of index conversion tool over silent online upgrade
because it is
1. controllable - with online upgrade you never know for sure when
your index is completely upgraded, even optimize() won't help here, as
it is a noop for already-optimized indexes
2. way easier to write - as flex sho
On Thu, Apr 15, 2010 at 17:17, Yonik Seeley wrote:
> Seamless online upgrades have their place too... say you are upgrading
> one server at a time in a cluster.
Nothing here that can't be solved with an upgrade tool. Down one
server, upgrade index, upgrade sofware, up.
--
Kirill Zakharenko/Кири
On Thu, Apr 15, 2010 at 17:49, Robert Muir wrote:
> wrong, it doesnt fix the analyzers problem.
> you need to reindex.
>
> On Thu, Apr 15, 2010 at 9:39 AM, Earwin Burrfoot wrote:
>>
>> On Thu, Apr 15, 2010 at 17:17, Yonik Seeley
>> wrote:
>> > Seamle
> reasonable, but changing APIs around when there's not a good reason
> behind it (other than someone liked the name a little better) should
> still be approached with caution.
Changing names is a good enough reason :)
They make a darn difference between having to read a book to be able
to use som
ays nice to be
> able to work without dealing with pesky legacy issues . Perhaps
> splitting out the indexing upgrades into a separate program lets us
> accommodate both concerns.
> FWIW
> Erick
> On Thu, Apr 15, 2010 at 9:42 AM, Danil ŢORIN wrote:
>>
>> True. Just ne
> First, the index format. IMHO, it is a good thing for a major release to be
> able to read the prior major release's index. And the ability to convert it
> to the current format via optimize is also good. Whatever is decided on this
> thread should take this seriously.
Optimize is a bad way to co
ote:
> On 04/15/2010 01:50 PM, Earwin Burrfoot wrote:
>>>
>>> First, the index format. IMHO, it is a good thing for a major release to
>>> be
>>> able to read the prior major release's index. And the ability to convert
>>> it
>>> to
> BTW Earwin, we can come up w/ a migrate() method on IW to accomplish
> manual migration on the segments that are still on old versions.
> That's not the point about whether optimize() is good or not. It is
> the difference between telling the customer to run a 5-day migration
> process, or a coup
On Thu, Apr 15, 2010 at 23:07, DM Smith wrote:
> On 04/15/2010 03:04 PM, Earwin Burrfoot wrote:
>>>
>>> BTW Earwin, we can come up w/ a migrate() method on IW to accomplish
>>> manual migration on the segments that are still on old versions.
>>> That'
> Not sure if plain users are allowed/encouraged to post in this list,
> but wanted to mention (just an opinion from a happy user), as other
> users have, that not all of us can reindex just like that. It would
> not be 10 min for one of our installations for sure...
>
> First, i would need to impl
2010/4/15 Shai Erera :
> The reason Earwin why online migration is faster is because when u
> finally need to *fully* migrate your index, most chances are that most
> of the segments are already on the newer format. Offline migration
> will just keep the application idle for some amount of time unt
I think this should split off the mega-thread :)
On Thu, Apr 15, 2010 at 23:28, Uwe Schindler wrote:
> Hi Earwin,
>
> I am strongly +1 on this. I would also make the Release Manager for 3.1, if
> nobody else wants to do this. I would like to take the preflex tag or some
> revisions before (mayb
Why can't people just use svn or mercurial as a client for subversion
repository?
What is the benefit of migrating repository itself?
On Sat, Apr 17, 2010 at 11:20, Thomas Koch wrote:
> Hi,
>
> at least since august 2009 nobody has dared to ask this question, so let's
> start a flamewar:
> Don't
These are broken, by the way.
We need to kick someone to merge entries for lucene&solr and point
them to a new svn url.
On Sun, Apr 18, 2010 at 04:10, John Wang wrote:
> Hi Thomas:
> There is a git mirror already: http://github.com/apache/lucene
> All of apache projects are: http://git.
Looks like Czech to my slavic eyes :)
On Sat, Jan 24, 2009 at 18:14, Paul Elschot wrote:
> On Saturday 24 January 2009 15:29:12 Grant Ingersoll wrote:
>
>> Anyone know what this is:
>> http://wiki.apache.org/lucene-java/IndeksRe%C4%8Di
>
> After looking around on the lucene wiki a bit I also foun
Have you looked at MG4J (http://mg4j.dsi.unimi.it/)?
Last time I did, it looked like an opposite of lucene - nice and
up-to-date algorithmics, but hard to apply to complex real-world
tasks.
On Thu, Feb 26, 2009 at 04:21, Koren Krupko wrote:
>
> Hello Lucene Developers!
>
> My name is Koren Krupko
> Maybe we can use the
> compression technology mentioned in this Wikipedia article to further
> optimize filters and their DocIdSetIterators.
We already use WAH-encoded bitmap filters over here for roughly a
year. And yes, they are nice.
--
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
H
My opinion is that if you want to enable sorting on multi-term fields,
you need a pluggable selection policy. I see someone wanting
biggest/smallest term represent a document when sorting. Or maybe a
function of the terms.
On Mon, Mar 2, 2009 at 20:34, Uwe Schindler wrote:
> I updated yesterday h
Take ANTLR and roll your own query parser from scratch? It's pretty easy.
On Thu, Mar 12, 2009 at 04:24, Candide Kemmler wrote:
> Hello,
>
> I'm looking at a way to extend the lucene query parser to allow for semantic
> computations in IEML space (see http://ieml.org). What I'd like to know is:
>
On Thu, Mar 12, 2009 at 21:16, Candide Kemmler wrote:
>
> On 11 Mar 2009, at 23:21, Earwin Burrfoot wrote:
>
>> Take ANTLR and roll your own query parser from scratch? It's pretty easy.
>>
>
> Hi Earwin,
>
> That would be fantastic, since our parser is a
On Wed, Mar 18, 2009 at 23:08, Andi Vajda wrote:
>
> On Mar 18, 2009, at 13:01, Michael McCandless
> wrote:
>
>> I think we should move TrieRange* into core before 2.9?
>>
>> It's received alot of attention, from both developers (Uwe & Yonik did
>> lots of iterations, and Solr is folding it in) a
> - contrib has always had a lower bar and stuff was committed under
> that lower bar - there should be no blanket promotion.
> - contrib items may have different dependencies... putting it all
> under the same source root can make a developers job harder
> - many contrib items are less related to
On Mon, Mar 23, 2009 at 22:13, Mark Miller wrote:
> Earwin Burrfoot wrote:
>>>
>>> - contrib has always had a lower bar and stuff was committed under
>>> that lower bar - there should be no blanket promotion.
>>> - contrib items may have different dependenci
I'd say it is a bad name. Raw hit is way far from being result of a search.
If you're already breaking back compat with 3.0 release (by
incrementing java version), maybe its worthy to break it in some more
places, just so ugly names like MRHC and special code paths that check
for n-year-old interf
> BTW, I like the name ResultsCollector, as it's just like HitCollector, but
> does not commit too much to "hits" .. i.e., facets aren't hits ... I think?
What this class consumes and what it produces is a totally different
thing. HitCollector always collects 'hits', and then produces whatever
imp
> On Thu, Mar 26, 2009 at 08:44:57AM -0400, Michael McCandless wrote:
>
>> do you have an alternative?
>
> Brainstorming
>
> * Harvester
> * Trawler
> * HitPicker
> * HitGrabber
>
> Marvin Humphrey
NitPicker - that absolutely made my day
--
Kirill Zakharenko/Кирилл Захаренко (ear...@gma
> I think ResultsCollector (or maybe ResultCollector) is my favorite so far...
>
> But how about simply Collector? (I realize it's very generic... but
> we don't collect anything else in Lucene?).
That's exactly what I'm using in my app -> abstract class Collector
extends HitCollector, that serves
On Sat, Mar 28, 2009 at 16:44, Michael Busch wrote:
> NIO.2 sounds great.
> Though, it will probably take a pretty long time before we can switch Lucene
> to Java 1.7 :(
>
> We could write a (contrib) module that we don't ship together with the core
> that has a Directory implementation which uses
> I think having async IO will be great, though I wonder how we would
> change Lucene to take advantage of it. It ought to gain us
> concurrency (eg we can score last chunk while we have an io request
> out to retrieve next chunk, of term docs / positions / etc.).
A presentation given above refere
While drooling over MappedBigByteBuffer, which we'll (hopefully) see
in JDK7, I revisited my own Directory code and noticed a certain
peculiarity, shared by Lucene core classes:
Each and every IndexInput implementation only implements readByte()
and readBytes(), never trying to override readInt/VIn
> A while ago I tried overriding the read* methods in BufferedIndexInput like
> this:
>
> I'm still surprised there was no performance improvement at all. Maybe
> something was wrong with my test and I should try it again...
For BufferedIndexInput improvement should be
> Earwin,
> I did not experiment lately, but I'd like to add a general compressed
> integer array to the basic types in an index, that would be compressed
> on writing and decompressed on reading.
> A first attempt is at LUCENE-1410, and one of the choices I had there
> was whether or not to use NI
>> In my case I have to switch to MMap/Buffers, Java behaves ugly with
>> 8Gb heaps.
> Do you mean that because garbage collection does not perform well
> on these larger heaps, one should avoid to create arrays to have heaps
> of that size, and rather use (direct) MMap/Buffers?
Yes, exactly. Keepi
Lucene is in fact already available through maven. poms do exist, all
what is left is to find who manages them and releases.
On Thu, Apr 2, 2009 at 01:40, Douglas Campos wrote:
> +1 on maven, and I volunteer to aid in the creation of the maven project
> files (pom's)
>
> On Wed, Apr 1, 2009 at 11
Currently, when we're seeking a given Term, it does a binary search
across all term space, including terms belonging to other fields.
I propose augmenting fields file with two pointers (firstTerm,
lastTerm) for each field. That reduces range we need to search, and
instead of comparing Terms we only
On Thu, Apr 9, 2009 at 00:14, Michael McCandless
wrote:
> On Wed, Apr 8, 2009 at 3:46 PM, Earwin Burrfoot wrote:
>
>> Currently, when we're seeking a given Term, it does a binary search
>> across all term space, including terms belonging to other fields.
>> I propos
On Thu, Apr 9, 2009 at 02:01, Uwe Schindler wrote:
>> >> Also, on the other topic - how hard is it to boost
>> >> TermEnum.skipTo(term) speed to IndexReader.terms(term) level? Would be
>> >> nice for TrieRangeFilter and probably some other filters.
>> > I think all that's needed is to implement Se
On Fri, Apr 10, 2009 at 02:25, Chris Hostetter wrote:
> Or just make it trivial to get all jars that fit a given profile w/o
> actually merging those jars into an uber-jar ... does maven's
> dependency management have any like "bundles" or "virtual packages" so
> we could publish a "lucene-all-ana
To support my dream of kicking fieldCache out of the core and to add
some extensibility to Lucene, I want to introduce IndexReaderPlugins.
Rough pseudocode follows:
interface IndexReaderPlugin {
void attach(SegmentReader reader);
void detach(SegmentReader reader);
void att
> Earwin Burrfoot wrote:
>>
>> Benefits are numerous. We get rid of alien code like:
>> +++ src/java/org/apache/lucene/index/SegmentReader.java (working copy)
>> @@ -83,6 +86,8 @@
>> + protected ValueSource valueSource;
>> +
>> @@ -555
ferent plugin instances per-subreader.
Do we want plugins supporting more than one interface, or is it an
unnecessary complication?
Like:
indexReader.bindPlugin(instance).to(Iface1.class, Iface2.class);
And then:
indexReader.plugin(Iface1.class) == indexReader.plugin(Iface2.class)
> Mike
>
>
>> Can we outline some requirements for the plugin API?
>>
>> Do we want to attach/detach them to IndexReader after it is created,
>> or only during construction?
>
> I think I'd lean towards only at construction. Seems dangerous to
> allow swap in/out at some later time.
I have several points pro
On Mon, Apr 13, 2009 at 17:14, Michael McCandless
wrote:
> On Mon, Apr 13, 2009 at 9:02 AM, Earwin Burrfoot wrote:
>
>>> I think I'd lean towards only at construction. Seems dangerous to
>>> allow swap in/out at some later time.
>> I have several points
>> IndexReader.java is littered with the likes of:
>> public static IndexReader open(final Directory directory,
>> IndexDeletionPolicy deletionPolicy) throws CorruptIndexException,
>> IOException;
> But I don't understand why is this a problem...
Doubling the number of factory methods? We have to k
>> > With the early binding approach, you wouldn't pass all plugins during
>> > creation; you'd pass a factory object that exposes methods like:
>> >
>> > getPostingsComponent(SegmentInfo)
>> > getStoredFieldsComponent(SegmentInfo)
>> > getValueSourceComponent(SegmentInfo)
>>
>> That basically k
> The original example justification was to avoid putting a ValueSource in the
> IndexReader (I guess avoiding the funky init code? valueSource = new
> CachingValueSource(this, new UninversionValueSource(this))
That was a bit of drama for the sake of drama, I couldn't restrain myself :)
My justific
Mark Miller wrote:
> The distinction I am making with core is that we will have to call known
> methods on those
> core 'modules' that are not very generic? Doesn't that keep it from playing
> nice with the very generic 'attach this to this segment'?
Genericity spans binding, notifications and retr
Michael McCandless wrote:
> I gave the example to show the init vs inflight distinction, because
> inflight makes me nervous.
I'm thinking of some (bad name follows) PluginBundle, that has
add/remove/inspect methods and constructor/method for filling it with
default Lucene components.
Then instead
On Wed, Apr 15, 2009 at 00:15, Mark Miller wrote:
> Mark Miller wrote:
>>
>> Earwin Burrfoot wrote:
>>>
>>> Mark Miller wrote:
>>>
>>>>
>>>> The distinction I am making with core is that we will have to call known
>>>>
On Wed, Apr 15, 2009 at 00:55, Mark Miller wrote:
> Earwin Burrfoot wrote:
>>
>> On Wed, Apr 15, 2009 at 00:15, Mark Miller wrote:
>>
>>>
>>> Mark Miller wrote:
>>>
>>>>
>>>> Earwin Burrfoot wrote:
>>>>
>&
On Thu, Apr 16, 2009 at 18:16, Ken Krugler wrote:
> I wrote a Analyzer for apache lucene for analyzing sentences in Chinese
> language, it's called imdict-chinese-analyzer as it is a subproject of
> imdict, which is an intelligent online dictionary.
>
> The project on google code is here:
> http:/
Okay, we'd like to have equality-by-reference for field names,
yielding überfast comparisions in all our tight inner loops. But we
dislike default String.intern() for its java<->native transitions and
general lentitude.
There's a perfect solution. Too dumb to come up with it myself, but
fortunately
On Sun, Apr 19, 2009 at 23:16, Chris Miller wrote:
> As far as I can see, both these implementations only suffer from
> threadsafety problems in that they don't guarantee visibility across
> threads, ie it's possible for threads to see stale data.
> So the code should work fine if you can live wi
On Sun, Apr 19, 2009 at 23:42, Chris Miller wrote:
>> As soon as all possible fields are in the pool, we're essentially
>> readonly.
> The problem is, there's no guarantee we will ever reach this point. For
> example suppose you have a server app that spawns a new thread per request.
> Each new th
> Sorry I wasn't as clear as I could have been - I realise JEE servers use a
> threadpool for handling requests, I was thinking of many other applications
> in the real world I'm aware of that don't (be that good design or
> otherwise...).
You was. I just wanted to point out that in real apps you'r
> Hello everyone,
>
> I'm looking for feedback and thoughts on the following problem (it's more of
> development than user-centered problem, hope the dev list is appropriate):
>
> - a token stream is given,
>
> - a set of "synonyms" is given, where synonyms are token sequences to be
> matched and t
>> Building on your example, "food place in new york" will find nothing,
>> because 'place' and 'in' share the same position.
> You're right, but is it such a big problem in real life?
Well, everyone has his own requirements for the search quality. For us
it was a problem.
User enters a query, the
> Your example concerns phrase queries, so somebody would have to keep adding
> terms to a phrase. My experience with open search queries (I had access to a
> larger slice of queries from Microsoft Live) is that phrases are a minority
> of all searches. In the most common case, people will look for
> On Wed, Apr 22, 2009 at 5:12 AM, Earwin Burrfoot wrote:
>
>> Your synonyms will break if you try searching for phrases.
>> Building on your example, "food place in new york" will find nothing,
>> because 'place' and 'in' share the same
>> engine. So guys looking for "MSU CMC" really want to get "Московский
>> Государственный Университет, факультет ВМиК" and his friends.
> And? How often do they extend this particular phrase with further terms?
They don't need to. Variations of this phrase alone killed my first
several approaches
Did I miss something, or when trunk switched to collecting on
SegmentReaders we've lost proper scores?
I mean, before score depended on TF calculated across all the index,
and now it depends on TF for a given segment (yup, unless I missed
something).
Per-segment TF can vary wildly, especially in ca
On Fri, May 1, 2009 at 00:47, Yonik Seeley wrote:
> On Thu, Apr 30, 2009 at 4:44 PM, Earwin Burrfoot wrote:
>> Did I miss something, or when trunk switched to collecting on
>> SegmentReaders we've lost proper scores?
>> I mean, before score depended on TF calculated ac
Isn't it better to have specially prepared sort fields? Like
lowercased, if you want case-insensitive comparisons, or stripped of
whitespace and punctuation, like I did once.
That way you have more flexibility and also don't kill performance outright.
On Fri, May 8, 2009 at 11:58, Federica Falini
Running latest lucene trunk with some patches applied, but they do not
touch IndexWriter and friends anywhere.
Happened once, I failed to reproduce it, with and without patches.
Java(TM) SE Runtime Environment (build 1.6.0_07-b06-153)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_07-b06-57, mixed
While experimenting with indexReader 'components', I've got this thought:
What if we always create MultiSegmentReader when (re)opening an index,
even if index contains a single segment?
Using unwrapped SegmentReader for single-segment case was a valid
optimization for the times when Lucene did col
that
doesn't hamper backwards compatibility?
2009/5/17 Michael McCandless :
> I tentatively think that's a good idea. The reopen logic is quite hairy...
>
> Wanna make a separate patch for that?
>
> Mike
>
> On Sun, May 17, 2009 at 8:37 AM, Earwin Burrfoot wrote:
>
1 - 100 of 448 matches
Mail list logo