Marvin Humphrey <[EMAIL PROTECTED]> wrote:
> > Container is only aware of the single inStream, while codec can still
> > think its operating on 3 even if it's really 1 or 2.
> >
>
> I don't understand. If you have three streams, all of them are going to
> have to get skipped, right?
For the "all
On Apr 27, 2008, at 3:28 AM, Michael McCandless wrote:
Actually, I was picturing that the container does the seeking itself
(using skip data), to get "close" to the right point, and then it uses
the codec to step through single docs at a time until it's at or
beyond the right one.
I believe i
Marvin Humphrey <[EMAIL PROTECTED]> wrote:
> > > Seeking might get a little weird, I suppose.
> >
> > Maybe not?: if the container is only aware of the single InStream, and
> > say it's "indexed" with a multi-skip index, then when you ask
> > container to seek, it forwards the request to multi-ski
On Apr 24, 2008, at 4:47 AM, Michael McCandless wrote:
Seeking might get a little weird, I suppose.
Maybe not?: if the container is only aware of the single InStream, and
say it's "indexed" with a multi-skip index, then when you ask
container to seek, it forwards the request to multi-skip whic
Marvin Humphrey <[EMAIL PROTECTED]> wrote:
>
> On Apr 17, 2008, at 11:57 AM, Michael McCandless wrote:
>
>
> > If I have a pluggable indexer,
> > then on the querying side I need something (I'm not sure what/how)
> > that knows how to create the right demuxer (container) and codec
> > (decoder) to
On Apr 17, 2008, at 11:57 AM, Michael McCandless wrote:
If I have a pluggable indexer,
then on the querying side I need something (I'm not sure what/how)
that knows how to create the right demuxer (container) and codec
(decoder) to interact with whatever my indexing plugins wrote.
So I don't t
Marvin Humphrey <[EMAIL PROTECTED]> wrote:
> On Apr 13, 2008, at 2:35 AM, Michael McCandless wrote:
>
>
> > I think the major difference is locality? In a compound file, you
> > have to seek "far away" to reach the prx & skip data (if they are
> > separate).
>
> There's another item worth mentio
On Apr 13, 2008, at 2:35 AM, Michael McCandless wrote:
I think the major difference is locality? In a compound file, you
have to seek "far away" to reach the prx & skip data (if they are
separate).
There's another item worth mentioning, something that Doug, Grant and
I discussed when this
Marvin Humphrey <[EMAIL PROTECTED]> wrote:
>
> On Apr 10, 2008, at 3:10 AM, Michael McCandless wrote:
>
>
> > Can't you compartmentalize while still serializing skip data into the
> > single frq/prx file?
> >
>
> Yes, that's possible.
>
> The way KS is set up right now, PostingList objects maintai
On Apr 10, 2008, at 3:10 AM, Michael McCandless wrote:
Can't you compartmentalize while still serializing skip data into the
single frq/prx file?
Yes, that's possible.
The way KS is set up right now, PostingList objects maintain i/o
state, and Posting's Read_Record() method just deals with
Marvin Humphrey <[EMAIL PROTECTED]> wrote:
> On Apr 9, 2008, at 6:35 AM, Michael Busch wrote:
>
>
> > We also need to come up with a good solution for the dictionary, because a
> term with frq/prx postings needs to store two (or three for skiplist) file
> pointers in the dictionary, whereas e. g. a
Michael Busch <[EMAIL PROTECTED]> wrote:
> > I agree we would have an abstract base Posting class that just tracks
> > the term text.
> >
> > Then, DocumentsWriter manages inverting each field, maintaining the
> > per-field hash of term Text -> abstract Posting instances, exposing
> > the methods
On Apr 9, 2008, at 6:35 AM, Michael Busch wrote:
We also need to come up with a good solution for the dictionary,
because a term with frq/prx postings needs to store two (or three
for skiplist) file pointers in the dictionary, whereas e. g. a
"binary" posting list only needs one pointer.
On Mar 13, 2007, at 2:03 AM, Nicolas Lalevée wrote:
At present KS allows you to attach both a Similarity and an Analyzer
to a field name via a FieldSpec subclass. I haven't quite figured
out how to attach a posting format. Should it return an object, like
FieldSpec's similarity() method does?
On Mar 12, 2007, at 5:08 PM, Grant Ingersoll wrote:
I can see having storage at:
Index
Document/Field //already exists
Token
I hadn't thought of it that way, as a logical extension outwards at
all levels.
If I understand you correctly, it's a clever point, but the thing is,
it's cake f
On Mar 13, 2007, at 2:38 AM, Michael Busch wrote:
Global field semantics make our life with FI much easier in a
single index. But even with global field semantics we would have
the same problem with the IndexWriter.addIndexes() method, no? I'm
curious about how you solved that conflict in
Le Dimanche 11 Mars 2007 22:41, Michael Busch a écrit :
> Hi Grant,
>
> I certainly agree that it would be great if we could make some progress
> and commit the payloads patch soon. I think it is quite independent from
> FI. FI will introduce different posting formats (see Wiki:
> http://wiki.apach
Marvin Humphrey wrote:
It uses global field semantics, which Hoss won't be happy about. ;)
However, I'm grateful to Hoss for past critiques, as they've helped me
to refine and improve how Schema works. For instance, as of KS
0.20_02 you can introduce new field_name => FieldSpec association
Le Lundi 12 Mars 2007 21:34, Marvin Humphrey a écrit :
> On Mar 10, 2007, at 3:27 PM, Michael Busch wrote:
> > - Introduce index format. Nicolas has already written a lot of code
> > in this regard!
>
> I worry that going the interface route is going to be too
> restrictive. When I looked at Nicho
On Mar 12, 2007, at 6:54 PM, Michael Busch wrote:
Marvin Humphrey wrote:
On Mar 12, 2007, at 2:11 PM, Michael Busch wrote:
I think our best option here is to have a closed XML file for the
index format/configuration (something like you sent in your other
mail) plus a binary file for cust
On Mar 12, 2007, at 3:54 PM, Michael Busch wrote:
Sounds interesting! I will take a closer look at it...
Here's an introduction courtesy of JYaml, a YAML library for Java:
http://jyaml.sourceforge.net/tutorial.html
For an example of how YAML is well suited to the task of serializing
ind
Marvin Humphrey wrote:
On Mar 12, 2007, at 2:11 PM, Michael Busch wrote:
I think our best option here is to have a closed XML file for the
index format/configuration (something like you sent in your other
mail) plus a binary file for custom index-level metadata like Grant
suggested.
Why th
On Mar 12, 2007, at 2:11 PM, Michael Busch wrote:
I think our best option here is to have a closed XML file for the
index format/configuration (something like you sent in your other
mail) plus a binary file for custom index-level metadata like Grant
suggested.
Why the binary file?
Btw,
Marvin Humphrey wrote:
On Mar 10, 2007, at 3:27 PM, Michael Busch wrote:
I'm going to respond to this over several mails (: and possibly days
:) because there's an awful lot here, and I've already implemented a
lot of it in KS.
We should also make this public, so that users can store their
On Mar 10, 2007, at 3:27 PM, Michael Busch wrote:
- Introduce index-level metadata. Preferable in XML format, so it
will be human readable. Later on, we can store information about
the index format in this file, like the codecs that are used to
store the data.
To provoke thought about wh
On Mar 10, 2007, at 3:27 PM, Michael Busch wrote:
- Introduce index format. Nicolas has already written a lot of code
in this regard!
I worry that going the interface route is going to be too
restrictive. When I looked at Nicholas's index format spec, I
immediately wanted to add an Anal
On Mar 10, 2007, at 3:27 PM, Michael Busch wrote:
I'm going to respond to this over several mails (: and possibly
days :) because there's an awful lot here, and I've already
implemented a lot of it in KS.
We should also make this public, so that users can store their own
index metadata.
Grant Ingersoll wrote:
In regard of FI and 662 however I really believe we should split it
up and plan ahead (in a way I mentioned already), so that we have
more isolated patches. It is really great that we have 662 already
(Nicolas, thank you so much for your hard work, I hope you'll keep
w
On Mar 11, 2007, at 5:41 PM, Michael Busch wrote:
Hi Grant,
I certainly agree that it would be great if we could make some
progress and commit the payloads patch soon. I think it is quite
independent from FI. FI will introduce different posting formats
(see Wiki: http://wiki.apache.org/l
Hi Grant,
I certainly agree that it would be great if we could make some progress
and commit the payloads patch soon. I think it is quite independent from
FI. FI will introduce different posting formats (see Wiki:
http://wiki.apache.org/lucene-java/FlexibleIndexing). Payloads will be
part of
Hi Michael,
This is very good. I know 662 is different, just wasn't sure if
Nicolas patch was meant to be applied after 662, b/c I know we had
discussed this before.
I do agree with you about planning this out, but I also know that
patches seem to motivate people the best and provide a c
On Jun 2, 2006, at 6:48 AM, Grant Ingersoll wrote:
I thought it was you, but wasn't sure.
I'm always looking for ways to minimize Term Vectors, because I
consider excerpting/highlighting a core feature rather than an add-
on, and they seem like such overkill. It bothers me that they
dupl
I thought it was you, but wasn't sure.
I would also like a way to store the frequency of the term in the
overall collection (probably should go in the Term dictionary, but not
sure, at the cost of an additional VInt per term, but I am open to other
places to store it). Right now, in order to
On Jun 1, 2006, at 5:48 AM, Grant Ingersoll wrote:
Someone on the list a while ago suggested moving Term Vectors out
of the postings and storing them separately, as then they don't
have to be merged (but they doc ids would have to be kept up to date)
Yes, that was me. :) I suggested stor
Marvin Humphrey wrote:
* Term Vectors (optional)
Someone on the list a while ago suggested moving Term Vectors out of the
postings and storing them separately, as then they don't have to be
merged (but they doc ids would have to be kept up to date)
--
Grant Ingersoll
Sr. Software Engi
[wild brainstorming...]
Another reason to consolidate the freqs, positions, and boosts/norms
into one file: we can isolate and distill the code that encodes/
decodes that file into a plugin, weakening the current tight coupling
between Lucene and its file format. Changing that index format
36 matches
Mail list logo