Hoss, thanks for kicking-in with your "design purist" hat on :)
about your proposal,
"The best short term approach I can think of for addressing LUCENE-1219
in 2.4:
1) list the new methods in a new interface that extends Fieldable
(ByteArrayReuseFieldable or something)
2) add the new met
On Mar 19, 2008, at 6:45 AM, eks dev wrote:
Hoss, thanks for kicking-in with your "design purist" hat on :)
about your proposal,
"The best short term approach I can think of for addressing
LUCENE-1219
in 2.4:
1) list the new methods in a new interface that extends Fieldable
(ByteArrayReu
On Mar 13, 2008, at 5:35 AM, Michael McCandless (JIRA) wrote:
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578199
#action_12578199 ]
Michael McCandless commented on LUCENE-1219:
--
I do agree, Hoss, that it makes sense to think about an IndexDocument
and a SearchDocument or something along those lines for 3.0.
Also note, I added some comments to 1219 about the history of Fieldable.
On Mar 18, 2008, at 9:26 PM, Chris Hostetter wrote:
: Really, I think we could go just
Grant,
"Couldn't you just mark AbstractField w/ another interface for the
methods needed for 1219"
sure it can be done this way, it is just not 100% obvious to me what benefit
would that bring compared to the current patch. Adding another interface vs
modifying only AbstractField, but the fac
On Mar 19, 2008, at 9:31 AM, eks dev wrote:
Grant,
"Couldn't you just mark AbstractField w/ another interface for the
methods needed for 1219"
sure it can be done this way, it is just not 100% obvious to me what
benefit would that bring compared to the current patch. Adding
another interfa
Hi
I have a question on the setting of RAMBufferSizeMB on IndexWriter. It may
sound like it belongs to the user list, but I actually think there is a
problem with it, so I'm posting it to the dev list.
I'm using 2.3.1 to index a set of documents (500K Amazon books to be exact).
I don't use norms
"Well, maybe we should put 1219 off to 3.0 and maybe we should get to
3..0 sooner rather than later, as in stop adding new features and focus
on bug fixes and deprecation. :-)"
honestly, "getting to 3.0 sooner" can take far too long for an itch I currently
have, gc() is kicking in like crazy
One correction - I use 2.3.0 and not 2.3.1
On Wed, Mar 19, 2008 at 4:25 PM, Shai Erera <[EMAIL PROTECTED]> wrote:
> Hi
>
> I have a question on the setting of RAMBufferSizeMB on IndexWriter. It may
> sound like it belongs to the user list, but I actually think there is a
> problem with it, so I'm
Shai Erera wrote:
Hi
I have a question on the setting of RAMBufferSizeMB on IndexWriter.
It may
sound like it belongs to the user list, but I actually think there
is a
problem with it, so I'm posting it to the dev list.
I'm using 2.3.1 to index a set of documents (500K Amazon books to
b
Chris Hostetter wrote:
Committers tend to prefer abstract classes for extension points because it
makes it easier to support backwards compatibility in the cases were we
want to add methods to extendable APIs and the "default" behavior for
these new methods is simple (or obvious delegation to e
I disagree on the use the interfaces.
The problem with abstract classes, is that any methods you provide
"know" something of the implementation, unless the methods are
implemented solely by calling other abstract methods (which is rarely
the case if the abstract class contains ANY private m
Hi Ian,
Can you try this patch? Grasping at straws at this point, trying to
isolate where the JVM fails us...
I'm CC'ing java-dev. To sum up: sometimes when we merge fields, the
fdx file ends up exactly one document too short. In adding numerous
asserts around this code in SegmentMerge
Thanks Hoss and Grant for links to all the background on this issue.
I've read through them and now my brain is really fuzzy...
For 1219, I'd rather not introduce yet another interface into the
Field classes. I think it confuses our users to have so many
classes/interfaces to represent the fairly
On Wed, Mar 19, 2008 at 1:18 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> I'm CC'ing java-dev. To sum up: sometimes when we merge fields, the
> fdx file ends up exactly one document too short. In adding numerous
> asserts around this code in SegmentMerger.java, insanely, somehow the
> c
robert engels wrote:
The problem with abstract classes, is that any methods you provide
"know" something of the implementation, unless the methods are
implemented solely by calling other abstract methods (which is rarely
the case if the abstract class contains ANY private members).
Yes, abstr
: I do like moving towards a separation of Document for indexing vs
: searching for 3.0.
:
: Disregarding for starters how we get there from here...
:
: Wouldn't we just want a base class (not an interface), say
: ReadOnlyField, that is used in documents retrieved by a reader? This
: class woul
Probably going to disagree here... but that's ok !
I think IndexReader and IndexWriter would have been perfect
interfaces - as long as the concepts were kept very abstract.
putDocument(), getDocument(), findDocument(), etc.
and supported the semantics.
That is what I find is key to hiding t
Thanks for clarifying that up. I thought I miss something :-)
No .. I don't use term vectors, only stored fields and indexed ones, no
norms or term vectors.
As for the efficiency of RAM usage by IndexWriter - what would perform
better: setting the RAM limit to 128MB, or create a RAMDirectory and
Shai Erera wrote:
Thanks for clarifying that up. I thought I miss something :-)
No .. I don't use term vectors, only stored fields and indexed
ones, no
norms or term vectors.
Hmm, then it's hard to explain why when you set buffer to 128 MB you
never saw the process get up to that usage.
I think you misunderstood me - ultimately, the process reached 128MB.
However it was flushing the .fdt file before it reached that. Your
explanation on stored fields explains that behavior, but it did
consume128MB.
Also, the CFS files that were written were of size >200MB (but less than
256) - whi
Chris Hostetter wrote:
: I do like moving towards a separation of Document for indexing vs
: searching for 3.0.
:
: Disregarding for starters how we get there from here...
:
: Wouldn't we just want a base class (not an interface), say
: ReadOnlyField, that is used in documents retrieved by a re
Shai Erera wrote:
I think you misunderstood me - ultimately, the process reached 128MB.
However it was flushing the .fdt file before it reached that. Your
explanation on stored fields explains that behavior, but it did
consume128MB.
Ahh, phew.
Also, the CFS files that were written were of siz
> > IndexableField really shouldn't be a subclass of whatever class is
> > returned after a sarch is done ... the methods used for accessing the
> > "stored" value of a returned document make as little sense in the
> > context of IndexableField as the setBoost/Reader/TokenStream
> > functions of
Can't you just make Document non final and add a property (name of
class) that the reader will call Class.forName().newInstance() when
it needs to create document. As long as subclasses has a no-arg ctor,
what is the problem?
Note, if you allow this kind of support, passing Document instanc
[
https://issues.apache.org/jira/browse/LUCENE-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580592#action_12580592
]
Michael McCandless commented on LUCENE-510:
---
{quote}
I'm wondering why the patch
: Wouldn't subclassing ReadOnlyDocument also work in this case, if you override
: the getField* to do your own new logic if it applies else fallback to super?
Sure, but how will IndexReader (or really FieldsReader) know which
subclass to instantiate? I think in LUCENE-778 the notion of passing
Bok Marko,
Very interested. I suggest you continue the discussion on
java-dev@lucene.apache.org, though (CC-ing)
You should note that there are several efforts around distributed Lucene.
There is SOLR-303 for distributed search, and there is some work in progress in
Hadoop land around distri
28 matches
Mail list logo