Re: Fieldable, AbstractField, Field

2008-03-19 Thread eks dev
Hoss, thanks for kicking-in with your "design purist" hat on :) about your proposal, "The best short term approach I can think of for addressing LUCENE-1219 in 2.4: 1) list the new methods in a new interface that extends Fieldable (ByteArrayReuseFieldable or something) 2) add the new met

Re: Fieldable, AbstractField, Field

2008-03-19 Thread Grant Ingersoll
On Mar 19, 2008, at 6:45 AM, eks dev wrote: Hoss, thanks for kicking-in with your "design purist" hat on :) about your proposal, "The best short term approach I can think of for addressing LUCENE-1219 in 2.4: 1) list the new methods in a new interface that extends Fieldable (ByteArrayReu

Re: [jira] Commented: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-03-19 Thread Grant Ingersoll
On Mar 13, 2008, at 5:35 AM, Michael McCandless (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578199 #action_12578199 ] Michael McCandless commented on LUCENE-1219: --

Re: Fieldable, AbstractField, Field

2008-03-19 Thread Grant Ingersoll
I do agree, Hoss, that it makes sense to think about an IndexDocument and a SearchDocument or something along those lines for 3.0. Also note, I added some comments to 1219 about the history of Fieldable. On Mar 18, 2008, at 9:26 PM, Chris Hostetter wrote: : Really, I think we could go just

Re: Fieldable, AbstractField, Field

2008-03-19 Thread eks dev
Grant, "Couldn't you just mark AbstractField w/ another interface for the methods needed for 1219" sure it can be done this way, it is just not 100% obvious to me what benefit would that bring compared to the current patch. Adding another interface vs modifying only AbstractField, but the fac

Re: Fieldable, AbstractField, Field

2008-03-19 Thread Grant Ingersoll
On Mar 19, 2008, at 9:31 AM, eks dev wrote: Grant, "Couldn't you just mark AbstractField w/ another interface for the methods needed for 1219" sure it can be done this way, it is just not 100% obvious to me what benefit would that bring compared to the current patch. Adding another interfa

Explanation on RAMBufferSizeMB

2008-03-19 Thread Shai Erera
Hi I have a question on the setting of RAMBufferSizeMB on IndexWriter. It may sound like it belongs to the user list, but I actually think there is a problem with it, so I'm posting it to the dev list. I'm using 2.3.1 to index a set of documents (500K Amazon books to be exact). I don't use norms

Re: Fieldable, AbstractField, Field

2008-03-19 Thread eks dev
"Well, maybe we should put 1219 off to 3.0 and maybe we should get to 3..0 sooner rather than later, as in stop adding new features and focus on bug fixes and deprecation. :-)" honestly, "getting to 3.0 sooner" can take far too long for an itch I currently have, gc() is kicking in like crazy

Re: Explanation on RAMBufferSizeMB

2008-03-19 Thread Shai Erera
One correction - I use 2.3.0 and not 2.3.1 On Wed, Mar 19, 2008 at 4:25 PM, Shai Erera <[EMAIL PROTECTED]> wrote: > Hi > > I have a question on the setting of RAMBufferSizeMB on IndexWriter. It may > sound like it belongs to the user list, but I actually think there is a > problem with it, so I'm

Re: Explanation on RAMBufferSizeMB

2008-03-19 Thread Michael McCandless
Shai Erera wrote: Hi I have a question on the setting of RAMBufferSizeMB on IndexWriter. It may sound like it belongs to the user list, but I actually think there is a problem with it, so I'm posting it to the dev list. I'm using 2.3.1 to index a set of documents (500K Amazon books to b

Re: Fieldable, AbstractField, Field

2008-03-19 Thread Doug Cutting
Chris Hostetter wrote: Committers tend to prefer abstract classes for extension points because it makes it easier to support backwards compatibility in the cases were we want to add methods to extendable APIs and the "default" behavior for these new methods is simple (or obvious delegation to e

Re: Fieldable, AbstractField, Field

2008-03-19 Thread robert engels
I disagree on the use the interfaces. The problem with abstract classes, is that any methods you provide "know" something of the implementation, unless the methods are implemented solely by calling other abstract methods (which is rarely the case if the abstract class contains ANY private m

Re: CorruptIndexException with some versions of java

2008-03-19 Thread Michael McCandless
Hi Ian, Can you try this patch? Grasping at straws at this point, trying to isolate where the JVM fails us... I'm CC'ing java-dev. To sum up: sometimes when we merge fields, the fdx file ends up exactly one document too short. In adding numerous asserts around this code in SegmentMerge

Re: Fieldable, AbstractField, Field

2008-03-19 Thread Michael McCandless
Thanks Hoss and Grant for links to all the background on this issue. I've read through them and now my brain is really fuzzy... For 1219, I'd rather not introduce yet another interface into the Field classes. I think it confuses our users to have so many classes/interfaces to represent the fairly

Re: CorruptIndexException with some versions of java

2008-03-19 Thread Yonik Seeley
On Wed, Mar 19, 2008 at 1:18 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > I'm CC'ing java-dev. To sum up: sometimes when we merge fields, the > fdx file ends up exactly one document too short. In adding numerous > asserts around this code in SegmentMerger.java, insanely, somehow the > c

Re: Fieldable, AbstractField, Field

2008-03-19 Thread Doug Cutting
robert engels wrote: The problem with abstract classes, is that any methods you provide "know" something of the implementation, unless the methods are implemented solely by calling other abstract methods (which is rarely the case if the abstract class contains ANY private members). Yes, abstr

Re: Fieldable, AbstractField, Field

2008-03-19 Thread Chris Hostetter
: I do like moving towards a separation of Document for indexing vs : searching for 3.0. : : Disregarding for starters how we get there from here... : : Wouldn't we just want a base class (not an interface), say : ReadOnlyField, that is used in documents retrieved by a reader? This : class woul

Re: Fieldable, AbstractField, Field

2008-03-19 Thread robert engels
Probably going to disagree here... but that's ok ! I think IndexReader and IndexWriter would have been perfect interfaces - as long as the concepts were kept very abstract. putDocument(), getDocument(), findDocument(), etc. and supported the semantics. That is what I find is key to hiding t

Re: Explanation on RAMBufferSizeMB

2008-03-19 Thread Shai Erera
Thanks for clarifying that up. I thought I miss something :-) No .. I don't use term vectors, only stored fields and indexed ones, no norms or term vectors. As for the efficiency of RAM usage by IndexWriter - what would perform better: setting the RAM limit to 128MB, or create a RAMDirectory and

Re: Explanation on RAMBufferSizeMB

2008-03-19 Thread Michael McCandless
Shai Erera wrote: Thanks for clarifying that up. I thought I miss something :-) No .. I don't use term vectors, only stored fields and indexed ones, no norms or term vectors. Hmm, then it's hard to explain why when you set buffer to 128 MB you never saw the process get up to that usage.

Re: Explanation on RAMBufferSizeMB

2008-03-19 Thread Shai Erera
I think you misunderstood me - ultimately, the process reached 128MB. However it was flushing the .fdt file before it reached that. Your explanation on stored fields explains that behavior, but it did consume128MB. Also, the CFS files that were written were of size >200MB (but less than 256) - whi

Re: Fieldable, AbstractField, Field

2008-03-19 Thread Michael McCandless
Chris Hostetter wrote: : I do like moving towards a separation of Document for indexing vs : searching for 3.0. : : Disregarding for starters how we get there from here... : : Wouldn't we just want a base class (not an interface), say : ReadOnlyField, that is used in documents retrieved by a re

Re: Explanation on RAMBufferSizeMB

2008-03-19 Thread Michael McCandless
Shai Erera wrote: I think you misunderstood me - ultimately, the process reached 128MB. However it was flushing the .fdt file before it reached that. Your explanation on stored fields explains that behavior, but it did consume128MB. Ahh, phew. Also, the CFS files that were written were of siz

Re: Fieldable, AbstractField, Field

2008-03-19 Thread eks dev
> > IndexableField really shouldn't be a subclass of whatever class is > > returned after a sarch is done ... the methods used for accessing the > > "stored" value of a returned document make as little sense in the > > context of IndexableField as the setBoost/Reader/TokenStream > > functions of

Re: Fieldable, AbstractField, Field

2008-03-19 Thread robert engels
Can't you just make Document non final and add a property (name of class) that the reader will call Class.forName().newInstance() when it needs to create document. As long as subclasses has a no-arg ctor, what is the problem? Note, if you allow this kind of support, passing Document instanc

[jira] Commented: (LUCENE-510) IndexOutput.writeString() should write length in bytes

2008-03-19 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580592#action_12580592 ] Michael McCandless commented on LUCENE-510: --- {quote} I'm wondering why the patch

Re: Fieldable, AbstractField, Field

2008-03-19 Thread Chris Hostetter
: Wouldn't subclassing ReadOnlyDocument also work in this case, if you override : the getField* to do your own new logic if it applies else fallback to super? Sure, but how will IndexReader (or really FieldsReader) know which subclass to instantiate? I think in LUCENE-778 the notion of passing

Re: Google Summer of Code

2008-03-19 Thread Otis Gospodnetic
Bok Marko, Very interested. I suggest you continue the discussion on java-dev@lucene.apache.org, though (CC-ing) You should note that there are several efforts around distributed Lucene. There is SOLR-303 for distributed search, and there is some work in progress in Hadoop land around distri