[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833860#action_12833860 ] Eks Dev commented on LUCENE-329: {quote} query for John~ Patitucci~ I'm probably more

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2010-02-12 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832911#action_12832911 ] Eks Dev commented on LUCENE-2089: - {quote} ...Aaron i think generation may pose a problem

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2010-02-11 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832424#action_12832424 ] Eks Dev commented on LUCENE-2089: - {quote} What about this, http://www.catalysoft.com

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2010-02-11 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832741#action_12832741 ] Eks Dev commented on LUCENE-2089: - {quote} I assume you mean by weighted edit distance

[jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762742#action_12762742 ] Eks Dev commented on LUCENE-1410: - Mike, That is definitely the way to go, distribution

Re: [jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread eks dev
October, 2009 23:33:03 Subject: Re: [jira] Commented: (LUCENE-1410) PFOR implementation Eks, [ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762742#action_12762742 ] Eks Dev commented on LUCENE

Re: [jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread eks dev
() where postings get resorted on such fields (basically enabling rle encoding to work) and at the same time all other terms get optimal encoding format for postings... perfect for read only indexes where you want to max performance and reduce ix size From: eks dev eks...@yahoo.co.uk To: java

[jira] Commented: (LUCENE-1762) Slightly more readable code in TermAttributeImpl

2009-07-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735809#action_12735809 ] Eks Dev commented on LUCENE-1762: - cool, thanks for the review. Slightly more

[jira] Created: (LUCENE-1762) Slightly more readable code in TermAttributeImpl

2009-07-25 Thread Eks Dev (JIRA)
: Analysis Reporter: Eks Dev Priority: Trivial No big deal. growTermBuffer(int newSize) was using correct, but slightly hard to follow code. the method was returning null as a hint that the current termBuffer has enough space to the upstream code or reallocated buffer

[jira] Updated: (LUCENE-1762) Slightly more readable code in TermAttributeImpl

2009-07-25 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1762: Attachment: LUCENE-1762.patch Slightly more readable code in TermAttributeImpl

[jira] Updated: (LUCENE-1762) Slightly more readable code in TermAttributeImpl

2009-07-25 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1762: Attachment: LUCENE-1762.patch made the changes in Token along the same lines, - had to change one

[jira] Updated: (LUCENE-1762) Slightly more readable code in TermAttributeImpl

2009-07-25 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1762: Attachment: LUCENE-1762.patch - made allocation in initTermBuffer() consistent

Re: Java caching of low-level index data?

2009-07-22 Thread eks dev
imo, it is too low level to do it better than OSs. I agree, cache unloading effect would be prevented with it, but I am not sure if it brings net-net benefit, you would get this problem fixed, but probably OS would kill you anyhow (you took valuable memory from OS) on queries that miss your

Re: Java caching of low-level index data?

2009-07-22 Thread eks dev
this should not be all that difficult to try. I accept it makes sense in some cases ... but which ones? Background: all my attempts to fight OS went bed :( Let us think again what does it mean what Mike gave as an example? You are explicitly deciding that Lucene should get bigger share of

[jira] Commented: (LUCENE-1743) MMapDirectory should only mmap large files, small files should be opened using SimpleFS/NIOFS

2009-07-14 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731085#action_12731085 ] Eks Dev commented on LUCENE-1743: - indeed! obvious idea, the only thing I do not like

[jira] Commented: (LUCENE-1743) MMapDirectory should only mmap large files, small files should be opened using SimpleFS/NIOFS

2009-07-14 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731104#action_12731104 ] Eks Dev commented on LUCENE-1743: - right, it is not everything about reading index, you

Re: [jira] Updated: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread eks dev
I have no test data which size is good, it is just trying out Sure, for this you need bad OS and large index, you are not as lucky as I am to have it :) Anyhow, I would argument against default value. An algorithm is quite simple, if you hit OOM on map(), reduce this value until it fits :)

[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730560#action_12730560 ] Eks Dev commented on LUCENE-1741: - Uwe, you convinced me, I looked at the code, and indeed

Re: A Comparison of Open Source Search Engines

2009-07-06 Thread eks dev
Anybody knows other interesting open-source search engines? Minion (https://minion.dev.java.net/) - Original Message From: Earwin Burrfoot ear...@gmail.com To: java-dev@lucene.apache.org Sent: Monday, 6 July, 2009 23:01:52 Subject: Re: A Comparison of Open Source Search Engines

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725168#action_12725168 ] Eks Dev commented on LUCENE-1720: - it's been late for this issue, but maybe worth thinking

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725182#action_12725182 ] Eks Dev commented on LUCENE-1720: - Sure, I just wanted to sharpen definition what

Re: Improving TimeLimitedCollector

2009-06-24 Thread eks dev
Re: I think such a parameter should not exist on individual search methods since it's more of a global setting (i.e., I want my searches to be limited to 5 seconds, always, not just for a particular query). Right? I am not sure about this one, we had cases where one phisical index served two

Re: Fuzzy search change

2009-06-18 Thread eks dev
what would be the difference/benefit compared to standard lucene SpellChecker? If I I am not wrong: - Lucene SpellChecker uses standard lucene index as a storage for tokens instead of QDBM... meaning full inverted index with arbitrary N-grams length, with tf/idf/norms... not only

[jira] Commented: (LUCENE-1594) Use source code specialization to maximize search performance

2009-05-07 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707116#action_12707116 ] Eks Dev commented on LUCENE-1594: - huh, it reduces hardware costs 2-3 times for larger

[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

2009-04-30 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12704561#action_12704561 ] Eks Dev commented on LUCENE-1518: - imo, it is really not all that important to make Filter

[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

2009-04-30 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12704613#action_12704613 ] Eks Dev commented on LUCENE-1518: - Shai, Regarding pure ranked, CSQ is really what we

[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

2009-04-30 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12704618#action_12704618 ] Eks Dev commented on LUCENE-1518: - Paul: ...The current patch at LUCENE-1345 does not need

Re: new TokenStream api Question

2009-04-28 Thread eks dev
Subject: Re: new TokenStream api Question Hi Eks Dev, I actually started experimenting with changing the new API slightly to overcome one drawback: with the variables now distributed over various Attribute classes (vs. being in a single class Token previously), cloning a Token (i.e. calling

[jira] Commented: (LUCENE-1619) TermAttribute.termLength() optimization

2009-04-28 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703543#action_12703543 ] Eks Dev commented on LUCENE-1619: - thanks Mike TermAttribute.termLength() optimization

[jira] Commented: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703085#action_12703085 ] Eks Dev commented on LUCENE-1616: - I am ok with both options, removing separate looks

Re: [jira] Commented: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread eks dev
          Components: Analysis             Reporter: Eks Dev             Priority: Trivial             Fix For: 2.9         Attachments: LUCENE-1616.patch add OffsetAttribute. setOffset(startOffset, endOffset); trivial change, no JUnit needed Changed CharTokenizer to use it -- This message

[jira] Updated: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1616: Attachment: LUCENE-1616.patch whoops, this time it compiles :) add one setter for start and end offset

[jira] Commented: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703254#action_12703254 ] Eks Dev commented on LUCENE-1616: - me too, sorry! Eclipse left me blind for some funny

[jira] Updated: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1616: Attachment: LUCENE-1616.patch ok, maybe this time it will work, I hope I managed to clean it up (core

[jira] Commented: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703335#action_12703335 ] Eks Dev commented on LUCENE-1616: - ant build-contrib add one setter for start and end

[jira] Updated: (LUCENE-1619) TermAttribute.termLength() optimization

2009-04-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1619: Attachment: LUCENE-1619.patch TermAttribute.termLength() optimization

[jira] Created: (LUCENE-1619) TermAttribute.termLength() optimization

2009-04-27 Thread Eks Dev (JIRA)
Reporter: Eks Dev Priority: Trivial Attachments: LUCENE-1619.patch public int termLength() { initTermBuffer(); // This patch removes this method call return termLength; } I see no reason to initTermBuffer() in termLength()... all tests pass, but I could

Re: new TokenStream api Question

2009-04-27 Thread eks dev
-Original Message- From: eks dev [mailto:eks...@yahoo.co.uk] Sent: Sunday, April 26, 2009 10:39 PM To: java-dev@lucene.apache.org Subject: new TokenStream api Question I am just looking into new TermAttribute usage and wonder what would be the best way to implement PrefixFilter

[jira] Commented: (LUCENE-1618) Allow setting the IndexWriter docstore to be a different directory

2009-04-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703406#action_12703406 ] Eks Dev commented on LUCENE-1618: - Maybe, FileSwitchDirectory should have possibility

[jira] Created: (LUCENE-1615) deprecated method used in fieldsReader / setOmitTf()

2009-04-26 Thread Eks Dev (JIRA)
Components: Index Reporter: Eks Dev Priority: Trivial setOmitTf(boolean) is deprecated and should not be used by core classes. One place where it appears is FieldsReader , this patch fixes it. It was necessary to change Fieldable to AbstractField at two places, only local

[jira] Updated: (LUCENE-1615) deprecated method used in fieldsReader / setOmitTf()

2009-04-26 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1615: Attachment: LUCENE-1615.patch deprecated method used in fieldsReader / setOmitTf

[jira] Commented: (LUCENE-1615) deprecated method used in fieldsReader / setOmitTf()

2009-04-26 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12702901#action_12702901 ] Eks Dev commented on LUCENE-1615: - sure, replacing Fieldable is good, just noticed quick

new TokenStream api Question

2009-04-26 Thread eks dev
I am just looking into new TermAttribute usage and wonder what would be the best way to implement PrefixFilter that would filter out some Terms that have some prefix, something like this, where '-' represents my prefix: public final boolean incrementToken() throws IOException { // the

Re: new TokenStream api Question

2009-04-26 Thread eks dev
Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: eks dev [mailto:eks...@yahoo.co.uk] Sent: Sunday, April 26, 2009 10:39 PM To: java-dev@lucene.apache.org Subject: new TokenStream api Question I am just looking into new TermAttribute usage

Re: new TokenStream api Question

2009-04-26 Thread eks dev
is not possible, as the indexer will get a ClassCastException when using the instance retrieved with getAttribute(TermAttribute.class). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: eks dev

[jira] Created: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-26 Thread Eks Dev (JIRA)
Components: Analysis Reporter: Eks Dev Priority: Trivial add OffsetAttribute. setOffset(startOffset, endOffset); trivial change, no JUnit needed Changed CharTokenizer to use it -- This message is automatically generated by JIRA. - You can reply to this email

[jira] Updated: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-26 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1616: Attachment: LUCENE-1616.patch add one setter for start and end offset to OffsetAttribute

Re: Another possible optimization - now in DocIdSetIterator

2009-04-24 Thread eks dev
Hi Shai, absolutely! we have been there, and there are already some micro benchmarks done in LUCENE-1345 just do not forget to use -1 doc instead of -1 != doc, trust me, Yonik convinced me :) as a side effect, this change would have some positive effects on iterator semantics, prevents,

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-04-21 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701279#action_12701279 ] Eks Dev commented on LUCENE-1606: - Robert, in order for Lev. Automata to work, you need

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-04-21 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12701298#action_12701298 ] Eks Dev commented on LUCENE-1606: - hmmm, sounds like good idea, but I am still

[jira] Commented: (LUCENE-1410) PFOR implementation

2009-03-23 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12688284#action_12688284 ] Eks Dev commented on LUCENE-1410: - It looks like Google went there as well (Block encoding

[jira] Commented: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-03-23 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12688429#action_12688429 ] Eks Dev commented on LUCENE-1561: - maybe something along the lines

Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or top level Query

2009-02-03 Thread eks dev
Message From: Michael McCandless luc...@mikemccandless.com To: java-dev@lucene.apache.org Sent: Tuesday, 3 February, 2009 18:28:14 Subject: Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or top level Query eks dev wrote: Thanks for confirming

[jira] Commented: (LUCENE-1532) File based spellcheck with doc frequencies supplied

2009-02-02 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12669595#action_12669595 ] Eks Dev commented on LUCENE-1532: - .bq but I'm not sure the exact frequency number at just

Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or top level Query

2009-01-31 Thread eks dev
...many core unit tests will need to change, or.. Thinking about it a bit more, what is current contract for deleted documents in respect to terms? if we delete document from an index, do we update global freqs and eventually delete terms... or we simply say document ID will not be found

Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or top level Query

2009-01-31 Thread eks dev
Right, we just filter out the docs when iterating through postings. So this means, as segments are merged, the stats get corrected, which means document scores will change for a given query. Mike Mark Miller wrote: eks dev wrote: ...many core unit tests will need to change

[jira] Commented: (LUCENE-1532) File based spellcheck with doc frequencies supplied

2009-01-30 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12669018#action_12669018 ] Eks Dev commented on LUCENE-1532: - bq. so it can suggest a very obscure word rather than

Re: BloomFilter-s with Lucene

2009-01-30 Thread eks dev
I have used them for speeding up huge switch clauses in charset normalization (eg lowercase and accent-plain form mapping). Big number of accented characters (this causes big switch statement) that appear seldom in corpus (big majority being not accented). If negative test, you do just simple

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread eks dev
Maybe we should close this issue with a won't-fix and start a new one for filtered deletions? A few thoughts, without looking at the code, just thinking aloud :) It is inverted filter what we are talking about here, Lucene uses Filter as a pass filter (Set bit defines document that should

Re: BloomFilter-s with Lucene

2009-01-30 Thread eks dev
...@osafoundation.org To: java-dev@lucene.apache.org Sent: Friday, 30 January, 2009 23:02:15 Subject: Re: BloomFilter-s with Lucene On Fri, 30 Jan 2009, eks dev wrote: I have used them for speeding up huge switch clauses in charset normalization (eg lowercase and accent-plain form mapping). Big

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread eks dev
January 2009 23:24:42 eks dev wrote: ... This is conceptually almost equal (fully equal, when Paul gets Fillters as bolean clauses done) to having separate, single valued field indexed isDeleted {true, false} where each Query gets implicitly transformed to OriginalQuery

Re: wiki

2009-01-24 Thread eks dev
It could be a Slavic language, but that's really no more a guess. it is one of Serbian, Croatian or Bosnian... (used to be the same language Serbo-Croatian 10-15 years ago, than it split on political boundaries). The same meaning, Index of words. cheers, eks

Re: Filesystem based bitset

2009-01-19 Thread eks dev
Hi Paul, not really an answer to your questions, I just thought you may find it useful as a confirmation that this packing of integers into (B or some other) Tree is good one. I have seen Integer set distributions that can profit hugely from the tree organization on top. have look at:

[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

2009-01-12 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663120#action_12663120 ] Eks Dev commented on LUCENE-1518: - nice, you did it top down (api), Paul takes it bottom

Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

2008-12-08 Thread eks dev
That sounds much better. Trying to distribute lucene (my reason why all this would be interesting) itself is just not going to work for far too many applications and will put burden on API extensions. My point is, I do not want to distribute Lucene Index, I need to distribute my application

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-03 Thread eks dev
John, sorry I have to comment, but I feel here some substantial missconceptions abot Open Source 1) e.g. 30 million documents indexed and searched in realtime., and I really had to do some tweaking. So what? What I or anyone else has to do with it? some tweaking is definitely better than

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2008-11-25 Thread eks dev
simple default could be B-Tree with prefix compression, it never disappoints and is relatively simple to implement. Berkeley DB (java edition) uses this Just to clarify, are you saying BDB Java performs prefix compression by default? On Sat, Nov 22, 2008 at 6:38 AM, eks dev [EMAIL PROTECTED

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2008-11-22 Thread eks dev
that's the way to go! simple default could be B-Tree with prefix compression, it never disappoints and is relatively simple to implement. Berkeley DB (java edition) uses this, I think apache xindice as well ... if you really go heavyweight, than String B-Tree looks interesting as it mixes

[jira] Commented: (LUCENE-1426) Next steps towards flexible indexing

2008-10-20 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12641128#action_12641128 ] Eks Dev commented on LUCENE-1426: - Just a few random thoughts on this topic - I am sure I

Re: docid set compression and boolean docid set operations

2008-09-13 Thread eks dev
Hi Anmol, Paul, great that someone is taking on p4delta for lucene! There are people like me that beleive this could bring some really noce performance benefits (if we get only 50% of speed-up the authors reported, it will still be huge) Moreover, we have implementations of Logic operators

[jira] Commented: (LUCENE-1329) Remove synchronization in SegmentReader.isDeleted

2008-08-22 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12624634#action_12624634 ] Eks Dev commented on LUCENE-1329: - Mike, did someone measure what this brings

[jira] Commented: (LUCENE-1329) Remove synchronization in SegmentReader.isDeleted

2008-08-22 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12624657#action_12624657 ] Eks Dev commented on LUCENE-1329: - ok, I see, thanks. At least, It resolves an issue

[jira] Commented: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-08-19 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12623593#action_12623593 ] Eks Dev commented on LUCENE-1219: - bq. did you ever measure the before/after performance

Re: RAMDisk.reload(FSDirectory)

2008-08-19 Thread eks dev
PM Subject: Re: RAMDisk.reload(FSDirectory) Since Lucene is write-once, this should be fairly simple to do. You just list the files, remove any that are now gone, copy over any new files? Mike eks dev wrote: short one, maybe stupid question: use case, - you load your

[jira] Commented: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-08-18 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12623332#action_12623332 ] Eks Dev commented on LUCENE-1219: - how was it: repetitio est mater studiorum ;) thanks

RAMDisk.reload(FSDirectory)

2008-08-15 Thread eks dev
short one, maybe stupid question: use case, - you load your index from Disk into RAMDisk - Use it in read only mode (no modifications on RAMDisk index) - Update on disk index from another, external process - reload only changes into RAM instead of reloading complete index is there any way to

[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-08-09 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1219: Attachment: LUCENE-1219.extended.patch bq. couldn't you just call document.getFieldable(name

[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-08-08 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1219: Attachment: LUCENE-1219.extended.patch Mike, This new patch includes take3 and adds the following

[jira] Commented: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-08-08 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12621036#action_12621036 ] Eks Dev commented on LUCENE-1219: - bq. could we instead add this to Field: byte

[jira] Commented: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-08-05 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12620019#action_12620019 ] Eks Dev commented on LUCENE-1219: - Great Mike, it gets better and better, i saw LUCENE

Re: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-30 Thread eks dev
than we conclude, comparison with 0 is faster :) Maybe something on my XP machine was doing something in background I have not noticed, stealing cycles, on Windows this can not be easily controlled. or when I tested it the other day, I used comparison with -1 while((doc=it.next()) -1) could

Re: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-29 Thread eks dev
as a matter of fact, you can, keeping literals on left hand side prevents some ugly accidental assignments, so at the end of day you have more time to speed things up instead of chasing bugs :) cheers Hoss, god to see you are following this - Original Message From: Chris Hostetter

[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-29 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617726#action_12617726 ] Eks Dev commented on LUCENE-1345: - Yonik, this would probably work fine for int values

[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-29 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617836#action_12617836 ] Eks Dev commented on LUCENE-1345: - bq. comparison with -1 is being optimized away entirely

[jira] Updated: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-29 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1345: Attachment: OpenBitSetIteratorExperiment.java TestIteratorPerf.java I just enhanced

[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-29 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617978#action_12617978 ] Eks Dev commented on LUCENE-1340: - ouch! it is kind of getting personal between me

[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-29 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12618069#action_12618069 ] Eks Dev commented on LUCENE-1340: - that sound like consensus :) Great! in that case

Re: ScorerDocQueue.HeapedScorerDoc

2008-07-28 Thread eks dev
... to change semantics of these iterators not to return boolen but rather document Id with sentinel values. This would definitely reduce number of method invocations by factor 2 at least.--- {next() doc()} - next() It would be pretty easy to do that, just requires on one huge patch,

[jira] Updated: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-28 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1345: Attachment: TestIteratorPerf.java Hi Paul, I gave it a try on micro benchmarking, and it looks like we

[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-28 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617603#action_12617603 ] Eks Dev commented on LUCENE-1345: - great! Will look into at at the weekend in more datails

Re: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-28 Thread eks dev
from what I can say, this just makes it harder for the new approach, but you newer know before you try it in production ... just wanted to see if it could lead anywhere before spending real time on it - Original Message From: Paul Elschot (JIRA) [EMAIL PROTECTED] To:

Re: ScorerDocQueue.HeapedScorerDoc

2008-07-27 Thread eks dev
people made this switch in last version as well. - Original Message From: Paul Elschot [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Sunday, 27 July, 2008 1:04:26 AM Subject: Re: ScorerDocQueue.HeapedScorerDoc Op Saturday 26 July 2008 23:09:06 schreef eks dev

[jira] Updated: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1345: Attachment: DisjunctionDISI.patch I just realised TestDisjunctionDISI had a bug (iterators have

[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-26 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617140#action_12617140 ] Eks Dev commented on LUCENE-1340: - we finished our tests Index without omitTf

[jira] Updated: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-26 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1345: Attachment: DisjunctionDISI.patch bq. Would anyone have a DisjunctionDISI (Disjunction over

ScorerDocQueue.HeapedScorerDoc

2008-07-26 Thread eks dev
what is the reason to have HeapedScorerDoc class in ScorerDocQueue? Caching of the doc value? Does this bring anything compared to invoking doc() on Scorer, just curious, maybe I do not see something ovious... If doc is the reason, I would bet on doc()

Re: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-26 Thread eks dev
Hi Paul, it sounds so familiar. I too like playing with lucene, makes fun, but I have not found formula to make 25 Hours day (waking up one hour earlier does not work for me for some strange reason) The only other person being so interested in this Filter-like issues is Yonik, but I guess he

Re: [jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-24 Thread eks dev
Reporter: Eks Dev Priority: Minor Attachments: LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch, LUCENE-1340.patch Original Estimate: 24h Remaining Estimate: 24h Term Frequency is typically not needed for all

Re: performance optimizations

2008-07-23 Thread eks dev
sure, nice article, big Ohhh notation should be addressed first, but try running Analyzers before Mike added char[] and compare try Indexing with some older versions, basically nothing significantly changed from the algorithmic point of view Doug set years ago, all that happened there is just

Re: performance optimizations

2008-07-23 Thread eks dev
and just one more for arguments sake, in Lucene obscure bit twiddling is the great deal, have a look at all recent / old work on inverted index design, p4delta, rank9/16 ... it is nothing more nor less than obscure bit twiddling - Original Message From: eks dev [EMAIL

  1   2   3   >