[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833860#action_12833860 ] Eks Dev commented on LUCENE-329: {quote} query for John~ Patitucci~ I'm prob

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2010-02-12 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832911#action_12832911 ] Eks Dev commented on LUCENE-2089: - {quote} ...Aaron i think generation may pose a pro

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2010-02-11 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832741#action_12832741 ] Eks Dev commented on LUCENE-2089: - {quote} I assume you mean by weighted edit dist

[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2010-02-11 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832424#action_12832424 ] Eks Dev commented on LUCENE-2089: - {quote} What about this, http://www.catalysoft

Re: [jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread eks dev
() where postings get resorted on such fields (basically enabling rle encoding to work) and at the same time all other terms get optimal encoding format for postings... perfect for read only indexes where you want to max performance and reduce ix size > >From: eks dev >To:

Re: [jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread eks dev
, 2009 23:33:03 >Subject: Re: [jira] Commented: (LUCENE-1410) PFOR implementation > >Eks, > > >> >>> [ >>> https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1276274

[jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762742#action_12762742 ] Eks Dev commented on LUCENE-1410: - Mike, That is definitely the way to go, distribu

[jira] Commented: (LUCENE-1762) Slightly more readable code in TermAttributeImpl

2009-07-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735809#action_12735809 ] Eks Dev commented on LUCENE-1762: - cool, thanks for the review. > Slight

[jira] Updated: (LUCENE-1762) Slightly more readable code in TermAttributeImpl

2009-07-25 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1762: Attachment: LUCENE-1762.patch - made allocation in initTermBuffer() consistent with

[jira] Updated: (LUCENE-1762) Slightly more readable code in TermAttributeImpl

2009-07-25 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1762: Attachment: LUCENE-1762.patch made the changes in Token along the same lines, - had to change one

[jira] Updated: (LUCENE-1762) Slightly more readable code in TermAttributeImpl

2009-07-25 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1762: Attachment: LUCENE-1762.patch > Slightly more readable code in TermAttributeI

[jira] Created: (LUCENE-1762) Slightly more readable code in TermAttributeImpl

2009-07-25 Thread Eks Dev (JIRA)
: Analysis Reporter: Eks Dev Priority: Trivial No big deal. growTermBuffer(int newSize) was using correct, but slightly hard to follow code. the method was returning null as a hint that the current termBuffer has enough space to the upstream code or reallocated buffer

Re: Java caching of low-level index data?

2009-07-22 Thread eks dev
>Part of the challenge here is what metric is really important. Sure, depends who you ask :) Lucene is so popular, that you can find almost every pattern we could come up with. funny, I had to deal with similar situation. The simplest solution was to set warm-up with constructed Queries (from

Re: Java caching of low-level index data?

2009-07-22 Thread eks dev
this should not be all that difficult to try. I accept it makes sense in some cases ... but which ones? Background: all my attempts to fight OS went bed :( Let us think again what does it mean what Mike gave as an example? You are explicitly deciding that Lucene should get bigger share of RAM.

Re: Java caching of low-level index data?

2009-07-22 Thread eks dev
imo, it is too low level to do it better than OSs. I agree, cache unloading effect would be prevented with it, but I am not sure if it brings net-net benefit, you would get this problem fixed, but probably OS would kill you anyhow (you took valuable memory from OS) on queries that miss your inte

[jira] Commented: (LUCENE-1743) MMapDirectory should only mmap large files, small files should be opened using SimpleFS/NIOFS

2009-07-14 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731104#action_12731104 ] Eks Dev commented on LUCENE-1743: - right, it is not everything about reading index,

[jira] Commented: (LUCENE-1743) MMapDirectory should only mmap large files, small files should be opened using SimpleFS/NIOFS

2009-07-14 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731085#action_12731085 ] Eks Dev commented on LUCENE-1743: - indeed! obvious idea, the only thing I do not

[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730560#action_12730560 ] Eks Dev commented on LUCENE-1741: - Uwe, you convinced me, I looked at the code,

Re: [jira] Updated: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread eks dev
>I have no test data which size is good, it is just trying out Sure, for this you need bad OS and large index, you are not as lucky as I am to have it :) Anyhow, I would argument against default value. An algorithm is quite simple, if you hit OOM on map(), reduce this value until it fits :) n

Re: A Comparison of Open Source Search Engines

2009-07-06 Thread eks dev
> Anybody knows other interesting open-source search engines? Minion (https://minion.dev.java.net/) - Original Message > From: Earwin Burrfoot > To: java-dev@lucene.apache.org > Sent: Monday, 6 July, 2009 23:01:52 > Subject: Re: A Comparison of Open Source Search Engines > > I'd sa

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725182#action_12725182 ] Eks Dev commented on LUCENE-1720: - Sure, I just wanted to "sharpen definition

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2009-06-29 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725168#action_12725168 ] Eks Dev commented on LUCENE-1720: - it's been late for this issue, but ma

Re: Improving TimeLimitedCollector

2009-06-24 Thread eks dev
Re: "I think such a parameter should not exist on individual search methods since it's more of a global setting (i.e., I want my searches to be limited to 5 seconds, always, not just for a particular query). Right?" I am not sure about this one, we had cases where one phisical index served two lo

Re: Fuzzy search change

2009-06-18 Thread eks dev
what would be the difference/benefit compared to standard lucene SpellChecker? If I I am not wrong: - Lucene SpellChecker uses standard lucene index as a storage for tokens instead of QDBM... meaning full inverted index with arbitrary N-grams length, with tf/idf/norms... not only HashMap -

[jira] Commented: (LUCENE-1594) Use source code specialization to maximize search performance

2009-05-07 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707116#action_12707116 ] Eks Dev commented on LUCENE-1594: - huh, it reduces hardware costs 2-3 times for la

[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

2009-04-30 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704618#action_12704618 ] Eks Dev commented on LUCENE-1518: - Paul: ...The current patch at LUCENE-1345 does

[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

2009-04-30 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704613#action_12704613 ] Eks Dev commented on LUCENE-1518: - Shai, Regarding pure ranked, CSQ is really

[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

2009-04-30 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704561#action_12704561 ] Eks Dev commented on LUCENE-1518: - imo, it is really not all that important to

Re: new TokenStream api Question

2009-04-28 Thread eks dev
TokenStream api Question Hi Eks Dev, I actually started experimenting with changing the new API slightly to overcome one drawback: with the variables now distributed over various Attribute classes (vs. being in a single class Token previously), cloning a "Token" (i.e. calling captureS

[jira] Commented: (LUCENE-1619) TermAttribute.termLength() optimization

2009-04-28 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703543#action_12703543 ] Eks Dev commented on LUCENE-1619: - thanks Mike > TermAttribute.termLength() optim

[jira] Commented: (LUCENE-1618) Allow setting the IndexWriter docstore to be a different directory

2009-04-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703406#action_12703406 ] Eks Dev commented on LUCENE-1618: - Maybe, FileSwitchDirectory should have possibilit

Re: new TokenStream api Question

2009-04-27 Thread eks dev
ibute.class). > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: eks dev [mailto:eks...@yahoo.co.uk] > > Sent: Sunday, April 26, 2009 10

[jira] Created: (LUCENE-1619) TermAttribute.termLength() optimization

2009-04-27 Thread Eks Dev (JIRA)
Reporter: Eks Dev Priority: Trivial Attachments: LUCENE-1619.patch public int termLength() { initTermBuffer(); // This patch removes this method call return termLength; } I see no reason to initTermBuffer() in termLength()... all tests pass, but I could be

[jira] Updated: (LUCENE-1619) TermAttribute.termLength() optimization

2009-04-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1619: Attachment: LUCENE-1619.patch > TermAttribute.termLength() optimizat

[jira] Commented: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703335#action_12703335 ] Eks Dev commented on LUCENE-1616: - ant build-contrib > add one setter for start

[jira] Updated: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1616: Attachment: LUCENE-1616.patch ok, maybe this time it will work, I hope I managed to clean it up (core

[jira] Commented: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703254#action_12703254 ] Eks Dev commented on LUCENE-1616: - me too, sorry! Eclipse left me blind for some f

[jira] Updated: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1616: Attachment: LUCENE-1616.patch whoops, this time it compiles :) > add one setter for start and end off

[jira] Updated: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1616: Attachment: LUCENE-1616.patch the same as the first patch, just with removed setStart/EndOffset(int

Re: [jira] Commented: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread eks dev
                URL: https://issues.apache.org/jira/browse/LUCENE-1616 > >            Project: Lucene - Java > >          Issue Type: Improvement > >          Components: Analysis > >            Reporter: Eks Dev > >            Priority: Trivial > >           

[jira] Commented: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703085#action_12703085 ] Eks Dev commented on LUCENE-1616: - I am ok with both options, removing separate loo

Re: new TokenStream api Question

2009-04-26 Thread eks dev
lee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: eks dev [mailto:eks...@yahoo.co.uk] > > Sent: Sunday, April 26, 2009 10:39 PM > > To: java-dev@lucene.apache.org > > Subject: new TokenStream

[jira] Updated: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-26 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1616: Attachment: LUCENE-1616.patch > add one setter for start and end offset to OffsetAttrib

[jira] Created: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-26 Thread Eks Dev (JIRA)
Components: Analysis Reporter: Eks Dev Priority: Trivial add OffsetAttribute. setOffset(startOffset, endOffset); trivial change, no JUnit needed Changed CharTokenizer to use it -- This message is automatically generated by JIRA. - You can reply to this email to

Re: new TokenStream api Question

2009-04-26 Thread eks dev
with > getAttribute(TermAttribute.class). > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: eks dev [mailto:eks...@yahoo.co.uk] > >

new TokenStream api Question

2009-04-26 Thread eks dev
I am just looking into new TermAttribute usage and wonder what would be the best way to implement PrefixFilter that would filter out some Terms that have some prefix, something like this, where '-' represents my prefix: public final boolean incrementToken() throws IOException { // the f

[jira] Commented: (LUCENE-1615) deprecated method used in fieldsReader / setOmitTf()

2009-04-26 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702901#action_12702901 ] Eks Dev commented on LUCENE-1615: - sure, replacing Fieldable is good, just noticed q

[jira] Updated: (LUCENE-1615) deprecated method used in fieldsReader / setOmitTf()

2009-04-26 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1615: Attachment: LUCENE-1615.patch > deprecated method used in fieldsReader / setOmi

[jira] Created: (LUCENE-1615) deprecated method used in fieldsReader / setOmitTf()

2009-04-26 Thread Eks Dev (JIRA)
Components: Index Reporter: Eks Dev Priority: Trivial setOmitTf(boolean) is deprecated and should not be used by core classes. One place where it appears is FieldsReader , this patch fixes it. It was necessary to change Fieldable to AbstractField at two places, only local

Re: Another possible optimization - now in DocIdSetIterator

2009-04-24 Thread eks dev
Hi Shai, absolutely! we have been there, and there are already some micro benchmarks done in LUCENE-1345 just do not forget to use -1 < doc instead of -1 != doc, trust me, Yonik convinced me :) as a side effect, this change would have some positive effects on iterator semantics, prevents, ve

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-04-21 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701298#action_12701298 ] Eks Dev commented on LUCENE-1606: - hmmm, sounds like good idea, but I am still

[jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable regex)

2009-04-21 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701279#action_12701279 ] Eks Dev commented on LUCENE-1606: - Robert, in order for Lev. Automata to work, you

[jira] Commented: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs

2009-03-23 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688429#action_12688429 ] Eks Dev commented on LUCENE-1561: - maybe something along the l

[jira] Commented: (LUCENE-1410) PFOR implementation

2009-03-23 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688284#action_12688284 ] Eks Dev commented on LUCENE-1410: - It looks like Google went there as well (B

Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or top level Query

2009-02-03 Thread eks dev
aloud and probably does not make much sense. cheers, eks - Original Message > From: Michael McCandless > To: java-dev@lucene.apache.org > Sent: Tuesday, 3 February, 2009 18:28:14 > Subject: Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or > top level Quer

[jira] Commented: (LUCENE-1532) File based spellcheck with doc frequencies supplied

2009-02-02 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669595#action_12669595 ] Eks Dev commented on LUCENE-1532: - .bq but I'm not sure the exact frequency

[jira] Commented: (LUCENE-1532) File based spellcheck with doc frequencies supplied

2009-02-02 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669579#action_12669579 ] Eks Dev commented on LUCENE-1532: - .bq I got better results by refining edit dist

Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or top level Query

2009-01-31 Thread eks dev
level Query > > > Right, we just filter out the docs when iterating through postings. > > So this means, as segments are merged, the stats get corrected, which means > document scores will change for a given query. > > Mike > > Mark Miller wrote: > >

Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or top level Query

2009-01-31 Thread eks dev
"...many core unit tests will need to change, or.." Thinking about it a bit more, what is current contract for deleted documents in respect to terms? if we delete document from an index, do we update global freqs and eventually delete terms... or we simply say document ID will not be found agai

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread eks dev
indeed :) From: Paul Elschot To: java-dev@lucene.apache.org Sent: Friday, 30 January, 2009 23:37:08 Subject: Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs On Friday 30 January 2009 23:24:42 eks

Re: BloomFilter-s with Lucene

2009-01-30 Thread eks dev
a > To: java-dev@lucene.apache.org > Sent: Friday, 30 January, 2009 23:02:15 > Subject: Re: BloomFilter-s with Lucene > > > On Fri, 30 Jan 2009, eks dev wrote: > > > I have used them for speeding up huge switch clauses in charset > > normalization > (eg lowe

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread eks dev
Maybe we should close this issue with a won't-fix and start a new one for filtered deletions? A few thoughts, without looking at the code, just thinking aloud :) It is inverted filter what we are talking about here, Lucene uses Filter as a pass filter (Set bit defines document that should pas

Re: BloomFilter-s with Lucene

2009-01-30 Thread eks dev
I have used them for speeding up huge switch clauses in charset normalization (eg lowercase and accent->plain form mapping). Big number of accented characters (this causes big switch statement) that appear seldom in corpus (big majority being not accented). If negative test, you do just simple a

[jira] Commented: (LUCENE-1532) File based spellcheck with doc frequencies supplied

2009-01-30 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669018#action_12669018 ] Eks Dev commented on LUCENE-1532: - bq. so it can suggest a very obscure word rather

Re: wiki

2009-01-24 Thread eks dev
"It could be a Slavic language, but that's really no more a guess." it is one of Serbian, Croatian or Bosnian... (used to be the same language "Serbo-Croatian" 10-15 years ago, than it split on political boundaries). The same meaning, "Index of words". cheers, eks ___

Re: Filesystem based bitset

2009-01-19 Thread eks dev
Hi Paul, not really an answer to your questions, I just thought you may find it useful as a confirmation that this packing of integers into (B or some other) Tree is good one. I have seen Integer set distributions that can profit hugely from the tree organization on top. have look at: http

[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

2009-01-12 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663120#action_12663120 ] Eks Dev commented on LUCENE-1518: - nice, you did it top down (api), Paul takes it bo

Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

2008-12-08 Thread eks dev
That sounds much better. Trying to distribute lucene (my reason why all this would be interesting) itself is just not going to work for far too many applications and will put burden on API extensions. My point is, I do not want to distribute Lucene Index, I need to distribute my application tha

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-03 Thread eks dev
John, sorry I have to comment, but I feel here some substantial missconceptions abot Open Source 1) "e.g. >30 million documents indexed and searched in realtime., and I really had to do some tweaking." So what? What I or anyone else has to do with it? "some tweaking" is definitely better than

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2008-11-25 Thread eks dev
dexing > simple default could be B-Tree with prefix compression, it never disappoints and is relatively simple to implement. Berkeley DB (java edition) uses this Just to clarify, are you saying BDB Java performs prefix compression by default? On Sat, Nov 22, 2008 at 6:38 AM, eks de

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2008-11-22 Thread eks dev
that's the way to go! simple default could be B-Tree with prefix compression, it never disappoints and is relatively simple to implement. Berkeley DB (java edition) uses this, I think apache xindice as well ... if you really go heavyweight, than String B-Tree looks interesting as it mixes Pa

[jira] Commented: (LUCENE-1426) Next steps towards flexible indexing

2008-10-20 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641128#action_12641128 ] Eks Dev commented on LUCENE-1426: - Just a few random thoughts on this topic - I am

Re: docid set compression and boolean docid set operations

2008-09-13 Thread eks dev
Hi Anmol, Paul, great that someone is taking on p4delta for lucene! There are people like me that beleive this could bring some really noce performance benefits (if we get only 50% of speed-up the authors reported, it will still be huge) > Moreover, we have implementations of Logic operators o

[jira] Commented: (LUCENE-1329) Remove synchronization in SegmentReader.isDeleted

2008-08-22 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624657#action_12624657 ] Eks Dev commented on LUCENE-1329: - ok, I see, thanks. At least, It resolves an i

[jira] Commented: (LUCENE-1329) Remove synchronization in SegmentReader.isDeleted

2008-08-22 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624634#action_12624634 ] Eks Dev commented on LUCENE-1329: - Mike, did someone measure what this brings?

Re: RAMDisk.reload(FSDirectory)

2008-08-19 Thread eks dev
uesday, 19 August, 2008 6:26:50 PM > Subject: Re: RAMDisk.reload(FSDirectory) > > > Since Lucene is write-once, this should be fairly simple to do. You > just list the files, remove any that are now gone, copy over any new > files? > > Mike > > eks dev wrote: &g

[jira] Commented: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-08-19 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623593#action_12623593 ] Eks Dev commented on LUCENE-1219: - bq. did you ever measure the before/after perform

[jira] Commented: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-08-18 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623332#action_12623332 ] Eks Dev commented on LUCENE-1219: - how was it: "repetitio est mater studiorum&qu

RAMDisk.reload(FSDirectory)

2008-08-15 Thread eks dev
short one, maybe stupid question: use case, - you load your index from Disk into RAMDisk - Use it in read only mode (no modifications on RAMDisk index) - Update on disk index from another, external process - reload only changes into RAM instead of reloading complete index is there any way to

[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-08-09 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1219: Attachment: LUCENE-1219.extended.patch bq. couldn't you just call document.getFieldable(name), and

[jira] Commented: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-08-08 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621036#action_12621036 ] Eks Dev commented on LUCENE-1219: - bq. could we instead add this to Field:

[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-08-08 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1219: Attachment: LUCENE-1219.extended.patch Mike, This new patch includes take3 and adds the following

[jira] Commented: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-08-05 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620019#action_12620019 ] Eks Dev commented on LUCENE-1219: - Great Mike, it gets better and better, i saw LU

[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-08-03 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1219: Attachment: LUCENE-1219.take3.patch - updated this patch to apply to trunk - implemented abstract

Re: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-30 Thread eks dev
than we conclude, comparison with 0 is faster :) Maybe something on my XP machine was doing something in background I have not noticed, stealing cycles, on Windows this can not be easily controlled. or when I tested it the other day, I used comparison with -1 while((doc=it.next()) >-1) could

[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-29 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618069#action_12618069 ] Eks Dev commented on LUCENE-1340: - that sound like consensus :) Great! in that

[jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-29 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617978#action_12617978 ] Eks Dev commented on LUCENE-1340: - ouch! it is kind of getting personal between me

[jira] Updated: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-29 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1345: Attachment: OpenBitSetIteratorExperiment.java TestIteratorPerf.java I just enhanced

[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-29 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617836#action_12617836 ] Eks Dev commented on LUCENE-1345: - bq. comparison with -1 is being optimized

[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-29 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617726#action_12617726 ] Eks Dev commented on LUCENE-1345: - Yonik, this would probably work fine for int va

Re: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-29 Thread eks dev
as a matter of fact, you can, keeping literals on left hand side prevents some ugly accidental assignments, so at the end of day you have more time to speed things up instead of chasing bugs :) cheers Hoss, god to see you are following this - Original Message > From: Chris Hostetter

Re: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-28 Thread eks dev
from what I can say, this just makes it harder for the new approach, but you newer know before you try it in "production" ... just wanted to see if it could lead anywhere before spending real time on it - Original Message > From: Paul Elschot (JIRA) <[EMAIL PROTECTED]> > To: java-dev@

[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-28 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617603#action_12617603 ] Eks Dev commented on LUCENE-1345: - great! Will look into at at the weekend in

[jira] Updated: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-28 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1345: Attachment: TestIteratorPerf.java Hi Paul, I gave it a try on micro benchmarking, and it looks like we

Re: ScorerDocQueue.HeapedScorerDoc

2008-07-28 Thread eks dev
>>... to change semantics of these iterators not to return boolen but > > rather document Id with sentinel values. This would definitely reduce > > number of method invocations by factor 2 at least.--- {next() doc()} > > -> next() > > > > It would be pretty easy to do that, just requires on one h

[jira] Updated: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-27 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1345: Attachment: DisjunctionDISI.patch I just realised TestDisjunctionDISI had a bug (iterators have to be

Re: ScorerDocQueue.HeapedScorerDoc

2008-07-27 Thread eks dev
ike. I think MG4J people made this switch in last version as well. - Original Message > From: Paul Elschot <[EMAIL PROTECTED]> > To: java-dev@lucene.apache.org > Sent: Sunday, 27 July, 2008 1:04:26 AM > Subject: Re: ScorerDocQueue.HeapedScorerDoc > > Op Satur

Re: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-26 Thread eks dev
Hi Paul, it sounds so familiar. I too like playing with lucene, makes fun, but I have not found formula to make 25 Hours day (waking up one hour earlier does not work for me for some strange reason) The only other person being so interested in this Filter-like issues is Yonik, but I guess he ha

ScorerDocQueue.HeapedScorerDoc

2008-07-26 Thread eks dev
what is the reason to have HeapedScorerDoc class in ScorerDocQueue? Caching of the doc value? Does this bring anything compared to invoking doc() on Scorer, just curious, maybe I do not see something ovious... If doc is the reason, I would bet on doc() ___

[jira] Updated: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-26 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1345: Attachment: DisjunctionDISI.patch bq. Would anyone have a DisjunctionDISI (Disjunction over

  1   2   3   >