Tree based BitSet (aka IntegerSet, DocSet...)

2006-01-28 Thread eks dev
might be interesting: http://www.iis.uni-stuttgart.de/intset/ Another way to represent Bit(Integer)Set. Should outperform nicely BitSet or HashBitSet as far as iteration speed and memory is concern. In Lucene where distribution of set bits is typically exponential... usage in caching, Filter...

Re: Tree based BitSet (aka IntegerSet, DocSet...)

2006-01-29 Thread eks dev
> Unfortunately, the license distributed with the JAR > (which we must > assume takes precedence over whatever is stated on > the web pages) is > much more restrictive, it's the Java Research > License, which > specifically disallows any commercial use. So, short > of reimplementing > it from s

Re: Filter

2006-03-10 Thread eks dev
It looks to me everybody agrees here, not? If yes, it would be really usefull if somebody with commit rights could add 1) and 2) to the trunk (these patches practically allready exist). It is not invasive change and there are no problems with compatibility. Also, I have noticed a lot of people try

Re: Changing Lucene scoring?

2006-05-09 Thread eks dev
Hi Otis, "I often need just yes/no (matches/doesn't match) answers,... " Not sure if you ment this: "how I could implement pure boolean model, completely avoiding scoring?". If yes, what comes to my mind is Filtering, ChainedFilter, ConstantScore* and all these discussions about implementing n

Re: OpenBitSet

2006-05-14 Thread eks dev
It is faster than BitSet, even against Mustang. The numbers are a bit less than on Yonik’s HW, but quite convincing. I did small test on my XP Notebook (Pentium M 1.6GHz). Only “union” test is some 20% slower on 8Mio size with 80k bits set. I did not dig deeper. As much as it is worth,

Re: OpenBitSet

2006-05-16 Thread eks dev
>Weird... I'm not sure how that could be. Are you sure you didn't get >the numbers reversed? that is exactly what happend, sorry for wrong numbers, now it looks as it should: java -version Java(TM) SE Runtime Environment (build 1.6.0-beta2-b83) Java HotSpot(TM) Client VM (build 1.6.0-beta2-b8

Re: OpenBitSet

2006-05-16 Thread eks dev
t and drop it somewhere on Jira if anybody has interest to play with. A bit off topic, is there anybody who is doing ChainedFilter version that uses docNrSkipper? As I recall, you wrote BitSet version :) - Original Message From: Chris Hostetter <[EMAIL PROTECTED]> To: java-dev@luc

Re: Nio File Caching & Performance Test

2006-05-16 Thread eks dev
Hi Robert, I might be easily wrong, but I beleive I saw something on JIRA (or was it bugzilla?) a long long time ago, where somebody made MMAP implementation for really big indexes that works on 32 bit. I guess it is worth checking it. - Original Message From: Yonik Seeley <[EMAIL PROT

Re: Explaining a filter; Scorer extending Matcher; (was: BooleanWeight.normalize(float) doesn't normalize prohibited clauses?)

2006-05-21 Thread eks dev
"Any thoughts on whether such a Matcher would be preferable to a DocNrSkipper that only has this method: int nextDocNr(int docNr) ?" As far as I can comprehend, it makes a lot of sense to decouple Scoring from Matching (of course their intermixing as well). This would practically mean that

Re: Explaining a filter; Scorer extending Matcher;

2006-05-23 Thread eks dev
>"DocIterator", that way even TermDocs could impliment it ... I played with that idea and what I learned is that DocIterator should throw IOException as TermDocs throws it (It has been allready mentioned on compact sparse filters JIRA).

Re: Lucene and Java 1.5

2006-05-27 Thread eks dev
so far: pro: 1. Code readability 2. Faster contribs (as many of active developers moved to it allready) 3. "moving forward effect" as sooner or later it will be the same argument for 1.6, 1.7... good feeling to stay close 4. Some performance boost not only from better hotspot, but from new jvm

Re: Benchmarking on GOV2

2006-05-29 Thread eks dev
That would be great to see! There is a million of enhancements and ideas that could come up as a result of this comparison. For example, I would not be surprised to see mg4j "perfect skipping" to become interesting optimization for Lucene, Trie based Lexicon could make some regex queries signi

Re: Lucene and Java 1.5

2006-05-30 Thread eks dev
LinkedHashMap for LRUs, StringBuilder... - Original Message From: Chris Hostetter <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Tuesday, 30 May, 2006 7:51:23 PM Subject: Re: Lucene and Java 1.5 : Agreed. But, I have not heard one compelling argument for the JDK 5 for : core.

trivial util to Visualize BitSets (Query results actually)

2006-05-31 Thread eks dev
Maybe there are some more people that like to see bits. Feel free to do whatever you like with it. Idea is simple, map 8 bits from HitCollector to one pixel by changing gray levels. Implementation is Quick 'n Dirty, but does the job. /** * Copyright 2004 The Apache Software Foundation * * L

Re: trivial util to Visualize BitSets (Query results actually)

2006-05-31 Thread eks dev
harwood <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org; eks dev <[EMAIL PROTECTED]> Sent: Wednesday, 31 May, 2006 5:12:06 PM Subject: Re: trivial util to Visualize BitSets (Query results actually) I added something similar to Luke but without the colour intensity - I may add you

Lexicon access questions

2006-06-01 Thread eks dev
We have faced the following use case: In order to optimize performance and more importantly quality of search results we are forced to attach more attributes to particular words (Terms). Generic attributes like TF, IDF are usefull to model our "similarity" only up to some level. Examples: 1.

Re: Lexicon access questions

2006-06-03 Thread eks dev
to extend the idea to support this by naming your tags something like TERM_TAG where TERM is the term they apply to (best if the character used for '_' cannot occur in any term). Then something like a TaggedTermQuery could easily find the tags relevant to a term in the query and iterate t

Re: Edit-distance strategy

2006-06-08 Thread eks dev
>I'm about to replace the edit-distance algorithm in FuzzyQuery from >Levenstein to Hirschberg to save a couple of clockticks. Have you allready confirmed Hirschberg algorithm to be faster than current implementation of edit distance? I am not convinced it helps really. Hirschberg and standard

Re: Edit-distance strategy

2006-06-08 Thread eks dev
distance calculations... good luck with it - Original Message From: eks dev <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Thursday, 8 June, 2006 4:07:07 PM Subject: Re: Edit-distance strategy >I'm about to replace the edit-distance algorithm in FuzzyQuery from

Re: Edit-distance strategy (slicing and one vs. all algorithms)

2006-06-08 Thread eks dev
Hi Bob, really nice answer! >The real gain would be to do something like the >edit-distance generalization of Aho-Corasick. The >basic idea is that instead of n iterations of string vs. string, >you do one iteration of string vs. trie. I was experimenting a bit with ternary trie as it has so

Re: Edit-distance strategy (slicing and one vs. all algorithms)

2006-06-10 Thread eks dev
No worries at all Yonik, Lingpipe is too big to be included into Lucene and nobody plans to go shadow rute by stealing somebody's hard work :) On the other side, I am convinced Bob won't mind if we learn something from Lingpipe (must say, on first look, the thing has some extremly clever solut

Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-17 Thread eks dev
Chuck, you nailed it! This reverse view is really what brings clarity, at least to me. It boils down to the question "Who is loosing what?" Move to 1.5: some people will not have an oportunity to use new cool features that will come in 2.x versions. So they know the feeling, they cannot use co

New work on Perfect Skip List (mg4j work)

2006-07-31 Thread eks dev
looks interesting: http://vigna.dsi.unimi.it/ftp/papers/CompressedPerfectEmbeddedSkipLists.pdf - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Combining search steps without re-searching

2006-08-28 Thread eks dev
you are right Chuck, it depends... Filters are great for fields with small cardinality (majority of terms in normal collection) or things that are sorted (assuming Paul's patch gets commited so we do not use BitSet and we could use less memory hungry structures like interval lists :) With BitSet

Re: Combining search steps without re-searching

2006-08-30 Thread eks dev
Paul, my offer is valid, please shout if and where you need some help, test cases... not t skilled with deep Lucene internals, but could help at least in API view... .. >At the moment I don't remember what the FIXME's are about, so I'll >need a bit of time getting back into it. >Once t

Re: LUCENE-584, was "Combining search steps without re-searching"

2006-08-30 Thread eks dev
g Sent: Wednesday, 30 August, 2006 9:49:41 PM Subject: LUCENE-584, was "Combining search steps without re-searching" On Wednesday 30 August 2006 21:08, eks dev wrote: > Paul, > my offer is valid, please shout if and where you need some help, test cases... not t skilled with de

Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-09-04 Thread eks dev
Yonik, any reason to have BitSetItrator method int next(int fromIndex) {... package protected Would be interesing to see how BitSetIterator works in Matcher, skipping is needed there - Original Message From: paul.elschot (JIRA) <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sen

Re: [jira] Updated: (LUCENE-584) Decouple Filter from BitSet

2006-09-05 Thread eks dev
15:42 PM Subject: Re: [jira] Updated: (LUCENE-584) Decouple Filter from BitSet On 9/4/06, Eks Dev (JIRA) <[EMAIL PROTECTED]> wrote: > Here are some Matcher implementations, > > - OpenBitsMatcher- the same as the code Paul wrote for BitsMatcher, with > replaced OpenBitSet instead

Re: [jira] Updated: (LUCENE-584) Decouple Filter from BitSet

2006-09-06 Thread eks dev
"Keep in mind that BitSetIterator is fast for iteration over all it's bits. If it's used as a filter (with skipping), I would expect it to be slower." still, DenseBitsMatcher (BitSetIterator warpped in Matcher) works faster than anything else for this case: int skip(Matcher m) throws IOExcepti

Re: [jira] Updated: (LUCENE-584) Decouple Filter from BitSet

2006-09-07 Thread eks dev
UCENE-584) Decouple Filter from BitSet On 9/6/06, eks dev <[EMAIL PROTECTED]> wrote: > still, DenseBitsMatcher (BitSetIterator warpped in Matcher) works faster than > anything else for this case: > > int skip(Matcher m) throws IOException{ > int doc=-1, ret = 0;

Re: [jira] Updated: (LUCENE-584) Decouple Filter from BitSet

2006-09-07 Thread eks dev
, everything looks as it should be, BitSetIterator is slower than OpenBitSet in skipTo() scenario... - Original Message From: eks dev <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Thursday, 7 September, 2006 10:03:45 AM Subject: Re: [jira] Updated: (LUCENE-584) De

Re: [jira] Updated: (LUCENE-584) Decouple Filter from BitSet

2006-09-07 Thread eks dev
"What's the point of using a sorted interval list for a category?" Just terminology first to avoid misunderstanding :), category is "category field" that can take N valus Now, the case I am facing goes as follows: I have category field in 50Mio collection which has more or less uniform distri

Re: [jira] Commented: (LUCENE-665) temporary file access denied on Windows

2006-09-13 Thread eks dev
we hit it a few times, and we use dedicated servers, but unfortunatelly someone else is hosting our app, well, if it does not hurt someone else, would be nice to patch it somehow (like ant did it for example) - Original Message From: robert engels <[EMAIL PROTECTED]> To: java-dev@lucene

Re: [jira] Commented: (LUCENE-665) temporary file access denied on Windows

2006-09-13 Thread eks dev
Also, folks write desktop apps with lucene... and users of desktop search are not sys admins ... - Original Message From: robert engels <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Wednesday, 13 September, 2006 9:58:06 PM Subject: Re: [jira] Commented: (LUCENE-665) temporary

Re: [jira] Commented: (LUCENE-665) temporary file access denied on Windows

2006-09-13 Thread eks dev
not promoting, "let lucene fix all Winblows problems", just saying, if someone has cool, simple trick in patch form, that hurts nobody, would be nice to accept it. Enough people burned their fingers on this one - Original Message From: Chris Hostetter <[EMAIL PROTECTED]> To: java-dev@lu

Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-09-14 Thread eks dev
not being inpatient, just asking if all holes are covered, Matcher rocks and I'd like to clean up a lot of mess we created in our local copy in order to simulate what Matcher will permit us to do in really elegant way... if being patient is all what it takes, cool ;) - Original Message

Re: ParallelMultiSearcher reimplementation

2006-11-13 Thread eks dev
maybe someone interested. I just remembered, we tested pure Hadop RPC a few (5+) months ago in simple setup, kind of balancing server getting and distributing requests to 3 "search units"... we went that far as java RMI proved to have ugly latency problems (or we did not get it right, don't kno

Re: Filesystem based bitset

2009-01-19 Thread eks dev
Hi Paul, not really an answer to your questions, I just thought you may find it useful as a confirmation that this packing of integers into (B or some other) Tree is good one. I have seen Integer set distributions that can profit hugely from the tree organization on top. have look at: http

Re: wiki

2009-01-24 Thread eks dev
"It could be a Slavic language, but that's really no more a guess." it is one of Serbian, Croatian or Bosnian... (used to be the same language "Serbo-Croatian" 10-15 years ago, than it split on political boundaries). The same meaning, "Index of words". cheers, eks ___

Re: BloomFilter-s with Lucene

2009-01-30 Thread eks dev
I have used them for speeding up huge switch clauses in charset normalization (eg lowercase and accent->plain form mapping). Big number of accented characters (this causes big switch statement) that appear seldom in corpus (big majority being not accented). If negative test, you do just simple a

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread eks dev
Maybe we should close this issue with a won't-fix and start a new one for filtered deletions? A few thoughts, without looking at the code, just thinking aloud :) It is inverted filter what we are talking about here, Lucene uses Filter as a pass filter (Set bit defines document that should pas

Re: BloomFilter-s with Lucene

2009-01-30 Thread eks dev
a > To: java-dev@lucene.apache.org > Sent: Friday, 30 January, 2009 23:02:15 > Subject: Re: BloomFilter-s with Lucene > > > On Fri, 30 Jan 2009, eks dev wrote: > > > I have used them for speeding up huge switch clauses in charset > > normalization > (eg lowe

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

2009-01-30 Thread eks dev
indeed :) From: Paul Elschot To: java-dev@lucene.apache.org Sent: Friday, 30 January, 2009 23:37:08 Subject: Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs On Friday 30 January 2009 23:24:42 eks

Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or top level Query

2009-01-31 Thread eks dev
"...many core unit tests will need to change, or.." Thinking about it a bit more, what is current contract for deleted documents in respect to terms? if we delete document from an index, do we update global freqs and eventually delete terms... or we simply say document ID will not be found agai

Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or top level Query

2009-01-31 Thread eks dev
level Query > > > Right, we just filter out the docs when iterating through postings. > > So this means, as segments are merged, the stats get corrected, which means > document scores will change for a given query. > > Mike > > Mark Miller wrote: > >

Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or top level Query

2009-02-03 Thread eks dev
aloud and probably does not make much sense. cheers, eks - Original Message > From: Michael McCandless > To: java-dev@lucene.apache.org > Sent: Tuesday, 3 February, 2009 18:28:14 > Subject: Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or > top level Quer

Re: Another possible optimization - now in DocIdSetIterator

2009-04-24 Thread eks dev
Hi Shai, absolutely! we have been there, and there are already some micro benchmarks done in LUCENE-1345 just do not forget to use -1 < doc instead of -1 != doc, trust me, Yonik convinced me :) as a side effect, this change would have some positive effects on iterator semantics, prevents, ve

new TokenStream api Question

2009-04-26 Thread eks dev
I am just looking into new TermAttribute usage and wonder what would be the best way to implement PrefixFilter that would filter out some Terms that have some prefix, something like this, where '-' represents my prefix: public final boolean incrementToken() throws IOException { // the f

Re: new TokenStream api Question

2009-04-26 Thread eks dev
with > getAttribute(TermAttribute.class). > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: eks dev [mailto:eks...@yahoo.co.uk] > >

Re: new TokenStream api Question

2009-04-26 Thread eks dev
lee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: eks dev [mailto:eks...@yahoo.co.uk] > > Sent: Sunday, April 26, 2009 10:39 PM > > To: java-dev@lucene.apache.org > > Subject: new TokenStream

Re: [jira] Commented: (LUCENE-1616) add one setter for start and end offset to OffsetAttribute

2009-04-27 Thread eks dev
                URL: https://issues.apache.org/jira/browse/LUCENE-1616 > >            Project: Lucene - Java > >          Issue Type: Improvement > >          Components: Analysis > >            Reporter: Eks Dev > >            Priority: Trivial > >           

Re: new TokenStream api Question

2009-04-27 Thread eks dev
ibute.class). > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: eks dev [mailto:eks...@yahoo.co.uk] > > Sent: Sunday, April 26, 2009 10

Re: new TokenStream api Question

2009-04-28 Thread eks dev
TokenStream api Question Hi Eks Dev, I actually started experimenting with changing the new API slightly to overcome one drawback: with the variables now distributed over various Attribute classes (vs. being in a single class Token previously), cloning a "Token" (i.e. calling captureS

Re: Fuzzy search change

2009-06-18 Thread eks dev
what would be the difference/benefit compared to standard lucene SpellChecker? If I I am not wrong: - Lucene SpellChecker uses standard lucene index as a storage for tokens instead of QDBM... meaning full inverted index with arbitrary N-grams length, with tf/idf/norms... not only HashMap -

Re: Improving TimeLimitedCollector

2009-06-24 Thread eks dev
Re: "I think such a parameter should not exist on individual search methods since it's more of a global setting (i.e., I want my searches to be limited to 5 seconds, always, not just for a particular query). Right?" I am not sure about this one, we had cases where one phisical index served two lo

Re: A Comparison of Open Source Search Engines

2009-07-06 Thread eks dev
> Anybody knows other interesting open-source search engines? Minion (https://minion.dev.java.net/) - Original Message > From: Earwin Burrfoot > To: java-dev@lucene.apache.org > Sent: Monday, 6 July, 2009 23:01:52 > Subject: Re: A Comparison of Open Source Search Engines > > I'd sa

Re: [jira] Updated: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread eks dev
>I have no test data which size is good, it is just trying out Sure, for this you need bad OS and large index, you are not as lucky as I am to have it :) Anyhow, I would argument against default value. An algorithm is quite simple, if you hit OOM on map(), reduce this value until it fits :) n

Re: Java caching of low-level index data?

2009-07-22 Thread eks dev
imo, it is too low level to do it better than OSs. I agree, cache unloading effect would be prevented with it, but I am not sure if it brings net-net benefit, you would get this problem fixed, but probably OS would kill you anyhow (you took valuable memory from OS) on queries that miss your inte

Re: Java caching of low-level index data?

2009-07-22 Thread eks dev
this should not be all that difficult to try. I accept it makes sense in some cases ... but which ones? Background: all my attempts to fight OS went bed :( Let us think again what does it mean what Mike gave as an example? You are explicitly deciding that Lucene should get bigger share of RAM.

Re: Java caching of low-level index data?

2009-07-22 Thread eks dev
>Part of the challenge here is what metric is really important. Sure, depends who you ask :) Lucene is so popular, that you can find almost every pattern we could come up with. funny, I had to deal with similar situation. The simplest solution was to set warm-up with constructed Queries (from

Re: [jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread eks dev
, 2009 23:33:03 >Subject: Re: [jira] Commented: (LUCENE-1410) PFOR implementation > >Eks, > > >> >>> [ >>> https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1276274

Re: [jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread eks dev
() where postings get resorted on such fields (basically enabling rle encoding to work) and at the same time all other terms get optimal encoding format for postings... perfect for read only indexes where you want to max performance and reduce ix size > >From: eks dev >To:

Ideas to refactor Filed

2008-03-05 Thread eks dev
I have noticed the two potential enhancements in Field, and I am not sure if I read it correctly, so better to ask before crating Jira issue :) 1.. Field uses two methods to determine type of fieldsData, sometimes with boolean isBinary; and sometimes with instanceof byt[] The proposal is to redu

Re: Ideas to refactor Filed

2008-03-11 Thread eks dev
From: Michael McCandless <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Wednesday, 5 March, 2008 10:09:26 AM Subject: Re: Ideas to refactor Filed Good morning! eks dev wrote: > I have noticed the two potential enhancements in Field, and I am > not sure if I read it c

Re: Ideas to refactor Filed

2008-03-11 Thread eks dev
tip with extra checks is good, deprecate even better, I will update patch - Original Message From: Michael McCandless <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Tuesday, 11 March, 2008 2:45:56 PM Subject: Re: Ideas to refactor Filed Hello! Responses below: e

Re: Ideas to refactor Filed

2008-03-11 Thread eks dev
Michael, others what is Lucene/Jira best practice for new versions of the same patch: 1. delete existing / add new patch wit the same name 2. add new patch with some funky version e.g. "Jira-1219-take3.patch" 3. just add new patch with the same name ?

Re: Ideas to refactor Filed

2008-03-11 Thread eks dev
but am fine with #2 as well. #3 makes it easier, IMO, to find the latest. -Grant On Mar 11, 2008, at 10:26 AM, Michael McCandless wrote: > > I like #2. > > I don't think we should delete/replace attachments in Jira. The > history can be useful.. > > Mike > &g

Re: [jira] Updated: (LUCENE-1217) use isBinary cached variable instead of instanceof in Field

2008-03-12 Thread eks dev
>>fix typo that's been bugging me excuse my ignorance, but i do not understand this entry. Typo we need to fix, which one? __ Sent from Yahoo! Mail. The World's Favourite Email http://uk.docs.yahoo.com/nowyoucan.html --

Re: [jira] Updated: (LUCENE-1217) use isBinary cached variable instead of instanceof in Field

2008-03-12 Thread eks dev
Thanks for diff Hoss! I was staring 10min at it but was not able to see any difference. Well, that is the price to pay when you work with us, non-native English speakers :) - Original Message From: Chris Hostetter <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Wednesday, 12 Ma

Re: [jira] Created: (LUCENE-1231) Column-stride fields (aka per-document Payloads)

2008-03-14 Thread eks dev
"A better way to do this is using payloads. By creating a "special" posting list that has one posting with payload for each document you can "simulate" a column- stride field. The performance is significantly better compared to stored fields, however still not optimal. The reason is that for each d

How to avoid byte[] allocation in Document.getBinaryValue(String name)

2008-03-14 Thread eks dev
I am looking for ideas on how I could pass my byte[] to Document.getBinaryValue(String name) in order to avoid allocation of new byte[] for each Field retrieved. first idea I had was to add something like this in Document: public final byte[] getBinaryValue(String name, byte[] myBuffer) {

Re: Fieldable, AbstractField, Field

2008-03-17 Thread eks dev
additionaly, this very reason makes something like Document.getBinaryValue(String name, byte[] myBuffer);, to put it mildly, impractical. This could be handy way to reduce allocations when fetching as stored fields can be big - Original Message From: Michael McCandless <[EMAIL PRO

Re: Fieldable, AbstractField, Field

2008-03-19 Thread eks dev
Hoss, thanks for kicking-in with your "design purist" hat on :) about your proposal, "The best short term approach I can think of for addressing LUCENE-1219 in 2.4: 1) list the new methods in a new interface that extends Fieldable (ByteArrayReuseFieldable or something) 2) add the new met

Re: Fieldable, AbstractField, Field

2008-03-19 Thread eks dev
ke such changes. - Original Message From: Grant Ingersoll <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Wednesday, 19 March, 2008 12:01:34 PM Subject: Re: Fieldable, AbstractField, Field On Mar 19, 2008, at 6:45 AM, eks dev wrote: > Hoss, thanks for kicking-in with you

Re: Fieldable, AbstractField, Field

2008-03-19 Thread eks dev
"Well, maybe we should put 1219 off to 3.0 and maybe we should get to 3..0 sooner rather than later, as in stop adding new features and focus on bug fixes and deprecation. :-)" honestly, "getting to 3.0 sooner" can take far too long for an itch I currently have, gc() is kicking in like crazy

Re: Fieldable, AbstractField, Field

2008-03-19 Thread eks dev
> > IndexableField really shouldn't be a subclass of whatever class is > > returned after a sarch is done ... the methods used for accessing the > > "stored" value of a returned document make as little sense in the > > context of IndexableField as the setBoost/Reader/TokenStream > > functions of

Re: Fieldable, AbstractField, Field

2008-03-23 Thread eks dev
i enjoy reading Carp's blog, he has/had the same dilemma :) interfaces vs abstract classes, nice reading on http://lingpipe-blog.com/ __ Sent from Yahoo! Mail. More Ways to Keep in Touch. http://uk.docs.yahoo.com/nowyoucan.html

Strange Exception

2008-04-18 Thread eks dev
does anyone have an idea what the reason for this could be? corrupt index? (this is RAMDirectoryloaded from FSDirectory!?) unfortunately I have very limited possibilities to access this system to dig deeper thanks! INFO | jvm 49 | 2008/04/10 11:30:41 | 080410 113041 SEVERE Server handl

Re: Strange Exception

2008-04-18 Thread eks dev
ve 2 fields? The exception is happening > because TermBuffer is trying to look up the field name for > fieldNumber=3. > > What version of Lucene is this? > > How was the index produced? > > Were there any other exceptions before this? > > Mike > > eks de

Re: Strange Exception

2008-04-22 Thread eks dev
x produced? > > Were there any other exceptions before this? > > Mike > > eks dev wrote: > > > > does anyone have an idea what the reason for this could be? corrupt > > index? (this is RAMDirectoryloaded from FSDirectory!?) > > unfortunately I ha

Index without tf, anyone?

2008-07-17 Thread eks dev
hi all, is there any solution to have pure postings lists without interleaved tf ... this eats a lot of CPU for VInt decoding on dense terms (also doubles IO...) in our case. Can be a untested patch, tips how to do it or whatever... I know about flexible indexing, but cannot wait (I guess it w

Re: Index without tf, anyone?

2008-07-18 Thread eks dev
hat you can do after first getting the > above working... > > On the search side, you'll need to fix scoring to be OK with tf=0. > > I think this would be a useful addition to Lucene (it comes up every > so often), even before we fully work out flexible indexing. > >

Re: Index without tf, anyone?

2008-07-18 Thread eks dev
s RAM & CPU) that you can do after first getting the > above working... > > On the search side, you'll need to fix scoring to be OK with tf=0. > > I think this would be a useful addition to Lucene (it comes up every > so often), even before we fully work out fle

Re: Index without tf, anyone?

2008-07-18 Thread eks dev
also, another one: what should happen with payloads and omitTf options in case op storePayloads==true && omitTf==true shold we say: 1. ignore omitTf and go on with payloads or 2. disable payloads and omit tf other combination are clear - Original Message > From: eks

Re: Index without tf, anyone?

2008-07-18 Thread eks dev
anything on reader side! - Original Message > From: eks dev <[EMAIL PROTECTED]> > To: java-dev@lucene.apache.org > Sent: Friday, 18 July, 2008 9:48:04 PM > Subject: Re: Index without tf, anyone? > > also, another one: > > what should happen with payloads a

Re: Index without tf, anyone?

2008-07-18 Thread eks dev
docDelta+1), > and you'd save some CPU when decoding as well. But maybe first do it > this way, then if necessary/it helps/etc, explore the optimization? > > Mike > > eks dev wrote: > > > am I boring :) > > > > would it be ok to assume tf == 1 alway

Re: Index without tf, anyone?

2008-07-18 Thread eks dev
ut, I can imagine > you'd want to index a TokenStream once with a field that's storing tf, > positions & payloads, and then again as an field that doesn't. > > Mike > > eks dev wrote: > > > also, another one: > > > > what should happen

Re: Index without tf, anyone?

2008-07-18 Thread eks dev
I have created "https://issues.apache.org/jira/browse/LUCENE-1340"; for this, with a patch, not properly tested, missing asserts and unit tests, but basic ant test-core passed ... released early for feedback - Original Message ---- > From: eks dev <[EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-18 Thread eks dev
s a holiday coming up. > > > Make it posible not to include TF information in index > > -- > > > > Key: LUCENE-1340 > > URL: https://issues.apache.org/jira/browse/LUCENE-1340 > > Project: Lucene - Java > > Issue Type: N

Re: [jira] Commented: (LUCENE-1278) Add optional storing of document numbers in term dictionary

2008-07-20 Thread eks dev
ues.apache.org/jira/browse/LUCENE-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615077#action_12615077 > > ] > > Eks Dev commented on LUCENE-1278: > - > > in light of Mike's comments hier (Michael McCa

Re: performance optimizations

2008-07-23 Thread eks dev
sure, nice article, big Ohhh notation should be addressed first, but try running Analyzers before Mike added char[] and compare try Indexing with some older versions, basically nothing significantly changed from the algorithmic point of view Doug set years ago, all that happened there is just r

Re: performance optimizations

2008-07-23 Thread eks dev
and just one more for arguments sake, in Lucene "obscure bit twiddling" is "the great deal", have a look at all recent / old work on inverted index design, p4delta, rank9/16 ... it is nothing more nor less than "obscure bit twiddling" - Original Messag

Re: performance optimizations

2008-07-23 Thread eks dev
> > It also seems that many more "obscure, index corruption" type bugs > have crept in as the pursuit of performance has taken place, whereas > the 1.9 and prior code was very stable. come on, this bug you point to is clear jvm bug that Lucene community contributed back to Sun... on the ot

Re: [jira] Commented: (LUCENE-1340) Make it posible not to include TF information in index

2008-07-24 Thread eks dev
> URL: https://issues.apache.org/jira/browse/LUCENE-1340 > > Project: Lucene - Java > > Issue Type: New Feature > > Components: Index > >Reporter: Eks Dev > >Priority: Minor > > A

ScorerDocQueue.HeapedScorerDoc

2008-07-26 Thread eks dev
what is the reason to have HeapedScorerDoc class in ScorerDocQueue? Caching of the doc value? Does this bring anything compared to invoking doc() on Scorer, just curious, maybe I do not see something ovious... If doc is the reason, I would bet on doc() ___

Re: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-26 Thread eks dev
Hi Paul, it sounds so familiar. I too like playing with lucene, makes fun, but I have not found formula to make 25 Hours day (waking up one hour earlier does not work for me for some strange reason) The only other person being so interested in this Filter-like issues is Yonik, but I guess he ha

Re: ScorerDocQueue.HeapedScorerDoc

2008-07-27 Thread eks dev
ike. I think MG4J people made this switch in last version as well. - Original Message > From: Paul Elschot <[EMAIL PROTECTED]> > To: java-dev@lucene.apache.org > Sent: Sunday, 27 July, 2008 1:04:26 AM > Subject: Re: ScorerDocQueue.HeapedScorerDoc > > Op Satur

Re: ScorerDocQueue.HeapedScorerDoc

2008-07-28 Thread eks dev
>>... to change semantics of these iterators not to return boolen but > > rather document Id with sentinel values. This would definitely reduce > > number of method invocations by factor 2 at least.--- {next() doc()} > > -> next() > > > > It would be pretty easy to do that, just requires on one h

Re: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-28 Thread eks dev
from what I can say, this just makes it harder for the new approach, but you newer know before you try it in "production" ... just wanted to see if it could lead anywhere before spending real time on it - Original Message > From: Paul Elschot (JIRA) <[EMAIL PROTECTED]> > To: java-dev@

Re: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-29 Thread eks dev
as a matter of fact, you can, keeping literals on left hand side prevents some ugly accidental assignments, so at the end of day you have more time to speed things up instead of chasing bugs :) cheers Hoss, god to see you are following this - Original Message > From: Chris Hostetter

  1   2   3   >