Re: 1.9 RC1

2006-02-19 Thread Nadav Har'El
sting constructor, but I think Lucene definitely need a new constructor or convenience function that will do "the right thing" for opening a potentially-existing index. -- Nadav Har'El. - To unsubscribe, e-

Crash tolerance in Lucene

2006-04-20 Thread Nadav Har'El
crash in-tolerance issues in Lucene that I should consider working on? Thanks, Nadav. -- Nadav Har'El IBM Haifa Research Lab - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: IndexWriter mergeSegments

2006-05-07 Thread Nadav Har'El
commit lock held, and not outside it. > I can send patch but firstly I need to find svn client in gentoo :) and > it's to late here. > Can be smb so kind and give me link where I can find how to generate > patch in lucene/apache way? I'm sorry I can't reall

Re: Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-05-09 Thread Nadav Har'El
so mentioned) of document updates: every single insert is preceded by a delete, 25% of which actually delete (the updated document existed previously) and the rest end up not finding an old document and not deleting anything. I expect t

Re: Lucene Planning

2006-05-31 Thread Nadav Har'El
ld that keeps a list of "categories" that a document is in. A document can either be, or not be, in a category, but there is no significance in the order of these categories in a document's list. -- Nadav Har'El ---

javadoc compilation problem?

2006-06-14 Thread Nadav Har'El
being created? Am I doing something wrong? Is something wrong in the build.xml, or something? Thanks, Nadav. -- Nadav Har'El [EMAIL PROTECTED] +972-4-829-6326 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Scoring

2006-06-15 Thread Nadav Har'El
uot;Scorer"? A "Similarity"? Or what? I think this is an interesting topic. -- Nadav Har'El [EMAIL PROTECTED] +972-4-829-6326 Grant Ingersoll

Combining Hits and HitCollector

2006-06-27 Thread Nadav Har'El
Nadav. -- Nadav Har'El| Tuesday, Jun 27 2006, 1 Tammuz 5766 IBM Haifa Research Lab |- |Unix is user friendly - it's just picky http://nadav.harel.org.il |ab

Re: Combining Hits and HitCollector

2006-06-27 Thread Nadav Har'El
llector) TopFieldDocs search(Query, Filter, int, Sort, HitCollector) In the long run, perhaps we need to give some thought as to whether we should continue demonstrating the use of Hits (rather than TopDocs) in most Lucene examples, and whether perhaps, the Hits API should be deprecate

Re: Limit of QueryParser ?

2006-06-29 Thread Nadav Har'El
n this BooleanQuery) you can have in a query parser expression. The default limit is 1024, but you can change it with BooleanQuery.setMaxClauseCount() Note, however, that if you really use such huge queries, they may be extremely slow. -- Nadav Har'El| Thu

Re: Flexible index format / Payloads Cont'd

2006-06-30 Thread Nadav Har'El
obviously not the best we can do: it is inefficient (goes through each posting list three times), and not tuned. A better solution would be like you said, to create a modified version of BooleanQuery's scoring.

Re: Flexible index format / Payloads Cont'd

2006-07-04 Thread Nadav Har'El
vily than text > between tags. Indeed. If you want a "poor man's version" of their capability, before per-position payloads are added to lucene, you can try this simple trick: double every word inside the . This will give these words a boost compared to the other words. Of course,

Proximity-enhanced boolean scoring (was: Re: Flexible index format / Payloads Cont'd)

2006-07-06 Thread Nadav Har'El
Yes, I think you described the situation well. At this stage, I'll continue to try to develop this feature using Lucene's existing Spans/SpanQuery framework. I hope this is possible, because the ideas you raised (adding weight to Spans or spans to Scorer) will require signif

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-07-07 Thread Nadav Har'El
ticated searches to decide what to delete? As I mentioned in a previous post, I needed this capability in an application which indexed emails and attachments, and when an email document was deleted I also had to delete the attached documents (listed in a field of the email) from the index. -- Na

BufferedIndexInput performance improvement

2006-10-23 Thread Nadav Har'El
If anybody has any comments, or knows of any reason why the existing code was so inefficient (while the code in BufferedIndexOutput makes more sense), I'd love to hear. If a committer will agree to commit this change, even better :-) When JIRA is back online, I'll put the patches the

Re: Controlling Hits

2006-11-26 Thread Nadav Har'El
s" is what differenciates Hits from TopDocs, perhaps we don't need Hits at all? So, how about deprecating Hits altogether, and recommending the TopDocs alternatives instead? -- Nadav Har'El| Sunday, Nov 26 2006, 5 Kislev 5767 IBM Haifa Research Lab

Re: Payloads

2007-01-03 Thread Nadav Har'El
API, it seems doing this is much more difficult and requires writing some sort of new Analyzer - one that will do the regular analysis that I want for the regulr fields, and add the payload to the one specific field that lists the facets. Am I understanding correctly? Or am I missing a better way t

Re: Payloads

2007-01-10 Thread Nadav Har'El
rm (F,W) with the payload you want for each document (basically, the list of categories that this document belongs to). I'm not saying this is the best way to do it, and certainly not the cleanest, but it's just one of the things that payloads enable you to do. -- Nadav Har'El

Re: adding "explicit commits" to Lucene?

2007-01-17 Thread Nadav Har'El
e segment that is being written. So perhaps a "grand unified Index" does make sense, instead of repeating the same code and/or functionality in both IndexReader and IndexWriter. -- Nadav Har'El|Wednesday, Jan 17 2007, 27 Tevet 5767 [EMAIL PROTECTED]

Re: Payloads

2007-01-18 Thread Nadav Har'El
g in this area). I'll add a comment about this use-case to LUCENE-580. -- Nadav Har'El| Thursday, Jan 18 2007, 28 Tevet 5767 IBM Haifa Research Lab |- |If glor

Re: determining whether a directory is on NFS?

2007-01-22 Thread Nadav Har'El
;df -F nfs INDEXDIR". If the result is empty with a "mounted as a ... file system" on stderr, it's not NFS. If the result on stdout has one line, it's NFS. It's (very) ugly, but it can work. Of course, NFS is not the only network file system out there. -- Nadav H

Re: NewIndexModifier - - - DeletingIndexWriter

2007-02-12 Thread Nadav Har'El
to work hard to get around this limitation. Wouldn't it be better if Lucene included this functionality that many (if not most) users need, out of the box? -- Nadav Har'El| Tuesday, Feb 13 2007, 25 Shevat 5767 IBM Haifa Research Lab |--

Re: Concurrent merge

2007-02-21 Thread Nadav Har'El
the queue and merges them. > > This would effectively block adding of documents some times, but that > is not different than what happens now. So if adds can still block, what is the point of making this change? -- Nadav Har'El| Wednesday, Feb 21 2

Re: Concurrent merge

2007-02-21 Thread Nadav Har'El
uch this idea can improve performance on systems with multiple separate disks and multiple CPUs. > size (buffered documents) is not too big. And multiple disk merge > threads require significant system resources to add benefit. See my comments above on why multiple concurrent merges might be n

Re: IndexWriter.rollback() logic

2009-03-18 Thread Nadav Har'El
luding this one), commit() is equivalent to a close() followed by a new open(), but a person reading this javadoc wouldn't know that. -- Nadav Har'El| Wednesday, Mar 18 2009, 22 Adar 5769 IBM Haifa Research Lab |-

Re: Is TopDocCollector's collect() implementation correct?

2009-03-22 Thread Nadav Har'El
tend the TopScoreDocCollector class, and it can be final. -- Nadav Har'El|Sunday, Mar 22 2009, 26 Adar 5769 IBM Haifa Research Lab |- |"Did you sleep well?" "

Re: Modularization

2009-04-01 Thread Nadav Har'El
n't overlook modules they might > want (like highlighting) because they are just as easy to find the "core" > and people wouldn't wind up with bloated jars containing a lot of code > they don't need. (beating a dead horse for a moment: this would

Re: WebLuke - include Jetty in Lucene binary distribution?

2008-04-27 Thread Nadav Har'El
rom Sun, at around 40 K (this is part of J2EE but not of J2SE, so you need to include this as well if you want to use the servlet API). And that's it. I'm sure that similar tiny Web Servers can also be found on the Web, but if there's interest, I can see about publishing mine. --

The 2GB segment size limit

2008-06-25 Thread Nadav Har'El
indices are less rare than they used to be, and 32 bit JVMs are still quite common, so I think this is a problem we should solve properly. Thanks, Nadav. -- Nadav Har'El|Wednesday, Jun 25 2008, 22 Sivan 5768 [EMAIL PROTECTED] |--

Re: Extending TopDocCollector

2008-08-13 Thread Nadav Har'El
ent different sorting mechanisms (e.g., according to payloads, database data, or whatever). Does anyone disagree? Is there a reason why this change should not be done? -- Nadav Har'El|

Re: Moving SweetSpotSimilarity out of contrib

2008-09-03 Thread Nadav Har'El
rtant goal. > At one point there was even talk of refactoring additional code out of the > core and into a contrib (this was already done with some analyzers when > Lucene became a TLP) -- Nadav Har'El| Wednesday, Sep 3 2008, 3 Elul 5768 IBM Haifa

Re: [jira] Commented: (LUCENE-1406) new Arabic Analyzer (Apache license)

2008-10-01 Thread Nadav Har'El
t wasn't even mentioned, let alone taught! As a result, some words have a few spelling variants in the wild, with each dictionary typically considering one correct and the others mispellings. -- Nadav Har'El|Wed

Re: draft 2.4 announcement

2008-10-05 Thread Nadav Har'El
of that new code being written). Thanks, Nadav. -- Nadav Har'El| Sunday, Oct 5 2008, 6 Tishri 5769 IBM Haifa Research Lab |- |Anyone who quotes me in their sig is

Re: draft 2.4 announcement

2008-10-05 Thread Nadav Har'El
making filters more efficient and flexible. Searching with a > Filter is now more efficient: now the filter is applied to a > document before scoring is done. Thanks, it's better I think. Maybe it even deserves its own bullet - I don't think there's too much connection

Re: Similarity.lengthNorm and positionIncrement=0

2008-10-13 Thread Nadav Har'El
t to count them twice, so it might indeed be useful to have this prosed behavior as an option. Anyway, this is just my opinion (not backed by any hard research or experimentation), so it might be wrong. -- Nadav Har'El

Re: [jira] Created: (LUCENE-1439) Inconsistent API

2008-11-11 Thread Nadav Har'El
ady exist. On the other hand, binaryValue() does something different - if I understand correctly, it may need may need to do array copying to get a byte[] which it can return. So this API is not at all inconsistent - maybe it is just a bit redundant and a bit confusing or not documented well en

Re: 2.9/3.0 plan & Java 1.5

2008-12-14 Thread Nadav Har'El
ds() and so on, but again, this would not be backward compatible (although, for 3.0 we may decide that this is not absolutely necessary). -- Nadav Har'El| Sunday, Dec 14 2008, 1

Re: IndexReader.isCurrent in presence of many files

2007-05-13 Thread Nadav Har'El
nably quick (at least on local disks), it would be great. -- Nadav Har'El| Sunday, May 13 2007, 25 Iyyar 5767 IBM Haifa Research Lab |- |How do you get holy water? Boil the he

Re: search quality - assessment & improvements

2007-06-26 Thread Nadav Har'El
g Term Relevance Sets", Einat Amitay, David Carmel, Ronny Lempel and Aya Soffer, SIGIR 2004, http://einat.webir.org/SIGIR_2004_Trels_p10-amitay.pdf -- Nadav Har'El| Tuesday, Jun

Re: Performance Improvement for Search using PriorityQueue

2007-12-11 Thread Nadav Har'El
urn null; > } else if (size > 0 && !lessThan(element, top())) { > Object ret = heap[1]; > heap[1] = element; > adjustTop(); > return ret; >

Re: Unique doc ids

2008-01-23 Thread Nadav Har'El
a query, get a list of docids, and then delete them all. I said "theoretically" because unfortunately, the current IndexWriter interface doesn't support the necessary calls (either a deleteDocuments(Query) or a deleteDocuments(int docid) call), but I don

[jira] Commented: (LUCENE-383) ConstantScoreRangeQuery - fixes "too many clauses" exception

2006-04-10 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-383?page=comments#action_12373843 ] Nadav Har'El commented on LUCENE-383: - Hi, It appears that ConstantScoreRangeQuery is already in the trunk. However, QueryParser still uses RangeQuery

[jira] Commented: (LUCENE-130) org.apache.lucene.search.Query.toString(String field) ignores it's only parameter

2006-04-10 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-130?page=comments#action_12373847 ] Nadav Har'El commented on LUCENE-130: - toString(field) works very well, if you understand what it does. Perhaps the javadoc isn't explicit enough on what it doe

[jira] Commented: (LUCENE-322) [PATCH] Add IndexSearcher.numDocs() method

2006-04-10 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-322?page=comments#action_12373849 ] Nadav Har'El commented on LUCENE-322: - I wonder, is this change at all necessary? After all, we have the IndexSearcher().getIndexReader() function, which return

[jira] Commented: (LUCENE-130) org.apache.lucene.search.Query.toString(String field) ignores it's only parameter

2006-04-11 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-130?page=comments#action_12373980 ] Nadav Har'El commented on LUCENE-130: - Daniel, sorry for the mess, but I actually misspelled the word "omitted" in that sentence. Should h

[jira] Created: (LUCENE-554) Possible index corruption if crashing while replacing segments file

2006-04-23 Thread Nadav Har'El (JIRA)
Versions: 1.9 Reporter: Nadav Har'El Priority: Minor Lucene's indexing is expected to be reasonably tolerant to computer crashes or the indexing process being killed. By reasonably tolerant, I mean that it is ok to lose a few documents (those currently buffered in memory),

[jira] Commented: (LUCENE-554) Possible index corruption if crashing while replacing segments file

2006-05-07 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-554?page=comments#action_12378295 ] Nadav Har'El commented on LUCENE-554: - Hi Otis, sorry about lingering with this patch (I've been very busy, not to mention a daughter two weeks ago :-) I sti

[jira] Commented: (LUCENE-504) FuzzyQuery produces a "java.lang.NegativeArraySizeException" in PriorityQueue.initialize if I use Integer.MAX_VALUE as BooleanQuery.MaxClauseCount

2006-06-14 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-504?page=comments#action_12416168 ] Nadav Har'El commented on LUCENE-504: - Hi Doron and Otis, My view is that this bug is a problem in FuzzyQuery, not in PriorityQueue or BooleanQuery. It is the cal

[jira] Updated: (LUCENE-504) FuzzyQuery produces a "java.lang.NegativeArraySizeException" in PriorityQueue.initialize if I use Integer.MAX_VALUE as BooleanQuery.MaxClauseCount

2006-06-14 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-504?page=all ] Nadav Har'El updated LUCENE-504: Attachment: fuzzyquery.patch This is my proposed patch described above. > FuzzyQuery produces a "java.lang.NegativeArraySize

[jira] Commented: (LUCENE-504) FuzzyQuery produces a "java.lang.NegativeArraySizeException" in PriorityQueue.initialize if I use Integer.MAX_VALUE as BooleanQuery.MaxClauseCount

2006-06-29 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-504?page=comments#action_12418446 ] Nadav Har'El commented on LUCENE-504: - Hi Otis, you did not comment on my patch (fuzzyquery.patch), which I think solves your objections to Doron's previous pat

[jira] Updated: (LUCENE-623) RAMDirectory.close() should have a comment about not releasing any resources

2006-07-06 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-623?page=all ] Nadav Har'El updated LUCENE-623: Attachment: ramdirectory.diff I propose a trivial patch, which does two very simple things: 1. RAMDirectory.close(), instead of being a no-op, sets files

[jira] Created: (LUCENE-695) Improve BufferedIndexInput.readBytes() performance

2006-10-24 Thread Nadav Har'El (JIRA)
: Store Affects Versions: 2.0.0 Reporter: Nadav Har'El Priority: Minor During a profiling session, I discovered that BufferedIndexInput.readBytes(), the function which reads a bunch of bytes from an index, is very inefficient in many cases. It is efficient for one o

[jira] Updated: (LUCENE-695) Improve BufferedIndexInput.readBytes() performance

2006-10-24 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-695?page=all ] Nadav Har'El updated LUCENE-695: Attachment: readbytes.patch The patch, which includes the change to BufferedIndexInput.readBytes(), and a new unit test for that class. >

[jira] Commented: (LUCENE-695) Improve BufferedIndexInput.readBytes() performance

2006-10-24 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-695?page=comments#action_12444322 ] Nadav Har'El commented on LUCENE-695: - Sorry, I didn't notice that my fix broke this unit test. Thanks for catching that. What is happening is i

[jira] Updated: (LUCENE-695) Improve BufferedIndexInput.readBytes() performance

2006-10-24 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-695?page=all ] Nadav Har'El updated LUCENE-695: Attachment: readbytes.patch A fixed patch, which now checks that we don't read past of of file. This is now checked correctly in all three case

[jira] Commented: (LUCENE-695) Improve BufferedIndexInput.readBytes() performance

2006-10-26 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-695?page=comments#action_12444903 ] Nadav Har'El commented on LUCENE-695: - > If "given" a null array? Is this ever done in Lucene? Which should be fixed, > the testcase o

[jira] Commented: (LUCENE-580) Pre-analyzed fields

2007-01-18 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465797 ] Nadav Har'El commented on LUCENE-580: - This patch will be useful for users LUCENE-755, the payloads patch.

[jira] Created: (LUCENE-1899) Inefficient growth of OpenBitSet

2009-09-08 Thread Nadav Har'El (JIRA)
Reporter: Nadav Har'El Priority: Minor Hi, I found a potentially serious efficiency problem with OpenBitSet. One typical (I think) way to build a bit set is to set() the bits one by one - e.g., have a HitCollector set() the bit for each matching document. The underlying arr

[jira] Created: (LUCENE-1900) Confusing Javadoc in Searchable.java

2009-09-08 Thread Nadav Har'El (JIRA)
Reporter: Nadav Har'El Priority: Trivial In Searchable.java, the javadoc for maxdoc() is: /** Expert: Returns one greater than the largest possible document number. * Called by search code to compute term weights. * @see org.apache.lucene.index.IndexReader#m

[jira] Commented: (LUCENE-1899) Inefficient growth of OpenBitSet

2009-09-08 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752931#action_12752931 ] Nadav Har'El commented on LUCENE-1899: -- Hi Shai, I guess you're

[jira] Commented: (LUCENE-1899) Inefficient growth of OpenBitSet

2009-09-09 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753019#action_12753019 ] Nadav Har'El commented on LUCENE-1899: -- Yes, you're right, 12.5%. O

[jira] Commented: (LUCENE-954) Toggle score normalization in Hits

2008-03-16 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579258#action_12579258 ] Nadav Har'El commented on LUCENE-954: - I hate to rain on the parade, but mayb

[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force)

2008-06-24 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607857#action_12607857 ] Nadav Har'El commented on LUCENE-1314: -- At first glance, my opinion was th

[jira] Commented: (LUCENE-1382) Allow storing user data when IndexWriter.commit() is called

2008-09-12 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630561#action_12630561 ] Nadav Har'El commented on LUCENE-1382: -- Hi Mike, If you add this feature,

[jira] Commented: (LUCENE-1233) Fix Document.getFieldables and others to never return null

2008-12-01 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651944#action_12651944 ] Nadav Har'El commented on LUCENE-1233: -- Hi, I know this comment is a bit

[jira] Commented: (LUCENE-1470) Add TrieRangeQuery to contrib

2008-12-01 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652002#action_12652002 ] Nadav Har'El commented on LUCENE-1470: -- Hi, I just wanted to comment that

[jira] Commented: (LUCENE-504) FuzzyQuery produces a "java.lang.NegativeArraySizeException" in PriorityQueue.initialize if I use Integer.MAX_VALUE as BooleanQuery.MaxClauseCount

2009-11-03 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773402#action_12773402 ] Nadav Har'El commented on LUCENE-504: - Hi Uwe, I think that even though Prio

[jira] Commented: (LUCENE-1088) PriorityQueue 'wouldBeInserted' method

2007-12-12 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550964 ] Nadav Har'El commented on LUCENE-1088: -- Michael, I agree - the most important fix was to make heap prot

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

2007-12-15 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552174 ] Nadav Har'El commented on LUCENE-997: - I'd like to add my 2 cents on this issue. The more I use