sting
constructor, but I think Lucene definitely need a new constructor or
convenience
function that will do "the right thing" for opening a potentially-existing
index.
--
Nadav Har'El.
-
To unsubscribe, e-
crash in-tolerance issues in Lucene
that I should consider working on?
Thanks,
Nadav.
--
Nadav Har'El
IBM Haifa Research Lab
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
commit lock held, and not outside it.
> I can send patch but firstly I need to find svn client in gentoo :) and
> it's to late here.
> Can be smb so kind and give me link where I can find how to generate
> patch in lucene/apache way?
I'm sorry I can't reall
so mentioned)
of document updates: every single insert is preceded by a delete,
25% of which actually delete (the updated document existed previously)
and the rest end up not finding an old document and not deleting
anything. I expect t
ld that
keeps a list of "categories" that a document is in. A document can
either be, or not be, in a category, but there is no significance
in the order of these categories in a document's list.
--
Nadav Har'El
---
being created? Am I doing something wrong? Is something wrong in the
build.xml, or something?
Thanks,
Nadav.
--
Nadav Har'El
[EMAIL PROTECTED]
+972-4-829-6326
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
uot;Scorer"? A
"Similarity"? Or what?
I think this is an interesting topic.
--
Nadav Har'El
[EMAIL PROTECTED]
+972-4-829-6326
Grant Ingersoll
Nadav.
--
Nadav Har'El| Tuesday, Jun 27 2006, 1 Tammuz 5766
IBM Haifa Research Lab |-
|Unix is user friendly - it's just picky
http://nadav.harel.org.il |ab
llector)
TopFieldDocs search(Query, Filter, int, Sort, HitCollector)
In the long run, perhaps we need to give some thought as to whether we
should continue demonstrating the use of Hits (rather than TopDocs) in most
Lucene examples, and whether perhaps, the Hits API should be deprecate
n this BooleanQuery)
you can have in a query parser expression.
The default limit is 1024, but you can change it with
BooleanQuery.setMaxClauseCount()
Note, however, that if you really use such huge queries, they may be
extremely slow.
--
Nadav Har'El| Thu
obviously not the best we can do: it is inefficient
(goes through each posting list three times), and not tuned. A better solution
would be like you said, to create a modified version of BooleanQuery's
scoring.
vily than text
> between tags.
Indeed.
If you want a "poor man's version" of their capability, before per-position
payloads are added to lucene, you can try this simple trick: double every
word inside the . This will give these words a boost compared to the
other words. Of course,
Yes, I think you described the situation well. At this stage, I'll continue
to try to develop this feature using Lucene's existing Spans/SpanQuery
framework. I hope this is possible, because the ideas you raised (adding
weight to Spans or spans to Scorer) will require signif
ticated searches to decide what to delete?
As I mentioned in a previous post, I needed this capability in an
application which indexed emails and attachments, and when an email document
was deleted I also had to delete the attached documents (listed in a field
of the email) from the index.
--
Na
If anybody has any comments, or knows of any reason why the existing code
was so inefficient (while the code in BufferedIndexOutput makes more sense),
I'd love to hear. If a committer will agree to commit this change, even
better :-) When JIRA is back online, I'll put the patches the
s" is what differenciates Hits from
TopDocs, perhaps we don't need Hits at all?
So, how about deprecating Hits altogether, and recommending the TopDocs
alternatives instead?
--
Nadav Har'El| Sunday, Nov 26 2006, 5 Kislev 5767
IBM Haifa Research Lab
API, it seems doing this is much more difficult and requires
writing some sort of new Analyzer - one that will do the regular analysis
that I want for the regulr fields, and add the payload to the one specific
field that lists the facets.
Am I understanding correctly? Or am I missing a better way t
rm (F,W) with the payload you want
for each document (basically, the list of categories that this document
belongs to).
I'm not saying this is the best way to do it, and certainly not the cleanest,
but it's just one of the things that payloads enable you to do.
--
Nadav Har'El
e segment that
is being written.
So perhaps a "grand unified Index" does make sense, instead of repeating the
same code and/or functionality in both IndexReader and IndexWriter.
--
Nadav Har'El|Wednesday, Jan 17 2007, 27 Tevet 5767
[EMAIL PROTECTED]
g in this area).
I'll add a comment about this use-case to LUCENE-580.
--
Nadav Har'El| Thursday, Jan 18 2007, 28 Tevet 5767
IBM Haifa Research Lab |-
|If glor
;df -F nfs INDEXDIR". If the result
is empty with a "mounted as a ... file system" on stderr, it's not NFS.
If the result on stdout has one line, it's NFS.
It's (very) ugly, but it can work.
Of course, NFS is not the only network file system out there.
--
Nadav H
to work
hard to get around this limitation. Wouldn't it be better if Lucene included
this functionality that many (if not most) users need, out of the box?
--
Nadav Har'El| Tuesday, Feb 13 2007, 25 Shevat 5767
IBM Haifa Research Lab |--
the queue and merges them.
>
> This would effectively block adding of documents some times, but that
> is not different than what happens now.
So if adds can still block, what is the point of making this change?
--
Nadav Har'El| Wednesday, Feb 21 2
uch this idea can improve performance
on systems with multiple separate disks and multiple CPUs.
> size (buffered documents) is not too big. And multiple disk merge
> threads require significant system resources to add benefit.
See my comments above on why multiple concurrent merges might be n
luding this one), commit() is equivalent to a close()
followed by a new open(), but a person reading this javadoc wouldn't know that.
--
Nadav Har'El| Wednesday, Mar 18 2009, 22 Adar 5769
IBM Haifa Research Lab |-
tend the TopScoreDocCollector class, and it can be final.
--
Nadav Har'El|Sunday, Mar 22 2009, 26 Adar 5769
IBM Haifa Research Lab |-
|"Did you sleep well?" "
n't overlook modules they might
> want (like highlighting) because they are just as easy to find the "core"
> and people wouldn't wind up with bloated jars containing a lot of code
> they don't need. (beating a dead horse for a moment: this would
rom Sun, at around 40 K (this is part
of J2EE but not of J2SE, so you need to include this as well if you want to
use the servlet API). And that's it.
I'm sure that similar tiny Web Servers can also be found on the Web, but if
there's interest, I can see about publishing mine.
--
indices are less rare than they used to be, and 32 bit JVMs
are still quite common, so I think this is a problem we should solve properly.
Thanks,
Nadav.
--
Nadav Har'El|Wednesday, Jun 25 2008, 22 Sivan 5768
[EMAIL PROTECTED] |--
ent different sorting mechanisms (e.g., according
to payloads, database data, or whatever).
Does anyone disagree? Is there a reason why this change should not be done?
--
Nadav Har'El|
rtant goal.
> At one point there was even talk of refactoring additional code out of the
> core and into a contrib (this was already done with some analyzers when
> Lucene became a TLP)
--
Nadav Har'El| Wednesday, Sep 3 2008, 3 Elul 5768
IBM Haifa
t wasn't
even mentioned, let alone taught! As a result, some words have a few spelling
variants in the wild, with each dictionary typically considering one correct
and the others mispellings.
--
Nadav Har'El|Wed
of that new code being written).
Thanks,
Nadav.
--
Nadav Har'El| Sunday, Oct 5 2008, 6 Tishri 5769
IBM Haifa Research Lab |-
|Anyone who quotes me in their sig is
making filters more efficient and flexible. Searching with a
> Filter is now more efficient: now the filter is applied to a
> document before scoring is done.
Thanks, it's better I think.
Maybe it even deserves its own bullet - I don't think there's too much
connection
t to count them twice, so it might
indeed be useful to have this prosed behavior as an option.
Anyway, this is just my opinion (not backed by any hard research or
experimentation), so it might be wrong.
--
Nadav Har'El
ady exist.
On the other hand, binaryValue() does something different - if I understand
correctly, it may need may need to do array copying to get a byte[] which
it can return.
So this API is not at all inconsistent - maybe it is just a bit redundant
and a bit confusing or not documented well en
ds() and so on, but again, this would not
be backward compatible (although, for 3.0 we may decide that this is
not absolutely necessary).
--
Nadav Har'El| Sunday, Dec 14 2008, 1
nably quick (at least on local disks),
it would be great.
--
Nadav Har'El| Sunday, May 13 2007, 25 Iyyar 5767
IBM Haifa Research Lab |-
|How do you get holy water? Boil the he
g Term Relevance Sets", Einat Amitay,
David Carmel, Ronny Lempel and Aya Soffer, SIGIR 2004,
http://einat.webir.org/SIGIR_2004_Trels_p10-amitay.pdf
--
Nadav Har'El| Tuesday, Jun
urn null;
> } else if (size > 0 && !lessThan(element, top())) {
> Object ret = heap[1];
> heap[1] = element;
> adjustTop();
> return ret;
>
a query, get a list of docids, and then delete
them all. I said "theoretically" because unfortunately, the current
IndexWriter interface doesn't support the necessary calls (either a
deleteDocuments(Query) or a deleteDocuments(int docid) call), but I don
[
http://issues.apache.org/jira/browse/LUCENE-383?page=comments#action_12373843 ]
Nadav Har'El commented on LUCENE-383:
-
Hi,
It appears that ConstantScoreRangeQuery is already in the trunk.
However, QueryParser still uses RangeQuery
[
http://issues.apache.org/jira/browse/LUCENE-130?page=comments#action_12373847 ]
Nadav Har'El commented on LUCENE-130:
-
toString(field) works very well, if you understand what it does. Perhaps the
javadoc isn't explicit enough on what it doe
[
http://issues.apache.org/jira/browse/LUCENE-322?page=comments#action_12373849 ]
Nadav Har'El commented on LUCENE-322:
-
I wonder, is this change at all necessary? After all, we have the
IndexSearcher().getIndexReader() function, which return
[
http://issues.apache.org/jira/browse/LUCENE-130?page=comments#action_12373980 ]
Nadav Har'El commented on LUCENE-130:
-
Daniel, sorry for the mess, but I actually misspelled the word "omitted" in
that sentence. Should h
Versions: 1.9
Reporter: Nadav Har'El
Priority: Minor
Lucene's indexing is expected to be reasonably tolerant to computer crashes or
the indexing process being killed. By reasonably tolerant, I mean that it is ok
to lose a few documents (those currently buffered in memory),
[
http://issues.apache.org/jira/browse/LUCENE-554?page=comments#action_12378295 ]
Nadav Har'El commented on LUCENE-554:
-
Hi Otis, sorry about lingering with this patch (I've been very busy, not to
mention a daughter two weeks ago :-) I sti
[
http://issues.apache.org/jira/browse/LUCENE-504?page=comments#action_12416168 ]
Nadav Har'El commented on LUCENE-504:
-
Hi Doron and Otis,
My view is that this bug is a problem in FuzzyQuery, not in PriorityQueue or
BooleanQuery. It is the cal
[ http://issues.apache.org/jira/browse/LUCENE-504?page=all ]
Nadav Har'El updated LUCENE-504:
Attachment: fuzzyquery.patch
This is my proposed patch described above.
> FuzzyQuery produces a "java.lang.NegativeArraySize
[
http://issues.apache.org/jira/browse/LUCENE-504?page=comments#action_12418446 ]
Nadav Har'El commented on LUCENE-504:
-
Hi Otis, you did not comment on my patch (fuzzyquery.patch), which I think
solves your objections to Doron's previous pat
[ http://issues.apache.org/jira/browse/LUCENE-623?page=all ]
Nadav Har'El updated LUCENE-623:
Attachment: ramdirectory.diff
I propose a trivial patch, which does two very simple things:
1. RAMDirectory.close(), instead of being a no-op, sets files
: Store
Affects Versions: 2.0.0
Reporter: Nadav Har'El
Priority: Minor
During a profiling session, I discovered that BufferedIndexInput.readBytes(),
the function which reads a bunch of bytes from an index, is very inefficient
in many cases. It is efficient for one o
[ http://issues.apache.org/jira/browse/LUCENE-695?page=all ]
Nadav Har'El updated LUCENE-695:
Attachment: readbytes.patch
The patch, which includes the change to BufferedIndexInput.readBytes(), and a
new unit test for that class.
>
[
http://issues.apache.org/jira/browse/LUCENE-695?page=comments#action_12444322 ]
Nadav Har'El commented on LUCENE-695:
-
Sorry, I didn't notice that my fix broke this unit test. Thanks for catching
that.
What is happening is i
[ http://issues.apache.org/jira/browse/LUCENE-695?page=all ]
Nadav Har'El updated LUCENE-695:
Attachment: readbytes.patch
A fixed patch, which now checks that we don't read past of of file. This is now
checked correctly in all three case
[
http://issues.apache.org/jira/browse/LUCENE-695?page=comments#action_12444903 ]
Nadav Har'El commented on LUCENE-695:
-
> If "given" a null array? Is this ever done in Lucene? Which should be fixed,
> the testcase o
[
https://issues.apache.org/jira/browse/LUCENE-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465797
]
Nadav Har'El commented on LUCENE-580:
-
This patch will be useful for users LUCENE-755, the payloads patch.
Reporter: Nadav Har'El
Priority: Minor
Hi, I found a potentially serious efficiency problem with OpenBitSet.
One typical (I think) way to build a bit set is to set() the bits one by one -
e.g., have a HitCollector set() the bit for each matching document.
The underlying arr
Reporter: Nadav Har'El
Priority: Trivial
In Searchable.java, the javadoc for maxdoc() is:
/** Expert: Returns one greater than the largest possible document number.
* Called by search code to compute term weights.
* @see org.apache.lucene.index.IndexReader#m
[
https://issues.apache.org/jira/browse/LUCENE-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752931#action_12752931
]
Nadav Har'El commented on LUCENE-1899:
--
Hi Shai, I guess you're
[
https://issues.apache.org/jira/browse/LUCENE-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753019#action_12753019
]
Nadav Har'El commented on LUCENE-1899:
--
Yes, you're right, 12.5%. O
[
https://issues.apache.org/jira/browse/LUCENE-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579258#action_12579258
]
Nadav Har'El commented on LUCENE-954:
-
I hate to rain on the parade, but mayb
[
https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607857#action_12607857
]
Nadav Har'El commented on LUCENE-1314:
--
At first glance, my opinion was th
[
https://issues.apache.org/jira/browse/LUCENE-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630561#action_12630561
]
Nadav Har'El commented on LUCENE-1382:
--
Hi Mike,
If you add this feature,
[
https://issues.apache.org/jira/browse/LUCENE-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651944#action_12651944
]
Nadav Har'El commented on LUCENE-1233:
--
Hi, I know this comment is a bit
[
https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652002#action_12652002
]
Nadav Har'El commented on LUCENE-1470:
--
Hi, I just wanted to comment that
[
https://issues.apache.org/jira/browse/LUCENE-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773402#action_12773402
]
Nadav Har'El commented on LUCENE-504:
-
Hi Uwe, I think that even though Prio
[
https://issues.apache.org/jira/browse/LUCENE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550964
]
Nadav Har'El commented on LUCENE-1088:
--
Michael, I agree - the most important fix was to make heap prot
[
https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552174
]
Nadav Har'El commented on LUCENE-997:
-
I'd like to add my 2 cents on this issue.
The more I use
69 matches
Mail list logo