Ok finally with some pointers from Ryan, figured out the last problem.
So as a note to anyone else who might encounter the same problems with
multireader
A) Directories can contain multiple segments and a reader for those segments
B) Searches are replayed within each reader in a serial fashion **
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/811/changes
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org
Thanks !
On Tue, Apr 28, 2009 at 11:48 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> On Tue, Apr 28, 2009 at 4:00 PM, Shai Erera wrote:
> > I hope that I don't make a complete fool of myself, but I'm talking about
> > this:
> >
> > private List exceptions = new ArrayList();
> >
[
https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Rutherglen updated LUCENE-1618:
-
Attachment: LUCENE-1618.patch
Implementation of the FileSwitchDirectory. It's nice this
I'm not sure that we could parallelize it. Currently, its a serial
process (as you say) - the queue collects across readers by adjusting
the values in the queue to sort correctly against the current reader.
That approach doesn't appear easily parallelized.
patrick o'leary wrote:
Think I may ha
On Tue, 28 Apr 2009, Michael McCandless wrote:
Hmm -- this failed because the host "downloads.osafoundation.org"
fails to resolve. The contrib/db tests need to download the Berkeley
DB JARs from here.
Andi any idea what's up w/ that? Do we need to set a different
download location?
It shou
[
https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703855#action_12703855
]
Jason Rutherglen commented on LUCENE-1618:
--
{quote}One downside to this approach
[
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703853#action_12703853
]
Jason Rutherglen commented on LUCENE-1313:
--
{quote}EG when RAM is full, we want t
[
https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703850#action_12703850
]
Jason Rutherglen commented on LUCENE-1618:
--
{quote}For an NRT writer using RAMDir
[
https://issues.apache.org/jira/browse/LUCENE-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man resolved LUCENE-1620.
--
Resolution: Invalid
Uday: please subscribe to the java-user mailing list and post your questions
abou
Think I may have found it, it was multiple runs of the filter, one for each
segment reader, I was generating a new map to hold distances each time. So
only the distances from the
last segment reader were stored.
Currently it looks like those segmented searches are done serially, well in
solr they
[
https://issues.apache.org/jira/browse/LUCENE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-1616.
Resolution: Fixed
Thanks Eks!
> add one setter for start and end offset to Offset
You might check out this Solr exchange :
http://www.lucidimagination.com/search/document/b2ccc68ca834129/lucene_2_9_migration_issues_multireader_vs_indexreader_document_ids
There are a few suggestions throughout.
--
- Mark
http://www.lucidimagination.com
Uwe Schindler wrote:
What is the
On Tue, Apr 28, 2009 at 4:00 PM, Shai Erera wrote:
> I hope that I don't make a complete fool of myself, but I'm talking about
> this:
>
> private List exceptions = new ArrayList();
>
> and this (MergeThread.run()):
>
> synchronized(ConcurrentMergeScheduler.this) {
> except
What is the problem exactly? Maybe you use the new Collector API, where the
search is done for each segment, so caching does not work correctly?
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
_
From: patrick o'leary [mailto:pj..
[
https://issues.apache.org/jira/browse/LUCENE-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-1604.
Resolution: Fixed
Thanks Shon!
> Stop creating huge arrays to represent the absen
hey
I've got a filter that's storing document id's with a geo distance for
spatial lucene using a bitset position for doc id,
However with a MultiSegmentReader that's no longer going to working.
What's the most appropriate way to go from bitset position to doc id now?
Thanks
Patrick
[
https://issues.apache.org/jira/browse/LUCENE-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-1617.
Resolution: Fixed
Thanks Shai!
> Add "testpackage" to common-build.xml
>
[
https://issues.apache.org/jira/browse/LUCENE-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-1623:
---
Attachment: LUCENE-1623.patch
Attached patch. I plan to commit in a day or two, and
Back-compat break with non-ascii field names
Key: LUCENE-1623
URL: https://issues.apache.org/jira/browse/LUCENE-1623
Project: Lucene - Java
Issue Type: Bug
Components: Index
Affects
I hope that I don't make a complete fool of myself, but I'm talking about
this:
private List exceptions = new ArrayList();
and this (MergeThread.run()):
synchronized(ConcurrentMergeScheduler.this) {
exceptions.add(exc);
}
Nothing seems to read this exceptions l
[
https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-1606:
Attachment: LUCENE-1606.patch
removed use of multitermquery's getTerm()
equals/hashcode are defin
[
https://issues.apache.org/jira/browse/LUCENE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703790#action_12703790
]
Earwin Burrfoot edited comment on LUCENE-1622 at 4/28/09 11:50 AM:
-
[
https://issues.apache.org/jira/browse/LUCENE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703790#action_12703790
]
Earwin Burrfoot commented on LUCENE-1622:
-
I'll shortly cite my experiences mentio
Apologies for the delay, guys. I tried to solve certain issues that didn't pop
up in my application (as Kirill said, the problem is indeed quite complex). I
didn't find all the answers I had been looking for, but nonetheless -- the patch
that works for my needs is in JIRA. I would be really in
[
https://issues.apache.org/jira/browse/LUCENE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-1622:
Attachment: synonyms.patch
Token filter implementing synonyms. Java 1.5 is required to compile it
Multi-word synonym filter (synonym expansion at indexing time).
---
Key: LUCENE-1622
URL: https://issues.apache.org/jira/browse/LUCENE-1622
Project: Lucene - Java
Issue Type: New Fe
Michael,
I updated the wiki under "New Features in Lucene". I can give a
presentation on realtime search in Lucene.
-J
On Mon, Apr 27, 2009 at 10:11 PM, Michael Busch wrote:
> I'm happy to give more than one talk, on the other hand I don't want to
> prevent others from presenting. So if anyon
[
https://issues.apache.org/jira/browse/LUCENE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703733#action_12703733
]
Mark Harwood commented on LUCENE-1621:
--
While we're poking around in this area I'd li
[
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703695#action_12703695
]
Yonik Seeley commented on LUCENE-1313:
--
bq. Yonik raised a good question on LUCENE-16
[
https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703686#action_12703686
]
Michael McCandless commented on LUCENE-1313:
Yonik raised a good question on L
[
https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703684#action_12703684
]
Earwin Burrfoot commented on LUCENE-1618:
-
bq. Sorry, by "diff" I meant the differ
[
https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703683#action_12703683
]
Michael McCandless commented on LUCENE-1618:
bq. by "diff" I meant the differe
[
https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703677#action_12703677
]
Marvin Humphrey commented on LUCENE-1614:
-
Further illustration...
Good method si
[
https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703676#action_12703676
]
Yonik Seeley commented on LUCENE-1618:
--
bq. That's not a diff
Sorry, by "diff" I me
[
https://issues.apache.org/jira/browse/LUCENE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703670#action_12703670
]
Felipe Sánchez Martínez commented on LUCENE-1284:
-
Hi,
I think that the
[
https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703667#action_12703667
]
Michael McCandless commented on LUCENE-1593:
bq. The way I understand it Index
[
https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703666#action_12703666
]
Earwin Burrfoot commented on LUCENE-1618:
-
bq. what is this diff anyway?
That's no
[
https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703660#action_12703660
]
Marvin Humphrey commented on LUCENE-1614:
-
> nudge doesn't sound like it changes a
[
https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703658#action_12703658
]
Yonik Seeley commented on LUCENE-1618:
--
As it relates to near real time, the search s
[
https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Earwin Burrfoot updated LUCENE-1618:
Attachment: MemoryCachedDirectory.java
> Allow setting the IndexWriter docstore to be a di
[
https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703656#action_12703656
]
Earwin Burrfoot commented on LUCENE-1618:
-
bq. You mean an opened IndexOutput woul
[
https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703651#action_12703651
]
Michael McCandless commented on LUCENE-1618:
Neat. This is sounding like one
On Tue, Apr 28, 2009 at 9:27 AM, Shai Erera wrote:
>> It's there so "anyUnhandledExceptions" can be called;
>
> I will check the code again, but I remember that after commenting it, the
> only compile errors I saw were from MergeThread adding the exception ...
> perhaps I'm missing something, so I
[
https://issues.apache.org/jira/browse/LUCENE-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703645#action_12703645
]
Robert Muir commented on LUCENE-1488:
-
what version of icu4j are you using? needs to b
>
> It's there so "anyUnhandledExceptions" can be called;
>
I will check the code again, but I remember that after commenting it, the
only compile errors I saw were from MergeThread adding the exception ...
perhaps I'm missing something, so I'll re-check the code.
I understand your point now - me
[
https://issues.apache.org/jira/browse/LUCENE-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703630#action_12703630
]
Tim Smith commented on LUCENE-1618:
---
{quote}
You mean an opened IndexOutput would write
[
https://issues.apache.org/jira/browse/LUCENE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller updated LUCENE-1621:
Component/s: Search
> deprecate term and getTerm in MultiTermQuery
> -
[
https://issues.apache.org/jira/browse/LUCENE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller updated LUCENE-1621:
Attachment: LUCENE-1621.patch
a quick first pass at this
> deprecate term and getTerm in MultiTer
On Tue, Apr 28, 2009 at 8:28 AM, Shai Erera wrote:
> Every merge hit the exception, yes.
>
> And actually, the exceptions list is not used anywhere besides MT adding the
> exception to the list. That's why I was curious why it's there.
It's there so "anyUnhandledExceptions" can be called; we coul
On Tue, Apr 28, 2009 at 8:10 AM, Uwe Schindler wrote:
>> It's awesome that you no longer have to warm your searchers... but be
>> careful when a large segment merge commits.
>
> I know this, but in our case (e.g. creating a IN-SQL list, collecting
> measurement parameters from the documents) the
Every merge hit the exception, yes.
And actually, the exceptions list is not used anywhere besides MT adding the
exception to the list. That's why I was curious why it's there.
I still think we should protect this case somehow, because even if it hits a
disk-full exception, there's no point conti
On Tue, Apr 28, 2009 at 6:09 AM, Shai Erera wrote:
> Hi
>
> I think I've hit a bug in ConcurrentMergeScheduler, but I'd like those who
> are more familiar with the code to review it. I ran
> TestStressSort.testSort() and started to get AIOOB exceptions from
> MergeThread, the CPU spiked to 98-100%
Hi Mike,
> This is great feedback on the new Collector API, Uwe. Thanks!
- Likewise.
> It's awesome that you no longer have to warm your searchers... but be
> careful when a large segment merge commits.
I know this, but in our case (e.g. creating a IN-SQL list, collecting
measurement parameter
deprecate term and getTerm in MultiTermQuery
Key: LUCENE-1621
URL: https://issues.apache.org/jira/browse/LUCENE-1621
Project: Lucene - Java
Issue Type: Improvement
Reporter: Mark Mille
[
https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703613#action_12703613
]
Shai Erera commented on LUCENE-1593:
bq. But actually: the thing calling scoresDocsInO
Okay, I agree - best would be to lose the method that does not make
sense for all multiterm queries.
I'll work on deprecating it and moving getTerm up to the sub queries
that it makes sense for.
- Mark
Uwe Schindler wrote:
During my implementations on trie range, I was always wondering, why
How to index and Search the special characters as well as non-english
characters like danish Å,ø,etc
-
Key: LUCENE-1620
URL: https://issues.apache.org/jira/browse
[
https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703600#action_12703600
]
Michael McCandless commented on LUCENE-1593:
bq. I actually prefer to add a bo
This sounds like a good change!
Then we'd un-deprecate Token? We could in fact then fix all core
tokenizers to use Tokens again.
I think given how simple these interfaces would be, it's an OK
situation to use interfaces? (Ie we disregard the normal back-compat
curse with interfaces).
Mike
On T
[
https://issues.apache.org/jira/browse/LUCENE-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703598#action_12703598
]
uday kumar maddigatla commented on LUCENE-1488:
---
hi,
i too just facing the
Hi Michael,
Sure, the Interfaces are solution to this. They define what Lucene core expects
from these entities and gives freedom to people to provide any implementation
they wish. E.g. users that do not need Offset information, can just provide
dummy implementation that returns constants...
[
https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703588#action_12703588
]
Shai Erera commented on LUCENE-1593:
bq. Good point - can you update HitQueue's javado
[
https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703570#action_12703570
]
Michael McCandless commented on LUCENE-1593:
bq. Ok sleeping did help.
OK...g
Hi
I think I've hit a bug in ConcurrentMergeScheduler, but I'd like those who
are more familiar with the code to review it. I ran
TestStressSort.testSort() and started to get AIOOB exceptions from
MergeThread, the CPU spiked to 98-100% and did not end for a couple of
minutes, until I was able to r
[
https://issues.apache.org/jira/browse/LUCENE-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703543#action_12703543
]
Eks Dev commented on LUCENE-1619:
-
thanks Mike
> TermAttribute.termLength() optimization
[
https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703540#action_12703540
]
Shai Erera commented on LUCENE-1593:
bq. I think I'd lean towards the 12 impls now. Th
[
https://issues.apache.org/jira/browse/LUCENE-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-1619.
Resolution: Fixed
Fix Version/s: 2.9
> TermAttribute.termLength() optimizat
Haha, isn't it funny, the same idea came to me on Sunday afternoon after I
answered to Eks Dev. But I have thrown it away, because interfaces are not
liked here. :-)
This new interface may also prevent us from using these useNewAPI() calls,
as the old TokenStream methods could be easily impleme
[
https://issues.apache.org/jira/browse/LUCENE-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703537#action_12703537
]
Michael McCandless commented on LUCENE-1619:
Indeed it seems unnecessary -- I'
[
https://issues.apache.org/jira/browse/LUCENE-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless reassigned LUCENE-1619:
--
Assignee: Michael McCandless
> TermAttribute.termLength() optimization
> -
Hmm -- this failed because the host "downloads.osafoundation.org"
fails to resolve. The contrib/db tests need to download the Berkeley
DB JARs from here.
Andi any idea what's up w/ that? Do we need to set a different
download location?
Mike
-- Forwarded message --
From: Apache
On Tue, Apr 28, 2009 at 2:38 AM, Uwe Schindler wrote:
> Why not deprecate getTerm() in MultiTermQuery, remove the field in
> MultiTermQuery and all related occurrences? The field and methods are then
> *not* deprecated and senseful implemented in Fuzzy*.
+1
Mike
---
Hi Eks Dev,
I actually started experimenting with changing the new API slightly to
overcome one drawback: with the variables now distributed over various
Attribute classes (vs. being in a single class Token previously),
cloning a "Token" (i.e. calling captureState()) is more expensive. This
s
74 matches
Mail list logo