> I am wondering if in an integrated solution, things like sorting still
> require the field cache? What if untokenized fields could be stored
> in H2, normal tokenized fields in Lucene. Then somehow make the query
> work properly. Yes the rowid would need to be stored. Currently
> Lucene range
I am wondering if in an integrated solution, things like sorting still
require the field cache? What if untokenized fields could be stored
in H2, normal tokenized fields in Lucene. Then somehow make the query
work properly. Yes the rowid would need to be stored. Currently
Lucene range queries a
Hi:
Integrating Lucene in a RDBMS has two separate concern:
- Integrate it as index to receive notification when a row change
and that the optimizer can choose a right execution plan based on the
index statistics.
- Replace Lucene file system store to align database changes with
Lucene changes,
Cool. I mention H2 because it does have some Lucene code in it yes.
Also according to some benchmarks it's the fastest of the open source
databases. I think it's possible to integrate realtime search for H2.
I suppose there is no need to store the data in Lucene in this case?
One loses the multi
:
: Is there a good place to place the javadocs on the Apache website once they
are more complete?
generated javadocs aren't really neccessary (at least not at this stage)
just having javadoc comments in the code makes it a lot easier to review
new contributions and patches (most people revi
Yes, both Marcelo and I would be interested.
We looked into H2 and it looks like something similar to Oracle's ODCI can
be implemented. Plus the primitive full-text implementación is based on
Lucene.
I say primitive because looking at the code I saw that one cannot define an
Analyzer and for each
Perhaps an interesting project would be to integrate Ocean with H2
www.h2database.com to take advantage of both models. I'm not sure how
exactly that would work, but it seems like it would not be too
difficult. Perhaps this would solve being able to perform faster
hierarchical queries and perhaps
On Mon, Sep 8, 2008 at 3:04 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> Right, getCurrentIndex would return a MultiReader that includes
> SegmentReader for each segment in the index, plus a "RAMReader" that
> searches the RAM buffer. That RAMReader is a tiny shell class that would
> basica
Term dictionary? I'm curious how that would be solved?
On Mon, Sep 8, 2008 at 3:04 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
>
> Yonik Seeley wrote:
>
>>> I think it's quite feasible, but, it'd still have a "reopen" cost in that
>>> any buffered delete by term or query would have to be "m
That sounds about correct and I don't think it matters much. I keep
the documents by default stored in InstantiatedIndex to 100. So the
heap size doesn't become a problem.
On Mon, Sep 8, 2008 at 2:58 PM, Karl Wettin <[EMAIL PROTECTED]> wrote:
> I need to point out that the only thing I know Inst
On Mon, Sep 8, 2008 at 3:56 PM, Ning Li <[EMAIL PROTECTED]> wrote:
> On Mon, Sep 8, 2008 at 2:43 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>> But, how would you maintain a static view of an index...?
>>
>> IndexReader r1 = indexWriter.getCurrentIndex()
>> indexWriter.addDocument(...)
>> IndexRead
On Mon, Sep 8, 2008 at 2:43 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> But, how would you maintain a static view of an index...?
>
> IndexReader r1 = indexWriter.getCurrentIndex()
> indexWriter.addDocument(...)
> IndexReader r2 = indexWriter.getCurrentIndex()
>
> I assume r1 will have a view of
Yonik Seeley wrote:
I think it's quite feasible, but, it'd still have a "reopen" cost
in that
any buffered delete by term or query would have to be
"materialiazed" into
docIDs on reopen. Though, if this somehow turns out to be a
problem, in the
future we could do this materializing immedi
I need to point out that the only thing I know InstantiatedIndex to be
great at is read access in the inverted index. It consumes a lot more
heap than RAMDirectory and InstantiatedIndexWriter is slightly less
efficient than IndexWriter.
Please let me know if your experience differs from the
On Mon, Sep 8, 2008 at 12:33 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> I'd also trying to make time to explore the approach of creating an
> IndexReader impl. that searches IndexWriter's RAM buffer.
That seems like it could possibly be the best performing approach in
the long run.
> I t
I'd also trying to make time to explore the approach of creating an
IndexReader impl. that searches IndexWriter's RAM buffer.
I think it's quite feasible, but, it'd still have a "reopen" cost in
that any buffered delete by term or query would have to be
"materialiazed" into docIDs on reop
InstantiatedIndex isn't quite realtime. Instead a new
InstantiatedIndex is created per transaction in Ocean and managed
thereafter. This however is fairly easy to build and could offer
realtime in Lucene without adding the transaction logging. It would
be good to find out what scope is acceptabl
Ning Li wrote:
I agree with Otis that the first step for Lucene is probably to
support real-time
search. The instantiated index in contrib seems to be something close..
Maybe we should start fleshing out what we want in realtime search on
the wiki?
Could it be as simple as making Instantiated
Hi,
We experimented using HBase's scalable infrastructure to scale out Lucene:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg01143.html
There is the concern on the impact of HDFS's random read performance
on Lucene search performance. And we can discuss if HBase's architecture
is best for scal
[
https://issues.apache.org/jira/browse/LUCENE-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-1327:
---
Fix Version/s: (was: 2.4)
We're still iterating on the approach to resolve this,
[
https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629164#action_12629164
]
Michael McCandless commented on LUCENE-914:
---
Since we're still having healthy dis
[
https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629157#action_12629157
]
Yonik Seeley commented on LUCENE-914:
-
bq. How about we change the spec for all skipTo'
[
https://issues.apache.org/jira/browse/LUCENE-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Elschot updated LUCENE-1379:
-
Attachment: LUCENE-1379.patch
The patch of 20080908 compiles, but it is untested because of
[
https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629142#action_12629142
]
Paul Elschot commented on LUCENE-914:
-
See LUCENE-1379 for the SpanScorer bug when slop
SpanScorer fails when sloppyFreq() returns 0
Key: LUCENE-1379
URL: https://issues.apache.org/jira/browse/LUCENE-1379
Project: Lucene - Java
Issue Type: Bug
Components: Search
Hi Joaquin,
Using HBase with realtime Lucene would be in line with what Google
does. However the question is whether or not this is completely
necessary or the most simple approach. That probably can only be
answered by doing a live comparison of the two! Unfortunately that
would require probab
[
https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629137#action_12629137
]
Paul Elschot commented on LUCENE-914:
-
Well, how about changing the TermDocs interface
[
https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629133#action_12629133
]
doronc edited comment on LUCENE-914 at 9/8/08 4:38 AM:
{quote}
[
https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629133#action_12629133
]
Doron Cohen commented on LUCENE-914:
{quote}
... else what happens is undefined ...
{qu
[
https://issues.apache.org/jira/browse/LUCENE-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller updated LUCENE-1357:
Attachment: LUCENE-1357.patch
> SpanScorer does not respect ConstantScoreRangeQuery setting
>
[
https://issues.apache.org/jira/browse/LUCENE-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629131#action_12629131
]
Michael McCandless commented on LUCENE-1279:
Grant, what's the game plan on th
[
https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629130#action_12629130
]
[EMAIL PROTECTED] edited comment on LUCENE-914 at 9/8/08 4:20 AM:
---
[
https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629130#action_12629130
]
Paul Elschot commented on LUCENE-914:
-
I had another look at these lines in Disjunction
[
https://issues.apache.org/jira/browse/LUCENE-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629127#action_12629127
]
Mark Miller commented on LUCENE-1357:
-
Ill put it up today...just wanted to make sure
[
https://issues.apache.org/jira/browse/LUCENE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629125#action_12629125
]
Shai Erera commented on LUCENE-1131:
I agree with the body, that's what I had in mind.
[
https://issues.apache.org/jira/browse/LUCENE-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629117#action_12629117
]
Michael McCandless commented on LUCENE-1357:
Mark, do you have a concrete patc
[
https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629115#action_12629115
]
Michael McCandless commented on LUCENE-914:
---
How about we change the spec for all
37 matches
Mail list logo