"Marvin Humphrey" <[EMAIL PROTECTED]> wrote:
> On Apr 4, 2007, at 10:05 AM, Michael McCandless wrote:
>
> >> (: Ironically, the numbers for Lucene on that page are a little
> >> better than they should be because of a sneaky bug. I would have
> >> made updating the results a priority if they'd go
"Ning Li" <[EMAIL PROTECTED]> wrote:
> On 4/4/07, Michael McCandless (JIRA) <[EMAIL PROTECTED]> wrote:
> > Note that for "autoCommit=false", this optimization is somewhat less
> > important, depending on how often you actually close/open a new
> > IndexWriter. In the extreme case, if you open a w
I understand your concerns!
I was a little skeptical at the beginning. But even with the 1.5 jvm,
the improvements still holds.
Lucene creates a lots of "garbage" (strings, tokens, ...) either at
index time or query time. While the new garbage collector strategies did
seriously improve since jav
[
https://issues.apache.org/jira/browse/LUCENE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Lef updated LUCENE-789:
--
Attachment: TestMultiSearcherSimilarity.java
Attached unit test
> Custom similarity is ignored when us
Once more, now to java-dev instead of to java-commits:
Otis,
Can I ask which tool you used to catch this, and the previous one?
Regards,
Paul Elschot
On Thursday 05 April 2007 03:06, [EMAIL PROTECTED] wrote:
> Author: otis
> Date: Wed Apr 4 18:06:16 2007
> New Revision: 525669
>
> URL: http:
[
https://issues.apache.org/jira/browse/LUCENE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jörg Hohwiller updated LUCENE-622:
--
Attachment: lucene-highlighter-2.0.0.pom
pom for lucene-highlighter
> Provide More of Lucene F
[
https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486942
]
Michael McCandless commented on LUCENE-843:
---
OK I ran old (trunk) vs new (this patch) with increasing RAM
wow, impressive numbers, congrats !
- Original Message
From: Michael McCandless (JIRA) <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Thursday, 5 April, 2007 3:22:32 PM
Subject: [jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to
buffer added documents
[
h
"eks dev" <[EMAIL PROTECTED]> wrote:
> wow, impressive numbers, congrats !
Thanks! But remember many Lucene apps won't see these speedups since I've
carefully minimized cost of tokenization and cost of document retrieval. I
think for many Lucene apps these are a sizable part of time spend index
On Apr 5, 2007, at 3:58 AM, Michael McCandless wrote:
The one thing that still baffles me is: I can't get a persistent
Posting hash to be any faster.
Don't use a hash, then. :)
KS doesn't.
* Give Token a "position" member.
* After you've got accumulated all the Tokens, calculate
po
[
https://issues.apache.org/jira/browse/LUCENE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jörg Hohwiller updated LUCENE-622:
--
Attachment: lucene-maven.patch
patch for partial mavenization of lucene
> Provide More of Luce
[
https://issues.apache.org/jira/browse/LUCENE-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487030
]
Jörg Hohwiller commented on LUCENE-622:
---
If you apply this patch to svn
(http://svn.apache.org/repos/asf/lucen
"Marvin Humphrey" <[EMAIL PROTECTED]> wrote:
>
> On Apr 5, 2007, at 3:58 AM, Michael McCandless wrote:
>
> > The one thing that still baffles me is: I can't get a persistent
> > Posting hash to be any faster.
>
> Don't use a hash, then. :)
>
> KS doesn't.
>
>* Give Token a "position" memb
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
> Jörg,
Hi Otis,
> Since you offered to help - please see
> https://issues.apache.org/jira/browse/LUCENE-622 .
> lucene-core POM is there for 2.1.0, but if you need POMs for contrib/*,
> please attach them to that issue. We have Jars, obviously,
>
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi Eric,
>
> On Apr 4, 2007, at 4:33 PM, Otis Gospodnetic wrote:
>> Eh, missing Jars in the Maven repo again. Why does this always get
>> dropped?
>
> Because none of us Lucene committers care much about Maven? :)
Its okay for you personally. And n
[
https://issues.apache.org/jira/browse/LUCENE-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487049
]
Michael McCandless commented on LUCENE-856:
---
OK I re-ran the above test (10 MM docs @ ~5,500 bytes plain te
On Apr 5, 2007, at 8:54 AM, Michael McCandless wrote:
So you basically do not "de-dup" by field+term on your first pass
through the tokens in the doc (which is "roughly" what that hash
does). Instead, append all tokens in an array, then sort first by
field+text and second by position? This is
At revision 525912:
[junit] Testsuite: org.apache.lucene.index.TestIndexWriter
[junit] Tests run: 16, Failures: 1, Errors: 0, Time elapsed: 52.161 sec
[junit]
[junit] Testcase:
testAddIndexOnDiskFull(org.apache.lucene.index.TestIndexWriter): FAILED
[junit] max free Directory
Nothing fancy - Eclipse. It flagged it, I removed it, nothing "turned red"
indicating everything still compiled, unit tests still passed, committed.
If I recall correctly, one has to configure Eclipse to alert you to unused
variables, methods, and such, and I have that turned on.
Otis
. . . .
On 4/4/07, Jean-Philippe Robichaud <[EMAIL PROTECTED]> wrote:
I understand your concerns!
I was a little skeptical at the beginning. But even with the 1.5 jvm,
the improvements still holds.
Lucene creates a lots of "garbage" (strings, tokens, ...) either at
index time or query time. While the
On Apr 5, 2007, at 3:58 AM, Michael McCandless wrote:
Marvin do you have any sense of what the equivalent cost is
in KS
It's big. I don't have any good optimizations to suggest in this area.
(I think for KS you "add" a previous segment not that
differently from how you "add" a document)?
"Paul Elschot" <[EMAIL PROTECTED]> wrote:
> At revision 525912:
>
> [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
> [junit] Tests run: 16, Failures: 1, Errors: 0, Time elapsed: 52.161
> sec
> [junit]
> [junit] Testcase:
> testAddIndexOnDiskFull(org.apache.lucene.
"Marvin Humphrey" <[EMAIL PROTECTED]> wrote:
>
> On Apr 5, 2007, at 8:54 AM, Michael McCandless wrote:
>
> > So you basically do not "de-dup" by field+term on your first pass
> > through the tokens in the doc (which is "roughly" what that hash
> > does). Instead, append all tokens in an array, t
What Mike said. Without seeing the Javalutionized Lucene in action we won't
get very far.
jean-Philippe, are you interested in making the changes to Lucene and showing
the performance improvement?
Note that you can use the super-nice and easy to use contrib/benchmark to
compare the "vanilla Luc
"Marvin Humphrey" <[EMAIL PROTECTED]> wrote:
> > (I think for KS you "add" a previous segment not that
> > differently from how you "add" a document)?
>
> Yeah. KS has to decompress and serialize posting content, which sux.
>
> The one saving grace is that with the Fibonacci merge schedule and
Yes, I believe enough in this approach to try it. I'm already starting
to play with it. I took the current trunk and I'm starting to play with
it. That begin said, I'm quite busy right now so I can't promise any
steady progress. Also, I won't apply patches that are already in JIRA,
so the numbe
I'm not in love with the dependency idea, though it's not that big of a deal
for me.
However, I think you will want to get some of the performance patched (e.g.
LUCENE-843) in first, so you can compare the latest and greatest version of
Lucene with your Javalutionized version. From what I gathe
Quick question, Mike:
You talk about a RAM buffer from 1MB - 96MB, but then you have the amount of
RAM @ flush time (e.g. Avg RAM used (MB) @ flush: old34.5; new 3.4 [
90.1% less]).
I don't follow 100% of what you are doing in LUCENE-843, so could you please
explain what these 2 dif
Sounds like I need to cut that out.
Since caching is built into the public BitSet bits(IndexReader reader) method,
I don't see a way to deprecate that, which means I'll just cut it out and
document it in CHANGES.txt. Anyone who wants QueryFilter caching will be able
to get the caching back by
Remove BitSet caching from QueryFilter
--
Key: LUCENE-857
URL: https://issues.apache.org/jira/browse/LUCENE-857
Project: Lucene - Java
Issue Type: Improvement
Reporter: Otis Gospodnetic
I'm not saying I'm against it, but one of the things that makes
Lucene so great is it's lack of dependencies in the core. It isn't
necessarily a slippery slope, either, if we do add one dependency.
Javolution is BSD license, AFAICT. I don't know if that is a good or
bad license as far as
[
https://issues.apache.org/jira/browse/LUCENE-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Otis Gospodnetic updated LUCENE-857:
Attachment: LUCENE-857.patch
QueryFilter without caching.
I'll commit it tomorrow (Friday)
On Apr 5, 2007, at 12:06 PM, Michael McCandless wrote:
(I think for KS you "add" a previous segment not that
differently from how you "add" a document)?
Yeah. KS has to decompress and serialize posting content, which sux.
The one saving grace is that with the Fibonacci merge schedule and
th
: Since caching is built into the public BitSet bits(IndexReader reader)
: method, I don't see a way to deprecate that, which means I'll just cut
: it out and document it in CHANGES.txt. Anyone who wants QueryFilter
: caching will be able to get the caching back by wrapping the QueryFilter
: in y
[
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Otis Gospodnetic updated LUCENE-584:
Attachment: bench-diff.txt
Perhaps I did something wrong with the benchmark, but I didn't g
: Thanks! But remember many Lucene apps won't see these speedups since I've
: carefully minimized cost of tokenization and cost of document retrieval. I
: think for many Lucene apps these are a sizable part of time spend indexing.
true, but as long as the changes you are making has no impact on
[
https://issues.apache.org/jira/browse/LUCENE-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487108
]
Matt Ericson commented on LUCENE-855:
-
I am almost done with my patch and I wanted to test it against this patch
On 4/5/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: Thanks! But remember many Lucene apps won't see these speedups since I've
: carefully minimized cost of tokenization and cost of document retrieval. I
: think for many Lucene apps these are a sizable part of time spend indexing.
true, bu
[
https://issues.apache.org/jira/browse/LUCENE-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487116
]
Hoss Man commented on LUCENE-857:
-
>From email since i didn't notice Otis opened this issue already...
Date: Thu, 5
On 4/4/07, Otis Gospodnetic (JIRA) <[EMAIL PROTECTED]> wrote:
[
https://issues.apache.org/jira/browse/LUCENE-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Otis Gospodnetic resolved LUCENE-796.
-
Resolution: Fixed
Makes s
"Mike Klaas" <[EMAIL PROTECTED]> wrote:
> On 4/5/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
> >
> > : Thanks! But remember many Lucene apps won't see these speedups since I've
> > : carefully minimized cost of tokenization and cost of document retrieval.
> > I
> > : think for many Lucene ap
Hi Otis!
"Otis Gospodnetic" <[EMAIL PROTECTED]> wrote:
> You talk about a RAM buffer from 1MB - 96MB, but then you have the amount
> of RAM @ flush time (e.g. Avg RAM used (MB) @ flush: old34.5; new
> 3.4 [ 90.1% less]).
>
> I don't follow 100% of what you are doing in LUCENE-843, so
"Marvin Humphrey" <[EMAIL PROTECTED]> wrote:
> On Apr 5, 2007, at 12:06 PM, Michael McCandless wrote:
>
> >>> (I think for KS you "add" a previous segment not that
> >>> differently from how you "add" a document)?
> >>
> >> Yeah. KS has to decompress and serialize posting content, which sux.
> >
Michael, like everyone else, I am watching this very closely. So far
it sounds great!
On Apr 5, 2007, at 8:03 PM, Michael McCandless wrote:
When I measure "amount of RAM @ flush time", I'm calling
MemoryMXBean.getHeapMemoryUsage().getUsed(). So, this measures actual
process memory usage w
"Grant Ingersoll" <[EMAIL PROTECTED]> wrote:
>
> Michael, like everyone else, I am watching this very closely. So far
> it sounds great!
>
> On Apr 5, 2007, at 8:03 PM, Michael McCandless wrote:
>
> > When I measure "amount of RAM @ flush time", I'm calling
> > MemoryMXBean.getHeapMemoryUsage
On Apr 5, 2007, at 5:26 PM, Michael McCandless wrote:
What we need to do is cut down on decompression and conflict
resolution costs when reading from one segment to another. KS has
solved this problem for stored fields. Field defs are global and
field values are keyed by name rather than fiel
Joerg Hohwiller wrote:
>> When we'll need .sha1 and .md5 files for all pushed Jars.
>> One of the other developers will have to do that,
>> as I don't have my PGP set up,
>> and hence no key for the KEYS file (if that's needed for the .sha1).
> You do not need PGP or something like this for SHA-* o
47 matches
Mail list logo