> FYI, the AMD Phenom also has the POPCNT instruction.
Don't have access to a computer with this one either. Seems like I
need to invest in hardware a bit.
I have found a person with I7 though -- the results are attached to
the JIRA issue, about 20% speedup.
Dawid
---
> Interested in some Core i5 benchmarks?
Sure, add them to the JIRA issue if you can, please.
> I just ran the benchmark locally on latest JDK 6 and it was slightly better
> than the I7 results you posted which made me wonder..
Well, like I said -- they may depend on the architecture of the
comp
Please contact Dawid Weiss (in CC:), he had a well-advanced port,
perhaps it just needs a little polishing (Polish-ing? :) .
Yes, this project is in fact still on my list... I do have a partial
implementation of Thinlet API that emulates it in Swing. With a JGoodies
look and feel the
Hello everyone,
I'm looking for feedback and thoughts on the following problem (it's more of
development than user-centered problem, hope the dev list is appropriate):
- a token stream is given,
- a set of "synonyms" is given, where synonyms are token sequences to be matched
and token seque
Your synonyms will break if you try searching for phrases.
Good point, I did write that filter, but I never actually got to searching for
exact phrases in it (there was a very specific scenario and we used prefix
queries which worked quite well).
Building on your example, "food place in n
Well, everyone has his own requirements for the search quality. For us
it was a problem.
The topic is subjective... I don't see this as a deterioration in search
quality. Let me explain.
Your example concerns phrase queries, so somebody would have to keep adding
terms to a phrase. My exper
engine. So guys looking for "MSU CMC" really want to get "Московский
Государственный Университет, факультет ВМиК" and his friends.
And? How often do they extend this particular phrase with further terms? It must
be fun to have an index running concurrently on multi language synonyms, mixing
It'd be great to get multi-word synonyms fully working...
I agree -- this is something that seems to be useful for a wider bunch of
people.
How would you change how Lucene indexes token positions to do this "correctly"?
Kirill has some interesting points to this. I have a busy day today,
Apologies for the delay, guys. I tried to solve certain issues that didn't pop
up in my application (as Kirill said, the problem is indeed quite complex). I
didn't find all the answers I had been looking for, but nonetheless -- the patch
that works for my needs is in JIRA. I would be really in
Hi there,
Is there anyone with access to an Intel I7-machine? I'd be curious
what the results of this benchmark are, given the new JVM intrinsics
introduced in HotSpot 1.7:
https://issues.apache.org/jira/browse/LUCENE-2221
There is an executable JAR file attached to the issue. Run with (must
be
I like it too. And I'm wondering what the response to this will be -- it will
in a way show if TREC really stands up to their mission, won't it?
D.
Grant Ingersoll wrote:
How does this sound:
Dear ,
My name is Grant Ingersoll and I am committer on the Lucene Java search
library (http:
It's more of a chicken-and-egg problem I guess; it's the same with E.U. grants
and local science grants over here (Poland) -- the government funds some
projects, but who if not us funds the government? I am a strong believer that
the results of public grants should be open and available for ev
This gets even more complicated when you throw Polish in. We do have diacritics
(such as ó, ż, ź or ą)
http://www.fileformat.info/info/unicode/char/0105/index.htm
but we _also_ have things like "ł" (l with a stroke):
http://www.fileformat.info/info/unicode/char/0142/index.htm
I don't think
I'm putting together a Google Web Toolkit-based version of Luke:
http://www.inperspective.com/lucene/Luke.war
This is neat, Mark!
At first I thought: darn, how the heck is he accessing the filesystem from
JavaScript (GWT or otherwise)?! Then it became clear to me that it's actually
the _
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802927#action_12802927
]
Dawid Weiss commented on LUCENE-2221:
-
Results from Intel I7 -- an improvemen
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: (was: benchmark.jar)
> Micro-benchmarks for ntz and pop (BitUtils) operati
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: benchmark.jar
An updated set of benchmarks (simple loops and JRE ntz/pop).
> Mi
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: (was: lucene-bitset-benchmarks.zip)
> Micro-benchmarks for ntz and pop (BitUt
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: lucene-bitset-benchmarks.zip
Updated source code for the benchmarks.
> Mi
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803251#action_12803251
]
Dawid Weiss commented on LUCENE-2221:
-
Confirmed, with a simple loop it is
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss resolved LUCENE-2221.
-
Resolution: Later
I'm done with these benchmarks. The results so far indicate that
a)
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804166#action_12804166
]
Dawid Weiss edited comment on LUCENE-2221 at 1/23/10 10:5
[
https://issues.apache.org/jira/browse/LUCENE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848054#action_12848054
]
Dawid Weiss commented on LUCENE-2298:
-
Staszek suggested that perhaps it woul
[
https://issues.apache.org/jira/browse/LUCENE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848139#action_12848139
]
Dawid Weiss commented on LUCENE-2298:
-
I agree about classpath issues, they'
[
https://issues.apache.org/jira/browse/LUCENE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848270#action_12848270
]
Dawid Weiss commented on LUCENE-2298:
-
The answer from the developer is: pick
[
https://issues.apache.org/jira/browse/LUCENE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848648#action_12848648
]
Dawid Weiss commented on LUCENE-2298:
-
The dictionary's author st
[
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848649#action_12848649
]
Dawid Weiss commented on LUCENE-2341:
-
Robert, should I wait for Stempel patch f
[
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849114#action_12849114
]
Dawid Weiss commented on LUCENE-2341:
-
Oh, I forgot about this -- yes, you'
[
http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12436972 ]
Dawid Weiss commented on LUCENE-675:
First -- I think it's a good initiative. Grant, when you're thinking about the
infrastructure, it would be pret
Feature
Components: contrib/*
Reporter: Dawid Weiss
Priority: Minor
Attachments: synonyms.patch
It would be useful to have a filter that provides support for indexing-time
synonym expansion, especially for multi-word synonyms (with multi-word matching
for
[
https://issues.apache.org/jira/browse/LUCENE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-1622:
Attachment: synonyms.patch
Token filter implementing synonyms. Java 1.5 is required to compile it
Components: Other
Affects Versions: 3.0, 2.9.1, 2.9
Reporter: Dawid Weiss
Priority: Minor
OpenBitSet uses an internal buffer of long variables to store set bits and an
additional 'wlen' index that points
to the highest used component inside {...@link #bi
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2216:
Attachment: openbitset.patch
> OpenBitSet#hashCode() may return false for identical s
[
https://issues.apache.org/jira/browse/LUCENE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801160#action_12801160
]
Dawid Weiss commented on LUCENE-2213:
-
Not to be picky, Michael, but is
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801195#action_12801195
]
Dawid Weiss commented on LUCENE-2216:
-
Hi Yonik,
This class is not thread-
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801198#action_12801198
]
Dawid Weiss commented on LUCENE-2216:
-
Perhaps this is for another patch, but Bit
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801221#action_12801221
]
Dawid Weiss commented on LUCENE-2216:
-
This is only true if there is happens-be
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801230#action_12801230
]
Dawid Weiss commented on LUCENE-2216:
-
This is not entirely what I had in mind (
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801230#action_12801230
]
Dawid Weiss edited comment on LUCENE-2216 at 1/16/10 5:2
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801240#action_12801240
]
Dawid Weiss commented on LUCENE-2216:
-
uff, I started having doubts in my
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801263#action_12801263
]
Dawid Weiss commented on LUCENE-2216:
-
Chances of this happening are really
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801265#action_12801265
]
Dawid Weiss commented on LUCENE-2216:
-
For what it's worth, I checked the
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801269#action_12801269
]
Dawid Weiss commented on LUCENE-2216:
-
Ok, argument accepted.
> OpenBitSet#h
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801272#action_12801272
]
Dawid Weiss commented on LUCENE-2216:
-
Ah, ok -- I thought ntz in BitUtils is
[
https://issues.apache.org/jira/browse/LUCENE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801278#action_12801278
]
Dawid Weiss commented on LUCENE-2213:
-
What Yonik suggested is yet ano
[
https://issues.apache.org/jira/browse/LUCENE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801371#action_12801371
]
Dawid Weiss commented on LUCENE-2213:
-
How about if you assert that minTargetSiz
Components: Other
Reporter: Dawid Weiss
Priority: Trivial
As suggested by Yonik, I performed a suite of micro-benchmarks to investigate
the following:
* pop() (bitCount) seems to be implemented in the same way ("hacker's delight")
as in the BitUtils class
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: results-popntz.txt
Performance test results.
> Micro-benchmarks for ntz and
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801733#action_12801733
]
Dawid Weiss commented on LUCENE-2221:
-
Yes, this would be my initial sugges
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801858#action_12801858
]
Dawid Weiss commented on LUCENE-2221:
-
Look closely at the results above, Yoni
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: (was: results-popntz.txt)
> Micro-benchmarks for ntz and pop (BitUt
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: results-popntz.txt
Plain ASCII results.
> Micro-benchmarks for ntz and pop (BitUt
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801888#action_12801888
]
Dawid Weiss commented on LUCENE-2221:
-
I had a suspicion this must be the cas
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: benchmarks.txt
Benchmark results for array operations and iterators comparing the
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: benchmark.jar
Executable Java JAR with benchmarking code for anybody that wishes to
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802823#action_12802823
]
Dawid Weiss commented on LUCENE-2221:
-
I wrote a set of micro-benchmarks compa
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: lucene-bitset-benchmarks.zip
Benchmarks, source code.
> Micro-benchmarks for ntz
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802900#action_12802900
]
Dawid Weiss commented on LUCENE-2221:
-
I do have a bunch of dinosaur-age compu
[
https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521201
]
Dawid Weiss commented on LUCENE-871:
Not exactly true, Mike. Switch statements are implemented as table lookups
[
https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521340
]
Dawid Weiss commented on LUCENE-871:
I guess it's a matter of just writing down two versions and comparing
[
https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521350
]
Dawid Weiss commented on LUCENE-871:
Funny -- I just did the same, but my compiler (Eclipse JDT) generated a
[
https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521353
]
Dawid Weiss commented on LUCENE-871:
To clarify: depending on the compiler/ hotspot you may get linear time
[
https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521361
]
Dawid Weiss commented on LUCENE-871:
I was a bit curious about it, so I decided to write a table-lookup version
[
https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-871:
---
Attachment: ISOLatin1AccentFilterAlt.java
A table-lookup version of ISO latin filter (this is not a
64 matches
Mail list logo