AFAIK this is still under heavy development and it doesn't seem to be ready in
the near future.
It's stable as far as I'm concerned.
Lucene-2454 includes the code and Junit tests that work with the latest 3.0.3
release. I have versions of this running in production with 2.4 and 2.9-based
I think what would be best is a smallish but feature complete demo,
For the nested stuff I had a reasonable demo on LUCENE-2454 that was based
around resumes - that use case has the one-to-many characteristics that lends
itself to nested e.g. a person has many different qualifications and
There are a number of scenarios where Lucene might be used to index a fixed
time range on a continuous stream of data e.g. a news feed.
In these scenarios I imagine the following facilities would be useful:
a) A MergePolicy that organized content into segments on the basis of
increasing time
you can do that by subclassing IW and call some package private APIs /
To date I have used separate physical indexes with a MultiReader to combine
them then dropping the outdated indexes.
At least this has the benefit that a custom MergePolicy is not required to keep
content from the
Good to have you aboard, Greg!
- Original Message -
From: Erick Erickson erickerick...@gmail.com
To: dev@lucene.apache.org
Cc:
Sent: Thursday, 21 June 2012, 11:56
Subject: Welcome Greg Bowyer
I'm pleased to announce that Greg Bowyer has been added as a
Lucene/Solr committer.
Greg:
I have been working on a hierarchical search capability for a while now and
wanted to see if there was general interest in adopting some of the thinking
into Lucene.
The idea needs a little explanation so I've put some slides up here to kick
things off:
I've put up code, example data and tests for the Nested Document feature here:
http://www.inperspective.com/lucene/LuceneNestedDocumentSupport.zip
The data used in the unit tests is chosen to illustrate practical use of
real-world content.
The final unit tests will work on more abstract data
of Luke that Mark Harwood started ever get dumped to
JIRA or anything? All I can find is a link to a war, but not the source.
Mark? Anyone?
- Mark
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional
it under lucene contrib?
Thanks
-John
On Fri, Jul 9, 2010 at 7:26 AM, Mark Harwood markharw...@yahoo.co.uk wrote:
See
http://search.lucidimagination.com/search/document/63cef9e98692a126/webluke_include_jetty_in_lucene_binary_distribution
There's a link to a zip file with source
Agreed. I think apache is a preferable home.
The major change to Luke in providing a Luke core api is the need to be
remotable i.e. Use of an interface and serializable data objects used for args.
Gwt rpc should take care of the marshalling and I've used similar frameworks
for applet clients.
Due to the odd behaviour of a custom Scorer of mine I discovered
ConjunctionScorer.doNext() could loop indefinitely.
It does not bail out as soon as any scorer.advance() call it makes reports back
NO_MORE_DOCS. Is there not a performance optimisation to be gained in exiting
as soon as this
.
- Original Message -
From: mark harwood markharw...@yahoo.co.uk
To: dev@lucene.apache.org dev@lucene.apache.org
Cc:
Sent: Thursday, 1 March 2012, 9:39
Subject: ConjunctionScorer.doNext() overstays?
Due to the odd behaviour of a custom Scorer of mine I discovered
ConjunctionScorer.doNext
; mark harwood markharw...@yahoo.co.uk
Cc:
Sent: Thursday, 1 March 2012, 13:31
Subject: Re: ConjunctionScorer.doNext() overstays?
Hmm, the tradeoff is an added per-hit check (doc != NO_MORE_DOCS), vs
the one-time cost at the end of calling advance(NO_MORE_DOCS) for each
sub-clause? I think
class could avoid a docID() method
invocation?
Anyhoo the profiler did not show that method up as any sort of hotspot so I
don't think it's an issue.
Thanks, Mike.
- Original Message -
From: Michael McCandless luc...@mikemccandless.com
To: dev@lucene.apache.org; mark harwood
Ideally, consumers of DISI should hold onto the int docID returned
from next/advance and use that... (ie, don't call docID() again,
unless it's too hard to hold onto the returned doc).
Yes, I remember raising that way back when:
Does anyone have any ideas?
A framework for match metadata?
Similar to the way tokenization was changed to allow tokenizers to to enrich a
stream of tokens with arbitrary attributes, Scorers could provide
MatchAttributes to provide arbitrary metadata about the stream of matches
they produce.
Hi Mark
I've played with Shingles recently in some auto-categorisation work where my
starting assumption was that multi-word terms will hold more information value
than individual words and that phrase queries on seperate terms will not give
these term combos their true reward (in terms of IDF)
be content to just compare it to baseline random chance.
Mark B
--
Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
On Fri, Sep 10, 2010 at 3:17 AM, mark harwood markharw...@yahoo.co.uk wrote:
Hi Mark
I've played
I've been looking at Graph Databases recently (neo4j, OrientDb, InfiniteGraph)
as a faster alternative to relational stores. I notice they either embed Lucene
for indexing node properties or (in the case of OrientDB) are talking about
doing this.
I think their fundamental performance
It should be possible to randomly add and delete such relationships after
indexWriter.addDocument(), is that the idea?
Yes. A like action may, for example allow me to tag an existing document by
connecting 2 documents - my personal like document and a document with
content of interest.
This slideshow has a first-cut on the Lucene file format extensions required to
support fast linking between documents:
http://www.slideshare.net/MarkHarwood/linking-lucene-documents
Interested in any of your thoughts.
Cheers,
Mark
/document/c871ea4672dda844/aw_incremental_field_updates#7ef11a70cdc95384
[2]
http://www.lucidimagination.com/search/document/ee102692c8023548/incremental_field_updates#13ffdd50440cce6e
On Sep 24, 2010, at 10:36 AM, mark harwood wrote:
This slideshow has a first-cut on the Lucene file format
path finding analysis is perhaps not a typical Lucene application but
other forms of link analysis e.g. recommendation engines require similar
performance.
Cheers
Mark
On 25 Sep 2010, at 11:41, Paul Elschot wrote:
Op vrijdag 24 september 2010 17:57:45 schreef mark harwood:
While not exactly
Perhaps another way of thinking about the problem:
Given a large range of IDs (eg your 300 million) you could constrain the number
of unique terms using a double-hashing technique e.g.
Pick a number n for the max number of unique terms you'll tolerate e.g. 1
million and store 2 terms for every
Good point, Toke. Forgot about that. Of course doubling the number of hash
algos used to 4 increases the space massively.
On 21 Oct 2010, at 22:51, Toke Eskildsen t...@statsbiblioteket.dk wrote:
Mark Harwood [markharw...@yahoo.co.uk]:
Given a large range of IDs (eg your 300 million) you
Look at BooleanQuery with 2 must clauses - one for the query, one for a
ConstantScoreQuery wrapping the filter.
BooleanQuery should then use automatically use skips when reading matching docs
from the main query and skip to the next docs identified by the filter.
Give it a try, otherwise you may
Here's a rough overview I mapped out as a sequence diagram for the search side
of things some time ago: http://goo.gl/lE6a
- Original Message
From: Jeff Zhang zjf...@gmail.com
To: dev@lucene.apache.org
Sent: Mon, 1 November, 2010 5:43:08
Subject: How can I get started for
@lucene.apache.org
Sent: Mon, 8 November, 2010 19:03:59
Subject: Re: Document links
Any updates/progress with this?
I'm looking at ways to implement an RTree with lucene -- and this
discussion seems relevant
thanks
ryan
On Sat, Sep 25, 2010 at 5:42 PM, mark harwood markharw...@yahoo.co.uk wrote
, Ryan McKinley ryan...@gmail.com wrote:
On Mon, Nov 8, 2010 at 2:52 PM, mark harwood markharw...@yahoo.co.uk wrote:
I came to the conclusion that the transient meaning of document ids is too
deeply ingrained in Lucene's design to use them to underpin any reliable
linking.
What about if we define
I was using within-segment doc ids stored in link files named after both the
source and target segments (a link after all is 2 endpoints).
For a complete solution you ultimately have to deal with the fact that doc ids
could be references to:
* Stable, committed docs (the easy case)
* Flushed but
I've been looking at the BlockJoin stuff in 3.4 in relation to children of
multiple types and have a couple of concerns which are either issues, or my
ignorance of the API:
Concern #1
If I only retrieve children of type A all is well.
If I only retrieve children of type B all is well.
limited by the number of docs you can hold in RAM as part of the
original IW.addDocuments call - i.e. not in the millions.
Cheers,
Mark
- Original Message -
From: Michael McCandless luc...@mikemccandless.com
To: dev@lucene.apache.org; mark harwood markharw...@yahoo.co.uk
Cc:
Sent
I've been spending quite a bit of time recently benchmarking various Key-Value
stores for a demanding project and been largely disappointed with results
However, I have developed a promising implementation based on these concepts:
http://www.slideshare.net/MarkHarwood/lucene-kvstore
The code
.
Did you try all the well known ones?
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
-- J
On Thu, Mar 22, 2012 at 10:42 AM, mark harwood markharw...@yahoo.co.uk
wrote:
I've been spending quite a bit of time recently benchmarking various
Key-Value stores
Random question: Do you basically end up with something very similar to
LevelDB that many people where talking about a few weeks ago ?
Haven't looked at LevelDB because I was concentrating on Java implementations.
Riak's Bitcask is the most similar in principle but I didn't like the
OK I have some code and benchmarks for this solution up on a Google Code
project here: http://code.google.com/p/graphdb-load-tester/
The project exists to address the performance challenges I have encountered
when dealing with large graphs. It uses all of the Wikipedia links as a test
dataset
Instead of making other APIs to accomodate BloomFilter's current
brokenness: remove its custom per-field logic so it works with
PerFieldPostingsFormat, like every other PF.
Not looked at it in a while but I'm pretty certain, like every other PF, you
can go ahead and use PerFieldPF with Bloom
+1
On 2020/05/12 07:36:57, Dawid Weiss wrote:
> Dear Lucene and Solr developers!
>
> According to an earlier [DISCUSS] thread on the dev list [2], I am
> calling for a vote on the proposal to make Solr a top-level Apache
> project (TLP) and separate Lucene and Solr development into two
>
In Lucene-9445 we'd like to add a case insensitive option to regex queries
in the query parser of the form:
/Foo/i
However, today people can search for :
/foo.com/index.html
and not get an error. The searcher may think this is a query for a URL but
it's actually parsed as a regex
ys very skeptical of adding the regexes, as it breaks
> many queries. Now it’s even more.
>
>
>
> Uwe
>
>
>
> -
>
> Uwe Schindler
>
> Achterdiek 19, D-28357 Bremen
>
> https://www.thetaphi.de
>
> eMail: u...@thetaphi.de
>
>
>
> *From:* Mark
>You could avoid (some of?) these problems by supporting /(?i)foo/ instead
of /foo/i
That would avoid our parsing dilemma but brings some other concerns. This
inline syntax can normally be used to selectively turn on case sensitivity
for sections of a regex and then turn it off with (?-i).
We
n my opinion, the proposed syntax change should enforce to have whitespace
> or any other separator chat after the regex “i” parameter.
>
> Uwe
>
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
Responses were light last time around:
I'd like to propose Wolfgang Hoschek should be given
commit rights to maintain his MemoryIndex contribution.
___
How much free photo storage do you get? Store your holiday
snaps for
That would use more memory, but still permit ranked
searches. Worth it?
Not sure. I expect FuzzyQuery results would suffer if
the edit distance could no longer be factored in. At
least there's a quality threshold to limit the more
tenuous matches but all matches below the threshold
would be
The Highlighter in the lucene contrib section has a
class called TokenSources which tries to find the best
way of getting a TokenStream.
It can build a TokenStream from either:
a) an Analyzer
b) TermPositionVector (if the field was created with
one in the index)
You may find that using
See IBM's UIMA project or Gate for Entity extraction
tools.
Cheers
Mark
ps this is a java-user question, not a java-dev topic.
--- Mario Alejandro M. [EMAIL PROTECTED] wrote:
I'm building a search engine couple with database
info... I wanna to detect
things like phone-numbers, adress,
a) do any other committers want a license, and
I'd appreciate a license.
b) would we be willing to put their logo somewhere
in exchange?
That seems a fair exchange provided that we 1) find
the product useful and 2) that it doesn't contravene
any apache directives about use of their
I don't think DOM and RAM is necessarily an issue.
The object construction process accesses the content
in the same order that a SAX based path takes so that
just seems an appropriate approach. There is no need
to leap around the structure in any other way from
what I can see, which is where DOM
However the moment you are promoting INTEROPERABILITY
with other
search/retrieval systems by XMLizing the query input
and the result output, like Mark is, then it makes
sense to adhere to standards
I think this is hijacking my original intentions to
some extent. I may be accused of being
Hi Chris,
Thanks for taking the time to review this.
1) I aplaud the plugable nature of your solution.
That's definitely a worthwhile objective.
2) Digging into what was involved in writting an
ObjectBuilder, I found...
don't really feel like
the API has a very clean seperation from SAX.
I suspect it's a little too ambitious to provide a
unifying common abstraction which wraps event based
*and* pull parser approaches.
I'm personally happier to stick with one approach,
preferably with an existing, standardized interface
which lets me switch implementations. I didn't really
want
Yes, I've found MemoryIndex to be very fast for this
kind of thing. This contribution can be used to
further optimize and shortlist the queries to be run
against the new document sat in MemoryIndex.
___
To help you stay
This example code looks interesting. If I understand
correctly using this approach requires that builders
like the q QueryObjectBuilder instance must be
explicitly registered with each and every builder that
consumes its type of output eg BQOB and FQOB. An
alternative would be to register q just
I've just been doing some benchmarking on a reasonably
large-scale system (38 million docs) and ran into an
issue where certain *very* common terms would
dramatically slow query responses.
Some terms were abnormally common because I had
constructed the index by taking several copies and
merging
Thanks for the comments, Chris/Doug.
Chris, although I suggested it initially, I'm now a
little uncomfortable in controlling this issue with a
static variable in TermQuery because it doesnt let me
have different settings for different queries, indexes
or fields.
Doug, I'd ideally like to optimize
Before I commit this stuff to contrib I wanted to
sound out dev members on directions for this code.
We currently have an extensible parser with composable
builder modules. These builders currently only have
a role in life which involves parsing particular XML
chunks and instantiating the related
I don't think option 3 is baked in at indexing time.
Sorry, I misread it. Yes, that is another option.
So if options 3 and 4 are about search-time selection
(based on size and fieldname respectively) can they be
generalized into a more wide-reaching retrieval API?
You can imagine a high-level
Having switched the highlighter over from lots of
Query-specific code to using the generic
Query.extractTerms API I realize I have both gained
something (support for all query types) and lost
something (detailed boost info for each term in the
tree eg Fuzzy spelling variants). The boost info was
It's still the case that you often need to know what
type of query the
parent is.
For highlighting purposes I typically don't need/want
to concern myself too much with precisely interpreting
the specifics of all Query logic:
* For Boolean queries the mustNot terms typically
don't appear in the
If you are wanting to select highlights from a
document where only whole sentences are the fragments
selected you will need to implement a custom
Fragmenter class.
This will need to look for sentence boundaries eg a
. followed by whitespace only, then a word with an
uppercase first character.
I
I added something similar to Luke but without the
colour intensity - I may add your code in to do this.
Another Luke plugin I have visualizes vocabulary
growth for a field as a chart over time. This is
useful to see if a field is matured or is still
accumulating new terms.
A Zipf term distribution
I can pick this up, but I don't think I've got much
more bandwidth than Andrzej to work on it.
I certainly don't have the time now for a port to an
Apache-friendly GUI framework but ultimately I think
Luke should end up under the contrib section where
it can be managed and benefit from the
FWIW, I integrated sourceforge's SecondString algos
(http://secondstring.sourceforge.net/javadoc ) and others using a callout
interface which boiled down to:
float getDifference(String a, String b)
This seemed to be the cleanest lowest-common-denominator standard for plugging
in string
Given the trouble people routinely get themselves into using RangeQuery would
it make sense to change the rewrite method to generate a ConstantScoreQuery
wrapping a RangeFilter?
The only disadvantages I can see would be:
1) Scoring would change - some users may find their apps produce
are there any legitimate usecases for calling rewrite other then when a
Searcher is about to execute the query?
When using the highlighter it is recommended to use a rewritten query e.g. to
get all the variations for a fuzzy query.
However I don't think there should be a problem with the
Any objections to me adding this read-only method to ConstantScoreQuery?
I need to discover RangeFilters etc wrapped in ConstantScoreQuerys as part of a
generic query optimiser/analyser.
Cheers,
Mark
Hi Rida,
I've been talking with Jukka Zitting (involved in Nutch) about parsing/Tika and
we started to sketch out some project objectives on the Wiki over there which
may be of interest:
http://code.google.com/p/tika/w/list
I recently did a round-up of the main open source projects which
Is it correct to compare using '==' or equals should be used instead?
In this context it is OK. Term fieldnames are deliberately interned using
String.intern() so this equality test can be used.
The intention is to make comparisons faster.
Cheers,
Mark
- Original Message
From: dmitri
Thanks for the pointers Paul.
I just don't think you can 'package' up a distribution that includes these
jars in your distribution.
Clearly the binary distribution need not bundle servlet-api.jar - a demo.war
file is all that is needed.
However, is the source distribution exempt from this
Mostly, though, I think it gives Lucene Java the feel that we are behind.
Isn't 1.6 the actual official release at this point?
I wouldn't say behind, just concerned about enabling Lucene for all - in the
same way popular websites might choose broad accessibility over using the
latest AJAX
Subject: Re: Fwd: Decouple Filter from BitSet: API change and xml query parser
On Friday 10 August 2007 13:12, mark harwood wrote:
Could someone give me a clue as to why the test case
TestRemoteCachingWrapperFilter fails with the patch applied?
Regardless of the reasons for this particular
and DocIdSetIterator currently part of Lucene?
I don't know how to go about these.
Regards,
Paul Elschot
-- Forwarded Message --
Subject: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet
Date: Friday 10 August 2007 01:15
From: Mark Harwood (JIRA) [EMAIL PROTECTED
This is neat, Mark!
Thanks - GWT rocks.
Then it became clear to me that it's
actually the _remote_ filesystem one is looking at (the server's).
Yes, that's a potentially worrying security issue that needs locking down
carefully. I think one mode of operation should be that Luke Server is
I don't know that we have ever checked in IDE settings
GWT development is much easier with the IDE and there is a fair amount of
manual setup required without the settings to run the hosted development
environment. Hosted development is the key productivity benefit and allows
debugging in Java
Hi Manik,
Is
there
a
set
of
tests
in
the
Lucene
sources
I
could
use
to
test the
JBCDirectory,
as
I
call
it?
You would probably need to adapt existing Junit tests in
contrib/benchmark and src/test for performance and functionality
testing, respectively.
They use the
I'm chasing down a bug in my application where multiple threads were readingand
caching the same filter (same very common term, big index) and causedan Out of
Memory exception when I would expect there to be plenty ofmemory to spare.
There's a number of layers to this app to investigate (I was
(reader). This is safe when the cache is private.
Regards,
Paul Elschot
Op Monday 18 February 2008 13:50:16 schreef mark harwood:
I'm chasing down a bug in my application where multiple threads were
readingand caching the same filter (same very common term, big index) and
causedan Out of Memory
Why don't use ivy or maven for that?
That would resurrect the Ant vs Maven debate around build systems. Not having
used Maven I don't feel qualified to comment.
Stefan, the Winstone server appears to be LGPL not Apache which also adds some
complexity. The GWT compiler is the main cause of the
Not tried SweetSpot so can't comment on worthiness of moving to core but agree
with the principle that we can't let the hassles of a company's due diligence
testing dictate the shape of core vs contrib.
For anyone concerned with the overhead of doing these checks a company/product
of potential
One way is to read TermDocs for each candidate term and see if they are in your
filter - but that sounds like a lot of disk IO to me when responding to
individual user keystrokes.
You can use skip to avoid reading all term docs when you know what is in the
filter but it all seems a bit costly.
Interesting discussion.
I think we should seriously look at joining efforts with open-source Database
engine projects
I posted some initial dabblings here with a couple of the databases on your
list :http://markmail.org/message/3bu5klzzc5i6uhl7 but this is not really a
scalable solution
You might want to try the XML query parser in contrib. I deliberately created
this to allow remote clients to have full control over lucene (filters, caching
etc) without trying to bloat the standard query parser with special characters.
On 13 Sep 2008, at 18:26, Shai Erera [EMAIL PROTECTED]
since not many people, I think, even use the RMI stuff
I certainly binned RMI in my distributed work.
It just would not reliably stop/restart cleanly in my experience - despite
following all the RMI guidelines for clean shutdowns.
I'd happily see all RMI dependencies banished from core.
Hi Mike,
Given the repackaging any chance you can sneak in 2 contrib fixes I added
recently?
Null pointer introduced to clients dropping in 2.4 upgrade -
http://svn.apache.org/viewvc?view=revrevision=700815
Bug in fuzzy matching -
/lucene2.4take3
Here's my vote: +1.
Mike
mark harwood wrote:
Hi Mike,
Given the repackaging any chance you can sneak in 2 contrib fixes
I added recently?
Null pointer introduced to clients dropping in 2.4 upgrade -
http://svn.apache.org/viewvc?view=revrevision=700815
Bug in fuzzy matching
Just checked Solr (forgot about that obvious precedent!) and they have it in
trunk/lib and an entry in trunk/notice.txt which reads:
Includes software from other Apache Software Foundation projects, including,
but not limited to:
- Apache Tomcat (lib/servlet-api-2.4.jar)
I'm not sure I see an easy translation of copyright !mycompany into
SpanQueries which is how all the other queries are being converted.
SpanNotQuery isn't applicable here because that only tests spans don't overlap.
Yonik's approach looks good.
- Original Message
From: Yonik Seeley
I'm OK with LIA2 on the front page - as Erik suggests it does help lend
credibility to a project.
I encounter organisations who are nervous about buying into an open-source
solution and having books up there on the home page immediately helps establish
the following:
1) The APIs are stable
Welcome, Uwe.
Great work on the Trie piece - now if you could just settle the Tree vs Try
pronunciation dilemma .
:)
- Original Message
From: Mark Miller markrmil...@gmail.com
To: java-dev@lucene.apache.org
Sent: Monday, 18 May, 2009 17:46:51
Subject: Re: Welcome Uwe Schindler
When you create IndexReader, IndexWriter and others, you must pass in a
Settings
instance.
I think this would also help solve the steady growth of constructor variations
(18 in 2.4's IndexWriter vs 3 in Lucene 1.9).
- Original Message
From: Otis Gospodnetic
Hi John/Grant.
I haven't done any more in developing WebLuke - although still use it regularly.
As Grant suggests there was an unease (mine) about bloating the Lucene
distribution size with GWT dependencies so it wasn't rolled into contrib.
However I guess I'm comfortable if no one else is
+1
On 11 Jun 2009, at 21:32, Michael McCandless (JIRA) wrote:
[ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718629
#action_12718629 ]
Michael McCandless commented on LUCENE-1685:
I think the Collector approach makes the most sense to me, since
it's the only object I fully control in the search process. I cannot
control Query implementations, and I cannot control the decisions
made by IndexSearcher. But I can always wrap someone else's
Collector with TLC and pass it
Going back to my post re TimeLimitedIndexReaders - here's an
incomplete but functional prototype:
http://www.inperspective.com/lucene/TimeLimitedIndexReader.java
http://www.inperspective.com/lucene/TestTimeLimitedIndexReader.java
The principle is that all reader accesses check a volatile
.
That will save looping over the collection to find the next
candidate. Just an implementation detail though.
Shai
On Sat, Jun 27, 2009 at 3:31 AM, Mark Harwood
markharw...@yahoo.co.uk wrote:
Going back to my post re TimeLimitedIndexReaders - here's an
incomplete but functional prototype
Odd. I see you're responding to a message from Shai I didn't get. Some
mail being dropped somewhere along the line..
Why don't you use Thread.interrupt(), .isInterrupted() ?
Not sure where exactly you mean for that?
I'm not sure I understand that - how can a thread run 1 activity
Despite making IDF a constant the edit distance should remain a factor
in the rankings so I would have thought this would give you what you
need.
Can you supply a more detailed example? Either print the rewritten
query or use the explain function
Cheers
Mark
On 27 Aug 2009, at 13:22,
I think those boosts shown are reflecting the edit distance. What we can't see
from this is that the Similarity class used in execution is using the same IDF
for all terms. The other factors at play will be the term frequency in the doc,
its length and any doc boost.
I don't have access to the
It seems like something higher up must accept two rects and OR them together
during the searching?
That's the way I've done it before. It's like the old Asteroids arcade game
where as the ship drifts off-screen stage right it is simultaneously emerging
back from stage-left.
-
I've been putting together some code to support highlighting of opaque query
clauses (cached filters, trie range, spatial etc etc) which shows some promise.
This is not intended as a replacement for the existing highlighter(s) which
deal with free-text but is instead concentrating on the
1 - 100 of 470 matches
Mail list logo