I have a question about the API for storing and indexing lucene
documents (in 3.x).
If I want to index a document by providing a TokenStream, I can do that
by calling document.add (field) where field is something I write
deriving from AbstractField that returns the TokenStream for
tokenStream
d instances for each type as you
like.
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Michael Sokolov [mailto:soko...@ifactory.com]
Sent: Wednesday, July 11, 2012 2:54 AM
To: java-user@lucene.apache.org
If you use HTMLStripCharFilter, it extracts the text only, leaving tags
out, and remembering the word positions so that highlighting works
properly. Should do exactly what you want out of the box...
On 10/23/2012 8:00 PM, Scott Smith wrote:
I need to take an html page that I retrieve from m
ve me after I've stripped the HTML.
Suggestions?
Scott
-Original Message-----
From: Michael Sokolov [mailto:soko...@ifactory.com]
Sent: Tuesday, October 23, 2012 9:04 PM
To: java-user@lucene.apache.org
Cc: Scott Smith
Subject: Re: Highlighting html pages
If you use HTMLStripCharFilter, i
On 11/6/2012 3:29 AM, Steve Rowe wrote:
Hi Scott,
HTMLStripCharFilter doesn't require that its input be valid HTML - there is no
assumption of balanced tags.
Also, highlighted sections could span tags, e.g. if you highlight "this
phrase", and the original HTML looks like:
… thisphras
Does anyone have any experience with the stemmers? I know that Porter
is what "everyone" uses. Am I better off with KStemFilter (better
performance) or ?? Does anyone understand the differences between the
various stemmers and how to choose one over another?
We started off using Porter, t
On 11/15/2012 1:06 PM, Tom Burton-West wrote:
This paper on the Kstem stemmer lists cases where the Porter stemmer
understems or overstems and explains the logic of Kstem: "Viewing
Morphology as an Inference Process" (*Krovetz*, R., Proceedings of the
Sixteenth Annual International ACM SIGIR Con
On 11/15/2012 1:06 PM, Tom Burton-West wrote:
This paper on the Kstem stemmer lists cases where the Porter stemmer
understems or overstems and explains the logic of Kstem: "Viewing
Morphology as an Inference Process" (*Krovetz*, R., Proceedings of the
Sixteenth Annual International ACM SIGIR Con
On 11/20/2012 6:49 AM, Michael McCandless wrote:
On Tue, Nov 20, 2012 at 1:49 AM, Ravikumar Govindarajan
wrote:
Also, for a TopN query sorted by doc-id will the query terminate early?
Actually, it won't! But it really should ... you could make a
Collector that throws an exception once the N
On 12/31/2012 11:39 AM, Itai Peleg wrote:
Hi all,
Can someone please post a simple example showing how to add additional
attributes to token in a TokenStream (inside IncrementToken for example?).
I'm working on entity extraction and want to flag specific tokens an
entities, but I'm having probl
r one.
Thanks in advance,
Itai
2012/12/31 Michael Sokolov
On 12/31/2012 11:39 AM, Itai Peleg wrote:
Hi all,
Can someone please post a simple example showing how to add additional
attributes to token in a TokenStream (inside IncrementToken for example?).
I'm working on entity extraction an
On 1/3/2013 6:16 PM, Wu, Stephen T., Ph.D. wrote:
I think we've been saying that if we put something in a Payload, it will be
indexed. From what I understand of the indexing format, that means that
what you put in the Payload will be stored in the Lucene index... But it
won't *itself* be indexed
I have an indexer that already collapses field values into a Map of
(value, count) before indexing, and I would like to specify an increment
to frequency (docFreq?) when adding a field value to a Lucene Document.
Should I just add the same value multiple times?
-Mike
On 2/28/2013 5:05 PM, Uwe Schindler wrote:
... Collector instead of HitCollector (like your ancient Lucene from 2.4), you have to
respect the new semantics that are *different* to old HitCollector. Collector works with
low-level atomic readers (also in Lucene 3.x), the calls to the "collect(in
On 03/01/2013 07:56 AM, Uwe Schindler wrote:
The slowdown happens not on making the doc ids absolute (it is just an
addition), the slowdown appears when you retrieve the stored fields on the
top-level reader (because the composite top-level reader has to do a binary
search in the reader tree t
On 03/11/2013 01:22 PM, Michael McCandless wrote:
On Mon, Mar 11, 2013 at 9:32 AM, Carsten Schnober
wrote:
Am 11.03.2013 13:38, schrieb Michael McCandless:
On Mon, Mar 11, 2013 at 7:08 AM, Uwe Schindler wrote:
Set the rewrite method to e.g. SCORING_BOOLEAN_QUERY_REWRITE,
On 5/7/2013 6:26 PM, Colin Pollock wrote:
Hi, I want to modify how the QueryScorer selects fragments for snippeting. I
want to add a small boost for fragments that contain certain terms (e.g.
"great", "amazing") to the unique term occurrence score. But I don't want
these words to actually be high
You may not have noticed that CharFilter extends Reader. The expected
pattern here is that you chain instances together -- your CharFilter
should act as *input* to the Analyzer, I think. Don't think in terms of
extending these analysis classes (except the base ones designed for it):
compose t
On 6/12/2013 7:02 PM, Steven Schlansker wrote:
On Jun 12, 2013, at 3:44 PM, Michael Sokolov
wrote:
You may not have noticed that CharFilter extends Reader. The expected pattern
here is that you chain instances together -- your CharFilter should act as
*input* to the Analyzer, I think
I'm pleased to announce the first public release of Lux (version 0.9.1),
an XML search engine embedding Saxon 9 and Lucene/Solr 4. Lux offers
many features found in XML databases: persistent XML storage,
index-optimized querying, an interactive query window, and some
application support feature
On 6/21/13 11:18 AM, Uwe Schindler wrote:
You may also be interested in this talk @ BerlinBuzzwords2013:
http://intrafind.de/tl_files/documents/INTRAFIND_BerlinBuzzwords2013_The-Typed-Index.pdf
Unfortunately the slides are not available.
Uwe
I've been wondering why we seem to handle case- and
by repeating the same term many times (I don't care
about positions or highlighting in this case, either, just scoring), but
that seems a bit perverse (and probably slower than just supplying the
counts directly).
--
Michael Sokolov
Senior Architect
Safari Books O
On 07/28/2013 07:32 PM, Denis Bazhenov wrote:
A full JSON query ser/deser would be an especially nice additionto Solr,
allowing direct access to all Lucene Query features even if they haven't been
integrated into the higher level query parsers.
There is nothing we could do, so we wrote one, in
I had been planning something similar to what Michael was used to:
creating a regular numeric field (call it "weight", say) with a rank
value, applying a field boost to that field that is equal to the rank
value, and then querying with weight:[* TO *] as a term, thinking that
would end up bring
rom
comprehensive.
Does this seem like a reasonable patch?
-Mike
Michael Sokolov
Engineering Director
www.ifactory.com
@iFactoryBoston
PubFactory: the revolutionary e-publishing platform from iFactory
-
To unsubscri
In our applications, we catch ParseException and then take one of the
following actions:
1) report an error to the user
2) rewrite the query, stripping all punctuation, and try again
3) rewrite the query, quoting all punctuation, and try again
would that work for you?
On 5/5/2011 3:26 AM, Bern
I believe creating a large number of fields is not a good match w/the
underlying architecture, and you'd be better off w/a large number of
documents/small number of fields, where the same field occurs in every
document. There is some discussion here:
http://markmail.org/message/hcmt5syca7zdeac
I think you need
field:[20020101 TO *]
although the "*" option isn't available in some versions (pre 3.1?) and
you just have to supply a big value:
field:[20020101 TO ]
-Mike
On 6/19/2011 6:18 PM, Hiller, Dean x66079 wrote:
"here you can simply go for field:[20020101 TO ] and leave
On 6/19/2011 8:11 PM, Hiller, Dean x66079 wrote:
Oddly, enough, this seems to work and I get one result calling
Collector.collect(int docIt)...(I found out AND has to be caps)...
author:dean AND date:20110623
but this does not seem to work...
author:dean AND date:[ 20110623 TO * ]
I'm not sur
Koji- I'm not familiar with the benchmarking system, but maybe I'll see
if I can run that benchmark on my test data as a point of comparison -
thanks for the pointer!
-Mike
On 6/20/2011 8:21 PM, Koji Sekiguchi wrote:
Mike,
FVH used to be faster for large docs. I wrote FVH section for Lucene
e PhraseQueries - I added those and it did make FVH slightly slower,
but not all that much. I'll keep digging.
-Mike
On 6/20/2011 10:54 PM, Michael Sokolov wrote:
Koji- I'm not familiar with the benchmarking system, but maybe I'll
see if I can run that benchmark on my test
e" within
which the term may occur - I think?) so there is an n^2 growth factor in
the number of occurrences of a term in a document. Does that seem possible?
-Mike
On 6/21/2011 8:48 PM, Michael Sokolov wrote:
I did that, and the benchmark indicates FVH is 10x faster than
Highlighter no
Because of this top-n behavior, its generally slow with Lucene to scan
deeply into the result set. If you want to go on page 100 of your search
results, the priority queue must at least have a size of n=docsPerPage*100.
Because of this, most full text search engines (e.g. Google does this, too)
You should take a look at org.apache.solr.analysis.MappingCharFilter,
which provides a generic table-based approach for use with solr. There
are also a lot of other interesting CharFilters in the same package.
For lucene-only use, there's
org.apache.lucene.analysis.icu.ICUFoldingFilter, which
I tried something similar, and failed - I think the API is lacking
there? My only advice is to vote for this:
https://issues.apache.org/jira/browse/LUCENE-2878 which should provide
an alternative better API, but it's not near completion.
-Mike
On 7/6/2011 5:34 PM, Jahangir Anwari wrote:
I h
On 8/4/2011 9:06 PM, Trejkaz wrote:...
For AND (and for any "default boolean" queries which aren't equivalent
to OR) queries, I have problems. For instance, you can't do this:
within(5, 'my', and('cat', 'dog')) -> and( within(5, 'my', 'cat'),
within(5, 'my', 'dog') )
The problem is that
Lukas there really isn't any support for custom Query types in
Highlighter, as you've found. If you inherit from one of the types it
does support, or rewrite your query to one of them, that should work,
but the Query class just doesn't provide enough support for Highlighter
to work with in the
Daniel, since no one knowledgeable has answered I'll take a stab - there
are a number of ant targets you can run, most of which incorporate some
indexing step(s). Basically you can run:
ant -Dtask.alg=
it looks as if the ant build.xml is set up to run
conf/micro-standard.alg by default, but
In my experience, books and other semi-structured text documents are
best handled as XML. There are many many different XML "vocabularies"
for doing this, each of which has benefits for different kinds of
documents. You probably should look at TEI, NLM Book, and DocBook
though - these are som
could use simply index every term with a namespace prefix like:
Q::term
where Q is the namespace and term the term?
Then when you do spell corrections, submit each candidate term with the
namespace prefix prepended
-Mike
On 11/23/2011 9:28 AM, E. van Chastelet wrote:
I currently have an id
On 10/02/2013 07:12 PM, Alice Wong wrote:
Hello,
We would like to index some documents. Each field of a document may have
multiple values. And for each (field,value) pair there are some associated
values. These associated values are just for retrieving, not searching.
For example, a document D
,2", "a2:2,3". I think that's what Aditya suggested. You still
have to parse these though, so why not use a prebuilt flexible parsing
infrastructure?
Thanks.
On Thu, Oct 3, 2013 at 1:49 PM, Michael Sokolov
<mailto:msoko...@safaribooksonline.com>> wrote:
On 10
There are some Analyzer methods you might want to override (initReader
for inserting a CharFilter, stuff about gaps), but if you don't need
that, it seems to be mostly about packaging neatly, as you say.
-Mike
On 10/8/13 10:30 AM, Benson Margulies wrote:
Is there some advice around about when
I've been running some tests comparing storing large fields (documents,
say 100K .. 10M) as files vs. storing them in Lucene as stored fields.
Initial results seem to indicate storing them externally is a win (at
least for binary docs which don't compress, and presumably we can
compress the ex
On 10/11/2013 03:04 PM, Adrien Grand wrote:
On Fri, Oct 11, 2013 at 7:03 PM, Michael Sokolov
wrote:
I've been running some tests comparing storing large fields (documents, say
100K .. 10M) as files vs. storing them in Lucene as stored fields. Initial
results seem to indicate storing
On 10/11/2013 03:19 PM, Michael Sokolov wrote:
On 10/11/2013 03:04 PM, Adrien Grand wrote:
On Fri, Oct 11, 2013 at 7:03 PM, Michael Sokolov
wrote:
I've been running some tests comparing storing large fields
(documents, say
100K .. 10M) as files vs. storing them in Lucene as stored f
On 10/13/2013 1:52 PM, Adrien Grand wrote:
Hi Michael,
I'm not aware enough of operating system internals to know what
exactly happens when a file is open but it sounds to be like having
separate files per document or field adds levels of indirection when
loading stored fields, so I would be sur
On 10/13/13 8:09 PM, Michael Sokolov wrote:
On 10/13/2013 1:52 PM, Adrien Grand wrote:
Hi Michael,
I'm not aware enough of operating system internals to know what
exactly happens when a file is open but it sounds to be like having
separate files per document or field adds levels of indire
On 10/18/2013 1:08 AM, Shai Erera wrote:
The codec intercepts merges in order to clean up files that are no longer
referenced
What happens if a document is deleted while there's a reader open on the
index, and the segments are merged? Maybe I misunderstand what you meant by
this statement, but
It sounds as if you want to create a new Query type. I would start by
having a look at BooleanQuery and trying to write an analogous object
that does what you want instead.
-Mike
On 11/13/2013 10:03 AM, Harald Kirsch wrote:
Hello all,
I wonder if a query according to the following rules is
I just posted a writeup of the Lucene/Solr Revolution Dublin
conference. I've been waiting for videos to become available, but I got
impatient. Slides are there, mostly though. Sorry if I missed your
talk -- I'm hoping to catch up when the videos are posted...
http://blog.safariflow.com/201
Have you read about numeric range faceting?
http://blog.mikemccandless.com/2013/05/dynamic-faceting-with-lucene.html
On 12/6/2013 5:34 AM, Ankit Murarka wrote:
Well a bit strange as this is the 1st time, I am not receiving any
reply to the question even after sending it again.
Would be very h
On 12/8/2013 12:03 PM, Ted Goldstein wrote:
I am new to Lucene and have begun experimenting. I've loaded both the example
books.csv and the various example electronic components documents. I then do a
variety of queries.
Quering http://su2c-dev.ucsc.edu:8983/solr/select?q=name:A* returns bot
Note the comments in the source:
/** Length of used bytes. */
public int length;
length is not the same as the size of the internal buffer. It is the
number of used bytes, so the length of the "logical" value as you call it.
-Mike
On 1/21/2014 10:32 AM, Yann-Erwan Perio wrote:
Hello,
The highlighters are the only thing I know of (in trunk) that do
something like that.
Work on this branch
(https://issues.apache.org/jira/browse/LUCENE/fixforversion/12317158) is
an attempt to make that more efficient.
In general the problem with doing this during scoring (the filtering
doc
On 2/4/14 12:16 PM, Earl Hood wrote:
On Tue, Feb 4, 2014 at 12:20 AM, Trejkaz wrote:
I'm trying to find a precise and reasonably efficient way to highlight
all occurrences of terms in the query, only highlighting fields which
...
[snip]
I am in a similiar situation with a web-based applica
On 2/4/2014 2:50 PM, Earl Hood wrote:
On Tue, Feb 4, 2014 at 1:16 PM, Michael Sokolov wrote:
You might be interested in looking at Lux, which layers XML services like
XQuery on top of Lucene and Solr, and includes an XML-aware highlighter:
https://github.com/msokolov/lux/blob/master/src/main
Ideally you would chunk a document at logical boundaries that will make
sense as units of both search and presentation. For some content, these
boundaries don't align; for example you might want to search for matches
within a paragraph scope, or within a section, chapter, or part of a
book, bu
No, not really. What would you do if you had a match contained entirely
within the overlapping region? You'd probably need a way to distinguish
that from a term that matched in two adjacent chunks, but *not* in the
overlap. Sounds very tricky to me.
-Mike
On 2/5/2014 2:21 AM, mrodent wrote:
On 2/5/2014 6:30 PM, raghavendra.k@barclays.com wrote:
Hi,
Can Lucene support wildcard searches such as the ones shown below?
Indexed value is "XYZ CORPORATION LIMITED".
If you index the value as a single token (KeywordTokenizer), there is
nothing really special about the examples you gav
On 2/6/2014 12:53 AM, Earl Hood wrote:
On Tue, Feb 4, 2014 at 6:05 PM, Michael Sokolov wrote:
Thanks for the feedback. I think it's difficult to know what to do about
attribute value highlighting in the general case - do you have any
suggestions?
That is a challenging one since one h
I have an idea for something I'm calling grouped scoring, and I want to
know if anybody has already done anything like this.
The idea comes from the problem that in your search results you'd like
to show only one or a small number of items from each group: for example
on google.com, multiple r
You could insert a large position gap between sentences (say 100;
something larger than the largest sentence in #words), and a still
larger position gap between paragraphs (1000; larger than the largest para).
Then within-sentence search is just (A B)~100 and within-paragraph
search (A B)~1000
I've been working on getting AnalyzingInfixSuggester to make suggestions
using tokens drawn from multiple fields. I've done this by copying
tokens from each of those fields into a destination field, and building
suggestions using that destination field. This allows me to use
different analysi
This isn't really a good use case for an index like Lucene. The most
essential property of an index is that it lets you look up documents
very quickly based on *precomputed* values.
-Mike
On 04/23/2014 06:56 AM, Rob Audenaerde wrote:
Hi all,
I'm looking for a way to use multi-values in a f
ShingleFilter can help with this; it concatenates neighboring tokens.
So a search for "good morning john" becomes a search for
"goodmorning john" OR
"good morningjohn" OR
"good morning john"
it makes your index much bigger because of all the terms, but you may
find it's worth the cost
-Mike
There is a Solr document cache that holds field values too, see:
http://wiki.apache.org/solr/SolrCaching
Maybe take this question over to the solr mailing list?
-Mike
On 5/30/2014 10:32 AM, Alan Woodward wrote:
Solr caches hold lucene docids, which are invalidated every time a new searcher
i
Probably the simplest thing is to define a field for each of the
contexts you are interested in, but you might want to consider using a
tagged-token approach.
I spent a while figuring out how to index tagged tree-structured data
and came up with Lux (http://luxdb.org) - basically it accepts XM
If you already have a parser for the language, you could use it to
create a TokenStream that you can feed to Lucene. That way you won't be
trying to reinvent a parser using tools designed for natural language.
-Mike
On 6/5/2014 6:42 AM, Johan Tibell wrote:
I will definitely try a prototype.
It all depends on the statistics: how the ranges are correlated. If the
integer range is small: from 1-2, for example, you might consider
indexing every integer in each range as a separate value, especially if
most documents will only have a small number of small ranges.
If there are too
I've been using AIS, and I see that it now has support for incremental
updates, which is great! I'm looking forward to getting suggestions from
newly-added documents without the need to rebuild the entire suggester
index. I've run into a few problems though, and I want to see if there
is a bett
this in a
subclass. With that in place, there is no issue about raising
exceptions - the index is always available.
Mike McCandless
http://blog.mikemccandless.com
On Thu, Aug 14, 2014 at 10:50 AM, Michael Sokolov
wrote:
I've been using AIS, and I see that it now has support for incr
can open a ticket for that too.
-Mike S
On 08/15/2014 07:33 AM, Michael Sokolov wrote:
On 8/14/2014 5:48 PM, Michael McCandless wrote:
I think we should expose commit? Can you open an issue?
I will
-
To unsubscribe, e-mail:
Have you looked into term vectors? I think they should fit your bill
pretty neatly. Here's a nice blog post with helpful background info:
http://blog.jpountz.net/post/41301889664/putting-term-vectors-on-a-diet
-Mike
On 8/19/2014 10:04 AM, Bianca Pereira wrote:
Hi everybody,
I would like
Tokenization is tricky. You might consider using whitespace tokenizer
followed by word delimiter filter (instead of standard tokenizer); it
does a kind of secondary tokenization pass that can preserve the
original token in addition to its component parts. There are some weird
side effects to
Usually that's referred to as multiple "values" for the same field; in
the index there is no distinction between title:C and title:X as far as
which field they are in -- they're in the same field.
If you want to prevent phrase queries from matching B C X, insert a
position gap between C and X;
On 9/4/2014 6:46 AM, Larry White wrote:
Hi,
Is there a way to index an entire json document automatically as one can do
with the new PostgreSQL json support? By automatically, I mean to create an
inverted index entry (path: value) for each element in the document without
having to specify in adv
On 9/19/2014 9:07 AM, John Cecere wrote:
Is there a way to set up Lucene so that both case-sensitive and
case-insensitive searches can be done without having to generate two
indexes?
You might be interested in the discussion here:
https://issues.apache.org/jira/browse/LUCENE-5620 which addres
Have you considered combining the AnalyzingInfixSuggester with a German
decompounding filter? If you break compound words into their
constituent parts during analysis, then the suggester will be able to do
what you want (prefix matches on the word-parts). I found this project
with a quick goo
I'm curious to know more about your use case, because I have an idea for
something that addresses this, but haven't found the opportunity to
develop it yet - maybe somebody else wants to :). The basic idea is to
reduce the number of terms needed to be looked up by collapsing
commonly-occurring
s filter into
ConstantScoreQuery and in other test I used FilteredQuery with
MatchAllDocsQuery and BooleanFilter. Both cases seems to work quite similar
in terms of performance to simple BooleanQuery.
But of course I'll also try to use TermsFilter. Maybe it will speedUp
filters.
Michael
That's a lot of code to eyeball. Have you tried printing out the input
data as you are indexing it (just at doc.add)? I am guessing there is
some simple variable aliasing issue that I don't see at a glance ...
-Mike
On 10/30/14 2:03 PM, Ralf Bierig wrote:
I want to implement a Lucene Indexer
The index size will not increase as quickly as you might think, and is
not an issue in most cases. An alternative to two fields, though, is to
index both upper- and lower-case tokens at the same position in a single
field, and then to perform no case folding at query time. There is no
standar
Why don't you want to use a highlighter? That's what they're for.
-Mike
On 11/25/2014 09:12 AM, John Cecere wrote:
I've done a bunch of searching, but I still can't seem to figure out
how to do this.
Given a WildcardQuery or PrefixQuery (something with a wildcard in
it), is there a way to r
It's impossible to tell since you didn't include the code for it, but my
advice would be to look at how the documents are being marked for
deletion. What are the terms being used to delete them? Are you trying
to use lucene docids?
-Mike
On 12/1/2014 4:22 PM, Badano Andrea wrote:
Hello,
M
On 1 Dec 2014, at 23:23, Michael Sokolov wrote:
It's impossible to tell since you didn't include the code for it, but my advice
would be to look at how the documents are being marked for deletion. What are
the terms being used to delete them? Are you trying to use lucene docids?
If you index the n-grams in their own field using ShingleFilter, you can
get statistics using the same term api on that field, in which the terms
*are* n-grams, and similarly for queries.
-Mike
On 12/02/2014 03:38 PM, Peter Organisciak wrote:
It is possible to get a total corpus frequency for
There are also Solr replication options - older snapshot-style
replication, and newer Solr Cloud, but if you are not using solr now,
you will incur some transitional costs since you would need to alter
your indexing and possibly querying code to use it
-Mike
On 12/04/2014 09:38 AM, Shai Erera
You probably don't want to use StandardAnalyzer: maybe try
WhitespaceAnalyzer, but you'll need to enhance your regex a little to
deal with punctuation since WA may give you tokens like:
5106-7922-9469-8422.
"5106-7922-9469-8422"
etc
-Mike
On 12/15/14 3:45 AM, Valentin Popov wrote:
I have
I see in the docs of ToParentBlockJoinQuery that:
* The child documents must be orthogonal to the parent
* documents: the wrapped child query must never
* return a parent document.
First, it would be helpful if the docs explained what would happen if
that assumption were violated.
Second,
OK - I see looking at the code that an exception is thrown if a parent
doc matches the subquery -- so that explains what will happen, but I
guess my further question is -- is that necessary? Could we just not
throw an exception there?
-Mike
On 12/16/2014 10:38 AM, Michael Sokolov wrote:
I
able fields; its children could include both 'Chapter' child
docs and also a 'BookMetadata' child doc.
-Greg
On Tue, Dec 16, 2014 at 10:42 AM, Michael Sokolov
wrote:
OK - I see looking at the code that an exception is thrown if a parent doc
matches the subquery -- so that
pendix had the word 'apple'. :)
It's equally possible to accidentally create a 'ToUncleJoin' or
'ToCousinJoin'.
Just my two cents,
Greg
On Tue, Dec 16, 2014 at 8:42 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:
Looking at the
The basic idea seems sound, but I think you can simplify that query a
bit. For one thing, the *:* clauses can be removed in a few places:
also if you index an explicit null value you won't need them at all; for
visiblefrom, if you don't have a from time, use 0, for visibleto, if you
don't have
On 1/13/2015 2:07 AM, Clemens Wyss DEV wrote:
reduced to:
( ( *:* -visiblefrom:[* TO *] AND -visibleto:[* TO *] )
OR (-visiblefrom:[* TO *] AND visibleto:[ TO ])
OR (-visibleto:[ * TO *] AND visiblefrom:[0 TO ])
OR ( visiblefrom:[0 TO ] AND visibleto:[ TO ]) )
also if y
In practice, normalization by field length proves to be more useful than
normalization by the sum of the lengths of all fields (document length),
which I think is what you seem to be after. Think of a book chapter
document with two fields: title and full text. It makes little sense to
weight
On 1/15/15 4:34 AM, rama44ster wrote:
Hi,
I am using lucene to index documents that have a multivalued text field
named ‘city’.
Each document might have multiple values for this field, like la, los
angeles etc.
Assuming
document d1 contains city = la ; city = los angeles
document d2 contains cit
On 1/15/15 11:23 AM, danield wrote:
Hi Mike,
Thank you for your reply. Yes, I had thought of this, but it is not a
solution to my problem, and this is because the Term Frequency and therefore
the results will still be wrong, as prepending or appending a string to the
term will still make it a di
On 1/21/2015 6:59 PM, Gregory Dearing wrote:
Jim,
I think you hit the nail on the head... that's not what BlockJoinQueries do.
If you're wanting to search for children and join to their parents... then
use ToParentBlockJoinQuery, with a query that matches the set of children
and a filter that m
e.org/mod_mbox/lucene-java-user/201412.mbox/%3ccaasl1-_ppmcnq3apjjfbt3adb4pgaspve-8o5r9gv5kldpf...@mail.gmail.com%3E>
-Greg
On Wed, Jan 21, 2015 at 7:59 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:
On 1/21/2015 6:59 PM, Gregory Dearing wrote:
Jim,
I think
1 - 100 of 286 matches
Mail list logo