Supposing I have a document with just "hi there" as the text.
If I do a span query like this:
near(near(term('hi'), term('there'), slop=0, forwards),
term('hi'), slop=1, any-direction)
that returns no hits. However, if I do a span query like this:
near(near(term('hi'), term('there'), s
On Mon, Dec 19, 2011 at 9:05 PM, Paul Taylor wrote:
> I was looking for a Query that returns all documents that contain a
> particular field, it doesnt matter what the value of the field is just that
> the document contains the field.
If you don't care about performance (or if it runs fast enough
Hi all.
I want to access a Lucene index remotely. I'm aware of a couple of
options for it which seem to operate more or less at the IndexSearcher
level - send a query, get back results.
But in our case, we use IndexReader directly for building statistics,
which is too slow to do via individual q
On Mon, Jan 23, 2012 at 11:31 PM, Jamie wrote:
> Ian
>
> Thanks. I'll have to read up about it. I have a lot of comparisons to make,
> so cannot precompute the values.
How many is a lot? If it were 100 or so I would still be tempted to do
all 4,950 comparisons and find some sensible way to store
On Fri, Jan 27, 2012 at 10:41 AM, Saurabh Gokhale
wrote:
> I wanted to check if Ngraming the document contents (space is not the
> issue) would make any good for better matching? Currently I see Ngram is
> mostly use for auto complete or spell checker but is this useful for
> similarity search?
I
Hi all.
I've found a rather frustrating issue which I can't seem to get to the
bottom of.
Our application will crash with an access violation around the time
when the index is closed, with various indications of what's on the
stack, but the common things being SegmentTermEnum.next and
MMapIndexIn
On Wed, Feb 1, 2012 at 11:30 AM, Robert Muir wrote:
> the problem is caused by searching indexreaders after you closed them.
>
> in general we can try to add more and more safety, but at the end of the day,
> if you close an indexreader while a search is running, you will have problems.
>
> So be
On Wed, Feb 1, 2012 at 1:14 PM, Robert Muir wrote:
>
> No, I don't think you should use close at all, because your problem is
> you are calling close() when its unsafe to do so (you still have other
> threads that try to search the reader after you closed it).
>
> Instead of trying to fix the bugs
Hi all.
We have 1..N indexes for each time someone adds some data. Each time
they can choose different tokenisation settings. Because of this, each
text index has its own query parser instance. Because each query
parser could generate a different Query (though I guess whether they
do or not is ano
On Wed, Feb 15, 2012 at 11:46 AM, Uwe Schindler wrote:
> Scores are only compatible if the query is the same, which is not the case
> for you.
> So you cannot merge hits from different queries.
So I guess in the case where the different query parsers happen to
generate the same query, it's safe
On Mon, Feb 20, 2012 at 12:07 PM, Uwe Schindler wrote:
> See my response. The problem is not in Lucene; its in general a problem of
> fixed
> thread pools that execute other callables from within a callable running at
> the
> moment in the same thread pool. Callables are simply waiting for each
On Thu, Mar 1, 2012 at 6:20 PM, Sudarshan Gaikaiwari wrote:
> Hi
>
> https://builds.apache.org/job/Lucene-trunk/javadoc/core/org/apache/lucene/document/DocValuesField.html
>
> The documentation at the above link indicates that the optimal way to
> add a DocValues field is to create it once and cha
On Fri, Mar 2, 2012 at 6:22 PM, su ha wrote:
> Hi,
> I'm new to Lucene. I'm indexed some documents with Lucene and need to
> sanitize it to ensure
> that they do not have any social security numbers (3-digits 2-digits
> 4-digits).
>
> (How) Can I write a query (with the QueryParser) that searche
On Wed, Apr 18, 2012 at 9:27 AM, Vladimir Gubarkov wrote:
> Hi, dear Lucene specialists,
>
> The only explanation I could think of is the new TieredMergePolicy
> instead of old LogMergePolicy. Could it be that because of
> TieredMergePolicy merges not adjacent segments - this results in not
> pres
On Fri, May 11, 2012 at 9:56 PM, Jong Kim wrote:
> 2. If Lucene can recycle old IDs, it would be even better if I could force
> it to re-use a particular doc ID when updating a document by deleting old
> one and creating new one. This scheme will allow me to reference this doc
> ID from another do
On Thu, May 17, 2012 at 7:11 AM, Chris Harris wrote:
> but also crazier ones, perhaps like
>
> agreement w/5 (medical and companion)
> (dog or dragon) w/5 (cat and cow)
> (daisy and (dog or dragon)) w/25 (cat not cow)
[skip]
Everything in your post matches our experience. We ended up writing
some
On Fri, May 18, 2012 at 6:23 AM, Jamie Johnson wrote:
> I think you want to have a look at the QueryParser classes. Not sure
> which you're using to start with but probably the default QueryParser
> should suffice.
There are (at least) two catches though:
1. The semantics of a QueryParser might
On Sat, May 26, 2012 at 12:07 PM, Chris Harris wrote:
>
> Alternatively, if you insist that query
>
> merger w/5 (medical and agreement)
>
> should match document "medical x x x merger x x x agreement"
>
> then you can propagate 2x the parent's slop value down to child queries.
This is in fact ex
On Fri, Jun 8, 2012 at 5:35 AM, Jack Krupansky wrote:
> Well, if you have defined OR/or and IN/in as stopwords, what is it you expect
> other than for the analyzer to ignore those terms (which with a boolean “AND”
> means match nothing)?
Is this behaviour really logical?
If I search for a sing
On Mon, Jul 23, 2012 at 10:16 PM, Deepak Shakya wrote:
> Hey Jack,
>
> Can you let me know how should I do that? I am using the Lucene 3.6 version
> and I dont see any parse() method for StandardAnalyzer.
In your case, presumably at indexing time you should be using a
PerFieldAnalyzerWrapper with
On Thu, Jul 26, 2012 at 5:38 AM, Simon Willnauer
wrote:
> you really shouldn't do that! If you use lucene as a Primary key
> generator why don't you build your own on top. Just add one layer that
> accepts the document and returns the PID and internally put it in an
> ID field. Using no merge poli
On Thu, Aug 16, 2012 at 11:27 AM, zhoucheng2008 wrote:
>
> +(title:21 title:a title:day title:once title:a title:month)
Looks like you have a fairly big boolean query going on here, and some
of the terms you're using are really common ones like "a".
Are you using AND or OR for the default operat
On Fri, Sep 7, 2012 at 6:12 PM, Jochen Hebbrecht
wrote:
> Hi qibaoyuan,
>
> I tried your second solution, using the scoring data. I think in this way,
> I could use MoreLikeThis. All documents with a score > X are a possible
> match :-).
FWIW, there is also BooleanQuery#setMinimumNumberShouldMatc
On Thu, Sep 20, 2012 at 4:28 AM, vempap wrote:
> Hello All,
>
> I've a issue with respect to the distance measure of SpanNearQuery in
> Lucene. Let's say I've following two documents:
>
> DocID: 6, cotent:"1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1001
> 1002 1003 1004 1005 1006 1007 1008
On Sat, Oct 27, 2012 at 1:53 PM, Tom wrote:
> Hello,
>
> using Lucene 4.0.0b, I am trying to get a superset of all stop words (for
> an international app).
> I have looked around, and not found anything specific. Is this the way to go?
>
> CharArraySet internationalSet = new CharArraySet(Version.L
On Mon, Nov 5, 2012 at 4:25 AM, Michael-O <1983-01...@gmx.net> wrote:
> Continuing my answer from above. Have you ever worked with the Spring
> Framework? They apply a very nice exception translation pattern. All
> internal exceptions are turned to specialized unchecked exceptions like
> Authentica
In our application, most users sit around in read-only mode all the time
but there is one place where write access can occur, which is essentially
scripted at the moment. (*)
Currently, we start out opening an IndexReader. When the caller declares
that they are going to start writing, we open an I
On Wed, Nov 7, 2012 at 10:11 PM, Ian Lea wrote:
> 4.0 has maybeRefreshBlocking which "is useful if you want to guarantee
> that the next call to acquire() will return a refreshed instance".
> You don't say what version you're using.
>
> If you're stuck on 3.6.1 can you do something with refreshIfN
On Wed, Nov 7, 2012 at 11:10 PM, Ian Lea wrote:
>
> Sorry, didn't notice that refreshIfNeeded is protected.
It's not only protected... but the class is final as well (the method
might as well be private so that it doesn't give a false sense of hope
that it can be overridden.)
I might have to clo
On Thu, Nov 8, 2012 at 8:29 AM, Trejkaz wrote:
> It's not only protected... but the class is final as well (the method
> might as well be private so that it doesn't give a false sense of hope
> that it can be overridden.)
>
> I might have to clone the whole class just
I have a feature I wanted to implement which required a quick way to
check whether an individual document matched a query or not.
IndexSearcher.explain seemed to be a good fit for this.
The query I tested was just a BooleanQuery with two TermQuery inside
it, both with MUST. I ran an empty query t
On Wed, Nov 21, 2012 at 12:33 AM, Ramprakash Ramamoorthy
wrote:
> On Tue, Nov 20, 2012 at 5:42 PM, Danil ŢORIN wrote:
>
>> Ironically most of the changes are in unicode handling and standard
>> analyzer ;)
>>
>
> Ouch! It hurts then ;)
What we did going from 2 -> 3 (and in some cases where passi
On Wed, Nov 21, 2012 at 10:40 AM, Robert Muir wrote:
> Explain is not performant... but the comment is fair I think? Its more of a
> worst-case, depends on the query.
> Explain is going to rewrite the query/create the weight and so on just to
> advance() the scorer to that single doc
> So if this
I recently implemented the ability for multiple users to open the
index in the same process ("whoa", you might think, but this has been
a single user application forever and we're only just making the
platform capable of supporting more than that.)
I found that filters are being stored twice and s
On Tue, Nov 27, 2012 at 9:31 AM, Robert Muir wrote:
> On Thu, Nov 22, 2012 at 11:10 PM, Trejkaz wrote:
>
>>
>> As for actually doing the invalidation, CachingWrapperFilter itself
>> doesn't appear to have any mechanism for invalidation at all, so I
>> imagine I
On Wed, Nov 28, 2012 at 2:09 AM, Robert Muir wrote:
>
> I don't understand how a filter could become invalid even though the reader
> has not changed.
I did state two ways in my last email, but just to re-iterate:
(1): The filter reflects a query constructed from lines in a text
file. If some ot
On Wed, Nov 28, 2012 at 6:28 PM, Robert Muir wrote:
> My point is really that lucene (especially clear in 4.0) assumes
> indexreaders are immutable points in time. I don't think it makes sense for
> us to provide any e.g. filtercaching or similar otherwise, because this is
> a key simplification t
On Thu, Nov 29, 2012 at 4:57 PM, Trejkaz wrote:
> doubt we're not
Rats. Accidentally double-negatived that. I doubt we are the only ones. *
TX
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For ad
Hi all.
trying to figure out what I was doing wrong in some of my own code so
I looked to LowerCaseFilter since I thought I remembered it doing this
correctly, and lo and behold, it failed the same test I had written.
Is this a bug or an intentional difference in behaviour?
@Test
public
On Fri, Nov 30, 2012 at 8:22 PM, Ian Lea wrote:
> Sounds like a side effect of possibly different, locale-dependent,
> results of using String.toLowerCase() and/or Character.toLowerCase().
>
> http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#toLowerCase()
> specifically mentions Turk
On Tue, Dec 4, 2012 at 10:09 AM, Vitaly Funstein wrote:
> If you don't need to support case-sensitive search in your application,
> then you may be able to get away with adding string fields to your
> documents twice - lowercase version for indexing only, and verbatim to
> store.
Actually, I will
On Tue, Dec 4, 2012 at 8:33 PM, BIAGINI Nathan
wrote:
> I need to send a class containing Lucene elements such as `Query` over the
> network using EJB and of course this class need to be serialized. I marked
> my class as `Serializable` but it does not seems to be enough:
>
> org.apache.lucene
On Sat, Jan 5, 2013 at 4:06 AM, Klaus Nesbigall wrote:
> The actual behavior doesn't work either.
> The english word families will not be found in case the user types the query
> familie*
> So why solve the problem by postulate one oppinion as right and another as
> wrong?
> A simple flag which
On Wed, Jan 9, 2013 at 6:30 AM, saisantoshi wrote:
> DoesLucene StandardAnalyzer work for all the languagues for tokenizing before
> indexing (since we are using java, I think the content is converted to UTF-8
> before tokenizing/indeing)?
No. There are multiple cases where it chooses not to brea
On Wed, Jan 9, 2013 at 10:57 AM, Steve Rowe wrote:
> Trejkaz (and maybe Sai too): ICUTokenizer in Lucene's icu module may be be of
> interest to you, along with the token filters in that same module. - Steve
ICUTokenizer sounds like it's implementing UAX #29, which is exac
On Wed, Jan 9, 2013 at 5:25 PM, Steve Rowe wrote:
> Dude. Go look. It allows for per-script specialization, with (non-UAX#29)
> specializations by default for Thai, Lao, Myanmar and Hewbrew. See
> DefaultICUTokenizerConfig. It's filled with exactly the opposite of what you
> were describing
On Tue, Jan 29, 2013 at 3:42 AM, Andrew Gilmartin
wrote:
> When I first started using Lucene, Lucene's Query classes where not suitable
> for use with the Visitor pattern and so I created my own query class
> equivalants and other more specialized ones. Lucene's classes might have
> changed since
On Thu, Jan 31, 2013 at 11:05 PM, Michael McCandless
wrote:
> It's confusing, but you should never try to re-index a document you
> retrieved from a searcher, because certain index-time details (eg,
> whether a field was tokenized) are not preserved in the stored
> document.
>
> Instead, you shoul
Hi all.
We have an application which has been around for so long that it's
still using doc IDs to key to an external database.
Obviously this won't work forever (even in Lucene 3.x we had to use a
custom merge policy to keep it working) so we want to introduce
application IDs eventually. We have
On Sun, Mar 10, 2013 at 8:19 PM, Gili Nachum wrote:
> Answering myself for next generations' sake.
> Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS does the job.
How about 㒨?
TX
-
To unsubscribe, e-mail: java-user-unsubscr...@lu
On Tue, Mar 12, 2013 at 10:42 PM, Hu Jing wrote:
> so my question is how to achieve a non-sort query method, this method can
> get result constantly and don't travel all unnecessary doc.
>
> Does Lucene supply some strategies to implement this?
If you want the result as soon as possible, just pa
On Wed, Jul 10, 2013 at 12:53 AM, Uwe Schindler wrote:
> Hi,
>
> there is no more locale-based sorting in Lucene 4.x. It was deprecated in 3.x,
> so you should get a warning about deprecation already!
I wasn't sure about this because we are on 3.6 and I didn't see a
deprecation warning in our cod
On Wed, Jul 10, 2013 at 4:20 PM, Uwe Schindler wrote:
> Hi,
>
> The "fast" replacement (means sorting works as fast without collating) is to
> index the fields
> used for sorting with CollationKeyAnalyzer ([snip]). The Collator you get
> from e.g. the locale.
[snip]
> The better was is, as menti
Hi all.
Is there some kind of callback where we can be notified about commits?
Sometimes a call to commit() doesn't actually commit anything (e.g. if
there is nothing in memory at the time.) I'm not really sure what's
wrong with assuming it does commit something, because it's another
developer as
On Mon, Sep 2, 2013 at 4:10 PM, Ankit Murarka
wrote:
> There's a reason why Writer is being opened everytime inside a while loop. I
> usually open writer in main method itself as suggested by you and pass a
> reference to it. However what I have observed is that if my file contains
> more than 4 l
Hi all.
I discovered there is a normalise filter now, using ICU's Normalizer2
(org.apache.lucene.analysis.icu.ICUNormalizer2Filter). However, as
this is a filter, various problems can result if used with
StandardTokenizer.
One in particular is half-width Katakana.
Supposing you start out with t
On Mon, Jan 17, 2011 at 11:53 AM, Robert Muir wrote:
> On Sun, Jan 16, 2011 at 7:37 PM, Trejkaz wrote:
>> So I guess I have two questions:
>> 1. Is there some way to do filtering to the text before
>> tokenisation without upsetting the offsets reported by the tokeniser?
&
On Thu, Jan 20, 2011 at 9:08 AM, Paul Libbrecht wrote:
>>> Wouldn't it be better to prefer precise matches (a field that is
>>> analyzed with StandardAnalyzer for example) but also allow matches are
>>> stemmed.
>>
>> StandardAnalyzer isn't quite precise, is it? StandardFilter does some
>> kind o
On Fri, Mar 11, 2011 at 10:03 PM, shrinath.m wrote:
> I am trying to index content withing certain HTML tags, how do I index it ?
> Which is the best parser/tokenizer available to do this ?
This doesn't really answer the question, but I think it will help...
The features you want to look for:
1.
Hi all.
I'm trying to parallelise writing documents into an index. Let's set
aside the fact that 3.1 is much better at this than 3.0.x... but I'm
using 3.0.3.
One of the things I need to know is the doc ID of each document added
so that we can add them into auxiliary database tables which are ke
On Tue, Mar 29, 2011 at 11:21 PM, Erick Erickson
wrote:
> I'm always skeptical of storing the doc IDs since they can
> change out from underneath you (just delete even a single
> document and optimize).
We never delete documents. Even when a feature request came in to
update documents (i.e. dele
On Wed, Mar 30, 2011 at 8:21 PM, Simon Willnauer
wrote:
> Before trunk (and I think
> its in 3.1 also) merge only merged continuous segments so the actual
> per-segment ID might change but the global document ID doesn't if you
> only add documents. But this should not be considered a feature. In
>
On Sat, Apr 2, 2011 at 7:07 AM, Christopher Condit wrote:
> I see in the JavaDoc for IndexWriterConfig that:
> "Note that IndexWriter makes a private clone; if you need to
> subsequently change settings use IndexWriter.getConfig()."
>
> However when I attempt to use the same IndexWriterConfig to c
On Thu, Apr 14, 2011 at 9:44 PM, shrinath.m wrote:
> Consider this case :
>
> Lucene index contains documents with these fields :
> title
> author
> publisher
>
> I have coded my app to use MultiFieldQueryParser so that it queries all
> fields.
> Now if user types something like "author:tom" in s
On Thu, Apr 28, 2011 at 6:13 PM, Uwe Schindler wrote:
> In general a *newly* created object that was not yet seen by any other
> thread is always safe. This is why I said, set all bits in the ctor. This is
> easy to understand: Before the ctor returns, the object's contents and all
> references li
On Wed, Jun 8, 2011 at 6:52 PM, Elmer wrote:
> the parsed query becomes:
>
> '+(title:the) +(title:project desc:project)'.
>
> So, the problem is that docs that have the term 'the' only appearing in
> their desc field are excluded from the results.
Subclass MFQP and override getFieldQuery.
If th
On Wed, Jun 29, 2011 at 2:24 AM, Michael McCandless
wrote:
> Here's the issue:
>
> https://issues.apache.org/jira/browse/LUCENE-3255
>
> It's because we read the first 0 int to be an ancient segments file
> format, and the next 0 int to mean there are no segments. Yuck!
>
> This format pre-dat
Hi all.
I created a test using Lucene 2.3. When run, this generates a single token:
public static void main(String[] args) throws Exception {
String string =
"\u0412\u0430\u0441\u0438\u0301\u043B\u044C\u0435\u0432";
StandardAnalyzer analyser = new StandardAnalyzer();
On Fri, Jul 15, 2011 at 10:02 AM, Trieu, Jason T
wrote:
> Hi all,
>
> I read postings about searching for empty field with but did not find any
> cases of successful search using query language syntax itself(-myField:[* TO
> *] for example).
We have been using: -myField:*
You would need to us
On Fri, Jul 15, 2011 at 4:45 PM, Uwe Schindler wrote:
> Hi,
>
>> The crappy thing is that to actually detect if there are any tokens in the
>> field
>> you need to make a TokenStream which can be used to read the first token
>> and then rewind again. I'm not sure if there is such a thing in Luce
Hi all.
I am writing a custom query parser which strongly resembles
StandardQueryParser (I use a lot of the same processors and builders,
with a slightly customised config handler and a completely new syntax
parser written as an ANTLR grammar.) My parser has additional syntax
for span queries. T
On Fri, Aug 5, 2011 at 1:57 AM, Jim Swainston
wrote:
> So if the Text input is:
>
> Marketing AND Smith OR Davies
>
> I want my program to work out that this should be grouped as the following
> (as AND has higher precedence than OR):
>
> (Marketing AND Smith) OR Davies.
>
> I'm effectively lookin
On Mon, Aug 8, 2011 at 8:58 AM, Michael Sokolov wrote:
> Can you do something approximately equivalent like:
>
> within(5, 'my', and('cat', 'dog')) ->
> within(5, 'my', within(5, 'cat', 'dog') )
>
> Might not be exactly the same in terms of distances (eg "cat x x x my x x x
> dog") might match the
On Mon, Aug 8, 2011 at 10:00 AM, Trejkaz wrote:
>
> within(5, 'my', and('cat', 'dog')) -> within(5, 'my', within(10, 'cat',
> 'dog') )
To extend my example and maybe make it a bit more hellish, take this one:
Hi all.
Suppose I am searching for - 限定
In 3.0, QueryParser would parse this as a phrase query. In 3.3, it
parses it as a boolean query, but offers an option to treat it like a
phrase. Why would the default be not to do this? Surely you would
always want it to become a phrase query.
The new p
On Fri, Aug 19, 2011 at 11:05 AM, Chris Hostetter
wrote:
>
> See LUCENE-2458 for the backstory.
>
> the argument was that while phrase queries were historicly generated by
> the query parser when a single (white space deliminated) "chunk" of query
> parser input produced multiple tokens, that logi
On Sat, Aug 20, 2011 at 7:00 PM, Robert Muir wrote:
> On Sat, Aug 20, 2011 at 3:34 AM, Trejkaz wrote:
>
>>
>> As an aside, Google's behaviour seems to follow the "old" way. For
>> instance, [[ 限定 ]] returns 640,000,000 hits and [[ 限 定 ]] returns
Hi all.
We are using IndexWriter with no limits set and managing the commits
ourselves, mainly so that we can ensure they are done at the same time
as other (non-Lucene) commits.
After upgrading from 3.0 ~ 3.3, we are seeing a change in
ramSizeInBytes() behaviour where it is no longer resetting t
On Wed, Aug 24, 2011 at 4:45 AM, Michael McCandless
wrote:
> Hmm... this looks like a side-effect of LUCENE-2680, which was merged
> back from trunk to 3.1.
>
> So the problem is, IW recycles the RAM it has allocated, and so this
> method is returning the allocated RAM, even if those buffers are n
On Sat, Aug 27, 2011 at 2:30 AM, wrote:
> Hello,
> In our indexes we have a field that is a combination of other various
> metadata fields (i.e. subject, from, to, etc.). Each field that is added has
> a null position at the beginning. As an example, in Luke the field data looks
> like:
>
> nu
On Mon, Sep 19, 2011 at 3:50 AM, Charlie Hubbard
wrote:
> Here was the prior API I was calling:
>
> Hits hits = getSearcher().search( query, filter, sort );
>
> The new API:
>
> TopDocs hits = getSearcher().search( query, filter, startDoc +
> length, sort );
>
> So the question is wh
The current ordering of JapaneseAnalyser's token filters is as follows:
1. JapaneseBaseFormFilter
2. JapanesePartOfSpeechStopFilter
3. CJKWidthFilter (similar to NormaliseFilter)
4. StopFilter
5. JapaneseKatakanaStemFilter
6. LowerCaseFilter
Our existing support for E
In 3.6.2, I notice MultiFieldAttribute is deprecated. So I looked to
the docs to find the replacement:
https://lucene.apache.org/core/3_6_2/api/contrib-queryparser/org/apache/lucene/queryParser/standard/config/MultiFieldAttribute.html
...and the Deprecated note doesn't say what we're supposed to
On Sat, Jan 25, 2014 at 4:29 AM, Olivier Binda wrote:
> I would like to serialize a query into a string (A) and then to unserialize
> it back into a query (B)
>
> I guess that a solution is
> A) query.toString()
> B) StandardQueryParser().parse(query,"")
If your custom query parser uses the new q
On Mon, Jan 27, 2014 at 3:48 AM, Andreas Brandl wrote:
> Is there some limitation on the length of fields? How do I get around this?
[cut]
> My overall goal is to index (arbitrary sized) text files and run a regular
> expression search using lucene's RegexpQuery. I suspect the
> KeywordAnalyzer to
Hi all.
I'm trying to find a precise and reasonably efficient way to highlight
all occurrences of terms in the query, only highlighting fields which
match the corresponding fields used in the query. This seems like it
would be a fairly common requirement in applications. We have an
existing implem
On Wed, Feb 5, 2014 at 4:16 AM, Earl Hood wrote:
> Our current solution is to do highlighting on the client-side. When
> search happens, the search results from the server includes the parsed
> query terms so the client has an idea of which terms to highlight vs
> trying to reimplement a complete
On Thu, Feb 20, 2014 at 1:43 PM, Jamie Johnson wrote:
> Is there a way to limit the fields a user can query by when using the
> standard query parser or a way to get all fields/terms that make up a query
> without writing custom code for each query subclass?
If you mean StandardQueryParser, you c
On Tue, Mar 4, 2014 at 4:44 AM, Jack Krupansky wrote:
> What is the hex value for that second character returned that appears to
> display as an apostrophe? Hex 92 (decimal 146) is listed as "Private Use
> 2", so who knows what it might display as.
Well, if they're dealing with HTML, then it wil
On Mon, Jun 9, 2014 at 7:57 PM, Jamie wrote:
> Greetings
>
> Our app currently uses language specific analysers (e.g. EnglishAnalyzer,
> GermanAnalyzer, etc.). We need an option to disable stemming. What's the
> recommended way to do this? These analyzers do not include an option to
> disable stem
Hi all.
The inability to read people's existing indexes is essentially the
only thing stopping us upgrading to v4, so we're stuck indefinitely on
v3.6 until we find a way around this issue.
As I understand it, Lucene 4 added the notion of codecs which can
precisely choose how to read and write th
On Mon, Jun 9, 2014 at 10:17 PM, Adrien Grand wrote:
> Hi,
>
> It is not possible to read 2.x indices from Lucene 4, even with a
> custom codec. For instance, Lucene 4 needs to hook into
> SegmentInfos.read to detect old 3.x indices and force the use of the
> Lucene3x codec since these indices don
Someone asked if it was possible to do a SpanNearQuery between a
TermQuery and a MultiPhraseQuery.
Sadly, you can only use SpanNearQuery with other instances of
SpanQuery, so we have a gigantic method where we rewrite as many
queries as possible to SpanQuery. For instance, TermQuery can
trivially
Unrelated to my previous mail to the list, but related to the same
investigation...
The following test program just indexes a phrase of nonsense words
using and then queries for one of the words using the same analyser.
The same analyser is being used both for indexing and for querying,
yet in th
Also in case it makes a difference, we're using Lucene v3.6.2.
TX
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
On Tue, Aug 19, 2014 at 5:27 PM, Uwe Schindler wrote:
> Hi,
>
> You forgot to close (or commit) IndexWriter before opening the reader.
Huh? The code I posted is closing it:
try (IndexWriter writer = new IndexWriter(directory,
new IndexWriterConfig(Version.LUCENE_36, analyser))) {
Lucene 4.9 gives much the same result.
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.ja.JapaneseAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import
It seems like nobody knows the answer, so I'm just going to file a bug.
TX
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Bit of thread necromancy here, but I figured it was relevant because
we get exactly the same error.
On Thu, Jan 19, 2012 at 12:47 AM, Michael McCandless
wrote:
> Hmm, are you certain your RAM buffer is 3 MB?
>
> Is it possible you are indexing an absurdly enormous document...?
We're seeing a cas
On Wed, Nov 26, 2014 at 2:09 PM, Erick Erickson wrote:
> Well
> 2> seriously consider the utility of indexing a 100+M file. Assuming
> it's mostly text, lots and lots and lots of queries will match it, and
> it'll score pretty low due to length normalization. And you probably
> can't return it to
1 - 100 of 229 matches
Mail list logo