Stemming is an inherently limited process. It doesn't know about the
word 'news', it just has a rule about 's'.
Some of us sell commercial products that do more complex linguistic
processing that knows about which words are which.
There may be open source implementations of similar technology.
If you wait tokenization to depend on sentences, and you insist on
being inside Lucene, you have to be a Tokenizer. Your tokenizer can
set an attribute on the token that ends a sentence. Then, downstream,
filters can read-ahead tokens to get the full sentence and buffer
tokens as needed.
On
codecs may not
work as expected!
Maybe try it out, was just an idea :-)
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Benson Margulies [mailto:bimargul...@gmail.com]
Sent: Thursday
Robert,
Let me lay out the scenario.
Hardware has .5T of Index is relatively small. Application profiling
shows a significant amount of time spent codec-ing.
Options as I see them:
1. Use DPF complete with the irritation of having to have this
spurious codec name in the on-disk format that has
WHOOPS.
First sentence was, until just before I clicked 'send',
Hardware has .5T of RAM. Index is relatively small (20g) ...
On Thu, Feb 12, 2015 at 4:51 PM, Benson Margulies ben...@basistech.com wrote:
Robert,
Let me lay out the scenario.
Hardware has .5T of Index is relatively small
means that I'm really married to a process
of making releases that mirror Lucene releases.
On Thu, Feb 12, 2015 at 5:33 AM, Benson Margulies ben...@basistech.com
wrote:
Based on reading the same comments you read, I'm pretty doubtful that
Codec.getDefault() is going to work. It seems
I have a class that extends FilterCodec. Written against Lucene 4.9,
it uses the Lucene49Codec.
Dropped into a copy of Solr with Lucene 4.10, it discovers that this
codec is read-only in 4.10. Is there some way to code one of these to
get 'the default codec' and not have to chase versions?
Consider a case where we have a token which can be subdivided in
several ways. This can happen in German. We'd like to represent this
with positionIncrement/positionLength, but it does not seem possible.
Once the position has moved out from one set of 'subtokens', we see no
way to move it back
...@gmail.com wrote:
HI Benson:
This is the case with n-gramming (though you have a more complicated start
chooser than most I imagine). Does that help get your ideas unblocked?
Will
-Original Message-
From: Benson Margulies [mailto:bimargul...@gmail.com]
Sent: Friday, October 24, 2014 4
Does google actually support *?
On Wed, Aug 27, 2014 at 9:54 AM, Milind mili...@gmail.com wrote:
I see. This is going to be extremely difficult to explain to end users.
It doesn't work as they would expect. Some of the tokenizing rules are
already somewhat confusing. Their expectation is
You should construct an analysis chain that does what you need. Read the
source of the relevant analyzer and pick the tokenizer and filter(s) that
you need, and don't include stemming.
On Mon, Jun 9, 2014 at 5:57 AM, Jamie ja...@mailarchiva.com wrote:
Greetings
Our app currently uses
Are you using Solr? If so you are on the wrong mailing list. If not, why do
you need a non-
-anonymous analyzer at all.
On Jun 9, 2014 6:55 AM, Jamie ja...@mailarchiva.com wrote:
To me, it seems strange that these default analyzers, don't provide
constructors that enable one to override
.
On Jun 9, 2014 7:02 AM, Jamie ja...@mailarchiva.com wrote:
I am not using Solr. I am using the default analyzers...
On 2014/06/09, 12:59 PM, Benson Margulies wrote:
Are you using Solr? If so you are on the wrong mailing list. If not, why
do
you need a non-
-anonymous analyzer at all
You must know what language each text is in, and use an appropriate
analyzer. Some people do this by using a separate field (text_eng,
text_spa, text_jpn). Other people put some extra information at the
beginning of the field, and then make an analyzer that peeks in order to
dispatch to the
the language, or ...
you can run a language detector. They are less accurate for short
strings, or ...
you can process it in _all_ of the languages and OR up the results.
On 4/6/2014 4:51 AM, Benson Margulies wrote:
You must know what language each text is in, and use an appropriate
It sounds like you've been asked to implement Named Entity Recognition.
OpenNLP has some capability here. There are also, um, commercial
alternatives.
On Thu, Feb 20, 2014 at 6:24 AM, Yann-Erwan Perio ye.pe...@gmail.comwrote:
On Thu, Feb 20, 2014 at 10:46 AM, Geet Gangwar geetgang...@gmail.com
If you are sensitive to things being committed to trunk, that suggests that
you are building your own jars and using the trunk. Are you perfectly sure
that you have built, and are using, a consistent set of jars? It looks as
if you've got some trunk-y stuff and some 4.6.1 stuff.
On Thu, Jan 30,
-a-lucene-tokenstream/20630673#20630673
Regards,
Mindaugas
On Tue, Jan 7, 2014 at 9:45 PM, Benson Margulies ben...@basistech.com
wrote:
Yes I Do.
On Tue, Jan 7, 2014 at 3:59 PM, Robert Muir rcm...@gmail.com wrote:
Benson, do you want to open an issue to fix this constructor to not
take
,
Sure, why not - I'm just not sure if my approach (of setting reader in
reset()) is preferred over yours (using this.input instead of input in
ctor)? Or are they both equally good?
m.
On Wed, Jan 8, 2014 at 12:18 PM, Benson Margulies ben...@basistech.com
wrote:
If you'd like to join
In 4.6.0, org.apache.lucene.analysis.BaseTokenStreamTestCase#checkResetException
fails if incrementToken fails to throw if there's a missing reset.
How am I supposed to organize this in a Tokenizer? A quick look at
CharTokenizer did not reveal any code for the purpose.
Tokenizer.java for the state machine logic. In general you should
not have to do anything if the tokenizer is well-behaved (e.g. close
calls super.close() and so on).
On Tue, Jan 7, 2014 at 2:50 PM, Benson Margulies bimargul...@gmail.com
wrote:
In 4.6.0
. i think its confusing and contributes to bugs that you have
to have logic in e.g. the ctor THEN ALSO in reset().
if someone does it correctly in the ctor, but they only test one
time, they might think everything is working..
On Tue, Jan 7, 2014 at 3:23 PM, Benson Margulies ben
There are a handful of binary files
in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames
ending in .dat.
Trailing around in the source, it seems as if at least one of these derives
from a source file named unk.def. In turn, this file comes from a
dependency. should the build
.
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Benson Margulies [mailto:ben...@basistech.com]
Sent: Monday, December 02, 2013 6:12 PM
To: java-user@lucene.apache.org; Christian Moen
,
Christian Moen
アティリカ株式会社
http://www.atilika.com
On Dec 3, 2013, at 2:11 AM, Benson Margulies ben...@basistech.com wrote:
There are a handful of binary files in
./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames ending
in .dat.
Trailing around in the source, it seems
How would you expect to recognize that 'Toy Story' is a thing?
On Tue, Nov 5, 2013 at 6:32 PM, Kevin glidekensing...@gmail.com wrote:
Currently I'm using StandardTokenizerFactory which tokenizes the words
bases on spaces. For Toy Story it will create tokens toy and story.
Ideally, I would
I just backported some code to 3.6.0, and it includes tests that use
org.apache.lucene.analysis.BaseTokenStreamTestCase#checkRandomData(java.util.Random,
org.apache.lucene.analysis.Analyzer, int, int)
The tests that use this method fail in 3.6.0 in ways that suggest that
multiple threads are
-
From: Benson Margulies [mailto:ben...@basistech.com]
Sent: Wednesday, October 30, 2013 12:30 AM
To: java-user@lucene.apache.org
Subject: new consistency check for token filters in 4.5.1
My token filter has no end() method at all. Am I required to have an
end
method
My token filter has no end() method at all. Am I required to have an end
method()?
BaseLinguisticsTokenFilterTest.testSegmentationReadings:175-Assert.assertTrue:41-Assert.fail:88
super.end()/clearAttributes() was not called correctly in end()
I'm working on tool that wants to construct analyzers 'at arms length' -- a
bit like from a solr schema -- so that multiple dueling analyzers could be
in their own class loaders at one time. I want to just define a simple
configuration for char filters, tokenizer, and token filter. So it would
be,
OK, so, here I go again making a public idiot of myself. Could it be that
the tokenizer factory is 'relatively recent' as in since 4.1?
On Mon, Oct 28, 2013 at 7:39 AM, Benson Margulies ben...@basistech.comwrote:
I'm working on tool that wants to construct analyzers 'at arms length
analyzers-commons module
(since 4.0). They are no longer part of Solr.
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Benson Margulies [mailto:ben...@basistech.com]
Sent: Monday, October 28
. I don't suppose there are some guidelines?
On Mon, Oct 28, 2013 at 9:43 AM, Benson Margulies ben...@basistech.comwrote:
Just how 'experimental' is the SPI system at this point, if that's a
reasonable question?
On Mon, Oct 28, 2013 at 8:41 AM, Uwe Schindler u...@thetaphi.de wrote:
Hi
I just built myself a sort of Solr-schema-in-a-test-tube. It's a class that
builds a classloader on some JAR files and then uses the SPI mechanism to
manufacture Analyzer objects made out of tokenizers and filters.
I can make this visible in github, or even attach it to a JIRA, if anyone
is
It might be helpful if you would explain, at a higher level, what you
are trying to accomplish. Where do these things come from? What
higher-level problem are you trying to solve?
On Sun, Oct 20, 2013 at 7:12 PM, saisantoshi saisantosh...@gmail.com wrote:
Thanks.
So, if I understand correctly,
On Wed, Oct 9, 2013 at 7:18 PM, Michael McCandless
luc...@mikemccandless.com wrote:
On Wed, Oct 9, 2013 at 7:13 PM, Benson Margulies ben...@basistech.com
wrote:
On Tue, Oct 8, 2013 at 5:50 PM, Michael McCandless
luc...@mikemccandless.com wrote:
DirectPostingsFormat?
It stores all
it as the postings guy, is that the whole
recipe?. Does it make sense to extend it any further to any of the other
codec pieces?
Mike McCandless
http://blog.mikemccandless.com
On Tue, Oct 8, 2013 at 5:45 PM, Benson Margulies ben...@basistech.com
wrote:
Consider a Lucene index consisting of 10m
On Wed, Oct 9, 2013 at 7:18 PM, Michael McCandless
luc...@mikemccandless.com wrote:
On Wed, Oct 9, 2013 at 7:13 PM, Benson Margulies ben...@basistech.com
wrote:
On Tue, Oct 8, 2013 at 5:50 PM, Michael McCandless
luc...@mikemccandless.com wrote:
DirectPostingsFormat?
It stores all
Is there some advice around about when it's appropriate to create an
Analyzer class, as opposed to just Tokenizer and TokenFilter classes?
The advantage of the constituent elements is that they allow the
consuming application to add more filters. The only disadvantage I see
is that the following
Consider a Lucene index consisting of 10m documents with a total disk
footprint of 3G. Consider an application that treats this index as
read-only, and runs very complex queries over it. Queries with many terms,
some of them 'fuzzy' and 'should' terms and a dismax. And, finally,
consider doing all
at 5:45 PM, Benson Margulies ben...@basistech.com
wrote:
Consider a Lucene index consisting of 10m documents with a total disk
footprint of 3G. Consider an application that treats this index as
read-only, and runs very complex queries over it. Queries with many
terms,
some of them 'fuzzy
Oh, drat, I left out an 's'. I got it now.
On Tue, Oct 8, 2013 at 7:40 PM, Benson Margulies ben...@basistech.comwrote:
Mike, where do I find DirectPostingFormat?
On Tue, Oct 8, 2013 at 5:50 PM, Michael McCandless
luc...@mikemccandless.com wrote:
DirectPostingsFormat?
It stores all
://blog.mikemccandless.com
On Tue, Oct 1, 2013 at 7:09 AM, Adrien Grand jpou...@gmail.com wrote:
Hi Benson,
On Mon, Sep 30, 2013 at 5:21 PM, Benson Margulies ben...@basistech.com
wrote:
The multithreaded index searcher fans out across segments. How
aggressively
does 'optimize' reduce
The multithreaded index searcher fans out across segments. How aggressively
does 'optimize' reduce the number of segments? If the segment count goes
way down, is there some other way to exploit multiple cores?
should be useful: there is
a JIRA for it, but it has some unresolved issues
https://issues.apache.org/jira/browse/LUCENE-4072
On Sun, Sep 15, 2013 at 7:05 PM, Benson Margulies bimargul...@gmail.com
wrote:
Can anyone shed light as to why this is a token filter and not a char
filter? I'm
Can anyone shed light as to why this is a token filter and not a char
filter? I'm wishing for one of these _upstream_ of a tokenizer, so that the
tokenizer's lookups in its dictionaries are seeing normalized contents.
in the original that might as well be blamed for any given
component.
On Fri, Sep 6, 2013 at 9:37 PM, Robert Muir rcm...@gmail.com wrote:
On Fri, Sep 6, 2013 at 9:32 PM, Benson Margulies ben...@basistech.com wrote:
On Fri, Sep 6, 2013 at 9:28 PM, Robert Muir rcm...@gmail.com wrote:
its the latter
On Sat, Sep 7, 2013 at 8:39 AM, Robert Muir rcm...@gmail.com wrote:
On Sat, Sep 7, 2013 at 7:44 AM, Benson Margulies ben...@basistech.com wrote:
In Japanese, compounds are just decompositions of the input string. In
other languages, compounds can manufacture entire tokens from thin
air
nextToken() calls peekToken(). That seems to prevent my lookahead
processing from seeing that item later. Am I missing something?
On Fri, Sep 6, 2013 at 9:15 PM, Benson Margulies ben...@basistech.com wrote:
I think that the penny just dropped, and I should not be using this class.
If I call
the buffered tokens, and to insert your own tokens when
afterPosition() is called ...
Mike McCandless
http://blog.mikemccandless.com
On Sat, Sep 7, 2013 at 1:10 PM, Benson Margulies ben...@basistech.com wrote:
nextToken() calls peekToken(). That seems to prevent my lookahead
processing from seeing
, thanks!
Mike McCandless
http://blog.mikemccandless.com
On Sat, Sep 7, 2013 at 3:40 PM, Benson Margulies ben...@basistech.com wrote:
I think I had better build you a test case for this situation, and
attach it to a JIRA.
On Sat, Sep 7, 2013 at 3:33 PM, Michael McCandless
luc
On Fri, Sep 6, 2013 at 7:31 AM, Michael McCandless
luc...@mikemccandless.com wrote:
On Thu, Sep 5, 2013 at 8:44 PM, Benson Margulies ben...@basistech.com wrote:
I'm trying to work through the logic of reading ahead until I've seen
marker for the end of a sentence, then applying some analysis
I'm confused by the comment about compound components here.
If a single token fissions into multiple tokens, then what belongs in
the PositionLengthAttribute. I'm wanting to store a fraction in here!
Or is the idea to store N in the 'mother' token and then '1' in each
of the babies?
.
public boolean incrementToken() throws IOException {
if (positions.getMaxPos() 0) {
peekSentence();
}
return nextToken();
}
On Fri, Sep 6, 2013 at 8:13 AM, Benson Margulies ben...@basistech.com wrote:
On Fri, Sep 6, 2013 at 7:31 AM, Michael
it.
On Fri, Sep 6, 2013 at 9:10 PM, Benson Margulies ben...@basistech.com wrote:
Michael,
I'm apparently not fully deconfused yet.
I've got a very simple incrementToken function. It calls peekToken to
stack up the tokens.
afterPosition is never called; I expected it to be called as each
On Fri, Sep 6, 2013 at 9:28 PM, Robert Muir rcm...@gmail.com wrote:
its the latter. the way its designed to work i think is illustrated
best in kuromoji analyzer where it heuristically decompounds nouns:
if it decompounds ABCD into AB + CD, then the tokens are AB and CD.
these both have
This useful-looking item is in the test-framework jar. Is there some subtle
reason that it isn't in the common analyzer jar? Some reason why I'd regret
using it?
I'm trying to work through the logic of reading ahead until I've seen
marker for the end of a sentence, then applying some analysis to all of the
tokens of the sentence, and then changing some attributes of each token to
reflect the results.
The queue of tokens for a position is just a State, so
On Thu, Sep 6, 2012 at 1:59 PM, Robert Muir rcm...@gmail.com wrote:
Thanks for reporting this Mark.
I think it was not intended to have actual null characters here (or
probably anywhere in javadocs).
Our javadocs checkers should be failing on stuff like this...
On Thu, Sep 6, 2012 at 1:52
I'm failing to find advice in MIGRATE.txt on how to replace 'new
Payload(...)' in migrating to 4.0. What am I missing?
Our Solr 3.x code used init(ResourceLoader) and then called the loader to
read a file.
What's the new approach to reading content from files in the 'usual place'?
That's what I meant, thanks.
On Wed, Aug 29, 2012 at 10:20 AM, Robert Muir rcm...@gmail.com wrote:
On Wed, Aug 29, 2012 at 10:10 AM, Benson Margulies ben...@basistech.com
wrote:
Our Solr 3.x code used init(ResourceLoader) and then called the loader to
read a file.
What's the new
I'm confused. Isn't inform/ResourceLoader deprecated? But your example use
it?
On Wed, Aug 29, 2012 at 10:20 AM, Robert Muir rcm...@gmail.com wrote:
On Wed, Aug 29, 2012 at 10:10 AM, Benson Margulies ben...@basistech.com
wrote:
Our Solr 3.x code used init(ResourceLoader) and then called
I'm close to the bottom of my list here.
I've got an Analyzer that, in 3.1, set up a CharFilter in the tokenStream
method. So now I have to migrate that to createComponents. Can someone give
me a shove in the right direction?
On Wed, Aug 29, 2012 at 10:30 AM, Robert Muir rcm...@gmail.com wrote:
On Wed, Aug 29, 2012 at 10:27 AM, Benson Margulies ben...@basistech.com
wrote:
I'm confused. Isn't inform/ResourceLoader deprecated? But your example
use
it?
Where is it deprecated? What does the deprecation message
Hang on:
[deprecation] org.apache.solr.util.plugin.ResourceLoaderAware in
org.apache.solr.util.plugin has been deprecated
On Wed, Aug 29, 2012 at 10:30 AM, Robert Muir rcm...@gmail.com wrote:
On Wed, Aug 29, 2012 at 10:27 AM, Benson Margulies ben...@basistech.com
wrote:
I'm confused
On Wed, Aug 29, 2012 at 10:42 AM, Robert Muir rcm...@gmail.com wrote:
Right and what does the @deprecated message say :)
Yes, indeed, sorry. I got caught in a maze of twisty passages and my brain
turned off. I'm better now.
On Wed, Aug 29, 2012 at 10:40 AM, Benson Margulies ben
I've read the javadoc through a few times, but I confess that I'm still
feeling dense.
Are all tokenizers responsible for implementing some way of retaining the
contents of their reader, so that a call to reset without a call to
setReader rewinds? I note that CharTokenizer doesn't implement
.
The fact that CharTokenizer is doing 'reset()-like-stuff' in here is
bogus IMO, but I dont think it will cause any bugs. Don't emulate it
:)
On Wed, Aug 29, 2012 at 3:29 PM, Benson Margulies ben...@basistech.com
wrote:
I've read the javadoc through a few times, but I confess that I'm still
Some interlinear commentary on the doc.
* Resets this stream to the beginning.
To me this implies a rewind. As previously noted, I don't see how this
works for the existing implementations.
* As all TokenStreams must be reusable,
* any implementations which have state that needs to be
I think I'm beginning to get the idea. Is the following plausible?
At the bottom of the stack, there's an actual source of data -- like a
tokenizer. For one of those, reset() is a bit silly, and something like
setReader is the brains of the operation.
Some number of other components may be
If I'm following, you've created a division of labor between setReader and
reset.
We have a tokenizer that has a good deal of state, since it has to split
the input into chunks. If I'm following here, you'd recommend that we do
nothing special in setReader, but have #reset fix up all the state on
Uwe and Robert,
Thanks. David and I are two peas in one pod here at Basis.
--benson
On Fri, Apr 20, 2012 at 2:33 AM, Uwe Schindler u...@thetaphi.de wrote:
Hi,
Ah sorry, I misunderstood, you wanted to score the duplicate match lower! To
achieve this, you have to change the coord function in
I am trying to solve a problem using DisjunctionMaxQuery.
Consider a query like:
a:b OR c:d OR e:f OR ...
name:richard OR name:dick OR name:dickie OR name:rich ...
At most, one of the richard names matches. So the match score gets
dragged down by the long list of things that don't match, as
On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir rcm...@gmail.com wrote:
On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies bimargul...@gmail.com
wrote:
I am trying to solve a problem using DisjunctionMaxQuery.
Consider a query like:
a:b OR c:d OR e:f OR ...
name:richard OR name:dick
Turning on disableCoord for a nested boolean query does not seem to
change the overall maxCoord term as displayed in explain.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:
On Thu, Apr 19, 2012 at 4:21 PM, Robert Muir rcm...@gmail.com wrote:
On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies bimargul...@gmail.com
wrote:
On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir rcm...@gmail.com wrote:
On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies bimargul...@gmail.com
On Thu, Apr 19, 2012 at 5:10 PM, Robert Muir rcm...@gmail.com wrote:
On Thu, Apr 19, 2012 at 5:05 PM, Benson Margulies bimargul...@gmail.com
wrote:
On Thu, Apr 19, 2012 at 4:21 PM, Robert Muir rcm...@gmail.com wrote:
On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies bimargul...@gmail.com
I see why I'm so confused, but I think I need to construct a simpler test case.
My top-level BooleanQuery, which has disableCoord=false, has 22
clauses. All but three are ordinary SHOULD TermQueries. the remainder
are a spanNear and a nested BooleanQuery, and an empty PhraseQuery
(that's a bug).
to accomplish this?
On Thu, Apr 19, 2012 at 7:37 PM, Robert Muir rcm...@gmail.com wrote:
On Thu, Apr 19, 2012 at 6:36 PM, Benson Margulies bimargul...@gmail.com
wrote:
I see why I'm so confused, but I think I need to construct a simpler
test case.
My top-level BooleanQuery, which has
We've observed something that, in some ways, is not surprising.
If you take a set of documents that are close in 'score' to some query,
and shuffle them in different orders
and then see what results you get in what order from the reference query,
the scores will vary according to the
should not change as a function of insertion
order...
Well, I assumed that TF-IDF would wiggle.
Do you have a small test case?
SInce this surprises you, I will build a test case.
Mike McCandless
http://blog.mikemccandless.com
On Mon, Apr 2, 2012 at 5:28 PM, Benson Margulies bimargul
fileformat.info
On Mar 30, 2012, at 1:04 PM, Denis Brodeur denisbrod...@gmail.com wrote:
Thanks Robert. That makes sense. Do you have a link handy where I can
find this information? i.e. word boundary/punctuation for any unicode
character set?
On Fri, Mar 30, 2012 at 12:57 PM, Robert Muir
I've posted a self-contained test case to github of a mystery.
git://github.com/bimargulies/lucene-4-update-case.git
The code can be seen at
https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java.
I write a doc to an index,
Under LUCENE-1458, LUCENE-2111: Flexible Indexing, CHANGES.txt
appears to be missing one critical hint. If you have existing code
that called IndexReader.terms(), where do you start to get a
FieldsEnum?
-
To unsubscribe, e-mail:
this instead of
MultiFields.getFields(indexReader).iterator();
which I came up with by fishing around for myself?
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Benson Margulies [mailto:bimargul
.
Mike McCandless
http://blog.mikemccandless.com
On Tue, Mar 6, 2012 at 8:50 AM, Benson Margulies bimargul...@gmail.com
wrote:
Under LUCENE-1458, LUCENE-2111: Flexible Indexing, CHANGES.txt
appears to be missing one critical hint. If you have existing code
that called IndexReader.terms
Oh, I see, I didn't read far enough down. Well, the patch still
repairs a bug in the code fragment relative to the Term enumeration.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands,
Oh, ouch, there's no SegmentReader.getReader, I was reading IndexWriter. Sorry.
On Tue, Mar 6, 2012 at 9:14 AM, Benson Margulies bimargul...@gmail.com wrote:
On Tue, Mar 6, 2012 at 8:56 AM, Uwe Schindler u...@thetaphi.de wrote:
AtomicReader.fields
etc that take Term don't analyze any text.
Instead usually higher-level things like QueryParsers analyze text into Terms.
On Tue, Mar 6, 2012 at 8:35 AM, Benson Margulies bimargul...@gmail.com
wrote:
I've posted a self-contained test case to github of a mystery.
git://github.com
On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies bimargul...@gmail.com wrote:
On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir rcm...@gmail.com wrote:
I think the issue is that your analyzer is standardanalyzer, yet field
text value is value-1
Robert,
Why is this field analyzed at all? It's
that MultiFields will be fine.
--benson
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Benson Margulies [mailto:bimargul...@gmail.com]
Sent: Tuesday, March 06, 2012 3:15 PM
To: java-user
On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir rcm...@gmail.com wrote:
On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies bimargul...@gmail.com
wrote:
On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir rcm...@gmail.com wrote:
I think the issue is that your analyzer is standardanalyzer, yet field
text
analyzing
StringField when we shouldn't...
Mike McCandless
http://blog.mikemccandless.com
On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir rcm...@gmail.com wrote:
On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies
bimargul...@gmail.com wrote:
On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir rcm...@gmail.com
it, otherwise it
should be pkg-private.
Oh! I'll rework the patch again, then. I might include some commentary
in MultiFields at all.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Benson Margulies
for sneaking around this in the mean time?
On Tue, Mar 6, 2012 at 9:58 AM, Benson Margulies bimargul...@gmail.com
wrote:
On Tue, Mar 6, 2012 at 9:47 AM, Uwe Schindler u...@thetaphi.de wrote:
String field is analyzed, but with KeywordTokenizer, so all should be fine.
I filed LUCENE-3854
Sorry, I'm coming up empty in Google here.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
To reduce noise slightly I'll stay on this thread.
I'm looking at this file, and not seeing a pointer to what to do about
QueryParser. Are jar file rearrangements supposed to be in that file?
I think that I don't have the right jar yet; all I'm seeing is the
'surround' package.
-
o.a.l.queryparser.classic.TokenMgrError
-Original Message-
From: Benson Margulies [mailto:bimargul...@gmail.com]
Sent: Monday, March 05, 2012 11:15 AM
To: java-user@lucene.apache.org
Subject: Re: What replaces IndexReader.openIfChanged in Lucene 4.0?
To reduce noise slightly I'll stay on this thread.
I'm
I am walking down the document in an index by number, and I find that
I want to update one. The updateDocument API only works on queries and
terms, not numbers.
So I can call remove and add, but, then, what's the document's number
after that? Or is that not a meaningful question until I make a
1 - 100 of 122 matches
Mail list logo