Try taking a look at the patch, but on a quick glance it doesn't
look like the underlying code has changed much.
But note the whole point of this is that optimize is overused
given its former name, why do you want to keep using it?
Best
Erick
On Tue, Dec 6, 2011 at 1:04 AM, KARTHIK SHIVAKUMAR
w
You may have to re-open the index. Also, is there any option
to commit? I have to admit I haven't tried this...
Best
Erick
On Wed, Dec 14, 2011 at 5:08 AM, Michael Südkamp
wrote:
> sure
>
> -Ursprüngliche Nachricht-
> Von: Vinaya Kumar Thimmappa [mailto:vthimma...@ariba.com]
> Gesendet:
Have you looked at Lucene's "MoreLikeThis"? I confess I haven't
worked with this enough to recommend *how* to use it, but it seems
like it's in the general area you're talking about.
http://lucene.apache.org/java/3_5_0/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html
Best
Er
What are you specifying for your sort criteria? And what kind of field
is it we're talking
about here?
Best
Erick
On Tue, Dec 20, 2011 at 8:45 AM, Qiurun wrote:
> Dear all,
>
> I select some of docs that meet some criteria by using TopDocs search(Query
> query, int n). Also It's easy to select
I call into question why you "retrieve and materialize as
many as 3,000 Documents from each index in order to
display a page of results to the user". You have to be
doing some post-processing because displaying
12,000 documents to the user is completely useless.
I wonder if this is an "XY" problem
completely redesign the system - but maybe
> thats just what we'll have to do.
>
> Anyways, thanks for your help! Any other suggestions would be appreciated,
> but if there is no (relatively) easy solution, thats ok.
>
> Rob
>
> On Thu, Dec 22, 2011 at 4:51 AM, Erick Erick
Lucene 2.0? I don't even know how to find the docs any more, I really
suggest you upgrade to something more recent.
In the 2.9 both IndexReader and IndexWriter have commit() methods.
Best
Erick
On Tue, Jan 3, 2012 at 8:35 AM, Dragon Fly wrote:
>
> Hi, I'm using Lucene 2.0 and was wondering how
the time interval is just a RangeQuery in the Lucene
world. The rest is pretty standard search stuff.
You probably want to have a look at the NRT
(near real time) stuff in trunk.
Your reads/writes are pretty high, so you'll need
some experimentation to size your site
correctly.
Best
Erick
On We
d vaues ) satisying
> a query criteria ? e.g. SELECT SUM(price) WHERE item=fruits
>
> Or I need to use hitCollector to achieve that ?
>
> Any sample solr/lucene query to compte aggregates ( like SUM ) will be great.
>
> -Thanks,
> Prasenjit
>
> On Thu, Jan 5, 2012 at 7:10
onent exist for Solr with
> the SUM function built in?
>
> http://wiki.apache.org/solr/StatsComponent
>
> On Thu, Jan 5, 2012 at 1:37 PM, Erick Erickson
> wrote:
>> You will encounter endless grief until you stop
>> thinking of Solr/Lucene as a replacement for
>> an R
Can you show the code? In particular are you re-opening
the index writer?
Bottom line: This isn't a problem anyone expects
in 3.1 absent some programming error on your
part, so it's hard to know what to say without
more information.
3.1 has other problems if you use spellcheck.collate,
you might
If your e-mail client sends things in anything but plain text, you might try
switching the format to plain text. I've had the spam filter
reject formatted e-mail before...
May not be relevant, but it's worth a try.
Best
Erick
On Wed, Jan 11, 2012 at 12:44 PM, Bennett, Tony
wrote:
> I tried to u
The parsing will be a trivial part of the overall
query time, so small that I wouldn't worry about
it in the least. I'd concentrate on doing the thing that
takes the least maintenance.
In the examples you're positing, it's not at all clear you
could even measure the difference...
Do what's easies
Depending on your analyzer, this could well be stripped
from the input. Perhaps try using Luke to examine the
actual values in the index to see if it's there.
And the escape character for Lucene is the backslash.. See:
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Escaping
Special Cha
What did you try and what exceptions did you get? You might review:
http://wiki.apache.org/solr/UsingMailingLists
Best
Erick
On Wed, Feb 1, 2012 at 8:54 AM, Prasad KVSH wrote:
> It will be great if you provide some working examples on this. We tried
> to deploy solr.war but getting exceptions.
Solr already logs the queries themselves although there isn't any way
that I know of to associate that with a user.
Although in Solr land, it seems that whatever servlet container that
you would use for Solr should be able to log all the URLs that hit
the server.
Best
Erick
On Mon, Feb 6, 2012 a
I'm curious what the nature of your data is such that you have 1.25
trillion documents. Even
at 100M/shard, you're still talking 12,500 shards. The "laggard"
problem will rear it's ugly
head, not to mention the administration of that many machines will be,
shall we say, non-trivial...
Best
Erick
13,000 is a bit different. Now I'm in even more need of
> help.
>
> How is "easy" - 15 million audit records a month, coming from several active
> systems, and a requirement to keep and search across seven years of data.
>
>
>
> Thanks a lot,
> The Captn
>
You cannot simply count words like this and expect the docs to be ordered
as you imply. The problem is that the lengths of the fields are encoded
an a byte (or perhaps an int, I forget). Thus, some loss of precision
is inherent in the process. You have to encode values from 1 to 2^31
or so in somet
Actually, you might well have your index be larger than your source, assuming
you're going to be both storing and indexing everything.
There's also the "deep paging" issue, see:
https://issues.apache.org/jira/browse/SOLR-1726
which comes into play if you expect to return a lot of rows.
Solr really
1> Don't use sint, it's being deprecated. And it'll take up more space
than a TrieDate
2> Precision. Sure, use the coarsest time you can, normalizing
everything to day would be a good thing.
You won't get any space savings by storing to day resolution, it's
just a long under the covers. But
depend
Have you looked at the Searcher.search variant
that takes a Sort parameter?
Best
Erick
On Sun, Feb 26, 2012 at 8:30 AM, Dragon Fly wrote:
>
> Hi,
>
> Let's say I have 6 documents and each document has 2 fields (i.e.
> CustomerName and OrderDate). For example:
>
> Doc 1 John 20120115
> Do
Just try it. Sorting doesn't load the document, it does load
the unique values for the sort field. Which is why indexing
dates benefits from using the coarsest resolution you can,
i.e. don't store millisecond resolution if all you care about
is the day something was published.
In fact, sorting doe
Sure, in Solr you can specify start/rows parameters on queries like:
&start=0&rows=1
&start=1&rows=1
&start=2&rows=1
You'll hit the "deep paging" problem, however. Briefly as you page deeper and
deeper you're response time will drop, see:
https://issues.apache.org/jira/browse/S
I'll, of course, defer to Uwe for technical Lucene issues, but you've
got a copy/paste error it looks like. I doubt it's the root of your
problem, but this code reuses priceField, it seems like
you intend the second to use salesField
NumericField priceField = new NumericField("price");
Typically, they index the text in reverse order
as well as forward order (similar to synonyms)
so if you have a term in your field
"reverse", you index "esrever" and now your
leading-wildcard search for "*verse" becomes
a trailing search for "esrev*".
There is an implementation in Solr, see:
http:
What kind of hiding are you interested in? Solr does a lot
of this...
Best
Erick
On Mon, Apr 16, 2012 at 1:37 PM, Akos Tajti wrote:
> Hi All,
>
> I'm looking for a solution that hides the complexity and the low level
> structure of Lucene (to make it much simpler to use). I came across the
> Com
a search server and the communication eith it is
> done through a RESTful API. What I need is a Java API that I can use
> programmatically.
>
> Ákos Tajti
>
>
>
> On 2012.04.16., at 19:58, Erick Erickson wrote:
>
>> What kind of hiding are you interested in? Solr doe
tom code.
>
> Ákos Tajti
>
>
>
>
> On Tue, Apr 17, 2012 at 12:23 AM, Erick Erickson
> wrote:
>
>> To do what? You're asking very general questions that are
>> hard to answer simply because of the lack of any detail,
>> use cases, etc.
>>
>>
Maybe I'm missing something here, but why not just boost the
terms in the fields at query time?
Best
Erick
On Fri, Apr 20, 2012 at 4:20 AM, Kasun Perera wrote:
> I have documents that are marked up with Taxonomy and Ontology terms
> separately.
> When I calculate the document similarity, I want
the - is part of the query syntax, you must escape it. See:
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/queryparsersyntax.html
Best
Erick
On Tue, Apr 24, 2012 at 5:44 AM, S Eslamian wrote:
> How can I search a minus digit like -123 in lucene?
> When I search this,lucene excl
2012 13:40, S Eslamian a écrit :
>>
>> Thank you but when I search this : Query termQuery = new TermQuery
>>> ("field","\-1234"); I get this exception :
>>> Invalid escape sequence (valid one are \b \t \n \f \r \" \' \\)
>>>
>>
There's no update-in-place, currently you _have_ to re-index the
entire document.
But to the original question:
There is a "limited join" capability you might investigate that would
allow you to split up the textual data and metadata into two different
documents and join them. I don't know how we
Hmmm, putting analyzed and unanalyzed values in
the same field seems like it'd be difficult to get right. In
the Solr world, two separate fields are usually used.
Sorting is right out, the results are unpredictable. What does
it mean to sort on a field with multiple tokens? For a doc
with "aardva
Not unless you provide a lot more context, there's
nothing to go on here!
You might review:
http://wiki.apache.org/solr/UsingMailingLists
Best
Erick
On Wed, May 2, 2012 at 6:11 AM, Zeynep P. wrote:
> Hi,
>
> In the pruning package, pruneAllPositions throws an exception. In the code
> it is comm
On the face of it, it looks like one of the subclasss of lucene.search.Filter
should be what you're looking for. Or is the "dynamic slice" something
you couldn't formulate into a query?
Best
Erick
On Fri, May 4, 2012 at 2:51 PM, Earl Hood wrote:
> I require the ability to perform a search on a d
In general you can't rely on anything like this. I admit the merge
stuff isn't my area of expertise, but when segments are merged,
there's no guarantee that they're merged in order. In general
the internal Lucene doc ID should be treated as predictable only
for closed segments.
Your solution of us
TermQuerys are assumed to be parsed already. So you're
looking for a _single_ term "ncbi-geneid:379474 or XI.24622".
You'd construct something like
Query query1 = new TermQuery(new Term("type", "gene"));
Query query2 = new TermQuery(new Term("alt_Id", "ncbi-geneid:379474"));
Query query3 = new Te
Performance here is an issue. The performance of Solr/Lucene's query joins is a
function of the number of _unique_ values in the fields being joined, and can
be unacceptably slow in the wild, it depends on your use-case.
denormalization is the usual approach, but that gives DB folks the hives. Bu
Hmmm, it's not quite clear what the problem is. But let's
say you have indexed your hard drive. Somewhere you'll
have to keep a record of what you've done, say the timestamp
when you started looking at your hard drive to index it.
Next time you run, you simply only index files that have changed
si
no, you can't delete those files, and you can't regenerate just those files,
all the various segment files are necessary and intertwined...
Consider using the CheckIndex facility, see:
http://solr.pl/en/2011/01/17/checkindex-for-the-rescue/
note, the CheckIndex class is contained in the lucene co
You can only show that is stored (Field.Store.YES). Only then can
you use document.get(...) and get something to display
Best
Erick
On Thu, Jul 12, 2012 at 2:55 AM, sam wrote:
> it's take a new problem,what even I seaching,I can only get the first line
> data,if the data can be seach.and ,when i
Sure, you can do it that way. But first I'd look over the zillion
tokenizers and filters
that are available and string together the ones that best suit your
need. For instance,
WhitespaceTokenizer and PatternReplaceFilter might make your regex much
easier since the PatternReplaceFilter gets just th
Lucene certainly supports multiple sort criteria, see
IndexSearcher.search, any one that takes a Sort
object. The Sort object can contain a list of fields where
any ties in the first N field(s) are decided by looking
at field N+1.
But, Ganesh, be a little careful about resolving by internal
Lucene
Hmmm, what about simply boosting very high on owner, and probably
grouping on title?
If you boosted on owner, you wouldn't even have to index the title
separately for each user, your "owner" field could be multivalued and
contain _all_ the owner IDs. In that case you wouldn't have to group
at all.
of somehow doing an empty query to fetch all docs, sorting them to
> put docs with the userId first, and then running a DuplicateFilter on title
> with KM_USE_FIRST_OCCURRENCE. This is the duplicate elimination behavior I
> want. Then do a text search on the remainder. But this
My guess is in the September/October time frame. Things are actually
moving along pretty quickly, and there's considerable sentiment to get
it out the door...
Best
Erick
On Mon, Jul 30, 2012 at 11:11 PM, Vitaly Funstein wrote:
> Given that the Alpha is out, are there any more or less definitive
I don't see how you could without indexing everything first
since you can't know what the most frequent terms until
you've processed all your documents
If you know these terms in advance, it seems like you could
just call then stopwords and use the common stopword
processing.
If you have to e
The first thing you should do is enumerate what you expect and what you get.
We have no way of knowing what expectations of yours are not being met.
Here's an interesting blog you might want to read:
http://searchhub.org/dev/2011/12/28/why-not-and-or-and-not/
Best
Erick
On Wed, Sep 5, 2012 at 8:
Yeah, payloads are probably what you want, otherwise the words are
indistinguishable.
Best
Erick
On Sat, Sep 29, 2012 at 12:23 PM, parnab kumar wrote:
> Hi All,
>
>I have an algorithm by which i measure the importance of a term
> in a document . While indexing i want to store weight
> be the payload weight .
>Is the above intuition correct ?
>
> Thanks,
> Parnab
>
> On Sun, Sep 30, 2012 at 2:13 AM, Erick Erickson
> wrote:
>
>> Yeah, payloads are probably what you want, otherwise the words are
>> indistinguishable.
>>
>> Be
You probably want to use a Lucene Filter then use
one of the query methods that takes a filter.
Best
Erick
On Tue, Oct 16, 2012 at 4:36 AM, sxam wrote:
> Hi,
> Prior to search I have a concrete list of Lucene Document Ids (different
> every time) and I want to limit my search only to those speci
eds to change
> for every query, that might be a problem.
>
> 2012/10/16 Erick Erickson
>
>> You probably want to use a Lucene Filter then use
>> one of the query methods that takes a filter.
>>
>> Best
>> Erick
>>
>> On Tue, Oct 16, 2012 at 4:
You haven't given us much to go on here, you might review:
http://wiki.apache.org/solr/UsingMailingLists
But one can imagine that this must be something you're doing that's
unusual, or more people would have reported something similar.
At a guess (since you haven't really told us _anything_ abou
So I've gotta ask... _why_ do you want to inject the spaces?
If it's just to break this up into tokens, wouldn't something like
LetterTokenizer do? Assuming you aren't interested in
leaving in numbers Or even StandardTokenizer unless you have
e-mail & etc.
Or what about PatternReplaceCharFilt
kenizers is that they get rid of the commas
> so if I use a ShingleFilter after them there's no way to tell if there was
> a comma there or not.
>
> (another option I consider is to add an Attribute to specify if there was
> a comma before or after a token)
>
> if there
first id see if omitting term frequencies and positions and norms did what
you need, these are all things you can disable OOB...
Best
Erick
On Mon, Nov 5, 2012 at 5:26 AM, Damian Birchler
wrote:
> Hi everyone
>
> ** **
>
> We are using Lucene to search for possible duplicates in an address
First, sorting on tokenized fields is undefined/unsupported. You _might_
get away with it if the author field always reduces to one token, i.e. if
you're always indexing only the last name.
I should say unsupported/undefined when more than one token is the result
of analysis. You can do things lik
cement=""
> replace="all"/>
> > pattern="(.{1,30})(.{31,})"
> > replacement="$1"
> replace="all"/>
> >
> >
> >
> > It reduces long lis
There's nothing in Solr that I know of that does this. It would be a pretty
easy custom filter to create though
FWIW,
Erick
On Tue, Nov 13, 2012 at 7:02 AM, Robert Muir wrote:
> On Mon, Nov 12, 2012 at 10:47 PM, Bernd Fehling
> wrote:
> > By the way, why does TrimFilter option updateOffse
{
> > if (input.incrementToken()) {
> > if (termAtt.length() > maxLength) {
> > termAtt.setLength(maxLength);
> > }
> >
> > return true;
> > } else {
> > return false;
> > }
> >
I'd make it easy for myself. Generate (programmatically), a list like you
showed for a _lot_ more terms, send it to your customer, and let _them_
pick. Unfortunately, the customer has no idea what "aggressive" means (for
that matter, I don't know how porter handles specific words for that
matter, I
I think you're looking for per-field similiarity, does this help?
https://issues.apache.org/jira/browse/LUCENE-2236
Note, in 4.0 only
Best
Erick
On Sat, Dec 1, 2012 at 1:43 PM, Eyal Ben Meir wrote:
> Can one replace the basic scoring algorithm (TF/IDF) for a specific field,
> to use a differe
If it's a fixed list and not excessively long, would synonyms work?
But if theres some kind of logic you need to apply, I don't think you're
going to find anything OOB.
The problem is that by the time a token filter gets called, they are
already split up, you'll probably
have to write a custom fil
Try looking at the admin/analysis page, that'll probably tell you a lot.
You'll have to provide quite a bit more information to help us help you,
you might want to review:
http://wiki.apache.org/solr/UsingMailingLists
Best
Erick
On Sat, Dec 22, 2012 at 9:28 PM, wrote:
> I am trying to index so
It's always amazing to me how hitting the "send" button makes the solution
obvious...too late .
Been there, done that... more times than I want to count
Best
Erick
On Sun, Dec 23, 2012 at 2:30 PM, Jeremy Long wrote:
> Have you ever wished you could retract your question to a mailing list?
BTW, if all you're interested in is the compiled code, you can always get
the latest build from:
http://wiki.apache.org/solr/NightlyBuilds(4x-SNAPSHOT). That code will
be compiled from the link Shai pointed out
except for any commits since the build...
FWIW,
Erick
On Wed, Jan 2, 2013 at 2:01 PM,
This page might add a little insight:
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html#file-names.
It's for 3.5, but the take-away is that *.fdt and *.fdx files are where a
raw copy of the data goes when you specify Store.YES. They are completely
independent of all t
Maybe do the handling as an overridable method and make it abstract?
That would give the skeleton of all the recovery stuff, but then
require the user to implement the actual recovery?
Just a thought
Erick
On Mon, Jan 21, 2013 at 9:06 AM, Michał Brzezicki wrote:
> I don't think it is possible to
P.S. Or just attach the code without your customized doc recovery
stuff with a note about how to carry it forward? That way someone
could pick it up if interested and generalize it.
Best
Erick
On Mon, Jan 21, 2013 at 12:37 PM, Erick Erickson
wrote:
> Maybe do the handling as an overrida
I haven't used it myself, but I did find this for atomic updates:
http://www.mumuio.com/solrj-4-0-0-alpha-atomic-updates/
Don't know if there really is need for specific support in SolrJ for RTG,
isn't that all over on the Solr side and automagic?
Best
Erick
On Wed, Jan 30, 2013 at 5:47 PM, Dye
There's an overridable default of 10,000 tokens, that's the first place I'd
look. Forget just how to set it to a higher value
Best
Erick.
P.S. Please don't hit reply to a message and change the title, but start an
e-mail fresh. See: http://people.apache.org/~hossman/#threadhijack
On Thu, Fe
If you kept an "indexed_time" field, you could always just index
to the same instance and then do a delete by query, something like
timestamp:[* TO NOW/DAY],
commit and go. That would delete everything indexed before midnight.
last night (NOW/DAY rounds down).
Note, most of this would be already r
ISOLatin1AccentFilter has been deprecated for quite some time,
ASCIIFoldingFilter is preferred
Best
Erick
On Fri, Mar 22, 2013 at 2:59 PM, Jerome Blouin wrote:
> Thanks. I'll check that later.
>
> -Original Message-
> From: Sujit Pal [mailto:sujitatgt...@gmail.com] On Behalf Of SUJ
@Tom - done
On Mon, Mar 25, 2013 at 12:48 PM, Tom Burton-West wrote:
> Please add tburtonw to contributors
> Tom Burton-West
> tburtonw at umich dot edu
>
> Tom
>
> On Mon, Mar 25, 2013 at 9:05 AM, Steve Rowe wrote:
>
> >
> > On Mar 25, 2013, at 8:49 AM, Rafał Kuć wrote:
> > > Could you add Ra
@Simon
did I actually catch a reference to: http://xkcd.com/722/
??? that's one of my all-time favorites on XKCD, I think it
describes my entire professional life
"Bobby Tables" is another (http://xkcd.com/327/).
There, I've done my bit to stop productivity today!
Erick
On Mon, Mar 25
t you put) to
> tburtonw.
>
> Steve
>
> On Mar 25, 2013, at 1:19 PM, Erick Erickson
> wrote:
>
> > @Tom - done
> >
> >
> > On Mon, Mar 25, 2013 at 12:48 PM, Tom Burton-West >wrote:
> >
> >> Please add tburtonw to contributors
> >> Tom
There are a bunch of possibilities listed here:
http://wiki.apache.org/solr/Support
Best
Erick
On Thu, Mar 28, 2013 at 2:32 PM, Nick Hoffman wrote:
> I'm looking for a consultant for Lucene Solr.
>
> Our team of 3 extended OpenBravo (Java ERP) with a built-in Shopping Cart
> (written in JS).
>
Sounds like what you want to do is
1> with each verse, store the chapter ID. This could be the ID of
another document. There's no requirement that all docs in an index
have the same structure. In this case, you could have a "type" field
in each doc with values like "verse" and "chapter". For your v
entire chapter that were highlighted
> in the selected verse.
>
> Thanks!
>
> Sent from my iPhone
>
> On Apr 7, 2013, at 5:38 AM, Erick Erickson wrote:
>
>> Sounds like what you want to do is
>> 1> with each verse, store the chapter ID. This could be the ID
Have you seen TimeLimitingCollector?
Best
Erick
On Wed, Jun 5, 2013 at 6:39 AM, 朱彦安 wrote:
> Hello!
>
> In the search hit a lot,I want to hit the 2000 docs return data immediately.
>
> I can not find such a method of lucene.
>
> I have tried:
>
> public int score(Collector collector,int max){
>
What Adrien said. I've had this happen when I kill a build
partway through (but just sometimes).
If you're on a fast network, I'll sometimes just delete the entire
.ivy2 cache, but that's a little drastic.
Erick
On Thu, Jun 20, 2013 at 9:15 AM, Adrien Grand wrote:
> Hi,
>
> On Thu, Jun 20, 2013
Security has at least two parts. First, allowing users access to
specific documents, for which Alon's comments are the "usual"
way to do this in Solr/Lucene.
But the patch you referenced doesn't address this,
it's all about encrypting the data stored on disk. This is useful
for keeping people who
WordDelimiterFilter(Factory if you're experimenting with
Solr as Jack suggests) will fix a number of your cases since
it splits on case change and numeric/alpha changes. There
are a bunch of ways to recombine things so be aware that
it'll take some fiddling with the parameters. As Jack
suggests, us
Hey Emma! It's been a while
Building on what Steven said, here's Uwe's blog on
MMapDirectory and Lucene:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
I've always considered RAMDirectory for rather restricted
use-cases. I.e. if I know without doubt that the index
is
; see if "this letter sequence occur(s)" in it? I'm thinking I'm missing
> something because that seems no different than using wildcards. Or am I
> missing a subtle difference?
>
> Thank you.
>
> -Original Message-
> From: Erick Erickson [mailto:erick
Right, unfortunately, there's nothing that I know of that's super-recent.
Jack Kupransky is e-publishing a book on Solr, which will be more up
to date but I don't know how thoroughly it dives into the underlying Lucene
code.
Otherwise, I think the best thing is to tackle a real problem (perhaps
tr
Note: as of Lucene 4.x, you can plug in your
own scoring algorithm, it ships with several
variants (e.g. BM25) so you can look at the
pluggable scoring where all the code for the
various algorithms is concentrated.
Erick
On Wed, Jul 17, 2013 at 12:40 AM, Jack Krupansky
wrote:
> The source code i
Well, it depends on what you put between your tokenizer and ngram
filter. Putting WordDelimiterFilterFactory would break up on the
underscore (and lots of other things besides) and submit the separate
tokens which would then be n-grammed separately. That has other
implications, of course, but you g
Even though you're on the Lucene list, consider installing Solr
just to see the admin/analysis page to see how your index and
query analysis works. There's no reason you couldn't split this
up on periods into separate words and then just use phrase query
to find java.lang.NullPointerException, but
I really don't see what the use-case here is. When you say "later",
what does that mean? You're indexing what and querying how?
Best
Erick
On Tue, Jul 23, 2013 at 7:19 AM, dheerajjoshim wrote:
> Greetings,
>
> I am looking a way to tokenize the String based on Logical operators
>
> Below String
Have you looked at either the Blacklight or Velocity Response Writer?
This latter is shipped standard with Solr, access it by the
/browse handler. It's pretty easily customizable
Blacklight is here: http://projectblacklight.org/
Best
Erick
On Thu, Jul 25, 2013 at 1:14 PM, mlotfi wrote:
Take a look at the BeiderMorseFilterFactory perhaps?
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PhoneticFilterFactory
Here's a mention that it explicitly works for French:
http://docs.lucidworks.com/display/solr/Phonetic+Matching
But I admit there's not much here on _how_,
You probably want something more like "electro hydraulic power assist
steering"~5,
quote marks and all. And note that it's not quite "within 5 positions",
it's more
"up to five single-word transpositions" which is kind of a slippery concept.
"electro hydraulic assist power steering"~5 would requi
As Mike said, this is an intended change. The test
passed in 3.5 because there was no check if
Span queries were working on a field that supported
them. In 4.x this is checked and an error is thrown.
Best
Erick
On Wed, Aug 14, 2013 at 12:22 AM, Yonghui Zhao wrote:
> In our old code, we create t
Have you looked at the whole flexible indexing functionality? Here's
a couple of places to start:
http://www.opensourceconnections.com/2013/06/05/build-your-own-lucene-codec/
http://www.slideshare.net/LucidImagination/flexible-indexing-in-lucene-40
I'm still not quite sure why you want to do this,
I really recommend you restructure your program, it's a hard to follow.
For instance, you open a new IndexWriter every time through the
while (flags)
loop. You only close it in the
if (iwcTemp1.getConfig().getOpenMode() == OpenMode.CREATE_OR_APPEND) {
case. That may be the root of your problem rig
ile is read.
>
> If there's another alternative I will be more than happy to know .
>
> As of now, I still get StreamClosedException and
> LockObtainFailedException. So any help on this will be deeply appreciated..
>
>
> On 9/1/2013 5:46 PM, Erick Erickson wrote:
>
>> I
Stop. Back up. Test.
The very _first_ thing I'd do is just comment out the bit that
actually indexes the content. I'm guessing you have some
loop like:
while (more files) {
read the file
transform the data
create a Lucene document
index the document
}
Just comment out the "index
Well, various people have measured between a 50% and 70+% reduction in
memory used for identical data, so I'd say so. The CHANGES.txt is where I'd
look to see if anything mentioned is worth your time.
Not to mention SolrCloud...
Erick
On Fri, Sep 6, 2013 at 3:41 PM, Darren Hoffman wrote:
> I
1 - 100 of 2299 matches
Mail list logo