I just saw an e-mail from Yonik suggesting escaping the space. I know
so little about Solr that all I can do is parrot Yonik...
Erick
On 8/8/07, Matthew Runo [EMAIL PROTECTED] wrote:
OK.
So a followup question..
?q=department_exact:Apparel%3EMen's%
Is there any chance you're optimizing each time you commit?
Erick
On 9/10/07, Marius Hanganu [EMAIL PROTECTED] wrote:
Hi,
We're having a problem when commiting to SOLR.
Our application commits right after each update - we need the data to be
available instantaneously. The index' size is
DISCLAIMER: This is from a Lucene-centric viewpoint. That said, this may be
useful
For your line number, page number etc perspective, it is possible to index
special guaranteed-to-not-match tokens then use the termdocs/termenum
data, along with SpanQueries to figure this out at search time.
It's good you already have the data because if you somehow got it from
some sort of calculations I'd have to tell my product manager that
the feature he wanted that I told him couldn't be done with our data
was possible after all G...
About page breaks:
Another approach to paging is to index a
I can't answer the question, but I *can* guarantee that
the people who can will give you *much* better
responses if you include some details. Like which
analyzers you use, how you submit the query,
samples of the two queries that work and the
one that doesn't.
Imagine you're on the receiving end
The beautiful thing about a wiki is that *anybody* can update them. It's
especially useful if someone who's just struggled through the issues
can write something up since the pain is still fresh G. Especially
if you're better than I am about writing things down
All of which leads me to ask if
Scoring isn't that simple, but don't ask me details G.. This link might
be useful:
http://lucene.apache.org/java/docs/scoring.html
Erick
On Dec 6, 2007 2:15 PM, Phillip Farber [EMAIL PROTECTED] wrote:
Hello Hoss,
I appreciate your detailed response. I think I like your second
alternative
member's
index (or indices - some users have multiple indices) separate. I can't
give out the total number of Simpy users, but I can tell you it is
weeell beyond 1000 :)
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Erick Erickson
How much data are we talking about here? Because it seems *much* simpler
to just index a field with each document indicating the user and then just
AND that user's ID in with your query.
Or think about facets (although I admit I don't know enough about facets
to weigh in on its merits, it's just
I think that what Yonik wants is a higher-level response.
*Why* do you want to process the tokens later? What is the
use case you're trying to satisfy?
Best
Erick
On Dec 20, 2007 1:37 AM, Rishabh Joshi [EMAIL PROTECTED] wrote:
What are you trying to do with the tokens?
Yonik, we wanted a
Well, you *still* have to store the stemmed and unstemmed version
in your index, otherwise you can't distinguish between, say,
run and running because you'd have indexed run both times.
But you could think about using special tokenizing. That is, for
a word that's stemmed, index a stem form.
You might try searching the Lucene users list for NFS. I know there has
been frequent discussion of locking issues etc. But since I'm not
using an NFS mount, I just glossed over them.
Also, my recollection is that many (most? all?) of the underlying issues
have been dealt with with new versions
I did something like this in low-level Lucene using
FieldSortedHitQueue. The searchable lucene users
list should have more details.
Don't know how to do it in SOLR though...
Erick
On Jan 11, 2008 1:04 PM, Jörg Kiegeland [EMAIL PROTECTED] wrote:
Hello,
I have a query of the form (a or b).
Have you seen this page?
http://lucene.apache.org/java/docs/queryparsersyntax.html
From that page:
Note: The NOT operator cannot be used with just one term. For example, the
following search will return no results:
NOT jakarta apache
Erick
On Jan 14, 2008 9:30 AM, Karen Loughran [EMAIL
I don't think this is a StringBuilder limitation, but rather your Java
JVM doesn't start with enough memory. i.e. -Xmx.
In raw Lucene, I've indexed 240M files
Best
Erick
On Jan 16, 2008 10:12 AM, David Thibault [EMAIL PROTECTED]
wrote:
All,
I just found a thread about this on the
P.S. Lucene by default limits the maximum field length
to 10K tokens, so you have to bump that for large files.
Erick
On Jan 16, 2008 11:04 AM, Erick Erickson [EMAIL PROTECTED] wrote:
I don't think this is a StringBuilder limitation, but rather your Java
JVM doesn't start with enough memory
On 1/16/08, Erick Erickson [EMAIL PROTECTED] wrote:
P.S. Lucene by default limits the maximum field length
to 10K tokens, so you have to bump that for large files.
Erick
On Jan 16, 2008 11:04 AM, Erick Erickson [EMAIL PROTECTED]
wrote:
I don't think
I would *strongly* encourage you to store them together
as one document. There's no real method of doing
DB like joins in the underlying Lucene search engine.
But that's generic advice. The question I have for you is
What's the big deal about coordinating the sources?
That is, you have to have
You can always use the trunk build, but you'll have to check the
status of SOLR-303 to be sure it's in the trunk...
Here's a thread that discusses this...
http://mail.google.com/mail/?zx=wmtcqx3ngeupshva=1#label/Solr/11799e3704804489
Best
Erick
On Jan 21, 2008 10:55 AM, David Pratt [EMAIL
On Jan 21, 2008 11:34 AM, David Pratt [EMAIL PROTECTED] wrote:
Hi Erick. Thank you for your reply. Unfortunately, I cannot access the
link you provided. It this message from the solr-user list? Many thanks.
Regards,
David
Erick Erickson wrote:
You can always use the trunk build
Just to add another wrinkle, how clean is your OCR? I've seen it
range from very nice (i.e. 99.9% of the words are actually words) to
horrible (60%+ of the words are nonsense). I saw one attempt
to OCR a family tree. As in a stylized tree with the data
hand-written along the various branches in
As chance would have it, this was just discussed over on the lucene
user's list. See the thread..
Inverted search / Search on profilenetBest
Erick
On Jan 23, 2008 1:38 PM, George Everitt [EMAIL PROTECTED]
wrote:
Verity had a function called profiler which was essentially an
inverted search
To me, it's really a question of where the work should be done given your
problem space. Injecting synonyms at index time allows the queries to be
simpler/faster. Injecting the synonyms at query time gets complex but is
more flexible.
As always, it's a time/space tradeoff. If you're willing to
What analyzers are you using? Many analyzers (both index and
query time) will remove non-alpha characters.
Best
Erick
On Feb 7, 2008 1:14 PM, nithyavembu [EMAIL PROTECTED] wrote:
Hi All,
Now i am facing problem in special character search.
I tried with the following special characters
When in doubt, use WhitespaceAnalyzer and build up from there. It's the
simplest. Look at the Lucene docs for what the various analyzers do
under the covers.
Note: WhitespaceAnalyzer does NOT transform to lowercase, you have
to do that yourself or compose your own analyzer.
Erick
On Feb 9,
Well, the *first* sort to the underlying Lucene engine is expensive since
it builds up the terms to sort. I wonder if you're closing and opening the
underlying searcher for every request? This is a definite limiter.
Disclaimer: I mostly do Lucene, not SOLR (yet), so don't *even* ask
me how to
I admit I know little about SOLR, but wouldn't an AlphaOnlySorter ignore
the digits?
Erick
On Thu, Feb 14, 2008 at 3:51 AM, Mahesh Udupa [EMAIL PROTECTED]
wrote:
Hello,
I have following entry in my title list:
Content1
Content2
Content3
Content4
Content5
If I try to Sort it in
Beating Hossman to the punch
http://people.apache.org/~hossman/#threadhijackhttp://people.apache.org/%7Ehossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email. Even
As always, it depends.
Just from a complexity perspective, my first choice is to store everything
in
one repository.
If I can store everything in Lucene, I'm a happy camper. If I *must* use a
database, I'd prefer to store everything there if possible. I only use both
if I can't avoid it because
Your problem might be solved by (from memory, so check it), using a filter
for indexing that collapses flexed (accented etc?) characters.
See IsoLatin1AccentFilter
Best
Erick
On Tue, Mar 25, 2008 at 1:56 PM, Lucas F. A. Teixeira
[EMAIL PROTECTED] wrote:
Hello all,
We r having some
. A. Teixeira
[EMAIL PROTECTED] wrote:
Thanks Erick,
But its already being used :-(
still looking for something :-)
Thank you!
[]s,
Lucas
Erick Erickson wrote:
Your problem might be solved by (from memory, so check it), using a
filter
for indexing that collapses flexed (accented
You might want to bounce over to the Lucene user's list and search
for language. This topic has arisen many times and there's some good
discussion. And have you searched the solr users list of language? I
know it's turned up here as well.
Best
Erick
On Mon, May 5, 2008 at 4:28 PM, Eli K [EMAIL
the really simple way is to index none for fields that are empty then just
search on color:none.
On Tue, May 6, 2008 at 9:06 PM, Brendan Grainger [EMAIL PROTECTED]
wrote:
Hi,
Not sure if this is what you want, but to search for 'empty' fields we use
something like this:
(*:* AND
This still isn't very helpful. How big are the docs? How many fields do you
expect to index? What is your expected query rate?
You can get away with an old laptop if your docs are, say, 5K each and you
only
expect to query it once a day and have one text field.
If each doc is 10M, you're
I don't see a semi-colon at the end of your entity reference, is that a
typo?
i.e. amp;
On Fri, May 9, 2008 at 9:26 AM, Ricky [EMAIL PROTECTED] wrote:
I have tried sending the 'amp' instead of '' like the following,
field name =companyA amp K Inc/field.
But i still get the same error entity
But are you sure you're not just masking the problem? That is, your limit
may now be 90,000 queries...
I always assume this kind of thing is a memory leak somewhere, have you
any tools to monitor your memory consumption and see if that's ever-rising?
Best
Erick
On Mon, Jun 2, 2008 at 10:38 AM,
Compression is only relevant for the original text, not the indexed
part. So in terms of searching, it's irrelevant.
Where it is relevant is when you *fetch* the document (e.g.
doe = hits.doc(32)), the de-compression work is done (for
stored documents). Depending upon your app, this may or
may
I think you want to boost specific clauses at *search* time, not
index time. Something like adding a clause
+CourseType:MATHMATICS^10
Best
Erick
On Tue, Aug 5, 2008 at 4:35 PM, Vicky_Dev [EMAIL PROTECTED]wrote:
Hi
Requirement: For given document , if course type = MATHMATICS then search
I'm surprised, as you are, by the non-linearity. Out of curiosity, what is
your MaxFieldLength? By default only the first 10,000 tokens are added
to a field per document. If you haven't set this higher, that could account
for it.
As far as I know, optimization shouldn't really affect the index
I've done exactly this many times in straight Lucene. Since Solr is built
on Lucene, I wouldn't anticipate any problems.
Make sure your transfer is binary mode...
Best
Erick
On Fri, Aug 15, 2008 at 8:02 AM, johnwarde [EMAIL PROTECTED] wrote:
Hi,
Can I copy an index built on a Windows
You *might* be able to reconstruct enough of the original documents
from your indexes to create another without recrawling. I know Luke
can reconstruct documents form an index, but for unstored data it's
slow and may be lossy.
But it may suit your needs given how long it takes to make your index
How long does it take to build the entire index? Can you just rebuild it
from scratch every night? That would be the simplest.
Best
Erick
On Thu, Sep 25, 2008 at 12:48 PM, sundar shankar
[EMAIL PROTECTED]wrote:
Hi,
We have an index of courses (about 4 million docs in prod) and we have
a
Have you looked at Nutch? It's built on top of Lucene and might
be a better fit.
But you simply must give more details about what your
requirements to get a meaningful answer. Imagine *you* were
reading your e-mail without knowing anything except
the information contained in the message. How
Hmmm, how do you know which particular record corresponds to which keyword?
Is this a list known at index time, as in this record should come up first
whenever bonkers is the keyword?
If that's the case, you could copy the magic keyword to a different field
(say magic_keyword) and boost it right
What do you get back when you specify debugQuery=on?
Best
Erick
On Mon, Apr 5, 2010 at 7:31 PM, Mark Fletcher
mark.fletcher2...@gmail.comwrote:
Hi,
I am using the dismax handler.
I have a field named *myfield* which has a value say XXX.YYY.ZZZ. I have
boosted myfield^20.0.
Even with such
Copying from another answer to this question on the list (See how to deploy
index on SOLR)...
It is possible but you have to take care to match Solr's schema with the
structure of documents in the Lucene index. The correct field names and
query-analyzers should be configured in schema.xml
HTH
Well, for a quick trial using trunk, I had to remove the
UnicodeNormalizationFactory, is that yours?
But with that removed, I get the results you do, ASSUMING that you've set
your default operator to AND in schema.xml...
Believe it or not, it all changes and all your queries return a hit if you
illustrating the behavior and maybe poke around
to see if it's an easy fix.
Thanks
Erick
On Thu, Apr 8, 2010 at 8:16 AM, Robert Muir rcm...@gmail.com wrote:
Erick, this sounds like https://issues.apache.org/jira/browse/SOLR-1852
On Wed, Apr 7, 2010 at 10:04 PM, Erick Erickson erickerick
We can't help with the information you've provided. Please review:
http://wiki.apache.org/solr/UsingMailingLists
Best
Erick
On Thu, Apr 8, 2010 at 7:23 AM, Pooja Verlani pooja.verl...@gmail.comwrote:
Hi,
In our search engine, we are getting numFound to be 0 for some queries
where documents
effects am I forgetting
about?
thanks,
Demian
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, April 07, 2010 10:04 PM
To: solr-user@lucene.apache.org
Subject: Re: solr.WordDelimiterFilterFactory problem with hyphenated
terms?
Well
.
On Thu, Apr 8, 2010 at 10:01 AM, Erick Erickson erickerick...@gmail.com
wrote:
Your're right, it sure looks related. But according to that JIRA, it's
fixed
in trunk and I'm pretty sure I have a very recent version that I built
from
code I updated within the last few days.
I'll
What analyzer is your field using at index and query time?
See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersSome analyzers
strip punctuation, some don't. Some lowercase,
some don't. You can chain filters together to do
Well, I think that's part of your problem. WhitespaceAnalyzer does
exactly what it says, splits on whitespace. So indexing carbon and
searching carbon. won't generate a hit.
If KeywordAnalyzer doesn't work for you, you could consider either
using one of the Pattern* guys or write your own.
Can we see the actual field definitions from your schema file.
Ahmet's question is vital and is best answered if you'll
copy/paste the relevant configuration entries But based
on what you *have* posted, I'd guess you're trying to
facet on tokenized fields, which is not recommended.
You might
If you're submitting this:
field1 : This is a good string
then you're searching in field1 ONLY for This. the tokens is,
a good and string are being searched against your default
search field as defined in your schema.
Have you tried parenthesizing?
Try the SOLR admin page for looking at
This is a little bit of hijacking going on here, but
It's algorithmic. That is, there isn't a list of variants that
stem to the same infinitive, and your statement
always the same infintive for any derivate of the word
isn't quite what happens.
Stemmers will always produce the same
no big deal, just wanted to mention.
On Mon, Apr 19, 2010 at 1:24 PM, dar...@ontrenet.com wrote:
This is a little bit of hijacking going on here, but
You are right. Accept my regrets.
It's algorithmic. That is, there isn't a list of variants that
stem to the same infinitive, and
?id you try parenthesizing:
field1:(This is a good string)
You can try lots of things easily by going to
http://localhost:8983/solr/admin/form.jsp
and clicking the debug enable checkbox...
HTH
Erick
On Mon, Apr 19, 2010 at 12:23 PM, MitchK mitc...@web.de wrote:
Erick,
I am a little bit
earlier too. To test query parsing, submit
your query to
http://localhost:8983/solr/select?q=your_querydebugQuery=true and look at
the parsed query output.
Erik
On Apr 19, 2010, at 6:45 PM, Erick Erickson wrote:
?id you try parenthesizing:
field1:(This is a good string)
You can try
:this +field1:good +field1:string
Is that ok to do.
Thanks,
Sandhya
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Tuesday, April 20, 2010 4:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Help using boolean operators
?id you try parenthesizing:
field1
NGrams might help here, search the SOLR list for NGram
and I think you'll find that this subject has been discussed
several times...
HTH
Erick
On Sat, Apr 24, 2010 at 9:26 PM, weiqi wang weiqi...@gmail.com wrote:
Hi,
I have about 2 million documents in my index. I want to search them by a
Hmmm, what does the rest of your query look like? And does adding
debugQuery=on show anything interesting?
Best
Erick
On Thu, Apr 29, 2010 at 6:54 AM, Jan Simon Winkelmann
winkelm...@newsfactory.de wrote:
((valid_from:[* TO 2010-04-29T10:34:12Z]) AND
(valid_till:[2010-04-29T10:34:12Z TO
This is a Lucene story, but may well apply... By the time I'd sent a request
for assistance
to the vendor of one of our search tools and received the reply you didn't
give us the
right license number, I'd found Lucene, indexed part of my corpus and run
successful
searches against it. And had
The underlying IndexReader must be reopened. If you're
searching for a document with a searcher that was opened
before the document was indexed, it won't show up on the
search results.
I'm guessing that your statement that when you search
for it with some test is coincidence, but that's just a
The problem here, I think, is that you're updating the
index in a manner that the regular SOLR webapp doesn't
know about. So the index changes without SOLR knowing
it has to reopen the index to see the modifications.
Something to try:
curl http://localhost:8983/solr/update -F stream.body=' commit
How many unique terms are in your sort field?
On Sun, May 2, 2010 at 11:48 PM, Hamid Vahedi hvb...@yahoo.com wrote:
I install 64 bit windows and my problem solved. also i using shard mode
(100 M doc per machine with one solr instance)
is there better solution? because i insert at least 5M doc
It would help a lot to see your actual config file, and if you provided a
bit more
detail about what failure looks like
Best
Erick
On Mon, May 3, 2010 at 9:43 AM, Jan Kammer jan.kam...@mni.fh-giessen.dewrote:
Hi there,
I want to enable spellchecking, but i got many fields.
I tried
The mail servers are often not too friendly with attachments, so people
either inline configs or put them on a server and post the URL.
HTH
Erick
On Wed, May 5, 2010 at 12:06 PM, Markus Fischer mar...@fischer.name wrote:
Hi,
On 05.05.2010 03:49, Chris Hostetter wrote:
: Are you
There's really no connection between NGrams and *. NGrams can be used
to handle hairy wildcard expressions, in particular searching for things
like
*blah* is one potential use of NGrams.
But your problem is simple to solve without bothering with NGrams, just use
the begin* syntax, no special
, or
something?
Thanks,
Felix
2010/5/6 Erick Erickson erickerick...@gmail.com
There's really no connection between NGrams and *. NGrams can be used
to handle hairy wildcard expressions, in particular searching for things
like
*blah* is one potential use of NGrams.
But your problem
You really have to give some more details about *why* you issue such a query
and what you are measuring, search time? total response time (which would
include network transmission)?. *Of course* matching 1.8M records will take
some time.. especially if you're trying to return the entire set of
We really need to see your schema definitions for the relevant field. For
instance,
if you're storing these as text you may just be losing the negative sign
which would
lead to all sorts of interesting failures..
Best
Erick
On Tue, May 11, 2010 at 9:53 AM, Christopher Gross
In Eclipse (you *may* need to have the subclipse plugin installed), just
right-click on the projectteamapply patch and follow the wizard
HTH
Erick
On Tue, May 11, 2010 at 12:50 PM, Jonty Rhods jonty.rh...@gmail.com wrote:
hi David,
thanks for quick reply..
please give me full command. so
What is the debug output of the query? That would shed some light on the
issue...
Best
Erick
On Tue, May 11, 2010 at 5:48 PM, Alex Wang aw...@crossview.com wrote:
Hi,
I am getting a weird behavior in my Solr (1.4) index:
I have a field defined as follows:
field name=productType
immediately by reply e-mail and delete this message.
On May 11, 2010, at 7:13 PM, Erick Erickson wrote:
What is the debug output of the query? That would shed some light on the
issue...
Best
Erick
On Tue, May 11, 2010 at 5:48 PM, Alex Wang aw...@crossview.commailto:
aw...@crossview.com
the raw *indexed* terms from the
admin console? I am not familiar with the admin console.
Thanks,
On May 12, 2010, at 10:18 AM, Erick Erickson wrote:
Hmmm, nothing looks odd about that, except perhaps the casing. If you use
the admin
console to look at the raw terms, is productbean mixed case
queries on the data you intent to.
Regards
Eric
On Wed, May 12, 2010 at 3:12 PM, Erick Erickson erickerick...@gmail.com
wrote:
I'm not entirely sure this is germane, but there's absolutely no
requirement
that
all documents in SOLR have the same fields. So it's possible for you to
index
tell me how to find the raw *indexed* terms from the
admin console? I am not familiar with the admin console.
Thanks,
On May 12, 2010, at 10:18 AM, Erick Erickson wrote:
Hmmm, nothing looks odd about that, except perhaps the casing. If you use
the admin
console to look at the raw terms
Hmmm, there's not much information to go on here.
You might review this page:
http://wiki.apache.org/solr/UsingMailingLists
and post with more information. At minimum,
the field definitions, the query output (include
debugQuery=on), perhaps what comes out
of the analysis admin page for both
Not at present, you must re-index your documents when you redefine your
schema
to change existing documents.
Field updating of documents already indexed is being worked on, but it's not
available yet.
Best
Erick
On Thu, May 13, 2010 at 3:58 PM, Anderson vasconcelos
anderson.v...@gmail.com
Probably your analyzer is removing the @ symbol, it's hard to say if you
don't include the relevant parts of your schema.
This page might help:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersBest
Erick
On Thu, May 13, 2010
HTMLStripStandardTokenizerFactory ?
Thanks
2010/5/13 Erick Erickson erickerick...@gmail.com
Probably your analyzer is removing the @ symbol, it's hard to say if you
don't include the relevant parts of your schema.
This page might help:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
http
, and this works. This is the way that i must go on? (This
could generate a trouble in the future?)
What's the advantages to set the field type to long? I must mantain this
field in string type?
Thanks
2010/5/13 Erick Erickson erickerick...@gmail.com
Not at present, you must re-index your
You might want to look at ngrams and/or shingles. In this
case I suspect that ngrams are better suited, I don't
think shingles applies with the direction you stated, but
your problem description is so short I thought I'd mention
it.
Although your collection of words can work (think synonyms) if
A couple of things:
1 try searching with debugQuery=on attached to your URL, that'll
give you some clues.
2 It's really worthwhile exploring the admin pages for a while, it'll also
give you a world of information. It takes a while to understand what the
various pages are telling you, but you'll
Are you sure you want to recompute the length when sorting?
It's the classic time/space tradeoff, but I'd suggest that when
your index is big enough to make taking up some more space
a problem, it's far too big to spend the cycles calculating each
term length for sorting purposes considering you
this way? The relevance
calculations already factor in both term frequency and field length. What's
the use-case for sorting by field length given the above?
Best
Erick
On Tue, May 25, 2010 at 3:40 AM, Sascha Szott sz...@zib.de wrote:
Hi Erick,
Erick Erickson wrote:
Are you sure you want
Don't forget to re-index after you make the change Lance suggested...
Erick
On Tue, May 25, 2010 at 4:51 PM, Lance Norskog goks...@gmail.com wrote:
Change type=string to type=text. This causes the field to be
analyzed and then searching on words finds the document.
On Tue, May 25, 2010 at
leave as an exercise for
the reader.
I really think you're reinventing the wheel here and looking at the
default scoring mechanism would be a good use of your time.
Best
Erick
On Wed, May 26, 2010 at 4:04 AM, Sascha Szott sz...@zib.de wrote:
Hi Erick,
Erick Erickson wrote:
Ah, I may have
You can get a lot of mileage out of the admin
analysis page and the full interface page, especially
by turning on the debug option on the admin
full interface page.
It takes a bit of practice to read the debug output, but
it's really, really, really worth it
Best
Erick
On Thu, May 27, 2010
Hmmm, I don't really see the problem here. I'll have to use English
examples...
Searching on the* (assuming the is a stopword) will search on
(them OR theory OR thespian) assuming those three words are in
your index. It will NOT search on the. So I think you're OK, or are
you seeing anomalous
No. You can add new documents which will reflect the new schema, but
you can't retroactively update your index.
In your specific example, it's not possible to losslessly recreate the data
to store from the indexed fields. Consider stopword removal, or lowercasing.
HTH
Erick
On Fri, May 28, 2010
You most certainly *can* store the many-many relationship, you
are just denormalizing your data. I know it goes against the grain
of any good database admin, but it's very often a good solution
for a search application.
You've gotta forget almost everything you learned about how data
*should* be
Well, the index does, indeed, get bigger. But the searches
get much faster because there's no term expansion going
on. It's another time/space tradeoff. I'm afraid you'll have
to just experiment a bit to see if this is an acceptable tradeoff.
in your particular situation
The real memory hit
The Solr admin page as access to (and uses) the field
definitions you've put in the config file. Luke has no
knowledge of this configuration, you have to choose
your analyzer from the drop down and select the one
closest to what's in your config file for SOLR. Are you
perhaps using an analyzer in
that the default is PersianAnalyzer. I switched
to StandardAnalyzer and tried a few different Lucene Compatibility values
but it didn't help :-(
On Sun, May 30, 2010 at 4:40 AM, Erick Erickson erickerick...@gmail.com
wrote:
The Solr admin page as access to (and uses) the field
definitions
to show non-string values, too.
On Sun, May 30, 2010 at 10:57 AM, Erick Erickson
erickerick...@gmail.com wrote:
Then you have to provide a lot more detail about what you did
and what you're seeing and what you think you should see. You
might review this page:
http://wiki.apache.org/solr
Assuming your config is set up to replace unique keys, you're really
doing a delete and an add (under the covers). It could very well be that
the deleted version of the document is still in your index taking up
space and will be until it is purged.
HTH
Erick
On Thu, Jun 3, 2010 at 10:22 AM,
Have you looked at DisMaxRequestHandler?
Best
Erick
On Thu, Jun 3, 2010 at 11:23 AM, homerlex homerlex.nab...@gmail.com wrote:
iorixxx wrote:
hl.requireFieldMatch=true
http://wiki.apache.org/solr/HighlightingParameters#hl.requireFieldMatch
I had tried this before but it did not
Index time boosting is different than search time boosting, so
asking about performance is irrelevant.
Paraphrasing Hossman from years ago on the Lucene list (from
memory).
...index time boosting is a way of saying this documents'
title is more important than other documents' titles. Search
time
1 - 100 of 8918 matches
Mail list logo