ggering negative idf.
thanks,
-Mike
t;.
That said, SOLR-216 is probably a good place to start; it has gone
through at least one round of review. I fully support having a
robust, featured python client for Solr!
-Mike
You need to specific the (list of)
fields you want highlit.
-Mike
lock
type). Also, the Solrs do not communicate with each other. You have
to tell the readers manually that the index is updated (via commit()--
autoCommit will not work).
-Mike
On 26-Feb-08, at 9:39 AM, Alok Dhir wrote:
Are you saying all the servers will use the same 'data'
And yet, if you are experiencing performance problems, consider
optimizing regularly. If not, why worry?
-Mike
On 19-Feb-08, at 4:17 PM, Otis Gospodnetic wrote:
Hi,
If your mergeFactor is "reasonable" (e.g. default 10), Lucene will
keep the number of segments in the index und
r" is implied:
bq=location:Parramatta^1.4 location:NS^1.4
alternatively, you can use multiple bq's
bq=location:Parramatta^1.4
bq=location:NS^1.4
see http://wiki.apache.org/solr/SolrQuerySyntax
-Mike
On 13-Feb-08, at 6:11 PM, Leonardo Santagada wrote:
On 13/02/2008, at 23:03, Mike Klaas wrote:
On 13-Feb-08, at 4:18 PM, Leonardo Santagada wrote:
Thanks but I would like to be able to:
q=
qf=field1
q=
qf=field2
...
Try using standardrequesthandler with the dismax query parser.
How
same time.
In this search form you have 3 places where you can choose a field
and then enter a search query... and then there are some more
restrictions you can choose (we would put those in one of more fq).
Get the idea?
Try using standardrequesthandler with the dismax query parser.
-Mike
ou want to boost these queries rather than filter, use bq.
-Mike
es.
When lucene supports efficient, reopen()able fieldcache upates, this
situation might improve, but the above architecture would still
probably be better. Note that the second index can be on the same
machine.
-Mike
is better than committing frequently, too.
-Mike
.
Thanks Mike,
Some clarification:
*single-valued* in my previous Email means *field-with-single-only-
value*
(in SOLR terms, multiValued="false"), and not a *single-token*. This
*single-valued* field is analyzed/tokenized and it is *multi-valued-
token*
so that fieldCache can
ately, there is no way for solr to know that
the analyzer is only outputting a single token per document, else we
could apply this optimization automatically.
-Mike
nce, change the following line in G
etXmlDocumentFromPost()
From:
xdoc.LoadXml(sr.Replace("\n", ""));
To:
xdoc.LoadXml(sr);
Cheers
Mike
On Feb 6, 2008 12:54 PM, Mike Davies <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I'm having a small problem with Solr, have had a
F's?
Thanks in advance
Mike
is usually fruitful.
Of course, the best way to improve performance in this regard is to
store the less-frequently-used fields in a parallel solr index. This
only works if the largest fields are the rarely-used ones, though
(like retrieving the doc contents to create a summary).
-Mike
s of our
sites
from the search engine, not just the obvious search forms.
I also use this feature. It would be useful to optimize the case
where rows=0.
-Mike
queries are not analyzed (see <http://wiki.apache.org/lucene-
java/LuceneFAQ#head-133cf44dd3dff3680c96c1316a663e881eeac35a>). In
your case, turning off stemming for the field should fix the problem.
-Mike
you had mentioned and i'll be happy to
test them. I
appreciate your help !!
anuvenk,
Multi-word spell checking is available only with
extendedResults=true, and only in trunk. I believe that the current
javadocs are incorrect on this point.
-Mike
ut it also depends on the OS being able to load the hot
spots of the index into the disk cache. And with memory being as
cheap as it is...
-Mike
individual
index I've worked with is on the order of 10GB, and one thing I've
learned is to not extrapolate several orders of magnitude beyond my
experience.
-Mike
e, I think it might be better to use
lucene directly. You can read the matching doc ids for a term
directly from the posting list in that case.
-Mike
Queries involving sorting can occupy a lot of memory. During
autowarming you need 2x peak memory usage. The only thing you can do
is increase your max heap size or be careful about cache autowarming
(possibly turning it off).
cheers,
-Mike
On 21-Jan-08, at 9:44 PM, Marcus Herou wrote
sion from ~1.5 months
ago, though).
-Mike
On 21-Jan-08, at 12:10 PM, Lance Norskog wrote:
Would somone please consider marking a label on the Subversion
repository
that says, "This is a clean version"? I only do HTTP requests and
have no
custom software, so I don't care a
e yet.
-Mike
Unfortunately, dismax does not do short-circuiting in the computer
science sense (stop evaluation once one clause matches).
I too have thought about implementing a scorer that has that
behaviour, but I've never gotten around to it (it would be expensive).
-Mike
On 18-Jan-08, at 3:
er query logs based on the set
of documents returned), then think about how to apply those
techniques in Solr.
-Mike
y way to handle that
efficiently with solr OOB. There probably is some way to create some
custom logic inside Solr to handle this efficiently, but it would
require some fancy code in the internals.
-Mike
On 17-Jan-08, at 1:08 PM, [EMAIL PROTECTED] wrote:
Sorry,.. my fault, please disr
he fields to query
(alternatively, you can provide a lucene-style query in an fq filter).
See the documentation here:
http://wiki.apache.org/solr/DisMaxRequestHandler
-Mike
che.
Each filter that matches more than ~3000 documents will occupy
maxDocs/8 bytes of memory. Certain kinds of faceting require one
entry per unique value in a field. The best way to tune this is to
monitor your cache hit/expunge statistics for the filter cache (on
the solr admin statistics screen).
-Mike
using them.
-Mike
strategy is as follows: index half the documents on one machine and
half on another. Execute both queries simultaneously (using threads,
f.i.), and combine the results. You should observe a speed up.
-Mike
See http://wiki.apache.org/solr/HighlightingParameters . The default
behaviour will provide snippets like google does.
Note that you need to "store" the text of fields you want to
highlight for this to work.
cheers,
-Mike
On 14-Jan-08, at 2:17 PM, Ycrux wrote:
Maybe the rig
at is the slop for phrase
queries in the main query (seems not documented on the wiki). So
qt=dismax
q="work injury"
qs=10
should do the trick.
-Mike
://issues.apache.org/jira/browse/SOLR-303
-Mike
f the highest-scoring one (mostly).
-Mike
so might want to consider turning off the stemming for code
search.
-Mike
()
probably shouldn't be one token.
The OP's problem might have to do with index/query-time analyzer
mismatch. We'd know more if he posted the schema definitions.
-Mike
It is the fraction of the score non-max terms that get added to the
solr. Hence, 1.0=sum everythign.
-Mike
On 4-Jan-08, at 3:28 PM, anuvenk wrote:
Could you elaborate on what the tie param does? I did read the
definition in
the solr wiki but still not crystal clear.
Mike Klaas wrote
you and using and the exact
query string sent to Solr. Keywords in (double!) quotes are phrase
queries and should function as you have posited (and they always have
for me).
-Mike
On 4-Jan-08, at 1:22 PM, anuvenk wrote:
My understanding of the searches with quotes is that say for eg: i
On 4-Jan-08, at 1:12 PM, s d wrote:
but i want to sum the scores and not use max, can i still do it
with the
DisMax? am i missing anything ?
If you set tie=1.0, dismax functions like dissum.
-Mike
t could be used to boost performance over a huge
Solr index. To accomplish that, you need to split it up over two
machines (for which you might find hadoop useful).
-Mike
On 3-Jan-08, at 11:38 AM, Leonardo Santagada wrote:
I tried to put some adds and deletes in the same request to solr
but it didn't work, have I done something wrong or this is really
not suported?
It isn't supported.
-Mike
r using a FieldCache that looks
up the age of each doc under consideration and does a range check.
Coaxing Solr into using it correctly might be a little tricky).
good luck,
-Mike
listed
before
Doc2 because it has "Parramatta" in location field.
Is this possible? Thanks in advance!
Sure, just also add a boost parameter:
fq= (+latitude[lat1 TO lat2] +longitude[lng1 TO lng2])
location:Parramatta location:NSW
bq=location:Parramatta^1.4
-Mike
#x27;t built-in capability for this. It wouldn't be atrocious to
implement, though.
-Mike
If you're writing to disk, you can minimize the chance of an
inconsistent index by hardlinking the files first (cp -l)
-Mike
On 2-Jan-08, at 8:10 AM, Charlie Jackson wrote:
Solr indexes are file-based, so there's no need to "dump" the index
to a file.
In terms of h
, though
(if that is an option).
-Mike
On 26-Dec-07, at 5:48 AM, Mark Baird wrote:
Well when I wasn't sending regular commits I was getting out of memory
exceptions from Solr fairly often, which I assume is due to the
size of the
documents I'm sending. I'd love to set th
Hey Kasi,
Take a look at the solr config file for the included example (example/
solr/conf/solrconfig.xml). It is the canonical documentation.
cheers,
-Mike
On 20-Dec-07, at 1:46 PM, Kasi Sankaralingam wrote:
Hi Mike,
Thanks a lot, where would this lock information go and also
How do I
(meaning keep the lock file
for example in database?)
Any plug in available?
Well, the lock isn't really important at all for typical Solr
operation. It is recommended to use simple
(1.3) which avoids FS locks. Otherwise, I would set
true and
not worry about it.
-Mike
Try 'ant clean' first.
On 17-Dec-07, at 10:34 AM, Owens, Martin wrote:
Hello,
I've just been rolling in my highlighter changes to the 2007-12-13
build of Solr, but even though the whole thing compiles I'm getting
the following odd error when I run a search:
SEVERE: java.lang.NoClassDefFo
I would recommend limiting the documentCache to a small number
(10-20), rather than zero. Otherwise, you will retrieve the
documents multiple times in one request if you are doing highlighting.
-Mike
On 14-Dec-07, at 2:24 AM, Koji Sekiguchi wrote:
Just comment out them:
regards,
Koji
Not sure if this helps, but note that the work is done in
PythonWriter, which is a subclass of JSONWriter. Most of the work is
done by JSONWriter due to the similarities of syntax.
-Mike
On 14-Dec-07, at 10:19 AM, Owens, Martin wrote:
That would be a python solr client, not a solr writer
threads get all jumbled together. (For instance, "Solr and
word frequencies?", "Solr 1.3 expected release date", and "Solr,
search result format" are all now mixed together in my client.)
Thanks!
-Mike
"category counts"
On 11-Dec-07, at 6:38 PM, Norskog, Lance wrote:
In SQL terms they are: 'select unique'. Except on only one field.
-Original Message-
From: Charles Hornberger [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 11, 2007 9:49 AM
To: solr-user@lucene.apache.org
Subject: Re
source to make the change?
Unfortunately, there is no easy way to enable this. Patches welcome!
-Mike
On 11-Dec-07, at 11:51 AM, Ken Krugler wrote:
Hi all,
I've got a pattern in a document (call it "xy") that I want to turn
into two tokens - "xy" and "y".
One approach I could use is PatternTokenizer to extract "xy", and
then a custom filter that returns "xy" and then "y" on the next
cal
I use jvm system properties for this; they seem to work well.
-Mike
On 11-Dec-07, at 7:39 AM, patrick o'leary wrote:
I actually have a patch for solr config parser which allows you to
use context environment variables in the solrconfig.xml
I generally use it for development when I'
See https://issues.apache.org/jira/browse/SOLR-293 for one solution.
-Mike
On 10-Dec-07, at 8:48 AM, Brendan Grainger wrote:
Hi Matt,
Thanks for the reply. I've done what you said and I get exactly
what you're saying as a result. Any ideas about how to make 2WD and
4WD be term
e comes to mind, though.
-Mike
Trunk has much more data in its spellcheck response, see http://
wiki.apache.org/solr/SpellCheckerRequestHandler .
-Mike
On 7-Dec-07, at 3:46 PM, Matthew Runo wrote:
I'll give it a try. Seems like the Spellcheck response type is
pretty basic.
Thanks!
Matthew Runo
Sof
d up all the
individual term counts (or, again, using LukeRequestHandler).
Ultimately, I think that it will be relatively hard to get sub-second
performance on such an index on a single box, but it may be possible
if you structure your queries intelligently. Definitely go for 16 gigs.
On 5-Dec-07, at 1:02 PM, Owens, Martin wrote:
Thanks Mike, So in essence I need to write a new RequestHandler
plugin which takes the query string, tokenises it then perform a
some kind of action against the index to return results which I
should then be able to get the termVectors from
y. I haven't worked with Term Vectors
(a Lucene API), so I'm not sure exactly how to go about this.
-Mike
ricky subject. It is hard to give any kind of
useful answer that applies in general. The one thing I can say is
that 110M is a _lot_ of docs for one system, especially if these are
normal-sized documents
regards,
-Mike
tell
you if you set the size too small. You can try it, it may help
This seems surprising unless you are positively hammering Solr with
tons of different threads during indexing. It's probably not worth
using more than # processors + a few.
-Mike
On 30-Nov-07, at 4:43 PM, Dave C. wrote:
Thanks for the quick response Mike...
Ideally it should match more than just a single character, i.e.
"the" in "weather" or "pro" in "profile" or "000" in "18000".
Would these cases be tak
or efficiency reasons. This is
configured via the "StopFilterFactory" in schema.xml: just remove it
from the field you are interested in and reindex.
-Mike
TermDocs would be needed).
-Mike
On 30-Nov-07, at 1:45 PM, Norskog, Lance wrote:
What would also help is a query to find records for the spellcheck
dictionary builder. We would like to make separate spelling indexes
for
all records in english, one in spanish, etc. We would also like to
can Solr tell use
which page the match happened on too?
Again, not automatically. However, if you wrote an analyzer that
bumped up the position increment of tokens every time a new page was
found (to, say the next multiple of 1000), then you infer the
matching page by the token position.
cheers,
-Mike
to "reverse analyze" the suggestions as well,
so "Pyhton" gets corrected to "Python" and not "python". Similarly,
"ad-hco" should probably suggest "ad-hoc" and not "adhoc".
-Mike
analyzers are just wrappers of the already-available
analyzers in lucene. I suspect (but am not sure) that the core devs
aren't fluent in the issues surrounding the analysis of asian text (I
certainly am not). Any improvements in this regard would be greatly
appreciated.
-Mike
On 23-Nov-07, at 5:24 PM, Mike Klaas wrote:
On 23-Nov-07, at 5:17 PM, Chris Hostetter wrote:
: The best way to help is to try out the patch, make sure it
applies, see if the
: functionality is working, and review the code changes. Review
is usually the
: biggest bottleneck in open
good, and
works well for you -- please add a comment saying so. if you think
the patch can be made
"
-Mike
ment order in the index is sufficient to
achieve O(100), but you'll have to insert code in Solr to stop after
100 docs (another alternative is to stop processing after a given
amount of time). Also, using O() in the case isn't quite accurate:
there are costs that vary based on the number of docs in the index too.
-Mike
1.3.
This one is less likely as it depends on other components which are
not yet included.
-Mike
are plenty
literature on near dup detection so you should be able to get one for
free!
To help your googling: the main algorithm used for this is called
'shingling' or 'shingle printing'.
-Mike
ts hard to imagine that happening at 30 threads.
-Mike
On 19-Nov-07, at 5:48 PM, Jonathan Ariel wrote:
Hi!
I'm wondering if someone is using a PHP client for solr. Actually
I'm not
sure if there is one out there.
Would you be interested in having a SolrJ port for PHP?
see http://wiki.apache.org/solr/SolPHP
cheers,
-Mike
xing via a webapp to eliminate
all your code as a possible factor. Then, look for signs to what is
happening when indexing slows. For instance, is Solr high in cpu, is
the computer thrashing, etc?
-Mike
On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
Hi,
Thanks for answering this
On 18-Nov-07, at 9:59 PM, Dilip.TS wrote:
Hello,
Does SOLR supports searching for a keyword which
has a
combination of more than 1 language within the same search page?
Sure: Solr is totally language-agnostic.
-Mike
On 18-Nov-07, at 8:17 AM, Eswar K wrote:
Is there any idea implementing that feature in the up coming releases?
Not currently. Feel free to contribute something if you find a good
solution .
-Mike
On Nov 18, 2007 9:35 PM, Stuart Sierra <[EMAIL PROTECTED]> wrote:
On Nov 18, 2
ur slave web container is configured
differently from the master.
-Mike
Have you build the project ('$ ant example')?
-Mike
On 15-Nov-07, at 2:41 PM, Thiago Jackiw wrote:
Grant,
Yes, I'm just starting it out from the examples directory flat out of
the trunk repository.
This is the output when I run "java -jar start.jar"
2007-11-15 14:
On 13-Nov-07, at 4:44 PM, Pieter Berkel wrote:
On Nov 14, 2007 6:44 AM, Mike Klaas <[EMAIL PROTECTED]> wrote:
Thanks Mike, that looks like a good place to start. While I really
can't think of any practical use for limiting the size of DocSet other
than simple faceting, the
web.xml ...etc...
Perhaps check your cache statistics on the admin gui. Is it possible
that you have set the capacity high and they are just filling up?
Another thing to look out for is if you tend to sort on many
different fields, but rarely.
-Mike
Not really--there have been a few threads on this topic recently.
Perhaps in a couple months?
It may depend on the timing of the lucene release.
-MIke
On 13-Nov-07, at 3:41 PM, Dave C. wrote:
Ah...
:(
Is there a timeline for the 1.3 release?
- david
Date: Tue, 13 Nov 2007 18:33:01
hen
doing lots of faceting on huge indices, if N is low (say, 500-1000).
One problem with the implementation above is that it stymies the
query caching in SolrIndexSearcher (since the generated DocList is >
the cache upper bound).
-Mike
deprecated
methods in external libraries as well.
I don't think so, but I suggest asking this question on java-
[EMAIL PROTECTED], which has a much broader lucene-related
audience.
-Mike
ld only be possible
in the event of cold process termination (like power loss).
-Mike
-Original Message-
From: David Neubert [mailto:[EMAIL PROTECTED]
Sent: Friday, November 09, 2007 10:56 AM
To: solr-user@lucene.apache.org
Subject: Re: Delte all docs in a SOLR index?
Thanks!
On 7-Nov-07, at 2:27 PM, briand wrote:
I need to perform a search against a limited set of documents. I
have the
set of document ids, but was wondering what is the best way to
formulate the
query to SOLR?
add fq=docId:(id1 id2 id3 id4 id5...)
cheers,
-Mike
Hi Brian,
Found the SVN location, will download from there and give it a try.
Thanks for the help.
On 07/11/2007, Mike Davies <[EMAIL PROTECTED]> wrote:
>
> I'm using 1.2, downloaded from
>
> http://apache.rediris.es/lucene/solr/
>
> Where can i get the trunk ver
I'm using 1.2, downloaded from
http://apache.rediris.es/lucene/solr/
Where can i get the trunk version?
On 07/11/2007, Brian Whitman <[EMAIL PROTECTED]> wrote:
>
>
> On Nov 7, 2007, at 10:00 AM, Mike Davies wrote:
> > java -Djetty.port=8521 -jar start.jar
> >
8983. Any suggestions?
Also, I'd really like to get hold of the source code to the start.jar but I
cant seem to find it anywhere. Again, any suggestions?
Thanks
Mike
ments that have the same resulting
token will be considered "the same").
If this is violated, the behaviour is undefined (but I wouldn't be
surprised if the first token was used).
-Mike
ven't been discovered yet. I'm using it in production.
More important than any claims we make is running it against your own
application's test suite, of course.
-Mike
to house the # of unique values
you are faceting on? Check the cache statistics on the admin gui.
Are there large numbers of evictions?
Alternatively, is company_facet multi- or -single-valued? If the
latter, the filter cache is not used at all.
-Mike
More generally, does anyone have a
when stemming, you'd store (account accountant)
(account accounts), etc., when filtering, (epee épée) (fantome
fantôme), etc.
Now when querying, transform your query into
^10:
épée -> epee épée^10
accountant -> account accountant^10
A bit of work to do in general, though.
-Mike
ev (lucene)? I think someone
once implemented a solution using hashmaps for sorting, but I can't
recall the issue #.
-Mike
scenes if you aren't using
multiple threads.
Some possible differences:
1. Solr has more aggressive default buffering settings
(maxBufferedDocs, mergeFactor)
2. solr trunk (if that is what you are using) is using a more recent
version of Lucene than the released 2.2
-Mike
601 - 700 of 1080 matches
Mail list logo