No I have not encountered OOM exception yet with current field collapse patch.
How large is your configured JVM heap space (-Xmx)? Field collapsing
requires more memory then regular searches so. Does Solr run out of
memory during the first search(es) or does it run out of memory after
a while when
I understand that synonyms are domain-specific, although I could still see a
benefit of having standardized synonyms.txt files (a thesaurus) for general
use. Just like the ones you can download or is already embedded in word
processors like Open Office Writer or MS Word.
I can understand that
Well that is odd. How have you configured field collapsing with the
dismax request handler?
The collapse counts should X - 1 (if collapse.threshold=1).
Martijn
2009/10/1 Joe Calderon calderon@gmail.com:
thx for the reply, i just want the number of dupes in the query
result, but it seems i
On Oct 1, 2009, at 12:18 PM, Phillip Farber wrote:
Resuming this discussion in a new thread to focus only on this
question:
What is the best way to get the size of an index so it does not get
too big to be optimized (or to allow a very large segment merge)
given space limits?
I
Hi,
Long story short: how can I take every 100th row from solr resultset.
What would syntax for this be.
Long story:
Currently I have lots of say documents(articles) indexed. They all have
field title with corresponding value.
atitle
btitle
.
*title
How do I build menu so I can search
Hi,
I run debug on a query to examine the score as I was surprised of results.
Here is the diff of same explain section of two different rows that I
found troubling.
It looks for pari in ancestorName field but first row looks in
241135 records
and the second row it's just 187821 records.
Hi,
I run debug on a query to examine the score as I was surprised of results.
Here is the diff of same explain section of two different rows that I
found troubling.
It looks for pari in ancestorName field but first row looks in
241135 records
and the second row it's just 187821 records.
On Thu, Oct 1, 2009 at 7:37 PM, matrix_psj matrix_...@hotmail.com wrote:
An example:
My schema is about web files. Part of the syntax is a text field of authors
that have worked on each file, e.g.
file
filenamelogin.php/filename
lastModDate2009-01-01/lastModDate
authorsalex,
On Thu, Oct 1, 2009 at 7:59 PM, Claudio Martella claudio.marte...@tis.bz.it
wrote:
About the copyField issue in general: as it copies the content to the
other field, what is the sense to define analyzers for the destination
field? The source is already analyzed so i guess that the RESULT of
Phillip Farber wrote:
Resuming this discussion in a new thread to focus only on this question:
What is the best way to get the size of an index so it does not get
too big to be optimized (or to allow a very large segment merge) given
space limits?
I already have the largest 15,000rpm SCSI
Did you try this?
http://blogs.msdn.com/dgorti/archive/2005/09/18/470766.aspx
http://blogs.msdn.com/dgorti/archive/2005/09/18/470766.aspxAlso, please
post the full exception stack trace.
2009/10/2 Steinar Asbjørnsen steinar...@gmail.com
Tried running solr on jetty now, and I still get the same
On Thu, Oct 1, 2009 at 3:10 PM, Thung, Peter C CIV SPAWARSYSCEN-PACIFIC,
56340 peter.th...@navy.mil wrote:
1. In my playing around with
sending in an XML document within a an XML CDATA tag,
with termVectors=true
I noticed the following behavior:
personpeter/person
collapses to the term
Hi all,
I need to perform sorting of my query hits by different criterion depending
on the number of hits. For instance, if there are 10 hits, sort by
date_entered, otherwise, sort by popularity.
Does anyone know if there is a way to do that with a single query, or I'll
have to send another
On Thu, Oct 1, 2009 at 7:59 PM, Claudio Martella claudio.marte...@tis.bz.it
wrote:
About the copyField issue in general: as it copies the content to the
other field, what is the sense to define analyzers for the destination
field? The source is already analyzed so i guess that the RESULT of
On Thu, Oct 1, 2009 at 2:54 PM, Lance Norskog goks...@gmail.com wrote:
Trie fields also do not support faceting.
Only those that index multiple tokens per value to speed up range queries.
They also take more ram in
some operations.
Should be less memory on average.
-Yonik
On Fri, Oct 2, 2009 at 6:44 PM, Fergus McMenemie fer...@twig.me.uk wrote:
The copy is done before analysis. The original text is sent to the
copyField
which can choose to do analysis differently from the source field.
I have been wondering about this as well. The WIKI is not explicit about
Ah yes we do have some warming queries which would look like a search. Did
that side change enough to push up the memory limits where we would run out
like this? Also, would FastLRU cache make a difference?
--
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562
On Fri, Oct 2, 2009 at 9:54 AM, Jeff Newburn jnewb...@zappos.com wrote:
Ah yes we do have some warming queries which would look like a search. Did
that side change enough to push up the memory limits where we would run out
like this?
What does the warming request(s) look like, and what are
Jeff Newburn wrote:
that side change enough to push up the memory limits where we would run out
like this?
Yes - now give us the FieldCache section from the stats section please :)
Its not likely gonna do you any good, but it could be good information
for us.
--
- Mark
On Fri, Oct 2, 2009 at 10:02 AM, Mark Miller markrmil...@gmail.com wrote:
Jeff Newburn wrote:
that side change enough to push up the memory limits where we would run out
like this?
Yes - now give us the FieldCache section from the stats section please :)
And the fieldValueCache section too
Hi,
I read pretty much all posts on this thread (before and after this one). Looks
like the main suggestion from you and others is to keep max heap size (-Xmx) as
small as possible (as long as you don't see OOM exception). This brings more
questions than answers (for me at least. I'm new to
siping liu wrote:
Hi,
I read pretty much all posts on this thread (before and after this one).
Looks like the main suggestion from you and others is to keep max heap size
(-Xmx) as small as possible (as long as you don't see OOM exception). This
brings more questions than answers (for me
If the threshold is only 10, why can't you always sort by popularity and
if the result set is 10 then resort on the client side based on
date_entered?
Uri
Bojan Šmid wrote:
Hi all,
I need to perform sorting of my query hits by different criterion depending
on the number of hits. For
Nope, that just gets you the number of results returned, not how many
there could be. Like I said, if you look at the XML returned, you'll
see something like
result name='response' numFound='1251' start='0'
but only 10 doc returned. getNumFound returns 10 in that case, not 1251.
2009/10/2
I tried to simplify the problem, but the point is that I could have really
complex requirements. For instance, if in the first 5 results none are
older than one year, use sort by X, otherwise sort by Y.
So, the question is, is there a way to make Solr recognize complex
situations and apply
The warmers return 11 fields:
3 Strings
2 booleans
2 doubles
2 longs
1 sint (solr.SortableIntField)
Let me know if you need the fields actually be searched on.
name: fieldCache
class: org.apache.solr.search.SolrFieldCacheMBean
version: 1.0
description: Provides introspection of the
Another thing to remember about wildcard and fuzzy searches is that none
of the token filters will be applied.
If you are using the LowerCaseFilterFactory at index time, then
RI-MC50034-1 gets converted to ri-mc50034-1 which is never going to
match RI-MC5000*
Also, I would probably use the
heap space is 4gb set to grow up to 8gb, usage is normally ~1-2gb,
seems to happen within a few searches.
if its just me ill try to isolate it, it could be some other part of
my implementation
thx much
On Fri, Oct 2, 2009 at 1:18 AM, Martijn v Groningen
martijn.is.h...@gmail.com wrote:
No I
Hello,
I'm trying to create a tag cloud from a term vector, but the array
returned (using JSON wt) is quite complex and takes an inordinate
amount of time to process. Is there a better way to retrieve terms and
their document TF? The TermVectorComponent allows for retrieval of tf
and df though
Hello,
A couple questions with regard to snapshots and distribution:
1. If two snapshots are created in between a snappull, are the changes from
the first snapshot missed by the slave, as it only pulls the most recent
snapshot?
2. When triggering snapshooter from the postCommit hook, does a
http://googleenterprise.blogspot.com/2009/08/compare-enterprise-search-relevance.html
This is really cool, and a version for Solr would help in doing
relevance experiments. We don't need the select A or B feature, just
seeing search result sets side-by-side would be great.
--
Lance Norskog
Mark Miller wrote:
Phillip Farber wrote:
Resuming this discussion in a new thread to focus only on this question:
What is the best way to get the size of an index so it does not get
too big to be optimized (or to allow a very large segment merge) given
space limits?
I already have the
I reran the test to try to ensure that other cores on the instance didn't
have searches against them. This time I get NPE errors just trying to get
into the stats after the system hits its limit.
--
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562
From: Jeff
A snapshot is a copy of the index at a particular moment in time. So
changes in earlier snapshots are in the latest one as well. Nothing is
missed by pulling the latest snapshot.
When triggering snapshooter with the postCommit hook, a commit always
results in a snapshot being created.
Bill
On
Yes. I think would be very helpful tool for tunning search relevancy - you
can do a controlled experiment with your target audiences to understand
their responses to the parameter changes. We plan to use this feature to
benchmark Lucene/SOLR against our in-house commercial search engine - it
will
Have you considered using facet counts for your tag cloud?
Bill
On Fri, Oct 2, 2009 at 11:34 AM, aod...@gmail.com wrote:
Hello,
I'm trying to create a tag cloud from a term vector, but the array
returned (using JSON wt) is quite complex and takes an inordinate
amount of time to process. Is
Does the PatternReplaceFilter have an option where you can keep the
original token in addition to the modified token? From what I looked at it
does not seem to but I want to confirm the same.
Alternatively, is there a filter available which takes in a pattern and
produces additional forms of
Hi
i have a question regarding synonymfilter
i have a one way mapping defined
austin martin, astonmartin = aston martin
what baffling me is that if i give at query time the word austin martin
it first goes through white space and generate two words in analysis page
austin and martin
then
On Fri, Oct 2, 2009 at 8:11 PM, Paul Tomblin ptomb...@xcski.com wrote:
Nope, that just gets you the number of results returned, not how many
there could be. Like I said, if you look at the XML returned, you'll
see something like
result name='response' numFound='1251' start='0'
but only 10
On Fri, Oct 2, 2009 at 3:13 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
On Fri, Oct 2, 2009 at 8:11 PM, Paul Tomblin ptomb...@xcski.com wrote:
Nope, that just gets you the number of results returned, not how many
there could be. Like I said, if you look at the XML returned, you'll
We have the same issue as Paul. We currently parse the XML manually to pull
out the numFound from the response.
Cheers!
Adam
- Original Message
From: Paul Tomblin ptomb...@xcski.com
To: solr-user@lucene.apache.org
Sent: Friday, October 2, 2009 2:39:01 PM
Subject: Re: How to access
Hi,
My doc has three fields, say field1, field2, field3.
My search would be q=field1:string1 field2:string2. I also need to
do some computation and comparison of the string1 and string2 with the
contents in field3 and then determine if it is a hit.
What can I do to implement this?
Thanks.
Hello,
I know I can invoke expungeDeletes using updatehandler ( curl update -
F stream.body=' commit expungeDeletes=true/' ), however, I was
wondering if it is possible to invoke it using SolrJ.
It looks like, currently, there are no SolrServer.commit(..) methods
that I can use for this
I know that the Solr FAQ says
Users should decide for themselves which Servlet Container they
consider the easiest/best for their use cases based on their
needs/experience. For high traffic scenarios, investing time for tuning
the servlet container can often make a big difference.
but is
Hi,
Is there a way to request all fields in an object EXCEPT a particular
one? In other words, the following pseudo code is what I'd like to express:
req = Solr::Request::Standard.new(:start = page*size, :rows = size,
:query = my_query, :field_list = [ ALL EXCEPT 'text' ])
Is there a way to
Just go for Tomcat. For all its problems, and I should know having used
it since it was originally JavaWebServer, it is perfectly capable of
handling high-end production environments provided you tune it
correctly. We use it with our customized Solr 1.3 version without any
problems.
Lajos
Netflix uses Tomcat throuought and they tail the log to figure out whether
it has started, except they look for a message from Solr to see whether
Solr is ready to go to work.
wunder
-Original Message-
From: Lajos [mailto:la...@protulae.com]
Sent: Friday, October 02, 2009 1:35 PM
To:
Hi
i have a question regarding synonymfilter
i have a one way mapping defined
austin martin, astonmartin = aston martin
...
Can anybody please explain if my observation is correct. This is a very
critical aspect for my work.
That is correct - the synonym filter can recognize multi-token
On Sat, Oct 3, 2009 at 1:09 AM, Paul Tomblin ptomb...@xcski.com wrote:
Nope. Check again. getNumFound will definitely give you 1251.
SolrDocumentList#size() will give you 10.
I don't have to check again. I put this log into my query code:
QueryResponse resp =
On Sat, Oct 3, 2009 at 1:35 AM, Jibo John jiboj...@mac.com wrote:
Hello,
I know I can invoke expungeDeletes using updatehandler ( curl update -F
stream.body=' commit expungeDeletes=true/' ), however, I was wondering
if it is possible to invoke it using SolrJ.
It looks like, currently,
LucidityWorks.com is my client. The similarity to lucid is purely coincidental
- the client didn't even know I was going to choose Solr. I am using Solr
trunk, last updated and compiled a few weeks ago.
-- Sent from my Palm Prē
Shalin Shekhar Mangar wrote:
On Sat, Oct 3, 2009 at 1:09 AM,
AOL uses Tomcat for all Solr deployments. Our load balancers use a ping
query to put a box back into rotation.
On Sat, Oct 3, 2009 at 2:15 AM, Walter Underwood wun...@wunderwood.orgwrote:
Netflix uses Tomcat throuought and they tail the log to figure out whether
it has started, except they
Thanks, Mark. I really appreciate your confirmation.
Phil
Mark Miller wrote:
Phillip Farber wrote:
Resuming this discussion in a new thread to focus only on this question:
What is the best way to get the size of an index so it does not get
too big to be optimized (or to allow a very large
You can always add arbitrary parameters to an update request:
UpdateRequest ureq = new UpdateRequest();
ureq.add(doc);
ureq.setParam(expungeDeletes,true);
NamedListObject rsp = server.request(ureq);
-Yonik
http://www.lucidimagination.com
On Fri, Oct 2, 2009 at 4:05 PM, Jibo
Created jira issue https://issues.apache.org/jira/browse/SOLR-1487
Thanks,
-Jibo
On Oct 2, 2009, at 2:17 PM, Shalin Shekhar Mangar wrote:
On Sat, Oct 3, 2009 at 1:35 AM, Jibo John jiboj...@mac.com wrote:
Hello,
I know I can invoke expungeDeletes using updatehandler ( curl
update -F
Doing a second search immediately after the first one is consistently
under 100 ms for me, usually under 25, on cheap hardware. Even while
sorting the results, you should have no problems. If necessary, you
could run Solr with the embedded client and do one search right after
the other, avoid the
On Fri, Oct 2, 2009 at 5:04 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
Can you try this with the Solrj client
in the official 1.3 release or even trunk?
I did a svn update to 821188 and that seems to have fixed the problem.
(The jar files changed from -1.3.0 to -1.4-dev) I guess
This is not working when i search documents i have a document which contains
text aston martin
when i search carDescription:austin martin i get a match but when i dont
give double quotes
like carDescription:austin martin
there is no match
in the analyser if i give austin martin with out
When you use a field qualifier(fieldName:valueToLookFor) it only applies
to the word right after the semicolon. If you look at the debug
infomation you will notice that for the second word it is using the
default field.
str name=parsedquery_toStringcarDescription:austin *text*:martin/str
the
Thanks
As i said it even works by giving double quotes too.
like carDescription:austin martin
So is that the conclusion that in order to map two word synonym i have to
always enclose in double quotes, so that it doen not split the words
Christian Zambrano wrote:
When you use a
No, there is only list of fields, star, and score. You can choose
to index it and not store it, and then have your application fetch it
from the original data store. This is a common system design pattern
to avoid storing giant text blobs in the index.
Thanks, Lance, for the quick reply.
Well, unfortunately, we need the highlighting feature on that field, so
I think we have to store it.
It's not a big deal, it just seemed like something that would be useful
and probably be easy to implement, so I figured I just missed it.
Alternately, is
Simon,
Have you tried the bin/jetty.sh script that comes with Jetty
distributions? It contains the standard start|stop|restart functions.
Joshua
On Oct 2, 2009, at 1:11 PM, Simon Wistow wrote:
I know that the Solr FAQ says
Users should decide for themselves which Servlet Container they
Maybe the TermsComponent?
You can't ask for facets with a wildcard in the field name. This would
do the trick. It's an issue in JIRA, if you want to vote for it.
http://issues.apache.org/jira/browse/SOLR-247
http://issues.apache.org/jira/browse/SOLR-1387
On Fri, Oct 2, 2009 at 6:36 PM, Paul
64 matches
Mail list logo