:)
On Apr 8, 2010, at 1:33 PM, Nagelberg, Kallin wrote:
I've been doing work evaluating Solr for use on a hightraffic
website for sometime and things are looking positive. I have some
concerns from my higher-ups that I need to address. I have suggested
that we use a single index in order to keep
I have been using Jmeter to perform some load testing. In your case you might
like to take a look at
http://jakarta.apache.org/jmeter/usermanual/component_reference.html#CSV_Data_Set_Config
. This will allow you to use a random item from your query list.
Regards,
Kallin Nagelberg
Hey,
A question was raised during a meeting about our new Solr based search
projects. We're getting 4 cutting edge servers each with something like 24 Gigs
of ram dedicated to search. However there is some problem with the amount of
SAS based storage each machine can handle, and people wonder
. See
http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-50-volumes-5-million-volumes-and-beyond
for details.
Tom
-Original Message-
From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com]
Sent: Tuesday, April 27, 2010 4:13 PM
To: 'solr-user
Hi,
Does anyone have an idea about the performance benefits of searching across
floats compared to strings? I have one multi-valued field that contains about
3000 distinct IDs across 5 million documents. I am going to be a lot of queries
like q=id:102 OR id:303 OR id:305, etc. Right now it is
You might want to look at DateMath,
http://lucene.apache.org/solr/api/org/apache/solr/util/DateMathParser.html. I
believe the default precision is to the millisecond, so if you afford to round
to the nearest second or even minute you might see some performance gains.
-Kallin Nagelberg
I had a very hard time selling Solr to business folks. Most are of the mind
that if you're not paying for something it can't be any good. That might also
be why they refrain from posting 'powered by solr' on their website, as if it
might show them to be cheap. They are also fearful of lack of
, Apr 28, 2010 at 11:22 AM,
Nagelberg, Kallin
knagelb...@globeandmail.com
wrote:
Does anyone have an idea about the performance
benefits of searching across floats compared to strings? I
have one multi-valued field that contains about 3000
distinct IDs across 5 million documents. I am going
Hey,
I've been using the dismax query parser so that I can pass a user created
search string directly to Solr. Now I'm getting the requirement that something
like 'Bo' must match 'Bob', or 'Bob Jo' must match 'Bob Jones'. I can't think
of a way to make this happen with Dismax, though it's
Hey everyone,
I'm curious if anyone has experiencing working with the company NStein and
their Solr based search solution S3. Any comments on performance, usability,
support etc. would be really appreciated.
Thanks,
-Kallin Nagelberg
Hey everyone,
I'm having some difficulty figuring out the best way to optimize for a certain
query situation. My documents have a many-valued field that stores lists of
IDs. All in all there are probably about 10,000 distinct IDs throughout my
index. I need to be able to query and find all
Hey everyone,
Does anyone know if it is possible to control cache behavior on a per-request
basis? I would like to be able to use the queryResultCache for certain queries,
but have it bypassed for others. IE, I know at query time if there is 0 chance
of a hit and would like to avoid the cache
I'm not sure I understand how your results are truncated. They both find 21502
documents. The fact that you are sorting on '_erstelldatum' ascending and not
seeing any results for that field on the first page leads me to think that you
have 'sortMissingLast=false' on that field's fieldType. In
I must be missing something very obvious here. I have a filter query like so:
(-rootdir:somevalue)
I get results for that filter
However, when I OR it with another term like so I get nothing:
((-rootdir:somevalue) OR (rootdir:somevalue AND someboolean:true))
How is this possible? Have I gone
Awesome that works, thanks Ahmet.
-Kallin Nagelberg
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Thursday, May 13, 2010 12:24 PM
To: solr-user@lucene.apache.org
Subject: Re: confused by simple OR
I must be missing something very
obvious here. I have a
I am trying to tune my Solr setup so that the caches are well warmed after the
index is updated. My documents are quite small, usually under 10k. I currently
have a document cache size of about 15,000, and am warming up 5,000 with a
query after each indexing. Autocommit is set at 30 seconds,
I agree that pulling all attributes into the parent sku during indexing could
work well. Define a Boolean field like 'isVirtual' to identify the non-leaf
skus, and use a multi-valued field for each of the attributes. For now you can
do a search like (isVirtual:true AND doorType:screen). If at a
products
in result set
sorry, what does sku mean?
I understand you like this: indexing base and variants, and include all
atributes (for one base and its variants) in each document. I think that
would work. Thanks.
Nagelberg, Kallin wrote:
I agree that pulling all attributes into the parent
I suppose you are still losing some performance on the replicated box since it
needs to use some resources to warm the cache. It would be nice if a warmed
cache could be replicated from the master though perhaps that's not practical.
Chris is right though: The newly updated index created by a
Hey everyone,
I've recently been given a requirement that is giving me some trouble. I need
to retrieve up to 100 documents, but I can't see a way to do it without making
100 different queries.
My schema has a multi-valued field like 'listOfIds'. Each document has between
0 and N of these ids
How about throwing a blockingqueue,
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueue.html,
between your document-creator and solrserver? Give it a size of 10,000 or
something, with one thread trying to feed it, and one thread waiting for it to
get near full then
, Kallin knagelb...@globeandmail.com wrote:
From: Nagelberg, Kallin knagelb...@globeandmail.com
Subject: RE: Machine utilization while indexing
To: 'solr-user@lucene.apache.org' solr-user@lucene.apache.org
Date: Thursday, May 20, 2010, 8:16 AM
How about throwing a blockingqueue,
http
your doing.Currently it takes
about 2hour to index the 5m documents I'm talking about. But I still
feel as if my machine is under utilized.
Thijs
On 20-5-2010 17:16, Nagelberg, Kallin wrote:
How about throwing a blockingqueue,
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent
Thanks Darren,
The problem with that is that it may not return one document per id, which is
what I need. IE, I could give 100 ids in that OR query and retrieve 100
documents, all containing just 1 of the IDs.
-Kallin Nagelberg
-Original Message-
From: dar...@ontrenet.com
Yeah I need something like:
(id:1 and maxhits:1) OR (id:2 and maxits:1).. something crazy like that..
I'm not sure how I can hit solr once. If I do try and do them all in one big OR
query then I'm probably not going to get a hit for each ID. I would need to
request probably 1000 documents to
with 1 matching
doc for each id.
Again it is not guarenteed that all docs returned are different. Since you
didn't specify this as a requirement I think this will suffics.
Cheers,
Geert-Jan
2010/5/20 Nagelberg, Kallin knagelb...@globeandmail.com
Yeah I need something like:
(id:1 and maxhits:1
StreamingUpdateSolrServer already has multiple threads and uses multiple
connections under the covers. At least the api says ' Uses an internal
MultiThreadedHttpConnectionManager to manage http connections'. The constructor
allows you to specify the number of threads used,
Nagelberg, Kallin knagelb...@globeandmail.com
Yeah I need something like:
(id:1 and maxhits:1) OR (id:2 and maxits:1).. something crazy like that..
I'm not sure how I can hit solr once. If I do try and do them all in one
big OR query then I'm probably not going to get a hit for each ID. I
-Jan
2010/5/20 Nagelberg, Kallin knagelb...@globeandmail.com
Yeah I need something like:
(id:1 and maxhits:1) OR (id:2 and maxits:1).. something crazy like that..
I'm not sure how I can hit solr once. If I do try and do them all in one
big OR query then I'm probably not going to get a hit
As I understand from looking at
https://issues.apache.org/jira/login.jsp?os_destination=/browse/SOLR-236 field
collapsing has been disabled on multi-valued fields. Is this really necessary?
Let's say I have a multi-valued field, 'my-mv-field'. I have a query like
(my-mv-field:1 OR
I'm afraid nothing is completely 'real-time'. Even when doing your inserts on
the database there is time taken for those operations to complete. Right now I
have my solr server autocommiting every 30 seconds, which is 'real-time' enough
for me. You need to figure out what your threshold is, and
Searching is very fast with Solr, but no way as fast as keying into a map.
There is possibly disk I/O if your document isn't cached. Your situation sounds
unique enough I think you're going to need to prototype to see if it meets your
demands. Figure out how 'fast' is 'fast' for your
.
Hopefully someone finds this useful eventually!
-Kallin Nagelberg
-Original Message-
From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com]
Sent: Friday, May 21, 2010 4:44 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: seemingly impossible query
I just realized something
Good read here: http://mysolr.com/tips/denormalized-data-structure/ .
Are consultation requests unique to each consultant? In that case you could
represent the request as a Json String and store it as a multi-valued string
field for each consultant, though that makes querying against requests
Multi-core is an option, but keep in mind if you go that route you will need to
do two searches to correlate data between the two.
-Kallin Nagelberg
-Original Message-
From: Robert Zotter [mailto:robertzot...@gmail.com]
Sent: Friday, May 28, 2010 12:26 PM
To:
, Nagelberg, Kallin
knagelb...@globeandmail.com wrote:
Multi-core is an option, but keep in mind if you go that route you will
need to do two searches to correlate data between the two.
-Kallin Nagelberg
-Original Message-
From: Robert Zotter [mailto:robertzot...@gmail.com
your config is set up to replace unique keys, you're really
doing a delete and an add (under the covers). It could very well be that
the deleted version of the document is still in your index taking up
space and will be until it is purged.
HTH
Erick
On Thu, Jun 3, 2010 at 10:22 AM, Nagelberg
How much memory have you given tomcat? The default is 64M which is going to be
really small for 5MB documents.
-Original Message-
From: jim.bl...@pbwiki.com [mailto:jim.bl...@pbwiki.com] On Behalf Of Jim Blomo
Sent: Thursday, June 03, 2010 2:05 PM
To: solr-user@lucene.apache.org
03, 2010 2:29 PM
To: solr-user@lucene.apache.org
Subject: Re: general debugging techniques?
On Thu, Jun 3, 2010 at 11:17 AM, Nagelberg, Kallin
knagelb...@globeandmail.com wrote:
How much memory have you given tomcat? The default is 64M which is going to
be really small for 5MB documents
-
From: Nagelberg, Kallin
Sent: Thursday, June 03, 2010 1:36 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: index growing with updates
Is there a way to trigger a purge, or under what conditions does it occur?
-Kallin Nagelberg
-Original Message-
From: Erick Erickson
I'm pretty sure you need to be running the patch against a checkout of the
trunk sources, not a generated .war file. Once you've done that you can use the
build scripts to make a new war.
-Kallin Nagelberg
-Original Message-
From: Moazzam Khan [mailto:moazz...@gmail.com]
Sent:
So you want to take the top 1000 sorted by score, then sort those by another
field. It's a strange case, and I can't think of a clean way to accomplish it.
You could do it in two queries, where the first is by score and you only
request your IDs to keep it snappy, then do a second query against
How about:
1. Create a date field to indicate indextime.
2 Use a date filter to restrict articles to today and yesterday such as
myindexdate:[NOW/DAY-1DAY TO NOW/DAY+1DAY]
3. sort on that field.
-Kallin Nagelberg
-Original Message-
From: oferiko [mailto:ofer...@gmail.com]
Sent:
Yeah you should definitely just setup a custom parser for each site.. should be
easy to extract title using groovy's xml parsing along with tagsoup for sloppy
html. If you can't find the pattern for each site leading to the job title how
can you expect solr to? Humans have the advantage here :P
Hey,
I recently moved a solr app from a testing environment into a production
environment, and I'm seeing a brand new error which never occurred during
testing. I'm seeing this in the solrJ-based app logs:
org.apache.solr.common.SolrException: com.caucho.vfs.SocketTimeoutException:
client
I think you just want something like:
p_value:Pramod AND p_type:Supplier
no?
-Kallin Nagelberg
-Original Message-
From: Pramod Goyal [mailto:pramod.go...@gmail.com]
Sent: Friday, July 23, 2010 2:17 PM
To: solr-user@lucene.apache.org
Subject: help with a schema design problem
Hi,
Lets
.
On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin
knagelb...@globeandmail.com wrote:
I think you just want something like:
p_value:Pramod AND p_type:Supplier
no?
-Kallin Nagelberg
-Original Message-
From: Pramod Goyal [mailto:pramod.go...@gmail.com
ManBearPig is still a threat.
-Kallin Nagelberg
-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Tuesday, July 27, 2010 7:44 PM
To: solr-user@lucene.apache.org
Subject: RE: How to 'filter' facet results
Is there a way to tell Solr to only return a specific
Hi everyone,
I've been trying to add a date based boost to my queries. I have a field like:
fieldType name=tdate class=solr.TrieDateField omitNorms=true
precisionStep=6 positionIncrementGap=0/
field name=datetime type=tdate indexed=true stored=true
required=true /
When I look at the datetime
more memory, ord() isn't even going to work for
a field with multiple tokens indexed per value (like tdate).
I'd recommend using a function on the date value itself.
http://wiki.apache.org/solr/FunctionQuery#ms
-Yonik
http://www.lucidimagination.com
On Wed, Jan 6, 2010 at 10:52 AM, Nagelberg
Hi everyone,
I'm trying to enhance a more like this search I'm conducting by boosting the
documents that have a date close to the original. I would like to do something
like a parabolic function centered on the date (would make tuning a little more
effective), though a linear function would
Hi everyone,
I am attempting to implement a faceted drill down feature with Solr. I am
having problems explaining some results of the fq parameter.
Let's say I have two fields, 'people' and 'category'. I do a search for 'dog'
and ask to facet on the people and category fields.
I am told that
Problem solved. I wasn't quoting the value. Since I was using names such as
'Gary Bettman' solr must have been giving all the Garys.
-Original Message-
From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com]
Sent: Tuesday, February 16, 2010 3:22 PM
To: 'solr-user@lucene.apache.org
I've noticed some peculiar behavior with the dismax searchhandler.
In my case I'm making the search The British Open, and am getting 0 results.
When I change it to British Open I get many hits. I looked at the query
analyzer and it should be broken down to british and open tokens ('the' is
a
I'm having a problem when users enter stopwords in their query. I'm using a
dismax request handler against a field setup like:
fieldType name=simpleText class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
Try setting the boost to 0 for the fields you don't want to contribute to the
score.
Kallin Nagelberg
-Original Message-
From: Jason Chaffee [mailto:jchaf...@ebates.com]
Sent: Thursday, February 25, 2010 4:03 PM
To: solr-user@lucene.apache.org
Subject: How to use dismax and boosting
copyField, if you
also need to be able to search or display the original values.
Just out of curiosity, can you tell us anything about what the Globe and
Mail is using Solr for? (assuming the question is work-related)
Peter
-Original Message-
From: Nagelberg, Kallin [mailto:knagelb
Hi,
I've got a situation where I need to reindex a core once a day. To do this I
was thinking of having two cores, one 'live' and one 'staging'. The app is
always serving 'live', but when the daily index happens it goes into 'staging',
then staging is swapped into 'live'. I can see how to do
:19 PM, Nagelberg, Kallin
knagelb...@globeandmail.com wrote:
Hi,
I've got a situation where I need to reindex a core once a day. To
do this I was thinking of having two cores, one 'live' and one
'staging'. The app is always serving 'live', but when the daily
index happens it goes
59 matches
Mail list logo