RE: index corruption / deployment strategy

2010-04-09 Thread Nagelberg, Kallin
:) On Apr 8, 2010, at 1:33 PM, Nagelberg, Kallin wrote: I've been doing work evaluating Solr for use on a hightraffic website for sometime and things are looking positive. I have some concerns from my higher-ups that I need to address. I have suggested that we use a single index in order to keep

RE: Benchmarking Solr

2010-04-12 Thread Nagelberg, Kallin
I have been using Jmeter to perform some load testing. In your case you might like to take a look at http://jakarta.apache.org/jmeter/usermanual/component_reference.html#CSV_Data_Set_Config . This will allow you to use a random item from your query list. Regards, Kallin Nagelberg

nfs vs sas in production

2010-04-27 Thread Nagelberg, Kallin
Hey, A question was raised during a meeting about our new Solr based search projects. We're getting 4 cutting edge servers each with something like 24 Gigs of ram dedicated to search. However there is some problem with the amount of SAS based storage each machine can handle, and people wonder

RE: nfs vs sas in production

2010-04-28 Thread Nagelberg, Kallin
. See http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-50-volumes-5-million-volumes-and-beyond for details. Tom -Original Message- From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com] Sent: Tuesday, April 27, 2010 4:13 PM To: 'solr-user

benefits of float vs. string

2010-04-28 Thread Nagelberg, Kallin
Hi, Does anyone have an idea about the performance benefits of searching across floats compared to strings? I have one multi-valued field that contains about 3000 distinct IDs across 5 million documents. I am going to be a lot of queries like q=id:102 OR id:303 OR id:305, etc. Right now it is

RE: Slow Date-Range Queries

2010-04-29 Thread Nagelberg, Kallin
You might want to look at DateMath, http://lucene.apache.org/solr/api/org/apache/solr/util/DateMathParser.html. I believe the default precision is to the millisecond, so if you afford to round to the nearest second or even minute you might see some performance gains. -Kallin Nagelberg

RE: Evangelism

2010-04-29 Thread Nagelberg, Kallin
I had a very hard time selling Solr to business folks. Most are of the mind that if you're not paying for something it can't be any good. That might also be why they refrain from posting 'powered by solr' on their website, as if it might show them to be cheap. They are also fearful of lack of

RE: benefits of float vs. string

2010-04-30 Thread Nagelberg, Kallin
, Apr 28, 2010 at 11:22 AM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: Does anyone have an idea about the performance benefits of searching across floats compared to strings? I have one multi-valued field that contains about 3000 distinct IDs across 5 million documents. I am going

prefixing with dismax

2010-04-30 Thread Nagelberg, Kallin
Hey, I've been using the dismax query parser so that I can pass a user created search string directly to Solr. Now I'm getting the requirement that something like 'Bo' must match 'Bob', or 'Bob Jo' must match 'Bob Jones'. I can't think of a way to make this happen with Dismax, though it's

nstein and 3S

2010-05-05 Thread Nagelberg, Kallin
Hey everyone, I'm curious if anyone has experiencing working with the company NStein and their Solr based search solution S3. Any comments on performance, usability, support etc. would be really appreciated. Thanks, -Kallin Nagelberg

caching repeated OR'd terms

2010-05-06 Thread Nagelberg, Kallin
Hey everyone, I'm having some difficulty figuring out the best way to optimize for a certain query situation. My documents have a many-valued field that stores lists of IDs. All in all there are probably about 10,000 distinct IDs throughout my index. I need to be able to query and find all

cache control per-request

2010-05-06 Thread Nagelberg, Kallin
Hey everyone, Does anyone know if it is possible to control cache behavior on a per-request basis? I would like to be able to use the queryResultCache for certain queries, but have it bypassed for others. IE, I know at query time if there is 0 chance of a hit and would like to avoid the cache

RE: strange behaviour when sorting, fields are missing in result

2010-05-12 Thread Nagelberg, Kallin
I'm not sure I understand how your results are truncated. They both find 21502 documents. The fact that you are sorting on '_erstelldatum' ascending and not seeing any results for that field on the first page leads me to think that you have 'sortMissingLast=false' on that field's fieldType. In

confused by simple OR

2010-05-13 Thread Nagelberg, Kallin
I must be missing something very obvious here. I have a filter query like so: (-rootdir:somevalue) I get results for that filter However, when I OR it with another term like so I get nothing: ((-rootdir:somevalue) OR (rootdir:somevalue AND someboolean:true)) How is this possible? Have I gone

RE: confused by simple OR

2010-05-13 Thread Nagelberg, Kallin
Awesome that works, thanks Ahmet. -Kallin Nagelberg -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Thursday, May 13, 2010 12:24 PM To: solr-user@lucene.apache.org Subject: Re: confused by simple OR I must be missing something very obvious here. I have a

maximum recommended document cache size

2010-05-13 Thread Nagelberg, Kallin
I am trying to tune my Solr setup so that the caches are well warmed after the index is updated. My documents are quite small, usually under 10k. I currently have a document cache size of about 15,000, and am warming up 5,000 with a query after each indexing. Autocommit is set at 30 seconds,

RE: Challenge: Searching for variant products and get basic products in result set

2010-05-19 Thread Nagelberg, Kallin
I agree that pulling all attributes into the parent sku during indexing could work well. Define a Boolean field like 'isVirtual' to identify the non-leaf skus, and use a multi-valued field for each of the attributes. For now you can do a search like (isVirtual:true AND doorType:screen). If at a

RE: Challenge: Searching for variant products and get basic products in result set

2010-05-19 Thread Nagelberg, Kallin
products in result set sorry, what does sku mean? I understand you like this: indexing base and variants, and include all atributes (for one base and its variants) in each document. I think that would work. Thanks. Nagelberg, Kallin wrote: I agree that pulling all attributes into the parent

RE: disable caches in real time

2010-05-19 Thread Nagelberg, Kallin
I suppose you are still losing some performance on the replicated box since it needs to use some resources to warm the cache. It would be nice if a warmed cache could be replicated from the master though perhaps that's not practical. Chris is right though: The newly updated index created by a

seemingly impossible query

2010-05-20 Thread Nagelberg, Kallin
Hey everyone, I've recently been given a requirement that is giving me some trouble. I need to retrieve up to 100 documents, but I can't see a way to do it without making 100 different queries. My schema has a multi-valued field like 'listOfIds'. Each document has between 0 and N of these ids

RE: Machine utilization while indexing

2010-05-20 Thread Nagelberg, Kallin
How about throwing a blockingqueue, http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueue.html, between your document-creator and solrserver? Give it a size of 10,000 or something, with one thread trying to feed it, and one thread waiting for it to get near full then

RE: Machine utilization while indexing

2010-05-20 Thread Nagelberg, Kallin
, Kallin knagelb...@globeandmail.com wrote: From: Nagelberg, Kallin knagelb...@globeandmail.com Subject: RE: Machine utilization while indexing To: 'solr-user@lucene.apache.org' solr-user@lucene.apache.org Date: Thursday, May 20, 2010, 8:16 AM How about throwing a blockingqueue, http

RE: Machine utilization while indexing

2010-05-20 Thread Nagelberg, Kallin
your doing.Currently it takes about 2hour to index the 5m documents I'm talking about. But I still feel as if my machine is under utilized. Thijs On 20-5-2010 17:16, Nagelberg, Kallin wrote: How about throwing a blockingqueue, http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent

RE: seemingly impossible query

2010-05-20 Thread Nagelberg, Kallin
Thanks Darren, The problem with that is that it may not return one document per id, which is what I need. IE, I could give 100 ids in that OR query and retrieve 100 documents, all containing just 1 of the IDs. -Kallin Nagelberg -Original Message- From: dar...@ontrenet.com

RE: seemingly impossible query

2010-05-20 Thread Nagelberg, Kallin
Yeah I need something like: (id:1 and maxhits:1) OR (id:2 and maxits:1).. something crazy like that.. I'm not sure how I can hit solr once. If I do try and do them all in one big OR query then I'm probably not going to get a hit for each ID. I would need to request probably 1000 documents to

RE: seemingly impossible query

2010-05-20 Thread Nagelberg, Kallin
with 1 matching doc for each id. Again it is not guarenteed that all docs returned are different. Since you didn't specify this as a requirement I think this will suffics. Cheers, Geert-Jan 2010/5/20 Nagelberg, Kallin knagelb...@globeandmail.com Yeah I need something like: (id:1 and maxhits:1

RE: Machine utilization while indexing

2010-05-20 Thread Nagelberg, Kallin
StreamingUpdateSolrServer already has multiple threads and uses multiple connections under the covers. At least the api says ' Uses an internal MultiThreadedHttpConnectionManager to manage http connections'. The constructor allows you to specify the number of threads used,

RE: seemingly impossible query

2010-05-20 Thread Nagelberg, Kallin
Nagelberg, Kallin knagelb...@globeandmail.com Yeah I need something like: (id:1 and maxhits:1) OR (id:2 and maxits:1).. something crazy like that.. I'm not sure how I can hit solr once. If I do try and do them all in one big OR query then I'm probably not going to get a hit for each ID. I

RE: seemingly impossible query

2010-05-21 Thread Nagelberg, Kallin
-Jan 2010/5/20 Nagelberg, Kallin knagelb...@globeandmail.com Yeah I need something like: (id:1 and maxhits:1) OR (id:2 and maxits:1).. something crazy like that.. I'm not sure how I can hit solr once. If I do try and do them all in one big OR query then I'm probably not going to get a hit

field collapsing on multi-valued field

2010-05-21 Thread Nagelberg, Kallin
As I understand from looking at https://issues.apache.org/jira/login.jsp?os_destination=/browse/SOLR-236 field collapsing has been disabled on multi-valued fields. Is this really necessary? Let's say I have a multi-valued field, 'my-mv-field'. I have a query like (my-mv-field:1 OR

RE: Any realtime indexing plugin available for SOLR

2010-05-26 Thread Nagelberg, Kallin
I'm afraid nothing is completely 'real-time'. Even when doing your inserts on the database there is time taken for those operations to complete. Right now I have my solr server autocommiting every 30 seconds, which is 'real-time' enough for me. You need to figure out what your threshold is, and

RE: How real-time are Solr/Lucene queries?

2010-05-26 Thread Nagelberg, Kallin
Searching is very fast with Solr, but no way as fast as keying into a map. There is possibly disk I/O if your document isn't cached. Your situation sounds unique enough I think you're going to need to prototype to see if it meets your demands. Figure out how 'fast' is 'fast' for your

RE: seemingly impossible query

2010-05-26 Thread Nagelberg, Kallin
. Hopefully someone finds this useful eventually! -Kallin Nagelberg -Original Message- From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com] Sent: Friday, May 21, 2010 4:44 PM To: 'solr-user@lucene.apache.org' Subject: RE: seemingly impossible query I just realized something

RE: Storing different entities in Solr

2010-05-28 Thread Nagelberg, Kallin
Good read here: http://mysolr.com/tips/denormalized-data-structure/ . Are consultation requests unique to each consultant? In that case you could represent the request as a Json String and store it as a multi-valued string field for each consultant, though that makes querying against requests

RE: Storing different entities in Solr

2010-05-28 Thread Nagelberg, Kallin
Multi-core is an option, but keep in mind if you go that route you will need to do two searches to correlate data between the two. -Kallin Nagelberg -Original Message- From: Robert Zotter [mailto:robertzot...@gmail.com] Sent: Friday, May 28, 2010 12:26 PM To:

RE: Storing different entities in Solr

2010-05-28 Thread Nagelberg, Kallin
, Nagelberg, Kallin knagelb...@globeandmail.com wrote: Multi-core is an option, but keep in mind if you go that route you will need to do two searches to correlate data between the two. -Kallin Nagelberg -Original Message- From: Robert Zotter [mailto:robertzot...@gmail.com

RE: index growing with updates

2010-06-03 Thread Nagelberg, Kallin
your config is set up to replace unique keys, you're really doing a delete and an add (under the covers). It could very well be that the deleted version of the document is still in your index taking up space and will be until it is purged. HTH Erick On Thu, Jun 3, 2010 at 10:22 AM, Nagelberg

RE: general debugging techniques?

2010-06-03 Thread Nagelberg, Kallin
How much memory have you given tomcat? The default is 64M which is going to be really small for 5MB documents. -Original Message- From: jim.bl...@pbwiki.com [mailto:jim.bl...@pbwiki.com] On Behalf Of Jim Blomo Sent: Thursday, June 03, 2010 2:05 PM To: solr-user@lucene.apache.org

RE: general debugging techniques?

2010-06-03 Thread Nagelberg, Kallin
03, 2010 2:29 PM To: solr-user@lucene.apache.org Subject: Re: general debugging techniques? On Thu, Jun 3, 2010 at 11:17 AM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: How much memory have you given tomcat? The default is 64M which is going to be really small for 5MB documents

RE: index growing with updates

2010-06-04 Thread Nagelberg, Kallin
- From: Nagelberg, Kallin Sent: Thursday, June 03, 2010 1:36 PM To: 'solr-user@lucene.apache.org' Subject: RE: index growing with updates Is there a way to trigger a purge, or under what conditions does it occur? -Kallin Nagelberg -Original Message- From: Erick Erickson

RE: Help patching Solr

2010-06-15 Thread Nagelberg, Kallin
I'm pretty sure you need to be running the patch against a checkout of the trunk sources, not a generated .war file. Once you've done that you can use the build scripts to make a new war. -Kallin Nagelberg -Original Message- From: Moazzam Khan [mailto:moazz...@gmail.com] Sent:

RE: limiting the total number of documents matched

2010-07-14 Thread Nagelberg, Kallin
So you want to take the top 1000 sorted by score, then sort those by another field. It's a strange case, and I can't think of a clean way to accomplish it. You could do it in two queries, where the first is by score and you only request your IDs to keep it snappy, then do a second query against

RE: how to eliminating scoring from a query?

2010-07-15 Thread Nagelberg, Kallin
How about: 1. Create a date field to indicate indextime. 2 Use a date filter to restrict articles to today and yesterday such as myindexdate:[NOW/DAY-1DAY TO NOW/DAY+1DAY] 3. sort on that field. -Kallin Nagelberg -Original Message- From: oferiko [mailto:ofer...@gmail.com] Sent:

RE: faceted search with job title

2010-07-21 Thread Nagelberg, Kallin
Yeah you should definitely just setup a custom parser for each site.. should be easy to extract title using groovy's xml parsing along with tagsoup for sloppy html. If you can't find the pattern for each site leading to the job title how can you expect solr to? Humans have the advantage here :P

solrj occasional timeout on commit

2010-07-23 Thread Nagelberg, Kallin
Hey, I recently moved a solr app from a testing environment into a production environment, and I'm seeing a brand new error which never occurred during testing. I'm seeing this in the solrJ-based app logs: org.apache.solr.common.SolrException: com.caucho.vfs.SocketTimeoutException: client

RE: help with a schema design problem

2010-07-23 Thread Nagelberg, Kallin
I think you just want something like: p_value:Pramod AND p_type:Supplier no? -Kallin Nagelberg -Original Message- From: Pramod Goyal [mailto:pramod.go...@gmail.com] Sent: Friday, July 23, 2010 2:17 PM To: solr-user@lucene.apache.org Subject: help with a schema design problem Hi, Lets

RE: help with a schema design problem

2010-07-23 Thread Nagelberg, Kallin
. On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: I think you just want something like: p_value:Pramod AND p_type:Supplier no? -Kallin Nagelberg -Original Message- From: Pramod Goyal [mailto:pramod.go...@gmail.com

RE: How to 'filter' facet results

2010-07-28 Thread Nagelberg, Kallin
ManBearPig is still a threat. -Kallin Nagelberg -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Tuesday, July 27, 2010 7:44 PM To: solr-user@lucene.apache.org Subject: RE: How to 'filter' facet results Is there a way to tell Solr to only return a specific

ord on TrieDateField always returning max

2010-01-06 Thread Nagelberg, Kallin
Hi everyone, I've been trying to add a date based boost to my queries. I have a field like: fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ field name=datetime type=tdate indexed=true stored=true required=true / When I look at the datetime

RE: ord on TrieDateField always returning max

2010-01-06 Thread Nagelberg, Kallin
more memory, ord() isn't even going to work for a field with multiple tokens indexed per value (like tdate). I'd recommend using a function on the date value itself. http://wiki.apache.org/solr/FunctionQuery#ms -Yonik http://www.lucidimagination.com On Wed, Jan 6, 2010 at 10:52 AM, Nagelberg

parabolic type function centered on a date

2010-02-11 Thread Nagelberg, Kallin
Hi everyone, I'm trying to enhance a more like this search I'm conducting by boosting the documents that have a date close to the original. I would like to do something like a parabolic function centered on the date (would make tuning a little more effective), though a linear function would

filter queries not fully filtering

2010-02-16 Thread Nagelberg, Kallin
Hi everyone, I am attempting to implement a faceted drill down feature with Solr. I am having problems explaining some results of the fq parameter. Let's say I have two fields, 'people' and 'category'. I do a search for 'dog' and ask to facet on the people and category fields. I am told that

RE: filter queries not fully filtering

2010-02-16 Thread Nagelberg, Kallin
Problem solved. I wasn't quoting the value. Since I was using names such as 'Gary Bettman' solr must have been giving all the Garys. -Original Message- From: Nagelberg, Kallin [mailto:knagelb...@globeandmail.com] Sent: Tuesday, February 16, 2010 3:22 PM To: 'solr-user@lucene.apache.org

including 'the' dismax query kills results

2010-02-18 Thread Nagelberg, Kallin
I've noticed some peculiar behavior with the dismax searchhandler. In my case I'm making the search The British Open, and am getting 0 results. When I change it to British Open I get many hits. I looked at the query analyzer and it should be broken down to british and open tokens ('the' is a

stop words make dismax fail

2010-02-24 Thread Nagelberg, Kallin
I'm having a problem when users enter stopwords in their query. I'm using a dismax request handler against a field setup like: fieldType name=simpleText class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/

RE: How to use dismax and boosting properly?

2010-02-25 Thread Nagelberg, Kallin
Try setting the boost to 0 for the fields you don't want to contribute to the score. Kallin Nagelberg -Original Message- From: Jason Chaffee [mailto:jchaf...@ebates.com] Sent: Thursday, February 25, 2010 4:03 PM To: solr-user@lucene.apache.org Subject: How to use dismax and boosting

RE: lowercasing for sorting

2010-03-23 Thread Nagelberg, Kallin
copyField, if you also need to be able to search or display the original values. Just out of curiosity, can you tell us anything about what the Globe and Mail is using Solr for? (assuming the question is work-related) Peter -Original Message- From: Nagelberg, Kallin [mailto:knagelb

multicore embedded swap / reload etc.

2010-03-24 Thread Nagelberg, Kallin
Hi, I've got a situation where I need to reindex a core once a day. To do this I was thinking of having two cores, one 'live' and one 'staging'. The app is always serving 'live', but when the daily index happens it goes into 'staging', then staging is swapped into 'live'. I can see how to do

RE: multicore embedded swap / reload etc.

2010-03-26 Thread Nagelberg, Kallin
:19 PM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: Hi, I've got a situation where I need to reindex a core once a day. To do this I was thinking of having two cores, one 'live' and one 'staging'. The app is always serving 'live', but when the daily index happens it goes