RE: Solr commits before documents are added

2009-10-20 Thread Feak, Todd
with last 20 RECORD_ID are missing.(example the last id is 999,980 instead of 1,000,000) - Sharmila Feak, Todd wrote: A few questions to help the troubleshooting. Solr version #? Is there just 1 commit through Solrj for the millions of documents? Or do you do it on a regular interval

RE: Solr commits before documents are added

2009-10-19 Thread Feak, Todd
A few questions to help the troubleshooting. Solr version #? Is there just 1 commit through Solrj for the millions of documents? Or do you do it on a regular interval (every 100k documents for example) and then one at the end to be sure? How are you observing that the last few didn't make it

RE: using regular expressions in solr query

2009-10-06 Thread Feak, Todd
Any particular reason for the double quotes in the 2nd and 3rd query example, but not the 1st, or is this just an artifact of your email? -Todd -Original Message- From: Rakhi Khatwani [mailto:rkhatw...@gmail.com] Sent: Tuesday, October 06, 2009 2:26 AM To: solr-user@lucene.apache.org

RE: Solr Timeouts

2009-10-06 Thread Feak, Todd
: maxDocs - number of updates since last commit is greater than this maxTime - oldest uncommited update (in ms) is this long ago autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit -- -Original Message- From: Feak, Todd [mailto:todd.f

RE: Solr Timeouts

2009-10-05 Thread Feak, Todd
How often are you committing? Every time you commit, Solr will close the old index and open the new one. If you are doing this in parallel from multiple jobs (4-5 you mention) then eventually the server gets behind and you start to pile up commit requests. Once this starts to happen, it will

RE: Solr Timeouts

2009-10-05 Thread Feak, Todd
[mailto:gfernandez-kinc...@capitaliq.com] Sent: Monday, October 05, 2009 9:30 AM To: solr-user@lucene.apache.org Subject: RE: Solr Timeouts I'm not committing at all actually - I'm waiting for all 6 million to be done. -Original Message- From: Feak, Todd [mailto:todd.f...@smss.sony.com

RE: Solr Timeouts

2009-10-05 Thread Feak, Todd
@lucene.apache.org Subject: RE: Solr Timeouts I'm not committing at all actually - I'm waiting for all 6 million to be done. -Original Message- From: Feak, Todd [mailto:todd.f...@smss.sony.com] Sent: Monday, October 05, 2009 12:10 PM To: solr-user@lucene.apache.org Subject: RE: Solr Timeouts How often

RE: About SolrJ for XML

2009-10-05 Thread Feak, Todd
It looks like you have some confusion about queries vs. facets. You may want to look at the Solr wiki reqarding facets a bit. In the meanwhile, if you just want to query for that field containing 21... I would suggest that you don't set the query type, don't set any facet fields, and only set

RE: cleanup old index directories on slaves

2009-10-05 Thread Feak, Todd
We use the snapcleaner script. http://wiki.apache.org/solr/SolrCollectionDistributionScripts#snapcleaner Will that do the job? -Todd -Original Message- From: solr jay [mailto:solr...@gmail.com] Sent: Monday, October 05, 2009 1:58 PM To: solr-user@lucene.apache.org Subject: cleanup old

RE: NGramTokenFilter behaviour

2009-09-30 Thread Feak, Todd
My understanding of a NGramTokenizing is to help with languages that don't necessarily contain spaces as a word delimiter (Japanese et al). In that case bi-gramming is used to find words contained within a stream of unbroken characters. In that case, you want to find all of the bi-grams that

RE: Re: WebLogic 10 Compatibility Issue - StackOverflowError

2009-01-30 Thread Feak, Todd
Are the issues ran into due to non-standard code in Solr, or is there some WebLogic inconsistency? -Todd Feak -Original Message- From: news [mailto:n...@ger.gmane.org] On Behalf Of Ilan Rabinovitch Sent: Friday, January 30, 2009 1:11 AM To: solr-user@lucene.apache.org Subject: Re:

RE: warmupTime : 0

2009-01-29 Thread Feak, Todd
This usually represents anything less then 8ms if you are on a Windows system. The granularity on timing on Windows systems is around 16ms. -Todd feak -Original Message- From: sunnyfr [mailto:johanna...@gmail.com] Sent: Thursday, January 29, 2009 9:13 AM To: solr-user@lucene.apache.org

RE: solr as the data store

2009-01-28 Thread Feak, Todd
Although the idea that you will need to rebuild from scratch is unlikely, you might want to fully understand the cost of recovery if you *do* have to. If it's incredibly expensive(time or money), you need to keep that in mind. -Todd -Original Message- From: Ian Connor

RE: Performance dead-zone due to garbage collection

2009-01-23 Thread Feak, Todd
Can you share your experience with the IBM JDK once you've evaluated it? You are working with a heavy load, I think many would benefit from the feedback. -Todd Feak -Original Message- From: wojtekpia [mailto:wojte...@hotmail.com] Sent: Thursday, January 22, 2009 3:46 PM To:

RE: QTime in microsecond

2009-01-23 Thread Feak, Todd
The easiest way is to run maybe 100,000 or more queries and take an average. A single microsecond value for a query would be incredibly inaccurate. -ToddFeak -Original Message- From: AHMET ARSLAN [mailto:iori...@yahoo.com] Sent: Friday, January 23, 2009 1:33 AM To:

RE: Performance dead-zone due to garbage collection

2009-01-21 Thread Feak, Todd
The large drop in old generation from 27GB-6GB indicates that things are getting into your old generation prematurely. They really don't need to get there at all, and should be collected sooner (more frequently). Look into increasing young generation sizes via JVM parameters. Also look into

RE: Performance dead-zone due to garbage collection

2009-01-21 Thread Feak, Todd
From a high level view, there is a certain amount of garbage collection that must occur. That garbage is generated per request, through a variety of means (buffers, request, response, cache expulsion). The only thing that JVM parameters can address is *when* that collection occurs. It can occur

RE: Performance dead-zone due to garbage collection

2009-01-21 Thread Feak, Todd
A ballpark calculation would be Collected Amount (From GC logging)/ # of Requests. The GC logging can tell you how much it collected each time, no need to try and snapshot before and after heap sizes. However (big caveat here), this is a ballpark figure. The garbage collector is not guaranteed

RE: How to select *actual* match from a multi-valued field

2009-01-20 Thread Feak, Todd
Anyone that can shed some insight? -Todd -Original Message- From: Feak, Todd [mailto:todd.f...@smss.sony.com] Sent: Friday, January 16, 2009 9:55 AM To: solr-user@lucene.apache.org Subject: How to select *actual* match from a multi-valued field At a high level, I'm trying to do some

RE: New to Solr/Lucene design question

2009-01-20 Thread Feak, Todd
A third option - Use dynamic fields. Add a dynamic field call *_stash. This will allow new fields for documents to be added down the road without changing schema.xml, yet still allow you to query on fields like arresteeFirstName_stash without extra overhead. -Todd Feak -Original

RE: New to Solr/Lucene design question

2009-01-20 Thread Feak, Todd
- Original Message From: Feak, Todd todd.f...@smss.sony.com To: solr-user@lucene.apache.org Sent: Tuesday, January 20, 2009 4:49:56 PM Subject: RE: New to Solr/Lucene design question A third option - Use dynamic fields. Add a dynamic field call *_stash. This will allow new fields

How to select *actual* match from a multi-valued field

2009-01-16 Thread Feak, Todd
At a high level, I'm trying to do some more intelligent searching using an app that will send multiple queries to Solr. My current issue is around multi-valued fields and determining which entry actually generated the hit for a particular query. For example, let's say that I have a

RE: Commiting index while time-consuming query is running

2009-01-13 Thread Feak, Todd
I believe that when you commit, a new IndexReader is created, which is warmed, etc. New incoming queries will be sent to this new IndexReader. Once all previously existing queries have been answered, the old IndexReader will shut down. The commit doesn't wait for the query to finish, but it

RE: Using query functions against a type field

2009-01-06 Thread Feak, Todd
:It should be fairly predictible, can you elaborate on what problems you :have just adding boost queries for the specific types? The boost queries are true queries, so the amount boost can be affected by things like term frequency for the query. The functions aren't affected by this and

RE: Snapinstaller vs Solr Restart

2009-01-06 Thread Feak, Todd
First suspect would be Filter Cache settings and Query Cache settings. If they are auto-warming at all, then there is a definite difference between the first start behavior and the post-commit behavior. This affects what's in memory, caches, etc. -Todd Feak -Original Message- From:

RE: Using query functions against a type field

2009-01-06 Thread Feak, Todd
@lucene.apache.org Subject: Re: Using query functions against a type field On Tue, Jan 6, 2009 at 10:41 AM, Feak, Todd todd.f...@smss.sony.com wrote: The boost queries are true queries, so the amount boost can be affected by things like term frequency for the query. Sounds like a constant score

RE: Using query functions against a type field

2009-01-06 Thread Feak, Todd
: Using query functions against a type field On Tue, Jan 6, 2009 at 1:05 PM, Feak, Todd todd.f...@smss.sony.com wrote: I'm not sure I followed all that Yonik. Are you saying that I can achieve this affect now with a bq setting in my DisMax query instead of via a bf setting? Yep, a const QParser

RE: Snapinstaller vs Solr Restart

2009-01-06 Thread Feak, Todd
Kind of a side-note, but I think it may be worth your while. If your queryResultCache hit rate is 65%, consider putting a reverse proxy in front of Solr. It can give performance boosts over the query cache in Solr, as it doesn't have to pay the cost of reformulating the response. I've used

RE: Ngram Repeats

2009-01-05 Thread Feak, Todd
. The ngrams are extremely fast and the recommended way to do this according to the user group. They work wonderfully except this one issue. So do we basically have to do a separate index for this or is there a dedup setting to only return unique brand names. On 12/24/08 7:51 AM, Feak, Todd todd.f

RE: Ngram Repeats

2008-12-24 Thread Feak, Todd
It sounds like you want to get a list of brands that start with a particular string, out of your index. But your index is based on products, not brands. Is that correct? If so, that has nothing to do with NGrams (or even tokenizing for that matter) I think you should be doing a Facet query

RE: Using query functions against a type field

2008-12-22 Thread Feak, Todd
Subject: Re: Using query functions against a type field Try document boost at index time. --wunder On 12/22/08 9:28 AM, Feak, Todd todd.f...@smss.sony.com wrote: I would like to use a query function to boost documents of a certain type. I realize that I can use a boost query

RE: looking for multilanguage indexing best practice/hint

2008-12-17 Thread Feak, Todd
Don't forget to consider scaling concerns (if there are any). There are strong differences in the number of searches we receive for each language. We chose to create separate schema and config per language so that we can throw servers at a particular language (or set of languages) if we needed to.

RE: Query Performance while updating teh index

2008-12-12 Thread Feak, Todd
It's spending 4-5 seconds warming up your query cache. If 4-5 seconds is too much, you could reduce the number of queries to auto-warm with on that cache. Notice that the 4-5 seconds is spent only putting about 420 queries into the query cache. Your autowarm of 5 for the query cache seems a

RE: Query Performance while updating teh index

2008-12-12 Thread Feak, Todd
is done. Feak, Todd wrote: It's spending 4-5 seconds warming up your query cache. If 4-5 seconds is too much, you could reduce the number of queries to auto-warm with on that cache. Notice that the 4-5 seconds is spent only putting about 420 queries into the query cache. Your autowarm

RE: move /solr directory from /tomcat/bin/

2008-12-11 Thread Feak, Todd
You can set the home directory in your Tomcat context snippet/file. http://wiki.apache.org/solr/SolrTomcat#head-7036378fa48b79c0797cc8230a8a a0965412fb2e This controls where Solr looks for solrconfig.xml and schema.xml. The solrconfig.xml in turn specifies where to find the data directory.

RE: Issue with Search when using wildcard(*) in search term.

2008-12-09 Thread Feak, Todd
I'm pretty sure * isn't supported by DisMax. From the Solr Wiki on DisMaxRequestHandler overview http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=(dismax)#head -ce5517b6c702a55af5cc14a2c284dbd9f18a18c2 This query handler supports an extremely simplified subset of the Lucene

RE: Sorting on text-fields with international characters

2008-12-08 Thread Feak, Todd
One option is to add an additional field for sorting. Create a copy of the field you want to sort on and modify the data you insert there so that it will sort the way you want it to. -ToddFeak -Original Message- From: Joel Karlsson [mailto:[EMAIL PROTECTED] Sent: Monday, December 08,

RE: Encoded search string qt=Dismax

2008-12-02 Thread Feak, Todd
Do you have a dismaxrequest request handler defined in your solr config xml? Or is it dismax? -Todd Feak -Original Message- From: tushar kapoor [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2008 10:07 AM To: solr-user@lucene.apache.org Subject: Encoded search string qt=Dismax

RE: maxWarmingSearchers

2008-12-01 Thread Feak, Todd
The commit after each one may be hurting you. I believe that a new searcher is created after each commit. That searcher then runs through its warm up, which can be costly depending on your warming settings. Even if it's not overly costly, creating another one while the first one is running

RE: WordDelimeterFilter and its Factory: access to charTypeTable

2008-11-20 Thread Feak, Todd
I've found that creating a custom filter and filter factory isn't too burdensome when the filter doesn't quite do what I need. You could grab the source and create your own version. -Todd Feak -Original Message- From: Jerven Bolleman [mailto:[EMAIL PROTECTED] Sent: Thursday, November

RE: Searchable/indexable newsgroups

2008-11-19 Thread Feak, Todd
Can Nutch crawl newsgroups? Anyone? -Todd Feak -Original Message- From: John Martyniak [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 3:06 PM To: solr-user@lucene.apache.org Subject: Searchable/indexable newsgroups Does anybody know of a good way to index newsgroups using

RE: Solr security

2008-11-17 Thread Feak, Todd
I see value in this in the form of protecting the client from itself. For example, our Solr isn't accessible from the Internet. It's all behind firewalls. But, the client applications can make programming mistakes. I would love the ability to lock them down to a certain number of rows, just in

RE: solr 1.3 Modification field in schema.xml

2008-11-13 Thread Feak, Todd
I believe (someone correct me if I'm wrong) that the only fields you need to store are those fields which you wish returned from the query. In other words, if you will never put the field on the list of fields (fl) to return, there is no need to store it. It would be advantageous not to store

RE: maxCodeLen in the doublemetaphone solr analyzer

2008-11-13 Thread Feak, Todd
There's a patch in to do that as a separate filter. See https://issues.apache.org/jira/browse/SOLR-813 You could just take the patch. It's the full filter and factory. -Todd Feak -Original Message- From: Brian Whitman [mailto:[EMAIL PROTECTED] Sent: Thursday, November 13, 2008 12:31 PM

RE: NIO not working yet

2008-11-12 Thread Feak, Todd
Is support for setting the FSDirectory this way built into 1.3.0 release? Or is it necessary to grab a trunk build. -Todd Feak -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Wednesday, November 12, 2008 11:59 AM To:

RE: Throughput Optimization

2008-11-05 Thread Feak, Todd
If you are seeing 90% CPU usage and are not IO (File or Network) bound, then you are most probably bound by lock contention. If your CPU usage goes down as you throw more threads at the box, that's an even bigger indication that that is the issue. A good profiling tool should help you locate

RE: Throughput Optimization

2008-11-05 Thread Feak, Todd
What are your other cache hit rates looking like? Which caches are you using the FastLRUCache on? -Todd Feak -Original Message- From: wojtekpia [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 05, 2008 8:15 AM To: solr-user@lucene.apache.org Subject: Re: Throughput Optimization

RE: Throughput Optimization

2008-11-05 Thread Feak, Todd
-Original Message- From: wojtekpia [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 05, 2008 11:08 AM To: solr-user@lucene.apache.org Subject: RE: Throughput Optimization My documentCache hit rate is ~.7, and my queryCache is ~.03. I'm using FastLRUCache on all 3 of the caches. Feak

RE: Custom sort (score + custom value)

2008-11-03 Thread Feak, Todd
Have you looked into the bf and bq arguments on the DisMaxRequestHandler? http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=(dismax)#head -6862070cf279d9a09bdab971309135c7aea22fb3 -Todd -Original Message- From: George [mailto:[EMAIL PROTECTED] Sent: Monday, November 03, 2008

RE: SOLR Performance

2008-11-03 Thread Feak, Todd
I believe this is one of the reasons that a master/slave configuration comes in handy. Commits to the Master don't slow down queries on the Slave. -Todd -Original Message- From: Alok Dhir [mailto:[EMAIL PROTECTED] Sent: Monday, November 03, 2008 1:47 PM To: solr-user@lucene.apache.org

RE: Performanec Lucene / Solr

2008-10-30 Thread Feak, Todd
I realize you said caching won't help because the searches are different, but what about Document caching? Is every document returned different? What's your hit rate on the Document cache? Can you throw memory at the problem by increasing Document cache size? I ask all this, as the Document cache

RE: exceeded limit of maxWarmingSearchers

2008-10-29 Thread Feak, Todd
Have you looked at how long your warm up is taking? If it's taking longer to warm up a searcher then it does for you to do an update, you will be behind the curve and eventually run into this no matter how big that number. -Original Message- From: news [mailto:[EMAIL PROTECTED] On

RE: date range query performance

2008-10-29 Thread Feak, Todd
It strikes me that removing just the seconds could very well reduce overhead to 1/60 of original. 30 second query turns into 500ms query. Just a swag though. -Todd -Original Message- From: Alok Dhir [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 29, 2008 1:48 PM To:

RE: Question about textTight

2008-10-28 Thread Feak, Todd
You may want to take a very close look at what the WordDelimiterFilter is doing. I believe the underscore is dropped entirely during indexing AND searching as it's not alphanumeric. Wiki doco here http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=(t

RE: One document inserted but nothing showing up ? SOLR 1.3

2008-10-23 Thread Feak, Todd
Unless q=ALL is a special query I don't know about, the only reason you would get results is if ALL showed up in the default field of the single document that was inserted/updated. You could try a query of *:* instead. Don't forget to URL encode if you are doing this via URL. -Todd

RE: Question about copyField

2008-10-22 Thread Feak, Todd
The filters and tokenizer that are applied to the copy field are determined by it's type in the schema. Simply create a new field type in your schema with the filters you would like, and use that type for your copy field. So, the field description would have it's old type, but the field suggestion

RE: Re[2]: Question about copyField

2008-10-22 Thread Feak, Todd
: Wednesday, October 22, 2008 9:24 AM To: Feak, Todd Subject: Re[2]: Question about copyField Thanks for reply. I want to make your point more exact, cause I'm not sure that I correctly understood you :) As far as I know (correct me please, if I wrong) type defines the way in which the field is indexed

RE: Re[4]: Question about copyField

2008-10-22 Thread Feak, Todd
My bad. I misunderstood what you wanted. The example I gave was for the searching side of things. Not the data representation in the document. -Todd -Original Message- From: Aleksey Gogolev [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 22, 2008 11:14 AM To: Feak, Todd Subject: Re

RE: Problem implementing a BinaryQueryResponseWriter

2008-10-21 Thread Feak, Todd
implementing a BinaryQueryResponseWriter do you have handleSelect set to true in solrconfig? requestDispatcher handleSelect=true ... if not, it would use a Servlet that is now deprecated On Oct 20, 2008, at 4:52 PM, Feak, Todd wrote: I found out what's going on. My test queries

Problem implementing a BinaryQueryResponseWriter

2008-10-20 Thread Feak, Todd
I switched from dev group for this specific question, in case other users have similar issue. I'm implementing my own BinaryQueryResponseWriter. I've implemented the interface and successfully plugged it into the Solr configuration. However, the application always calls the Writer method on

RE: Japonish language seems to don't work on solr 1.3

2008-10-20 Thread Feak, Todd
That looks like the data in the index is incorrectly encoded. If the inserts into your index came in via HTTP GET and your Tomcat wasn't configured for UTF-8 at the time, I could see it going into the index corrupted. But I'm not sure if that's even possible (depends on Update) Is it hard to

RE: Problem implementing a BinaryQueryResponseWriter

2008-10-20 Thread Feak, Todd
a BinaryQueryResponseWriter Hi Todd, Did you add your response writer in solrconfig.xml? queryResponseWriter name=xml class=org.apache.solr.request.XMLResponseWriter default=true/ On Mon, Oct 20, 2008 at 9:35 PM, Feak, Todd [EMAIL PROTECTED] wrote: I switched from dev group for this specific

RE: Problem implementing a BinaryQueryResponseWriter

2008-10-20 Thread Feak, Todd
= response.getWriter(); responseWriter.write(out, solrReq, solrRsp); } On Oct 20, 2008, at 3:59 PM, Feak, Todd wrote: Yes. I've gotten it to the point where my class is called, but the wrong method on it is called. -Todd -Original Message- From

RE: Lucene 2.4 released

2008-10-15 Thread Feak, Todd
The current Subversion trunk has the new Lucene 2.4.0 libraries committed. So, it's definitely under way. -Todd -Original Message- From: Julio Castillo [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 15, 2008 9:48 AM To: solr-user@lucene.apache.org Subject: Lucene 2.4 released Any

RE: Practical number of Solr instances per machine

2008-10-14 Thread Feak, Todd
In our load testing, the limit for utilizing all of the processor time on a box was locking (synchronize, mutex, monitor, pick one). There were a couple of locking points that we saw. 1. Lucene's locking on the index for simultaneous read/write protection. 2. Solr's locking on the LRUCaches for

RE: Practical number of Solr instances per machine

2008-10-14 Thread Feak, Todd
Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Tuesday, October 14, 2008 1:38 PM To: solr-user@lucene.apache.org Subject: Re: Practical number of Solr instances per machine On Tue, Oct 14, 2008 at 4:29 PM, Feak, Todd [EMAIL PROTECTED] wrote: In our load