with last 20 RECORD_ID are missing.(example the
last id is 999,980 instead of 1,000,000)
- Sharmila
Feak, Todd wrote:
A few questions to help the troubleshooting.
Solr version #?
Is there just 1 commit through Solrj for the millions of documents?
Or do you do it on a regular interval
A few questions to help the troubleshooting.
Solr version #?
Is there just 1 commit through Solrj for the millions of documents?
Or do you do it on a regular interval (every 100k documents for example) and
then one at the end to be sure?
How are you observing that the last few didn't make it
Any particular reason for the double quotes in the 2nd and 3rd query example,
but not the 1st, or is this just an artifact of your email?
-Todd
-Original Message-
From: Rakhi Khatwani [mailto:rkhatw...@gmail.com]
Sent: Tuesday, October 06, 2009 2:26 AM
To: solr-user@lucene.apache.org
:
maxDocs - number of updates since last commit is greater than this
maxTime - oldest uncommited update (in ms) is this long ago
autoCommit
maxDocs1/maxDocs
maxTime1000/maxTime
/autoCommit
--
-Original Message-
From: Feak, Todd [mailto:todd.f
How often are you committing?
Every time you commit, Solr will close the old index and open the new one. If
you are doing this in parallel from multiple jobs (4-5 you mention) then
eventually the server gets behind and you start to pile up commit requests.
Once this starts to happen, it will
[mailto:gfernandez-kinc...@capitaliq.com]
Sent: Monday, October 05, 2009 9:30 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Timeouts
I'm not committing at all actually - I'm waiting for all 6 million to be done.
-Original Message-
From: Feak, Todd [mailto:todd.f...@smss.sony.com
@lucene.apache.org
Subject: RE: Solr Timeouts
I'm not committing at all actually - I'm waiting for all 6 million to be done.
-Original Message-
From: Feak, Todd [mailto:todd.f...@smss.sony.com]
Sent: Monday, October 05, 2009 12:10 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr Timeouts
How often
It looks like you have some confusion about queries vs. facets. You may want to
look at the Solr wiki reqarding facets a bit. In the meanwhile, if you just
want to query for that field containing 21...
I would suggest that you don't set the query type, don't set any facet fields,
and only set
We use the snapcleaner script.
http://wiki.apache.org/solr/SolrCollectionDistributionScripts#snapcleaner
Will that do the job?
-Todd
-Original Message-
From: solr jay [mailto:solr...@gmail.com]
Sent: Monday, October 05, 2009 1:58 PM
To: solr-user@lucene.apache.org
Subject: cleanup old
My understanding of a NGramTokenizing is to help with languages that don't
necessarily contain spaces as a word delimiter (Japanese et al). In that case
bi-gramming is used to find words contained within a stream of unbroken
characters. In that case, you want to find all of the bi-grams that
Are the issues ran into due to non-standard code in Solr, or is there
some WebLogic inconsistency?
-Todd Feak
-Original Message-
From: news [mailto:n...@ger.gmane.org] On Behalf Of Ilan Rabinovitch
Sent: Friday, January 30, 2009 1:11 AM
To: solr-user@lucene.apache.org
Subject: Re:
This usually represents anything less then 8ms if you are on a Windows
system. The granularity on timing on Windows systems is around 16ms.
-Todd feak
-Original Message-
From: sunnyfr [mailto:johanna...@gmail.com]
Sent: Thursday, January 29, 2009 9:13 AM
To: solr-user@lucene.apache.org
Although the idea that you will need to rebuild from scratch is
unlikely, you might want to fully understand the cost of recovery if you
*do* have to.
If it's incredibly expensive(time or money), you need to keep that in
mind.
-Todd
-Original Message-
From: Ian Connor
Can you share your experience with the IBM JDK once you've evaluated it?
You are working with a heavy load, I think many would benefit from the
feedback.
-Todd Feak
-Original Message-
From: wojtekpia [mailto:wojte...@hotmail.com]
Sent: Thursday, January 22, 2009 3:46 PM
To:
The easiest way is to run maybe 100,000 or more queries and take an
average. A single microsecond value for a query would be incredibly
inaccurate.
-ToddFeak
-Original Message-
From: AHMET ARSLAN [mailto:iori...@yahoo.com]
Sent: Friday, January 23, 2009 1:33 AM
To:
The large drop in old generation from 27GB-6GB indicates that things
are getting into your old generation prematurely. They really don't need
to get there at all, and should be collected sooner (more frequently).
Look into increasing young generation sizes via JVM parameters. Also
look into
From a high level view, there is a certain amount of garbage collection
that must occur. That garbage is generated per request, through a
variety of means (buffers, request, response, cache expulsion). The only
thing that JVM parameters can address is *when* that collection occurs.
It can occur
A ballpark calculation would be
Collected Amount (From GC logging)/ # of Requests.
The GC logging can tell you how much it collected each time, no need to
try and snapshot before and after heap sizes. However (big caveat here),
this is a ballpark figure. The garbage collector is not guaranteed
Anyone that can shed some insight?
-Todd
-Original Message-
From: Feak, Todd [mailto:todd.f...@smss.sony.com]
Sent: Friday, January 16, 2009 9:55 AM
To: solr-user@lucene.apache.org
Subject: How to select *actual* match from a multi-valued field
At a high level, I'm trying to do some
A third option - Use dynamic fields.
Add a dynamic field call *_stash. This will allow new fields for
documents to be added down the road without changing schema.xml, yet
still allow you to query on fields like arresteeFirstName_stash
without extra overhead.
-Todd Feak
-Original
- Original Message
From: Feak, Todd todd.f...@smss.sony.com
To: solr-user@lucene.apache.org
Sent: Tuesday, January 20, 2009 4:49:56 PM
Subject: RE: New to Solr/Lucene design question
A third option - Use dynamic fields.
Add a dynamic field call *_stash. This will allow new fields
At a high level, I'm trying to do some more intelligent searching using
an app that will send multiple queries to Solr. My current issue is
around multi-valued fields and determining which entry actually
generated the hit for a particular query.
For example, let's say that I have a
I believe that when you commit, a new IndexReader is created, which is
warmed, etc. New incoming queries will be sent to this new IndexReader.
Once all previously existing queries have been answered, the old
IndexReader will shut down.
The commit doesn't wait for the query to finish, but it
:It should be fairly predictible, can you elaborate on what problems you
:have just adding boost queries for the specific types?
The boost queries are true queries, so the amount boost can be affected
by things like term frequency for the query. The functions aren't
affected by this and
First suspect would be Filter Cache settings and Query Cache settings.
If they are auto-warming at all, then there is a definite difference
between the first start behavior and the post-commit behavior. This
affects what's in memory, caches, etc.
-Todd Feak
-Original Message-
From:
@lucene.apache.org
Subject: Re: Using query functions against a type field
On Tue, Jan 6, 2009 at 10:41 AM, Feak, Todd todd.f...@smss.sony.com
wrote:
The boost queries are true queries, so the amount boost can be
affected
by things like term frequency for the query.
Sounds like a constant score
: Using query functions against a type field
On Tue, Jan 6, 2009 at 1:05 PM, Feak, Todd todd.f...@smss.sony.com
wrote:
I'm not sure I followed all that Yonik.
Are you saying that I can achieve this affect now with a bq setting in
my DisMax query instead of via a bf setting?
Yep, a const QParser
Kind of a side-note, but I think it may be worth your while.
If your queryResultCache hit rate is 65%, consider putting a reverse
proxy in front of Solr. It can give performance boosts over the query
cache in Solr, as it doesn't have to pay the cost of reformulating the
response. I've used
. The ngrams are extremely
fast and the recommended way to do this according to the user group. They
work wonderfully except this one issue. So do we basically have to do a
separate index for this or is there a dedup setting to only return unique
brand names.
On 12/24/08 7:51 AM, Feak, Todd todd.f
It sounds like you want to get a list of brands that start with a particular
string, out of your index. But your index is based on products, not brands. Is
that correct?
If so, that has nothing to do with NGrams (or even tokenizing for that matter)
I think you should be doing a Facet query
Subject: Re: Using query functions against a type field
Try document boost at index time. --wunder
On 12/22/08 9:28 AM, Feak, Todd todd.f...@smss.sony.com wrote:
I would like to use a query function to boost documents of a certain
type. I realize that I can use a boost query
Don't forget to consider scaling concerns (if there are any). There are
strong differences in the number of searches we receive for each
language. We chose to create separate schema and config per language so
that we can throw servers at a particular language (or set of languages)
if we needed to.
It's spending 4-5 seconds warming up your query cache. If 4-5 seconds is
too much, you could reduce the number of queries to auto-warm with on
that cache.
Notice that the 4-5 seconds is spent only putting about 420 queries into
the query cache. Your autowarm of 5 for the query cache seems a
is
done.
Feak, Todd wrote:
It's spending 4-5 seconds warming up your query cache. If 4-5 seconds
is
too much, you could reduce the number of queries to auto-warm with on
that cache.
Notice that the 4-5 seconds is spent only putting about 420 queries
into
the query cache. Your autowarm
You can set the home directory in your Tomcat context snippet/file.
http://wiki.apache.org/solr/SolrTomcat#head-7036378fa48b79c0797cc8230a8a
a0965412fb2e
This controls where Solr looks for solrconfig.xml and schema.xml. The
solrconfig.xml in turn specifies where to find the data directory.
I'm pretty sure * isn't supported by DisMax.
From the Solr Wiki on DisMaxRequestHandler overview
http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=(dismax)#head
-ce5517b6c702a55af5cc14a2c284dbd9f18a18c2
This query handler supports an extremely simplified subset of the
Lucene
One option is to add an additional field for sorting. Create a copy of the
field you want to sort on and modify the data you insert there so that it will
sort the way you want it to.
-ToddFeak
-Original Message-
From: Joel Karlsson [mailto:[EMAIL PROTECTED]
Sent: Monday, December 08,
Do you have a dismaxrequest request handler defined in your solr config xml?
Or is it dismax?
-Todd Feak
-Original Message-
From: tushar kapoor [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 02, 2008 10:07 AM
To: solr-user@lucene.apache.org
Subject: Encoded search string qt=Dismax
The commit after each one may be hurting you.
I believe that a new searcher is created after each commit. That searcher then
runs through its warm up, which can be costly depending on your warming
settings. Even if it's not overly costly, creating another one while the first
one is running
I've found that creating a custom filter and filter factory isn't too
burdensome when the filter doesn't quite do what I need. You could
grab the source and create your own version.
-Todd Feak
-Original Message-
From: Jerven Bolleman [mailto:[EMAIL PROTECTED]
Sent: Thursday, November
Can Nutch crawl newsgroups? Anyone?
-Todd Feak
-Original Message-
From: John Martyniak [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 19, 2008 3:06 PM
To: solr-user@lucene.apache.org
Subject: Searchable/indexable newsgroups
Does anybody know of a good way to index newsgroups using
I see value in this in the form of protecting the client from itself.
For example, our Solr isn't accessible from the Internet. It's all
behind firewalls. But, the client applications can make programming
mistakes. I would love the ability to lock them down to a certain number
of rows, just in
I believe (someone correct me if I'm wrong) that the only fields you
need to store are those fields which you wish returned from the query.
In other words, if you will never put the field on the list of fields
(fl) to return, there is no need to store it.
It would be advantageous not to store
There's a patch in to do that as a separate filter. See
https://issues.apache.org/jira/browse/SOLR-813
You could just take the patch. It's the full filter and factory.
-Todd Feak
-Original Message-
From: Brian Whitman [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 13, 2008 12:31 PM
Is support for setting the FSDirectory this way built into 1.3.0
release? Or is it necessary to grab a trunk build.
-Todd Feak
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Wednesday, November 12, 2008 11:59 AM
To:
If you are seeing 90% CPU usage and are not IO (File or Network)
bound, then you are most probably bound by lock contention. If your CPU
usage goes down as you throw more threads at the box, that's an even
bigger indication that that is the issue.
A good profiling tool should help you locate
What are your other cache hit rates looking like?
Which caches are you using the FastLRUCache on?
-Todd Feak
-Original Message-
From: wojtekpia [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 05, 2008 8:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Throughput Optimization
-Original Message-
From: wojtekpia [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 05, 2008 11:08 AM
To: solr-user@lucene.apache.org
Subject: RE: Throughput Optimization
My documentCache hit rate is ~.7, and my queryCache is ~.03. I'm using
FastLRUCache on all 3 of the caches.
Feak
Have you looked into the bf and bq arguments on the
DisMaxRequestHandler?
http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=(dismax)#head
-6862070cf279d9a09bdab971309135c7aea22fb3
-Todd
-Original Message-
From: George [mailto:[EMAIL PROTECTED]
Sent: Monday, November 03, 2008
I believe this is one of the reasons that a master/slave configuration
comes in handy. Commits to the Master don't slow down queries on the
Slave.
-Todd
-Original Message-
From: Alok Dhir [mailto:[EMAIL PROTECTED]
Sent: Monday, November 03, 2008 1:47 PM
To: solr-user@lucene.apache.org
I realize you said caching won't help because the searches are
different, but what about Document caching? Is every document returned
different? What's your hit rate on the Document cache? Can you throw
memory at the problem by increasing Document cache size?
I ask all this, as the Document cache
Have you looked at how long your warm up is taking?
If it's taking longer to warm up a searcher then it does for you to do
an update, you will be behind the curve and eventually run into this no
matter how big that number.
-Original Message-
From: news [mailto:[EMAIL PROTECTED] On
It strikes me that removing just the seconds could very well reduce
overhead to 1/60 of original. 30 second query turns into 500ms query.
Just a swag though.
-Todd
-Original Message-
From: Alok Dhir [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 29, 2008 1:48 PM
To:
You may want to take a very close look at what the WordDelimiterFilter
is doing. I believe the underscore is dropped entirely during indexing
AND searching as it's not alphanumeric.
Wiki doco here
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=(t
Unless q=ALL is a special query I don't know about, the only reason you would
get results is if ALL showed up in the default field of the single document
that was inserted/updated.
You could try a query of *:* instead. Don't forget to URL encode if you are
doing this via URL.
-Todd
The filters and tokenizer that are applied to the copy field are
determined by it's type in the schema. Simply create a new field type in
your schema with the filters you would like, and use that type for your
copy field. So, the field description would have it's old type, but the
field suggestion
: Wednesday, October 22, 2008 9:24 AM
To: Feak, Todd
Subject: Re[2]: Question about copyField
Thanks for reply. I want to make your point more exact, cause I'm not
sure that I correctly understood you :)
As far as I know (correct me please, if I wrong) type defines the way
in which the field is indexed
My bad. I misunderstood what you wanted.
The example I gave was for the searching side of things. Not the data
representation in the document.
-Todd
-Original Message-
From: Aleksey Gogolev [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 22, 2008 11:14 AM
To: Feak, Todd
Subject: Re
implementing a BinaryQueryResponseWriter
do you have handleSelect set to true in solrconfig?
requestDispatcher handleSelect=true
...
if not, it would use a Servlet that is now deprecated
On Oct 20, 2008, at 4:52 PM, Feak, Todd wrote:
I found out what's going on.
My test queries
I switched from dev group for this specific question, in case other
users have similar issue.
I'm implementing my own BinaryQueryResponseWriter. I've implemented the
interface and successfully plugged it into the Solr configuration.
However, the application always calls the Writer method on
That looks like the data in the index is incorrectly encoded.
If the inserts into your index came in via HTTP GET and your Tomcat wasn't
configured for UTF-8 at the time, I could see it going into the index
corrupted. But I'm not sure if that's even possible (depends on Update)
Is it hard to
a BinaryQueryResponseWriter
Hi Todd,
Did you add your response writer in solrconfig.xml?
queryResponseWriter name=xml
class=org.apache.solr.request.XMLResponseWriter default=true/
On Mon, Oct 20, 2008 at 9:35 PM, Feak, Todd [EMAIL PROTECTED]
wrote:
I switched from dev group for this specific
= response.getWriter();
responseWriter.write(out, solrReq, solrRsp);
}
On Oct 20, 2008, at 3:59 PM, Feak, Todd wrote:
Yes.
I've gotten it to the point where my class is called, but the wrong
method on it is called.
-Todd
-Original Message-
From
The current Subversion trunk has the new Lucene 2.4.0 libraries
committed. So, it's definitely under way.
-Todd
-Original Message-
From: Julio Castillo [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 15, 2008 9:48 AM
To: solr-user@lucene.apache.org
Subject: Lucene 2.4 released
Any
In our load testing, the limit for utilizing all of the processor time
on a box was locking (synchronize, mutex, monitor, pick one). There were
a couple of locking points that we saw.
1. Lucene's locking on the index for simultaneous read/write protection.
2. Solr's locking on the LRUCaches for
Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Tuesday, October 14, 2008 1:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Practical number of Solr instances per machine
On Tue, Oct 14, 2008 at 4:29 PM, Feak, Todd [EMAIL PROTECTED]
wrote:
In our load
66 matches
Mail list logo