Date range problems

2007-10-26 Thread David Whalen
Hi All.

We're seeing a really interesting problem when searching by
date range.  

We have two fields of type date in our index (they are both
indexed and stored).  They are:

content_date
and
created_date

We can run any date-range query we want against content_date 
and we get expected results.  However, when we run similar
queries against created_date we get 0 results from the query
consistently.

Now, here's the interesting part -- if we do a plain search
without a date range, BUT sort by created_date desc we get 
properly sorted results.

So, it seems like the index works for sorting but not for
searching.  Does that make any sense?  Anyone have an ideas
on how we can diagnose this issue?

Here's the relevant block from our schema (before you ask):

   field name=id type=string indexed=true stored=true/
   field name=content_date type=date indexed=true stored=true /
   field name=media_type type=string indexed=true stored=true 
multiValued=true /
   field name=location type=string indexed=true stored=true 
multiValued=true /
   field name=country_code type=string indexed=true stored=true 
multiValued=true /
   field name=text type=text indexed=true stored=false 
multiValued=true/
   field name=content_source type=string indexed=true stored=true /
   field name=title type=string indexed=true stored=true /
   field name=site_id type=string indexed=true stored=false /
   field name=journalist_id type=string indexed=true stored=false /
   field name=network type=string indexed=true stored=false /
   field name=created_date type=date indexed=true stored=true /

TIA,

Dave W.



comment-out a filter?

2007-10-15 Thread David Whalen
Hi All.

I want to comment-out a filter in my schema.xml, specifically
the solr.EnglishPorterFilterFactory filter.

I want to know -- will this cause me to have to re-build my
index?  Or will a restart of SOLR get the job done?

Thanks!

Dave W



RE: Facets and running out of Heap Space

2007-10-10 Thread David Whalen
It looks now like I can't use facets the way I was hoping
to because the memory requirements are impractical.

So, as an alternative I was thinking I could get counts
by doing rows=0 and using filter queries.  

Is there a reason to think that this might perform better?
Or, am I simply moving the problem to another step in the
process?

DW

  

 -Original Message-
 From: Stu Hood [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, October 09, 2007 10:53 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Facets and running out of Heap Space
 
  Using the filter cache method on the things like media type and 
  location; this will occupy ~2.3MB of memory _per unique value_
 
 Mike, how did you calculate that value? I'm trying to tune my 
 caches, and any equations that could be used to determine 
 some balanced settings would be extremely helpful. I'm in a 
 memory limited environment, so I can't afford to throw a ton 
 of cache at the problem.
 
 (I don't want to thread-jack, but I'm also wondering whether 
 anyone has any notes on how to tune cache sizes for the 
 filterCache, queryResultCache and documentCache).
 
 Thanks,
 Stu
 
 
 -Original Message-
 From: Mike Klaas [EMAIL PROTECTED]
 Sent: Tuesday, October 9, 2007 9:30pm
 To: solr-user@lucene.apache.org
 Subject: Re: Facets and running out of Heap Space
 
 On 9-Oct-07, at 12:36 PM, David Whalen wrote:
 
 (snip)
  I'm sure we could stop storing many of these columns, 
 especially  if 
 someone told me that would make a big difference.
 
 I don't think that it would make a difference in memory 
 consumption, but storage is certainly not necessary for 
 faceting.  Extra stored fields can slow down search if they 
 are large (in terms of bytes), but don't really occupy extra 
 memory, unless they are polluting the doc cache.  Does 'text' 
 need to be stored?
 
  what does the LukeReqeust Handler tell you about the # of distinct 
  terms in each field that you facet on?
 
  Where would I find that?  I could probably estimate that 
 myself on a 
  per-column basis.  it ranges from 4 distinct values for 
 media_type to 
  30-ish for location to 200-ish for country_code to almost 
 10,000 for 
  site_id to almost 100,000 for journalist_id.
 
 Using the filter cache method on the things like media type 
 and location; this will occupy ~2.3MB of memory _per unique 
 value_, so it should be a net win for those (although quite 
 close in space requirements for a 30-ary field on your index size).
 
 -Mike
 
 


RE: Facets and running out of Heap Space

2007-10-10 Thread David Whalen
Accoriding to Yonik I can't use minDf because I'm faceting
on a string field.  I'm thinking of changing it to a tokenized
type so that I can utilize this setting, but then I'll have to
rebuild my entire index.

Unless there's some way around that?


  

 -Original Message-
 From: Mike Klaas [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, October 10, 2007 4:56 PM
 To: solr-user@lucene.apache.org
 Cc: stuhood
 Subject: Re: Facets and running out of Heap Space
 
 On 10-Oct-07, at 12:19 PM, David Whalen wrote:
 
  It looks now like I can't use facets the way I was hoping 
 to because 
  the memory requirements are impractical.
 
 I can't remember if this has been mentioned, but upping the 
 HashDocSet size is one way to reduce memory consumption.  
 Whether this will work well depends greatly on the 
 cardinality of your facet sets.  facet.enum.cache.minDf set 
 high is another option (will not generate a bitset for any 
 value whose facet set is less that this value).
 
 Both options have performance implications.
 
  So, as an alternative I was thinking I could get counts by doing 
  rows=0 and using filter queries.
 
  Is there a reason to think that this might perform better?
  Or, am I simply moving the problem to another step in the process?
 
 Running one query per unique facet value seems impractical, 
 if that is what you are suggesting.  Setting minDf to a very 
 high value should always outperform such an approach.
 
 -Mike
 
  DW
 
 
 
  -Original Message-
  From: Stu Hood [mailto:[EMAIL PROTECTED]
  Sent: Tuesday, October 09, 2007 10:53 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Facets and running out of Heap Space
 
  Using the filter cache method on the things like media type and 
  location; this will occupy ~2.3MB of memory _per unique value_
 
  Mike, how did you calculate that value? I'm trying to tune 
 my caches, 
  and any equations that could be used to determine some balanced 
  settings would be extremely helpful. I'm in a memory limited 
  environment, so I can't afford to throw a ton of cache at the 
  problem.
 
  (I don't want to thread-jack, but I'm also wondering 
 whether anyone 
  has any notes on how to tune cache sizes for the filterCache, 
  queryResultCache and documentCache).
 
  Thanks,
  Stu
 
 
  -Original Message-
  From: Mike Klaas [EMAIL PROTECTED]
  Sent: Tuesday, October 9, 2007 9:30pm
  To: solr-user@lucene.apache.org
  Subject: Re: Facets and running out of Heap Space
 
  On 9-Oct-07, at 12:36 PM, David Whalen wrote:
 
  (snip)
  I'm sure we could stop storing many of these columns,
  especially  if
  someone told me that would make a big difference.
 
  I don't think that it would make a difference in memory 
 consumption, 
  but storage is certainly not necessary for faceting.  Extra stored 
  fields can slow down search if they are large (in terms of bytes), 
  but don't really occupy extra memory, unless they are 
 polluting the 
  doc cache.  Does 'text'
  need to be stored?
 
  what does the LukeReqeust Handler tell you about the # 
 of distinct 
  terms in each field that you facet on?
 
  Where would I find that?  I could probably estimate that
  myself on a
  per-column basis.  it ranges from 4 distinct values for
  media_type to
  30-ish for location to 200-ish for country_code to almost
  10,000 for
  site_id to almost 100,000 for journalist_id.
 
  Using the filter cache method on the things like media type and 
  location; this will occupy ~2.3MB of memory _per unique 
 value_, so it 
  should be a net win for those (although quite close in space 
  requirements for a 30-ary field on your index size).
 
  -Mike
 
 
 
 
 


RE: Facets and running out of Heap Space

2007-10-10 Thread David Whalen
I'll see what I can do about that.

Truthfully, the most important facet we need is the one on
media_type, which has only 4 unique values.  The second
most important one to us is location, which has about 30
unique values.

So, it would seem like we actually need a counter-intuitive
solution.  That's why I thought Field Queries might be the
solution.

Is there some reason to avoid setting multiValued to true
here?  It sounds like it would be the true cure-all

Thanks again!

dave


  

 -Original Message-
 From: Mike Klaas [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, October 10, 2007 6:20 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Facets and running out of Heap Space
 
 On 10-Oct-07, at 2:40 PM, David Whalen wrote:
 
  Accoriding to Yonik I can't use minDf because I'm faceting 
 on a string 
  field.  I'm thinking of changing it to a tokenized type so 
 that I can 
  utilize this setting, but then I'll have to rebuild my entire index.
 
  Unless there's some way around that?
 
 For the fields that matter (many unique values), this is 
 likely result in a performance regression.
 
 It might be better to try storing less unique data.  For 
 instance, faceting on the blog_url field, or create_date in 
 your schema would case problems (they probably have millions 
 of unique values).
 
 It would be helpful to know which field is causing the 
 problem.  One way would be to do a sorted query on a 
 quiescent index for each field, and see if there are any 
 suspiciously large jumps in memory usage.
 
 -Mike
 
 
 
 
  -Original Message-
  From: Mike Klaas [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, October 10, 2007 4:56 PM
  To: solr-user@lucene.apache.org
  Cc: stuhood
  Subject: Re: Facets and running out of Heap Space
 
  On 10-Oct-07, at 12:19 PM, David Whalen wrote:
 
  It looks now like I can't use facets the way I was hoping
  to because
  the memory requirements are impractical.
 
  I can't remember if this has been mentioned, but upping the
  HashDocSet size is one way to reduce memory consumption.
  Whether this will work well depends greatly on the
  cardinality of your facet sets.  facet.enum.cache.minDf set
  high is another option (will not generate a bitset for any
  value whose facet set is less that this value).
 
  Both options have performance implications.
 
  So, as an alternative I was thinking I could get counts by doing
  rows=0 and using filter queries.
 
  Is there a reason to think that this might perform better?
  Or, am I simply moving the problem to another step in the process?
 
  Running one query per unique facet value seems impractical,
  if that is what you are suggesting.  Setting minDf to a very
  high value should always outperform such an approach.
 
  -Mike
 
  DW
 
 
 
  -Original Message-
  From: Stu Hood [mailto:[EMAIL PROTECTED]
  Sent: Tuesday, October 09, 2007 10:53 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Facets and running out of Heap Space
 
  Using the filter cache method on the things like media type and
  location; this will occupy ~2.3MB of memory _per unique value_
 
  Mike, how did you calculate that value? I'm trying to tune
  my caches,
  and any equations that could be used to determine some balanced
  settings would be extremely helpful. I'm in a memory limited
  environment, so I can't afford to throw a ton of cache at the
  problem.
 
  (I don't want to thread-jack, but I'm also wondering
  whether anyone
  has any notes on how to tune cache sizes for the filterCache,
  queryResultCache and documentCache).
 
  Thanks,
  Stu
 
 
  -Original Message-
  From: Mike Klaas [EMAIL PROTECTED]
  Sent: Tuesday, October 9, 2007 9:30pm
  To: solr-user@lucene.apache.org
  Subject: Re: Facets and running out of Heap Space
 
  On 9-Oct-07, at 12:36 PM, David Whalen wrote:
 
  (snip)
  I'm sure we could stop storing many of these columns,
  especially  if
  someone told me that would make a big difference.
 
  I don't think that it would make a difference in memory
  consumption,
  but storage is certainly not necessary for faceting.  
 Extra stored
  fields can slow down search if they are large (in terms 
 of bytes),
  but don't really occupy extra memory, unless they are
  polluting the
  doc cache.  Does 'text'
  need to be stored?
 
  what does the LukeReqeust Handler tell you about the #
  of distinct
  terms in each field that you facet on?
 
  Where would I find that?  I could probably estimate that
  myself on a
  per-column basis.  it ranges from 4 distinct values for
  media_type to
  30-ish for location to 200-ish for country_code to almost
  10,000 for
  site_id to almost 100,000 for journalist_id.
 
  Using the filter cache method on the things like media type and
  location; this will occupy ~2.3MB of memory _per unique
  value_, so it
  should be a net win for those (although quite close in space
  requirements for a 30-ary field on your index size).
 
  -Mike
 
 
 
 
 
 
 
 


RE: Availability Issues

2007-10-09 Thread David Whalen
Chris:

We're using Jetty also, so I get the sense I'm looking at the
wrong log file.

On that note -- I've read that Jetty isn't the best servlet
container to use in these situations, is that your experience?

Dave


 -Original Message-
 From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
 Sent: Monday, October 08, 2007 11:20 PM
 To: solr-user
 Subject: RE: Availability Issues
 
 
 : My logs don't look anything like that.  They look like HTTP
 : requests.  Am I looking in the wrong place?
 
 what servlet container are you using?  
 
 every servlet container handles applications logs differently 
 -- it's especially tricky becuse even the format can be 
 changed, the examples i gave before are in the default format 
 you get if you use the jetty setup in the solr example (which 
 logs to stdout), but many servlet containers won't include 
 that much detail by default (they typically leave out the 
 classname and method name).  there's also typically a setting 
 that controls the verbosity -- so in some configurations only 
 the SEVERE messages are logged and in others the INFO 
 messages are logged ... you're going to want at least the 
 INFO level to debug stuff.
 
 grep all the log files you can find for Solr home set to 
 ... that's one of the first messages Solr logs.  if you can 
 find that, you'll find the other messages i was talking about.
 
 
 -Hoss
 
 
 


RE: Availability Issues

2007-10-09 Thread David Whalen
All:

How can I break up my install onto more than one box?  We've
hit a learning curve here and we don't understand how best to
proceed.  Right now we have everything crammed onto one box
because we don't know any better.

So, how would you build it if you could?  Here are the specs:

a) the index needs to hold at least 25 million articles
b) the index is constantly updated at a rate of 10,000 articles
per minute
c) we need to have faceted queries

Again, real-world experience is preferred here over book knowledge.
We've tried to read the docs and it's only made us more confused.

TIA

Dave W
  

 -Original Message-
 From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
 Sent: Monday, October 08, 2007 3:42 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Availability Issues
 
 On 10/8/07, David Whalen [EMAIL PROTECTED] wrote:
   Do you see any requests that took a really long time to finish?
 
  The requests that take a long time to finish are just 
 simple queries.  
  And the same queries run at a later time come back much faster.
 
  Our logs contain 99% inserts and 1% queries.  We are 
 constantly adding 
  documents to the index at a rate of 10,000 per minute, so the logs 
  show mostly that.
 
 Oh, so you are using the same boxes for updating and querying?
 When you insert, are you using multiple threads?  If so, how many?
 
 What is the full URL of those slow query requests?
 Do the slow requests start after a commit?
 
   Start with the thread dump.
   I bet it's multiple queries piling up around some synchronization 
   points in lucene (sometimes caused by multiple threads generating 
   the same big filter that isn't yet cached).
 
  What would be my next steps after that?  I'm not sure I'd 
 understand 
  enough from the dump to make heads-or-tails of it.  Can I 
 share that 
  here?
 
 Yes, post it here.  Most likely a majority of the threads 
 will be blocked somewhere deep in lucene code, and you will 
 probably need help from people here to figure it out.
 
 -Yonik
 
 


Facets and running out of Heap Space

2007-10-09 Thread David Whalen
Hi All.

I run a faceted query against a very large index on a 
regular schedule.  Every now and then the query throws
an out of heap space error, and we're sunk.

So, naturally we increased the heap size and things worked
well for a while and then the errors would happen again.
We've increased the initial heap size to 2.5GB and it's
still happening.

Is there anything we can do about this?

Thanks in advance,

Dave W


RE: Facets and running out of Heap Space

2007-10-09 Thread David Whalen
Hi Yonik.

According to the doc:


 This is only used during the term enumeration method of
 faceting (facet.field type faceting on multi-valued or
 full-text fields). 

What if I'm faceting on just a plain String field?  It's
not full-text, and I don't have multiValued set for it


Dave


 -Original Message-
 From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, October 09, 2007 12:47 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Facets and running out of Heap Space
 
 On 10/9/07, David Whalen [EMAIL PROTECTED] wrote:
  I run a faceted query against a very large index on a regular 
  schedule.  Every now and then the query throws an out of heap space 
  error, and we're sunk.
 
  So, naturally we increased the heap size and things worked 
 well for a 
  while and then the errors would happen again.
  We've increased the initial heap size to 2.5GB and it's still 
  happening.
 
  Is there anything we can do about this?
 
 Try facet.enum.cache.minDf param:
 http://wiki.apache.org/solr/SimpleFacetParameters
 
 -Yonik
 
 


RE: Facets and running out of Heap Space

2007-10-09 Thread David Whalen
 Make sure you have:
 requestHandler name=/admin/luke 
 class=org.apache.solr.handler.admin.LukeRequestHandler / 
 defined in solrconfig.xml

What's the consequence of me changing the solrconfig.xml file?
Doesn't that cause a restart of solr?

 for a large index, this can be very slow but the results are valuable.

In what way?  I'm still not clear on what this does for me


 -Original Message-
 From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, October 09, 2007 4:01 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Facets and running out of Heap Space
 
  
  what does the LukeReqeust Handler tell you about the # of distinct 
  terms in each field that you facet on?
  
  Where would I find that?  
 
 check:
 http://wiki.apache.org/solr/LukeRequestHandler
 
 Make sure you have:
 requestHandler name=/admin/luke 
 class=org.apache.solr.handler.admin.LukeRequestHandler / 
 defined in solrconfig.xml
 
 for a large index, this can be very slow but the results are valuable.
 
 ryan
 
 


Availability Issues

2007-10-08 Thread David Whalen
Hi All.

I'm seeing all these threads about availability and I'm
wondering why my situation is so different than others'.

We're running SOLR 1.2 with a 2.5G heap size.  On any
given day, the system becomes completely unresponsive.
We can't even get /solr/admin/ to come up, much less
any select queries.  

The only thing we can do is kill the SOLR process and
re-start it.

We are indexing over 25 million documents and we add
about as much as we remove daily, so the number remains
fairly constant.

Again, it seems like other folks are having a much
easier time with SOLR than we are.  Can anyone help
by sharing how you've got it configured?  Does anyone
have a similar experience?

TIA.

DW



RE: Availability Issues

2007-10-08 Thread David Whalen
Hi Tom.

The logs show nothing but regular activity.  We do a tail -f
on the logfile and we can read it during the unresponsive period
and we don't see any errors.

I've attached our schema/config files.  They are pretty much
out-of-the-box values, except for our index.

Dave


 -Original Message-
 From: Tom Hill [mailto:[EMAIL PROTECTED] 
 Sent: Monday, October 08, 2007 2:22 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Availability Issues
 
 Hi -
 
 We're definitely not seeing that. What do your logs show? 
 What do your schema/solrconfig look like?
 
 Tom
 
 
 On 10/8/07, David Whalen [EMAIL PROTECTED] wrote:
 
  Hi All.
 
  I'm seeing all these threads about availability and I'm 
 wondering why 
  my situation is so different than others'.
 
  We're running SOLR 1.2 with a 2.5G heap size.  On any given 
 day, the 
  system becomes completely unresponsive.
  We can't even get /solr/admin/ to come up, much less any select 
  queries.
 
  The only thing we can do is kill the SOLR process and re-start it.
 
  We are indexing over 25 million documents and we add about 
 as much as 
  we remove daily, so the number remains fairly constant.
 
  Again, it seems like other folks are having a much easier time with 
  SOLR than we are.  Can anyone help by sharing how you've got it 
  configured?  Does anyone have a similar experience?
 
  TIA.
 
  DW
 
 
 
 


RE: Availability Issues

2007-10-08 Thread David Whalen
Hi Yonik.

 What version of Solr are you running?

We're running:
Solr Specification Version: 1.2.2007.08.24.08.06.00 
Solr Implementation Version: nightly ${svnversion} - yonik - 2007-08-24 
08:06:00 
Lucene Specification Version: 2.2.0 
Lucene Implementation Version: 2.2.0 548010 - buschmi - 2007-06-16 23:15:56 

 Is the CPU pegged at 100% when it's unresponsive?

It's a little difficult to be sure.  We have a HT box and the
CPU % we get back is misleading.  I think it's safe to say we
may spike up to 100% but we don't necessarily stay pegged there.

 Have you taken a thread dump to see what is going on?

We can't do it b/c during the unresponsive time we can't access
the admin site (/solr/admin) at all.  I don't know how to do a
thread dump via the command line

 Do you get into a situation where more than one searcher is 
 warming at a time? (there is configuration that can prevent 
 this one from happening).

Forgive me when I say I'm not totally clear on what this 
question means.  The index is constantly getting hit with
a myriad or queries, if that's what you meant

Thanks,

Dave


  

 -Original Message-
 From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
 Sent: Monday, October 08, 2007 2:23 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Availability Issues
 
 On 10/8/07, David Whalen [EMAIL PROTECTED] wrote:
  We're running SOLR 1.2 with a 2.5G heap size.  On any given 
 day, the 
  system becomes completely unresponsive.
  We can't even get /solr/admin/ to come up, much less any select 
  queries.
 
 What version of Solr are you running?
 The first step to diagnose something like this is to figure 
 out what is going on...
 Is the CPU pegged at 100% when it's unresponsive?
 Have you taken a thread dump to see what is going on?
 Do you get into a situation where more than one searcher is 
 warming at a time? (there is configuration that can prevent 
 this one from happening).
 
 -Yonik
 
 


RE: Availability Issues

2007-10-08 Thread David Whalen
Hi Yonik.

 Do you see any requests that took a really long time to finish?

The requests that take a long time to finish are just simple
queries.  And the same queries run at a later time come back
much faster.

Our logs contain 99% inserts and 1% queries.  We are constantly
adding documents to the index at a rate of 10,000 per minute,
so the logs show mostly that.


 Start with the thread dump.
 I bet it's multiple queries piling up around some 
 synchronization points in lucene (sometimes caused by 
 multiple threads generating the same big filter that isn't 
 yet cached).

What would be my next steps after that?  I'm not sure I'd
understand enough from the dump to make heads-or-tails of
it.  Can I share that here?

Dave


 -Original Message-
 From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
 Sent: Monday, October 08, 2007 3:01 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Availability Issues
 
 On 10/8/07, David Whalen [EMAIL PROTECTED] wrote:
  The logs show nothing but regular activity.  We do a tail -f
  on the logfile and we can read it during the unresponsive 
 period and 
  we don't see any errors.
 
 You don't see log entries for requests until after they complete.
 When a server becomes unresponsive, try shutting off further 
 traffic to it, and let it finish whatever requests it's 
 working on (assuming that's the issue) so you can see them in 
 the log.  Do you see any requests that took a really long 
 time to finish?
 
 -Yonik
 
 


RE: Availability Issues

2007-10-08 Thread David Whalen
 Oh, so you are using the same boxes for updating and querying?

Yep.  We have a MySQL database on the box and we query it and
POST directly into SOLR via wget in PERL.  We then also hit the
box for queries.

[We'd be very interested in hearing about best practices on
how to seperate-out the data from the index and how to balance
them when the inserts outweigh the selects by factors of 50,000:1]

 When you insert, are you using multiple threads?  If so, how many?

We're not threading at all.  We have a PERL script that does a
select statement out of a MySQL database and runs POSTs sequentially
into SOLR, one per document.  After a batch of 10,000 POSTs, we run a
background commit (using waitFlush and waitSearcher)

Again, I'd be very grateful for success stories from people in terms
of good server architecture.  We are ready and willing to change versions
of linux, of the Java container, etc.  And we're ready to add more
boxes if that'll help.  We just need some guidance.

 What is the full URL of those slow query requests?

They can be anything.  For example:

[08/10/2007:18:51:55 +] GET 
/solr/select/?q=solrversion=2.2start=0rows=10indent=on HTTP/1.1 200 45799

 Do the slow requests start after a commit?

Based on the way the logs read, you could argue that point.
The stream of POSTs end in the logs and then subsequent queries
take longer to run, but it's hard to be sure there's a direct
correlation.

 Yes, post it here.  Most likely a majority of the threads 
 will be blocked somewhere deep in lucene code, and you will 
 probably need help from people here to figure it out.

Next time it happens I'll shoot it over.
  
--Dave


 -Original Message-
 From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
 Sent: Monday, October 08, 2007 3:42 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Availability Issues
 
 On 10/8/07, David Whalen [EMAIL PROTECTED] wrote:
   Do you see any requests that took a really long time to finish?
 
  The requests that take a long time to finish are just 
 simple queries.  
  And the same queries run at a later time come back much faster.
 
  Our logs contain 99% inserts and 1% queries.  We are 
 constantly adding 
  documents to the index at a rate of 10,000 per minute, so the logs 
  show mostly that.
 
 Oh, so you are using the same boxes for updating and querying?
 When you insert, are you using multiple threads?  If so, how many?
 
 What is the full URL of those slow query requests?
 Do the slow requests start after a commit?
 
   Start with the thread dump.
   I bet it's multiple queries piling up around some synchronization 
   points in lucene (sometimes caused by multiple threads generating 
   the same big filter that isn't yet cached).
 
  What would be my next steps after that?  I'm not sure I'd 
 understand 
  enough from the dump to make heads-or-tails of it.  Can I 
 share that 
  here?
 
 Yes, post it here.  Most likely a majority of the threads 
 will be blocked somewhere deep in lucene code, and you will 
 probably need help from people here to figure it out.
 
 -Yonik
 
 


RE: Availability Issues

2007-10-08 Thread David Whalen
Hi Chris.

My logs don't look anything like that.  They look like HTTP
requests.  Am I looking in the wrong place?

Dave


 -Original Message-
 From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
 Sent: Monday, October 08, 2007 5:02 PM
 To: solr-user
 Subject: RE: Availability Issues
 
 
 :  Do the slow requests start after a commit?
 : 
 : Based on the way the logs read, you could argue that point.
 : The stream of POSTs end in the logs and then subsequent queries
 : take longer to run, but it's hard to be sure there's a direct
 : correlation.
 
 you would know based on the INFO level messages related to a 
 commit ... 
 you'll see messages that look like this when the commit starts...
 
 Oct 8, 2007 1:56:48 PM 
 org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
 
 ...then you'll see a message like this...
 
 Oct 8, 2007 1:56:48 PM 
 org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: end_commit_flush
 
 ...if you have autowarming you'll see a bunch of logs about 
 that, and then eventually you'll see a message like this...
 
 Oct 8, 2007 1:56:48 PM 
 org.apache.solr.update.processor.LogUpdateProcessor finish
 INFO: {commit=} 0 299
 
 ...the important question is how many of these hangs or 
 really long queries happen in the midst of all that ... how 
 many happen very quickly after it (which may indicate not 
 enough warming)
 
 (NOTE: some of those log messages may look different in your 
 nightly snapshot version, but the main gist should be the 
 same .. i don't remember when exactly the LogUpdateProcessor 
 was added).
 
 
 
 
 -Hoss
 
 
 


RE: Availability Issues

2007-10-08 Thread David Whalen
Thanks for letting me know that.  Okay, here they are:


 BEGIN SCHEMA.XML ===


?xml version=1.0 ?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--

!-- This is the Solr schema file. This file should be named schema.xml and
 should be in the conf directory under the solr home
 (i.e. ./solr/conf/schema.xml by default) 
 or located where the classloader for the Solr webapp can find it.

 For more information, on how to customize this file, please see
 http://wiki.apache.org/solr/SchemaXml
--

schema name=enr-solr version=1.1
  !-- attribute name is the name of this schema and is only used for display 
purposes.
   Applications should change this to reflect the nature of the search 
collection.
   version=1.1 is Solr's version number for the schema syntax and 
semantics.  It should
   not normally be changed by applications.
   1.0: multiValued attribute did not exist, all fields are multiValued by 
nature
   1.1: multiValued attribute introduced, false by default --

  types
!-- field type definitions. The name attribute is
   just a label to be used by field definitions.  The class
   attribute and any other attributes determine the real
   behavior of the fieldtype.
 Class names starting with solr refer to java classes in the
   org.apache.solr.analysis package.
--

!-- The StrField type is not analyzed, but indexed/stored verbatim.  
   - StrField and TextField support an optional compressThreshold which
   limits compression (if enabled in the derived fields) to values which
   exceed a certain size (in characters).
--
fieldtype name=string class=solr.StrField sortMissingLast=true 
omitNorms=true/

!-- boolean type: true or false --
fieldtype name=boolean class=solr.BoolField sortMissingLast=true 
omitNorms=true/

!-- The optional sortMissingLast and sortMissingFirst attributes are
 currently supported on types that are sorted internally as strings.
   - If sortMissingLast=true, then a sort on this field will cause 
documents
 without the field to come after documents with the field,
 regardless of the requested sort order (asc or desc).
   - If sortMissingFirst=true, then a sort on this field will cause 
documents
 without the field to come before documents with the field,
 regardless of the requested sort order.
   - If sortMissingLast=false and sortMissingFirst=false (the default),
 then default lucene sorting will be used which places docs without the
 field first in an ascending sort and last in a descending sort.
--


!-- numeric field types that store and index the text
 value verbatim (and hence don't support range queries, since the
 lexicographic ordering isn't equal to the numeric ordering) --
fieldtype name=integer class=solr.IntField omitNorms=true/
fieldtype name=long class=solr.LongField omitNorms=true/
fieldtype name=float class=solr.FloatField omitNorms=true/
fieldtype name=double class=solr.DoubleField omitNorms=true/


!-- Numeric field types that manipulate the value into
 a string value that isn't human-readable in its internal form,
 but with a lexicographic ordering the same as the numeric ordering,
 so that range queries work correctly. --
fieldtype name=sint class=solr.SortableIntField sortMissingLast=true 
omitNorms=true/
fieldtype name=slong class=solr.SortableLongField 
sortMissingLast=true omitNorms=true/
fieldtype name=sfloat class=solr.SortableFloatField 
sortMissingLast=true omitNorms=true/
fieldtype name=sdouble class=solr.SortableDoubleField 
sortMissingLast=true omitNorms=true/


!-- The format for this date field is of the form 1995-12-31T23:59:59Z, and
 is a more restricted form of the canonical representation of dateTime
 http://www.w3.org/TR/xmlschema-2/#dateTime
 The trailing Z designates UTC time and is mandatory.
 Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z
 All other components are mandatory.

 Expressions can also be used to denote calculations that should be
 performed 

Selecting Distinct values?

2007-09-27 Thread David Whalen
Hi there.

Is there a query I can use to select distinct values in an index?
I thought I could use a facet, but the facets don't seem to return
all the distinct values in the index, only the highest-count ones.

Is there another query I can try?  Or, can I adjust the facets
somehow to make this work?

Thanks,

DW



RE: Selecting Distinct values?

2007-09-27 Thread David Whalen
grin  Silly me.  Thanks!

  

 -Original Message-
 From: Mike Klaas [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, September 27, 2007 4:46 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Selecting Distinct values?
 
 On 27-Sep-07, at 12:01 PM, David Whalen wrote:
 
  Hi there.
 
  Is there a query I can use to select distinct values in an index?
  I thought I could use a facet, but the facets don't seem to 
 return all 
  the distinct values in the index, only the highest-count ones.
 
  Is there another query I can try?  Or, can I adjust the 
 facets somehow 
  to make this work?
 
 http://wiki.apache.org/solr/SimpleFacetParameters#head-1b28106
 7d007d3fb66f07a3e90e9b1704cbc59a3
 
 cheers,
 -Mike
 
 


quirks with sorting

2007-09-10 Thread David Whalen
Hi All.

I'm seeing a weird problem with sorting that I can't figure out.

I have a query that uses two fields -- a source column and a
date column.  I search on the source and I sort by the date
descending.

What I'm seeing is that depending on the value in the source,
the date sort works in reverse.

For example, the query:

content_source:(mv); content_date desc

returns 2007-09-10T09:25:00.000Z in its first row, which is what
I expect.

BUT, the query:

content_source:(thomson); content_date desc

returns 2008-08-17T00:00:00.000Z, which is the first date we
put into SOLR.

So, simply by changing the value in the field, the sort seems
to beem reversed (or ignored outright).

Now, before you ask, I did a sanity-check query to make sure
that there is in fact data for that source from today, and there
is.

Can anyone help shed some light on this?

TIA

DW


RE: quirks with sorting

2007-09-10 Thread David Whalen
red-faced

You know, I must have looked at that date 10 times and I never
noticed the year.

Sorry everyone!

/red-faced

  

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf 
 Of Yonik Seeley
 Sent: Monday, September 10, 2007 11:23 AM
 To: solr-user@lucene.apache.org
 Subject: Re: quirks with sorting
 
 On 9/10/07, David Whalen [EMAIL PROTECTED] wrote:
  I'm seeing a weird problem with sorting that I can't figure out.
 
  I have a query that uses two fields -- a source column and a date 
  column.  I search on the source and I sort by the date descending.
 
  What I'm seeing is that depending on the value in the 
 source, the date 
  sort works in reverse.
 
  For example, the query:
 
  content_source:(mv); content_date desc
 
  returns 2007-09-10T09:25:00.000Z in its first row, which is what I 
  expect.
 
  BUT, the query:
 
  content_source:(thomson); content_date desc
 
  returns 2008-08-17T00:00:00.000Z, which is the first date 
 we put into 
  SOLR.
 
 It is it the last (highest date) since it's 2008?
 
 -Yonik
 


searching where a value is not null?

2007-09-06 Thread David Whalen
Hi all.

I'm trying to construct a query that in pseudo-code would read
like this:

field != ''

I'm finding it difficult to write this as a solr query, though.
Stuff like:

NOT field:()

doesn't seem to do the trick.

any ideas?

dw


Effects of changing schema?

2007-08-24 Thread David Whalen
Hi All.

I'm unclear on whether changing the schema.xml file
automatically causes a reindex or not.  If I'm adding
a field to the schema (and removing some unused ones),
does solr do the reindex?  Or, do I have to kick it
off myself.

Ideally, we'd like to avoid a reindex...

Thanks!

DW


Problem with stemming

2007-08-13 Thread David Whalen
Hi All.

We're running into a problem with stemming that I can't
figure out.  For example, searching for the word transit
(whether in quotes or not) returns documents with the word
transition in them.

How do I disable this?  We want our engine to be as literal
as possible.  If a user mis-types a word, that's too bad for
them

TIA

DW


RE: Problem with stemming

2007-08-13 Thread David Whalen
Yonik:

I only raised the question to the group after I had looked in
the schema.xml.  There are a lot of comments in that file, but
they make no sense to me.  

I'd appreciate some specific help on what to do...

DW

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf 
 Of Yonik Seeley
 Sent: Monday, August 13, 2007 3:28 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Problem with stemming
 
 On 8/13/07, David Whalen [EMAIL PROTECTED] wrote:
  Hi All.
 
  We're running into a problem with stemming that I can't 
 figure out.  
  For example, searching for the word transit
  (whether in quotes or not) returns documents with the word 
  transition in them.
 
  How do I disable this?  We want our engine to be as literal as 
  possible.  If a user mis-types a word, that's too bad for them
 
 Use a different field-type for those fields that you want 
 exact matching for (and then re-index).
 Read through schema.xml if you haven't... there are quite a 
 few comments in there.
 You may want a field type with just a whitespace tokenizer 
 followed by a lowercase filter.
 
 -Yonik
 


RE: Problem with stemming

2007-08-13 Thread David Whalen
Thanks, guys.  I'm sure that by the time I get the book and
learn all about Lucene the CEO of my company will have insisted
we find another search engine.  But the book will look great
on my coffee table


  

 -Original Message-
 From: Lance Norskog [mailto:[EMAIL PROTECTED] 
 Sent: Monday, August 13, 2007 4:37 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Problem with stemming
 
 (Oops, try again.)
 
 You need this book:
 
 http://www.amazon.com/Lucene-Action-Erik-Hatcher/dp/1932394281
 /ref=pd_bbs_sr
 _1/103-4871137-7111056?ie=UTF8s=booksqid=1187037246sr=8-1
 
 Lucene in Action by Eric Hatcher and Otis Gospodnetic.  It 
 does not cover Solr really, but you will understand what 
 Lucene does and how it works.
 Until then you will not really get anywhere.
 
 Cheers,
 
 Lance 
 
 -Original Message-
 From: David Whalen [mailto:[EMAIL PROTECTED]
 Sent: Monday, August 13, 2007 1:00 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Problem with stemming
 
 Yonik:
 
 I only raised the question to the group after I had looked in 
 the schema.xml.  There are a lot of comments in that file, 
 but they make no sense to me.  
 
 I'd appreciate some specific help on what to do...
 
 DW
 
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On 
 Behalf Of Yonik 
  Seeley
  Sent: Monday, August 13, 2007 3:28 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Problem with stemming
  
  On 8/13/07, David Whalen [EMAIL PROTECTED] wrote:
   Hi All.
  
   We're running into a problem with stemming that I can't
  figure out.  
   For example, searching for the word transit
   (whether in quotes or not) returns documents with the word 
   transition in them.
  
   How do I disable this?  We want our engine to be as literal as 
   possible.  If a user mis-types a word, that's too bad for them
  
  Use a different field-type for those fields that you want exact 
  matching for (and then re-index).
  Read through schema.xml if you haven't... there are quite a few 
  comments in there.
  You may want a field type with just a whitespace tokenizer 
 followed by 
  a lowercase filter.
  
  -Yonik
  
 
 


RE: Problem with stemming

2007-08-13 Thread David Whalen
So I shut it off by removing these tags from my schema.xml
file?  Seems like it's this Porter thing that's messing
me up.


 -Original Message-
 From: Tom Mastre [mailto:[EMAIL PROTECTED] 
 Sent: Monday, August 13, 2007 5:19 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Problem with stemming
 
 Go here
 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?hi
ghlight=%28stemming%29 #head-88cc86e4432b359030cffdb32d095062b843d4f5
 
 Look for this 
 
 solr.PorterStemFilterFactory
 
 
 On 8/13/07 1:50 PM, David Whalen [EMAIL PROTECTED] wrote:
 
  Thanks, guys.  I'm sure that by the time I get the book and
  learn all about Lucene the CEO of my company will have insisted
  we find another search engine.  But the book will look great
  on my coffee table
  
  

  
  -Original Message-
  From: Lance Norskog [mailto:[EMAIL PROTECTED]
  Sent: Monday, August 13, 2007 4:37 PM
  To: solr-user@lucene.apache.org
  Subject: RE: Problem with stemming
  
  (Oops, try again.)
  
  You need this book:
  
  http://www.amazon.com/Lucene-Action-Erik-Hatcher/dp/1932394281
  /ref=pd_bbs_sr
  _1/103-4871137-7111056?ie=UTF8s=booksqid=1187037246sr=8-1
  
  Lucene in Action by Eric Hatcher and Otis Gospodnetic.  It
  does not cover Solr really, but you will understand what
  Lucene does and how it works.
  Until then you will not really get anywhere.
  
  Cheers,
  
  Lance 
  
  -Original Message-
  From: David Whalen [mailto:[EMAIL PROTECTED]
  Sent: Monday, August 13, 2007 1:00 PM
  To: solr-user@lucene.apache.org
  Subject: RE: Problem with stemming
  
  Yonik:
  
  I only raised the question to the group after I had looked in
  the schema.xml.  There are a lot of comments in that file,
  but they make no sense to me.
  
  I'd appreciate some specific help on what to do...
  
  DW
  
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
  Behalf Of Yonik 
  Seeley
  Sent: Monday, August 13, 2007 3:28 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Problem with stemming
  
  On 8/13/07, David Whalen [EMAIL PROTECTED] wrote:
  Hi All.
  
  We're running into a problem with stemming that I can't
  figure out.  
  For example, searching for the word transit
  (whether in quotes or not) returns documents with the word
  transition in them.
  
  How do I disable this?  We want our engine to be as literal as
  possible.  If a user mis-types a word, that's too bad 
 for them
  
  Use a different field-type for those fields that you want exact
  matching for (and then re-index).
  Read through schema.xml if you haven't... there are quite a few
  comments in there.
  You may want a field type with just a whitespace tokenizer
  followed by 
  a lowercase filter.
  
  -Yonik
  
  
  
 
 Thomas M. Mastre 
 Manager, Homeland Security Digital Library
 
  
 Center for Homeland Defense and Security
 The Nation's Homeland Security Educator
 1 University Circle
 DKL, Room 112 
 Monterey, Ca. 93943
 Phone: 831.656.1095, Cell:831.238.1451
 Fax:831.656.2619 
 email: [EMAIL PROTECTED]
 http://www.hsdl.org
 
 


RE: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread David Whalen
What we're looking for is a way to inject *without* using
curl, or wget, or any other http-based communication.  We'd
like for the HTTP daemon to only handle search requests, not
indexing requests on top of them.

Plus, I have to believe there's a faster way to get documents
into solr/lucene than using curl

_
david whalen
senior applications developer
eNR Services, Inc.
[EMAIL PROTECTED]
203-849-7240
  

 -Original Message-
 From: Clay Webster [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, August 09, 2007 11:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Any clever ideas to inject into solr? Without http?
 
 Condensing the loader into a single executable sounds right 
 if you have performance problems. ;-)
 
 You could also try adding multiple docs in a single post if 
 you notice your problems are with tcp setup time, though if 
 you're doing localhost connections that should be minimal.
 
 If you're already local to the solr server, you might check 
 out the CSV slurper. http://wiki.apache.org/solr/UpdateCSV  
 It's a little specialized.
 
 And then there's of course the question of are you doing 
 full re-indexing or incremental indexing of changes?
 
 --cw
 
 
 On 8/9/07, Kevin Holmes [EMAIL PROTECTED] wrote:
 
  I inherited an existing (working) solr indexing script that 
 runs like
  this:
 
 
 
  Python script queries the mysql DB then calls bash script
 
  Bash script performs a curl POST submit to solr
 
 
 
  We're injecting about 1000 records / minute (constantly), 
 frequently 
  pushing the edge of our CPU / RAM limitations.
 
 
 
  I'm in the process of building a Perl script to use DBI and 
  lwp::simple::post that will perform this all from a single script 
  (instead of 3).
 
 
 
  Two specific questions
 
  1: Does anyone have a clever (or better) way to perform 
 this process 
  efficiently?
 
 
 
  2: Is there a way to inject into solr without using POST / 
 curl / http?
 
 
 
  Admittedly, I'm no solr expert - I'm starting from someone else's 
  setup, trying to reverse-engineer my way out.  Any input would be 
  greatly appreciated.
 
 
 


RE: Solr 1.1 HTTP server stops responding

2007-07-30 Thread David Whalen
Hi All.

I'm still hoping to get some insight into how I can solve this
issue.  If Jetty is the problem I'll happily get rid of it, but
I'd feel better if I could do some tests first to be sure I'm
solving the problem.

Has anyone else had this problem in the past?

Thanks,

DW



 -Original Message-
 From: David Whalen [mailto:[EMAIL PROTECTED] 
 Sent: Friday, July 27, 2007 10:49 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Solr 1.1 HTTP server stops responding
 
 We're using Jetty.  I don't know what version though.  To my 
 knowledge, Solr is the only thing running inside it.  
 
 Yes, we cannot get to the admin pages either.  Nothing on port
 8983 responds.
 
 So maybe it's actually Jetty that's messing me up?  How can I 
 make sure of that?
 
 Thanks for the help!
 
 DW
 
 
  -Original Message-
  From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
  Sent: Friday, July 27, 2007 10:40 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr 1.1 HTTP server stops responding
  
  Solr runs as a webapp (think .war file) inside a servlet container 
  (e.g. Tomcat, Jetty, Resin...).  It could be that the 
 servlet contan 
  itself has a bug that prevents it from responding properly after a 
  while.  If you have other webapps in the same container, do 
 they still 
  respond?  Can you got to
  *any* of Solr's pages (e.g. admin page)?  Anything in 
 container or 
  Solr logs?
  
  Otis
  --
  Lucene Consulting - http://lucene-consulting.com/
  
  
  
  - Original Message 
  From: David Whalen [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Sent: Friday, July 27, 2007 4:21:18 PM
  Subject: RE: Solr 1.1 HTTP server stops responding
  
  Hi Otis.
  
  I'm filling-in for the guy that installed the software for us (now 
  he's long gone), so I'm just getting familiar with all of 
 this.  Can 
  you elaborate on what you mean?
  
  DW
  
  
   -Original Message-
   From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
   Sent: Friday, July 27, 2007 10:01 AM
   To: solr-user@lucene.apache.org
   Subject: Re: Solr 1.1 HTTP server stops responding
   
   Hi David,
   
   Have you ruled out your servlet container as the source 
 of this bug?
   
   Otis
   
   
   - Original Message 
   From: David Whalen [EMAIL PROTECTED]
   To: solr-user@lucene.apache.org
   Sent: Friday, July 27, 2007 3:06:42 PM
   Subject: Solr 1.1 HTTP server stops responding
   
   Hi All.
   
   We're running Solr 1.1 and we're seeing intermittent cases
  where Solr
   stops responding to HTTP requests.  It seems like the
  listener on port
   8983 just doesn't respond.
   
   We stop and restart Solr and everything works fine for a 
 few hours, 
   and then the problem returns.  We can't seem to point to 
 any single 
   factor that would lead to this problem, and I'm hoping to 
 get some 
   hints on how to diagnose it.
   
   Here's what I can tell you now, and I can provide more info by
   request:
   
   1) The query load (via /solr/select) isn't that high.  
  Maybe 20 or 30
   requests per minute tops.
   
   2) The insert load (via /solr/update) is very high.  We
  commit almost
   500,000 documents per day.  We also trim out the same
  number however,
   so the net number of documents should stay around 20 million.
   
   3) We do see Out of Memory errors sometimes, especially 
 when making 
   facet queries (which we do most of the time).
   
   We think solr is great, and we want to keep using it, but
  the downtime
   makes the product (and us) look bad, so we need to solve 
 this soon.
   
   Thanks in advance for your help!
   
   DW
   
   
   
   
   No virus found in this incoming message.
   Checked by AVG Free Edition. 
   Version: 7.5.476 / Virus Database: 269.10.22/922 - Release
   Date: 7/27/2007 6:08 AM

   
  
  
  
  
  No virus found in this incoming message.
  Checked by AVG Free Edition. 
  Version: 7.5.476 / Virus Database: 269.10.22/922 - Release
  Date: 7/27/2007 6:08 AM
   
  
 
 No virus found in this incoming message.
 Checked by AVG Free Edition. 
 Version: 7.5.476 / Virus Database: 269.10.22/922 - Release 
 Date: 7/27/2007 6:08 AM
  
 


Please help! Solr 1.1 HTTP server stops responding

2007-07-30 Thread David Whalen
Guys:

Can anyone help me?  Things are getting serious at my
company and heads are going to roll.  

I need to figure out why solr just suddenly stops responding
without any warning.

DW
  

 -Original Message-
 From: David Whalen [mailto:[EMAIL PROTECTED] 
 Sent: Friday, July 27, 2007 10:49 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Solr 1.1 HTTP server stops responding
 
 We're using Jetty.  I don't know what version though.  To my 
 knowledge, Solr is the only thing running inside it.  
 
 Yes, we cannot get to the admin pages either.  Nothing on port
 8983 responds.
 
 So maybe it's actually Jetty that's messing me up?  How can I 
 make sure of that?
 
 Thanks for the help!
 
 DW
 
 
  -Original Message-
  From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
  Sent: Friday, July 27, 2007 10:40 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr 1.1 HTTP server stops responding
  
  Solr runs as a webapp (think .war file) inside a servlet container 
  (e.g. Tomcat, Jetty, Resin...).  It could be that the 
 servlet contan 
  itself has a bug that prevents it from responding properly after a 
  while.  If you have other webapps in the same container, do 
 they still 
  respond?  Can you got to
  *any* of Solr's pages (e.g. admin page)?  Anything in 
 container or 
  Solr logs?
  
  Otis
  --
  Lucene Consulting - http://lucene-consulting.com/
  
  
  
  - Original Message 
  From: David Whalen [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Sent: Friday, July 27, 2007 4:21:18 PM
  Subject: RE: Solr 1.1 HTTP server stops responding
  
  Hi Otis.
  
  I'm filling-in for the guy that installed the software for us (now 
  he's long gone), so I'm just getting familiar with all of 
 this.  Can 
  you elaborate on what you mean?
  
  DW
  
  
   -Original Message-
   From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
   Sent: Friday, July 27, 2007 10:01 AM
   To: solr-user@lucene.apache.org
   Subject: Re: Solr 1.1 HTTP server stops responding
   
   Hi David,
   
   Have you ruled out your servlet container as the source 
 of this bug?
   
   Otis
   
   
   - Original Message 
   From: David Whalen [EMAIL PROTECTED]
   To: solr-user@lucene.apache.org
   Sent: Friday, July 27, 2007 3:06:42 PM
   Subject: Solr 1.1 HTTP server stops responding
   
   Hi All.
   
   We're running Solr 1.1 and we're seeing intermittent cases
  where Solr
   stops responding to HTTP requests.  It seems like the
  listener on port
   8983 just doesn't respond.
   
   We stop and restart Solr and everything works fine for a 
 few hours, 
   and then the problem returns.  We can't seem to point to 
 any single 
   factor that would lead to this problem, and I'm hoping to 
 get some 
   hints on how to diagnose it.
   
   Here's what I can tell you now, and I can provide more info by
   request:
   
   1) The query load (via /solr/select) isn't that high.  
  Maybe 20 or 30
   requests per minute tops.
   
   2) The insert load (via /solr/update) is very high.  We
  commit almost
   500,000 documents per day.  We also trim out the same
  number however,
   so the net number of documents should stay around 20 million.
   
   3) We do see Out of Memory errors sometimes, especially 
 when making 
   facet queries (which we do most of the time).
   
   We think solr is great, and we want to keep using it, but
  the downtime
   makes the product (and us) look bad, so we need to solve 
 this soon.
   
   Thanks in advance for your help!
   
   DW
   
   
   
   
   No virus found in this incoming message.
   Checked by AVG Free Edition. 
   Version: 7.5.476 / Virus Database: 269.10.22/922 - Release
   Date: 7/27/2007 6:08 AM

   
  
  
  
  
  No virus found in this incoming message.
  Checked by AVG Free Edition. 
  Version: 7.5.476 / Virus Database: 269.10.22/922 - Release
  Date: 7/27/2007 6:08 AM
   
  
 
 No virus found in this incoming message.
 Checked by AVG Free Edition. 
 Version: 7.5.476 / Virus Database: 269.10.22/922 - Release 
 Date: 7/27/2007 6:08 AM
  
 


RE: Please help! Solr 1.1 HTTP server stops responding

2007-07-30 Thread David Whalen
Hi Yonik!

I'm glad to finally get to talk to you.  We're all very impressed
with solr and when it's running it's really great.

We increased the heap size to 1500M and that didn't seem to help.
In fact, the crashes seem to occur more now than ever.  We're
constantly restarting solr just to get a response.

I don't know enough to know where the log files are to answer
your question (again, I'm filling in for the guy that set us 
up with all this).  Can I ask for your patience so we can figure
this out?

Thanks!

Dave W


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf 
 Of Yonik Seeley
 Sent: Monday, July 30, 2007 2:23 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Please help! Solr 1.1 HTTP server stops responding
 
 It may be related to the out-of-memory errors you were seeing.
 severe errors like that should never be ignored.
 Do you see any other warning or severe errors in your logs?
 
 -Yonik
 
 On 7/30/07, David Whalen [EMAIL PROTECTED] wrote:
  Guys:
 
  Can anyone help me?  Things are getting serious at my company and 
  heads are going to roll.
 
  I need to figure out why solr just suddenly stops 
 responding without 
  any warning.
 
  DW
 
 
   -Original Message-
   From: David Whalen [mailto:[EMAIL PROTECTED]
   Sent: Friday, July 27, 2007 10:49 AM
   To: solr-user@lucene.apache.org
   Subject: RE: Solr 1.1 HTTP server stops responding
  
   We're using Jetty.  I don't know what version though.  To my 
   knowledge, Solr is the only thing running inside it.
  
   Yes, we cannot get to the admin pages either.  Nothing on port
   8983 responds.
  
   So maybe it's actually Jetty that's messing me up?  How 
 can I make 
   sure of that?
  
   Thanks for the help!
  
   DW
  
  
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Friday, July 27, 2007 10:40 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 1.1 HTTP server stops responding
   
Solr runs as a webapp (think .war file) inside a 
 servlet container 
(e.g. Tomcat, Jetty, Resin...).  It could be that the
   servlet contan
itself has a bug that prevents it from responding 
 properly after a 
while.  If you have other webapps in the same container, do
   they still
respond?  Can you got to
*any* of Solr's pages (e.g. admin page)?  Anything in
   container or
Solr logs?
   
Otis
--
Lucene Consulting - http://lucene-consulting.com/
   
   
   
- Original Message 
From: David Whalen [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, July 27, 2007 4:21:18 PM
Subject: RE: Solr 1.1 HTTP server stops responding
   
Hi Otis.
   
I'm filling-in for the guy that installed the software 
 for us (now 
he's long gone), so I'm just getting familiar with all of
   this.  Can
you elaborate on what you mean?
   
DW
   
   
 -Original Message-
 From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
 Sent: Friday, July 27, 2007 10:01 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 1.1 HTTP server stops responding

 Hi David,

 Have you ruled out your servlet container as the source
   of this bug?

 Otis


 - Original Message 
 From: David Whalen [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Friday, July 27, 2007 3:06:42 PM
 Subject: Solr 1.1 HTTP server stops responding

 Hi All.

 We're running Solr 1.1 and we're seeing intermittent cases
where Solr
 stops responding to HTTP requests.  It seems like the
listener on port
 8983 just doesn't respond.

 We stop and restart Solr and everything works fine for a
   few hours,
 and then the problem returns.  We can't seem to point to
   any single
 factor that would lead to this problem, and I'm hoping to
   get some
 hints on how to diagnose it.

 Here's what I can tell you now, and I can provide more info by
 request:

 1) The query load (via /solr/select) isn't that high.
Maybe 20 or 30
 requests per minute tops.

 2) The insert load (via /solr/update) is very high.  We
commit almost
 500,000 documents per day.  We also trim out the same
number however,
 so the net number of documents should stay around 20 million.

 3) We do see Out of Memory errors sometimes, especially
   when making
 facet queries (which we do most of the time).

 We think solr is great, and we want to keep using it, but
the downtime
 makes the product (and us) look bad, so we need to solve
   this soon.

 Thanks in advance for your help!

 DW




 No virus found in this incoming message.
 Checked by AVG Free Edition.
 Version: 7.5.476 / Virus Database: 269.10.22/922 - Release
 Date: 7/27/2007 6:08 AM


   
   
   
   
No virus found

RE: Please help! Solr 1.1 HTTP server stops responding

2007-07-30 Thread David Whalen
Yonik:

 If that's not the problem, you could decrease memory usage 
 due to faceting by upgrading to Solr 1.2 and using 
 facet.enum.cache.minDf

Is it hard to upgrade from 1.1 to 1.2?  We were considering
making that change if it wouldn't cost us a lot of downtime.

can you help me understand what using facet.enum.cache.minDf
means?  Is that a setting in the config file?

Dave W

  

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf 
 Of Yonik Seeley
 Sent: Monday, July 30, 2007 3:29 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Please help! Solr 1.1 HTTP server stops responding
 
 Grep for PERFORMANCE in the logs to make sure that you aren't 
 running into a scenario where more than one searcher is 
 warming in the background.
 
 If that's not the problem, you could decrease memory usage 
 due to faceting by upgrading to Solr 1.2 and using 
 facet.enum.cache.minDf
 
 -Yonik
 
 On 7/30/07, Kevin Holmes [EMAIL PROTECTED] wrote:
  Just got this:
 
 
 
  Jul 30, 2007 3:02:14 PM org.apache.solr.core.SolrException log
  SEVERE: java.lang.OutOfMemoryError: Java heap space
 
  Jul 30, 2007 3:02:30 PM org.apache.solr.core.SolrException log
  SEVERE: java.lang.OutOfMemoryError: Java heap space
 
 
 
 
 
  Kevin Holmes
  eNR Services, Inc.
  20 Glover Ave. 2nd Floor
  Norwalk, CT. 06851
  203-849-7248
  [EMAIL PROTECTED]
 
 
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On 
 Behalf Of Yonik 
  Seeley
  Sent: Monday, July 30, 2007 2:55 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Please help! Solr 1.1 HTTP server stops responding
 
  On 7/30/07, David Whalen [EMAIL PROTECTED] wrote:
   We increased the heap size to 1500M and that didn't seem to help.
   In fact, the crashes seem to occur more now than ever.  We're 
   constantly restarting solr just to get a response.
  
   I don't know enough to know where the log files are to 
 answer your 
   question
 
  Me neither ;-)
  Solr's example app that uses Jetty just has logging going to stdout 
  (the console) to make it clear and visible to new users 
 when an error 
  happens.  Hopefully you've configured Jetty to log to files, or at 
  least redirected Jetty's stdout/stderr to a file.
  You need to look around and try and find those log files.
  If you find them, one thing to look for would be WARNING 
 in the log 
  files.  Another thing to look for would be Exception or Memory
 
   So maybe it's actually Jetty that's messing me up?  How 
 can I make 
   sure of that?
 
  Perhaps point your browser at http://localhost:8983/ and see if you 
  get any reponse at all.
 
  -Yonik
 
 
 No virus found in this incoming message.
 Checked by AVG Free Edition. 
 Version: 7.5.476 / Virus Database: 269.10.25/926 - Release 
 Date: 7/29/2007 11:14 PM
  
 


faceting on multiple columns

2007-07-30 Thread David Whalen
Hi All.

I am using facets to help me build an ajax-driven tree for
search results.  When the search is first run, all I need to
do is show the counts per facet, for example

search results for fred
+--A (102)
+--B (234)
+--C (721)
+--D (512)

sounds simple, but I also need to break-down the results from
D by a different index in lucene:

search results for fred
+--A (102)
+--B (234)
+--C (721)
+--D (512)
  +--D1 (19)
  +--D2 (34)
  +--D3 (45)

what I have been doing in my solr querystring looks like this:

rows=0facet=truefacet.limit=-1facet.field=field1facet.field=field
2

Unfortunately we're seeing really bad performance and we're
constantly running out of heap space on this type of query.

So, my question is, would breaking this into two calls perform
better?  That is,

rows=0facet=truefacet.limit=-1facet.field=field1

and then

rows=0facet=truefacet.limit=-1facet.field=field2

?

It seems to me that two calls would have more overhead than one,
but it might lessen the impact on the heap space on my server.

Anyone work enough with facets to throw in their two cents?

Thanks!

Dave W.



RE: Solr 1.1 HTTP server stops responding

2007-07-27 Thread David Whalen
Hi Otis.

I'm filling-in for the guy that installed the software for us
(now he's long gone), so I'm just getting familiar with all of
this.  Can you elaborate on what you mean?

DW


 -Original Message-
 From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
 Sent: Friday, July 27, 2007 10:01 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 1.1 HTTP server stops responding
 
 Hi David,
 
 Have you ruled out your servlet container as the source of this bug?
 
 Otis
 
 
 - Original Message 
 From: David Whalen [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Friday, July 27, 2007 3:06:42 PM
 Subject: Solr 1.1 HTTP server stops responding
 
 Hi All.
 
 We're running Solr 1.1 and we're seeing intermittent cases 
 where Solr stops responding to HTTP requests.  It seems like 
 the listener on port 8983 just doesn't respond.
 
 We stop and restart Solr and everything works fine for a few 
 hours, and then the problem returns.  We can't seem to point 
 to any single factor that would lead to this problem, and I'm 
 hoping to get some hints on how to diagnose it.
 
 Here's what I can tell you now, and I can provide more info 
 by request:
 
 1) The query load (via /solr/select) isn't that high.  Maybe 
 20 or 30 requests per minute tops.
 
 2) The insert load (via /solr/update) is very high.  We 
 commit almost 500,000 documents per day.  We also trim out 
 the same number however, so the net number of documents 
 should stay around 20 million.
 
 3) We do see Out of Memory errors sometimes, especially when 
 making facet queries (which we do most of the time).
 
 We think solr is great, and we want to keep using it, but the 
 downtime makes the product (and us) look bad, so we need to 
 solve this soon.
 
 Thanks in advance for your help!
 
 DW
 
 
 
 
 No virus found in this incoming message.
 Checked by AVG Free Edition. 
 Version: 7.5.476 / Virus Database: 269.10.22/922 - Release 
 Date: 7/27/2007 6:08 AM
  
 


RE: Solr 1.1 HTTP server stops responding

2007-07-27 Thread David Whalen
We're using Jetty.  I don't know what version though.  To my
knowledge, Solr is the only thing running inside it.  

Yes, we cannot get to the admin pages either.  Nothing on port
8983 responds.

So maybe it's actually Jetty that's messing me up?  How can
I make sure of that?

Thanks for the help!

DW


 -Original Message-
 From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
 Sent: Friday, July 27, 2007 10:40 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 1.1 HTTP server stops responding
 
 Solr runs as a webapp (think .war file) inside a servlet 
 container (e.g. Tomcat, Jetty, Resin...).  It could be that 
 the servlet contan itself has a bug that prevents it from 
 responding properly after a while.  If you have other webapps 
 in the same container, do they still respond?  Can you got to 
 *any* of Solr's pages (e.g. admin page)?  Anything in 
 container or Solr logs?
 
 Otis
 --
 Lucene Consulting - http://lucene-consulting.com/
 
 
 
 - Original Message 
 From: David Whalen [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Friday, July 27, 2007 4:21:18 PM
 Subject: RE: Solr 1.1 HTTP server stops responding
 
 Hi Otis.
 
 I'm filling-in for the guy that installed the software for us 
 (now he's long gone), so I'm just getting familiar with all 
 of this.  Can you elaborate on what you mean?
 
 DW
 
 
  -Original Message-
  From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
  Sent: Friday, July 27, 2007 10:01 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr 1.1 HTTP server stops responding
  
  Hi David,
  
  Have you ruled out your servlet container as the source of this bug?
  
  Otis
  
  
  - Original Message 
  From: David Whalen [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Sent: Friday, July 27, 2007 3:06:42 PM
  Subject: Solr 1.1 HTTP server stops responding
  
  Hi All.
  
  We're running Solr 1.1 and we're seeing intermittent cases 
 where Solr 
  stops responding to HTTP requests.  It seems like the 
 listener on port 
  8983 just doesn't respond.
  
  We stop and restart Solr and everything works fine for a few hours, 
  and then the problem returns.  We can't seem to point to any single 
  factor that would lead to this problem, and I'm hoping to get some 
  hints on how to diagnose it.
  
  Here's what I can tell you now, and I can provide more info by 
  request:
  
  1) The query load (via /solr/select) isn't that high.  
 Maybe 20 or 30 
  requests per minute tops.
  
  2) The insert load (via /solr/update) is very high.  We 
 commit almost 
  500,000 documents per day.  We also trim out the same 
 number however, 
  so the net number of documents should stay around 20 million.
  
  3) We do see Out of Memory errors sometimes, especially when making 
  facet queries (which we do most of the time).
  
  We think solr is great, and we want to keep using it, but 
 the downtime 
  makes the product (and us) look bad, so we need to solve this soon.
  
  Thanks in advance for your help!
  
  DW
  
  
  
  
  No virus found in this incoming message.
  Checked by AVG Free Edition. 
  Version: 7.5.476 / Virus Database: 269.10.22/922 - Release
  Date: 7/27/2007 6:08 AM
   
  
 
 
 
 
 No virus found in this incoming message.
 Checked by AVG Free Edition. 
 Version: 7.5.476 / Virus Database: 269.10.22/922 - Release 
 Date: 7/27/2007 6:08 AM