Re: JVM random crashes

2007-03-06 Thread Dimitar Ouzounov

It will probably turn out to be a hardware problem - a bad RAM chip. I
removed it and today I will test Solr again to make sure everything is fine.

On 3/5/07, Bill Au [EMAIL PROTECTED] wrote:


Seems like this maybe a JVM bug:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6500147

http://forum.java.sun.com/thread.jspa?threadID=659990messageID=3876052

Have you tried using a different garbage collector?

Bill

On 3/3/07, Jed Reynolds [EMAIL PROTECTED] wrote:

 Yonik Seeley wrote:
  On 3/3/07, Dimitar Ouzounov [EMAIL PROTECTED] wrote:
  But what hardware problem could it be? Tomorrow I'll make sure that
the
  memory is fine, but nothing
  else comes to my mind.
 
  Memory, motherboard, etc.
  Try http://www.memtest86.com/ to test this.
 
  It may be OS-related - probably a buggy version of
  some library. But which library?
 
  Yep, we've seen that in the past.
  I'd recommend going with OS versions that vendors test with.
  The commercial RHEL or the free clone of it http://www.centos.org/,
  would be my recommendation.
 

 I'm running a lot of CentOS 4.4 myself, on i686 and x86_64 processors.
 I'm testing out Solr on an i686 with JDK 1.5 and I'm running a
 production copy of Nutch on x86_64 JDK 1.5, Tomcat 1.5. It's been rock
 solid.

 From trying to install Java in the past on FC5, I read a lot about how
 you had to be rather careful to make absolutely certain that you had no
 conflicting gjc libs in your path. If this is a production box, I'd got
 with a longer-supported OS than FC6. If the server is only for searching
 and apache, I don't think FC6 will give you any noticeable performance
 boost over CentOS 4.4. FC6's performance enhancements with
 glibc-hash-binding won't affect a JVM.


 Jed




Re: problem with solr.HTMLStripWhitespaceTokenizerFactory

2007-03-06 Thread Yonik Seeley

On 3/6/07, mike topper [EMAIL PROTECTED] wrote:

when inserting it it seems like nothing happens ie when i do a query
here is the response for a test description:

str name=description

brhibrmybrnamebrisbrtopperbrand this bnbsp;blahblah/b is a 
btest/b

/str


The tag stripping happens during the analysis phase, and affects what
gets indexed.
For returned field values, you get what you put in.

-Yonik


Re: Dynamic RequestHandler loading

2007-03-06 Thread Chris Hostetter

: getRequestHandlers() would be equivolent to:
: getRequestHandlers( SolrRequestHandler.class )
:
: We will need some way to ask what is registered without knowing the
: path it is registered to.

getting instances by class seems like a pretty special case situation ...
i'd rather not add a bunch of methods that really only have one use case
which isn't even in the main code base.

Adding a MapString,SolrRequestHandler getRequestHandlers() method to
the core seems useful enough in a broad case to solve any special needs
custom code might have -- find instances by interface etc...

As long as we change the SolrCore initialization to construct all
SOlrRequestHandler instances and build up that Map prior to calling hte
init method on them, it would also solve the what name am i registered
with question for RequestHandlers without needing to change the INterface
in a backwards incompatible way.  (handlers that want to know could get
the Map and look for a value they are == to)


-Hoss



RE: Time after snapshot is visible on the slave

2007-03-06 Thread Graham Stead
Hi Galo,

The snapinstaller actually performs a commit as its last step, so if that
didn't work, it's not surprising that running commit separately didn't work,
either.

I would suggest running the snapinstaller and/or commit scripts with the -V
option. This will produce verbose debugging information and allow you to see
where they encounter problems.

Hope this helps,
-Graham




improve performance after commit

2007-03-06 Thread Kaan Erdener

hello,

I'm looking for some tips / suggestions around reducing the query  
time for Solr after I've post'ed a commit request. My Lucene index  
contains around 2,000,000 documents, and I have a job that  
periodically removes artibrary documents from Lucene and replaces  
them with fresh copies from a database. Whenever that cycle occurs, I  
send a commit to Solr to expose the updates. The problem is that  
immediately after the commit, a Solr query that previously took  
5-20ms now takes 20-25 seconds. Ouch.


I know that commit can be expensive, although I don't know by how  
much, or what I might do to mitigate the expense. I haven't much doc  
around this topic. I've also tried different cache settings  
(basically using high values for cache and auto-warm sizes) but that  
doesn't seem to make much of a difference.


I'll keep investigating on my own, but if anyone has any suggestions  
or additional info, I would greatly appreciate it.


thanks,
Kaan


Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace
Managed Hosting. Any dissemination, distribution or copying of the enclosed
material is prohibited. If you receive this transmission in error, please
notify us immediately by e-mail at [EMAIL PROTECTED], and delete the
original message. Your cooperation is appreciated.



Re: improve performance after commit

2007-03-06 Thread Yonik Seeley

On 3/6/07, Kaan Erdener [EMAIL PROTECTED] wrote:

I'm looking for some tips / suggestions around reducing the query
time for Solr after I've post'ed a commit request. My Lucene index
contains around 2,000,000 documents, and I have a job that
periodically removes artibrary documents from Lucene and replaces
them with fresh copies from a database. Whenever that cycle occurs, I
send a commit to Solr to expose the updates. The problem is that
immediately after the commit, a Solr query that previously took
5-20ms now takes 20-25 seconds. Ouch.


If this is a normal query (no faceting) then most likely the time is spent
populating a lucene FieldCache entry used for sorting results.
Put a static warming entry in solrconfig.xml that queries for a small
number of documents and sorts that query by all the fields you
commonly sort by.

-Yonik


Saving dynamic field name without dynamic extension

2007-03-06 Thread Debra

I want to add a suffix to my fields names to use the dynamic fields feature.
Is there a way to save the field name without the suffix so users can search
by field  with plain field name?

-- 
View this message in context: 
http://www.nabble.com/Saving-dynamic-field-name-without-dynamic-extension-tf3358269.html#a9340901
Sent from the Solr - User mailing list archive at Nabble.com.



SQL Update

2007-03-06 Thread Debra

What is the status of the SQL update?
Should all database fields used in sql updates be added to schema.xml before
running the sql update?

-- 
View this message in context: 
http://www.nabble.com/SQL-Update-tf3358303.html#a9341018
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SQL Update

2007-03-06 Thread Ryan McKinley

SOLR-103 is waiting for SOLR-139 to solidify before i post more updates...

I have it running successfully, but it requires too many other patches
to suggest trying to get it running unless you are up for a bit of
work.  If you are, i can easily post an update.

About the schema... SOLR-103 uses the ResultSetMetaData to decide what
field to push the value into - you will need to make sure the column
names correspond to the fields in schema.sql.  If you use SELECT *
FROM, your tables will need the same names, if you use:  SELECT
mysqlfield as mysolrfieldname FROM ... you don't.

ryan


On 3/6/07, Debra [EMAIL PROTECTED] wrote:


What is the status of the SQL update?
Should all database fields used in sql updates be added to schema.xml before
running the sql update?

--
View this message in context: 
http://www.nabble.com/SQL-Update-tf3358303.html#a9341018
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Saving dynamic field name without dynamic extension

2007-03-06 Thread Mike Klaas

On 3/6/07, Debra [EMAIL PROTECTED] wrote:


I want to add a suffix to my fields names to use the dynamic fields feature.
Is there a way to save the field name without the suffix so users can search
by field  with plain field name?


No, and I'm not sure that it is possible.  Solr needs to know the type
of a field at all times--not just during indexing.

Why not create a _user suffix, and programmatically add the suffix to
user queries before it reaches solr?

-Mike


Re: Reindex only records that changed

2007-03-06 Thread Traut

additional field in your DB as flag? 1 - dirty, 0 - clean.

Debra wrote:

Hi all,

This is not a direct solr issue but I need it for indexing.

Is there a way to check if a database record changed since the last index
(with out using a specail flag field that has to be set any-where the record
is updated). I would like to re-index only records that changed.

TIA
Debra
  




Re: improve performance after commit

2007-03-06 Thread Kaan Erdener


On Mar 6, 2007, at 1:55 PM, Yonik Seeley wrote:


On 3/6/07, Kaan Erdener [EMAIL PROTECTED] wrote:

I'm looking for some tips / suggestions around reducing the query
time for Solr after I've post'ed a commit request. My Lucene index
contains around 2,000,000 documents, and I have a job that
periodically removes artibrary documents from Lucene and replaces
them with fresh copies from a database. Whenever that cycle occurs, I
send a commit to Solr to expose the updates. The problem is that
immediately after the commit, a Solr query that previously took
5-20ms now takes 20-25 seconds. Ouch.


If this is a normal query (no faceting) then most likely the time  
is spent

populating a lucene FieldCache entry used for sorting results.
Put a static warming entry in solrconfig.xml that queries for a small
number of documents and sorts that query by all the fields you
commonly sort by.

-Yonik


I'm not exactly sure this is what you meant, but I did some more  
research and it looks close. I added the following to my solrconfig.xml:


listener event=newSearcher class=solr.QuerySenderListener
  arr name=queries
lst str name=qallMessageContent:test/str str  
name=start0/str str name=rows10/str /lst

  /arr
/listener

and also:

listener event=firstSearcher class=solr.QuerySenderListener
  arr name=queries
lst str name=qallMessageContent:trying/str str  
name=start0/str str name=rows10/str /lst

  /arr
/listener

From what I can see in the logs, these are both invoked after the  
commit. However, the query times after a commit are still slow  
(around 20 seconds). I'm guessing I didn't set up the warming  
correctly? I had some sorting parameters in there, but the syntax was  
wrong, produced errors on startup, so I took them out for now.


Mar 6, 2007 4:51:52 PM org.apache.solr.update.DirectUpdateHandler2  
commit

INFO: end_commit_flush
Mar 6, 2007 4:51:52 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
documentCache 
{lookups=10,hits=0,hitratio=0.00,inserts=20,evictions=0,size=20,cumulati 
ve_lookups=120,cumulative_hits=68,cumulative_hitratio=0.56,cumulative_in 
serts=52,cumulative_evictions=0}

Mar 6, 2007 4:51:52 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main
documentCache 
{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_ 
lookups=120,cumulative_hits=68,cumulative_hitratio=0.56,cumulative_inser 
ts=52,cumulative_evictions=0}
Mar 6, 2007 4:51:52 PM org.apache.solr.core.QuerySenderListener  
newSearcher

INFO: QuerySenderListener sending requests to [EMAIL PROTECTED] main
Mar 6, 2007 4:51:52 PM org.apache.solr.core.SolrCore execute
INFO: rows=10start=0q=allMessageContent:trying 0 410
Mar 6, 2007 4:51:52 PM org.apache.solr.core.QuerySenderListener  
newSearcher




Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace
Managed Hosting. Any dissemination, distribution or copying of the enclosed
material is prohibited. If you receive this transmission in error, please
notify us immediately by e-mail at [EMAIL PROTECTED], and delete the
original message. Your cooperation is appreciated.



Re: [2] Saving dynamic field name without dynamic extension

2007-03-06 Thread Debra

Thank you, your suggestion looks like the way to go...


Mike Klaas wrote:
 
 On 3/6/07, Debra [EMAIL PROTECTED] wrote:

 I want to add a suffix to my fields names to use the dynamic fields
 feature.
 Is there a way to save the field name without the suffix so users can
 search
 by field  with plain field name?
 
 No, and I'm not sure that it is possible.  Solr needs to know the type
 of a field at all times--not just during indexing.
 
 Why not create a _user suffix, and programmatically add the suffix to
 user queries before it reaches solr?
 
 -Mike
 
 

-- 
View this message in context: 
http://www.nabble.com/Saving-dynamic-field-name-without-dynamic-extension-tf3358269.html#a9343182
Sent from the Solr - User mailing list archive at Nabble.com.



Re: [2] Highlighting problems with HTML tagged fields

2007-03-06 Thread nick19701


Yonik Seeley wrote:
 
 HTMLStripWhitespaceTokenizerFactory works in two phases...
 HTMLStripReader removes the HTML and passes the result to
 WhitespaceTokenizer... at that point, Tokens are generated, but the
 offsets will correspond to the text after HTML removal, not before.
 
 I did it this way so that HTMLStripReader  could go before any
 tokenizer (like StandardTokenizer).
 
 Can you open a JIRA bug for this?  The fix would be a special version
 of HTMLStripReader integrated with a WhitespaceTokenizer to keep
 offsets correct.
 
 -Yonik
 
 
Is there a fix for this problem?

my solr is dated on 12/17/2006. HTMLStripWhitespaceTokenizerFactory +
highlighting still
doesn't work. All the wrong items are highlighted.
-- 
View this message in context: 
http://www.nabble.com/Highlighting-problems-with-HTML-tagged-fields-tf2017260.html#a9343253
Sent from the Solr - User mailing list archive at Nabble.com.



Re: [2] Reindex only records that changed

2007-03-06 Thread Debra

I would like to avoid such  a field in case  tables are updated  in programs
not under my control + any program that updates these tables has to add
logic for updating this field.


Sergey Polzunov-2 wrote:
 
 additional field in your DB as flag? 1 - dirty, 0 - clean.
 
 Debra wrote:
 Hi all,

 This is not a direct solr issue but I need it for indexing.

 Is there a way to check if a database record changed since the last index
 (with out using a specail flag field that has to be set any-where the
 record
 is updated). I would like to re-index only records that changed.

 TIA
 Debra
   
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Reindex-only-records-that-changed-tf3358652.html#a9343307
Sent from the Solr - User mailing list archive at Nabble.com.



Re: [2] Reindex only records that changed

2007-03-06 Thread Ryan McKinley

MySQL has a TIMESTAMP field that can autoupdate everytime something
changes... i've never used it, but that may be a place to look.

alternativly you could add a TRIGGER to automatticaly dump stuff to a
bucket when it changes and clear the bucket when you index


On 3/6/07, Debra [EMAIL PROTECTED] wrote:


I would like to avoid such  a field in case  tables are updated  in programs
not under my control + any program that updates these tables has to add
logic for updating this field.


Sergey Polzunov-2 wrote:

 additional field in your DB as flag? 1 - dirty, 0 - clean.

 Debra wrote:
 Hi all,

 This is not a direct solr issue but I need it for indexing.

 Is there a way to check if a database record changed since the last index
 (with out using a specail flag field that has to be set any-where the
 record
 is updated). I would like to re-index only records that changed.

 TIA
 Debra





--
View this message in context: 
http://www.nabble.com/Reindex-only-records-that-changed-tf3358652.html#a9343307
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Error with bin/optimize and multiple solr webapps

2007-03-06 Thread Jeff Rodenburg

This issue has been logged as:

https://issues.apache.org/jira/browse/SOLR-188

A patch file is included for those who are interested.  I've unit tested in
my environment, please validate it for your own environment.

cheers,
j



On 3/5/07, Jeff Rodenburg [EMAIL PROTECTED] wrote:


Thanks Hoss.  I'll add an issue in JIRA and attach the patch.



On 3/5/07, Chris Hostetter [EMAIL PROTECTED]  wrote:


 : This line assumes a single solr installation under Tomcat, whereas the

 : multiple webapp scenario runs from a different location (the /solr
 part).
 : I'm sure this applies elsewhere.

 good catch ... it looks like all of our scripts assume /solr/update is

 the correct path to POST commit/optimize messages to.

 : I would submit a patch for JIRA, but couldn't find these files under
 version
 : control.  Any recommendations?

 They live in src/scripts ... a patch would ceritanly be apprecaited.

 FYI: there is an evolution underway to allow XML based update messages
 to
 be sent to any path (and the fixed path /update is being deprecated)
 so it would be handy if the entire URL path was configurable (not just
 hte
 webapp name)


 -Hoss





RE: Error with bin/optimize and multiple solr webapps

2007-03-06 Thread Graham Stead
Apologies in advance if SOLR-187 and SOLR-188 look the same -- they are the
same issue. I have been using adjusted scripts locally but hadn't used Jira
before and wasn't sure of the process. I decided to figure it out after
answering Gola's question this morning...then saw that Jeff had mentioned a
similar issue last night. I apologize again for confusion over the double
entry. 

Thanks,
-Graham

 -Original Message-
 From: Jeff Rodenburg [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, March 06, 2007 4:34 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Error with bin/optimize and multiple solr webapps
 
 This issue has been logged as:
 
 https://issues.apache.org/jira/browse/SOLR-188
 
 A patch file is included for those who are interested.  I've 
 unit tested in my environment, please validate it for your 
 own environment.
 
 cheers,
 j
 
 
 
 On 3/5/07, Jeff Rodenburg [EMAIL PROTECTED] wrote:
 
  Thanks Hoss.  I'll add an issue in JIRA and attach the patch.
 
 
 
  On 3/5/07, Chris Hostetter [EMAIL PROTECTED]  wrote:
  
  
   : This line assumes a single solr installation under 
 Tomcat, whereas 
   the
  
   : multiple webapp scenario runs from a different location 
 (the /solr
   part).
   : I'm sure this applies elsewhere.
  
   good catch ... it looks like all of our scripts assume 
   /solr/update is
  
   the correct path to POST commit/optimize messages to.
  
   : I would submit a patch for JIRA, but couldn't find these files 
   under version
   : control.  Any recommendations?
  
   They live in src/scripts ... a patch would ceritanly be 
 apprecaited.
  
   FYI: there is an evolution underway to allow XML based update 
   messages to be sent to any path (and the fixed path /update is 
   being deprecated) so it would be handy if the entire URL path was 
   configurable (not just hte webapp name)
  
  
   -Hoss
  
  
 
 




Re: Error with bin/optimize and multiple solr webapps

2007-03-06 Thread Jeff Rodenburg

Oops, my bad I didn't see either 186 or 187 before entering 188.  :-)

-- j

On 3/6/07, Graham Stead [EMAIL PROTECTED] wrote:


Apologies in advance if SOLR-187 and SOLR-188 look the same -- they are
the
same issue. I have been using adjusted scripts locally but hadn't used
Jira
before and wasn't sure of the process. I decided to figure it out after
answering Gola's question this morning...then saw that Jeff had mentioned
a
similar issue last night. I apologize again for confusion over the double
entry.

Thanks,
-Graham

 -Original Message-
 From: Jeff Rodenburg [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, March 06, 2007 4:34 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Error with bin/optimize and multiple solr webapps

 This issue has been logged as:

 https://issues.apache.org/jira/browse/SOLR-188

 A patch file is included for those who are interested.  I've
 unit tested in my environment, please validate it for your
 own environment.

 cheers,
 j



 On 3/5/07, Jeff Rodenburg [EMAIL PROTECTED] wrote:
 
  Thanks Hoss.  I'll add an issue in JIRA and attach the patch.
 
 
 
  On 3/5/07, Chris Hostetter [EMAIL PROTECTED]  wrote:
  
  
   : This line assumes a single solr installation under
 Tomcat, whereas
   the
  
   : multiple webapp scenario runs from a different location
 (the /solr
   part).
   : I'm sure this applies elsewhere.
  
   good catch ... it looks like all of our scripts assume
   /solr/update is
  
   the correct path to POST commit/optimize messages to.
  
   : I would submit a patch for JIRA, but couldn't find these files
   under version
   : control.  Any recommendations?
  
   They live in src/scripts ... a patch would ceritanly be
 apprecaited.
  
   FYI: there is an evolution underway to allow XML based update
   messages to be sent to any path (and the fixed path /update is
   being deprecated) so it would be handy if the entire URL path was
   configurable (not just hte webapp name)
  
  
   -Hoss
  
  
 






Re: improve performance after commit

2007-03-06 Thread Yonik Seeley

On 3/6/07, Kaan Erdener [EMAIL PROTECTED] wrote:

From what I can see in the logs, these are both invoked after the
commit. However, the query times after a commit are still slow
(around 20 seconds).


Your warming script didn't do any sorts.
Why don't you also show the part of the log with the slow query...
that would make it much easier for people to help.

-Yonik


Re: improve performance after commit

2007-03-06 Thread Ryan McKinley


str name=qallMessageContent:test;subject+asc/str



there should be a space between subject and asc,

try: http://host/select?q=allMessageContent:test;subject%20asc

+ is supposed to become a space, but it looks like it is staying +