Re: Question on StreamingUpdateSolrServer
I index in 10K batches and commit after 5 index cyles (after 50K). Is there any limitation that I can't search during commit or auto-warming? I got 8 CPU cores and only 2 were showing busy (using top) - so it's unlikely that the CPU was pegged. 2009/4/12 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: If you use StreamingUpdateSolrServer it POSTs all the docs in a single request. 10 million docs may be a bit too much for a single request. I guess you should batch it in multiple requests of smaller chunks, It is likely that the CPU is really hot when the autowarming is hapening. getting a decent search perf w/o autowarming is not easy . autowarmCount is an attribute of a cache .see here http://wiki.apache.org/solr/SolrCaching On Mon, Apr 13, 2009 at 3:32 AM, vivek sar vivex...@gmail.com wrote: Thanks Shalin. I noticed couple more things. As I index around 100 million records a day, my Indexer is running pretty much at all times throughout the day. Whenever I run a search query I usually get connection reset when the commit is happening and get blank page when the auto-warming of searchers is happening. Here are my questions, 1) Is this coincidence or a known issue? Can't we search while commit or auto-warming is happening? 2) How do I stop auto-warming? My search traffic is very low so I'm trying to turn off auto-warming after commit has happened - is there anything in the solrconfig.xml to do that? 3) What would be the best strategy for searching in my scenario where commits may be happening all the time (I commit every 50K records - so every 30-60 sec there is a commit happening followed by auto-warming that takes 40 sec)? Search frequency is pretty low for us, but we want to make sure that whenever it happens it is fast enough and returns result (instead of exception or a blank screen). Thanks for all the help. -vivek On Sat, Apr 11, 2009 at 1:48 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Sun, Apr 12, 2009 at 2:15 AM, vivek sar vivex...@gmail.com wrote: The problem is I don't see any error message in the catalina.out. I don't even see the request coming in - I simply get blank page on browser. If I keep trying the request goes through and I get respond from Solr, but then it become unresponsive again or sometimes throws connection reset error. I'm not sure why would it work sometimes and not the other times for the same query. As soon as I stop the Indexer process things start working fine. Any way I can debug this problem? I'm not sure. I've never seen this issue myself. Could you try using the bundled jetty instead of Tomcat or on a different box just to make sure this is not an environment specific issue? -- Regards, Shalin Shekhar Mangar. -- --Noble Paul
Re: Question on StreamingUpdateSolrServer
On Mon, Apr 13, 2009 at 12:36 PM, vivek sar vivex...@gmail.com wrote: I index in 10K batches and commit after 5 index cyles (after 50K). Is there any limitation that I can't search during commit or auto-warming? I got 8 CPU cores and only 2 were showing busy (using top) - so it's unlikely that the CPU was pegged. No, there is no such limitation. The old searcher will continue to serve search requests until the new one is warmed and registered. So, CPU does not seem to be an issue. Does this happen only when you use StreamingUpdateSolrServer? Which OS, file system? What JVM parameters are you using? Which servlet container and version? -- Regards, Shalin Shekhar Mangar.
DataImportHandler with multiple values
Hello, I'm trying to import a simple book table with the full-import command. The datas are stored in mysql. It worked well when I tried to import few fields from the 'book' table : title, author, publisher etc. Now I would like to create a facet (multi valued field) with the categories which belong to the book. There is my sql request to get the list of categories from a book (009959241X for example, return 7 categories) : SELECT abn.name AS cat, ab.isbn AS isbn_temp FROM (amazon_books AS ab LEFT JOIN amazon_book_browse_nodes AS abbn ON ab.isbn = abbn.amazon_book_id) LEFT JOIN amazon_browse_nodes AS abn ON abbn.amazon_browse_node_id = abn.id WHERE ab.isbn = '009959241X' I tried to integrate it on my dataconfig : dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:33061/completelynovel user=root password= / document name=books entity name=book pk=ID query=select isbn, listing_id AS id, title, publisher_name, author_name AS author_name_s from amazon_books where publisher_name IS NOT NULL AND author_name IS NOT NULL LIMIT 0, 10 field column=ID name=id / field column=ISBN name=isbn / field column=TITLE name=title / field column=PUBLISHER_NAME name=publisher_name / field column=AUTHOR_NAME_S name=author_name_s / entity name=book_category pk=id query=SELECT abn.name AS cat, ab.isbn AS isbn_temp FROM (amazon_books AS ab LEFT JOIN amazon_book_browse_nodes AS abbn ON ab.isbn = abbn.amazon_book_id) LEFT JOIN amazon_browse_nodes AS abn ON abbn.amazon_browse_node_id = abn.id WHERE ab.isbn = '${book.ISBN}' field column=cat name=cat / /entity /entity /document /dataConfig And my solr schema : field name=id type=sint indexed=true stored=true required=true / field name=isbn type=string indexed=true stored=true / field name=title type=string indexed=true stored=true / field name=publisher_name type=string indexed=true stored=true/ field name=cat type=text_ws indexed=true stored=true multiValued=true omitNorms=true termVectors=true / And below standart solr 1.4 dynamics fields... Ten fields are well created... but without the 'cat' multi value field. doc arr name=author_name_s strTerry Pratchett/str /arr int name=id47/int str name=isbn0552124753/str str name=publisher_nameCorgi Books/str date name=timestamp2009-04-13T12:54:38.553Z/date str name=titleThe Colour of Magic (Discworld Novel)/str /doc I guess I missed something, could you help me or redirect me to the right doc? Thank you ! Vincent -- View this message in context: http://www.nabble.com/DataImportHandler-with-multiple-values-tp23022195p23022195.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler with multiple values
2009/4/13 Vincent Pérès vincent.pe...@gmail.com dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:33061/completelynovel user=root password= / document name=books entity name=book pk=ID query=select isbn, listing_id AS id, title, publisher_name, author_name AS author_name_s from amazon_books where publisher_name IS NOT NULL AND author_name IS NOT NULL LIMIT 0, 10 field column=ID name=id / field column=ISBN name=isbn / field column=TITLE name=title / field column=PUBLISHER_NAME name=publisher_name / field column=AUTHOR_NAME_S name=author_name_s / entity name=book_category pk=id query=SELECT abn.name AS cat, ab.isbn AS isbn_temp FROM (amazon_books AS ab LEFT JOIN amazon_book_browse_nodes AS abbn ON ab.isbn = abbn.amazon_book_id) LEFT JOIN amazon_browse_nodes AS abn ON abbn.amazon_browse_node_id = abn.id WHERE ab.isbn = '${book.ISBN}' field column=cat name=cat / /entity /entity /document /dataConfig Ten fields are well created... but without the 'cat' multi value field. Just a guess, try ${book.isbn} instead Does you sql return the column names in capitals? If you are using trunk, you do not need to specify upper-case to lower-case mapping in data-config. Infact the field mapping is not required at all if your schema has a field with the same name as returned by SQL. DataImportHandler will populate it with the value, irrespective of case. Also, if you intend to facet on 'cat', you should probably use a non-tokenized field type in the schema such as string. Faceting is performed on the indexed value rather than the stored value. -- Regards, Shalin Shekhar Mangar.
Re: DataImportHandler with multiple values
it is likely that your query did not return any data. just run the query separately and see if it reallly works. Or try it out in debug mode. it will tell you which query was run and what got returned. --Noble 2009/4/13 Vincent Pérès vincent.pe...@gmail.com: Hello, I'm trying to import a simple book table with the full-import command. The datas are stored in mysql. It worked well when I tried to import few fields from the 'book' table : title, author, publisher etc. Now I would like to create a facet (multi valued field) with the categories which belong to the book. There is my sql request to get the list of categories from a book (009959241X for example, return 7 categories) : SELECT abn.name AS cat, ab.isbn AS isbn_temp FROM (amazon_books AS ab LEFT JOIN amazon_book_browse_nodes AS abbn ON ab.isbn = abbn.amazon_book_id) LEFT JOIN amazon_browse_nodes AS abn ON abbn.amazon_browse_node_id = abn.id WHERE ab.isbn = '009959241X' I tried to integrate it on my dataconfig : dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:33061/completelynovel user=root password= / document name=books entity name=book pk=ID query=select isbn, listing_id AS id, title, publisher_name, author_name AS author_name_s from amazon_books where publisher_name IS NOT NULL AND author_name IS NOT NULL LIMIT 0, 10 field column=ID name=id / field column=ISBN name=isbn / field column=TITLE name=title / field column=PUBLISHER_NAME name=publisher_name / field column=AUTHOR_NAME_S name=author_name_s / entity name=book_category pk=id query=SELECT abn.name AS cat, ab.isbn AS isbn_temp FROM (amazon_books AS ab LEFT JOIN amazon_book_browse_nodes AS abbn ON ab.isbn = abbn.amazon_book_id) LEFT JOIN amazon_browse_nodes AS abn ON abbn.amazon_browse_node_id = abn.id WHERE ab.isbn = '${book.ISBN}' field column=cat name=cat / /entity /entity /document /dataConfig And my solr schema : field name=id type=sint indexed=true stored=true required=true / field name=isbn type=string indexed=true stored=true / field name=title type=string indexed=true stored=true / field name=publisher_name type=string indexed=true stored=true/ field name=cat type=text_ws indexed=true stored=true multiValued=true omitNorms=true termVectors=true / And below standart solr 1.4 dynamics fields... Ten fields are well created... but without the 'cat' multi value field. doc arr name=author_name_s strTerry Pratchett/str /arr int name=id47/int str name=isbn0552124753/str str name=publisher_nameCorgi Books/str date name=timestamp2009-04-13T12:54:38.553Z/date str name=titleThe Colour of Magic (Discworld Novel)/str /doc I guess I missed something, could you help me or redirect me to the right doc? Thank you ! Vincent -- View this message in context: http://www.nabble.com/DataImportHandler-with-multiple-values-tp23022195p23022195.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: DataImportHandler with multiple values
I changed the ISBN to lowercase (and the other fields as well) and it works ! Thanks very much ! -- View this message in context: http://www.nabble.com/DataImportHandler-with-multiple-values-tp23022195p23023374.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: PHP Remove From Index/Search By Fields
Also, in reference to the other question, I'm currently trying to edit the main search page to search multiple fields. Essentially, I detect if each field has been posted or not using: if ($_POST['FIELD'] != '') { $query = $query . '+FIELDNAME:' . $_POST['FIELD']; } Once it's processed all the fields, its then sent to query solr, but I'm not sure if I'm getting the syntax right or if there's anything in the Solr config file I need to modify (dismax?) because it still only returns results when I enter a search in the 'content' field (also the default Solr field). My Solr query looks like: $query = ?q=.trim(urlencode($query)). 'version=2.2start=0rows=99indent=on'; where $query will look something like Content: 35 million+Date: 16th Oct etc, until it has been urlencoded/trimmed. Will it still only return results on 'content' searches because that's the only default field? Johnny X wrote: Thanks for the reply Erik! Based on a previous page I used to return queries I've developed this code below for the page I need to do all of the above. CODE ?php $id = $_GET['id']; $connection = mysqli_connect(localhost, root, onion, collection) or die (Couldn't connect to MySQL); define('SOLR_URL', 'http://localhost:8080/solr/'); function request($reqData, $type){ $header[] = Content-type: text/xml; charset=UTF-8; $session = curl_init(); curl_setopt($session, CURLOPT_HEADER, true); curl_setopt($session, CURLOPT_HTTPHEADER, $header); curl_setopt($session, CURLOPT_URL,SOLR_URL.$type); curl_setopt($session, CURLOPT_POSTFIELDS, $reqData); curl_setopt($session, CURLOPT_RETURNTRANSFER, 1); curl_setopt($session, CURLOPT_POST, 1); $response = curl_exec($session); curl_close($session); return $response; } function solrQuery($q){ $query = ?q=.trim(urlencode($q)).qf=Message-IDversion=2.2start=0rows=99indent=on; return $results = request(, select.$query); } echo htmlheadtitleIP E-mail/title; echo 'link rel=stylesheet type=text/css href=stylesheet.css /'; echo 'script type=text/javascript !-- function confirmation() { var answer = confirm(Remove spam?) if (answer){ alert(Spam removed!) $results = solrQuery('.$id.'); } } //-- /script'; echo /headbody; echo 'form method=post'; echo 'table width=100%'; echo tr; echo 'tdh1Trace/Mark IP E-mail/h1td'; echo 'tdp align=rightPowered by/p/td'; echo 'td width=283px mysql_logo.jpg /td'; echo /tr; echo /table; echo /form; /* Send a query to the server */ if ($location = mysqli_query($connection, SELECT location FROM hashes WHERE message_id = '$id')) { echo 'br/'; echo 'pMark as: input type=button onclick=confirmation() value=Spam input type=button value=Non-Business input type=button value=Non-Confidential/p'; print(h3Message Location:\n/h3); /* Fetch the results of the query */ while( $row = mysqli_fetch_assoc($location) ){ printf(p%s\n/p, $row['location']); } /* Destroy the result set and free the memory used for it */ mysqli_free_result($location); } /* Send a query to the server */ if ($duplicates = mysqli_query($connection, SELECT location FROM hashes WHERE (md5 = (SELECT md5 FROM hashes WHERE message_id = '$id') AND message_id '$id'))) { print(h3Duplicate Locations:\n/h3); /* Fetch the results of the query */ while( $row = mysqli_fetch_assoc($duplicates) ){ printf(p%s\n/p, $row['location']); } /* Destroy the result set and free the memory used for it */ mysqli_free_result($duplicates); } /* Close the connection */ mysqli_close($connection); $results = explode('?xml version=1.0 encoding=UTF-8?', $results); $results = $results[1]; $dom = new DomDocument; $dom-loadXML($results); $docs = $dom-getElementsByTagName('doc'); foreach ($docs as $doc) { $strings = $doc-getElementsByTagName('arr'); foreach($strings as $str){ $attr = $str-getAttribute('name'); $data = $str-textContent; switch($attr){ case 'Bcc': $Bcc = $data; break; case 'Cc': $Cc = $data; break; case 'Content': $Content = $data; break; case 'Content-Transfer-Encoding': $ContentTransferEncoding = $data; break; case 'Content-Type': $ContentType = $data; break; case 'Date': $Date = $data; break; case 'From':
Re: PHP Remove From Index/Search By Fields
On Apr 13, 2009, at 11:20 AM, Johnny X wrote: Also, in reference to the other question, I'm currently trying to edit the main search page to search multiple fields. Essentially, I detect if each field has been posted or not using: if ($_POST['FIELD'] != '') { $query = $query . '+FIELDNAME:' . $_POST['FIELD']; } Once it's processed all the fields, its then sent to query solr, but I'm not sure if I'm getting the syntax right or if there's anything in the Solr config file I need to modify (dismax?) because it still only returns results when I enter a search in the 'content' field (also the default Solr field). My Solr query looks like: $query = ?q=.trim(urlencode($query)). 'version=2.2start=0rows=99indent=on'; where $query will look something like Content: 35 million+Date: 16th Oct etc, until it has been urlencoded/trimmed. Will it still only return results on 'content' searches because that's the only default field? You'll need to read up on Lucene/Solr query parser syntax to be able to build useful queries programatically like that: http://wiki.apache.org/solr/SolrQuerySyntax Your syntax above is not doing what you might think... you'll want to surround expressions with quotes or in parens for a single field. Content:(35 million) for example. It'll be best if you decouple your questions about query parsing from PHP code though. And don't forget that debugQuery=true is your friend, so you can see how queries are being parsed. Providing that output would be helpful to see what is actually happening with what you're sending. Erik
Solr posts xml
Hi there I installed Solr on tomcat 6 and whenever I click search it displays the xml like I am editing it? is that normal? I added a connector line in my server.xml below. -- Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 / !-- A Connector using the shared thread pool-- !-- Connector executor=tomcatThreadPool port=8080 protocol=HTTP/1.1 connectionTimeout=2 redirectPort=8443 / -- !-- Define a SSL HTTP/1.1 Connector on port 8443 This connector uses the JSSE configuration, when using APR, the connector should be using the OpenSSL style configuration described in the APR documentation -- !-- Connector port=8443 protocol=HTTP/1.1 SSLEnabled=true maxThreads=150 scheme=https secure=true clientAuth=false sslProtocol=TLS / -- I added this line --- Connector port=8983 maxHttpHeaderSize=8192 maxThreads=150 minSpareThreads=25 maxSpareThreads=75 enableLookups=false redirectPort=8443 acceptCount=100 connectionTimeout=2 disableUploadTimeout=true URIEncoding=UTF-8 / -- -- View this message in context: http://www.nabble.com/Solr-posts-xml-tp23024642p23024642.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: PHP Remove From Index/Search By Fields
Do you know the specific syntax when querying different fields? http://localhost:8080/solr/select/?q=Date:%222000%22version=2.2start=0rows=10indent=on doesn't appear to return anything when I post it in my browser, when it should, but (as before) if you change 'Date' to 'Content' it works! (presumably because content is the default field). Is there nothing else I have to change to make sure they're returned? All fields are indexed and stored, but 'Content' is the only 'text' field, the others are 'string'. Going back to dismax, it looks like that's more useful for boosting than specifying multiple fields because it works a lot like copyfields (in that it compounds all of the fields together in one big search). If I were to do that, there'd be no need to have anything more than one user input box because it won't be separated by field anyway. Erik Hatcher wrote: On Apr 13, 2009, at 11:20 AM, Johnny X wrote: Also, in reference to the other question, I'm currently trying to edit the main search page to search multiple fields. Essentially, I detect if each field has been posted or not using: if ($_POST['FIELD'] != '') { $query = $query . '+FIELDNAME:' . $_POST['FIELD']; } Once it's processed all the fields, its then sent to query solr, but I'm not sure if I'm getting the syntax right or if there's anything in the Solr config file I need to modify (dismax?) because it still only returns results when I enter a search in the 'content' field (also the default Solr field). My Solr query looks like: $query = ?q=.trim(urlencode($query)). 'version=2.2start=0rows=99indent=on'; where $query will look something like Content: 35 million+Date: 16th Oct etc, until it has been urlencoded/trimmed. Will it still only return results on 'content' searches because that's the only default field? You'll need to read up on Lucene/Solr query parser syntax to be able to build useful queries programatically like that: http://wiki.apache.org/solr/SolrQuerySyntax Your syntax above is not doing what you might think... you'll want to surround expressions with quotes or in parens for a single field. Content:(35 million) for example. It'll be best if you decouple your questions about query parsing from PHP code though. And don't forget that debugQuery=true is your friend, so you can see how queries are being parsed. Providing that output would be helpful to see what is actually happening with what you're sending. Erik -- View this message in context: http://www.nabble.com/PHP-Remove-From-Index-Search-By-Fields-tp22996701p23024816.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: PHP Remove From Index/Search By Fields
A further update on this is that (when 'Date' is searched using the same URL as posted in the previous message), whether Date is of type string or text, the full (exact) content of a field has to be searched to return a result. Why is this not the case with Content? I tried changing the default search field to 'Date' to see if that made a difference and nothing changed. Johnny X wrote: Do you know the specific syntax when querying different fields? http://localhost:8080/solr/select/?q=Date:%222000%22version=2.2start=0rows=10indent=on doesn't appear to return anything when I post it in my browser, when it should, but (as before) if you change 'Date' to 'Content' it works! (presumably because content is the default field). Is there nothing else I have to change to make sure they're returned? All fields are indexed and stored, but 'Content' is the only 'text' field, the others are 'string'. Going back to dismax, it looks like that's more useful for boosting than specifying multiple fields because it works a lot like copyfields (in that it compounds all of the fields together in one big search). If I were to do that, there'd be no need to have anything more than one user input box because it won't be separated by field anyway. Erik Hatcher wrote: On Apr 13, 2009, at 11:20 AM, Johnny X wrote: Also, in reference to the other question, I'm currently trying to edit the main search page to search multiple fields. Essentially, I detect if each field has been posted or not using: if ($_POST['FIELD'] != '') { $query = $query . '+FIELDNAME:' . $_POST['FIELD']; } Once it's processed all the fields, its then sent to query solr, but I'm not sure if I'm getting the syntax right or if there's anything in the Solr config file I need to modify (dismax?) because it still only returns results when I enter a search in the 'content' field (also the default Solr field). My Solr query looks like: $query = ?q=.trim(urlencode($query)). 'version=2.2start=0rows=99indent=on'; where $query will look something like Content: 35 million+Date: 16th Oct etc, until it has been urlencoded/trimmed. Will it still only return results on 'content' searches because that's the only default field? You'll need to read up on Lucene/Solr query parser syntax to be able to build useful queries programatically like that: http://wiki.apache.org/solr/SolrQuerySyntax Your syntax above is not doing what you might think... you'll want to surround expressions with quotes or in parens for a single field. Content:(35 million) for example. It'll be best if you decouple your questions about query parsing from PHP code though. And don't forget that debugQuery=true is your friend, so you can see how queries are being parsed. Providing that output would be helpful to see what is actually happening with what you're sending. Erik -- View this message in context: http://www.nabble.com/PHP-Remove-From-Index-Search-By-Fields-tp22996701p23025514.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Term Counts/Term Frequency Vector Info
The query method seems to only support solr/select requests. I subclassed SolrRequest and created a request class that supports solr/autoSuggest - following the pattern in LukeRequest. It seems to work fine for me. Clay -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, April 07, 2009 10:41 PM To: solr-user@lucene.apache.org Subject: Re: Term Counts/Term Frequency Vector Info You can send arbitrary requests via SolrJ, just use the parameter map via the query method: http://lucene.apache.org/solr/api/solrj/org/apache/solr/client/solrj/SolrServer.html . -Grant On Apr 7, 2009, at 1:52 PM, Fink, Clayton R. wrote: These URLs give me what I want - word completion and term counts. What I don't see is a way to call these via SolrJ. I could call the server directly using java.net classes and process the XML myself, I guess. There needs to be an auto suggest request class. http://localhost:8983/solr/autoSuggest?terms=trueterms.fl=CONTENTSte rms.lower=Londterms.prefix=Lonindent=true response − lst name=responseHeader int name=status0/int int name=QTime0/int /lst − lst name=terms − lst name=CONTENTS int name=London11/int int name=Londoners2/int /lst /lst /response http://localhost:8983/solr/autoSuggest?terms=trueterms.fl=CONTENTSte rms.lower=Londonterms.upper=Londonterms.upper.incl=trueindent=true response − lst name=responseHeader int name=status0/int int name=QTime0/int /lst − lst name=terms − lst name=CONTENTS int name=London11/int /lst /lst /response -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Monday, April 06, 2009 5:43 PM To: solr-user@lucene.apache.org Subject: Re: Term Counts/Term Frequency Vector Info See also http://wiki.apache.org/solr/TermsComponent You might be able to apply these patches to 1.3 and have them work, but there is no guarantee. You also can get some termDocs like capabilities through Solr's faceting capabilities, but I am not aware of any way to get at the term vector capabilities. HTH, Grant On Apr 6, 2009, at 1:49 PM, Fink, Clayton R. wrote: I want the functionality that Lucene IndexReader.termDocs gives me. That or access on the document level to the term vector. This (http://wiki.apache.org/solr/TermVectorComponent?highlight=(term )|(vector) seems to suggest that this will be available in 1.4. Is there any way to do this in 1.3? Thanks, Clay -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Question on StreamingUpdateSolrServer
Here is some more information about my setup, Solr - v1.4 (nightly build 03/29/09) Servlet Container - Tomcat 6.0.18 JVM - 1.6.0 (64 bit) OS - Mac OS X Server 10.5.6 Hardware Overview: Processor Name: Quad-Core Intel Xeon Processor Speed: 3 GHz Number Of Processors: 2 Total Number Of Cores: 8 L2 Cache (per processor): 12 MB Memory: 20 GB Bus Speed: 1.6 GHz JVM Parameters (for Solr): export CATALINA_OPTS=-server -Xms6044m -Xmx6044m -DSOLR_APP -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log -Dsun.rmi.dgc.client.gcInterval=360 -Dsun.rmi.dgc.server.gcInterval=360 Other: lsof|grep solr|wc -l 2493 ulimit -an open files (-n) 9000 Tomcat Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 maxThreads=100 / Total Solr cores on same instance - 65 useCompoundFile - true The tests I ran, While Indexer is running 1) Go to http://juum19.co.com:8080/solr;- returns blank page (no error in the catalina.out) 2) Try telnet juum19.co.com 8080 - returns with Connection closed by foreign host Stop the Indexer Program (Tomcat is still running with Solr) 3) Go to http://juum19.co.com:8080/solr; - works ok, shows the list of all the Solr cores 4) Try telnet - able to Telnet fine 5) Now comment out all the caches in solrconfig.xml. Try same tests, but the Tomcat still doesn't response. Is there a way to stop the auto-warmer. I commented out the caches in the solrconfig.xml but still see the following log, INFO: autowarming result for searc...@3aba3830 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} INFO: Closing searc...@175dc1e2 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 6) Change the Indexer frequency so it runs every 2 min (instead of all the time). I noticed once the commit is done, I'm able to run my searches. During commit and auto-warming period I just get blank page. 7) Changed from Solrj to XML update - I still get the blank page whenever update/commit is happening. Apr 13, 2009 6:46:18 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[621094001, 621094002, 621094003, 621094004, 621094005, 621094006, 621094007, 621094008, ...(6992 more)]} 0 1948 Apr 13, 2009 6:46:18 PM org.apache.solr.core.SolrCore execute INFO: [20090413_12] webapp=/solr path=/update params={} status=0 QTime=1948 So, looks like it's not just StreamingUpdateSolrServer, but whenever the update/commit is happening I'm not able to search. I don't know if it's related to using multi-core. In this test I was using only single thread for update to a single core using only single Solr instance. So, it's clearly related to index process (update, commit and auto-warming). As soon as update/commit/auto-warming is completed I'm able to run my queries again. Is there anything that could stop searching while update process is in-progress - like any lock or something? Any other ideas? Thanks, -vivek On Mon, Apr 13, 2009 at 12:14 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Apr 13, 2009 at 12:36 PM, vivek sar vivex...@gmail.com wrote: I index in 10K batches and commit after 5 index cyles (after 50K). Is there any limitation that I can't search during commit or auto-warming? I got 8 CPU cores and only 2 were showing busy (using top) - so it's unlikely that the CPU was pegged. No, there is no such limitation. The old searcher will continue to serve search requests until the new one is warmed and registered. So, CPU does not seem to be an issue. Does this happen only when you use StreamingUpdateSolrServer? Which OS, file system? What JVM parameters are you using? Which servlet container and version? -- Regards, Shalin Shekhar Mangar.
Re: Index Version Number
Interesting. Do you know if it's possible to get the HTTP headers with Solrj? Yonik Seeley wrote: On Fri, Apr 10, 2009 at 11:58 AM, Richard Wiseman rwise...@infosciences.com wrote: Is it possible for a Solr client to determine if the index has changed since the last time it performed a query? For example, is it possible to query the current Lucene indexVersion? Grant pointed to one way - the Luke handler. Another way is to look at the Last-Modified or ETag HTTP headers. $ curl -i http://localhost:8983/solr/select?q=solr HTTP/1.1 200 OK Last-Modified: Fri, 10 Apr 2009 17:40:54 GMT ETag: OWZlNjdkN2Q4ODAwMDAwU29scg== Content-Type: text/xml; charset=utf-8 Content-Length: 2308 Server: Jetty(6.1.3) -Yonik http://www.lucidimagination.com -- Richard Wiseman Information Sciences Corp. (301) 962-5707
DataImporter : Java heap space
Hi All, I am trying to setup a Solr instance on my macbook. I get following errors when m trying to do a full db import ... please help me on this Apr 13, 2009 11:53:28 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity slideshow with URL: jdbc:mysql://localhost/mydb_development Apr 13, 2009 11:53:29 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 319 Apr 13, 2009 11:53:32 PM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351) Caused by: java.lang.OutOfMemoryError: Java heap space at com.mysql.jdbc.Buffer.init(Buffer.java:58) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444) at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840) My Java version $ java -version java version 1.5.0_16 Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_16-b06-284) Java HotSpot(TM) Client VM (build 1.5.0_16-133, mixed mode, sharing) Is that i need to install a new java version? my db is also very huge ~15 GB please do the need full ... thanks mani kumar
Re: DataImporter : Java heap space
I am using Tomcat ... On Mon, Apr 13, 2009 at 11:57 PM, Mani Kumar manikumarchau...@gmail.comwrote: Hi All, I am trying to setup a Solr instance on my macbook. I get following errors when m trying to do a full db import ... please help me on this Apr 13, 2009 11:53:28 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity slideshow with URL: jdbc:mysql://localhost/mydb_development Apr 13, 2009 11:53:29 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 319 Apr 13, 2009 11:53:32 PM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351) Caused by: java.lang.OutOfMemoryError: Java heap space at com.mysql.jdbc.Buffer.init(Buffer.java:58) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444) at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840) My Java version $ java -version java version 1.5.0_16 Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_16-b06-284) Java HotSpot(TM) Client VM (build 1.5.0_16-133, mixed mode, sharing) Is that i need to install a new java version? my db is also very huge ~15 GB please do the need full ... thanks mani kumar
Re: DataImporter : Java heap space
On Mon, Apr 13, 2009 at 11:57 PM, Mani Kumar manikumarchau...@gmail.comwrote: Hi All, I am trying to setup a Solr instance on my macbook. I get following errors when m trying to do a full db import ... please help me on this java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351) Caused by: java.lang.OutOfMemoryError: Java heap space at com.mysql.jdbc.Buffer.init(Buffer.java:58) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444) at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840) How much heap size have you allocated to the jvm? Also see http://wiki.apache.org/solr/DataImportHandlerFaq -- Regards, Shalin Shekhar Mangar.
Re: DataImporter : Java heap space
Hi Shalin: Thanks for quick response! By defaults it was set to 1.93 MB. But i also tried it with following command: $ ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M I also tried tricks given on http://wiki.apache.org/solr/DataImportHandlerFaq page. what should i try next ? Thanks! Mani Kumar On Tue, Apr 14, 2009 at 12:12 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Apr 13, 2009 at 11:57 PM, Mani Kumar manikumarchau...@gmail.com wrote: Hi All, I am trying to setup a Solr instance on my macbook. I get following errors when m trying to do a full db import ... please help me on this java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351) Caused by: java.lang.OutOfMemoryError: Java heap space at com.mysql.jdbc.Buffer.init(Buffer.java:58) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444) at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840) How much heap size have you allocated to the jvm? Also see http://wiki.apache.org/solr/DataImportHandlerFaq -- Regards, Shalin Shekhar Mangar.
Re: Term Counts/Term Frequency Vector Info
Sorry, should have add that you should set the qt param: http://wiki.apache.org/solr/CoreQueryParameters#head-2c940d42ec4f2a74c5d251f12f4077e53f2f00f4 -Grant On Apr 13, 2009, at 1:35 PM, Fink, Clayton R. wrote: The query method seems to only support solr/select requests. I subclassed SolrRequest and created a request class that supports solr/autoSuggest - following the pattern in LukeRequest. It seems to work fine for me. Clay -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, April 07, 2009 10:41 PM To: solr-user@lucene.apache.org Subject: Re: Term Counts/Term Frequency Vector Info You can send arbitrary requests via SolrJ, just use the parameter map via the query method: http://lucene.apache.org/solr/api/solrj/org/apache/solr/client/solrj/SolrServer.html . -Grant On Apr 7, 2009, at 1:52 PM, Fink, Clayton R. wrote: These URLs give me what I want - word completion and term counts. What I don't see is a way to call these via SolrJ. I could call the server directly using java.net classes and process the XML myself, I guess. There needs to be an auto suggest request class. http://localhost:8983/solr/autoSuggest? terms=trueterms.fl=CONTENTSte rms.lower=Londterms.prefix=Lonindent=true response − lst name=responseHeader int name=status0/int int name=QTime0/int /lst − lst name=terms − lst name=CONTENTS int name=London11/int int name=Londoners2/int /lst /lst /response http://localhost:8983/solr/autoSuggest? terms=trueterms.fl=CONTENTSte rms.lower=Londonterms.upper=Londonterms.upper.incl=trueindent=true response − lst name=responseHeader int name=status0/int int name=QTime0/int /lst − lst name=terms − lst name=CONTENTS int name=London11/int /lst /lst /response -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Monday, April 06, 2009 5:43 PM To: solr-user@lucene.apache.org Subject: Re: Term Counts/Term Frequency Vector Info See also http://wiki.apache.org/solr/TermsComponent You might be able to apply these patches to 1.3 and have them work, but there is no guarantee. You also can get some termDocs like capabilities through Solr's faceting capabilities, but I am not aware of any way to get at the term vector capabilities. HTH, Grant On Apr 6, 2009, at 1:49 PM, Fink, Clayton R. wrote: I want the functionality that Lucene IndexReader.termDocs gives me. That or access on the document level to the term vector. This (http://wiki.apache.org/solr/TermVectorComponent?highlight=(term )|(vector) seems to suggest that this will be available in 1.4. Is there any way to do this in 1.3? Thanks, Clay -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: DataImporter : Java heap space
Depending on your dataset and how your queries look you may very likely need to increase to a larger heap size. How many queries and rows are required for each of your documents to be generated? Ilan On 4/13/09 12:21 PM, Mani Kumar wrote: Hi Shalin: Thanks for quick response! By defaults it was set to 1.93 MB. But i also tried it with following command: $ ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M I also tried tricks given on http://wiki.apache.org/solr/DataImportHandlerFaq page. what should i try next ? Thanks! Mani Kumar On Tue, Apr 14, 2009 at 12:12 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Apr 13, 2009 at 11:57 PM, Mani Kumarmanikumarchau...@gmail.com wrote: Hi All, I am trying to setup a Solr instance on my macbook. I get following errors when m trying to do a full db import ... please help me on this java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351) Caused by: java.lang.OutOfMemoryError: Java heap space at com.mysql.jdbc.Buffer.init(Buffer.java:58) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444) at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840) How much heap size have you allocated to the jvm? Also see http://wiki.apache.org/solr/DataImportHandlerFaq -- Regards, Shalin Shekhar Mangar. -- Ilan Rabinovitch i...@fonz.net --- SCALE 7x: 2009 Southern California Linux Expo Los Angeles, CA http://www.socallinuxexpo.org
indexing txt file
Hi all, Currently I wrote an xml file and schema.xml file. What is the next step to index a txt file? Where should I put my txt file I want to index? thank you, Alex V.
Re: Question on StreamingUpdateSolrServer
Some more update. As I mentioned earlier we are using multi-core Solr (up to 65 cores in one Solr instance with each core 10G). This was opening around 3000 file descriptors (lsof). I removed some cores and after some trial and error I found at 25 cores system seems to work fine (around 1400 file descriptors). Tomcat is responsive even when the indexing is happening at Solr (for 25 cores). But, as soon as it goes to 26 cores the Tomcat becomes unresponsive again. The puzzling thing is if I stop indexing I can search on even 65 cores, but while indexing is happening it seems to support only up to 25 cores. 1) Is there a limit on number of cores a Solr instance can handle? 2) Does Solr do anything to the existing cores while indexing? I'm writing to only one core at a time. We are struggling to find why Tomcat stops responding on high number of cores while indexing is in-progress. Any help is very much appreciated. Thanks, -vivek On Mon, Apr 13, 2009 at 10:52 AM, vivek sar vivex...@gmail.com wrote: Here is some more information about my setup, Solr - v1.4 (nightly build 03/29/09) Servlet Container - Tomcat 6.0.18 JVM - 1.6.0 (64 bit) OS - Mac OS X Server 10.5.6 Hardware Overview: Processor Name: Quad-Core Intel Xeon Processor Speed: 3 GHz Number Of Processors: 2 Total Number Of Cores: 8 L2 Cache (per processor): 12 MB Memory: 20 GB Bus Speed: 1.6 GHz JVM Parameters (for Solr): export CATALINA_OPTS=-server -Xms6044m -Xmx6044m -DSOLR_APP -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log -Dsun.rmi.dgc.client.gcInterval=360 -Dsun.rmi.dgc.server.gcInterval=360 Other: lsof|grep solr|wc -l 2493 ulimit -an open files (-n) 9000 Tomcat Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 maxThreads=100 / Total Solr cores on same instance - 65 useCompoundFile - true The tests I ran, While Indexer is running 1) Go to http://juum19.co.com:8080/solr; - returns blank page (no error in the catalina.out) 2) Try telnet juum19.co.com 8080 - returns with Connection closed by foreign host Stop the Indexer Program (Tomcat is still running with Solr) 3) Go to http://juum19.co.com:8080/solr; - works ok, shows the list of all the Solr cores 4) Try telnet - able to Telnet fine 5) Now comment out all the caches in solrconfig.xml. Try same tests, but the Tomcat still doesn't response. Is there a way to stop the auto-warmer. I commented out the caches in the solrconfig.xml but still see the following log, INFO: autowarming result for searc...@3aba3830 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} INFO: Closing searc...@175dc1e2 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 6) Change the Indexer frequency so it runs every 2 min (instead of all the time). I noticed once the commit is done, I'm able to run my searches. During commit and auto-warming period I just get blank page. 7) Changed from Solrj to XML update - I still get the blank page whenever update/commit is happening. Apr 13, 2009 6:46:18 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[621094001, 621094002, 621094003, 621094004, 621094005, 621094006, 621094007, 621094008, ...(6992 more)]} 0 1948 Apr 13, 2009 6:46:18 PM org.apache.solr.core.SolrCore execute INFO: [20090413_12] webapp=/solr path=/update params={} status=0 QTime=1948 So, looks like it's not just StreamingUpdateSolrServer, but whenever the update/commit is happening I'm not able to search. I don't know if it's related to using multi-core. In this test I was using only single thread for update to a single core using only single Solr instance. So, it's clearly related to index process (update, commit and auto-warming). As soon as update/commit/auto-warming is completed I'm able to run my queries again. Is there anything that could stop searching while update process is in-progress - like any lock or something? Any other ideas? Thanks, -vivek On Mon, Apr 13, 2009 at 12:14 AM, Shalin Shekhar
Search included in *all* fields
I'll start a new thread to make things easier, because I've only really got one problem now. I've configured my Solr to search on all fields, so it will only search for a specific query in a specific field (e.g. q=Date:October) will only search the 'Date' field, rather the all the others. The issue is when you build up multiple fields to search on. Only one of those has to match for a result to be returned, rather than all of them. Is there a way to change this? Cheers! -- View this message in context: http://www.nabble.com/Search-included-in-*all*-fields-tp23031829p23031829.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search included in *all* fields
what about: fieldA:value1 AND fieldB:value2 this can also be written as: +fieldA:value1 +fieldB:value2 On Apr 13, 2009, at 9:53 PM, Johnny X wrote: I'll start a new thread to make things easier, because I've only really got one problem now. I've configured my Solr to search on all fields, so it will only search for a specific query in a specific field (e.g. q=Date:October) will only search the 'Date' field, rather than all the others. The issue is when you build up multiple fields to search on. Only one of those has to match for a result to be returned, rather than all of them. Is there a way to change this? Cheers! -- View this message in context: http://www.nabble.com/Search-included-in-*all*-fields-tp23031829p23031829.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question on StreamingUpdateSolrServer
On Tue, Apr 14, 2009 at 7:14 AM, vivek sar vivex...@gmail.com wrote: Some more update. As I mentioned earlier we are using multi-core Solr (up to 65 cores in one Solr instance with each core 10G). This was opening around 3000 file descriptors (lsof). I removed some cores and after some trial and error I found at 25 cores system seems to work fine (around 1400 file descriptors). Tomcat is responsive even when the indexing is happening at Solr (for 25 cores). But, as soon as it goes to 26 cores the Tomcat becomes unresponsive again. The puzzling thing is if I stop indexing I can search on even 65 cores, but while indexing is happening it seems to support only up to 25 cores. 1) Is there a limit on number of cores a Solr instance can handle? 2) Does Solr do anything to the existing cores while indexing? I'm writing to only one core at a time. There is no hard limit (it is Integer.MAX_VALUE) . But inreality your mileage depends on your hardware and no:of file handles the OS can open We are struggling to find why Tomcat stops responding on high number of cores while indexing is in-progress. Any help is very much appreciated. Thanks, -vivek On Mon, Apr 13, 2009 at 10:52 AM, vivek sar vivex...@gmail.com wrote: Here is some more information about my setup, Solr - v1.4 (nightly build 03/29/09) Servlet Container - Tomcat 6.0.18 JVM - 1.6.0 (64 bit) OS - Mac OS X Server 10.5.6 Hardware Overview: Processor Name: Quad-Core Intel Xeon Processor Speed: 3 GHz Number Of Processors: 2 Total Number Of Cores: 8 L2 Cache (per processor): 12 MB Memory: 20 GB Bus Speed: 1.6 GHz JVM Parameters (for Solr): export CATALINA_OPTS=-server -Xms6044m -Xmx6044m -DSOLR_APP -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log -Dsun.rmi.dgc.client.gcInterval=360 -Dsun.rmi.dgc.server.gcInterval=360 Other: lsof|grep solr|wc -l 2493 ulimit -an open files (-n) 9000 Tomcat Connector port=8080 protocol=HTTP/1.1 connectionTimeout=2 maxThreads=100 / Total Solr cores on same instance - 65 useCompoundFile - true The tests I ran, While Indexer is running 1) Go to http://juum19.co.com:8080/solr; - returns blank page (no error in the catalina.out) 2) Try telnet juum19.co.com 8080 - returns with Connection closed by foreign host Stop the Indexer Program (Tomcat is still running with Solr) 3) Go to http://juum19.co.com:8080/solr; - works ok, shows the list of all the Solr cores 4) Try telnet - able to Telnet fine 5) Now comment out all the caches in solrconfig.xml. Try same tests, but the Tomcat still doesn't response. Is there a way to stop the auto-warmer. I commented out the caches in the solrconfig.xml but still see the following log, INFO: autowarming result for searc...@3aba3830 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} INFO: Closing searc...@175dc1e2 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} 6) Change the Indexer frequency so it runs every 2 min (instead of all the time). I noticed once the commit is done, I'm able to run my searches. During commit and auto-warming period I just get blank page. 7) Changed from Solrj to XML update - I still get the blank page whenever update/commit is happening. Apr 13, 2009 6:46:18 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[621094001, 621094002, 621094003, 621094004, 621094005, 621094006, 621094007, 621094008, ...(6992 more)]} 0 1948 Apr 13, 2009 6:46:18 PM org.apache.solr.core.SolrCore execute INFO: [20090413_12] webapp=/solr path=/update params={} status=0 QTime=1948 So, looks like it's not just StreamingUpdateSolrServer, but whenever the update/commit is happening I'm not able to search. I don't know if it's related to using multi-core. In this test I was using only single thread for update to a single core using only single Solr instance. So, it's clearly related to index process (update, commit and auto-warming). As soon as update/commit/auto-warming is completed I'm
Re: DataImporter : Java heap space
Hi ILAN: Only one query is required to generate a document ... Here is my data-config.xml dataConfig dataSource type=JdbcDataSource name=sp driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb_development user=root password=** / document name=items entity name=item dataSource=sp query=select * from items field column=id name=id / field column=title name=title / /entity /document /dataConfig and other useful info: mysql select * from items +--+ | count(*) | +--+ | 900051 | +--+ 1 row in set (0.00 sec) Each record consist of id and title. id is of type int(11) and title's avg. length is 50 chars. I am using tomcat with solr. here is the command i am using to start it ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M Thanks! for help. I appreciate it. -Mani Kumar On Tue, Apr 14, 2009 at 2:31 AM, Ilan Rabinovitch i...@fonz.net wrote: Depending on your dataset and how your queries look you may very likely need to increase to a larger heap size. How many queries and rows are required for each of your documents to be generated? Ilan On 4/13/09 12:21 PM, Mani Kumar wrote: Hi Shalin: Thanks for quick response! By defaults it was set to 1.93 MB. But i also tried it with following command: $ ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M I also tried tricks given on http://wiki.apache.org/solr/DataImportHandlerFaq page. what should i try next ? Thanks! Mani Kumar On Tue, Apr 14, 2009 at 12:12 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Apr 13, 2009 at 11:57 PM, Mani Kumarmanikumarchau...@gmail.com wrote: Hi All, I am trying to setup a Solr instance on my macbook. I get following errors when m trying to do a full db import ... please help me on this java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351) Caused by: java.lang.OutOfMemoryError: Java heap space at com.mysql.jdbc.Buffer.init(Buffer.java:58) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444) at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840) How much heap size have you allocated to the jvm? Also see http://wiki.apache.org/solr/DataImportHandlerFaq -- Regards, Shalin Shekhar Mangar. -- Ilan Rabinovitch i...@fonz.net --- SCALE 7x: 2009 Southern California Linux Expo Los Angeles, CA http://www.socallinuxexpo.org
Re: DataImporter : Java heap space
DIH itself may not be consuming so much memory. It also includes the memory used by Solr. Do you have a hard limit on 400MB , is it not possible to increase it? On Tue, Apr 14, 2009 at 11:09 AM, Mani Kumar manikumarchau...@gmail.com wrote: Hi ILAN: Only one query is required to generate a document ... Here is my data-config.xml dataConfig dataSource type=JdbcDataSource name=sp driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb_development user=root password=** / document name=items entity name=item dataSource=sp query=select * from items field column=id name=id / field column=title name=title / /entity /document /dataConfig and other useful info: mysql select * from items +--+ | count(*) | +--+ | 900051 | +--+ 1 row in set (0.00 sec) Each record consist of id and title. id is of type int(11) and title's avg. length is 50 chars. I am using tomcat with solr. here is the command i am using to start it ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M Thanks! for help. I appreciate it. -Mani Kumar On Tue, Apr 14, 2009 at 2:31 AM, Ilan Rabinovitch i...@fonz.net wrote: Depending on your dataset and how your queries look you may very likely need to increase to a larger heap size. How many queries and rows are required for each of your documents to be generated? Ilan On 4/13/09 12:21 PM, Mani Kumar wrote: Hi Shalin: Thanks for quick response! By defaults it was set to 1.93 MB. But i also tried it with following command: $ ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M I also tried tricks given on http://wiki.apache.org/solr/DataImportHandlerFaq page. what should i try next ? Thanks! Mani Kumar On Tue, Apr 14, 2009 at 12:12 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Apr 13, 2009 at 11:57 PM, Mani Kumarmanikumarchau...@gmail.com wrote: Hi All, I am trying to setup a Solr instance on my macbook. I get following errors when m trying to do a full db import ... please help me on this java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351) Caused by: java.lang.OutOfMemoryError: Java heap space at com.mysql.jdbc.Buffer.init(Buffer.java:58) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444) at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840) How much heap size have you allocated to the jvm? Also see http://wiki.apache.org/solr/DataImportHandlerFaq -- Regards, Shalin Shekhar Mangar. -- Ilan Rabinovitch i...@fonz.net --- SCALE 7x: 2009 Southern California Linux Expo Los Angeles, CA http://www.socallinuxexpo.org -- --Noble Paul
Re: DataImporter : Java heap space
Here is the stack trace: notice in stack trace * at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1749)* It looks like that its trying to read whole table into memory at a time. n thts y getting OOM. Apr 14, 2009 11:15:01 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351) Caused by: java.lang.OutOfMemoryError: Java heap space at com.mysql.jdbc.Buffer.init(Buffer.java:58) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444) at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840) at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:468) at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2534) at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1749) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2159) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2548) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2477) at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:741) at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:587) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:243) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:207) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:335) ... 5 more Apr 14, 2009 11:15:01 AM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: start rollback Apr 14, 2009 11:15:01 AM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: end_rollback On Tue, Apr 14, 2009 at 11:09 AM, Mani Kumar manikumarchau...@gmail.comwrote: Hi ILAN: Only one query is required to generate a document ... Here is my data-config.xml dataConfig dataSource type=JdbcDataSource name=sp driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb_development user=root password=** / document name=items entity name=item dataSource=sp query=select * from items field column=id name=id / field column=title name=title / /entity /document /dataConfig and other useful info: mysql select * from items +--+ | count(*) | +--+ | 900051 | +--+ 1 row in set (0.00 sec) Each record consist of id and title. id is of type int(11) and title's avg. length is 50 chars. I am using tomcat with solr. here is the command i am using to start it ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M Thanks! for help. I appreciate it. -Mani Kumar On Tue, Apr 14, 2009 at 2:31 AM, Ilan Rabinovitch i...@fonz.net wrote: Depending on your dataset and how your queries look you may very likely need to increase to a larger heap size. How many queries and rows are required for each of your documents to be generated? Ilan On 4/13/09 12:21 PM, Mani Kumar wrote: Hi Shalin: Thanks for quick response! By defaults it was set to 1.93 MB. But i also tried it with following command: $ ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M I also tried tricks given on http://wiki.apache.org/solr/DataImportHandlerFaq page. what should i try next ? Thanks! Mani Kumar On Tue, Apr 14, 2009 at 12:12 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Apr 13, 2009 at 11:57 PM, Mani Kumarmanikumarchau...@gmail.com wrote: Hi All, I am trying to setup a Solr instance on my macbook. I get following errors when m trying to do a full db import ... please help me on this java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164) at
Re: DataImporter : Java heap space
Hi Noble: But the question is how much memory? is there any rules or something like that? so that i can estimate the how much memory it requires? Yeah i can increase it upto 800MB max will try it and let you know Thanks! Mani 2009/4/14 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com DIH itself may not be consuming so much memory. It also includes the memory used by Solr. Do you have a hard limit on 400MB , is it not possible to increase it? On Tue, Apr 14, 2009 at 11:09 AM, Mani Kumar manikumarchau...@gmail.com wrote: Hi ILAN: Only one query is required to generate a document ... Here is my data-config.xml dataConfig dataSource type=JdbcDataSource name=sp driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb_development user=root password=** / document name=items entity name=item dataSource=sp query=select * from items field column=id name=id / field column=title name=title / /entity /document /dataConfig and other useful info: mysql select * from items +--+ | count(*) | +--+ | 900051 | +--+ 1 row in set (0.00 sec) Each record consist of id and title. id is of type int(11) and title's avg. length is 50 chars. I am using tomcat with solr. here is the command i am using to start it ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M Thanks! for help. I appreciate it. -Mani Kumar On Tue, Apr 14, 2009 at 2:31 AM, Ilan Rabinovitch i...@fonz.net wrote: Depending on your dataset and how your queries look you may very likely need to increase to a larger heap size. How many queries and rows are required for each of your documents to be generated? Ilan On 4/13/09 12:21 PM, Mani Kumar wrote: Hi Shalin: Thanks for quick response! By defaults it was set to 1.93 MB. But i also tried it with following command: $ ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M I also tried tricks given on http://wiki.apache.org/solr/DataImportHandlerFaq page. what should i try next ? Thanks! Mani Kumar On Tue, Apr 14, 2009 at 12:12 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Mon, Apr 13, 2009 at 11:57 PM, Mani Kumar manikumarchau...@gmail.com wrote: Hi All, I am trying to setup a Solr instance on my macbook. I get following errors when m trying to do a full db import ... please help me on this java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351) Caused by: java.lang.OutOfMemoryError: Java heap space at com.mysql.jdbc.Buffer.init(Buffer.java:58) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444) at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840) How much heap size have you allocated to the jvm? Also see http://wiki.apache.org/solr/DataImportHandlerFaq -- Regards, Shalin Shekhar Mangar. -- Ilan Rabinovitch i...@fonz.net --- SCALE 7x: 2009 Southern California Linux Expo Los Angeles, CA http://www.socallinuxexpo.org -- --Noble Paul
Re: DataImporter : Java heap space
On Tue, Apr 14, 2009 at 11:18 AM, Mani Kumar manikumarchau...@gmail.comwrote: Here is the stack trace: notice in stack trace * at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1749)* It looks like that its trying to read whole table into memory at a time. n thts y getting OOM. Mani, the data-config.xml you posted does not have the batchSize=-1 attribute to your data source. Did you try that? This is a known bug in MySql jdbc driver. -- Regards, Shalin Shekhar Mangar.