Re: Question on StreamingUpdateSolrServer

2009-04-13 Thread vivek sar
I index in 10K batches and commit after 5 index cyles (after 50K). Is
there any limitation that I can't search during commit or
auto-warming? I got 8 CPU cores and only 2 were showing busy (using
top) - so it's unlikely that the CPU was pegged.

2009/4/12 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
 If you use StreamingUpdateSolrServer it POSTs all the docs in a single
 request. 10 million docs may be a bit too much for a single request. I
 guess you should batch it in multiple requests of smaller chunks,

 It is likely that the CPU is really hot when the autowarming is hapening.

 getting a decent search perf w/o autowarming is not easy .

 autowarmCount is an attribute of a cache .see here
 http://wiki.apache.org/solr/SolrCaching

 On Mon, Apr 13, 2009 at 3:32 AM, vivek sar vivex...@gmail.com wrote:
 Thanks Shalin.

 I noticed couple more things. As I index around 100 million records a
 day, my Indexer is running pretty much at all times throughout the
 day. Whenever I run a search query I usually get connection reset
 when the commit is happening and get blank page when the
 auto-warming of searchers is happening. Here are my questions,

 1) Is this coincidence or a known issue? Can't we search while commit
 or auto-warming is happening?
 2) How do I stop auto-warming? My search traffic is very low so I'm
 trying to turn off auto-warming after commit has happened - is there
 anything in the solrconfig.xml to do that?
 3) What would be the best strategy for searching in my scenario where
 commits may be happening all the time (I commit every 50K records - so
 every 30-60 sec there is a commit happening followed by auto-warming
 that takes 40 sec)?

 Search frequency is pretty low for us, but we want to make sure that
 whenever it happens it is fast enough and returns result (instead of
 exception or a blank screen).

 Thanks for all the help.

 -vivek



 On Sat, Apr 11, 2009 at 1:48 PM, Shalin Shekhar Mangar
 shalinman...@gmail.com wrote:
 On Sun, Apr 12, 2009 at 2:15 AM, vivek sar vivex...@gmail.com wrote:


 The problem is I don't see any error message in the catalina.out. I
 don't even see the request coming in - I simply get blank page on
 browser. If I keep trying the request goes through and I get respond
 from Solr, but then it become unresponsive again or sometimes throws
 connection reset error. I'm not sure why would it work sometimes and
 not the other times for the same query. As soon as I stop the Indexer
 process things start working fine. Any way I can debug this problem?


 I'm not sure. I've never seen this issue myself.

 Could you try using the bundled jetty instead of Tomcat or on a different
 box just to make sure this is not an environment specific issue?

 --
 Regards,
 Shalin Shekhar Mangar.





 --
 --Noble Paul



Re: Question on StreamingUpdateSolrServer

2009-04-13 Thread Shalin Shekhar Mangar
On Mon, Apr 13, 2009 at 12:36 PM, vivek sar vivex...@gmail.com wrote:

 I index in 10K batches and commit after 5 index cyles (after 50K). Is
 there any limitation that I can't search during commit or
 auto-warming? I got 8 CPU cores and only 2 were showing busy (using
 top) - so it's unlikely that the CPU was pegged.


No, there is no such limitation. The old searcher will continue to serve
search requests until the new one is warmed and registered.

So, CPU does not seem to be an issue. Does this happen only when you use
StreamingUpdateSolrServer? Which OS, file system? What JVM parameters are
you using? Which servlet container and version?

-- 
Regards,
Shalin Shekhar Mangar.


DataImportHandler with multiple values

2009-04-13 Thread Vincent Pérès

Hello,

I'm trying to import a simple book table with the full-import command. The
datas are stored in mysql.
It worked well when I tried to import few fields from the 'book' table :
title, author, publisher etc.
Now I would like to create a facet (multi valued field) with the categories
which belong to the book.

There is my sql request to get the list of categories from a book
(009959241X for example, return 7 categories) :
SELECT abn.name AS cat, ab.isbn AS isbn_temp FROM (amazon_books AS ab LEFT
JOIN amazon_book_browse_nodes AS abbn ON ab.isbn = abbn.amazon_book_id) LEFT
JOIN amazon_browse_nodes AS abn ON abbn.amazon_browse_node_id = abn.id WHERE
ab.isbn = '009959241X'

I tried to integrate it on my dataconfig :
dataConfig
  dataSource
type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:33061/completelynovel user=root password=
/
  document name=books
entity name=book pk=ID query=select isbn, listing_id AS id, title,
publisher_name, author_name AS author_name_s from amazon_books where
publisher_name IS NOT NULL AND author_name IS NOT NULL LIMIT 0, 10
  field column=ID name=id /
  field column=ISBN name=isbn /
  field column=TITLE name=title /
  field column=PUBLISHER_NAME name=publisher_name /
  field column=AUTHOR_NAME_S name=author_name_s /
  entity name=book_category pk=id query=SELECT abn.name AS cat,
ab.isbn AS isbn_temp FROM (amazon_books AS ab LEFT JOIN
amazon_book_browse_nodes AS abbn ON ab.isbn = abbn.amazon_book_id) LEFT JOIN
amazon_browse_nodes AS abn ON abbn.amazon_browse_node_id = abn.id WHERE
ab.isbn = '${book.ISBN}'
field column=cat name=cat /
  /entity
/entity
  /document
/dataConfig

And my solr schema :
field name=id type=sint indexed=true stored=true required=true /
 field name=isbn type=string indexed=true stored=true /
   field name=title type=string indexed=true stored=true /
   field name=publisher_name type=string indexed=true stored=true/
   field name=cat type=text_ws indexed=true stored=true
multiValued=true omitNorms=true termVectors=true /
And below standart solr 1.4 dynamics fields...


Ten fields are well created... but without the 'cat' multi value field.
doc
arr name=author_name_s
strTerry Pratchett/str
/arr
int name=id47/int
str name=isbn0552124753/str
str name=publisher_nameCorgi Books/str
date name=timestamp2009-04-13T12:54:38.553Z/date
str name=titleThe Colour of Magic (Discworld Novel)/str
/doc

I guess I missed something, could you help me or redirect me to the right
doc?

Thank you !
Vincent

-- 
View this message in context: 
http://www.nabble.com/DataImportHandler-with-multiple-values-tp23022195p23022195.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler with multiple values

2009-04-13 Thread Shalin Shekhar Mangar
2009/4/13 Vincent Pérès vincent.pe...@gmail.com


 dataConfig
  dataSource
type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://localhost:33061/completelynovel user=root password=
 /
  document name=books
entity name=book pk=ID query=select isbn, listing_id AS id, title,
 publisher_name, author_name AS author_name_s from amazon_books where
 publisher_name IS NOT NULL AND author_name IS NOT NULL LIMIT 0, 10
  field column=ID name=id /
  field column=ISBN name=isbn /
  field column=TITLE name=title /
  field column=PUBLISHER_NAME name=publisher_name /
  field column=AUTHOR_NAME_S name=author_name_s /
  entity name=book_category pk=id query=SELECT abn.name AS cat,
 ab.isbn AS isbn_temp FROM (amazon_books AS ab LEFT JOIN
 amazon_book_browse_nodes AS abbn ON ab.isbn = abbn.amazon_book_id) LEFT
 JOIN
 amazon_browse_nodes AS abn ON abbn.amazon_browse_node_id = abn.id WHERE
 ab.isbn = '${book.ISBN}'
field column=cat name=cat /
  /entity
/entity
  /document
 /dataConfig


 Ten fields are well created... but without the 'cat' multi value field.


Just a guess, try ${book.isbn} instead

Does you sql return the column names in capitals? If you are using trunk,
you do not need to specify upper-case to lower-case mapping in data-config.
Infact the field mapping is not required at all if your schema has a
field with the same name as returned by SQL. DataImportHandler will populate
it with the value, irrespective of case.

Also, if you intend to facet on 'cat', you should probably use a
non-tokenized field type in the schema such as string. Faceting is performed
on the indexed value rather than the stored value.

-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImportHandler with multiple values

2009-04-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
it is likely that your query did not return any data. just run the
query separately and see if it reallly works.

Or try it out in debug mode. it will tell you which query was run and
what got returned.

--Noble

2009/4/13 Vincent Pérès vincent.pe...@gmail.com:

 Hello,

 I'm trying to import a simple book table with the full-import command. The
 datas are stored in mysql.
 It worked well when I tried to import few fields from the 'book' table :
 title, author, publisher etc.
 Now I would like to create a facet (multi valued field) with the categories
 which belong to the book.

 There is my sql request to get the list of categories from a book
 (009959241X for example, return 7 categories) :
 SELECT abn.name AS cat, ab.isbn AS isbn_temp FROM (amazon_books AS ab LEFT
 JOIN amazon_book_browse_nodes AS abbn ON ab.isbn = abbn.amazon_book_id) LEFT
 JOIN amazon_browse_nodes AS abn ON abbn.amazon_browse_node_id = abn.id WHERE
 ab.isbn = '009959241X'

 I tried to integrate it on my dataconfig :
 dataConfig
  dataSource
    type=JdbcDataSource driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://localhost:33061/completelynovel user=root password=
 /
  document name=books
    entity name=book pk=ID query=select isbn, listing_id AS id, title,
 publisher_name, author_name AS author_name_s from amazon_books where
 publisher_name IS NOT NULL AND author_name IS NOT NULL LIMIT 0, 10
      field column=ID name=id /
      field column=ISBN name=isbn /
      field column=TITLE name=title /
      field column=PUBLISHER_NAME name=publisher_name /
      field column=AUTHOR_NAME_S name=author_name_s /
      entity name=book_category pk=id query=SELECT abn.name AS cat,
 ab.isbn AS isbn_temp FROM (amazon_books AS ab LEFT JOIN
 amazon_book_browse_nodes AS abbn ON ab.isbn = abbn.amazon_book_id) LEFT JOIN
 amazon_browse_nodes AS abn ON abbn.amazon_browse_node_id = abn.id WHERE
 ab.isbn = '${book.ISBN}'
        field column=cat name=cat /
      /entity
    /entity
  /document
 /dataConfig

 And my solr schema :
 field name=id type=sint indexed=true stored=true required=true /
  field name=isbn type=string indexed=true stored=true /
   field name=title type=string indexed=true stored=true /
   field name=publisher_name type=string indexed=true stored=true/
   field name=cat type=text_ws indexed=true stored=true
 multiValued=true omitNorms=true termVectors=true /
 And below standart solr 1.4 dynamics fields...


 Ten fields are well created... but without the 'cat' multi value field.
 doc
 arr name=author_name_s
 strTerry Pratchett/str
 /arr
 int name=id47/int
 str name=isbn0552124753/str
 str name=publisher_nameCorgi Books/str
 date name=timestamp2009-04-13T12:54:38.553Z/date
 str name=titleThe Colour of Magic (Discworld Novel)/str
 /doc

 I guess I missed something, could you help me or redirect me to the right
 doc?

 Thank you !
 Vincent

 --
 View this message in context: 
 http://www.nabble.com/DataImportHandler-with-multiple-values-tp23022195p23022195.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Re: DataImportHandler with multiple values

2009-04-13 Thread Vincent Pérès

I changed the ISBN to lowercase (and the other fields as well) and it works !

Thanks very much !
-- 
View this message in context: 
http://www.nabble.com/DataImportHandler-with-multiple-values-tp23022195p23023374.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: PHP Remove From Index/Search By Fields

2009-04-13 Thread Johnny X

Also, in reference to the other question, I'm currently trying to edit the
main search page to search multiple fields.

Essentially, I detect if each field has been posted or not using:

if ($_POST['FIELD'] != '') {
$query = $query . '+FIELDNAME:' . $_POST['FIELD'];
}

Once it's processed all the fields, its then sent to query solr, but I'm not
sure if I'm getting the syntax right or if there's anything in the Solr
config file I need to modify (dismax?) because it still only returns results
when I enter a search in the 'content' field (also the default Solr field).

My Solr query looks like:

$query = ?q=.trim(urlencode($query)).
'version=2.2start=0rows=99indent=on';

where $query will look something like Content: 35 million+Date: 16th Oct
etc, until it has been urlencoded/trimmed.

Will it still only return results on 'content' searches because that's the
only default field?




Johnny X wrote:
 
 Thanks for the reply Erik!
 
 Based on a previous page I used to return queries I've developed this code
 below for the page I need to do all of the above.
 
 CODE
 
 ?php
 
 
 $id = $_GET['id'];
 
 $connection = mysqli_connect(localhost, root, onion, collection)
 or die (Couldn't connect to MySQL);
 
 define('SOLR_URL', 'http://localhost:8080/solr/');
 
 function request($reqData, $type){
 
 $header[] = Content-type: text/xml; charset=UTF-8;
 
 $session = curl_init();
 curl_setopt($session, CURLOPT_HEADER, true);
 curl_setopt($session, CURLOPT_HTTPHEADER, $header);
 curl_setopt($session, CURLOPT_URL,SOLR_URL.$type);
 curl_setopt($session, CURLOPT_POSTFIELDS, $reqData);
 curl_setopt($session, CURLOPT_RETURNTRANSFER, 1);
 curl_setopt($session, CURLOPT_POST,   1);
 
 $response = curl_exec($session);
 curl_close($session);
 
 return $response;
 }
 
 function solrQuery($q){
 $query =
 ?q=.trim(urlencode($q)).qf=Message-IDversion=2.2start=0rows=99indent=on;
 return $results = request(, select.$query);
 
   }
 
 echo htmlheadtitleIP E-mail/title;
 echo 'link rel=stylesheet type=text/css href=stylesheet.css /';
 
 echo 'script type=text/javascript
 !--
 function confirmation() {
   var answer = confirm(Remove spam?)
   if (answer){
   alert(Spam removed!)
   $results = solrQuery('.$id.');
   }
 
 }
 //--
 /script';
 
 
 echo /headbody;
 
 
 echo 'form method=post';
 echo 'table width=100%';
 echo tr;
 echo 'tdh1Trace/Mark IP E-mail/h1td';
 echo 'tdp align=rightPowered by/p/td';
 echo 'td width=283px mysql_logo.jpg /td';
 echo /tr;
 echo /table;
 echo /form;
 
 /* Send a query to the server */ 
 if ($location = mysqli_query($connection, SELECT location FROM hashes
 WHERE message_id = '$id')) { 
 
 echo 'br/';
 echo 'pMark as: input type=button onclick=confirmation()
 value=Spam input type=button value=Non-Business input
 type=button value=Non-Confidential/p';
 
 print(h3Message Location:\n/h3); 
 
 /* Fetch the results of the query */ 
 while( $row = mysqli_fetch_assoc($location) ){ 
 printf(p%s\n/p, $row['location']); 
 } 
 
 /* Destroy the result set and free the memory used for it */ 
 mysqli_free_result($location); 
 } 
 
 
 /* Send a query to the server */ 
 if ($duplicates = mysqli_query($connection, SELECT location FROM hashes
 WHERE (md5 = (SELECT md5 FROM hashes WHERE message_id = '$id') AND
 message_id  '$id'))) { 
 
 print(h3Duplicate Locations:\n/h3); 
 
 /* Fetch the results of the query */ 
 while( $row = mysqli_fetch_assoc($duplicates) ){ 
 printf(p%s\n/p, $row['location']); 
 } 
 
 /* Destroy the result set and free the memory used for it */ 
 mysqli_free_result($duplicates); 
 } 
 
 
 
 
 /* Close the connection */ 
 mysqli_close($connection); 
 
 
 
 
 
  
   
 $results =
 explode('?xml version=1.0 encoding=UTF-8?', $results);
 $results = $results[1];
 
 $dom = new DomDocument;
 $dom-loadXML($results);
 $docs = $dom-getElementsByTagName('doc');
 
 foreach ($docs as $doc) {
 $strings = $doc-getElementsByTagName('arr');
 foreach($strings as $str){
 $attr = $str-getAttribute('name');
 $data = $str-textContent;
 switch($attr){
 case 'Bcc':
 $Bcc = $data;
 break;
 case 'Cc':
 $Cc = $data;
 break;
 case 'Content':
 $Content = $data;
 break;
 case 'Content-Transfer-Encoding':
 $ContentTransferEncoding = $data;
 break;
 case 'Content-Type':
 $ContentType = $data;
 break;
 case 'Date':
 $Date = $data;
 break;
 case 'From':

Re: PHP Remove From Index/Search By Fields

2009-04-13 Thread Erik Hatcher


On Apr 13, 2009, at 11:20 AM, Johnny X wrote:



Also, in reference to the other question, I'm currently trying to  
edit the

main search page to search multiple fields.

Essentially, I detect if each field has been posted or not using:

if ($_POST['FIELD'] != '') {
$query = $query . '+FIELDNAME:' . $_POST['FIELD'];
}

Once it's processed all the fields, its then sent to query solr, but  
I'm not
sure if I'm getting the syntax right or if there's anything in the  
Solr
config file I need to modify (dismax?) because it still only returns  
results
when I enter a search in the 'content' field (also the default Solr  
field).


My Solr query looks like:

   $query = ?q=.trim(urlencode($query)).
   'version=2.2start=0rows=99indent=on';

where $query will look something like Content: 35 million+Date:  
16th Oct

etc, until it has been urlencoded/trimmed.

Will it still only return results on 'content' searches because  
that's the

only default field?


You'll need to read up on Lucene/Solr query parser syntax to be able  
to build useful queries programatically like that: http://wiki.apache.org/solr/SolrQuerySyntax 
  Your syntax above is not doing what you might think... you'll want  
to surround expressions with quotes or in parens for a single field.   
Content:(35 million) for example.


It'll be best if you decouple your questions about query parsing from  
PHP code though.  And don't forget that debugQuery=true is your  
friend, so you can see how queries are being parsed.  Providing that  
output would be helpful to see what is actually happening with what  
you're sending.


Erik




Solr posts xml

2009-04-13 Thread slyistheman

Hi there

I installed Solr on tomcat 6 and whenever I click search it displays the xml
like I am editing it?

is that normal?

I added  a connector line in my server.xml below.


--
Connector port=8080 protocol=HTTP/1.1 
   connectionTimeout=2 
   redirectPort=8443 /
!-- A Connector using the shared thread pool--
!--
Connector executor=tomcatThreadPool
   port=8080 protocol=HTTP/1.1 
   connectionTimeout=2 
   redirectPort=8443 /
--   
!-- Define a SSL HTTP/1.1 Connector on port 8443
 This connector uses the JSSE configuration, when using APR, the 
 connector should be using the OpenSSL style configuration
 described in the APR documentation --
!--
Connector port=8443 protocol=HTTP/1.1 SSLEnabled=true
   maxThreads=150 scheme=https secure=true
   clientAuth=false sslProtocol=TLS /
--

 I added this line
---
   Connector port=8983 maxHttpHeaderSize=8192 
  maxThreads=150 minSpareThreads=25 maxSpareThreads=75 
enableLookups=false redirectPort=8443 acceptCount=100 
connectionTimeout=2 disableUploadTimeout=true
URIEncoding=UTF-8 / 

--
-- 
View this message in context: 
http://www.nabble.com/Solr-posts-xml-tp23024642p23024642.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: PHP Remove From Index/Search By Fields

2009-04-13 Thread Johnny X

Do you know the specific syntax when querying different fields?

http://localhost:8080/solr/select/?q=Date:%222000%22version=2.2start=0rows=10indent=on

doesn't appear to return anything when I post it in my browser, when it
should, but (as before) if you change 'Date' to 'Content' it works!
(presumably because content is the default field). Is there nothing else I
have to change to make sure they're returned? All fields are indexed and
stored, but 'Content' is the only 'text' field, the others are 'string'.

Going back to dismax, it looks like that's more useful for boosting than
specifying multiple fields because it works a lot like copyfields (in that
it compounds all of the fields together in one big search). If I were to do
that, there'd be no need to have anything more than one user input box
because it won't be separated by field anyway.



Erik Hatcher wrote:
 
 
 On Apr 13, 2009, at 11:20 AM, Johnny X wrote:
 

 Also, in reference to the other question, I'm currently trying to  
 edit the
 main search page to search multiple fields.

 Essentially, I detect if each field has been posted or not using:

 if ($_POST['FIELD'] != '') {
 $query = $query . '+FIELDNAME:' . $_POST['FIELD'];
 }

 Once it's processed all the fields, its then sent to query solr, but  
 I'm not
 sure if I'm getting the syntax right or if there's anything in the  
 Solr
 config file I need to modify (dismax?) because it still only returns  
 results
 when I enter a search in the 'content' field (also the default Solr  
 field).

 My Solr query looks like:

$query = ?q=.trim(urlencode($query)).
'version=2.2start=0rows=99indent=on';

 where $query will look something like Content: 35 million+Date:  
 16th Oct
 etc, until it has been urlencoded/trimmed.

 Will it still only return results on 'content' searches because  
 that's the
 only default field?
 
 You'll need to read up on Lucene/Solr query parser syntax to be able  
 to build useful queries programatically like that:
 http://wiki.apache.org/solr/SolrQuerySyntax 
Your syntax above is not doing what you might think... you'll want  
 to surround expressions with quotes or in parens for a single field.   
 Content:(35 million) for example.
 
 It'll be best if you decouple your questions about query parsing from  
 PHP code though.  And don't forget that debugQuery=true is your  
 friend, so you can see how queries are being parsed.  Providing that  
 output would be helpful to see what is actually happening with what  
 you're sending.
 
   Erik
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/PHP-Remove-From-Index-Search-By-Fields-tp22996701p23024816.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: PHP Remove From Index/Search By Fields

2009-04-13 Thread Johnny X

A further update on this is that (when 'Date' is searched using the same URL
as posted in the previous message), whether Date is of type string or text,
the full (exact) content of a field has to be searched to return a result. 

Why is this not the case with Content? I tried changing the default search
field to 'Date' to see if that made a difference and nothing changed.



Johnny X wrote:
 
 Do you know the specific syntax when querying different fields?
 
 http://localhost:8080/solr/select/?q=Date:%222000%22version=2.2start=0rows=10indent=on
 
 doesn't appear to return anything when I post it in my browser, when it
 should, but (as before) if you change 'Date' to 'Content' it works!
 (presumably because content is the default field). Is there nothing else I
 have to change to make sure they're returned? All fields are indexed and
 stored, but 'Content' is the only 'text' field, the others are 'string'.
 
 Going back to dismax, it looks like that's more useful for boosting than
 specifying multiple fields because it works a lot like copyfields (in that
 it compounds all of the fields together in one big search). If I were to
 do that, there'd be no need to have anything more than one user input box
 because it won't be separated by field anyway.
 
 
 
 Erik Hatcher wrote:
 
 
 On Apr 13, 2009, at 11:20 AM, Johnny X wrote:
 

 Also, in reference to the other question, I'm currently trying to  
 edit the
 main search page to search multiple fields.

 Essentially, I detect if each field has been posted or not using:

 if ($_POST['FIELD'] != '') {
 $query = $query . '+FIELDNAME:' . $_POST['FIELD'];
 }

 Once it's processed all the fields, its then sent to query solr, but  
 I'm not
 sure if I'm getting the syntax right or if there's anything in the  
 Solr
 config file I need to modify (dismax?) because it still only returns  
 results
 when I enter a search in the 'content' field (also the default Solr  
 field).

 My Solr query looks like:

$query = ?q=.trim(urlencode($query)).
'version=2.2start=0rows=99indent=on';

 where $query will look something like Content: 35 million+Date:  
 16th Oct
 etc, until it has been urlencoded/trimmed.

 Will it still only return results on 'content' searches because  
 that's the
 only default field?
 
 You'll need to read up on Lucene/Solr query parser syntax to be able  
 to build useful queries programatically like that:
 http://wiki.apache.org/solr/SolrQuerySyntax 
Your syntax above is not doing what you might think... you'll want  
 to surround expressions with quotes or in parens for a single field.   
 Content:(35 million) for example.
 
 It'll be best if you decouple your questions about query parsing from  
 PHP code though.  And don't forget that debugQuery=true is your  
 friend, so you can see how queries are being parsed.  Providing that  
 output would be helpful to see what is actually happening with what  
 you're sending.
 
  Erik
 
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/PHP-Remove-From-Index-Search-By-Fields-tp22996701p23025514.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Term Counts/Term Frequency Vector Info

2009-04-13 Thread Fink, Clayton R.
The query method seems to only support solr/select requests. I subclassed 
SolrRequest and created a request class that supports solr/autoSuggest - 
following the pattern in LukeRequest. It seems to work fine for me.

Clay 

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org] 
Sent: Tuesday, April 07, 2009 10:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Term Counts/Term Frequency Vector Info

You can send arbitrary requests via SolrJ, just use the parameter map via the 
query method: 
http://lucene.apache.org/solr/api/solrj/org/apache/solr/client/solrj/SolrServer.html
.

-Grant

On Apr 7, 2009, at 1:52 PM, Fink, Clayton R. wrote:

 These URLs give me what I want - word completion and term counts.  
 What I don't see is a way to call these via SolrJ. I could call the 
 server directly using java.net classes and process the XML myself, I 
 guess. There needs to be an auto suggest request class.

 http://localhost:8983/solr/autoSuggest?terms=trueterms.fl=CONTENTSte
 rms.lower=Londterms.prefix=Lonindent=true

 response
 −
 lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 /lst
 −
 lst name=terms
 −
 lst name=CONTENTS
 int name=London11/int
 int name=Londoners2/int
 /lst
 /lst
 /response

 http://localhost:8983/solr/autoSuggest?terms=trueterms.fl=CONTENTSte
 rms.lower=Londonterms.upper=Londonterms.upper.incl=trueindent=true

 response
 −
 lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 /lst
 −
 lst name=terms
 −
 lst name=CONTENTS
 int name=London11/int
 /lst
 /lst
 /response

 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Monday, April 06, 2009 5:43 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Term Counts/Term Frequency Vector Info

 See also http://wiki.apache.org/solr/TermsComponent

 You might be able to apply these patches to 1.3 and have them work, 
 but there is no guarantee.  You also can get some termDocs like 
 capabilities through Solr's faceting capabilities, but I am not aware 
 of any way to get at the term vector capabilities.

 HTH,
 Grant

 On Apr 6, 2009, at 1:49 PM, Fink, Clayton R. wrote:

 I want the functionality that Lucene IndexReader.termDocs gives me.
 That or access on the document level to the term vector. This 
 (http://wiki.apache.org/solr/TermVectorComponent?highlight=(term
 )|(vector) seems to suggest that this will be available in 1.4. Is 
 there any way to do this in 1.3?

 Thanks,

 Clay


 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search 


Re: Question on StreamingUpdateSolrServer

2009-04-13 Thread vivek sar
Here is some more information about my setup,

Solr - v1.4 (nightly build 03/29/09)
Servlet Container - Tomcat 6.0.18
JVM - 1.6.0 (64 bit)
OS -  Mac OS X Server 10.5.6

Hardware Overview:

Processor Name: Quad-Core Intel Xeon
Processor Speed: 3 GHz
Number Of Processors: 2
Total Number Of Cores: 8
L2 Cache (per processor): 12 MB
Memory: 20 GB
Bus Speed: 1.6 GHz

JVM Parameters (for Solr):

export CATALINA_OPTS=-server -Xms6044m -Xmx6044m -DSOLR_APP
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log
-Dsun.rmi.dgc.client.gcInterval=360
-Dsun.rmi.dgc.server.gcInterval=360

Other:

lsof|grep solr|wc -l
2493

ulimit -an
  open files  (-n) 9000

Tomcat
Connector port=8080 protocol=HTTP/1.1
   connectionTimeout=2
   maxThreads=100 /

Total Solr cores on same instance - 65

useCompoundFile - true

The tests I ran,

While Indexer is running
1)  Go to http://juum19.co.com:8080/solr;- returns blank page (no
error in the catalina.out)

2) Try telnet juum19.co.com 8080  - returns with Connection closed
by foreign host

Stop the Indexer Program (Tomcat is still running with Solr)

3)  Go to http://juum19.co.com:8080/solr;  - works ok, shows the list
of all the Solr cores

4) Try telnet - able to Telnet fine

5)  Now comment out all the caches in solrconfig.xml. Try same tests,
but the Tomcat still doesn't response.

Is there a way to stop the auto-warmer. I commented out the caches in
the solrconfig.xml but still see the following log,

INFO: autowarming result for searc...@3aba3830 main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

INFO: Closing searc...@175dc1e2
main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}


6) Change the Indexer frequency so it runs every 2 min (instead of all
the time). I noticed once the commit is done, I'm able to run my
searches. During commit and auto-warming period I just get blank page.

 7) Changed from Solrj to XML update -  I still get the blank page
whenever update/commit is happening.

Apr 13, 2009 6:46:18 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {add=[621094001, 621094002, 621094003, 621094004, 621094005,
621094006, 621094007, 621094008, ...(6992 more)]} 0 1948
Apr 13, 2009 6:46:18 PM org.apache.solr.core.SolrCore execute
INFO: [20090413_12] webapp=/solr path=/update params={} status=0 QTime=1948


So, looks like it's not just StreamingUpdateSolrServer, but whenever
the update/commit is happening I'm not able to search. I don't know if
it's related to using multi-core. In this test I was using only single
thread for update to a single core using only single Solr instance.

So, it's clearly related to index process (update, commit and
auto-warming). As soon as update/commit/auto-warming is completed I'm
able to run my queries again. Is there anything that could stop
searching while update process is in-progress - like any lock or
something?

Any other ideas?

Thanks,
-vivek

On Mon, Apr 13, 2009 at 12:14 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 On Mon, Apr 13, 2009 at 12:36 PM, vivek sar vivex...@gmail.com wrote:

 I index in 10K batches and commit after 5 index cyles (after 50K). Is
 there any limitation that I can't search during commit or
 auto-warming? I got 8 CPU cores and only 2 were showing busy (using
 top) - so it's unlikely that the CPU was pegged.


 No, there is no such limitation. The old searcher will continue to serve
 search requests until the new one is warmed and registered.

 So, CPU does not seem to be an issue. Does this happen only when you use
 StreamingUpdateSolrServer? Which OS, file system? What JVM parameters are
 you using? Which servlet container and version?

 --
 Regards,
 Shalin Shekhar Mangar.



Re: Index Version Number

2009-04-13 Thread Richard Wiseman
Interesting.  Do you know if it's possible to get the HTTP headers with 
Solrj?


Yonik Seeley wrote:

On Fri, Apr 10, 2009 at 11:58 AM, Richard Wiseman
rwise...@infosciences.com wrote:
  

Is it possible for a Solr client to determine if the index has changed since
the last time it performed a query?  For example, is it possible to query
the current Lucene indexVersion?



Grant pointed to one way - the Luke handler.
Another way is to look at the Last-Modified or ETag HTTP headers.

$ curl -i http://localhost:8983/solr/select?q=solr
HTTP/1.1 200 OK
Last-Modified: Fri, 10 Apr 2009 17:40:54 GMT
ETag: OWZlNjdkN2Q4ODAwMDAwU29scg==
Content-Type: text/xml; charset=utf-8
Content-Length: 2308
Server: Jetty(6.1.3)


-Yonik
http://www.lucidimagination.com


  



--
Richard Wiseman
Information Sciences Corp.
(301) 962-5707



DataImporter : Java heap space

2009-04-13 Thread Mani Kumar
Hi All,
I am trying to setup a Solr instance on my macbook.

I get following errors when m trying to do a full db import ... please help
me on this

Apr 13, 2009 11:53:28 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Creating a connection for entity slideshow with URL:
jdbc:mysql://localhost/mydb_development
Apr 13, 2009 11:53:29 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Time taken for getConnection(): 319
Apr 13, 2009 11:53:32 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.OutOfMemoryError: Java heap space
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351)
Caused by: java.lang.OutOfMemoryError: Java heap space
at com.mysql.jdbc.Buffer.init(Buffer.java:58)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840)


My Java version
$ java -version
java version 1.5.0_16
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_16-b06-284)
Java HotSpot(TM) Client VM (build 1.5.0_16-133, mixed mode, sharing)


Is that i need to install a new java version?
my db is also very huge ~15 GB

please do the need full ...

thanks
mani kumar


Re: DataImporter : Java heap space

2009-04-13 Thread Mani Kumar
I am using Tomcat ...

On Mon, Apr 13, 2009 at 11:57 PM, Mani Kumar manikumarchau...@gmail.comwrote:

 Hi All,
 I am trying to setup a Solr instance on my macbook.

 I get following errors when m trying to do a full db import ... please help
 me on this

 Apr 13, 2009 11:53:28 PM
 org.apache.solr.handler.dataimport.JdbcDataSource$1 call
 INFO: Creating a connection for entity slideshow with URL:
 jdbc:mysql://localhost/mydb_development
 Apr 13, 2009 11:53:29 PM
 org.apache.solr.handler.dataimport.JdbcDataSource$1 call
 INFO: Time taken for getConnection(): 319
 Apr 13, 2009 11:53:32 PM org.apache.solr.handler.dataimport.DataImporter
 doFullImport
 SEVERE: Full Import failed
 org.apache.solr.handler.dataimport.DataImportHandlerException:
 java.lang.OutOfMemoryError: Java heap space
 at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400)
 at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221)
 at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164)
 at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312)
 at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370)
 at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351)
 Caused by: java.lang.OutOfMemoryError: Java heap space
 at com.mysql.jdbc.Buffer.init(Buffer.java:58)
 at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444)
 at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840)


 My Java version
 $ java -version
 java version 1.5.0_16
 Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_16-b06-284)
 Java HotSpot(TM) Client VM (build 1.5.0_16-133, mixed mode, sharing)


 Is that i need to install a new java version?
 my db is also very huge ~15 GB

 please do the need full ...

 thanks
 mani kumar




Re: DataImporter : Java heap space

2009-04-13 Thread Shalin Shekhar Mangar
On Mon, Apr 13, 2009 at 11:57 PM, Mani Kumar manikumarchau...@gmail.comwrote:

 Hi All,
 I am trying to setup a Solr instance on my macbook.

 I get following errors when m trying to do a full db import ... please help
 me on this

 java.lang.OutOfMemoryError: Java heap space
at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400)
at

 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221)
at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164)
at

 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312)
at

 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370)
at

 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351)
 Caused by: java.lang.OutOfMemoryError: Java heap space
at com.mysql.jdbc.Buffer.init(Buffer.java:58)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840)


How much heap size have you allocated to the jvm?

Also see http://wiki.apache.org/solr/DataImportHandlerFaq

-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImporter : Java heap space

2009-04-13 Thread Mani Kumar
Hi Shalin:

Thanks for quick response!

By defaults it was set to 1.93 MB.
But i also tried it with following command:

$  ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M

I also tried tricks given on
http://wiki.apache.org/solr/DataImportHandlerFaq page.

what should i try next ?

Thanks!
Mani Kumar

On Tue, Apr 14, 2009 at 12:12 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Mon, Apr 13, 2009 at 11:57 PM, Mani Kumar manikumarchau...@gmail.com
 wrote:

  Hi All,
  I am trying to setup a Solr instance on my macbook.
 
  I get following errors when m trying to do a full db import ... please
 help
  me on this
 
  java.lang.OutOfMemoryError: Java heap space
 at
 
 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400)
 at
 
 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221)
 at
 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164)
 at
 
 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312)
 at
 
 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370)
 at
 
 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351)
  Caused by: java.lang.OutOfMemoryError: Java heap space
 at com.mysql.jdbc.Buffer.init(Buffer.java:58)
 at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444)
 at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840)
 
 
 How much heap size have you allocated to the jvm?

 Also see http://wiki.apache.org/solr/DataImportHandlerFaq

 --
 Regards,
 Shalin Shekhar Mangar.



Re: Term Counts/Term Frequency Vector Info

2009-04-13 Thread Grant Ingersoll

Sorry, should have add that you should set the qt param: 
http://wiki.apache.org/solr/CoreQueryParameters#head-2c940d42ec4f2a74c5d251f12f4077e53f2f00f4

-Grant

On Apr 13, 2009, at 1:35 PM, Fink, Clayton R. wrote:

The query method seems to only support solr/select requests. I  
subclassed SolrRequest and created a request class that supports  
solr/autoSuggest - following the pattern in LukeRequest. It seems  
to work fine for me.


Clay

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org]
Sent: Tuesday, April 07, 2009 10:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Term Counts/Term Frequency Vector Info

You can send arbitrary requests via SolrJ, just use the parameter  
map via the query method: http://lucene.apache.org/solr/api/solrj/org/apache/solr/client/solrj/SolrServer.html

.

-Grant

On Apr 7, 2009, at 1:52 PM, Fink, Clayton R. wrote:


These URLs give me what I want - word completion and term counts.
What I don't see is a way to call these via SolrJ. I could call the
server directly using java.net classes and process the XML myself, I
guess. There needs to be an auto suggest request class.

http://localhost:8983/solr/autoSuggest? 
terms=trueterms.fl=CONTENTSte

rms.lower=Londterms.prefix=Lonindent=true

response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
/lst
−
lst name=terms
−
lst name=CONTENTS
int name=London11/int
int name=Londoners2/int
/lst
/lst
/response

http://localhost:8983/solr/autoSuggest? 
terms=trueterms.fl=CONTENTSte

rms.lower=Londonterms.upper=Londonterms.upper.incl=trueindent=true

response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
/lst
−
lst name=terms
−
lst name=CONTENTS
int name=London11/int
/lst
/lst
/response

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org]
Sent: Monday, April 06, 2009 5:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Term Counts/Term Frequency Vector Info

See also http://wiki.apache.org/solr/TermsComponent

You might be able to apply these patches to 1.3 and have them work,
but there is no guarantee.  You also can get some termDocs like
capabilities through Solr's faceting capabilities, but I am not aware
of any way to get at the term vector capabilities.

HTH,
Grant

On Apr 6, 2009, at 1:49 PM, Fink, Clayton R. wrote:


I want the functionality that Lucene IndexReader.termDocs gives me.
That or access on the document level to the term vector. This
(http://wiki.apache.org/solr/TermVectorComponent?highlight=(term
)|(vector) seems to suggest that this will be available in 1.4. Is
there any way to do this in 1.3?

Thanks,

Clay



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: DataImporter : Java heap space

2009-04-13 Thread Ilan Rabinovitch
Depending on your dataset and how your queries look you may very likely 
need to increase to a larger heap size.  How many queries and rows are 
required for each of your documents to be generated?


Ilan

On 4/13/09 12:21 PM, Mani Kumar wrote:

Hi Shalin:

Thanks for quick response!

By defaults it was set to 1.93 MB.
But i also tried it with following command:

$  ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M

I also tried tricks given on
http://wiki.apache.org/solr/DataImportHandlerFaq page.

what should i try next ?

Thanks!
Mani Kumar

On Tue, Apr 14, 2009 at 12:12 AM, Shalin Shekhar Mangar
shalinman...@gmail.com  wrote:


On Mon, Apr 13, 2009 at 11:57 PM, Mani Kumarmanikumarchau...@gmail.com

wrote:



Hi All,
I am trying to setup a Solr instance on my macbook.

I get following errors when m trying to do a full db import ... please

help

me on this

java.lang.OutOfMemoryError: Java heap space
at



org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400)

at



org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221)

at


org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164)

at



org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312)

at



org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370)

at



org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351)

Caused by: java.lang.OutOfMemoryError: Java heap space
at com.mysql.jdbc.Buffer.init(Buffer.java:58)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840)



How much heap size have you allocated to the jvm?

Also see http://wiki.apache.org/solr/DataImportHandlerFaq

--
Regards,
Shalin Shekhar Mangar.






--
Ilan Rabinovitch
i...@fonz.net

---
SCALE 7x: 2009 Southern California Linux Expo
Los Angeles, CA
http://www.socallinuxexpo.org



indexing txt file

2009-04-13 Thread Alex Vu
Hi all,

Currently I wrote an xml file and schema.xml file.  What is the next step to
index a txt file?  Where should I put my txt file I want to index?

thank you,
Alex V.


Re: Question on StreamingUpdateSolrServer

2009-04-13 Thread vivek sar
Some more update. As I mentioned earlier we are using multi-core Solr
(up to 65 cores in one Solr instance with each core 10G). This was
opening around 3000 file descriptors (lsof). I removed some cores and
after some trial and error I found at 25 cores system seems to work
fine (around 1400 file descriptors). Tomcat is responsive even when
the indexing is happening at Solr (for 25 cores). But, as soon as it
goes to 26 cores the Tomcat becomes unresponsive again. The puzzling
thing is if I stop indexing I can search on even 65 cores, but while
indexing is happening it seems to support only up to 25 cores.

1) Is there a limit on number of cores a Solr instance can handle?
2) Does Solr do anything to the existing cores while indexing? I'm
writing to only one core at a time.

We are struggling to find why Tomcat stops responding on high number
of cores while indexing is in-progress. Any help is very much
appreciated.

Thanks,
-vivek

On Mon, Apr 13, 2009 at 10:52 AM, vivek sar vivex...@gmail.com wrote:
 Here is some more information about my setup,

 Solr - v1.4 (nightly build 03/29/09)
 Servlet Container - Tomcat 6.0.18
 JVM - 1.6.0 (64 bit)
 OS -  Mac OS X Server 10.5.6

 Hardware Overview:

 Processor Name: Quad-Core Intel Xeon
 Processor Speed: 3 GHz
 Number Of Processors: 2
 Total Number Of Cores: 8
 L2 Cache (per processor): 12 MB
 Memory: 20 GB
 Bus Speed: 1.6 GHz

 JVM Parameters (for Solr):

 export CATALINA_OPTS=-server -Xms6044m -Xmx6044m -DSOLR_APP
 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log
 -Dsun.rmi.dgc.client.gcInterval=360
 -Dsun.rmi.dgc.server.gcInterval=360

 Other:

 lsof|grep solr|wc -l
    2493

 ulimit -an
  open files                      (-n) 9000

 Tomcat
    Connector port=8080 protocol=HTTP/1.1
               connectionTimeout=2
               maxThreads=100 /

 Total Solr cores on same instance - 65

 useCompoundFile - true

 The tests I ran,

 While Indexer is running
 1)  Go to http://juum19.co.com:8080/solr;    - returns blank page (no
 error in the catalina.out)

 2) Try telnet juum19.co.com 8080  - returns with Connection closed
 by foreign host

 Stop the Indexer Program (Tomcat is still running with Solr)

 3)  Go to http://juum19.co.com:8080/solr;  - works ok, shows the list
 of all the Solr cores

 4) Try telnet - able to Telnet fine

 5)  Now comment out all the caches in solrconfig.xml. Try same tests,
 but the Tomcat still doesn't response.

 Is there a way to stop the auto-warmer. I commented out the caches in
 the solrconfig.xml but still see the following log,

 INFO: autowarming result for searc...@3aba3830 main
 fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

 INFO: Closing searc...@175dc1e2
 main    
 fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}


 6) Change the Indexer frequency so it runs every 2 min (instead of all
 the time). I noticed once the commit is done, I'm able to run my
 searches. During commit and auto-warming period I just get blank page.

  7) Changed from Solrj to XML update -  I still get the blank page
 whenever update/commit is happening.

 Apr 13, 2009 6:46:18 PM
 org.apache.solr.update.processor.LogUpdateProcessor finish
 INFO: {add=[621094001, 621094002, 621094003, 621094004, 621094005,
 621094006, 621094007, 621094008, ...(6992 more)]} 0 1948
 Apr 13, 2009 6:46:18 PM org.apache.solr.core.SolrCore execute
 INFO: [20090413_12] webapp=/solr path=/update params={} status=0 QTime=1948


 So, looks like it's not just StreamingUpdateSolrServer, but whenever
 the update/commit is happening I'm not able to search. I don't know if
 it's related to using multi-core. In this test I was using only single
 thread for update to a single core using only single Solr instance.

 So, it's clearly related to index process (update, commit and
 auto-warming). As soon as update/commit/auto-warming is completed I'm
 able to run my queries again. Is there anything that could stop
 searching while update process is in-progress - like any lock or
 something?

 Any other ideas?

 Thanks,
 -vivek

 On Mon, Apr 13, 2009 at 12:14 AM, Shalin Shekhar 

Search included in *all* fields

2009-04-13 Thread Johnny X

I'll start a new thread to make things easier, because I've only really got
one problem now.

I've configured my Solr to search on all fields, so it will only search for
a specific query in a specific field (e.g. q=Date:October) will only
search the 'Date' field, rather the all the others.

The issue is when you build up multiple fields to search on. Only one of
those has to match for a result to be returned, rather than all of them. Is
there a way to change this?


Cheers!
-- 
View this message in context: 
http://www.nabble.com/Search-included-in-*all*-fields-tp23031829p23031829.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Search included in *all* fields

2009-04-13 Thread Ryan McKinley

what about:
 fieldA:value1 AND fieldB:value2

this can also be written as:
 +fieldA:value1 +fieldB:value2


On Apr 13, 2009, at 9:53 PM, Johnny X wrote:



I'll start a new thread to make things easier, because I've only  
really got

one problem now.

I've configured my Solr to search on all fields, so it will only  
search for

a specific query in a specific field (e.g. q=Date:October) will only
search the 'Date' field, rather than all the others.

The issue is when you build up multiple fields to search on. Only  
one of
those has to match for a result to be returned, rather than all of  
them. Is

there a way to change this?


Cheers!
--
View this message in context: 
http://www.nabble.com/Search-included-in-*all*-fields-tp23031829p23031829.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Question on StreamingUpdateSolrServer

2009-04-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Tue, Apr 14, 2009 at 7:14 AM, vivek sar vivex...@gmail.com wrote:
 Some more update. As I mentioned earlier we are using multi-core Solr
 (up to 65 cores in one Solr instance with each core 10G). This was
 opening around 3000 file descriptors (lsof). I removed some cores and
 after some trial and error I found at 25 cores system seems to work
 fine (around 1400 file descriptors). Tomcat is responsive even when
 the indexing is happening at Solr (for 25 cores). But, as soon as it
 goes to 26 cores the Tomcat becomes unresponsive again. The puzzling
 thing is if I stop indexing I can search on even 65 cores, but while
 indexing is happening it seems to support only up to 25 cores.

 1) Is there a limit on number of cores a Solr instance can handle?
 2) Does Solr do anything to the existing cores while indexing? I'm
 writing to only one core at a time.
There is no hard limit (it is Integer.MAX_VALUE) . But inreality your
mileage depends on your hardware and no:of file handles the OS can
open

 We are struggling to find why Tomcat stops responding on high number
 of cores while indexing is in-progress. Any help is very much
 appreciated.

 Thanks,
 -vivek

 On Mon, Apr 13, 2009 at 10:52 AM, vivek sar vivex...@gmail.com wrote:
 Here is some more information about my setup,

 Solr - v1.4 (nightly build 03/29/09)
 Servlet Container - Tomcat 6.0.18
 JVM - 1.6.0 (64 bit)
 OS -  Mac OS X Server 10.5.6

 Hardware Overview:

 Processor Name: Quad-Core Intel Xeon
 Processor Speed: 3 GHz
 Number Of Processors: 2
 Total Number Of Cores: 8
 L2 Cache (per processor): 12 MB
 Memory: 20 GB
 Bus Speed: 1.6 GHz

 JVM Parameters (for Solr):

 export CATALINA_OPTS=-server -Xms6044m -Xmx6044m -DSOLR_APP
 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:gc.log
 -Dsun.rmi.dgc.client.gcInterval=360
 -Dsun.rmi.dgc.server.gcInterval=360

 Other:

 lsof|grep solr|wc -l
    2493

 ulimit -an
  open files                      (-n) 9000

 Tomcat
    Connector port=8080 protocol=HTTP/1.1
               connectionTimeout=2
               maxThreads=100 /

 Total Solr cores on same instance - 65

 useCompoundFile - true

 The tests I ran,

 While Indexer is running
 1)  Go to http://juum19.co.com:8080/solr;    - returns blank page (no
 error in the catalina.out)

 2) Try telnet juum19.co.com 8080  - returns with Connection closed
 by foreign host

 Stop the Indexer Program (Tomcat is still running with Solr)

 3)  Go to http://juum19.co.com:8080/solr;  - works ok, shows the list
 of all the Solr cores

 4) Try telnet - able to Telnet fine

 5)  Now comment out all the caches in solrconfig.xml. Try same tests,
 but the Tomcat still doesn't response.

 Is there a way to stop the auto-warmer. I commented out the caches in
 the solrconfig.xml but still see the following log,

 INFO: autowarming result for searc...@3aba3830 main
 fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

 INFO: Closing searc...@175dc1e2
 main    
 fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}


 6) Change the Indexer frequency so it runs every 2 min (instead of all
 the time). I noticed once the commit is done, I'm able to run my
 searches. During commit and auto-warming period I just get blank page.

  7) Changed from Solrj to XML update -  I still get the blank page
 whenever update/commit is happening.

 Apr 13, 2009 6:46:18 PM
 org.apache.solr.update.processor.LogUpdateProcessor finish
 INFO: {add=[621094001, 621094002, 621094003, 621094004, 621094005,
 621094006, 621094007, 621094008, ...(6992 more)]} 0 1948
 Apr 13, 2009 6:46:18 PM org.apache.solr.core.SolrCore execute
 INFO: [20090413_12] webapp=/solr path=/update params={} status=0 QTime=1948


 So, looks like it's not just StreamingUpdateSolrServer, but whenever
 the update/commit is happening I'm not able to search. I don't know if
 it's related to using multi-core. In this test I was using only single
 thread for update to a single core using only single Solr instance.

 So, it's clearly related to index process (update, commit and
 auto-warming). As soon as update/commit/auto-warming is completed I'm
 

Re: DataImporter : Java heap space

2009-04-13 Thread Mani Kumar
Hi ILAN:

Only one query is required to generate a document ...
Here is my data-config.xml

dataConfig
dataSource type=JdbcDataSource name=sp
driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb_development
user=root password=** /
document name=items
entity name=item dataSource=sp query=select * from items
field column=id name=id /
field column=title name=title /
/entity
/document
/dataConfig

and other useful info:

mysql select * from items
+--+
| count(*) |
+--+
|   900051 |
+--+
1 row in set (0.00 sec)

Each record consist of id and title.

id  is of type int(11) and title's avg. length is 50 chars.


I am using tomcat with solr.
here is the command i am using to start it

./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M


Thanks! for help. I appreciate it.

-Mani Kumar

On Tue, Apr 14, 2009 at 2:31 AM, Ilan Rabinovitch i...@fonz.net wrote:

 Depending on your dataset and how your queries look you may very likely
 need to increase to a larger heap size.  How many queries and rows are
 required for each of your documents to be generated?

 Ilan


 On 4/13/09 12:21 PM, Mani Kumar wrote:

 Hi Shalin:

 Thanks for quick response!

 By defaults it was set to 1.93 MB.
 But i also tried it with following command:

 $  ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M

 I also tried tricks given on
 http://wiki.apache.org/solr/DataImportHandlerFaq page.

 what should i try next ?

 Thanks!
 Mani Kumar

 On Tue, Apr 14, 2009 at 12:12 AM, Shalin Shekhar Mangar
 shalinman...@gmail.com  wrote:

  On Mon, Apr 13, 2009 at 11:57 PM, Mani Kumarmanikumarchau...@gmail.com

 wrote:


  Hi All,
 I am trying to setup a Solr instance on my macbook.

 I get following errors when m trying to do a full db import ... please

 help

 me on this

 java.lang.OutOfMemoryError: Java heap space
at


 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400)

at


 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221)

at

 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164)

at


 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312)

at


 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370)

at


 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351)

 Caused by: java.lang.OutOfMemoryError: Java heap space
at com.mysql.jdbc.Buffer.init(Buffer.java:58)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840)


  How much heap size have you allocated to the jvm?

 Also see http://wiki.apache.org/solr/DataImportHandlerFaq

 --
 Regards,
 Shalin Shekhar Mangar.




 --
 Ilan Rabinovitch
 i...@fonz.net

 ---
 SCALE 7x: 2009 Southern California Linux Expo
 Los Angeles, CA
 http://www.socallinuxexpo.org




Re: DataImporter : Java heap space

2009-04-13 Thread Noble Paul നോബിള്‍ नोब्ळ्
DIH itself may not be consuming so much memory. It also includes the
memory used by Solr.

Do you have a hard limit on 400MB  , is it not possible to increase it?

On Tue, Apr 14, 2009 at 11:09 AM, Mani Kumar manikumarchau...@gmail.com wrote:
 Hi ILAN:

 Only one query is required to generate a document ...
 Here is my data-config.xml

 dataConfig
    dataSource type=JdbcDataSource name=sp
 driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb_development
 user=root password=** /
    document name=items
        entity name=item dataSource=sp query=select * from items
            field column=id name=id /
            field column=title name=title /
        /entity
    /document
 /dataConfig

 and other useful info:

 mysql select * from items
 +--+
 | count(*) |
 +--+
 |   900051 |
 +--+
 1 row in set (0.00 sec)

 Each record consist of id and title.

 id  is of type int(11) and title's avg. length is 50 chars.


 I am using tomcat with solr.
 here is the command i am using to start it

 ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M


 Thanks! for help. I appreciate it.

 -Mani Kumar

 On Tue, Apr 14, 2009 at 2:31 AM, Ilan Rabinovitch i...@fonz.net wrote:

 Depending on your dataset and how your queries look you may very likely
 need to increase to a larger heap size.  How many queries and rows are
 required for each of your documents to be generated?

 Ilan


 On 4/13/09 12:21 PM, Mani Kumar wrote:

 Hi Shalin:

 Thanks for quick response!

 By defaults it was set to 1.93 MB.
 But i also tried it with following command:

 $  ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M

 I also tried tricks given on
 http://wiki.apache.org/solr/DataImportHandlerFaq page.

 what should i try next ?

 Thanks!
 Mani Kumar

 On Tue, Apr 14, 2009 at 12:12 AM, Shalin Shekhar Mangar
 shalinman...@gmail.com  wrote:

  On Mon, Apr 13, 2009 at 11:57 PM, Mani Kumarmanikumarchau...@gmail.com

 wrote:


  Hi All,
 I am trying to setup a Solr instance on my macbook.

 I get following errors when m trying to do a full db import ... please

 help

 me on this

 java.lang.OutOfMemoryError: Java heap space
        at


 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400)

        at


 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221)

        at

 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164)

        at


 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312)

        at


 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370)

        at


 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351)

 Caused by: java.lang.OutOfMemoryError: Java heap space
        at com.mysql.jdbc.Buffer.init(Buffer.java:58)
        at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444)
        at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840)


  How much heap size have you allocated to the jvm?

 Also see http://wiki.apache.org/solr/DataImportHandlerFaq

 --
 Regards,
 Shalin Shekhar Mangar.




 --
 Ilan Rabinovitch
 i...@fonz.net

 ---
 SCALE 7x: 2009 Southern California Linux Expo
 Los Angeles, CA
 http://www.socallinuxexpo.org






-- 
--Noble Paul


Re: DataImporter : Java heap space

2009-04-13 Thread Mani Kumar
Here is the stack trace:

notice in stack trace *   at
com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1749)*

It looks like that its trying to read whole table into memory at a time. n
thts y getting OOM.

Apr 14, 2009 11:15:01 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.OutOfMemoryError: Java heap space
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351)
Caused by: java.lang.OutOfMemoryError: Java heap space
at com.mysql.jdbc.Buffer.init(Buffer.java:58)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840)
at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:468)
at
com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2534)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1749)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2159)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2548)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2477)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:741)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:587)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:243)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:207)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:335)
... 5 more
Apr 14, 2009 11:15:01 AM org.apache.solr.update.DirectUpdateHandler2
rollback
INFO: start rollback
Apr 14, 2009 11:15:01 AM org.apache.solr.update.DirectUpdateHandler2
rollback
INFO: end_rollback




On Tue, Apr 14, 2009 at 11:09 AM, Mani Kumar manikumarchau...@gmail.comwrote:

 Hi ILAN:

 Only one query is required to generate a document ...
 Here is my data-config.xml

 dataConfig
 dataSource type=JdbcDataSource name=sp
 driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/mydb_development
 user=root password=** /
 document name=items
 entity name=item dataSource=sp query=select * from items
 field column=id name=id /
 field column=title name=title /
 /entity
 /document
 /dataConfig

 and other useful info:

 mysql select * from items
 +--+
 | count(*) |
 +--+
 |   900051 |
 +--+
 1 row in set (0.00 sec)

 Each record consist of id and title.

 id  is of type int(11) and title's avg. length is 50 chars.


 I am using tomcat with solr.
 here is the command i am using to start it

 ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M


 Thanks! for help. I appreciate it.

 -Mani Kumar

 On Tue, Apr 14, 2009 at 2:31 AM, Ilan Rabinovitch i...@fonz.net wrote:

 Depending on your dataset and how your queries look you may very likely
 need to increase to a larger heap size.  How many queries and rows are
 required for each of your documents to be generated?

 Ilan


 On 4/13/09 12:21 PM, Mani Kumar wrote:

 Hi Shalin:

 Thanks for quick response!

 By defaults it was set to 1.93 MB.
 But i also tried it with following command:

 $  ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M

 I also tried tricks given on
 http://wiki.apache.org/solr/DataImportHandlerFaq page.

 what should i try next ?

 Thanks!
 Mani Kumar

 On Tue, Apr 14, 2009 at 12:12 AM, Shalin Shekhar Mangar
 shalinman...@gmail.com  wrote:

  On Mon, Apr 13, 2009 at 11:57 PM, Mani Kumarmanikumarchau...@gmail.com

 wrote:


  Hi All,
 I am trying to setup a Solr instance on my macbook.

 I get following errors when m trying to do a full db import ... please

 help

 me on this

 java.lang.OutOfMemoryError: Java heap space
at


 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400)

at


 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221)

at

 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164)

at


 

Re: DataImporter : Java heap space

2009-04-13 Thread Mani Kumar
Hi Noble:


But the question is how much memory? is there any rules or something like
that? so that i can estimate the how much memory it requires?
Yeah i can increase it upto 800MB max will try it and let you know

Thanks!
Mani

2009/4/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@gmail.com

 DIH itself may not be consuming so much memory. It also includes the
 memory used by Solr.

 Do you have a hard limit on 400MB  , is it not possible to increase it?

 On Tue, Apr 14, 2009 at 11:09 AM, Mani Kumar manikumarchau...@gmail.com
 wrote:
  Hi ILAN:
 
  Only one query is required to generate a document ...
  Here is my data-config.xml
 
  dataConfig
 dataSource type=JdbcDataSource name=sp
  driver=com.mysql.jdbc.Driver
 url=jdbc:mysql://localhost/mydb_development
  user=root password=** /
 document name=items
 entity name=item dataSource=sp query=select * from items
 field column=id name=id /
 field column=title name=title /
 /entity
 /document
  /dataConfig
 
  and other useful info:
 
  mysql select * from items
  +--+
  | count(*) |
  +--+
  |   900051 |
  +--+
  1 row in set (0.00 sec)
 
  Each record consist of id and title.
 
  id  is of type int(11) and title's avg. length is 50 chars.
 
 
  I am using tomcat with solr.
  here is the command i am using to start it
 
  ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M
 
 
  Thanks! for help. I appreciate it.
 
  -Mani Kumar
 
  On Tue, Apr 14, 2009 at 2:31 AM, Ilan Rabinovitch i...@fonz.net wrote:
 
  Depending on your dataset and how your queries look you may very likely
  need to increase to a larger heap size.  How many queries and rows are
  required for each of your documents to be generated?
 
  Ilan
 
 
  On 4/13/09 12:21 PM, Mani Kumar wrote:
 
  Hi Shalin:
 
  Thanks for quick response!
 
  By defaults it was set to 1.93 MB.
  But i also tried it with following command:
 
  $  ./apache-tomcat-6.0.18/bin/startup.sh -Xmn50M -Xms300M -Xmx400M
 
  I also tried tricks given on
  http://wiki.apache.org/solr/DataImportHandlerFaq page.
 
  what should i try next ?
 
  Thanks!
  Mani Kumar
 
  On Tue, Apr 14, 2009 at 12:12 AM, Shalin Shekhar Mangar
  shalinman...@gmail.com  wrote:
 
   On Mon, Apr 13, 2009 at 11:57 PM, Mani Kumar
 manikumarchau...@gmail.com
 
  wrote:
 
 
   Hi All,
  I am trying to setup a Solr instance on my macbook.
 
  I get following errors when m trying to do a full db import ...
 please
 
  help
 
  me on this
 
  java.lang.OutOfMemoryError: Java heap space
 at
 
 
 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:400)
 
 at
 
 
 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:221)
 
 at
 
 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:164)
 
 at
 
 
 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:312)
 
 at
 
 
 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:370)
 
 at
 
 
 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:351)
 
  Caused by: java.lang.OutOfMemoryError: Java heap space
 at com.mysql.jdbc.Buffer.init(Buffer.java:58)
 at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1444)
 at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2840)
 
 
   How much heap size have you allocated to the jvm?
 
  Also see http://wiki.apache.org/solr/DataImportHandlerFaq
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 
 
 
  --
  Ilan Rabinovitch
  i...@fonz.net
 
  ---
  SCALE 7x: 2009 Southern California Linux Expo
  Los Angeles, CA
  http://www.socallinuxexpo.org
 
 
 



 --
 --Noble Paul



Re: DataImporter : Java heap space

2009-04-13 Thread Shalin Shekhar Mangar
On Tue, Apr 14, 2009 at 11:18 AM, Mani Kumar manikumarchau...@gmail.comwrote:

 Here is the stack trace:

 notice in stack trace *   at
 com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1749)*

 It looks like that its trying to read whole table into memory at a time. n
 thts y getting OOM.


Mani, the data-config.xml you posted does not have the batchSize=-1
attribute to your data source. Did you try that? This is a known bug in
MySql jdbc driver.

-- 
Regards,
Shalin Shekhar Mangar.