date:20090923

Re: Solr with Auto-suggest

2009-09-23 Thread dharhsana


Hi Ryan,

I gone through your post 
https://issues.apache.org/jira/browse/SOLR-357

where you mention about prefix filter,can you tell me how to use that
patch,and you mentioned to use the code as bellow,

fieldType name=prefix_full class=solr.TextField
positionIncrementGap=1
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=20/
/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
/analyzer
/fieldType

fieldType name=prefix_token class=solr.TextField
positionIncrementGap=1
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=20/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
/analyzer
/fieldType

...
field name=prefix1 type=prefix_full indexed=true stored=false/
field name=prefix2 type=prefix_token indexed=true stored=false/
...
copyField source=name dest=prefix1/
copyField source=name dest=prefix2/

For using the above code is that you are using EdgeNGramFilterFactory or
PrefixingFilterFactory.

or the above code works for EdgeNGramFilterFactory,i am not clear about
it,with out using the PrefixingFilterFactory patch, is that i can write the
above code.


And the next is name in copyFiled is text type or string type


waiting for your reply,

Regards,

Rekha







-- 
View this message in context: 
http://www.nabble.com/Solr-with-Auto-suggest-tp16880894p25530993.html
Sent from the Solr - User mailing list archive at Nabble.com.

Highlighting not working on a prefix_token field

2009-09-23 Thread Avlesh Singh

I have a prefix_token field defined as underneath in my schema.xml

fieldType name=prefix_token class=solr.TextField
positionIncrementGap=1
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=20/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
/analyzer
/fieldType

Searches on the field work fine and as expected.
However, attempts to highlight on this field does not yield any results.
Highlighting on other fields work fine.

Any clues? I am using Solr 1.3

Cheers
Avlesh

Re: Highlighting not working on a prefix_token field

2009-09-23 Thread Shalin Shekhar Mangar

On Wed, Sep 23, 2009 at 12:23 PM, Avlesh Singh avl...@gmail.com wrote:

 I have a prefix_token field defined as underneath in my schema.xml

 fieldType name=prefix_token class=solr.TextField
 positionIncrementGap=1
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=20/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
/analyzer
 /fieldType

 Searches on the field work fine and as expected.
 However, attempts to highlight on this field does not yield any results.
 Highlighting on other fields work fine.



Won't work until SOLR-1268 comes along.

http://www.lucidimagination.com/search/document/4da480fe3eb0e7e4/highlighting_in_stemmed_or_n_grammed_fields_possible

-- 
Regards,
Shalin Shekhar Mangar.

Re: Highlighting not working on a prefix_token field

2009-09-23 Thread Avlesh Singh

Hmmm .. But ngrams with KeywordTokenizerFactory instead of the
WhitespaceTokenizerFactory work just as fine. Related issues?

Cheers
Avlesh

On Wed, Sep 23, 2009 at 12:27 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Wed, Sep 23, 2009 at 12:23 PM, Avlesh Singh avl...@gmail.com wrote:

  I have a prefix_token field defined as underneath in my schema.xml
 
  fieldType name=prefix_token class=solr.TextField
  positionIncrementGap=1
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
  maxGramSize=20/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 /analyzer
  /fieldType
 
  Searches on the field work fine and as expected.
  However, attempts to highlight on this field does not yield any results.
  Highlighting on other fields work fine.
 


 Won't work until SOLR-1268 comes along.


 http://www.lucidimagination.com/search/document/4da480fe3eb0e7e4/highlighting_in_stemmed_or_n_grammed_fields_possible

 --
 Regards,
 Shalin Shekhar Mangar.

Re: Solr with Auto-suggest

2009-09-23 Thread Shalin Shekhar Mangar

On Wed, Sep 23, 2009 at 11:30 AM, dharhsana rekha.dharsh...@gmail.comwrote:


 Hi Ryan,

 I gone through your post
 https://issues.apache.org/jira/browse/SOLR-357

 where you mention about prefix filter,can you tell me how to use that
 patch,and you mentioned to use the code as bellow,

 fieldType name=prefix_full class=solr.TextField
 positionIncrementGap=1
 analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=20/
 /analyzer
 analyzer type=query
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 /analyzer
 /fieldType

 fieldType name=prefix_token class=solr.TextField
 positionIncrementGap=1
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=20/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 /analyzer
 /fieldType

 ...
 field name=prefix1 type=prefix_full indexed=true stored=false/
 field name=prefix2 type=prefix_token indexed=true stored=false/
 ...
 copyField source=name dest=prefix1/
 copyField source=name dest=prefix2/

 For using the above code is that you are using EdgeNGramFilterFactory or
 PrefixingFilterFactory.

 or the above code works for EdgeNGramFilterFactory,i am not clear about
 it,with out using the PrefixingFilterFactory patch, is that i can write the
 above code.


There is no such thing in Solr as a PrefixingFilterFactory. Use
EdgeNGramFilterFactory.



 And the next is name in copyFiled is text type or string type


Name was a field in his schema. Whatever fields' values you want for
auto-suggest, copy them over to the field.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Highlighting not working on a prefix_token field

2009-09-23 Thread Shalin Shekhar Mangar

On Wed, Sep 23, 2009 at 12:31 PM, Avlesh Singh avl...@gmail.com wrote:

 Hmmm .. But ngrams with KeywordTokenizerFactory instead of the
 WhitespaceTokenizerFactory work just as fine. Related issues?


I'm sorry I don't understand the question. Do you mean to say that
highlighting works with one but not with another?

-- 
Regards,
Shalin Shekhar Mangar.

Re: Highlighting not working on a prefix_token field

2009-09-23 Thread Avlesh Singh


 I'm sorry I don't understand the question. Do you mean to say that
 highlighting works with one but not with another?

Yes.

Cheers
Avlesh

On Wed, Sep 23, 2009 at 12:59 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Wed, Sep 23, 2009 at 12:31 PM, Avlesh Singh avl...@gmail.com wrote:

  Hmmm .. But ngrams with KeywordTokenizerFactory instead of the
  WhitespaceTokenizerFactory work just as fine. Related issues?
 
 
 I'm sorry I don't understand the question. Do you mean to say that
 highlighting works with one but not with another?

 --
 Regards,
 Shalin Shekhar Mangar.

Finding near duplicates which searching Documents

2009-09-23 Thread Ninad Raut

Hi,
When we have news content crawled we face a problme of same content being
repeated in many documents.  We want to add a near duplicate document filter
to detect such documents. Is there a way to do that in SOLR?
Regards,
Ninad Raut.

Re: Finding near duplicates which searching Documents

2009-09-23 Thread Shalin Shekhar Mangar

On Wed, Sep 23, 2009 at 3:14 PM, Ninad Raut hbase.user.ni...@gmail.comwrote:

 Hi,
 When we have news content crawled we face a problme of same content being
 repeated in many documents.  We want to add a near duplicate document
 filter
 to detect such documents. Is there a way to do that in SOLR?


Look at http://wiki.apache.org/solr/Deduplication

-- 
Regards,
Shalin Shekhar Mangar.

Phrase stopwords

2009-09-23 Thread Pooja Verlani

Hi,
Is it possible to have a phrase as a stopword in solr? In case, please share
how to do so?

regards,
Pooja

Re: Finding near duplicates which searching Documents

2009-09-23 Thread Ninad Raut

Is this feature included in SOLR 1.4??

On Wed, Sep 23, 2009 at 3:29 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Wed, Sep 23, 2009 at 3:14 PM, Ninad Raut hbase.user.ni...@gmail.com
 wrote:

  Hi,
  When we have news content crawled we face a problme of same content being
  repeated in many documents.  We want to add a near duplicate document
  filter
  to detect such documents. Is there a way to do that in SOLR?
 

 Look at http://wiki.apache.org/solr/Deduplication

 --
 Regards,
 Shalin Shekhar Mangar.

RE: Oracle incomplete DataImport results

2009-09-23 Thread Daniel Bradley

After investigating the log files, the DataImporter was throwing an error from 
the Oracle DB driver:

java.sql.SQLException: ORA-22835: Buffer too small for CLOB to CHAR or BLOB to 
RAW conversion (actual: 2890, maximum: 2000)

Aka. There was a problem with the 551st item where a related item had a text 
field of type Clob that was too long and was therefore causing a problem when 
using the function TO_NCHAR to fix the type. 

FIX:
Used the Oracle function dbms_lob.substr(FIELD_NAME, MAX_LENGTH, 1) to just 
trim the string (this also applies and implicit converstion).

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: 22 September 2009 19:27
To: solr-user@lucene.apache.org
Subject: Re: Oracle incomplete DataImport results

On Tue, Sep 22, 2009 at 10:53 PM, Daniel Bradley 
daniel.brad...@adfero.co.uk wrote:

 I appear to be getting only a small number of items imported into Solr
 when doing a full-import against an oracle data-provider. The query I'm
 running is something approximately similar to:

 SELECT ID, dbms_lob.substr(Text, 4000, 1) Text, Date,
 LastModified, Type, Created, Available, Parent, Title from
 TheTableName where Available  CURRENT_DATE and Available 
 add_months(current_date, -1)

 This retrieves the last month's items from the database (The
 dbms_lob.substr function is used to avoid Solr simply indexing the
 object name as Text is the Oracle clob type). When running this in
 oracle sql developer approximately 5600 rows are returned however
 running a full import only imports approximately 550 items.

 There's no visible memory use and no exceptions suggesting any problems
 with lack of memory. Is there any limiting of the number of items you
 can import in a single request? Any other thoughts on this problem would
 be much appreciated.


What is the uniqueKey in schema.xml? Is it possible that many of those 5600
rows share the same value for solr's uniqueKey field?

There are no limits on the number of items you can import. The number of
documents created should correspond to the number of rows returned by the
root level entity's query (assuming the uniqueKey for each of those
documents is actually unique).

-- 
Regards,
Shalin Shekhar Mangar.


This message has been scanned for viruses by Websense Hosted Email Security - 
On Behalf of Adfero Ltd

DISCLAIMER: This email (including any attachments) is subject to copyright, and 
the information in it is confidential. Use of this email or of any information 
in it other than by the addressee is unauthorised and unlawful. Whilst 
reasonable efforts are made to ensure that any attachments are virus-free, it 
is the recipient's sole responsibility to scan all attachments for viruses.  
All calls and emails to and from this company may be monitored and recorded for 
legitimate purposes relating to this company's business.

Any opinions expressed in this email (or in any attachments) are those of the 
author and do not necessarily represent the opinions of Adfero Ltd or of any 
other group company.

Re: Finding near duplicates which searching Documents

2009-09-23 Thread Shalin Shekhar Mangar

On Wed, Sep 23, 2009 at 3:50 PM, Ninad Raut hbase.user.ni...@gmail.comwrote:

 Is this feature included in SOLR 1.4??


Yep.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Oracle incomplete DataImport results

2009-09-23 Thread Shalin Shekhar Mangar

On Wed, Sep 23, 2009 at 3:53 PM, Daniel Bradley daniel.brad...@adfero.co.uk
 wrote:

 After investigating the log files, the DataImporter was throwing an error
 from the Oracle DB driver:

 java.sql.SQLException: ORA-22835: Buffer too small for CLOB to CHAR or BLOB
 to RAW conversion (actual: 2890, maximum: 2000)

 Aka. There was a problem with the 551st item where a related item had a
 text field of type Clob that was too long and was therefore causing a
 problem when using the function TO_NCHAR to fix the type.

 FIX:
 Used the Oracle function dbms_lob.substr(FIELD_NAME, MAX_LENGTH, 1) to
 just trim the string (this also applies and implicit converstion).


Phew, tricky one! Thanks for bringing closure.

-- 
Regards,
Shalin Shekhar Mangar.

Exact match

2009-09-23 Thread bhaskar chandrasekar

Hi,
 
I am doing exact search in Solr .In Solr admin page I  am giving the search 
input string for search.
For ex: I am giving “channeL12” as search input string in solr home page it 
displays search results as 
 
doc
  str name=urlhttp://rediff/field
  str name=titlefirst/field
  str name=descriptionchanneL12/field
/doc
 
As there is a matching input for “channeL12”.
 
If I give “channel12” as search input string with L in lower case I am not 
getting any search results.
 
In fact I changed ignoreCase =”true” in schema.xml
 
schema.xml
analyzer type=query
    tokenizer class=solr.WhitespaceTokenizerFactory ignoreCase=true /
/analyzer
 
I want to ignore casesensitive search in my search results.
 
Please let me know if I need to make changes any where else or what to do to 
achieve the desired output.
 
Regards
Bhaskar

Re: Exact match

2009-09-23 Thread AHMET ARSLAN

 Hi,
  
 I am doing exact search in Solr .In Solr admin page I  am
 giving the search input string for search.
 For ex: I am giving “channeL12” as search input string
 in solr home page it displays search results as 
  
 doc
   str name=urlhttp://rediff/field
   str name=titlefirst/field
   str name=descriptionchanneL12/field
 /doc
  
 As there is a matching input for “channeL12”.
  
 If I give “channel12” as search input string with L in
 lower case I am not getting any search results.
  
 In fact I changed ignoreCase =”true” in schema.xml
  
 schema.xml
 analyzer type=query
     tokenizer
 class=solr.WhitespaceTokenizerFactory ignoreCase=true
 /
 /analyzer
  
 I want to ignore casesensitive search in my search
 results.
  
 Please let me know if I need to make changes any where else
 or what to do to achieve the desired output.
  
 Regards
 Bhaskar

First of all, WhitespaceTokenizerFactory does not have an ignoreCase parameter. 
You need to add 
filter class=solr.LowerCaseFilterFactory/
to your both query and index analyzer.

analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.LowerCaseFilterFactory/
/analyzer

But i think it will be more convenient for you to use the same analyzer for 
index and query.

Hope this helps.

about url field error

2009-09-23 Thread net_nav


hello guy
  I am newbie on solr.
  I have running solr on tomcat6, all is ok,
  when i add data to solrserver via http post cause a error
  the below is code
SolrInputDocument solrdoc=new SolrInputDocument();
solrdoc.addField(url,request.getParameter(URL));
2009-9-23 21:18:03 org.apache.solr.update.processor.LogUpdateProcessor
finish
info: {add=[http://www.yahoo.com]} 0 1
Invalid version or the data in not in 'javabin' format

the schema.xml content is 
field name=url type=url stored=true indexed=true
required=true/
i have to solve this error

thank you very much
-- 
View this message in context: 
http://www.nabble.com/about-url-field-error-tp25531153p25531153.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Phrase stopwords

2009-09-23 Thread AHMET ARSLAN

 From: Pooja Verlani pooja.verl...@gmail.com
 Subject: Phrase stopwords
 To: solr-user@lucene.apache.org
 Date: Wednesday, September 23, 2009, 1:15 PM
 Hi,
 Is it possible to have a phrase as a stopword in solr? In
 case, please share
 how to do so?

 regards,
 Pooja

I think that can be implemented casting/using SynonymFilterFactory and 
StopFilterFactory.

filter class=solr.SynonymFilterFactory synonyms=syn.txt ignoreCase=true 
expand=false/
filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true/

syn.txt will contain lines:

phrase as a stopword = somestupidtoken
phrase stopword = somestupidtoken
three words stopword = somestupidtoken

stopwords.txt will contain line:
somestupidtoken

IMO it will work since SynonymFilterFactory can handle multi-word synonyms like 
a b c d = foo. With expand=false, you can use this filter to reduce your 
multi-word stopwords to a single token (that has a low possibility to occur in 
your docuements). Then remove this single token with StopFilter.
This combination will remove multi-word entries in your syn.txt.

Hope this helps.

RE: Parallel requests to Tomcat

2009-09-23 Thread Fuad Efendi

For 8-CPU load-stress testing of Tomcat you are probably making mistake:
- you should execute load-stress software and wait 5-30 minutes (depends on
index size) BEFORE taking measurements.

1. JVM HotSpot need to compile everything into native code
2. Tomcat Thread Pool needs warm up
3. SOLR caches need warm up(!)
And etc.

8 parallel requests are too small for default Tomcat; it uses 150 threads
(default for old versions), and new Concurrent package from Java 5

You should not test manually; use software such as The Grinder etc., also
note please: there is difference between mean time and response time,
between average (successful) requests per second and average response
time...

 Tomcat is serializing the requests
- doesn't mean anything for performance... yes, it has dedicated Listener on
dedicated port dispatching requests to worker threads... and LAN NIC card
serializes everything too... 



Fuad Efendi
http://www.linkedin.com/in/liferay



 -Original Message-
 From: Michael [mailto:solrco...@gmail.com]
 Sent: September-22-09 4:04 PM
 To: solr-user@lucene.apache.org
 Subject: Parallel requests to Tomcat
 
 Hi,
 I have a Solr+Tomcat installation on an 8 CPU Linux box, and I just tried
 sending parallel requests to it and measuring response time.  I would
expect
 that it could handle up to 8 parallel requests without significant
slowdown
 of any individual request.
 
 Instead, I found that Tomcat is serializing the requests.
 
 For example, the response time for each of 2 parallel requests is nearly 2
 times that for a single request, and the time for each of 8 parallel
 requests is about 4 times that of a single request.
 
 I am pretty sure this is a Tomcat issue, for when I started 8 identical
 instances of Solr+Tomcat on the machine (on 8 different ports), I could
send
 one request to each in parallel with only a 20% slowdown (compared to 300%
 in a single Tomcat.)
 
 I'm using the stock Tomcat download with minimal configuration changes,
 except that I disabled all logging (in case the logger was blocking for
each
 request, serializing them.)  I'm giving 2G RAM to each JVM.
 
 Does anyone more familiar with Tomcat know what's wrong?  I can't imagine
 that Tomcat really can't handle parallel requests.

Re: Parallel requests to Tomcat

2009-09-23 Thread Michael

I'm using a Solr 1.4 nightly from around July.  Is that recent enough to
have the improved reader implementation?
I'm not sure whether you'd call my operations IO heavy -- each query has so
many terms (~50) that even against a 45K document index a query takes 130ms,
but the entire index is in a ramfs.
- Michael

On Tue, Sep 22, 2009 at 8:08 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 What version of Solr are you using?
 Solr1.3 and Lucene 2.4 defaulted to an index reader implementation
 that had to synchronize, so search operations that are IO heavy
 can't proceed in parallel.  You shouldn't see this with 1.4

 -Yonik
 http://www.lucidimagination.com



 On Tue, Sep 22, 2009 at 4:03 PM, Michael solrco...@gmail.com wrote:
  Hi,
  I have a Solr+Tomcat installation on an 8 CPU Linux box, and I just tried
  sending parallel requests to it and measuring response time.  I would
 expect
  that it could handle up to 8 parallel requests without significant
 slowdown
  of any individual request.
 
  Instead, I found that Tomcat is serializing the requests.
 
  For example, the response time for each of 2 parallel requests is nearly
 2
  times that for a single request, and the time for each of 8 parallel
  requests is about 4 times that of a single request.
 
  I am pretty sure this is a Tomcat issue, for when I started 8 identical
  instances of Solr+Tomcat on the machine (on 8 different ports), I could
 send
  one request to each in parallel with only a 20% slowdown (compared to
 300%
  in a single Tomcat.)
 
  I'm using the stock Tomcat download with minimal configuration changes,
  except that I disabled all logging (in case the logger was blocking for
 each
  request, serializing them.)  I'm giving 2G RAM to each JVM.
 
  Does anyone more familiar with Tomcat know what's wrong?  I can't imagine
  that Tomcat really can't handle parallel requests.

RE: Parallel requests to Tomcat

2009-09-23 Thread Fuad Efendi


I have 0-15ms for 50M (millions docs), Tomcat, 8-CPU:
http://www.tokenizer.org
==

- something obviously wrong in your case, 130ms is too high. Is it dedicated
server? Disk swapping? Etc.



 -Original Message-
 From: Michael [mailto:solrco...@gmail.com]
 Sent: September-23-09 11:17 AM
 To: solr-user@lucene.apache.org; yo...@lucidimagination.com
 Subject: Re: Parallel requests to Tomcat
 
 I'm using a Solr 1.4 nightly from around July.  Is that recent enough to
 have the improved reader implementation?
 I'm not sure whether you'd call my operations IO heavy -- each query has
so
 many terms (~50) that even against a 45K document index a query takes
130ms,
 but the entire index is in a ramfs.
 - Michael
 
 On Tue, Sep 22, 2009 at 8:08 PM, Yonik Seeley
 yo...@lucidimagination.comwrote:
 
  What version of Solr are you using?
  Solr1.3 and Lucene 2.4 defaulted to an index reader implementation
  that had to synchronize, so search operations that are IO heavy
  can't proceed in parallel.  You shouldn't see this with 1.4
 
  -Yonik
  http://www.lucidimagination.com
 
 
 
  On Tue, Sep 22, 2009 at 4:03 PM, Michael solrco...@gmail.com wrote:
   Hi,
   I have a Solr+Tomcat installation on an 8 CPU Linux box, and I just
tried
   sending parallel requests to it and measuring response time.  I would
  expect
   that it could handle up to 8 parallel requests without significant
  slowdown
   of any individual request.
  
   Instead, I found that Tomcat is serializing the requests.
  
   For example, the response time for each of 2 parallel requests is
nearly
  2
   times that for a single request, and the time for each of 8 parallel
   requests is about 4 times that of a single request.
  
   I am pretty sure this is a Tomcat issue, for when I started 8
identical
   instances of Solr+Tomcat on the machine (on 8 different ports), I
could
  send
   one request to each in parallel with only a 20% slowdown (compared to
  300%
   in a single Tomcat.)
  
   I'm using the stock Tomcat download with minimal configuration
changes,
   except that I disabled all logging (in case the logger was blocking
for
  each
   request, serializing them.)  I'm giving 2G RAM to each JVM.
  
   Does anyone more familiar with Tomcat know what's wrong?  I can't
imagine
   that Tomcat really can't handle parallel requests.

Re: Parallel requests to Tomcat

2009-09-23 Thread Michael

Hi Fuad, thanks for the reply.
My queries are heavy enough that the difference in performance is obvious.
 I am using a home-grown load testing script that sends 1000 realistic
queries to the server and takes the average response time.  My index is on a
ramfs which I've shown makes the QR and doc caches unnecessary; I am warming
up the filter and fieldvalue caches before beginning the test.  There's no
appreciable difference between query times at the beginning, middle, or end
of the test, so I can't blame the hotspot or the Tomcat thread pool for not
being warmed up.

The queries I'm using are complex enough that they take a long time to run.
 8 queries against 1 Tomcat average 600ms per query, while 8 queries against
8 Tomcats average 190ms per query (on a dedicated 8 CPU server w 32G RAM).
 I don't see how to interpret these numbers except that Tomcat is not
multithreading as well as it should :)

Your thoughts?
Michael


On Wed, Sep 23, 2009 at 10:48 AM, Fuad Efendi f...@efendi.ca wrote:

 For 8-CPU load-stress testing of Tomcat you are probably making mistake:
 - you should execute load-stress software and wait 5-30 minutes (depends on
 index size) BEFORE taking measurements.

 1. JVM HotSpot need to compile everything into native code
 2. Tomcat Thread Pool needs warm up
 3. SOLR caches need warm up(!)
 And etc.

 8 parallel requests are too small for default Tomcat; it uses 150 threads
 (default for old versions), and new Concurrent package from Java 5

 You should not test manually; use software such as The Grinder etc., also
 note please: there is difference between mean time and response time,
 between average (successful) requests per second and average response
 time...

  Tomcat is serializing the requests
 - doesn't mean anything for performance... yes, it has dedicated Listener
 on
 dedicated port dispatching requests to worker threads... and LAN NIC card
 serializes everything too...



 Fuad Efendi
 http://www.linkedin.com/in/liferay



  -Original Message-
  From: Michael [mailto:solrco...@gmail.com]
  Sent: September-22-09 4:04 PM
  To: solr-user@lucene.apache.org
  Subject: Parallel requests to Tomcat
 
  Hi,
  I have a Solr+Tomcat installation on an 8 CPU Linux box, and I just tried
  sending parallel requests to it and measuring response time.  I would
 expect
  that it could handle up to 8 parallel requests without significant
 slowdown
  of any individual request.
 
  Instead, I found that Tomcat is serializing the requests.
 
  For example, the response time for each of 2 parallel requests is nearly
 2
  times that for a single request, and the time for each of 8 parallel
  requests is about 4 times that of a single request.
 
  I am pretty sure this is a Tomcat issue, for when I started 8 identical
  instances of Solr+Tomcat on the machine (on 8 different ports), I could
 send
  one request to each in parallel with only a 20% slowdown (compared to
 300%
  in a single Tomcat.)
 
  I'm using the stock Tomcat download with minimal configuration changes,
  except that I disabled all logging (in case the logger was blocking for
 each
  request, serializing them.)  I'm giving 2G RAM to each JVM.
 
  Does anyone more familiar with Tomcat know what's wrong?  I can't imagine
  that Tomcat really can't handle parallel requests.

RE: Parallel requests to Tomcat

2009-09-23 Thread Fuad Efendi

Correction: 0 - 150ms (depends on size of query results; 150ms for
non-cached (new) queries returning more than 50K docs).

 -Original Message-
 From: Fuad Efendi [mailto:f...@efendi.ca]
 Sent: September-23-09 11:26 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Parallel requests to Tomcat

 I have 0-15ms for 50M (millions docs), Tomcat, 8-CPU:
 http://www.tokenizer.org
 ==

 - something obviously wrong in your case, 130ms is too high. Is it
dedicated
 server? Disk swapping? Etc.

  -Original Message-
  From: Michael [mailto:solrco...@gmail.com]
  Sent: September-23-09 11:17 AM
  To: solr-user@lucene.apache.org; yo...@lucidimagination.com
  Subject: Re: Parallel requests to Tomcat

  I'm using a Solr 1.4 nightly from around July.  Is that recent enough to
  have the improved reader implementation?
  I'm not sure whether you'd call my operations IO heavy -- each query has
 so
  many terms (~50) that even against a 45K document index a query takes
 130ms,
  but the entire index is in a ramfs.
  - Michael

  On Tue, Sep 22, 2009 at 8:08 PM, Yonik Seeley
  yo...@lucidimagination.comwrote:

   What version of Solr are you using?
   Solr1.3 and Lucene 2.4 defaulted to an index reader implementation
   that had to synchronize, so search operations that are IO heavy
   can't proceed in parallel.  You shouldn't see this with 1.4

   -Yonik
   http://www.lucidimagination.com

   On Tue, Sep 22, 2009 at 4:03 PM, Michael solrco...@gmail.com wrote:
Hi,
I have a Solr+Tomcat installation on an 8 CPU Linux box, and I just
 tried
sending parallel requests to it and measuring response time.  I
would
   expect
that it could handle up to 8 parallel requests without significant
   slowdown
of any individual request.

Instead, I found that Tomcat is serializing the requests.

For example, the response time for each of 2 parallel requests is
 nearly
   2
times that for a single request, and the time for each of 8 parallel
requests is about 4 times that of a single request.

I am pretty sure this is a Tomcat issue, for when I started 8
 identical
instances of Solr+Tomcat on the machine (on 8 different ports), I
 could
   send
one request to each in parallel with only a 20% slowdown (compared
to
   300%
in a single Tomcat.)

I'm using the stock Tomcat download with minimal configuration
 changes,
except that I disabled all logging (in case the logger was blocking
 for
   each
request, serializing them.)  I'm giving 2G RAM to each JVM.

Does anyone more familiar with Tomcat know what's wrong?  I can't
 imagine
that Tomcat really can't handle parallel requests.

Re: Parallel requests to Tomcat

2009-09-23 Thread Michael

On Wed, Sep 23, 2009 at 11:26 AM, Fuad Efendi f...@efendi.ca wrote:

 - something obviously wrong in your case, 130ms is too high. Is it
 dedicated
 server? Disk swapping? Etc.


It's that my queries are ridiculously complex.  My users are very familiar
with boolean searching, and I'm doing a lot of processing outside of Solr
that increases the query size by something like 50x.

I'm OK with the individual query time -- I can always shave terms off if I
must.  It's the difference between 1 Tomcat and 8 Tomcats that is the
problem: I'd like to be able to harness all 8 CPUs!  While my test corpus is
45K docs, my actual corpus will be 30MM, and so I'd like to get all the
performance I can out of my box.

Michael

RE: Parallel requests to Tomcat

2009-09-23 Thread Fuad Efendi

  8 queries against 1 Tomcat average 600ms per query, while 8 queries
against
 8 Tomcats average 190ms per query (on a dedicated 8 CPU server w 32G RAM).
  I don't see how to interpret these numbers except that Tomcat is not
 multithreading as well as it should :)


Hi Michael, I think it is very natural; 8 single processes not sharing
anything are faster than 8 threads sharing something.


However, 600ms is too high.

My index is on a
ramfs which I've shown makes the QR and doc caches unnecessary;


However, SOLR is faster than pure Lucene, try SOLR caches! 

(I am not sure about current version of SOLR(Lucene), but Lucene always used
synchronized isDeleted() method which causes 'serialization').

Re: Parallel requests to Tomcat

2009-09-23 Thread Yonik Seeley

On Wed, Sep 23, 2009 at 11:17 AM, Michael solrco...@gmail.com wrote:
 I'm using a Solr 1.4 nightly from around July.  Is that recent enough to
 have the improved reader implementation?
 I'm not sure whether you'd call my operations IO heavy -- each query has so
 many terms (~50) that even against a 45K document index a query takes 130ms,
 but the entire index is in a ramfs.

This could well be IO bound - lots of seeks and reads.
Perhaps try on a normal filesystem and see if you still see the
serialization - someone recently saw some funny results with tmpfs and
lucene, so it would be good to rule that out.

If you want to try and rule out tomcat, throw the webapp in jetty.

-Yonik
http://www.lucidimagination.com

Re: Parallel requests to Tomcat

2009-09-23 Thread Michael

Hi Fuad,

On Wed, Sep 23, 2009 at 11:37 AM, Fuad Efendi f...@efendi.ca wrote:

   8 queries against 1 Tomcat average 600ms per query, while 8 queries
 against
  8 Tomcats average 190ms per query (on a dedicated 8 CPU server w 32G
 RAM).
   I don't see how to interpret these numbers except that Tomcat is not
  multithreading as well as it should :)

 Hi Michael, I think it is very natural; 8 single processes not sharing
 anything are faster than 8 threads sharing something.


8 threads sharing something may have *some* overhead versus 8 processes, but
as you say, 410ms overhead points to a different problem.


However, 600ms is too high.

 My index is on a
 ramfs which I've shown makes the QR and doc caches unnecessary;

 However, SOLR is faster than pure Lucene, try SOLR caches!


I have.  In a separate test, I verified that the caches that save disk I/O
(QR and doc) make no difference to query time, because my index is on a
ramfs.  The caches that save CPU cycles (filter and fieldvalue, because I'm
doing heavy faceting) DO help and I do have them turned on.

Michael

Re: Parallel requests to Tomcat

2009-09-23 Thread Michael

Hi Yonik,

On Wed, Sep 23, 2009 at 11:42 AM, Yonik Seeley
yo...@lucidimagination.comwrote:


 This could well be IO bound - lots of seeks and reads.


If this were IO bound, wouldn't I see the same results when sending my 8
requests to 8 Tomcats?  There's only one disk (well, RAM) whether I'm
querying 8 processes or 8 threads in 1 process, right?

Michael

Re: Parallel requests to Tomcat

2009-09-23 Thread Walter Underwood

This sure seems like a good time to try LucidGaze for Solr. That would  
give some Solr-specific profiling data.


http://www.lucidimagination.com/Downloads/LucidGaze-for-Solr

wunder

On Sep 23, 2009, at 8:47 AM, Michael wrote:


Hi Yonik,

On Wed, Sep 23, 2009 at 11:42 AM, Yonik Seeley
yo...@lucidimagination.comwrote:



This could well be IO bound - lots of seeks and reads.



If this were IO bound, wouldn't I see the same results when sending  
my 8
requests to 8 Tomcats?  There's only one disk (well, RAM) whether  
I'm

querying 8 processes or 8 threads in 1 process, right?

Michael

Re: Parallel requests to Tomcat

2009-09-23 Thread Michael

Thanks for the suggestion, Walter!  I've been using Gaze 1.0 for a while
now, but when I moved to a multicore approach (which was the impetus behind
all of this testing) Gaze failed to start and I had to comment it out of
solrconfig.xml to get Solr to start.  Are you aware whether Gaze is able to
work in a multicore environment?
Michael

On Wed, Sep 23, 2009 at 11:55 AM, Walter Underwood wun...@wunderwood.orgwrote:

 This sure seems like a good time to try LucidGaze for Solr. That would give
 some Solr-specific profiling data.

 http://www.lucidimagination.com/Downloads/LucidGaze-for-Solr

 wunder


 On Sep 23, 2009, at 8:47 AM, Michael wrote:

  Hi Yonik,

 On Wed, Sep 23, 2009 at 11:42 AM, Yonik Seeley
 yo...@lucidimagination.comwrote:


 This could well be IO bound - lots of seeks and reads.


 If this were IO bound, wouldn't I see the same results when sending my 8
 requests to 8 Tomcats?  There's only one disk (well, RAM) whether I'm
 querying 8 processes or 8 threads in 1 process, right?

 Michael

Re: Parallel requests to Tomcat

2009-09-23 Thread Yonik Seeley

On Wed, Sep 23, 2009 at 11:47 AM, Michael solrco...@gmail.com wrote:
 Hi Yonik,

 On Wed, Sep 23, 2009 at 11:42 AM, Yonik Seeley yo...@lucidimagination.com
 wrote:

 This could well be IO bound - lots of seeks and reads.

 If this were IO bound, wouldn't I see the same results when sending my 8
 requests to 8 Tomcats?  There's only one disk (well, RAM) whether I'm
 querying 8 processes or 8 threads in 1 process, right?

Right - I was thinking IO bound at the Lucene Directory level - which
synchronized in the past and led to poor concurrency.  Buy your Solr
version is recent enough to use the newer unsynchronized method by
default (on non-windows)

-Yonik
http://www.lucidimagination.com

Re: Parallel requests to Tomcat

2009-09-23 Thread Michael

On Wed, Sep 23, 2009 at 12:05 PM, Yonik Seeley
yo...@lucidimagination.comwrote:

 On Wed, Sep 23, 2009 at 11:47 AM, Michael solrco...@gmail.com wrote:
  If this were IO bound, wouldn't I see the same results when sending my 8
  requests to 8 Tomcats?  There's only one disk (well, RAM) whether I'm
  querying 8 processes or 8 threads in 1 process, right?

 Right - I was thinking IO bound at the Lucene Directory level - which
 synchronized in the past and led to poor concurrency.  Buy your Solr
 version is recent enough to use the newer unsynchronized method by
 default (on non-windows)


Ah, OK.  So it looks like comparing to Jetty is my only next step.  Although
I'm not sure what I'm going to do based on the result of that test -- if
Jetty behaves differently, then I still don't know why the heck Tomcat is
behaving badly! :)

Michael

RE: Parallel requests to Tomcat

2009-09-23 Thread Fuad Efendi

 8 threads sharing something may have *some* overhead versus 8 processes,
but
 as you say, 410ms overhead points to a different problem.


- You have baseline (single-threaded load-stress script sending requests to
SOLR) (1-request-in-parallel, 8 requests to 8 Tomcats); 200ms looks
extremely high... only if you are not GETting more than top-1 docs in a
single query (instead of default top-10).

RE: Parallel requests to Tomcat

2009-09-23 Thread Fuad Efendi

 I'm not sure whether you'd call my operations IO heavy -- each query has
so
 many terms (~50) that even against a 45K document index a query takes
130ms,
 but the entire index is in a ramfs.



The more terms, the more it takes to find docset intersections (belonging to
each term); something in SOLR/Lucene is still synchronized...

Try to compare with smaller 1-term queries, different terms for parallel
requests...

Re: Solr via ruby

2009-09-23 Thread Ian Connor

Hi,

Thanks for the discussion. We use the distributed option so I am not sure
embedded is possible.

As you also guessed, we use haproxy for load balancing and failover between
replicas of the shards so giving this up for a minor performance boost is
probably not wise.

So essentially we have: User - HTTP Load Balancer - Mogrel Cluster -
Haproxy - N x Solr Shards

and it looks like that is the standard setup for performance from what you
suggest here and most of the performance tweaks I thought of are already in
use.

Ian.

On Fri, Sep 18, 2009 at 3:09 AM, Erik Hatcher erik.hatc...@gmail.comwrote:


 On Sep 18, 2009, at 1:09 AM, rajan chandi wrote:

 We are planning to use the external Solr on tomcat for scalability
 reasons.

 We thought that EmbeddedSolrServer uses HTTP too to talk with Ruby and
 vise-versa as in acts_as_solr ruby plugin.


 EmbeddedSolrServer is a way to run Solr as an API (like Lucene) rather than
 with any web container involved at all.  In other words, only Java can use
 EmbeddedSolrServer (which means JRuby works great).

 The acts_as_solr plugin uses the solr-ruby library to communicate with
 Solr.  Under solr-ruby, it's HTTP with ruby (wt=ruby) formatted responses
 for searches, and documents being indexed get converted to Solr's XML format
 and POSTed to the Solr URL used to open the Solr::Connection

Erik




 If Ruby is not using the HTTP to talk EmbeddedSolrServer, what is it
 using?

 Thanks and Regards
 Rajan Chandi

 On Thu, Sep 17, 2009 at 9:44 PM, Erik Hatcher erik.hatc...@gmail.com
 wrote:


 On Sep 17, 2009, at 11:40 AM, Ian Connor wrote:

  Is there any support for connection pooling or a more optimized data
 exchange format?


 The solr-ruby library (as do other Solr + Ruby libraries) use the ruby
 response format and eval it.  solr-ruby supports keeping the HTTP
 connection
 alive too.

 We are looking at any further ways to optimize the solr

 queries so we can possibly make more of them in the one request.

 The JSON like format seems pretty tight but I understand when the
 distributed search takes place it uses a binary protocol instead of
 text.
 I
 wanted to know if that was available or could be available via the ruby
 library.

 Is it possible to host a local shard and skip HTTP between ruby and
 solr?


 If you use JRuby you can do some fancy stuff, like use the javabin update
 and response formats so no XML is involved, and you could also use Solr's
 EmbeddedSolrServer to avoid HTTP.   However, in practice rarely is HTTP
 the
 bottleneck and actually offers a lot of advantages, such as easy
 commodity
 load balancing and caching.

 But JRuby + Solr is a very beautiful way to go!

 If you're using MRI Ruby, though, you don't really have any options other
 than to go over HTTP. You could use json or ruby formatted responses -
 I'd
 be curious to see some performance numbers comparing those two.

  Erik






-- 
Regards,

Ian Connor
1 Leighton St #723
Cambridge, MA 02141
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Fax: +1(770) 818 5697
Skype: ian.connor

Multiple DisMax Queries spanning across multiple fields

2009-09-23 Thread Kay Kay

For a particular requirement we have - we need to do a query that is a 
combination of multiple dismax queries behind the scenes.  (Using solr 
1.4 nightly ).


The DisMaxQParser org.apache.solr.search.DisMaxQParser ( details at - 
http://wiki.apache.org/solr/DisMaxRequestHandler ) takes in the /qf/ 
parameters and applies the parser to /q /and computes relevance based on 
the same.


We need to have a case where, the final query is a combination of{  
(q  = keywords, qf  = Map of field weights)  ,   (q1, qf1 ) , (q2, 
qf2 )  .. etc  } combined by a boolean AND , for the individual queries.


Creating a custom QParser works right away  as below.   



public class MultiTermDisMaxQParser extends DisMaxQParser
{
  ..
  ..
  ..


 @Override
 public Query parse() throws ParseException
 {
   BooleanQuery finalQuery = new BooleanQuery(true);

   Query superQuery = super.parse(); // Handles {  (q, qf) combination  }.
...
   ...
   // finalQuery adds superQuery with a weight.

   return finalQuery;
 }

}


Curious to see if we have an alternate method to implement the same / 
any other alternate suggestions to the problem itself.

ReversedWildcardFilterFactory (SOLR-1321) and KeywordTokenizerFactory

2009-09-23 Thread Ravi Kiran

Hello,
 Can ReversedWildcardFilterFactory be used with KeywordTokenizerFactory 
? I get the following error, looks like solr expects 
WhitespaceTokenizerFactory...Can anybody suggest how to rectify it. My schema 
snippet is also given below. Data is extracted via OpenNLP and indexed into 
Solr for performing leading wildcard searches for faceting purposes.

ERROR

HTTP Status 500 - 
org.apache.solr.analysis.WhitespaceTokenizerFactory.create(Ljava/io/Reader;)Lorg/apache/lucene/analysis/Tokenizer;
 java.lang.AbstractMethodError: 
org.apache.solr.analysis.WhitespaceTokenizerFactory.create(Ljava/io/Reader;)Lorg/apache/lucene/analysis/Tokenizer;
 at org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:69) 
at 
org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:74) 
at 
org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:364)
 at 
org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:543) 
at 
org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:153) 
at 
org.apache.solr.util.SolrPluginUtils$DisjunctionMaxQueryParser.getFieldQuery(SolrPluginUtils.java:807)
 at 
org.apache.solr.util.SolrPluginUtils$DisjunctionMaxQueryParser.getFieldQuery(SolrPluginUtils.java:794)
 at
 org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1425) at 
org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1313) at 
org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1241) at 
org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1230) 
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:176) at 
org.apache.solr.search.DisMaxQParser.getUserQuery(DisMaxQParser.java:195) at 
org.apache.solr.search.DisMaxQParser.addMainQuery(DisMaxQParser.java:158) at 
org.apache.solr.search.DisMaxQParser.parse(DisMaxQParser.java:74) at 
org.apache.solr.search.QParser.getQuery(QParser.java:131) at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1313) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:198)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:297)
 at 
org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:271)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:202)
 at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:632) 
at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:577) 
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94) at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:206) 
at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:632) 
at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:577) 
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:571) 
at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1080) at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:150)
 at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:632) 
at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:577) 
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:571) 
at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1080) at 
org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:272) at
 
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.invokeAdapter(DefaultProcessorTask.java:637)
 at 
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.doProcess(DefaultProcessorTask.java:568)
 at 
com.sun.enterprise.web.connector.grizzly.DefaultProcessorTask.process(DefaultProcessorTask.java:813)
 at 
com.sun.enterprise.web.connector.grizzly.DefaultReadTask.executeProcessorTask(DefaultReadTask.java:341)
 at 
com.sun.enterprise.web.connector.grizzly.DefaultReadTask.doTask(DefaultReadTask.java:263)
 at 
com.sun.enterprise.web.connector.grizzly.DefaultReadTask.doTask(DefaultReadTask.java:214)
 at com.sun.enterprise.web.connector.grizzly.TaskBase.run(TaskBase.java:265) at 
com.sun.enterprise.web.connector.grizzly.ssl.SSLWorkerThread.run(SSLWorkerThread.java:106)
 

SCHEMA snippet
---

fieldType name=keywordText class=solr.TextField sortMissingLast=true

Very big numbers

2009-09-23 Thread Jonathan Ariel

Hi! I need to index in solr very big numbers. Something like
99,999,999,999,999.99
Right now i'm using an sdouble field type because I need to make range
queries on this field.
The problem is that the field value is being returned in scientific
notation. Is there any way to avoid that?

Thanks!
Jonathan

Re: Solr http post performance seems slow - help?

2009-09-23 Thread Dan A. Dickey

On Friday 11 September 2009 11:06:20 am Dan A. Dickey wrote:
...
 Our JBoss expert and I will be looking into why this might be occurring.
 Does anyone know of any JBoss related slowness with Solr?
 And does anyone have any other sort of suggestions to speed indexing
 performance?   Thanks for your help all!  I'll keep you up to date with
 further progress.

Ok, further progress... just to keep any interested parties up to date
and for the record...

I'm finding that using the example jetty setup (will be switching very
very soon to a real jetty installation) is about the fastest.  Using
several processes to send posts to Solr helps a lot, and we're seeing
about 80 posts a second this way.

We also stripped down JBoss to the bare bones and the Solr in it
is running nearly as fast - about 50 posts a second.  It was our previous
JBoss configuration that was making it appear slow for some reason.

We will be running more tests and spreading out the pre-index workload
across more machines and more processes. In our case we were seeing
the bottleneck being one machine running 18 processes.
The 2 quad core xeon system is experiencing about a 25% cpu load.
And I'm not certain, but I think this may be actually 25% of one of the 8 cores.
So, there's *lots* of room for Solr to be doing more work there.
-Dan

-- 
Dan A. Dickey | Senior Software Engineer

Savvis
10900 Hampshire Ave. S., Bloomington, MN  55438
Office: 952.852.4803 | Fax: 952.852.4951
E-mail: dan.dic...@savvis.net

java doc error local params syntax for dismax

2009-09-23 Thread Naomi Dushay


The javadoc for  DisMaxQParserPlugin states:

{!dismax qf=myfield,mytitle^2}foo creates a dismax query

but actually, that gives an error.

The correct syntax is

{!dismax qf=myfield mytitle^2}foo

(could use single quote instead of double quote).

- Naomi

Re: java doc error local params syntax for dismax

2009-09-23 Thread Yonik Seeley

On Wed, Sep 23, 2009 at 5:59 PM, Naomi Dushay ndus...@stanford.edu wrote:
 The javadoc for  DisMaxQParserPlugin states:

 {!dismax qf=myfield,mytitle^2}foo creates a dismax query

 but actually, that gives an error.

 The correct syntax is

 {!dismax qf=myfield mytitle^2}foo

 (could use single quote instead of double quote).

Thanks, I always forget that dismax uses space separated, not comma
separated lists.

-Yonik

Re: Solrj possible deadlock

2009-09-23 Thread pof


I had the same problem again yesterday except the process halted after about
20mins this time. 


pof wrote:
 
 Hello, I was running a batch index the other day using the Solrj
 EmbeddedSolrServer when the process abruptly froze in it's tracks after
 running for about 4-5 hours and indexing ~400K documents. There were no
 document locks so it would seem likely that there was some kind of thread
 deadlock. I was hoping someone might be able to tell me some information
 about the following thread dump taken at the time:
 
 Full thread dump OpenJDK Client VM (1.6.0-b09 mixed mode):
 
 DestroyJavaVM prio=10 tid=0x9322a800 nid=0xcef waiting on condition
 [0x..0x0018a044]
java.lang.Thread.State: RUNNABLE
 
 Java2D Disposer daemon prio=10 tid=0x0a28cc00 nid=0xf1c in Object.wait()
 [0x0311d000..0x0311def4]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on 0x97a96840 (a java.lang.ref.ReferenceQueue$Lock)
 at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133)
 - locked 0x97a96840 (a java.lang.ref.ReferenceQueue$Lock)
 at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149)
 at sun.java2d.Disposer.run(Disposer.java:143)
 at java.lang.Thread.run(Thread.java:636)
 
 pool-1-thread-1 prio=10 tid=0x93a26c00 nid=0xcf7 waiting on condition
 [0x08a6a000..0x08a6b074]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x967acfd0 (a
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1978)
 at
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:386)
 at
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:636)
 
 Low Memory Detector daemon prio=10 tid=0x93a00c00 nid=0xcf5 runnable
 [0x..0x]
java.lang.Thread.State: RUNNABLE
 
 CompilerThread0 daemon prio=10 tid=0x09fe9800 nid=0xcf4 waiting on
 condition [0x..0x096a7af4]
java.lang.Thread.State: RUNNABLE
 
 Signal Dispatcher daemon prio=10 tid=0x09fe8800 nid=0xcf3 waiting on
 condition [0x..0x]
java.lang.Thread.State: RUNNABLE
 
 Finalizer daemon prio=10 tid=0x09fd7000 nid=0xcf2 in Object.wait()
 [0x005ca000..0x005caef4]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on 0x966e6d40 (a java.lang.ref.ReferenceQueue$Lock)
 at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133)
 - locked 0x966e6d40 (a java.lang.ref.ReferenceQueue$Lock)
 at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149)
 at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)
 
 Reference Handler daemon prio=10 tid=0x09fd2c00 nid=0xcf1 in
 Object.wait() [0x00579000..0x00579d74]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on 0x966e6dc8 (a java.lang.ref.Reference$Lock)
 at java.lang.Object.wait(Object.java:502)
 at
 java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
 - locked 0x966e6dc8 (a java.lang.ref.Reference$Lock)
 
 VM Thread prio=10 tid=0x09fcf800 nid=0xcf0 runnable
 
 VM Periodic Task Thread prio=10 tid=0x93a02400 nid=0xcf6 waiting on
 condition
 
 JNI global references: 1072
 
 Heap
  def new generation   total 36288K, used 23695K [0x93f1, 0x9667,
 0x9667)
   eden space 32256K,  73% used [0x93f1, 0x95633f60, 0x95e9)
   from space 4032K,   0% used [0x95e9, 0x95e9, 0x9628)
   to   space 4032K,   0% used [0x9628, 0x9628, 0x9667)
  tenured generation   total 483968K, used 72129K [0x9667, 0xb3f1,
 0xb3f1)
the space 483968K,  14% used [0x9667, 0x9ace04b8, 0x9ace0600,
 0xb3f1)
  compacting perm gen  total 23040K, used 22983K [0xb3f1, 0xb559,
 0xb7f1)
the space 23040K,  99% used [0xb3f1, 0xb5581ff8, 0xb5582000,
 0xb559)
 No shared spaces configured.
 
 Cheers. Brett.
 

-- 
View this message in context: 
http://www.nabble.com/Solrj-possible-deadlock-tp25530146p25531321.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solrj possible deadlock

2009-09-23 Thread Ryan McKinley


do you have anything custom going on?

The fact that the lock is in java2d seems suspicious...


On Sep 23, 2009, at 7:01 PM, pof wrote:



I had the same problem again yesterday except the process halted  
after about

20mins this time.


pof wrote:


Hello, I was running a batch index the other day using the Solrj
EmbeddedSolrServer when the process abruptly froze in it's tracks  
after
running for about 4-5 hours and indexing ~400K documents. There  
were no
document locks so it would seem likely that there was some kind of  
thread
deadlock. I was hoping someone might be able to tell me some  
information

about the following thread dump taken at the time:

Full thread dump OpenJDK Client VM (1.6.0-b09 mixed mode):

DestroyJavaVM prio=10 tid=0x9322a800 nid=0xcef waiting on condition
[0x..0x0018a044]
  java.lang.Thread.State: RUNNABLE

Java2D Disposer daemon prio=10 tid=0x0a28cc00 nid=0xf1c in  
Object.wait()

[0x0311d000..0x0311def4]
  java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x97a96840 (a java.lang.ref.ReferenceQueue 
$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 
133)

   - locked 0x97a96840 (a java.lang.ref.ReferenceQueue$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 
149)

   at sun.java2d.Disposer.run(Disposer.java:143)
   at java.lang.Thread.run(Thread.java:636)

pool-1-thread-1 prio=10 tid=0x93a26c00 nid=0xcf7 waiting on  
condition

[0x08a6a000..0x08a6b074]
  java.lang.Thread.State: WAITING (parking)
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x967acfd0 (a
java.util.concurrent.locks.AbstractQueuedSynchronizer 
$ConditionObject)

   at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
   at
java.util.concurrent.locks.AbstractQueuedSynchronizer 
$ConditionObject.await(AbstractQueuedSynchronizer.java:1978)

   at
java 
.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java: 
386)

   at
java 
.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java: 
1043)

   at
java 
.util 
.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java: 
1103)

   at
java.util.concurrent.ThreadPoolExecutor 
$Worker.run(ThreadPoolExecutor.java:603)

   at java.lang.Thread.run(Thread.java:636)

Low Memory Detector daemon prio=10 tid=0x93a00c00 nid=0xcf5  
runnable

[0x..0x]
  java.lang.Thread.State: RUNNABLE

CompilerThread0 daemon prio=10 tid=0x09fe9800 nid=0xcf4 waiting on
condition [0x..0x096a7af4]
  java.lang.Thread.State: RUNNABLE

Signal Dispatcher daemon prio=10 tid=0x09fe8800 nid=0xcf3 waiting  
on

condition [0x..0x]
  java.lang.Thread.State: RUNNABLE

Finalizer daemon prio=10 tid=0x09fd7000 nid=0xcf2 in Object.wait()
[0x005ca000..0x005caef4]
  java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x966e6d40 (a java.lang.ref.ReferenceQueue 
$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 
133)

   - locked 0x966e6d40 (a java.lang.ref.ReferenceQueue$Lock)
   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 
149)
   at java.lang.ref.Finalizer 
$FinalizerThread.run(Finalizer.java:177)


Reference Handler daemon prio=10 tid=0x09fd2c00 nid=0xcf1 in
Object.wait() [0x00579000..0x00579d74]
  java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x966e6dc8 (a java.lang.ref.Reference$Lock)
   at java.lang.Object.wait(Object.java:502)
   at
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
   - locked 0x966e6dc8 (a java.lang.ref.Reference$Lock)

VM Thread prio=10 tid=0x09fcf800 nid=0xcf0 runnable

VM Periodic Task Thread prio=10 tid=0x93a02400 nid=0xcf6 waiting on
condition

JNI global references: 1072

Heap
def new generation   total 36288K, used 23695K [0x93f1,  
0x9667,

0x9667)
 eden space 32256K,  73% used [0x93f1, 0x95633f60, 0x95e9)
 from space 4032K,   0% used [0x95e9, 0x95e9, 0x9628)
 to   space 4032K,   0% used [0x9628, 0x9628, 0x9667)
tenured generation   total 483968K, used 72129K [0x9667,  
0xb3f1,

0xb3f1)
  the space 483968K,  14% used [0x9667, 0x9ace04b8, 0x9ace0600,
0xb3f1)
compacting perm gen  total 23040K, used 22983K [0xb3f1,  
0xb559,

0xb7f1)
  the space 23040K,  99% used [0xb3f1, 0xb5581ff8, 0xb5582000,
0xb559)
No shared spaces configured.

Cheers. Brett.



--
View this message in context: 
http://www.nabble.com/Solrj-possible-deadlock-tp25530146p25531321.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: java doc error local params syntax for dismax

2009-09-23 Thread Naomi Dushay

It's not just the spaces - it's that the quotes (single or double  
flavor) is required as well.



On Sep 23, 2009, at 3:10 PM, Yonik Seeley wrote:

On Wed, Sep 23, 2009 at 5:59 PM, Naomi Dushay ndus...@stanford.edu  
wrote:

The javadoc for  DisMaxQParserPlugin states:

{!dismax qf=myfield,mytitle^2}foo creates a dismax query

but actually, that gives an error.

The correct syntax is

{!dismax qf=myfield mytitle^2}foo

(could use single quote instead of double quote).


Thanks, I always forget that dismax uses space separated, not comma
separated lists.

-Yonik

Re: java doc error local params syntax for dismax

2009-09-23 Thread Yonik Seeley

On Wed, Sep 23, 2009 at 8:24 PM, Naomi Dushay ndus...@stanford.edu wrote:
 It's not just the spaces - it's that the quotes (single or double flavor) is
 required as well.

LocalParams are space delimited, so the original example would have
worked if the dismax parser accepted comma delimited fields.

-Yonik
http://www.lucidimagination.com



 On Sep 23, 2009, at 3:10 PM, Yonik Seeley wrote:

 On Wed, Sep 23, 2009 at 5:59 PM, Naomi Dushay ndus...@stanford.edu
 wrote:

 The javadoc for  DisMaxQParserPlugin states:

 {!dismax qf=myfield,mytitle^2}foo creates a dismax query

 but actually, that gives an error.

 The correct syntax is

 {!dismax qf=myfield mytitle^2}foo

 (could use single quote instead of double quote).

 Thanks, I always forget that dismax uses space separated, not comma
 separated lists.

 -Yonik

Can solr build on top of HBase

2009-09-23 Thread 梁景明

hi,  i use hbase and solr ,now i have a large data need to index ,it means
solr-index  will be large,
as the data increases,it will be more larger than now.
so  solrconfig.xml 's dataDir/solrhome/data/dataDir ,can i used it from
api ,and point to my
distrabuted hbase data storage,
and if the index is too large ,will it be slow?
thanks.

Re: solr caching problem

2009-09-23 Thread satya

Is there any way to analyze or see that which documents are getting cached
by documentCache -

  documentCache
 class=solr.LRUCache
 size=512
 initialSize=512
 autowarmCount=0/



On Wed, Sep 23, 2009 at 8:10 AM, satya tosatyaj...@gmail.com wrote:

 First of all , thanks a lot for the clarification.Is there any way to see,
 how this cache is working internally and what are the objects being stored
 and how much memory its consuming,so that we can get a clear picture in
 mind.And how to test the performance through cache.


 On Tue, Sep 22, 2009 at 11:19 PM, Fuad Efendi f...@efendi.ca wrote:

  1)Then do you mean , if we delete a perticular doc ,then that is going
 to
 be
  deleted from
cache also.

 When you delete document, and then COMMIT your changes, new caches will be
 warmed up (and prepopulated by some key-value pairs from old instances),
 etc:

  !-- documentCache caches Lucene Document objects (the stored fields for
 each document).
   Since Lucene internal document ids are transient, this cache will
 not
 be autowarmed.  --
documentCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=0/

 - this one won't be 'prepopulated'.




  2)In solr,is  cache storing the entire document in memory or only the
  references to
 documents in memory.

 There are many different cache instances, DocumentCache should store ID,
 Document pairs, etc

Re: Can solr build on top of HBase

2009-09-23 Thread Noble Paul നോബിള്‍ नोब्ळ्

can hbase be mounted on the filesystem? Solr can only read data from a
 filesystem

On Thu, Sep 24, 2009 at 7:27 AM, 梁景明 futur...@gmail.com wrote:
 hi,  i use hbase and solr ,now i have a large data need to index ,it means
 solr-index  will be large,
 as the data increases,it will be more larger than now.
 so  solrconfig.xml 's dataDir/solrhome/data/dataDir ,can i used it from
 api ,and point to my
 distrabuted hbase data storage,
 and if the index is too large ,will it be slow?
 thanks.




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Can we point a Solr server to index directory dynamically at runtime..

2009-09-23 Thread Silent Surfer

Hi,

Is there any way to dynamically point the Solr servers to an index/data 
directories at run time?

We are generating 200 GB worth of index per day and we want to retain the index 
for approximately 1 month. So our idea is to keep the first 1 week of index 
available at anytime for the users i.e have set of Solr servers up and running 
and handle request to get the past 1 week of date. 

But when user tries to query data which is older than 7 days old, we want to 
dynamically point the existing Solr instances to the inactive/dormant indexes 
and get the results.

The main intention is to limit the number of Solr Slave instances and there by 
limit the # of Servers required.

If the index directory and Solr instances are tightly coupled, then most of the 
Solr instances are just up and running and may hardly used, as most of the 
users are mainly interested in past 1 week data and not beyond that.

Any thoughts or any other approaches to tackle this would be greatly 
appreciated.

Thanks,
sS

Re: Can solr build on top of HBase

2009-09-23 Thread Amit Nithian

Would FUSE (http://wiki.apache.org/hadoop/MountableHDFS) be of use?
I wonder if you could take the data from HBase and index it into a Lucene
index stored on HDFS.

2009/9/23 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 can hbase be mounted on the filesystem? Solr can only read data from a
  filesystem

 On Thu, Sep 24, 2009 at 7:27 AM, 梁景明 futur...@gmail.com wrote:
  hi,  i use hbase and solr ,now i have a large data need to index ,it
 means
  solr-index  will be large,
  as the data increases,it will be more larger than now.
  so  solrconfig.xml 's dataDir/solrhome/data/dataDir ,can i used it
 from
  api ,and point to my
  distrabuted hbase data storage,
  and if the index is too large ,will it be slow?
  thanks.
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

Re: java doc error local params syntax for dismax

2009-09-23 Thread Naomi Dushay


Okay, but

{!dismax qf=myfield mytitle^2}foo works

{!dismax qf=myfield mytitle^2}foo does NOT work

- Naomi

On Sep 23, 2009, at 5:52 PM, Yonik Seeley wrote:

On Wed, Sep 23, 2009 at 8:24 PM, Naomi Dushay ndus...@stanford.edu  
wrote:
It's not just the spaces - it's that the quotes (single or double  
flavor) is

required as well.


LocalParams are space delimited, so the original example would have
worked if the dismax parser accepted comma delimited fields.

-Yonik
http://www.lucidimagination.com




On Sep 23, 2009, at 3:10 PM, Yonik Seeley wrote:


On Wed, Sep 23, 2009 at 5:59 PM, Naomi Dushay ndus...@stanford.edu
wrote:


The javadoc for  DisMaxQParserPlugin states:

{!dismax qf=myfield,mytitle^2}foo creates a dismax query

but actually, that gives an error.

The correct syntax is

{!dismax qf=myfield mytitle^2}foo

(could use single quote instead of double quote).


Thanks, I always forget that dismax uses space separated, not comma
separated lists.

-Yonik

51 matches

Mail list logo