Re:Re: How to speed up solr search speed

2010-07-26 Thread Dennis Gearon
Isn't it always one of these three? (from most likely to least likely, 
generally)

Memory
Disk Speed
WebServer and it's code
CPU.

Memory and Disk are related, as swapping occurs between them. As long as memory 
is high enough, it becomes:

Disk Speed
WebServer and it's code
CPU

If the WebServer is configured to be as fast as is possible,THEN the CPU comes 
into play.

So normally:

1/ Put enough memory in so it doesn't swap
2/ Buy the fastest damn disk/diskArrays/SolidState/HyperDrive RamDisk/RAIDed 
HyperDrive RamDisk that you can afford.
3/ Tune your webserver code.

1 GOOD *LAPTOP* with 8-16 gig of ram(with a 64bit OS), and an single, external 
SATA HyperDrive 64Gig RamDrive is SCREAMING, way beyond most single server 
boxes you'll pay to get hosting on.


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Fri, 7/16/10, marship mars...@126.com wrote:

 From: marship mars...@126.com
 Subject: Re:Re: How to speed up solr search speed
 To: solr-user@lucene.apache.org
 Date: Friday, July 16, 2010, 11:26 AM
 Hi. Peter. 
 
  Thanks for replying.
 
 
 Hi Scott!
 
  I am aware these cores on same server are
 interfering with each other.
 
 Thats not good. Try to use only one core per CPU. With
 more per CPU you
 won't have any benefits over the single-core version, I
 think.
 
  I only have 2 servers, each CPU with 8 cores. Each server
 has 6G memory. So I have 16 CPU core in total. But I have 70
 solr cores so I have to use them on my 2 servers. Based on
 my observation, even when the search is processing, the CPU
 usage is not high. The memory usage is not high too. Each
 solr(jetty) instance on consume 40M-60M memory. My server
 always have 2-3G memory availabe.
 
  can solr use more memory to avoid disk operation
 conflicts?
 
 Yes, only the memory you have on the machine of course.
 Are you using
 tomcat or jetty?
 
 
 I am using jetty.
  For my case, I don't think solr can work as fast
 as 100-200ms on average.
 
 We have indices with a lot entries not as large as
 yours, but in the
 range of X Million. and have response times under
 100ms.
 What about testing only one core with 5-10 Mio docs? If
 the response
 time isn't any better maybe you need a different field
 config or sth.
 different is wrong?
 
 For the moment, I really don't know. I tried to use java
 -sever -jar start.jar to start jetty/solr. I saw when solr
 start, sometimes some core search for simple keyword like
 design will take 70s, of course some only take 0-15ms.
 From my aspect, I do believe it is the harddisk accessed by
 these cores deplays each other. So finally some cores fall
 behind. But the bad news for me is the solr distriubted
 search's speed is decided by the slowest one. 
 
 
 
  So should I add it or the default(without it ) is
 ok?
 
 Without is also okay - solr uses default.
 With 75 Mio docs it should around 20 000 but I guess
 there is sth.
 different wrong: maybe caching or field definition.
 Could you post the
 latter one?
 
 
 Sorry. What are you asking me to post?
 
  
 
 
 Regards,
 Peter.
 
  Hi. Peter.
  I think I am not using faceting, highlighting ...
 I read about them
  but don't know how to work with them. I am using
 the default example
  just change the indexed fields.
  For my case, I don't think solr can work as fast
 as 100-200ms on
  average. I tried some keywords on only single solr
 instance. It
  sometimes takes more than 20s. I just input 4
 keywords. I agree it is
  keyword concerns. But the issue is it doesn't work
 consistently.
 
  When 37 instances on same server works at same
 time (when a
  distributed search start), it goes worse, I saw
 some solr cores
  execute very fast, 0ms, ~40ms, ~200ms. But more
 solr cores executed as
  ~2500ms, ~3500ms, ~6700ms. and about 5-10 solr
 cores need more than
  17s. I have 70 cores running. And the search speed
 depends on the
  SLOWEST one. Even 69 cores can run at 1ms. but
 last one need 50s. then
  the distributed search speed is 50s.
  I am aware these cores on same server are
 interfering with each other.
  As I have lots of free memory. I want to know,
 with the prerequisite,
  can solr use more memory to avoid disk operation
 conflicts?
 
  Thanks.
  Regards.
  Scott
 
  在2010-07-15 17:19:57,Peter Karich peat...@yahoo.de
 写道:
  How does your queries look like? Do you use
 faceting, highlighting, ... ?
  Did you try to customize the cache?
  Setting the HashDocSet to 0.005 of all
 documents improves our
  search speed a lot.
  Did you optimize the index?
 
  500ms seems to be slow for an 'average'
 search. I am not an expert
  but without highlighting it should be faster
 as 100ms or at least 200ms
 
  Regards,
  Peter.
 
 
  Hi.
  Thanks for replying.
  My document has many different
 fields(about 30 fields, 10 different
  type of documents but these are not the
 point ) and I have to search
  over several fields

Re:Re: How to speed up solr search speed

2010-07-17 Thread marship
Hi. Peter and All.
I merged my indexes today. Now each index stores 10M document. Now I only have 
10 solr cores. 
And I used 

java -Xmx1g -jar -server start.jar
to start the jetty server.

At first I deployed them all on one search. The search speed is about 3s. Then 
I noticed from cmd output when search start, 4 of 10's QTime only cost about 
10ms-500ms. The left 5 cost more, up to 2-3s. Then I put 6 on web server, 4 on 
another(DB, high load most time). Then the search speed goes down to about 1s 
most time. 
Now most search takes about 1s. That's great. 

I watched the jetty output on cmd windows on web server, now when each search 
start, I saw 2 of 6 costs 60ms-80ms. The another 4 cost 170ms - 700ms.  I do 
believe the bottleneck is still the hard disk. But at least, the search speed 
at the moment is acceptable. Maybe i should try memdisk to see if that help.


And for -Xmx1g, actually I only see jetty consume about 150M memory, consider 
now the index is 10x bigger. I don't think that works. I googled -Xmx is go 
enlarge the heap size. Not sure can that help search.  I still have 3.5G memory 
free on server. 

Now the issue I found is search with fq argument looks slow down the search.

Thanks All for your help and suggestions.
Thanks.
Regards.
Scott


在2010-07-17 03:36:19,Peter Karich peat...@yahoo.de 写道:
  Each solr(jetty) instance on consume 40M-60M memory.

 java -Xmx1024M -jar start.jar

That's a good suggestion!
Please, double check that you are using the -server version of the jvm
and the latest 1.6.0_20 or so.

Additionally you can start jvisualvm (shipped with the jdk) and hook
into jetty/tomcat easily to see the current CPU and memory load.

 But I have 70 solr cores

if you ask me: I would reduce them to 10-15 or even less and increase
the RAM.
try out tomcat too

 solr distriubted search's speed is decided by the slowest one. 

so, try to reduce the cores

Regards,
Peter.

 you mentioned that you have a lot of mem free, but your yetty containers
 only using between 40-60 mem.

 probably stating the obvious, but have you increased the -Xmx param like for
 instance:
 java -Xmx1024M -jar start.jar

 that way you're configuring the container to use a maximum of 1024 MB ram
 instead of the standard which is much lower (I'm not sure what exactly but
 it could well be 64MB for non -server, aligning with what you're seeing)

 Geert-Jan

 2010/7/16 marship mars...@126.com

   
 Hi Tom Burton-West.

  Sorry looks my email ISP filtered out your replies. I checked web version
 of mailing list and saw your reply.

  My query string is always simple like design, principle of design,
 tom



 EG:

 URL:
 http://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=on

 Response:

 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime16/int
 -
 lst name=params
 str name=indenton/str
 str name=start0/str
 str name=qdesign/str
 str name=version2.2/str
 str name=rows10/str
 /lst
 /lst
 -
 result name=response numFound=5981 start=0
 -
 doc
 str name=idproduct_208619/str
 /doc





 EG:
 http://localhost:7550/solr/select/?q=Principleversion=2.2start=0rows=10indent=on

 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime94/int
 -
 lst name=params
 str name=indenton/str
 str name=start0/str
 str name=qPrinciple/str
 str name=version2.2/str
 str name=rows10/str
 /lst
 /lst
 -
 result name=response numFound=104 start=0
 -
 doc
 str name=idproduct_56926/str
 /doc



 As I am querying over single core and other cores are not querying at same
 time. The QTime looks good.

 But when I query the distributed node: (For this case, 6422ms is still a
 not bad one. Many cost ~20s)

 URL:
 http://localhost:7499/solr/select/?q=the+first+world+warversion=2.2start=0rows=10indent=ondebugQuery=true

 Response:

 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime6422/int
 -
 lst name=params
 str name=debugQuerytrue/str
 str name=indenton/str
 str name=start0/str
 str name=qthe first world war/str
 str name=version2.2/str
 str name=rows10/str
 /lst
 /lst
 -
 result name=response numFound=4231 start=0



 Actually I am thinking and testing a solution: As I believe the bottleneck
 is in harddisk and all our indexes add up is about 10-15G. What about I just
 add another 16G memory to my server then use MemDisk to map a memory disk
 and put all my indexes into it. Then each time, solr/jetty need to load
 index from harddisk, it is loading from memory. This should give solr the
 most throughout and avoid the harddisk access delay. I am testing 

 But if there are way to make solr use better use our limited resource to
 avoid adding new ones. that would be great.






 
   


-- 
http://karussell.wordpress.com/



Re: Re: How to speed up solr search speed

2010-07-17 Thread Geert-Jan Brits
My query string is always simple like design, principle of design,
tom
EG:
URL:
http://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=on

IMO, indeed with these types of simple searches caching (and thus RAM usage)
can not be fully exploited, i.e: there isn't really anything to cache (no
sort-ordering, faceting (Lucene fieldcache), no documentsets,faceting (Solr
filtercache))

The only thing that helps you here would be a big solr querycache, depending
on how often queries are repeated.
Just execute the same query twice, the second time you should see a fast
response (say  20ms) that's the querycache (and thus RAM)  working for
you.

Now the issue I found is search with fq argument looks slow down the
search.

This doesn't align with your previous statement that you only use search
with a q-param (e.g:
http://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=on
)
For your own sake, explain what you're trying to do, otherwise we really are
guessing in the dark.

Anyway the FQ-param let's you cache (using the Solr-filtercache)  individual
documentsets that can be used to efficiently to intersect your resultset.
Also the first time, caches should be warmed (i.e: the fq-query should be
exectuted and results saved to cache, since there isn't anything there yet)
. Only on the second time would you start seeing improvements.

For instance:
http://localhost:7550/solr/select/?q=designfq=doctype:pdfversion=2.2start=0rows=10indent=onhttp://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=on

http://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=onwould
only show documents containing design when the doctype=pdf (Again this is
just an example here where I'm just assuming that you have defined a field
'doctype')
since the nr of values of documenttype would be pretty low and would be used
independently of other queries, this would be an excellent candidate for the
FQ-param.

http://wiki.apache.org/solr/CommonQueryParameters#fq
http://wiki.apache.org/solr/CommonQueryParameters#fq
This was a longer reply than I wanted to. Really think about your use-cases
first, then present some real examples of what you want to achieve and then
we can help you in a more useful manner.

Cheers,
Geert-Jan

2010/7/17 marship mars...@126.com

 Hi. Peter and All.
 I merged my indexes today. Now each index stores 10M document. Now I only
 have 10 solr cores.
 And I used

 java -Xmx1g -jar -server start.jar
 to start the jetty server.

 At first I deployed them all on one search. The search speed is about 3s.
 Then I noticed from cmd output when search start, 4 of 10's QTime only cost
 about 10ms-500ms. The left 5 cost more, up to 2-3s. Then I put 6 on web
 server, 4 on another(DB, high load most time). Then the search speed goes
 down to about 1s most time.
 Now most search takes about 1s. That's great.

 I watched the jetty output on cmd windows on web server, now when each
 search start, I saw 2 of 6 costs 60ms-80ms. The another 4 cost 170ms -
 700ms.  I do believe the bottleneck is still the hard disk. But at least,
 the search speed at the moment is acceptable. Maybe i should try memdisk to
 see if that help.


 And for -Xmx1g, actually I only see jetty consume about 150M memory,
 consider now the index is 10x bigger. I don't think that works. I googled
 -Xmx is go enlarge the heap size. Not sure can that help search.  I still
 have 3.5G memory free on server.

 Now the issue I found is search with fq argument looks slow down the
 search.

 Thanks All for your help and suggestions.
 Thanks.
 Regards.
 Scott


 在2010-07-17 03:36:19,Peter Karich peat...@yahoo.de 写道:
   Each solr(jetty) instance on consume 40M-60M memory.
 
  java -Xmx1024M -jar start.jar
 
 That's a good suggestion!
 Please, double check that you are using the -server version of the jvm
 and the latest 1.6.0_20 or so.
 
 Additionally you can start jvisualvm (shipped with the jdk) and hook
 into jetty/tomcat easily to see the current CPU and memory load.
 
  But I have 70 solr cores
 
 if you ask me: I would reduce them to 10-15 or even less and increase
 the RAM.
 try out tomcat too
 
  solr distriubted search's speed is decided by the slowest one.
 
 so, try to reduce the cores
 
 Regards,
 Peter.
 
  you mentioned that you have a lot of mem free, but your yetty containers
  only using between 40-60 mem.
 
  probably stating the obvious, but have you increased the -Xmx param like
 for
  instance:
  java -Xmx1024M -jar start.jar
 
  that way you're configuring the container to use a maximum of 1024 MB
 ram
  instead of the standard which is much lower (I'm not sure what exactly
 but
  it could well be 64MB for non -server, aligning with what you're seeing)
 
  Geert-Jan
 
  2010/7/16 marship mars...@126.com
 
 
  Hi Tom Burton-West.
 
   Sorry looks my email ISP filtered out your replies. I checked web
 version
  of mailing list and saw your reply.
 
   My query string is always simple like 

Re:Re: Re: How to speed up solr search speed

2010-07-17 Thread marship
Hi. Geert-Jan.
   Thanks for replying.
   I know solr has querycache and it improves the search speed from second 
time. Actually when I talk about the search speed. I don't mean talking about 
the speed of cache. When user search on our site, I don't want the first time 
cost 10s and all following cost 0s. These are unacceptable. So I want the first 
time to be as fast as it can. So all my test speed only count the first time.  
   For fq, yes, I need that. We have 5 different types, for general search, 
user doesn't need to specify which type he need to search over. But sometimes 
he needs to search over eg: type:product, that's the time I used fq and I 
believe I understand it correctly. Before I get today's speed, I was always 
testing against the simple search design etc, for the time before today, even 
the simple search speed is not acceptable so I doesn't care how fq speed will 
go. Today, as the simple search speed is acceptable. I move on to check fq 
and looks it sometimes is much slower than the simple search(The slower means 
it would take more than 2s, maybe 10s) .  

The only thing that helps you here would be a big solr querycache, depending
on how often queries are repeated.
I don't agree. I don't really care the speed of cache as I know it is always 
super fast. What I want to for solr is to consume as many memory as it can to 
pre-load the lucene index(maybe be 50% or even 100%). Then when the time comes 
it need to do the first time of a keyword. It is fast. (I haven't got the 
answer for this question.)

Thanks.
Regards.




在2010-07-17 19:30:26,Geert-Jan Brits gbr...@gmail.com 写道:
My query string is always simple like design, principle of design,
tom
EG:
URL:
http://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=on

IMO, indeed with these types of simple searches caching (and thus RAM usage)
can not be fully exploited, i.e: there isn't really anything to cache (no
sort-ordering, faceting (Lucene fieldcache), no documentsets,faceting (Solr
filtercache))

The only thing that helps you here would be a big solr querycache, depending
on how often queries are repeated.
Just execute the same query twice, the second time you should see a fast
response (say  20ms) that's the querycache (and thus RAM)  working for
you.

Now the issue I found is search with fq argument looks slow down the
search.

This doesn't align with your previous statement that you only use search
with a q-param (e.g:
http://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=on
)
For your own sake, explain what you're trying to do, otherwise we really are
guessing in the dark.

Anyway the FQ-param let's you cache (using the Solr-filtercache)  individual
documentsets that can be used to efficiently to intersect your resultset.
Also the first time, caches should be warmed (i.e: the fq-query should be
exectuted and results saved to cache, since there isn't anything there yet)
. Only on the second time would you start seeing improvements.

For instance:
http://localhost:7550/solr/select/?q=designfq=doctype:pdfversion=2.2start=0rows=10indent=onhttp://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=on

http://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=onwould
only show documents containing design when the doctype=pdf (Again this is
just an example here where I'm just assuming that you have defined a field
'doctype')
since the nr of values of documenttype would be pretty low and would be used
independently of other queries, this would be an excellent candidate for the
FQ-param.

http://wiki.apache.org/solr/CommonQueryParameters#fq
http://wiki.apache.org/solr/CommonQueryParameters#fq
This was a longer reply than I wanted to. Really think about your use-cases
first, then present some real examples of what you want to achieve and then
we can help you in a more useful manner.

Cheers,
Geert-Jan

2010/7/17 marship mars...@126.com

 Hi. Peter and All.
 I merged my indexes today. Now each index stores 10M document. Now I only
 have 10 solr cores.
 And I used

 java -Xmx1g -jar -server start.jar
 to start the jetty server.

 At first I deployed them all on one search. The search speed is about 3s.
 Then I noticed from cmd output when search start, 4 of 10's QTime only cost
 about 10ms-500ms. The left 5 cost more, up to 2-3s. Then I put 6 on web
 server, 4 on another(DB, high load most time). Then the search speed goes
 down to about 1s most time.
 Now most search takes about 1s. That's great.

 I watched the jetty output on cmd windows on web server, now when each
 search start, I saw 2 of 6 costs 60ms-80ms. The another 4 cost 170ms -
 700ms.  I do believe the bottleneck is still the hard disk. But at least,
 the search speed at the moment is acceptable. Maybe i should try memdisk to
 see if that help.


 And for -Xmx1g, actually I only see jetty consume about 150M memory,
 consider now the index is 10x bigger. I don't think that works. I googled
 -Xmx is 

Re: How to speed up solr search speed

2010-07-17 Thread Shawn Heisey

 On 7/17/2010 3:28 AM, marship wrote:

Hi. Peter and All.
I merged my indexes today. Now each index stores 10M document. Now I only have 
10 solr cores.
And I used

java -Xmx1g -jar -server start.jar
to start the jetty server.


How big are the indexes on each of those cores? You can easily get this 
info from a URL like this (assuming the bundled Jetty and its standard 
port):


http://hostname:8983/solr/corename/admin/replication/index.jsp

If your server only has 4GB of RAM, low memory is almost guaranteed to 
be the true problem. With low ram levels, the disk cache is nearly 
useless, and high disk I/O is the symptom.


My system runs as virtual machines. I've got six static indexes each a 
little over 12GB in size (7 million rows) and an incremental index that 
gets to about 700MB (300,000 rows). I've only got one active index core 
per virtual machine, except when doing a full reindex, which is rare. 
Each static VM is allocated 2 CPUs and 9GB of memory, each incremental 
has 2 CPUs and 3GB of memory. As I'm not using VMware, the memory is not 
oversubscribed. There is a slight oversubscription of CPUs, but I've 
never seen a CPU load problem. I've got dedicated VMs for load balancing 
and for the brokers.


With a max heap of 1.5GB, that leaves over 7GB of RAM to act as disk 
cache for a 12GB index. My statistics show that each of my two broker 
cores has 185000 queries under its belt, with an average query time of 
about 185 milliseconds. If I had enough memory to fit the entire 12GB 
index into RAM, I'm sure my query times would be MUCH smaller.


Here's a screenshot of the status page that aggregates my Solr statistics:

http://www.flickr.com/photos/52107...@n05/4801491979/sizes/l/



Re:Re: How to speed up solr search speed

2010-07-17 Thread marship
Hi. Shawn.
My indexes are smaller than yours. I only store id + type in indexes so 
each core index is about 1 - 1.5GB on disk.
I don't have so many servers/VPS as you have. In my option, my problem is not 
CPU. If possible, I prefer to add more memory to fit indexes in my server. At 
least at memory is cheaper. And I saw lots of my CPU time are wasted because no 
program can fullly use it. 

Is there a way to tell solr to load all indexes into memory? like memory 
directory in lucene. That would be breezing fast
Btw, how do you get that status page? 

Thanks.
Regards.
Scott



在2010-07-17 23:38:58,Shawn Heisey s...@elyograg.org 写道:
  On 7/17/2010 3:28 AM, marship wrote:
 Hi. Peter and All.
 I merged my indexes today. Now each index stores 10M document. Now I only 
 have 10 solr cores.
 And I used

 java -Xmx1g -jar -server start.jar
 to start the jetty server.

How big are the indexes on each of those cores? You can easily get this 
info from a URL like this (assuming the bundled Jetty and its standard 
port):

http://hostname:8983/solr/corename/admin/replication/index.jsp

If your server only has 4GB of RAM, low memory is almost guaranteed to 
be the true problem. With low ram levels, the disk cache is nearly 
useless, and high disk I/O is the symptom.

My system runs as virtual machines. I've got six static indexes each a 
little over 12GB in size (7 million rows) and an incremental index that 
gets to about 700MB (300,000 rows). I've only got one active index core 
per virtual machine, except when doing a full reindex, which is rare. 
Each static VM is allocated 2 CPUs and 9GB of memory, each incremental 
has 2 CPUs and 3GB of memory. As I'm not using VMware, the memory is not 
oversubscribed. There is a slight oversubscription of CPUs, but I've 
never seen a CPU load problem. I've got dedicated VMs for load balancing 
and for the brokers.

With a max heap of 1.5GB, that leaves over 7GB of RAM to act as disk 
cache for a 12GB index. My statistics show that each of my two broker 
cores has 185000 queries under its belt, with an average query time of 
about 185 milliseconds. If I had enough memory to fit the entire 12GB 
index into RAM, I'm sure my query times would be MUCH smaller.

Here's a screenshot of the status page that aggregates my Solr statistics:

http://www.flickr.com/photos/52107...@n05/4801491979/sizes/l/



Re: How to speed up solr search speed

2010-07-17 Thread Shawn Heisey
 I don't know of a way to tell Solr to load all the indexes into 
memory, but if you were to simply read all the files at the OS level, 
that would do it. Under a unix OS, cat *  /dev/null would work. Under 
Windows, I can't think of a way to do it off the top of my head, but if 
you had Cygwin installed, you could use the Unix method. That's not 
really necessary to do, however. Just the act of running queries against 
the index will load the relevant bits into the disk cache, making 
subsequent queries go to RAM instead of disk.


With 10 cores at 1.5GB each, your total index is a little bigger than 
one of my static indexes. Performance might be reasonable with 8GB of 
total RAM, if the machine is running Linux/Unix and doing nothing but 
Solr, but would be better with 12-16GB. It would be important to set up 
the Solr caches properly. Here's mine:


filterCache
class=solr.FastLRUCache
size=256
initialSize=64
autowarmCount=32
cleanupThread=true
/

queryResultCache
class=solr.FastLRUCache
size=1024
initialSize=256
autowarmCount=64
cleanupThread=true
/

documentCache
class=solr.FastLRUCache
size=16384
initialSize=4096
cleanupThread=true
/

The status page is a CGI script that I wrote which queries a couple of 
Solr pages on all my VMs. It's heavily tied into the central 
configuration used by my Solr build system, so it's not directly usable 
by the masses.


Thanks,
Shawn


On 7/17/2010 10:36 AM, marship wrote:

Hi. Shawn.
My indexes are smaller than yours. I only store id + type in indexes so each 
core index is about 1 - 1.5GB on disk.
I don't have so many servers/VPS as you have. In my option, my problem is not 
CPU. If possible, I prefer to add more memory to fit indexes in my server. At 
least at memory is cheaper. And I saw lots of my CPU time are wasted because no 
program can fullly use it.

Is there a way to tell solr to load all indexes into memory? like memory 
directory in lucene. That would be breezing fast
Btw, how do you get that status page?




Re:Re: How to speed up solr search speed

2010-07-16 Thread marship
...@truvo.com
 写道:

 Is there any reason why you have to limit each instance to only 1M
 documents?
 If you could put more documents in the same core I think it would
 dramatically improve your response times.

 -Original Message-
 From: marship [mailto:mars...@126.com]
 Sent: donderdag 15 juli 2010 6:23
 To: solr-user
 Subject: How to speed up solr search speed

 Hi. All.
 I got a problem with distributed solr search. The issue is
 I have 76M documents spread over 76 solr instances, each instance
 handles 1M documents.
 Previously I put all 76 instances on single server and when I tested
 I found each time it runs, it will take several times, mostly 10-20s to
 finish a search.
 Now, I split these instances into 2 servers. each one with 38
 instances. the search speed is about 5-10s each time.
 10s is a bit unacceptable for me. And based on my observation, the slow
 is caused by disk operation as all theses instances are on same server.
 Because when I test each single instance, it is purely fast, always
 ~400ms. When I use distributed search, I found some instance say it
 need
 7000+ms.
 Our server has plenty of memory free of use. I am thinking is there a
 way we can make solr use more memory instead of harddisk index, like,
 load all indexes into memory so it can speed up?

 welcome any help.
 Thanks.
 Regards.
 Scott




Re:Re:Re: How to speed up solr search speed

2010-07-16 Thread marship
Hi Tom Burton-West.

  Sorry looks my email ISP filtered out your replies. I checked web version of 
mailing list and saw your reply.

  My query string is always simple like design, principle of design, tom

 

EG:

URL: 
http://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=on

Response:

response
−
lst name=responseHeader
int name=status0/int
int name=QTime16/int
−
lst name=params
str name=indenton/str
str name=start0/str
str name=qdesign/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
−
result name=response numFound=5981 start=0
−
doc
str name=idproduct_208619/str
/doc

 

 

EG: 
http://localhost:7550/solr/select/?q=Principleversion=2.2start=0rows=10indent=on

response
−
lst name=responseHeader
int name=status0/int
int name=QTime94/int
−
lst name=params
str name=indenton/str
str name=start0/str
str name=qPrinciple/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
−
result name=response numFound=104 start=0
−
doc
str name=idproduct_56926/str
/doc

 

As I am querying over single core and other cores are not querying at same 
time. The QTime looks good.

But when I query the distributed node: (For this case, 6422ms is still a not 
bad one. Many cost ~20s)

URL: 
http://localhost:7499/solr/select/?q=the+first+world+warversion=2.2start=0rows=10indent=ondebugQuery=true

Response: 

response
−
lst name=responseHeader
int name=status0/int
int name=QTime6422/int
−
lst name=params
str name=debugQuerytrue/str
str name=indenton/str
str name=start0/str
str name=qthe first world war/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
−
result name=response numFound=4231 start=0

 

Actually I am thinking and testing a solution: As I believe the bottleneck is 
in harddisk and all our indexes add up is about 10-15G. What about I just add 
another 16G memory to my server then use MemDisk to map a memory disk and put 
all my indexes into it. Then each time, solr/jetty need to load index from 
harddisk, it is loading from memory. This should give solr the most throughout 
and avoid the harddisk access delay. I am testing 

But if there are way to make solr use better use our limited resource to avoid 
adding new ones. that would be great.

 

 

 

Re: Re:Re: How to speed up solr search speed

2010-07-16 Thread Geert-Jan Brits
you mentioned that you have a lot of mem free, but your yetty containers
only using between 40-60 mem.

probably stating the obvious, but have you increased the -Xmx param like for
instance:
java -Xmx1024M -jar start.jar

that way you're configuring the container to use a maximum of 1024 MB ram
instead of the standard which is much lower (I'm not sure what exactly but
it could well be 64MB for non -server, aligning with what you're seeing)

Geert-Jan

2010/7/16 marship mars...@126.com

 Hi Tom Burton-West.

  Sorry looks my email ISP filtered out your replies. I checked web version
 of mailing list and saw your reply.

  My query string is always simple like design, principle of design,
 tom



 EG:

 URL:
 http://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=on

 Response:

 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime16/int
 -
 lst name=params
 str name=indenton/str
 str name=start0/str
 str name=qdesign/str
 str name=version2.2/str
 str name=rows10/str
 /lst
 /lst
 -
 result name=response numFound=5981 start=0
 -
 doc
 str name=idproduct_208619/str
 /doc





 EG:
 http://localhost:7550/solr/select/?q=Principleversion=2.2start=0rows=10indent=on

 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime94/int
 -
 lst name=params
 str name=indenton/str
 str name=start0/str
 str name=qPrinciple/str
 str name=version2.2/str
 str name=rows10/str
 /lst
 /lst
 -
 result name=response numFound=104 start=0
 -
 doc
 str name=idproduct_56926/str
 /doc



 As I am querying over single core and other cores are not querying at same
 time. The QTime looks good.

 But when I query the distributed node: (For this case, 6422ms is still a
 not bad one. Many cost ~20s)

 URL:
 http://localhost:7499/solr/select/?q=the+first+world+warversion=2.2start=0rows=10indent=ondebugQuery=true

 Response:

 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime6422/int
 -
 lst name=params
 str name=debugQuerytrue/str
 str name=indenton/str
 str name=start0/str
 str name=qthe first world war/str
 str name=version2.2/str
 str name=rows10/str
 /lst
 /lst
 -
 result name=response numFound=4231 start=0



 Actually I am thinking and testing a solution: As I believe the bottleneck
 is in harddisk and all our indexes add up is about 10-15G. What about I just
 add another 16G memory to my server then use MemDisk to map a memory disk
 and put all my indexes into it. Then each time, solr/jetty need to load
 index from harddisk, it is loading from memory. This should give solr the
 most throughout and avoid the harddisk access delay. I am testing 

 But if there are way to make solr use better use our limited resource to
 avoid adding new ones. that would be great.








Re:Re: How to speed up solr search speed

2010-07-16 Thread Dennis Gearon
Isn't it always one of these four? (from most likely to least likely, generally)

Memory (as a ceiling limit)
Disk Speed
WebServer and it's code
CPU.

Memory and Disk are related, as swapping occurs between them. As long as memory 
is high enough, it becomes:

Disk Speed
WebServer and it's code
CPU

If the WebServer is configured to be as fast as is possible,only THEN the CPU 
comes into play.

So normally:

1/ Put enough memory in so it doesn't swap
2/ Buy the fastest damn disk/diskArrays/SolidState/HyperDrive RamDisk/RAIDed 
HyperDrive RamDisk that you can afford.
3/ Tune your webserver code.

1 moderate *LAPTOP* with 8-16 gig of ram(with a 64bit OS :-), and an single, 
external SATA HyperDrive 64Gig RamDrive is SCREAMING, way beyond most single 
server boxes you'll pay to get hosting on. The processor almost doesn't matter. 
Imagine what it's like with an array of those things.

Shows how much Ram and Disk slow things down.

Get going that fast and it's the Ethernet connection that slows things down. 
Good gaming boards are now coming with dual ethernet IO stock with software 
preconfigured to issue requests via both and get delivered to both. One dies 
and it falls back to the other.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Fri, 7/16/10, marship mars...@126.com wrote:

 From: marship mars...@126.com
 Subject: Re:Re: How to speed up solr search speed
 To: solr-user@lucene.apache.org
 Date: Friday, July 16, 2010, 11:26 AM
 Hi. Peter. 
 
  Thanks for replying.
 
 
 Hi Scott!
 
  I am aware these cores on same server are
 interfering with each other.
 
 Thats not good. Try to use only one core per CPU. With
 more per CPU you
 won't have any benefits over the single-core version, I
 think.
 
  I only have 2 servers, each CPU with 8 cores. Each server
 has 6G memory. So I have 16 CPU core in total. But I have 70
 solr cores so I have to use them on my 2 servers. Based on
 my observation, even when the search is processing, the CPU
 usage is not high. The memory usage is not high too. Each
 solr(jetty) instance on consume 40M-60M memory. My server
 always have 2-3G memory availabe.
 
  can solr use more memory to avoid disk operation
 conflicts?
 
 Yes, only the memory you have on the machine of course.
 Are you using
 tomcat or jetty?
 
 
 I am using jetty.
  For my case, I don't think solr can work as fast
 as 100-200ms on average.
 
 We have indices with a lot entries not as large as
 yours, but in the
 range of X Million. and have response times under
 100ms.
 What about testing only one core with 5-10 Mio docs? If
 the response
 time isn't any better maybe you need a different field
 config or sth.
 different is wrong?
 
 For the moment, I really don't know. I tried to use java
 -sever -jar start.jar to start jetty/solr. I saw when solr
 start, sometimes some core search for simple keyword like
 design will take 70s, of course some only take 0-15ms.
 From my aspect, I do believe it is the harddisk accessed by
 these cores deplays each other. So finally some cores fall
 behind. But the bad news for me is the solr distriubted
 search's speed is decided by the slowest one. 
 
 
 
  So should I add it or the default(without it ) is
 ok?
 
 Without is also okay - solr uses default.
 With 75 Mio docs it should around 20 000 but I guess
 there is sth.
 different wrong: maybe caching or field definition.
 Could you post the
 latter one?
 
 
 Sorry. What are you asking me to post?
 
  
 
 
 Regards,
 Peter.
 
  Hi. Peter.
  I think I am not using faceting, highlighting ...
 I read about them
  but don't know how to work with them. I am using
 the default example
  just change the indexed fields.
  For my case, I don't think solr can work as fast
 as 100-200ms on
  average. I tried some keywords on only single solr
 instance. It
  sometimes takes more than 20s. I just input 4
 keywords. I agree it is
  keyword concerns. But the issue is it doesn't work
 consistently.
 
  When 37 instances on same server works at same
 time (when a
  distributed search start), it goes worse, I saw
 some solr cores
  execute very fast, 0ms, ~40ms, ~200ms. But more
 solr cores executed as
  ~2500ms, ~3500ms, ~6700ms. and about 5-10 solr
 cores need more than
  17s. I have 70 cores running. And the search speed
 depends on the
  SLOWEST one. Even 69 cores can run at 1ms. but
 last one need 50s. then
  the distributed search speed is 50s.
  I am aware these cores on same server are
 interfering with each other.
  As I have lots of free memory. I want to know,
 with the prerequisite,
  can solr use more memory to avoid disk operation
 conflicts?
 
  Thanks.
  Regards.
  Scott
 
  在2010-07-15 17:19:57,Peter Karich peat...@yahoo.de
 写道:
  How does your queries look like? Do you use
 faceting, highlighting, ... ?
  Did you try to customize the cache?
  Setting the HashDocSet to 0.005 of all
 documents

Re: How to speed up solr search speed

2010-07-16 Thread Peter Karich
  Each solr(jetty) instance on consume 40M-60M memory.

 java -Xmx1024M -jar start.jar

That's a good suggestion!
Please, double check that you are using the -server version of the jvm
and the latest 1.6.0_20 or so.

Additionally you can start jvisualvm (shipped with the jdk) and hook
into jetty/tomcat easily to see the current CPU and memory load.

 But I have 70 solr cores

if you ask me: I would reduce them to 10-15 or even less and increase
the RAM.
try out tomcat too

 solr distriubted search's speed is decided by the slowest one. 

so, try to reduce the cores

Regards,
Peter.

 you mentioned that you have a lot of mem free, but your yetty containers
 only using between 40-60 mem.

 probably stating the obvious, but have you increased the -Xmx param like for
 instance:
 java -Xmx1024M -jar start.jar

 that way you're configuring the container to use a maximum of 1024 MB ram
 instead of the standard which is much lower (I'm not sure what exactly but
 it could well be 64MB for non -server, aligning with what you're seeing)

 Geert-Jan

 2010/7/16 marship mars...@126.com

   
 Hi Tom Burton-West.

  Sorry looks my email ISP filtered out your replies. I checked web version
 of mailing list and saw your reply.

  My query string is always simple like design, principle of design,
 tom



 EG:

 URL:
 http://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=on

 Response:

 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime16/int
 -
 lst name=params
 str name=indenton/str
 str name=start0/str
 str name=qdesign/str
 str name=version2.2/str
 str name=rows10/str
 /lst
 /lst
 -
 result name=response numFound=5981 start=0
 -
 doc
 str name=idproduct_208619/str
 /doc





 EG:
 http://localhost:7550/solr/select/?q=Principleversion=2.2start=0rows=10indent=on

 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime94/int
 -
 lst name=params
 str name=indenton/str
 str name=start0/str
 str name=qPrinciple/str
 str name=version2.2/str
 str name=rows10/str
 /lst
 /lst
 -
 result name=response numFound=104 start=0
 -
 doc
 str name=idproduct_56926/str
 /doc



 As I am querying over single core and other cores are not querying at same
 time. The QTime looks good.

 But when I query the distributed node: (For this case, 6422ms is still a
 not bad one. Many cost ~20s)

 URL:
 http://localhost:7499/solr/select/?q=the+first+world+warversion=2.2start=0rows=10indent=ondebugQuery=true

 Response:

 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime6422/int
 -
 lst name=params
 str name=debugQuerytrue/str
 str name=indenton/str
 str name=start0/str
 str name=qthe first world war/str
 str name=version2.2/str
 str name=rows10/str
 /lst
 /lst
 -
 result name=response numFound=4231 start=0



 Actually I am thinking and testing a solution: As I believe the bottleneck
 is in harddisk and all our indexes add up is about 10-15G. What about I just
 add another 16G memory to my server then use MemDisk to map a memory disk
 and put all my indexes into it. Then each time, solr/jetty need to load
 index from harddisk, it is loading from memory. This should give solr the
 most throughout and avoid the harddisk access delay. I am testing 

 But if there are way to make solr use better use our limited resource to
 avoid adding new ones. that would be great.






 
   


-- 
http://karussell.wordpress.com/



RE: How to speed up solr search speed

2010-07-15 Thread Fornoville, Tom
Is there any reason why you have to limit each instance to only 1M
documents?
If you could put more documents in the same core I think it would
dramatically improve your response times.

-Original Message-
From: marship [mailto:mars...@126.com] 
Sent: donderdag 15 juli 2010 6:23
To: solr-user
Subject: How to speed up solr search speed

Hi. All.
I got a problem with distributed solr search. The issue is 
I have 76M documents spread over 76 solr instances, each instance
handles 1M documents. 
   Previously I put all 76 instances on single server and when I tested
I found each time it runs, it will take several times, mostly 10-20s to
finish a search. 
   Now, I split these instances into 2 servers. each one with 38
instances. the search speed is about 5-10s each time. 
10s is a bit unacceptable for me. And based on my observation, the slow
is caused by disk operation as all theses instances are on same server.
Because when I test each single instance, it is purely fast, always
~400ms. When I use distributed search, I found some instance say it need
7000+ms. 
   Our server has plenty of memory free of use. I am thinking is there a
way we can make solr use more memory instead of harddisk index, like,
load all indexes into memory so it can speed up?

welcome any help.
Thanks.
Regards.
Scott


Re:RE: How to speed up solr search speed

2010-07-15 Thread marship
Hi.
   Thanks for replying.
   My document has many different fields(about 30 fields, 10 different type of 
documents but these are not the point ) and I have to search over several 
fields. 
   I was putting all 76M documents into several lucene indexes and use the 
default lucene.net ParaSearch to search over these indexes. That was slow, more 
than 20s.
   Then someone suggested I need to merge all our indexes into a huge one, he 
thought lucene can handle 76M documents in one index easily. Then I merged all 
the documents into a single huge one(which took me 3 days) . That time, the 
index folder is about 15G(I don't store info into index, just index them). 
Actually the search is still very slow, more than 20s too, and looks slower 
than use several indexes. 
   Then I come to solr. Why I put 1M into each core is I found when a core has 
1M document, the search speed is fast, range from 0-500ms, which is acceptable. 
I don't know how many documents to saved in one core is proper. 
   The problem is even if I put 2M documents into each core. Then I have only 
36 cores at the moment. But when our documents doubles in the future, same 
issue will rise again. So I don't think save 1M in each core is the issue. 
   The issue is I put too many cores into one server. I don't have extra server 
to spread solr cores. So we have to improve solr search speed from some other 
way. 
   Any suggestion?

Regards.
Scott





在2010-07-15 15:24:08,Fornoville, Tom tom.fornovi...@truvo.com 写道:
Is there any reason why you have to limit each instance to only 1M
documents?
If you could put more documents in the same core I think it would
dramatically improve your response times.

-Original Message-
From: marship [mailto:mars...@126.com] 
Sent: donderdag 15 juli 2010 6:23
To: solr-user
Subject: How to speed up solr search speed

Hi. All.
I got a problem with distributed solr search. The issue is 
I have 76M documents spread over 76 solr instances, each instance
handles 1M documents. 
   Previously I put all 76 instances on single server and when I tested
I found each time it runs, it will take several times, mostly 10-20s to
finish a search. 
   Now, I split these instances into 2 servers. each one with 38
instances. the search speed is about 5-10s each time. 
10s is a bit unacceptable for me. And based on my observation, the slow
is caused by disk operation as all theses instances are on same server.
Because when I test each single instance, it is purely fast, always
~400ms. When I use distributed search, I found some instance say it need
7000+ms. 
   Our server has plenty of memory free of use. I am thinking is there a
way we can make solr use more memory instead of harddisk index, like,
load all indexes into memory so it can speed up?

welcome any help.
Thanks.
Regards.
Scott


Re: How to speed up solr search speed

2010-07-15 Thread Peter Karich
How does your queries look like? Do you use faceting, highlighting, ... ?
Did you try to customize the cache?
Setting the HashDocSet to 0.005 of all documents improves our search speed a 
lot.
Did you optimize the index?

500ms seems to be slow for an 'average' search. I am not an expert but without 
highlighting it should be faster as 100ms or at least 200ms

Regards,
Peter.


 Hi.
Thanks for replying.
My document has many different fields(about 30 fields, 10 different type 
 of documents but these are not the point ) and I have to search over several 
 fields. 
I was putting all 76M documents into several lucene indexes and use the 
 default lucene.net ParaSearch to search over these indexes. That was slow, 
 more than 20s.
Then someone suggested I need to merge all our indexes into a huge one, he 
 thought lucene can handle 76M documents in one index easily. Then I merged 
 all the documents into a single huge one(which took me 3 days) . That time, 
 the index folder is about 15G(I don't store info into index, just index 
 them). Actually the search is still very slow, more than 20s too, and looks 
 slower than use several indexes. 
Then I come to solr. Why I put 1M into each core is I found when a core 
 has 1M document, the search speed is fast, range from 0-500ms, which is 
 acceptable. I don't know how many documents to saved in one core is proper. 
The problem is even if I put 2M documents into each core. Then I have only 
 36 cores at the moment. But when our documents doubles in the future, same 
 issue will rise again. So I don't think save 1M in each core is the issue. 
The issue is I put too many cores into one server. I don't have extra 
 server to spread solr cores. So we have to improve solr search speed from 
 some other way. 
Any suggestion?

 Regards.
 Scott





 ??2010-07-15 15:24:08??Fornoville, Tom tom.fornovi...@truvo.com ??
   
 Is there any reason why you have to limit each instance to only 1M
 documents?
 If you could put more documents in the same core I think it would
 dramatically improve your response times.

 -Original Message-
 From: marship [mailto:mars...@126.com] 
 Sent: donderdag 15 juli 2010 6:23
 To: solr-user
 Subject: How to speed up solr search speed

 Hi. All.
I got a problem with distributed solr search. The issue is 
I have 76M documents spread over 76 solr instances, each instance
 handles 1M documents. 
   Previously I put all 76 instances on single server and when I tested
 I found each time it runs, it will take several times, mostly 10-20s to
 finish a search. 
   Now, I split these instances into 2 servers. each one with 38
 instances. the search speed is about 5-10s each time. 
 10s is a bit unacceptable for me. And based on my observation, the slow
 is caused by disk operation as all theses instances are on same server.
 Because when I test each single instance, it is purely fast, always
 ~400ms. When I use distributed search, I found some instance say it need
 7000+ms. 
   Our server has plenty of memory free of use. I am thinking is there a
 way we can make solr use more memory instead of harddisk index, like,
 load all indexes into memory so it can speed up?

 welcome any help.
 Thanks.
 Regards.
 Scott
 


-- 
http://karussell.wordpress.com/



Re:Re: How to speed up solr search speed

2010-07-15 Thread marship
Hi. Peter.
  I think I am not using faceting, highlighting ... I read about them but don't 
know how to work with them. I am using the default example just change the 
indexed fields.
  For my case, I don't think solr can work as fast as 100-200ms on average. I 
tried some keywords on only single solr instance. It sometimes takes more than 
20s. I just input 4 keywords. I agree it is keyword concerns. But the issue is 
it doesn't work consistently. 

   When 37 instances on same server works at same time (when a distributed 
search start), it goes worse, I saw some solr cores execute very fast, 0ms, 
~40ms, ~200ms. But more solr cores executed as ~2500ms, ~3500ms, ~6700ms. and 
about 5-10 solr cores need more than 17s. I have 70 cores running. And the 
search speed depends on the SLOWEST one. Even 69 cores can run at 1ms. but last 
one need 50s. then the distributed search speed is 50s. 
I am aware these cores on same server are interfering with each other. As I 
have lots of free memory. I want to know, with the prerequisite, can solr use 
more memory to avoid disk operation conflicts?

Thanks.
Regards.
Scott

在2010-07-15 17:19:57,Peter Karich peat...@yahoo.de 写道:
How does your queries look like? Do you use faceting, highlighting, ... ?
Did you try to customize the cache?
Setting the HashDocSet to 0.005 of all documents improves our search speed a 
lot.
Did you optimize the index?

500ms seems to be slow for an 'average' search. I am not an expert but without 
highlighting it should be faster as 100ms or at least 200ms

Regards,
Peter.


 Hi.
Thanks for replying.
My document has many different fields(about 30 fields, 10 different type 
 of documents but these are not the point ) and I have to search over several 
 fields. 
I was putting all 76M documents into several lucene indexes and use the 
 default lucene.net ParaSearch to search over these indexes. That was slow, 
 more than 20s.
Then someone suggested I need to merge all our indexes into a huge one, 
 he thought lucene can handle 76M documents in one index easily. Then I 
 merged all the documents into a single huge one(which took me 3 days) . That 
 time, the index folder is about 15G(I don't store info into index, just 
 index them). Actually the search is still very slow, more than 20s too, and 
 looks slower than use several indexes. 
Then I come to solr. Why I put 1M into each core is I found when a core 
 has 1M document, the search speed is fast, range from 0-500ms, which is 
 acceptable. I don't know how many documents to saved in one core is proper. 
The problem is even if I put 2M documents into each core. Then I have 
 only 36 cores at the moment. But when our documents doubles in the future, 
 same issue will rise again. So I don't think save 1M in each core is the 
 issue. 
The issue is I put too many cores into one server. I don't have extra 
 server to spread solr cores. So we have to improve solr search speed from 
 some other way. 
Any suggestion?

 Regards.
 Scott





 在2010-07-15 15:24:08,Fornoville, Tom tom.fornovi...@truvo.com 写道:
   
 Is there any reason why you have to limit each instance to only 1M
 documents?
 If you could put more documents in the same core I think it would
 dramatically improve your response times.

 -Original Message-
 From: marship [mailto:mars...@126.com] 
 Sent: donderdag 15 juli 2010 6:23
 To: solr-user
 Subject: How to speed up solr search speed

 Hi. All.
I got a problem with distributed solr search. The issue is 
I have 76M documents spread over 76 solr instances, each instance
 handles 1M documents. 
   Previously I put all 76 instances on single server and when I tested
 I found each time it runs, it will take several times, mostly 10-20s to
 finish a search. 
   Now, I split these instances into 2 servers. each one with 38
 instances. the search speed is about 5-10s each time. 
 10s is a bit unacceptable for me. And based on my observation, the slow
 is caused by disk operation as all theses instances are on same server.
 Because when I test each single instance, it is purely fast, always
 ~400ms. When I use distributed search, I found some instance say it need
 7000+ms. 
   Our server has plenty of memory free of use. I am thinking is there a
 way we can make solr use more memory instead of harddisk index, like,
 load all indexes into memory so it can speed up?

 welcome any help.
 Thanks.
 Regards.
 Scott
 


-- 
http://karussell.wordpress.com/



Re:Re: How to speed up solr search speed

2010-07-15 Thread marship
Hi.  Peter.
And I checked my example/solr/conf/solrconfig.xml. (solr 1.4)
I don't see 

HashDocSet maxSize=3000 loadFactor=0.75/

in it. 
But I see it in solr website's solrconfig.xml wiki.

So should I add it or the default(without it ) is ok?


Thanks

在2010-07-15 17:19:57,Peter Karich peat...@yahoo.de 写道:
How does your queries look like? Do you use faceting, highlighting, ... ?
Did you try to customize the cache?
Setting the HashDocSet to 0.005 of all documents improves our search speed a 
lot.
Did you optimize the index?

500ms seems to be slow for an 'average' search. I am not an expert but without 
highlighting it should be faster as 100ms or at least 200ms

Regards,
Peter.


 Hi.
Thanks for replying.
My document has many different fields(about 30 fields, 10 different type 
 of documents but these are not the point ) and I have to search over several 
 fields. 
I was putting all 76M documents into several lucene indexes and use the 
 default lucene.net ParaSearch to search over these indexes. That was slow, 
 more than 20s.
Then someone suggested I need to merge all our indexes into a huge one, 
 he thought lucene can handle 76M documents in one index easily. Then I 
 merged all the documents into a single huge one(which took me 3 days) . That 
 time, the index folder is about 15G(I don't store info into index, just 
 index them). Actually the search is still very slow, more than 20s too, and 
 looks slower than use several indexes. 
Then I come to solr. Why I put 1M into each core is I found when a core 
 has 1M document, the search speed is fast, range from 0-500ms, which is 
 acceptable. I don't know how many documents to saved in one core is proper. 
The problem is even if I put 2M documents into each core. Then I have 
 only 36 cores at the moment. But when our documents doubles in the future, 
 same issue will rise again. So I don't think save 1M in each core is the 
 issue. 
The issue is I put too many cores into one server. I don't have extra 
 server to spread solr cores. So we have to improve solr search speed from 
 some other way. 
Any suggestion?

 Regards.
 Scott





 在2010-07-15 15:24:08,Fornoville, Tom tom.fornovi...@truvo.com 写道:
   
 Is there any reason why you have to limit each instance to only 1M
 documents?
 If you could put more documents in the same core I think it would
 dramatically improve your response times.

 -Original Message-
 From: marship [mailto:mars...@126.com] 
 Sent: donderdag 15 juli 2010 6:23
 To: solr-user
 Subject: How to speed up solr search speed

 Hi. All.
I got a problem with distributed solr search. The issue is 
I have 76M documents spread over 76 solr instances, each instance
 handles 1M documents. 
   Previously I put all 76 instances on single server and when I tested
 I found each time it runs, it will take several times, mostly 10-20s to
 finish a search. 
   Now, I split these instances into 2 servers. each one with 38
 instances. the search speed is about 5-10s each time. 
 10s is a bit unacceptable for me. And based on my observation, the slow
 is caused by disk operation as all theses instances are on same server.
 Because when I test each single instance, it is purely fast, always
 ~400ms. When I use distributed search, I found some instance say it need
 7000+ms. 
   Our server has plenty of memory free of use. I am thinking is there a
 way we can make solr use more memory instead of harddisk index, like,
 load all indexes into memory so it can speed up?

 welcome any help.
 Thanks.
 Regards.
 Scott
 


-- 
http://karussell.wordpress.com/



Re: How to speed up solr search speed

2010-07-15 Thread Peter Karich
Hi Scott!

 I am aware these cores on same server are interfering with each other.

Thats not good. Try to use only one core per CPU. With more per CPU you
won't have any benefits over the single-core version, I think.

 can solr use more memory to avoid disk operation conflicts?

Yes, only the memory you have on the machine of course. Are you using
tomcat or jetty?

 For my case, I don't think solr can work as fast as 100-200ms on average.

We have indices with a lot entries not as large as yours, but in the
range of X Million. and have response times under 100ms.
What about testing only one core with 5-10 Mio docs? If the response
time isn't any better maybe you need a different field config or sth.
different is wrong?

 So should I add it or the default(without it ) is ok?

Without is also okay - solr uses default.
With 75 Mio docs it should around 20 000 but I guess there is sth.
different wrong: maybe caching or field definition. Could you post the
latter one?

Regards,
Peter.

 Hi. Peter.
 I think I am not using faceting, highlighting ... I read about them
 but don't know how to work with them. I am using the default example
 just change the indexed fields.
 For my case, I don't think solr can work as fast as 100-200ms on
 average. I tried some keywords on only single solr instance. It
 sometimes takes more than 20s. I just input 4 keywords. I agree it is
 keyword concerns. But the issue is it doesn't work consistently.

 When 37 instances on same server works at same time (when a
 distributed search start), it goes worse, I saw some solr cores
 execute very fast, 0ms, ~40ms, ~200ms. But more solr cores executed as
 ~2500ms, ~3500ms, ~6700ms. and about 5-10 solr cores need more than
 17s. I have 70 cores running. And the search speed depends on the
 SLOWEST one. Even 69 cores can run at 1ms. but last one need 50s. then
 the distributed search speed is 50s.
 I am aware these cores on same server are interfering with each other.
 As I have lots of free memory. I want to know, with the prerequisite,
 can solr use more memory to avoid disk operation conflicts?

 Thanks.
 Regards.
 Scott

 在2010-07-15 17:19:57,Peter Karich peat...@yahoo.de 写道:
 How does your queries look like? Do you use faceting, highlighting, ... ?
 Did you try to customize the cache?
 Setting the HashDocSet to 0.005 of all documents improves our
 search speed a lot.
 Did you optimize the index?

 500ms seems to be slow for an 'average' search. I am not an expert
 but without highlighting it should be faster as 100ms or at least 200ms

 Regards,
 Peter.


 Hi.
 Thanks for replying.
 My document has many different fields(about 30 fields, 10 different
 type of documents but these are not the point ) and I have to search
 over several fields.
 I was putting all 76M documents into several lucene indexes and use
 the default lucene.net ParaSearch to search over these indexes. That
 was slow, more than 20s.
 Then someone suggested I need to merge all our indexes into a huge
 one, he thought lucene can handle 76M documents in one index easily.
 Then I merged all the documents into a single huge one(which took me
 3 days) . That time, the index folder is about 15G(I don't store
 info into index, just index them). Actually the search is still very
 slow, more than 20s too, and looks slower than use several indexes.
 Then I come to solr. Why I put 1M into each core is I found when a
 core has 1M document, the search speed is fast, range from 0-500ms,
 which is acceptable. I don't know how many documents to saved in one
 core is proper.
 The problem is even if I put 2M documents into each core. Then I
 have only 36 cores at the moment. But when our documents doubles in
 the future, same issue will rise again. So I don't think save 1M in
 each core is the issue.
 The issue is I put too many cores into one server. I don't have
 extra server to spread solr cores. So we have to improve solr search
 speed from some other way.
 Any suggestion?

 Regards.
 Scott





 在2010-07-15 15:24:08,Fornoville, Tom tom.fornovi...@truvo.com
 写道:

 Is there any reason why you have to limit each instance to only 1M
 documents?
 If you could put more documents in the same core I think it would
 dramatically improve your response times.

 -Original Message-
 From: marship [mailto:mars...@126.com]
 Sent: donderdag 15 juli 2010 6:23
 To: solr-user
 Subject: How to speed up solr search speed

 Hi. All.
 I got a problem with distributed solr search. The issue is
 I have 76M documents spread over 76 solr instances, each instance
 handles 1M documents.
 Previously I put all 76 instances on single server and when I tested
 I found each time it runs, it will take several times, mostly 10-20s to
 finish a search.
 Now, I split these instances into 2 servers. each one with 38
 instances. the search speed is about 5-10s each time.
 10s is a bit unacceptable for me. And based on my observation, the slow
 is caused by disk operation as all theses instances are on same

How to speed up solr search speed

2010-07-14 Thread marship
Hi. All.
I got a problem with distributed solr search. The issue is 
I have 76M documents spread over 76 solr instances, each instance handles 
1M documents. 
   Previously I put all 76 instances on single server and when I tested I found 
each time it runs, it will take several times, mostly 10-20s to finish a 
search. 
   Now, I split these instances into 2 servers. each one with 38 instances. the 
search speed is about 5-10s each time. 
10s is a bit unacceptable for me. And based on my observation, the slow is 
caused by disk operation as all theses instances are on same server. Because 
when I test each single instance, it is purely fast, always ~400ms. When I use 
distributed search, I found some instance say it need 7000+ms. 
   Our server has plenty of memory free of use. I am thinking is there a way we 
can make solr use more memory instead of harddisk index, like, load all indexes 
into memory so it can speed up?

welcome any help.
Thanks.
Regards.
Scott