[jira] Issue Comment Edited: (SOLR-1044) Use Hadoop RPC for inter Solr communication

Noble Paul (JIRA) Tue, 03 Mar 2009 00:31:26 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678242#action_12678242
 ]


noble.paul edited comment on SOLR-1044 at 3/3/09 12:30 AM:
-----------------------------------------------------------

bq.Is our use of HTTP really a bottleneck? 
we are limited by the servlet engine's ability to serve requests . I guess it 
would easily peak out at 600-800 req/sec .Whereas a NIO based system can serve 
far more with lower latency (http://www.jboss.org/netty/performance.html). If 
we have a request served out of cache (no lucene search involved) the only 
overhead will be that of the HTTP . Then there is the overhead of servlet 
engine itself . Moreover HTTP is not a very efficient for large volume small 
sized requests

bq.My feeling has been that if we go to a call mechanism, it should be based on 
something more standard that will have many off the shelf bindings - perl, 
python, php, C, etc.

I agree. Hadoop looked like a simple RPC mechanism . But we can choose any 
(Thrift, Etch, Grizzly etc). We can rely on these for the transport alone. The 
payload will have to be our own say xml/json/javabin etc . None of them yet 
support a flexible format

bq. That can also be a potential weakness though I think... a slow reader or 
writer for one request/response hangs up all the others.

The requests on the server are served by multiple handlers (each one is a 
thread). One request will not block another if there are enough 
handlers/threads 


      was (Author: noble.paul):
    bq.Is our use of HTTP really a bottleneck? 
we are limited by the servlet engine's ability to serve requests . I guess it 
would easily peak out at 600-800 req/sec .Whereas a NIO based system can serve 
far more with lower latency (http://www.jboss.org/netty/performance.html). If 
we have a request served out of cache (no lucene search involved) the only 
overhead will be that of the HTTP . Then there is the overhead of servlet 
engine itself . Moreover HTTP is not a very efficient for large volume small 
sized requests

bq.My feeling has been that if we go to a call mechanism, it should be based on 
something more standard that will have many off the shelf bindings - perl, 
python, php, C, etc.

I agree. Hadoop looked like a simple RPC mechanism .

bq. That can also be a potential weakness though I think... a slow reader or 
writer for one request/response hangs up all the others.

The requests on the server are served by multiple handlers (each one is a 
thread). One request will not block another if there are enough 
handlers/threads 

  
> Use Hadoop RPC for inter Solr communication
> -------------------------------------------
>
>                 Key: SOLR-1044
>                 URL: https://issues.apache.org/jira/browse/SOLR-1044
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Noble Paul
>
> Solr uses http for distributed search . We can make it a whole lot faster if 
> we use an RPC mechanism which is more lightweight/efficient. 
> Hadoop RPC looks like a good candidate for this.  
> The implementation should just have one protocol. It should follow the Solr's 
> idiom of making remote calls . A uri + params +[optional stream(s)] . The 
> response can be a stream of bytes.
> To make this work we must make the SolrServer implementation pluggable in 
> distributed search. Users should be able to choose between the current 
> CommonshttpSolrServer, or a HadoopRpcSolrServer . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-1044) Use Hadoop RPC for inter Solr communication

Reply via email to