2012/8/9 Emmanuel Lécharny <[email protected]>:
> Le 8/9/12 7:23 PM, Antonio Rodriges a écrit :
>
>>>> Maybe check the CPU load on your server machine and/or run a (java)
>>>> profiler.
>>>> Except the "unusual" buffer size you have chosen (maybe you have a good
>>>> reason for that?!), I don't see any issues.
>>>
>>>
>>> I don't think that the chosen buffer size are "unusual". If you are going
>>> to
>>> transfer large amount of data, this is quite sane to use large
>>> buffers.That
>>> being said, it all depends on what you do when you process an incoming
>>> message.
>>>
>>> Also check that the pb is not on your injector : are you sending enough
>>> message to saturate your server ?
>>>
>> When the client receives the incoming message it issues new server
>> query.
>
> Not sure that I understand what does your client and what does your server
> here. Can you be a bit more explicit ? For instance :
> - is your client just sending messages, and wait for a response ?
> - is your client using MINA or a simple blocking IO ?
> - is your server using MINA (I'm assuming that it does, but just for
> clarity) ?
> - what exactly do the server ? Is it processing the message in any way (like
> waiting for a third tier like a database ) ?
>
>
>>
>> We have currently 3 sizes of messages, the biggest one is 1 MB.
>
> What kind of messages are they ? Are you doing something "special" with them
> on the server ?
>
>
>> The
>> CPU is surely not loaded. Munin live statistics shows not more than
>> 400K output traffic per 0.5 sec. from the server.
>
> That's clearly awfully slow.
>
>
>> We ran some tests on
>> our network and they showed about 500 Mb/sec mean speed. Thus, the
>> network is not saturated with our server
>
> You bet !
>
>>
>> Also it is interesting that when we increase the buffer size the
>> number of MESSAGE_RECEIVED events does not noticably changes at client
>> or speed increases.
>>
>>> Also check that the pb is not on your injector : are you sending enough
>>> message to saturate your server ?
>>
>> What do you mean by "pb" and "injector"?
>
> pb = problem
> Injector = the client used to send requests to the server.
>
> At some point, if your client is waiting for a response from the server, and
> if it takes time to receive these responses, then it's likely that the
> throughput will be low.
>
> Let's do some math here. You have a 1Gb/s network, let's say the average
> message is 1 Mb. It will take roughly 1Mb*10/1Gb second to send a message
> from a client to a server, around 10 ms. In order to saturate your bandwith,
> you will need to use 100 clients sending 1Mb messages every second.
>
> If you just use one client, then it will be able to send 100 message per
> second, max. Probably less, if the server do some processing. For instance,
> if your server takes 100ms to process a single message, then with one
> client, you will only be able to process around 9 messages per second, max.
>
> As I have no idea what your server does, it's really difficult to know where
> exactly to dig to find a solution. You need to tell us more here.
>
After more analysis of statistics we identified the bottleneck but not
found yet the solution
You are right that when client waits for the respose, the throughput
will be low. However, we measure all stages of the query processing.
And the transfer stage takes long. We synchronize the clock between
the server, gate (see below) and the clinet within +/-5 ms. 10 clients
can generate and receive responses for about 7 queries per minute.
Both client and server use Mina.
Client:
while ( time not finished )
{
q = generateQueryString /// it is several bytes
send (q)
wait for response ()
} // that;s all
There is a gate (also Mina based) which simply retranferes queries to
servers and results back to clients:
Gate (has 32 nio acceptors)
Unpack query
Parse query
Choose server
Transfer request → server
Server
Unpack request
Extract data (implies disk IO)
Create response
Pack response
Transfer resp → Gate
Again Gate:
Unpack response from server
RePack response for client
Transfer Gate → Client
The overall performance must be dominated by disk IO which is
currently up to 200 ms. However, as I mentioned before, we measure all
stages.
The median for a stage is given right to each of them. The statistics
is for 10 clients and 4 MB query results. They are able to receive 7
queries per minute.
The query response comes to client on average in 7237 ms.While Gate->
client takes 6627 ms and response server -> gate takes only 220 ms. on
average It is interesting why gate does not keep pace with the overall
load? While it simply retransfers the message. Maybe too many IO for a
single machine or Mina maybe tuned?