[ 
https://issues.apache.org/jira/browse/IGNITE-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367398#comment-16367398
 ] 

Kirill Shirokov commented on IGNITE-7736:
-----------------------------------------

I think the best way to impement it would be to introduce some kind of sliding 
window protocol [1], like TCP window. So the item #4 in your list can be 
sliding window implementation.

We send a number of packets without waiting for an ACK. This number is called 
window size. Ideally:

{noformat}
window_size  = ceil(network_bandwidth * (network_rtt / 2 + 
server_single_packet_processing_time) / packet_size).
{noformat}

Then we wait for ACK for packet with id = (current_packet_id - window_size).

Since we the network bandwidth value is not directly available to us both at 
the client and the server sides we may calculate and adjust optimal window size 
on the client using round-trip time between sending a particular packet and 
receiving an ACK for it. So we can assume that:

{noformat}
bandwidth = packet_size * (current_packet_number - packet_X_number) / 
(packet_X_ack_receive_time - packet_X_sending_start_time)
{noformat}

[1] http://computing.dcu.ie/~humphrys/Notes/Networks/data.sliding.html

And a few thoughts about moving data to separate threads:

Some part of it is already implemented as part of IGNITE-7681 (see 
ignite-7681-2 branch): data conversion (surprisingly the most CPU-intensive 
part) and streaming are relayed to pool threads in the manner of sliding window.

However the parsing can be also parallelized: for a packet we skip characters 
until a line separator and start processing from it. Not processed heads and 
tails of the packets are processed later separately for each pair of 
consecutive packets. However this approach has problems with any kind of line 
numbering (such as auto-incremented line number, if we are about to introduce 
it).

> SQL COPY: streaming model for network packets instead of request-response 
> model
> -------------------------------------------------------------------------------
>
>                 Key: IGNITE-7736
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7736
>             Project: Ignite
>          Issue Type: Task
>            Reporter: Vladimir Ozerov
>            Priority: Major
>             Fix For: 2.5
>
>
> *Problem*
> Currently data transfer in COPY command is implemented as a series of 
> request-responses. When request is received, it is parsed synchronously and 
> passed to the streamer, then response is sent. This is not very efficient 
> approach:
> # We hardly could utilize long fat network channels efficiently as we spend a 
> lot of time waiting for a very small response (ack).
> # Parsing takes and adding data to the streamer takes time (especially if we 
> reached streamer buffer limitations and are blocked waiting for responses 
> from data nodes). During this period network is not utilized and file data is 
> not transferred further.
> *Solution*
> Let's fix the problem iteratively as follows:
> # Introduce asynchrony - when request is received, send the response 
> immediately before data processing
> # Then consider sending one ack for several requests instead of sending ack 
> for every request
> # When multiple simultaneous requests are enabled (previous point), consider 
> moving data processing to separate stream, so that we can read data from the 
> socket as fast as possible 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to