[ https://issues.apache.org/jira/browse/IGNITE-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367398#comment-16367398 ]
Kirill Shirokov commented on IGNITE-7736: ----------------------------------------- I think the best way to impement it would be to introduce some kind of sliding window protocol [1], like TCP window. So the item #4 in your list can be sliding window implementation. We send a number of packets without waiting for an ACK. This number is called window size. Ideally: {noformat} window_size = ceil(network_bandwidth * (network_rtt / 2 + server_single_packet_processing_time) / packet_size). {noformat} Then we wait for ACK for packet with id = (current_packet_id - window_size). Since we the network bandwidth value is not directly available to us both at the client and the server sides we may calculate and adjust optimal window size on the client using round-trip time between sending a particular packet and receiving an ACK for it. So we can assume that: {noformat} bandwidth = packet_size * (current_packet_number - packet_X_number) / (packet_X_ack_receive_time - packet_X_sending_start_time) {noformat} [1] http://computing.dcu.ie/~humphrys/Notes/Networks/data.sliding.html And a few thoughts about moving data to separate threads: Some part of it is already implemented as part of IGNITE-7681 (see ignite-7681-2 branch): data conversion (surprisingly the most CPU-intensive part) and streaming are relayed to pool threads in the manner of sliding window. However the parsing can be also parallelized: for a packet we skip characters until a line separator and start processing from it. Not processed heads and tails of the packets are processed later separately for each pair of consecutive packets. However this approach has problems with any kind of line numbering (such as auto-incremented line number, if we are about to introduce it). > SQL COPY: streaming model for network packets instead of request-response > model > ------------------------------------------------------------------------------- > > Key: IGNITE-7736 > URL: https://issues.apache.org/jira/browse/IGNITE-7736 > Project: Ignite > Issue Type: Task > Reporter: Vladimir Ozerov > Priority: Major > Fix For: 2.5 > > > *Problem* > Currently data transfer in COPY command is implemented as a series of > request-responses. When request is received, it is parsed synchronously and > passed to the streamer, then response is sent. This is not very efficient > approach: > # We hardly could utilize long fat network channels efficiently as we spend a > lot of time waiting for a very small response (ack). > # Parsing takes and adding data to the streamer takes time (especially if we > reached streamer buffer limitations and are blocked waiting for responses > from data nodes). During this period network is not utilized and file data is > not transferred further. > *Solution* > Let's fix the problem iteratively as follows: > # Introduce asynchrony - when request is received, send the response > immediately before data processing > # Then consider sending one ack for several requests instead of sending ack > for every request > # When multiple simultaneous requests are enabled (previous point), consider > moving data processing to separate stream, so that we can read data from the > socket as fast as possible -- This message was sent by Atlassian JIRA (v7.6.3#76005)