David, Great thanks for your perfect answer.
I think Disk IO will not be a bottleneck. There are four servers which share disks. I executed the scrpit in seperate servers and this reduced the upload speed a lot. After I saw this performance improvement, I started changing script. I will try your suggestion.. Thanks, Jaepyoung On Sat, Jul 10, 2010 at 3:56 PM, David Bolen <[email protected]> wrote: > Jaepyoung Kim <[email protected]> writes: > >> The current script is uploading using ftplib and it takes time about 1 hour. >> I want to change this script to use twisted asynchronous function. >> I thought if I use asynchronous function in twisted like following, >> then file uploading will be executed in parallel. >> But this was executed sequentially. Uploading second file starts afer >> completing first file upload. >> Could you check what was wrong in my source code? Or Am I wrong in >> understanding asynchronous function? > > I'm pretty sure you'll need separate connections to an FTP server to > achieve parallel transfers, regardless of how you write the client. > At least as long as you stick with regular get/put commands. So while > using a twisted approach can enable you to manage those parallel > streams pretty easily, you'll need distinct connections for each > transfer and manage which file transfer is using which connection in > your code. > > Essentially a store or fetch FTP operation initiates a transfer over > the dedicated data channel, so that channel is in use until the > transfer completes or is aborted. The data on the data channel is not > encapsulated nor multiplexed in any way so you can only have a single > transfer using the data channel at once. Passive transfers do create > new data channels, but the FTP protocol specifically says a server > needs to stop accepting connections and shut down any open connections > on old passive ports once a new passive request is received, so you're > still limited to one at a time. > > Thus, your callbacks for each store operation, will only file when the > store has completed, and you'll only be able to initiate the next > store request at that point since its only then that the channel to > the server is free to transfer another file. > > I believe some servers have implemented custom extensions to implement > parallel operations at a finer grained level than a file, but I don't > think they're commonly implemented in ftp libraries (nor in servers > commonly in use). > > What I'd suggest, in terms of your code, is to instantiate a pool of > FTPClients to the same server, initiate transfers on them in parallel > and then as one completes, use it to pick up the next file. You'll > need to handle the distribution of files amongst the pool of clients > yourself. > > Is there any particular reason you expect this to yield an improvement > in overall time? Unless you're transferring very large numbers of > files that are very small compared to the bandwidth*latency of your > network connection to the server (which doesn't sound like the case > here), the overhead of the protocol itself will be quite small, and > your bottleneck is either going to be the network throughput, or the > slower of the disk I/O on either end. > > Neither of those bottlenecks will likely be improved by doing multiple > transfers in parallel, and in fact your total time can worsen if the > prior bottleneck was the disk I/O since you'll have competing > operations for the disks as opposed to simple sequential access. Or > you may find that you get very marginal benefit with the expense of > much more complicated to maintain code. > > You might grab an existing ftp client that supports parallel transfers > and use it to run some tests before trying to re-implement things > yourself. There should be several options, but for example, I believe > FileZilla supports it under Windows, or lftp under Linux. > > -- David > > > _______________________________________________ > Twisted-Python mailing list > [email protected] > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python > -- Jaepyoung Kim (Cellular phone) 1-310-848-7774 _______________________________________________ Twisted-Python mailing list [email protected] http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
