Re: How does POE speed up operation in reality?
Quoting Roy M. ([email protected]): > > So far the POE::FTP components only allow me to connect to one ftp > server at a time, how can I get the same/similar speed as 8 concurrent > process?). > > In fact, this is a quite common multiple-producer and As per usual in these cases, there is a slight misunderstanding going on, where Rocco knows too much about POE to figure out what elementary knowledge you are missing. It's pretty simple POE, once you get your head around that it is cooperative multitasking of bundles of short events. So, POE::Component::FTP is one such bundle. Now, this bundle of events you can kick off multiple times, it will cooperate with all the other instances. As long as they are not CPU bound, and network task seldomly are, they will happily retrieve all network data "concurrently". So, just set up the component for each server, call poekernel->run and sit back. If you need to, then you can throttle back, having only 10 or 15 servers connected at the same time, adding one more for each finished transfer. Cheers, -- Merijn Broeren | We take risks, we know we take them. Therefore, when things | come out against us, we have no cause for complaint. | - Scott, last journal entry, march 1912
Re: How does POE speed up operation in reality?
On Sun, May 3, 2009 at 12:17 AM, Roy M. wrote: > hello, > > On Sun, May 3, 2009 at 2:02 PM, Rocco Caputo wrote: >> You should consult the component's >> author directly if he doesn't respond on the mailing list. >> > > you are right, I will send to author off the list then. > > >> I don't understand the question. Do you mean to imply that an event-driven >> program cannot consume data from multiple producers? >> > > I think at least if you want to access blocking method in the main > event loop, then thread is needed. > > e.g. in python twisted, they have threadpool > http://twistedmatrix.com/projects/core/documentation/howto/threading.html > > > Just wonder if have the same thing in POE. > My opinion is that you are making this problem harder than it has to be. I've written solutions to this exact problem of collecting data from a large number of hosts concurrently using FTP. The trivially easy way to do it is to manage a moderate number of single host transfer processes. This can be done in Perl as using Parallel::ForkManager or in POE using POE::Component::JobQueue and POE::Wheel::Run to launch and monitor the FTP sub processes. Accounting can be handled either by instrumenting the FTP sub processes to update some score card or by letting the management process manage the score card itself. JMHO cfedde
Re: How does POE speed up operation in reality?
On Sun, 3 May 2009 03:19:00 -0400
Rocco Caputo wrote:
> On May 3, 2009, at 02:17, Roy M. wrote:
> >
> > On Sun, May 3, 2009 at 2:02 PM, Rocco Caputo
> > wrote:
> >> I don't understand the question. Do you mean to imply that an
> >> event-driven
> >> program cannot consume data from multiple producers?
> >
> > I think at least if you want to access blocking method in the main
> > event loop, then thread is needed.
>
> True concurrency is only needed if an operation blocks too long for
> the application.
>
> > e.g. in python twisted, they have threadpool
> > http://twistedmatrix.com/projects/core/documentation/howto/threading.html
> >
> > Just wonder if have the same thing in POE.
>
>
> Yes, Perl is Turing complete. You can write programs with it that
> use POE and true concurrency at the same time.
>
> Be aware that threads aren't the only form of concurrency at your
> disposal.
>
Maybe it is just me, but doing a for(0..$x) { [instantiate FTP component
here] } would solve this non-problem. Or am I missing something?
And I back Rocco here. You have not demonstrated that your blocking call
to the database would be significant enough to require further
engineering. Even if you made a connection to the database,
authenticated, prepared a statement, and executed that statement for
every file that you downloaded, you would /still/ not introduce enough
of a block. Largely because if you are in POE, each one of those
steps would be an individual event that would be interleaved and
executed along with everything else that is going on. See any of the
POE::Component::*DBI* modules that demonstrate non-blocking database
calls (ie. all of them).
--
Nicholas Perez
XMPP/Email: [email protected]
http://search.cpan.org/~nperez/
http://github.com/nperez
Re: How does POE speed up operation in reality?
On May 3, 2009, at 02:17, Roy M. wrote: On Sun, May 3, 2009 at 2:02 PM, Rocco Caputo wrote: I don't understand the question. Do you mean to imply that an event-driven program cannot consume data from multiple producers? I think at least if you want to access blocking method in the main event loop, then thread is needed. True concurrency is only needed if an operation blocks too long for the application. e.g. in python twisted, they have threadpool http://twistedmatrix.com/projects/core/documentation/howto/threading.html Just wonder if have the same thing in POE. Yes, Perl is Turing complete. You can write programs with it that use POE and true concurrency at the same time. Be aware that threads aren't the only form of concurrency at your disposal. -- Rocco Caputo - [email protected]
Re: How does POE speed up operation in reality?
hello, On Sun, May 3, 2009 at 2:02 PM, Rocco Caputo wrote: > You should consult the component's > author directly if he doesn't respond on the mailing list. > you are right, I will send to author off the list then. > I don't understand the question. Do you mean to imply that an event-driven > program cannot consume data from multiple producers? > I think at least if you want to access blocking method in the main event loop, then thread is needed. e.g. in python twisted, they have threadpool http://twistedmatrix.com/projects/core/documentation/howto/threading.html Just wonder if have the same thing in POE.
Re: How does POE speed up operation in reality?
On May 3, 2009, at 01:25, Roy M. wrote: total bandwidth needed for all ftp sessions = 10KB x 300 files x 50 = 146MB as you can see, bandwidth (giga), disk, cpu should not be a problem to handle 146MB of data in 1 hour of time. so I am thinking 2 solutions: 1. Use thread (e.g. 8 concurrent process as my CPU is quad core SMP) to parallel the worker to ftp servers. 2. Investigate other solutions such as POE and that's why I am asking here: So far the POE::FTP components only allow me to connect to one ftp server at a time, how can I get the same/similar speed as 8 concurrent process?). Without having looked at the FTP component, I would either assume (a) that the component supports multiple connections, or (b) you would instantiate the component once per connection. You should consult the component's author directly if he doesn't respond on the mailing list. In fact, this is a quite common multiple-producer and multiple-consumer pattern in muilt-threading theory. But so far the POE example only allow me to do thing in a event-driven way, but not a way to handle this pattern, or something I have missed? I don't understand the question. Do you mean to imply that an event- driven program cannot consume data from multiple producers? -- Rocco Caputo - [email protected]
Re: How does POE speed up operation in reality?
Hello, On Sun, May 3, 2009 at 1:27 AM, Rocco Caputo wrote: > On May 2, 2009, at 05:37, Roy M. wrote: > Time constraint = 1 hour. > 50 servers * 300 files = 15,000 files. > 15,000 files * 5 minutes = 75,000 minutes. > sorry for the misleading: I mean to download 300 files from a single ftp server is less than a few minutes, e.g. 5 minutes for safe. (files are just very small, most are a few KB, e.g. <10KB) so currently I already have a sequential Net::FTP perl script which can handle the job for ONE ftp server every hours very nicely. The problem arise when I add more ftp servers to handle by the script, e.g. 50 ftp servers x 5 minutes = 250 mins = 4 hours (worst case) total bandwidth needed for all ftp sessions = 10KB x 300 files x 50 = 146MB as you can see, bandwidth (giga), disk, cpu should not be a problem to handle 146MB of data in 1 hour of time. so I am thinking 2 solutions: 1. Use thread (e.g. 8 concurrent process as my CPU is quad core SMP) to parallel the worker to ftp servers. 2. Investigate other solutions such as POE and that's why I am asking here: So far the POE::FTP components only allow me to connect to one ftp server at a time, how can I get the same/similar speed as 8 concurrent process?). In fact, this is a quite common multiple-producer and multiple-consumer pattern in muilt-threading theory. But so far the POE example only allow me to do thing in a event-driven way, but not a way to handle this pattern, or something I have missed? Thanks.
Re: How does POE speed up operation in reality?
On May 2, 2009, at 05:37, Roy M. wrote: On Sat, May 2, 2009 at 4:04 PM, Rocco Caputo wrote: You seem to be seeking examples where POE isn't necessarily appropriate. I recommend looking for CPU-bound problems, since that's where POE doesn't directly help. Even in the CPU-bound case, a subthread or child process can externalize and parallelize the work. POE can continue doing non- blocking work in the meantime, and it can be notified when the side thread or process is done. Thanks for your lengthly reply. I think I better explain my usage in more detail (sorry for not begin done before). Case: 1. I need to write a script to download files from 50 FTP servers every hour. 2. For each FTP server, I need to download 100-300 files. 3. I can accept connection to each FTP server is being single threaded (since the max. time for operations on a single FTP server is quite short, i.e. < 5 minutes) Time constraint = 1 hour. 50 servers * 300 files = 15,000 files. 15,000 files * 5 minutes = 75,000 minutes. You need to fit 1,250 hours worth of file transfer into a realtime hour. You will need to run up to 1,250 simultaneous connections. 15,000 files / 1,250 connections = 12 files per connection. 12 files * 5 minutes = 60 minutes, so that works. 1,250 connections / 50 servers = 25 simultaneous connections per server. How large are the files, in octets? File size divided by 5 minutes = data rate per file, in octets per minute. Multiply that by 1,250 to determine your required network capacity. Divide what you require by your actual network capacity to determine how many networks you'll need. If the result is greater than one, then you'll need to buy one or more networks. Remember to round the quotient up to the next integer; you don't want to be half a network short. Plan for growth. 4. So I tried to investigate POE, as FTP component is avaliable. You should explore the number of network interfaces, disk channels, memory, and CPU cores you'll need with each technology option at your disposal. Divide machine capacities by resource requirements, to find the number of machines you'll need. Use the maximum number, which will reflect your tightest bottleneck. If the bottleneck is network interfaces, make sure you don't use more than your network can support. If budget is a bottleneck, choose the technology that minimizes hardware costs. 5. Currently written using simple single threaded POE program, and In the `authenticated` method in POE::Component::Client::FTP, I need to write the log into an external MSSQL (not MySQL, wrongly typed b4), so this process should be blocking as I think. The log seems to be per connection. You can avoid blocking by journaling log entries to files, then spooling the journals into your SQL server with a dedicated background process. Questions: a. Is POE suitable for my jobs? Your question implies additional constraints that you haven't mentioned. POE is suitable for the current constraints and the work as specified, assuming that you have adequate hardware resources to achieve the work at all. b. Do I need to use thread + POE in order to run 50 works together? Threads are inappropriate for this task. You will almost certainly need more than one machine to do this work, which implies a multiprocess solution. Threads are redundant complexity, unless you can show a compelling need to share memory between downloaders. c. Within each worker/process, I have some blocking operation in the callback which prevent it from switching event more quickly, whar are the normal way to handle? You have yet to show that the blocking is significant or cannot be worked around trivially. You have larger worries right now. Your questions imply that you haven't thought the problem through. If you need to hire an analyst, just ask. I'm sure some who read this list are available. -- Rocco Caputo - [email protected]
Re: How does POE speed up operation in reality?
Hi, On Sat, May 2, 2009 at 4:04 PM, Rocco Caputo wrote: > You seem to be seeking examples where POE isn't necessarily appropriate. I > recommend looking for CPU-bound problems, since that's where POE doesn't > directly help. Even in the CPU-bound case, a subthread or child process can > externalize and parallelize the work. POE can continue doing non-blocking > work in the meantime, and it can be notified when the side thread or process > is done. > Thanks for your lengthly reply. I think I better explain my usage in more detail (sorry for not begin done before). Case: 1. I need to write a script to download files from 50 FTP servers every hour. 2. For each FTP server, I need to download 100-300 files. 3. I can accept connection to each FTP server is being single threaded (since the max. time for operations on a single FTP server is quite short, i.e. < 5 minutes) 4. So I tried to investigate POE, as FTP component is avaliable. 5. Currently written using simple single threaded POE program, and In the `authenticated` method in POE::Component::Client::FTP, I need to write the log into an external MSSQL (not MySQL, wrongly typed b4), so this process should be blocking as I think. Questions: a. Is POE suitable for my jobs? b. Do I need to use thread + POE in order to run 50 works together? c. Within each worker/process, I have some blocking operation in the callback which prevent it from switching event more quickly, whar are the normal way to handle? Thanks.
Re: How does POE speed up operation in reality?
DBI is generally not CPU bound on the client side. Again, it's a case where the client spends its time waiting idly, so this may be made non- blocking. A number of components have been written to help you execute DBI requests in the background. A simple user lookup (presumably to authenticate a user) may take on the order of a few hundredths of a second, possibly longer if the table is large and poorly indexed, you mistakenly prepare() the select statement each time, or your mysqld or network are saturated. None of these problems are insurmountable: Upgrade your mysqld. Upgrade your network. Properly index your table. Shard your database. You seem to be seeking examples where POE isn't necessarily appropriate. I recommend looking for CPU-bound problems, since that's where POE doesn't directly help. Even in the CPU-bound case, a subthread or child process can externalize and parallelize the work. POE can continue doing non-blocking work in the meantime, and it can be notified when the side thread or process is done. -- Rocco Caputo - [email protected] On May 2, 2009, at 03:03, Roy M. wrote: Hi. On Sat, May 2, 2009 at 2:56 PM, Rocco Caputo wrote: Incorrect. A lot of the work of transferring a file is waiting for the data to arrive. POE can wait for data to arrive in parallel without threading. POE::Component::Client::FTP may or may not be programmed to allow this, but POE can do it. It's a common misunderstanding to say "POE" when you mean "a module using POE". POE's authors have little control over how it's used. What if I have some blocking operation within a callback, e.g. MySQL query in `authenticated` method in POE::Component::Client::FTP ? I think unless I make it run as parellel, the bottomneck still exist as it is single threaded? Thanks.
Re: How does POE speed up operation in reality?
Hi. On Sat, May 2, 2009 at 2:56 PM, Rocco Caputo wrote: > Incorrect. A lot of the work of transferring a file is waiting for the data > to arrive. POE can wait for data to arrive in parallel without threading. > > POE::Component::Client::FTP may or may not be programmed to allow this, but > POE can do it. It's a common misunderstanding to say "POE" when you mean "a > module using POE". POE's authors have little control over how it's used. > What if I have some blocking operation within a callback, e.g. MySQL query in `authenticated` method in POE::Component::Client::FTP ? I think unless I make it run as parellel, the bottomneck still exist as it is single threaded? Thanks.
Re: How does POE speed up operation in reality?
Incorrect. A lot of the work of transferring a file is waiting for the data to arrive. POE can wait for data to arrive in parallel without threading. POE::Component::Client::FTP may or may not be programmed to allow this, but POE can do it. It's a common misunderstanding to say "POE" when you mean "a module using POE". POE's authors have little control over how it's used. -- Rocco Caputo - [email protected] On May 2, 2009, at 02:39, Roy M. wrote: Hi guy, Some basic questions about POE. For example, in the module: POE::Component::Client::FTP http://search.cpan.org/~bingos/POE-Component-Client-FTP-0.22/lib/POE/Component/Client/FTP.pm What I see is the old sequential ftp operations now being replaced by some call back handlers and they will trigger automatically. But the speed of the whole FTP operations (e.g. login, download, and bye) should remain the same (or slower in fact due to overhead). So if I want to have faster speed, e.g. download more then one files at the same time, I need to use thread and POE together? Am I correct? Thanks.
How does POE speed up operation in reality?
Hi guy, Some basic questions about POE. For example, in the module: POE::Component::Client::FTP http://search.cpan.org/~bingos/POE-Component-Client-FTP-0.22/lib/POE/Component/Client/FTP.pm What I see is the old sequential ftp operations now being replaced by some call back handlers and they will trigger automatically. But the speed of the whole FTP operations (e.g. login, download, and bye) should remain the same (or slower in fact due to overhead). So if I want to have faster speed, e.g. download more then one files at the same time, I need to use thread and POE together? Am I correct? Thanks.
