Re: How does POE speed up operation in reality?

2009-05-19 Thread Merijn Broeren
Quoting Roy M. ([email protected]):
> 
> So far the POE::FTP components only allow me to connect to one ftp
> server at a time, how can I get the same/similar speed as 8 concurrent
> process?).
> 
> In fact, this is a quite common multiple-producer and

As per usual in these cases, there is a slight misunderstanding going
on, where Rocco knows too much about POE to figure out what elementary
knowledge you are missing. 

It's pretty simple POE, once you get your head around that it is
cooperative multitasking of bundles of short events. So,
POE::Component::FTP is one such bundle. Now, this bundle of events you
can kick off multiple times, it will cooperate with all the other
instances. As long as they are not CPU bound, and network task seldomly
are, they will happily retrieve all network data "concurrently". 

So, just set up the component for each server, call poekernel->run and
sit back. If you need to, then you can throttle back, having only 10 or
15 servers connected at the same time, adding one more for each finished
transfer.

Cheers,
-- 
Merijn Broeren | We take risks, we know we take them. Therefore, when things
   | come out against us, we have no cause for complaint.
   | - Scott, last journal entry, march 1912


Re: How does POE speed up operation in reality?

2009-05-04 Thread chris fedde
On Sun, May 3, 2009 at 12:17 AM, Roy M.  wrote:
> hello,
>
> On Sun, May 3, 2009 at 2:02 PM, Rocco Caputo  wrote:
>> You should consult the component's
>> author directly if he doesn't respond on the mailing list.
>>
>
> you are right, I will send to author off the list then.
>
>
>> I don't understand the question.  Do you mean to imply that an event-driven
>> program cannot consume data from multiple producers?
>>
>
> I think at least if you want to access blocking method in the main
> event loop, then thread is needed.
>
> e.g. in python twisted, they have threadpool
> http://twistedmatrix.com/projects/core/documentation/howto/threading.html
>
>
> Just wonder if have the same thing in POE.
>

My opinion is that you are making this problem harder than it has to be.

I've written solutions to this exact problem of collecting data from a
large number of hosts concurrently using FTP.  The trivially easy way
to do it is to manage a moderate number of single host transfer
processes.  This can be done in Perl as using Parallel::ForkManager or
in POE  using POE::Component::JobQueue  and POE::Wheel::Run to launch
and monitor the FTP sub processes.

Accounting can be handled either by instrumenting the FTP sub
processes to update some score card or by letting the management
process manage the score card itself.

JMHO
cfedde


Re: How does POE speed up operation in reality?

2009-05-03 Thread Nick Perez
On Sun, 3 May 2009 03:19:00 -0400
Rocco Caputo  wrote:

> On May 3, 2009, at 02:17, Roy M. wrote:
> >
> > On Sun, May 3, 2009 at 2:02 PM, Rocco Caputo   
> > wrote:
> >> I don't understand the question.  Do you mean to imply that an  
> >> event-driven
> >> program cannot consume data from multiple producers?
> >
> > I think at least if you want to access blocking method in the main
> > event loop, then thread is needed.
> 
> True concurrency is only needed if an operation blocks too long for  
> the application.
> 
> > e.g. in python twisted, they have threadpool
> > http://twistedmatrix.com/projects/core/documentation/howto/threading.html
> >
> > Just wonder if have the same thing in POE.
> 
> 
> Yes, Perl is Turing complete.  You can write programs with it that
> use POE and true concurrency at the same time.
> 
> Be aware that threads aren't the only form of concurrency at your  
> disposal.
> 

Maybe it is just me, but doing a for(0..$x) { [instantiate FTP component
here] } would solve this non-problem. Or am I missing something? 

And I back Rocco here. You have not demonstrated that your blocking call
to the database would be significant enough to require further
engineering. Even if you made a connection to the database,
authenticated, prepared a statement, and executed that statement for
every file that you downloaded, you would /still/ not introduce enough
of a block. Largely because if you are in POE, each one of those
steps would be an individual event that would be interleaved and
executed along with everything else that is going on. See any of the
POE::Component::*DBI* modules that demonstrate non-blocking database
calls (ie. all of them). 







-- 

Nicholas Perez
XMPP/Email: [email protected]
http://search.cpan.org/~nperez/
http://github.com/nperez


Re: How does POE speed up operation in reality?

2009-05-03 Thread Rocco Caputo

On May 3, 2009, at 02:17, Roy M. wrote:


On Sun, May 3, 2009 at 2:02 PM, Rocco Caputo   
wrote:
I don't understand the question.  Do you mean to imply that an  
event-driven

program cannot consume data from multiple producers?


I think at least if you want to access blocking method in the main
event loop, then thread is needed.


True concurrency is only needed if an operation blocks too long for  
the application.



e.g. in python twisted, they have threadpool
http://twistedmatrix.com/projects/core/documentation/howto/threading.html

Just wonder if have the same thing in POE.



Yes, Perl is Turing complete.  You can write programs with it that use  
POE and true concurrency at the same time.


Be aware that threads aren't the only form of concurrency at your  
disposal.


--
Rocco Caputo - [email protected]


Re: How does POE speed up operation in reality?

2009-05-02 Thread Roy M.
hello,

On Sun, May 3, 2009 at 2:02 PM, Rocco Caputo  wrote:
> You should consult the component's
> author directly if he doesn't respond on the mailing list.
>

you are right, I will send to author off the list then.


> I don't understand the question.  Do you mean to imply that an event-driven
> program cannot consume data from multiple producers?
>

I think at least if you want to access blocking method in the main
event loop, then thread is needed.

e.g. in python twisted, they have threadpool
http://twistedmatrix.com/projects/core/documentation/howto/threading.html


Just wonder if have the same thing in POE.


Re: How does POE speed up operation in reality?

2009-05-02 Thread Rocco Caputo

On May 3, 2009, at 01:25, Roy M. wrote:


total bandwidth needed for all ftp sessions = 10KB x 300 files x 50  
= 146MB


as you can see, bandwidth (giga), disk, cpu should not be a problem to
handle 146MB of data in 1 hour of time.

so I am thinking 2 solutions:

1. Use thread (e.g. 8 concurrent process as my CPU is quad core SMP)
to parallel the worker to ftp servers.
2. Investigate other solutions such as POE and that's why I am  
asking here:


So far the POE::FTP components only allow me to connect to one ftp
server at a time, how can I get the same/similar speed as 8 concurrent
process?).


Without having looked at the FTP component, I would either assume (a)  
that the component supports multiple connections, or (b) you would  
instantiate the component once per connection.  You should consult the  
component's author directly if he doesn't respond on the mailing list.



In fact, this is a quite common multiple-producer and
multiple-consumer pattern in muilt-threading theory. But so far the
POE example only allow me to do thing in a event-driven way, but not a
way to handle this  pattern, or something I have missed?



I don't understand the question.  Do you mean to imply that an event- 
driven program cannot consume data from multiple producers?


--
Rocco Caputo - [email protected]


Re: How does POE speed up operation in reality?

2009-05-02 Thread Roy M.
Hello,

On Sun, May 3, 2009 at 1:27 AM, Rocco Caputo  wrote:
> On May 2, 2009, at 05:37, Roy M. wrote:
> Time constraint = 1 hour.
> 50 servers * 300 files = 15,000 files.
> 15,000 files * 5 minutes = 75,000 minutes.
>

sorry for the misleading:

I mean to download 300 files from a single ftp server is less than a
few minutes, e.g. 5 minutes for safe. (files are just very small, most
are a few KB, e.g. <10KB)

so currently I already have a sequential Net::FTP perl script which
can handle the job for ONE ftp server every hours very nicely.


The problem arise when I add more ftp servers to handle by the script,
e.g. 50 ftp servers x 5 minutes = 250 mins =  4 hours (worst case)

total bandwidth needed for all ftp sessions = 10KB x 300 files x 50 = 146MB

as you can see, bandwidth (giga), disk, cpu should not be a problem to
handle 146MB of data in 1 hour of time.


so I am thinking 2 solutions:

1. Use thread (e.g. 8 concurrent process as my CPU is quad core SMP)
to parallel the worker to ftp servers.
2. Investigate other solutions such as POE and that's why I am asking here:

So far the POE::FTP components only allow me to connect to one ftp
server at a time, how can I get the same/similar speed as 8 concurrent
process?).

In fact, this is a quite common multiple-producer and
multiple-consumer pattern in muilt-threading theory. But so far the
POE example only allow me to do thing in a event-driven way, but not a
way to handle this  pattern, or something I have missed?



Thanks.


Re: How does POE speed up operation in reality?

2009-05-02 Thread Rocco Caputo

On May 2, 2009, at 05:37, Roy M. wrote:


On Sat, May 2, 2009 at 4:04 PM, Rocco Caputo   
wrote:
You seem to be seeking examples where POE isn't necessarily  
appropriate.  I
recommend looking for CPU-bound problems, since that's where POE  
doesn't
directly help.  Even in the CPU-bound case, a subthread or child  
process can
externalize and parallelize the work.  POE can continue doing non- 
blocking
work in the meantime, and it can be notified when the side thread  
or process

is done.



Thanks for your lengthly reply.
I think I better explain my usage in more detail (sorry for not begin
done before).

Case:

1. I need to write a script to download files from 50 FTP servers  
every hour.


2. For each FTP server, I need to download 100-300 files.

3. I can accept connection to each FTP server is being single threaded
(since the max. time for operations on a single FTP server is quite
short, i.e. <  5 minutes)


Time constraint = 1 hour.
50 servers * 300 files = 15,000 files.
15,000 files * 5 minutes = 75,000 minutes.

You need to fit 1,250 hours worth of file transfer into a realtime hour.

You will need to run up to 1,250 simultaneous connections.
15,000 files / 1,250 connections = 12 files per connection.
12 files * 5 minutes = 60 minutes, so that works.

1,250 connections / 50 servers = 25 simultaneous connections per server.

How large are the files, in octets?

File size divided by 5 minutes = data rate per file, in octets per  
minute.  Multiply that by 1,250 to determine your required network  
capacity.  Divide what you require by your actual network capacity to  
determine how many networks you'll need.  If the result is greater  
than one, then you'll need to buy one or more networks.  Remember to  
round the quotient up to the next integer; you don't want to be half a  
network short.


Plan for growth.


4. So I tried to investigate POE, as FTP component is avaliable.


You should explore the number of network interfaces, disk channels,  
memory, and CPU cores you'll need with each technology option at your  
disposal.  Divide machine capacities by resource requirements, to find  
the number of machines you'll need.  Use the maximum number, which  
will reflect your tightest bottleneck.  If the bottleneck is network  
interfaces, make sure you don't use more than your network can support.


If budget is a bottleneck, choose the technology that minimizes  
hardware costs.



5. Currently written using simple single threaded POE program, and In
the `authenticated` method in POE::Component::Client::FTP, I need to
write the log into an external MSSQL (not MySQL, wrongly typed b4), so
this process should be blocking as I think.


The log seems to be per connection.  You can avoid blocking by  
journaling log entries to files, then spooling the journals into your  
SQL server with a dedicated background process.



Questions:

a. Is POE suitable for my jobs?


Your question implies additional constraints that you haven't mentioned.

POE is suitable for the current constraints and the work as specified,  
assuming that you have adequate hardware resources to achieve the work  
at all.



b. Do I need to use thread + POE in order to run 50 works together?


Threads are inappropriate for this task.  You will almost certainly  
need more than one machine to do this work, which implies a  
multiprocess solution.  Threads are redundant complexity, unless you  
can show a compelling need to share memory between downloaders.



c. Within each worker/process, I have some blocking operation in the
callback which prevent it from switching event more quickly, whar are
the normal way to handle?


You have yet to show that the blocking is significant or cannot be  
worked around trivially.  You have larger worries right now.


Your questions imply that you haven't thought the problem through.  If  
you need to hire an analyst, just ask.  I'm sure some who read this  
list are available.


--
Rocco Caputo - [email protected]


Re: How does POE speed up operation in reality?

2009-05-02 Thread Roy M.
Hi,

On Sat, May 2, 2009 at 4:04 PM, Rocco Caputo  wrote:
> You seem to be seeking examples where POE isn't necessarily appropriate.  I
> recommend looking for CPU-bound problems, since that's where POE doesn't
> directly help.  Even in the CPU-bound case, a subthread or child process can
> externalize and parallelize the work.  POE can continue doing non-blocking
> work in the meantime, and it can be notified when the side thread or process
> is done.
>


Thanks for your lengthly reply.
I think I better explain my usage in more detail (sorry for not begin
done before).

Case:

1. I need to write a script to download files from 50 FTP servers every hour.

2. For each FTP server, I need to download 100-300 files.

3. I can accept connection to each FTP server is being single threaded
(since the max. time for operations on a single FTP server is quite
short, i.e. <  5 minutes)

4. So I tried to investigate POE, as FTP component is avaliable.

5. Currently written using simple single threaded POE program, and In
the `authenticated` method in POE::Component::Client::FTP, I need to
write the log into an external MSSQL (not MySQL, wrongly typed b4), so
this process should be blocking as I think.


Questions:

a. Is POE suitable for my jobs?
b. Do I need to use thread + POE in order to run 50 works together?
c. Within each worker/process, I have some blocking operation in the
callback which prevent it from switching event more quickly, whar are
the normal way to handle?


Thanks.


Re: How does POE speed up operation in reality?

2009-05-02 Thread Rocco Caputo
DBI is generally not CPU bound on the client side.  Again, it's a case  
where the client spends its time waiting idly, so this may be made non- 
blocking.  A number of components have been written to help you  
execute DBI requests in the background.


A simple user lookup (presumably to authenticate a user) may take on  
the order of a few hundredths of a second, possibly longer if the  
table is large and poorly indexed, you mistakenly prepare() the select  
statement each time, or your mysqld or network are saturated.  None of  
these problems are insurmountable:


Upgrade your mysqld.
Upgrade your network.
Properly index your table.
Shard your database.


You seem to be seeking examples where POE isn't necessarily  
appropriate.  I recommend looking for CPU-bound problems, since that's  
where POE doesn't directly help.  Even in the CPU-bound case, a  
subthread or child process can externalize and parallelize the work.   
POE can continue doing non-blocking work in the meantime, and it can  
be notified when the side thread or process is done.


--
Rocco Caputo - [email protected]


On May 2, 2009, at 03:03, Roy M. wrote:


Hi.

On Sat, May 2, 2009 at 2:56 PM, Rocco Caputo   
wrote:
Incorrect.  A lot of the work of transferring a file is waiting for  
the data
to arrive.  POE can wait for data to arrive in parallel without  
threading.


POE::Component::Client::FTP may or may not be programmed to allow  
this, but
POE can do it.  It's a common misunderstanding to say "POE" when  
you mean "a
module using POE".  POE's authors have little control over how it's  
used.





What if I have some blocking operation within a callback, e.g. MySQL
query in `authenticated` method in POE::Component::Client::FTP ?

I think unless I make it run as parellel, the bottomneck still exist
as it is single threaded?


Thanks.




Re: How does POE speed up operation in reality?

2009-05-02 Thread Roy M.
Hi.

On Sat, May 2, 2009 at 2:56 PM, Rocco Caputo  wrote:
> Incorrect.  A lot of the work of transferring a file is waiting for the data
> to arrive.  POE can wait for data to arrive in parallel without threading.
>
> POE::Component::Client::FTP may or may not be programmed to allow this, but
> POE can do it.  It's a common misunderstanding to say "POE" when you mean "a
> module using POE".  POE's authors have little control over how it's used.
>


What if I have some blocking operation within a callback, e.g. MySQL
query in `authenticated` method in POE::Component::Client::FTP ?

I think unless I make it run as parellel, the bottomneck still exist
as it is single threaded?


Thanks.


Re: How does POE speed up operation in reality?

2009-05-01 Thread Rocco Caputo
Incorrect.  A lot of the work of transferring a file is waiting for  
the data to arrive.  POE can wait for data to arrive in parallel  
without threading.


POE::Component::Client::FTP may or may not be programmed to allow  
this, but POE can do it.  It's a common misunderstanding to say "POE"  
when you mean "a module using POE".  POE's authors have little control  
over how it's used.


--
Rocco Caputo - [email protected]


On May 2, 2009, at 02:39, Roy M. wrote:


Hi guy,

Some basic questions about POE.


For example, in the module: POE::Component::Client::FTP

http://search.cpan.org/~bingos/POE-Component-Client-FTP-0.22/lib/POE/Component/Client/FTP.pm


What I see is the old sequential ftp operations now being replaced by
some call back handlers and they will trigger automatically.

But the speed of the whole FTP operations (e.g. login, download, and
bye) should remain the same (or slower in fact due to overhead).

So if I want to have faster speed, e.g. download more then one files
at the same time, I need to use thread and POE together?

Am I correct?


Thanks.




How does POE speed up operation in reality?

2009-05-01 Thread Roy M.
Hi guy,

Some basic questions about POE.


For example, in the module: POE::Component::Client::FTP

http://search.cpan.org/~bingos/POE-Component-Client-FTP-0.22/lib/POE/Component/Client/FTP.pm


What I see is the old sequential ftp operations now being replaced by
some call back handlers and they will trigger automatically.

But the speed of the whole FTP operations (e.g. login, download, and
bye) should remain the same (or slower in fact due to overhead).

So if I want to have faster speed, e.g. download more then one files
at the same time, I need to use thread and POE together?

Am I correct?


Thanks.