Re: [ql-developers] Massive amount of job state transitions and re-scheduling

2003-09-08 Thread Richard Zidlicky

On Sun, Sep 07, 2003 at 10:48:50PM +0200, BRANE wrote:
 
 
 - Original Message - 
 From: Peter Graf [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Sunday, September 07, 2003 9:53 PM
 Subject: Re: [ql-developers] Massive amount of job state transitions and
 re-scheduling
 

  Simple example: A M$ or Unix machine sends a file to the QDOS machine via
  TCP. It will send one or two packets, then stop and wait for ACK. Further
  packets will only be sent after further ACKs. Your ACKs can only be
  generated in 50 Hz rhythm, so packets will crawl one-by-one in 50 Hz
  rhythm. (Or two-by-two, if you're lucky.)
 
 AFAIK with TCP/IP this is negotiable. There is no need for such small
 window...

don't forget this is a rather simple TCP/IP implementation and apparently
it is already hard enough to make the simplest variant working reliably
with the garden variety of TCP/IP implementations out there.

Richard


Re: [ql-developers] Massive amount of job state transitions and re-scheduling

2003-09-07 Thread P Witte

Richard Zidlicky writes:


  Plus, I'm a bit surprised that you are apparently using jobs to fetch
the
  data from the ethernet card... It should be done via an interrupt
handler
  instead... Actually, the best design would be to have the Q60 fast
interrupt
  handler to fill a buffer, and a frame interrupt task to move the data
from
  that buffer into a bigger one for your job to fetch it in big
chunks...).

 this was discussed a while ago here, the big problem is that
 neither QDOS nor SMSQ will attempt to reschedule after interrupt
 handling and there is no way to deal with the complexities of the
 TCP/IP protocol inside the interupt handler.
 That means sending of protocol replies would be very often delayed
 by 1/50s which would make especially TCP crawl..

The last words you wrote the last time we discussed this topic was:

 Otoh checking for sys_rschd after isr processing looks really trivial 
 and top priority now.

Did you ever get round to it?

And Peter, did you try out the suggestions that were made at that time?

Could the effects Peter mentions have anything to do with the cache?

Per




Re: [ql-developers] Massive amount of job state transitions and re-scheduling

2003-09-07 Thread Thierry Godefroy

On Sat, 06 Sep 2003 00:24:18 +0200, Peter Graf wrote:

 
 Thierry wrote:
 
 .../...

 Plus, I'm a bit surprised that you are apparently using jobs to fetch the
 data from the ethernet card... It should be done via an interrupt handler
 instead...
 
 At first sight it looks like that of course. QDOS/SMS reality is different 
 though.
 
 Actually, the best design would be to have the Q60 fast interrupt
 handler to fill a buffer, and a frame interrupt task to move the data from
 that buffer into a bigger one for your job to fetch it in big chunks...).
 
 Wrong.
 
 1. TCP is not a linear flow of data into one direction, even if the purpose 
 is file transfer.

Yes, this I know, thanks... I'm perfectly aware of the fragmentation and of
out of order receipt of TCP packets... That doesn't change the fact you could
use the fast interrupt to store as many TCP packet as needed (i.e. when they
come in), into a buffer (organized as a linked list of recieved packets),
then to transfer the whole lot of packets to the higher level layers of the
TCP/IP stack at once and every 1/50th of second...

 QDOS (and likely SMSQ/E, too) is so primitive that an 
 interrupt service routine can _not_ trigger immediate rescheduling of jobs 
 after it has completed. The time until the next rescheduling can be 20 ms 
 (worst case) so the user job has to wait that time until it can process the 
 data. The effect is that the other TCP endpoint in the network has to wait 
 20 ms + processing + transfer time until it can react to the response 
 packet. Given MTU=1460=1.5KB your interrupt driven approach can not 
 guarantee more than a throughput of 1.5 KB / 20 ms = 75 KB/s with TCP, even 
 if the other endpoint needs zero time to process it's packets. (75 KB/s is 
 not quite what I want.)

Wrong... With my method, you simply get a 20ms penalty (at worst) on the
acknowledgment of all the packets that were bufered... I.e. you'll have
a (worst case) 20ms penalty when pinging a Q60 on a network, compared to
another computer...

 Unlike an ISR, a job _can_ trigger immediate rescheduling! You don't need 
 to always poll the NIC, a clever approach can lead to full TCP throughput 
 during network activity, but zero polling waste (except for a a few tens of 
 instructions per 50 Hz) when the network is inactive.

You don't need to poll the hardware as long as you can use an interrupt
to signal the arrival of each new packet. Is the Q60 able to trigger an
extrenal interrupt on such conditions ?  If yes, then the lowest layer
of the TCP/IP stack (actually of the Ethernet driver) could be implemented
as the external interrupt handler...

 The details are 
 somewhat complex, but as long the OS isn't changed, I have no better choice.
 
 2. You waste response (and processor) time by your second copying level. 
 Imagine running the TCP/IP stack on a SuperGoldCard. Copying or not copying 
 about 1 MB every second _does_ matter.

Well, aren't we speaking about the Q60 (or Q40) here ?  I mean, there's not
even an Ethernet I/F on (S)GCs...

 3. The idea of collecting fragments into larger buffers is not feasible, 
 unless you implement the TCP/IP stack itself within ISRs. (There are good 
 reasons not to do that!)

This is wrong... The low level part is only responsible for moving the data
from the hardware into an area of the memory wher it can wait until it's
processed... I see no problem at all...

Thierry.