Re: [Haskell-cafe] Re: Odd parallel haskell observations (some more numbers)

2010-08-09 Thread Don Stewart
sacha:
> Hi.
> 
> > On Mon, 9 Aug 2010 09:44:00 +0200
> > "JG" == Jean-Marie Gaillourdet  wrote:
> JG> 
> JG> I am no expert in web server tuning, but I will share my thoughts
> JG> about your approach and expectations nevertheless.
> 
> I would better think about ghc than about web server. I believe, that
> numbers I already provided (especially their deviation) illustrate that
> sometimes ghc runtime perform quite bad. 
> 
> I also found out that it can do better than it does by default, I can
> accept that runtime might be not capable to adjust itself to any taks
> the best way, but then it will be nice to know, for instance, why for
> such I/O that GC settings change performance in times.

I'd consider boiling this down into a small test case and asking Simon
Marlow -- runtime hacker uber-guru -- to help.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Odd parallel haskell observations (some more numbers)

2010-08-09 Thread Alexander Kotelnikov
Hi.

> On Mon, 9 Aug 2010 09:44:00 +0200
> "JG" == Jean-Marie Gaillourdet  wrote:
JG> 
JG> I am no expert in web server tuning, but I will share my thoughts
JG> about your approach and expectations nevertheless.

I would better think about ghc than about web server. I believe, that
numbers I already provided (especially their deviation) illustrate that
sometimes ghc runtime perform quite bad. 

I also found out that it can do better than it does by default, I can
accept that runtime might be not capable to adjust itself to any taks
the best way, but then it will be nice to know, for instance, why for
such I/O that GC settings change performance in times.

To illustrate even more that httpd has nothing to do with the phenomenon
I wrote a small C application which does pretty much the same. Numbers
tells that apache can serve much faster that it was required to please
haskell version (I will remind that it was 3.4 for single-threaded
haskell and went as low as 1.9s for 4-threaded). I attached the code
just in case.

10:27 sa...@loft4633:~/work/cube-server/tests 99> for i in 1 2 3 4; do for j in 
`seq 1 5`;do ./getc $i 1;done;done
1 1.352978
1 1.34
1 1.344545
1 1.345116
1 1.189060
2 0.668219
2 0.625113
2 0.698073
2 0.732621
2 0.722310
3 0.569121
3 0.581570
3 0.563512
3 0.566186
3 0.564232
4 0.510132
4 0.496181
4 0.529212
4 0.504506
4 0.511847

# include 
# include 
# include 
# include 
# include 
# include 

# include 
# include 
 #include 

int fib(int n) {
  if ( n > 1 )
return fib(n-1) + fib(n-2);
  else
return 1;
}

# define REQ "GET / HTTP/1.1\r\n\r\n"

int get() {
  int s;
  int n;
  struct sockaddr_in sa;
  struct in_addr ia;
  char buf[1];

  sa.sin_family = AF_INET;
  if ( !inet_pton(AF_INET, "127.0.0.1", &ia) ) {
fprintf(stderr, "inet_pton\n");
exit(1);
  }
  else {
sa.sin_addr.s_addr = ia.s_addr;
  }
  sa.sin_port = htons(80);

  s = socket(AF_INET, SOCK_STREAM, 0);
  n = connect(s, (struct sockaddr*)&sa, sizeof(sa));

  send(s, REQ, strlen(REQ), 0);
  while ( (n = recv(s, buf, 1, 0)) > 0 );
  //printf("%d\n", fib(38));
  close (s);
  return 0;
}

void* nget(void* p){
  int i;
  int n = *(int*)p;
  for ( i = 0; i < n; i++ )
get();
  return NULL;
}

int main(int argc, char* argv[]) {
  int c;
  int n;
  int p;
  int i;
  double run_time;
  struct timeval start;
  struct timeval end;
  
  c = strtol(argv[1], NULL, 10);
  n = strtol(argv[2], NULL, 10);
  p = n/c;

  pthread_t *thread_ids;

  thread_ids = (pthread_t*)malloc(sizeof(pthread_t*) * c);

  gettimeofday(&start, NULL);
  for (i = 0; i < c; i++) {
pthread_create(&thread_ids[i], NULL, nget, &p);
  }

  for (i = 0; i < c; i++) {
pthread_join(thread_ids[i], NULL);
  }
  gettimeofday(&end, NULL);
  
  run_time = end.tv_sec - start.tv_sec + 1e-6 * (end.tv_usec - start.tv_usec);
  printf("%d %f\n", c, run_time);
  return 0;
}
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Odd parallel haskell observations (some more numbers)

2010-08-09 Thread Jean-Marie Gaillourdet
Hello,

I am no expert in web server tuning, but I will share my thoughts about your 
approach and expectations nevertheless.

On 08.08.2010, at 21:07, Alexander Kotelnikov wrote:

> So I continue to issue thousands of HTTP GET requests to a local apache
> an got some ThreadScope pictures and numbers (BTW, I all this happens on
> a 4-core machine).

So your apache configuration is very crucial for the performance figures you 
will receive from your "benchmark". As far as I know web server benchmarks 
usually run continuously and report a current throughput or average latency of 
the last n seconds or something like that. 

This allows the tested web server to adapt to the kind of load it experiences. 
And the benchmarker is able to wait untill those numbers stabilize.
When you execute your program with a different number of capabilities 
(different -N settings), apache will see a different kind of load and behave 
different. This makes it hard to change your program and expect similar results.

> I would point out the following as deserving an explanation:
> 1. It looks like that none of tests used resources of more than 2 cores.

This might be an indication that cpu resources are not the limiting factor in 
this benchmark. You basically bench the io capabilities of your operating 
system, your installed apache with your configuration and of your installed ghc.

Therefore, increasing available cpu resources does't lead necessarily to 
increased performance.

> 2. Sometimes there is no activity in any thread of a running program
> (get.N4qg_withgaps.eventlog.png, does this mean that process is in a OS
> queue for scheduling or something else?)


> 3. Without RTS's -c or -qg multithreaded run suffers from excessive GC
> actions.
> 4. Even with -c/-qg thread's run looks to be iterrupted too frequent.
> 5. Provided that 1 requests in a row can be completed in ~3.4s I
> would expect that 4 threads might come close or even under 1s, but 1.9s
> was the best result.

A last point to consider:

Is getRequest strict? Does it internally use some kind of lazy IO? Is it 
possible that some resource aren't properly freed? Perhaps, because the library 
relies on the garbage collector reclaiming sockets? Or because the request 
aren't completely read?

I simply don't no the internall of Network.HTTP, but if it uses lazy IO it is 
IMHO not suitable for such a benchmark.

Just, my two euro cents.

-- Jean


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe