Re: [Openstack] [openstack-dev] CLI command to figure out security-group's association to particular tenant/user

2013-06-28 Thread Rick Jones

On 06/28/2013 01:55 AM, Rahul Sharma wrote:

Thanks Aaron for your kind help. It worked. Is there any doc which lists
all the possible commands and their usage for quantum? because --help
doesn't help in identifying all the parameters, is there any reference
which one can use to get the complete command syntax?


If you use quantum help command rather than quantum --help, it will 
give you more detailed help about command.  For example:


$ quantum help security-group-rule-create
usage: quantum security-group-rule-create [-h]
  [-f {html,json,shell,table,yaml}]
  [-c COLUMN] [--variable VARIABLE]
  [--prefix PREFIX]
  [--request-format {json,xml}]
  [--tenant-id TENANT_ID]
  [--direction {ingress,egress}]
  [--ethertype ETHERTYPE]
  [--protocol PROTOCOL]
  [--port-range-min PORT_RANGE_MIN]
  [--port-range-max PORT_RANGE_MAX]
  [--remote-ip-prefix 
REMOTE_IP_PREFIX]

  [--remote-group-id SOURCE_GROUP]
  SECURITY_GROUP

Create a security group rule.

positional arguments:
  SECURITY_GROUPSecurity group name or id to add rule.

optional arguments:
  -h, --helpshow this help message and exit
  --request-format {json,xml}
the xml or json request format
  --tenant-id TENANT_ID
the owner tenant ID
  --direction {ingress,egress}
direction of traffic: ingress/egress
  --ethertype ETHERTYPE
IPv4/IPv6
  --protocol PROTOCOL   protocol of packet
  --port-range-min PORT_RANGE_MIN
starting port range
  --port-range-max PORT_RANGE_MAX
ending port range
  --remote-ip-prefix REMOTE_IP_PREFIX
cidr to match on
  --remote-group-id SOURCE_GROUP
remote security group name or id to apply rule

output formatters:
  output formatter options

  -f {html,json,shell,table,yaml}, --format {html,json,shell,table,yaml}
the output format, defaults to table
  -c COLUMN, --column COLUMN
specify the column(s) to include, can be repeated

shell formatter:
  a format a UNIX shell can parse (variable=value)

  --variable VARIABLE   specify the variable(s) to include, can be repeated
  --prefix PREFIX   add a prefix to all variable names

rick jones

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Swift performance issues with requests

2013-06-04 Thread Rick Jones

On 06/04/2013 02:45 AM, Klaus Schürmann wrote:

Hi Rick,

I found the problem. I placed a hardware balancer in front of the proxy server.
The balancer lost some packets because of a faulty network interface.
Your tip was excellent.


I'm glad it helped.  Looking back I see that my math on the cumulative 
time for successive retransmissions of TCP SYNs was completely wrong - 3 
+ 6 + 12 isn't 17, but 21... :)


rick



Thanks
Klaus

-Ursprüngliche Nachricht-
Von: Rick Jones [mailto:rick.jon...@hp.com]
Gesendet: Freitag, 31. Mai 2013 19:17
An: Klaus Schürmann
Cc: openstack@lists.launchpad.net
Betreff: Re: [Openstack] Swift performance issues with requests

On 05/31/2013 04:55 AM, Klaus Schürmann wrote:

May 31 10:33:08 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 
31/May/2013/08/33/08 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - 
Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 
- 283354 - txd4a3a4bf3f384936a0bc14dbffddd275 - 0.1020 -
May 31 10:33:26 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 31/May/2013/08/33/26 GET 
/v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - Wget/1.12%20%28linux-gnu%29 
provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 - 283354 - txd8c6b34b8e41460bb2c5f3f4b6def0ef - 17.7330 -   



Something I forgot to mention, which was the basis for my TCP
retransmissions guess.  Depending on your kernel revision, the initial
TCP retransmission timeout is 3 seconds, and it will double each time -
eg 3, 6, 12.  As it happens, the cumulative time for that is 17
seconds...  So, the 17 seconds and change would be consistent with a
transient problem in establishing a TCP connection.  Of course, it could
just be a coincidence.

Later kernels - I  forget where in the 3.X stream exactly - have the
initial retransmission timeout of 1 second.  In that case the timeouts
would go 1, 2, 4, 8, etc...

rick




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Swift performance issues with requests

2013-05-31 Thread Rick Jones

On 05/31/2013 04:55 AM, Klaus Schürmann wrote:

Hi,

when I test my new swift cluster I get a strange behavior with GET and PUT 
requests.
Most time it is really fast. But sometimes it takes a long time to get the data.
Here is an example with the same request which took one time 17 seconds:

...

Can someone explain such behavior?


I'm sure others will suggest storage things to check.  Being a 
networking type, I will suggest looking into TCP retransmissions. 
Netstat -s commands can be helpful there.  On your client, the proxy and 
perhaps even your object server(s).


rick jones

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Swift performance issues with requests

2013-05-31 Thread Rick Jones

On 05/31/2013 04:55 AM, Klaus Schürmann wrote:

May 31 10:33:08 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 
31/May/2013/08/33/08 GET /v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - 
Wget/1.12%20%28linux-gnu%29 provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 
- 283354 - txd4a3a4bf3f384936a0bc14dbffddd275 - 0.1020 -
May 31 10:33:26 swift-proxy1 proxy-logging 10.4.2.99 10.4.2.99 31/May/2013/08/33/26 GET 
/v1/AUTH_provider1/129450/829188397.31 HTTP/1.0 200 - Wget/1.12%20%28linux-gnu%29 
provider1%2CAUTH_tke6408efec4b2439091fb6f4e75911602 - 283354 - txd8c6b34b8e41460bb2c5f3f4b6def0ef - 17.7330 -   



Something I forgot to mention, which was the basis for my TCP 
retransmissions guess.  Depending on your kernel revision, the initial 
TCP retransmission timeout is 3 seconds, and it will double each time - 
eg 3, 6, 12.  As it happens, the cumulative time for that is 17 
seconds...  So, the 17 seconds and change would be consistent with a 
transient problem in establishing a TCP connection.  Of course, it could 
just be a coincidence.


Later kernels - I  forget where in the 3.X stream exactly - have the 
initial retransmission timeout of 1 second.  In that case the timeouts 
would go 1, 2, 4, 8, etc...


rick

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] New code name for networks

2013-05-11 Thread Rick Jones

On 05/11/2013 01:07 PM, Monty Taylor wrote:

I have been arguing for:

mutnuaq


As someone who named a netperf test MAERTS because it transfered data 
the opposite direction as the STREAM test, I'm good with that.


If that does not work, with Quantum being something of Spooky 
networking at a distance - perhaps something else in the realm of 
quantum physics?


rick jones

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] launching multiple VMs takes very long time

2013-05-01 Thread Rick Jones

On 05/01/2013 10:43 AM, Steve Heistand wrote:

there may be some network issues going on here, trying to shove some amount of 
data
bigger then a few Gig seems to start slowing things down.


If you think there is an actual network problem, you could I suppose try 
exploring that with netperf or iperf.  It seems unlikely that the 
network could retain a memory of a previous transfer to cause a 
subsequent, non-overlapping transfer to run more slowly.


The logs on the compute node(s) will show how long it took to actually 
retrieve the image yes?


rick jones

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] launching multiple VMs takes very long time

2013-04-30 Thread Rick Jones

On 04/30/2013 11:42 AM, Steve Heistand wrote:

if I launch one vm at a time its doesnt very long to start up the instance. 
maybe a minute.
if I launch 4 instances (of the same snapshot as before) it takes 30 minutes.

they are all launching to different compute nodes, the controllers are all 
multicore,
I dont see any processes on the compute nodes taking much cpu power, the 
controller has
a keystone process mostly sucking up 1 core, loads and loads of beam.smp from 
rabbitmq but
none are really taking any cpu time.

the glance image storage is on the controller node, the snapshots are 2-3G in 
size.

where should I start looking to find out why things are so slow?


I am something of a networking guy so that will color my response :)

Is each  instance using the same image or a different one?  If a 
different one, the workload to the glance storage will become (I 
suspect) a random read workload accessing the four different images even 
though each individual stream may be sequential.  How much storage 
oomph do you have on/serving the glance/controller node?


After that, what do the network statistics look like?  Starting I 
suppose with the glance/controller node.  Take some snapshots over an 
interval in each case and run them through something like beforeafter:


netstat -s  before
...wait a defined/consistent moment...
netstat -s  after
beforeafter before after  delta

and go from there.

rick jones

/*
 * beforeafter
 *
 *   SYNOPSIS
 *
 *   beforeafter before_file after_file
 *
 *   Description
 *
 *   Subtract the numbers in before_file from the numbers in
 *   after_file.
 * 
 *   Example
 *
 *   # netstat -s  netstat.before
 *   # run some test here
 *	 # netstat -s  netstat.after
 *	 # beforeafter netstat.before netstat.after
 *
 *   Note
 *
 *   The long double is usually implemented as the IEEE double
 *   extended precision: its mantissa is _at_least_ 64 bits.
 *   Therefore, long double should be able to handle 64-bit
 *   integer numbers.
 */

#include stdio.h
#include ctype.h
#include unistd.h
#include stdlib.h

char	*USAGE = before_file after_file;

main(argc, argv)
int		argc;
char	*argv[];

{
FILE	*fp1;		/* before file */
FILE	*fp2;		/* after file */
char	*fname1;
char	*fname2;
int		i;
int		c;
int		c2;
int		separator  = 1;
int		separator2 = 1;
unsigned int	n1 = 0;
unsigned int	n2 = 0;
long double		d1 = 0.0;
long double		d2 = 0.0;
long double		delta = 0.0;
long double		p31;
long double		p32;
long double		p63;
long double		p64;

/*
 * Checke # of arguments.
 */
if (argc != 3) {
	long double	x;

	fprintf(stderr, Usage: %s %s\n, argv[0], USAGE);

	printf (\n);
	printf ( Testing how many decimal digits can be handled ...\n);
	printf ( #digits #bits  1=pass, 0=fail\n);
	x = 9.0L -
	8.0L;
	printf (%10s %6s   %1.0Lf\n,  9, ~30, x);
	x = 99.0L -
	98.0L;
	printf (%10s %6s   %1.0Lf\n, 18, ~60, x);
	x = 999.0L -
	998.0L;
	printf (%10s %6s   %1.0Lf\n, 27, ~90, x);
	x = 99.0L -
	98.0L;
	printf (%10s %6s   %1.0Lf\n, 30, ~100, x);
	x = 9.0L -
	8.0L;
	printf (%10s %6s   %1.0Lf\n, 33, ~110, x);
	x = .0L -
	9998.0L;
	printf (%10s %6s   %1.0Lf\n, 36, ~120, x);

	exit (1);
}
/*
 * Open files.
 */
fname1 = argv[1];
fname2 = argv[2];
fp1 = fopen(fname1, r);
fp2 = fopen(fname2, r);
if (!fp1 || !fp2) {
	fprintf(stderr, fp1 = %p  fp2 = %p, fp1, fp2);
	perror (Could not open files);
	exit (2);
}
/*
 * Prepare for 32-bit and 64-bit overflow check.
 */
for (p31=1.0, i=0; i31; i++) {
	p31 *= 2.0;			/* 2^31 */
}
p32 = p31 * 2.0;			/* 2^32 */
p63 = p31 * p32;			/* 2^63 */
p64 = p63 * 2.0;			/* 2^64 */

/*
 * Parse.
 */
while ((c = getc(fp1)) != EOF) {
	if (c==' ' || c=='\t' || c==':' || c=='(' || c=='\n') {
	printf(%c, c);
	separator = 1;
	} else if (!isdigit(c))	{
	printf(%c, c);
	separator = 0;
	} else {
	if (separator == 0) {
		printf(%c, c);	/* this digit is a part of a word */
		continue;
	}
	n1 = c - '0';
	d1 = n1;
	while ((c = getc(fp1)) != EOF) {
		if (isdigit(c))	{
		n1 = c - '0';
		d1 = 10.0 * d1 + n1;
		} else {
		break;
		}
	}
	/*
	 * Find the counterpart in the after file.
	 */
	while ((c2 = getc(fp2)) != EOF) {
		if (c2==' ' || c2=='\t' || c2==':' || c2=='(' || c2=='\n') {
		separator2 = 1;
		} else if (!isdigit(c2)) {
		separator2 = 0;
		} else {
		if (separator2 == 0) {
			continue;
		}
		n2 = c2 - '0';
		d2 = n2;
		while ((c2 = getc(fp2)) != EOF) {
			if (isdigit(c2

Re: [Openstack] launching multiple VMs takes very long time

2013-04-30 Thread Rick Jones

On 04/30/2013 01:36 PM, Melanie Witt wrote:

This presentation from the summit might be of interest to you:

http://www.openstack.org/summit/portland-2013/session-videos/presentation/scaling-the-boot-barrier-identifying-and-eliminating-contention-in-openstack


A nice presentation.  Based on his comment at the end, I did the web 
search and found his slides at:


http://www.cs.utoronto.ca/~peter/feiner_slides_openstack_summit_portland_2013.pdf

A caveat/nit/whatnot about looking at overall system CPU utilization and 
assuming no CPU bottleneck (the hardware portion at the beginning) at 
points even well below 100% utilization - with multiple CPUs in a system 
now, there are for example, many ways for there to be 50% overall CPU 
utilization.  It could be that all the CPUs are indeed at 50% util, but 
it could also be that 1/2 the CPUs are at 100% and the other half are 
idle.  Now, perhaps that fits in the space between a hardware and a 
software bottleneck, but I'd be cautious about overall CPU utilization 
figures.


For example, a single or small number of CPUs saturating can happen 
rather easily in some networking workloads - the CPU servicing 
interrupts from the NIC (or CPUs if the NIC is multiqueue) can saturate. 
 I'd consider that a hardware saturation, even though many of the other 
CPUs in the system are largely idle.  That is why in later versions of 
netperf, there is a way to report the ID and utilization of the most 
utilized CPU during a test, in addition to reporting the overall CPU 
utilization.


rick jones

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] question on the GRE Performance

2013-03-15 Thread Rick Jones

On 03/15/2013 08:05 AM, tommy(小包) wrote:

Hi Guys,

in my test, i found OVS GRE performance so lower, for example:
100Mbits Switch, GRE just 26Mbits speed,but use linux bridge
95Mbits,

so, my question is: why GRE speed low, or may be my config not right,/


95 and 26 Mbit/s measured at what level?  On the wire (including all 
the protocol headers) or to user level (after all the protocol headers)? 
That you were seeing 95 Mbit/s suggests user level but I'd like to make 
certain.


GRE adds header overhead, but I wouldn't think enough to take one from 
95 down to 26 Mbit/s to user level.  I would suggest looking at, in no 
particular order:


*) Netstat stats on your sender - is it retransmitting in one case and 
not the other?


*) per-CPU CPU utilization - is any one CPU on the sending, receiving or 
intervening iron saturating in one case and not the other?


and go from there. I'm guessing your tests are all bulk-transfer - you 
might want to consider adding some latency and/or aggregate small-packet 
performance tests.


happy benchmarking,

rick jones
the applicability varies,  but attached is some boilerplate I've 
built-up over time, on the matter of why is my network performance 
slow?  PS - the beforeafter utility mentioned is no longer available 
via ftp.cup.hp.com because ftp.cup.hp.com no longer exists.  I probably 
aught to put it up on ftp.netperf.org...



Some of my checklist items when presented with assertions of poor
network performance, in no particular order, numbered only for
convenience of reference:

1) Is *any one* CPU on either end of the transfer at or close to 100%
   utilization?  A given TCP connection cannot really take advantage
   of more than the services of a single core in the system, so
   average CPU utilization being low does not a priori mean things are
   OK.

2) Are there TCP retransmissions being registered in netstat
   statistics on the sending system?  Take a snapshot of netstat -s -t
   from just before the transfer, and one from just after and run it
   through beforeafter from
   ftp://ftp.cup.hp.com/dist/networking/tools:

   netstat -s -t  before
   transfer or wait 60 or so seconds if the transfer was already going
   netstat -s -t  after
   beforeafter before after  delta

3) Are there packet drops registered in ethtool -S statistics on
   either side of the transfer?  Take snapshots in a manner similar to
   that with netstat.

4) Are there packet drops registered in the stats for the switch(es)
   being traversed by the transfer?  These would be retrieved via
   switch-specific means.

5) What is the latency between the two end points.  Install netperf on
   both sides, start netserver on one side and on the other side run:

   netperf -t TCP_RR -l 30 -H remote

   and invert the transaction/s rate to get the RTT latency.  There
   are caveats involving NIC interrupt coalescing settings defaulting
   in favor of throughput/CPU util over latency:

   ftp://ftp.cup.hp.com/dist/networking/briefs/nic_latency_vs_tput.txt

   but when the connections are over a WAN latency is important and
   may not be clouded as much by NIC settings.

   This all leads into:

6) What is the *effective* TCP (or other) window size for the
   connection.  One limit to the performance of a TCP bulk transfer
   is:

   Tput = W(eff)/RTT

   The effective window size will be the lesser of:

   a) The classic TCP window advertised by the receiver. This is the
  value in the TCP header's window field shifted by the window
  scaling factor which was exchanged during connection
  establishment. The window scale factor is why one wants to get
  traces including the connection establishment.
   
  The size of the classic window will depend on whether/what the
  receiving application has requested via a setsockopt(SO_RCVBUF)
  call and the sysctl limits set in the OS.  If the receiving
  application does not call setsockopt(SO_RCVBUF) then under Linux
  the stack will autotune the advertised window based on other
  sysctl limits in the OS.  Other stacks may or may not autotune.
 
   b) The computed congestion window on the sender - this will be
  affected by the packet loss rate over the connection, hence the
  interest in the netstat and ethtool stats.

   c) The quantity of data to which the sending TCP can maintain a
  reference while waiting for it to be ACKnowledged by the
  receiver - this will be akin to the classic TCP window case
  above, but on the sending side, and concerning
  setsockopt(SO_SNDBUF) and sysctl settings.

   d) The quantity of data the sending application is willing/able to
  send at any one time before waiting for some sort of
  application-level acknowledgement.  FTP and rcp will just blast
  all the data of the file into the socket as fast as the socket
  will take it.  Scp has some application-layer windowing which
  may cause it to put less data out onto the connection

Re: [Openstack] [QUANTUM] (Bug ?) L3 routing not correctly fragmenting packets ?

2013-03-11 Thread Rick Jones

On 03/11/2013 06:09 AM, Sylvain Bauza wrote:

Okay. I think I got the reason why it's not working with OVS/GRE
contrary to FlatDHCP nova-network.
So, as per
http://www.cisco.com/en/US/tech/tk827/tk369/technologies_white_paper09186a00800d6979.shtml
,
GRE encapsulation protocol can add up to 34 bytes to the IP datagram
(meaning the TCP segment is only 1456 bytes if MTU set to 1500).
When the packet is about 1500 bytes, then it should fragment to keep the
1500-byte size of the reply (including GRE encap then).


That sounds like the reason.


Unfortunaly, due to security purpose, the ICMP packet type 3/code 4
(frag. needed) can't be reached to the X.X.X.X backend as this backend
is denying any ICMP request (firewall).
As a consequence, PathMTU is failing and packets still retransmited with
1500-byte size again and again...

As said on my first post, the only workaround I found is to modify *all*
my VMs with MTU set to 1454 (don't know why there is a 2-bytes overhead
compared to the 1456-byte I told above), including my Windows VMs which
is not a cool stuff (modifying a registry key and reboot the VM. Yes,
you aren't dreaming. This is the way for Windows-based machines to
modify MTUs...)

Do you know if any cool idea would prevent to modify VMs, and only do
things on the network node ?


Yes.  Let the ICMP Destination Unreachable, Datagram Too Big messages 
through.   So the network can function the way it was intended.


Otherwise you have no recourse but alter the MTU in the VMs.  Or add the 
insult of tweaking the code to ignore the DF bit to the injury of 
blocking the ICMP messages. (Assuming that is even possible)


If you are Very Lucky (tm) all your network infrastructure in the 
broadcast domain (everything on the same side of a router - device 
forwarding based on Layer3 (eg IP) addressing or put another way, 
everything reachable via just switches - in the proper sense of the term 
wherein a switch is a device making forwarding decisions based on 
layer2, eg Ethernet addresses) then you can try to increase the MTU of 
your physical interfaces so the GRE encapsulation overhead can be 
hidden from the VMs.  But *everything* in the broadcast domain must 
have the same maximum frame size (MTU) or life becomes even more 
interesting.


My suggestion is let the ICMP Destination Unreachable, Datagram Too Big 
messages through.  It is perhaps my failing, but I fail to see how 
blocking them improves security.


rick jones
adde parvum parvo magnus acervus erit - Ovid quoted in The Mythical Man 
Month


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [QUANTUM] (Bug ?) L3 routing not correctly fragmenting packets ?

2013-03-08 Thread Rick Jones

On 03/08/2013 11:49 AM, Aaron Rosen wrote:

Hi Rick,

You are right. I just ran curl to test for myself and it does set the
DF bit. Why is this? Any ideas why it specifies that the packet cannot
be fragmented?


Because most, if not virtually all TCP stacks going back to the mid 
1990s (RFC 1191 is from 1990 and I figured a couple years to propagate) 
enable Path MTU discovery by default for TCP.  At least those with which 
I have come into contact.


I doubt that curl itself asked for it.  I suspect you will find the DF 
bit set in the IP datagrams carrying the TCP segments of any application 
on your system using TCP - even netperf :)  PathMTU discovery, for TCP 
at least, and perhaps other reliable transports, is considered a Best 
Practice (tm) and so enabled by default.  Where it may not be enabled by 
default is for UDP.


rick

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Horizon and open connections

2013-01-31 Thread Rick Jones

On 01/31/2013 11:59 AM, Gabriel Hurley wrote:

Even though I don't experience this problem (and prefer nginx to
apache), I can help diagnose:

Connections ending up in CLOSE_WAIT means that the socket isn't being
fully closed, which is controlled by the client lib (in this case
python-keystoneclient) which uses httplib2 under the hood.


Expanding on that a bit. CLOSE_WAIT is the state a TCP endpoint will 
enter upon receiving a FINished segment from the remote TCP.  When the 
FIN arrives, the local application will receive notification of this via 
the classic read return of zero on a receive/read call against the 
socket.  The FIN segment means I will be sending you no more data.


Meanwhile, the local TCP will have ACKed the FIN segment, and the remote 
TCP will transition to FIN_WAIT_2 upon receipt of that ACK (until then 
it will be in FIN_WAIT_1).


Depending on how the remote application triggered the sending of the 
FIN, the TCP connection is now in a perfectly valid simplex state 
wherein the side in CLOSE_WAIT can continue sending data to the side 
which will now be in FIN_WAIT_2.  It is exceedingly rare for 
applications to want a simplex TCP connection


If such a unidirectional TCP connection is not of any use to an 
application, (the common case)then that application should/must also 
close the connection upon the read return of zero.


Thus, seeing lots of connections stuck in CLOSE_WAIT is an indication 
of an application-level (relative to TCP) bug wherein the application on 
the CLOSE_WAIT side is ignoring the read return of zero.


Such bugs in applications may be masked by a few things:

1) If the remote side called close() rather than shutdown(SHUT_WR) then 
an attempt on the CLOSE_WAIT side to send data to the remote will cause 
the remote TCP to return a RST segment (reset) because there is no 
longer anything above TCP to receive the data.  This will then cause the 
local TCP to terminate the connection.  This may also happen if the 
local application set SO_KEEPALIVE to enable TCP keepalives.


*) If the local side doesn't send anything, and doesn't have TCP 
keepalives set, if the remote TCP has a FIN_WAIT_2 timer of some sort 
going (long story involving a hole in the TCP specification and 
implementation workarounds, email if you want to hear it) then when that 
FIN_WAIT_2 timer expires the remote TCP may sent a RST segment.


RST segments are best effort in sending - they don't get retransmitted 
explicitly.  In case 1 if the RST segment doesn't make it back, the 
local TCP will retransmit the data it was sending (because it will not 
have received an ACKnowledgement either).  It will then either receive 
the RST triggered by that retransmission, or if no RSTs ever make it 
back, the local TCP will at some point reach its retransmission limit 
and terminate the connection.  In case 2, if that one RST is lost, 
that's it, and the CLOSE_WAIT may remain forever.


Again though, given the rarity of actual application use of a simplex 
TCP connection, 99 times out of 10, seeing lots of CLOSE_WAIT 
connections building-up implies a buggy application or the libraries 
doing work on its behalf.


rick jones

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] How to create vm instance to specific compute node?

2012-12-30 Thread Rick Jones

On 12/27/2012 02:39 PM, Jay Pipes wrote:

No.


Pity - I'd just gotten used to that mechanism :)


Use nova boot --availability_zone=nova:hostname where nova: is your
availability zone and hostname is the hostname of the compute node you
wish to put the instance on.

Note this is an admin-only ability by default and can oversubscribe the
compute node the instance goes on.


Will it use the same /var/lib/nova/sch_hosts/id mechanism to allow 
mere mortals to use it like the onhost stuff did?


thanks,

rick



Best,
-jay

On 12/27/2012 02:45 PM, Rick Jones wrote:

Does the convention of adding --onhost--computenodename to the instanc
name being created still work?

rick jones

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] How to create vm instance to specific compute node?

2012-12-27 Thread Rick Jones
Does the convention of adding --onhost--computenodename to the instance 
name being created still work?


rick jones

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Strange network behavior

2012-11-09 Thread Rick Jones

On 11/09/2012 09:14 AM, Joe Warren-Meeks wrote:

What I am seeing in Tcpdump is a lot of incorrect cksums. This happens
with all Tcp connections.

17:12:38.539784 IP (tos 0x0, ttl 64, id 53611, offset 0, flags [DF],
proto TCP (6), length 60)
 10.0.0.240.56791  10.0.41.3.22: Flags [S], cksum 0x3e21 (incorrect
- 0x6de2), seq 2650163743, win 14600, options [mss 1460,sackOK,TS val
28089204 ecr 0,nop,wscale 6], length 0


17:12:38.585279 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
TCP (6), length 60)
 10.0.41.3.22  10.0.0.240.56791: Flags [S.], cksum 0x3e21
(incorrect - 0xe5c5), seq 1530502549, ack 3098447117, win 14480,
options [mss 1460,sackOK,TS val 340493 ecr 28089204,nop,wscale 3], length 0

Anyone come across this before?


When a Network Interface card (NIC) offers ChecKsum Offload (CKO) in the 
outbound/transmit direction, the computation of the Layer 4 (eg TCP, 
UDP) checksum is deferred to the NIC.  You can see if a given 
interface/NIC has checksum offload, or other offloads, enabled via 
ethtool -k interface


When the packet passes the promiscuous tap on the way down the stack to 
a NIC offering CKO, the packet will be in essence unchecksummed and so 
tcpdump will report that as an incorrect checksum.  It is therefore 
possibly a false positive.  I say possibly because I just did a quick 
netperf test on my Ubuntu 11.04 workstation to see what the SYN's looked 
like there, and I didn't see an incorrect checksum warning out of 
tcpdump though I know the egress interface is offering outbound CKO, 
making me think that TCP may not bother with CKO for small segments like 
SYNchronize segments. One way to check if the incorrect checksum report 
is valid would be to run tcpdump on 10.0.41.3 as well.  And/or disable 
CKO if you see it is enabled in ethtool.


I would not have expected to see invalid checksums reported by tcpdump 
for an inbound packet though.  Might be good to cross-check with the 
netstat statistics.


There is what appears to be an inconsistency between those two TCP 
segments.  The sequence number of the SYNchronize (that 'S' in flags) 
segment from 10.0.0.240.56791 to 10.0.41.3.22 is 2650163743.  The SYN 
from 10.0.41.3.22 to 10.0.0.240.56791 though has the ACK flag set ('.') 
but the ACKnowledgement number is 3098447117 rather than what I would 
have expected - 2650163744.


FWIW, that there was a SYN-ACK sent in response to the SYN in the first 
place suggests that 10.0.41.3 received what it thought was a properly 
checksummed SYN segment.  All the more reason I suspect to take traces 
at both ends and compare the packets byte by byte.


rick jones



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Troubleshooting Swift 1.7.4 on mini servers

2012-10-30 Thread Rick Jones

On 10/29/2012 07:37 PM, Pete Zaitcev wrote:

On Mon, 29 Oct 2012 18:16:52 -0700
Nathan Trueblood nat...@truebloodllc.com wrote:


Definitely NOT a problem with the filesystem, but something is causing the
object-server to think there is a problem with the filesystem.


If you are willing to go all-out, you can probably catch the
error with strace, if it works on ARM.


Strace is your friend even if he is sometimes a bit on the chatty side. 
 It looks as though there is at least some support for ARM if 
http://packages.debian.org/search?keywords=strace is any indication.


rick jones


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [SWIFT] Proxies Sizing for 90.000 / 200.000 RPM

2012-10-26 Thread Rick Jones

On 10/25/2012 06:13 PM, Chander Kant wrote:

Sure. We have published a new blog related to the summit, including a
link to our presentation slides:

http://www.zmanda.com/blogs/?p=971
http://www.zmanda.com/pdf/how-swift-is-your-Swift-SD.pdf

We plan to publish more performance results within next few weeks.


Any chance of expanding on this:


disable TIME_WAIT, disable syn cookies ...


from slide 10?  Particularly the disabling of TIME_WAIT.  While the 
traditionalist couple-minutes TIME_WAIT may be a bit, oh, conservative, 
TIME_WAIT is there for a reason as part of TCP's correctness 
alogrithms/heuristics.  And disabling it suggests an opportunity to tune 
an application for better performance.


rick jones



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [SWIFT] Proxies Sizing for 90.000 / 200.000 RPM

2012-10-24 Thread Rick Jones

On Oct 11, 2012, at 4:28 PM, Alejandro Comisario 
alejandro.comisa...@mercadolibre.com wrote:


Hi Stackers !
This is the thing, today we have a 24 datanodes (3 copies, 90TB
usables) each datanode has 2 intel hexacores CPU with HT and 96GB
of RAM, and 6 Proxies with the same hardware configuration, using
swift 1.4.8 with keystone. Regarding the networking, each proxy /
datanodes has a dual 1Gb nic, bonded in LACP mode 4,


Are you seeing good balancing of traffic across the two interfaces in 
the bonds?



each of the proxies are behind an F5 BigIP Load Balancer ( so, no
worries over there ).


What is the pipe into/out-of the F5 (cluster of F5's?) and how 
utilized is that pipe already?  If it is running at anything more than 
2.5% (5000/20) to 5.5% (5000/9) in the direction the GETS will 
flow it will become a bottleneck. (handwaving it as 100% GETS rather 
than 90%)


rick jones



Today, we are receiving 5000 RPM ( Requests per Minute ) with 660
RPM per Proxies, i know its low, but now ... with a new product
migration, soon ( really soon ) we are expecting to receive about a
total of 90.000 RPM average ( 1500 req / s ) with weekly peaks of
200.000 RPM ( 3500 req / s ) to the swift api, witch will be 90%
public gets ( no keystone auth ) and 10% authorized PUTS (keystone
in the middle, worth to know that we have a 10 keystone vms pool,
connected to a 5 nodes galera mysql cluster, so no worries there
either )

So, 3500 req/s divided by 6 proxy nodes doesnt sounds too much, but
well, its a number that we cant ignore. What do you think about
this numbers? does this 6 proxies sounds good, or we should double
or triple the proxies ? Does anyone has this size of requests and
can share their configs ?


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Bad performance on physical hosts kvm + bonding + bridging

2012-07-13 Thread Rick Jones

On 07/13/2012 06:55 AM, Leandro Reox wrote:
Ok, here is the story, we deployed some inhouse APIs in our Openstack 
privade cloud, and we were stressing them up, we realize that some 
packages were taking so long, to discard the behavior of the api, we 
installed apache, lighttpd and event tried with netcat, of course on 
the guest systems running ubuntu 10.10 w/virtio, after getting nuts 
modifing sysctl parameters to change the guest behavior, we realized 
that if we installed apache, or lighttpd on the PHYSICAL host the 
behavior was the same , that surprised us, when we try the same 
benchmark on a node without bonding, bridging and without any KVM 
packages or nova installed, with the same HW specs, the benchmark 
passes OK, but if we run the same tests on a spare nova node with 
everything installed + bonding + bridging that never run a virtual 
guest machine, the test fails too, so, so far:


Tested on hosts with Ubuntu 10.10, 11.10 and 12.04

- Clean node without bonding + briding or KVM - just the eth0 
configured - PASS

- Spare node with bridging - PASS
- Spare node with just bonding (dynamic link aggr mode4) - PASS
- Spare node with nova + kvm + bonding + bridging - FAILS
- Spare node with nova + kvm - PASS

Is there a chance that working with bridging + bonding + nova some 
module get screwed, ill attach the tests , you can see that a small 
amount of packages takes TOO LONG, like 3secs, and the overhead time 
is on the CONNECT phase


If I recall correctly, 3 seconds is the default, initial TCP 
retransmission timeout (at least in older kernels - what is your load 
generator running?).  Between that, and your mentioning connect phase, 
my first guess (it is only a guess) would be that something is causing 
TCP SYNchronize segments to be dropped.  If that is the case, it should 
show-up in netstat -s statistics.  Snap them on both client and server 
before the test is started, and after the test is completed, and then 
run them through something like beforeafter ( 
ftp://ftp.cup.hp.com/dist/networking/tools )


netstat -s  before.server
# run benchmark
netstat -s  after.server
beforeafter before.server after.server  delta.server
less delta.server

(As a sanity check, make certain that before.server and after.server 
have the same number of lines. The habit of Linux's netstat to avoid 
printing a statistic with a value of zero can, sometimes, confuse 
beforeafter if a stat appears in after that was not present in before.)


It might not be a bad idea to include ethtool -S statistics from each of 
the interfaces in that procedure as well.


rick jones
probably a good idea to mention the bonding mode you are using


This is ApacheBench, Version 2.3 $Revision: 655654 $
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 172.16.161.25 (be patient)
Completed 2500 requests
Completed 5000 requests
Completed 7500 requests
Completed 1 requests
Completed 12500 requests
Completed 15000 requests
Completed 17500 requests
Completed 2 requests
Completed 22500 requests
Completed 25000 requests
Finished 25000 requests


Server Software:Apache/2.2.16
Server Hostname:172.16.161.25
Server Port:80

Document Path:  /
Document Length:177 bytes

Concurrency Level:  5
Time taken for tests:   7.493 seconds
Complete requests:  25000
Failed requests:0
Write errors:   0
Total transferred:  1135 bytes
HTML transferred:   4425000 bytes
Requests per second:3336.53 [#/sec] (mean)
Time per request:   1.499 [ms] (mean)
Time per request:   0.300 [ms] (mean, across all concurrent requests)
Transfer rate:  1479.28 [Kbytes/sec] received

Connection Times (ms)
 min  mean[+/-sd] median   max
Connect:01  46.6  03009
Processing: 01   5.7  0 277
Waiting:00   4.6  0 277
Total:  01  46.9  13010

Percentage of the requests served within a certain time (ms)
 50%  1
 66%  1
 75%  1
 80%  1
 90%  1
 95%  1
 98%  1
 99%  1
100%   3010 (longest request)

Regards!






___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [Swift][Object-server] Why that arp_cache consumes memory followed with uploading objects?

2012-07-12 Thread Rick Jones

On 07/12/2012 06:36 AM, Kuo Hugo wrote:


Hi all

I found that the arp_cache in slabinfo on objec-server is growing up 
followed with uploaded object numbers.


Does any code using it ?


The code which maps from IP to Ethernet addresses does.  That mapping is 
what enables sending IP datagrams to their next-hop destination (which 
may be the final hop, depending) on an Ethernet network.



2352000 1329606  56%0.06K  36750   64 147000K kmalloc-64
1566617 1257226  80%0.21K  42341   37338728K xfs_ili
1539808 1257748  81%1.00K  48119   32   1539808K xfs_inode
538432 470882  87%0.50K  16826   32269216K kmalloc-512
403116 403004  99%0.19K   9598   42 76784K dentry
169250 145824  86% 0.31K   6770   25 54160K arp_cache


Does it may cause any performance concern ?


I believe that is one of those it depends kinds of questions.


Btw , how could I flush the memory of arp_cache which using by XFS(SWIFT)?


You can use the classic arp command to manipulate the ARP cache. It 
can also show you how many entries there are.  I suspect that a web 
search on linux flush arp cache may yield some helpful results as well.


rick jones
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Performance metrics

2012-06-29 Thread Rick Jones

On 06/21/2012 02:21 PM, Rick Jones wrote:

TSO and GRO can cover a multitude of path-length sins :)

That is one of the reasons netperf does more than just bulk transfer :)
  When I was/am measuring scaling of an SMP node I would use
aggregate, burst-mode, single-byte netperf TCP_RR tests to maximize the
packets per second while minimizing the actual bandwidth consumed.

And if there is a concern about flows coming and going there is the
TCP_CRR test which is like the TCP_RR test but each transaction is a
freshly created and torn-down TCP connection.


It doesn't do TCP_CRR, and it is not geared towards the 
scores/hundreds/thousands of isntances, but I've just put a script into 
the netperf repository at netperf.org which will use novaclient.v1_1 to 
launch three instances of a specified flavor and run the 
runemomniaggdemo.sh script on one of them, targeting the other two.


http://www.netperf.org/svn/netperf2/trunk/doc/examples/netperf_by_flavor.py

Is it only my second bit of Python, so I'm sure it has lots of room for 
improvement, but perhaps it will be of use to folks and help act as a 
seed crystal.


happy benchmarking,

rick jones


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Performance metrics

2012-06-21 Thread Rick Jones

On 06/20/2012 08:09 PM, Huang Zhiteng wrote:

On Thu, Jun 21, 2012 at 12:36 AM, Rick Jonesrick.jon...@hp.com  wrote:

I do not have numbers I can share, but do have an interest in discussing
methodology for evaluating scaling  particularly as regards to
networking.  My initial thoughts are simply starting with what I have done
for network scaling  on SMP systems (as vaguely instantiated in the likes
of the runemomniaggdemo.sh script under
http://www.netperf.org/svn/netperf2/trunk/doc/examples/ ) though expanding
it by adding more and more VMs/hypervisors etc as one goes.


By 'network scaling', do you mean the aggregated throughput
(bandwidth, packets/sec) of the entire cloud (or part of it)? I think
picking up 'netperf' as micro benchmark is just 1st step, there's more
work needs to be done.


Indeed. A great deal more.


For OpenStack network, there's 'inter-cloud' and
'cloud-to-external-world' throughput.  If we care about the
performance for end user, then reason numbers (for network scaling)
should be captured inside VM instances.  For example, spawn 1,000 VM
instances across cloud, then pair them to do 'netperf' tests in
order to measure 'inter-cloud' network throughput.


That would certainly be an interesting test yes.

rick jones

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Performance metrics

2012-06-21 Thread Rick Jones

On 06/21/2012 12:41 PM, Narayan Desai wrote:


We did a bunch of similar tests to determine the overhead caused by
kvm and limitations of the nova network architecture. We found that
VMs themselves were able to consistently saturate the network link
available to the host system, whether it was 1GE or 10GE, with
relatively modern node and network hardware. With the default
VLANManager network setup, there isn't much you can do to scale your
outbound connectivity beyond the hardware you can reasonably drive
with a single node, but using multi-host nova-network, we were able to
run a bunch of nodes in parallel, scaling up our outbound bandwidth
linearly. We managed to get 10 nodes, with a single VM per node, each
running 4 TCP streams, up to 99 gigabits on a dedicated cross country
link. There was a bunch of tuning that we needed to do, but it wasn't
anything particularly outlandish compared with the tuning needed for
doing this with bare metal. We've been meaning to do a full writeup,
but haven't had time yet.


TSO and GRO can cover a multitude of path-length sins :)

That is one of the reasons netperf does more than just bulk transfer :) 
 When I was/am measuring scaling of an SMP node I would use 
aggregate, burst-mode, single-byte netperf TCP_RR tests to maximize the 
packets per second while minimizing the actual bandwidth consumed.


And if there is a concern about flows coming and going there is the 
TCP_CRR test which is like the TCP_RR test but each transaction is a 
freshly created and torn-down TCP connection.


happy benchmarking,

rick jones

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Performance metrics

2012-06-20 Thread Rick Jones

On 06/20/2012 05:56 AM, Neelakantam Gaddam wrote:

Hi All,

I want to do performance analysis on top of
[openstack,Qauntum,openvswitch] setup. I am interested in the following
metrics.

VM life cycle (creation, deletion, boot..,etc)
VM Migration
Quantum (network, port creation/deletion..,etc)

Are there any performance metric tools/scripts available in openstack ?
If not, how can I do the performance analysis of the above metrics on
openstack quantum setup ? Please help me regarding performance metrics.

I want to know details of the biggest deployment with
[openstack,Qauntum,openvswitch] setup interms of number of tenant
networks, number of compute nodes, number of VMs per tenant.


I do not have numbers I can share, but do have an interest in discussing 
methodology for evaluating scaling  particularly as regards to 
networking.  My initial thoughts are simply starting with what I have 
done for network scaling  on SMP systems (as vaguely instantiated in 
the likes of the runemomniaggdemo.sh script under 
http://www.netperf.org/svn/netperf2/trunk/doc/examples/ ) though 
expanding it by adding more and more VMs/hypervisors etc as one goes.


While netperf (or its like) is simply a microbenchmark, and so somewhat 
removed from reality it does have the benefit of not (directly at 
least :) leaking anything proprietary about what is going-on in any one 
vendor's environment.  And if something will scale well under the rigors 
of netperf workloads it will probably scale well under real workloads. 
 Such scaling under netperf may not be necessary, but it should be 
sufficient.


happy benchmarking,

rick jones


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Swift performance for very small objects

2012-05-21 Thread Rick Jones
-host.americas.hpqcorp.net.35594: Flags [P.], seq 1:319, ack 1, 
win 115, options [nop,nop,TS val 111970417 ecr 428732549], length 318
00:00:00.000158 IP internal-host.americas.hpqcorp.net.35594  
tardy.63978: Flags [.], ack 319, win 486, options [nop,nop,TS val 
428732549 ecr 111970417], length 0
00:00:00.62 IP internal-host.americas.hpqcorp.net.35594  
tardy.63978: Flags [P.], seq 1:129, ack 319, win 486, options 
[nop,nop,TS val 428732549 ecr 111970417], length 128
00:00:00.04 IP internal-host.americas.hpqcorp.net.35594  
tardy.63978: Flags [F.], seq 129, ack 319, win 486, options [nop,nop,TS 
val 428732549 ecr 111970417], length 0
00:00:00.06 IP tardy.63978  
internal-host.americas.hpqcorp.net.35594: Flags [.], ack 129, win 123, 
options [nop,nop,TS val 111970417 ecr 428732549], length 0
00:00:00.08 IP tardy.63978  
internal-host.americas.hpqcorp.net.35594: Flags [F.], seq 319, ack 130, 
win 123, options [nop,nop,TS val 111970417 ecr 428732549], length 0


It was in this LAN-based case a very short time between the SYN (the 
first line with SEW in it (the EW relates to Explicit Congestion 
Notification, which is on on my systems) and the  second line, which is 
the SYN|ACK (and acceptance of ecn for this connection).  I used port 
63978 as a packet filter to let the relative timestamps be helpful - 
otherwise, a last ACK from a previous connection would have been in there.


Anyway, it was only 97+7 microseconds (0.000104 s) before the connection 
was established from the point-of-view of the client, and only another 
10 microseconds after that before the request was on its way. (Strictly 
speaking, past the tracepoint in the stack running on the client, but I 
rather doubt it was much longer before it was actually through the NIC 
and on the wire)



It would be great if you could share your thoughts on this and how could
the performance of this special case be improved.


You should repeat the above (at least the tcpdump of your actual PUTs if 
not also the pure netperf test) on your setup.  I am as much a fan of 
persistent connections as anyone but I think you will find that TCP 
connection setup overhead wall clock time, is not the biggest component 
of the response time floor you have found.


That was suggesting tcpdump on the client.  If you also take a tcpdump 
trace at the server, you can see the length of time inside the server 
between when the request arrived off the wire, and when the response was 
sent (queued to the driver at least).


Caveat - you should not try to do math between absolute timestamps on 
the client and on the server unless you know that the two systems have 
really, Really, REALLY well synchronized clocks...


rick jones
http://www.netperf.org/

If you are still reading, for grins here is the TCP_RR for the same 
sizes.  Just no connection establishment overhead:
raj@tardy:~/netperf2_trunk$ src/netperf -H 
raj-8510w.americas.hpqcorp.net -t TCP_RR -l 30 -v 2 -- -r `expr 190 + 
128`,128
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET 
to internal-host.americas.hpqcorp.net () port 0 AF_INET : histogram : 
first burst 0

Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate
bytes  Bytes  bytesbytes   secs.per sec

16384  87380  318  128 30.004114.35
16384  87380
Alignment  Offset RoundTrip  TransThroughput
Local  Remote  Local  Remote  LatencyRate 10^6bits/s
Send   RecvSend   Recvusec/Tran  per sec  Outbound   Inbound
8  0   0  0   243.052   4114.347 10.4674.213

Histogram of request/response times
UNIT_USEC :0:0:0:0:0:0:0:0:0:0
TEN_USEC  :0:0:0:0:0:0:0:0:0:0
HUNDRED_USEC  :0:  774: 117564: 4923:  102:   25:9:4:4:7
UNIT_MSEC :0:   14:2:1:0:0:0:1:1:1
TEN_MSEC  :0:0:0:0:0:0:0:0:0:0
HUNDRED_MSEC  :0:0:0:0:0:0:0:0:0:0
UNIT_SEC  :0:0:0:0:0:0:0:0:0:0
TEN_SEC   :0:0:0:0:0:0:0:0:0:0
100_SECS: 0
HIST_TOTAL:  123432

Certainly, when/if there is very little service time in the server, a 
persistent connection will be faster - it will send rather fewer 
segments back and forth per transaction.  But if the service times are 
rather larger than the network round-trip-times involved the gains from 
a persistent connection will be minimized.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Openstack Beginners guide for Ubuntu 12.04/Essex

2012-05-10 Thread Rick Jones

On 05/10/2012 07:33 AM, Atul Jha wrote:

Suggestion/criticism would be highly appreciated.


Tried a few times to send this directly to Atul and the 
css.ossbo...@csscorp.com  address in the paper, but was getting rejected 
content for Atul's email destination and no such user for the 
css.ossbooks email.  So, some feedback, mostly little things, 
wording/format/etc:



11th Page - List of Tables - This is a tutorial style beginner’s guide
for OpenStackTM on Ubuntu 12.04, Precise Pangolin. The aim is to help
the reader in setting up a minimal installation of OpenStack. doesn't
seem like a list of tables.

13th page, section 1.1 - since it is a beginners guide, a short sentence
describing IaaS, PaaS and SaaS would be a good thing to include.


13th page, section 1.2 - similar to previous, a short sentence
describing what a Compute, Storage, Imaging, Identity and UI service
are/do would be goodness.


14th page - Perhaps a dialect thing but should it be The diagram below
rather than The below diagram? Also, I would put the overall diagram
before the Nova-specific one and then call them Overall Architecture
and Nova Architecture respectively.  Show the beginner the overall 
first before hitting him with the complex :)


Also, in the overall diagram, should Glance be called STORE or should
that be IMAGE to maintain consistency with previous discussion -
someone seeing Glance:Store and Swift:Storage will wonder about the
difference.

15th page - section 1.2.1.2.2 - I think that should start with
OpenStack components communicate

section 1.2.1.2.3 - Compute workers deal with the instance management
life cycle... and I might add based on the scheduling algorithm used
by nova-scheduler.

Section 1.2.1.2.4 - security groups are mentioned without prior definition.

16th page - section 1.2.1.2.6 - previously, it was said that OpenStack
Nova provides EC2 apis and the native was mentioned just as an aside.
Now though we read The scheduler maps the nova-API calls to the ... -
what has become of EC2?

section 1.2.2 - might it be worthwhile to include the Swift project
name along with Open Stack Object Store in the second bullet item?

22nd page - section 2.2.2 - should there be some sort of caveat about
using IP addresses appropriate for the admin's specific situation?

Section 2.2.3 - the NTP gods are quite adamant about configuring at
least four sources of time. That allows the bad clock detection
heuristics to operate even if one of the time sources is unavailable.

IP addresses of the servers are resolvable sounds like asking for PTR
records to go from IP to name, but I think you mean to verify that the
names can be resolved to IPs no? Perhaps Ensure that the hostnames can
be resolved to their respective IP addresses. If they are not
resolvable via DNS, you can add entries to the /etc/hosts file.

Some discussion of how long it will take Server1 to get its time
synchronized and so be willing to serve time to others is probably in order.

27th page - it might be an artifact of document viewer, but it isn't
possible to cut-and-paste the keystone commands from the document. And
even if it was, where I'd expect to find a backslash '\' there is an
arrow with a curled shaft - is that something bash et all will recognize
and deal with properly as a continued on the next line indication?


40th page - why is Server2 a child of Server1 section 2.2 instead of
its own section 2.3? Also, the interfaces file seems to be the first
indication that Server2 needs to have two NICs.

42nd page - same sort of question about Client1

56th page - 5.2.1 Instances - the text is on this page, but the image is
on the 57th page. And that continues with the other sections.
Something should be done to force the text and image to be on the same page.

58th page - section 5.2.3 - Flavors as a term just sort of magically
appears for the first time here.

80th page - section 8.1 - not an issue with the document per-se but with
the terms nova chose. To someone with much knowledge of TCP From Port
sounds like the source port number and To Port sounds like the
destination port number. That is very different from what they are in
this context, which are the Beginning and Ending port numbers of an
instance-local range of ports being opened. Some verbiage about that
might be goodness.

Also the example description for adding port 22 is incomplete - it isn't
allowing tcp traffic traffic generally. It is allowing ssh/scp traffic
specifically

hope that helps,

rick jones

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Caching strategies in Nova ...

2012-03-23 Thread Rick Jones

On 03/23/2012 01:26 PM, Mark Washenberger wrote:



Johannes Erdfeltjohan...@erdfelt.com  said:



MySQL isn't exactly slow and Nova doesn't have particularly large
tables. It looks like the slowness is coming from the network and how
many queries are being made.

Avoiding joins would mean even more queries, which looks like it would
slow it down even further.



This is exactly what I saw in my profiling. More complex queries did
still seem to take longer than less complex ones, but it was a second
order effect compared to the overall volume of queries.

I'm not sure that network was the culprit though, since my ping
roundtrip time was small relative to the wall time I measured for each
nova.db.api call.


How much data would the queries return, and how long between queries? 
One networking thing that might come into play would be slow start 
after idle - if the query returns are  INITCWND (either 3 or 10 
segments depending on which kernel) and they are separated by at least 
one RTO (or is it RTT?) then they will hit slow start each time.  Now, 
the extent to which that matters is a function of how large the return 
is, and it is only adding RTTs so it wouldn't be minutes, but it could 
add up a bit I suppose.


rick jones

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] glance performance gains via sendfile()

2012-02-06 Thread Rick Jones
If one wants to experiment with the performance effects of sendfile(), 
the netperf benchmark http://www.netperf.org/ has a TCP_SENDFILE 
test which complements the TCP_STREAM test.  It can also report CPU 
utilization and service demand to allow a comparison of efficiency.


netperf -H destination -t TCP_SENDFILE -F file -c -C -l 30

will run a 30-second TCP_SENDFILE tests using file as the data source 
(one is created if no -F option is specified) sending to destination 
(assumes that netserver has been launched on destination.  The 
corresponding TCP_STREAM test would be the obvious substitution.


One area of investigation would be the effect of send size on things. 
That can be accomplished with a test-specific (following a -- on the 
command line) -m option:


netperf  ...as above...  -- -m 64K

would cause netperf to send 65536 bytes in each send call.  The manual 
for the current top-of-trunk version of netperf is at:


http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html

and the top-of-trunk bits can be pulled via subversion pointing at 
http://www.netperf.org/svn/netperf2/trunk


happy benchmarking,

rick jones

For example, between a pair of Ubuntu 11.04 systems with Mellanox 10GbE, 
and a pair of X5650 processors each (so 24 CPUs):


~$ ./netperf -p 12866 -H ndestination -c -C -l 30 -- -P 12867 -m 64K
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 12867 AF_INET to 
destination () port 12867 AF_INET : demo
Recv   SendSend  Utilization   Service 
Demand

Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local 
remote

bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB   us/KB

 87380  16384  6553630.00  9271.36   2.52 2.72 0.535 
0.576


~$ ./netperf -t TCP_SENDFILE -p 12866 -H destination -c -C -l 30 -- -P 
12867 -m 64K
TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 12867 AF_INET to 
destination () port 12867 AF_INET : demo
Recv   SendSend  Utilization   Service 
Demand

Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local 
remote

bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB   us/KB

 87380  16384  6553630.00  9332.46   0.82 2.71 0.173 
0.572


It would be good to repeat each a couple times, but in this case at 
least, we see a considerable drop in sending side CPU utilization and 
service demand, the latter being a direct measure of efficiency.


(the socket sizes are simply what they were at the onset of the 
connection, not by the end.  for that, use omni output selectors - 
http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Omni-Output-Selection 
- the test-specific -P option is to explicitly select port numbers for 
the data connection to deal with firewalls in my test environment - 
similarly for the global -p option selecting the port number on which 
netserver at destination is waiting)


With a smaller send size  the results may be a bit different:

~$ ./netperf -p 12866 -H destination -c -C -l 30 -- -P 12867
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 12867 AF_INET to 
destination () port 12867 AF_INET : demo
Recv   SendSend  Utilization   Service 
Demand

Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local 
remote

bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB   us/KB

 87380  16384  1638430.00  9332.43   2.64 2.74 0.556 
0.578
~$ ./netperf -t TCP_SENDFILE -p 12866 -H destination -c -C -l 30 -- -P 
12867
TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 12867 AF_INET to 
destination () port 12867 AF_INET : demo
Recv   SendSend  Utilization   Service 
Demand

Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local 
remote

bytes  bytes   bytessecs.10^6bits/s  % S  % S  us/KB   us/KB

 87380  16384  1638430.00  9351.32   1.26 2.73 0.264 
0.574


Mileage will vary depending on link-type, CPU's present, etc etc etc...

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp