Re: HAST instability

2011-06-14 Thread Daniel Kalchev
On 10.06.11 20:07, Mikolaj Golub wrote: On Fri, 10 Jun 2011 20:05:43 +0300 Mikolaj Golub wrote to Daniel Kalchev: MG Could you please try this patch? MG http://people.freebsd.org/~trociny/hastd.no_shutdown.patch Sure you still have to have your kernel patched with uipc_socket.c.patch

Re: HAST instability

2011-06-14 Thread Mikolaj Golub
On Tue, 14 Jun 2011 16:39:11 +0300 Daniel Kalchev wrote: DK On 10.06.11 20:07, Mikolaj Golub wrote: On Fri, 10 Jun 2011 20:05:43 +0300 Mikolaj Golub wrote to Daniel Kalchev: MG Could you please try this patch? MG http://people.freebsd.org/~trociny/hastd.no_shutdown.patch

Re: HAST instability

2011-06-14 Thread Daniel Kalchev
On 14.06.11 17:56, Mikolaj Golub wrote: It has turned out that automatic receive buffer sizing works only for connections in ESTABLISHED state. And with small receive buffer the connection might stuck sending data only via TCP window probes -- one byte every few seconds (see Scenario to make

Re: HAST instability

2011-06-10 Thread Mikolaj Golub
On Fri, 03 Jun 2011 19:18:29 +0300 Daniel Kalchev wrote: DK Well, apparently my HAST joy was short. On a second run, I got stuck with DK Jun 3 19:08:16 b1a hastd[1900]: [data2] (primary) Unable to receive DK reply header: Operation timed out. DK on the primary. No messages on the

Re: HAST instability

2011-06-10 Thread Mikolaj Golub
On Fri, 10 Jun 2011 20:05:43 +0300 Mikolaj Golub wrote to Daniel Kalchev: MG Could you please try this patch? MG http://people.freebsd.org/~trociny/hastd.no_shutdown.patch Sure you still have to have your kernel patched with uipc_socket.c.patch :-) -- Mikolaj Golub

Re: HAST instability

2011-06-03 Thread Daniel Kalchev
Decided to apply the patch proposed in -current by Mikolaj Golub: http://people.freebsd.org/~trociny/uipc_socket.c.patch This apparently fixed my issue as well. Running without checksums for a full bonnie++ run (~100GB write/rewrite) produced no disconnects, no stalls and generated up to

Re: HAST instability

2011-06-03 Thread Daniel Kalchev
Well, apparently my HAST joy was short. On a second run, I got stuck with Jun 3 19:08:16 b1a hastd[1900]: [data2] (primary) Unable to receive reply header: Operation timed out. on the primary. No messages on the secondary. On primary: # netstat -an | grep 8457 tcp4 0 0

Re: HAST instability

2011-06-01 Thread Daniel Kalchev
Here goes the second run, wihtout checksums. systat -if /0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10 Load Average Interface Traffic PeakTotal lo0 in 0.000 KB/s 71.666 KB/s 361.825

Re: HAST instability

2011-05-31 Thread Daniel Kalchev
On 30.05.11 21:42, Mikolaj Golub wrote: DK One strange thing is that there is never established TCP connection DK between both nodes: DK tcp4 0 0 10.2.101.11.48939 10.2.101.12.8457 FIN_WAIT_2 DK tcp4 0 1288 10.2.101.11.57008 10.2.101.12.8457

Re: HAST instability

2011-05-31 Thread Mikolaj Golub
On Tue, 31 May 2011 15:51:07 +0300 Daniel Kalchev wrote: DK On 30.05.11 21:42, Mikolaj Golub wrote: DK One strange thing is that there is never established TCP connection DK between both nodes: DK tcp4 0 0 10.2.101.11.48939 10.2.101.12.8457 FIN_WAIT_2

Re: HAST instability

2011-05-31 Thread Daniel Kalchev
On 31.05.11 17:08, Mikolaj Golub wrote: As I wrote privately, it would be nice to see both netstat and hast logs (from both nodes) for the same rather long period, when several cases occured. It would be good to place them somewere on web so other guys could access them too, as I will be

Re: HAST instability

2011-05-30 Thread Daniel Kalchev
Some further investigation: The HAST nodes do not disconnect when checksum is enabled (either crc32 or sha256). One strange thing is that there is never established TCP connection between both nodes: tcp4 0 0 10.2.101.11.48939 10.2.101.12.8457 FIN_WAIT_2 tcp4 0

Re: HAST instability

2011-05-30 Thread Mikolaj Golub
On Mon, 30 May 2011 17:43:04 +0300 Daniel Kalchev wrote: DK Some further investigation: DK The HAST nodes do not disconnect when checksum is enabled (either DK crc32 or sha256). DK One strange thing is that there is never established TCP connection DK between both nodes: DK tcp4 0

Re: HAST instability

2011-05-30 Thread Mikolaj Golub
On Mon, 30 May 2011 17:43:04 +0300 Daniel Kalchev wrote: DK tcp4 0 0 10.2.101.11.48939 10.2.101.12.8457 FIN_WAIT_2 DK tcp4 0 1288 10.2.101.11.57008 10.2.101.12.8457 CLOSE_WAIT DK tcp4 0 0 10.2.101.11.46346 10.2.101.12.8457