Re: Issue with hast replication

2012-03-17 Thread Mikolaj Golub
On Tue, 13 Mar 2012 00:22:23 +0100 Phil Regnauld wrote: PR (side note: hastd doesn't pick up configuration changes even with SIGHUP, PRwhich makes it hard to provision new resources on the fly) I just tried to reproduce this and failed. For me a new recource was added without

Re: Issue with hast replication

2012-03-17 Thread Phil Regnauld
Mikolaj Golub (to.my.trociny) writes: I just tried to reproduce this and failed. For me a new recource was added without problems on reload. Mar 17 20:04:24 kopusha hastd[52678]: Reloading configuration... Mar 17 20:04:24 kopusha hastd[52678]: Keep listening on address 0.0.0.0:7771. Mar

Re: Issue with hast replication

2012-03-13 Thread Mikolaj Golub
On Tue, 13 Mar 2012 00:22:23 +0100 Phil Regnauld wrote: PR Mikolaj Golub (to.my.trociny) writes: It looks like in the case of hastd this was send(2) who returned ENOMEM, but it would be good to check. Could you please start synchronization again, ktrace primary worker process when

Re: Issue with hast replication

2012-03-13 Thread Phil Regnauld
Mikolaj Golub (to.my.trociny) writes: Ok. So it is send(2). I suppose the network driver could generate the error. Did you tell what network adaptor you had? Not yet. bce0: HP NC382i DP Multifunction Gigabit Server Adapter (C0) mem 0xf400-0xf5ff irq 16 at device 0.0

Re: Issue with hast replication

2012-03-13 Thread Mikolaj Golub
On Tue, 13 Mar 2012 22:19:28 +0100 Phil Regnauld wrote: PR dev.bce.0.l2fhdr_error_count: 0 PR dev.bce.0.stat_emac_tx_stat_dot3statsinternalmactransmiterrors: 0 PR dev.bce.0.stat_Dot3StatsCarrierSenseErrors: 0 PR dev.bce.0.stat_Dot3StatsFCSErrors: 0 PR

Re: Issue with hast replication

2012-03-13 Thread Phil Regnauld
Mikolaj Golub (to.my.trociny) writes: What about failed counters like mbuf_alloc_failed_count, dma_map_addr_rx_failed_count, dma_map_addr_tx_failed_count? dev.bce.0.l2fhdr_error_count: 0 dev.bce.0.mbuf_alloc_failed_count: 0 dev.bce.0.mbuf_frag_count: 0

Re: Issue with hast replication

2012-03-12 Thread Phil Regnauld
Phil Regnauld (regnauld) writes: 7) ktrace on the destination dd: fstat(0,{ mode=p- ,inode=5,size=16384,blksize=4096 }) = 0 (0x0) lseek(0,0x0,SEEK_CUR)ERR#29 'Illegal seek' [...] Illegal seek, eh ? Any clues ? The boxes are

Re: Issue with hast replication

2012-03-12 Thread Mikolaj Golub
On Mon, 12 Mar 2012 15:31:27 +0100 Phil Regnauld wrote: PR Phil Regnauld (regnauld) writes: 7) ktrace on the destination dd: fstat(0,{ mode=p- ,inode=5,size=16384,blksize=4096 }) = 0 (0x0) lseek(0,0x0,SEEK_CUR)ERR#29 'Illegal seek' PR

Re: Issue with hast replication

2012-03-12 Thread Phil Regnauld
Mikolaj Golub (to.my.trociny) writes: It looks like in the case of hastd this was send(2) who returned ENOMEM, but it would be good to check. Could you please start synchronization again, ktrace primary worker process when ENOMEM errors are observed and show output here? Ok, took a

Issue with hast replication

2012-03-11 Thread Phil Regnauld
Hi, I've got a fairly simple setup: two hosts running 9.0-R (will upgrade to stable if told to, but want to check here first), ZFS and HAST. HAST is configured to run on top of zvols configured on each host, as illustrated: FS FS +--+

Re: Issue with hast replication

2012-03-11 Thread Mikolaj Golub
On Sun, 11 Mar 2012 19:54:57 +0100 Phil Regnauld wrote: PR Hi, PR I've got a fairly simple setup: two hosts running 9.0-R (will upgrade to stable PR if told to, but want to check here first), ZFS and HAST. HAST is configured to PR run on top of zvols configured on each host, as

Re: Issue with hast replication

2012-03-11 Thread Phil Regnauld
Mikolaj Golub (trociny) writes: PR Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. PR Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. PR Mar 11 02:02:41 h1 hastd[2282]: [hvol]