from:"Schweiss, Chip"

Re: [OmniOS-discuss] NVMe JBOF

2018-12-14 Thread Schweiss, Chip

On Fri, Dec 14, 2018 at 10:20 AM Richard Elling <
richard.ell...@richardelling.com> wrote:

>
> I can't speak to the Supermicro, but I can talk in detail about
> https://www.vikingenterprisesolutions.com/products-2/nds-2244/
>
> >
> > While I do not run HA because of too many issues, I still build
> everything with two server nodes.  This makes updates and reboots possible
> by moving a pool to the sister host and greatly minimizing downtime.   This
> is essential when the NFS target is hosting 300+ vSphere VMs.
>
> The NDS-2244 is a 24-slot u.2 NVMe chassis with programmable PCIe switches.
> To the host, the devices look like locally attached NVMe and there is no
> software
> changes required. Multiple hosts can connect, up to the PCIe port limits.
> If you use
> dual-port NVMe drives, then you can share the drives between any two hosts
> concurrently.
> Programming the switches is accomplished out-of-band by an HTTTP-based
> interface
> that also monitors the enclosure.
>
> In other words, if you want a NVMe equivalent to a dual-hosted SAS JBOD,
> the NDS-2244
> is very capable, but more configurable.
>  -- richard
>
>
This is execellent.   I like the idea of only one host seeing the SSDs at
once, but a programatic way to flip them to the other host.   This solves
the fencing problem in ZFS nicely.

Thanks for product reference.   The Viking JBOF looks like what I need.

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] NVMe JBOF

2018-12-14 Thread Schweiss, Chip

Has the NVMe support in Illumos come far enough along to properly support
two servers connected to NVMe JBOF storage such as the Supermicro
SSG-136R-N32JBF?

While I do not run HA because of too many issues, I still build everything
with two server nodes.  This makes updates and reboots possible by moving a
pool to the sister host and greatly minimizing downtime.   This is
essential when the NFS target is hosting 300+ vSphere VMs.

Thanks!
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Panic on OmniOS CE r151022ay

2018-08-30 Thread Schweiss, Chip

Here's the dump from the panic:

ftp://ftp.nrg.wustl.edu/pub/zfs/mirpool03-xattr-20180830-vmdump.1



On Thu, Aug 30, 2018 at 9:29 AM, Schweiss, Chip  wrote:

> I've seen this panic twice now in the past couple weeks.   Does anyone
> know if there is a patch already that fixes this?  Looks like another xattr
> problem.
>
> # fmdump -Vp -u b7c9840b-8bb1-cbbc-e165-a5b6fa34078b
> TIME   UUID
>  SUNW-MSG-ID
> Aug 30 2018 08:29:32.089419000 b7c9840b-8bb1-cbbc-e165-a5b6fa34078b
> SUNOS-8000-KL
>
>   TIME CLASS ENA
>   Aug 30 08:27:50.8299 ireport.os.sunos.panic.dump_pending_on_device
> 0x
>
> nvlist version: 0
> version = 0x0
> class = list.suspect
> uuid = b7c9840b-8bb1-cbbc-e165-a5b6fa34078b
> code = SUNOS-8000-KL
> diag-time = 1535635766 223254
> de = fmd:///module/software-diagnosis
> fault-list-sz = 0x1
> fault-list = (array of embedded nvlists)
> (start fault-list[0])
> nvlist version: 0
> version = 0x0
> class = defect.sunos.kernel.panic
> certainty = 0x64
> asru = sw:///:path=/var/crash//.b7c98
> 40b-8bb1-cbbc-e165-a5b6fa34078b
> resource = sw:///:path=/var/crash//.b7c98
> 40b-8bb1-cbbc-e165-a5b6fa34078b
> savecore-succcess = 0
> os-instance-uuid = b7c9840b-8bb1-cbbc-e165-a5b6fa34078b
> panicstr = BAD TRAP: type=d (#gp General protection)
> rp=d001e9855360 addr=d063784ee8d0
> panicstack = unix:real_mode_stop_cpu_stage2_end+b203 () |
> unix:trap+a70 () | unix:cmntrap+e6 () | zfs:zfs_getattr+1a0 () |
> genunix:fop_getattr+a8 () | genunix:xattr_dir_getattr+16c () |
> genunix:fop_getattr+a8 () | nfssrv:rfs4_delegated_getattr+20 () |
> nfssrv:acl3_getxattrdir+102 () | nfssrv:common_dispatch+5ab () |
> nfssrv:acl_dispatch+2d () | rpcmod:svc_getreq+1c1 () | rpcmod:svc_run+e0 ()
> | rpcmod:svc_do_run+8e () | nfs:nfssys+111 () | unix:brand_sys_sysenter+1d3
> () |
> crashtime = 1535633923
> panic-time = Thu Aug 30 07:58:43 2018 CDT
> (end fault-list[0])
>
> fault-status = 0x1
> severity = Major
> __ttl = 0x1
> __tod = 0x5b87f13c 0x5546cf8
>
> Let me know what other information I can provide here.
>
> -Chip
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] Panic on OmniOS CE r151022ay

2018-08-30 Thread Schweiss, Chip

I've seen this panic twice now in the past couple weeks.   Does anyone know
if there is a patch already that fixes this?  Looks like another xattr
problem.

# fmdump -Vp -u b7c9840b-8bb1-cbbc-e165-a5b6fa34078b
TIME   UUID
 SUNW-MSG-ID
Aug 30 2018 08:29:32.089419000 b7c9840b-8bb1-cbbc-e165-a5b6fa34078b
SUNOS-8000-KL

  TIME CLASS ENA
  Aug 30 08:27:50.8299 ireport.os.sunos.panic.dump_pending_on_device
0x

nvlist version: 0
version = 0x0
class = list.suspect
uuid = b7c9840b-8bb1-cbbc-e165-a5b6fa34078b
code = SUNOS-8000-KL
diag-time = 1535635766 223254
de = fmd:///module/software-diagnosis
fault-list-sz = 0x1
fault-list = (array of embedded nvlists)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = defect.sunos.kernel.panic
certainty = 0x64
asru = sw:///:path=/var/crash//.b7c9840b-8bb1-cbbc-e165-
a5b6fa34078b
resource = sw:///:path=/var/crash//.b7c9840b-8bb1-cbbc-e165-
a5b6fa34078b
savecore-succcess = 0
os-instance-uuid = b7c9840b-8bb1-cbbc-e165-a5b6fa34078b
panicstr = BAD TRAP: type=d (#gp General protection)
rp=d001e9855360 addr=d063784ee8d0
panicstack = unix:real_mode_stop_cpu_stage2_end+b203 () |
unix:trap+a70 () | unix:cmntrap+e6 () | zfs:zfs_getattr+1a0 () |
genunix:fop_getattr+a8 () | genunix:xattr_dir_getattr+16c () |
genunix:fop_getattr+a8 () | nfssrv:rfs4_delegated_getattr+20 () |
nfssrv:acl3_getxattrdir+102 () | nfssrv:common_dispatch+5ab () |
nfssrv:acl_dispatch+2d () | rpcmod:svc_getreq+1c1 () | rpcmod:svc_run+e0 ()
| rpcmod:svc_do_run+8e () | nfs:nfssys+111 () | unix:brand_sys_sysenter+1d3
() |
crashtime = 1535633923
panic-time = Thu Aug 30 07:58:43 2018 CDT
(end fault-list[0])

fault-status = 0x1
severity = Major
__ttl = 0x1
__tod = 0x5b87f13c 0x5546cf8

Let me know what other information I can provide here.

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] rpcgen

2018-05-09 Thread Schweiss, Chip

I need the rpcgen on OmniOS.  Any suggestions on where I can get this, or
do I need to build it myself?

Thanks,
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] rpcgen

2018-05-09 Thread Schweiss, Chip

I just found it in 'developer/object-file'.

Sorry for the bother.

-Chip

On Wed, May 9, 2018 at 7:54 AM, Schweiss, Chip <c...@innovates.com> wrote:

> I need the rpcgen on OmniOS.  Any suggestions on where I can get this, or
> do I need to build it myself?
>
> Thanks,
> -Chip
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [zfs] FMD fails to run

2018-03-19 Thread Schweiss, Chip

The fault manager is starting now.  However, the disks still as show as
UNAVAIL when running zpool import.

# zpool import
   pool: hcp-arc01
 id: 11579406004081253836
  state: UNAVAIL
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://illumos.org/msg/ZFS-8000-3C
 config:

hcp-arc01UNAVAIL
insufficient replicas
  raidz3-0   UNAVAIL
insufficient replicas
c0t5000C50093E3BE87d0p0  UNAVAIL
cannot open
c0t5000C50086B52EABd0p0  UNAVAIL
cannot open
c0t5000C50093F046A7d0p0  UNAVAIL
cannot open
c0t5000C50093E3086Fd0p0  UNAVAIL
cannot open
c0t5000C50093E85C07d0p0  UNAVAIL
cannot open
c0t5000C50093E3BED3d0p0  UNAVAIL
cannot open
c0t5000C50093E39267d0p0  UNAVAIL
cannot open
c0t5000C50093E309DBd0p0  UNAVAIL
cannot open
c0t5000C50093E31407d0p0  UNAVAIL
cannot open
c0t5000C50093E3885Bd0p0  UNAVAIL
cannot open
c0t5000C50093E344D7d0p0  UNAVAIL
cannot open
c0t5000C50093E332AFd0p0  UNAVAIL
cannot open
c0t5000C50093F04A2Fd0p0  UNAVAIL
cannot open
c0t5000C50093F04763d0p0  UNAVAIL
cannot open
c0t5000C50086B5DCE3d0p0  UNAVAIL
cannot open
c0t5000C50086B5CD37d0p0  UNAVAIL
cannot open
c0t5000C50086B5E263d0p0  UNAVAIL
cannot open
c0t5000C50086B5CD07d0p0  UNAVAIL
cannot open
c0t5000C50086B5DB3Bd0p0  UNAVAIL
cannot open
c0t5000C50086B5D95Fd0p0  UNAVAIL
cannot open
c0t5000C50086B566BBd0p0  UNAVAIL
cannot open
c0t5000C50086B5F38Fd0p0  UNAVAIL
cannot open
c0t5000C50093E37C97d0p0  UNAVAIL
cannot open
c0t5000C50093E3909Bd0p0  UNAVAIL
cannot open
  raidz3-1   UNAVAIL
insufficient replicas
c0t5000C50093E85C1Fd0p0  UNAVAIL
cannot open
c0t5000C50093E3A29Fd0p0  UNAVAIL
cannot open
c0t5000C50093E342BFd0p0  UNAVAIL
cannot open
c0t5000C50093E359DFd0p0  UNAVAIL
cannot open
c0t5000C50086B5281Fd0p0  UNAVAIL
cannot open
c0t5000C50093E331F7d0p0  UNAVAIL
cannot open
c0t5000C50093E35A93d0p0  UNAVAIL
cannot open
c0t5000C50093E38347d0p0  UNAVAIL
cannot open
c0t5000C50093E8532Bd0p0  UNAVAIL
cannot open
c0t5000C50093E3422Fd0p0  UNAVAIL
cannot open
c0t5000C50093CFA493d0p0  UNAVAIL
cannot open
c0t5000C50093E29DB3d0p0  UNAVAIL
cannot open
c0t5000C50093E3B70Bd0p0  UNAVAIL
cannot open
c0t5000C50093E3946Fd0p0  UNAVAIL
cannot open
c0t5000C50086B5319Bd0p0  UNAVAIL
cannot open
c0t5000C50086B5608Bd0p0  UNAVAIL
cannot open
c0t5000C50086B5D9B7d0p0  UNAVAIL
cannot open
c0t5000C50086B5E1ABd0p0  UNAVAIL
cannot open
c0t5000C50093E85D93d0p0  UNAVAIL
cannot open
c0t5000C50093E85C73d0p0  UNAVAIL
cannot open
c0t5000C50086B5D7CBd0p0  UNAVAIL
cannot open
c0t5000C50093E33F23d0p0  UNAVAIL
cannot open
c0t5000C50093E36A8Fd0p0  UNAVAIL
cannot open
c0t5000C50093E30193d0p0  UNAVAIL
cannot open
  raidz3-2   UNAVAIL
insufficient replicas
c0t5000C50093E34E3Fd0p0  UNAVAIL
cannot open
c0t5000C50093E36DB7d0p0  UNAVAIL
cannot open
c0t5000C50093E2C467d0p0  UNAVAIL
cannot open
c0t5000C50093E3A213d0p0  UNAVAIL
cannot open

Re: [OmniOS-discuss] [zfs] FMD fails to run

2018-03-19 Thread Schweiss, Chip

On Mon, Mar 19, 2018 at 9:33 AM, Andy Fiddaman <omn...@citrus-it.net> wrote:

>
> On Mon, 19 Mar 2018, Schweiss, Chip wrote:
>
> ; On Mon, Mar 19, 2018 at 9:19 AM, Andy Fiddaman <omn...@citrus-it.net>
> wrote:
> ;
> ; >
> ; > I'll have a look at this for you and get a hot-fix built. I have the
> core
> ; > file that you made available so just need to go through and work out
> why
> ; > it thinks there are 0 phys somewhere.
> ; >
> ; >
> ; Many thanks!
> ;
> ; In discusscussion with JBOD vendor support, this JBOD has two SAS
> ; expanders, which are linked together.  One is likely incorrectly
> reporting
> ; 0 and should be ignored.
>
> The device is identifying as ESC_ELECTRONICS rather than a SAS_EXPANDER but
> I'll do some more digging.
>
>
You might be on to something.  I was suspicious of Element 96 when
examining via sg_ses.  This JBOD has 96 slots and no display panel.  The
vendor suspected other issues.

sg_ses -p ed /dev/es/ses1
  RAIDINC   96BAY 1715
  Primary enclosure logical identifier (hex): 500093d230938000
Element Descriptor In diagnostic page:
  generation code: 0x1
  element descriptor list (grouped by type):
Element type: Array device slot, subenclosure id: 0 [ti=0]
  Overall descriptor: Array Dev Slot
  Element 0 descriptor: SLOT 01 11
  Element 1 descriptor: SLOT 02 12
  Element 2 descriptor: SLOT 03 13
  Element 3 descriptor: SLOT 04 14
  Element 4 descriptor: SLOT 05 15
  Element 5 descriptor: SLOT 06 16
  Element 6 descriptor: SLOT 07 17
  Element 7 descriptor: SLOT 08 18
  Element 8 descriptor: SLOT 09 19
  Element 9 descriptor: SLOT 10 1A
  Element 10 descriptor: SLOT 11 1B
  Element 11 descriptor: SLOT 12 1C
  Element 12 descriptor: SLOT 13 1D
  Element 13 descriptor: SLOT 14 1E
  Element 14 descriptor: SLOT 15 21
  Element 15 descriptor: SLOT 16 22
  Element 16 descriptor: SLOT 17 23
  Element 17 descriptor: SLOT 18 24
  Element 18 descriptor: SLOT 19 25
  Element 19 descriptor: SLOT 20 26
  Element 20 descriptor: SLOT 21 27
  Element 21 descriptor: SLOT 22 28
  Element 22 descriptor: SLOT 23 29
  Element 23 descriptor: SLOT 24 2A
  Element 24 descriptor: SLOT 25 2B
  Element 25 descriptor: SLOT 26 2C
  Element 26 descriptor: SLOT 27 2D
  Element 27 descriptor: SLOT 28 2E
  Element 28 descriptor: SLOT 29 31
  Element 29 descriptor: SLOT 30 32
  Element 30 descriptor: SLOT 31 33
  Element 31 descriptor: SLOT 32 34
  Element 32 descriptor: SLOT 33 35
  Element 33 descriptor: SLOT 34 36
  Element 34 descriptor: SLOT 35 37
  Element 35 descriptor: SLOT 36 38
  Element 36 descriptor: SLOT 37 39
  Element 37 descriptor: SLOT 38 3A
  Element 38 descriptor: SLOT 39 3B
  Element 39 descriptor: SLOT 40 3C
  Element 40 descriptor: SLOT 41 3D
  Element 41 descriptor: SLOT 42 3E
  Element 42 descriptor: SLOT 43 41
  Element 43 descriptor: SLOT 44 42
  Element 44 descriptor: SLOT 45 43
  Element 45 descriptor: SLOT 46 44
  Element 46 descriptor: SLOT 47 45
  Element 47 descriptor: SLOT 48 46
  Element 48 descriptor: SLOT 49 47
  Element 49 descriptor: SLOT 50 49
  Element 50 descriptor: SLOT 51 4A
  Element 51 descriptor: SLOT 52 4B
  Element 52 descriptor: SLOT 53 4C
  Element 53 descriptor: SLOT 54 4D
  Element 54 descriptor: SLOT 55 4E
  Element 55 descriptor: SLOT 56 51
  Element 56 descriptor: SLOT 57 52
  Element 57 descriptor: SLOT 58 53
  Element 58 descriptor: SLOT 59 54
  Element 59 descriptor: SLOT 60 55
  Element 60 descriptor: SLOT 61 56
  Element 61 descriptor: SLOT 62 57
  Element 62 descriptor: SLOT 63 59
  Element 63 descriptor: SLOT 64 5A
  Element 64 descriptor: SLOT 65 5B
  Element 65 descriptor: SLOT 66 5C
  Element 66 descriptor: SLOT 67 5D
  Element 67 descriptor: SLOT 68 5E
  Element 68 descriptor: SLOT 69 61
  Element 69 descriptor: SLOT 70 62
  Element 70 descriptor: SLOT 71 63
  Element 71 descriptor: SLOT 72 64
  Element 72 descriptor: SLOT 73 65
  Element 73 descriptor: SLOT 74 66
  Element 74 descriptor: SLOT 75 67
  Element 75 descriptor: SLOT 76 68
  Element 76 descriptor: SLOT 77 69
  Element 77 descriptor: SLOT 78 6A
  Element 78 descriptor: SLOT 79 6B
  Element 79 descriptor: SLOT 80 6C
  Element 80 descriptor: SLOT 81 6D
  Element 81 descriptor: SLOT 82 6E
  Element 82 descriptor: SLOT 83 71
  Element 83 descriptor: SLOT 84 72
  Element 84 descriptor: SLOT 85 73
  Element 85 descriptor: SLOT 86 74
  Element 86 descriptor: SLOT 87 75
  Element 87 descriptor: SLOT 88 76
  Element 88 descriptor: SLOT 89 77
  Element 89 descriptor: SLOT 90 78
  Element 90 descriptor: SLOT 91 79

Re: [OmniOS-discuss] [zfs] FMD fails to run

2018-03-19 Thread Schweiss, Chip

On Mon, Mar 19, 2018 at 9:19 AM, Andy Fiddaman  wrote:

>
> I'll have a look at this for you and get a hot-fix built. I have the core
> file that you made available so just need to go through and work out why
> it thinks there are 0 phys somewhere.
>
>
Many thanks!

In discusscussion with JBOD vendor support, this JBOD has two SAS
expanders, which are linked together.  One is likely incorrectly reporting
0 and should be ignored.

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [zfs] FMD fails to run

2018-03-19 Thread Schweiss, Chip

Even unloading all the modules except 'fmd-self-diagnosis' which will not
unload, fmd still dies as soon as I plug in the JBOD.

# fmadm config
MODULE   VERSION STATUS  DESCRIPTION
fmd-self-diagnosis   1.0 active  Fault Manager Self-Diagnosis

# fmadm unload fmd-self-diagnosis
fmadm: failed to unload fmd-self-diagnosis: module is in use and cannot be
unloaded

Looks like I'm dead in the water to make this work with Illumos until this
bug is fixed.

-Chip

On Fri, Mar 16, 2018 at 3:42 PM, Richard Elling <
richard.ell...@richardelling.com> wrote:

> fmadm allows you to load/unload modules.
>  -- richard
>
> On Mar 16, 2018, at 8:24 AM, Schweiss, Chip <c...@innovates.com> wrote:
>
> I need to get this JBOD working with OmniOS.  Is there a way to get FMD to
> ignore this SES device until this issue is fixed?
>
> It is a RAID, Inc. 4U 96-Bay  http://www.raidinc.
> com/products/object-storage/ability-4u-96-bay
>
> -Chip
>
> On Fri, Mar 16, 2018 at 9:18 AM, Schweiss, Chip <c...@innovates.com>
> wrote:
>
>> While this problem was originally ruled out as an artifact of running as
>> a virtual machine, I've now installed the same HBA and JBOD to a physical
>> server.   The problem is exactly the same.
>>
>> This is on OmniOS CE r151024r
>>
>> -Chip
>>
>> # /usr/lib/fm/fmd/fmd -o fg=true -o client.debug=true
>> fmd: [ loading modules ... ABORT: attempted zero-length allocation:
>> Operation not supported
>> Abort (core dumped)
>>
>> > $C
>> 080462a8 libc.so.1`_lwp_kill+0x15(1, 6, 80462f8, fef42000, fef42000,
>> 8046330)
>> 080462c8 libc.so.1`raise+0x2b(6, 0, 80462e0, feec1b59, 0, 0)
>> 08046318 libc.so.1`abort+0x10e(fead51f0, 0, fede2a40, 30, 524f4241,
>> 61203a54)
>> 08046748 libses.so.1`ses_panic(fdde6758, 8046774, 80467e8, fdb6b67a,
>> 83eb0a8, fdb6c398)
>> 08046768 libses.so.1`ses_realloc(fdde6758, 0, 83f01b8, fdde6130,
>> fddf7000, fdb6658f)
>> 08046788 libses.so.1`ses_alloc+0x27(0, feb8, 6, 10, ee0, 8111627)
>> 080467b8 libses.so.1`ses_zalloc+0x1e(0, 0, 73, fdb6659d, 83f0190, 8)
>> 08046838 ses2.so`elem_parse_aes_misc+0x91(81114f4, 83eb0a8, 8, fdb65d85)
>> 08046888 ses2.so`elem_parse_aes+0xfc(82f1ac8, 83f0288, 80468f8, fdb80eae)
>> 080468a8 ses2.so`ses2_fill_element_node+0x37(82f1ac8, 83f0288, 832e930,
>> 4)
>> 080468d8 ses2.so`ses2_node_parse+0x53(82f1ac8, 83f0288, e, fddf7000)
>> 080468f8 libses.so.1`ses_fill_node+0x22(83f0288, 83f0348, fdde38ae,
>> fdde394c)
>> 08046918 libses.so.1`ses_fill_tree+0x21(83f0288, 82f1c88, 83e4cc8,
>> fdde394c)
>> 08046938 libses.so.1`ses_fill_tree+0x33(82f1d88, 82f1b88, 8046968,
>> fdde394c)
>> 08046958 libses.so.1`ses_fill_tree+0x33(82f1c88, 82ef758, 8046998,
>> fdde394c)
>> 08046978 libses.so.1`ses_fill_tree+0x33(82f1b88, 0, 18, fddf7000)
>> 08046998 libses.so.1`ses_fill_snap+0x22(82f08a0, 80, 0, fdde56eb)
>> 080469e8 libses.so.1`ses_snap_new+0x325(82f1b48, 0, 8046a18, fdde3006)
>> 08046a18 libses.so.1`ses_open_scsi+0xc4(1, 82ef688, 8046aa0, fed71c1b,
>> 81053f8, fede4042)
>> 08046a68 libses.so.1`ses_open+0x98(1, 8046aa0, 0, feecedd3, 43, fde1fc58)
>> 08046eb8 ses.so`ses_process_dir+0x133(fde20159, 83cc348, 0, fed77e40)
>> 08046ee8 ses.so`ses_enum+0xc1(81053f8, 82f21a0, 8386608, 0, 400, 0)
>> 08046f38 libtopo.so.1`topo_mod_enumerate+0xc4(81053f8, 82f21a0, 82fb1c8,
>> 8386608, 0, 400)
>> 08046f88 libtopo.so.1`enum_run+0xe9(8105a18, 83d6f78, a, fed7b1dd)
>> 08046fd8 libtopo.so.1`topo_xml_range_process+0x13e(8105a18, 82eb5b0,
>> 83d6f78, 8047008)
>> 08047028 libtopo.so.1`tf_rdata_new+0x135(8105a18, 82dfde0, 82eb5b0,
>> 82f21a0)
>> 08047088 libtopo.so.1`topo_xml_walk+0x246(8105a18, 82dfde0, 82ebd30,
>> 82f21a0, 8105a18, 83cbac0)
>> 080470e8 libtopo.so.1`topo_xml_walk+0x1b2(8105a18, 82dfde0, 82de080,
>> 82f21a0)
>> 08047128 libtopo.so.1`dependent_create+0x127(8105a18, 82dfde0, 83d3aa0,
>> 82de080, 82f21a0, fed7b1f9)
>> 08047168 libtopo.so.1`dependents_create+0x64(8105a18, 82dfde0, 83d3aa0,
>> 82de300, 82f21a0, 81eb0d8)
>> 08047218 libtopo.so.1`pad_process+0x51e(8105a18, 83ce100, 82de300,
>> 82f21a0, 83ce128, 81d8638)
>> 08047278 libtopo.so.1`topo_xml_range_process+0x31f(8105a18, 82de300,
>> 83ce100, 80472a8)
>> 080472c8 libtopo.so.1`tf_rdata_new+0x135(8105a18, 82dfde0, 82de300,
>> 81eb198)
>> 08047328 libtopo.so.1`topo_xml_walk+0x246(8105a18, 82dfde0, 82d1ca0,
>> 81eb198, 8103f40, fed8c000)
>> 08047358 libtopo.so.1`topo_xml_enum+0x67(8105a18, 82dfde0, 81eb198,
>> feac2000)
>> 08047488 libtopo.so.1`topo_file_load+0x139(8105a18, 81eb198, fe20c127,
>> fe20bda

Re: [OmniOS-discuss] [zfs] FMD fails to run

2018-03-16 Thread Schweiss, Chip

I need to get this JBOD working with OmniOS.  Is there a way to get FMD to
ignore this SES device until this issue is fixed?

It is a RAID, Inc. 4U 96-Bay
http://www.raidinc.com/products/object-storage/ability-4u-96-bay

-Chip

On Fri, Mar 16, 2018 at 9:18 AM, Schweiss, Chip <c...@innovates.com> wrote:

> While this problem was originally ruled out as an artifact of running as a
> virtual machine, I've now installed the same HBA and JBOD to a physical
> server.   The problem is exactly the same.
>
> This is on OmniOS CE r151024r
>
> -Chip
>
> # /usr/lib/fm/fmd/fmd -o fg=true -o client.debug=true
> fmd: [ loading modules ... ABORT: attempted zero-length allocation:
> Operation not supported
> Abort (core dumped)
>
> > $C
> 080462a8 libc.so.1`_lwp_kill+0x15(1, 6, 80462f8, fef42000, fef42000,
> 8046330)
> 080462c8 libc.so.1`raise+0x2b(6, 0, 80462e0, feec1b59, 0, 0)
> 08046318 libc.so.1`abort+0x10e(fead51f0, 0, fede2a40, 30, 524f4241,
> 61203a54)
> 08046748 libses.so.1`ses_panic(fdde6758, 8046774, 80467e8, fdb6b67a,
> 83eb0a8, fdb6c398)
> 08046768 libses.so.1`ses_realloc(fdde6758, 0, 83f01b8, fdde6130,
> fddf7000, fdb6658f)
> 08046788 libses.so.1`ses_alloc+0x27(0, feb8, 6, 10, ee0, 8111627)
> 080467b8 libses.so.1`ses_zalloc+0x1e(0, 0, 73, fdb6659d, 83f0190, 8)
> 08046838 ses2.so`elem_parse_aes_misc+0x91(81114f4, 83eb0a8, 8, fdb65d85)
> 08046888 ses2.so`elem_parse_aes+0xfc(82f1ac8, 83f0288, 80468f8, fdb80eae)
> 080468a8 ses2.so`ses2_fill_element_node+0x37(82f1ac8, 83f0288, 832e930, 4)
> 080468d8 ses2.so`ses2_node_parse+0x53(82f1ac8, 83f0288, e, fddf7000)
> 080468f8 libses.so.1`ses_fill_node+0x22(83f0288, 83f0348, fdde38ae,
> fdde394c)
> 08046918 libses.so.1`ses_fill_tree+0x21(83f0288, 82f1c88, 83e4cc8,
> fdde394c)
> 08046938 libses.so.1`ses_fill_tree+0x33(82f1d88, 82f1b88, 8046968,
> fdde394c)
> 08046958 libses.so.1`ses_fill_tree+0x33(82f1c88, 82ef758, 8046998,
> fdde394c)
> 08046978 libses.so.1`ses_fill_tree+0x33(82f1b88, 0, 18, fddf7000)
> 08046998 libses.so.1`ses_fill_snap+0x22(82f08a0, 80, 0, fdde56eb)
> 080469e8 libses.so.1`ses_snap_new+0x325(82f1b48, 0, 8046a18, fdde3006)
> 08046a18 libses.so.1`ses_open_scsi+0xc4(1, 82ef688, 8046aa0, fed71c1b,
> 81053f8, fede4042)
> 08046a68 libses.so.1`ses_open+0x98(1, 8046aa0, 0, feecedd3, 43, fde1fc58)
> 08046eb8 ses.so`ses_process_dir+0x133(fde20159, 83cc348, 0, fed77e40)
> 08046ee8 ses.so`ses_enum+0xc1(81053f8, 82f21a0, 8386608, 0, 400, 0)
> 08046f38 libtopo.so.1`topo_mod_enumerate+0xc4(81053f8, 82f21a0, 82fb1c8,
> 8386608, 0, 400)
> 08046f88 libtopo.so.1`enum_run+0xe9(8105a18, 83d6f78, a, fed7b1dd)
> 08046fd8 libtopo.so.1`topo_xml_range_process+0x13e(8105a18, 82eb5b0,
> 83d6f78, 8047008)
> 08047028 libtopo.so.1`tf_rdata_new+0x135(8105a18, 82dfde0, 82eb5b0,
> 82f21a0)
> 08047088 libtopo.so.1`topo_xml_walk+0x246(8105a18, 82dfde0, 82ebd30,
> 82f21a0, 8105a18, 83cbac0)
> 080470e8 libtopo.so.1`topo_xml_walk+0x1b2(8105a18, 82dfde0, 82de080,
> 82f21a0)
> 08047128 libtopo.so.1`dependent_create+0x127(8105a18, 82dfde0, 83d3aa0,
> 82de080, 82f21a0, fed7b1f9)
> 08047168 libtopo.so.1`dependents_create+0x64(8105a18, 82dfde0, 83d3aa0,
> 82de300, 82f21a0, 81eb0d8)
> 08047218 libtopo.so.1`pad_process+0x51e(8105a18, 83ce100, 82de300,
> 82f21a0, 83ce128, 81d8638)
> 08047278 libtopo.so.1`topo_xml_range_process+0x31f(8105a18, 82de300,
> 83ce100, 80472a8)
> 080472c8 libtopo.so.1`tf_rdata_new+0x135(8105a18, 82dfde0, 82de300,
> 81eb198)
> 08047328 libtopo.so.1`topo_xml_walk+0x246(8105a18, 82dfde0, 82d1ca0,
> 81eb198, 8103f40, fed8c000)
> 08047358 libtopo.so.1`topo_xml_enum+0x67(8105a18, 82dfde0, 81eb198,
> feac2000)
> 08047488 libtopo.so.1`topo_file_load+0x139(8105a18, 81eb198, fe20c127,
> fe20bda2, 0, 82d2000)
> 080474b8 libtopo.so.1`topo_mod_enummap+0x26(8105a18, 81eb198, fe20c127,
> fe20bda2, 8105a18, fe20b11c)
> 08047508 x86pi.so`x86pi_enum_start+0xc5(8105a18, 8047530, 8047538,
> fe205580, 8105a18, 8105a18)
> 08047558 x86pi.so`x86pi_enum+0x55(8105a18, 81eb198, 81d8a90, 0, 0, 0)
> 080475a8 libtopo.so.1`topo_mod_enumerate+0xc4(8105a18, 81eb198, 80ebf38,
> 81d8a90, 0, 0)
> 080475f8 libtopo.so.1`enum_run+0xe9(8105b68, 81f1070, a, fed7b1dd)
> 08047648 libtopo.so.1`topo_xml_range_process+0x13e(8105b68, 81f94c8,
> 81f1070, 8047678)
> 08047698 libtopo.so.1`tf_rdata_new+0x135(8105b68, 81f4240, 81f94c8,
> 81eb198)
> 080476f8 libtopo.so.1`topo_xml_walk+0x246(8105b68, 81f4240, 81f9608,
> 81eb198, 8103f40, fed8c000)
> 08047728 libtopo.so.1`topo_xml_enum+0x67(8105b68, 81f4240, 81eb198,
> 81d8ad0)
> 08047858 libtopo.so.1`topo_file_load+0x139(8105b68, 81eb198, 80f3f38,
> 81d8aa0, 0, 2c)
> 08047898 libtopo.so.1`topo_tree_enum+0x89(8103f40, 81f51c8, 80478c8,
> fe70e6f8, 81f7f78, 8103f40)
> 080478b8 libtopo.s

Re: [OmniOS-discuss] [zfs] FMD fails to run

2018-03-16 Thread Schweiss, Chip

While this problem was originally ruled out as an artifact of running as a
virtual machine, I've now installed the same HBA and JBOD to a physical
server.   The problem is exactly the same.

This is on OmniOS CE r151024r

-Chip

# /usr/lib/fm/fmd/fmd -o fg=true -o client.debug=true
fmd: [ loading modules ... ABORT: attempted zero-length allocation:
Operation not supported
Abort (core dumped)

> $C
080462a8 libc.so.1`_lwp_kill+0x15(1, 6, 80462f8, fef42000, fef42000,
8046330)
080462c8 libc.so.1`raise+0x2b(6, 0, 80462e0, feec1b59, 0, 0)
08046318 libc.so.1`abort+0x10e(fead51f0, 0, fede2a40, 30, 524f4241,
61203a54)
08046748 libses.so.1`ses_panic(fdde6758, 8046774, 80467e8, fdb6b67a,
83eb0a8, fdb6c398)
08046768 libses.so.1`ses_realloc(fdde6758, 0, 83f01b8, fdde6130, fddf7000,
fdb6658f)
08046788 libses.so.1`ses_alloc+0x27(0, feb8, 6, 10, ee0, 8111627)
080467b8 libses.so.1`ses_zalloc+0x1e(0, 0, 73, fdb6659d, 83f0190, 8)
08046838 ses2.so`elem_parse_aes_misc+0x91(81114f4, 83eb0a8, 8, fdb65d85)
08046888 ses2.so`elem_parse_aes+0xfc(82f1ac8, 83f0288, 80468f8, fdb80eae)
080468a8 ses2.so`ses2_fill_element_node+0x37(82f1ac8, 83f0288, 832e930, 4)
080468d8 ses2.so`ses2_node_parse+0x53(82f1ac8, 83f0288, e, fddf7000)
080468f8 libses.so.1`ses_fill_node+0x22(83f0288, 83f0348, fdde38ae,
fdde394c)
08046918 libses.so.1`ses_fill_tree+0x21(83f0288, 82f1c88, 83e4cc8, fdde394c)
08046938 libses.so.1`ses_fill_tree+0x33(82f1d88, 82f1b88, 8046968, fdde394c)
08046958 libses.so.1`ses_fill_tree+0x33(82f1c88, 82ef758, 8046998, fdde394c)
08046978 libses.so.1`ses_fill_tree+0x33(82f1b88, 0, 18, fddf7000)
08046998 libses.so.1`ses_fill_snap+0x22(82f08a0, 80, 0, fdde56eb)
080469e8 libses.so.1`ses_snap_new+0x325(82f1b48, 0, 8046a18, fdde3006)
08046a18 libses.so.1`ses_open_scsi+0xc4(1, 82ef688, 8046aa0, fed71c1b,
81053f8, fede4042)
08046a68 libses.so.1`ses_open+0x98(1, 8046aa0, 0, feecedd3, 43, fde1fc58)
08046eb8 ses.so`ses_process_dir+0x133(fde20159, 83cc348, 0, fed77e40)
08046ee8 ses.so`ses_enum+0xc1(81053f8, 82f21a0, 8386608, 0, 400, 0)
08046f38 libtopo.so.1`topo_mod_enumerate+0xc4(81053f8, 82f21a0, 82fb1c8,
8386608, 0, 400)
08046f88 libtopo.so.1`enum_run+0xe9(8105a18, 83d6f78, a, fed7b1dd)
08046fd8 libtopo.so.1`topo_xml_range_process+0x13e(8105a18, 82eb5b0,
83d6f78, 8047008)
08047028 libtopo.so.1`tf_rdata_new+0x135(8105a18, 82dfde0, 82eb5b0, 82f21a0)
08047088 libtopo.so.1`topo_xml_walk+0x246(8105a18, 82dfde0, 82ebd30,
82f21a0, 8105a18, 83cbac0)
080470e8 libtopo.so.1`topo_xml_walk+0x1b2(8105a18, 82dfde0, 82de080,
82f21a0)
08047128 libtopo.so.1`dependent_create+0x127(8105a18, 82dfde0, 83d3aa0,
82de080, 82f21a0, fed7b1f9)
08047168 libtopo.so.1`dependents_create+0x64(8105a18, 82dfde0, 83d3aa0,
82de300, 82f21a0, 81eb0d8)
08047218 libtopo.so.1`pad_process+0x51e(8105a18, 83ce100, 82de300, 82f21a0,
83ce128, 81d8638)
08047278 libtopo.so.1`topo_xml_range_process+0x31f(8105a18, 82de300,
83ce100, 80472a8)
080472c8 libtopo.so.1`tf_rdata_new+0x135(8105a18, 82dfde0, 82de300, 81eb198)
08047328 libtopo.so.1`topo_xml_walk+0x246(8105a18, 82dfde0, 82d1ca0,
81eb198, 8103f40, fed8c000)
08047358 libtopo.so.1`topo_xml_enum+0x67(8105a18, 82dfde0, 81eb198,
feac2000)
08047488 libtopo.so.1`topo_file_load+0x139(8105a18, 81eb198, fe20c127,
fe20bda2, 0, 82d2000)
080474b8 libtopo.so.1`topo_mod_enummap+0x26(8105a18, 81eb198, fe20c127,
fe20bda2, 8105a18, fe20b11c)
08047508 x86pi.so`x86pi_enum_start+0xc5(8105a18, 8047530, 8047538,
fe205580, 8105a18, 8105a18)
08047558 x86pi.so`x86pi_enum+0x55(8105a18, 81eb198, 81d8a90, 0, 0, 0)
080475a8 libtopo.so.1`topo_mod_enumerate+0xc4(8105a18, 81eb198, 80ebf38,
81d8a90, 0, 0)
080475f8 libtopo.so.1`enum_run+0xe9(8105b68, 81f1070, a, fed7b1dd)
08047648 libtopo.so.1`topo_xml_range_process+0x13e(8105b68, 81f94c8,
81f1070, 8047678)
08047698 libtopo.so.1`tf_rdata_new+0x135(8105b68, 81f4240, 81f94c8, 81eb198)
080476f8 libtopo.so.1`topo_xml_walk+0x246(8105b68, 81f4240, 81f9608,
81eb198, 8103f40, fed8c000)
08047728 libtopo.so.1`topo_xml_enum+0x67(8105b68, 81f4240, 81eb198, 81d8ad0)
08047858 libtopo.so.1`topo_file_load+0x139(8105b68, 81eb198, 80f3f38,
81d8aa0, 0, 2c)
08047898 libtopo.so.1`topo_tree_enum+0x89(8103f40, 81f51c8, 80478c8,
fe70e6f8, 81f7f78, 8103f40)
080478b8 libtopo.so.1`topo_tree_enum_all+0x20(8103f40, 81f7f78, 80478f8,
fed71087)
080478f8 libtopo.so.1`topo_snap_create+0x13d(8103f40, 804794c, 0, fed7118d,
807c010, 4d5)
08047928 libtopo.so.1`topo_snap_hold+0x56(8103f40, 0, 804794c, 80e7f08, 0,
8047ac8)
08047968 fmd_topo_update+0x9f(80e7f08, 8085dfa, 8047a68, 80601f7, 0, 0)
08047978 fmd_topo_init+0xb(0, 0, 0, 0, 2, 80992f8)
08047a68 fmd_run+0x118(809a8c0, , 0, 2)
08047ae8 main+0x344(8047adc, fef4f348, 8047b18, 805fdd3, 5, 8047b24)
08047b18 _start+0x83(5, 8047c38, 8047c4c, 8047c4f, 8047c57, 8047c5a)


On Fri, Feb 16, 2018 at 10:57 AM, Schweiss, Chip <c...@innovates.com> wrote:

> On Fri, Feb 16, 2018 at 10:47 AM, Robert Mustacchi <r...@joyent.com> wrote:
>
>> We're getting a zero length allocation here.

Re: [OmniOS-discuss] 8806 back port

2018-03-15 Thread Schweiss, Chip

On Tue, Mar 13, 2018 at 5:50 PM, Andy Fiddaman  wrote:

>
> This error message actually looks like you still have the publisher set
> from the last time you applied the fix by hand. Can you check the output
> of 'pkg publisher'?
>
>
That was it.   It was still there because I rolled back to the BE before it
was previously installed.

Thanks!

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] 8806 back port

2018-03-13 Thread Schweiss, Chip

The 8806 backport will no longer apply to r151022ap.

# pkg apply-hot-fix --be-name=omniosce-r151022ap-8806 8806-backport_r22.p5p
No updates available for this image.
pkg set-publisher: Could not refresh the catalog for omnios

file protocol error: code: E_FAILED_INIT (2) reason: Package archive
/root/8806-backport_r22.p5p does not contain the requested package file(s):
publisher/omnios/catalog/catalog.attrs.
Repository URL: 'file:///root/8806-backport_r22.p5p'. (happened 4 times)


Some things are not clear about hot-fixes.

Do they need to be reapplied after each update?

Are they only compatible with the release they are built against?

Cheers!
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [zfs] FMD fails to run

2018-02-16 Thread Schweiss, Chip

On Fri, Feb 16, 2018 at 10:47 AM, Robert Mustacchi  wrote:

> We're getting a zero length allocation here. It appears that the number
> of phys that we're detecting in one of the elements is probably zero. Is
> it possible to upload the core so we can confirm the data and fix the
> ses module to handle this, potentially odd, case?
>
>
Sure, where would you like me to upload the core?

I've put it here if you'd like to grab it:
ftp://ftp.nrg.wustl.edu/pub/zfs/fmd.core

-Chip



> Thanks,
> Robert
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [zfs] FMD fails to run

2018-02-16 Thread Schweiss, Chip

Here's what I'm seeing:

# /usr/lib/fm/fmd/fmd -o fg=true -o client.debug=true
fmd: [ loading modules ... ABORT: attempted zero-length allocation: No such
device or address
Abort (core dumped)

# mdb core
Loading modules: [ fmd libumem.so.1 libc.so.1 libnvpair.so.1 libtopo.so.1
libuutil.so.1 libavl.so.1 libcmdutils.so.1 libsysevent.so.1 ld.so.1 ]
> $C
08046298 libc.so.1`_lwp_kill+0x15(1, 6, 80462e8, fef42000, fef42000,
8046320)
080462b8 libc.so.1`raise+0x2b(6, 0, 80462d0, feec1b59, 0, 0)
08046308 libc.so.1`abort+0x10e(fede2a40, fef44cb8, 8046348, 6, 524f4241,
61203a54)
08046738 libses.so.1`ses_panic(fdde6758, 8046764, 80467d8, fdb6b67a,
83f1048, fdb6c398)
08046758 libses.so.1`ses_realloc(fdde6758, 0, 83f5078, fdde6130, fddf7000,
fdb6658f)
08046778 libses.so.1`ses_alloc+0x27(0, feb8, 6, 10, ee0, 80f4627)
080467a8 libses.so.1`ses_zalloc+0x1e(0, 0, 73, fdb6659d, 83f5050, 8)
08046828 ses2.so`elem_parse_aes_misc+0x91(80f44f4, 83f1048, 8, fdb65d85)
08046878 ses2.so`elem_parse_aes+0xfc(82bd388, 83f5148, 80468e8, fdb80eae)
08046898 ses2.so`ses2_fill_element_node+0x37(82bd388, 83f5148, 8303ed8, 4)
080468c8 ses2.so`ses2_node_parse+0x53(82bd388, 83f5148, e, fddf7000)
080468e8 libses.so.1`ses_fill_node+0x22(83f5148, 83f5208, fdde38ae,
fdde394c)
08046908 libses.so.1`ses_fill_tree+0x21(83f5148, 82bd548, 83e9cc8, fdde394c)
08046928 libses.so.1`ses_fill_tree+0x33(82bd648, 82bd448, 8046958, fdde394c)
08046948 libses.so.1`ses_fill_tree+0x33(82bd548, 82a5270, 8046988, fdde394c)
08046968 libses.so.1`ses_fill_tree+0x33(82bd448, 0, 18, fddf7000)
08046988 libses.so.1`ses_fill_snap+0x22(82c08d0, 80, 0, fdde56eb)
080469d8 libses.so.1`ses_snap_new+0x325(82bd408, 0, 8046a08, fdde3006)
08046a08 libses.so.1`ses_open_scsi+0xc4(1, 82a51a0, 8046a90, fed71c1b,
80e9468, fede4042)
08046a58 libses.so.1`ses_open+0x98(1, 8046a90, 0, feecedd3, 43, fde1fc58)
08046ea8 ses.so`ses_process_dir+0x133(fde20159, 83d8ed8, 0, fed77e40)
08046ed8 ses.so`ses_enum+0xc1(80e9468, 83aeb58, 8356570, 0, 400, 0)
08046f28 libtopo.so.1`topo_mod_enumerate+0xc4(80e9468, 83aeb58, 82d4a88,
8356570, 0, 400)
08046f78 libtopo.so.1`enum_run+0xe9(80e9a18, 83d77c8, a, fed7b1dd)
08046fc8 libtopo.so.1`topo_xml_range_process+0x13e(80e9a18, 82bb0b0,
83d77c8, 8046ff8)
08047018 libtopo.so.1`tf_rdata_new+0x135(80e9a18, 81c8790, 82bb0b0, 83aeb58)
08047078 libtopo.so.1`topo_xml_walk+0x246(80e9a18, 81c8790, 82bb830,
83aeb58, 80e9a18, 83d5bc0)
080470d8 libtopo.so.1`topo_xml_walk+0x1b2(80e9a18, 81c8790, 82b0b28,
83aeb58)
08047118 libtopo.so.1`dependent_create+0x127(80e9a18, 81c8790, 83d6ab0,
82b0b28, 83aeb58, fed7b1f9)
08047158 libtopo.so.1`dependents_create+0x64(80e9a18, 81c8790, 83d6ab0,
82b0da8, 83aeb58, 81bd0d8)
08047208 libtopo.so.1`pad_process+0x51e(80e9a18, 83d79a8, 82b0da8, 83aeb58,
83d79d0, 8356340)
08047268 libtopo.so.1`topo_xml_range_process+0x31f(80e9a18, 82b0da8,
83d79a8, 8047298)
080472b8 libtopo.so.1`tf_rdata_new+0x135(80e9a18, 81c8790, 82b0da8, 81bd258)
08047318 libtopo.so.1`topo_xml_walk+0x246(80e9a18, 81c8790, 82a37a0,
81bd258, 80e5f40, fed8c000)
08047348 libtopo.so.1`topo_xml_enum+0x67(80e9a18, 81c8790, 81bd258,
feac2000)
08047478 libtopo.so.1`topo_file_load+0x139(80e9a18, 81bd258, fe20c127,
fe20bda2, 0, 82a6000)
080474a8 libtopo.so.1`topo_mod_enummap+0x26(80e9a18, 81bd258, fe20c127,
fe20bda2, 80e9a18, fe20b11c)
080474f8 x86pi.so`x86pi_enum_start+0xc5(80e9a18, 8047520, 8047528,
fe205580, 80e9a18, 80e9a18)
08047548 x86pi.so`x86pi_enum+0x55(80e9a18, 81bd258, 81a6a70, 0, 0, 0)
08047598 libtopo.so.1`topo_mod_enumerate+0xc4(80e9a18, 81bd258, 80cdf38,
81a6a70, 0, 0)
080475e8 libtopo.so.1`enum_run+0xe9(80e9b68, 82a5fa8, a, fed7b1dd)
08047638 libtopo.so.1`topo_xml_range_process+0x13e(80e9b68, 82a3f70,
82a5fa8, 8047668)
08047688 libtopo.so.1`tf_rdata_new+0x135(80e9b68, 81c8bd0, 82a3f70, 81bd258)
080476e8 libtopo.so.1`topo_xml_walk+0x246(80e9b68, 81c8bd0, 81c7108,
81bd258, 80e5f40, fed8c000)
08047718 libtopo.so.1`topo_xml_enum+0x67(80e9b68, 81c8bd0, 81bd258, 81a6ab0)
08047848 libtopo.so.1`topo_file_load+0x139(80e9b68, 81bd258, 80d4f38,
81a6a80, 0, 2c)
08047888 libtopo.so.1`topo_tree_enum+0x89(80e5f40, 81c5318, 80478b8,
fe70e6f8, 81b5310, 80e5f40)
080478a8 libtopo.so.1`topo_tree_enum_all+0x20(80e5f40, 81b5310, 80478e8,
fed71087)
080478e8 libtopo.so.1`topo_snap_create+0x13d(80e5f40, 804793c, 0, fed7118d,
807c010, 21)
08047918 libtopo.so.1`topo_snap_hold+0x56(80e5f40, 0, 804793c, 80c9f08, 0,
8047ab8)
08047958 fmd_topo_update+0x9f(80c9f08, 8085dfa, 8047a58, 80601f7, 0, 0)
08047968 fmd_topo_init+0xb(0, 0, 0, 0, 2, 80992f8)
08047a58 fmd_run+0x118(809a8c0, , 0, 0)
08047ad8 main+0x344(8047acc, fef4f348, 8047b0c, 805fdd3, 5, 8047b18)
08047b0c _start+0x83(5, 8047c2c, 8047c40, 8047c43, 8047c4b, 8047c4e)



On Fri, Feb 16, 2018 at 10:29 AM, Yuri Pankov <yur...@yuripv.net> wrote:

> Schweiss, Chip wrote:
>
>> This is on OmniOS CE r151024l running in a VMware virtual machine under
>> ESXi 6.5 with PCI pass-thru to a SAS3008 HBA.
>>
>> Th

[OmniOS-discuss] FMD fails to run

2018-02-16 Thread Schweiss, Chip

This is on OmniOS CE r151024l running in a VMware virtual machine under
ESXi 6.5 with PCI pass-thru to a SAS3008 HBA.

The problem is related to the HBA on pass-thru.  If I disconnect it,
everything starts fine, but I am not clear why or how to fix this.   I have
done similar VM passthrough setups with older versions of OmniOS and SAS2
HBAs without any problems.

The same HBA was being used successfully in the same configuration with
CentOS 7 in the VM so I know this can function.

I can see all the disks, but cannot import the pool because the fault
manager is not running.

The logs show:

[ Feb 16 10:02:14 Method "start" exited with status 1. ]
[ Feb 16 10:02:14 Executing start method ("/usr/lib/fm/fmd/fmd"). ]
ABORT: attempted zero-length allocation: No such device or address
[ Feb 16 10:02:14 Method "start" exited with status 1. ]
[ Feb 16 10:02:14 Executing start method ("/usr/lib/fm/fmd/fmd"). ]
ABORT: attempted zero-length allocation: No such device or address
[ Feb 16 10:02:14 Method "start" exited with status 1. ]
[ Feb 16 10:05:09 Leaving maintenance because clear requested. ]
[ Feb 16 10:05:09 Enabled. ]
[ Feb 16 10:05:09 Executing start method ("/usr/lib/fm/fmd/fmd"). ]
ABORT: attempted zero-length allocation: No such device or address
[ Feb 16 10:05:10 Method "start" exited with status 1. ]

Any hope of making this work?

Thanks!
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [zfs] Re: rpcbind: t_bind failed

2018-01-17 Thread Schweiss, Chip

I haven't seen this bug filed yet.

Please submit this.  For anyone using automounter this bug is a ticking
time bomb.

I've been able to extend my frequency of reboots by about a week with

ndd -set /dev/tcp tcp_smallest_anon_port 1024

However, until this is fixed, I'm forced to reboot every couple weeks.

Thank you,

-Chip

On Mon, Jan 8, 2018 at 10:46 AM, Dan McDonald  wrote:

> OH PHEW!
>
> > On Jan 8, 2018, at 11:43 AM, Youzhong Yang  wrote:
> >
> > This is our patch. It was applied 3 years ago so the line number could
> be different for the latest version of the file.
> > diff --git a/usr/src/uts/common/rpc/clnt_cots.c
> b/usr/src/uts/common/rpc/clnt_cots.c
> > index 4466e93..0a0951d 100644
> > --- a/usr/src/uts/common/rpc/clnt_cots.c
> > +++ b/usr/src/uts/common/rpc/clnt_cots.c
> > @@ -2285,6 +2285,7 @@ start_retry_loop:
> >   if (rpcerr->re_status == RPC_SUCCESS)
> >   rpcerr->re_status = RPC_XPRTFAILED;
> >   cm_entry->x_connected = FALSE;
> > + cm_entry->x_dead = TRUE;
> >   } else
> >   cm_entry->x_connected = connected;
> >
> > @@ -2403,6 +2404,7 @@ connmgr_wrapconnect(
> >   if (rpcerr->re_status == RPC_SUCCESS)
> >   rpcerr->re_status = RPC_XPRTFAILED;
> >   cm_entry->x_connected = FALSE;
> > + cm_entry->x_dead = TRUE;
> >   } else
> >   cm_entry->x_connected = connected;
>
> This makes TONS more sense, and alleviates/obviates my concerns previously.
>
> If there isn't a bug already, please file one.  Once filed or found,
> please add me as a code reviewer for this.
>
> Thanks,
> Dan
>
>
> --
> illumos-zfs
> Archives: https://illumos.topicbox.com/groups/zfs/discussions/
> T8f10bde64dc0d5c5-M889b6aaf7cbeb0b32617f321
> Powered by Topicbox: https://topicbox.com
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] OmniOSce installer rpool slicing

2018-01-05 Thread Schweiss, Chip

In the previous Solaris style installer we had the option of only using a
portion of the disk that the rpool went on.   This was very good for SSDs
that perform better and last longer if they have some additional slack
space that never has data written to it.

Is there a way to achieve this with the new installer?

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OmniOSce installer rpool slicing

2018-01-05 Thread Schweiss, Chip

I didn't think about that.  Thanks!

On Fri, Jan 5, 2018 at 9:11 AM, Volker A. Brandt  wrote:

> Hi Chip!
>
>
> > In the previous Solaris style installer we had the option of only using a
> > portion of the disk that the rpool went on.   This was very good for
> SSDs that
> > perform better and last longer if they have some additional slack space
> that
> > never has data written to it.
> >
> > Is there a way to achieve this with the new installer?
>
> Yes.  Just drop to the shell from the installation menu and create your
> rpool using fdisk, format, and zpool create.  Exit the shell and select
> "use existing pool".
>
>
> Regards -- Volker
> --
> 
> Volker A. Brandt   Consulting and Support for Oracle Solaris
> Brandt & Brandt Computer GmbH   WWW: http://www.bb-c.de/
> Am Wiesenpfad 6, 53340 Meckenheim, GERMANYEmail: v...@bb-c.de
> Handelsregister: Amtsgericht Bonn, HRB 10513  Schuhgröße: 46
> Geschäftsführer: Rainer J.H. Brandt und Volker A. Brandt
>
> "When logic and proportion have fallen sloppy dead"
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [zfs] Re: rpcbind: t_bind failed

2018-01-03 Thread Schweiss, Chip

Hopefully the patch Marcel is talking about fixes this.  I've at least
figured out enough to predict when the problem is imminent.

We have been migrating to using automounter instead of hard mounts which
could to be related to this problem growing over time.

Just an FYI:  I've kept the server running in this state, but moved its
storage pool to a sister server.   The port binding problem remains with NO
NFS clients connected, but neither pfiles or lsof shows rpcbind as the
culprit:

# netstat -an|grep BOUND|wc -l
32739

# /opt/ozmt/bin/SunOS/lsof -i:41155

{nothing returned}

# pfiles `pgrep rpcbind`
449:/usr/sbin/rpcbind
  Current rlimit: 65536 file descriptors
   0: S_IFCHR mode:0666 dev:527,0 ino:7077 uid:0 gid:3 rdev:135,2
  O_RDWR
  /devices/pseudo/mm@0:null
  offset:0
   1: S_IFCHR mode:0666 dev:527,0 ino:7077 uid:0 gid:3 rdev:135,2
  O_RDWR
  /devices/pseudo/mm@0:null
  offset:0
   2: S_IFCHR mode:0666 dev:527,0 ino:7077 uid:0 gid:3 rdev:135,2
  O_RDWR
  /devices/pseudo/mm@0:null
  offset:0
   3: S_IFCHR mode: dev:527,0 ino:61271 uid:0 gid:0 rdev:231,64
  O_RDWR
sockname: AF_INET6 ::  port: 111
  /devices/pseudo/udp6@0:udp6
  offset:0
   4: S_IFCHR mode: dev:527,0 ino:50998 uid:0 gid:0 rdev:231,59
  O_RDWR
sockname: AF_INET6 ::  port: 0
  /devices/pseudo/udp6@0:udp6
  offset:0
   5: S_IFCHR mode: dev:527,0 ino:61264 uid:0 gid:0 rdev:231,58
  O_RDWR
sockname: AF_INET6 ::  port: 60955
  /devices/pseudo/udp6@0:udp6
  offset:0
   6: S_IFCHR mode: dev:527,0 ino:64334 uid:0 gid:0 rdev:224,57
  O_RDWR
sockname: AF_INET6 ::  port: 111
  /devices/pseudo/tcp6@0:tcp6
  offset:0
   7: S_IFCHR mode: dev:527,0 ino:64333 uid:0 gid:0 rdev:224,56
  O_RDWR
sockname: AF_INET6 ::  port: 0
  /devices/pseudo/tcp6@0:tcp6
  offset:0
   8: S_IFCHR mode: dev:527,0 ino:64332 uid:0 gid:0 rdev:230,55
  O_RDWR
sockname: AF_INET 0.0.0.0  port: 111
  /devices/pseudo/udp@0:udp
  offset:0
   9: S_IFCHR mode: dev:527,0 ino:64330 uid:0 gid:0 rdev:230,54
  O_RDWR
sockname: AF_INET 0.0.0.0  port: 0
  /devices/pseudo/udp@0:udp
  offset:0
  10: S_IFCHR mode: dev:527,0 ino:64331 uid:0 gid:0 rdev:230,53
  O_RDWR
sockname: AF_INET 0.0.0.0  port: 60994
  /devices/pseudo/udp@0:udp
  offset:0
  11: S_IFCHR mode: dev:527,0 ino:64327 uid:0 gid:0 rdev:223,52
  O_RDWR
sockname: AF_INET 0.0.0.0  port: 111
  /devices/pseudo/tcp@0:tcp
  offset:0
  12: S_IFCHR mode: dev:527,0 ino:64326 uid:0 gid:0 rdev:223,51
  O_RDWR
sockname: AF_INET 0.0.0.0  port: 0
  /devices/pseudo/tcp@0:tcp
  offset:0
  13: S_IFCHR mode: dev:527,0 ino:64324 uid:0 gid:0 rdev:226,32
  O_RDWR
  /devices/pseudo/tl@0:ticlts
  offset:0
  14: S_IFCHR mode: dev:527,0 ino:64328 uid:0 gid:0 rdev:226,33
  O_RDWR
  /devices/pseudo/tl@0:ticlts
  offset:0
  15: S_IFCHR mode: dev:527,0 ino:64324 uid:0 gid:0 rdev:226,35
  O_RDWR
  /devices/pseudo/tl@0:ticlts
  offset:0
  16: S_IFCHR mode: dev:527,0 ino:64322 uid:0 gid:0 rdev:226,36
  O_RDWR
  /devices/pseudo/tl@0:ticotsord
  offset:0
  17: S_IFCHR mode: dev:527,0 ino:64321 uid:0 gid:0 rdev:226,37
  O_RDWR
  /devices/pseudo/tl@0:ticotsord
  offset:0
  18: S_IFCHR mode: dev:527,0 ino:64030 uid:0 gid:0 rdev:226,39
  O_RDWR
  /devices/pseudo/tl@0:ticots
  offset:0
  19: S_IFCHR mode: dev:527,0 ino:64029 uid:0 gid:0 rdev:226,40
  O_RDWR
  /devices/pseudo/tl@0:ticots
  offset:0
  20: S_IFIFO mode: dev:525,0 ino:206 uid:1 gid:12 rdev:0,0
  O_RDWR|O_NONBLOCK
  21: S_IFIFO mode: dev:525,0 ino:206 uid:1 gid:12 rdev:0,0
  O_RDWR|O_NONBLOCK
  23: S_IFCHR mode: dev:527,0 ino:33089 uid:0 gid:0 rdev:129,21273
  O_WRONLY FD_CLOEXEC
  /devices/pseudo/log@0:conslog
  offset:0

Restarting rpcbind doesn't affect it either:

# svcadm restart svc:/network/rpc/bind:default

# netstat -an|grep BOUND|wc -l
32739

In the interim of this patch getting integrated I'll monitor the number of
bound ports to know when I should fail my pool over again.


On Wed, Jan 3, 2018 at 10:32 AM, Marcel Telka <mar...@telka.sk> wrote:

> On Wed, Jan 03, 2018 at 10:02:43AM -0600, Schweiss, Chip wrote:
> > The problem occurred again starting last night.  I have another clue,
> but I
> > still don't know how it is occurring or how to fix it.
> >
> > It looks like all the TCP ports are in "bound" state, but not being
> > released.
> >
> > How can I isolate the cause of this?
>
> This is a bug in rpcmod, very likely related to
> https://www.illumos.org/issues/1616
>
> I discussed this few weeks back with some guy who faced the same issue.  It
> looks like he found the cause an

Re: [OmniOS-discuss] rpcbind: t_bind failed

2018-01-03 Thread Schweiss, Chip

vate/defer
d063568fd7e0 stream-ord 000 000
d063568fdb90 stream-ord 000 000
d06356840078 stream-ord d0635685a700 000 private/bounce
d06356840428 stream-ord 000 000
d063568407d8 stream-ord 000 000
d06356840b88 stream-ord d0635685a800 000 private/rewrite
d06356843070 stream-ord d06356810380 000 private/tlsmgr
d06356843420 stream-ord 000 000
d063568437d0 stream-ord 000 000
d06356849068 stream-ord 000 000
d06356849418 stream-ord 000 000
d063568497c8 stream-ord d0635685a000 000 public/qmgr
d06356849b78 stream-ord d0635685a100 000 public/cleanup
d0635684d060 stream-ord 000 000
d0635684d410 stream-ord 000 000
d0635684db70 stream-ord 000 000
d06355646058 stream-ord 000 000
d06355646b68 stream-ord d0635685a300 000 public/pickup
d063551bf3f8 stream-ord d063193fe900 000 /var/run/.inetd.uds
d063550e7b50 dgram  d063550eb380 000 /var/run/in.rdisc_mib
d06355031798 dgram  d063536c8800 000 /var/run/in.ndpd_mib
d06355031b48 stream-ord d063536c8c00 000 /var/run/in.ndpd_ipadm
d0635265a028 stream-ord 000 d0634e4acd00
/var/run/dbus/system_bus_socket
d0635265a788 stream-ord 000 d063500ffc80
/var/run/hald/dbus-y1Me9kLIpf
d0635265ab38 stream-ord 000 000 /var/run/hald/dbus-y1Me9kLIpf
d06351d553d0 stream-ord 000 000 /var/run/hald/dbus-y1Me9kLIpf
d06351d55780 stream-ord 000 000 /var/run/hald/dbus-y1Me9kLIpf
d06351d55b30 stream-ord 000 d063500ffc80
/var/run/hald/dbus-y1Me9kLIpf
d06351996018 stream-ord 000 d063500ffc80
/var/run/hald/dbus-y1Me9kLIpf
d063519963c8 stream-ord 000 000 /var/run/hald/dbus-y1Me9kLIpf
d06351996778 stream-ord 000 d063500ffc80
/var/run/hald/dbus-y1Me9kLIpf
d063500fe010 stream-ord 000 000 /var/run/hald/dbus-5Qrha0Wmu3
d063500fe3c0 stream-ord 000 d063500ffa80
/var/run/hald/dbus-5Qrha0Wmu3
d063500fe770 stream-ord d063500ffa80 000
/var/run/hald/dbus-5Qrha0Wmu3
d063500feb20 stream-ord d063500ffc80 000
/var/run/hald/dbus-y1Me9kLIpf
d0634e4ad008 stream-ord 000 000
d0634e4ad3b8 stream-ord 000 000
d0634e4ad768 stream-ord 000 000 /var/run/dbus/system_bus_socket
d0634e4adb18 stream-ord d0634e4acd00 000
/var/run/dbus/system_bus_socket


A sorted output shows nearly all 64K ports in bound state.

On Tue, Jan 2, 2018 at 8:40 AM, Schweiss, Chip <c...@innovates.com> wrote:

> About once every week or two I'm having NFS connections start to collapse
> to one of my servers.   Clients will lose thier connections of the the
> course of several hours. The logs fill with these messages:
>
> Dec 25 16:21:14 mir-zfs03 rpcbind: [ID 452059 daemon.error]  do_accept :
> t_bind failed : Couldn't allocate address
> Dec 25 16:21:14 mir-zfs03 /usr/lib/nfs/nfsd[27689]: [ID 396295
> daemon.error] t_bind(file descriptor 188/transport tcp) TLI error 5
> Dec 25 16:21:31 mir-zfs03 last message repeated 85 times
> Dec 25 16:21:31 mir-zfs03 rpcbind: [ID 452059 daemon.error]  do_accept :
> t_bind failed : Couldn't allocate address
> Dec 25 16:21:32 mir-zfs03 /usr/lib/nfs/nfsd[27689]: [ID 396295
> daemon.error] t_bind(file descriptor 188/transport tcp) TLI error 5
> Dec 25 16:21:34 mir-zfs03 last message repeated 19 times
> Dec 25 16:21:37 mir-zfs03 /usr/lib/nfs/nfsd[27689]: [ID 396295
> daemon.error] t_bind(file descriptor 200/transport tcp) TLI error 5
> Dec 25 16:22:17 mir-zfs03 last message repeated 116 times
> Dec 25 16:22:21 mir-zfs03 /usr/lib/nfs/nfsd[27689]: [ID 396295
> daemon.error] t_bind(file descriptor 206/transport tcp) TLI error 5
> Dec 25 16:23:04 mir-zfs03 last message repeated 81 times
>
> This is a fully updated OmniOS CE r151022.
>
> I've tried restarting NFS services, but the only thing that has been
> successful in restoring services has been rebooting.
>
> I'm not finding anything useful via Google except the source code that
> spits out this message.   HP-UX appears to have had the same issue that
> they patched years ago.   I'm guessing shared NFS/RPC code.
>
> Any clue as to the cause of this and how to fix?
>
> -Chip
>
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] rpcbind: t_bind failed

2018-01-02 Thread Schweiss, Chip

About once every week or two I'm having NFS connections start to collapse
to one of my servers.   Clients will lose thier connections of the the
course of several hours. The logs fill with these messages:

Dec 25 16:21:14 mir-zfs03 rpcbind: [ID 452059 daemon.error]  do_accept :
t_bind failed : Couldn't allocate address
Dec 25 16:21:14 mir-zfs03 /usr/lib/nfs/nfsd[27689]: [ID 396295
daemon.error] t_bind(file descriptor 188/transport tcp) TLI error 5
Dec 25 16:21:31 mir-zfs03 last message repeated 85 times
Dec 25 16:21:31 mir-zfs03 rpcbind: [ID 452059 daemon.error]  do_accept :
t_bind failed : Couldn't allocate address
Dec 25 16:21:32 mir-zfs03 /usr/lib/nfs/nfsd[27689]: [ID 396295
daemon.error] t_bind(file descriptor 188/transport tcp) TLI error 5
Dec 25 16:21:34 mir-zfs03 last message repeated 19 times
Dec 25 16:21:37 mir-zfs03 /usr/lib/nfs/nfsd[27689]: [ID 396295
daemon.error] t_bind(file descriptor 200/transport tcp) TLI error 5
Dec 25 16:22:17 mir-zfs03 last message repeated 116 times
Dec 25 16:22:21 mir-zfs03 /usr/lib/nfs/nfsd[27689]: [ID 396295
daemon.error] t_bind(file descriptor 206/transport tcp) TLI error 5
Dec 25 16:23:04 mir-zfs03 last message repeated 81 times

This is a fully updated OmniOS CE r151022.

I've tried restarting NFS services, but the only thing that has been
successful in restoring services has been rebooting.

I'm not finding anything useful via Google except the source code that
spits out this message.   HP-UX appears to have had the same issue that
they patched years ago.   I'm guessing shared NFS/RPC code.

Any clue as to the cause of this and how to fix?

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] Editing kernel command line with BSD Loader

2017-10-30 Thread Schweiss, Chip

Forgive me if there is a FAQ somewhere on this, but I could not locate one.

How do I edit the command line now that my OmniOS is using the BSD loader?

I'd like to disable a driver at boot time such as:

-B disable-mpt_sas=true

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [zfs] SAS 9305-16e HBA support in Illumos

2017-09-08 Thread Schweiss, Chip

Robert,

That is awesome.   I'd definitely be interested in testing this.

I'll get my feet wet with building OmniOS CE with it.

Thanks!
-Chip

On Fri, Sep 8, 2017 at 10:07 AM, Robert Mustacchi <r...@joyent.com> wrote:

> On 9/8/17 6:43 , Schweiss, Chip wrote:
> > Now that I'm back to working on this.   The only way I could get the
> > firmware updated was booting into the UEFI shell.   A bit of a pain but
> it
> > worked.
> >
> > Unfortunately, it has not changed the behavior of the HBA.
> >
> > Where do I go from here?Any hope of getting this working on OmniOS?
>
> Hi Chip,
>
> I'm just catching up on this. So, I do have some good news and bad news.
> First, the bad news. This is based on the SAS3224 chipset, which it
> appears the 16e is also describing itself as. Of note, this uses a
> slightly newer version of the MPI specification and the driver as it is
> written doesn't quite notice that it requires slightly different
> behavior and a simple PCI ID update isn't sufficient.
>
> The good news is that I just finished doing this work for the LSI
> 9305-24i and was going to send that up to illumos shortly. If you want,
> I can send those changes your way if you're comfortable building illumos
> and want to test that.
>
> Robert
>
> > On Thu, Aug 31, 2017 at 9:53 AM, Schweiss, Chip <c...@innovates.com>
> wrote:
> >
> >> This server will be serving NFS for vSphere.  It is running OmniOS CE,
> >> nothing VMware.
> >>
> >> I'm working on flashing firmware now and will report back any changes.
> >>
> >> -Chip
> >>
> >> On Thu, Aug 31, 2017 at 9:42 AM, Dale Ghent <da...@elemental.org>
> wrote:
> >>
> >>>> On Aug 31, 2017, at 9:29 AM, Schweiss, Chip <c...@innovates.com>
> wrote:
> >>>>
> >>>> I've added mpt_sas "pciex1000,c9" to /etc/driver_aliases and rebooted.
> >>>>
> >>>> Looks like it's partially working, but it's not fully functional.
> >>> Service are timing out:
> >>>>
> >>>> Here's what I see in /var/adm/messages:
> >>>>
> >>>>
> >>>> Aug 31 08:15:49 vsphere-zfs01 scsi: [ID 107833 kern.warning] WARNING:
> >>> /pci@0,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
> >>>> Aug 31 08:15:49 vsphere-zfs01   MPT Firmware Fault, code: 2667
> >>>> Aug 31 08:15:49 vsphere-zfs01 scsi: [ID 107833 kern.warning] WARNING:
> >>> /pci@0,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
> >>>
> >>> The driver is reporting that the MPT IOC (IO Controller) is reporting a
> >>> fault. It's just reading this condition off the controller chip
> itself, and
> >>> unfortunately there doesn't seem to be a handy reference published by
> >>> LSI/Avago regarding what 2667h actually means.
> >>>
> >>> However I note from your machine's hostname that this is perhaps a ESI
> >>> guest that is being given the HBA in passthrough mode? It would seem
> that
> >>> someone else has encountered a similar issue as yourself in this case,
> with
> >>> the same MPT fault code, but on Linux running Proxmox. According to
> this
> >>> forum thread, they ended up flashing the firmware on the card to
> something
> >>> newer and the problem went away:
> >>>
> >>> https://forum.proxmox.com/threads/pci-passthrough.16483/
> >>>
> >>> I would suggest Tim's approach and flashing your card up to the newest
> IT
> >>> (not IR) firmware.
> >>>
> >>> /dale
> >>>
> >
> > --
> > illumos-zfs
> > Archives: https://illumos.topicbox.com/groups/zfs/discussions/
> T372d7ddd75316296-M4bd824d5e1881e2772ee518a
> > Powered by Topicbox: https://topicbox.com
> >
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [zfs] SAS 9305-16e HBA support in Illumos

2017-08-31 Thread Schweiss, Chip

This server will be serving NFS for vSphere.  It is running OmniOS CE,
nothing VMware.

I'm working on flashing firmware now and will report back any changes.

-Chip

On Thu, Aug 31, 2017 at 9:42 AM, Dale Ghent <da...@elemental.org> wrote:

>
> > On Aug 31, 2017, at 9:29 AM, Schweiss, Chip <c...@innovates.com> wrote:
> >
> > I've added mpt_sas "pciex1000,c9" to /etc/driver_aliases and rebooted.
> >
> > Looks like it's partially working, but it's not fully functional.
> Service are timing out:
> >
> > Here's what I see in /var/adm/messages:
> >
> >
> > Aug 31 08:15:49 vsphere-zfs01 scsi: [ID 107833 kern.warning] WARNING:
> /pci@0,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
> > Aug 31 08:15:49 vsphere-zfs01   MPT Firmware Fault, code: 2667
> > Aug 31 08:15:49 vsphere-zfs01 scsi: [ID 107833 kern.warning] WARNING:
> /pci@0,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
>
> The driver is reporting that the MPT IOC (IO Controller) is reporting a
> fault. It's just reading this condition off the controller chip itself, and
> unfortunately there doesn't seem to be a handy reference published by
> LSI/Avago regarding what 2667h actually means.
>
> However I note from your machine's hostname that this is perhaps a ESI
> guest that is being given the HBA in passthrough mode? It would seem that
> someone else has encountered a similar issue as yourself in this case, with
> the same MPT fault code, but on Linux running Proxmox. According to this
> forum thread, they ended up flashing the firmware on the card to something
> newer and the problem went away:
>
> https://forum.proxmox.com/threads/pci-passthrough.16483/
>
> I would suggest Tim's approach and flashing your card up to the newest IT
> (not IR) firmware.
>
> /dale
>
>
> --
> illumos-zfs
> Archives: https://illumos.topicbox.com/groups/zfs/discussions/
> T372d7ddd75316296-Mb0dd6c92e5393440a8b0c8fb
> Powered by Topicbox: https://topicbox.com
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [zfs] SAS 9305-16e HBA support in Illumos

2017-08-31 Thread Schweiss, Chip

 kern.info]
w5000c5002c6d6512 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c6d7192 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c6d36ee FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c6d64f2 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c4b3c16 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c50070a5c08a FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c6d3662 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c6d60d6 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c6d35a2 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c4d1fbe FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c4d27c6 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c6d66a2 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c500056fb256 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c40e3be FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c4d1846 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c6d5ff6 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c4d1e52 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c4d2276 FastPath Capable and Enabled
Aug 31 08:15:49 vsphere-zfs01 scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
Aug 31 08:15:49 vsphere-zfs01   MPT Firmware Fault, code: 2667
Aug 31 08:15:49 vsphere-zfs01 scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
Aug 31 08:15:49 vsphere-zfs01   ioc reset abort passthru
Aug 31 08:15:49 vsphere-zfs01 mpt_sas: [ID 201859 kern.warning] WARNING:
smp_start do passthru error 11
Aug 31 08:15:51 vsphere-zfs01 scsi: [ID 365881 kern.info] /pci@0
,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
Aug 31 08:15:51 vsphere-zfs01   MPT Firmware version v9.0.100.0 (?)
Aug 31 08:15:51 vsphere-zfs01 scsi: [ID 365881 kern.info] /pci@0
,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
Aug 31 08:15:51 vsphere-zfs01   mpt_sas0 MPI Version 0x206
Aug 31 08:15:51 vsphere-zfs01 scsi: [ID 365881 kern.info] /pci@0
,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
Aug 31 08:15:51 vsphere-zfs01   mpt0: IOC Operational.

Where do I go from here?

-Chip


On Thu, Aug 31, 2017 at 7:30 AM, Schweiss, Chip <c...@innovates.com> wrote:

> On Wed, Aug 30, 2017 at 3:12 PM, Dan McDonald <dan...@joyent.com> wrote:
>
>> > On Aug 30, 2017, at 4:11 PM, Dale Ghent <da...@elemental.org> wrote:
>> >
>> > Or rather:
>> >
>> > # update_drv -a -i '"pciex1000,c9"' mpt_sas
>>
>> It MIGHT fail because mpt_sas checks PCI IDs explicitly itself.  :(
>>
>>
> Yes, the update_drv command just hangs indefinately.
>
> -Chip
>
>
>
>> FYI,
>> Dan
>>
>>
>> --
>> illumos-zfs
>> Archives: https://illumos.topicbox.com/groups/zfs/discussions/T372d7dd
>> d75316296-M31c90a92ac6d9d9dbd977114
>> Powered by Topicbox: https://topicbox.com
>>
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] [zfs] SAS 9305-16e HBA support in Illumos

2017-08-31 Thread Schweiss, Chip

On Wed, Aug 30, 2017 at 3:12 PM, Dan McDonald  wrote:

> > On Aug 30, 2017, at 4:11 PM, Dale Ghent  wrote:
> >
> > Or rather:
> >
> > # update_drv -a -i '"pciex1000,c9"' mpt_sas
>
> It MIGHT fail because mpt_sas checks PCI IDs explicitly itself.  :(
>
>
Yes, the update_drv command just hangs indefinately.

-Chip



> FYI,
> Dan
>
>
> --
> illumos-zfs
> Archives: https://illumos.topicbox.com/groups/zfs/discussions/
> T372d7ddd75316296-M31c90a92ac6d9d9dbd977114
> Powered by Topicbox: https://topicbox.com
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] SAS 9305-16e HBA support in Illumos

2017-08-30 Thread Schweiss, Chip

I made the assumption that a Broadcom/LSI HBA would be supported already in
OmniOS CE r151022o.

This HBA is not loading.

Here's the 'lspci -vv' output:

02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3216
PCI-Express Fusion-MPT SAS-3 (rev 01)
Subsystem: LSI Logic / Symbios Logic Device 3180
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR- ___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Upgrade to 151022m from 014 - horrible NFS performance

2017-08-24 Thread Schweiss, Chip

I switched back to 014 for now, it was too bad to inflict on my users.

I have some new systems coming in soon that I'll test on r151022 before
making them live.   I will start with the NFS defaults.

-Chip

On Thu, Aug 24, 2017 at 8:35 AM, Dan McDonald <dan...@kebe.com> wrote:

>
> > On Aug 24, 2017, at 8:41 AM, Schweiss, Chip <c...@innovates.com> wrote:
> >
> > I just move one of my production systems to OmniOS CE 151022m from
> 151014 and my NFS performance has tanked.
> >
> > Here's a snapshot of nfssvrtop:
> >
> > 2017 Aug 24 07:34:39, load: 1.54, read: 5427 KB, swrite: 104
> KB, awrite: 9634 KB
> > Ver Client   NFSOPS   Reads SWrites AWrites Commits   Rd_bw
> SWr_bw  AWr_bwRd_t   SWr_t   AWr_t   Com_t  Align%
> > 3   10.28.17.10   0   0   0   0   0
>  0   0   0   0   0   0   0
> > 3   all   0   0   0   0   0   0
>  0   0   0   0   0   0   0
> > 4   10.28.17.19   0   0   0   0   0
>  0   0   0   0   0   0   0
> > 4   10.28.16.160 17   0   0   0   0   0
>  0   0   0   0   0   0   0
> > 4   10.28.16.127 20   0   0   0   0   0
>  0   0   0   0   0   0   0
> > 4   10.28.16.113 74   6   6   0   0  48
> 56   01366   20824   0   0 100
> > 4   10.28.16.64 338  16   0  36   3 476
>  01065 120   0 130  117390 100
> > 4   10.28.16.54 696  68   0  91   52173
>  02916  52   0  93  142083 100
> > 4   all1185  90   6 127   82697
> 563996 151   20824 104  133979 100
> >
> > The pool is not doing anything but serving NFS.   Before the upgrade,
> the pool would sustain 20k NFS ops.
> >
> > Is there some significant change in NFS that I need to adjust its tuning?
>
> Oh my.
>
> I'd start pinging the illumos list on this.  Also, are there any special
> tweaks you made in the 014 configuration?  IF you did, I'd start back
> removing them and seeing what a default system does, just in case.
>
> I know Delphix and Nexenta still care about NFS quite a bit, so I can't
> believe something would be that bad.
>
> Maintainers:  Check for NFS changes RIGHT AFTER 022 closed for blanket
> upstream pull-ins.  Maybe it closed during a poor-performance window?
>
> Dan
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] Upgrade to 151022m from 014 - horrible NFS performance

2017-08-24 Thread Schweiss, Chip

I just move one of my production systems to OmniOS CE 151022m from 151014
and my NFS performance has tanked.

Here's a snapshot of nfssvrtop:

2017 Aug 24 07:34:39, load: 1.54, read: 5427 KB, swrite: 104  KB,
awrite: 9634 KB
Ver Client   NFSOPS   Reads SWrites AWrites Commits   Rd_bw
 SWr_bw  AWr_bwRd_t   SWr_t   AWr_t   Com_t  Align%
3   10.28.17.10   0   0   0   0   0
  0   0   0   0   0   0   0
3   all   0   0   0   0   0   0
  0   0   0   0   0   0   0
4   10.28.17.19   0   0   0   0   0
  0   0   0   0   0   0   0
4   10.28.16.160 17   0   0   0   0   0
  0   0   0   0   0   0   0
4   10.28.16.127 20   0   0   0   0   0
  0   0   0   0   0   0   0
4   10.28.16.113 74   6   6   0   0  48
 56   01366   20824   0   0 100
4   10.28.16.64 338  16   0  36   3 476
  01065 120   0 130  117390 100
4   10.28.16.54 696  68   0  91   52173
  02916  52   0  93  142083 100
4   all1185  90   6 127   82697
 563996 151   20824 104  133979 100

The pool is not doing anything but serving NFS.   Before the upgrade, the
pool would sustain 20k NFS ops.

Is there some significant change in NFS that I need to adjust its tuning?

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] scsi command timeouts

2017-06-22 Thread Schweiss, Chip

I'm talking about an offline pool.   I started this thread after rebooting
a server that is part of an HA pair. The other server has the pools
online.  It's been over 4 hours now and it still hasn't completed its disk
scan.

Every tool I have that helps me locate disks, suffers from the same insane
command timeout to happen many times before moving on.   Operations that
typically take seconds blow up to hours really fast because of a few dead
disks.

-Chip



On Thu, Jun 22, 2017 at 3:12 PM, Dale Ghent <da...@omniti.com> wrote:

>
> Have you able to and have tried offlining it in the zpool?
>
> zpool offline thepool 
>
> I'm assuming the pool has some redundancy which would allow for this.
>
> /dale
>
> > On Jun 22, 2017, at 11:54 AM, Schweiss, Chip <c...@innovates.com> wrote:
> >
> > When ever a disk goes south, several disk related takes become painfully
> slow.  Boot up times can jump into the hours to complete the disk scans.
> >
> > The logs slowly get these type messages:
> >
> > genunix: WARNING /pci@0,0/pci8086,340c@5/pci15d9,400@0 (mpt_sas0):
> > Timeout of 60 seconds expired with 1 commands on target 16 lun 0
> >
> > I thought this /etc/system setting would reduce the timeout to 5 seconds:
> > set sd:sd_io_time = 5
> >
> > But this doesn't seem to change anything.
> >
> > Is there anyway to make this a more reasonable timeout, besides pulling
> the disk that's causing it?   Just locating the defective disk is also
> painfully slow because of this problem.
> >
> > -Chip
> > ___
> > OmniOS-discuss mailing list
> > OmniOS-discuss@lists.omniti.com
> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] scsi command timeouts

2017-06-22 Thread Schweiss, Chip

On Thu, Jun 22, 2017 at 11:05 AM, Michael Rasmussen  wrote:

>
> > I thought this /etc/system setting would reduce the timeout to 5 seconds:
> > set sd:sd_io_time = 5
> >
> I think it expects a hex value so try 0x5 instead.
>
>
Unfortunately, no, I've tried that too.

-Chip


> --
> Hilsen/Regards
> Michael Rasmussen
>
> Get my public GnuPG keys:
> michael  rasmussen  cc
> http://pgp.mit.edu:11371/pks/lookup?op=get=0xD3C9A00E
> mir  datanom  net
> http://pgp.mit.edu:11371/pks/lookup?op=get=0xE501F51C
> mir  miras  org
> http://pgp.mit.edu:11371/pks/lookup?op=get=0xE3E80917
> --
> /usr/games/fortune -es says:
> Look, we play the Star Spangled Banner before every game.  You want us
> to pay income taxes, too?
> -- Bill Veeck, Chicago White Sox
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] scsi command timeouts

2017-06-22 Thread Schweiss, Chip

When ever a disk goes south, several disk related takes become painfully
slow.  Boot up times can jump into the hours to complete the disk scans.

The logs slowly get these type messages:

genunix: WARNING /pci@0,0/pci8086,340c@5/pci15d9,400@0 (mpt_sas0):
Timeout of 60 seconds expired with 1 commands on target 16 lun 0

I thought this /etc/system setting would reduce the timeout to 5 seconds:
set sd:sd_io_time = 5

But this doesn't seem to change anything.

Is there anyway to make this a more reasonable timeout, besides pulling the
disk that's causing it?   Just locating the defective disk is also
painfully slow because of this problem.

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] Resilver zero progress

2017-05-10 Thread Schweiss, Chip

I have a pool that has had a resilver running for about an hour but the
progress status is a bit alarming.  I'm concerned for some reason it will
not resilver.   Resilvers are tuned to be faster in /etc/system.   This is
on OmniOS r151014, currently fully updated.   Any suggestions?

-Chip

from /etc/system:

set zfs:zfs_resilver_delay = 0
set zfs:zfs_scrub_delay = 0
set zfs:zfs_top_maxinflight = 64
set zfs:zfs_resilver_min_time_ms = 5000


# zpool status hcp03
  pool: hcp03
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed May 10 09:22:15 2017
1 scanned out of 545T at 1/s, (scan is slow, no estimated time)
0 resilvered, 0.00% done
config:

NAME STATE READ WRITE CKSUM
hcp03DEGRADED 0 0 0
  raidz2-0   DEGRADED 0 0 0
c0t5000C500846F161Fd0ONLINE   0 0 0
spare-1  UNAVAIL  0 0 0
  5676922542927845170UNAVAIL  0 0 0  was
/dev/dsk/c0t5000C5008473DBF3d0s0
  c0t5000C500846F1823d0  ONLINE   0 0 0
c0t5000C500846F134Fd0ONLINE   0 0 0
c0t5000C500846F139Fd0ONLINE   0 0 0
c0t5000C5008473B89Fd0ONLINE   0 0 0
c0t5000C500846F145Bd0ONLINE   0 0 0
c0t5000C5008473B6BBd0ONLINE   0 0 0
c0t5000C500846F131Fd0ONLINE   0 0 0
  raidz2-1   ONLINE   0 0 0
c0t5000C5008473BB63d0ONLINE   0 0 0
c0t5000C5008473C9C7d0ONLINE   0 0 0
c0t5000C500846F1A17d0ONLINE   0 0 0
c0t5000C5008473A0A3d0ONLINE   0 0 0
c0t5000C5008473D047d0ONLINE   0 0 0
c0t5000C5008473BF63d0ONLINE   0 0 0
c0t5000C5008473BC83d0ONLINE   0 0 0
c0t5000C5008473E35Bd0ONLINE   0 0 0
  raidz2-2   ONLINE   0 0 0
c0t5000C5008473ABAFd0ONLINE   0 0 0
c0t5000C5008473ADF3d0ONLINE   0 0 0
c0t5000C5008473AE77d0ONLINE   0 0 0
c0t5000C5008473A23Bd0ONLINE   0 0 0
c0t5000C5008473C907d0ONLINE   0 0 0
c0t5000C5008473CCABd0ONLINE   0 0 0
c0t5000C5008473C77Fd0ONLINE   0 0 0
c0t5000C5008473B6D3d0ONLINE   0 0 0
  raidz2-3   ONLINE   0 0 0
c0t5000C5008473E4FFd0ONLINE   0 0 0
c0t5000C5008473ECFFd0ONLINE   0 0 0
c0t5000C5008473F4C3d0ONLINE   0 0 0
c0t5000C5008473F8CFd0ONLINE   0 0 0
c0t5000C500846F1897d0ONLINE   0 0 0
c0t5000C500846F14B7d0ONLINE   0 0 0
c0t5000C500846F1353d0ONLINE   0 0 0
c0t5000C5008473EEDFd0ONLINE   0 0 0
  raidz2-4   ONLINE   0 0 0
c0t5000C500846F144Bd0ONLINE   0 0 0
c0t5000C5008473F10Fd0ONLINE   0 0 0
c0t5000C500846F15CBd0ONLINE   0 0 0
c0t5000C500846F1493d0ONLINE   0 0 0
c0t5000C5008473E26Fd0ONLINE   0 0 0
c0t5000C500846F1A0Bd0ONLINE   0 0 0
c0t5000C5008473EE07d0ONLINE   0 0 0
c0t5000C500846F1453d0ONLINE   0 0 0
  raidz2-5   ONLINE   0 0 0
c0t5000C500846F153Bd0ONLINE   0 0 0
c0t5000C5008473F9EBd0ONLINE   0 0 0
c0t5000C500846F14EFd0ONLINE   0 0 0
c0t5000C5008473AB0Bd0ONLINE   0 0 0
c0t5000C500846F140Bd0ONLINE   0 0 0
c0t5000C5008473FC0Fd0ONLINE   0 0 0
c0t5000C5008473DFA3d0ONLINE   0 0 0
c0t5000C5008473F89Bd0ONLINE   0 0 0
  raidz2-6   ONLINE   0 0 0
c0t5000C500846F19BFd0ONLINE   0 0 0
c0t5000C5008473D1ABd0ONLINE   0 0 0
c0t5000C50084739FD3d0ONLINE   0 0 0
c0t5000C5008473FFB7d0ONLINE   0 0 0
c0t5000C5008473E72Fd0ONLINE   0 0 0

[OmniOS-discuss] OmniOS DOS'd my entire network

2017-05-09 Thread Schweiss, Chip

This was a first for me and extremely painful to locate.

In the middle of the night between last Friday and Saturday, I started
getting down alerts from most of my network.   It took 4 engineers
including myself 9 hours to pinpoint the source of the problem.

The problem turned out to be one of my OmniOS boxes sending out pure
garbage constantly on layer 2 out the 10G network ports.   This disrupted
ARP caches on every machine on every VLAN that was trunked on these ports,
not just the VLANs that were configured on the server.   The switches
reported every port healthy and without error.   The traffic on the bad
port was not high either, just severely disruptive.

The affected OmniOS box appear to be healthy, as it was still serving the
VM data stores for over 350 virtual machines.   However, it like every
other service on the network appeared to be up and down repeatedly, but NFS
kept on recovering gracefully.

The only thing that finally identified this server was when one of us plug
a monitor to the console and saw "WARNING: proxy ARP problem?"  happening
so fast that it took taking a cellphone picture of it a high frame rate to
read it.   Powering off this server, cleared the problem for the entire
network, and its pools were taken over by its HA sister.

Googling for that warning brings up nothing useful.

Has anyone ever seen a problem like this?   How did you locate it?

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] new supermicro server

2017-03-08 Thread Schweiss, Chip

On Wed, Mar 8, 2017 at 8:36 AM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

>
>
> Perhaps there is a way to tell the HBA BIOS to not advertize the SAS
> drives which are not needed for booting?

In the HBA BIOS configuration, set the HBA to disabled.  OmniOS will still
see the HBA and disks, but the BIOS will not want to list all the disks.
You only need it enabled if you boot from a disk attached to the HBA.

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Corrupted file recovery and restoring pool to ONLINE

2017-01-23 Thread Schweiss, Chip

To get back to an online state you need to detach the offline disk:

zpool detach B-034 c10t5C0F0132772Ed0s0

If the corrupted file is in any snapshots those snapshots will have to be
destroyed to stop it from being found as a corruption during a scrub.

-Chip

On Mon, Jan 23, 2017 at 9:53 AM,  wrote:

>
>   Howdy!
>
>  I had a corrupted file during resilvering after a drive
> failure/replacment, I replaced the file from a backup and the
> pool started to resilver again. If finished and is still in a
> state
> that shows DEGRADED with file errors. I can read the file fine and md5sum
> checks out.
>
>   What do I need to do to put this pool into an ONLINE state and
> remove the error?
> Or is this pool still problematic??
>
> thanx - steve
>
>   pool: B-034
>  state: DEGRADED
> status: One or more devices has experienced an error resulting in data
> corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
> entire pool from backup.
>see: http://illumos.org/msg/ZFS-8000-8A
>   scan: resilvered 3.46T in 28h19m with 3 errors on Sun Jan 22 17:43:53
> 2017
> config:
>
> NAMESTATE READ WRITE CKSUM
> B-034   DEGRADED 0 0 3
>   raidz1-0  DEGRADED 0 0 6
> c0t5000C500571D5D9Fd0s0 ONLINE   0 0 0
> c0t5000C500571D69D3d0s0 ONLINE   0 0 0
> c10t5C0F01F82C82d0s0ONLINE   0 0 0
> c10t5C0F01F84B6Ad0s0ONLINE   0 0 0
> replacing-4 DEGRADED 0 0 0
>   c10t5C0F0132772Ed0s0  OFFLINE  0 0 0
>   c10t539578C8A83Ed0s0  ONLINE   0 0 0
> c10t5C0F0136989Ad0s0ONLINE   0 0 0
> c10t5C0F01327226d0s0ONLINE   0 0 0
> c10t5C0F01327316d0s0ONLINE   0 0 0
>
> errors: Permanent errors have been detected in the following files:
>
> B-034@nov_5_2016:/51/17/1000621751.bkt
>
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Network >10Gb/s

2017-01-10 Thread Schweiss, Chip

On Tue, Jan 10, 2017 at 9:58 AM, Dan McDonald <dan...@omniti.com> wrote:

>
> > On Jan 10, 2017, at 8:41 AM, Schweiss, Chip <c...@innovates.com> wrote:
> >
> > It appears that my options for 40Gb/s Ethernet are Intel, Chelsio and
> SolarFlare.
> >
> > Can anyone comment on which of these is the most stable solution when
> running under OmniOS?   What's the fastest NFS throughput you've been able
> to achieve?
>
> The Intel i40e driver is nascent, but it will receive more attention as
> time passes.  Doug's point about SolarFlare is a good one.
>
>
I'm a bit concerned on the Intel because of posts like this:
https://news.ycombinator.com/item?id=11373848  and the fact that they seem
to have shifted their focus to Omni-Path which from my understanding is
incompatible with the existing 40G gear.

SolarFlare seems promising, but I'd like to know of at least on success
story.

-Chip



> You may wish to ping the larger illumos community about this as well.
>


> Dan
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] Network >10Gb/s

2017-01-10 Thread Schweiss, Chip

It appears that my options for 40Gb/s Ethernet are Intel, Chelsio and
SolarFlare.

Can anyone comment on which of these is the most stable solution when
running under OmniOS?   What's the fastest NFS throughput you've been able
to achieve?

Also is there any work being done by anyone to bring an Omni-Path
compatible NIC to Illumos/OmniOS?

Thanks!
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Multiple faulty SSD's ?

2016-07-26 Thread Schweiss, Chip

I don't have a lot of experience with the 850 Pro, but a lot with the 840
Pro under OmniOS

With 4K block size set in sd.conf and slicing them to only use 80% of their
capacity a pool of 72 of them has been under near constant heavy read/write
workload for over 3 years without a single chksum error.

-Chip

On Tue, Jul 26, 2016 at 1:30 PM, Piotr Jasiukajtis  wrote:

> I don’t know a root cause, but it’s better to have a workaround than a
> corrupted pools.
>
> --
> Piotr Jasiukajtis
>
> > On 26 Jul 2016, at 20:06, Dan McDonald  wrote:
> >
> > I wonder if those sd.conf changes should be upstreamed or not?
> >
> > Dan
> >
> > Sent from my iPhone (typos, autocorrect, and all)
> >
> >> On Jul 26, 2016, at 1:28 PM, Piotr Jasiukajtis  wrote:
> >>
> >> You may want to force the driver to use 4k instead of 512b for those
> drivers and create a new pool:
> >>
> >>
> https://github.com/joyent/smartos-live/commit/dd25937d2f9725def16f5e8dbb16a8bcbc2213d5
> >>
> >> --
> >> Piotr Jasiukajtis
> >>
> >>> On 26 Jul 2016, at 02:24, Shaun McGuane  wrote:
> >>>
> >>> Hi List,
> >>>
> >>> I want to report very strange SSD behaviour on a new pool I setup.
> >>>
> >>> The hardware is a HP DL180 G6 Server with the LSI 9207-8i Card
> >>> And 8x 1TB Samsung SSD Pro drives. Running omnios-10b9c79
> >>>
> >>> All the drives are brand spanking new setup in a raidz2 array.
> >>>
> >>> Within 2 months the below has happened and there has been very
> >>> Little use on this array.
> >>>
> >>> pool: SSD-TANK
> >>> state: DEGRADED
> >>> status: One or more devices are faulted in response to persistent
> errors.
> >>>   Sufficient replicas exist for the pool to continue functioning
> in a
> >>>   degraded state.
> >>> action: Replace the faulted device, or use 'zpool clear' to mark the
> device
> >>>   repaired.
> >>> scan: scrub repaired 23K in 1h12m with 0 errors on Mon Jul 25 20:13:04
> 2016
> >>> config:
> >>>
> >>>   NAME   STATE READ WRITE CKSUM
> >>>   SSD-TANK   DEGRADED 16735
> >>> raidz2-0 DEGRADED 472   113
> >>>   c5t500253884014D0D3d0  ONLINE   0 0 2
> >>>   c5t50025388401F767Ad0  DEGRADED 0 019  too many
> errors
> >>>   c5t50025388401F767Bd0  FAULTED  0 0 0  too many
> errors
> >>>   c5t50025388401F767Dd0  ONLINE   0 0 0
> >>>   c5t50025388401F767Fd0  ONLINE   0 0 1
> >>>   c5t50025388401F7679d0  ONLINE   0 0 2
> >>>   c5t50025388401F7680d0  REMOVED  0 0 0
> >>>   c5t50025388401F7682d0  ONLINE   0 0 1
> >>>
> >>> Can anyone suggest why I would have this problem where I am seeing
> CKSUM errors
> >>> On most disks and while only one has faulted others have been degraded
> or removed.
> >>>
> >>> Thanks
> >>> Shaun
> >>> ___
> >>> OmniOS-discuss mailing list
> >>> OmniOS-discuss@lists.omniti.com
> >>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> >>
> >> ___
> >> OmniOS-discuss mailing list
> >> OmniOS-discuss@lists.omniti.com
> >> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Supermicro X9DR3-F PCI Bus reported fault.

2016-04-27 Thread Schweiss, Chip

I've run many Supermicro servers in the X9 and X10 series.   It sounds like
you have a bad board, or at the minimum a bad slot.   If it's under
warranty get it exchanged.

-Chip

On Tue, Apr 26, 2016 at 9:47 PM, Shaun McGuane 
wrote:

> Hi OmniOS list,
>
> I am wondering if anyone has had any experience with the Supermicro boards
> for OmniOS in particular Model X9DR3-F
> I have it setup with 2x Intel E5-2670 processors (so I can use all the
> pci-e slots) and 256GB DDR3 ECC Ram as a base.
>
> I have then tried to run this with LSI 9207-8i Cards x3 for the complete
> setup, started off with none to get a base OmniOS
> Install on the server to ensure all is working OK before adding cards and
> drives.
>
> The OmniOS version I am running is r151014 – I have also tried the latest
> current build from 2016 and I get the same result
>
> I am getting the following error when performing : fmadm faulty
>
> Fault class: fault.io.pciex.device-interr
> Affects: dev:pci@78,0/pci8086,3c08@3/pci8086,a21f@0 faulted and taken
> out of service
> FRU: “CPU2_SLOT6”
>
> This slot being reported is the slot closest to the cpu.
>
> The problem that I have is that I have 2 of these boards are showing the
> same error and I have tested these boards running
> Ubuntu 14.04 and Windows and I do not have any errors or issues using this
> slot. I am new to using super micro boards for
> my ZFS arrays and are used to using HP Servers (DL180 G6, etc)
>
> I don’t necessarily need to use this slot, but I am seeing strange issues
> with removing and re-inserting drives where drives
> Show up when running "iostat –En” but not when I run format to label them.
>
> I thought the 2 issues maybe connected.
>
> Kind Regards
> Shaun McGuane
>
>
>
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Routing challlenges

2016-04-07 Thread Schweiss, Chip

On Thu, Apr 7, 2016 at 12:51 PM, Michael Talbott <mtalb...@lji.org> wrote:

> Oh, I see. Sorry about that, reading it on my phone didn't render your
> diagram properly ;)
>
> The reason this is happening is because the omnios box has knowledge of
> both subnets in its routing table and it always takes the shortest path to
> reach an ip destination.
>

That's definitely the reason, but not correct when stateful firewalls are
involved.

>
> So you will need to put the "clients" in a unique subnet that always
> passes through the firewall in both directions (in a subnet that's not
> shared by the omnios machines). Any attempt to add/modify a static route to
> the omnios box to resolve this will likely fail (it'll just move the
> problem from one network to the other one and cause your "services" network
> to route improperly).
>

The problem is each person who manages these (there are 4) are also clients
of the services (SMB, NFS).

For management, going through the firewall is fine, it is low volume, but
the services need to be on the same VLAN or else the 1gb firewall will
choke on the 10gb services.


> Either that, or remove the firewall as a hop, set sshd to listen only on
> the management IP, and add a management vlan interface to the clients
> allowed to connect.
>
>
I've considered this too, but I have some floating IP attached to zfs pools
in an HA cluster that SSH needs to bind to.

Unless I can get the network stack on the management vlan to act
independently of the other interfaces, it may come to modifying the
sshd_config and restarting ssh each time a pool is started or stopped on a
host.

-Chip



> Michael
>
>
> On Apr 7, 2016, at 10:25 AM, Michael Talbott <mtalb...@lji.org> wrote:
>
> It sounds like you're using the same subnet for management and service
> traffic, that would be the problem causing the split route. Give each vlan
> a unique subnet and traffic should flow correctly.
>
> Michael
> Sent from my iPhone
>
> On Apr 7, 2016, at 8:52 AM, Schweiss, Chip <c...@innovates.com> wrote:
>
> On several of my OmniOS hosts I have a setup a management interface for
> SSH access on an independent VLAN.   There are service vlans attached to
> other nics.
>
> The problem I am having is that when on privileged machine on one of the
> vlans also on the service side that has access to the management SSH port
> the TCP SYN comes in the management VLAN but the SYNACK goes out the
> service VLAN instead of routing back out its connecting port.   This causes
> a split route and the firewall blocks the connection because the connection
> never appears complete.
>
> Traffic is flowing like this:
> client   firewall omnnios
> 10.28.0.106 ->   10.28.0.254->10.28.125.254  -> 10.28.125.44
>
> 10.28.0.106  <- 10.28.0.44
>
> How can I cause connections to only communicate on the vlan that the
> connection is initiated from?
>
> I don't want to use the 10.28.0.44 interface because that is a virtual IP
> and will not always be on the same host.
>
> -Chip
>
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] sshd logging

2016-03-31 Thread Schweiss, Chip

I'm trying to get sshd logging to work on OmniOS with OpenSSH installed.
Nothing I try seems to produce any logging.

In sshd_config I have:

# Syslog facility and level
SyslogFacility AUTH
LogLevel VERBOSE

In /etc/syslog.conf:

*.err;kern.notice;auth.notice   /dev/sysmsg
*.err;kern.debug;daemon.notice;mail.crit/var/adm/messages
*.alert;kern.err;daemon.err operator
*.alert root
*.emerg *

# if a non-loghost machine chooses to have authentication messages
# sent to the loghost machine, un-comment out the following line:
auth.notice ifdef(`LOGHOST', /var/log/authlog, @loghost)
mail.debug  ifdef(`LOGHOST', /var/log/syslog, @loghost)

#
# non-loghost machines will use the following lines to cause "user"
# log messages to be logged locally.
#
ifdef(`LOGHOST', ,
user.err/dev/sysmsg
user.err/var/adm/messages
user.alert  `root, operator'
user.emerg  *
)

I've tried may combinations, in both ssshd_config and syslog.conf.

Can someone clue me in on the magic formula?

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

2016-02-19 Thread Schweiss, Chip

On Thu, Feb 18, 2016 at 3:56 PM, Richard Elling <
richard.ell...@richardelling.com> wrote:

>
>
> Related to lock manager is name lookup. If you use name services, you add
> a latency
> dependency to failover for name lookups, which is why we often disable DNS
> or other
> network name services on high-availability services as a best practice.
>  -- richard
>
>
Interesting approach.  Something I will definitely test in our environment.
  The biggest challenge I see is that I run Samba on a couple hosts that
needs DNS.   Hopefully I can find a work around for it.

It would be nice if DNS could be disabled just for NFS.

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Updating to r15016

2016-02-10 Thread Schweiss, Chip

On Wed, Feb 10, 2016 at 11:19 AM, Dale Ghent <da...@omniti.com> wrote:

>
> > On Feb 10, 2016, at 11:26 AM, Schweiss, Chip <c...@innovates.com> wrote:
> >
> > I'm updating one of my systems to r151016.   When I use:
> >
> > /usr/bin/pkg update --be-name=omnios-r151016 entire@11,5.11-0.151016
> >
> > I get:
> > pkg update: 'entire@11,5.11-0.151016' matches no installed packages
> >
> > I'm ignorant of what the entire@ portion does as I've been script
> kidding my way through upgrades.   Can someone explain what this is
> supposed to be?
>
> Did you change your omnios repo to the one for r151016? Different versions
> of OmniOS reside in their own repos now, so first you must switch the
> omnios publisher, then you can just run 'pkg upgrade'
>

Yes.  That was my first step.

Been through this many times on OmniOS, but this time the entire@ seems to
be cause a problem and I'm not clear why.

-Chip

>
> pkg set-publisher -G http://pkg.omniti.com/omnios/r151014/ -g
> http://pkg.omniti.com/omnios/r151016/ omnios
> pkg update -v
>
> /dale
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] Updating to r15016

2016-02-10 Thread Schweiss, Chip

I'm updating one of my systems to r151016.   When I use:

/usr/bin/pkg update --be-name=omnios-r151016 entire@11,5.11-0.151016

I get:
pkg update: 'entire@11,5.11-0.151016' matches no installed packages

I'm ignorant of what the entire@ portion does as I've been script kidding
my way through upgrades.   Can someone explain what this is supposed to be?

Thanks!
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Updating to r15016

2016-02-10 Thread Schweiss, Chip

On Wed, Feb 10, 2016 at 1:11 PM, Dan McDonald <dan...@omniti.com> wrote:

>
> > On Feb 10, 2016, at 11:26 AM, Schweiss, Chip <c...@innovates.com> wrote:
> >
> > /usr/bin/pkg update --be-name=omnios-r151016 entire@11,5.11-0.151016
>
> Lose the "5."...
>
> r151016(~)[0]% pkg list -v entire
> FMRI
>IFO
> pkg://omnios/entire@11-0.151016:20151202T161203Z
>i--
> r151016(~)[0]%
>
> Do we need to update a wiki page about that?
>

Possibly.   I may be the odd user who doesn't understand what 'entire' on
the update is doing or how to correct it when my syntax is wrong.




>
> Also, you could just specify "entire" if the publisher's set right.
>
> Dan
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] zlib/zlib-devel packages

2016-01-20 Thread Schweiss, Chip

Is anyone aware of zlib and zlib-devel packages available anywhere for
OmniOS?

These are needed for building any Samba version 4.2.0 or greater.

Thanks!
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] zlib/zlib-devel packages

2016-01-20 Thread Schweiss, Chip

I ended up downloading and building zlib separately and got it to build.

The problem only occurs when selecting --with-ads during configure.  It
fails on checking gnutls, which needs zlib-devel.

My build will not join the domain, but that's out of the scope of this
list..

Thanks!
-Chip

On Wed, Jan 20, 2016 at 1:02 PM, Peter Tribble <peter.trib...@gmail.com>
wrote:

> On Wed, Jan 20, 2016 at 6:40 PM, Schweiss, Chip <c...@innovates.com>
> wrote:
>
>> Is anyone aware of zlib and zlib-devel packages available anywhere for
>> OmniOS?
>>
>
> Installed in OmniOS by default, and cannot be uninstalled.
>
>
>> These are needed for building any Samba version 4.2.0 or greater.
>>
>
> We build samba (4.3.x) on OmniOS without any issues. We haven't done
> anything
> beyond install the basic build tools. What sort of error are you getting?
>
> --
> -Peter Tribble
> http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] zlib/zlib-devel packages

2016-01-20 Thread Schweiss, Chip

On Wed, Jan 20, 2016 at 2:03 PM, Dan McDonald  wrote:

> Probably not useful now for you on LTS, but Nexenta's SMB2 is available
> for r151016 and later.
>

My biggest challenge is I have to support multiple domains on one server.
That's forcing me to build from source because several paths get compiled
in and break things.

-Chip

>
> Dan
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] zlib/zlib-devel packages

2016-01-20 Thread Schweiss, Chip

I'll definitely be trying your build config.

Is that joined to an Active Directory domain?   If so I'm confused on how
that works with the '--without-ad-dc' flag.

Thanks!
-Chip



On Wed, Jan 20, 2016 at 2:29 PM, Michael Talbott <mtalb...@lji.org> wrote:

> I have samba 4.2.3 working with all the bells and whistles (including
> winbind). Also I have netatalk working along side it ;)
>
> Hope this helps:
>
> Here's what I installed as prereqs before compiling them:
>
> pkg install \
>   library/security/openssl \
>   naming/ldap \
>   system/library/iconv/unicode \
>   system/library/dbus \
>   system/library/libdbus \
>   system/library/libdbus-glib \
>   developer/gnu-binutils \
>   developer/build/libtool \
>   developer/build/autoconf \
>   system/library/math/header-math \
>   /system/library/dbus \
>   /system/library/libdbus-glib \
>   /omniti/database/bdb \
>   /text/gnu-gettext \
>   /service/network/dns/mdns \
>   /developer/build/gnu-make \
>   /developer/build/automake \
>   /developer/build/libtool \
>   /developer/macro/gnu-m4 \
>   /developer/build/gnu-make \
>   /developer/gnu-binutils \
>   developer/build/autoconf \
>   developer/build/automake \
>   developer/lexer/flex \
>   developer/parser/bison \
>   developer/object-file \
>   developer/linker \
>   developer/library/lint \
>   developer/build/gnu-make \
>   library/idnkit \
>   library/idnkit/header-idnkit \
>   system/header \
>   system/library/math/header-math \
>   gcc44 \
>   gcc48
>
> pkg install /omniti/perl/dbd-mysql \
> /omniti/database/mysql-55/library
>
> pkg install libgcrypt
>
> And, this is what I use for building samba (I force 32 bit so winbind
> plays nicely with other 32 bit only tools like the "id" command):
>
> export ISALIST=i386
> CFLAGS=-m32 CXXFLAGS=-m32 CPPFLAGS=-m32 LDFLAGS=-m32 \
> ./configure \
>   --prefix=/usr/local \
>   --bindir=/usr/local/bin \
>   --sbindir=/usr/local/sbin \
>   --libdir=/usr/local/lib/ \
>   --mandir=/usr/local/man \
>   --infodir=/usr/local/info \
>   --sysconfdir=/etc/samba \
>   --with-configdir=/etc/samba \
>   --with-privatedir=/etc/samba/private \
>   --localstatedir=/var \
>   --sharedstatedir=/var \
>   --bundled-libraries=ALL \
>   --with-winbind \
>   --with-ads \
>   --with-ldap \
>   --with-pam \
>   --with-iconv \
>   --with-acl-support \
>   --with-syslog \
>   --with-aio-support \
>   --enable-fhs \
>   --without-ad-dc \
>
> --with-shared-modules=idmap_ad,vfs_zfsacl,vfs_audit,vfs_catia,vfs_full_audit,vfs_readahead,vfs_streams_xattr,time_audit,vfs_fruit
> \
>   --enable-gnutls
>
> gmake
> gmake install
>
> Good luck!
>
> Michael
>
>
> On Jan 20, 2016, at 11:48 AM, Schweiss, Chip <c...@innovates.com> wrote:
>
> I ended up downloading and building zlib separately and got it to build.
>
> The problem only occurs when selecting --with-ads during configure.  It
> fails on checking gnutls, which needs zlib-devel.
>
> My build will not join the domain, but that's out of the scope of this
> list..
>
> Thanks!
> -Chip
>
>
>
> On Wed, Jan 20, 2016 at 1:02 PM, Peter Tribble <peter.trib...@gmail.com>
> wrote:
>
>> On Wed, Jan 20, 2016 at 6:40 PM, Schweiss, Chip <c...@innovates.com>
>> wrote:
>>
>>> Is anyone aware of zlib and zlib-devel packages available anywhere for
>>> OmniOS?
>>>
>>
>> Installed in OmniOS by default, and cannot be uninstalled.
>>
>>
>>> These are needed for building any Samba version 4.2.0 or greater.
>>>
>>
>> We build samba (4.3.x) on OmniOS without any issues. We haven't done
>> anything
>> beyond install the basic build tools. What sort of error are you getting?
>>
>> --
>> -Peter Tribble
>> http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
>>
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] NFS Server restart

2015-12-08 Thread Schweiss, Chip

I had an NFS server become unresponsive on one of my production systems.
The NFS server service would not restart, out of desperation I rebooted
which fixed the problem.

Before reboot I tried restarting all NFS related service with no-avail.
The reboot probably wasn't necessary but the correct list and order of
services to restart is.

Can someone fill me in on which services in what order should be
stopped/started to get NFS fully reset?

Thanks!
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Updates for OmniOS r151014 & r151016

2015-11-13 Thread Schweiss, Chip

On Fri, Nov 13, 2015 at 2:13 PM, Dan McDonald  wrote:

>
> 014:
> --
>
> - OpenSSH 7.1p1, including the r151016 method(s) of changing between
> SunSSH and OpenSSH
>
>
Thank you for this!!

-Chip


>
> Happy updating!
> Dan
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Slow performance with ZeusRAM?

2015-10-22 Thread Schweiss, Chip

The ZIL on log devices suffer a bit from not filling queues well.   In
order to get the queues to fill more, try running your test to several zfs
folders on the pool simultaneously and measure your total I/O.

As I understand it, ff you're writing to only one zfs folder, your queue
depth will stay at 1 on the log device and you be come latency bound.

-Chip

On Thu, Oct 22, 2015 at 2:02 PM, Matej Zerovnik  wrote:

> Hello,
>
> I'm building a new system and I'm having a bit of a performance problem.
> Well, its either that or I'm not getting the whole ZIL idea:)
>
> My system is following:
> - IBM xServer 3550 M4 server (dual CPU with 160GB memory)
> - LSI 9207 HBA (P19 firmware)
> - Supermicro JBOD with SAS expander
> - 4TB SAS3 drives
> - ZeusRAM for ZIL
> - LTS Omnios (all patches applied)
>
> If I benchmark ZeusRAM on its own with random 4k sync writes, I can get
> 48k IOPS out of it, no problem there.
>
> If I create a new raidz2 pool with 10 hard drives, mirrored ZeusRAMs for
> ZIL and set sync=always, I can only squeeze 14k IOPS out of the system.
> Is that normal or should I be getting 48k IOPS on the 2nd pool as well,
> since this is the performance ZeusRAM can deliver?
>
> I'm testing with fio:
> fio --filename=/pool0/test01 --size=5g --rw=randwrite --refill_buffers
> --norandommap --randrepeat=0 --ioengine=solarisaio --bs=4k --iodepth=16
> --numjobs=16 --runtime=60 --group_reporting --name=4ktest
>
> thanks, Matej
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] ZIL TXG commits happen very frequently - why?

2015-10-14 Thread Schweiss, Chip

It all has to do with the write throttle and buffers filling.   Here's a
great blog post on how it works and how it's tuned:

http://dtrace.org/blogs/ahl/2014/02/10/the-openzfs-write-throttle/

http://dtrace.org/blogs/ahl/2014/08/31/openzfs-tuning/

-Chip


On Wed, Oct 14, 2015 at 12:45 AM, Rune Tipsmark  wrote:

> Hi all.
>
>
>
> Wondering if anyone could shed some light on why my ZFS pool would perform
> TXG commits up to 5 times per second. It’s set to the default 5 second
> interval and occasionally it does wait 5 seconds between commits, but only
> when nearly idle.
>
>
>
> I’m not sure if this impacts my performance but I would suspect it doesn’t
> improve it. I force sync on all data.
>
>
>
> I got 11 mirrors (7200rpm sas disks) two SLOG devices and two L2 ARC
> devices and a pair of spare disks.
>
>
>
> Each log device can hold 150GB of data so plenty for 2 TXG commits. The
> system has 384GB memory.
>
>
>
>
> Below is a bit of output from zilstat during a near idle time this morning
> so you wont see 4-5 commits per second, but during load later today it will
> happen..
>
>
>
> root@zfs10:/tmp# ./zilstat.ksh -M -t -p pool01 txg
>
> waiting for txg commit...
>
> TIMEtxg   N-MB N-MB/s N-Max-Rate
> B-MB B-MB/s B-Max-Rateops  <=4kB 4-32kB >=32kB
>
> 2015 Oct 14 06:21:19   10872771  3  3  0
> 21 21  2234 14 19201
>
> 2015 Oct 14 06:21:22   10872772 10  3  3
>  70 23 24806  0 84725
>
> 2015 Oct 14 06:21:24   10872773 12  6  5
> 56 28 26682 17107558
>
> 2015 Oct 14 06:21:25   10872774 13 13  2
>  75 75 14651  0 10641
>
> 2015 Oct 14 06:21:25   10872775  0  0  0
> 0  0  0  1  0  0  1
>
> 2015 Oct 14 06:21:26   10872776 11 11  6
> 53 53 29645  2136507
>
> 2015 Oct 14 06:21:30   10872777 11  2  4
> 81 20 32873 11 60804
>
> 2015 Oct 14 06:21:30   10872778  0  0  0
> 0  0  0  1  0  1  0
>
> 2015 Oct 14 06:21:31   10872779 12 12 11
> 56 56 52631  0  8623
>
> 2015 Oct 14 06:21:33   10872780 11  5  4
> 74 37 27858  0 44814
>
> 2015 Oct 14 06:21:36   10872781 14  4  6
> 79 26 30977 12 82883
>
> 2015 Oct 14 06:21:39   10872782 11  3  4
> 78 26 25957 18 55884
>
> 2015 Oct 14 06:21:43   10872783 13  3  4
> 80 20 24930  0135795
>
> 2015 Oct 14 06:21:46   10872784 13  4  4
> 81 27 29965 13 95857
>
> 2015 Oct 14 06:21:49   10872785 11  3  6
> 80 26 41   1077 12215850
>
> 2015 Oct 14 06:21:53   10872786  9  3  2
> 67 22 18870  1 74796
>
> 2015 Oct 14 06:21:56   10872787 12  3  5
> 72 18 26909 17163729
>
> 2015 Oct 14 06:21:58   10872788 12  6  3
> 53 26 21530  0 33497
>
> 2015 Oct 14 06:21:59   10872789 26 26 24
> 72 72 62882 12 60810
>
> 2015 Oct 14 06:22:02   10872790  9  3  5
> 57 19 28777  0 70708
>
> 2015 Oct 14 06:22:07   10872791 11  2  3
> 96 24 22   1044 12 46986
>
> 2015 Oct 14 06:22:10   10872792 13  3  4
> 78 19 22911 12 38862
>
> 2015 Oct 14 06:22:14   10872793 11  2  4
> 79 19 26930 10 94826
>
> 2015 Oct 14 06:22:17   10872794 11  3  5
> 73 24 26   1054 17151886
>
> 2015 Oct 14 06:22:17   10872795  0  0  0
> 0  0  0  2  0  0  2
>
> 2015 Oct 14 06:22:18   10872796 40 40 38
> 78 78 60707  0 28680
>
> 2015 Oct 14 06:22:22   10872797 10  3  3
> 66 22 21937 14164759
>
> 2015 Oct 14 06:22:25   10872798  9  2  2
> 66 16 21821 11 92718
>
> 2015 Oct 14 06:22:28   10872799 24 12 14
> 80 40 43750  0 23727
>
> 2015 Oct 14 06:22:28   10872800  0  0  0
>

Re: [OmniOS-discuss] zfs send/receive corruption?

2015-10-05 Thread Schweiss, Chip

This smells of a problem reported fixed on FreeBSD and ZoL.
http://permalink.gmane.org/gmane.comp.file-systems.openzfs.devel/1545

On the Illumos ZFS the question was posed if the fixed have been
incorporated, but unanswered:
http://www.listbox.com/member/archive/182191/2015/09/sort/time_rev/page/1/entry/23:71/20150916025648:1487D326-5C40-11E5-A45A-20B0EF10038B/

I'd be curious to confirm if this has been fixed in Illimos or not as I now
have systems with lots of CIFS and ACLs and potential vulnerable to the
same sort of problem.  Thus far I cannot find reference to it, but I could
be looking in the wrong place, or for the wrong keywords.

-Chip

On Mon, Oct 5, 2015 at 12:45 PM, Michael Rasmussen  wrote:

> On Mon, 5 Oct 2015 11:30:04 -0600
> Aaron Curry  wrote:
>
> > # zfs get sync pool/fs
> > NAMEPROPERTY  VALUE SOURCE
> > pool/fs  sync  standard  default
> >
> > Is that what you mean?
> >
> Yes. Default means honor sync requests.
>
> --
> Hilsen/Regards
> Michael Rasmussen
>
> Get my public GnuPG keys:
> michael  rasmussen  cc
> http://pgp.mit.edu:11371/pks/lookup?op=get=0xD3C9A00E
> mir  datanom  net
> http://pgp.mit.edu:11371/pks/lookup?op=get=0xE501F51C
> mir  miras  org
> http://pgp.mit.edu:11371/pks/lookup?op=get=0xE3E80917
> --
> /usr/games/fortune -es says:
> Love isn't only blind, it's also deaf, dumb, and stupid.
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] possible bug

2015-09-29 Thread Schweiss, Chip

I've seen issues like this when you run out of NFS locks.   NFSv3 in
Illumos is really slow at releasing locks.

On all my NFS servers I do:

sharectl set -p lockd_listen_backlog=256 nfs
sharectl set -p lockd_servers=2048 nfs

Everywhere I can, I use NFSv4 instead of v3.   It handles lock much better.

-Chip

On Tue, Sep 29, 2015 at 1:22 PM, Hildebrandt, Bill 
wrote:

> Over the past few weeks, I have had 3 separate occurrences where my
> OmniOS/Napp-it NAS stops responding to NFS and CIFS.  The first time was
> during the week of the ZFS corruption bug announcement.  The system and
> it’s replicated storage were both scrubbed and zdb analyzed, and nothing
> looked wrong.  I rebuilt the NAS from scratch with updated patches and
> imported the pool.  Same thing happened three days later, and now today,
> eight days later.  Each time, a reboot is performed to bring it back.  All
> services appear to be running.  The odd thing is that an “ls –l” hangs on
> every mountpoint.  Has anyone heard of this issue?  Since I am not OmniOS
> savvy, is there anything I can capture while in that state that could help
> debug it?
>
>
>
> Thanks,
>
> Bill
>
> --
>
> This e-mail and any documents accompanying it may contain legally
> privileged and/or confidential information belonging to Exegy, Inc. Such
> information may be protected from disclosure by law. The information is
> intended for use by only the addressee. If you are not the intended
> recipient, you are hereby notified that any disclosure or use of the
> information is strictly prohibited. If you have received this e-mail in
> error, please immediately contact the sender by e-mail or phone regarding
> instructions for return or destruction and do not use or disclose the
> content to others.
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] Periodic SSH connect failures

2015-09-10 Thread Schweiss, Chip

On OmniOS r151014 I use ssh with rsa-keys to allow my storage systems to
communicate and launch things like 'zfs receive'

Periodically the connection fails with "ssh_exchange_identification:
Connection closed by remote host"   When this happens about 1/2 the
connection attempts will fail this way for about 10-20 minutes then thing
return to normal.

root@mir-dr-zfs01:/root# ssh -v mirpool02
OpenSSH_6.6, OpenSSL 1.0.1p 9 Jul 2015
debug1: Reading configuration data /etc/opt/csw/ssh/ssh_config
debug1: Connecting to mirpool02 [10.28.125.130] port 22.
debug1: Connection established.
debug1: permanently_set_uid: 0/0
debug1: identity file /root/.ssh/id_rsa type -1
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: identity file /root/.ssh/id_dsa type 2
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa type -1
debug1: identity file /root/.ssh/id_ecdsa-cert type -1
debug1: identity file /root/.ssh/id_ed25519 type -1
debug1: identity file /root/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.6
ssh_exchange_identification: Connection closed by remote host
root@mir-dr-zfs01:/root# echo $?
255

I've not been able to get logs out of the SunSSH server, turning things on
in /etc/syslog.conf, doesn't seem to work.   What am I am missing in trying
to get more information out of the ssh server?

I used OpenSSH client from OpenCSW, using the SunSSH client and the problem
happens nearly twice as often.

Any suggestions on how to make these connections robust?

Thanks!
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] r151014 users - beware of illumos 6214 and L2ARC

2015-09-10 Thread Schweiss, Chip

On Thu, Sep 10, 2015 at 11:43 AM, Dan McDonald <dan...@omniti.com> wrote:

>
> > On Sep 10, 2015, at 12:15 PM, Schweiss, Chip <c...@innovates.com> wrote:
> >
> > Is this limited to r151014 and bloody?
> >
> > I was under the impression this bug went back to the introduction of
> L2ARC compression.
>
> Did you read the analysis of 6214?  It calls out this commit as the cause:
>
> Author: Chris Williamson <chris.william...@delphix.com>
> Date:   Mon Dec 29 19:12:23 2014 -0800
>
> 5408 managing ZFS cache devices requires lots of RAM
> Reviewed by: Christopher Siden <christopher.si...@delphix.com>
> Reviewed by: George Wilson <george.wil...@delphix.com>
> Reviewed by: Matthew Ahrens <mahr...@delphix.com>
> Reviewed by: Don Brady <dev.fs@gmail.com>
> Reviewed by: Josef 'Jeff' Sipek <josef.si...@nexenta.com>
> Approved by: Garrett D'Amore <garr...@damore.org>
>
> That wasn't in '012, just '014 and later.
>

Sorry, I missed that.   I was going off assumptions from other
communications.

-Chip

>
> Dan
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] r151014 users - beware of illumos 6214 and L2ARC

2015-09-10 Thread Schweiss, Chip

Is this limited to r151014 and bloody?

I was under the impression this bug went back to the introduction of L2ARC
compression.

-Chip


On Thu, Sep 10, 2015 at 6:53 AM, Dan McDonald  wrote:

>
> > On Sep 10, 2015, at 7:53 AM, Dan McDonald  wrote:
> >
> > If you are using a zpool with r151014 and you have an L2ARC ("cache")
> vdev, I recommend at this time disabling it.  You may disable it by
> uttering:
>
> This also affects bloody as well.
>
> Dan
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Slow Drive Detection and boot-archive

2015-07-29 Thread Schweiss, Chip

I have an OmniOS box with all the same hardware except the server and hard
disks.  I would wager this something to do with the WD disks and something
different happening in the init.

This is a stab in the dark, but try adding power-condition:false in
/kernel/drv/sd.conf for the WD disks.

-Chip



On Wed, Jul 29, 2015 at 12:48 PM, Michael Talbott mtalb...@lji.org wrote:

 Here's the specs of that server.

 Fujitsu RX300S8
  -
 http://www.fujitsu.com/fts/products/computing/servers/primergy/rack/rx300/
 128G ECC DDR3 1600 RAM
 2 x Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
 2 x LSI 9200-8e
 2 x 10Gb Intel NICs
 2 x SuperMicro 847E26-RJBOD1 45 bay JBOD enclosures
  - http://www.supermicro.com/products/chassis/4U/847/SC847E26-RJBOD1.cfm

 The enclosures are not currently set up for multipathing. The front and
 rear backplane each have a single independent SAS connection to one of the
 LSI 9200s.

 The two enclosures are fully loaded with 45 x 4TB WD4001FYYG-01SL3 drives
 each (90 total).
 http://www.newegg.com/Product/Product.aspx?Item=N82E16822236353

 Booting the server up in Ubuntu or CentOS does not have that 8 second
 delay. Each drive is found in a fraction of a second (activity LEDs on the
 enclosure flash on and off really quick as the drives are scanned). On
 OmniOS, the drives seem to be scanned in the same order, but, instead of it
 spending a fraction of a second on each drive, it spends 8 seconds on 1
 drive (led of only one drive rapidly flashing during that process) before
 moving on to the next x 90 drives.

 Is there anything I can do to get more verbosity in the boot messages that
 might just reveal the root issue?

 Any suggestions appreciated.

 Thanks

 
 Michael Talbott
 Systems Administrator
 La Jolla Institute

 On Jul 29, 2015, at 7:51 AM, Schweiss, Chip c...@innovates.com wrote:



 On Fri, Jul 24, 2015 at 5:03 PM, Michael Talbott mtalb...@lji.org wrote:

 Hi,

 I've downgraded the cards (LSI 9211-8e) to v.19 and disabled their boot
 bios. But I'm still getting the 8 second per drive delay after the kernel
 loads. Any other ideas?


 8 seconds is way too long.   What JBODs and disks are you using?   Could
 it be they are powered off and the delay in waiting for the power on
 command to complete?   This could be accelerated by using lsiutils to send
 them all power on commands first.

 While I still consider it slow, however, my OmniOS systems with  LSI HBAs
 discover about 2 disks per second.   With systems with LOTS of disk all
 multipathed it still stacks up to a long time to discover them all.

 -Chip



 
 Michael Talbott
 Systems Administrator
 La Jolla Institute

  On Jul 20, 2015, at 11:27 PM, Floris van Essen ..:: House of Ancients
 Amstafs ::.. i...@houseofancients.nl wrote:
 
  Michael,
 
  I know v20 does cause lots of issue's.
  V19 , to the best of my knowledge doesn't contain any, so I would
 downgrade to v19
 
 
  Kr,
 
 
  Floris
  -Oorspronkelijk bericht-
  Van: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com]
 Namens Michael Talbott
  Verzonden: dinsdag 21 juli 2015 4:57
  Aan: Marion Hakanson hakan...@ohsu.edu
  CC: omnios-discuss omnios-discuss@lists.omniti.com
  Onderwerp: Re: [OmniOS-discuss] Slow Drive Detection and boot-archive
 
  Thanks for the reply. The bios for the card is disabled already. The 8
 second per drive scan happens after the kernel has already loaded and it is
 scanning for devices. I wonder if it's due to running newer firmware. I did
 update the cards to fw v.20.something before I moved to omnios. Is there a
 particular firmware version on the cards I should run to match OmniOS's
 drivers?
 
 
  
  Michael Talbott
  Systems Administrator
  La Jolla Institute
 
  On Jul 20, 2015, at 6:06 PM, Marion Hakanson hakan...@ohsu.edu
 wrote:
 
  Michael,
 
  I've not seen this;  I do have one system with 120 drives and it
  definitely does not have this problem.  A couple with 80+ drives are
  also free of this issue, though they are still running OpenIndiana.
 
  One thing I pretty much always do here, is to disable the boot option
  in the LSI HBA's config utility (accessible from the during boot after
  the BIOS has started up).  I do this because I don't want the BIOS
  thinking it can boot from any of the external JBOD disks;  And also
  because I've had some system BIOS crashes when they tried to enumerate
  too many drives.  But, this all happens at the BIOS level, before the
  OS has even started up, so in theory it should not affect what you are
  seeing.
 
  Regards,
 
  Marion
 
 
  
  Subject: Re: [OmniOS-discuss] Slow Drive Detection and boot-archive
  From: Michael Talbott mtalb...@lji.org
  Date: Fri, 17 Jul 2015 16:15:47 -0700
  To: omnios-discuss omnios-discuss@lists.omniti.com
 
  Just realized my typo. I'm using this on my 90 and 180 drive systems:
 
  # svccfg -s boot-archive

Re: [OmniOS-discuss] Slow Drive Detection and boot-archive

2015-07-29 Thread Schweiss, Chip

On Fri, Jul 24, 2015 at 5:03 PM, Michael Talbott mtalb...@lji.org wrote:

 Hi,

 I've downgraded the cards (LSI 9211-8e) to v.19 and disabled their boot
 bios. But I'm still getting the 8 second per drive delay after the kernel
 loads. Any other ideas?


8 seconds is way too long.   What JBODs and disks are you using?   Could it
be they are powered off and the delay in waiting for the power on command
to complete?   This could be accelerated by using lsiutils to send them all
power on commands first.

While I still consider it slow, however, my OmniOS systems with  LSI HBAs
discover about 2 disks per second.   With systems with LOTS of disk all
multipathed it still stacks up to a long time to discover them all.

-Chip



 
 Michael Talbott
 Systems Administrator
 La Jolla Institute

  On Jul 20, 2015, at 11:27 PM, Floris van Essen ..:: House of Ancients
 Amstafs ::.. i...@houseofancients.nl wrote:
 
  Michael,
 
  I know v20 does cause lots of issue's.
  V19 , to the best of my knowledge doesn't contain any, so I would
 downgrade to v19
 
 
  Kr,
 
 
  Floris
  -Oorspronkelijk bericht-
  Van: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com]
 Namens Michael Talbott
  Verzonden: dinsdag 21 juli 2015 4:57
  Aan: Marion Hakanson hakan...@ohsu.edu
  CC: omnios-discuss omnios-discuss@lists.omniti.com
  Onderwerp: Re: [OmniOS-discuss] Slow Drive Detection and boot-archive
 
  Thanks for the reply. The bios for the card is disabled already. The 8
 second per drive scan happens after the kernel has already loaded and it is
 scanning for devices. I wonder if it's due to running newer firmware. I did
 update the cards to fw v.20.something before I moved to omnios. Is there a
 particular firmware version on the cards I should run to match OmniOS's
 drivers?
 
 
  
  Michael Talbott
  Systems Administrator
  La Jolla Institute
 
  On Jul 20, 2015, at 6:06 PM, Marion Hakanson hakan...@ohsu.edu wrote:
 
  Michael,
 
  I've not seen this;  I do have one system with 120 drives and it
  definitely does not have this problem.  A couple with 80+ drives are
  also free of this issue, though they are still running OpenIndiana.
 
  One thing I pretty much always do here, is to disable the boot option
  in the LSI HBA's config utility (accessible from the during boot after
  the BIOS has started up).  I do this because I don't want the BIOS
  thinking it can boot from any of the external JBOD disks;  And also
  because I've had some system BIOS crashes when they tried to enumerate
  too many drives.  But, this all happens at the BIOS level, before the
  OS has even started up, so in theory it should not affect what you are
  seeing.
 
  Regards,
 
  Marion
 
 
  
  Subject: Re: [OmniOS-discuss] Slow Drive Detection and boot-archive
  From: Michael Talbott mtalb...@lji.org
  Date: Fri, 17 Jul 2015 16:15:47 -0700
  To: omnios-discuss omnios-discuss@lists.omniti.com
 
  Just realized my typo. I'm using this on my 90 and 180 drive systems:
 
  # svccfg -s boot-archive setprop start/timeout_seconds=720 # svccfg -s
  boot-archive setprop start/timeout_seconds=1440
 
  Seems like 8 seconds to detect each drive is pretty excessive.
 
  Any ideas on how to speed that up?
 
 
  
  Michael Talbott
  Systems Administrator
  La Jolla Institute
 
  On Jul 17, 2015, at 4:07 PM, Michael Talbott mtalb...@lji.org wrote:
 
  I have multiple NAS servers I've moved to OmniOS and each of them have
 90-180 4T disks. Everything has worked out pretty well for the most part.
 But I've come into an issue where when I reboot any of them, I'm getting
 boot-archive service timeouts happening. I found a workaround of increasing
 the timeout value which brings me to the following. As you can see below in
 a dmesg output, it's taking the kernel about 8 seconds to detect each of
 the drives. They're connected via a couple SAS2008 based LSI cards.
 
  Is this normal?
  Is there a way to speed that up?
 
  I've fixed my frustrating boot-archive timeout problem by adjusting
 the timeout value from the default of 60 seconds (I guess that'll work ok
 on systems with less than 8 drives?) to 8 seconds * 90 drives + a little
 extra time = 280 seconds (for the 90 drive systems). Which means it takes
 between 12-24 minutes to boot those machines up.
 
  # svccfg -s boot-archive setprop start/timeout_seconds=280
 
  I figure I can't be the only one. A little googling also revealed:
  https://www.illumos.org/issues/4614
  https://www.illumos.org/issues/4614
 
  Jul 17 15:40:15 store2 genunix: [ID 583861 kern.info] sd29 at
  mpt_sas3: unit-address w5c0f0401bd43,0: w5c0f0401bd43,0 Jul
  17 15:40:15 store2 genunix: [ID 936769 kern.info] sd29 is
  /pci@0,0/pci8086,e06@2,2/pci1000,3080@0/iport@f/disk@w5c0f0401bd4
  3,0 Jul 17 15:40:16 store2 genunix: [ID 408114 kern.info]

Re: [OmniOS-discuss] Zil Device

2015-07-16 Thread Schweiss, Chip

The 850 Pro should never be used as a log device. It does not have power
fail protection of its ram cache. You might as well set sync=disabled and
skip using a log device entirely because the 850 Pro is not protecting your
last transactions in case of power failure.

Only SSDs with power failure protection should be considered for log
devices.

That being said, unless your running application that need transaction
consistency such as databases, don't bother with using a log device and set
sync=disabled.

-Chip

On Thu, Jul 16, 2015 at 11:55 AM, Doug Hughes d...@will.to wrote:

8GB zil on very active server and 100+GB ssd lasts many years. We have
yet, after years of use of various SSDs, to have one fail from wear usage,
and that's with fairly active NFS use.
They usually fail for other reasons.
We started with with Intel X series, which are only 32GB in size, and some
of them are still active, though less active use now. With Samsung 850 pro,
you practically don't have to worry about it, and the price is really good.

On Thu, Jul 16, 2015 at 12:36 PM, Brogyányi József bro...@gmail.com
wrote:

Hi Doug

Can you write its life time? I don't trust any SSD but I've thinking for
a while to use as a ZIL+L2ARC.
Could you share with us your experiences? I would be interested in server
usage. Thanks.

2015.07.15. 22:42 keltezéssel, Doug Hughes írta:

We have been preferring commodity SSD like Intel 320 (older), intel 710,
or currently, Samsung 850 pro. We also use it as boot drive and reserve an
8GB slide for ZIL so that massive synchronous NFS IOPS are manageable.

Sent from my android device.

-Original Message-
From: Matthew Lagoe matthew.la...@subrigo.net
matthew.la...@subrigo.net
To: omnios-discuss@lists.omniti.com
Sent: Wed, 15 Jul 2015 16:29
Subject: [OmniOS-discuss] Zil Device

Is the zeusram SSD still the big zil device out there or are there other
high performance reliable options that anyone knows of on the market now?
I
can't go with like the DDRdrive as its pcie.

Thanks

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing
listOmniOS-discuss@lists.omniti.comhttp://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] re-tune round-robin reading from a mirror

2015-07-15 Thread Schweiss, Chip

This is a very interesting idea.  It could allow for the creation of a
scratch pool with great ROI.   I have a particular need for an extremely
high read rate pool for data analysis that will leverage the fattest read
optimized SSDs I can get for the dollar.  I was considering this to be
raidz1, but if mirroring with disks would work this way, it could be even
better bang for the buck.

Did you ever do any actual testing with this type of setup?  I'd love to
see some real world performance data.

-Chip

On Wed, Jul 15, 2015 at 10:09 AM, Jim Klimov jimkli...@cos.ru wrote:

 15 июля 2015 г. 14:10:15 CEST, Michael Mounteney gat...@landcroft.co.uk
 пишет:
 Hello list;  is it possible with OmniOS to have a multi-way mirror with
 one disk being an SSD and the rest magnetic;  then to tune ZFS to
 perform all reads from the SSD?  for the sake of performance.  The
 default case is round-robin reading, which is the best if all disks are
 of approximately equal performance, especially if they're on separate
 controllers.  But SSD changes that.
 
 __
 Michael Mounteney
 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

 When I last asked a few years ago (but IIRC forma mirror of local+iSCSI
 vdevs), the answer was along the lines that round-robin first considers the
 available devices. If the faster (ssd, local) device has no queue, it gets
 the load while the slower device still struggles with the task it has, so
 on average the faster device serves more io's - but not 100%. Queue depth
 tuning can also help here.

 Jim

 --
 Typos courtesy of K-9 Mail on my Samsung Android
 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] big zfs storage?

2015-07-13 Thread Schweiss, Chip

Liam,

This report is encouraging.  Please share some details of your
configuration.   What disk failure parameters are have you set?   Which
JBODs and disks are you running?

I have mostly DataON JBODs and a some Supermicro.   DataON has PMC SAS
expanders and Supermicro has LSI, both setups have pretty much the same
behavior with disk failures.   All my servers are Supermicro with LSI HBAs.

If there's a magic combination of hardware and OS config out there that
solves the disk failure panic problem, I will certainly change my builds
going forward.

-Chip

On Fri, Jul 10, 2015 at 1:04 PM, Liam Slusser lslus...@gmail.com wrote:

 I have two 800T ZFS systems on OmniOS and a bunch of smaller 50T
 systems.  Things generally work very well.  We loose a disk here and there
 but its never resulted in downtime.  They're all on Dell hardware with LSI
 or Dell PERC controllers.

 Putting in smaller disk failure parameters, so disks fail quicker, was a
 big help when something does go wrong with a disk.

 thanks,
 liam


 On Fri, Jul 10, 2015 at 10:31 AM, Schweiss, Chip c...@innovates.com
 wrote:

 Unfortunately for the past couple years panics on disk failure has been
 the norm.   All my production systems are HA with RSF-1, so at least things
 come back online relatively quick.  There are quite a few open tickets in
 the Illumos bug tracker related to mpt_sas related panics.

 Most of the work to fix these problems has been committed in the past
 year, though problems still exist.  For example, my systems are dual path
 SAS, however, mpt_sas will panic if you pull a cable instead of dropping a
 path to the disks.  Dan McDonald is actively working to resolve this.   He
 is also pushing a bug fix in genunix from Nexenta that appears to fix a lot
 of the panic problems.   I'll know for sure in a few months after I see a
 disk or two drop if it truly fixes things.  Hans Rosenfeld at Nexenta is
 responsible for most of the updates to mpt_sas including support for 3008
 (12G SAS).

 I haven't run any 12G SAS yet, but plan to on my next build in a couple
 months.   This will be about 300TB using an 84 disk JBOD.  All the code
 from Nexenta to support the 3008 appears to be in Illumos now, and they
 fully support it so I suspect it's pretty stable now.  From what I
 understand there may be some 12G performance fixes coming sometime.

 The fault manager is nice when the system doesn't panic.  When it panics,
 the fault manger never gets a chance to take action.  It is still the
 consensus that is is better to run pools without hot spares because there
 are situations the fault manager will do bad things.   I witnessed this
 myself when building a system and the fault manger replaced 5 disks in a
 raidz2 vdev inside 1 minute, trashing the pool.   I haven't completely
 yield to the best practice.  I now run one hot spare per pool.  I figure
 with raidz2, the odds of the fault manager causing something catastrophic
 is much less possible.

 -Chip



 On Fri, Jul 10, 2015 at 11:37 AM, Linda Kateley lkate...@kateley.com
 wrote:

  I have to build and maintain my own system. I usually help others
 build(i teach zfs and freenas classes/consulting). I really love fault
 management in solaris and miss it. Just thought since it's my system and I
 get to choose I would use omni. I have 20+ years using solaris and only 2
 on freebsd.

 I like freebsd for how well tuned for zfs oob. I miss the network, v12n
 and resource controls in solaris.

 Concerned about panics on disk failure. Is that common?


 linda


 On 7/9/15 9:30 PM, Schweiss, Chip wrote:

   Linda,

  I have 3.5 PB running under OmniOS.  All my systems have LSI 2108 HBAs
 which is considered the best choice for HBAs.

 Illumos leaves a bit to be desired with handling faults from disks or
 SAS problems, but things under OmniOS have been improving, much thanks to
 Dan McDonald and OmniTI.   We have a paid support on all of our production
 systems with OmniTI.  Their response and dedication has been very good.
 Other than the occasional panic and restart from a disk failure, OmniOS has
 been solid.   ZFS of course never has lost a single bit of information.

  I'd be curious why you're looking to move, have there been specific
 problems under BSD or ZoL?  I've been slowly evaluating FreeBSD ZFS, but of
 course the skeletons in the closet never seem to come out until you do
 something big.

  -Chip

 On Thu, Jul 9, 2015 at 4:21 PM, Linda Kateley lkate...@kateley.com
 wrote:

 Hey is there anyone out there running big zfs on omni?

 I have been doing mostly zol and freebsd for the last year but have to
 build a 300+TB box and i want to come back home to roots(solaris). Feeling
 kind of hesitant :) Also, if you had to do over, is there anything you
 would do different.

 Also, what is the go to HBA these days? Seems like i saw stable code
 for lsi 3008?

 TIA

 linda


 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com

Re: [OmniOS-discuss] big zfs storage?

2015-07-10 Thread Schweiss, Chip

Unfortunately for the past couple years panics on disk failure has been the
norm.   All my production systems are HA with RSF-1, so at least things
come back online relatively quick.  There are quite a few open tickets in
the Illumos bug tracker related to mpt_sas related panics.

Most of the work to fix these problems has been committed in the past year,
though problems still exist.  For example, my systems are dual path SAS,
however, mpt_sas will panic if you pull a cable instead of dropping a path
to the disks.  Dan McDonald is actively working to resolve this.   He is
also pushing a bug fix in genunix from Nexenta that appears to fix a lot of
the panic problems.   I'll know for sure in a few months after I see a disk
or two drop if it truly fixes things.  Hans Rosenfeld at Nexenta is
responsible for most of the updates to mpt_sas including support for 3008
(12G SAS).

I haven't run any 12G SAS yet, but plan to on my next build in a couple
months.   This will be about 300TB using an 84 disk JBOD.  All the code
from Nexenta to support the 3008 appears to be in Illumos now, and they
fully support it so I suspect it's pretty stable now.  From what I
understand there may be some 12G performance fixes coming sometime.

The fault manager is nice when the system doesn't panic.  When it panics,
the fault manger never gets a chance to take action.  It is still the
consensus that is is better to run pools without hot spares because there
are situations the fault manager will do bad things.   I witnessed this
myself when building a system and the fault manger replaced 5 disks in a
raidz2 vdev inside 1 minute, trashing the pool.   I haven't completely
yield to the best practice.  I now run one hot spare per pool.  I figure
with raidz2, the odds of the fault manager causing something catastrophic
is much less possible.

-Chip



On Fri, Jul 10, 2015 at 11:37 AM, Linda Kateley lkate...@kateley.com
wrote:

  I have to build and maintain my own system. I usually help others build(i
 teach zfs and freenas classes/consulting). I really love fault management
 in solaris and miss it. Just thought since it's my system and I get to
 choose I would use omni. I have 20+ years using solaris and only 2 on
 freebsd.

 I like freebsd for how well tuned for zfs oob. I miss the network, v12n
 and resource controls in solaris.

 Concerned about panics on disk failure. Is that common?


linda


 On 7/9/15 9:30 PM, Schweiss, Chip wrote:

   Linda,

  I have 3.5 PB running under OmniOS.  All my systems have LSI 2108 HBAs
 which is considered the best choice for HBAs.

 Illumos leaves a bit to be desired with handling faults from disks or SAS
 problems, but things under OmniOS have been improving, much thanks to Dan
 McDonald and OmniTI.   We have a paid support on all of our production
 systems with OmniTI.  Their response and dedication has been very good.
 Other than the occasional panic and restart from a disk failure, OmniOS has
 been solid.   ZFS of course never has lost a single bit of information.

  I'd be curious why you're looking to move, have there been specific
 problems under BSD or ZoL?  I've been slowly evaluating FreeBSD ZFS, but of
 course the skeletons in the closet never seem to come out until you do
 something big.

  -Chip

 On Thu, Jul 9, 2015 at 4:21 PM, Linda Kateley lkate...@kateley.com
 wrote:

 Hey is there anyone out there running big zfs on omni?

 I have been doing mostly zol and freebsd for the last year but have to
 build a 300+TB box and i want to come back home to roots(solaris). Feeling
 kind of hesitant :) Also, if you had to do over, is there anything you
 would do different.

 Also, what is the go to HBA these days? Seems like i saw stable code for
 lsi 3008?

 TIA

 linda


 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss



 --
 Linda Kateley
 Kateley Company
 Skype ID-kateleycohttp://kateleyco.com


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] big zfs storage?

2015-07-09 Thread Schweiss, Chip

Linda,

I have 3.5 PB running under OmniOS.  All my systems have LSI 2108 HBAs
which is considered the best choice for HBAs.

Illumos leaves a bit to be desired with handling faults from disks or SAS
problems, but things under OmniOS have been improving, much thanks to Dan
McDonald and OmniTI.   We have a paid support on all of our production
systems with OmniTI.  Their response and dedication has been very good.
Other than the occasional panic and restart from a disk failure, OmniOS has
been solid.   ZFS of course never has lost a single bit of information.

I'd be curious why you're looking to move, have there been specific
problems under BSD or ZoL?  I've been slowly evaluating FreeBSD ZFS, but of
course the skeletons in the closet never seem to come out until you do
something big.

-Chip

On Thu, Jul 9, 2015 at 4:21 PM, Linda Kateley lkate...@kateley.com wrote:

 Hey is there anyone out there running big zfs on omni?

 I have been doing mostly zol and freebsd for the last year but have to
 build a 300+TB box and i want to come back home to roots(solaris). Feeling
 kind of hesitant :) Also, if you had to do over, is there anything you
 would do different.

 Also, what is the go to HBA these days? Seems like i saw stable code for
 lsi 3008?

 TIA

 linda


 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Highly Available ZFS

2015-06-29 Thread Schweiss, Chip

On Mon, Jun 29, 2015 at 1:52 PM, Michael Rasmussen m...@miras.org wrote:

 Does anybody have an idea of how Nexenta does their HA-setup?

 My guess is that it must involve something with a constant snapshot of
 the pool using zfs send combined with forced import.


Nexenta uses RSF-1 from HighAvailability.com.  It is dual servers connected
to SAS devices.  Pools are exported either gracefully or by force on one
host and imported on the other.   A floating IP address allows the clients
to maintain connectivity.

I use RSF-1 with OmniOS.   It works well, but HA in general has a steep
learning curve and A LOT of gotchas that are not well documented anywhere.

It took me about a year of learning until HA started actually increasing my
storage availability.  The price of RSF-1 is well justified if you don't
have a lot experience with HA on ZFS.   Eventually I will attempt HA
without it, but in the mean time it is serving me well.

-Chip



 --
 Hilsen/Regards
 Michael Rasmussen

 Get my public GnuPG keys:
 michael at rasmussen dot cc
 http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E
 mir at datanom dot net
 http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C
 mir at miras dot org
 http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917
 --
 /usr/games/fortune -es says:
 Ego sum ens omnipotens.

 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Zpool export while resilvering?

2015-06-10 Thread Schweiss, Chip

On Wed, Jun 10, 2015 at 4:09 AM, Robert A. Brock 
robert.br...@2hoffshore.com wrote:

  What did you use to flash them? Fwflash just gives an error about
 firmware file being too large.


Santools.

-Chip



 *From:* Schweiss, Chip [mailto:c...@innovates.com]
 *Sent:* 09 June 2015 21:25
 *To:* Robert A. Brock
 *Cc:* omnios-discuss
 *Subject:* Re: [OmniOS-discuss] Zpool export while resilvering?



 I went through this problem a while back.  There are some gotchas in
 getting them back online and firmware upgraded.   The is will not talk to
 the drive until it has its firmware upgraded or cleared from the fault
 database.

 This drives will not flash with multipath enabled either.

 I ended up clearing the fault manager's database, disabling it and
 disconnecting half the SAS cables to get them flashed.

 -Chip



 snip
  2H Offshore Engineering Ltd | Registered in England No. 02790139 |
 Registered office: Ferryside, Ferry Road, Norwich NR1 1SW.


  2H Offshore is an Acteon company specializing in the design, monitoring
 and integrity management of offshore riser and conductor systems. Acteon is
 a group of specialist international engineering companies serving the
 offshore oil and gas industry. Its focus is on subsea services spanning the
 entire life of field. For more information, visit www.acteon.com


  The information in and/or accompanying this email is intended for the
 use of the stated recipient only and may be confidential and/or privileged.
 It should not be forwarded or copied nor should its contents be disclosed
 in any manner without the express consent of the sender/author. Any views
 or opinions presented are solely those of the author and do not necessarily
 represent those of 2H.

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Zpool export while resilvering?

2015-06-09 Thread Schweiss, Chip

I went through this problem a while back.  There are some gotchas in
getting them back online and firmware upgraded.   The is will not talk to
the drive until it has its firmware upgraded or cleared from the fault
database.

This drives will not flash with multipath enabled either.

I ended up clearing the fault manager's database, disabling it and
disconnecting half the SAS cables to get them flashed.

-Chip
On Jun 9, 2015 2:32 PM, Robert A. Brock robert.br...@2hoffshore.com
wrote:

  They are failed as far as OmniOS is concerned, from what I can tell:



 Jun 08 01:08:54 710768e8-2f2b-4b3d-9d4b-a85ef5617219  DISK-8000-12   Major



 Host: 2hus291

 Platform: S5500BC   Chassis_id  : 

 Product_sn  :



 Fault class : fault.io.disk.over-temperature

 Affects : dev:///:devid=id1,sd@n5000c5007242271f
 //scsi_vhci/disk@g5000c5007242271f

   faulted and taken out of service

 FRU : Slot 21
 (hc://:product-id=LSI-SAS2X36:server-id=:chassis-id=500304800033213f:serial=S1Z02A8MK4361NF4:part=SEAGATE-ST4000NM0023:revision=0003/ses-enclosure=1/bay=20/disk=0)

   faulty



 Description : A disk's temperature exceeded the limits established by

   its manufacturer.

   Refer to http://illumos.org/msg/DISK-8000-12 for more

   information.



 root@2hus291:/root# zpool status pool0

   pool: pool0

 state: DEGRADED

 status: One or more devices is currently being resilvered.  The pool will

 continue to function, possibly in a degraded state.

 action: Wait for the resilver to complete.

   scan: resilver in progress since Tue Jun  9 11:11:16 2015

 18.8T scanned out of 91.7T at 667M/s, 31h48m to go

 591G resilvered, 20.55% done

 config:



 NAME STATE READ WRITE CKSUM

 pool0DEGRADED 0 0 0

   raidz2-0   DEGRADED 0 0 0

 c0t5000C50055ECA49Bd0ONLINE   0 0 0

 c0t5000C50055ECA4B3d0ONLINE   0 0 0

 c0t5000C50055ECA587d0ONLINE   0 0 0

 c0t5000C50055ECA6CFd0ONLINE   0 0 0

 c0t5000C50055ECA7F3d0ONLINE   0 0 0

 spare-5  REMOVED  0 0 0

   c0t5000C5007242271Fd0  REMOVED  0 0 0

   c0t5000C50055EF8A6Fd0  ONLINE   0 0 0
 (resilvering)

 c0t5000C50055ECAB23d0ONLINE   0 0 0

 c0t5000C50055ECABABd0ONLINE   0 0 0

   raidz2-1   ONLINE   0 0 0

 c0t5000C50055EE9D87d0ONLINE   0 0 0

 c0t5000C50055EE9E43d0ONLINE   0 0 0

 c0t5000C50055EEA5ABd0ONLINE   0 0 0

 c0t5000C50055EEBA5Fd0ONLINE   0 0 0

 c0t5000C50055EEC1E3d0ONLINE   0 0 0

 c0t5000C500636670BFd0ONLINE   0 0 0

 c0t5000C50055EF8CBBd0ONLINE   0 0 0

 c0t5000C50055EF8D33d0ONLINE   0 0 0

   raidz2-2   ONLINE   0 0 0

 c0t5000C50055F7942Fd0ONLINE   0 0 0

 c0t5000C50055F79E03d0ONLINE   0 0 0

 c0t5000C50055F7A8DFd0ONLINE   0 0 0

 c0t5000C50055F81C1Bd0ONLINE   0 0 0

 c0t5000C5005604A42Bd0ONLINE   0 0 0

 c0t5000C5005604A487d0ONLINE   0 0 0

 c0t5000C5005604A74Bd0ONLINE   0 0 0

 c0t5000C5005604A91Bd0ONLINE   0 0 0

   raidz2-4   DEGRADED 0 0 0

 c0t5000C500562ED6A3d0ONLINE   0 0 0

 c0t5000C500562F8DEFd0ONLINE   0 0 0

 c0t5000C500562F92D7d0ONLINE   0 0 0

 c0t5000C500562FA0DFd0ONLINE   0 0 0

 c0t5000C500636679EBd0ONLINE   0 0 0

 spare-5  DEGRADED 0 014

   c0t5000C50057FBB127d0  REMOVED  0 0 0

   c0t5000C5006366906Bd0  ONLINE   0 0 0

 c0t5000C5006366808Fd0ONLINE   0 0 0

 spare-7  REMOVED  0 0 0

   c0t5000C50057FC84F3d0  REMOVED  0 0 0

   c0t5000C50063669937d0  ONLINE   0 0 0

 logs

   mirror-3   ONLINE   0 0 0

 c13t5003048000308398d0   ONLINE   0 0 0

 c13t5003048000308399d0   ONLINE   0 0 0

 cache

   c13t5E83A9705BC3d0 ONLINE   0 0 0

Re: [OmniOS-discuss] Backing up HUGE zfs volumes

2015-05-21 Thread Schweiss, Chip

I would caution against anything using 'zfs diff'  It has been perpetually
broken, either not working at all, or returning incomplete information.

Avoiding crawling the directory is pretty much impossible unless you use
'zfs send'.   However, as long as there is enough cache on the system,
directory crawls can be very efficient.I have daily rsync jobs that
crawl over 200 million files.   The impact of the crawl is not noticeable
to other users.

I has also used ZFS send to AWS Glacier.   This worked well until the data
change rate got high enough I need to start over too often to keep the
storage size reasonable on Glacier.

I also use CrashPlan on my home OmniOS server to back up about 5TB.  It
works very nicely.

-Chip

On Wed, May 20, 2015 at 6:51 PM, Michael Talbott mtalb...@lji.org wrote:

 I'm trying to find ways of efficiently archiving up some huge (120TB and
 growing) zfs volumes with millions maybe billions of files of all sizes. I
 use zfs send/recv for replication to another box for tier 1/2 recovery.
 But, I'm trying to find a good open source solution that runs on Omni for
 archival purposes that doesn't have to crawl the filesystem or rely on any
 proprietary formats.

 I was thinking I could use zfs diff to get a list of changed data, parse
 that into a usable format, create a tar and par of the data, and an
 accompanying plain text index file. From there, upload that set of data to
 a cloud provider. While I could probably script it all out myself to
 accomplish this, I'm hoping someone knows of an existing solution that can
 produce somewhat similar results.

 Ideas anyone?

 Thanks,

 Michael
 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] High density 2.5 chassis

2015-05-09 Thread Schweiss, Chip

I have an SSD server in one of those chassis.  Here's a write-up about it
on my blog, there are 3 postings about it.

http://www.bigdatajunkie.com/index.php/9-solaris/zfs/10-short-stroking-consumer-ssds

Not necessarily a build for everyone, but it has been absolutely awesome
for our use. After a few bumps at the beginning and giving up on HA on this
server, it has been rock solid.  Many will swear against the interposers,
but combined with Samsung SSDs they have worked very well.

-Chip


On Sat, May 9, 2015 at 1:06 PM, Chris Nagele nag...@wildbit.com wrote:

 Hi all. Continuing on my all SSD discussion, I am looking for some
 recommendations on a new Supermicro
 chassis for our file servers. So far I have been looking at this
 thing:

 http://www.supermicro.com/products/chassis/4U/417/SC417E16-R1400LP.cfm

 Does anyone have experience with this? If so, what would you recommend
 for a motherboard and HBA to support all of the disks? We've
 traditionally used the X9DRD-7LN4F-JBOD or the X9DRi-F with a LSI
 9211-8i HBA.

 Thanks,
 Chris
 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] What repos do people use to build a *AMP server?

2015-05-08 Thread Schweiss, Chip

I've done really well with the OpenCSW packages on OmniOS.

-Chip
On May 8, 2015 11:50 AM, Saso Kiselkov skiselkov...@gmail.com wrote:

 I've decided to try and update my r151006 box to something newer, seeing
 as r151014 just came out and it's supposed to be LTS. Trouble is, I'm
 trying to build a *AMP box and I can't find any prebuilt packages for it
 in any of these repos:
 http://omnios.omniti.com/wiki.php/Packaging
 What do you guys use for getting pre-built software? Do all people here
 just roll their own?

 Also, allow me to say, I *hate* consolidations and the way they lock
 accessible package versions. Where are the days when OSes used to be
 backwards-compatible?

 Cheers,
 --
 Saso
 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] slow ssh login, maybee to many locales?

2015-04-20 Thread Schweiss, Chip

Slow ssh logon is almost always reverse DNS problems on the server side.
Adding the client to the server's /etc/hosts will usually resolve the
problem.

-Chip

On Sat, Apr 18, 2015 at 10:37 PM, PÁSZTOR György 
pasz...@sagv5.gyakg.u-szeged.hu wrote:

 Hi,

 I faced with that, the login onto my new omnios zone is slow.
 I tried to debug.
 Many of the symptoms seemed pretty the same as this:

 http://broken.net/uncategorized/resolving-slow-ssh-login-performance-problems-on-openindiana/

 Also in my case it stopped at the same point: after the kexinit sent.

 However, on my omnios, the cryptadm list showed this:
 pasztor@omni:~$ cryptoadm list

 User-level providers:
 Provider: /usr/lib/security/$ISA/pkcs11_kernel.so
 Provider: /usr/lib/security/$ISA/pkcs11_softtoken.so

 Kernel software providers:
 des
 aes
 arcfour
 blowfish
 ecc
 sha1
 sha2
 md4
 md5
 rsa
 swrand

 Kernel hardware providers:
 [---end of output]

 So, in my case it did not contained the tpm module.
 I tried the opposite: enabling the tpm module, but nothing changed.
 (Maybe it become even slower. I did not count the seconds)
 So, I rewert it back, and run the same truss command, which revealed this:
 There are tons's of file openings in the /usr/lib/locale dir at that point:

 24560:   8.8232 stat(/usr/lib/locale/is_IS.UTF-8, 0x08047D48) = 0
 24560:   8.8234 open(/usr/lib/locale//is_IS.UTF-8/LC_CTYPE/LCL_DATA,
 O_RDONLY) = 7
 24560:   8.8236 fstat(7, 0x08047658)= 0
 24560:   8.8237 mmap(0x, 94904, PROT_READ, MAP_PRIVATE, 7, 0) =
 0xFEDE7000
 24560:   8.8238 close(7)= 0
 ...
 24560:  14.5883
 open(/usr/lib/locale//el_GR.ISO8859-7/LC_MESSAGES/LCL_DATA, O_RDONLY) = 7
 24560:  14.5884 fstat(7, 0x08047678)= 0
 24560:  14.6061 read(7,  ^ ( ( [EDCD ] ( [E1C1 ].., 82)   = 82
 24560:  14.6063 close(7)= 0
 24560:  14.6065 getdents64(5, 0xFEE04000, 8192) = 0
 24560:  14.6069 ioctl(1, TCGETA, 0x08046DBE)Err#22
 EINVAL
 24560:  14.6069 fstat64(1, 0x08046E00)  = 0
 24560:  14.6070 brk(0x080689D0) = 0
 24560:  14.6071 brk(0x0806A9D0) = 0
 24560:  14.6072 fstat64(1, 0x08046D00)  = 0
 24560:  14.6074 close(5)= 0
 24560:  14.6075 write(1,  C\n P O S I X\n a f _ Z.., 2891)= 2891
 24556:  14.6076 read(3,  C\n P O S I X\n a f _ Z.., 5120) = 2891
 24560:  14.6077 _exit(0)
 24556:  14.6080 brk(0x080D0488) = 0
 24556:  14.6082 brk(0x080D2488) = 0
 24556:  14.6083 read(3, 0x080CD544, 5120)   = 0
 24556:  14.6084 llseek(3, 0, SEEK_CUR)  Err#29
 ESPIPE
 24556:  14.6085 close(3)= 0
 24556:  14.6296 waitid(P_PID, 24560, 0x080473F0, WEXITED|WTRAPPED) = 0

 So, does somebody knows what is happening at that point,
 why,
 and how can I fine-tune it?

 Kind regards,
 György Pásztor
 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] esxi 5.5 to omnios r151014 nfs server issue

2015-04-06 Thread Schweiss, Chip

On Mon, Apr 6, 2015 at 9:30 AM, Dan McDonald dan...@omniti.com wrote:


  On Apr 6, 2015, at 5:50 AM, Hafiz Rafiyev rafibe...@gmail.com wrote:
 
 
  only log I see from omnios side is:
 
  nfs4cbd[468]: [ID 867284 daemon.notice] nfsv4 cannot determine local
 hostname binding for transport tcp6 - delegations will not be available on
 this transport

 Are you having DNS problems?

 This error is in an unchanged subsystem, the NFSv4 callback daemon.  The
 error looks like something caused by a naming-services failure.


I'd say this is a red-herring.  ESXi 5.5 will only use NFSv3.  However, DNS
resolution is critical for ESXi NFS mounts even when mounting via IP
address.

I always put host entries in /etc/hosts on each ESXi host for all other
hosts, vCenter and NFS servers.   The same on the NFS server side.  I
learned this years ago on a 3 AM call to VMware support. :)

-Chip




 I'll forward your note along to an illumos-community NFS expert, I may
 find out more.

 Dan

 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Fwd: All SSD pool advice

2015-04-06 Thread Schweiss, Chip

On Mon, Apr 6, 2015 at 8:53 AM, Fábio Rabelo fa...@fabiorabelo.wiki.br
wrote:

 Sorry, forget to forward to the list ...

 -- Forwarded message --
 From: Fábio Rabelo fa...@fabiorabelo.wiki.br
 Date: 2015-04-06 10:51 GMT-03:00
 Subject: Re: [OmniOS-discuss] All SSD pool advice
 To: Chris Nagele nag...@wildbit.com

 I never get my hands at that 4U model ...

 I have 2 of this babys in a customer of mine :

 http://www.supermicro.com/products/chassis/2U/216/SC216BA-R1K28LP.cfm

 Each one with 24 1TB Samsung 850PRO for a litle over an year,
 OminOS+Napp-it , no issue whatsoever ...

 Expanded Chassis brings me lots and lots of headaches  ...

The system I've built with interposers has SAS expanders and gives me no
problems.  Samsung SSDs are the only SSD I've found that works well with
the interposer.

-Chip

 Fábio Rabelo

 2015-04-06 10:41 GMT-03:00 Chris Nagele nag...@wildbit.com:

 Thanks everyone. Regarding the expanders, our 4U servers are on the
 following chassis:

 http://www.supermicro.com/products/chassis/4U/846/SC846E16-R1200.cfm

 We are using all SAS disks, except for the SSDs. How big is the risk
 here when it comes to SAS - SATA conversion? Our newer servers have
 direct connections on each lane to the disk.

 Chris

 Chris Nagele
 Co-founder, Wildbit
 Beanstalk, Postmark, dploy.io

 On Sat, Apr 4, 2015 at 7:18 PM, Doug Hughes d...@will.to wrote:

  We have a couple of machines with all SSD pool (~6-10 Samsung 850 pro
 is the
  current favorite). They work great for IOPS. Here's my take.
  1) you don't need a dedicated zil. Just let the zpool intersperse it
 amongst
  the existing zpool devices. They are plenty fast enough.
  2) you don't need an L2arc for the same reason. a smaller number of
  dedicated devices would likely cause more of a bottleneck than serving
 off
  the existing pool devices (unless you were to put it on one of those
 giant
  RDRAM things or similar, but that adds a lot of expense)

  On 4/4/2015 3:07 PM, Chris Nagele wrote:

  We've been running a few 4U Supermicro servers using ZeusRAM for zil and
  SSDs for L2. The main disks are regular 1TB SAS.

  I'm considering moving to all SSD since the pricing has dropped so much.
  What things should I know or do when moving to all SSD pools? I'm
 assuming I
  don't need L2 and that I should keep the ZeusRAM. Should I only use
 certain
  types of SSDs?

  Thanks,
  Chris

  --

  Chris Nagele
  Co-founder, Wildbit
  Beanstalk, Postmark, dploy.io

  ___
  OmniOS-discuss mailing list
  OmniOS-discuss@lists.omniti.com
  http://lists.omniti.com/mailman/listinfo/omnios-discuss

  ___
  OmniOS-discuss mailing list
  OmniOS-discuss@lists.omniti.com
  http://lists.omniti.com/mailman/listinfo/omnios-discuss

 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] All SSD pool advice

2015-04-06 Thread Schweiss, Chip

On Mon, Apr 6, 2015 at 8:41 AM, Chris Nagele nag...@wildbit.com wrote:

 Thanks everyone. Regarding the expanders, our 4U servers are on the
 following chassis:

 http://www.supermicro.com/products/chassis/4U/846/SC846E16-R1200.cfm

 We are using all SAS disks, except for the SSDs. How big is the risk
 here when it comes to SAS - SATA conversion? Our newer servers have
 direct connections on each lane to the disk.


There are A LOT of opinions on this.  What I have done that has worked
extremely well  was use 70 Samsung 840 Pro SSD with LSI interposers in
Supermicro chassis.  There were a couple early failures of the interposers
but it has been rock solid ever since.   mpt_sas blue chunks and panic'd
the system on one.  On another one I caught it in action and doing a 'zpool
offline {device}' kept everything running without a hitch.

I run with ZIL off because this is used entirely for scratch data and
virtual machines that can be redeployed in minutes.   It would be sync safe
with the addition of some good log devices.

I'm not sure if the interposers increased stability or it has simply been
the quality of the Samsung SSD.

-Chip



 Chris

 Chris Nagele
 Co-founder, Wildbit
 Beanstalk, Postmark, dploy.io


 On Sat, Apr 4, 2015 at 7:18 PM, Doug Hughes d...@will.to wrote:
 
  We have a couple of machines with all SSD pool (~6-10 Samsung 850 pro is
 the
  current favorite). They work great for IOPS. Here's my take.
  1) you don't need a dedicated zil. Just let the zpool intersperse it
 amongst
  the existing zpool devices. They are plenty fast enough.
  2) you don't need an L2arc for the same reason. a smaller number of
  dedicated devices would likely cause more of a bottleneck than serving
 off
  the existing pool devices (unless you were to put it on one of those
 giant
  RDRAM things or similar, but that adds a lot of expense)
 
 
 
 
 
  On 4/4/2015 3:07 PM, Chris Nagele wrote:
 
  We've been running a few 4U Supermicro servers using ZeusRAM for zil and
  SSDs for L2. The main disks are regular 1TB SAS.
 
  I'm considering moving to all SSD since the pricing has dropped so much.
  What things should I know or do when moving to all SSD pools? I'm
 assuming I
  don't need L2 and that I should keep the ZeusRAM. Should I only use
 certain
  types of SSDs?
 
  Thanks,
  Chris
 
 
  --
 
  Chris Nagele
  Co-founder, Wildbit
  Beanstalk, Postmark, dploy.io
 
 
 
  ___
  OmniOS-discuss mailing list
  OmniOS-discuss@lists.omniti.com
  http://lists.omniti.com/mailman/listinfo/omnios-discuss
 
 
 
  ___
  OmniOS-discuss mailing list
  OmniOS-discuss@lists.omniti.com
  http://lists.omniti.com/mailman/listinfo/omnios-discuss
 
 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Best infrastructure for VSphere/Hyper-V

2015-04-02 Thread Schweiss, Chip

On Apr 2, 2015 11:50 AM, Nate Smith nsm...@careyweb.com wrote:

 So going over the forum over the last month, it appears more than a
couple people have had problem with Omnios as a storage backend for
virtualization platforms, both as iSCSI targets and as Fibre Channel
targets. Looking at a list of possible alternatives, what infrastructure
works well?

 Is it limited to NFS on VSphere, or is there some way I can get this
working with Hyper-V (which would be vastly preferable due to licensing
advantages for me)?

I run OmniOS with NFS for vSphere.  It works very good.

One bit of disclosure all my VM storage is SSD, however ZFS with
compression makes SSD goes a lot further.  I also use linked clones using
the vsphere api.

I have 250 VMS running on 5 TB of SSD.  Performance is awesome for every VM.

-Chip

 -Nate

 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] best or preferred 10g card for OmniOS

2015-03-29 Thread Schweiss, Chip

On Sun, Mar 29, 2015 at 8:51 AM, Matthew Lagoe matthew.la...@subrigo.net
wrote:

 The intel cards are nice but they don't have any cx4 cards so we don't use
 them. Copper connections have less latency on short links then fiber as you
 don't have the electric to optical conversion (when done properly)


On short links ( 20M) twin-ax copper SFP+ are much more economical and
lower latency than optics.   I would only use optics and fiber if I have
long runs.

-Chip


 -Original Message-
 From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On
 Behalf Of Richard Elling
 Sent: Saturday, March 28, 2015 07:40 AM
 To: Doug Hughes
 Cc: omnios-discuss
 Subject: Re: [OmniOS-discuss] best or preferred 10g card for OmniOS


  On Mar 26, 2015, at 9:24 AM, Doug Hughes d...@will.to wrote:
 
  any recommendations? We're having some pretty big problems with the
 Solarflare card and driver dropping network under high load. We eliminated
 LACP as a culprit, and the switch.
 
  Intel? Chelsio? other?

 I've been running exclusively Intel for several years now. It gets the most
 attention in the illumos community.

  -- richard


 
  - Doug
  ___
  OmniOS-discuss mailing list
  OmniOS-discuss@lists.omniti.com
  http://lists.omniti.com/mailman/listinfo/omnios-discuss
 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss


 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] About P19

2015-03-11 Thread Schweiss, Chip

I have P19 on 3 active servers.  No issues.

I consider it safe.

Also interesting, P20 was on them when I first purchased them.  It was
nearly a month of usage before I found out about P20 and then downgraded.
I didn't have any problems with P20 like others were seeing.

-Chip

On Wed, Mar 11, 2015 at 9:22 AM, Dan McDonald dan...@omniti.com wrote:


  On Mar 11, 2015, at 4:20 AM, Tobias Oetiker t...@oetiker.ch wrote:
 
  Dan,
 
  you mentioned in an earlier post that you had not heard anything
  good about P19 ... this seems to prompt people to consider
  downgreading to P18 ...

 I've heard little/nothing about P19.  I've only heard P18 is known to be
 good.

 Dan

 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] lsi sas 9211-8i it efi firmware 20.00.02.00

2015-03-10 Thread Schweiss, Chip

On Tue, Mar 10, 2015 at 5:48 AM, Stephan Budach stephan.bud...@jvm.de
wrote:

 Am 09.03.15 um 15:47 schrieb Dan McDonald:

 On Mar 9, 2015, at 10:23 AM, Eric Sproul eric.spr...@circonus.com
 wrote:

 On Sat, Mar 7, 2015 at 3:56 PM, Brogyányi József bro...@gmail.com
 wrote:

 Has anyone tested this firmware? Is it free from this error message
 Parity
 Error on path?
 Thanks any information.

 P20 firmware is known to be toxic; just google for lsi p20 firmware
 for the carnage.

 P19 and below are fine, as far as I know.

 I've not heard good things about 19.  I HAVE heard that 18 is the best
 level of FW to run for right now.

 Thanks!
 Dan

 Is there a known good way to flash a LSI back to P18 if it already came
 with P19? I happen to have two new LSIs running P19.
 Afaik, the readme explicitly warns about flashing back the fw…


Backwards is hard.  I went through that trying to get v20 reverted on some
new HBAs.

The only method I could find that worked was using the UEFI shell and UEFI
sas2flash utility to erase the firmware and install the old version.  On
older motherboards, the DOS method should work. Solaris/Illumos sas2flash
is incapable of erasing the firmware.

-Chip





 Cheers,
 budy

 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] smtp-notify dependency on sendmail

2015-03-10 Thread Schweiss, Chip

On Tue, Mar 10, 2015 at 10:36 AM, Dan McDonald dan...@omniti.com wrote:


 svccfg -s system/fm/smtp-notify setprop startup_req/entities = fmri:
 svc:/milestone/multi-user:default
 svccfg -s system/fm/smtp-notify addpropvalue startup_req/entities fmri:
 svc:/system/fmd:default



That's the trick.  Thanks!

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

[OmniOS-discuss] smtp-notify dependency on sendmail

2015-03-10 Thread Schweiss, Chip

I haven't used sendmail since the 1990's and don't intend to change.

I've figured out how to get smtp-notify to start with sendmail-client
disable, but it was a manual process of using 'svccfg -s smtp-notify
editprop'

What I can't figure out how to do the same on the command line.  Everything
I try either gives a syntax error or 'svccfg: No such property group
startup_req.'  I really don't want to have to add a manual step to my
system setup scripts.

What's the proper syntax for this setting?:

svccfg -s system/fm/smtp-notify:default setprop startup_req/entities =
fmri: \svc:/milestone/multi-user:default svc:/system/fmd:default\

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OmniOS on IBM DX360 UEFI Firmware

2015-03-04 Thread Schweiss, Chip

Sounds like the problem I had on a new Supermicro box.   I found by trial
and error turning off x2apic in the bios fixed the problem.

Also disable C sleep states.

-Chip

On Wed, Mar 4, 2015 at 10:27 AM, John Barfield john.barfi...@bissinc.com
wrote:

  Greetings,

 I’m writing to see if anyone could point me in the direction of a
 document that would detail how to get OmniOS to boot on IBM’s newest UEFI
 firmware on system X machines.

  I’m using a DX360 3U chassis as a storage appliance and I’m having a
 hard time booting the installer iso from USB.

  The installer ISO simply does not work but I can boot another
 “installed” OmniOS appliance image off of a different USB stick.

  However this image just crashes and reboots after the SunOS 5.11 screen
 and goes into an infinite reboot loop.

  If anyone has any experience with this server I would be very grateful
 if you shared your knowledge.

  I’ve tried disabling UEFI or enabling legacy mode but I just don’t think
 that its working…after scanning through IBM’s docs from what I can tell…it
 should just work automatically.

  Thanks in advance for any help!

  John Barfield





 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] speeding up file access

2015-03-04 Thread Schweiss, Chip

No USB flash is going to bring any benefit to the game as a log device.  If
it has any ram cache to increase write performance, it's useless as a log
device because it will not have any power protection for the ram.  Most
likely it will not have any RAM and write performance will be poor.  Decent
log devices don't come cheap.

If it's just a home server set up some frequent snapshots and turn sync
off.   You may have to throw away the most recent writes in the case of a
power failure, but your performance will be maximized.

I've been doing this for 4 years on my home ZFS server, about 1/2 dozen
power failures and I've never lost anything.  I still keep it backed up.  I
use Code42's Crashplan.

-Chip

On Tue, Mar 3, 2015 at 10:26 PM, Michael Mounteney gat...@landcroft.co.uk
wrote:

 Hello list;  this is a very basic question about ZFS performance from
 someone with limited sysadmin knowledge.  I've seen various messages
 about ZILs and caching and noticed that my Supermicro 5017C-LF
 (http://www.supermicro.com/products/system/1U/5017/SYS-5017C-LF.cfm).
 This has a single USB socket on the board so I wondered if it would be
 worth putting a USB stick / `thumbdrive' in there and using it as the
 ZIL / cache.  I know the real answer to my question is 'buy a proper
 server' but this is a home system and cost, noise and power-consumption
 all mandate the current choice of machine.

 (Yes;  the USB socket is vertical;  I'd have to buy a right-angle
 converter)

 Thanks, Michael.
 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] The ixgbe driver, Lindsay Lohan, and the Greek economy

2015-02-21 Thread Schweiss, Chip

I can't say I totally agree with your performance assessment.   I run Intel
X520 in all my OmniOS boxes.

Here is a capture of nfssvrtop I made while running many storage vMotions
between two OmniOS boxes hosting NFS datastores.   This is a 10 host VMware
cluster.  Both OmniOS boxes are dual 10G connected with copper twin-ax to
the in rack Nexus 5010.

VMware does 100% sync writes, I use ZeusRAM SSDs for log devices.

-Chip

2014 Apr 24 08:05:51, load: 12.64, read: 17330243 KB, swrite: 15985KB,
awrite: 1875455  KB

Ver Client   NFSOPS   Reads SWrites AWrites Commits   Rd_bw
SWr_bw  AWr_bwRd_t   SWr_t   AWr_t   Com_t  Align%

4   10.28.17.105  0   0   0   0   0
0   0   0   0   0   0   0   0

4   10.28.17.215  0   0   0   0   0
0   0   0   0   0   0   0   0

4   10.28.17.213  0   0   0   0   0
0   0   0   0   0   0   0   0

4   10.28.16.151  0   0   0   0   0
0   0   0   0   0   0   0   0

4   all   1   0   0   0   0
0   0   0   0   0   0   0   0

3   10.28.16.175  3   0   3   0   0
1  11   04806  48   0   0  85

3   10.28.16.183  6   0   6   0   0   3
162   0 549 124   0   0  73

3   10.28.16.180 11   0  10   0   0
3  27   0 776  89   0   0  67

3   10.28.16.176 28   2  26   0   0  10
405   02572 198   0   0 100

3   10.28.16.178   46064602   4   0   0
294534   3   0 723  49   0   0  99

3   10.28.16.179   49054879  26   0   0  312208
311   0 735 271   0   0  99

3   10.28.16.181   55155502  13   0   0
352107  77   0  89  87   0   0  99

3   10.28.16.184  12095   12059  10   0   0
763014  39   0 249 147   0   0  99

3   10.28.58.1154016040 1166354  53  191605
474  202346 192  96 144  83  99

3   all   42574   33086 2176354  53 *1913488*
1582  202300 348 138 153 105  99




On Fri, Feb 20, 2015 at 11:46 PM, W Verb wver...@gmail.com wrote:

 Hello All,

 Thank you for your replies.
 I tried a few things, and found the following:

 1: Disabling hyperthreading support in the BIOS drops performance overall
 by a factor of 4.
 2: Disabling VT support also seems to have some effect, although it
 appears to be minor. But this has the amusing side effect of fixing the
 hangs I've been experiencing with fast reboot. Probably by disabling kvm.
 3: The performance tests are a bit tricky to quantify because of caching
 effects. In fact, I'm not entirely sure what is happening here. It's just
 best to describe what I'm seeing:

 The commands I'm using to test are
 dd if=/dev/zero of=./test.dd bs=2M count=5000
 dd of=/dev/null if=./test.dd bs=2M count=5000
 The host vm is running Centos 6.6, and has the latest vmtools installed.
 There is a host cache on an SSD local to the host that is also in place.
 Disabling the host cache didn't immediately have an effect as far as I
 could see.

 The host MTU set to 3000 on all iSCSI interfaces for all tests.

 Test 1: Right after reboot, with an ixgbe MTU of 9000, the write test
 yields an average speed over three tests of 137MB/s. The read test yields
 an average over three tests of 5MB/s.

 Test 2: After setting ifconfig ixgbe0 mtu 3000, the write tests yield
 140MB/s, and the read tests yield 53MB/s. It's important to note here that
 if I cut the read test short at only 2-3GB, I get results upwards of
 350MB/s, which I assume is local cache-related distortion.

 Test 3: MTU of 1500. Read tests are up to 156 MB/s. Write tests yield
 about 142MB/s.
 Test 4: MTU of 1000: Read test at 182MB/s.
 Test 5: MTU of 900: Read test at 130 MB/s.
 Test 6: MTU of 1000: Read test at 160MB/s. Write tests are now
 consistently at about 300MB/s.
 Test 7: MTU of 1200: Read test at 124MB/s.
 Test 8: MTU of 1000: Read test at 161MB/s. Write at 261MB/s.

 A few final notes:
 L1ARC grabs about 10GB of RAM during the tests, so there's definitely some
 read caching going on.
 The write operations are easier to observe with iostat, and I'm seeing io
 rates that closely correlate with the network write speeds.


 Chris, thanks for your specific details. I'd appreciate it if you could
 tell me which copper NIC you tried, as well as to pass on the iSCSI tuning
 parameters.

 I've ordered an Intel EXPX9502AFXSR, which uses the 82598 chip instead of
 the 82599 in the X520. If I get

Re: [OmniOS-discuss] OmniOS Bloody update for Feb 18

2015-02-20 Thread Schweiss, Chip

I will second that request as my OmniOS experiments always start as VMware
VMs.

-Chip

On Fri, Feb 20, 2015 at 3:17 AM, Alexander Lesle gro...@tierarzt-mueller.de
 wrote:

 Hello Dan McDonald and List,

 do your remember my feature request?
 ,-[  ]-
 |
 | at Friday, 3. Okt. 2014 17:22 Dan McDonald
 | at mid:a762c059-f687-42de-a1f7-6cf2a7f58...@omniti.com wrote:
 |
 |  On Oct 3, 2014, at 9:29 AM, Alexander Lesle
 |  gro...@tierarzt-mueller.de wrote:
 |
 |  Hello Dan McDonald and List,
 | 
 |  in my mind this was a great proposal what you done at Sep, 02.
 |  That's the way to build a great OmniOS near to the users.
 | 
 |  Here my request for the next release:
 |  It would be convenient if open-vm-tools were installed in the new
 |  release. http://sourceforge.net/projects/open-vm-tools/files/
 |
 |  Actually, this work has been done.  I dropped it on the ground:
 |
 |  https://github.com/omniti-labs/omnios-build/pull/39
 |
 |  I will be merging this pull requests now in the master branch,
 |  and seeing how OOod chews on it.
 |
 |
 |  I know there's other work to be done for OmniOS-as-a-guest as well.
 |
 |  Thanks,
 |  DAn
 |
 `---

 Do you find the time to integrate open-vm-tools in the next stable
 version?

 Thanks.

 On Februar, 19 2015, 03:36 Dan McDonald wrote in [1]:

  There will be only 1-3 more updates prior to the next OmniOS stable
  (and for this time, this stable is also Long-Term Support) release.
  If you notice anything weird about the bloody bits, please let me
  know ASAP.  I've heard little/no complaints about the updates to
  pkg(5), so either you're very happy, or not using it.  :)

  This update will only be reaching the repo.

  * omnios-build master branch, revision f3d6d48

  * Git to 2.3.0.

  * UnZIP fixes.

  * OpenJDK7 up to update 76, build 31.

  * Microtasking libraries (/lib/libmtsk*) are back as a distinct package
(system/library/mtsk) now that they are not part of the (now
 open-source)
Math libraries.

  * illumos-omnios master branch, revision cbf73e4 (last illumos-gate
 merge 336069c)

  * Bugfixes in PF_PACKET, SMB/CIFS, header files, NFS, and man pages.

 --
 Best Regards
 Alexander
 Februar, 20 2015
 
 [1] mid:f70c482d-68e8-43a2-bf73-86ade4b63...@omniti.com
 

 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] OmniOS Bloody update for Feb 18

2015-02-20 Thread Schweiss, Chip

On Fri, Feb 20, 2015 at 9:27 AM, Dan McDonald dan...@omniti.com wrote:

 Are you two volunteering for testing?  If so, I can see what I can do
 about pushing the package (without it being in an incorporation) into the
 bloody repo server -- assuming I can build it properly.


Absolutely.   I have several OmniOS VMs already.   At least 1/2 of them
need VMware tools.

-Chip


 Dan

 Sent from my iPhone (typos, autocorrect, and all)

 On Feb 20, 2015, at 9:20 AM, Schweiss, Chip c...@innovates.com wrote:

 I will second that request as my OmniOS experiments always start as VMware
 VMs.

 -Chip

 On Fri, Feb 20, 2015 at 3:17 AM, Alexander Lesle 
 gro...@tierarzt-mueller.de wrote:

 Hello Dan McDonald and List,

 do your remember my feature request?
 ,-[  ]-
 |
 | at Friday, 3. Okt. 2014 17:22 Dan McDonald
 | at mid:a762c059-f687-42de-a1f7-6cf2a7f58...@omniti.com wrote:
 |
 |  On Oct 3, 2014, at 9:29 AM, Alexander Lesle
 |  gro...@tierarzt-mueller.de wrote:
 |
 |  Hello Dan McDonald and List,
 | 
 |  in my mind this was a great proposal what you done at Sep, 02.
 |  That's the way to build a great OmniOS near to the users.
 | 
 |  Here my request for the next release:
 |  It would be convenient if open-vm-tools were installed in the new
 |  release. http://sourceforge.net/projects/open-vm-tools/files/
 |
 |  Actually, this work has been done.  I dropped it on the ground:
 |
 |  https://github.com/omniti-labs/omnios-build/pull/39
 |
 |  I will be merging this pull requests now in the master branch,
 |  and seeing how OOod chews on it.
 |
 |
 |  I know there's other work to be done for OmniOS-as-a-guest as well.
 |
 |  Thanks,
 |  DAn
 |
 `---

 Do you find the time to integrate open-vm-tools in the next stable
 version?

 Thanks.

 On Februar, 19 2015, 03:36 Dan McDonald wrote in [1]:

  There will be only 1-3 more updates prior to the next OmniOS stable
  (and for this time, this stable is also Long-Term Support) release.
  If you notice anything weird about the bloody bits, please let me
  know ASAP.  I've heard little/no complaints about the updates to
  pkg(5), so either you're very happy, or not using it.  :)

  This update will only be reaching the repo.

  * omnios-build master branch, revision f3d6d48

  * Git to 2.3.0.

  * UnZIP fixes.

  * OpenJDK7 up to update 76, build 31.

  * Microtasking libraries (/lib/libmtsk*) are back as a distinct package
(system/library/mtsk) now that they are not part of the (now
 open-source)
Math libraries.

  * illumos-omnios master branch, revision cbf73e4 (last illumos-gate
 merge 336069c)

  * Bugfixes in PF_PACKET, SMB/CIFS, header files, NFS, and man pages.

 --
 Best Regards
 Alexander
 Februar, 20 2015
 
 [1] mid:f70c482d-68e8-43a2-bf73-86ade4b63...@omniti.com
 

 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss


 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Dell vs. Supermicro and any recommendations..

2015-02-18 Thread Schweiss, Chip

On Wed, Feb 18, 2015 at 7:41 AM, Andy omn...@citrus-it.net wrote:



 I'd prefer to run Supermicro but might have some problems convincing
 those with the purse strings. Is anyone running, or can anyone recommend,
 a Supermicro server roughly equivalent to the Del R730xd, or give me an
 idea of what chipsets/HBAs etc. to choose or avoid for OmniOS?


I've been quite happy with the 6028U-TR4+.   It was the first 2U Ultra
server Supermicro was shipping.   Any of the 2U Ultra's would be a good
choice.   What I like about these in particular is lots of PCIe slots for
HBAs and 10GB NICs.   If your running RJ45 10GB they have versions with
10GB built in.

If it were available at the time I would have went with the 2028U, since it
has all 2 1/2 drives.   The only drives I've put in the SATA side has been
SSDs for boot and L2ARC.   With an internal HBA you could load it up with
24 drives.  I think with the on board SATA you are limited to 10 drives.

-Chip



 Any help appreciated,

 Thanks

 Andy

 --
 Citrus IT Limited | +44 (0)870 199 8000 | enquir...@citrus-it.co.uk
 Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
 Registered in England and Wales | Company number 4899123

 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] FINALLY - OmniOS bloody is now updated. (long, please read)

2015-02-04 Thread Schweiss, Chip

Maybe I haven't had enough coffee this morning or I'm missing something.

Where should I set the pkg publisher to point to?

I tried:
http://pkg.omniti.com/omnios/r151013/
and
http://pkg.omniti.com/omnios/r151013-20150203/

Neither works.

-Chip

On Tue, Feb 3, 2015 at 3:08 PM, Dan McDonald dan...@omniti.com wrote:


  On Feb 3, 2015, at 4:07 PM, Schweiss, Chip c...@innovates.com wrote:
 
  Good to know.  Regardless, I will run bloody until I'm ready to go live
 on this system.   Hopefully I can contribute some valuable information,
 even if it is in the form of crash dumps.

 Crash dumps are appreciated.  Also, you saw what else (packaging) I'm
 interested in for this particular one.

 Thanks!
 Dan


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] FINALLY - OmniOS bloody is now updated. (long, please read)

2015-02-03 Thread Schweiss, Chip

On Tue, Feb 3, 2015 at 2:59 PM, Dan McDonald dan...@omniti.com wrote:


 It has the latest illumos-gate versions of things as of two days ago.
 Very recently (mid-January) we backported all mpt_sas stuff into 012 and
 006.  So if you're updated to the latest 012, you won't have much new in
 mpt_sas.  ZFS, OTOH, has newness in it, and alas, not all of it is good
 (5531 has siblings, apparently).

 Dan


Good to know.  Regardless, I will run bloody until I'm ready to go live on
this system.   Hopefully I can contribute some valuable information, even
if it is in the form of crash dumps.

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Windows crashes my ZFS box

2015-02-02 Thread Schweiss, Chip

On Sun, Feb 1, 2015 at 6:21 PM, Rune Tipsmark r...@steait.net wrote:

 I got some major problems... when using Windows and Fibre Channel I am
 able to kill my ZFS box totally for at least 15 minutes... it simply drops
 all connections to all hosts connected via FC. This happens under load, for
 example doing backups writing to the ZFS, running IO Meter against my ZFS...


...


 Latest FW on all items, HBA, Switch etc. Monitoring shows a distributed
 load on the ports as expected using Round Robin and MPIO.

 This might be a shot in the dark, but the latest firmware on LSI HBAs is
known to have serious problems.  It has more to do with data corrupting, so
I'm not sure this is your cause.  Use P18 or P19, but not P20.

-Chip




 One thing that irritates me is that I don't get any more than ~80-120
 MB/sec (sync=always) throughput when writing to this LUN in Windows, where
 I get 6-700 MB/sec (sync=always) when writing from a VM on ESXi... The
 abysmal performance is a pain, but the fact that I can downright crash or
 hang my ZFS box just by running IOMeter is disturbing...



 Any ideas why this might happen? Seems to me like a queue problem but I
 can't really get any closer than that... maybe Windows is just crappy at
 handling Fibre Channel... however no problems against HP EVA Storage
 same machine, same tests



 br,

 Rune






 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] High Availability storage with ZFS

2015-01-06 Thread Schweiss, Chip

On Tue, Jan 6, 2015 at 5:16 AM, Filip Marvan filip.mar...@aira.cz wrote:

 Hi



 as few guys before, I'm thinking again about High Availability storage
 with ZFS. I know, that there is great commercial RSF-1, but that's quite
 expensive for my needs.

 I know, that Sašo did a great job about that on his blog
 http://zfs-create.blogspot.cz but I never found the way, how to
 successfully configure that on current OmniOS versions.



 So I'm thinking about something more simple. Arrange two LUNs from two
 OmniOS ZFS storages in one software mirror through fibrechannel. Arrange
 that mirror in client, for example mdadm in Linux. I know, that it will
 have performance affect and I will lost some ZFS advantages, but I still
 can use snapshots, backups with send/receive and some other interesting ZFS
 things, so it could be usable for some projects.

 Is there anyone, who tried that before? Any eperience with that?


While this sounds technically possible, it is not HA.  Your client is the
single point of failure.   I would wager that mdadm would create more
availability issues than it would be solving.

I run RSF-1 and HA is still hard to achieve.   I don't think I have gained
any additional up-time overcoming failures, but it definitely helps with
planned maintenance.   Unfortunately, there are still too many ways a zfs
pool can fail that having a second server connected does not help.

-Chip




 Thank you,



 Filip Marvan





 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Postgres on Stable

2014-12-29 Thread Schweiss, Chip

I use OpenCSW for Postgres and many other packages on OmniOS.

While targeted at Solaris, OpenCSW packages have worked flawlessly for me
on OmniOS.

http://www.opencsw.org/

-Chip

On Mon, Dec 29, 2014 at 12:40 AM, Michael Mounteney gat...@landcroft.co.uk
wrote:

 On Fri, 28 Nov 2014 18:37:09 -0500
 Zach Malone zmal...@omniti.com wrote:

  I built postgresql-935 for you on a r151012 system, and published it
  to the omniti-ms repo.  Want to give it a shot?  Past installing it, I
  have not tried it on a production system (I'm planning on moving some
  systems to r151012 this month), and I have not rolled any of the other
  postgres libraries or modules that we publish for 9.2.

 Zac, can I be really cheeky and ask for this for bloody, r151013 ?
 I've just switched over to that distro, and there are no versions of
 Postgres available from the standard repositories.

 Much thanks in anticipation.

 Michael.
 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] NFSv4 id mapping only working on client but not server?

2014-12-16 Thread Schweiss, Chip

It seems there a many ways to map ID in NFSv4, is there a way to not map
them at all?

I'm working to configure an OmniOS NFS server that will serve several
domains simultaneously.  Each which has their own ID map with many
conflicting uid/gid across the domains.

All the current file systems being migrated are NFSv3 with AUTH_SYS.   I'd
consider moving them all to kerberos authentication, but something tells me
that may be impossible with the multiple domains.

-Chip



On Mon, Dec 8, 2014 at 3:42 PM, Paul B. Henson hen...@acm.org wrote:

  From: Ian Kaufman
  Sent: Monday, December 08, 2014 12:37 PM
 
  Yes, I oversimplified things. The issue is that AUTH_SYS/AUTH_UNIX
  does not support NFSv4. AUTH_SYS uses the UID, not the name, so the
  mapping fails. So, when RPC uses AUTH_SYS, NFSv4 is SOL.

 I apologize for being a pedant; it's a character flaw :). Your wording
 implies that you cannot use AUTH_SYS with NFSv4, which is not true. NFSv4
 works perfectly fine with AUTH_SYS as long as you maintain synchronization
 of uid/gid's on both sides. I would pick NFSv4 with AUTH_SYS over NFSv3
 with AUTH_SYS. Both require that you maintain uid synchronization, so it's
 not like you're gaining something by falling back, and why miss out on the
 other features of NFSv4?

  Supposedly. at least on the client side, this has been fixed somewhere
  upstream. However, the server side is not.
 
  https://bugzilla.linux-nfs.org/show_bug.cgi?id=226

 I don't know if I would call this fixed 8-/. They are basically just
 disabling the idmapper and passing raw uid/gid info at the NFS level to
 match the raw info at the RPC level. I guess it makes it less confusing in
 such a scenario, because you're always broken if they don't match on each
 side rather than only broken in some cases. An actual fix would be
 introducing an RPC mechanism using names such as AUTH_SYS_NAME, one of
 these days maybe I'll find the time to go hassle some NFS developers and
 see why they don't just do that and make everything simpler.

  Regardless, my point is that this is not a Solaris/Linux issue, as a
  Linux server and Linux clients would be in the same boat.

 Agreed. It is a deficiency in the NFSv4 protocol :(...


 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] live lsi firmware upgrade P17-P19

2014-12-11 Thread Schweiss, Chip

On Thu, Dec 11, 2014 at 9:44 AM, Tobias Oetiker t...@oetiker.ch wrote:

 I am looking at upgrading the firmware of our LSI HBAs to P19 since
 we suspect that our use of P17 is the cause for disk timeouts we
 are seeing every few weeks.

 I am wondering, is it save to flash the HBAs from a running omnios system,
 and if so, has anyone written some notes down on the process and
 tools to use?


I don't believe that is possible on a live system.  I have upgraded using
the Solaris sas2flash utility on a system with the pool exported.   The
last step of the flash function is to reset the adapter which fails
requiring the system to be rebooted.

What I haven't tried is disconnecting the SAS cables then applying the
update.   I suspect this may work, but is only a useful method if you have
redundant SAS paths and can do one HBA at a time.

These issues are why I run HA systems and can migrate my pools from server
to server and perform maintenance at a leisurely pace and confirm
everything is the way I want it before putting the server back in service.

-Chip




 cheers
 tobi


 --
 Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
 www.oetiker.ch t...@oetiker.ch +41 62 775 9902

 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Parity error on path mpt_sas2

2014-12-09 Thread Schweiss, Chip

I'm still fighting this.

Under OmniOS the erase function is disabled in sas2flash for solaris.

I couldn't seem to find an iso bootable DOS that I could stage files in.
FreeDOS1.0 won't mount the CD, 1.1 got rid of the live mode.Tried a
Linux rescue CD and I get  ERROR: Erase Flash Operation Failed!   Tried
both P18 sas2flash and P20 sas2flash.

What OS environment are you running this in?



On Tue, Dec 9, 2014 at 3:06 AM, Filip Marvan filip.mar...@aira.cz wrote:

  Hi,



 first you have to erase P20 firmware with:



 sas2flsh -o -e 6



 now do not reboot(!) and flash P18 with



 sas2flsh -f XX.bin -b mptsas2.rom



 Filip





 *From:* OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] *On
 Behalf Of *Schweiss, Chip
 *Sent:* Monday, December 08, 2014 08:09 AM
 *To:* Filip Marvan
 *Cc:* omnios-discuss@lists.omniti.com
 *Subject:* Re: [OmniOS-discuss] Parity error on path mpt_sas2



 I've got some new LSI HBAs I'm trying to downgrade from firmware version
 20 to 18.

 I'm getting errors when trying to downgrade:

 Attempting to flash firmware to LSI SAS SAS2308_2(D1) :

 Executing Operation: Flash Firmware Image

 Firmware Image has a Valid Checksum.
 Firmware Version 18.00.00.00
 Firmware Image compatible with Controller.

 Valid NVDATA Image found.
 NVDATA Version 11.00.00.00
 Checking for a compatible NVData image...

 NVDATA Device ID and Chip Revision match verified.
 ERROR: Cannot downgrade NVDATA version 14.00.00.06
to 11.00.11.00.

 ERROR: Failed to get valid NVDATA image from File!

 Firmware Image Validation Failed!

 Tried downgrading bios:

 Attempting to flash Boot Service to LSI SAS SAS2308_2(D1) :

 Validating BIOS Image...

 BIOS Header Signature is Valid

 BIOS Image has a Valid Checksum.

 BIOS PCI Structure Signature Valid.

 BIOS Image Compatible with the SAS Controller.

 Attempting to Flash BIOS Image...

 BIOS Version in flash: 07.39.00.00
 BIOS Version from File  : 07.35.00.00
 Skipping flash since file version is not greater than
 existing.

 Flash BIOS Image Failed!

 I've tried sas2flash from P20 and P18.

 Can someone fill me in on the trick to downgrade?

 -Chip





 On Thu, Nov 27, 2014 at 9:03 AM, Filip Marvan filip.mar...@aira.cz
 wrote:

 Thank you for your help Aaron! I downgraded firmware to P18 and there are
 no more errors today.

 It seems, that there is something bad with P20 firmware!



 You saved me a lot of time J



 Filip





 *From:* Aaron Curry [mailto:asc1...@gmail.com]
 *Sent:* Tuesday, November 25, 2014 5:46 PM
 *To:* Filip Marvan
 *Cc:* Dan McDonald; omnios-discuss@lists.omniti.com
 *Subject:* Re: [OmniOS-discuss] Parity error on path mpt_sas2



 I had this exact same problem recently when setting up a new home
 server... same controller, same firmware (P20). The errors were on all
 disks attached to the controller, but only on high read activity. Writes
 did not generate the errors. I downgraded the firmware to P18 (which is
 what we are using at work) and the errors went away. Has anyone had success
 with this controller and the P20 firmware?



 Aaron



 On Tue, Nov 25, 2014 at 8:30 AM, Filip Marvan filip.mar...@aira.cz
 wrote:

 Hi Dan,

 thanks for reply.
 Yes, errors are on all 5 disks in the same RAIDZ pool. No problem on other
 disks in different pool but on the same SAS cable.
 So maybe I only had bad luck with that disks.

 Filip


 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss




 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] Parity error on path mpt_sas2

2014-12-08 Thread Schweiss, Chip

I've got some new LSI HBAs I'm trying to downgrade from firmware version 20
to 18.

I'm getting errors when trying to downgrade:

Attempting to flash firmware to LSI SAS SAS2308_2(D1) :

Executing Operation: Flash Firmware Image

Firmware Image has a Valid Checksum.
Firmware Version 18.00.00.00
Firmware Image compatible with Controller.

Valid NVDATA Image found.
NVDATA Version 11.00.00.00
Checking for a compatible NVData image...

NVDATA Device ID and Chip Revision match verified.
ERROR: Cannot downgrade NVDATA version 14.00.00.06
   to 11.00.11.00.

ERROR: Failed to get valid NVDATA image from File!

Firmware Image Validation Failed!

Tried downgrading bios:

Attempting to flash Boot Service to LSI SAS SAS2308_2(D1) :

Validating BIOS Image...

BIOS Header Signature is Valid

BIOS Image has a Valid Checksum.

BIOS PCI Structure Signature Valid.

BIOS Image Compatible with the SAS Controller.

Attempting to Flash BIOS Image...

BIOS Version in flash: 07.39.00.00
BIOS Version from File  : 07.35.00.00
Skipping flash since file version is not greater than
existing.

Flash BIOS Image Failed!


I've tried sas2flash from P20 and P18.

Can someone fill me in on the trick to downgrade?

-Chip


On Thu, Nov 27, 2014 at 9:03 AM, Filip Marvan filip.mar...@aira.cz wrote:

 Thank you for your help Aaron! I downgraded firmware to P18 and there are
 no more errors today.

 It seems, that there is something bad with P20 firmware!



 You saved me a lot of time J



 Filip





 *From:* Aaron Curry [mailto:asc1...@gmail.com]
 *Sent:* Tuesday, November 25, 2014 5:46 PM
 *To:* Filip Marvan
 *Cc:* Dan McDonald; omnios-discuss@lists.omniti.com
 *Subject:* Re: [OmniOS-discuss] Parity error on path mpt_sas2



 I had this exact same problem recently when setting up a new home
 server... same controller, same firmware (P20). The errors were on all
 disks attached to the controller, but only on high read activity. Writes
 did not generate the errors. I downgraded the firmware to P18 (which is
 what we are using at work) and the errors went away. Has anyone had success
 with this controller and the P20 firmware?



 Aaron



 On Tue, Nov 25, 2014 at 8:30 AM, Filip Marvan filip.mar...@aira.cz
 wrote:

 Hi Dan,

 thanks for reply.
 Yes, errors are on all 5 disks in the same RAIDZ pool. No problem on other
 disks in different pool but on the same SAS cable.
 So maybe I only had bad luck with that disks.

 Filip


 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss



 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] anyone doing DirectPath I/O?

2014-12-03 Thread Schweiss, Chip

I've been using DirectPath with ESXi 5.0 with an LSI HBA for almost 2 years
to an OpenIndiana server.  It's been very stable:

root@zfs01:~# uptime
 21:01pm  up 649 days  6:58,  2 users,  load average: 0.46, 0.30, 0.25


On Wed, Dec 3, 2014 at 6:10 PM, Joseph Boren jbo...@drakecooper.com wrote:

 Greetings,

 I was wondering if anyone was using DirectPath, specifically for exclusive
 use of a drive controller and attached drives to a specific VM.I have a
 use case that would seem to be a good fit for this, so I played around with
 a couple of RAID controllers I had, and was able to get one (3ware 9650se)
 configured for directpath, but none of the drives attached would show to
 OmniOS, regardless of how I configured them in the raid bois (JBOD,
 individual disks, etc).  I know that controller is poorly supported and I
 was curious if anyone was using DirectPath  this way in production and what
 kind of drive controller/HBA/whatever was working.  Also any for the love
 of god don't do it this way scenarios?  I seem to be really adept at
 finding and trying those out first

 Thanks a ton and best regards,

 -jb-
 *Joseph Boren*

 IT Specialist
 *DRAKE COOPER*
 + c: (208) 891-2128 + o: (208) 342-0925
 + 416 S. 8th St., Boise, ID 83702
 + w: drakecooper.com + f: /drakecooper http://facebook.com/drakecooper +
  t: @drakecooper http://twitter.com/drakecooper


 ___
 OmniOS-discuss mailing list
 OmniOS-discuss@lists.omniti.com
 http://lists.omniti.com/mailman/listinfo/omnios-discuss


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

1 2 >

1 - 100 of 135 matches

Mail list logo