Re: [zfs-discuss] LAST CALL: zfs-discuss is moving Sunday, March 24, 2013

2013-03-24 Thread Hans J. Albertsson

However, the zfs-discuss list seems to be archived at gmane.


On 2013-03-22 22:57, Cindy Swearingen wrote:

I hope to see everyone on the other side...

***

The ZFS discussion list is moving to java.net.

This opensolaris/zfs discussion will not be available after March 24.
There is no way to migrate the existing list to the new list.

The solaris-zfs project is here:

http://java.net/projects/solaris-zfs

See the steps below to join the ZFS project or just the discussion list,
but you must create an account on java.net to join the list.

Thanks, Cindy

1. Create an account on java.net.

https://java.net/people/new

2. When logged in to your java.net account, join the solaris-zfs
project as an Observer by clicking the Join This Project link on the
left side of this page:

http://java.net/projects/solaris-zfs

3. Subscribe to the zfs discussion mailing list here:

http://java.net/projects/solaris-zfs/lists
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Dirves going offline in Zpool

2013-03-23 Thread Ram Chander
Hi,

I have Dell md1200 connected to two heads ( Dell R710 ). The heads have
Perc H800 card and drives are configured in Raid0 ( Virtual Disk) in the
RAID controller.

One of the drives had crashed and is replaced by a spare. Resilvering was
triggered but fails to complete due to drives going offline.  I have to
reboot the head ( R710) and drives comes online. This happened  repeatedly
when resilver was 4% done, and again was rebooted ,  again hung at 27%
done, etc.

The issues happens with both Solaris11.1/ Omnios.
Its a 100Tb  pool with 69Tb used. I have critical data and cant afford loss
of data.
Can I recover the data anyway ( atleast partially ) ?

I had verified there is no hardware issue with H800 and also upgraded the
firmware for H800. The issue happens with both the heads.

Current OS: Solaris 11.1

Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@12,0 (sd26):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@c,0 (sd20):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@18,0 (sd32):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@1c,0 (sd36):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@1b,0 (sd35):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@1e,0 (sd38):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@19,0 (sd33):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@1d,0 (sd37):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@27,0 (sd47):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone
Mar 22 21:47:55 solaris scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,340e@7/pci1028,1f15@0/sd@26,0 (sd46):
Mar 22 21:47:55 solarisCommand failed to complete...Device is gone

# zpool status -v

  pool: test
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Mar 20 19:13:40 2013
27.4T scanned out of 69.6T at 183M/s, 67h11m to go
2.43T resilvered, 39.32% done
config:

NAMESTATE READ WRITE CKSUM
test  DEGRADED 0 0 0
  raidz1-0  DEGRADED 0 0 0
c8t0d0  ONLINE   0 0 0
c8t1d0  DEGRADED 0 0 0
c8t2d0  DEGRADED 0 0 0
c8t3d0  ONLINE   0 0 0
spare-4 DEGRADED 0 0 0
  12459181442598970150  UNAVAIL  0 0 0
  c8t45d0   DEGRADED 0 0 0
(resilvering)
  raidz1-1  ONLINE   0 0 0
c8t5d0  ONLINE   0 0 0
c8t6d0  ONLINE   0 0 0
c8t7d0  ONLINE   0 0 0
c8t8d0  ONLINE   0 0 0
c8t9d0  ONLINE   0 0 0
  raidz1-3  DEGRADED 0 0 0
c8t12d0 ONLINE   0 0 0
c8t13d0 ONLINE   0 0 0
c8t14d0 ONLINE   0 0 0
c8t15d0 DEGRADED 0 0 0
c8t16d0 ONLINE   0 0 0
c8t17d0 ONLINE   0 0 0
c8t18d0 ONLINE   0 0 0
c8t19d0 ONLINE   0 0 0
c8t20d0 DEGRADED 0 0 0
c8t21d0 DEGRADED 0 0 0
spare-10DEGRADED 0 0 0
  c8t22d0   DEGRADED 0 0 0
  c8t47d0   DEGRADED 0 

[zfs-discuss] How to enforce probing of all disks?

2013-03-22 Thread Jim Klimov

Hello all,

  I have a kind of lame question here: how can I force the system (OI)
to probe all the HDD controllers and disks that it can find, and be
certain that it has searched everywhere for disks?

  My remotely supported home-NAS PC was unavailable for a while, and
a friend rebooted it for me from a LiveUSB image with SSH (oi_148a).
I can see my main pool disks, but not the old boot (rpool) drive.
Meaning, that it does not appear in zpool import nor in format
outputs. While it is possible that it has finally kicked the bucket,
and that won't really be unexpected, I'd like to try and confirm.

  For example, it might fail to spin up or come into contact with
the SATA cable initially - but subsequent probing of the same
controller might just find it. Happened before, too - though
via a reboot and full POST... The friend won't be available for a
few days, and there's no other remote management nor inspection
facility for this box, so I'd like to probe from within OI as much
as I can. Should be an educational quest, too ;)

# cfgadm -al
Ap_Id  Type Receptacle   Occupant 
Condition

Slot36 sata/hp  connectedconfigured   ok
sata0/0::dsk/c5t0d0disk connectedconfigured   ok
sata0/1::dsk/c5t1d0disk connectedconfigured   ok
sata0/2::dsk/c5t2d0disk connectedconfigured   ok
sata0/3::dsk/c5t3d0disk connectedconfigured   ok
sata0/4::dsk/c5t4d0disk connectedconfigured   ok
sata0/5::dsk/c5t5d0disk connectedconfigured   ok
sata1/0sata-portemptyunconfigured ok
sata1/1sata-portemptyunconfigured ok
... (USB reports follow)

# devfsadm -Cv  -- nothing new found

Nothing of interest in dmesg...

# scanpci -v | grep -i ata
 Intel Corporation 82801HR/HO/HH (ICH8R/DO/DH) 6 port SATA AHCI Controller
 JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller
 JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller

# prtconf -v | grep -i ata
name='ata-dma-enabled' type=string items=1
name='atapi-cd-dma-enabled' type=string items=1
value='ADATA USB Flash Drive'
value='ADATA'
value='ADATA'
name='sata' type=int items=1 dev=none
value='SATA AHCI 1.0 Interface'
dev_link=/dev/cfg/sata1/0
dev_link=/dev/cfg/sata1/1
name='ata-options' type=int items=1
value='atapi'
name='sata' type=int items=1 dev=none
value='\_SB_.PCI0.SATA'
value='SATA AHCI 1.0 Interface'
dev_link=/dev/cfg/sata0/0
dev_link=/dev/cfg/sata0/1
dev_link=/dev/cfg/sata0/2
dev_link=/dev/cfg/sata0/3
dev_link=/dev/cfg/sata0/4
dev_link=/dev/cfg/sata0/5

value='id1,sd@SATA_ST2000DL003-9VT15YD217ZL'
name='sata-phy' type=int items=1

value='scsiclass,00.vATA.pST2000DL003-9VT1.rCC32' + 
'scsiclass,00.vATA.pST2000DL003-9VT1' + 'scsiclass,00' + 'scsiclass'


value='id1,sd@SATA_ST2000DL003-9VT15YD1XWWB'
name='sata-phy' type=int items=1

value='scsiclass,00.vATA.pST2000DL003-9VT1.rCC32' + 
'scsiclass,00.vATA.pST2000DL003-9VT1' + 'scsiclass,00' + 'scsiclass'


value='id1,sd@SATA_ST2000DL003-9VT15YD1VLKC'
name='sata-phy' type=int items=1

value='scsiclass,00.vATA.pST2000DL003-9VT1.rCC32' + 
'scsiclass,00.vATA.pST2000DL003-9VT1' + 'scsiclass,00' + 'scsiclass'


value='id1,sd@SATA_ST2000DL003-9VT15YD21QZL'
name='sata-phy' type=int items=1

value='scsiclass,00.vATA.pST2000DL003-9VT1.rCC32' + 
'scsiclass,00.vATA.pST2000DL003-9VT1' + 'scsiclass,00' + 'scsiclass'


value='id1,sd@SATA_ST2000DL003-9VT15YD24GCA'
name='sata-phy' type=int items=1

value='scsiclass,00.vATA.pST2000DL003-9VT1.rCC32' + 
'scsiclass,00.vATA.pST2000DL003-9VT1' + 'scsiclass,00' + 'scsiclass'


value='id1,sd@SATA_ST2000DL003-9VT15YD24GDG'
name='sata-phy' type=int items=1

value='scsiclass,00.vATA.pST2000DL003-9VT1.rCC32' + 
'scsiclass,00.vATA.pST2000DL003-9VT1' + 'scsiclass,00' + 'scsiclass'



This only sees the six ST2000DL003 drives of the main data pool,
and the LiveUSB flash drive...

So - is it possible to try reinitializing and locating connections to
the disk on a commodity motherboard (i.e. no lsiutil, IPMI and such)
using only OI, without rebooting the box?

The pools are not imported, so if I can detach and reload the sata
drivers - I might try that, but I am stumped at how 

[zfs-discuss] LAST CALL: zfs-discuss is moving Sunday, March 24, 2013

2013-03-22 Thread Cindy Swearingen

I hope to see everyone on the other side...

***

The ZFS discussion list is moving to java.net.

This opensolaris/zfs discussion will not be available after March 24.
There is no way to migrate the existing list to the new list.

The solaris-zfs project is here:

http://java.net/projects/solaris-zfs

See the steps below to join the ZFS project or just the discussion list,
but you must create an account on java.net to join the list.

Thanks, Cindy

1. Create an account on java.net.

https://java.net/people/new

2. When logged in to your java.net account, join the solaris-zfs
project as an Observer by clicking the Join This Project link on the
left side of this page:

http://java.net/projects/solaris-zfs

3. Subscribe to the zfs discussion mailing list here:

http://java.net/projects/solaris-zfs/lists
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] This mailing list EOL???

2013-03-21 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
mail-archive.com is an independent third party.

This is one of their FAQ's
http://www.mail-archive.com/faq.html#duration

The Mail Archive has been running since 1998. Archiving services are planned to 
continue indefinitely. We do not plan on ever needing to remove archived 
material. Do not, however, misconstrue these intentions with a warranty of any 
kind. We reserve the right to discontinue service at any time.



From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Deirdre Straughan
Sent: Wednesday, March 20, 2013 5:16 PM
To: Cindy Swearingen; zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] This mailing list EOL???

Will the archives of all the lists be preserved? I don't think we've seen a 
clear answer on that (it's possible you haven't, either!).
On Wed, Mar 20, 2013 at 2:14 PM, Cindy Swearingen 
cindy.swearin...@oracle.commailto:cindy.swearin...@oracle.com wrote:
Hi Ned,

This list is migrating to java.nethttp://java.net and will not be available
in its current form after March 24, 2013.

The archive of this list is available here:

http://www.mail-archive.com/zfs-discuss@opensolaris.org/

I will provide an invitation to the new list shortly.

Thanks for your patience.

Cindy


On 03/20/13 15:05, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) 
wrote:
I can't seem to find any factual indication that 
opensolaris.orghttp://opensolaris.org mailing
lists are going away, and I can't even find the reference to whoever
said it was EOL in a few weeks ... a few weeks ago.

So ... are these mailing lists going bye-bye?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.orgmailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.orgmailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--


best regards,
Deirdré Straughan
Community Architect, SmartOS
illumos Community Manager


cell 720 371 4107
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SSD for L2arc

2013-03-21 Thread Ram Chander
Hi,

Can I know how to configure a SSD to be used for L2arc ? Basically I want
to improve read performance.
To increase write performance, will SSD for Zil help ? As I read on forums,
Zil is only used for mysql/transaction based writes. I have regular writes
only.

Thanks.

Regards,
Ram
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD for L2arc

2013-03-21 Thread Jim Mauro

 Can I know how to configure a SSD to be used for L2arc ? Basically I want to 
 improve read performance.

Read the documentation, specifically the section titled;

Creating a ZFS Storage PoolWith Cache Devices


 To increase write performance, will SSD for Zil help ? As I read on forums, 
 Zil is only used for mysql/transaction based writes. I have regular writes 
 only.

That is not correct - the ZIL is used for synchronous writes.

From the documentation:

The ZFS intent log (ZIL) is provided to satisfy POSIX requirements for 
synchronous
transactions. For example, databases often require their transactions to be on 
stable storage
devices when returning from a system call. NFS and other applications can also 
use fsync() to
ensure data stability.

By default, the ZIL is allocated from blocks within the main pool.However, 
better performance
might be possible by using separate intent log devices, such asNVRAMor a 
dedicated disk.


 
 Thanks.
 
 Regards,
 Ram
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD for L2arc

2013-03-21 Thread Jim Klimov

On 2013-03-21 16:24, Ram Chander wrote:

Hi,

Can I know how to configure a SSD to be used for L2arc ? Basically I
want to improve read performance.


The man zpool page is quite informative on theory and concepts ;)

If your pool already exists, you can prepare the SSD (partition/slice
it) and:
# zpool add POOLNAME cache cXtYdZsS

Likewise, to add a ZIL device you can add a log device, either as
a single disk (slice) or as a mirror of two or more:
# zpool add POOLNAME log cXtYdZsS
# zpool add POOLNAME log mirror cXtYdZsS1 cXtYdZsS2



To increase write performance, will SSD for Zil help ? As I read on
forums, Zil is only used for mysql/transaction based writes. I have
regular writes only.


It may increase performance in two ways:

If you have any apps (including NFS, maybe VMs, iSCSI, etc. - not only
databases) that regularly issue synchronous writes - those which must
be stored on media (not just cached and queued) before the call returns
a success, then the ZIL catches these writes instead of the main pool
devices. The ZIL is written as ring buffer, so its size is proportional
to your pool's throughput - about 3 full-size TXG syncs should fit into
the designated ZIL space. That's usually max bandwidth (X Mb/s) times
15 sec (3*5s), or a bit more for peace of mind.

1) If the ZIL device (SLOG) is an SSD, it is presumably quick, so
writes should return quickly and sync IOs are less blocked.

2) If the SLOG is on HDD(s) separate from the main pool, then writes
into the ZIL cause no mechanical seeks during normal pool IOs, thus
requiring time for the disk heads to travel to the reserved ZIL area
and back - this is time stolen from both reads and writes in the pool.
*Possibly*, fragmentation might also be reduced by having ZIL outside
of the main pool, though this statement may be technically invalid as
my fault, then.

3) As a *speculation*, it is likely that a HDD doing nothing but SLOG
(i.e. a hotspare with a designated slice for ZIL so it does something
useful while waiting for failover of a larger pool device) would also
give a good boost to performance, since it won't have to seek much.
The rotational latency will be there however, limiting reachable IOPS
in comparison to an SSD SLOG.

HTH,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
I have two identical Supermicro boxes with 32GB ram. Hardware details at
the end of the message.

They were running OI 151.a.5 for months. The zpool configuration was one
storage zpool with 3 vdevs of 8 disks in RAIDZ2.

The OI installation is absolutely clean. Just next-next-next until done.
All I do is configure the network after install. I don't install or enable
any other services.

Then I added more disks and rebuild the systems with OI 151.a.7 and this
time configured the zpool with 6 vdevs of 5 disks in RAIDZ.

The systems started crashing really bad. They just disappear from the
network, black and unresponsive console, no error lights but no activity
indication either. The only way out is to power cycle the system.

There is no pattern in the crashes. It may crash in 2 days in may crash in
2 hours.

I upgraded the memory on both systems to 128GB at no avail. This is the max
memory they can take.

In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.

Any idea what could be the problem.

Thank you

-- Peter

Supermicro X9DRH-iF
Xeon E5-2620 @ 2.0 GHz 6-Core
LSI SAS9211-8i HBA
32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Michael Schuster
Peter,

sorry if this is so obvious that you didn't mention it: Have you checked
/var/adm/messages and other diagnostic tool output?

regards
Michael

On Wed, Mar 20, 2013 at 4:34 PM, Peter Wood peterwood...@gmail.com wrote:

 I have two identical Supermicro boxes with 32GB ram. Hardware details at
 the end of the message.

 They were running OI 151.a.5 for months. The zpool configuration was one
 storage zpool with 3 vdevs of 8 disks in RAIDZ2.

 The OI installation is absolutely clean. Just next-next-next until done.
 All I do is configure the network after install. I don't install or enable
 any other services.

 Then I added more disks and rebuild the systems with OI 151.a.7 and this
 time configured the zpool with 6 vdevs of 5 disks in RAIDZ.

 The systems started crashing really bad. They just disappear from the
 network, black and unresponsive console, no error lights but no activity
 indication either. The only way out is to power cycle the system.

 There is no pattern in the crashes. It may crash in 2 days in may crash in
 2 hours.

 I upgraded the memory on both systems to 128GB at no avail. This is the
 max memory they can take.

 In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.

 Any idea what could be the problem.

 Thank you

 -- Peter

 Supermicro X9DRH-iF
 Xeon E5-2620 @ 2.0 GHz 6-Core
 LSI SAS9211-8i HBA
 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
Michael Schuster
http://recursiveramblings.wordpress.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Will Murnane
Does the Supermicro IPMI show anything when it crashes?  Does anything show
up in event logs in the BIOS, or in system logs under OI?


On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood peterwood...@gmail.com wrote:

 I have two identical Supermicro boxes with 32GB ram. Hardware details at
 the end of the message.

 They were running OI 151.a.5 for months. The zpool configuration was one
 storage zpool with 3 vdevs of 8 disks in RAIDZ2.

 The OI installation is absolutely clean. Just next-next-next until done.
 All I do is configure the network after install. I don't install or enable
 any other services.

 Then I added more disks and rebuild the systems with OI 151.a.7 and this
 time configured the zpool with 6 vdevs of 5 disks in RAIDZ.

 The systems started crashing really bad. They just disappear from the
 network, black and unresponsive console, no error lights but no activity
 indication either. The only way out is to power cycle the system.

 There is no pattern in the crashes. It may crash in 2 days in may crash in
 2 hours.

 I upgraded the memory on both systems to 128GB at no avail. This is the
 max memory they can take.

 In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.

 Any idea what could be the problem.

 Thank you

 -- Peter

 Supermicro X9DRH-iF
 Xeon E5-2620 @ 2.0 GHz 6-Core
 LSI SAS9211-8i HBA
 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
I'm sorry. I should have mentioned it that I can't find any errors in the
logs. The last entry in /var/adm/messages is that I removed the keyboard
after the last reboot and then it shows the new boot up messages when I
boot up the system after the crash. The BIOS log is empty. I'm not sure how
to check the IPMI but IPMI is not configured and I'm not using it.

Just another observation - the crashes are more intense the more data the
system serves (NFS).

I'm looking into FRMW upgrades for the LSI now.


On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane will.murn...@gmail.comwrote:

 Does the Supermicro IPMI show anything when it crashes?  Does anything
 show up in event logs in the BIOS, or in system logs under OI?


 On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood peterwood...@gmail.comwrote:

 I have two identical Supermicro boxes with 32GB ram. Hardware details at
 the end of the message.

 They were running OI 151.a.5 for months. The zpool configuration was one
 storage zpool with 3 vdevs of 8 disks in RAIDZ2.

 The OI installation is absolutely clean. Just next-next-next until done.
 All I do is configure the network after install. I don't install or enable
 any other services.

 Then I added more disks and rebuild the systems with OI 151.a.7 and this
 time configured the zpool with 6 vdevs of 5 disks in RAIDZ.

 The systems started crashing really bad. They just disappear from the
 network, black and unresponsive console, no error lights but no activity
 indication either. The only way out is to power cycle the system.

 There is no pattern in the crashes. It may crash in 2 days in may crash
 in 2 hours.

 I upgraded the memory on both systems to 128GB at no avail. This is the
 max memory they can take.

 In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.

 Any idea what could be the problem.

 Thank you

 -- Peter

 Supermicro X9DRH-iF
 Xeon E5-2620 @ 2.0 GHz 6-Core
 LSI SAS9211-8i HBA
 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Michael Schuster
How about crash dumps?

michael

On Wed, Mar 20, 2013 at 4:50 PM, Peter Wood peterwood...@gmail.com wrote:

 I'm sorry. I should have mentioned it that I can't find any errors in the
 logs. The last entry in /var/adm/messages is that I removed the keyboard
 after the last reboot and then it shows the new boot up messages when I
 boot up the system after the crash. The BIOS log is empty. I'm not sure how
 to check the IPMI but IPMI is not configured and I'm not using it.

 Just another observation - the crashes are more intense the more data the
 system serves (NFS).

 I'm looking into FRMW upgrades for the LSI now.


 On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane will.murn...@gmail.comwrote:

 Does the Supermicro IPMI show anything when it crashes?  Does anything
 show up in event logs in the BIOS, or in system logs under OI?


 On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood peterwood...@gmail.comwrote:

 I have two identical Supermicro boxes with 32GB ram. Hardware details at
 the end of the message.

 They were running OI 151.a.5 for months. The zpool configuration was one
 storage zpool with 3 vdevs of 8 disks in RAIDZ2.

 The OI installation is absolutely clean. Just next-next-next until done.
 All I do is configure the network after install. I don't install or enable
 any other services.

 Then I added more disks and rebuild the systems with OI 151.a.7 and this
 time configured the zpool with 6 vdevs of 5 disks in RAIDZ.

 The systems started crashing really bad. They just disappear from the
 network, black and unresponsive console, no error lights but no activity
 indication either. The only way out is to power cycle the system.

 There is no pattern in the crashes. It may crash in 2 days in may crash
 in 2 hours.

 I upgraded the memory on both systems to 128GB at no avail. This is the
 max memory they can take.

 In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.

 Any idea what could be the problem.

 Thank you

 -- Peter

 Supermicro X9DRH-iF
 Xeon E5-2620 @ 2.0 GHz 6-Core
 LSI SAS9211-8i HBA
 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
Michael Schuster
http://recursiveramblings.wordpress.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
I'm going to need some help with the crash dumps. I'm not very familiar
with Solaris.

Do I have to enable something to get the crash dumps? Where should I look
for them?

Thanks for the help.


On Wed, Mar 20, 2013 at 8:53 AM, Michael Schuster michaelspriv...@gmail.com
 wrote:

 How about crash dumps?

 michael


 On Wed, Mar 20, 2013 at 4:50 PM, Peter Wood peterwood...@gmail.comwrote:

 I'm sorry. I should have mentioned it that I can't find any errors in the
 logs. The last entry in /var/adm/messages is that I removed the keyboard
 after the last reboot and then it shows the new boot up messages when I
 boot up the system after the crash. The BIOS log is empty. I'm not sure how
 to check the IPMI but IPMI is not configured and I'm not using it.

 Just another observation - the crashes are more intense the more data the
 system serves (NFS).

 I'm looking into FRMW upgrades for the LSI now.


 On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane will.murn...@gmail.comwrote:

 Does the Supermicro IPMI show anything when it crashes?  Does anything
 show up in event logs in the BIOS, or in system logs under OI?


 On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood peterwood...@gmail.comwrote:

 I have two identical Supermicro boxes with 32GB ram. Hardware details
 at the end of the message.

 They were running OI 151.a.5 for months. The zpool configuration was
 one storage zpool with 3 vdevs of 8 disks in RAIDZ2.

 The OI installation is absolutely clean. Just next-next-next until
 done. All I do is configure the network after install. I don't install or
 enable any other services.

 Then I added more disks and rebuild the systems with OI 151.a.7 and
 this time configured the zpool with 6 vdevs of 5 disks in RAIDZ.

 The systems started crashing really bad. They just disappear from the
 network, black and unresponsive console, no error lights but no activity
 indication either. The only way out is to power cycle the system.

 There is no pattern in the crashes. It may crash in 2 days in may crash
 in 2 hours.

 I upgraded the memory on both systems to 128GB at no avail. This is the
 max memory they can take.

 In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.

 Any idea what could be the problem.

 Thank you

 -- Peter

 Supermicro X9DRH-iF
 Xeon E5-2620 @ 2.0 GHz 6-Core
 LSI SAS9211-8i HBA
 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




 --
 Michael Schuster
 http://recursiveramblings.wordpress.com/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Jim Klimov

On 2013-03-20 17:15, Peter Wood wrote:

I'm going to need some help with the crash dumps. I'm not very familiar
with Solaris.

Do I have to enable something to get the crash dumps? Where should I
look for them?


Typically the kernel crash dumps are created as a result of kernel
panic; also they may be forced by administrative actions like NMI.
They require you to configure a dump volume of sufficient size (see
dumpadm) and a /var/crash which may be a dataset on a large enough
pool - after the reboot the dump data will be migrated there.

To help with the hangs you can try the BIOS watchdog (which would
require a bmc driver, one which is known from OpenSolaris is alas
not opensourced and not redistributable), or with a software deadman
timer:

http://www.cuddletech.com/blog/pivot/entry.php?id=1044

http://wiki.illumos.org/display/illumos/System+Hangs

Also, if you configure crash dump on NMI and set up your IPMI card,
then you can likely gain remote access to both the server console
(physical and/or serial) and may be able to trigger the NMI, too.

HTH,
//Jim



Thanks for the help.


On Wed, Mar 20, 2013 at 8:53 AM, Michael Schuster
michaelspriv...@gmail.com mailto:michaelspriv...@gmail.com wrote:

How about crash dumps?

michael


On Wed, Mar 20, 2013 at 4:50 PM, Peter Wood peterwood...@gmail.com
mailto:peterwood...@gmail.com wrote:

I'm sorry. I should have mentioned it that I can't find any
errors in the logs. The last entry in /var/adm/messages is that
I removed the keyboard after the last reboot and then it shows
the new boot up messages when I boot up the system after the
crash. The BIOS log is empty. I'm not sure how to check the IPMI
but IPMI is not configured and I'm not using it.

Just another observation - the crashes are more intense the more
data the system serves (NFS).

I'm looking into FRMW upgrades for the LSI now.


On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane
will.murn...@gmail.com mailto:will.murn...@gmail.com wrote:

Does the Supermicro IPMI show anything when it crashes?
  Does anything show up in event logs in the BIOS, or in
system logs under OI?


On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood
peterwood...@gmail.com mailto:peterwood...@gmail.com wrote:

I have two identical Supermicro boxes with 32GB ram.
Hardware details at the end of the message.

They were running OI 151.a.5 for months. The zpool
configuration was one storage zpool with 3 vdevs of 8
disks in RAIDZ2.

The OI installation is absolutely clean. Just
next-next-next until done. All I do is configure the
network after install. I don't install or enable any
other services.

Then I added more disks and rebuild the systems with OI
151.a.7 and this time configured the zpool with 6 vdevs
of 5 disks in RAIDZ.

The systems started crashing really bad. They
just disappear from the network, black and unresponsive
console, no error lights but no activity indication
either. The only way out is to power cycle the system.

There is no pattern in the crashes. It may crash in 2
days in may crash in 2 hours.

I upgraded the memory on both systems to 128GB at no
avail. This is the max memory they can take.

In summary all I did is upgrade to OI 151.a.7 and
reconfigured zpool.

Any idea what could be the problem.

Thank you

-- Peter

Supermicro X9DRH-iF
Xeon E5-2620 @ 2.0 GHz 6-Core
LSI SAS9211-8i HBA
32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




--
Michael Schuster
http://recursiveramblings.wordpress.com/




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
Hi Jim,

Thanks for the pointers. I'll definitely look into this.


--
Peter Blajev
IT Manager, TAAZ Inc.
Office: 858-597-0512 x125


On Wed, Mar 20, 2013 at 11:29 AM, Jim Klimov jimkli...@cos.ru wrote:

 On 2013-03-20 17:15, Peter Wood wrote:

 I'm going to need some help with the crash dumps. I'm not very familiar
 with Solaris.

 Do I have to enable something to get the crash dumps? Where should I
 look for them?


 Typically the kernel crash dumps are created as a result of kernel
 panic; also they may be forced by administrative actions like NMI.
 They require you to configure a dump volume of sufficient size (see
 dumpadm) and a /var/crash which may be a dataset on a large enough
 pool - after the reboot the dump data will be migrated there.

 To help with the hangs you can try the BIOS watchdog (which would
 require a bmc driver, one which is known from OpenSolaris is alas
 not opensourced and not redistributable), or with a software deadman
 timer:

 http://www.cuddletech.com/**blog/pivot/entry.php?id=1044http://www.cuddletech.com/blog/pivot/entry.php?id=1044

 http://wiki.illumos.org/**display/illumos/System+Hangshttp://wiki.illumos.org/display/illumos/System+Hangs

 Also, if you configure crash dump on NMI and set up your IPMI card,
 then you can likely gain remote access to both the server console
 (physical and/or serial) and may be able to trigger the NMI, too.

 HTH,
 //Jim


 Thanks for the help.


 On Wed, Mar 20, 2013 at 8:53 AM, Michael Schuster
 michaelspriv...@gmail.com 
 mailto:michaelsprivate@gmail.**commichaelspriv...@gmail.com
 wrote:

 How about crash dumps?

 michael


 On Wed, Mar 20, 2013 at 4:50 PM, Peter Wood peterwood...@gmail.com
 mailto:peterwood...@gmail.com** wrote:

 I'm sorry. I should have mentioned it that I can't find any
 errors in the logs. The last entry in /var/adm/messages is that
 I removed the keyboard after the last reboot and then it shows
 the new boot up messages when I boot up the system after the
 crash. The BIOS log is empty. I'm not sure how to check the IPMI
 but IPMI is not configured and I'm not using it.

 Just another observation - the crashes are more intense the more
 data the system serves (NFS).

 I'm looking into FRMW upgrades for the LSI now.


 On Wed, Mar 20, 2013 at 8:40 AM, Will Murnane
 will.murn...@gmail.com mailto:will.murn...@gmail.com** wrote:

 Does the Supermicro IPMI show anything when it crashes?
   Does anything show up in event logs in the BIOS, or in
 system logs under OI?


 On Wed, Mar 20, 2013 at 11:34 AM, Peter Wood
 peterwood...@gmail.com mailto:peterwood...@gmail.com**
 wrote:

 I have two identical Supermicro boxes with 32GB ram.
 Hardware details at the end of the message.

 They were running OI 151.a.5 for months. The zpool
 configuration was one storage zpool with 3 vdevs of 8
 disks in RAIDZ2.

 The OI installation is absolutely clean. Just
 next-next-next until done. All I do is configure the
 network after install. I don't install or enable any
 other services.

 Then I added more disks and rebuild the systems with OI
 151.a.7 and this time configured the zpool with 6 vdevs
 of 5 disks in RAIDZ.

 The systems started crashing really bad. They
 just disappear from the network, black and unresponsive
 console, no error lights but no activity indication
 either. The only way out is to power cycle the system.

 There is no pattern in the crashes. It may crash in 2
 days in may crash in 2 hours.

 I upgraded the memory on both systems to 128GB at no
 avail. This is the max memory they can take.

 In summary all I did is upgrade to OI 151.a.7 and
 reconfigured zpool.

 Any idea what could be the problem.

 Thank you

 -- Peter

 Supermicro X9DRH-iF
 Xeon E5-2620 @ 2.0 GHz 6-Core
 LSI SAS9211-8i HBA
 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

 __**_
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 
 mailto:zfs-discuss@**opensolaris.orgzfs-discuss@opensolaris.org
 

 http://mail.opensolaris.org/**
 mailman/listinfo/zfs-discusshttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss




 __**_
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org 
 

Re: [zfs-discuss] [BULK] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
No problem Trey. Anything will help.

Yes, I did a clean install overwriting the old OS.



  Just to make sure, you actually did an overwrite reinstall with OI151a7
 rather than upgrading the existing OS images?   If you did a pkg
 image-update, you should be able to boot back into the oi151a5 image from
 grub.  Apologies in advance if I'm stating the obvious.

  -- Trey


 On Mar 20, 2013, at 11:34 AM, Peter Wood peterwood...@gmail.com wrote:

   I have two identical Supermicro boxes with 32GB ram. Hardware details
 at the end of the message.

  They were running OI 151.a.5 for months. The zpool configuration was one
 storage zpool with 3 vdevs of 8 disks in RAIDZ2.

  The OI installation is absolutely clean. Just next-next-next until done.
 All I do is configure the network after install. I don't install or enable
 any other services.

  Then I added more disks and rebuild the systems with OI 151.a.7 and this
 time configured the zpool with 6 vdevs of 5 disks in RAIDZ.

  The systems started crashing really bad. They just disappear from the
 network, black and unresponsive console, no error lights but no activity
 indication either. The only way out is to power cycle the system.

  There is no pattern in the crashes. It may crash in 2 days in may crash
 in 2 hours.

  I upgraded the memory on both systems to 128GB at no avail. This is the
 max memory they can take.

  In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.

  Any idea what could be the problem.

  Thank you

  -- Peter

  Supermicro X9DRH-iF
  Xeon E5-2620 @ 2.0 GHz 6-Core
  LSI SAS9211-8i HBA
  32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

  ___

 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Jens Elkner
On Wed, Mar 20, 2013 at 08:50:40AM -0700, Peter Wood wrote:
I'm sorry. I should have mentioned it that I can't find any errors in the
logs. The last entry in /var/adm/messages is that I removed the keyboard
after the last reboot and then it shows the new boot up messages when I 
 boot
up the system after the crash. The BIOS log is empty. I'm not sure how to
check the IPMI but IPMI is not configured and I'm not using it.

You definitely should! Plugin a cable into the dedicated network port 
and configure it (easiest way for you is probably to jump into the BIOS
and assign the appropriate IP address etc.). Than, for a quick look, 
point your browser to the given IP port 80 (default login is
ADMIN/ADMIN). Also you may now configure some other details
(accounts/passwords/roles).

To track the problem, either write a script, which polls the parameters
in question periodically or just install the latest ipmiViewer and use
this to monitor your sensors ad hoc.
see ftp://ftp.supermicro.com/utility/IPMIView/

Just another observation - the crashes are more intense the more data the
system serves (NFS).
I'm looking into FRMW upgrades for the LSI now.

Latest LSI FW should be P15, for this MB type 217 (2.17), MB-BIOS C28 (1.0b).
However, I doubt, that your problem has anything to do with the
SAS-ctrl or OI or ZFS.

My guess is, that either your MB is broken (we had an X9DRH-iF, which
instantly disappeared as soon as it got some real load) or you have
a heat problem (watch you cpu temp e.g. via ipmiviewer). With 2GHz
that's not very likely, but worth a try (socket placement on this board
is not really smart IMHO).

To test quickly
- disable all addtional, unneeded service in OI, which may put some
  load on the machine (like NFS service, http and bla) and perhaps
  even export unneeded pools (just to be sure)
- fire up your ipmiviewer and look at the sensors (set update to
  10s) or refresh manually often
- start 'openssl speed -multi 32' and keep watching your cpu temp
  sensors (with 2GHz I guess it takes ~ 12min)

I guess, your machine disappears before the CPUs getting really hot
(broken MB). If CPUs switch off (usually first CPU2 and a little bit
later CPU1) you have a cooling problem. If nothing happens, well, than
it could be an OI or ZFS problem ;-)

Have fun,
jel.
-- 
Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany Tel: +49 391 67 52768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
Great write up Jens.

The chance of two MB to be broken is probably low but overheating is a very
good point. It was on my to-do list to setup IPMI and seems that now is the
best time to do it.

Thanks

On Wed, Mar 20, 2013 at 1:08 PM, Jens Elkner jel+...@cs.uni-magdeburg.dewrote:

 On Wed, Mar 20, 2013 at 08:50:40AM -0700, Peter Wood wrote:
 I'm sorry. I should have mentioned it that I can't find any errors in
 the
 logs. The last entry in /var/adm/messages is that I removed the
 keyboard
 after the last reboot and then it shows the new boot up messages when
 I boot
 up the system after the crash. The BIOS log is empty. I'm not sure
 how to
 check the IPMI but IPMI is not configured and I'm not using it.

 You definitely should! Plugin a cable into the dedicated network port
 and configure it (easiest way for you is probably to jump into the BIOS
 and assign the appropriate IP address etc.). Than, for a quick look,
 point your browser to the given IP port 80 (default login is
 ADMIN/ADMIN). Also you may now configure some other details
 (accounts/passwords/roles).

 To track the problem, either write a script, which polls the parameters
 in question periodically or just install the latest ipmiViewer and use
 this to monitor your sensors ad hoc.
 see ftp://ftp.supermicro.com/utility/IPMIView/

 Just another observation - the crashes are more intense the more data
 the
 system serves (NFS).
 I'm looking into FRMW upgrades for the LSI now.

 Latest LSI FW should be P15, for this MB type 217 (2.17), MB-BIOS C28
 (1.0b).
 However, I doubt, that your problem has anything to do with the
 SAS-ctrl or OI or ZFS.

 My guess is, that either your MB is broken (we had an X9DRH-iF, which
 instantly disappeared as soon as it got some real load) or you have
 a heat problem (watch you cpu temp e.g. via ipmiviewer). With 2GHz
 that's not very likely, but worth a try (socket placement on this board
 is not really smart IMHO).

 To test quickly
 - disable all addtional, unneeded service in OI, which may put some
   load on the machine (like NFS service, http and bla) and perhaps
   even export unneeded pools (just to be sure)
 - fire up your ipmiviewer and look at the sensors (set update to
   10s) or refresh manually often
 - start 'openssl speed -multi 32' and keep watching your cpu temp
   sensors (with 2GHz I guess it takes ~ 12min)

 I guess, your machine disappears before the CPUs getting really hot
 (broken MB). If CPUs switch off (usually first CPU2 and a little bit
 later CPU1) you have a cooling problem. If nothing happens, well, than
 it could be an OI or ZFS problem ;-)

 Have fun,
 jel.
 --
 Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
 Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
 39106 Magdeburg, Germany Tel: +49 391 67 52768
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] This mailing list EOL???

2013-03-20 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
I can't seem to find any factual indication that opensolaris.org mailing lists 
are going away, and I can't even find the reference to whoever said it was EOL 
in a few weeks ... a few weeks ago.

So ... are these mailing lists going bye-bye?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] This mailing list EOL???

2013-03-20 Thread Cindy Swearingen

Hi Ned,

This list is migrating to java.net and will not be available
in its current form after March 24, 2013.

The archive of this list is available here:

http://www.mail-archive.com/zfs-discuss@opensolaris.org/

I will provide an invitation to the new list shortly.

Thanks for your patience.

Cindy

On 03/20/13 15:05, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) wrote:

I can't seem to find any factual indication that opensolaris.org mailing
lists are going away, and I can't even find the reference to whoever
said it was EOL in a few weeks ... a few weeks ago.

So ... are these mailing lists going bye-bye?



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] This mailing list EOL???

2013-03-20 Thread Deirdre Straughan
Will the archives of all the lists be preserved? I don't think we've seen a
clear answer on that (it's possible you haven't, either!).

On Wed, Mar 20, 2013 at 2:14 PM, Cindy Swearingen 
cindy.swearin...@oracle.com wrote:

 Hi Ned,

 This list is migrating to java.net and will not be available
 in its current form after March 24, 2013.

 The archive of this list is available here:

 http://www.mail-archive.com/**zfs-discuss@opensolaris.org/http://www.mail-archive.com/zfs-discuss@opensolaris.org/

 I will provide an invitation to the new list shortly.

 Thanks for your patience.

 Cindy


 On 03/20/13 15:05, Edward Ned Harvey 
 (**opensolarisisdeadlongliveopens**olaris)
 wrote:

 I can't seem to find any factual indication that opensolaris.org mailing
 lists are going away, and I can't even find the reference to whoever
 said it was EOL in a few weeks ... a few weeks ago.

 So ... are these mailing lists going bye-bye?



 __**_
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/**mailman/listinfo/zfs-discusshttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 __**_
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/**mailman/listinfo/zfs-discusshttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 


best regards,
Deirdré Straughan
Community Architect, SmartOS
illumos Community Manager


cell 720 371 4107
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] System started crashing hard after zpool reconfigure and OI upgrade

2013-03-20 Thread Peter Wood
I can reproduce the problem. I can crash the system.

Here are the steps I did (some steps may not be needed but I haven't tested
it):

- Clean install of OI 151.a.7 on Supermicro hardware described above (32GB
RAM though, not the 128GB)

- Create 1 zpool, 6 raidz vdevs with 5 drives each

- NFS export a dataset
  zfs set sharenfs=rw=@10.20.1/24 vol01/htmlspace

- Create zfs child dataset
  zfs create vol01/htmlspace/A

  $ zfs get -H sharenfs vol01/htmlspace/A
  vol01/htmlspace/A   sharenfsrw=@10.20.1/24  inherited from
vol01/htmlspace

- Stop NFS shearing for the child dataset

  zfs set sharenfs=off vol01/htmlspace/A

The crash is instant after the sharenfs=off command.

I thought it was coincident so after reboot I tried it on another dataset.
Instant crash again. I get my prompt back but that's it. The system is gone
after that.

The NFS exported file systems are not accessed by any system on the
network. They are not in use. That's why I wanted to stop exporting them.
And, even if they were in use this should now crash the system, right?

I can't try the other box because it is heavy in production. At least not
until later tonight.

I thought I'll collect some advice to make each crash as useful as possible.

Any pointers are appreciated.

Thanks,

-- Peter


On Wed, Mar 20, 2013 at 8:34 AM, Peter Wood peterwood...@gmail.com wrote:

 I have two identical Supermicro boxes with 32GB ram. Hardware details at
 the end of the message.

 They were running OI 151.a.5 for months. The zpool configuration was one
 storage zpool with 3 vdevs of 8 disks in RAIDZ2.

 The OI installation is absolutely clean. Just next-next-next until done.
 All I do is configure the network after install. I don't install or enable
 any other services.

 Then I added more disks and rebuild the systems with OI 151.a.7 and this
 time configured the zpool with 6 vdevs of 5 disks in RAIDZ.

 The systems started crashing really bad. They just disappear from the
 network, black and unresponsive console, no error lights but no activity
 indication either. The only way out is to power cycle the system.

 There is no pattern in the crashes. It may crash in 2 days in may crash in
 2 hours.

 I upgraded the memory on both systems to 128GB at no avail. This is the
 max memory they can take.

 In summary all I did is upgrade to OI 151.a.7 and reconfigured zpool.

 Any idea what could be the problem.

 Thank you

 -- Peter

 Supermicro X9DRH-iF
 Xeon E5-2620 @ 2.0 GHz 6-Core
 LSI SAS9211-8i HBA
 32x 3TB Hitachi HUS723030ALS640, SAS, 7.2K

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Please join us on the new zfs discuss list on java.net

2013-03-20 Thread Cindy Swearingen

Hi Everyone,

The ZFS discussion list is moving to java.net.

This opensolaris/zfs discussion will not be available after March 24.
There is no way to migrate the existing list to the new list.

The solaris-zfs project is here:

http://java.net/projects/solaris-zfs

See the steps below to join the ZFS project or just the discussion list,
but you must create an account on java.net to join the list.

Thanks, Cindy

1. Create an account on java.net.

https://java.net/people/new

2. When logged in to your java.net account, join the solaris-zfs
project as an Observer by clicking the Join This Project link on the
left side of this page:

http://java.net/projects/solaris-zfs

3. Subscribe to the zfs discussion mailing list here:

http://java.net/projects/solaris-zfs/lists
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Ian Collins

Andrew Werchowiecki wrote:


Thanks for the info about slices, I may give that a go later on. I’m 
not keen on that because I have clear evidence (as in zpools set up 
this way, right now, working, without issue) that GPT partitions of 
the style shown above work and I want to see why it doesn’t work in my 
set up rather than simply ignoring and moving on.




Didn't you read Richard's post? You can have only one Solaris partition 
at a time.


Your original example failed when you tried to add a second.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Cindy Swearingen

Hi Andrew,

Your original syntax was incorrect.

A p* device is a larger container for the d* device or s* devices.
In the case of a cache device, you need to specify a d* or s* device.
That you can add p* devices to a pool is a bug.

Adding different slices from c25t10d1 as both log and cache devices
would need the s* identifier, but you've already added the entire
c25t10d1 as the log device. A better configuration would be using
c25t10d1 for log and using c25t9d1 for cache or provide some spares
for this large pool.

After you remove the log devices, re-add like this:

# zpool add aggr0 log c25t10d1
# zpool add aggr0 cache c25t9d1

You might review the ZFS recommendation practices section, here:

http://docs.oracle.com/cd/E26502_01/html/E29007/zfspools-4.html#storage-2

See example 3-4 for adding a cache device, here:

http://docs.oracle.com/cd/E26502_01/html/E29007/gayrd.html#gazgw

Always have good backups.

Thanks, Cindy



On 03/18/13 23:23, Andrew Werchowiecki wrote:

I did something like the following:

format -e /dev/rdsk/c5t0d0p0

fdisk

1 (create)

F (EFI)

6 (exit)

partition

label

1

y

0

usr

wm

64

4194367e

1

usr

wm

4194368

117214990

label

1

y

Total disk size is 9345 cylinders

Cylinder size is 12544 (512 byte) blocks

Cylinders

Partition Status Type Start End Length %

= ==  = === == ===

1 EFI 0 9345 9346 100

partition print

Current partition table (original):

Total disk sectors available: 117214957 + 16384 (reserved sectors)

Part Tag Flag First Sector Size Last Sector

0 usr wm 64 2.00GB 4194367

1 usr wm 4194368 53.89GB 117214990

2 unassigned wm 0 0 0

3 unassigned wm 0 0 0

4 unassigned wm 0 0 0

5 unassigned wm 0 0 0

6 unassigned wm 0 0 0

8 reserved wm 117214991 8.00MB 117231374

This isn’t the output from when I did it but it is exactly the same
steps that I followed.

Thanks for the info about slices, I may give that a go later on. I’m not
keen on that because I have clear evidence (as in zpools set up this
way, right now, working, without issue) that GPT partitions of the style
shown above work and I want to see why it doesn’t work in my set up
rather than simply ignoring and moving on.

*From:*Fajar A. Nugraha [mailto:w...@fajar.net]
*Sent:* Sunday, 17 March 2013 3:04 PM
*To:* Andrew Werchowiecki
*Cc:* zfs-discuss@opensolaris.org
*Subject:* Re: [zfs-discuss] partioned cache devices

On Sun, Mar 17, 2013 at 1:01 PM, Andrew Werchowiecki
andrew.werchowie...@xpanse.com.au
mailto:andrew.werchowie...@xpanse.com.au wrote:

I understand that p0 refers to the whole disk... in the logs I
pasted in I'm not attempting to mount p0. I'm trying to work out why
I'm getting an error attempting to mount p2, after p1 has
successfully mounted. Further, this has been done before on other
systems in the same hardware configuration in the exact same
fashion, and I've gone over the steps trying to make sure I haven't
missed something but can't see a fault.

How did you create the partition? Are those marked as solaris partition,
or something else (e.g. fdisk on linux use type 83 by default).

I'm not keen on using Solaris slices because I don't have an
understanding of what that does to the pool's OS interoperability.

Linux can read solaris slice and import solaris-made pools just fine, as
long as you're using compatible zpool version (e.g. zpool version 28).

--

Fajar



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What would be the best tutorial cum reference doc for ZFS

2013-03-19 Thread Cindy Swearingen

Hi Hans,

Start with the ZFS Admin Guide, here:

http://docs.oracle.com/cd/E26502_01/html/E29007/index.html

Or, start with your specific questions.

Thanks, Cindy

On 03/19/13 03:30, Hans J. Albertsson wrote:

as used on Illumos?

I've seen a few tutorials written by people who obviously are very
action oriented; afterwards you find you have worn your keyboard down a
bit and not learned a lot at all, at least not in the sense of
understanding what zfs is and what it does and why things are the way
they are.

I'm looking for something that would make me afterwards understand what,
say, commands like zpool import ... or zfs send ... actually do, and
some idea as to why, so I can begin to understand ZFS in a way that
allows me to make educated guesses on how to perform tasks I haven't
tried before.
And mostly without having to ask around for days on end.

For SOME part of zfs I'm already there, but only for the things I had to
do more than twice or so while managing the Swedish lab at Sun Micro.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What would be the best tutorial cum reference doc for ZFS

2013-03-19 Thread Deirdre Straughan
There are links to videos and other materials here:
http://wiki.smartos.org/display/DOC/ZFS

Not as organized as I'd like...


On Tue, Mar 19, 2013 at 2:30 AM, Hans J. Albertsson 
hans.j.alberts...@branneriet.se wrote:

 as used on Illumos?

 I've seen a few tutorials written by people who obviously are very action
 oriented; afterwards you find you have worn your keyboard down a bit and
 not learned a lot at all, at least not in the sense of understanding what
 zfs is and what it does and why things are the way they are.

 I'm looking for something that would make me afterwards understand what,
 say, commands like  zpool import ... or zfs send ... actually do, and some
 idea as to why, so I can begin to understand ZFS in a way that allows me to
 make educated guesses on how to perform tasks I haven't tried before.
 And mostly without having to ask around for days on end.

 For SOME part of zfs I'm already there, but only for the things I had to
 do more than twice or so while managing the Swedish lab at Sun Micro.


 __**_
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/**mailman/listinfo/zfs-discusshttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 


best regards,
Deirdré Straughan
Community Architect, SmartOS
illumos Community Manager


cell 720 371 4107
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Andrew Gabriel

Andrew Werchowiecki wrote:


 Total disk size is 9345 cylinders
 Cylinder size is 12544 (512 byte) blocks
 
   Cylinders

  Partition   StatusType  Start   End   Length%
  =   ==  =   ===   ==   ===
  1 EFI   0  93459346100


You only have a p1 (and for a GPT/EFI labeled disk, you can only
have p1 - no other FDISK partitions are allowed).


partition print
Current partition table (original):
Total disk sectors available: 117214957 + 16384 (reserved sectors)
 
Part  TagFlag First Sector Size Last Sector

  0usrwm642.00GB  4194367
  1usrwm   4194368   53.89GB  117214990
  2 unassignedwm 0   0   0
  3 unassignedwm 0   0   0
  4 unassignedwm 0   0   0
  5 unassignedwm 0   0   0
  6 unassignedwm 0   0   0
  8   reservedwm 1172149918.00MB  117231374


You have an s0 and s1.

This isn’t the output from when I did it but it is exactly the same 
steps that I followed.
 
Thanks for the info about slices, I may give that a go later on. I’m not 
keen on that because I have clear evidence (as in zpools set up this 
way, right now, working, without issue) that GPT partitions of the style 
shown above work and I want to see why it doesn’t work in my set up 
rather than simply ignoring and moving on.


You would have to blow away the partitioning you have, and create an FDISK
partitioned disk (not EFI), and then create a p1 and p2 partition. (Don't
use the 'partition' subcommand, which confusingly creates solaris slices.)
Give the FDISK partitions a partition type which nothing will recognise,
such as 'other', so that nothing will try and interpret them as OS partitions.
Then you can use them as raw devices, and they should be portable between
OS's which can handle FDISK partitioned devices.

--
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Jim Klimov

On 2013-03-19 20:38, Cindy Swearingen wrote:

Hi Andrew,

Your original syntax was incorrect.

A p* device is a larger container for the d* device or s* devices.
In the case of a cache device, you need to specify a d* or s* device.
That you can add p* devices to a pool is a bug.


I disagree; at least, I've always thought differently:
the d device is the whole disk denomination, with a
unique number for a particular controller link (c+t).

The disk has some partitioning table, MBR or GPT/EFI.
In these tables, partition p0 stands for the table
itself (i.e. to manage partitioning), and the rest kind
of depends. In case of MBR tables, one partition may
be named as having a Solaris (or Solaris2) type, and
there it holds a SMI table of Solaris slices, and these
slices can hold legacy filesystems or components of ZFS
pools. In case of GPT, the GPT-partitions can be used
directly by ZFS. However, they are also denominated as
slices in ZFS and format utility.

I believe, Solaris-based OSes accessing a p-named
partition and an s-named slice of the same number
on a GPT disk should lead to the same range of bytes
on disk, but I am not really certain about this.

Also, if a whole disk is given to ZFS (and for OSes
other that the latest Solaris 11 this means non-rpool
disks), then ZFS labels the disk as GPT and defines a
partition for itself plus a small trailing partition
(likely to level out discrepancies with replacement
disks that might happen to be a few sectors too small).
In this case ZFS reports that it uses cXtYdZ as a
pool component, since it considers itself in charge
of the partitioning table and its inner contents, and
doesn't intend to share the disk with other usages
(dual-booting and other OSes' partitions, or SLOG and
L2ARC parts, etc). This also allows ZFS to influence
hardware-related choices, like caching and throttling,
and likely auto-expansion with the changed LUN sizes
by fixing up the partition table along the way, since
it assumes being 100% in charge of the disk.

I don't think there is a crime in trying to use the
partitions (of either kind) as ZFS leaf vdevs, even the
zpool(1M) manpage states that:

... The  following  virtual  devices  are supported:
  disk
A block device, typically located under  /dev/dsk.
ZFS  can  use  individual  slices  or  partitions,
though the recommended mode of operation is to use
whole  disks.  ...

This is orthogonal to the fact that there can only be
one Solaris slice table, inside one partition, on MBR.
AFAIK this is irrelevant on GPT/EFI - no SMI slices there.

On my old home NAS with OpenSolaris I certainly did have
MBR partitions on the rpool intended initially for some
dual-booted OSes, but repurposed as L2ARC and ZIL devices
for the storage pool on other disks, when I played with
that technology. Didn't gain much with a single spindle ;)

HTH,
//Jim Klimov

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Andrew Gabriel

On 03/19/13 20:27, Jim Klimov wrote:

I disagree; at least, I've always thought differently:
the d device is the whole disk denomination, with a
unique number for a particular controller link (c+t).

The disk has some partitioning table, MBR or GPT/EFI.
In these tables, partition p0 stands for the table
itself (i.e. to manage partitioning),


p0 is the whole disk regardless of any partitioning.
(Hence you can use p0 to access any type of partition table.)


and the rest kind
of depends. In case of MBR tables, one partition may
be named as having a Solaris (or Solaris2) type, and
there it holds a SMI table of Solaris slices, and these
slices can hold legacy filesystems or components of ZFS
pools. In case of GPT, the GPT-partitions can be used
directly by ZFS. However, they are also denominated as
slices in ZFS and format utility.


The GPT partitioning spec requires the disk to be FDISK
partitioned with just one single FDISK partition of type EFI,
so that tools which predate GPT partitioning will still see
such a GPT disk as fully assigned to FDISK partitions, and
therefore less likely to be accidentally blown away.


I believe, Solaris-based OSes accessing a p-named
partition and an s-named slice of the same number
on a GPT disk should lead to the same range of bytes
on disk, but I am not really certain about this.


No, you'll see just p0 (whole disk), and p1 (whole disk
less space for the backwards compatible FDISK partitioning).


Also, if a whole disk is given to ZFS (and for OSes
other that the latest Solaris 11 this means non-rpool
disks), then ZFS labels the disk as GPT and defines a
partition for itself plus a small trailing partition
(likely to level out discrepancies with replacement
disks that might happen to be a few sectors too small).
In this case ZFS reports that it uses cXtYdZ as a
pool component,


For an EFI disk, the device name without a final p* or s*
component is the whole EFI partition. (It's actually the
s7 slice minor device node, but the s7 is dropped from
the device name to avoid the confusion we had with s2
on SMI labeled disks being the whole SMI partition.)


since it considers itself in charge
of the partitioning table and its inner contents, and
doesn't intend to share the disk with other usages
(dual-booting and other OSes' partitions, or SLOG and
L2ARC parts, etc). This also allows ZFS to influence
hardware-related choices, like caching and throttling,
and likely auto-expansion with the changed LUN sizes
by fixing up the partition table along the way, since
it assumes being 100% in charge of the disk.

I don't think there is a crime in trying to use the
partitions (of either kind) as ZFS leaf vdevs, even the
zpool(1M) manpage states that:

... The  following  virtual  devices  are supported:
  disk
A block device, typically located under  /dev/dsk.
ZFS  can  use  individual  slices  or  partitions,
though the recommended mode of operation is to use
whole  disks.  ...


Right.


This is orthogonal to the fact that there can only be
one Solaris slice table, inside one partition, on MBR.
AFAIK this is irrelevant on GPT/EFI - no SMI slices there.


There's a simpler way to think of it on x86.
You always have FDISK partitioning (p1, p2, p3, p4).
You can then have SMI or GPT/EFI slices (both called s0, s1, ...)
in an FDISK partition of the appropriate type.
With SMI labeling, s2 is by convention the whole Solaris FDISK
partition (although this is not enforced).
With EFI labeling, s7 is enforced as the whole EFI FDISK partition,
and so the trailing s7 is dropped off the device name for
clarity.

This simplicity is brought about because the GPT spec requires
that backwards compatible FDISK partitioning is included, but
with just 1 partition assigned.

--
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Jim Klimov

On 2013-03-19 22:07, Andrew Gabriel wrote:

The GPT partitioning spec requires the disk to be FDISK
partitioned with just one single FDISK partition of type EFI,
so that tools which predate GPT partitioning will still see
such a GPT disk as fully assigned to FDISK partitions, and
therefore less likely to be accidentally blown away.


Okay, I guess I got entangled in terminology now ;)
Anyhow, your words are not all news to me, though my write-up
was likely misleading to unprepared readers... sigh... Thanks
for the clarifications and deeper details that I did not know!

So, we can concur that GPT does indeed include the fake MBR
header with one EFI partition which addresses the smaller of
2TB (MBR limit) or disk size, minus a few sectors for the GPT
housekeeping. Inside the EFI partition are defined the GPT,
um, partitions (represented as slices in Solaris). This is
after all a GUID *Partition* Table, and that's how parted
refers to them too ;)

Notably, there are also unportable tricks to fool legacy OSes
and bootloaders into addressing the same byte ranges via both
MBR entries (forged manually and abusing the GPT/EFI spec) and
proper GPT entries, as partitions in the sense of each table.

//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs-discuss Digest, Vol 89, Issue 12

2013-03-18 Thread Kristoffer Sheather @ CloudCentral
You could always use 40-gigabit between the two storage systems which would 
speed things dramatically, or back to back 56-gigabit IB.


 From: zfs-discuss-requ...@opensolaris.org
Sent: Monday, March 18, 2013 11:01 PM
To: zfs-discuss@opensolaris.org
Subject: zfs-discuss Digest, Vol 89, Issue 12

Send zfs-discuss mailing list submissions to
zfs-discuss@opensolaris.org

To subscribe or unsubscribe via the World Wide Web, visit
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
or, via email, send a message with subject or body 'help' to
zfs-discuss-requ...@opensolaris.org

You can reach the person managing the list at
zfs-discuss-ow...@opensolaris.org

When replying, please edit your Subject line so it is more specific
than Re: Contents of zfs-discuss digest...

Today's Topics:

1. Re: [zfs] Re:  Petabyte pool? (Richard Yao)
2. Re: [zfs] Re:  Petabyte pool? (Trey Palmer)

--

Message: 1
Date: Sat, 16 Mar 2013 08:23:07 -0400
From: Richard Yao r...@gentoo.org
To: z...@lists.illumos.org
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] [zfs] Re:  Petabyte pool?
Message-ID: 5144642b.1030...@gentoo.org
Content-Type: text/plain; charset=iso-8859-1

On 03/16/2013 12:57 AM, Richard Elling wrote:
 On Mar 15, 2013, at 6:09 PM, Marion Hakanson hakan...@ohsu.edu wrote:
 So, has anyone done this?  Or come close to it?  Thoughts, even if you
 haven't done it yourself?
 
 Don't forget about backups :-)
  -- richard

Transferring 1 PB over a 10 gigabit link will take at least 10 days when
overhead is taken into account. The backup system should have a
dedicated 10 gigabit link at the minimum and using incremental send/recv
will be extremely important.

-- next part --
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 901 bytes
Desc: OpenPGP digital signature
URL: 
http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130316/de90
7dfe/attachment-0001.bin

--

Message: 2
Date: Sat, 16 Mar 2013 01:30:41 -0400 (EDT)
From: Trey Palmer t...@nerdmagic.com
To: z...@lists.illumos.org z...@lists.illumos.org
Cc: z...@lists.illumos.org z...@lists.illumos.org,
zfs-discuss@opensolaris.org zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] [zfs] Re:  Petabyte pool?
Message-ID: 1ce7bf11-6e42-421e-b136-14c0d557d...@nerdmagic.com
Content-Type: text/plain;   charset=us-ascii

I know it's heresy these days, but given the I/O throughput you're looking 
for and the amount you're going to spend on disks, a T5-2 could make sense 
when they're released (I think) later this month.

Crucial sells RAM they guarantee for use in SPARC T-series, and since 
you're at an edu the academic discount is 35%.   So A T4-2 with 512GB RAM 
could be had for under $35K shortly after release, 4-5 months before the E5 
Xeon was released.  It seemed a surprisingly good deal to me.

The T5-2 has 32x3.6GHz cores, 256 threads and ~150GB/s aggregate memory 
bandwidth.   In my testing a T4-1 can compete with a 12-core E-5 box on I/O 
and memory bandwidth, and this thing is about 5 times bigger than the T4-1. 
  It should have at least 10 PCIe's and will take 32 DIMMs minimum, maybe 
64.  And is likely to cost you less than $50K with aftermarket RAM.

-- Trey

On Mar 15, 2013, at 10:35 PM, Marion Hakanson hakan...@ohsu.edu wrote:

 Ray said:
 Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual 
pathed
 to a couple of LSI SAS switches.
 Marion said:
 How many HBA's in the R720?
 Ray said:
 We have qty 2 LSI SAS 9201-16e HBA's (Dell resold[1]).
 
 Sounds similar in approach to the Aberdeen product another sender 
referred to,
 with SAS switch layout:
  http://www.aberdeeninc.com/images/1-up-petarack2.jpg
 
 One concern I had is that I compared our SuperMicro JBOD with 40x 4TB 
drives
 in it, connected via a dual-port LSI SAS 9200-8e HBA, to the same pool 
layout
 on a 40-slot server with 40x SATA drives in it.  But the server uses n
 expanders, instead using SAS-to-SATA octopus cables to connect the 
drives
 directly to three internal SAS HBA's (2x 9201-16i's, 1x 9211-8i).
 
 What I found was that the internal pool was significantly faster for 
both
 sequential and random I/O than the pool on the external JBOD.
 
 My conclusion was that I would not want to exceed ~48 drives on a single
 8-port SAS HBA.  So I thought that running the I/O of all your hundreds
 of drives through only two HBA's would be a bottleneck.
 
 LSI's specs say 4800MBytes/sec for an 8-port SAS HBA, but 4000MBytes/sec
 for that card in an x8 PCIe-2.0 slot.  Sure, the newer 9207-8e is rated
 at 8000MBytes/sec in an x8 PCIe-3.0 slot, but it still has only the same
 8 SAS ports going at 4800MBytes/sec.
 
 Yes, I know the disks probably can't go that fast.  But in my tests
 above, the internal 40-disk pool measures 2000MBytes/sec sequential
 reads and writes, 

Re: [zfs-discuss] partioned cache devices

2013-03-18 Thread Andrew Werchowiecki
I did something like the following:

format -e /dev/rdsk/c5t0d0p0
fdisk
1 (create)
F (EFI)
6 (exit)
partition
label
1
y
0
usr
wm
64
4194367e
1
usr
wm
4194368
117214990
label
1
y



 Total disk size is 9345 cylinders
 Cylinder size is 12544 (512 byte) blocks

   Cylinders
  Partition   StatusType  Start   End   Length%
  =   ==  =   ===   ==   ===
  1 EFI   0  93459346100

partition print
Current partition table (original):
Total disk sectors available: 117214957 + 16384 (reserved sectors)

Part  TagFlag First Sector Size Last Sector
  0usrwm642.00GB  4194367
  1usrwm   4194368   53.89GB  117214990
  2 unassignedwm 0   0   0
  3 unassignedwm 0   0   0
  4 unassignedwm 0   0   0
  5 unassignedwm 0   0   0
  6 unassignedwm 0   0   0
  8   reservedwm 1172149918.00MB  117231374

This isn't the output from when I did it but it is exactly the same steps that 
I followed.

Thanks for the info about slices, I may give that a go later on. I'm not keen 
on that because I have clear evidence (as in zpools set up this way, right now, 
working, without issue) that GPT partitions of the style shown above work and I 
want to see why it doesn't work in my set up rather than simply ignoring and 
moving on.

From: Fajar A. Nugraha [mailto:w...@fajar.net]
Sent: Sunday, 17 March 2013 3:04 PM
To: Andrew Werchowiecki
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] partioned cache devices

On Sun, Mar 17, 2013 at 1:01 PM, Andrew Werchowiecki 
andrew.werchowie...@xpanse.com.aumailto:andrew.werchowie...@xpanse.com.au 
wrote:
I understand that p0 refers to the whole disk... in the logs I pasted in I'm 
not attempting to mount p0. I'm trying to work out why I'm getting an error 
attempting to mount p2, after p1 has successfully mounted. Further, this has 
been done before on other systems in the same hardware configuration in the 
exact same fashion, and I've gone over the steps trying to make sure I haven't 
missed something but can't see a fault.

How did you create the partition? Are those marked as solaris partition, or 
something else (e.g. fdisk on linux use type 83 by default).

I'm not keen on using Solaris slices because I don't have an understanding of 
what that does to the pool's OS interoperability.


Linux can read solaris slice and import solaris-made pools just fine, as long 
as you're using compatible zpool version (e.g. zpool version 28).

--
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-17 Thread Fajar A. Nugraha
On Sun, Mar 17, 2013 at 1:01 PM, Andrew Werchowiecki 
andrew.werchowie...@xpanse.com.au wrote:

 I understand that p0 refers to the whole disk... in the logs I pasted in
 I'm not attempting to mount p0. I'm trying to work out why I'm getting an
 error attempting to mount p2, after p1 has successfully mounted. Further,
 this has been done before on other systems in the same hardware
 configuration in the exact same fashion, and I've gone over the steps
 trying to make sure I haven't missed something but can't see a fault.


How did you create the partition? Are those marked as solaris partition, or
something else (e.g. fdisk on linux use type 83 by default).

I'm not keen on using Solaris slices because I don't have an understanding
 of what that does to the pool's OS interoperability.



Linux can read solaris slice and import solaris-made pools just fine, as
long as you're using compatible zpool version (e.g. zpool version 28).

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs] Re: Petabyte pool?

2013-03-17 Thread Richard Yao
On 03/16/2013 12:57 AM, Richard Elling wrote:
 On Mar 15, 2013, at 6:09 PM, Marion Hakanson hakan...@ohsu.edu wrote:
 So, has anyone done this?  Or come close to it?  Thoughts, even if you
 haven't done it yourself?
 
 Don't forget about backups :-)
  -- richard

Transferring 1 PB over a 10 gigabit link will take at least 10 days when
overhead is taken into account. The backup system should have a
dedicated 10 gigabit link at the minimum and using incremental send/recv
will be extremely important.



signature.asc
Description: OpenPGP digital signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs] Re: Petabyte pool?

2013-03-17 Thread Trey Palmer
I know it's heresy these days, but given the I/O throughput you're looking for 
and the amount you're going to spend on disks, a T5-2 could make sense when 
they're released (I think) later this month.

Crucial sells RAM they guarantee for use in SPARC T-series, and since you're at 
an edu the academic discount is 35%.   So A T4-2 with 512GB RAM could be had 
for under $35K shortly after release, 4-5 months before the E5 Xeon was 
released.  It seemed a surprisingly good deal to me.

The T5-2 has 32x3.6GHz cores, 256 threads and ~150GB/s aggregate memory 
bandwidth.   In my testing a T4-1 can compete with a 12-core E-5 box on I/O and 
memory bandwidth, and this thing is about 5 times bigger than the T4-1.   It 
should have at least 10 PCIe's and will take 32 DIMMs minimum, maybe 64.  And 
is likely to cost you less than $50K with aftermarket RAM.

-- Trey



On Mar 15, 2013, at 10:35 PM, Marion Hakanson hakan...@ohsu.edu wrote:

 Ray said:
 Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed
 to a couple of LSI SAS switches.
 Marion said:
 How many HBA's in the R720?
 Ray said:
 We have qty 2 LSI SAS 9201-16e HBA's (Dell resold[1]).
 
 Sounds similar in approach to the Aberdeen product another sender referred to,
 with SAS switch layout:
  http://www.aberdeeninc.com/images/1-up-petarack2.jpg
 
 One concern I had is that I compared our SuperMicro JBOD with 40x 4TB drives
 in it, connected via a dual-port LSI SAS 9200-8e HBA, to the same pool layout
 on a 40-slot server with 40x SATA drives in it.  But the server uses n
 expanders, instead using SAS-to-SATA octopus cables to connect the drives
 directly to three internal SAS HBA's (2x 9201-16i's, 1x 9211-8i).
 
 What I found was that the internal pool was significantly faster for both
 sequential and random I/O than the pool on the external JBOD.
 
 My conclusion was that I would not want to exceed ~48 drives on a single
 8-port SAS HBA.  So I thought that running the I/O of all your hundreds
 of drives through only two HBA's would be a bottleneck.
 
 LSI's specs say 4800MBytes/sec for an 8-port SAS HBA, but 4000MBytes/sec
 for that card in an x8 PCIe-2.0 slot.  Sure, the newer 9207-8e is rated
 at 8000MBytes/sec in an x8 PCIe-3.0 slot, but it still has only the same
 8 SAS ports going at 4800MBytes/sec.
 
 Yes, I know the disks probably can't go that fast.  But in my tests
 above, the internal 40-disk pool measures 2000MBytes/sec sequential
 reads and writes, while the external 40-disk JBOD measures at 1500
 to 1700 MBytes/sec.  Not a lot slower, but significantly slower, so
 I do think the number of HBA's makes a difference.
 
 At the moment, I'm leaning toward piling six, eight, or ten HBA's into
 a server, preferably one with dual IOH's (thus two PCIe busses), and
 connecting dual-path JBOD's in that manner.
 
 I hadn't looked into SAS switches much, but they do look more reliable
 than daisy-chaining a bunch of JBOD's together.  I just haven't seen
 how to get more bandwidth through them to a single host.
 
 Regards,
 
 Marion
 
 
 
 
 ---
 illumos-zfs
 Archives: https://www.listbox.com/member/archive/182191/=now
 RSS Feed: https://www.listbox.com/member/archive/rss/182191/22500336-78e51065
 Modify Your Subscription: 
 https://www.listbox.com/member/?member_id=22500336id_secret=22500336-0da17977
 Powered by Listbox: http://www.listbox.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabyte pool?

2013-03-16 Thread Marion Hakanson
hakan...@ohsu.edu said:
 I get a little nervous at the thought of hooking all that up to a single
 server, and am a little vague on how much RAM would be advisable, other than
 as much as will fit (:-).  Then again, I've been waiting for something 
like
 pNFS/NFSv4.1 to be usable for gluing together multiple NFS servers into a
 single global namespace, without any sign of that happening anytime soon.
 
richard.ell...@gmail.com said:
 NFS v4 or DFS (or even clever sysadmin + automount) offers single namespace
 without needing the complexity of NFSv4.1, lustre, glusterfs, etc. 

Been using NFSv4 since it showed up in Solaris-10 FCS, and it is true
that I've been clever enough (without automount -- I like my computers
to be as deterministic as possible, thank you very much :-) for our
NFS clients to see a single directory-tree namespace which abstracts
away the actual server/location of a particular piece of data.

However, we find it starts getting hard to manage when a single project
(think directory node) needs more space than their current NFS server
will hold.  Or perhaps what you're getting at above is even more clever
than I have been to date, and is eluding me at the moment.  I did see
someone mention NFSv4 referrals recently, maybe that would help.

Plus, believe it or not, some of our customers still insist on having the
server name in their path hierarchy for some reason, like /home/mynfs1/,
/home/mynfs2/, and so on.  Perhaps I've just not been persuasive enough
yet (:-).


richard.ell...@gmail.com said:
 Don't forget about backups :-)

I was hoping I could get by with telling them to buy two of everything.

Thanks and regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs] Petabyte pool?

2013-03-16 Thread Bob Friesenhahn

On Sat, 16 Mar 2013, Kristoffer Sheather @ CloudCentral wrote:


Well, off the top of my head:

2 x Storage Heads, 4 x 10G, 256G RAM, 2 x Intel E5 CPU's
8 x 60-Bay JBOD's with 60 x 4TB SAS drives
RAIDZ2 stripe over the 8 x JBOD's

That should fit within 1 rack comfortably and provide 1 PB storage..


What does one do for power?  What are the power requirements when the 
system is first powered on?  Can drive spin-up be staggered between 
JBOD chassis?  Does the server need to be powered up last so that it 
does not time out on the zfs import?


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs] Petabyte pool?

2013-03-16 Thread Jim Klimov

On 2013-03-16 15:20, Bob Friesenhahn wrote:

On Sat, 16 Mar 2013, Kristoffer Sheather @ CloudCentral wrote:


Well, off the top of my head:

2 x Storage Heads, 4 x 10G, 256G RAM, 2 x Intel E5 CPU's
8 x 60-Bay JBOD's with 60 x 4TB SAS drives
RAIDZ2 stripe over the 8 x JBOD's

That should fit within 1 rack comfortably and provide 1 PB storage..


What does one do for power?  What are the power requirements when the
system is first powered on?  Can drive spin-up be staggered between JBOD
chassis?  Does the server need to be powered up last so that it does not
time out on the zfs import?


I guess you can use managed PDUs like those from APC (many models for
varied socket types and amounts); they can be scripted on an advanced
level, and on a basic level I think delays can be just configured
per-socket to make the staggered startup after giving power from the
wall (UPS) regardless of what the boxes' individual power sources can
do. Conveniently, they also allow to do a remote hard-reset of hung
boxes without walking to the server room ;)

My 2c,
//Jim Klimov

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs] Petabyte pool?

2013-03-16 Thread Jim Klimov

On 2013-03-16 15:20, Bob Friesenhahn wrote:

On Sat, 16 Mar 2013, Kristoffer Sheather @ CloudCentral wrote:


Well, off the top of my head:

2 x Storage Heads, 4 x 10G, 256G RAM, 2 x Intel E5 CPU's
8 x 60-Bay JBOD's with 60 x 4TB SAS drives
RAIDZ2 stripe over the 8 x JBOD's

That should fit within 1 rack comfortably and provide 1 PB storage..


What does one do for power?  What are the power requirements when the
system is first powered on?  Can drive spin-up be staggered between JBOD
chassis?  Does the server need to be powered up last so that it does not
time out on the zfs import?



Giving this question a second thought, I think JBODs should spin-up
quickly (i.e. when power is given) while the server head(s) take time
to pass POST, initialize their HBAs and other stuff. Booting 8 JBODs,
one every 15 seconds to complete a typical spin-up power draw, would
take a couple of minutes. It is likely that a server booted along with
the first JBOD won't get to importing the pool this quickly ;)

Anyhow, with such a system attention should be given to redundant power
and cooling, including redundant UPSes preferably fed from different
power lines going into the room.

This does not seem like a fantastic power sucker, however. 480 drives at 
15W would consume 7200W; add a bit for processor/RAM heads (perhaps

a kW?) and this would still fit into 8-10kW, so a couple of 15kVA UPSes
(or more smaller ones) should suffice including redundancy. This might
overall exceed a rack in size though. But for power/cooling this seems
like a standard figure for a 42U rack or just a bit more.

//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs] Petabyte pool?

2013-03-16 Thread Tim Cook
On Sat, Mar 16, 2013 at 2:27 PM, Jim Klimov jimkli...@cos.ru wrote:

 On 2013-03-16 15:20, Bob Friesenhahn wrote:

 On Sat, 16 Mar 2013, Kristoffer Sheather @ CloudCentral wrote:

  Well, off the top of my head:

 2 x Storage Heads, 4 x 10G, 256G RAM, 2 x Intel E5 CPU's
 8 x 60-Bay JBOD's with 60 x 4TB SAS drives
 RAIDZ2 stripe over the 8 x JBOD's

 That should fit within 1 rack comfortably and provide 1 PB storage..


 What does one do for power?  What are the power requirements when the
 system is first powered on?  Can drive spin-up be staggered between JBOD
 chassis?  Does the server need to be powered up last so that it does not
 time out on the zfs import?


 I guess you can use managed PDUs like those from APC (many models for
 varied socket types and amounts); they can be scripted on an advanced
 level, and on a basic level I think delays can be just configured
 per-socket to make the staggered startup after giving power from the
 wall (UPS) regardless of what the boxes' individual power sources can
 do. Conveniently, they also allow to do a remote hard-reset of hung
 boxes without walking to the server room ;)

 My 2c,
 //Jim Klimov


Any modern JBOD should have the intelligence built in to stagger drive
spin-up.  I wouldn't spend money on one that didn't.  There's really no
need to stagger the JBOD power-up at the PDU.

As for the head, yes it should have a delayed power on which you can
typically set in the BIOS.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabyte pool?

2013-03-16 Thread Schweiss, Chip
I just recently built an OpenIndiana 151a7 system that is currently 1/2 PB
that will be expanded to 1 PB as we collect imaging data for the Human
Connectome Project at Washington University in St. Louis.  It is very much
like your use case as this is an offsite backup system that will write once
and read rarely.

It has displaced a BlueArc DR system because their mechanisms for syncing
over distances could not keep up with our data generation rate.   The fact
it cost 5x per TB as homebrew helped the decision also.

It is currently 180 4TB SAS Seagate Constellations in 4 Supermicro JBODs.
  The JBODS currently are in two branches only cascading once.   When
expanded 4 JBODs will be on each branch.  The pool is configured as 9 zvols
of 19 drives in raidz3.   The remaining disks are configured as hot
spares.  Metedata only is cached in 128GB ram and 2 480GB Intel 520 SSDs
for L2ARC.  Sync (ZIL) is turned off since the worst that would happen is
that we would need to rerun an rsync job.

Two identical servers were built for a cold standby configuration.   Since
it is a DR system the need for a hot standby was ruled out since even
several hours downtime would not be an issue.  Each server is fitted with 2
LSI 9207-8e HBAs configured as redundant multipath to the JBODs.

Before putting in into service I ran several iozone tests to benchmark the
pool.   Even with really fat vdevs the performance is impressive.   If
you're interested in that data let me know.It has many hours of idle
time each day so additional performance tests are not out of the question
either.

Actually I should say I designed and configured the system.  The system was
assembled by a colleague at UMINN.   If you would like more details on the
hardware I have a very detailed assembly doc I wrote and would be happy to
share.

The system receives daily rsyncs from our production BlueArc system.   The
rsyncs are split into 120 parallel rsync jobs.  This overcomes the latency
slow down TCP suffers from and we see total throughput between
500-700Mb/s.  The BlueArc has 120TB of 15k SAS tiered to NL-SAS.  All
metadata is on the SAS pool.   The ZFS system outpaces the BlueArc on
metadata when rsync does its tree walk.

Given all the safeguards built into ZFS, I would not hesitate to build a
production system at the multi-petabyte scale.   If a channel to disks are
no longer available it will simply stop writing and data will be safe.
Given the redundant paths, power supplies, etc, the odds of that happening
are very unlikely.  The single points of failure left when running a single
server remain at the motherboard, CPU and RAM level.   Build a hot standby
server and human error becomes the most likely failure.

-Chip

On Fri, Mar 15, 2013 at 8:09 PM, Marion Hakanson hakan...@ohsu.edu wrote:

 Greetings,

 Has anyone out there built a 1-petabyte pool?  I've been asked to look
 into this, and was told low performance is fine, workload is likely
 to be write-once, read-occasionally, archive storage of gene sequencing
 data.  Probably a single 10Gbit NIC for connectivity is sufficient.

 We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis,
 using 4TB nearline SAS drives, giving over 100TB usable space (raidz3).
 Back-of-the-envelope might suggest stacking up eight to ten of those,
 depending if you want a raw marketing petabyte, or a proper power-of-two
 usable petabyte.

 I get a little nervous at the thought of hooking all that up to a single
 server, and am a little vague on how much RAM would be advisable, other
 than as much as will fit (:-).  Then again, I've been waiting for
 something like pNFS/NFSv4.1 to be usable for gluing together multiple
 NFS servers into a single global namespace, without any sign of that
 happening anytime soon.

 So, has anyone done this?  Or come close to it?  Thoughts, even if you
 haven't done it yourself?

 Thanks and regards,

 Marion


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-16 Thread Andrew Werchowiecki
It's a home set up, the performance penalty from splitting the cache devices is 
non-existant, and that work around sounds like some pretty crazy amount of 
overhead where I could instead just have a mirrored slog.

I'm less concerned about wasted space, more concerned about amount of SAS ports 
I have available.

I understand that p0 refers to the whole disk... in the logs I pasted in I'm 
not attempting to mount p0. I'm trying to work out why I'm getting an error 
attempting to mount p2, after p1 has successfully mounted. Further, this has 
been done before on other systems in the same hardware configuration in the 
exact same fashion, and I've gone over the steps trying to make sure I haven't 
missed something but can't see a fault. 

I'm not keen on using Solaris slices because I don't have an understanding of 
what that does to the pool's OS interoperability. 

From: Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) 
[opensolarisisdeadlongliveopensola...@nedharvey.com]
Sent: Friday, 15 March 2013 8:44 PM
To: Andrew Werchowiecki; zfs-discuss@opensolaris.org
Subject: RE: partioned cache devices

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Andrew Werchowiecki

 muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2
 Password:
 cannot open '/dev/dsk/c25t10d1p2': I/O error
 muslimwookie@Pyzee:~$

 I have two SSDs in the system, I've created an 8gb partition on each drive for
 use as a mirrored write cache. I also have the remainder of the drive
 partitioned for use as the read only cache. However, when attempting to add
 it I get the error above.

Sounds like you're probably running into confusion about how to partition the 
drive.  If you create fdisk partitions, they will be accessible as p0, p1, p2, 
but I think p0 unconditionally refers to the whole drive, so the first 
partition is p1, and the second is p2.

If you create one big solaris fdisk parititon and then slice it via partition 
where s2 is typically the encompassing slice, and people usually use s1 and s2 
and s6 for actual slices, then they will be accessible via s1, s2, s6

Generally speaking, it's unadvisable to split the slog/cache devices anyway.  
Because:

If you're splitting it, evidently you're focusing on the wasted space.  Buying 
an expensive 128G device where you couldn't possibly ever use more than 4G or 
8G in the slog.  But that's not what you should be focusing on.  You should be 
focusing on the speed (that's why you bought it in the first place.)  The slog 
is write-only, and the cache is a mixture of read/write, where it should be 
hopefully doing more reads than writes.  But regardless of your actual success 
with the cache device, your cache device will be busy most of the time, and 
competing against the slog.

You have a mirror, you say.  You should probably drop both the cache  log.  
Use one whole device for the cache, use one whole device for the log.  The only 
risk you'll run is:

Since a slog is write-only (except during mount, typically at boot) it's 
possible to have a failure mode where you think you're writing to the log, but 
the first time you go back and read, you discover an error, and discover the 
device has gone bad.  In other words, without ever doing any reads, you might 
not notice when/if the device goes bad.  Fortunately, there's an easy 
workaround.  You could periodically (say, once a month) script the removal of 
your log device, create a junk pool, write a bunch of data to it, scrub it 
(thus verifying it was written correctly) and in the absence of any scrub 
errors, destroy the junk pool and re-add the device as a slog to the main pool.

I've never heard of anyone actually being that paranoid, and I've never heard 
of anyone actually experiencing the aforementioned possible undetected device 
failure mode.  So this is all mostly theoretical.

Mirroring the slog device really isn't necessary in the modern age.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-16 Thread Richard Elling

On Mar 16, 2013, at 7:01 PM, Andrew Werchowiecki 
andrew.werchowie...@xpanse.com.au wrote:

 It's a home set up, the performance penalty from splitting the cache devices 
 is non-existant, and that work around sounds like some pretty crazy amount of 
 overhead where I could instead just have a mirrored slog.
 
 I'm less concerned about wasted space, more concerned about amount of SAS 
 ports I have available.
 
 I understand that p0 refers to the whole disk... in the logs I pasted in I'm 
 not attempting to mount p0. I'm trying to work out why I'm getting an error 
 attempting to mount p2, after p1 has successfully mounted. Further, this has 
 been done before on other systems in the same hardware configuration in the 
 exact same fashion, and I've gone over the steps trying to make sure I 
 haven't missed something but can't see a fault. 

You can have only one Solaris partition at a time. Ian already shared the 
answer, Create one 100% 
Solaris partition and then use format to create two slices.
 -- richard

 
 I'm not keen on using Solaris slices because I don't have an understanding of 
 what that does to the pool's OS interoperability. 
 
 From: Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) 
 [opensolarisisdeadlongliveopensola...@nedharvey.com]
 Sent: Friday, 15 March 2013 8:44 PM
 To: Andrew Werchowiecki; zfs-discuss@opensolaris.org
 Subject: RE: partioned cache devices
 
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Andrew Werchowiecki
 
 muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2
 Password:
 cannot open '/dev/dsk/c25t10d1p2': I/O error
 muslimwookie@Pyzee:~$
 
 I have two SSDs in the system, I've created an 8gb partition on each drive 
 for
 use as a mirrored write cache. I also have the remainder of the drive
 partitioned for use as the read only cache. However, when attempting to add
 it I get the error above.
 
 Sounds like you're probably running into confusion about how to partition the 
 drive.  If you create fdisk partitions, they will be accessible as p0, p1, 
 p2, but I think p0 unconditionally refers to the whole drive, so the first 
 partition is p1, and the second is p2.
 
 If you create one big solaris fdisk parititon and then slice it via 
 partition where s2 is typically the encompassing slice, and people usually 
 use s1 and s2 and s6 for actual slices, then they will be accessible via s1, 
 s2, s6
 
 Generally speaking, it's unadvisable to split the slog/cache devices anyway.  
 Because:
 
 If you're splitting it, evidently you're focusing on the wasted space.  
 Buying an expensive 128G device where you couldn't possibly ever use more 
 than 4G or 8G in the slog.  But that's not what you should be focusing on.  
 You should be focusing on the speed (that's why you bought it in the first 
 place.)  The slog is write-only, and the cache is a mixture of read/write, 
 where it should be hopefully doing more reads than writes.  But regardless of 
 your actual success with the cache device, your cache device will be busy 
 most of the time, and competing against the slog.
 
 You have a mirror, you say.  You should probably drop both the cache  log.  
 Use one whole device for the cache, use one whole device for the log.  The 
 only risk you'll run is:
 
 Since a slog is write-only (except during mount, typically at boot) it's 
 possible to have a failure mode where you think you're writing to the log, 
 but the first time you go back and read, you discover an error, and discover 
 the device has gone bad.  In other words, without ever doing any reads, you 
 might not notice when/if the device goes bad.  Fortunately, there's an easy 
 workaround.  You could periodically (say, once a month) script the removal of 
 your log device, create a junk pool, write a bunch of data to it, scrub it 
 (thus verifying it was written correctly) and in the absence of any scrub 
 errors, destroy the junk pool and re-add the device as a slog to the main 
 pool.
 
 I've never heard of anyone actually being that paranoid, and I've never heard 
 of anyone actually experiencing the aforementioned possible undetected device 
 failure mode.  So this is all mostly theoretical.
 
 Mirroring the slog device really isn't necessary in the modern age.
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 

ZFS and performance consulting
http://www.RichardElling.com
















___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-15 Thread Ian Collins

Andrew Werchowiecki wrote:


Hi all,

I'm having some trouble with adding cache drives to a zpool, anyone 
got any ideas?


muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2

Password:

cannot open '/dev/dsk/c25t10d1p2': I/O error

muslimwookie@Pyzee:~$

I have two SSDs in the system, I've created an 8gb partition on each 
drive for use as a mirrored write cache. I also have the remainder of 
the drive partitioned for use as the read only cache. However, when 
attempting to add it I get the error above.




Create one 100% Solaris partition and then use format to create two slices.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-15 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Andrew Werchowiecki
 
 muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2
 Password:
 cannot open '/dev/dsk/c25t10d1p2': I/O error
 muslimwookie@Pyzee:~$
 
 I have two SSDs in the system, I've created an 8gb partition on each drive for
 use as a mirrored write cache. I also have the remainder of the drive
 partitioned for use as the read only cache. However, when attempting to add
 it I get the error above.

Sounds like you're probably running into confusion about how to partition the 
drive.  If you create fdisk partitions, they will be accessible as p0, p1, p2, 
but I think p0 unconditionally refers to the whole drive, so the first 
partition is p1, and the second is p2.

If you create one big solaris fdisk parititon and then slice it via partition 
where s2 is typically the encompassing slice, and people usually use s1 and s2 
and s6 for actual slices, then they will be accessible via s1, s2, s6

Generally speaking, it's unadvisable to split the slog/cache devices anyway.  
Because:  

If you're splitting it, evidently you're focusing on the wasted space.  Buying 
an expensive 128G device where you couldn't possibly ever use more than 4G or 
8G in the slog.  But that's not what you should be focusing on.  You should be 
focusing on the speed (that's why you bought it in the first place.)  The slog 
is write-only, and the cache is a mixture of read/write, where it should be 
hopefully doing more reads than writes.  But regardless of your actual success 
with the cache device, your cache device will be busy most of the time, and 
competing against the slog.

You have a mirror, you say.  You should probably drop both the cache  log.  
Use one whole device for the cache, use one whole device for the log.  The only 
risk you'll run is:

Since a slog is write-only (except during mount, typically at boot) it's 
possible to have a failure mode where you think you're writing to the log, but 
the first time you go back and read, you discover an error, and discover the 
device has gone bad.  In other words, without ever doing any reads, you might 
not notice when/if the device goes bad.  Fortunately, there's an easy 
workaround.  You could periodically (say, once a month) script the removal of 
your log device, create a junk pool, write a bunch of data to it, scrub it 
(thus verifying it was written correctly) and in the absence of any scrub 
errors, destroy the junk pool and re-add the device as a slog to the main pool.

I've never heard of anyone actually being that paranoid, and I've never heard 
of anyone actually experiencing the aforementioned possible undetected device 
failure mode.  So this is all mostly theoretical.

Mirroring the slog device really isn't necessary in the modern age.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun X4200 Question...

2013-03-15 Thread Tiernan OToole
Thanks for the info. I am planning g the install this weekend, between
formula one and other hardware upgrades... fingers crossed it works!
On 14 Mar 2013 09:19, Heiko L. h.lehm...@hs-lausitz.de wrote:


  support for VT, but nothing for AMD... The Opterons dont have VT, so i
 wont
  be using XEN, but the Zones may be useful...

 We use XEN/PV on X4200 for many years without problems.
 dom0: X4200+openindiana+xvm
 guests(PV): openindiana,linux/fedora,linux/debian
 (vmlinuz-2.6.32.28-xenU-32,vmlinuz-2.6.18-xenU64)


 regards Heiko

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Petabyte pool?

2013-03-15 Thread Marion Hakanson
Greetings,

Has anyone out there built a 1-petabyte pool?  I've been asked to look
into this, and was told low performance is fine, workload is likely
to be write-once, read-occasionally, archive storage of gene sequencing
data.  Probably a single 10Gbit NIC for connectivity is sufficient.

We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis,
using 4TB nearline SAS drives, giving over 100TB usable space (raidz3).
Back-of-the-envelope might suggest stacking up eight to ten of those,
depending if you want a raw marketing petabyte, or a proper power-of-two
usable petabyte.

I get a little nervous at the thought of hooking all that up to a single
server, and am a little vague on how much RAM would be advisable, other
than as much as will fit (:-).  Then again, I've been waiting for
something like pNFS/NFSv4.1 to be usable for gluing together multiple
NFS servers into a single global namespace, without any sign of that
happening anytime soon.

So, has anyone done this?  Or come close to it?  Thoughts, even if you
haven't done it yourself?

Thanks and regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabyte pool?

2013-03-15 Thread Ray Van Dolson
On Fri, Mar 15, 2013 at 06:09:34PM -0700, Marion Hakanson wrote:
 Greetings,
 
 Has anyone out there built a 1-petabyte pool?  I've been asked to look
 into this, and was told low performance is fine, workload is likely
 to be write-once, read-occasionally, archive storage of gene sequencing
 data.  Probably a single 10Gbit NIC for connectivity is sufficient.
 
 We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis,
 using 4TB nearline SAS drives, giving over 100TB usable space (raidz3).
 Back-of-the-envelope might suggest stacking up eight to ten of those,
 depending if you want a raw marketing petabyte, or a proper power-of-two
 usable petabyte.
 
 I get a little nervous at the thought of hooking all that up to a single
 server, and am a little vague on how much RAM would be advisable, other
 than as much as will fit (:-).  Then again, I've been waiting for
 something like pNFS/NFSv4.1 to be usable for gluing together multiple
 NFS servers into a single global namespace, without any sign of that
 happening anytime soon.
 
 So, has anyone done this?  Or come close to it?  Thoughts, even if you
 haven't done it yourself?
 
 Thanks and regards,
 
 Marion

We've come close:

admin@mes-str-imgnx-p1:~$ zpool list
NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
datapool   978T   298T   680T30%  1.00x  ONLINE  -
syspool278G   104G   174G37%  1.00x  ONLINE  -

Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual
pathed to a couple of LSI SAS switches.

Using Nexenta but no reason you couldn't do this w/ $whatever.

We did triple parity and our vdev membership is set up such that we can
lose up to three JBODs and still be functional (one vdev member disk
per JBOD).

This is with 3TB NL-SAS drives.

Ray
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs] Petabyte pool?

2013-03-15 Thread Kristoffer Sheather @ CloudCentral
Well, off the top of my head:

2 x Storage Heads, 4 x 10G, 256G RAM, 2 x Intel E5 CPU's
8 x 60-Bay JBOD's with 60 x 4TB SAS drives
RAIDZ2 stripe over the 8 x JBOD's

That should fit within 1 rack comfortably and provide 1 PB storage..

Regards,

Kristoffer Sheather
Cloud Central
Scale Your Data Center In The Cloud 
Phone: 1300 144 007 | Mobile: +61 414 573 130 | Email: 
k...@cloudcentral.com.au
LinkedIn:   | Skype: kristoffer.sheather | Twitter: 
http://twitter.com/kristofferjon 


 From: Marion Hakanson hakan...@ohsu.edu
Sent: Saturday, March 16, 2013 12:12 PM
To: z...@lists.illumos.org
Subject: [zfs] Petabyte pool?

Greetings,

Has anyone out there built a 1-petabyte pool?  I've been asked to look
into this, and was told low performance is fine, workload is likely
to be write-once, read-occasionally, archive storage of gene sequencing
data.  Probably a single 10Gbit NIC for connectivity is sufficient.

We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis,
using 4TB nearline SAS drives, giving over 100TB usable space (raidz3).
Back-of-the-envelope might suggest stacking up eight to ten of those,
depending if you want a raw marketing petabyte, or a proper 
power-of-two
usable petabyte.

I get a little nervous at the thought of hooking all that up to a single
server, and am a little vague on how much RAM would be advisable, other
than as much as will fit (:-).  Then again, I've been waiting for
something like pNFS/NFSv4.1 to be usable for gluing together multiple
NFS servers into a single global namespace, without any sign of that
happening anytime soon.

So, has anyone done this?  Or come close to it?  Thoughts, even if you
haven't done it yourself?

Thanks and regards,

Marion

---
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: 
https://www.listbox.com/member/archive/rss/182191/23629987-2afa167a
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=23629987id_secret=23629987-c48148
a8
Powered by Listbox: http://www.listbox.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs] Petabyte pool?

2013-03-15 Thread Kristoffer Sheather @ CloudCentral
Actually, you could use 3TB drives and with a 6/8 RAIDZ2 stripe achieve 
1080 TB usable.

You'll also need 8-16 x SAS ports available on each storage head to provide 
redundant multi-pathed SAS connectivity to the JBOD's, recommend LSI 
9207-8E's for those and Intel X520-DA2's for the 10G NIC's.


 From: Kristoffer Sheather @ CloudCentral 
kristoffer.sheat...@cloudcentral.com.au
Sent: Saturday, March 16, 2013 12:21 PM
To: z...@lists.illumos.org
Subject: re: [zfs] Petabyte pool?

Well, off the top of my head:

2 x Storage Heads, 4 x 10G, 256G RAM, 2 x Intel E5 CPU's
8 x 60-Bay JBOD's with 60 x 4TB SAS drives
RAIDZ2 stripe over the 8 x JBOD's

That should fit within 1 rack comfortably and provide 1 PB storage..

Regards,

Kristoffer Sheather
Cloud Central
Scale Your Data Center In The Cloud 
Phone: 1300 144 007 | Mobile: +61 414 573 130 | Email: 
k...@cloudcentral.com.au
LinkedIn:   | Skype: kristoffer.sheather | Twitter: 
http://twitter.com/kristofferjon 


 From: Marion Hakanson hakan...@ohsu.edu
Sent: Saturday, March 16, 2013 12:12 PM
To: z...@lists.illumos.org
Subject: [zfs] Petabyte pool?

Greetings,

Has anyone out there built a 1-petabyte pool?  I've been asked to look
into this, and was told low performance is fine, workload is likely
to be write-once, read-occasionally, archive storage of gene sequencing
data.  Probably a single 10Gbit NIC for connectivity is sufficient.

We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis,
using 4TB nearline SAS drives, giving over 100TB usable space (raidz3).
Back-of-the-envelope might suggest stacking up eight to ten of those,
depending if you want a raw marketing petabyte, or a proper 
power-of-two
usable petabyte.

I get a little nervous at the thought of hooking all that up to a single
server, and am a little vague on how much RAM would be advisable, other
than as much as will fit (:-).  Then again, I've been waiting for
something like pNFS/NFSv4.1 to be usable for gluing together multiple
NFS servers into a single global namespace, without any sign of that
happening anytime soon.

So, has anyone done this?  Or come close to it?  Thoughts, even if you
haven't done it yourself?

Thanks and regards,

Marion

---
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: 
https://www.listbox.com/member/archive/rss/182191/23629987-2afa167a
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=23629987id_secret=23629987-c48148
a8
Powered by Listbox: http://www.listbox.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabyte pool?

2013-03-15 Thread Jan Owoc
On Fri, Mar 15, 2013 at 7:09 PM, Marion Hakanson hakan...@ohsu.edu wrote:
 Has anyone out there built a 1-petabyte pool?

I'm not advising against your building/configuring a system yourself,
but I suggest taking look at the Petarack:
http://www.aberdeeninc.com/abcatg/petarack.htm

It shows it's been done with ZFS :-).

Jan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabyte pool?

2013-03-15 Thread Marion Hakanson
rvandol...@esri.com said:
 We've come close:
 
 admin@mes-str-imgnx-p1:~$ zpool list
 NAME   SIZE  ALLOC   FREECAP DEDUP  HEALTH  ALTROOT
 datapool   978T   298T   680T30%  1.00x  ONLINE  -
 syspool278G   104G   174G37%  1.00x  ONLINE  -
 
 Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed to
 a couple of LSI SAS switches. 

Thanks Ray,

We've been looking at those too (we've had good luck with our MD1200's).

How many HBA's in the R720?

Thanks and regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabyte pool?

2013-03-15 Thread Ray Van Dolson
On Fri, Mar 15, 2013 at 06:31:11PM -0700, Marion Hakanson wrote:
 rvandol...@esri.com said:
  We've come close:
  
  admin@mes-str-imgnx-p1:~$ zpool list
  NAME   SIZE  ALLOC   FREECAP DEDUP  HEALTH  ALTROOT
  datapool   978T   298T   680T30%  1.00x  ONLINE  -
  syspool278G   104G   174G37%  1.00x  ONLINE  -
  
  Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed 
  to
  a couple of LSI SAS switches. 
 
 Thanks Ray,
 
 We've been looking at those too (we've had good luck with our MD1200's).
 
 How many HBA's in the R720?
 
 Thanks and regards,
 
 Marion

We have qty 2 LSI SAS 9201-16e HBA's (Dell resold[1]).

Ray

[1] 
http://accessories.us.dell.com/sna/productdetail.aspx?c=usl=ens=hiedcs=65sku=a4614101
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabyte pool?

2013-03-15 Thread Marion Hakanson
Ray said:
 Using a Dell R720 head unit, plus a bunch of Dell MD1200 JBODs dual pathed
 to a couple of LSI SAS switches. 
 
Marion said:
 How many HBA's in the R720?

Ray said:
 We have qty 2 LSI SAS 9201-16e HBA's (Dell resold[1]).

Sounds similar in approach to the Aberdeen product another sender referred to,
with SAS switch layout:
  http://www.aberdeeninc.com/images/1-up-petarack2.jpg

One concern I had is that I compared our SuperMicro JBOD with 40x 4TB drives
in it, connected via a dual-port LSI SAS 9200-8e HBA, to the same pool layout
on a 40-slot server with 40x SATA drives in it.  But the server uses no SAS
expanders, instead using SAS-to-SATA octopus cables to connect the drives
directly to three internal SAS HBA's (2x 9201-16i's, 1x 9211-8i).

What I found was that the internal pool was significantly faster for both
sequential and random I/O than the pool on the external JBOD.

My conclusion was that I would not want to exceed ~48 drives on a single
8-port SAS HBA.  So I thought that running the I/O of all your hundreds
of drives through only two HBA's would be a bottleneck.

LSI's specs say 4800MBytes/sec for an 8-port SAS HBA, but 4000MBytes/sec
for that card in an x8 PCIe-2.0 slot.  Sure, the newer 9207-8e is rated
at 8000MBytes/sec in an x8 PCIe-3.0 slot, but it still has only the same
8 SAS ports going at 4800MBytes/sec.

Yes, I know the disks probably can't go that fast.  But in my tests
above, the internal 40-disk pool measures 2000MBytes/sec sequential
reads and writes, while the external 40-disk JBOD measures at 1500
to 1700 MBytes/sec.  Not a lot slower, but significantly slower, so
I do think the number of HBA's makes a difference.

At the moment, I'm leaning toward piling six, eight, or ten HBA's into
a server, preferably one with dual IOH's (thus two PCIe busses), and
connecting dual-path JBOD's in that manner.

I hadn't looked into SAS switches much, but they do look more reliable
than daisy-chaining a bunch of JBOD's together.  I just haven't seen
how to get more bandwidth through them to a single host.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabyte pool?

2013-03-15 Thread Richard Elling
On Mar 15, 2013, at 6:09 PM, Marion Hakanson hakan...@ohsu.edu wrote:

 Greetings,
 
 Has anyone out there built a 1-petabyte pool?

Yes, I've done quite a few.

  I've been asked to look
 into this, and was told low performance is fine, workload is likely
 to be write-once, read-occasionally, archive storage of gene sequencing
 data.  Probably a single 10Gbit NIC for connectivity is sufficient.
 
 We've had decent success with the 45-slot, 4U SuperMicro SAS disk chassis,
 using 4TB nearline SAS drives, giving over 100TB usable space (raidz3).
 Back-of-the-envelope might suggest stacking up eight to ten of those,
 depending if you want a raw marketing petabyte, or a proper power-of-two
 usable petabyte.

Yes. NB, for the PHB, using N^2 is found 2B less effective than N^10.

 I get a little nervous at the thought of hooking all that up to a single
 server, and am a little vague on how much RAM would be advisable, other
 than as much as will fit (:-).  Then again, I've been waiting for
 something like pNFS/NFSv4.1 to be usable for gluing together multiple
 NFS servers into a single global namespace, without any sign of that
 happening anytime soon.

NFS v4 or DFS (or even clever sysadmin + automount) offers single namespace
without needing the complexity of NFSv4.1, lustre, glusterfs, etc.

 
 So, has anyone done this?  Or come close to it?  Thoughts, even if you
 haven't done it yourself?

Don't forget about backups :-)
 -- richard


--

richard.ell...@richardelling.com
+1-760-896-4422









___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun X4200 Question...

2013-03-14 Thread Heiko L.

 support for VT, but nothing for AMD... The Opterons dont have VT, so i wont
 be using XEN, but the Zones may be useful...

We use XEN/PV on X4200 for many years without problems.
dom0: X4200+openindiana+xvm
guests(PV): openindiana,linux/fedora,linux/debian 
(vmlinuz-2.6.32.28-xenU-32,vmlinuz-2.6.18-xenU64)


regards Heiko

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun X4200 Question...

2013-03-14 Thread Jim Klimov

On 2013-03-11 21:50, Bob Friesenhahn wrote:

On Mon, 11 Mar 2013, Tiernan OToole wrote:


I know this might be the wrong place to ask, but hopefully someone can
point me in the right direction...
I got my hands on a Sun x4200. Its the original one, not the M2, and
has 2 single core Opterons, 4Gb RAM and 4 73Gb SAS Disks...
But, I dont know what to install on it... I was thinking of SmartOS,
but the site mentions Intel support for VT, but nothing for
AMD... The Opterons dont have VT, so i wont be using XEN, but the
Zones may be useful...


OpenIndiana or OmniOS seem like the most likely candidates.

You can run VirtualBox on OpenIndiana and it should be able to work
without VT extensions.


Also note that without the extensions VirtualBox has some quirks.
Most notably, lack of acceleration and support for virtual SMP.
But unlike some other virtualizers, it should work (does work for
us on a Thumper also with pre-VTx Opteron CPUs). However, recently
the VM virtual hardware clocks became way slow. I am at loss so
far, the forum was moderately helpful - probably the load on the
host and induced latencies have their role. But the problem does
happen on more modern hardware too, so VTx (lack of) shouldn't be
our reason...

//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun X4200 Question...

2013-03-14 Thread Gary Driggs
On Mar 14, 2013, at 5:55 PM, Jim Klimov jimkli...@cos.ru wrote:

 However, recently the VM virtual hardware clocks became way slow.

Does NTP help correct the guest's clock?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun X4200 Question...

2013-03-14 Thread Jim Klimov

On 2013-03-15 01:58, Gary Driggs wrote:

On Mar 14, 2013, at 5:55 PM, Jim Klimov jimkli...@cos.ru wrote:


However, recently the VM virtual hardware clocks became way slow.


Does NTP help correct the guest's clock?


Unfortunately no, neither guest NTP, ntpdate or rdate in crontabs,
nor VirtualBox timesync settings, alone or even combined for test
(though known to conflict) - nothing has definitely helped so far.

We also have some setups on rather not-loaded hardware where after
a few days of uptime the clock stalls to the point that it has a
groundhog day - rotating over the same 2-3 second range for hours,
until the VM is powered off and booted.

Conversely, we also have dozens of VMs (and a few hosts) where no
such problems occur. Weird stuff...

//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] partioned cache devices

2013-03-14 Thread Andrew Werchowiecki
Hi all,

I'm having some trouble with adding cache drives to a zpool, anyone got any 
ideas?

muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2
Password:
cannot open '/dev/dsk/c25t10d1p2': I/O error
muslimwookie@Pyzee:~$

I have two SSDs in the system, I've created an 8gb partition on each drive for 
use as a mirrored write cache. I also have the remainder of the drive 
partitioned for use as the read only cache. However, when attempting to add it 
I get the error above.

Here's a zpool status:

  pool: aggr0
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Feb 21 21:13:45 2013
1.13T scanned out of 20.0T at 106M/s, 51h52m to go
74.2G resilvered, 5.65% done
config:

NAME STATE READ WRITE CKSUM
aggr0DEGRADED 0 0 0
  raidz2-0   DEGRADED 0 0 0
c7t5000C50035CA68EDd0ONLINE   0 0 0
c7t5000C5003679D3E2d0ONLINE   0 0 0
c7t50014EE2B16BC08Bd0ONLINE   0 0 0
c7t50014EE2B174216Dd0ONLINE   0 0 0
c7t50014EE2B174366Bd0ONLINE   0 0 0
c7t50014EE25C1E7646d0ONLINE   0 0 0
c7t50014EE25C17A62Cd0ONLINE   0 0 0
c7t50014EE25C17720Ed0ONLINE   0 0 0
c7t50014EE206C2AFD1d0ONLINE   0 0 0
c7t50014EE206C8E09Fd0ONLINE   0 0 0
c7t50014EE602DFAACAd0ONLINE   0 0 0
c7t50014EE602DFE701d0ONLINE   0 0 0
c7t50014EE20677C1C1d0ONLINE   0 0 0
replacing-13 UNAVAIL  0 0 0
  c7t50014EE6031198C1d0  UNAVAIL  0 0 0  cannot open
  c7t50014EE0AE2AB006d0  ONLINE   0 0 0  (resilvering)
c7t50014EE65835480Dd0ONLINE   0 0 0
logs
  mirror-1   ONLINE   0 0 0
c25t10d1p1   ONLINE   0 0 0
c25t9d1p1ONLINE   0 0 0

errors: No known data errors

As you can see, I've successfully added the 8gb partitions in a write caches. 
Interestingly, when I do a zpool iostat -v it shows the total as 111gb:

capacity operationsbandwidth
pool alloc   free   read  write   read  write
---  -  -  -  -  -  -
aggr020.0T  7.27T  1.33K139  81.7M  4.19M
  raidz2 20.0T  7.27T  1.33K115  81.7M  2.70M
c7t5000C50035CA68EDd0-  -566  9  6.91M   241K
c7t5000C5003679D3E2d0-  -493  8  6.97M   242K
c7t50014EE2B16BC08Bd0-  -544  9  7.02M   239K
c7t50014EE2B174216Dd0-  -525  9  6.94M   241K
c7t50014EE2B174366Bd0-  -540  9  6.95M   241K
c7t50014EE25C1E7646d0-  -549  9  7.02M   239K
c7t50014EE25C17A62Cd0-  -534  9  6.93M   241K
c7t50014EE25C17720Ed0-  -542  9  6.95M   241K
c7t50014EE206C2AFD1d0-  -549  9  7.02M   239K
c7t50014EE206C8E09Fd0-  -526 10  6.94M   241K
c7t50014EE602DFAACAd0-  -576 10  6.91M   241K
c7t50014EE602DFE701d0-  -591 10  7.00M   239K
c7t50014EE20677C1C1d0-  -530 10  6.95M   241K
replacing-  -  0922  0  7.11M
  c7t50014EE6031198C1d0  -  -  0  0  0  0
  c7t50014EE0AE2AB006d0  -  -  0622  2  7.10M
c7t50014EE65835480Dd0-  -595 10  6.98M   239K
logs -  -  -  -  -  -
  mirror  740K   111G  0 43  0  2.75M
c25t10d1p1   -  -  0 43  3  2.75M
c25t9d1p1-  -  0 43  3  2.75M
---  -  -  -  -  -  -
rpool7.32G  12.6G  2  4  41.9K  43.2K
  c4t0d0s0   7.32G  12.6G  2  4  41.9K  43.2K
---  -  -  -  -  -  -

Something funky is going on here...

Wooks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Sun X4200 Question...

2013-03-11 Thread Tiernan OToole
I know this might be the wrong place to ask, but hopefully someone can
point me in the right direction...

I got my hands on a Sun x4200. Its the original one, not the M2, and has 2
single core Opterons, 4Gb RAM and 4 73Gb SAS Disks... But, I dont know what
to install on it... I was thinking of SmartOS, but the site mentions Intel
support for VT, but nothing for AMD... The Opterons dont have VT, so i wont
be using XEN, but the Zones may be useful...

Any advice?

Thanks!

-- 
Tiernan O'Toole
blog.lotas-smartman.net
www.geekphotographer.com
www.tiernanotoole.ie
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun X4200 Question...

2013-03-11 Thread Bob Friesenhahn

On Mon, 11 Mar 2013, Tiernan OToole wrote:


I know this might be the wrong place to ask, but hopefully someone can point me 
in the right direction...
I got my hands on a Sun x4200. Its the original one, not the M2, and has 2 
single core Opterons, 4Gb RAM and 4 73Gb SAS Disks...
But, I dont know what to install on it... I was thinking of SmartOS, but the 
site mentions Intel support for VT, but nothing for
AMD... The Opterons dont have VT, so i wont be using XEN, but the Zones may be 
useful... 


OpenIndiana or OmniOS seem like the most likely candidates.

You can run VirtualBox on OpenIndiana and it should be able to work 
without VT extensions.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun X4200 Question...

2013-03-11 Thread Tiernan OToole
to tell you the truth, i dont really need the virtualization stuff... Zones
sounds interesting, since it seems to be ligher weight than Xen or anything
like that...


On Mon, Mar 11, 2013 at 8:50 PM, Bob Friesenhahn 
bfrie...@simple.dallas.tx.us wrote:

 On Mon, 11 Mar 2013, Tiernan OToole wrote:

  I know this might be the wrong place to ask, but hopefully someone can
 point me in the right direction...
 I got my hands on a Sun x4200. Its the original one, not the M2, and has
 2 single core Opterons, 4Gb RAM and 4 73Gb SAS Disks...
 But, I dont know what to install on it... I was thinking of SmartOS, but
 the site mentions Intel support for VT, but nothing for
 AMD... The Opterons dont have VT, so i wont be using XEN, but the Zones
 may be useful...


 OpenIndiana or OmniOS seem like the most likely candidates.

 You can run VirtualBox on OpenIndiana and it should be able to work
 without VT extensions.

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/**
 users/bfriesen/ http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/




-- 
Tiernan O'Toole
blog.lotas-smartman.net
www.geekphotographer.com
www.tiernanotoole.ie
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Huge Numbers of Illegal Requests

2013-03-06 Thread Bob Friesenhahn

On Tue, 5 Mar 2013, Ed Shipe wrote:


On 2 different OpenIndiana 151a7 systems, Im showing a huge number of Illegal 
Requests.  There are no other apparent
issues, performance is fine, etc,etc.Everything works great - what are these 
illegal requests?  My Google-Foo is
failing me...


My system used to exhibit this problem so I opened Illumos issue 2998 
(https://www.illumos.org/issues/2998).  The weird thing is that the 
problem went away and has not returned.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-03-05 Thread Robert Milkowski
  We do the same for all of our legacy operating system backups.
 Take
  a snapshot then do an rsync and an excellent way of maintaining
  incremental backups for those.
 
 
  Magic rsync options used:
 
-a --inplace --no-whole-file --delete-excluded
 
  This causes rsync to overwrite the file blocks in place rather than
  writing to a new temporary file first.  As a result, zfs COW produces
  primitive deduplication of at least the unchanged blocks (by
 writing
  nothing) while writing new COW blocks for the changed blocks.
 
 If I understand your use case correctly (the application overwrites
 some blocks with the same exact contents), ZFS will ignore these no-

I think he meant to rely on rsync here to do in-place updates of files and
only for changed blocks with the above parameters (by using rsync's own
delta mechanism). So if you have a file a and only one block changed rsync
will overwrite on destination only that single block.


 op writes only on recent Open ZFS (illumos / FreeBSD / Linux) builds
 with checksum=sha256 and compression!=off.  AFAIK, Solaris ZFS will COW
 the blocks even if their content is identical to what's already there,
 causing the snapshots to diverge.
 
 See https://www.illumos.org/issues/3236 for details.
 

This is interesting. I didn't know about it.
Is there an option similar to verify=on in dedup or does it just assume that
checksum is your data?

-- 
Robert Milkowski
http://milek.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-03-05 Thread Bob Friesenhahn

On Mon, 4 Mar 2013, Matthew Ahrens wrote:


Magic rsync options used:

  -a --inplace --no-whole-file --delete-excluded

This causes rsync to overwrite the file blocks in place rather than writing
to a new temporary file first.  As a result, zfs COW produces primitive
deduplication of at least the unchanged blocks (by writing nothing) while
writing new COW blocks for the changed blocks.


If I understand your use case correctly (the application overwrites
some blocks with the same exact contents), ZFS will ignore these
no-op writes only on recent Open ZFS (illumos / FreeBSD / Linux)
builds with checksum=sha256 and compression!=off.  AFAIK, Solaris ZFS
will COW the blocks even if their content is identical to what's
already there, causing the snapshots to diverge.


With these rsync options, rsync will only overwrite a block if the 
contents of the block has changed.  Rsync's notion of a block is 
different than zfs so there is not a perfect overlap.


Rsync does need to read files on the destination filesystem to see if 
they have changed.  If the system has sufficient RAM (and/or L2ARC) 
then files may still be cached from the previous day's run.  In most 
cases only a small subset of the total files are updated (at least on 
my systems) so the caching requirements are small.  Files updated on 
one day are more likely to be the ones updated on subsequent days.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-03-05 Thread David Magda
On Tue, March 5, 2013 10:02, Bob Friesenhahn wrote:

 Rsync does need to read files on the destination filesystem to see if
 they have changed.  If the system has sufficient RAM (and/or L2ARC)
 then files may still be cached from the previous day's run.  In most
 cases only a small subset of the total files are updated (at least on
 my systems) so the caching requirements are small.  Files updated on
 one day are more likely to be the ones updated on subsequent days.

It's also possible to reduce the amount that rsync has to walk the entire
file tree.

Most folks simply do a rsync --options /my/source/ /the/dest/, but if
you use zfs diff, and parse/feed the output of that to rsync, then the
amount of thrashing can probably be minimized. Especially useful for file
hierarchies that very many individual files, so you don't have to stat()
every single one.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-03-05 Thread Russ Poyner

On 3/5/2013 9:40 AM, David Magda wrote:

On Tue, March 5, 2013 10:02, Bob Friesenhahn wrote:


Rsync does need to read files on the destination filesystem to see if
they have changed.  If the system has sufficient RAM (and/or L2ARC)
then files may still be cached from the previous day's run.  In most
cases only a small subset of the total files are updated (at least on
my systems) so the caching requirements are small.  Files updated on
one day are more likely to be the ones updated on subsequent days.

It's also possible to reduce the amount that rsync has to walk the entire
file tree.

Most folks simply do a rsync --options /my/source/ /the/dest/, but if
you use zfs diff, and parse/feed the output of that to rsync, then the
amount of thrashing can probably be minimized. Especially useful for file
hierarchies that very many individual files, so you don't have to stat()
every single one.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


David,

Your idea to use zfs diff to limit the need to stat the entire 
filesystem tree intrigues me. My current rsync backups are normally 
limited by this very factor. It takes longer to walk the filesystem tree 
than it does to transfer the new data.


Would you be willing to provide an example of what you mean when you say 
parse/feed the ouput of zfs diff to rsync?


Russ Poyner
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-03-05 Thread Bob Friesenhahn

On Tue, 5 Mar 2013, David Magda wrote:

It's also possible to reduce the amount that rsync has to walk the entire
file tree.

Most folks simply do a rsync --options /my/source/ /the/dest/, but if
you use zfs diff, and parse/feed the output of that to rsync, then the
amount of thrashing can probably be minimized. Especially useful for file
hierarchies that very many individual files, so you don't have to stat()
every single one.


Zfs diff only works for zfs filesystems.  If one is using zfs 
filesystems then rsync may not be the best option.  In the real world, 
data may be sourced from many types of systems and filesystems.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-03-05 Thread Russ Poyner

On 3/5/2013 10:27 AM, Bob Friesenhahn wrote:

On Tue, 5 Mar 2013, David Magda wrote:
It's also possible to reduce the amount that rsync has to walk the 
entire

file tree.

Most folks simply do a rsync --options /my/source/ /the/dest/, but if
you use zfs diff, and parse/feed the output of that to rsync, then the
amount of thrashing can probably be minimized. Especially useful for 
file

hierarchies that very many individual files, so you don't have to stat()
every single one.


Zfs diff only works for zfs filesystems.  If one is using zfs 
filesystems then rsync may not be the best option.  In the real world, 
data may be sourced from many types of systems and filesystems.


Bob

Bob,

Good point. Clearly this wouldn't work for my current linux fileserver. 
I'm building a replacement that will run FreeBSD 9.1 with a zfs storage 
pool. My backups are to a thumper running solaris 10 and zfs in another 
department. I have an arm's-length collaboration with the department 
that runs the thumper, which likely precludes a direct zfs send.


Rsync has allowed us to transfer data without getting too deep into each 
others' system administration. I run an rsync daemon with read only 
access to my filesystem that accepts connections from the thumper. They 
serve the backups to me via a read-only nfs export. The only problem has 
been the iops load generated by my users' millions of small files. 
That's why the zfs diff idea excited me, but perhaps I'm missing some 
simpler approach.


Russ
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-03-05 Thread David Magda
On Tue, March 5, 2013 11:17, Russ Poyner wrote:
 Your idea to use zfs diff to limit the need to stat the entire
 filesystem tree intrigues me. My current rsync backups are normally
 limited by this very factor. It takes longer to walk the filesystem tree
 than it does to transfer the new data.

 Would you be willing to provide an example of what you mean when you say
 parse/feed the ouput of zfs diff to rsync?

Don't have anything readily available, or a ZFS system handy to hack
something up. The output of zfs diff is roughly:

  M   /myfiles/
  M   /myfiles/link_to_me   (+1)
  R   /myfiles/rename_me - /myfiles/renamed
  -   /myfiles/delete_me
  +   /myfiles/new_file

Take the second column and use that as the list of file to check. Solaris'
zfs(1M) has an -F option which would output something like:

   M   /   /myfiles/
   M   F   /myfiles/link_to_me  (+1)
   R   /myfiles/rename_me - /myfiles/renamed
   -   F   /myfiles/delete_me
   +   F   /myfiles/new_file
   +   |   /myfiles/new_pipe

So the second column now has a type, and the path is pushed over to the
third column. This way you can simply choose file (F) and tell rsync to
use check those.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-03-04 Thread Matthew Ahrens
On Tue, Feb 26, 2013 at 7:42 PM, Bob Friesenhahn
bfrie...@simple.dallas.tx.us wrote:
 On Wed, 27 Feb 2013, Ian Collins wrote:

 I am finding that rsync with the right options (to directly
 block-overwrite) plus zfs snapshots is providing me with pretty
 amazing deduplication for backups without even enabling
 deduplication in zfs.  Now backup storage goes a very long way.


 We do the same for all of our legacy operating system backups. Take a
 snapshot then do an rsync and an excellent way of maintaining incremental
 backups for those.


 Magic rsync options used:

   -a --inplace --no-whole-file --delete-excluded

 This causes rsync to overwrite the file blocks in place rather than writing
 to a new temporary file first.  As a result, zfs COW produces primitive
 deduplication of at least the unchanged blocks (by writing nothing) while
 writing new COW blocks for the changed blocks.

If I understand your use case correctly (the application overwrites
some blocks with the same exact contents), ZFS will ignore these
no-op writes only on recent Open ZFS (illumos / FreeBSD / Linux)
builds with checksum=sha256 and compression!=off.  AFAIK, Solaris ZFS
will COW the blocks even if their content is identical to what's
already there, causing the snapshots to diverge.

See https://www.illumos.org/issues/3236 for details.

commit 80901aea8e78a2c20751f61f01bebd1d5b5c2ba5
Author: George Wilson george.wil...@delphix.com
Date:   Tue Nov 13 14:55:48 2012 -0800

3236 zio nop-write

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Huge Numbers of Illegal Requests

2013-03-04 Thread Ed Shipe
On 2 different OpenIndiana 151a7 systems, Im showing a huge number of
Illegal Requests.  There are no other apparent issues, performance is fine,
etc,etc.
Everything works great - what are these illegal requests?  My Google-Foo is
failing me...

Thanks,

-ed

root@NAPP1:~# iostat -Ensr
c6t0d0   ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0
Vendor: SanDisk  ,Product: Extreme  ,Revision: 0001 ,Serial No:
Size: 16.01GB 16013942784 bytes
,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0
Illegal Request: 333 ,Predictive Failure Analysis: 0
c4t0d0   ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0
Vendor: ATA  ,Product: SanDisk SDSSDX24 ,Revision: R211 ,Serial No:
121562402168
Size: 240.06GB 240057409536 bytes
,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0
Illegal Request: 992096 ,Predictive Failure Analysis: 0
c4t1d0   ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0
Vendor: ATA  ,Product: SanDisk SDSSDX24 ,Revision: R211 ,Serial No:
121562401118
Size: 240.06GB 240057409536 bytes
,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0
Illegal Request: 992064 ,Predictive Failure Analysis: 0
c4t2d0   ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0
Vendor: ATA  ,Product: SanDisk SDSSDX24 ,Revision: R211 ,Serial No:
121562401215
Size: 240.06GB 240057409536 bytes
,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0
Illegal Request: 992063 ,Predictive Failure Analysis: 0
c4t3d0   ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0
Vendor: ATA  ,Product: SanDisk SDSSDX24 ,Revision: R211 ,Serial No:
121562401014
Size: 240.06GB 240057409536 bytes
,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0
Illegal Request: 992063 ,Predictive Failure Analysis: 0
c4t5d0   ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0
Vendor: ATA  ,Product: INTEL SSDSC2CT12 ,Revision: 300i ,Serial No:
CVMP219200MZ120
Size: 120.03GB 120034123776 bytes
,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0
Illegal Request: 1983773 ,Predictive Failure Analysis: 0
c3t0d0   ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0
Vendor: ATA  ,Product: ST2000DM001-9YN1 ,Revision: CC4B ,Serial No:
S240F3KN
Size: 2000.40GB 2000398934016 bytes
,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0
Illegal Request: 992072 ,Predictive Failure Analysis: 0
c3t1d0   ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0
Vendor: ATA  ,Product: ST2000DM001-9YN1 ,Revision: CC4B ,Serial No:
S240F2TN
Size: 2000.40GB 2000398934016 bytes
,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0
Illegal Request: 992031 ,Predictive Failure Analysis: 0
c3t2d0   ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0
Vendor: ATA  ,Product: ST2000DM001-9YN1 ,Revision: CC4B ,Serial No:
Z1E0K3C9
Size: 2000.40GB 2000398934016 bytes
,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0
Illegal Request: 992019 ,Predictive Failure Analysis: 0
c3t3d0   ,Soft Errors: 0 ,Hard Errors: 0 ,Transport Errors: 0
Vendor: ATA  ,Product: ST2000DM001-9YN1 ,Revision: CC4B ,Serial No:
W1E0FL39
Size: 2000.40GB 2000398934016 bytes
,Media Error: 0 ,Device Not Ready: 0 ,No Device: 0 ,Recoverable: 0
Illegal Request: 992016 ,Predictive Failure Analysis: 0
root@NAPP1:~#


-- 
Ed Shipe
Candelabra Computing Inc
e...@candelabracomputing.com
Mobile: 410-929-2597
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SVM ZFS

2013-02-27 Thread Darren J Moffat



On 02/26/13 20:30, Morris Hooten wrote:

Besides copying data from /dev/md/dsk/x volume manager filesystems to
new zfs filesystems
does anyone know of any zfs conversion tools to make the
conversion/migration from svm to zfs
easier?


With Solaris 11 you can use shadow migration, it is really a VFS layer 
feature but it is integrated into the ZFS CLI tools for easy of use


# zfs create -o shadow=file:///path/to/old  mypool/new

The new filesystem will appear to instantly have all the data, and it 
will be copied over as it is access as well as shadowd pulling it over 
in advance.


You can use shadowstat(1M) to show progress.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-27 Thread Ahmed Kamal
How is the quality of the ZFS Linux port today? Is it comparable to Illumos
or at least FreeBSD ? Can I trust production data to it ?


On Wed, Feb 27, 2013 at 5:22 AM, Bob Friesenhahn 
bfrie...@simple.dallas.tx.us wrote:

 On Tue, 26 Feb 2013, Gary Driggs wrote:

  On Feb 26, 2013, at 12:44 AM, Sašo Kiselkov wrote:

   I'd also recommend that you go and subscribe to
 z...@lists.illumos.org, since this list is going to get shut
   down by Oracle next month.


 Whose description still reads, everything ZFS running on illumos-based
 distributions.


 Even FreeBSD's zfs is now based on zfs from Illumos.  FreeBSD and Linux
 zfs developers contribute fixes back to zfs in Illumos.

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/**
 users/bfriesen/ http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-27 Thread Sašo Kiselkov
On 02/27/2013 12:32 PM, Ahmed Kamal wrote:
 How is the quality of the ZFS Linux port today? Is it comparable to Illumos
 or at least FreeBSD ? Can I trust production data to it ?

Can't speak from personal experience, but a colleague of mine has been
PPA builds on Ubuntu and has had, well, less than stellar experience. It
shows promise, but I'm not sure it's there yet.

Cheers,
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-27 Thread Dan Swartzendruber

I've been using it since rc13.  It's been stable for me as long as you don't
get into things like zvols and such... 

-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Sašo Kiselkov
Sent: Wednesday, February 27, 2013 6:37 AM
To: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] ZFS Distro Advice

On 02/27/2013 12:32 PM, Ahmed Kamal wrote:
 How is the quality of the ZFS Linux port today? Is it comparable to 
 Illumos or at least FreeBSD ? Can I trust production data to it ?

Can't speak from personal experience, but a colleague of mine has been PPA
builds on Ubuntu and has had, well, less than stellar experience. It shows
promise, but I'm not sure it's there yet.

Cheers,
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-27 Thread Bob Friesenhahn

On Wed, 27 Feb 2013, Ian Collins wrote:

Magic rsync options used:

-a --inplace --no-whole-file --delete-excluded

This causes rsync to overwrite the file blocks in place rather than
writing to a new temporary file first.  As a result, zfs COW produces
primitive deduplication of at least the unchanged blocks (by writing
nothing) while writing new COW blocks for the changed blocks.


Do these options impact performance or reduce the incremental stream sizes?


I don't see any adverse impact on performance and incremental stream 
size is quite considerably reduced.


The main risk is that if the disk fills up you may end up with a 
corrupted file rather than just an rsync error.  However, the 
snapshots help because an earlier version of the file is likely 
available.


I just use -a --delete and the snapshots don't take up much space (compared 
with the incremental stream sizes).


That is what I used to do before I learned better.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-27 Thread Tim Cook
On Wed, Feb 27, 2013 at 2:57 AM, Dan Swartzendruber dswa...@druber.comwrote:


 I've been using it since rc13.  It's been stable for me as long as you
 don't
 get into things like zvols and such...



Then it definitely isn't at the level of FreeBSD, and personally I would
not consider that production ready.


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-27 Thread Dan Swartzendruber

On 2/27/2013 2:05 PM, Tim Cook wrote:




On Wed, Feb 27, 2013 at 2:57 AM, Dan Swartzendruber 
dswa...@druber.com mailto:dswa...@druber.com wrote:



I've been using it since rc13.  It's been stable for me as long as
you don't
get into things like zvols and such...



Then it definitely isn't at the level of FreeBSD, and personally I 
would not consider that production ready.


Everyone has to make their own risk assessment.  Keep in mind, it is 
described as a release candidate.  I understand zvols are an important 
feature, but I can do without them, so I am...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SVM ZFS

2013-02-27 Thread Alfredo De Luca
Hi Darren. you're right! With solaris 11 and shadow migration feature it's
fantastic.

Not sure which Solaria vers we are talking about here.

Alfredo



On Wed, Feb 27, 2013 at 10:22 PM, Darren J Moffat
darr...@opensolaris.orgwrote:



 On 02/26/13 20:30, Morris Hooten wrote:

 Besides copying data from /dev/md/dsk/x volume manager filesystems to
 new zfs filesystems
 does anyone know of any zfs conversion tools to make the
 conversion/migration from svm to zfs
 easier?


 With Solaris 11 you can use shadow migration, it is really a VFS layer
 feature but it is integrated into the ZFS CLI tools for easy of use

 # zfs create -o shadow=file:///path/to/old  mypool/new

 The new filesystem will appear to instantly have all the data, and it will
 be copied over as it is access as well as shadowd pulling it over in
 advance.

 You can use shadowstat(1M) to show progress.

 --
 Darren J Moffat

 __**_
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/**mailman/listinfo/zfs-discusshttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
*Alfredo*
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Tiernan OToole
Thanks all! I will check out FreeNAS and see what it can do... I will also
check my RAID Card and see if it can work with JBOD... fingers crossed...
The machine has a couple internal SATA ports (think there are 2, could be
4) so i was thinking of using those for boot disks and SSDs later...

As a follow up question: Data Deduplication: The machine, to start, will
have about 5Gb  RAM. I read somewhere that 20TB storage would require about
8GB RAM, depending on block size... Since i dont know block sizes, yet (i
store a mix of VMs, TV Shows, Movies and backups on the NAS) I am not sure
how much memory i will need (my estimate is 10TB RAW (8TB usable?) in a
ZRAID1 pool, and then 3TB RAW in a striped pool). If i dont have enough
memory now, can i enable DeDupe at a later stage when i add memory? Also,
if i pick FreeBSD now, and want to move to, say, Nexenta, is that possible?
Assuming the drives are just JBOD drives (to be confirmed) could they just
get imported?

Thanks.


On Mon, Feb 25, 2013 at 6:11 PM, Tim Cook t...@cook.ms wrote:




 On Mon, Feb 25, 2013 at 7:57 AM, Volker A. Brandt v...@bb-c.de wrote:

 Tim Cook writes:
   I need something that will allow me to share files over SMB (3 if
   possible), NFS, AFP (for Time Machine) and iSCSI. Ideally, i would
   like something i can manage easily and something that works with
   the Dell...
 
  All of them should provide the basic functionality you're looking
  for.
   None of them will provide SMB3 (at all) or AFP (without a third
  party package).

 FreeNAS has AFP built-in, including a Time Machine discovery method.

 The latest FreeNAS is still based on Samba 3.x, but they are aware
 of 4.x and will probably integrate it at some point in the future.
 Then you should have SMB3.  I don't know how far along they are...


 Best regards -- Volker



 FreeNAS comes with a package pre-installed to add AFP support.  There is
 no native AFP support in FreeBSD and by association FreeNAS.

 --Tim





-- 
Tiernan O'Toole
blog.lotas-smartman.net
www.geekphotographer.com
www.tiernanotoole.ie
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Sašo Kiselkov
On 02/26/2013 09:33 AM, Tiernan OToole wrote:
 As a follow up question: Data Deduplication: The machine, to start, will
 have about 5Gb  RAM. I read somewhere that 20TB storage would require about
 8GB RAM, depending on block size...

The typical wisdom is that 1TB of dedup'ed data = 1GB of RAM. 5GB of RAM
seems too small for a 20TB pool of dedup'ed data.
Unless you know what you're doing, I'd go with just compression and let
dedup be - compression has known performance and doesn't suffer with
scaling.

 If i dont have enough memory now, can i enable DeDupe at a later stage
 when i add memory?

Yes.

 Also, if i pick FreeBSD now, and want to move to, say, Nexenta, is that
 possible? Assuming the drives are just JBOD drives (to be confirmed)
 could they just get imported?

Yes, that's the whole point of open storage.

I'd also recommend that you go and subscribe to z...@lists.illumos.org,
since this list is going to get shut down by Oracle next month.

Cheers,
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Tim Cook
On Mon, Feb 25, 2013 at 10:33 PM, Tiernan OToole lsmart...@gmail.comwrote:

 Thanks all! I will check out FreeNAS and see what it can do... I will also
 check my RAID Card and see if it can work with JBOD... fingers crossed...
 The machine has a couple internal SATA ports (think there are 2, could be
 4) so i was thinking of using those for boot disks and SSDs later...

 As a follow up question: Data Deduplication: The machine, to start, will
 have about 5Gb  RAM. I read somewhere that 20TB storage would require about
 8GB RAM, depending on block size... Since i dont know block sizes, yet (i
 store a mix of VMs, TV Shows, Movies and backups on the NAS) I am not sure
 how much memory i will need (my estimate is 10TB RAW (8TB usable?) in a
 ZRAID1 pool, and then 3TB RAW in a striped pool). If i dont have enough
 memory now, can i enable DeDupe at a later stage when i add memory? Also,
 if i pick FreeBSD now, and want to move to, say, Nexenta, is that possible?
 Assuming the drives are just JBOD drives (to be confirmed) could they just
 get imported?

 Thanks.




Yes, you can move between FreeBSD and Illumos based distros as long as you
are at a compatible zpool version (which they currently are).  I'd avoid
deduplication unless you absolutely need it... it's still a bit of a
kludge.  Stick to compression and your world will be a much happier place.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Tiernan OToole
Thanks again lads. I will take all that info into advice, and will join
that new group also!

Thanks again!

--Tiernan


On Tue, Feb 26, 2013 at 8:44 AM, Tim Cook t...@cook.ms wrote:



 On Mon, Feb 25, 2013 at 10:33 PM, Tiernan OToole lsmart...@gmail.comwrote:

 Thanks all! I will check out FreeNAS and see what it can do... I will
 also check my RAID Card and see if it can work with JBOD... fingers
 crossed... The machine has a couple internal SATA ports (think there are 2,
 could be 4) so i was thinking of using those for boot disks and SSDs
 later...

 As a follow up question: Data Deduplication: The machine, to start, will
 have about 5Gb  RAM. I read somewhere that 20TB storage would require about
 8GB RAM, depending on block size... Since i dont know block sizes, yet (i
 store a mix of VMs, TV Shows, Movies and backups on the NAS) I am not sure
 how much memory i will need (my estimate is 10TB RAW (8TB usable?) in a
 ZRAID1 pool, and then 3TB RAW in a striped pool). If i dont have enough
 memory now, can i enable DeDupe at a later stage when i add memory? Also,
 if i pick FreeBSD now, and want to move to, say, Nexenta, is that possible?
 Assuming the drives are just JBOD drives (to be confirmed) could they just
 get imported?

 Thanks.




 Yes, you can move between FreeBSD and Illumos based distros as long as you
 are at a compatible zpool version (which they currently are).  I'd avoid
 deduplication unless you absolutely need it... it's still a bit of a
 kludge.  Stick to compression and your world will be a much happier place.

 --Tim





-- 
Tiernan O'Toole
blog.lotas-smartman.net
www.geekphotographer.com
www.tiernanotoole.ie
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Robert Milkowski
Solaris 11.1 (free for non-prod use). 

 

 

From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Tiernan OToole
Sent: 25 February 2013 14:58
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] ZFS Distro Advice

 

Good morning all.

 

My home NAS died over the weekend, and it leaves me with a lot of spare
drives (5 2Tb and 3 1Tb disks). I have a Dell Poweredge 2900 Server sitting
in the house, which has not been doing much over the last while (bought it a
few years back with the intent of using it as a storage box, since it has 8
Hot Swap drive bays) and i am now looking at building the NAS using ZFS...

 

But, now i am confused as to what OS to use... OpenIndiana? Nexenta?
FreeNAS/FreeBSD? 

 

I need something that will allow me to share files over SMB (3 if possible),
NFS, AFP (for Time Machine) and iSCSI. Ideally, i would like something i can
manage easily and something that works with the Dell... 

 

Any recommendations? Any comparisons to each? 

 

Thanks.


 

-- 
Tiernan O'Toole
blog.lotas-smartman.net
www.geekphotographer.com
www.tiernanotoole.ie 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2013-02-26 Thread hagai
for what is worth.. 
I had the same problem and found the answer here - 
http://forums.freebsd.org/showthread.php?t=27207


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Gary Driggs
On Feb 26, 2013, at 12:44 AM, Sašo Kiselkov wrote:

I'd also recommend that you go and subscribe to z...@lists.illumos.org, since
this list is going to get shut down by Oracle next month.


Whose description still reads, everything ZFS running on illumos-based
distributions.

-Gary
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Sašo Kiselkov
On 02/26/2013 03:51 PM, Gary Driggs wrote:
 On Feb 26, 2013, at 12:44 AM, Sašo Kiselkov wrote:
 
 I'd also recommend that you go and subscribe to z...@lists.illumos.org, since
 this list is going to get shut down by Oracle next month.
 
 Whose description still reads, everything ZFS running on illumos-based
 distributions.

We've never dismissed any topic or issue as not our problem. All
sensible ZFS-related discussion is welcome and taken seriously.

Cheers,
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Eugen Leitl
On Tue, Feb 26, 2013 at 06:51:08AM -0800, Gary Driggs wrote:
 On Feb 26, 2013, at 12:44 AM, Sašo Kiselkov wrote:
 
 I'd also recommend that you go and subscribe to z...@lists.illumos.org, since

I can't seem to find this list. Do you have an URL for that?
Mailman, hopefully?

 this list is going to get shut down by Oracle next month.
 
 
 Whose description still reads, everything ZFS running on illumos-based
 distributions.
 
 -Gary

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Sašo Kiselkov
On 02/26/2013 05:57 PM, Eugen Leitl wrote:
 On Tue, Feb 26, 2013 at 06:51:08AM -0800, Gary Driggs wrote:
 On Feb 26, 2013, at 12:44 AM, Sašo Kiselkov wrote:

 I'd also recommend that you go and subscribe to z...@lists.illumos.org, since
 
 I can't seem to find this list. Do you have an URL for that?
 Mailman, hopefully?

http://wiki.illumos.org/display/illumos/illumos+Mailing+Lists

--
Saso

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Eugen Leitl
On Tue, Feb 26, 2013 at 06:01:39PM +0100, Sašo Kiselkov wrote:
 On 02/26/2013 05:57 PM, Eugen Leitl wrote:
  On Tue, Feb 26, 2013 at 06:51:08AM -0800, Gary Driggs wrote:
  On Feb 26, 2013, at 12:44 AM, Sašo Kiselkov wrote:
 
  I'd also recommend that you go and subscribe to z...@lists.illumos.org, 
  since
  
  I can't seem to find this list. Do you have an URL for that?
  Mailman, hopefully?
 
 http://wiki.illumos.org/display/illumos/illumos+Mailing+Lists

Oh, it's the illumos-zfs one. Had me confused.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2013-02-26 Thread Paul Kraus
Be careful when testing ZFS with ozone, I ran a bunch of stats many 
years ago that produced results that did not pass a basic sanity check. There 
was *something* about the ozone test data that ZFS either did not like or liked 
very much, depending on the specific test.

I eventually wrote my own very crude tool to test exactly what our 
workload was and started getting results that matched the reality we saw.

On Jul 17, 2012, at 4:18 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us 
wrote:

 On Tue, 17 Jul 2012, Michael Hase wrote:
 
 To work around these caching effects just use a file  2 times the size of 
 ram, iostat then shows the numbers really coming from disk. I always test 
 like this. a re-read rate of 8.2 GB/s is really just memory bandwidth, but 
 quite impressive ;-)
 
 Ok, the iozone benchmark finally completed.  The results do suggest that 
 reading from mirrors substantially improves the throughput. This is 
 interesting since the results differ (better than) from my 'virgin mount' 
 test approach:
 
 Command line used: iozone -a -i 0 -i 1 -y 64 -q 512 -n 8G -g 256G
 
  KB  reclen   write rewritereadreread
 8388608  64  572933 1008668  6945355  7509762
 8388608 128 2753805 2388803  6482464  7041942
 8388608 256 2508358 2331419  2969764  3045430
 8388608 512 2407497 2131829  3021579  3086763
16777216  64  671365  879080  6323844  6608806
16777216 128 1279401 2286287  6409733  6739226
16777216 256 2382223 2211097  2957624  3021704
16777216 512 2237742 2179611  3048039  3085978
33554432  64  933712  699966  6418428  6604694
33554432 128  459896  431640  6443848  6546043
33554432 256  90  430989  2997615  3026246
33554432 512  427158  430891  3042620  3100287
67108864  64  426720  427167  6628750  6738623
67108864 128  419328  422581  153  6743711
67108864 256  419441  419129  3044352  3056615
67108864 512  431053  417203  3090652  3112296
   134217728  64  417668   55434   759351   760994
   134217728 128  409383  400433   759161   765120
   134217728 256  408193  405868   763892   766184
   134217728 512  408114  403473   761683   766615
   268435456  64  418910   55239   768042   768498
   268435456 128  408990  399732   763279   766882
   268435456 256  413919  399386   760800   764468
   268435456 512  410246  403019   766627   768739
 
 Bob
 -- 
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Paul Kraus
Deputy Technical Director, LoneStarCon 3
Sound Coordinator, Schenectady Light Opera Company

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Richard Elling
On Feb 26, 2013, at 12:33 AM, Tiernan OToole lsmart...@gmail.com wrote:

 Thanks all! I will check out FreeNAS and see what it can do... I will also 
 check my RAID Card and see if it can work with JBOD... fingers crossed... The 
 machine has a couple internal SATA ports (think there are 2, could be 4) so i 
 was thinking of using those for boot disks and SSDs later... 
 
 As a follow up question: Data Deduplication: The machine, to start, will have 
 about 5Gb  RAM. I read somewhere that 20TB storage would require about 8GB 
 RAM, depending on block size... Since i dont know block sizes, yet (i store a 
 mix of VMs, TV Shows, Movies and backups on the NAS)

Consider using different policies for different data. For traditional file 
systems, you
had relatively few policy options: readonly, nosuid, quota, etc. With ZFS, 
dedup and
compression are also policy options. In your case, dedup for your media is not 
likely
to be a good policy, but dedup for your backups could be a win (unless you're 
using
something that already doesn't backup duplicate data -- eg most backup 
utilities).
A way to approach this is to think of your directory structure and create file 
systems
to match the policies. For example:
/home/richard = compressed (default top-level, since properties are 
inherited)
/home/richard/media = compressed
/home/richard/backup = compressed + dedup

 -- richard

 I am not sure how much memory i will need (my estimate is 10TB RAW (8TB 
 usable?) in a ZRAID1 pool, and then 3TB RAW in a striped pool). If i dont 
 have enough memory now, can i enable DeDupe at a later stage when i add 
 memory? Also, if i pick FreeBSD now, and want to move to, say, Nexenta, is 
 that possible? Assuming the drives are just JBOD drives (to be confirmed) 
 could they just get imported?
 
 Thanks.
 
 
 On Mon, Feb 25, 2013 at 6:11 PM, Tim Cook t...@cook.ms wrote:
 
 
 
 On Mon, Feb 25, 2013 at 7:57 AM, Volker A. Brandt v...@bb-c.de wrote:
 Tim Cook writes:
   I need something that will allow me to share files over SMB (3 if
   possible), NFS, AFP (for Time Machine) and iSCSI. Ideally, i would
   like something i can manage easily and something that works with
   the Dell...
 
  All of them should provide the basic functionality you're looking
  for.
   None of them will provide SMB3 (at all) or AFP (without a third
  party package).
 
 FreeNAS has AFP built-in, including a Time Machine discovery method.
 
 The latest FreeNAS is still based on Samba 3.x, but they are aware
 of 4.x and will probably integrate it at some point in the future.
 Then you should have SMB3.  I don't know how far along they are...
 
 
 Best regards -- Volker
 
 
 
 FreeNAS comes with a package pre-installed to add AFP support.  There is no 
 native AFP support in FreeBSD and by association FreeNAS.  
 
 --Tim
  
 
 
 
 -- 
 Tiernan O'Toole
 blog.lotas-smartman.net
 www.geekphotographer.com
 www.tiernanotoole.ie
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--

richard.ell...@richardelling.com
+1-760-896-4422









___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SVM ZFS

2013-02-26 Thread Morris Hooten
Besides copying data from /dev/md/dsk/x volume manager filesystems to new 
zfs filesystems
does anyone know of any zfs conversion tools to make the 
conversion/migration from svm to zfs 
easier?

Thanks


Morris Hooten
Unix SME
Integrated Technology Delivery
mhoo...@us.ibm.com
Office: 720-342-5614___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Ian Collins

Robert Milkowski wrote:


Solaris 11.1 (free for non-prod use).



But a ticking bomb if you use a cache device.

--

Ian.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Robert Milkowski

 
 Robert Milkowski wrote:
 
  Solaris 11.1 (free for non-prod use).
 
 
 But a ticking bomb if you use a cache device.


It's been fixed in SRU (although this is only for customers with a support
contract - still, will be in 11.2 as well).

Then, I'm sure there are other bugs which are fixed in S11 and not in
Illumos (and vice-versa).

-- 
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Ian Collins

Robert Milkowski wrote:

Robert Milkowski wrote:

Solaris 11.1 (free for non-prod use).


But a ticking bomb if you use a cache device.


It's been fixed in SRU (although this is only for customers with a support
contract - still, will be in 11.2 as well).

Then, I'm sure there are other bugs which are fixed in S11 and not in
Illumos (and vice-versa).



There may well be, but in seven+ years of using ZFS, this was the first 
one to cost me a pool.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   3   4   5   6   7   8   9   10   >