Re: [ceph-users] SLOW SSD's after moving to Bluestore

2018-12-11 Thread Tyler Bishop
>
> [root] osci-1001.infra.cin1.corp:~/cephdeploy # ceph-deploy osd create
>> --filestore --fs-type xfs --data /dev/sdb2 --journal /dev/sdb1 osci-1001
>
> [ceph_deploy.conf][DEBUG ] found configuration file at:
>> /root/.cephdeploy.conf
>
> [ceph_deploy.cli][INFO  ] Invoked (2.0.1): /usr/bin/ceph-deploy osd create
>> --filestore --fs-type xfs --data /dev/sdb2 --journal /dev/sdb1 osci-1001
>
> [ceph_deploy.cli][INFO  ] ceph-deploy options:
>
> [ceph_deploy.cli][INFO  ]  verbose   : False
>
> [ceph_deploy.cli][INFO  ]  bluestore : None
>
> [ceph_deploy.cli][INFO  ]  cd_conf   :
>> 
>
> [ceph_deploy.cli][INFO  ]  cluster   : ceph
>
> [ceph_deploy.cli][INFO  ]  fs_type   : xfs
>
> [ceph_deploy.cli][INFO  ]  block_wal : None
>
> [ceph_deploy.cli][INFO  ]  default_release   : False
>
> [ceph_deploy.cli][INFO  ]  username  : None
>
> [ceph_deploy.cli][INFO  ]  journal   : /dev/sdb1
>
> [ceph_deploy.cli][INFO  ]  subcommand: create
>
> [ceph_deploy.cli][INFO  ]  host  : osci-1001
>
> [ceph_deploy.cli][INFO  ]  filestore : True
>
> [ceph_deploy.cli][INFO  ]  func  : > at 0x7fde72db0578>
>
> [ceph_deploy.cli][INFO  ]  ceph_conf : None
>
> [ceph_deploy.cli][INFO  ]  zap_disk  : False
>
> [ceph_deploy.cli][INFO  ]  data  : /dev/sdb2
>
> [ceph_deploy.cli][INFO  ]  block_db  : None
>
> [ceph_deploy.cli][INFO  ]  dmcrypt   : False
>
> [ceph_deploy.cli][INFO  ]  overwrite_conf: False
>
> [ceph_deploy.cli][INFO  ]  dmcrypt_key_dir   :
>> /etc/ceph/dmcrypt-keys
>
> [ceph_deploy.cli][INFO  ]  quiet : False
>
> [ceph_deploy.cli][INFO  ]  debug : False
>
> [ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device
>> /dev/sdb2
>
> [osci-1001][DEBUG ] connected to host: osci-1001
>
> [osci-1001][DEBUG ] detect platform information from remote host
>
> [osci-1001][DEBUG ] detect machine type
>
> [osci-1001][DEBUG ] find the location of an executable
>
> [ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.5.1804 Core
>
> [ceph_deploy.osd][DEBUG ] Deploying osd to osci-1001
>
> [osci-1001][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
>
> [osci-1001][DEBUG ] find the location of an executable
>
> [osci-1001][INFO  ] Running command: /usr/sbin/ceph-volume --cluster ceph
>> lvm create --filestore --data /dev/sdb2 --journal /dev/sdb1
>
> [osci-1001][WARNIN] -->  RuntimeError: command returned non-zero exit
>> status: 1
>
> [osci-1001][DEBUG ] Running command: /bin/ceph-authtool --gen-print-key
>
> [osci-1001][DEBUG ] Running command: /bin/ceph --cluster ceph --name
>> client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i
>> - osd new 81522145-e31b-4325-83fd-6cfefc1b761f
>
> [osci-1001][DEBUG ] Running command: vgcreate --force --yes
>> ceph-7b308a5a-a8e9-48aa-86a9-39957dcbd1eb /dev/sdb2
>
> [osci-1001][DEBUG ]  stdout: Physical volume "/dev/sdb2" successfully
>> created.
>
> [osci-1001][DEBUG ]  stdout: Volume group
>> "ceph-7b308a5a-a8e9-48aa-86a9-39957dcbd1eb" successfully created
>
> [osci-1001][DEBUG ] Running command: lvcreate --yes -l 100%FREE -n
>> osd-data-81522145-e31b-4325-83fd-6cfefc1b761f
>> ceph-7b308a5a-a8e9-48aa-86a9-39957dcbd1eb
>
> [osci-1001][DEBUG ]  stdout: Logical volume
>> "osd-data-81522145-e31b-4325-83fd-6cfefc1b761f" created.
>
> [osci-1001][DEBUG ] Running command: /bin/ceph-authtool --gen-print-key
>
> [osci-1001][DEBUG ] Running command: mkfs -t xfs -f -i size=2048
>> /dev/ceph-7b308a5a-a8e9-48aa-86a9-39957dcbd1eb/osd-data-81522145-e31b-4325-83fd-6cfefc1b761f
>
> [osci-1001][DEBUG ]  stdout:
>> meta-data=/dev/ceph-7b308a5a-a8e9-48aa-86a9-39957dcbd1eb/osd-data-81522145-e31b-4325-83fd-6cfefc1b761f
>> isize=2048   agcount=4, agsize=58239488 blks
>
> [osci-1001][DEBUG ]  =   sectsz=4096  attr=2,
>> projid32bit=1
>
> [osci-1001][DEBUG ]  =   crc=1
>> finobt=0, sparse=0
>
> [osci-1001][DEBUG ] data =   bsize=4096
>>  blocks=232957952, imaxpct=25
>
> [osci-1001][DEBUG ]  =   sunit=0  swidth=0
>> blks
>
> [osci-1001][DEBUG ] naming   =version 2  bsize=4096
>>  ascii-ci=0 ftype=1
>
> [osci-1001][DEBUG ] log  =internal log   bsize=4096
>>  blocks=113749, version=2
>
> [osci-1001][DEBUG ]  =   sectsz=4096  sunit=1
>> blks, lazy-count=1
>
> [osci-1001][DEBUG ] realtime =none   extsz=4096
>>  blocks=0, rtextents=0
>
> [osci-1001][DEBUG ] Running command: mount -t xfs -o
>> "rw,noatime,noquota,logbsize=256k,logbufs=8,inode64,allocsize=4M,delaylog"
>> 

Re: [ceph-users] SLOW SSD's after moving to Bluestore

2018-12-11 Thread Tyler Bishop
Now I'm just trying to figure out how to create filestore in Luminous.
I've read every doc and tried every flag but I keep ending up with
either a data LV of 100% on the VG or a bunch fo random errors for
unsupported flags...

# ceph-disk prepare --filestore --fs-type xfs --data-dev /dev/sdb1
--journal-dev /dev/sdb2 --osd-id 3
usage: ceph-disk [-h] [-v] [--log-stdout] [--prepend-to-path PATH]
 [--statedir PATH] [--sysconfdir PATH] [--setuser USER]
 [--setgroup GROUP]


{prepare,activate,activate-lockbox,activate-block,activate-journal,activate-all,list,suppress-activate,unsuppress-activate,deactivate,destroy,zap,trigger,fix}
 ...
ceph-disk: error: unrecognized arguments: /dev/sdb1
On Tue, Dec 11, 2018 at 7:22 PM Christian Balzer  wrote:
>
>
> Hello,
>
> On Tue, 11 Dec 2018 23:22:40 +0300 Igor Fedotov wrote:
>
> > Hi Tyler,
> >
> > I suspect you have BlueStore DB/WAL at these drives as well, don't you?
> >
> > Then perhaps you have performance issues with f[data]sync requests which
> > DB/WAL invoke pretty frequently.
> >
> Since he explicitly mentioned using these SSDs with filestore AND the
> journals on the same SSD I'd expect a similar impact aka piss-poor
> performance in his existing setup (the 300 other OSDs).
>
> Unless of course some bluestore is significantly more sync happy than the
> filestore journal and/or other bluestore particulars (reduced caching
> space, not caching in some situations) are rearing their ugly heads.
>
> Christian
>
> > See the following links for details:
> >
> > https://www.percona.com/blog/2018/02/08/fsync-performance-storage-devices/
> >
> > https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
> >
> > The latter link shows pretty poor numbers for M500DC drives.
> >
> >
> > Thanks,
> >
> > Igor
> >
> >
> > On 12/11/2018 4:58 AM, Tyler Bishop wrote:
> >
> > > Older Crucial/Micron M500/M600
> > > _
> > >
> > > *Tyler Bishop*
> > > EST 2007
> > >
> > >
> > > O:513-299-7108 x1000
> > > M:513-646-5809
> > > http://BeyondHosting.net <http://beyondhosting.net/>
> > >
> > >
> > > This email is intended only for the recipient(s) above and/or
> > > otherwise authorized personnel. The information contained herein and
> > > attached is confidential and the property of Beyond Hosting. Any
> > > unauthorized copying, forwarding, printing, and/or disclosing
> > > any information related to this email is prohibited. If you received
> > > this message in error, please contact the sender and destroy all
> > > copies of this email and any attachment(s).
> > >
> > >
> > > On Mon, Dec 10, 2018 at 8:57 PM Christian Balzer  > > <mailto:ch...@gol.com>> wrote:
> > >
> > > Hello,
> > >
> > > On Mon, 10 Dec 2018 20:43:40 -0500 Tyler Bishop wrote:
> > >
> > > > I don't think thats my issue here because I don't see any IO to
> > > justify the
> > > > latency.  Unless the IO is minimal and its ceph issuing a bunch
> > > of discards
> > > > to the ssd and its causing it to slow down while doing that.
> > > >
> > >
> > > What does atop have to say?
> > >
> > > Discards/Trims are usually visible in it, this is during a fstrim of a
> > > RAID1 / :
> > > ---
> > > DSK |  sdb  | busy 81% |  read   0 | write  8587
> > > | MBw/s 2323.4 |  avio 0.47 ms |
> > > DSK |  sda  | busy 70% |  read   2 | write  8587
> > > | MBw/s 2323.4 |  avio 0.41 ms |
> > > ---
> > >
> > > The numbers tend to be a lot higher than what the actual interface is
> > > capable of, clearly the SSD is reporting its internal activity.
> > >
> > > In any case, it should give a good insight of what is going on
> > > activity
> > > wise.
> > > Also for posterity and curiosity, what kind of SSDs?
> > >
> > > Christian
> > >
> > > > Log isn't showing anything useful and I have most debugging
> > > disabled.
> > > >
> > > >
> > > >
> > > > On Mon, Dec 10, 2018 at 7:43 PM Mark Nelson  > > <mailto:mnel...@redhat.com>> wrote:
> > > >
> > >

Re: [ceph-users] SLOW SSD's after moving to Bluestore

2018-12-10 Thread Tyler Bishop
All 4 of these SSD that i've converted to Bluestore are behaving this
way.   I have around 300 of these drives in a very large production
cluster and do not have this type of behavior with Filestore.

On the filestore setup these SSD are partitioned 20GB for journal and
800GB for data.

The systems were never powered off or anything during the conversion
from filestore to bluestore.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SLOW SSD's after moving to Bluestore

2018-12-10 Thread Tyler Bishop
LVM | dm-0  | busy101%  | read 137  | write   1761  |
KiB/r  4  | KiB/w 30  | MBr/s0.1  | MBw/s5.3  | avq
185.42  | avio 5.31 ms  |
DSK |  sdb  | busy100%  | read 127  | write   1208  |
KiB/r  4  | KiB/w 32  | MBr/s0.1  | MBw/s3.9  | avq
58.39  | avio 7.49 ms  |
_

Tyler Bishop
EST 2007


O: 513-299-7108 x1000
M: 513-646-5809
http://BeyondHosting.net


This email is intended only for the recipient(s) above and/or
otherwise authorized personnel. The information contained herein and
attached is confidential and the property of Beyond Hosting. Any
unauthorized copying, forwarding, printing, and/or disclosing any
information related to this email is prohibited. If you received this
message in error, please contact the sender and destroy all copies of
this email and any attachment(s).


On Mon, Dec 10, 2018 at 8:57 PM Christian Balzer  wrote:
>
> Hello,
>
> On Mon, 10 Dec 2018 20:43:40 -0500 Tyler Bishop wrote:
>
> > I don't think thats my issue here because I don't see any IO to justify the
> > latency.  Unless the IO is minimal and its ceph issuing a bunch of discards
> > to the ssd and its causing it to slow down while doing that.
> >
>
> What does atop have to say?
>
> Discards/Trims are usually visible in it, this is during a fstrim of a
> RAID1 / :
> ---
> DSK |  sdb  | busy 81% |  read   0 | write   8587  | MBw/s 
> 2323.4 |  avio 0.47 ms |
> DSK |  sda  | busy 70% |  read   2 | write   8587  | MBw/s 
> 2323.4 |  avio 0.41 ms |
> ---
>
> The numbers tend to be a lot higher than what the actual interface is
> capable of, clearly the SSD is reporting its internal activity.
>
> In any case, it should give a good insight of what is going on activity
> wise.
> Also for posterity and curiosity, what kind of SSDs?
>
> Christian
>
> > Log isn't showing anything useful and I have most debugging disabled.
> >
> >
> >
> > On Mon, Dec 10, 2018 at 7:43 PM Mark Nelson  wrote:
> >
> > > Hi Tyler,
> > >
> > > I think we had a user a while back that reported they had background
> > > deletion work going on after upgrading their OSDs from filestore to
> > > bluestore due to PGs having been moved around.  Is it possible that your
> > > cluster is doing a bunch of work (deletion or otherwise) beyond the
> > > regular client load?  I don't remember how to check for this off the top
> > > of my head, but it might be something to investigate.  If that's what it
> > > is, we just recently added the ability to throttle background deletes:
> > >
> > > https://github.com/ceph/ceph/pull/24749
> > >
> > >
> > > If the logs/admin socket don't tell you anything, you could also try
> > > using our wallclock profiler to see what the OSD is spending it's time
> > > doing:
> > >
> > > https://github.com/markhpc/gdbpmp/
> > >
> > >
> > > ./gdbpmp -t 1000 -p`pidof ceph-osd` -o foo.gdbpmp
> > >
> > > ./gdbpmp -i foo.gdbpmp -t 1
> > >
> > >
> > > Mark
> > >
> > > On 12/10/18 6:09 PM, Tyler Bishop wrote:
> > > > Hi,
> > > >
> > > > I have an SSD only cluster that I recently converted from filestore to
> > > > bluestore and performance has totally tanked. It was fairly decent
> > > > before, only having a little additional latency than expected.  Now
> > > > since converting to bluestore the latency is extremely high, SECONDS.
> > > > I am trying to determine if it an issue with the SSD's or Bluestore
> > > > treating them differently than filestore... potential garbage
> > > > collection? 24+ hrs ???
> > > >
> > > > I am now seeing constant 100% IO utilization on ALL of the devices and
> > > > performance is terrible!
> > > >
> > > > IOSTAT
> > > >
> > > > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> > > >1.370.000.34   18.590.00   79.70
> > > >
> > > > Device: rrqm/s   wrqm/s r/s w/s rkB/swkB/s
> > > > avgrq-sz avgqu-sz   await r_await w_await svctm  %util
> > > > sda   0.00 0.000.009.50  0.0064.00
> > > > 13.47 0.011.160.001.16  1.11   1.05
> > > > sdb   0.0096.504.50   46.50 34.00 11776.00
> > > >  463.14   132.68 1174.84  782.67 1212.80 19.61 100.00
> > > > dm-0  0.00 0.005.50  128.00

Re: [ceph-users] SLOW SSD's after moving to Bluestore

2018-12-10 Thread Tyler Bishop
Older Crucial/Micron M500/M600
_

*Tyler Bishop*
EST 2007


O: 513-299-7108 x1000
M: 513-646-5809
http://BeyondHosting.net <http://beyondhosting.net/>


This email is intended only for the recipient(s) above and/or
otherwise authorized personnel. The information contained herein and
attached is confidential and the property of Beyond Hosting. Any
unauthorized copying, forwarding, printing, and/or disclosing
any information related to this email is prohibited. If you received this
message in error, please contact the sender and destroy all copies of this
email and any attachment(s).


On Mon, Dec 10, 2018 at 8:57 PM Christian Balzer  wrote:

> Hello,
>
> On Mon, 10 Dec 2018 20:43:40 -0500 Tyler Bishop wrote:
>
> > I don't think thats my issue here because I don't see any IO to justify
> the
> > latency.  Unless the IO is minimal and its ceph issuing a bunch of
> discards
> > to the ssd and its causing it to slow down while doing that.
> >
>
> What does atop have to say?
>
> Discards/Trims are usually visible in it, this is during a fstrim of a
> RAID1 / :
> ---
> DSK |  sdb  | busy 81% |  read   0 | write   8587  | MBw/s
> 2323.4 |  avio 0.47 ms |
> DSK |  sda  | busy 70% |  read   2 | write   8587  | MBw/s
> 2323.4 |  avio 0.41 ms |
> ---
>
> The numbers tend to be a lot higher than what the actual interface is
> capable of, clearly the SSD is reporting its internal activity.
>
> In any case, it should give a good insight of what is going on activity
> wise.
> Also for posterity and curiosity, what kind of SSDs?
>
> Christian
>
> > Log isn't showing anything useful and I have most debugging disabled.
> >
> >
> >
> > On Mon, Dec 10, 2018 at 7:43 PM Mark Nelson  wrote:
> >
> > > Hi Tyler,
> > >
> > > I think we had a user a while back that reported they had background
> > > deletion work going on after upgrading their OSDs from filestore to
> > > bluestore due to PGs having been moved around.  Is it possible that
> your
> > > cluster is doing a bunch of work (deletion or otherwise) beyond the
> > > regular client load?  I don't remember how to check for this off the
> top
> > > of my head, but it might be something to investigate.  If that's what
> it
> > > is, we just recently added the ability to throttle background deletes:
> > >
> > > https://github.com/ceph/ceph/pull/24749
> > >
> > >
> > > If the logs/admin socket don't tell you anything, you could also try
> > > using our wallclock profiler to see what the OSD is spending it's time
> > > doing:
> > >
> > > https://github.com/markhpc/gdbpmp/
> > >
> > >
> > > ./gdbpmp -t 1000 -p`pidof ceph-osd` -o foo.gdbpmp
> > >
> > > ./gdbpmp -i foo.gdbpmp -t 1
> > >
> > >
> > > Mark
> > >
> > > On 12/10/18 6:09 PM, Tyler Bishop wrote:
> > > > Hi,
> > > >
> > > > I have an SSD only cluster that I recently converted from filestore
> to
> > > > bluestore and performance has totally tanked. It was fairly decent
> > > > before, only having a little additional latency than expected.  Now
> > > > since converting to bluestore the latency is extremely high, SECONDS.
> > > > I am trying to determine if it an issue with the SSD's or Bluestore
> > > > treating them differently than filestore... potential garbage
> > > > collection? 24+ hrs ???
> > > >
> > > > I am now seeing constant 100% IO utilization on ALL of the devices
> and
> > > > performance is terrible!
> > > >
> > > > IOSTAT
> > > >
> > > > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> > > >1.370.000.34   18.590.00   79.70
> > > >
> > > > Device: rrqm/s   wrqm/s r/s w/s rkB/swkB/s
> > > > avgrq-sz avgqu-sz   await r_await w_await svctm  %util
> > > > sda   0.00 0.000.009.50  0.0064.00
> > > > 13.47 0.011.160.001.16  1.11   1.05
> > > > sdb   0.0096.504.50   46.50 34.00 11776.00
> > > >  463.14   132.68 1174.84  782.67 1212.80 19.61 100.00
> > > > dm-0  0.00 0.005.50  128.00 44.00  8162.00
> > > >  122.94   507.84 1704.93  674.09 1749.23  7.49 100.00
> > > >
> > > > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> > > >0.850.00  

Re: [ceph-users] SLOW SSD's after moving to Bluestore

2018-12-10 Thread Tyler Bishop
I don't think thats my issue here because I don't see any IO to justify the
latency.  Unless the IO is minimal and its ceph issuing a bunch of discards
to the ssd and its causing it to slow down while doing that.

Log isn't showing anything useful and I have most debugging disabled.



On Mon, Dec 10, 2018 at 7:43 PM Mark Nelson  wrote:

> Hi Tyler,
>
> I think we had a user a while back that reported they had background
> deletion work going on after upgrading their OSDs from filestore to
> bluestore due to PGs having been moved around.  Is it possible that your
> cluster is doing a bunch of work (deletion or otherwise) beyond the
> regular client load?  I don't remember how to check for this off the top
> of my head, but it might be something to investigate.  If that's what it
> is, we just recently added the ability to throttle background deletes:
>
> https://github.com/ceph/ceph/pull/24749
>
>
> If the logs/admin socket don't tell you anything, you could also try
> using our wallclock profiler to see what the OSD is spending it's time
> doing:
>
> https://github.com/markhpc/gdbpmp/
>
>
> ./gdbpmp -t 1000 -p`pidof ceph-osd` -o foo.gdbpmp
>
> ./gdbpmp -i foo.gdbpmp -t 1
>
>
> Mark
>
> On 12/10/18 6:09 PM, Tyler Bishop wrote:
> > Hi,
> >
> > I have an SSD only cluster that I recently converted from filestore to
> > bluestore and performance has totally tanked. It was fairly decent
> > before, only having a little additional latency than expected.  Now
> > since converting to bluestore the latency is extremely high, SECONDS.
> > I am trying to determine if it an issue with the SSD's or Bluestore
> > treating them differently than filestore... potential garbage
> > collection? 24+ hrs ???
> >
> > I am now seeing constant 100% IO utilization on ALL of the devices and
> > performance is terrible!
> >
> > IOSTAT
> >
> > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> >1.370.000.34   18.590.00   79.70
> >
> > Device: rrqm/s   wrqm/s r/s w/s rkB/swkB/s
> > avgrq-sz avgqu-sz   await r_await w_await svctm  %util
> > sda   0.00 0.000.009.50  0.0064.00
> > 13.47 0.011.160.001.16  1.11   1.05
> > sdb   0.0096.504.50   46.50 34.00 11776.00
> >  463.14   132.68 1174.84  782.67 1212.80 19.61 100.00
> > dm-0  0.00 0.005.50  128.00 44.00  8162.00
> >  122.94   507.84 1704.93  674.09 1749.23  7.49 100.00
> >
> > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> >0.850.000.30   23.370.00   75.48
> >
> > Device: rrqm/s   wrqm/s r/s w/s rkB/swkB/s
> > avgrq-sz avgqu-sz   await r_await w_await svctm  %util
> > sda   0.00 0.000.003.00  0.0017.00
> > 11.33 0.012.170.002.17  2.17   0.65
> > sdb   0.0024.509.50   40.50 74.00 1.00
> >  402.9683.44 2048.67 1086.11 2274.46 20.00 100.00
> > dm-0  0.00 0.00   10.00   33.50 78.00  2120.00
> >  101.06   287.63 8590.47 1530.40 10697.96 22.99 100.00
> >
> > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> >0.810.000.30   11.400.00   87.48
> >
> > Device: rrqm/s   wrqm/s r/s w/s rkB/swkB/s
> > avgrq-sz avgqu-sz   await r_await w_await svctm  %util
> > sda   0.00 0.000.006.00  0.0040.25
> > 13.42 0.011.330.001.33  1.25   0.75
> > sdb   0.00   314.50   15.50   72.00  122.00 17264.00
> >  397.3961.21 1013.30  740.00 1072.13  11.41  99.85
> > dm-0  0.00 0.00   10.00  427.00 78.00 27728.00
> >  127.26   224.12  712.01 1147.00  701.82  2.28  99.85
> >
> > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> >1.220.000.294.010.00   94.47
> >
> > Device: rrqm/s   wrqm/s r/s w/s rkB/swkB/s
> > avgrq-sz avgqu-sz   await r_await w_await svctm  %util
> > sda   0.00 0.000.003.50  0.0017.00
> >  9.71 0.001.290.001.29  1.14   0.40
> > sdb   0.00 0.001.00   39.50  8.00 10112.00
> >  499.7578.19 1711.83 1294.50 1722.39 24.69 100.00
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] SLOW SSD's after moving to Bluestore

2018-12-10 Thread Tyler Bishop
Hi,

I have an SSD only cluster that I recently converted from filestore to
bluestore and performance has totally tanked.  It was fairly decent before,
only having a little additional latency than expected.  Now since
converting to bluestore the latency is extremely high, SECONDS.  I am
trying to determine if it an issue with the SSD's or Bluestore treating
them differently than filestore... potential garbage collection? 24+ hrs ???

I am now seeing constant 100% IO utilization on ALL of the devices and
performance is terrible!

IOSTAT

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   1.370.000.34   18.590.00   79.70

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 0.000.009.50 0.0064.0013.47
   0.011.160.001.16   1.11   1.05
sdb   0.0096.504.50   46.5034.00 11776.00   463.14
 132.68 1174.84  782.67 1212.80  19.61 100.00
dm-0  0.00 0.005.50  128.0044.00  8162.00   122.94
 507.84 1704.93  674.09 1749.23   7.49 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.850.000.30   23.370.00   75.48

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 0.000.003.00 0.0017.0011.33
   0.012.170.002.17   2.17   0.65
sdb   0.0024.509.50   40.5074.00 1.00   402.96
  83.44 2048.67 1086.11 2274.46  20.00 100.00
dm-0  0.00 0.00   10.00   33.5078.00  2120.00   101.06
 287.63 8590.47 1530.40 10697.96  22.99 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.810.000.30   11.400.00   87.48

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 0.000.006.00 0.0040.2513.42
   0.011.330.001.33   1.25   0.75
sdb   0.00   314.50   15.50   72.00   122.00 17264.00   397.39
  61.21 1013.30  740.00 1072.13  11.41  99.85
dm-0  0.00 0.00   10.00  427.0078.00 27728.00   127.26
 224.12  712.01 1147.00  701.82   2.28  99.85

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   1.220.000.294.010.00   94.47

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 0.000.003.50 0.0017.00 9.71
   0.001.290.001.29   1.14   0.40
sdb   0.00 0.001.00   39.50 8.00 10112.00   499.75
  78.19 1711.83 1294.50 1722.39  24.69 100.00
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] kRBD write performance for high IO use cases

2018-09-09 Thread Tyler Bishop
Running 3.10 but I don't think i can change the depth on this older kernel.

I see that config option on my 4.9 test machine.  I wonder if that will
help a lot!  My cluster has wait but it seems entirely limited by the RBD
client.. the OSD are not busy and i don't really have any iowait at all.
_

*Tyler Bishop*
EST 2007


O: 513-299-7108 x1000
M: 513-646-5809
http://BeyondHosting.net <http://beyondhosting.net/>


This email is intended only for the recipient(s) above and/or
otherwise authorized personnel. The information contained herein and
attached is confidential and the property of Beyond Hosting. Any
unauthorized copying, forwarding, printing, and/or disclosing
any information related to this email is prohibited. If you received this
message in error, please contact the sender and destroy all copies of this
email and any attachment(s).


On Sat, Sep 8, 2018 at 4:56 AM Ilya Dryomov  wrote:

> On Sat, Sep 8, 2018 at 1:52 AM Tyler Bishop
>  wrote:
> >
> > I have a fairly large cluster running ceph bluestore with extremely fast
> SAS ssd for the metadata.  Doing FIO benchmarks I am getting 200k-300k
> random write iops but during sustained workloads of ElasticSearch my
> clients seem to hit a wall of around 1100 IO/s per RBD device.  I've tried
> 1 RBD and 4 RBD devices and I still only get 1100 IO per device, so 4
> devices gets me around 4k.
> >
> > Is there some sort of setting that limits each RBD devices performance?
> I've tried playing with nr_requests but that don't seem to change it at
> all... I'm just looking for another 20-30% performance on random write
> io... I even thought about doing raid 0 across 4-8 rbd devices just to get
> the io performance.
>
> What is the I/O profile of that workload?  How did you arrive at the
> 20-30% number?
>
> Which kernel are you running?  Increasing nr_requests doesn't actually
> increase the queue depth, at least on anything moderately recent.  You
> need to map with queue_depth=X for that, see [1] for details.
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b55841807fb864eccca0167650a65722fd7cd553
>
> Thanks,
>
> Ilya
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] kRBD write performance for high IO use cases

2018-09-07 Thread Tyler Bishop
I have a fairly large cluster running ceph bluestore with extremely fast
SAS ssd for the metadata.  Doing FIO benchmarks I am getting 200k-300k
random write iops but during sustained workloads of ElasticSearch my
clients seem to hit a wall of around 1100 IO/s per RBD device.  I've tried
1 RBD and 4 RBD devices and I still only get 1100 IO per device, so 4
devices gets me around 4k.

Is there some sort of setting that limits each RBD devices performance?
I've tried playing with nr_requests but that don't seem to change it at
all... I'm just looking for another 20-30% performance on random write
io... I even thought about doing raid 0 across 4-8 rbd devices just to get
the io performance.

Thoughts?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Best practices for allocating memory to bluestore cache

2018-08-30 Thread Tyler Bishop
Hi,

My OSD host has 256GB of ram and I have 52 OSD.  Currently I have the cache
set to 1GB and the system only consumes around 44GB of ram and the other
ram sits as unallocated because I am using bluestore vs filestore.

The documentation:
http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/
list the defaults of ram to be used almost exclusively for the KV cache.

With a system like mine do you think It would be safe to allow 3GB cache
and change the KV ratio to 0.60?

Thanks
_

*Tyler Bishop*
EST 2007


O: 513-299-7108 x1000
M: 513-646-5809
http://BeyondHosting.net <http://beyondhosting.net/>


This email is intended only for the recipient(s) above and/or
otherwise authorized personnel. The information contained herein and
attached is confidential and the property of Beyond Hosting. Any
unauthorized copying, forwarding, printing, and/or disclosing
any information related to this email is prohibited. If you received this
message in error, please contact the sender and destroy all copies of this
email and any attachment(s).
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore crashing constantly with load on newly created cluster/host.

2018-08-28 Thread Tyler Bishop
After moving back to tcmalloc my random crash issues have been resolved.
 I would advise disabling support for jemalloc on bluestore since its not
stable or safe... seems risky to allow this?
_

*Tyler Bishop*
EST 2007


O: 513-299-7108 x1000
M: 513-646-5809
http://BeyondHosting.net <http://beyondhosting.net/>


This email is intended only for the recipient(s) above and/or
otherwise authorized personnel. The information contained herein and
attached is confidential and the property of Beyond Hosting. Any
unauthorized copying, forwarding, printing, and/or disclosing
any information related to this email is prohibited. If you received this
message in error, please contact the sender and destroy all copies of this
email and any attachment(s).


On Mon, Aug 27, 2018 at 11:15 PM Tyler Bishop <
tyler.bis...@beyondhosting.net> wrote:

> I bumped another post from earlier in the year.  I got this reply:
>
>
> Adam Tygart 
> 11:06 PM (8 minutes ago)
> to me, Kyle, Ceph
> This issue was related to using Jemalloc. Jemalloc is not as well
> tested with Bluestore and lead to lots of segfaults. We moved back to
> the default of tcmalloc with Bluestore and these stopped.
>
> Check /etc/sysconfig/ceph under RHEL based distros.
>
> ---
>
> I had enabled jemalloc in the sysconfig previously. Disabled that and now
> appear to have stable OSDs.
>
>
> On Mon, Aug 27, 2018 at 11:13 PM Alfredo Daniel Rezinovsky <
> alfredo.rezinov...@ingenieria.uncuyo.edu.ar> wrote:
>
>> Have you created the blockdb partitions or LVM manually ?
>>
>> What size?
>> On 27/08/18 23:48, Tyler Bishop wrote:
>>
>> My host has 256GB of ram.  62GB used under most heavy io workload.
>> _
>>
>> *Tyler Bishop*
>> EST 2007
>>
>>
>> O: 513-299-7108 x1000
>> M: 513-646-5809
>> http://BeyondHosting.net <http://beyondhosting.net/>
>>
>>
>> This email is intended only for the recipient(s) above and/or
>> otherwise authorized personnel. The information contained herein and
>> attached is confidential and the property of Beyond Hosting. Any
>> unauthorized copying, forwarding, printing, and/or disclosing
>> any information related to this email is prohibited. If you received this
>> message in error, please contact the sender and destroy all copies of this
>> email and any attachment(s).
>>
>>
>> On Mon, Aug 27, 2018 at 10:36 PM Alfredo Daniel Rezinovsky <
>> alfredo.rezinov...@ingenieria.uncuyo.edu.ar> wrote:
>>
>>> I had blockdb in ssd, with 3 OSDs per host (8G ram) and the default 3G
>>> bluestore_cache_size_ssd
>>>
>>> I stopped having inconsistencies dropping the cache to 1G.
>>>
>>> On 27/08/18 23:32, Tyler Bishop wrote:
>>>
>>> Having a constant segfault issue under io load with my newly created
>>> bluestore deployment.
>>>
>>> https://pastebin.com/82YjXRm7
>>>
>>> Setup is 28GB SSD LVM for block.db and 6T spinner for data.
>>>
>>> Config:
>>> [global]
>>> fsid =  REDACTED
>>> mon_initial_members = cephmon-1001, cephmon-1002, cephmon-1003
>>> mon_host = 10.20.142.5,10.20.142.6,10.20.142.7
>>> auth_cluster_required = cephx
>>> auth_service_required = cephx
>>> auth_client_required = cephx
>>> filestore_xattr_use_omap = true
>>>
>>> # Fixes issue where image is created with newer than supported features
>>> enabled.
>>> rbd_default_features = 3
>>>
>>>
>>> # Debug Tuning
>>> debug_lockdep = 0/0
>>> debug_context = 0/0
>>> debug_crush = 0/0
>>> debug_buffer = 0/0
>>> debug_timer = 0/0
>>> debug_filer = 0/0
>>> debug_objecter = 0/0
>>> debug_rados = 0/0
>>> debug_rbd = 0/0
>>> debug_journaler = 0/0
>>> debug_objectcatcher = 0/0
>>> debug_client = 0/0
>>> debug_osd = 0/0
>>> debug_optracker = 0/0
>>> debug_objclass = 0/0
>>> debug_filestore = 0/0
>>> debug_journal = 0/0
>>> debug_ms = 0/0
>>> debug_monc = 0/0
>>> debug_tp = 0/0
>>> debug_auth = 0/0
>>> debug_finisher = 0/0
>>> debug_heartbeatmap = 0/0
>>> debug_perfcounter = 0/0
>>> debug_asok = 0/0
>>> debug_throttle = 0/0
>>> debug_mon = 0/0
>>> debug_paxos = 0/0
>>> debug_rgw = 0/0
>>>
>>> [osd]
>>> osd_mkfs_type = xfs
>>> osd_mount_options_xfs =
>>> rw,noatime,,nodiratime,inode64,logbsize

Re: [ceph-users] Bluestore crashing constantly with load on newly created cluster/host.

2018-08-27 Thread Tyler Bishop
I bumped another post from earlier in the year.  I got this reply:


Adam Tygart 
11:06 PM (8 minutes ago)
to me, Kyle, Ceph
This issue was related to using Jemalloc. Jemalloc is not as well
tested with Bluestore and lead to lots of segfaults. We moved back to
the default of tcmalloc with Bluestore and these stopped.

Check /etc/sysconfig/ceph under RHEL based distros.

---

I had enabled jemalloc in the sysconfig previously. Disabled that and now
appear to have stable OSDs.


On Mon, Aug 27, 2018 at 11:13 PM Alfredo Daniel Rezinovsky <
alfredo.rezinov...@ingenieria.uncuyo.edu.ar> wrote:

> Have you created the blockdb partitions or LVM manually ?
>
> What size?
> On 27/08/18 23:48, Tyler Bishop wrote:
>
> My host has 256GB of ram.  62GB used under most heavy io workload.
> _____
>
> *Tyler Bishop*
> EST 2007
>
>
> O: 513-299-7108 x1000
> M: 513-646-5809
> http://BeyondHosting.net <http://beyondhosting.net/>
>
>
> This email is intended only for the recipient(s) above and/or
> otherwise authorized personnel. The information contained herein and
> attached is confidential and the property of Beyond Hosting. Any
> unauthorized copying, forwarding, printing, and/or disclosing
> any information related to this email is prohibited. If you received this
> message in error, please contact the sender and destroy all copies of this
> email and any attachment(s).
>
>
> On Mon, Aug 27, 2018 at 10:36 PM Alfredo Daniel Rezinovsky <
> alfredo.rezinov...@ingenieria.uncuyo.edu.ar> wrote:
>
>> I had blockdb in ssd, with 3 OSDs per host (8G ram) and the default 3G
>> bluestore_cache_size_ssd
>>
>> I stopped having inconsistencies dropping the cache to 1G.
>>
>> On 27/08/18 23:32, Tyler Bishop wrote:
>>
>> Having a constant segfault issue under io load with my newly created
>> bluestore deployment.
>>
>> https://pastebin.com/82YjXRm7
>>
>> Setup is 28GB SSD LVM for block.db and 6T spinner for data.
>>
>> Config:
>> [global]
>> fsid =  REDACTED
>> mon_initial_members = cephmon-1001, cephmon-1002, cephmon-1003
>> mon_host = 10.20.142.5,10.20.142.6,10.20.142.7
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>> filestore_xattr_use_omap = true
>>
>> # Fixes issue where image is created with newer than supported features
>> enabled.
>> rbd_default_features = 3
>>
>>
>> # Debug Tuning
>> debug_lockdep = 0/0
>> debug_context = 0/0
>> debug_crush = 0/0
>> debug_buffer = 0/0
>> debug_timer = 0/0
>> debug_filer = 0/0
>> debug_objecter = 0/0
>> debug_rados = 0/0
>> debug_rbd = 0/0
>> debug_journaler = 0/0
>> debug_objectcatcher = 0/0
>> debug_client = 0/0
>> debug_osd = 0/0
>> debug_optracker = 0/0
>> debug_objclass = 0/0
>> debug_filestore = 0/0
>> debug_journal = 0/0
>> debug_ms = 0/0
>> debug_monc = 0/0
>> debug_tp = 0/0
>> debug_auth = 0/0
>> debug_finisher = 0/0
>> debug_heartbeatmap = 0/0
>> debug_perfcounter = 0/0
>> debug_asok = 0/0
>> debug_throttle = 0/0
>> debug_mon = 0/0
>> debug_paxos = 0/0
>> debug_rgw = 0/0
>>
>> [osd]
>> osd_mkfs_type = xfs
>> osd_mount_options_xfs =
>> rw,noatime,,nodiratime,inode64,logbsize=256k,delaylog
>> osd_mkfs_options_xfs = -f -i size=2048
>> osd_journal_size = 10240
>> filestore_queue_max_ops=1000
>> filestore_queue_max_bytes = 1048576000
>> filestore_max_sync_interval = 10
>> filestore_merge_threshold = 500
>> filestore_split_multiple = 100
>> osd_op_shard_threads = 6
>> journal_max_write_entries = 5000
>> journal_max_write_bytes = 1048576000
>> journal_queueu_max_ops = 3000
>> journal_queue_max_bytes = 1048576000
>> ms_dispatch_throttle_bytes = 1048576000
>> objecter_inflight_op_bytes = 1048576000
>> public network = 10.20.142.0/24
>> cluster_network = 10.20.136.0/24
>> osd_disk_thread_ioprio_priority = 7
>> osd_disk_thread_ioprio_class = idle
>> osd_max_backfills = 2
>> osd_recovery_sleep = 0.10
>>
>>
>> [client]
>> rbd_cache = False
>> rbd cache size = 33554432
>> rbd cache target dirty = 16777216
>> rbd cache max dirty = 25165824
>> rbd cache max dirty age = 2
>> rbd cache writethrough until flush = false
>>
>>
>> 
>>
>>
>> 2018-08-28 02:31:30.961954 7f64a895a700  4 rocksdb:
>> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACH

Re: [ceph-users] OSD Segfaults after Bluestore conversion

2018-08-27 Thread Tyler Bishop
Okay so far since switching back it looks more stable.  I have around 2GB/s
and 100k iops flowing with FIO atm to test.
_



On Mon, Aug 27, 2018 at 11:06 PM Adam Tygart  wrote:

> This issue was related to using Jemalloc. Jemalloc is not as well
> tested with Bluestore and lead to lots of segfaults. We moved back to
> the default of tcmalloc with Bluestore and these stopped.
>
> Check /etc/sysconfig/ceph under RHEL based distros.
>
> --
> Adam
> On Mon, Aug 27, 2018 at 9:51 PM Tyler Bishop
>  wrote:
> >
> > Did you solve this?  Similar issue.
> > _
> >
> >
> > On Wed, Feb 28, 2018 at 3:46 PM Kyle Hutson  wrote:
> >>
> >> I'm following up from awhile ago. I don't think this is the same bug.
> The bug referenced shows "abort: Corruption: block checksum mismatch", and
> I'm not seeing that on mine.
> >>
> >> Now I've had 8 OSDs down on this one server for a couple of weeks, and
> I just tried to start it back up. Here's a link to the log of that OSD
> (which segfaulted right after starting up):
> http://people.beocat.ksu.edu/~kylehutson/ceph-osd.414.log
> >>
> >> To me, it looks like the logs are providing surprisingly few hints as
> to where the problem lies. Is there a way I can turn up logging to see if I
> can get any more info as to why this is happening?
> >>
> >> On Thu, Feb 8, 2018 at 3:02 AM, Mike O'Connor  wrote:
> >>>
> >>> On 7/02/2018 8:23 AM, Kyle Hutson wrote:
> >>> > We had a 26-node production ceph cluster which we upgraded to
> Luminous
> >>> > a little over a month ago. I added a 27th-node with Bluestore and
> >>> > didn't have any issues, so I began converting the others, one at a
> >>> > time. The first two went off pretty smoothly, but the 3rd is doing
> >>> > something strange.
> >>> >
> >>> > Initially, all the OSDs came up fine, but then some started to
> >>> > segfault. Out of curiosity more than anything else, I did reboot the
> >>> > server to see if it would get better or worse, and it pretty much
> >>> > stayed the same - 12 of the 18 OSDs did not properly come up. Of
> >>> > those, 3 again segfaulted
> >>> >
> >>> > I picked one that didn't properly come up and copied the log to where
> >>> > anybody can view it:
> >>> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log
> >>> > <http://people.beocat.ksu.edu/%7Ekylehutson/ceph-osd.426.log>
> >>> >
> >>> > You can contrast that with one that is up:
> >>> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log
> >>> > <http://people.beocat.ksu.edu/%7Ekylehutson/ceph-osd.428.log>
> >>> >
> >>> > (which is still showing segfaults in the logs, but seems to be
> >>> > recovering from them OK?)
> >>> >
> >>> > Any ideas?
> >>> Ideas ? yes
> >>>
> >>> There is a a bug which is hitting a small number of systems and at this
> >>> time there is no solution. Issues details at
> >>> http://tracker.ceph.com/issues/22102.
> >>>
> >>> Please submit more details of your problem on the ticket.
> >>>
> >>> Mike
> >>>
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD Segfaults after Bluestore conversion

2018-08-27 Thread Tyler Bishop
Did you solve this?  Similar issue.
_


On Wed, Feb 28, 2018 at 3:46 PM Kyle Hutson  wrote:

> I'm following up from awhile ago. I don't think this is the same bug. The
> bug referenced shows "abort: Corruption: block checksum mismatch", and I'm
> not seeing that on mine.
>
> Now I've had 8 OSDs down on this one server for a couple of weeks, and I
> just tried to start it back up. Here's a link to the log of that OSD (which
> segfaulted right after starting up):
> http://people.beocat.ksu.edu/~kylehutson/ceph-osd.414.log
>
> To me, it looks like the logs are providing surprisingly few hints as to
> where the problem lies. Is there a way I can turn up logging to see if I
> can get any more info as to why this is happening?
>
> On Thu, Feb 8, 2018 at 3:02 AM, Mike O'Connor  wrote:
>
>> On 7/02/2018 8:23 AM, Kyle Hutson wrote:
>> > We had a 26-node production ceph cluster which we upgraded to Luminous
>> > a little over a month ago. I added a 27th-node with Bluestore and
>> > didn't have any issues, so I began converting the others, one at a
>> > time. The first two went off pretty smoothly, but the 3rd is doing
>> > something strange.
>> >
>> > Initially, all the OSDs came up fine, but then some started to
>> > segfault. Out of curiosity more than anything else, I did reboot the
>> > server to see if it would get better or worse, and it pretty much
>> > stayed the same - 12 of the 18 OSDs did not properly come up. Of
>> > those, 3 again segfaulted
>> >
>> > I picked one that didn't properly come up and copied the log to where
>> > anybody can view it:
>> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.426.log
>> > 
>> >
>> > You can contrast that with one that is up:
>> > http://people.beocat.ksu.edu/~kylehutson/ceph-osd.428.log
>> > 
>> >
>> > (which is still showing segfaults in the logs, but seems to be
>> > recovering from them OK?)
>> >
>> > Any ideas?
>> Ideas ? yes
>>
>> There is a a bug which is hitting a small number of systems and at this
>> time there is no solution. Issues details at
>> http://tracker.ceph.com/issues/22102.
>>
>> Please submit more details of your problem on the ticket.
>>
>> Mike
>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore crashing constantly with load on newly created cluster/host.

2018-08-27 Thread Tyler Bishop
My host has 256GB of ram.  62GB used under most heavy io workload.
_

*Tyler Bishop*
EST 2007


O: 513-299-7108 x1000
M: 513-646-5809
http://BeyondHosting.net <http://beyondhosting.net/>


This email is intended only for the recipient(s) above and/or
otherwise authorized personnel. The information contained herein and
attached is confidential and the property of Beyond Hosting. Any
unauthorized copying, forwarding, printing, and/or disclosing
any information related to this email is prohibited. If you received this
message in error, please contact the sender and destroy all copies of this
email and any attachment(s).


On Mon, Aug 27, 2018 at 10:36 PM Alfredo Daniel Rezinovsky <
alfredo.rezinov...@ingenieria.uncuyo.edu.ar> wrote:

> I had blockdb in ssd, with 3 OSDs per host (8G ram) and the default 3G
> bluestore_cache_size_ssd
>
> I stopped having inconsistencies dropping the cache to 1G.
>
> On 27/08/18 23:32, Tyler Bishop wrote:
>
> Having a constant segfault issue under io load with my newly created
> bluestore deployment.
>
> https://pastebin.com/82YjXRm7
>
> Setup is 28GB SSD LVM for block.db and 6T spinner for data.
>
> Config:
> [global]
> fsid =  REDACTED
> mon_initial_members = cephmon-1001, cephmon-1002, cephmon-1003
> mon_host = 10.20.142.5,10.20.142.6,10.20.142.7
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
>
> # Fixes issue where image is created with newer than supported features
> enabled.
> rbd_default_features = 3
>
>
> # Debug Tuning
> debug_lockdep = 0/0
> debug_context = 0/0
> debug_crush = 0/0
> debug_buffer = 0/0
> debug_timer = 0/0
> debug_filer = 0/0
> debug_objecter = 0/0
> debug_rados = 0/0
> debug_rbd = 0/0
> debug_journaler = 0/0
> debug_objectcatcher = 0/0
> debug_client = 0/0
> debug_osd = 0/0
> debug_optracker = 0/0
> debug_objclass = 0/0
> debug_filestore = 0/0
> debug_journal = 0/0
> debug_ms = 0/0
> debug_monc = 0/0
> debug_tp = 0/0
> debug_auth = 0/0
> debug_finisher = 0/0
> debug_heartbeatmap = 0/0
> debug_perfcounter = 0/0
> debug_asok = 0/0
> debug_throttle = 0/0
> debug_mon = 0/0
> debug_paxos = 0/0
> debug_rgw = 0/0
>
> [osd]
> osd_mkfs_type = xfs
> osd_mount_options_xfs =
> rw,noatime,,nodiratime,inode64,logbsize=256k,delaylog
> osd_mkfs_options_xfs = -f -i size=2048
> osd_journal_size = 10240
> filestore_queue_max_ops=1000
> filestore_queue_max_bytes = 1048576000
> filestore_max_sync_interval = 10
> filestore_merge_threshold = 500
> filestore_split_multiple = 100
> osd_op_shard_threads = 6
> journal_max_write_entries = 5000
> journal_max_write_bytes = 1048576000
> journal_queueu_max_ops = 3000
> journal_queue_max_bytes = 1048576000
> ms_dispatch_throttle_bytes = 1048576000
> objecter_inflight_op_bytes = 1048576000
> public network = 10.20.142.0/24
> cluster_network = 10.20.136.0/24
> osd_disk_thread_ioprio_priority = 7
> osd_disk_thread_ioprio_class = idle
> osd_max_backfills = 2
> osd_recovery_sleep = 0.10
>
>
> [client]
> rbd_cache = False
> rbd cache size = 33554432
> rbd cache target dirty = 16777216
> rbd cache max dirty = 25165824
> rbd cache max dirty age = 2
> rbd cache writethrough until flush = false
>
>
> 
>
>
> 2018-08-28 02:31:30.961954 7f64a895a700  4 rocksdb:
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/flush_job.cc:319]
> [default] [JOB 19] Level-0 flush table #688: 6121532 bytes OK
> 2018-08-28 02:31:30.962476 7f64a895a700  4 rocksdb:
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl_files.cc:242]
> adding log 681 to recycle list
>
> 2018-08-28 02:31:30.962495 7f64a895a700  4 rocksdb: (Original Log Time
> 2018/08/28-02:31:30.961973)
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/memtable_list.cc:360]
> [default] Level-0 commit table #688 started
> 2018-08-28 02:31:30.962501 7f64a895a700  4 rocksdb: (Original Log Time
> 2018/08/28-02:31:30.962413)
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/memtable_list.cc:383]
> [default] Level-0 commit table #688: memtable #1 done
> 2018-08-2

[ceph-users] Bluestore crashing constantly with load on newly created cluster/host.

2018-08-27 Thread Tyler Bishop
ST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl_write.cc:725]
[default] New memtable created with log file: #689. Immutable memtables: 0.

2018-08-28 02:32:06.102542 7f64a895a700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl_compaction_flush.cc:49]
[JOB 20] Syncing log #687
2018-08-28 02:32:06.103394 7f64a895a700  4 rocksdb: (Original Log Time
2018/08/28-02:32:06.102527)
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl_compaction_flush.cc:1158]
Calling FlushMemTableToOutputFile with column family [default], flush slots
available 1, compaction slots allowed 1, compaction slots scheduled 1
2018-08-28 02:32:06.103407 7f64a895a700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/flush_job.cc:264]
[default] [JOB 20] Flushing memtable with next log file: 689

2018-08-28 02:32:06.103435 7f64a895a700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1535423526103422, "job": 20, "event": "flush_started",
"num_memtables": 1, "num_entries": 97689, "num_deletes": 21335,
"memory_usage": 260069984}
2018-08-28 02:32:06.103446 7f64a895a700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/flush_job.cc:293]
[default] [JOB 20] Level-0 flush table #690: started
2018-08-28 02:32:06.155755 7f64a895a700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1535423526155726, "cf_name": "default", "job": 20, "event":
"table_file_creation", "file_number": 690, "file_size": 6343137,
"table_properties": {"data_size": 6153638, "index_size": 65232,
"filter_size": 123278, "raw_key_size": 2289031, "raw_average_key_size": 52,
"raw_value_size": 5374531, "raw_average_value_size": 122,
"num_data_blocks": 1047, "num_entries": 43785, "filter_policy_name":
"rocksdb.BuiltinBloomFilter", "kDeletedKeys": "21429", "kMergeOperands":
"220"}}
2018-08-28 02:32:06.155776 7f64a895a700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/flush_job.cc:319]
[default] [JOB 20] Level-0 flush table #690: 6343137 bytes OK
2018-08-28 02:32:06.156214 7f64a895a700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl_files.cc:242]
adding log 687 to recycle list

2018-08-28 02:32:06.156225 7f64a895a700  4 rocksdb: (Original Log Time
2018/08/28-02:32:06.155790)
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/memtable_list.cc:360]
[default] Level-0 commit table #690 started
2018-08-28 02:32:06.156229 7f64a895a700  4 rocksdb: (Original Log Time
2018/08/28-02:32:06.156164)
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/memtable_list.cc:383]
[default] Level-0 commit table #690: memtable #1 done
2018-08-28 02:32:06.156239 7f64a895a700  4 rocksdb: (Original Log Time
2018/08/28-02:32:06.156178) EVENT_LOG_v1 {"time_micros": 1535423526156172,
"job": 20, "event": "flush_finished", "lsm_state": [2, 4, 1, 0, 0, 0, 0],
"immutable_memtables": 0}
2018-08-28 02:32:06.156244 7f64a895a700  4 rocksdb: (Original Log Time
2018/08/28-02:32:06.156199)
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl_compaction_flush.cc:132]
[default] Level summary: base level 1 max bytes base 268435456 files[2 4 1
0 0 0 0] max score 0.84

2018-08-28 02:32:06.156252 7f64a895a700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2

Re: [ceph-users] Stability Issue with 52 OSD hosts

2018-08-23 Thread Tyler Bishop
Thanks for the info. I was investigating bluestore as well.  My host dont
go unresponsive but I do see parallel io slow down.

On Thu, Aug 23, 2018, 8:02 PM Andras Pataki 
wrote:

> We are also running some fairly dense nodes with CentOS 7.4 and ran into
> similar problems.  The nodes ran filestore OSDs (Jewel, then Luminous).
> Sometimes a node would be so unresponsive that one couldn't even ssh to
> it (even though the root disk was a physically separate drive on a
> separate controller from the OSD drives).  Often these would coincide
> with kernel stack traces about hung tasks. Initially we did blame high
> load, etc. from all the OSDs.
>
> But then we benchmarked the nodes independently of ceph (with iozone and
> such) and noticed problems there too.  When we started a few dozen
> iozone processes on separate JBOD drives with xfs, some didn't even
> start and write a single byte for minutes.  The conclusion we came to
> was that there is some interference among a lot of mounted xfs file
> systems in the Red Hat 3.10 kernels.  Some kind of central lock that
> prevents dozens of xfs file systems from running in parallel.  When we
> do I/O directly to raw devices in parallel, we saw no problems (no high
> loads, etc.).  So we built a newer kernel, and the situation got
> better.  4.4 is already much better, nowadays we are testing moving to
> 4.14.
>
> Also, migrating to bluestore significantly reduced the load on these
> nodes too.  At busy times, the filestore host loads were 20-30, even
> higher (on a 28 core node), while the bluestore nodes hummed along at a
> lot of perhaps 6 or 8.  This also confirms that somehow lots of xfs
> mounts don't work in parallel.
>
> Andras
>
>
> On 08/23/2018 03:24 PM, Tyler Bishop wrote:
> > Yes I've reviewed all the logs from monitor and host.   I am not
> > getting useful errors (or any) in dmesg or general messages.
> >
> > I have 2 ceph clusters, the other cluster is 300 SSD and i never have
> > issues like this.   That's why Im looking for help.
> >
> > On Thu, Aug 23, 2018 at 3:22 PM Alex Gorbachev 
> wrote:
> >> On Wed, Aug 22, 2018 at 11:39 PM Tyler Bishop
> >>  wrote:
> >>> During high load testing I'm only seeing user and sys cpu load around
> 60%... my load doesn't seem to be anything crazy on the host and iowait
> stays between 6 and 10%.  I have very good `ceph osd perf` numbers too.
> >>>
> >>> I am using 10.2.11 Jewel.
> >>>
> >>>
> >>> On Wed, Aug 22, 2018 at 11:30 PM Christian Balzer 
> wrote:
> >>>> Hello,
> >>>>
> >>>> On Wed, 22 Aug 2018 23:00:24 -0400 Tyler Bishop wrote:
> >>>>
> >>>>> Hi,   I've been fighting to get good stability on my cluster for
> about
> >>>>> 3 weeks now.  I am running into intermittent issues with OSD flapping
> >>>>> marking other OSD down then going back to a stable state for hours
> and
> >>>>> days.
> >>>>>
> >>>>> The cluster is 4x Cisco UCS S3260 with dual E5-2660, 256GB ram, 40G
> >>>>> Network to 40G Brocade VDX Switches.  The OSD are 6TB HGST SAS drives
> >>>>> with 400GB HGST SAS 12G SSDs.   My configuration is 4 journals per
> >>>>> host with 12 disk per journal for a total of 56 disk per system and
> 52
> >>>>> OSD.
> >>>>>
> >>>> Any denser and you'd have a storage black hole.
> >>>>
> >>>> You already pointed your finger in the (or at least one) right
> direction
> >>>> and everybody will agree that this setup is woefully underpowered in
> the
> >>>> CPU department.
> >>>>
> >>>>> I am using CentOS 7 with kernel 3.10 and the redhat tuned-adm profile
> >>>>> for throughput-performance enabled.
> >>>>>
> >>>> Ceph version would be interesting as well...
> >>>>
> >>>>> I have these sysctls set:
> >>>>>
> >>>>> kernel.pid_max = 4194303
> >>>>> fs.file-max = 6553600
> >>>>> vm.swappiness = 0
> >>>>> vm.vfs_cache_pressure = 50
> >>>>> vm.min_free_kbytes = 3145728
> >>>>>
> >>>>> I feel like my issue is directly related to the high number of OSD
> per
> >>>>> host but I'm not sure what issue I'm really running into.   I believe
> >>>>> that I have ruled out network issues, i am able to get 38Gbit
> >>>>> consistently via iperf testi

Re: [ceph-users] Stability Issue with 52 OSD hosts

2018-08-23 Thread Tyler Bishop
Yes I've reviewed all the logs from monitor and host.   I am not
getting useful errors (or any) in dmesg or general messages.

I have 2 ceph clusters, the other cluster is 300 SSD and i never have
issues like this.   That's why Im looking for help.

On Thu, Aug 23, 2018 at 3:22 PM Alex Gorbachev  wrote:
>
> On Wed, Aug 22, 2018 at 11:39 PM Tyler Bishop
>  wrote:
> >
> > During high load testing I'm only seeing user and sys cpu load around 
> > 60%... my load doesn't seem to be anything crazy on the host and iowait 
> > stays between 6 and 10%.  I have very good `ceph osd perf` numbers too.
> >
> > I am using 10.2.11 Jewel.
> >
> >
> > On Wed, Aug 22, 2018 at 11:30 PM Christian Balzer  wrote:
> >>
> >> Hello,
> >>
> >> On Wed, 22 Aug 2018 23:00:24 -0400 Tyler Bishop wrote:
> >>
> >> > Hi,   I've been fighting to get good stability on my cluster for about
> >> > 3 weeks now.  I am running into intermittent issues with OSD flapping
> >> > marking other OSD down then going back to a stable state for hours and
> >> > days.
> >> >
> >> > The cluster is 4x Cisco UCS S3260 with dual E5-2660, 256GB ram, 40G
> >> > Network to 40G Brocade VDX Switches.  The OSD are 6TB HGST SAS drives
> >> > with 400GB HGST SAS 12G SSDs.   My configuration is 4 journals per
> >> > host with 12 disk per journal for a total of 56 disk per system and 52
> >> > OSD.
> >> >
> >> Any denser and you'd have a storage black hole.
> >>
> >> You already pointed your finger in the (or at least one) right direction
> >> and everybody will agree that this setup is woefully underpowered in the
> >> CPU department.
> >>
> >> > I am using CentOS 7 with kernel 3.10 and the redhat tuned-adm profile
> >> > for throughput-performance enabled.
> >> >
> >> Ceph version would be interesting as well...
> >>
> >> > I have these sysctls set:
> >> >
> >> > kernel.pid_max = 4194303
> >> > fs.file-max = 6553600
> >> > vm.swappiness = 0
> >> > vm.vfs_cache_pressure = 50
> >> > vm.min_free_kbytes = 3145728
> >> >
> >> > I feel like my issue is directly related to the high number of OSD per
> >> > host but I'm not sure what issue I'm really running into.   I believe
> >> > that I have ruled out network issues, i am able to get 38Gbit
> >> > consistently via iperf testing and mtu for jump pings successfully
> >> > with no fragment set and 8972 packet size.
> >> >
> >> The fact that it all works for days at a time suggests this as well, but
> >> you need to verify these things when they're happening.
> >>
> >> > From FIO testing I seem to be able to get 150-200k iops write from my
> >> > rbd clients on 1gbit networking... This is about what I expected due
> >> > to the write penalty and my underpowered CPU for the number of OSD.
> >> >
> >> > I get these messages which I believe are normal?
> >> > 2018-08-22 10:33:12.754722 7f7d009f5700  0 -- 10.20.136.8:6894/718902
> >> > >> 10.20.136.10:6876/490574 pipe(0x55aed77fd400 sd=192 :40502 s=2
> >> > pgs=1084 cs=53 l=0 c=0x55aed805bc80).fault with nothing to send, going
> >> > to standby
> >> >
> >> Ignore.
> >>
> >> > Then randomly I'll get a storm of this every few days for 20 minutes or 
> >> > so:
> >> > 2018-08-22 15:48:32.631186 7f44b7514700 -1 osd.127 37333
> >> > heartbeat_check: no reply from 10.20.142.11:6861 osd.198 since back
> >> > 2018-08-22 15:48:08.052762 front 2018-08-22 15:48:31.282890 (cutoff
> >> > 2018-08-22 15:48:12.630773)
> >> >
> >> Randomly is unlikely.
> >> Again, catch it in the act, atop in huge terminal windows (showing all
> >> CPUs and disks) for all nodes should be very telling, collecting and
> >> graphing this data might work, too.
> >>
> >> My suspects would be deep scrubs and/or high IOPS spikes when this is
> >> happening, starving out OSD processes (CPU wise, RAM should be fine one
> >> supposes).
> >>
> >> Christian
> >>
> >> > Please help!!!
>
> Have you looked at the OSD logs on the OSD nodes by chance?  I found
> that correlating the messages in those logs with your master ceph log
> and also correlating with any messages in syslog or kern.log can
> elucidate the cause of the problem pretty well.
> --
> Alex Gorbachev
> Storcium
>
>
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >>
> >>
> >> --
> >> Christian BalzerNetwork/Systems Engineer
> >> ch...@gol.com   Rakuten Communications
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stability Issue with 52 OSD hosts

2018-08-22 Thread Tyler Bishop
During high load testing I'm only seeing user and sys cpu load around
60%... my load doesn't seem to be anything crazy on the host and iowait
stays between 6 and 10%.  I have very good `ceph osd perf` numbers too.

I am using 10.2.11 Jewel.


On Wed, Aug 22, 2018 at 11:30 PM Christian Balzer  wrote:

> Hello,
>
> On Wed, 22 Aug 2018 23:00:24 -0400 Tyler Bishop wrote:
>
> > Hi,   I've been fighting to get good stability on my cluster for about
> > 3 weeks now.  I am running into intermittent issues with OSD flapping
> > marking other OSD down then going back to a stable state for hours and
> > days.
> >
> > The cluster is 4x Cisco UCS S3260 with dual E5-2660, 256GB ram, 40G
> > Network to 40G Brocade VDX Switches.  The OSD are 6TB HGST SAS drives
> > with 400GB HGST SAS 12G SSDs.   My configuration is 4 journals per
> > host with 12 disk per journal for a total of 56 disk per system and 52
> > OSD.
> >
> Any denser and you'd have a storage black hole.
>
> You already pointed your finger in the (or at least one) right direction
> and everybody will agree that this setup is woefully underpowered in the
> CPU department.
>
> > I am using CentOS 7 with kernel 3.10 and the redhat tuned-adm profile
> > for throughput-performance enabled.
> >
> Ceph version would be interesting as well...
>
> > I have these sysctls set:
> >
> > kernel.pid_max = 4194303
> > fs.file-max = 6553600
> > vm.swappiness = 0
> > vm.vfs_cache_pressure = 50
> > vm.min_free_kbytes = 3145728
> >
> > I feel like my issue is directly related to the high number of OSD per
> > host but I'm not sure what issue I'm really running into.   I believe
> > that I have ruled out network issues, i am able to get 38Gbit
> > consistently via iperf testing and mtu for jump pings successfully
> > with no fragment set and 8972 packet size.
> >
> The fact that it all works for days at a time suggests this as well, but
> you need to verify these things when they're happening.
>
> > From FIO testing I seem to be able to get 150-200k iops write from my
> > rbd clients on 1gbit networking... This is about what I expected due
> > to the write penalty and my underpowered CPU for the number of OSD.
> >
> > I get these messages which I believe are normal?
> > 2018-08-22 10:33:12.754722 7f7d009f5700  0 -- 10.20.136.8:6894/718902
> > >> 10.20.136.10:6876/490574 pipe(0x55aed77fd400 sd=192 :40502 s=2
> > pgs=1084 cs=53 l=0 c=0x55aed805bc80).fault with nothing to send, going
> > to standby
> >
> Ignore.
>
> > Then randomly I'll get a storm of this every few days for 20 minutes or
> so:
> > 2018-08-22 15:48:32.631186 7f44b7514700 -1 osd.127 37333
> > heartbeat_check: no reply from 10.20.142.11:6861 osd.198 since back
> > 2018-08-22 15:48:08.052762 front 2018-08-22 15:48:31.282890 (cutoff
> > 2018-08-22 15:48:12.630773)
> >
> Randomly is unlikely.
> Again, catch it in the act, atop in huge terminal windows (showing all
> CPUs and disks) for all nodes should be very telling, collecting and
> graphing this data might work, too.
>
> My suspects would be deep scrubs and/or high IOPS spikes when this is
> happening, starving out OSD processes (CPU wise, RAM should be fine one
> supposes).
>
> Christian
>
> > Please help!!!
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Rakuten Communications
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Stability Issue with 52 OSD hosts

2018-08-22 Thread Tyler Bishop
Hi,   I've been fighting to get good stability on my cluster for about
3 weeks now.  I am running into intermittent issues with OSD flapping
marking other OSD down then going back to a stable state for hours and
days.

The cluster is 4x Cisco UCS S3260 with dual E5-2660, 256GB ram, 40G
Network to 40G Brocade VDX Switches.  The OSD are 6TB HGST SAS drives
with 400GB HGST SAS 12G SSDs.   My configuration is 4 journals per
host with 12 disk per journal for a total of 56 disk per system and 52
OSD.

I am using CentOS 7 with kernel 3.10 and the redhat tuned-adm profile
for throughput-performance enabled.

I have these sysctls set:

kernel.pid_max = 4194303
fs.file-max = 6553600
vm.swappiness = 0
vm.vfs_cache_pressure = 50
vm.min_free_kbytes = 3145728

I feel like my issue is directly related to the high number of OSD per
host but I'm not sure what issue I'm really running into.   I believe
that I have ruled out network issues, i am able to get 38Gbit
consistently via iperf testing and mtu for jump pings successfully
with no fragment set and 8972 packet size.

>From FIO testing I seem to be able to get 150-200k iops write from my
rbd clients on 1gbit networking... This is about what I expected due
to the write penalty and my underpowered CPU for the number of OSD.

I get these messages which I believe are normal?
2018-08-22 10:33:12.754722 7f7d009f5700  0 -- 10.20.136.8:6894/718902
>> 10.20.136.10:6876/490574 pipe(0x55aed77fd400 sd=192 :40502 s=2
pgs=1084 cs=53 l=0 c=0x55aed805bc80).fault with nothing to send, going
to standby

Then randomly I'll get a storm of this every few days for 20 minutes or so:
2018-08-22 15:48:32.631186 7f44b7514700 -1 osd.127 37333
heartbeat_check: no reply from 10.20.142.11:6861 osd.198 since back
2018-08-22 15:48:08.052762 front 2018-08-22 15:48:31.282890 (cutoff
2018-08-22 15:48:12.630773)

Please help!!!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph iSCSI login failed due to authorization failure

2017-10-19 Thread Tyler Bishop
Where did you find the iscsi rpms ect? I looked all through the repo and can't 
find anything but the documentation. 

_ 

Tyler Bishop 
Founder EST 2007 


O: 513-299-7108 x10 
M: 513-646-5809 
[ http://beyondhosting.net/ | http://BeyondHosting.net ] 


This email is intended only for the recipient(s) above and/or otherwise 
authorized personnel. The information contained herein and attached is 
confidential and the property of Beyond Hosting. Any unauthorized copying, 
forwarding, printing, and/or disclosing any information related to this email 
is prohibited. If you received this message in error, please contact the sender 
and destroy all copies of this email and any attachment(s). 


From: "Maged Mokhtar" <mmokh...@petasan.org> 
To: "Kashif Mumtaz" <kashif.mum...@yahoo.com> 
Cc: "Ceph Users" <ceph-users@lists.ceph.com> 
Sent: Saturday, October 14, 2017 1:40:05 PM 
Subject: Re: [ceph-users] Ceph iSCSI login failed due to authorization failure 



On 2017-10-14 17:50, Kashif Mumtaz wrote: 


Hello Dear, 
I am trying to configure the Ceph iscsi gateway on Ceph Luminious . As per 
below 
[ http://docs.ceph.com/docs/master/rbd/iscsi-overview/ | Ceph iSCSI Gateway — 
Ceph Documentation ] 
[ http://docs.ceph.com/docs/master/rbd/iscsi-overview/ |   ] 

Ceph iSCSI Gateway — Ceph Documentation 

Ceph is iscsi gateway are configured and chap auth is set. 
/> ls 
o- / 
.
 [...] 
o- clusters 

 [Clusters: 1] 
| o- ceph 
..
 [HEALTH_WARN] 
| o- pools 
..
 [Pools: 2] 
| | o- kashif . 
[Commit: 0b, Avail: 116G, Used: 1K, Commit%: 0%] 
| | o- rbd ... 
[Commit: 10G, Avail: 116G, Used: 3K, Commit%: 8%] 
| o- topology 
...
 [OSDs: 13,MONs: 3] 
o- disks 
.
 [10G, Disks: 1] 
| o- rbd.disk_1 
...
 [disk_1 (10G)] 
o- iscsi-target 
.
 [Targets: 1] 
o- iqn.2003-01.com.redhat.iscsi-gw:tahir 
. 
[Gateways: 2] 
o- gateways 

 [Up: 2/2, Portals: 2] 
| o- gateway 

 [192.168.10.37 (UP)] 
| o- gateway2 
...
 [192.168.10.38 (UP)] 
o- hosts 
..
 [Hosts: 1] 
o- iqn.1994-05.com.redhat:rh7-client 
... [Auth: CHAP, Disks: 
1(10G)] 
o- lun 0 
.. 
[rbd.disk_1(10G), Owner: gateway2] 
/> 
But initiators are unable to mount it. Try both ion Linux and ESXi 6. 
Below is the error message on iscsi gateway server log file. 
Oct 14 19:34:49 gateway kernel: iSCSI Initiator Node: 
iqn.1998-01.com.vmware:esx0-36c45c69 is not authorized to access iSCSI target 
portal group: 1. 
Oct 14 19:34:49 gateway kernel: iSCSI Login negotiation failed. 
Oct 14 19:35:27 gateway kernel: iSCSI Initiator Node: 
iqn.1994-05.com.redhat:5ef55740c576 is not authorized to access iSCSI target 
portal group: 1. 
Oct 14 19:35:27 gateway kernel: iSCSI Login negotiation failed. 
I am giving the ceph authentication on initiator side. 
Discovery on initiator is happening 
root@server1 ~]# iscsiadm -m discovery -t st -p 192.168.10.37 
192.168.10.37:3260,1 iqn.2003-01.com.redhat.iscsi-gw:tahir 
192.168.10.38:3260,2 iqn.2003-01.com.redhat.iscsi-gw:tahir 
But when trying to login , it is giving "iSCSI login failed due to 
authorization failure" 
[root@server1 ~]# iscsiadm -m node -T iqn.2003-01.com.redhat.iscsi-gw:tahir -l 
Logging in to [iface: default, target: iqn.2003-01.com.redhat.iscsi-gw:tahir, 
portal: 192.168.10.37,3260] (multiple) 
Logging in to [iface: default, target: iqn.2003-01.com.redhat.iscsi-gw:tahir, 
portal: 192.168.10.3

Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

2017-09-05 Thread Tyler Bishop
We had to change these in our cluster for some drives to come up.

_ 

Tyler Bishop 
Founder EST 2007 


O: 513-299-7108 x10 
M: 513-646-5809 
[ http://beyondhosting.net/ | http://BeyondHosting.net ] 


This email is intended only for the recipient(s) above and/or otherwise 
authorized personnel. The information contained herein and attached is 
confidential and the property of Beyond Hosting. Any unauthorized copying, 
forwarding, printing, and/or disclosing any information related to this email 
is prohibited. If you received this message in error, please contact the sender 
and destroy all copies of this email and any attachment(s).

- Original Message -
From: "Andreas Calminder" <andreas.calmin...@klarna.com>
To: "Gregory Farnum" <gfar...@redhat.com>
Cc: "Ceph Users" <ceph-users@lists.ceph.com>
Sent: Tuesday, September 5, 2017 1:17:32 AM
Subject: Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

Hi!
Thanks for the pointer about leveldb_compact_on_mount, it took a while
to get everything compacted but after that the deep scrub of the
offending pg went smooth without any suicides. I'm considering using
the compact on mount feature for all our osd's in the cluster since
they're kind of large and thereby kind of slow, sas, but still.
Anyhow, thanks a lot for the help!

/andreas

On 17 August 2017 at 23:48, Gregory Farnum <gfar...@redhat.com> wrote:
> On Thu, Aug 17, 2017 at 1:02 PM, Andreas Calminder
> <andreas.calmin...@klarna.com> wrote:
>> Hi!
>> Thanks for getting back to me!
>>
>> Clients access the cluster through rgw (s3), we had some big buckets
>> containing a lot of small files. Prior to this happening I removed a
>> semi-stale bucket with a rather large index, 2.5 million objects, all but 30
>> objects didn't actually exist which left the normal radosgw-admin bucket rm
>> command to fail so I had to remove the bucket instances and bucket metadata
>> by hand, leaving the remaining 30 objects floating around in the cluster.
>>
>> I don't have access to the logs at the moment, but I see the deep-scrub
>> starting in the log for osd.34, after a while it starts with
>>
>> 1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread $THREADID' had timed out after 15
>>
>> the $THREADID seemingly is the same one as the deep scrub, after a while it
>> will suicide and a lot of operations will happen until the deep scrub tries
>> again for the same pg and the above repeats.
>>
>> The osd disk (we have 1 osd per disk) is rather large and pretty slow so it
>> might be that, but I think the behaviour should've been observed elsewhere
>> in the cluster as well since all osd disks are of the same type and size.
>>
>> One thought I had is to just kill the disk and re-add it since the data is
>> supposed to be replicated to 3 nodes in the cluster, but I kind of want to
>> find out what has happened and have it fixed.
>
> Ah. Some people have also found that compacting the leveldb store
> improves the situation a great deal. In most versions you can do this
> by setting "leveldb_compact_on_mount = true" in the OSD's config file
> and then restarting the daemon. You may also have admin socket
> commands available to trigger it.
>
> I'd try out those and then turn it on again with the high suicide
> timeout and see if things improve.
> -Greg
>
>
>>
>> /andreas
>>
>>
>> On 17 Aug 2017 20:21, "Gregory Farnum" <gfar...@redhat.com> wrote:
>>
>> On Thu, Aug 17, 2017 at 12:14 AM Andreas Calminder
>> <andreas.calmin...@klarna.com> wrote:
>>>
>>> Thanks,
>>> I've modified the timeout successfully, unfortunately it wasn't enough
>>> for the deep-scrub to finish, so I increased the
>>> osd_op_thread_suicide_timeout even higher (1200s), the deep-scrub
>>> command will however get killed before this timeout is reached, I
>>> figured it was osd_command_thread_suicide_timeout and adjusted it
>>> accordingly and restarted the osd, but it still got killed
>>> approximately 900s after starting.
>>>
>>> The log spits out:
>>> 2017-08-17 09:01:35.723865 7f062e696700  1 heartbeat_map is_healthy
>>> 'OSD::osd_op_tp thread 0x7f05cceee700' had timed out after 15
>>> 2017-08-17 09:01:40.723945 7f062e696700  1 heartbeat_map is_healthy
>>> 'OSD::osd_op_tp thread 0x7f05cceee700' had timed out after 15
>>> 2017-08-17 09:01:45.012105 7f05cceee700  1 heartbeat_map reset_timeout
>>> 'OSD::osd_op_tp thread 0x7f05cceee700' had timed out after 15
>>>
>>> I'm thinking havin

[ceph-users] Enjoy the leap second mon skew tonight..

2016-12-31 Thread Tyler Bishop
Enjoy the leap second guys.. lol your cluster gonna be skewed. 

_ 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] new Open Source Ceph based iSCSI SAN project

2016-10-19 Thread Tyler Bishop
This is a cool project, keep up the good work!


_ 

Tyler Bishop 
Founder 


O: 513-299-7108 x10 
M: 513-646-5809 
http://BeyondHosting.net 


This email is intended only for the recipient(s) above and/or otherwise 
authorized personnel. The information contained herein and attached is 
confidential and the property of Beyond Hosting. Any unauthorized copying, 
forwarding, printing, and/or disclosing any information related to this email 
is prohibited. If you received this message in error, please contact the sender 
and destroy all copies of this email and any attachment(s).

- Original Message -
From: "Maged Mokhtar" <mmokh...@petasan.org>
To: "ceph users" <ceph-users@lists.ceph.com>
Sent: Sunday, October 16, 2016 12:57:14 PM
Subject: [ceph-users] new Open Source Ceph based iSCSI SAN project

Hello,

I am happy to announce PetaSAN, an open source scale-out SAN that uses Ceph 
storage and LIO iSCSI Target.
visit us at:
www.petasan.org

your feedback will be much appreciated.
maged mokhtar 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Interested in Ceph, but have performance questions

2016-09-29 Thread Tyler Bishop
We easily see line rate sequential io of most disk. 

I would say that 150GB/s with 40G networking and a minimum of 20 host is no 
problem. 







Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 




From: "Nick Fisk" <n...@fisk.me.uk> 
To: "Gerald Spencer" <ger.spenc...@gmail.com>, ceph-users@lists.ceph.com 
Sent: Thursday, September 29, 2016 11:04:45 AM 
Subject: Re: [ceph-users] Interested in Ceph, but have performance questions 



Hi Gerald, 



I would say it’s definitely possible. I would make sure you invest in the 
networking to make sure you have enough bandwidth and choose disks based on 
performance rather than capacity. Either lots of lower capacity disks or SSD’s 
would be best. The biggest challenge may be around the client interface (ie 
block,object,file) and if you can get it to create the parallelism required to 
drive the underlying RADOS cluster. 



With my 60 disk cluster I can max out a 10G Nic with both read and writes. 
Ceph’s performance will increase with scale, so I don’t see why with 40G 
networking those figures wouldn’t be achievable. 



Nick 




From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Gerald 
Spencer 
Sent: 29 September 2016 15:38 
To: ceph-users@lists.ceph.com 
Subject: [ceph-users] Interested in Ceph, but have performance questions 




Greetings new world of Ceph, 





Long story short, at work we perform high throughput volumetric imaging and 
create a decent chunk of data per machine. We are about to bring the next 
generation of our system online and the IO requirements will outpace our 
current storage solution (jbod using zfs on Linux). We are currently searching 
for a template-able scale out solution that we can add as we bring each new 
system online starting in a few months. There are several quotes floating 
around from all of the big players, but the buy in on hardware and software is 
unsettling as they are a hefty chunk of change. 





The current performance we are currently estimating is per machine: 


- simultaneous 30Gbps read and 30Gbps write 


- 180 TB capacity (roughly a two day buffer into a public cloud) 








So our question is: are these types of performances possible using Ceph? I 
haven't found any benchmarks of this nature beyond 


https://www.mellanox.com/related-docs/whitepapers/WP_Deploying_Ceph_over_High_Performance_Networks.pdf
 


Which claims 150GB/s? I think perhaps they meant 150Gb/s (150 1Gbps clients). 





Cheers, 


Gerald Spencer 


___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD Down but not marked down by cluster

2016-09-29 Thread Tyler Bishop
The crush does however the status does not.

16/330 in osds are down

When in reality it was 56/330.

I am also having issues of io deadlock from clients until a full rebuild or it 
comes back up.   I have the priorities set but I believe its still trying to 
write to the down osds.

Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited.

- Original Message -
From: "Wido den Hollander" <w...@42on.com>
To: "ceph-users" <ceph-us...@ceph.com>, "ceph new" <ceph-users@lists.ceph.com>, 
"Tyler Bishop" <tyler.bis...@beyondhosting.net>
Sent: Thursday, September 29, 2016 3:35:14 AM
Subject: Re: [ceph-users] OSD Down but not marked down by cluster

> Op 29 september 2016 om 1:57 schreef Tyler Bishop 
> <tyler.bis...@beyondhosting.net>:
> 
> 
> S1148 is down but the cluster does not mark it as such. 
> 

A host will never be marked as down, but the output shows that all OSDs are 
marked as down however.

Wido

> cluster 3aac8ab8-1011-43d6-b281-d16e7a61b2bd 
> health HEALTH_WARN 
> 3888 pgs backfill 
> 196 pgs backfilling 
> 6418 pgs degraded 
> 52 pgs down 
> 52 pgs peering 
> 1 pgs recovery_wait 
> 3653 pgs stuck degraded 
> 52 pgs stuck inactive 
> 6088 pgs stuck unclean 
> 3653 pgs stuck undersized 
> 6417 pgs undersized 
> 186 requests are blocked > 32 sec 
> recovery 42096983/185765821 objects degraded (22.661%) 
> recovery 49940341/185765821 objects misplaced (26.883%) 
> 16/330 in osds are down 
> monmap e1: 3 mons at 
> {ceph0-mon0=10.1.8.40:6789/0,ceph0-mon1=10.1.8.41:6789/0,ceph0-mon2=10.1.8.42:6789/0}
>  
> election epoch 13550, quorum 0,1,2 ceph0-mon0,ceph0-mon1,ceph0-mon2 
> osdmap e236889: 370 osds: 314 up, 330 in; 4096 remapped pgs 
> pgmap v47890297: 20920 pgs, 19 pools, 316 TB data, 85208 kobjects 
> 530 TB used, 594 TB / 1125 TB avail 
> 42096983/185765821 objects degraded (22.661%) 
> 49940341/185765821 objects misplaced (26.883%) 
> 14390 active+clean 
> 3846 active+undersized+degraded+remapped+wait_backfill 
> 2375 active+undersized+degraded 
> 196 active+undersized+degraded+remapped+backfilling 
> 52 down+peering 
> 42 active+remapped+wait_backfill 
> 11 active+remapped 
> 7 active+clean+scrubbing+deep 
> 1 active+recovery_wait+degraded+remapped 
> recovery io 2408 MB/s, 623 objects/s 
> 
> 
> -43 304.63928 host ceph0-s1148 
> 303 5.43999 osd.303 down 0 1.0 
> 304 5.43999 osd.304 down 0 1.0 
> 305 5.43999 osd.305 down 0 1.0 
> 306 5.43999 osd.306 down 0 1.0 
> 307 5.43999 osd.307 down 0 1.0 
> 308 5.43999 osd.308 down 0 1.0 
> 309 5.43999 osd.309 down 0 1.0 
> 310 5.43999 osd.310 down 0 1.0 
> 311 5.43999 osd.311 down 0 1.0 
> 312 5.43999 osd.312 down 0 1.0 
> 313 5.43999 osd.313 down 0 1.0 
> 314 5.43999 osd.314 down 0 1.0 
> 315 5.43999 osd.315 down 0 1.0 
> 316 5.43999 osd.316 down 0 1.0 
> 317 5.43999 osd.317 down 0 1.0 
> 318 5.43999 osd.318 down 0 1.0 
> 319 5.43999 osd.319 down 0 1.0 
> 320 5.43999 osd.320 down 0 1.0 
> 321 5.43999 osd.321 down 0 1.0 
> 322 5.43999 osd.322 down 0 1.0 
> 323 5.43999 osd.323 down 0 1.0 
> 324 5.43999 osd.324 down 0 1.0 
> 325 5.43999 osd.325 down 0 1.0 
> 326 5.43999 osd.326 down 0 1.0 
> 327 5.43999 osd.327 down 0 1.0 
> 328 5.43999 osd.328 down 0 1.0 
> 329 5.43999 osd.329 down 0 1.0 
> 330 5.43999 osd.330 down 0 1.0 
> 331 5.43999 osd.331 down 0 1.0 
> 332 5.43999 osd.332 down 1.0 1.0 
> 333 5.43999 osd.333 down 1.0 1.0 
> 334 5.43999 osd.334 down 1.0 1.0 
> 335 5.43999 osd.335 down 0 1.0 
> 337 5.43999 osd.337 down 1.0 1.0 
> 338 5.43999 osd.338 down 0 1.0 
> 339 5.43999 osd.339 down 1.0 1.0 
> 340 5.43999 osd.340 down 0 1.0 
> 341 5.43999 osd.341 down 0 1.0 
> 342 5.43999 osd.342 down 0 1.0 
> 343 5.43999 osd.343 down 0 1.0 
> 344 5.43999 osd.344 down 0 1.0 
> 345 5.43999 osd.345 down 0 1.0 
> 346 5.43999 osd.346 down 0 1.0 
> 347 5.43999 osd.347 down 1.0 1.0 
> 348 5.43999 osd.348 down 1.0 1.0 
> 349 5.43999 osd.349 down 0 1.0 
> 350 5.43999 osd.350 down 1.0 1.0 
> 351 5.43999 osd.351 down 1.0 1.0 
> 352 5.43999 osd.352 down 1.00000 1.0 
> 353 5.43999 osd.353 down 1.0 1.0 
> 354 5.43999 osd.354 down 1.0 1.0 
> 355 5.43999 osd.355 down 1.0 1.0 
> 356 5.43999 osd.356 down 1.0 1.0

[ceph-users] OSD Down but not marked down by cluster

2016-09-28 Thread Tyler Bishop
S1148 is down but the cluster does not mark it as such. 

cluster 3aac8ab8-1011-43d6-b281-d16e7a61b2bd 
health HEALTH_WARN 
3888 pgs backfill 
196 pgs backfilling 
6418 pgs degraded 
52 pgs down 
52 pgs peering 
1 pgs recovery_wait 
3653 pgs stuck degraded 
52 pgs stuck inactive 
6088 pgs stuck unclean 
3653 pgs stuck undersized 
6417 pgs undersized 
186 requests are blocked > 32 sec 
recovery 42096983/185765821 objects degraded (22.661%) 
recovery 49940341/185765821 objects misplaced (26.883%) 
16/330 in osds are down 
monmap e1: 3 mons at 
{ceph0-mon0=10.1.8.40:6789/0,ceph0-mon1=10.1.8.41:6789/0,ceph0-mon2=10.1.8.42:6789/0}
 
election epoch 13550, quorum 0,1,2 ceph0-mon0,ceph0-mon1,ceph0-mon2 
osdmap e236889: 370 osds: 314 up, 330 in; 4096 remapped pgs 
pgmap v47890297: 20920 pgs, 19 pools, 316 TB data, 85208 kobjects 
530 TB used, 594 TB / 1125 TB avail 
42096983/185765821 objects degraded (22.661%) 
49940341/185765821 objects misplaced (26.883%) 
14390 active+clean 
3846 active+undersized+degraded+remapped+wait_backfill 
2375 active+undersized+degraded 
196 active+undersized+degraded+remapped+backfilling 
52 down+peering 
42 active+remapped+wait_backfill 
11 active+remapped 
7 active+clean+scrubbing+deep 
1 active+recovery_wait+degraded+remapped 
recovery io 2408 MB/s, 623 objects/s 


-43 304.63928 host ceph0-s1148 
303 5.43999 osd.303 down 0 1.0 
304 5.43999 osd.304 down 0 1.0 
305 5.43999 osd.305 down 0 1.0 
306 5.43999 osd.306 down 0 1.0 
307 5.43999 osd.307 down 0 1.0 
308 5.43999 osd.308 down 0 1.0 
309 5.43999 osd.309 down 0 1.0 
310 5.43999 osd.310 down 0 1.0 
311 5.43999 osd.311 down 0 1.0 
312 5.43999 osd.312 down 0 1.0 
313 5.43999 osd.313 down 0 1.0 
314 5.43999 osd.314 down 0 1.0 
315 5.43999 osd.315 down 0 1.0 
316 5.43999 osd.316 down 0 1.0 
317 5.43999 osd.317 down 0 1.0 
318 5.43999 osd.318 down 0 1.0 
319 5.43999 osd.319 down 0 1.0 
320 5.43999 osd.320 down 0 1.0 
321 5.43999 osd.321 down 0 1.0 
322 5.43999 osd.322 down 0 1.0 
323 5.43999 osd.323 down 0 1.0 
324 5.43999 osd.324 down 0 1.0 
325 5.43999 osd.325 down 0 1.0 
326 5.43999 osd.326 down 0 1.0 
327 5.43999 osd.327 down 0 1.0 
328 5.43999 osd.328 down 0 1.0 
329 5.43999 osd.329 down 0 1.0 
330 5.43999 osd.330 down 0 1.0 
331 5.43999 osd.331 down 0 1.0 
332 5.43999 osd.332 down 1.0 1.0 
333 5.43999 osd.333 down 1.0 1.0 
334 5.43999 osd.334 down 1.0 1.0 
335 5.43999 osd.335 down 0 1.0 
337 5.43999 osd.337 down 1.0 1.0 
338 5.43999 osd.338 down 0 1.0 
339 5.43999 osd.339 down 1.0 1.0 
340 5.43999 osd.340 down 0 1.0 
341 5.43999 osd.341 down 0 1.0 
342 5.43999 osd.342 down 0 1.0 
343 5.43999 osd.343 down 0 1.0 
344 5.43999 osd.344 down 0 1.0 
345 5.43999 osd.345 down 0 1.0 
346 5.43999 osd.346 down 0 1.0 
347 5.43999 osd.347 down 1.0 1.0 
348 5.43999 osd.348 down 1.0 1.0 
349 5.43999 osd.349 down 0 1.0 
350 5.43999 osd.350 down 1.0 1.0 
351 5.43999 osd.351 down 1.0 1.0 
352 5.43999 osd.352 down 1.0 1.0 
353 5.43999 osd.353 down 1.0 1.0 
354 5.43999 osd.354 down 1.0 1.0 
355 5.43999 osd.355 down 1.0 1.0 
356 5.43999 osd.356 down 1.0 1.0 
357 5.43999 osd.357 down 1.0 1.0 
358 5.43999 osd.358 down 0 1.0 
369 5.43999 osd.369 down 1.0 1.0 





    

Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS metadata pool size

2016-09-25 Thread Tyler Bishop
800TB of NVMe?  That sounds wonderful!


- Original Message -
From: "Ryan Leimenstoll" 
To: "ceph new" 
Sent: Saturday, September 24, 2016 5:37:08 PM
Subject: [ceph-users] CephFS metadata pool size

Hi all, 

We are in the process of expanding our current Ceph deployment (Jewel, 10.2.2) 
to incorporate CephFS for fast, network attached scratch storage. We are 
looking to have the metadata pool exist entirely on SSDs (or NVMe), however I 
am not sure how big to expect this pool to grow to. Is there any good rule of 
thumb or guidance to getting an estimate on this before purchasing hardware? We 
are expecting upwards of 800T usable capacity at the start.

Thanks for any insight!

Ryan Leimenstoll
rleim...@umiacs.umd.edu
University of Maryland Institute for Advanced Computer Studies

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading 0.94.6 -> 0.94.9 saturating mon node networking

2016-09-23 Thread Tyler Bishop
Your monitors are sending the new cluster map out every time it changes.

This is a known issue IIRC,  I remember reading a really interesting article on 
it a few months ago.


I think theres a slideshow from CERN that explained it.

- Original Message -
From: "Stillwell, Bryan J" 
To: ceph-users@lists.ceph.com
Sent: Wednesday, September 21, 2016 5:21:18 PM
Subject: [ceph-users] Upgrading 0.94.6 -> 0.94.9 saturating mon node
networking

While attempting to upgrade a 1200+ OSD cluster from 0.94.6 to 0.94.9 I've
run into serious performance issues every time I restart an OSD.

At first I thought the problem I was running into was caused by the osdmap
encoding bug that Dan and Wido ran into when upgrading to 0.94.7, because
I was seeing a ton (millions) of these messages in the logs:

2016-09-21 20:48:32.831040 osd.504 24.161.248.128:6810/96488 24 : cluster
[WRN] failed to encode map e727985 with expected cry

Here are the links to their descriptions of the problem:

http://www.spinics.net/lists/ceph-devel/msg30450.html
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg30783.html

I tried the solution of using the following command to stop those errors
from occurring:

ceph tell osd.* injectargs '--clog_to_monitors false'

Which did get the messages to stop spamming the log files, however, it
didn't fix the performance issue for me.

Using dstat on the mon nodes I was able to determine that every time the
osdmap is updated (by running 'ceph osd pool set data size 2' in this
example) it causes the outgoing network on all mon nodes to be saturated
for multiple seconds at a time:

system total-cpu-usage --memory-usage- -net/total-
-dsk/total- --io/total-
 time |usr sys idl wai hiq siq| used  buff  cach  free| recv
send| read  writ| read  writ
21-09 21:06:53|  1   0  99   0   0   0|11.8G  273M 18.7G  221G|2326k
9015k|   0  1348k|   0  16.0
21-09 21:06:54|  1   1  98   0   0   0|11.9G  273M 18.7G  221G|  15M
10M|   0  1312k|   0  16.0
21-09 21:06:55|  2   2  94   0   0   1|12.3G  273M 18.7G  220G|  14M
311M|   048M|   0   309
21-09 21:06:56|  2   3  93   0   0   3|12.2G  273M 18.7G  220G|7745k
1190M|   016M|   0  93.0
21-09 21:06:57|  1   2  96   0   0   1|12.0G  273M 18.7G  220G|8269k
1189M|   0  1956k|   0  10.0
21-09 21:06:58|  3   1  95   0   0   1|11.8G  273M 18.7G  221G|4854k
752M|   0  4960k|   0  21.0
21-09 21:06:59|  3   0  97   0   0   0|11.8G  273M 18.7G  221G|3098k
25M|   0  5036k|   0  26.0
21-09 21:07:00|  1   0  98   0   0   0|11.8G  273M 18.7G  221G|2247k
25M|   0  9980k|   0  45.0
21-09 21:07:01|  2   1  97   0   0   0|11.8G  273M 18.7G  221G|4149k
17M|   076M|   0   427

That would be 1190 MiB/s (or 9.982 Gbps).

Restarting every OSD on a node at once as part of the upgrade causes a
couple minutes worth of network saturation on all three mon nodes.  This
causes thousands of slow requests and many unhappy OpenStack users.

I'm now stuck about 15% into the upgrade and haven't been able to
determine how to move forward (or even backward) without causing another
outage.

I've attempted to run the same test on another cluster with 1300+ OSDs and
the outgoing network on the mon nodes didn't exceed 15 MiB/s (0.126 Gbps).

Any suggestions on how I can proceed?

Thanks,
Bryan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] High OSD to Server ratio causes udev event to timeout during system boot

2016-09-23 Thread Tyler Bishop
Hi, 

My systems have 56 x 6T disk, dual 12 core processors and 256gb ram. CentOS 7 
x64. 

During boot I'm having issues with the system going into emergency mode. 

When starting udevd "a start job is running for dev-disk-by" the timer of 1 
minute 30 seconds runs out and the system fails to boot. 

I believe this has something to do with the ceph udev rules running out of time 
before every drive can mount cleanly. 

Has anyone had this issue and can I solve my issue by setting the timeout 
higher? Does anyone know what specific systemd process this timeout is affected 
by? 

Please help! Thank you! 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cleaning Up Failed Multipart Uploads

2016-08-02 Thread Tyler Bishop
We're having the same issues. I have a 1200TB pool at 90% utilization however 
disk utilization is only 40% 







Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 




From: "Brian Felton" <bjfel...@gmail.com> 
To: "ceph-users" <ceph-us...@ceph.com> 
Sent: Wednesday, July 27, 2016 9:24:30 AM 
Subject: [ceph-users] Cleaning Up Failed Multipart Uploads 

Greetings, 

Background: If an object storage client re-uploads parts to a multipart object, 
RadosGW does not clean up all of the parts properly when the multipart upload 
is aborted or completed. You can read all of the gory details (including 
reproduction steps) in this bug report: http://tracker.ceph.com/issues/16767 . 

My setup: Hammer 0.94.6 cluster only used for S3-compatible object storage. RGW 
stripe size is 4MiB. 

My problem: I have buckets that are reporting TB more utilization (and, in one 
case, 200k more objects) than they should report. I am trying to remove the 
detritus from the multipart uploads, but removing the leftover parts directly 
from the .rgw.buckets pool is having no effect on bucket utilization (i.e. 
neither the object count nor the space used are declining). 

To give an example, I have a client that uploaded a very large multipart object 
(8000 15MiB parts). Due to a bug in the client, it uploaded each of the 8000 
parts 6 times. After the sixth attempt, it gave up and aborted the upload, at 
which point RGW removed the 8000 parts from the sixth attempt. When I list the 
bucket's contents with radosgw-admin (radosgw-admin bucket list 
--bucket= --max-entries=), I see all of the object's 
8000 parts five separate times, each under a namespace of 'multipart'. 

Since the multipart upload was aborted, I can't remove the object by name via 
the S3 interface. Since my RGW stripe size is 4MiB, I know that each part of 
the object will be stored across 4 entries in the .rgw.buckets pool -- 4 MiB in 
a 'multipart' file, and 4, 4, and 3 MiB in three successive 'shadow' files. 
I've created a script to remove these parts (rados -p .rgw.buckets rm 
__multipart_<object+prefix>. and rados -p .rgw.buckets rm 
__shadow_<object+prefix>..[1-3]). The removes are completing 
successfully (in that additional attempts to remove the object result in a 
failure), but I'm not seeing any decrease in the bucket's space used, nor am I 
seeing a decrease in the bucket's object count. In fact, if I do another 
'bucket list', all of the removed parts are still included. 

I've looked at the output of 'gc list --include-all', and the removed parts are 
never showing up for garbage collection. Garbage collection is otherwise 
functioning normally and will successfully remove data for any object properly 
removed via the S3 interface. 

I've also gone so far as to write a script to list the contents of bucket 
shards in the .rgw.buckets.index pool, check for the existence of the entry in 
.rgw.buckets, and remove entries that cannot be found, but that is also failing 
to decrement the size/object count counters. 

What am I missing here? Where, aside from .rgw.buckets and .rgw.buckets.index 
is RGW looking to determine object count and space used for a bucket? 

Many thanks to any and all who can assist. 

Brian Felton 



___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] civetweb vs Apache for rgw

2016-06-01 Thread Tyler Bishop
Use Haproxy.

sudomakeinstall.com/uncategorized/ceph-radosgw-nginx-tengine-apache-and-now-civetweb


- Original Message -
From: c...@jack.fr.eu.org
To: ceph-users@lists.ceph.com
Sent: Tuesday, May 24, 2016 5:01:05 AM
Subject: Re: [ceph-users] civetweb vs Apache for rgw

I'm using mod_rewrite and mod_expires

Dunno if it can be done via civetweb, however, my installation is older

On 24/05/2016 10:49, Karol Mroz wrote:
> On Tue, May 24, 2016 at 08:51:13AM +0100, Luis Periquito wrote:
>> It may be possible to do it with civetweb, but I use apache because of
>> HTTPS config.
> 
> Civetweb should be able to handle ssl just fine:
> 
> rgw_frontends = civetweb port=7480s ssl_certificate=/path/to/some_cert.pem
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recomendations for building 1PB RadosGW with Erasure Code

2016-02-17 Thread Tyler Bishop
I'm using 2x replica on that pool for storing rbd volumes. Our workload is 
pretty heavy, id imagine objects an ec would be light in comparison. 







Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 




From: "John Hogenmiller" <j...@hogenmiller.net> 
To: "Tyler Bishop" <tyler.bis...@beyondhosting.net> 
Cc: "Nick Fisk" <n...@fisk.me.uk>, ceph-users@lists.ceph.com 
Sent: Wednesday, February 17, 2016 7:50:11 AM 
Subject: Re: [ceph-users] Recomendations for building 1PB RadosGW with Erasure 
Code 

Tyler, 
E5-2660 V2 is a 10-core, 2.2Ghz, giving you roughly 44Ghz or 0.78Ghz per OSD. 
That seems to fall in line with Nick's "golden rule" or 0.5Ghz - 1Ghz per OSD. 

Are you doing EC or Replication? If EC, what profile? Could you also provide an 
average of CPU utilization? 

I'm still researching, but so far, the ratio seems to be pretty realistic. 

-John 

On Tue, Feb 16, 2016 at 9:22 AM, Tyler Bishop < tyler.bis...@beyondhosting.net 
> wrote: 


We use dual E5-2660 V2 with 56 6T and performance has not been an issue. It 
will easily saturate the 40G interfaces and saturate the spindle io. 

And yes, you can run dual servers attached to 30 disk each. This gives you lots 
of density. Your failure domain will remain as individual servers. The only 
thing shared is the quad power supplies. 

Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 

- Original Message - 
From: "Nick Fisk" < n...@fisk.me.uk > 
To: "Василий Ангапов" < anga...@gmail.com >, "Tyler Bishop" < 
tyler.bis...@beyondhosting.net > 
Cc: ceph-users@lists.ceph.com 
Sent: Tuesday, February 16, 2016 8:24:33 AM 
Subject: RE: [ceph-users] Recomendations for building 1PB RadosGW with Erasure 
Code 

> -Original Message- 
> From: Василий Ангапов [mailto: anga...@gmail.com ] 
> Sent: 16 February 2016 13:15 
> To: Tyler Bishop < tyler.bis...@beyondhosting.net > 
> Cc: Nick Fisk < n...@fisk.me.uk >; < ceph-users@lists.ceph.com >  us...@lists.ceph.com > 
> Subject: Re: [ceph-users] Recomendations for building 1PB RadosGW with 
> Erasure Code 
> 
> 2016-02-16 17:09 GMT+08:00 Tyler Bishop 
> < tyler.bis...@beyondhosting.net >: 
> > With ucs you can run dual server and split the disk. 30 drives per node. 
> > Better density and easier to manage. 
> I don't think I got your point. Can you please explain it in more details? 

I think he means that the 60 bays can be zoned, so you end up with physically 1 
JBOD split into two 30 logical JBOD's each connected to a different server. 
What this does to your failures domains is another question. 

> 
> And again - is dual Xeon's power enough for 60-disk node and Erasure Code? 

I would imagine yes, but you would mostly likely need to go for the 12-18core 
versions with a high clock. These are serious . I don't know at what point 
this becomes more expensive than 12 disk nodes with "cheap" Xeon-D's or Xeon 
E3's. 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recomendations for building 1PB RadosGW with Erasure Code

2016-02-16 Thread Tyler Bishop
We use dual E5-2660 V2 with 56 6T and performance has not been an issue.  It 
will easily saturate the 40G interfaces and saturate the spindle io.

And yes, you can run dual servers attached to 30 disk each.  This gives you 
lots of density.  Your failure domain will remain as individual servers.  The 
only thing shared is the quad power supplies.

Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited.

- Original Message -
From: "Nick Fisk" <n...@fisk.me.uk>
To: "Василий Ангапов" <anga...@gmail.com>, "Tyler Bishop" 
<tyler.bis...@beyondhosting.net>
Cc: ceph-users@lists.ceph.com
Sent: Tuesday, February 16, 2016 8:24:33 AM
Subject: RE: [ceph-users] Recomendations for building 1PB RadosGW with Erasure 
Code

> -Original Message-
> From: Василий Ангапов [mailto:anga...@gmail.com]
> Sent: 16 February 2016 13:15
> To: Tyler Bishop <tyler.bis...@beyondhosting.net>
> Cc: Nick Fisk <n...@fisk.me.uk>; <ceph-users@lists.ceph.com>  us...@lists.ceph.com>
> Subject: Re: [ceph-users] Recomendations for building 1PB RadosGW with
> Erasure Code
> 
> 2016-02-16 17:09 GMT+08:00 Tyler Bishop
> <tyler.bis...@beyondhosting.net>:
> > With ucs you can run dual server and split the disk.  30 drives per node.
> > Better density and easier to manage.
> I don't think I got your point. Can you please explain it in more details?

I think he means that the 60 bays can be zoned, so you end up with physically 1 
JBOD split into two 30 logical JBOD's each connected to a different server. 
What this does to your failures domains is another question.

> 
> And again - is dual Xeon's power enough for 60-disk node and Erasure Code?

I would imagine yes, but you would mostly likely need to go for the 12-18core 
versions with a high clock. These are serious . I don't know at what point 
this becomes more expensive than 12 disk nodes with "cheap" Xeon-D's or Xeon 
E3's.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recomendations for building 1PB RadosGW with Erasure Code

2016-02-16 Thread Tyler Bishop
With ucs you can run dual server and split the disk.  30 drives per node.  
Better density and easier to manage. 

Sent from TypeApp



On Feb 16, 2016, 3:39 AM, at 3:39 AM, "Василий Ангапов" <anga...@gmail.com> 
wrote:
>Nick, Tyler, many thanks for very helpful feedback!
>I spent many hours meditating on the following two links:
>http://www.supermicro.com/solutions/storage_ceph.cfm
>http://s3s.eu/cephshop
>
>60- or even 72-disk nodes are very capacity-efficient, but will the 2
>CPUs (even the fastest ones) be enough to handle Erasure Coding?
>Also as Nick stated with 4-5 nodes I cannot use high-M "K+M"
>combinations.
>I've did some calculations and found that the most efficient and safe
>configuration is to use 10 nodes with 29*6TB SATA and 7*200GB S3700
>for journals. Assuming 6+3 EC profile that will give me 1.16 PB of
>effective space. Also I prefer not to use precious NVMe drives. Don't
>see any reason to use them.
>
>But what about RAM? Can I go with 64GB per node with above config?
>I've seen OSDs are consuming not more than 1GB RAM for replicated
>pools (even 6TB ones). But what is the typical memory usage of EC
>pools? Does anybody know that?
>
>Also, am I right that for 6+3 EC profile i need at least 10 nodes to
>feel comfortable (one extra node for redundancy)?
>
>And finally can someone recommend what EC plugin to use in my case? I
>know it's a difficult question but anyway?
>
>
>
>
>
>
>
>
>
>2016-02-16 16:12 GMT+08:00 Nick Fisk <n...@fisk.me.uk>:
>>
>>
>>> -Original Message-
>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
>Behalf Of
>>> Tyler Bishop
>>> Sent: 16 February 2016 04:20
>>> To: Василий Ангапов <anga...@gmail.com>
>>> Cc: ceph-users <ceph-users@lists.ceph.com>
>>> Subject: Re: [ceph-users] Recomendations for building 1PB RadosGW
>with
>>> Erasure Code
>>>
>>> You should look at a 60 bay 4U chassis like a Cisco UCS C3260.
>>>
>>> We run 4 systems at 56x6tB with dual E5-2660 v2 and 256gb ram.
>>> Performance is excellent.
>>
>> Only thing I will say to the OP, is that if you only need 1PB, then
>likely 4-5 of these will give you enough capacity. Personally I would
>prefer to spread the capacity around more nodes. If you are doing
>anything serious with Ceph its normally a good idea to try and make
>each node no more than 10% of total capacity. Also with Ec pools you
>will be limited to the K+M combo's you can achieve with smaller number
>of nodes.
>>
>>>
>>> I would recommend a cache tier for sure if your data is busy for
>reads.
>>>
>>> Tyler Bishop
>>> Chief Technical Officer
>>> 513-299-7108 x10
>>>
>>>
>>>
>>> tyler.bis...@beyondhosting.net
>>>
>>>
>>> If you are not the intended recipient of this transmission you are
>notified
>>> that disclosing, copying, distributing or taking any action in
>reliance on the
>>> contents of this information is strictly prohibited.
>>>
>>> - Original Message -
>>> From: "Василий Ангапов" <anga...@gmail.com>
>>> To: "ceph-users" <ceph-users@lists.ceph.com>
>>> Sent: Friday, February 12, 2016 7:44:07 AM
>>> Subject: [ceph-users] Recomendations for building 1PB RadosGW with
>>> Erasure   Code
>>>
>>> Hello,
>>>
>>> We are planning to build 1PB Ceph cluster for RadosGW with Erasure
>Code. It
>>> will be used for storing online videos.
>>> We do not expect outstanding write performace, something like 200-
>>> 300MB/s of sequental write will be quite enough, but data safety is
>very
>>> important.
>>> What are the most popular hardware and software recomendations?
>>> 1) What EC profile is best to use? What values of K/M do you
>recommend?
>>
>> The higher total k+m you go, you will require more CPU and sequential
>performance will degrade slightly as the IO's are smaller going to the
>disks. However larger numbers allow you to be more creative with
>failure scenarios and "replication" efficiency.
>>
>>> 2) Do I need to use Cache Tier for RadosGW or it is only needed for
>RBD? Is it
>>
>> Only needed for RBD, but depending on workload it may still benefit.
>If you are mostly doing large IO's, the gains will be a lot smaller.
>>
>>> still an overall good practice to use Cache Tier for RadosGW?
>>> 3) What hardware is recommended for EC? I assume higher-clocked CPUs
&

Re: [ceph-users] Recomendations for building 1PB RadosGW with Erasure Code

2016-02-15 Thread Tyler Bishop
You should look at a 60 bay 4U chassis like a Cisco UCS C3260.

We run 4 systems at 56x6tB with dual E5-2660 v2 and 256gb ram.  Performance is 
excellent.

I would recommend a cache tier for sure if your data is busy for reads.

Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited.

- Original Message -
From: "Василий Ангапов" <anga...@gmail.com>
To: "ceph-users" <ceph-users@lists.ceph.com>
Sent: Friday, February 12, 2016 7:44:07 AM
Subject: [ceph-users] Recomendations for building 1PB RadosGW with Erasure  
Code

Hello,

We are planning to build 1PB Ceph cluster for RadosGW with Erasure
Code. It will be used for storing online videos.
We do not expect outstanding write performace, something like
200-300MB/s of sequental write will be quite enough, but data safety
is very important.
What are the most popular hardware and software recomendations?
1) What EC profile is best to use? What values of K/M do you recommend?
2) Do I need to use Cache Tier for RadosGW or it is only needed for
RBD? Is it still an overall good practice to use Cache Tier for
RadosGW?
3) What hardware is recommended for EC? I assume higher-clocked CPUs
are needed? What about RAM?
4) What SSDs for Ceph journals are the best?

Thanks a lot!

Regards, Vasily.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multipath devices with infernalis

2016-02-12 Thread Tyler Bishop
Your probably running into issues with sysvinit / upstart / whatever. 

Try partitioning the DM and then mapping it directly in your ceph.conf under 
the osd section. 

It should work, ceph is just a process using the filesystem. 







Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 




From: "Andrus, Brian Contractor" <bdand...@nps.edu> 
To: ceph-users@lists.ceph.com 
Sent: Thursday, February 11, 2016 5:35:30 PM 
Subject: [ceph-users] Multipath devices with infernalis 



All, 



I have a set of hardware with a few systems connected via IB along with a DDN 
SFA12K. 

There are 4 IB/SRP paths to each block device. Those show up as 
/dev/mapper/mpath[b-d] 



I am trying to do an initial install/setup of ceph on 3 nodes. Each will be a 
monitor as well as host a single OSD. 



I am using the ceph-deploy to do most of the heavy lifting (using CentOS 
7.2.1511). 



Everything is quite successful installing monitors and even the first OSD. 



ceph status shows: 

cluster 0d9e68e4-176d-4229-866b-d408f8055e5b 

health HEALTH_OK 

monmap e1: 3 mons at 
{ceph-1-35a=10.100.1.35:6789/0,ceph-1-35b=10.100.1.85:6789/0,ceph-1-36a=10.100.1.36:6789/0}
 

election epoch 8, quorum 0,1,2 ceph-1-35a,ceph-1-36a,ceph-1-35b 

osdmap e5: 1 osds: 1 up, 1 in 

flags sortbitwise 

pgmap v8: 64 pgs, 1 pools, 0 bytes data, 0 objects 

40112 kB used, 43888 GB / 43889 GB avail 

64 active+clean 



But as soon as I try to add the next OSD on the next system using 

ceph-deploy osd create ceph-1-35b:/dev/mapper/mpathc 

things start acting up. 

The last bit from the output seems ok: 
[ceph-1-35b][INFO ] checking OSD status... 

[ceph-1-35b][INFO ] Running command: ceph --cluster=ceph osd stat --format=json 

[ceph-1-35b][WARNIN] there is 1 OSD down 

[ceph-1-35b][WARNIN] there is 1 OSD out 

[ceph_deploy.osd][DEBUG ] Host ceph-1-35b is now ready for osd use. 



But, ceph status is now: 

cluster 0d9e68e4-176d-4229-866b-d408f8055e5b 

health HEALTH_OK 

monmap e1: 3 mons at 
{ceph-1-35a=10.100.1.35:6789/0,ceph-1-35b=10.100.1.85:6789/0,ceph-1-36a=10.100.1.36:6789/0}
 

election epoch 8, quorum 0,1,2 ceph-1-35a,ceph-1-36a,ceph-1-35b 

osdmap e6: 2 osds: 1 up, 1 in 

flags sortbitwise 

pgmap v10: 64 pgs, 1 pools, 0 bytes data, 0 objects 

40120 kB used, 43888 GB / 43889 GB avail 

64 active+clean 



And ceph osd tree: 

ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY 

-1 42.86040 root default 

-2 42.86040 host ceph-1-35a 

0 42.86040 osd.0 up 1.0 1.0 

1 0 osd.1 down 0 1.0 



I don’t understand why ceph-deploy didn’t activate this one when it did for the 
first. The OSD is not mounted on the other box. 

I can try to activate the down OSD (ceph-deploy disk activate 
ceph-1-35b:/dev/mapper/mpathc1:/dev/mapper/mpathc2) 

Things look good for a bit: 

cluster 0d9e68e4-176d-4229-866b-d408f8055e5b 

health HEALTH_OK 

monmap e1: 3 mons at 
{ceph-1-35a=10.100.1.35:6789/0,ceph-1-35b=10.100.1.85:6789/0,ceph-1-36a=10.100.1.36:6789/0}
 

election epoch 8, quorum 0,1,2 ceph-1-35a,ceph-1-36a,ceph-1-35b 

osdmap e8: 2 osds: 2 up, 2 in 

flags sortbitwise 

pgmap v14: 64 pgs, 1 pools, 0 bytes data, 0 objects 

74804 kB used, 8 GB / 87778 GB avail 

64 active+clean 



But after about 1 minute, it goes down: 

ceph osd tree 

ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY 

-1 85.72079 root default 

-2 42.86040 host ceph-1-35a 

0 42.86040 osd.0 up 1.0 1.0 

-3 42.86040 host ceph-1-35b 

1 42.86040 osd.1 down 1.0 1.0 



ceph status 

cluster 0d9e68e4-176d-4229-866b-d408f8055e5b 

health HEALTH_WARN 

1/2 in osds are down 

monmap e1: 3 mons at 
{ceph-1-35a=10.100.1.35:6789/0,ceph-1-35b=10.100.1.85:6789/0,ceph-1-36a=10.100.1.36:6789/0}
 

election epoch 8, quorum 0,1,2 ceph-1-35a,ceph-1-36a,ceph-1-35b 

osdmap e9: 2 osds: 1 up, 2 in 

flags sortbitwise 

pgmap v15: 64 pgs, 1 pools, 0 bytes data, 0 objects 

74804 kB used, 8 GB / 87778 GB avail 

64 active+clean 



Has anyone played with getting multipath devices to work? 
Of course it could be something completely different and I need to step back 
and see what step is failing. Any insight into where to dig would be 
appreciated. 



Thanks in advance, 

Brian Andrus 

ITACS/Research Computing 

Naval Postgraduate School 

Monterey, California 

voice: 831-656-6238 



___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v10.0.3 released

2016-02-12 Thread Tyler Bishop
Great work as always sage!

Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited.

- Original Message -
From: "Sage Weil" <sw...@redhat.com>
To: ceph-de...@vger.kernel.org, "ceph-users" <ceph-us...@ceph.com>, 
ceph-maintain...@ceph.com, ceph-annou...@ceph.com
Sent: Wednesday, February 10, 2016 2:40:46 PM
Subject: [ceph-users] v10.0.3 released

This is the fourth development release for Jewel. Several big pieces have 
been added this release, including BlueStore (a new backend for OSD to 
replace FileStore), many ceph-disk fixes, a new CRUSH tunable that 
improves mapping stability, a new librados object enumeration API, and a 
whole slew of OSD and RADOS optimizations.

Note that, due to general developer busyness, we aren’t building official 
release packages for this dev release. You can fetch autobuilt gitbuilder 
packages from the usual location (http://gitbuilder.ceph.com).

Notable Changes
---

http://ceph.com/releases/v10-0-3-released/

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* For packages, see 
http://ceph.com/docs/master/install/get-packages#add-ceph-development
* For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mirrors wanted!

2016-02-07 Thread Tyler Bishop
http://ceph.mirror.beyondhosting.net/

I need to know what server will be keeping the master copy for rsync to pull 
changes from.

Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited.

- Original Message -
From: "Wido den Hollander" <w...@42on.com>
To: "Tyler Bishop" <tyler.bis...@beyondhosting.net>
Cc: "ceph-users" <ceph-us...@ceph.com>
Sent: Sunday, February 7, 2016 4:22:13 AM
Subject: Re: [ceph-users] Ceph mirrors wanted!

> Op 6 februari 2016 om 15:48 schreef Tyler Bishop
> <tyler.bis...@beyondhosting.net>:
> 
> 
> Covered except that the dreamhost mirror is constantly down or broken.
> 

Yes. Working on that.

> I can add ceph.mirror.beyondhosting.net for it.
> 

Great. Would us-east.ceph.com work for you? I can CNAME that to
ceph.mirror.beyondhosting.net.

I see that mirror.beyondhosting.net has IPv4 and IPv6, so that is good.

If you are OK, I'll add you to the ceph-mirrors list so we can get this up and
running.

Wido

> Tyler Bishop 
> Chief Technical Officer 
> 513-299-7108 x10 
> 
> 
> 
> tyler.bis...@beyondhosting.net 
> 
> 
> If you are not the intended recipient of this transmission you are notified
> that disclosing, copying, distributing or taking any action in reliance on the
> contents of this information is strictly prohibited.
> 
> - Original Message -
> From: "Wido den Hollander" <w...@42on.com>
> To: "Tyler Bishop" <tyler.bis...@beyondhosting.net>
> Cc: "ceph-users" <ceph-us...@ceph.com>
> Sent: Saturday, February 6, 2016 2:46:50 AM
> Subject: Re: [ceph-users] Ceph mirrors wanted!
> 
> > Op 6 februari 2016 om 0:08 schreef Tyler Bishop
> > <tyler.bis...@beyondhosting.net>:
> > 
> > 
> > I have ceph pulling down from eu.   What *origin* should I setup rsync to
> > automatically pull from?
> > 
> > download.ceph.com is consistently broken.
> > 
> 
> download.ceph.com should be your best guess, since that is closest.
> 
> The US however seems covered with download.ceph.com although we might set up
> us-east and us-west.
> 
> I see that Ceph is currently in a subfolder called 'Ceph' and that is not
> consistent with the other mirrors.
> 
> Can that be fixed so that it matches the original directory structure?
> 
> Wido
> 
> > - Original Message -
> > From: "Tyler Bishop" <tyler.bis...@beyondhosting.net>
> > To: "Wido den Hollander" <w...@42on.com>
> > Cc: "ceph-users" <ceph-us...@ceph.com>
> > Sent: Friday, February 5, 2016 5:59:20 PM
> > Subject: Re: [ceph-users] Ceph mirrors wanted!
> > 
> > We would be happy to mirror the project.
> > 
> > http://mirror.beyondhosting.net
> > 
> > 
> > - Original Message -
> > From: "Wido den Hollander" <w...@42on.com>
> > To: "ceph-users" <ceph-us...@ceph.com>
> > Sent: Saturday, January 30, 2016 9:14:59 AM
> > Subject: [ceph-users] Ceph mirrors wanted!
> > 
> > Hi,
> > 
> > My PR was merged with a script to mirror Ceph properly:
> > https://github.com/ceph/ceph/tree/master/mirroring
> > 
> > Currently there are 3 (official) locations where you can get Ceph:
> > 
> > - download.ceph.com (Dreamhost, US)
> > - eu.ceph.com (PCextreme, Netherlands)
> > - au.ceph.com (Digital Pacific, Australia)
> > 
> > I'm looking for more mirrors to become official mirrors so we can easily
> > distribute Ceph.
> > 
> > Mirrors do go down and it's always nice to have a mirror local to you.
> > 
> > I'd like to have one or more mirrors in Asia, Africa and/or South
> > Ameirca if possible. Anyone able to host there? Other locations are
> > welcome as well!
> > 
> > A few things which are required:
> > 
> > - 1Gbit connection or more
> > - Native IPv4 and IPv6
> > - HTTP access
> > - rsync access
> > - 2TB of storage or more
> > - Monitoring of the mirror/source
> > 
> > You can easily mirror Ceph yourself with this script I wrote:
> > https://github.com/ceph/ceph/blob/master/mirroring/mirror-ceph.sh
> > 
> > eu.ceph.com and au.ceph.com use it to sync from download.ceph.com. If
> > you want to mirror Ceph locally, please pick a mirror local to you.
> > 
> > Please refer to these guidelines:
> > https://github.com/ceph/ceph/tree/master/mirroring#guidelines
> > 
> > -- 
> > Wido den Hollander
> > 42on B.V.
> > Ceph trainer and consultant
> > 
> > Phone: +31 (0)20 700 9902
> > Skype: contact42on
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH health issues

2016-02-06 Thread Tyler Bishop
You need to get your OSD back online. 




From: "Jeffrey McDonald"  
To: ceph-users@lists.ceph.com 
Sent: Saturday, February 6, 2016 8:18:06 AM 
Subject: [ceph-users] CEPH health issues 

Hi, 
I'm seeing lots of issues with my CEPH installation. The health of the system 
is degraded and many of the OSD are down. 

# ceph -v 
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 

# ceph health 
HEALTH_ERR 2002 pgs degraded; 14 pgs down; 180 pgs inconsistent; 14 pgs 
peering; 1 pgs stale; 2002 pgs stuck degraded; 14 pgs stuck inactive; 1 pgs 
stuck stale; 2320 pgs stuck unclean; 2002 pgs stuck undersized; 2002 pgs 
undersized; 100 requests are blocked > 32 sec; recovery 3802/531925830 
objects degraded (7.150%); recovery 48881596/531925830 objects misplaced 
(9.190%); 12623 scrub errors; 11/320 in osds are down; noout flag(s) set 

Log for one of the down OSDes shows: 

-5> 2016-02-05 19:10:45.294873 7fd4d58e4700 1 -- 10.31.0.3:6835/157558 --> 
10.31.0.5:0/3796 -- osd_ping(ping_reply e144138 stamp 2016-02-05 
19:10:45.286934) v2 -- ?+ 
0 0x4359a00 con 0x2bc9ac60 
-4> 2016-02-05 19:10:45.294915 7fd4d70e7700 1 -- 10.31.0.67:6835/157558 --> 
10.31.0.5:0/3796 -- osd_ping(ping_reply e144138 stamp 2016-02-05 
19:10:45.286934) v2 -- ? 
+0 0x27e21800 con 0x2bacd700 
-3> 2016-02-05 19:10:45.341383 7fd4e2ea8700 0 
filestore(/var/lib/ceph/osd/ceph-299) error (39) Directory not empty not 
handled on operation 0x12c88178 (6494115.0.1, 
or op 1, counting from 0) 
-2> 2016-02-05 19:10:45.341477 7fd4e2ea8700 0 
filestore(/var/lib/ceph/osd/ceph-299) ENOTEMPTY suggests garbage data in osd 
data dir 
-1> 2016-02-05 19:10:45.341493 7fd4e2ea8700 0 
filestore(/var/lib/ceph/osd/ceph-299) transaction dump: 
{ 
"ops": [ 
{ 
"op_num": 0, 
"op_name": "remove", 
"collection": "70.532s3_head", 
"oid": "532\/\/head\/\/70\/18446744073709551615\/3" 
}, 
{ 
"op_num": 1, 
"op_name": "rmcoll", 
"collection": "70.532s3_head" 
} 
] 
} 

0> 2016-02-05 19:10:45.343794 7fd4e2ea8700 -1 os/FileStore.cc: In function 
'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, 
int, ThreadP 
ool::TPHandle*)' thread 7fd4e2ea8700 time 2016-02-05 19:10:45.341673 
os/FileStore.cc: 2757: FAILED assert(0 == "unexpected error") 

ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) 
[0xbc60eb] 
2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, 
ThreadPool::TPHandle*)+0xa52) [0x923d12] 
3: (FileStore::_do_transactions(std::list >&, unsigned long, 
ThreadPool::TPHandle*)+0x64) [0x92a3a4] 
4: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) 
[0x92a52a] 
5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e] 
6: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0] 
7: (()+0x8182) [0x7fd4ef916182] 
8: (clone()+0x6d) [0x7fd4ede8147d] 
NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this. 

--- logging levels --- 
0/ 5 none 
0/ 1 lockdep 
0/ 1 context 
1/ 1 crush 
1/ 5 mds 
1/ 5 mds_balancer 
1/ 5 mds_locker 
1/ 5 mds_log 
1/ 5 mds_log_expire 
1/ 5 mds_migrator 
0/ 1 buffer 
0/ 1 timer 
0/ 1 filer 
0/ 1 striper 
0/ 1 objecter 
0/ 5 rados 
0/ 5 rbd 
0/ 5 rbd_replay 
0/ 5 journaler 
0/ 5 objectcacher 
0/ 5 client 
0/ 5 osd 
0/ 5 optracker 
0/ 5 objclass 
1/ 3 filestore 
1/ 3 keyvaluestore 
1/ 3 journal 
0/ 5 ms 
1/ 5 mon 
0/10 monc 
1/ 5 paxos 
0/ 5 tp 
1/ 5 auth 
1/ 5 crypto 
1/ 1 finisher 
1/ 5 heartbeatmap 
1/ 5 perfcounter 
1/ 5 rgw 
1/10 civetweb 
1/ 5 javaclient 
1/ 5 asok 
1/ 1 throttle 
0/ 0 refs 
1/ 5 xio 
-2/-2 (syslog threshold) 
-1/-1 (stderr threshold) 
max_recent 1 
max_new 1000 
log_file /var/log/ceph/ceph-osd.299.log 
--- end dump of recent events --- 
2016-02-05 19:10:45.441428 7fd4e2ea8700 -1 *** Caught signal (Aborted) ** 
in thread 7fd4e2ea8700 

ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 
1: /usr/bin/ceph-osd() [0xacd7ba] 
2: (()+0x10340) [0x7fd4ef91e340] 
3: (gsignal()+0x39) [0x7fd4eddbdcc9] 
4: (abort()+0x148) [0x7fd4eddc10d8] 
5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fd4ee6c8535] 
6: (()+0x5e6d6) [0x7fd4ee6c66d6] 
7: (()+0x5e703) [0x7fd4ee6c6703] 
8: (()+0x5e922) [0x7fd4ee6c6922] 
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) 
[0xbc62d8] 
10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, 
ThreadPool::TPHandle*)+0xa52) [0x923d12] 
11: (FileStore::_do_transactions(std::list >&, unsigned long, 
ThreadPool::TPHandle*)+0x64) [0x92a3a4 
] 
12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) 
[0x92a52a] 
13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb6b4e] 
14: (ThreadPool::WorkThread::entry()+0x10) [0xbb7bf0] 
15: (()+0x8182) [0x7fd4ef916182] 
16: 

Re: [ceph-users] Ceph mirrors wanted!

2016-02-06 Thread Tyler Bishop
Covered except that the dreamhost mirror is constantly down or broken.

I can add ceph.mirror.beyondhosting.net for it.

Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited.

- Original Message -
From: "Wido den Hollander" <w...@42on.com>
To: "Tyler Bishop" <tyler.bis...@beyondhosting.net>
Cc: "ceph-users" <ceph-us...@ceph.com>
Sent: Saturday, February 6, 2016 2:46:50 AM
Subject: Re: [ceph-users] Ceph mirrors wanted!

> Op 6 februari 2016 om 0:08 schreef Tyler Bishop
> <tyler.bis...@beyondhosting.net>:
> 
> 
> I have ceph pulling down from eu.   What *origin* should I setup rsync to
> automatically pull from?
> 
> download.ceph.com is consistently broken.
> 

download.ceph.com should be your best guess, since that is closest.

The US however seems covered with download.ceph.com although we might set up
us-east and us-west.

I see that Ceph is currently in a subfolder called 'Ceph' and that is not
consistent with the other mirrors.

Can that be fixed so that it matches the original directory structure?

Wido

> - Original Message -
> From: "Tyler Bishop" <tyler.bis...@beyondhosting.net>
> To: "Wido den Hollander" <w...@42on.com>
> Cc: "ceph-users" <ceph-us...@ceph.com>
> Sent: Friday, February 5, 2016 5:59:20 PM
> Subject: Re: [ceph-users] Ceph mirrors wanted!
> 
> We would be happy to mirror the project.
> 
> http://mirror.beyondhosting.net
> 
> 
> - Original Message -
> From: "Wido den Hollander" <w...@42on.com>
> To: "ceph-users" <ceph-us...@ceph.com>
> Sent: Saturday, January 30, 2016 9:14:59 AM
> Subject: [ceph-users] Ceph mirrors wanted!
> 
> Hi,
> 
> My PR was merged with a script to mirror Ceph properly:
> https://github.com/ceph/ceph/tree/master/mirroring
> 
> Currently there are 3 (official) locations where you can get Ceph:
> 
> - download.ceph.com (Dreamhost, US)
> - eu.ceph.com (PCextreme, Netherlands)
> - au.ceph.com (Digital Pacific, Australia)
> 
> I'm looking for more mirrors to become official mirrors so we can easily
> distribute Ceph.
> 
> Mirrors do go down and it's always nice to have a mirror local to you.
> 
> I'd like to have one or more mirrors in Asia, Africa and/or South
> Ameirca if possible. Anyone able to host there? Other locations are
> welcome as well!
> 
> A few things which are required:
> 
> - 1Gbit connection or more
> - Native IPv4 and IPv6
> - HTTP access
> - rsync access
> - 2TB of storage or more
> - Monitoring of the mirror/source
> 
> You can easily mirror Ceph yourself with this script I wrote:
> https://github.com/ceph/ceph/blob/master/mirroring/mirror-ceph.sh
> 
> eu.ceph.com and au.ceph.com use it to sync from download.ceph.com. If
> you want to mirror Ceph locally, please pick a mirror local to you.
> 
> Please refer to these guidelines:
> https://github.com/ceph/ceph/tree/master/mirroring#guidelines
> 
> -- 
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mirrors wanted!

2016-02-05 Thread Tyler Bishop
I have ceph pulling down from eu.   What *origin* should I setup rsync to 
automatically pull from?

download.ceph.com is consistently broken.

- Original Message -
From: "Tyler Bishop" <tyler.bis...@beyondhosting.net>
To: "Wido den Hollander" <w...@42on.com>
Cc: "ceph-users" <ceph-us...@ceph.com>
Sent: Friday, February 5, 2016 5:59:20 PM
Subject: Re: [ceph-users] Ceph mirrors wanted!

We would be happy to mirror the project.

http://mirror.beyondhosting.net


- Original Message -
From: "Wido den Hollander" <w...@42on.com>
To: "ceph-users" <ceph-us...@ceph.com>
Sent: Saturday, January 30, 2016 9:14:59 AM
Subject: [ceph-users] Ceph mirrors wanted!

Hi,

My PR was merged with a script to mirror Ceph properly:
https://github.com/ceph/ceph/tree/master/mirroring

Currently there are 3 (official) locations where you can get Ceph:

- download.ceph.com (Dreamhost, US)
- eu.ceph.com (PCextreme, Netherlands)
- au.ceph.com (Digital Pacific, Australia)

I'm looking for more mirrors to become official mirrors so we can easily
distribute Ceph.

Mirrors do go down and it's always nice to have a mirror local to you.

I'd like to have one or more mirrors in Asia, Africa and/or South
Ameirca if possible. Anyone able to host there? Other locations are
welcome as well!

A few things which are required:

- 1Gbit connection or more
- Native IPv4 and IPv6
- HTTP access
- rsync access
- 2TB of storage or more
- Monitoring of the mirror/source

You can easily mirror Ceph yourself with this script I wrote:
https://github.com/ceph/ceph/blob/master/mirroring/mirror-ceph.sh

eu.ceph.com and au.ceph.com use it to sync from download.ceph.com. If
you want to mirror Ceph locally, please pick a mirror local to you.

Please refer to these guidelines:
https://github.com/ceph/ceph/tree/master/mirroring#guidelines

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mirrors wanted!

2016-02-05 Thread Tyler Bishop
We would be happy to mirror the project.

http://mirror.beyondhosting.net


- Original Message -
From: "Wido den Hollander" 
To: "ceph-users" 
Sent: Saturday, January 30, 2016 9:14:59 AM
Subject: [ceph-users] Ceph mirrors wanted!

Hi,

My PR was merged with a script to mirror Ceph properly:
https://github.com/ceph/ceph/tree/master/mirroring

Currently there are 3 (official) locations where you can get Ceph:

- download.ceph.com (Dreamhost, US)
- eu.ceph.com (PCextreme, Netherlands)
- au.ceph.com (Digital Pacific, Australia)

I'm looking for more mirrors to become official mirrors so we can easily
distribute Ceph.

Mirrors do go down and it's always nice to have a mirror local to you.

I'd like to have one or more mirrors in Asia, Africa and/or South
Ameirca if possible. Anyone able to host there? Other locations are
welcome as well!

A few things which are required:

- 1Gbit connection or more
- Native IPv4 and IPv6
- HTTP access
- rsync access
- 2TB of storage or more
- Monitoring of the mirror/source

You can easily mirror Ceph yourself with this script I wrote:
https://github.com/ceph/ceph/blob/master/mirroring/mirror-ceph.sh

eu.ceph.com and au.ceph.com use it to sync from download.ceph.com. If
you want to mirror Ceph locally, please pick a mirror local to you.

Please refer to these guidelines:
https://github.com/ceph/ceph/tree/master/mirroring#guidelines

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Journal

2016-01-28 Thread Tyler Bishop
This is an interesting topic that i've been waiting for. 

Right now we run the journal as a partition on the data disk. I've build drives 
without journals and the write performance seems okay but random io performance 
is poor in comparison to what it should be. 







Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 




From: "Bill WONG" <wongahsh...@gmail.com> 
To: "ceph-users" <ceph-users@lists.ceph.com> 
Sent: Thursday, January 28, 2016 1:36:01 PM 
Subject: [ceph-users] SSD Journal 

Hi, 
i have tested with SSD Journal with SATA, it works perfectly.. now, i am 
testing with full SSD ceph cluster, now with full SSD ceph cluster, do i still 
need to have SSD as journal disk? 

[ assumed i do not have PCIe SSD Flash which is better performance than normal 
SSD disk] 

please give some ideas on full ssd ceph cluster ... thank you! 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD Journal

2016-01-28 Thread Tyler Bishop
What approach did sandisk take with this for jewel? 







Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 




From: "Jan Schermer" <j...@schermer.cz> 
To: "Tyler Bishop" <tyler.bis...@beyondhosting.net> 
Cc: "Bill WONG" <wongahsh...@gmail.com>, ceph-users@lists.ceph.com 
Sent: Thursday, January 28, 2016 4:32:54 PM 
Subject: Re: [ceph-users] SSD Journal 

You can't run Ceph OSD without a journal. The journal is always there. 
If you don't have a journal partition then there's a "journal" file on the OSD 
filesystem that does the same thing. If it's a partition then this file turns 
into a symlink. 

You will always be better off with a journal on a separate partition because of 
the way writeback cache in linux works (someone correct me if I'm wrong). 
The journal needs to flush to disk quite often, and linux is not always able to 
flush only the journal data. You can't defer metadata flushing forever and also 
doing fsync() makes all the dirty data flush as well. ext2/3/4 also flushes 
data to the filesystem periodicaly (5s is it I think?) which will make the 
latency of the journal go through the roof momentarily. 
(I'll leave researching how exactly XFS does it to those who care about that 
"filesystem'o'thing"). 

P.S. I feel very strongly that this whole concept is broken fundamentaly. We 
already have a journal for the filesystem which is time proven, well behaved 
and above all fast. Instead there's this reinvented wheel which supposedly does 
it better in userspace while not really avoiding the filesystem journal either. 
It would maybe make sense if OSD was storing the data on a block device 
directly, avoiding the filesystem altogether. But it would still do the same 
bloody thing and (no disrespect) ext4 does this better than Ceph ever will. 





On 28 Jan 2016, at 20:01, Tyler Bishop < tyler.bis...@beyondhosting.net > 
wrote: 

This is an interesting topic that i've been waiting for. 

Right now we run the journal as a partition on the data disk. I've build drives 
without journals and the write performance seems okay but random io performance 
is poor in comparison to what it should be. 





Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 

tyler.bis...@beyondhosting.net 

If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 





From: "Bill WONG" < wongahsh...@gmail.com > 
To: "ceph-users" < ceph-users@lists.ceph.com > 
Sent: Thursday, January 28, 2016 1:36:01 PM 
Subject: [ceph-users] SSD Journal 

Hi, 
i have tested with SSD Journal with SATA, it works perfectly.. now, i am 
testing with full SSD ceph cluster, now with full SSD ceph cluster, do i still 
need to have SSD as journal disk? 

[ assumed i do not have PCIe SSD Flash which is better performance than normal 
SSD disk] 

please give some ideas on full ssd ceph cluster ... thank you! 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] downloads.ceph.com no longer valid?

2016-01-27 Thread Tyler Bishop
tyte... ceph pool go rogue?

- Original Message -
From: "Gregory Farnum" 
To: "John Hogenmiller" 
Cc: ceph-users@lists.ceph.com
Sent: Wednesday, January 27, 2016 2:08:36 PM
Subject: Re: [ceph-users] downloads.ceph.com no longer valid?

Infrastructure guys say it's down and they are working on it.

On Wed, Jan 27, 2016 at 11:01 AM, John Hogenmiller  wrote:
> I dug a bit more.
>
> download.ceph.com resolves (for me) to 173.236.253.173 and is not responding
> to icmp, port 80, or 443
>
> https://git.ceph.com/release.asc works
> https://ceph.com/keys/release.asc returns 404
> http://eu.ceph.com/keys/release.asc works
> http://au.ceph.com/keys/release.asc returns 404
>
> http://eu.ceph.com/debian-infernalis/ works
> http://ceph.com/debian-infernalis/ redirects to
> http://download.ceph.com/debian-infernalis/
> http://au.ceph.com/debian-infernalis/ returns 503
>
> So at this point, it looks like eu.ceph.com is working, au.ceph.com is out
> of sync, and download.ceph.com is not working (and it didn't work for me
> last week either, requiring me to use gitbuilder.ceph.com.
>
>
>
> On Wed, Jan 27, 2016 at 11:58 AM, Moulin Yoann  wrote:
>>
>> Hello,
>>
>>
>> > I installed ceph last week from the docs and noticed all the
>> > downloads.ceph.com  and ceph.com/download
>> >  links no longer worked.  After various
>> > searching around, I substituted and got past this. But then I forgot
>> > about it until I went to install again this week.
>> >
>> > Are the below URLs from my history the new correct?  If so, should I go
>> > ahead and try and get a PR for docs updated?
>> >
>> > https://raw.github.com/ceph/ceph/master/keys/autobuild.asc
>> >
>> > http://gitbuilder.ceph.com/libapache-mod-fastcgi-deb-$(lsb_release
>> > -sc)-x86_64-basic/ref/master
>>
>> it was OK one 1h ago, maybe something goes wrong on some servers
>>
>> --
>> Yoann Moulin
>> EPFL IC-IT
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] downloads.ceph.com no longer valid?

2016-01-27 Thread Tyler Bishop
No they need it to work. 







Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 




From: "☣Adam" <a...@dc949.org> 
To: "Gregory Farnum" <gfar...@redhat.com> 
Cc: ceph-users@lists.ceph.com, "Tyler Bishop" <tyler.bis...@beyondhosting.net> 
Sent: Wednesday, January 27, 2016 4:57:08 PM 
Subject: Re: [ceph-users] downloads.ceph.com no longer valid? 



It's not hosted on ceph? Well that's your problem right there. ;-) 
On Jan 27, 2016 3:55 PM, "Gregory Farnum" < gfar...@redhat.com > wrote: 


Nah, it's not hosted on Ceph. 

On Wed, Jan 27, 2016 at 1:39 PM, Tyler Bishop 
< tyler.bis...@beyondhosting.net > wrote: 
> tyte... ceph pool go rogue? 
> 
> - Original Message - 
> From: "Gregory Farnum" < gfar...@redhat.com > 
> To: "John Hogenmiller" < j...@hogenmiller.net > 
> Cc: ceph-users@lists.ceph.com 
> Sent: Wednesday, January 27, 2016 2:08:36 PM 
> Subject: Re: [ceph-users] downloads.ceph.com no longer valid? 
> 
> Infrastructure guys say it's down and they are working on it. 
> 
> On Wed, Jan 27, 2016 at 11:01 AM, John Hogenmiller < j...@hogenmiller.net > 
> wrote: 
>> I dug a bit more. 
>> 
>> download.ceph.com resolves (for me) to 173.236.253.173 and is not responding 
>> to icmp, port 80, or 443 
>> 
>> https://git.ceph.com/release.asc works 
>> https://ceph.com/keys/release.asc returns 404 
>> http://eu.ceph.com/keys/release.asc works 
>> http://au.ceph.com/keys/release.asc returns 404 
>> 
>> http://eu.ceph.com/debian-infernalis/ works 
>> http://ceph.com/debian-infernalis/ redirects to 
>> http://download.ceph.com/debian-infernalis/ 
>> http://au.ceph.com/debian-infernalis/ returns 503 
>> 
>> So at this point, it looks like eu.ceph.com is working, au.ceph.com is out 
>> of sync, and download.ceph.com is not working (and it didn't work for me 
>> last week either, requiring me to use gitbuilder.ceph.com . 
>> 
>> 
>> 
>> On Wed, Jan 27, 2016 at 11:58 AM, Moulin Yoann < yoann.mou...@epfl.ch > 
>> wrote: 
>>> 
>>> Hello, 
>>> 
>>> 
>>> > I installed ceph last week from the docs and noticed all the 
>>> > downloads.ceph.com < http://downloads.ceph.com > and ceph.com/download 
>>> > < http://ceph.com/download > links no longer worked. After various 
>>> > searching around, I substituted and got past this. But then I forgot 
>>> > about it until I went to install again this week. 
>>> > 
>>> > Are the below URLs from my history the new correct? If so, should I go 
>>> > ahead and try and get a PR for docs updated? 
>>> > 
>>> > https://raw.github.com/ceph/ceph/master/keys/autobuild.asc 
>>> > 
>>> > http://gitbuilder.ceph.com/libapache-mod-fastcgi-deb-$(lsb_release 
>>> > -sc)-x86_64-basic/ref/master 
>>> 
>>> it was OK one 1h ago, maybe something goes wrong on some servers 
>>> 
>>> -- 
>>> Yoann Moulin 
>>> EPFL IC-IT 
>> 
>> 
>> 
>> ___ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Again - state of Ceph NVMe and SSDs

2016-01-19 Thread Tyler Bishop
It sounds like your just assuming these drives don't perform good...

- Original Message -
From: "Mark Nelson" <mnel...@redhat.com>
To: ceph-users@lists.ceph.com
Sent: Monday, January 18, 2016 2:17:19 PM
Subject: Re: [ceph-users] Again - state of Ceph NVMe and SSDs

Take Greg's comments to heart, because he's absolutely correct here. 
Distributed storage systems almost as a rule love parallelism and if you 
have enough you can often hide other issues.  Latency is probably the 
more interesting question, and frankly that's where you'll often start 
seeing the kernel, ceph code, drivers, random acts of god, etc, get in 
the way.  It's very easy for any one of these things to destroy your 
performance, so you have to be *very* *very* careful to understand 
exactly what you are seeing.  As such, don't trust any one benchmark. 
Wait until it's independently verified, possibly by multiple sources, 
before putting too much weight into it.

Mark

On 01/18/2016 01:02 PM, Tyler Bishop wrote:
> One of the other guys on the list here benchmarked them.  They spanked every 
> other ssd on the *recommended* tree..
>
> - Original Message -
> From: "Gregory Farnum" <gfar...@redhat.com>
> To: "Tyler Bishop" <tyler.bis...@beyondhosting.net>
> Cc: "David" <da...@visions.se>, "Ceph Users" <ceph-users@lists.ceph.com>
> Sent: Monday, January 18, 2016 2:01:44 PM
> Subject: Re: [ceph-users] Again - state of Ceph NVMe and SSDs
>
> On Sun, Jan 17, 2016 at 12:34 PM, Tyler Bishop
> <tyler.bis...@beyondhosting.net> wrote:
>> The changes you are looking for are coming from Sandisk in the ceph "Jewel" 
>> release coming up.
>>
>> Based on benchmarks and testing, sandisk has really contributed heavily on 
>> the tuning aspects and are promising 90%+ native iop of a drive in the 
>> cluster.
>
> Mmmm, they've gotten some very impressive numbers but most people
> shouldn't be expecting 90% of an SSD's throughput out of their
> workloads. These tests are *very* parallel and tend to run multiple
> OSD processes on a single SSD, IIRC.
> -Greg
>
>>
>> The biggest changes will come from the memory allocation with writes.  
>> Latency is going to be a lot lower.
>>
>>
>> - Original Message -
>> From: "David" <da...@visions.se>
>> To: "Wido den Hollander" <w...@42on.com>
>> Cc: ceph-users@lists.ceph.com
>> Sent: Sunday, January 17, 2016 6:49:25 AM
>> Subject: Re: [ceph-users] Again - state of Ceph NVMe and SSDs
>>
>> Thanks Wido, those are good pointers indeed :)
>> So we just have to make sure the backend storage (SSD/NVMe journals) won’t 
>> be saturated (or the controllers) and then go with as many RBD per VM as 
>> possible.
>>
>> Kind Regards,
>> David Majchrzak
>>
>> 16 jan 2016 kl. 22:26 skrev Wido den Hollander <w...@42on.com>:
>>
>>> On 01/16/2016 07:06 PM, David wrote:
>>>> Hi!
>>>>
>>>> We’re planning our third ceph cluster and been trying to find how to
>>>> maximize IOPS on this one.
>>>>
>>>> Our needs:
>>>> * Pool for MySQL, rbd (mounted as /var/lib/mysql or equivalent on KVM
>>>> servers)
>>>> * Pool for storage of many small files, rbd (probably dovecot maildir
>>>> and dovecot index etc)
>>>>
>>>
>>> Not completely NVMe related, but in this case, make sure you use
>>> multiple disks.
>>>
>>> For MySQL for example:
>>>
>>> - Root disk for OS
>>> - Disk for /var/lib/mysql (data)
>>> - Disk for /var/log/mysql (binary log)
>>> - Maybe even a InnoDB logfile disk
>>>
>>> With RBD you gain more performance by sending I/O into the cluster in
>>> parallel. So when ever you can, do so!
>>>
>>> Regarding small files, it might be interesting to play with the stripe
>>> count and stripe size there. By default this is 1 and 4MB. But maybe 16
>>> and 256k work better here.
>>>
>>> With Dovecot as well, use a different RBD disk for the indexes and a
>>> different one for the Maildir itself.
>>>
>>> Ceph excels at parallel performance. That is what you want to aim for.
>>>
>>>> So I’ve been reading up on:
>>>>
>>>> https://communities.intel.com/community/itpeernetwork/blog/2015/11/20/the-future-ssd-is-here-pcienvme-boosts-ceph-performance
>>>>
>>>> and ceph-users from october 2015:
>>>>
>>>> http://www.spinics.net/lists/ceph-users/msg22494.html

Re: [ceph-users] Again - state of Ceph NVMe and SSDs

2016-01-18 Thread Tyler Bishop
Check these out to:  
http://www.seagate.com/internal-hard-drives/solid-state-hybrid/1200-ssd/


- Original Message -
From: "Christian Balzer" 
To: "ceph-users" 
Sent: Sunday, January 17, 2016 10:45:56 PM
Subject: Re: [ceph-users] Again - state of Ceph NVMe and SSDs

Hello,

On Sat, 16 Jan 2016 19:06:07 +0100 David wrote:

> Hi!
> 
> We’re planning our third ceph cluster and been trying to find how to
> maximize IOPS on this one.
> 
> Our needs:
> * Pool for MySQL, rbd (mounted as /var/lib/mysql or equivalent on KVM
> servers)
> * Pool for storage of many small files, rbd (probably dovecot maildir
> and dovecot index etc)
>
I'm running dovecot for several 100k users on 2-node DRBD clusters and for
a mail archive server for a few hundred users backed by Ceph/RBD.
The later works fine (it's not that busy), but I wouldn't consider
replacing the DRBD clusters with Ceph/RBD at this time (higher investment
in storage 3x vs 2x and lower performance of course).

Depending on your use case you may be just fine of course.

> So I’ve been reading up on:
> 
> https://communities.intel.com/community/itpeernetwork/blog/2015/11/20/the-future-ssd-is-here-pcienvme-boosts-ceph-performance
> 
> and ceph-users from october 2015:
> 
> http://www.spinics.net/lists/ceph-users/msg22494.html
> 
> We’re planning something like 5 OSD servers, with:
> 
> * 4x 1.2TB Intel S3510
I'd be wary of that.
As in, you're spec'ing the best Intel SSDs money can buy below for
journals, but the least write-endurable Intel DC SSDs for OSDs here.
Note that write amplification (beyond Ceph and FS journals) is very much a
thing, especially with small files. 
There's a mail about this by me in the ML archives somewhere:
http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-October/043949.html

Unless you're very sure about this being a read-mostly environment I'd go
with 3610's at least.

> * 8st 4TB HDD
> * 2x Intel P3700 Series HHHL PCIe 400GB (one for SSD Pool Journal and
> one for HDD pool journal)
You may be better off (cost and SPOF wise) with 2 200GB S3700 (not 3710)
for the HDD journals, but then again that won't fit into your case, won't
it...
Given the IOPS limits in Ceph as it is, you're unlikely to see much of
difference if you forgo a journal for the SSDs and use shared journals with
DC S3610 or 3710 OSD SSDs. 
Note that as far as pure throughput is concerned (in most operations the
least critical factor) your single journal SSD will limit things to the
speed of 2 (of your 4) storage SSDs.
But then again, your network is probably saturated before that.

> * 2x 80GB Intel S3510 raid1 for system
> * 256GB RAM
Plenty. ^o^

> * 2x 8 core CPU Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz or better
> 
Not sure about Jewel, but SSD OSDs will eat pretty much any and all CPU
cycles you can throw at them.
This also boils down to the question if having mixed HDD/SSD storage nodes
(with the fun of having to set "osd crush update on start = false") is a
good idea or not, as opposed to nodes that are optimized for their
respective storage hardware (CPU, RAM, network wise).

Regards,

Christian
> This cluster will probably run Hammer LTS unless there are huge
> improvements in Infernalis when dealing 4k IOPS.
> 
> The first link above hints at awesome performance. The second one from
> the list not so much yet.. 
> 
> Is anyone running Hammer or Infernalis with a setup like this?
> Is it a sane setup?
> Will we become CPU constrained or can we just throw more RAM on it? :D
> 
> Kind Regards,
> David Majchrzak

-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and NFS

2016-01-18 Thread Tyler Bishop
You should test out cephfs exported as an NFS target.


- Original Message -
From: "david" 
To: ceph-users@lists.ceph.com
Sent: Monday, January 18, 2016 4:36:17 AM
Subject: [ceph-users] Ceph and NFS

Hello All.
Does anyone provides Ceph rbd/rgw/cephfs through NFS?  I have a 
requirement about Ceph Cluster which needs to provide NFS service. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CentOS 7 iscsi gateway using lrbd

2016-01-18 Thread Tyler Bishop

Well that's interesting. 

I've mounted block devices to the kernel and exported them to iscsi but the 
performance was horrible.. I wonder if this is any different? 



From: "Dominik Zalewski"  
To: ceph-users@lists.ceph.com 
Sent: Monday, January 18, 2016 6:35:20 AM 
Subject: [ceph-users] CentOS 7 iscsi gateway using lrbd 

Hi, 
I'm looking into implementing iscsi gateway with MPIO using lrbd - 
https://github.com/swiftgist/lrb 


https://www.suse.com/docrep/documents/kgu61iyowz/suse_enterprise_storage_2_and_iscsi.pdf
 

https://www.susecon.com/doc/2015/sessions/TUT16512.pdf 

>From above examples: 



For iSCSI failover and load-balancing, 

these servers must run a kernel supporting the target_core_ 

rbd module. This also requires that the target servers run at 

least the version 3.12.48-52.27.1 of the kernel-default ­package. 

Updates packages are available from the SUSE Linux 

Enterprise Server maintenance channel. 




I understand that lrbd is basically a nice way to configure LIO and rbd across 
ceph osd nodes/iscsi gatways. Does CentOS 7 have same target_core_rbd module in 
the kernel or this is something Suse Enterprise Storage specific only? 




Basically will LIO+rbd work the same way on CentOS 7? Has anyone using it with 
CentOS? 




Thanks 




Dominik 





___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Again - state of Ceph NVMe and SSDs

2016-01-18 Thread Tyler Bishop
One of the other guys on the list here benchmarked them.  They spanked every 
other ssd on the *recommended* tree..

- Original Message -
From: "Gregory Farnum" <gfar...@redhat.com>
To: "Tyler Bishop" <tyler.bis...@beyondhosting.net>
Cc: "David" <da...@visions.se>, "Ceph Users" <ceph-users@lists.ceph.com>
Sent: Monday, January 18, 2016 2:01:44 PM
Subject: Re: [ceph-users] Again - state of Ceph NVMe and SSDs

On Sun, Jan 17, 2016 at 12:34 PM, Tyler Bishop
<tyler.bis...@beyondhosting.net> wrote:
> The changes you are looking for are coming from Sandisk in the ceph "Jewel" 
> release coming up.
>
> Based on benchmarks and testing, sandisk has really contributed heavily on 
> the tuning aspects and are promising 90%+ native iop of a drive in the 
> cluster.

Mmmm, they've gotten some very impressive numbers but most people
shouldn't be expecting 90% of an SSD's throughput out of their
workloads. These tests are *very* parallel and tend to run multiple
OSD processes on a single SSD, IIRC.
-Greg

>
> The biggest changes will come from the memory allocation with writes.  
> Latency is going to be a lot lower.
>
>
> - Original Message -
> From: "David" <da...@visions.se>
> To: "Wido den Hollander" <w...@42on.com>
> Cc: ceph-users@lists.ceph.com
> Sent: Sunday, January 17, 2016 6:49:25 AM
> Subject: Re: [ceph-users] Again - state of Ceph NVMe and SSDs
>
> Thanks Wido, those are good pointers indeed :)
> So we just have to make sure the backend storage (SSD/NVMe journals) won’t be 
> saturated (or the controllers) and then go with as many RBD per VM as 
> possible.
>
> Kind Regards,
> David Majchrzak
>
> 16 jan 2016 kl. 22:26 skrev Wido den Hollander <w...@42on.com>:
>
>> On 01/16/2016 07:06 PM, David wrote:
>>> Hi!
>>>
>>> We’re planning our third ceph cluster and been trying to find how to
>>> maximize IOPS on this one.
>>>
>>> Our needs:
>>> * Pool for MySQL, rbd (mounted as /var/lib/mysql or equivalent on KVM
>>> servers)
>>> * Pool for storage of many small files, rbd (probably dovecot maildir
>>> and dovecot index etc)
>>>
>>
>> Not completely NVMe related, but in this case, make sure you use
>> multiple disks.
>>
>> For MySQL for example:
>>
>> - Root disk for OS
>> - Disk for /var/lib/mysql (data)
>> - Disk for /var/log/mysql (binary log)
>> - Maybe even a InnoDB logfile disk
>>
>> With RBD you gain more performance by sending I/O into the cluster in
>> parallel. So when ever you can, do so!
>>
>> Regarding small files, it might be interesting to play with the stripe
>> count and stripe size there. By default this is 1 and 4MB. But maybe 16
>> and 256k work better here.
>>
>> With Dovecot as well, use a different RBD disk for the indexes and a
>> different one for the Maildir itself.
>>
>> Ceph excels at parallel performance. That is what you want to aim for.
>>
>>> So I’ve been reading up on:
>>>
>>> https://communities.intel.com/community/itpeernetwork/blog/2015/11/20/the-future-ssd-is-here-pcienvme-boosts-ceph-performance
>>>
>>> and ceph-users from october 2015:
>>>
>>> http://www.spinics.net/lists/ceph-users/msg22494.html
>>>
>>> We’re planning something like 5 OSD servers, with:
>>>
>>> * 4x 1.2TB Intel S3510
>>> * 8st 4TB HDD
>>> * 2x Intel P3700 Series HHHL PCIe 400GB (one for SSD Pool Journal and
>>> one for HDD pool journal)
>>> * 2x 80GB Intel S3510 raid1 for system
>>> * 256GB RAM
>>> * 2x 8 core CPU Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz or better
>>>
>>> This cluster will probably run Hammer LTS unless there are huge
>>> improvements in Infernalis when dealing 4k IOPS.
>>>
>>> The first link above hints at awesome performance. The second one from
>>> the list not so much yet..
>>>
>>> Is anyone running Hammer or Infernalis with a setup like this?
>>> Is it a sane setup?
>>> Will we become CPU constrained or can we just throw more RAM on it? :D
>>>
>>> Kind Regards,
>>> David Majchrzak
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> --
>> Wido den Hollander
>> 42on B.V.
>> Ceph trainer and consultant
>>
>> Phone: +31 (0)20 700 9902
>> Skype: contact42on
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Cache pool redundancy requirements.

2016-01-17 Thread Tyler Bishop
Adding to this thought, even if you are using a single replica for the cache 
pool, will ceph scrub the cached block against the base tier? What if you have 
corruption in your cache? 


From: "Tyler Bishop" <tyler.bis...@beyondhosting.net> 
To: ceph-users@lists.ceph.com 
Cc: "Sebastien han" <sebastien@enovance.com> 
Sent: Sunday, January 17, 2016 3:47:13 PM 
Subject: Ceph Cache pool redundancy requirements. 

Based off Sebastiens design I had some thoughts: 
http://www.sebastien-han.fr/images/ceph-cache-pool-compute-design.png 

Hypervisors are for obvious reason more susceptible to crashes and reboots for 
security updates. Since ceph is utilizing a standard pool for the cache tier it 
creates a requirement for placement group stability. IE: We cannot use a pool 
with only 1 PG replica required. The ideal configuration would be to utilize a 
single replica ssd cache pool as READ ONLY, and all writes will be sent to the 
base tier ssd journals, this way your getting quick acks and fast reads without 
any lost flash capacity for redundancy. 

Has anyone tested a failure with a read only cache pool that utilizes a single 
replica? Does ceph simply fetch the data and place it to another pg? The cache 
pool should be able to sustain drive failures with 1 replica because its not 
needed for consistency. 

Interesting topic here.. curious if anyone has tried this. 

Our current architecture utilizes 48 hosts with 2x 1T SSD each as a 2 replica 
ssd pool. We have 4 host with 52x 6T disk for a capacity pool. We would like to 
run the base tier on the spindles with the SSD as a 100% utilized cache tier 
for busy pools. 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Again - state of Ceph NVMe and SSDs

2016-01-17 Thread Tyler Bishop
The changes you are looking for are coming from Sandisk in the ceph "Jewel" 
release coming up.

Based on benchmarks and testing, sandisk has really contributed heavily on the 
tuning aspects and are promising 90%+ native iop of a drive in the cluster.

The biggest changes will come from the memory allocation with writes.  Latency 
is going to be a lot lower.


- Original Message -
From: "David" 
To: "Wido den Hollander" 
Cc: ceph-users@lists.ceph.com
Sent: Sunday, January 17, 2016 6:49:25 AM
Subject: Re: [ceph-users] Again - state of Ceph NVMe and SSDs

Thanks Wido, those are good pointers indeed :)
So we just have to make sure the backend storage (SSD/NVMe journals) won’t be 
saturated (or the controllers) and then go with as many RBD per VM as possible.

Kind Regards,
David Majchrzak

16 jan 2016 kl. 22:26 skrev Wido den Hollander :

> On 01/16/2016 07:06 PM, David wrote:
>> Hi!
>> 
>> We’re planning our third ceph cluster and been trying to find how to
>> maximize IOPS on this one.
>> 
>> Our needs:
>> * Pool for MySQL, rbd (mounted as /var/lib/mysql or equivalent on KVM
>> servers)
>> * Pool for storage of many small files, rbd (probably dovecot maildir
>> and dovecot index etc)
>> 
> 
> Not completely NVMe related, but in this case, make sure you use
> multiple disks.
> 
> For MySQL for example:
> 
> - Root disk for OS
> - Disk for /var/lib/mysql (data)
> - Disk for /var/log/mysql (binary log)
> - Maybe even a InnoDB logfile disk
> 
> With RBD you gain more performance by sending I/O into the cluster in
> parallel. So when ever you can, do so!
> 
> Regarding small files, it might be interesting to play with the stripe
> count and stripe size there. By default this is 1 and 4MB. But maybe 16
> and 256k work better here.
> 
> With Dovecot as well, use a different RBD disk for the indexes and a
> different one for the Maildir itself.
> 
> Ceph excels at parallel performance. That is what you want to aim for.
> 
>> So I’ve been reading up on:
>> 
>> https://communities.intel.com/community/itpeernetwork/blog/2015/11/20/the-future-ssd-is-here-pcienvme-boosts-ceph-performance
>> 
>> and ceph-users from october 2015:
>> 
>> http://www.spinics.net/lists/ceph-users/msg22494.html
>> 
>> We’re planning something like 5 OSD servers, with:
>> 
>> * 4x 1.2TB Intel S3510
>> * 8st 4TB HDD
>> * 2x Intel P3700 Series HHHL PCIe 400GB (one for SSD Pool Journal and
>> one for HDD pool journal)
>> * 2x 80GB Intel S3510 raid1 for system
>> * 256GB RAM
>> * 2x 8 core CPU Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz or better
>> 
>> This cluster will probably run Hammer LTS unless there are huge
>> improvements in Infernalis when dealing 4k IOPS.
>> 
>> The first link above hints at awesome performance. The second one from
>> the list not so much yet.. 
>> 
>> Is anyone running Hammer or Infernalis with a setup like this?
>> Is it a sane setup?
>> Will we become CPU constrained or can we just throw more RAM on it? :D
>> 
>> Kind Regards,
>> David Majchrzak
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> 
> 
> -- 
> Wido den Hollander
> 42on B.V.
> Ceph trainer and consultant
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Cache pool redundancy requirements.

2016-01-17 Thread Tyler Bishop
Based off Sebastiens design I had some thoughts: 
http://www.sebastien-han.fr/images/ceph-cache-pool-compute-design.png 

Hypervisors are for obvious reason more susceptible to crashes and reboots for 
security updates. Since ceph is utilizing a standard pool for the cache tier it 
creates a requirement for placement group stability. IE: We cannot use a pool 
with only 1 PG replica required. The ideal configuration would be to utilize a 
single replica ssd cache pool as READ ONLY, and all writes will be sent to the 
base tier ssd journals, this way your getting quick acks and fast reads without 
any lost flash capacity for redundancy. 

Has anyone tested a failure with a read only cache pool that utilizes a single 
replica? Does ceph simply fetch the data and place it to another pg? The cache 
pool should be able to sustain drive failures with 1 replica because its not 
needed for consistency. 

Interesting topic here.. curious if anyone has tried this. 

Our current architecture utilizes 48 hosts with 2x 1T SSD each as a 2 replica 
ssd pool. We have 4 host with 52x 6T disk for a capacity pool. We would like to 
run the base tier on the spindles with the SSD as a 100% utilized cache tier 
for busy pools. 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] In production - Change osd config

2016-01-07 Thread Tyler Bishop
http://sudomakeinstall.com/uncategorized/ceph-make-configuration-changes-in-realtime-without-restart

Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited.

- Original Message -
From: "Francois Lafont" <flafdiv...@free.fr>
To: "ceph-users@lists.ceph.com" <Ceph-users@lists.ceph.com>
Sent: Saturday, January 2, 2016 11:08:31 PM
Subject: Re: [ceph-users] In production - Change osd config

Hi,

On 03/01/2016 02:16, Sam Huracan wrote:

> I try restart all osd but not efficient.
> Is there anyway to apply this change transparently to client?

You can use this command (it's an example):

# In a cluster node where the admin account is available.
ceph tell 'osd.*' injectargs '--osd_disk_threads 2'

After, you can check the config in a specific osd. For instance:

ceph daemon osd.5 config show | grep 'osd_disk_threads'

But you must launch this command in the node which hosts the osd.5
daemon.

Furthermore, with "ceph tell osd.\* injectargs ..." it's possible
to set a parameter for all osds from a simple cluster node with just
one command, but I don't know if it's possible to just _get_ (not set)
the value of a parameter of all osds with just one command.
Does a such command exist?

Personally, I don't know a such command and currently, I have to
launch "ceph daemon osd.$id config show" for each osd which is
hosted by the current server where I'm connected and I have to
repeat the commands in the other cluster nodes.

Regards.

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] more performance issues :(

2015-12-26 Thread Tyler Bishop
Add this under osd. 

osd op threads = 8 

Restart the osd services and try that. 






From: "Florian Rommel"  
To: "Wade Holler"  
Cc: ceph-users@lists.ceph.com 
Sent: Saturday, December 26, 2015 4:55:06 AM 
Subject: Re: [ceph-users] more performance issues :( 

Hi, iostat shows all OSDs working when data is benched. it looks like the 
culprit is nowhere to be found. If i add SSD journals with the SSDs that we 
have, even thought they give a much higher result with fio than the SATA 
drives, the speed of the cluster is exactly the same… 150-180MB/s, while reads 
max out the 10GBe network with no problem. 
rbd benchwrite however gives me NICE throughput… about 500MB /s to start with 
and then dropping and flattening out at 320MB/s, 9 IOPs…. so what the hell 
is going on?. 


if i take the journals off and move them to the disks themselves, same results. 
Something is really really off with my config i guess, and I need to do some 
serious troubleshooting to figure this out. 

Thanks for the help so far . 
//Florian 






On 24 Dec 2015, at 13:54, Wade Holler < wade.hol...@gmail.com > wrote: 

Have a look at the iostsat -x 1 1000 output to see what the drives are doing 

On Wed, Dec 23, 2015 at 4:35 PM Florian Rommel < florian.rom...@datalounges.com 
> wrote: 

BQ_BEGIN
Ah, totally forgot the additional details :) 

OS is SUSE Enterprise Linux 12.0 with all patches, 
Ceph version 0.94.3 
4 node cluster with 2x 10GBe networking, one for cluster and one for public 
network, 1 additional server purely as an admin server. 
Test machine is also 10gbe connected 

ceph.conf is included: 
[global] 
fsid = 312e0996-a13c-46d3-abe3-903e0b4a589a 
mon_initial_members = ceph-admin, ceph-01, ceph-02, ceph-03, ceph-04 
mon_host = 
192.168.0.190,192.168.0.191,192.168.0.192,192.168.0.193,192.168.0.194 
auth_cluster_required = cephx 
auth_service_required = cephx 
auth_client_required = cephx 
filestore_xattr_use_omap = true 
public network = 192.168.0.0/24 
cluster network = 192.168.10.0/24 

osd pool default size = 2 
[osd] 
osd journal size = 2048 

Thanks again for any help and merry xmas already . 
//F 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 




BQ_END



___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-26 Thread Tyler Bishop
http://www.seagate.com/files/www-content/product-content/ssd-fam/1200-ssd/en-us/docs/1200-2-sas-ssd-ds1858-2-1509us.pdf
 

Which of these have you tested? I didn't even know seagate had good flash. 







Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 




From: "Tyler Bishop" <tyler.bis...@beyondhosting.net> 
To: "Frederic BRET" <frederic.b...@univ-lr.fr> 
Cc: "Andrei Mikhailovsky" <and...@arhont.com>, "ceph-users" 
<ceph-users@lists.ceph.com> 
Sent: Saturday, December 26, 2015 11:23:46 AM 
Subject: Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results 

Wow, whats the seagate part numbers? I need to test those as well. 

What SAS controller are you utilizing? 

I did some test with FIO on some of the stuff we use. 

http://sudomakeinstall.com/servers/high-end-consumer-ssd-benchmarks 







Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] nfs over rbd problem

2015-12-25 Thread Tyler Bishop
I didn't read the whole thing but if your trying to do HA NFS, you need to run 
OCFS2 on your RBD and disable read/write caching on the rbd client. 




From: "Steve Anthony"  
To: ceph-users@lists.ceph.com 
Sent: Friday, December 25, 2015 12:39:01 AM 
Subject: Re: [ceph-users] nfs over rbd problem 

I've run into many problems trying to run RBD/NFS Pacemaker like you describe 
on a two node cluster. In my case, most of the problems were a result of a) no 
quorum and b) no STONITH. If you're going to be running this setup in 
production, I *highly* recommend adding more nodes (if only to maintain 
quorum). Without quorum, you will run into split-brain situations where you'll 
see anything from both nodes starting the same services to configuration 
changes disappearing when the nodes can't agree on a recent revision. 

In this case, it looks like a STONITH problem. Without a fencing mechanism, 
node2 cannot be 100% certain that node1 is dead. When you manually shutdown 
corosync, I suspect that as part of the process, node1 alerts the cluster that 
it's leaving as part of a planned shutdown. When it just disappears, node2 
can't determine what happened, it just knows node1 isn't answering anymore. 
Node1 is not necessarily down, there could just be a network problem between 
node1 and node2. In such a scenario, it would be very bad for node2 to 
map/mount the RBDs and start writing data while node1 is still providing the 
same service. Quorum can help here too, but for node2 to be certain node1 is 
down, it needs an out-of-band method to force power off node1, eg. I use IPMI 
on a separate physical network. 

So...add mode nodes as quorum members. You can set weight restrictions on the 
resource groups to prevent them from running on these nodes if the are not as 
powerful as the nodes you're using now. Then, add a STONITH mechanism on all 
the nodes, and verify it works. 

Once you do that, you should see things act the way you expect. Good luck! 

-Steve 

On 12/19/2015 03:46 AM, maoqi1982 wrote: 



Hi list 
I have a test ceph cluster include 3 nodes (node0: mon, node1: osd and nfs 
server1, node2 osd and nfs server2). 
os :centos6.6 ,kernel :3.10.94-1.el6.elrepo.x86_64, ceph version 0.94.5 
I followed the http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/ 
instructions to setup an active/standy NFS environment. 
when using commands " # service corosync stop or # poweroff " on node1 , the 
fail over switch situation went fine ( nfs server take over by node2). But when 
I testing the situation of cutting off the power of node1, the switch is 
failed. 
1. [root@node1 ~]# crm status 
Last updated: Fri Dec 18 17:14:19 2015 
Last change: Fri Dec 18 17:13:29 2015 
Stack: classic openais (with plugin) 
Current DC: node1 - partition with quorum 
Version: 1.1.11-97629de 
2 Nodes configured, 3 expected votes 
8 Resources configured 
Online: [ node1 node2 ] 
Resource Group: g_rbd_share_1 
p_rbd_map_1 (ocf::ceph:rbd.in): Started node1 
p_fs_rbd_1 (ocf::heartbeat:Filesystem): Started node1 
p_export_rbd_1 (ocf::heartbeat:exportfs): Started node1 
p_vip_1 (ocf::heartbeat:IPaddr): Started node1 
Clone Set: clo_nfs [g_nfs] 
Started: [ node1 node2 ] 
2. [root@node1 ~]# service corosync stop 
[root@node2 cluster]# crm status 
Last updated: Fri Dec 18 17:14:59 2015 
Last change: Fri Dec 18 17:13:29 2015 
Stack: classic openais (with plugin) 
Current DC: node2 - partition WITHOUT quorum 
Version: 1.1.11-97629de 
2 Nodes configured, 3 expected votes 
8 Resources configured 
Online: [ node2 ] 
OFFLINE: [ node1 ] 
Resource Group: g_rbd_share_1 
p_rbd_map_1 (ocf::ceph:rbd.in): Started node2 
p_fs_rbd_1 (ocf::heartbeat:Filesystem): Started node2 
p_export_rbd_1 (ocf::heartbeat:exportfs): Started node2 
p_vip_1 (ocf::heartbeat:IPaddr): Started node2 
Clone Set: clo_nfs [g_nfs] 
Started: [ node2 ] 
Stopped: [ node1 ] 

3. cut off node1 power manually 
[root@node2 cluster]# crm status 
Last updated: Fri Dec 18 17:23:06 2015 
Last change: Fri Dec 18 17:13:29 2015 
Stack: classic openais (with plugin) 
Current DC: node2 - partition WITHOUT quorum 
Version: 1.1.11-97629de 
2 Nodes configured, 3 expected votes 
8 Resources configured 
Online: [ node2 ] 
OFFLINE: [ node1 ] 
Clone Set: clo_nfs [g_nfs] 
Started: [ node2 ] 
Stopped: [ node1 ] 
Failed actions: 
p_rbd_map_1_start_0 on node2 'unknown error' (1): call=48, status=Timed Out, 
last-rc-change='Fri Dec 18 17:22:19 2015', queued=0ms, exec=20002ms 
corosync.log: 
Dec 18 17:22:19 corosync [pcmk ] notice: pcmk_peer_update: Transitional 
membership event on ring 668: memb=1, new=0, lost=1 
Dec 18 17:22:19 corosync [pcmk ] info: pcmk_peer_update: memb: node2 1211279552 
Dec 18 17:22:19 corosync [pcmk ] info: pcmk_peer_update: lost: node1 1194502336 
Dec 18 17:22:19 corosync [pcmk ] notice: pcmk_peer_update: Stable membership 
event on ring 668: memb=1, new=0, lost=0 
Dec 18 17:22:19 corosync [pcmk ] info: pcmk_peer_update: MEMB: node2 1211279552 
Dec 18 

Re: [ceph-users] Tuning ZFS + QEMU/KVM + Ceph RBD’s

2015-12-25 Thread Tyler Bishop
Due to the nature of distributed storage and a filesystem built to distribute 
itself across sequential devices.. you're going to always have poor performance.

Are you unable to use XFS inside the vm?


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited.

- Original Message -
From: "J David" 
To: ceph-users@lists.ceph.com
Sent: Thursday, December 24, 2015 1:10:36 PM
Subject: [ceph-users] Tuning ZFS + QEMU/KVM + Ceph RBD’s

For a variety of reasons, a ZFS pool in a QEMU/KVM virtual machine
backed by a Ceph RBD doesn’t perform very well.

Does anyone have any tuning tips (on either side) for this workload?

A fair amount of the problem is probably related to two factors.

First, ZFS always assumes it is talking to bare metal drives.  This
assumption is baked into it at a very fundamental level but Ceph is
pretty much the polar opposite of that.  For one thing, this makes any
type of write caching moderately terrifying from a dataloss
standpoint.

Although, to be fair, we run some non-critical KVM VM’s with ZFS
filesystems and cache=writeback with no observed ill-effects.  From
the available information, it *seems* safe to do that, but it’s not
certain whether under enough stress and the wrong crash at the wrong
moment, a lost/corrupted pool would be the result.  ZFS is notorious
for exploding if the underlying subsystem lies to it about whether
data has been permanently written to disk (that bare-metal assumption
again); it’s not an area that encourages pressing one’s luck.

The second issue is that ZFS likes a huge recordsize.  It uses small
blocks for small files, but as soon as a file grows a little bit, it
is happy to use 128KiB blocks (again assuming it’s talking to a
physical disk that can do a sequential read of a whole block with
minimal added overhead because the head was already there for the
first byte and what’s a little wasted bandwidth on a 6Gbps SAS bus
that has nothing else to do).

Ceph on the other hand *always* has something else to do, so a 128K
read-modify-write cycle to change one byte in the middle of a file
winds up being punishingly wasteful.

The RBD striping explanation ( on
http://docs.ceph.com/docs/hammer/man/8/rbd/ ) seems to suggest that
the default object size is 4M, so at least a single 128K read/write
should only hit one or (at most) two objects.

Whether it’s one or two seems to depend on whether ZFS has a useful
interpretation of track size, which it may not.  One such virtual
machine reports for a 1TB ceph image, 62 sectors of 512 bytes per
track, or 31K tracks.  Which could lead to a fair number of
object-straddling reads and writes at a 128K object size.

So the main impact of that is massive write amplification; writing one
byte can turn into reading and writing 128K from/to 2-6 different
OSDs.  All of which winds up passing over the storage LAN, introducing
tons of latency compared to that hypothetical 6Gbps SAS read that ZFS
is designed to expect.

If it helps establish a baseline, the reason this subject comes up is
that currently ZFS filesystems on RBD-backed QEMU VM’s do stuff like
this:

(iostat -x at 10-second intervals)

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vdc   0.00 0.00   41.00   31.3086.35  4006.40
113.22 2.07   27.18   24.10   31.22  13.82  99.92
vdc   0.00 0.00  146.30   38.10   414.95  4876.80
57.39 2.46   13.64   10.36   26.25   5.42  99.96
vdc   0.00 0.00  127.30  102.20   256.40 13081.60
116.24 2.079.198.579.97   4.35  99.88
vdc   0.00 0.00  160.80  160.70   297.30 10592.80
67.75 1.213.761.735.78   2.91  93.68

That’s… not great… for a low-load 10G LAN Ceph cluster with 60 Intel
DC S37X0 SSD’s.

Is there some tuning that could be done (on any side, ZFS, QEMU, or
Ceph) to optimize performance?

Are there any metrics we could collect to gain more insight into what
and where the bottlenecks are?

Some combination of changing the ZFS max recordsize, the QEMU virtual
disk geometry, and Ceph backend settings seems like it might make a
big difference, but there are many combinations, and it feels like
guesswork with the available information.

So it seems worthwhile to ask if anyone has been down this road and if
so what they found before spending a week or two rediscovering the
wheel.

Thanks for any advice!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

2015-12-22 Thread Tyler Bishop
Write endurance is kinda bullshit.

We have crucial 960gb drives storing data and we've only managed to take 2% off 
the drives life in the period of a year and hundreds of tb written weekly.


Stuff is way more durable than anyone gives it credit.


- Original Message -
From: "Lionel Bouton" 
To: "Andrei Mikhailovsky" , "ceph-users" 

Sent: Tuesday, December 22, 2015 11:04:26 AM
Subject: Re: [ceph-users] Intel S3710 400GB and Samsung PM863 480GB fio results

Le 22/12/2015 13:43, Andrei Mikhailovsky a écrit :
> Hello guys,
>
> Was wondering if anyone has done testing on Samsung PM863 120 GB version to 
> see how it performs? IMHO the 480GB version seems like a waste for the 
> journal as you only need to have a small disk size to fit 3-4 osd journals. 
> Unless you get a far greater durability.

The problem is endurance. If we use the 480GB for 3 OSDs each on the
cluster we might build we expect 3 years (with some margin for error but
not including any write amplification at the SSD level) before the SSDs
will fail.
In our context a 120GB model might not even last a year (endurance is
1/4th of the 480GB model). This is why SM863 models will probably be
more suitable if you have access to them: you can use smaller ones which
cost less and get more endurance (you'll have to check the performance
though, usually smaller models have lower IOPS and bandwidth).

> I am planning to replace my current journal ssds over the next month or so 
> and would like to find out if there is an a good alternative to the Intel's 
> 3700/3500 series. 

3700 are a safe bet (the 100GB model is rated for ~1.8PBW). 3500 models
probably don't have enough endurance for many Ceph clusters to be cost
effective. The 120GB model is only rated for 70TBW and you have to
consider both client writes and rebalance events.
I'm uneasy with SSDs expected to fail within the life of the system they
are in: you can have a cascade effect where an SSD failure brings down
several OSDs triggering a rebalance which might make SSDs installed at
the same time fail too. In this case in the best scenario you will reach
your min_size (>=2) and block any writes which would prevent more SSD
failures until you move journals to fresh SSDs. If min_size = 1 you
might actually lose data.

If you expect to replace your current journal SSDs if I were you I would
make a staggered deployment over several months/a year to avoid them
failing at the same time in case of an unforeseen problem. In addition
this would allow to evaluate the performance and behavior of a new SSD
model with your hardware (there have been reports of performance
problems with some combinations of RAID controllers and SSD
models/firmware versions) without impacting your cluster's overall
performance too much.

When using SSDs for journals you have to monitor both :
* the SSD wear leveling or something equivalent (SMART data may not be
available if you use a RAID controller but usually you can get the total
amount data written) of each SSD,
* the client writes on the whole cluster.
And check periodically what the expected lifespan left there is for each
of your SSD based on their current state, average write speed, estimated
write amplification (both due to pool's size parameter and the SSD
model's inherent write amplification) and the amount of data moved by
rebalance events you expect to happen.
Ideally you should make this computation before choosing the SSD models,
but several variables are not always easy to predict and probably will
change during the life of your cluster.

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph same rbd on multiple client

2015-10-15 Thread Tyler Bishop
I don't know enough on ocfs to help.  Sounds like you have unconccurent writes 
though

Sent from TypeMail



On Oct 15, 2015, 1:53 AM, at 1:53 AM, gjprabu <gjpr...@zohocorp.com> wrote:
>Hi Tyler,
>
>
>
>   Can please send me the next setup action to be taken on this issue.
>
>
>
>Regards
>
>Prabu
>
>
>
>
>
> On Wed, 14 Oct 2015 13:43:29 +0530 gjprabu
>gjpr...@zohocorp.com wrote 
>
>
>
>
>Hi Tyler,
>
>
>
>Thanks for your reply. We have disabled rbd_cache but still issue is
>persist. Please find our configuration file.
>
>
>
># cat /etc/ceph/ceph.conf
>
>[global]
>
>fsid = 944fa0af-b7be-45a9-93ff-b9907cfaee3f
>
>mon_initial_members = integ-hm5, integ-hm6, integ-hm7
>
>mon_host = 192.168.112.192,192.168.112.193,192.168.112.194
>
>auth_cluster_required = cephx
>
>auth_service_required = cephx
>
>auth_client_required = cephx
>
>filestore_xattr_use_omap = true
>
>osd_pool_default_size = 2
>
>
>
>[mon]
>
>mon_clock_drift_allowed = .500
>
>
>
>[client]
>
>rbd_cache = false
>
>
>
>--
>
>
>
> cluster 944fa0af-b7be-45a9-93ff-b9907cfaee3f
>
> health HEALTH_OK
>
>monmap e2: 3 mons at
>{integ-hm5=192.168.112.192:6789/0,integ-hm6=192.168.112.193:6789/0,integ-hm7=192.168.112.194:6789/0}
>
> election epoch 480, quorum 0,1,2 integ-hm5,integ-hm6,integ-hm7
>
> osdmap e49780: 2 osds: 2 up, 2 in
>
>  pgmap v2256565: 190 pgs, 2 pools, 1364 GB data, 410 kobjects
>
>2559 GB used, 21106 GB / 24921 GB avail
>
> 190 active+clean
>
>  client io 373 kB/s rd, 13910 B/s wr, 103 op/s
>
>
>
>
>
>Regards
>
>Prabu
>
>
>
> On Tue, 13 Oct 2015 19:59:38 +0530 Tyler Bishop
>tyler.bis...@beyondhosting.net wrote 
>
>
>
>
>You need to disable RBD caching.
>
>
>
>
>
>
>
> Tyler Bishop
>Chief Technical Officer
> 513-299-7108 x10
> 
>tyler.bis...@beyondhosting.net
>
> 
> 
>If you are not the intended recipient of this transmission you are
>notified that disclosing, copying, distributing or taking any action in
>reliance on the contents of this information is strictly prohibited.
>
> 
>
>
>
>
>
>
>
>
>
>From: "gjprabu" gjpr...@zohocorp.com
>
>To: "Frédéric Nass" frederic.n...@univ-lorraine.fr
>
>Cc: "ceph-users@lists.ceph.com"
>ceph-users@lists.ceph.com, "Siva Sokkumuthu"
>sivaku...@zohocorp.com, "Kamal Kannan Subramani(kamalakannan)"
>ka...@manageengine.com
>
>Sent: Tuesday, October 13, 2015 9:11:30 AM
>
>Subject: Re: [ceph-users] ceph same rbd on multiple client
>
>
>
>
>Hi ,
>
>
>
>
>We have CEPH  RBD with OCFS2 mounted servers. we are facing i/o errors
>simultaneously while move the folder using one nodes in the same disk
>other nodes data replicating with below said error (Copying is not
>having any problem). Workaround if we remount the partition this issue
>get resolved but after sometime problem again reoccurred. please help
>on this issue.
>
>
>
>Note : We have total 5 Nodes, here two nodes working fine other nodes
>are showing like below input/output error on moved data's.
>
>
>
>ls -althr 
>
>ls: cannot access LITE_3_0_M4_1_TEST: Input/output error 
>
>ls: cannot access LITE_3_0_M4_1_OLD: Input/output error 
>
>total 0 
>
>d? ? ? ? ? ? LITE_3_0_M4_1_TEST 
>
>d? ? ? ? ? ? LITE_3_0_M4_1_OLD 
>
>
>
>Regards
>
>Prabu
>
>
>
>
>
>
> On Fri, 22 May 2015 17:33:04 +0530 Frédéric Nass
>frederic.n...@univ-lorraine.fr wrote 
>
>
>
>
>Hi,
>
>
>
>Waiting for CephFS, you can use clustered filesystem like OCFS2 or GFS2
>on top of RBD mappings so that each host can access the same device and
>clustered filesystem.
>
>
>
>Regards,
>
>
>
>Frédéric.
>
>
>
>Le 21/05/2015 16:10, gjprabu a écrit :
>
>
>
>
>
>-- Frédéric Nass Sous direction des Infrastructures, Direction du
>Numérique, Université de Lorraine. Tél : 03.83.68.53.83
>___ 
>
>ceph-users mailing list 
>
>ceph-users@lists.ceph.com 
>
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>
>
>Hi All,
>
>
>
>We are using rbd and map the same rbd image to the rbd device on two
>different client but i can't see the data until i umount and mount -a
>partition. Kindly share the solution for this issue.
>
>
>
>Example
>
>create rbd image named foo
>
>map foo to /dev/rbd0 on server A,   mount /dev/rbd0 to /mnt
>
>map foo to /dev/rbd0 on server B,   mount /dev/rbd0 to /mnt
>
>
>
>Regards
>
>Prabu
>
>
>
>
>
>
>
>
>___ ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
>
>
>___
>
>ceph-users mailing list
>
>ceph-users@lists.ceph.com
>
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph same rbd on multiple client

2015-10-13 Thread Tyler Bishop
You need to disable RBD caching. 







Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 




From: "gjprabu" <gjpr...@zohocorp.com> 
To: "Frédéric Nass" <frederic.n...@univ-lorraine.fr> 
Cc: "<ceph-users@lists.ceph.com>" <ceph-users@lists.ceph.com>, "Siva 
Sokkumuthu" <sivaku...@zohocorp.com>, "Kamal Kannan Subramani(kamalakannan)" 
<ka...@manageengine.com> 
Sent: Tuesday, October 13, 2015 9:11:30 AM 
Subject: Re: [ceph-users] ceph same rbd on multiple client 

Hi , 

We have CEPH RBD with OCFS2 mounted servers. we are facing i/o errors 
simultaneously while move the folder using one nodes in the same disk other 
nodes data replicating with below said error (Copying is not having any 
problem). Workaround if we remount the partition this issue get resolved but 
after sometime problem again reoccurred. please help on this issue. 

Note : We have total 5 Nodes, here two nodes working fine other nodes are 
showing like below input/output error on moved data's. 

ls -althr 
ls: cannot access LITE_3_0_M4_1_TEST: Input/output error 
ls: cannot access LITE_3_0_M4_1_OLD: Input/output error 
total 0 
d? ? ? ? ? ? LITE_3_0_M4_1_TEST 
d? ? ? ? ? ? LITE_3_0_M4_1_OLD 

Regards 
Prabu 

 On Fri, 22 May 2015 17:33:04 +0530 Frédéric Nass 
<frederic.n...@univ-lorraine.fr> wrote  




Hi, 

Waiting for CephFS, you can use clustered filesystem like OCFS2 or GFS2 on top 
of RBD mappings so that each host can access the same device and clustered 
filesystem. 

Regards, 

Frédéric. 

Le 21/05/2015 16:10, gjprabu a écrit : 


-- 
Frédéric Nass

Sous direction des Infrastructures,
Direction du Numérique,
Université de Lorraine.

Tél : 03.83.68.53.83 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

BQ_BEGIN

Hi All, 

We are using rbd and map the same rbd image to the rbd device on two different 
client but i can't see the data until i umount and mount -a partition. Kindly 
share the solution for this issue. 

Example 
create rbd image named foo 
map foo to /dev/rbd0 on server A, mount /dev/rbd0 to /mnt 
map foo to /dev/rbd0 on server B, mount /dev/rbd0 to /mnt 

Regards 
Prabu 



___
ceph-users mailing list ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 



BQ_END



___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RadosGW failing to upload multipart.

2015-10-13 Thread Tyler Bishop


This just started after we removed some old osd hardware, I feel like it may be 
related but at the same time I'm not sure how? All of my pools have the same 
ruleset and everything else is working and uploads do work but the testing for 
multipart process fails. 




Any help would be greatly appreciated! 




# s3cmd -c tests3cfg put testfile s3://vift/testing/testfile --no-ssl --verbose 

INFO: Compiling list of local files... 

INFO: Running stat() and reading/calculating MD5 values on 1 files, this may 
take some time... 

INFO: Summary: 1 local files to upload 

WARNING: Retrying failed request: /testing/testfile?uploads () 

WARNING: Waiting 3 sec... 

WARNING: Retrying failed request: /?location () 

WARNING: Waiting 3 sec... 

testfile -> s3://vift/testing/testfile [part 1 of 2, 15MB] 

15728640 of 15728640 100% in 0s 30.85 MB/s done 

testfile -> s3://vift/testing/testfile [part 2 of 2, 1024kB] 

1048576 of 1048576 100% in 0s 4.81 MB/s done 
Complete! 




Using Civetweb: 




2015-10-13 16:36:35.573656 7fc3267f4700 1 == starting new request 
req=0x7fc38c000ec0 = 

2015-10-13 16:36:35.579672 7fc3267f4700 1 == req done req=0x7fc38c000ec0 
http_status=400 == 

2015-10-13 16:36:35.579783 7fc3267f4700 1 civetweb: 0x7fc38c0017b0: 10.1.48.95 
- - [13/Oct/2015:16:36:35 -0400] "POST /testing/testfile HTTP/1.1" -1 0 - - 

2015-10-13 16:36:38.588189 7fc325ff3700 1 == starting new request 
req=0x7fc398160a00 = 

2015-10-13 16:36:38.661519 7fc325ff3700 1 == req done req=0x7fc398160a00 
http_status=200 == 

2015-10-13 16:36:38.661619 7fc325ff3700 1 civetweb: 0x7fc398004380: 10.1.48.95 
- - [13/Oct/2015:16:36:38 -0400] "POST /testing/testfile HTTP/1.1" -1 0 - - 

2015-10-13 16:36:41.667706 7fc3257f2700 1 == starting new request 
req=0x7fc394004db0 = 

2015-10-13 16:36:41.668862 7fc3257f2700 1 == req done req=0x7fc394004db0 
http_status=200 == 

2015-10-13 16:36:41.668965 7fc3257f2700 1 civetweb: 0x7fc394005120: 10.1.48.95 
- - [13/Oct/2015:16:36:41 -0400] "GET / HTTP/1.1" -1 0 - - 

2015-10-13 16:36:41.782854 7fc3257f2700 1 == starting new request 
req=0x7fc39411efa0 = 

2015-10-13 16:36:42.154775 7fc3257f2700 1 == req done req=0x7fc39411efa0 
http_status=200 == 

2015-10-13 16:36:42.154844 7fc3257f2700 1 civetweb: 0x7fc394005120: 10.1.48.95 
- - [13/Oct/2015:16:36:41 -0400] "PUT /testing/testfile HTTP/1.1" -1 0 - - 

2015-10-13 16:36:42.164454 7fc3257f2700 1 == starting new request 
req=0x7fc394428b60 = 

2015-10-13 16:36:42.363195 7fc3257f2700 1 == req done req=0x7fc394428b60 
http_status=200 == 

2015-10-13 16:36:42.363271 7fc3257f2700 1 civetweb: 0x7fc394005120: 10.1.48.95 
- - [13/Oct/2015:16:36:42 -0400] "PUT /testing/testfile HTTP/1.1" -1 0 - - 

2015-10-13 16:36:42.365241 7fc3257f2700 1 == starting new request 
req=0x7fc394001fa0 = 

2015-10-13 16:36:42.392679 7fc3257f2700 0 RGWObjManifest::operator++(): result: 
ofs=4194304 stripe_ofs=4194304 part_ofs=0 rule->part_size=15728640 

2015-10-13 16:36:42.392729 7fc3257f2700 0 RGWObjManifest::operator++(): result: 
ofs=8388608 stripe_ofs=8388608 part_ofs=0 rule->part_size=15728640 

2015-10-13 16:36:42.392739 7fc3257f2700 0 RGWObjManifest::operator++(): result: 
ofs=12582912 stripe_ofs=12582912 part_ofs=0 rule->part_size=15728640 

2015-10-13 16:36:42.392747 7fc3257f2700 0 RGWObjManifest::operator++(): result: 
ofs=15728640 stripe_ofs=15728640 part_ofs=15728640 rule->part_size=1048576 

2015-10-13 16:36:42.392755 7fc3257f2700 0 RGWObjManifest::operator++(): result: 
ofs=16777216 stripe_ofs=16777216 part_ofs=16777216 rule->part_size=1048576 

2015-10-13 16:36:42.429989 7fc3257f2700 1 == req done req=0x7fc394001fa0 
http_status=200 == 

2015-10-13 16:36:42.430094 7fc3257f2700 1 civetweb: 0x7fc394005120: 10.1.48.95 
- - [13/Oct/2015:16:36:42 -0400] "POST /testing/testfile HTTP/1.1" -1 0 - - 















Tyler Bishop 
Chief Technical Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 403 return code on S3 Gateway for remove keys or change key.

2015-07-10 Thread Tyler Bishop
: application/x-www-form-urlencoded 
Date: Fri, 10 Jul 2015 17:42:48 GMT 
Authorization: AWS 27K8RGLQBN8K6G5PV3RS:vojakYdp1RqR3JYX5g5P6ny0vMc= 
Content-Length: 0 

 HTTP/1.1 403 Forbidden 
 Server: Tengine/2.1.0 
 Date: Fri, 10 Jul 2015 17:42:44 GMT 
 Content-Type: application/json 
 Content-Length: 32 
 Connection: keep-alive 
 Accept-Ranges: bytes 
* HTTP error before end of send, stop sending 
 
* Closing connection 0 
### END CURL VERBOSE ### 

### START Response Dump ### 
CFResponse Object 
( 
[header] = Array 
( 
[server] = Tengine/2.1.0 
[date] = Fri, 10 Jul 2015 17:42:44 GMT 
[content-type] = application/json 
[content-length] = 32 
[connection] = keep-alive 
[accept-ranges] = bytes 
[_info] = Array 
( 
[url] = 
https://s3.example.com/admin/user?keyuid=C1access-key=ANNMJKDEZ2RN60I03GI9 
[content_type] = application/json 
[http_code] = 403 
[header_size] = 184 
[request_size] = 497 
[filetime] = -1 
[ssl_verify_result] = 20 
[redirect_count] = 0 
[total_time] = 0.312 
[namelookup_time] = 0 
[connect_time] = 0.062 
[pretransfer_time] = 0.234 
[size_upload] = 0 
[size_download] = 32 
[speed_download] = 102 
[speed_upload] = 0 
[download_content_length] = 32 
[upload_content_length] = 0 
[starttransfer_time] = 0.312 
[redirect_time] = 0 
[redirect_url] = 
[primary_ip] = 1.2.3.4 
[certinfo] = Array 
( 
) 

[primary_port] = 443 
[local_ip] = 192.168.2.12 
[local_port] = 64079 
[method] = DELETE 
) 

[x-aws-request-url] = 
https://s3.example.com/admin/user?keyuid=C1access-key=ANNMJKDEZ2RN60I03GI9 
[x-aws-redirects] = 0 
[x-aws-stringtosign] = DELETE 

application/x-www-form-urlencoded 
Fri, 10 Jul 2015 17:42:48 GMT 
/admin/user?key 
[x-aws-requestheaders] = Array 
( 
[Content-Type] = application/x-www-form-urlencoded 
[Date] = Fri, 10 Jul 2015 17:42:48 GMT 
[Authorization] = AWS 27K8RGLQBN8K6G5PV3RS:vojakYdp1RqR3JYX5g5P6ny0vMc= 
[Expect] = 
) 

) 

[body] = CFSimpleXML Object 
( 
[Code] = SignatureDoesNotMatch 
) 

[status] = 403 
) 
### END Response Dump ### 







Tyler Bishop 
Chief Executive Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hardware cache settings recomendation

2015-06-11 Thread Tyler Bishop
You want write cache to disk, no write cache for SSD. 

I assume all of your data disk are single drive raid 0? 







Tyler Bishop 
Chief Executive Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 




From: Mateusz Skała mateusz.sk...@budikom.net 
To: ceph-users@lists.ceph.com 
Sent: Saturday, June 6, 2015 4:09:59 AM 
Subject: [ceph-users] Hardware cache settings recomendation 



Hi, 

Please help me with hardware cache settings on controllers for ceph rbd best 
performance. All Ceph hosts have one SSD drive for journal. 



We are using 4 different controllers, all with BBU: 

· HP Smart Array P400 

· HP Smart Array P410i 

· Dell PERC 6/i 

· Dell PERC H700 



I have to set cache policy, on Dell settings are: 

· Read Policy 

o Read-Ahead (current) 

o No-Read-Ahead 

o Adaptive Read-Ahead 

· Write Policy 

o Write-Back (current) 

o Write-Through 

· Cache Policy 

o Cache I/O 

o Direct I/O (current) 

· Disk Cache Policy 

o Default (current) 

o Enabled 

o Disabled 

On HP controllers: 

· Cache Ratio (current: 25% Read / 75% Write) 

· Drive Write Cache 

o Enabled (current) 

o Disabled 



And there is one more setting in LogicalDrive option: 

· Caching: 

o Enabled (current) 

o Disabled 



Please verify my settings and give me some recomendations. 

Best regards, 

Mateusz 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] TR: High apply latency on OSD causes poor performance on VM

2015-06-11 Thread Tyler Bishop
Turn off write cache on the controller. Your probably seeing the flush to disk. 







Tyler Bishop 
Chief Executive Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 




From: Franck Allouis franck.allo...@stef.com 
To: ceph-users ceph-us...@ceph.com 
Sent: Friday, May 29, 2015 8:54:41 AM 
Subject: [ceph-users] TR: High apply latency on OSD causes poor performance on 
VM 



Hi, 



Could you take a look on my problem. 

It’s about high latency on my OSDs on HP G8 servers (ceph01, ceph02 and 
ceph03). 

When I run a rados bench for 60 sec, the results are surprising : after a few 
seconds, there is no traffic, then it’s resume, etc. 

Finally, the maximum latency is high and VM’s disks freeze lot. 



#rados bench -p pool-test-g8 60 write 

Maintaining 16 concurrent writes of 4194304 bytes for up to 60 seconds or 0 
objects 

Object prefix: benchmark_data_ceph02_56745 

sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 

0 0 0 0 0 0 - 0 

1 16 82 66 263.959 264 0.0549584 0.171148 

2 16 134 118 235.97 208 0.344873 0.232103 

3 16 189 173 230.639 220 0.015583 0.24581 

4 16 248 232 231.973 236 0.0704699 0.252504 

5 16 306 290 231.974 232 0.0229872 0.258343 

6 16 371 355 236.64 260 0.27183 0.255469 

7 16 419 403 230.26 192 0.0503492 0.263304 

8 16 460 444 221.975 164 0.0157241 0.261779 

9 16 506 490 217.754 184 0.199418 0.271501 

10 16 518 502 200.778 48 0.0472324 0.269049 

11 16 518 502 182.526 0 - 0.269049 

12 16 556 540 179.981 76 0.100336 0.301616 

13 16 607 591 181.827 204 0.173912 0.346105 

14 16 655 639 182.552 192 0.0484904 0.339879 

15 16 683 667 177.848 112 0.0504184 0.349929 

16 16 746 730 182.481 252 0.276635 0.347231 

17 16 807 791 186.098 244 0.391491 0.339275 

18 16 845 829 184.203 152 0.188608 0.342021 

19 16 850 834 175.561 20 0.960175 0.342717 

2015-05-28 17:09:48.397376min lat: 0.013532 max lat: 6.28387 avg lat: 0.346987 

sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 

20 16 859 843 168.582 36 0.0182246 0.346987 

21 16 863 847 161.316 16 3.18544 0.355051 

22 16 897 881 160.165 136 0.0811037 0.371209 

23 16 901 885 153.897 16 0.0482124 0.370793 

24 16 943 927 154.484 168 0.63064 0.397204 

25 15 997 982 157.104 220 0.0933448 0.392701 

26 16 1058 1042 160.291 240 0.166463 0.385943 

27 16 1088 1072 158.798 120 1.63882 0.388568 

28 16 1125 1109 158.412 148 0.0511479 0.38419 

29 16 1155 1139 157.087 120 0.162266 0.385898 

30 16 1163 1147 152.917 32 0.0682181 0.383571 

31 16 1190 1174 151.468 108 0.0489185 0.386665 

32 16 1196 1180 147.485 24 2.95263 0.390657 

33 16 1213 1197 145.076 68 0.0467788 0.389299 

34 16 1265 1249 146.926 208 0.0153085 0.420687 

35 16 1332 1316 150.384 268 0.0157061 0.42259 

36 16 1374 1358 150.873 168 0.251626 0.417373 

37 16 1402 1386 149.822 112 0.0475302 0.413886 

38 16 1444 1428 150.3 168 0.0507577 0.421055 

39 16 1500 1484 152.189 224 0.0489163 0.416872 

2015-05-28 17:10:08.399434min lat: 0.013532 max lat: 9.26596 avg lat: 0.415296 

sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 

40 16 1530 1514 151.384 120 0.951713 0.415296 

41 16 1551 1535 149.741 84 0.0686787 0.416571 

42 16 1606 1590 151.413 220 0.0826855 0.41684 

43 16 1656 1640 152.542 200 0.0706539 0.409974 

44 16 1663 1647 149.712 28 0.046672 0.408476 

45 16 1685 1669 148.34 88 0.0989566 0.424918 

46 16 1707 1691 147.028 88 0.0490569 0.421116 

47 16 1707 1691 143.9 0 - 0.421116 

48 16 1707 1691 140.902 0 - 0.421116 

49 16 1720 1704 139.088 17. 0.0480335 0.428997 

50 16 1752 1736 138.866 128 0.053219 0.4416 

51 16 1786 1770 138.809 136 0.602946 0.440357 

52 16 1810 1794 137.986 96 0.0472518 0.438376 

53 16 1831 1815 136.967 84 0.0148999 0.446801 

54 16 1831 1815 134.43 0 - 0.446801 

55 16 1853 1837 133.586 44 0.0499486 0.455561 

56 16 1898 1882 134.415 180 0.0566593 0.461019 

57 16 1932 1916 134.442 136 0.0162902 0.454385 

58 16 1948 1932 133.227 64 0.62188 0.464403 

59 16 1966 1950 132.19 72 0.563613 0.472147 

2015-05-28 17:10:28.401525min lat: 0.013532 max lat: 12.4828 avg lat: 0.472084 

sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 

60 16 1983 1967 131.12 68 0.030789 0.472084 

61 16 1984 1968 129.036 4 0.0519125 0.471871 

62 16 1984 1968 126.955 0 - 0.471871 

63 16 1984 1968 124.939 0 - 0.471871 

64 14 1984 1970 123.112 2.7 4.20878 0.476035 

Total time run: 64.823355 

Total writes made: 1984 

Write size: 4194304 

Bandwidth (MB/sec): 122.425 

Stddev Bandwidth: 85.3816 

Max bandwidth (MB/sec): 268 

Min bandwidth (MB/sec): 0 

Average Latency: 0.520956 

Stddev Latency: 1.17678 

Max latency: 12.4828 

Min latency: 0.013532 





I have installed a new ceph06 box which has best latencies but hardware is 
different (RAID card, disks

[ceph-users] Cannot add OSD node into crushmap or all writes fail

2015-03-30 Thread Tyler Bishop
I have this ceph node that will correctly recover into my ceph pool and 
performance looks to be normal for the rbd clients. However after a few minutes 
once finishing recovery the rbd clients begin to fall over and cannot write 
data to the pool. 

I've been trying to figure this out for weeks! None of the logs contain 
anything relevant at all. 

If I disable the node in the crushmap the rbd clients immediately begin writing 
to the other nodes. 

Ideas? 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Issus with device-mapper drive partition names.

2015-02-13 Thread Tyler Bishop
When trying to zap and prepare a disk it fails to find the partitions. 



[ceph@ceph0-mon0 ~]$ ceph-deploy -v disk zap 
ceph0-node1:/dev/mapper/35000c50031a1c08b 

[ ceph_deploy.conf ][ DEBUG ] found configuration file at: 
/home/ceph/.cephdeploy.conf 

[ ceph_deploy.cli ][ INFO ] Invoked (1.5.21): /usr/bin/ceph-deploy -v disk zap 
ceph0-node1:/dev/mapper/35000c50031a1c08b 

[ ceph_deploy.osd ][ DEBUG ] zapping /dev/mapper/35000c50031a1c08b on 
ceph0-node1 

[ ceph0-node1 ][ DEBUG ] connection detected need for sudo 

[ ceph0-node1 ][ DEBUG ] connected to host: ceph0-node1 

[ ceph0-node1 ][ DEBUG ] detect platform information from remote host 

[ ceph0-node1 ][ DEBUG ] detect machine type 

[ ceph_deploy.osd ][ INFO ] Distro info: CentOS Linux 7.0.1406 Core 

[ ceph0-node1 ][ DEBUG ] zeroing last few blocks of device 

[ ceph0-node1 ][ DEBUG ] find the location of an executable 

[ ceph0-node1 ][ INFO ] Running command: sudo /usr/sbin/ceph-disk zap 
/dev/mapper/35000c50031a1c08b 

[ ceph0-node1 ][ DEBUG ] Creating new GPT entries. 

[ ceph0-node1 ][ DEBUG ] Warning: The kernel is still using the old partition 
table. 

[ ceph0-node1 ][ DEBUG ] The new table will be used at the next reboot. 

[ ceph0-node1 ][ DEBUG ] GPT data structures destroyed! You may now partition 
the disk using fdisk or 

[ ceph0-node1 ][ DEBUG ] other utilities. 

[ ceph0-node1 ][ DEBUG ] Warning: The kernel is still using the old partition 
table. 

[ ceph0-node1 ][ DEBUG ] The new table will be used at the next reboot. 

[ ceph0-node1 ][ DEBUG ] The operation has completed successfully. 

[ ceph_deploy.osd ][ INFO ] calling partx on zapped device 
/dev/mapper/35000c50031a1c08b 

[ ceph_deploy.osd ][ INFO ] re-reading known partitions will display errors 

[ ceph0-node1 ][ INFO ] Running command: sudo partx -a 
/dev/mapper/35000c50031a1c08b 




Now running prepare fails because it can't find the newly created partitions. 




[ceph@ceph0-mon0 ~]$ ceph-deploy -v osd prepare 
ceph0-node1:/dev/mapper/35000c50031a1c08b 




[ ceph_deploy.conf ][ DEBUG ] found configuration file at: 
/home/ceph/.cephdeploy.conf 

[ ceph_deploy.cli ][ INFO ] Invoked (1.5.21): /usr/bin/ceph-deploy -v osd 
prepare ceph0-node1:/dev/mapper/35000c50031a1c08b 

[ ceph_deploy.osd ][ DEBUG ] Preparing cluster ceph disks 
ceph0-node1:/dev/mapper/35000c50031a1c08b: 

[ ceph0-node1 ][ DEBUG ] connection detected need for sudo 

[ ceph0-node1 ][ DEBUG ] connected to host: ceph0-node1 

[ ceph0-node1 ][ DEBUG ] detect platform information from remote host 

[ ceph0-node1 ][ DEBUG ] detect machine type 

[ ceph_deploy.osd ][ INFO ] Distro info: CentOS Linux 7.0.1406 Core 

[ ceph_deploy.osd ][ DEBUG ] Deploying osd to ceph0-node1 

[ ceph0-node1 ][ DEBUG ] write cluster configuration to 
/etc/ceph/{cluster}.conf 

[ ceph0-node1 ][ INFO ] Running command: sudo udevadm trigger 
--subsystem-match=block --action=add 

[ ceph_deploy.osd ][ DEBUG ] Preparing host ceph0-node1 disk 
/dev/mapper/35000c50031a1c08b journal None activate False 

[ ceph0-node1 ][ INFO ] Running command: sudo ceph-disk -v prepare --fs-type 
xfs --cluster ceph -- /dev/mapper/35000c50031a1c08b 

[ ceph0-node1 ][ WARNIN ] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
--cluster=ceph --show-config-value=fsid 

[ ceph0-node1 ][ WARNIN ] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs 

[ ceph0-node1 ][ WARNIN ] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs 

[ ceph0-node1 ][ WARNIN ] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_mount_options_xfs 

[ ceph0-node1 ][ WARNIN ] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs 

[ ceph0-node1 ][ WARNIN ] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
--cluster=ceph --show-config-value=osd_journal_size 

[ ceph0-node1 ][ WARNIN ] INFO:ceph-disk:Will colocate journal with data on 
/dev/mapper/35000c50031a1c08b 

[ ceph0-node1 ][ WARNIN ] DEBUG:ceph-disk:Creating journal partition num 2 size 
1 on /dev/mapper/35000c50031a1c08b 

[ ceph0-node1 ][ WARNIN ] INFO:ceph-disk:Running command: /sbin/sgdisk 
--new=2:0:1M --change-name=2:ceph journal 
--partition-guid=2:b9202d1b-63be-4deb-ad08-0a143a31f4a9 
--typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- 
/dev/mapper/35000c50031a1c08b 

[ ceph0-node1 ][ DEBUG ] Information: Moved requested sector from 34 to 2048 in 

[ ceph0-node1 ][ DEBUG ] order to align on 2048-sector boundaries. 

[ ceph0-node1 ][ DEBUG ] Warning: The kernel is still using the old partition 
table. 

[ ceph0-node1 ][ DEBUG ] The new table will be used at the next reboot. 

[ ceph0-node1 ][ DEBUG ] The operation has completed successfully. 

[ ceph0-node1 ][ WARNIN ] INFO:ceph-disk:calling partx on prepared device 
/dev/mapper/35000c50031a1c08b 

[ ceph0-node1 ][ WARNIN ]