[ceph-users] Mixed Bluestore and Filestore NVMe OSDs for RGW metadata both running out of space

2018-08-29 Thread David Turner
osd daemon perf dump for a one of my bluestore NVMe OSDs has [1] this
excerpt.  I grabbed those stats based on Wido's [2] script to determine how
much DB overhead you have per object.  My [3] calculations for this
particular OSD are staggering.  99% of the space used on this OSD is being
consumed by the DB.  This particular OSD is sitting between 90%-97% disk
usage with an occasional drop to 80%, but then back up.  It's fluctuating
wildly from one minute to the next.

One of my filestore NVMe OSDs in the same cluster has 99% of its used space
in ./current/omap/

This is causing IO stalls as well as OSDs flapping on the cluster.  Does
anyone have any ideas of anything I can try?  It's definitely not the
actual PGs on the OSDs.  I tried balancing the weights of the OSDs to
better distribute the data, but moving the PGs around seemed to make things
worse.  Thank you.


[1] "bluestore_onodes": 167,
"stat_bytes_used": 143855271936,
"db_used_bytes": 142656667648,

[2] https://gist.github.com/wido/b1328dd45aae07c45cb8075a24de9f1f

[3] Average object size = 821MB
DB overhead per object = 814MB
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hammer and a (little) disk/partition shrink...

2018-08-29 Thread Marco Gaiarin
Mandi! David Turner
  In chel di` si favelave...

> Replace the raid controller in the chassis with an HBA before moving into the
> new hardware? ;)

Eh... some hint on a controller i can buy?


> If you do move to the HP controller, make sure you're monitoring the health of
> the cache battery in the controller.

I've no battery in the controller... ;-)

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-Deploy error on 15/71 stage

2018-08-29 Thread Jones de Andrade
Hi Eugen.

Sorry for the delay in answering.

Just looked in the /var/log/ceph/ directory. It only contains the following
files (for example on node01):

###
# ls -lart
total 3864
-rw--- 1 ceph ceph 904 ago 24 13:11 ceph.audit.log-20180829.xz
drwxr-xr-x 1 root root 898 ago 28 10:07 ..
-rw-r--r-- 1 ceph ceph  189464 ago 28 23:59 ceph-mon.node01.log-20180829.xz
-rw--- 1 ceph ceph   24360 ago 28 23:59 ceph.log-20180829.xz
-rw-r--r-- 1 ceph ceph   48584 ago 29 00:00 ceph-mgr.node01.log-20180829.xz
-rw--- 1 ceph ceph   0 ago 29 00:00 ceph.audit.log
drwxrws--T 1 ceph ceph 352 ago 29 00:00 .
-rw-r--r-- 1 ceph ceph 1908122 ago 29 12:46 ceph-mon.node01.log
-rw--- 1 ceph ceph  175229 ago 29 12:48 ceph.log
-rw-r--r-- 1 ceph ceph 1599920 ago 29 12:49 ceph-mgr.node01.log
###

So, it only contains logs concerning the node itself (is it correct? sincer
node01 is also the master, I was expecting it to have logs from the other
too) and, moreover, no ceph-osd* files. Also, I'm looking the logs I have
available, and nothing "shines out" (sorry for my poor english) as a
possible error.

Any suggestion on how to proceed?

Thanks a lot in advance,

Jones


On Mon, Aug 27, 2018 at 5:29 AM Eugen Block  wrote:

> Hi Jones,
>
> all ceph logs are in the directory /var/log/ceph/, each daemon has its
> own log file, e.g. OSD logs are named ceph-osd.*.
>
> I haven't tried it but I don't think SUSE Enterprise Storage deploys
> OSDs on partitioned disks. Is there a way to attach a second disk to
> the OSD nodes, maybe via USB or something?
>
> Although this thread is ceph related it is referring to a specific
> product, so I would recommend to post your question in the SUSE forum
> [1].
>
> Regards,
> Eugen
>
> [1] https://forums.suse.com/forumdisplay.php?99-SUSE-Enterprise-Storage
>
> Zitat von Jones de Andrade :
>
> > Hi Eugen.
> >
> > Thanks for the suggestion. I'll look for the logs (since it's our first
> > attempt with ceph, I'll have to discover where they are, but no problem).
> >
> > One thing called my attention on your response however:
> >
> > I haven't made myself clear, but one of the failures we encountered were
> > that the files now containing:
> >
> > node02:
> >--
> >storage:
> >--
> >osds:
> >--
> >/dev/sda4:
> >--
> >format:
> >bluestore
> >standalone:
> >True
> >
> > Were originally empty, and we filled them by hand following a model found
> > elsewhere on the web. It was necessary, so that we could continue, but
> the
> > model indicated that, for example, it should have the path for /dev/sda
> > here, not /dev/sda4. We chosen to include the specific partition
> > identification because we won't have dedicated disks here, rather just
> the
> > very same partition as all disks were partitioned exactly the same.
> >
> > While that was enough for the procedure to continue at that point, now I
> > wonder if it was the right call and, if it indeed was, if it was done
> > properly.  As such, I wonder: what you mean by "wipe" the partition here?
> > /dev/sda4 is created, but is both empty and unmounted: Should a different
> > operation be performed on it, should I remove it first, should I have
> > written the files above with only /dev/sda as target?
> >
> > I know that probably I wouldn't run in this issues with dedicated discks,
> > but unfortunately that is absolutely not an option.
> >
> > Thanks a lot in advance for any comments and/or extra suggestions.
> >
> > Sincerely yours,
> >
> > Jones
> >
> > On Sat, Aug 25, 2018 at 5:46 PM Eugen Block  wrote:
> >
> >> Hi,
> >>
> >> take a look into the logs, they should point you in the right direction.
> >> Since the deployment stage fails at the OSD level, start with the OSD
> >> logs. Something's not right with the disks/partitions, did you wipe
> >> the partition from previous attempts?
> >>
> >> Regards,
> >> Eugen
> >>
> >> Zitat von Jones de Andrade :
> >>
> >>> (Please forgive my previous email: I was using another message and
> >>> completely forget to update the subject)
> >>>
> >>> Hi all.
> >>>
> >>> I'm new to ceph, and after having serious problems in ceph stages 0, 1
> >> and
> >>> 2 that I could solve myself, now it seems that I have hit a w

Re: [ceph-users] Hammer and a (little) disk/partition shrink...

2018-08-29 Thread David Turner
Replace the raid controller in the chassis with an HBA before moving into
the new hardware? ;)

If you do move to the HP controller, make sure you're monitoring the health
of the cache battery in the controller.  We notice a significant increase
to await on our OSD nodes behind these when the cache battery fails.  We've
replaced over 10 batteries on HP raid controllers for our OSD nodes and the
first time we noticed it, there were 6 of them failed across multiple
clusters causing the OSDs to be slower in those nodes.

On Wed, Aug 29, 2018 at 7:21 AM Marco Gaiarin  wrote:

>
> Probably a complex question, with a simple answer: NO. ;-)
>
>
> I need to move disks from a ceph node (still on hammer) from an
> hardware to another one. The source hardware have a simple SATA/SAS
> controller, the 'new' server have a RAID controller with no JBOD mode
> (the infamous HP P410i), so i need to create some 'RAID 0 with a single
> disk' fake raid.
>
> These controller, seems to ''eat'' some space at the end of the disk,
> so (doing some tests) the disk does not get corrupted with the
> 'raid0-ification', but lost some bytes at the end, and linux then
> complain that the (last) partition are corrupted.
>
> hammer use filestore, so practically i need to shrunk an xfs
> filesystem, that is not supported by XFS.
> Clearly i can do 'xfsdump' of disks in some scratch space and rebuild
> filesystem but...
>
>
> I've some escape path?
>
>
> Thanks.
>
> --
> dott. Marco Gaiarin GNUPG Key ID:
> 240A3D66
>   Associazione ``La Nostra Famiglia''
> http://www.lanostrafamiglia.it/
>   Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento
> (PN)
>   marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711
> <+39%200434%20842711>   f +39-0434-842797 <+39%200434%20842797>
>
> Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
>   http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
> (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs mount on osd node

2018-08-29 Thread David Turner
The problem with mounting an RBD or CephFS on an OSD node is if you're
doing so with the kernel client.  In a previous message on the ML John
Spray explained this wonderfully.

  "This is not a Ceph-specific thing -- it can also affect similar systems
like Lustre.  The classic case is when under some memory pressure, the
kernel tries to free memory by flushing the client's page cache, but doing
the flush means allocating more memory on the server, making the
memory pressure
worse, until the whole thing just seizes up."

If you're using ceph-fuse to mount cephfs, then you only have resource
contention as a problem, but nothing as severe as deadlocking.  Settings
like Jake mentioned can help you work around resource contention if that is
an issue for you.  Don't change the settings unless you notice a problem,
though.  Ceph is pretty good at having sane defaults.

On Wed, Aug 29, 2018 at 6:35 AM Jake Grimmett  wrote:

> Hi Marc,
>
> We mount cephfs using FUSE on all 10 nodes of our cluster, and provided
> that we limit bluestore memory use, find it to be reliable*.
>
> bluestore_cache_size = 209715200
> bluestore_cache_kv_max = 134217728
>
> Without the above tuning, we get OOM errors.
>
> As others will confirm, the FUSE client is more stable than the kernel
> client, but slower.
>
> ta ta
>
> Jake
>
> * We have 128GB of ram per 45 x 8TB Drive OSD node, way below
> recommendations (1GB RAM per TB storage); our OOM issues are completely
> predictable...
>
> On 29/08/18 13:25, Marc Roos wrote:
> >
> >
> > I have 3 node test cluster and I would like to expand this with a 4th
> > node that is currently mounting the cephfs and rsync's backups to it. I
> > can remember reading something about that you could create a deadlock
> > situation doing this.
> >
> > What are the risks I would be taking if I would be doing this?
> >
> >
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error EINVAL: (22) Invalid argument While using ceph osd safe-to-destroy

2018-08-29 Thread Alfredo Deza
I am addressing the doc bug at https://github.com/ceph/ceph/pull/23801

On Mon, Aug 27, 2018 at 2:08 AM, Eugen Block  wrote:
> Hi,
>
> could you please paste your osd tree and the exact command you try to
> execute?
>
>> Extra note, the while loop in the instructions look like it's bad.  I had
>> to change it to make it work in bash.
>
>
> The documented command didn't work for me either.
>
> Regards,
> Eugen
>
> Zitat von Robert Stanford :
>
>
>> I am following the procedure here:
>> http://docs.ceph.com/docs/mimic/rados/operations/bluestore-migration/
>>
>>  When I get to the part to run "ceph osd safe-to-destroy $ID" in a while
>> loop, I get a EINVAL error.  I get this error when I run "ceph osd
>> safe-to-destroy 0" on the command line by itself, too.  (Extra note, the
>> while loop in the instructions look like it's bad.  I had to change it to
>> make it work in bash.)
>>
>>  I know my ID is correct because I was able to use it in the previous step
>> (ceph osd out $ID).  I also substituted $ID for the number on the command
>> line and got the same error.  Why isn't this working?
>>
>> Error: Error EINVAL: (22) Invalid argument While using ceph osd
>> safe-to-destroy
>>
>>  Thank you
>> R
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Hammer and a (little) disk/partition shrink...

2018-08-29 Thread Marco Gaiarin


Probably a complex question, with a simple answer: NO. ;-)


I need to move disks from a ceph node (still on hammer) from an
hardware to another one. The source hardware have a simple SATA/SAS
controller, the 'new' server have a RAID controller with no JBOD mode
(the infamous HP P410i), so i need to create some 'RAID 0 with a single
disk' fake raid.

These controller, seems to ''eat'' some space at the end of the disk,
so (doing some tests) the disk does not get corrupted with the
'raid0-ification', but lost some bytes at the end, and linux then
complain that the (last) partition are corrupted.

hammer use filestore, so practically i need to shrunk an xfs
filesystem, that is not supported by XFS.
Clearly i can do 'xfsdump' of disks in some scratch space and rebuild
filesystem but...


I've some escape path?


Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Looking for information on full SSD deployments

2018-08-29 Thread Valmar Kuristik

Hello fellow Ceph users,

We have been using a small cluster (6 data nodes with 12 disks each, 3 
monitors) with OSDs on spinners and journals on SATA SSD-s for a while now.
We still haven't upgraded to Luminous, and are going to test it now, as 
we also need to switch some projects on a shared file system and cephFS 
seems to fit the bill.


What I'm mostly looking for is to get in contact with someone with 
experience in running Ceph as a full SSD cluster, or full SSD pool(s) on 
the main cluster. Main interest is in performance centric workloads 
generated by web applications that work directly with files, heavily 
both in read and write capacity, with low latency being very important.


As mentioned above, the other question is about viability of cephFS in 
production environment right now, for web applications with several 
nodes, using a shared file system for certain read and write operations.


I will not go into more detail here, if you have some experience and 
would be willing to share it, please write to val...@eenet.ee



Also thanks to everyone in this list for the insights other people's 
random problems have given us. We have probably managed to prevent some 
problems in the current cluster just by skimming through these e-mails.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD OSDs crashing after upgrade to 12.2.7

2018-08-29 Thread Alfredo Deza
On Wed, Aug 29, 2018 at 2:06 AM, Wolfgang Lendl
 wrote:
> Hi,
>
> after upgrading my ceph clusters from 12.2.5 to 12.2.7  I'm experiencing 
> random crashes from SSD OSDs (bluestore) - it seems that HDD OSDs are not 
> affected.
> I destroyed and recreated some of the SSD OSDs which seemed to help.
>
> this happens on centos 7.5 (different kernels tested)
>
> /var/log/messages:
> Aug 29 10:24:08  ceph-osd: *** Caught signal (Segmentation fault) **
> Aug 29 10:24:08  ceph-osd: in thread 7f8a8e69e700 thread_name:bstore_kv_final
> Aug 29 10:24:08  kernel: traps: bstore_kv_final[187470] general protection 
> ip:7f8a997cf42b sp:7f8a8e69abc0 error:0 in 
> libtcmalloc.so.4.4.5[7f8a997a8000+46000]
> Aug 29 10:24:08  systemd: ceph-osd@2.service: main process exited, 
> code=killed, status=11/SEGV
> Aug 29 10:24:08  systemd: Unit ceph-osd@2.service entered failed state.
> Aug 29 10:24:08  systemd: ceph-osd@2.service failed.
> Aug 29 10:24:28  systemd: ceph-osd@2.service holdoff time over, scheduling 
> restart.
> Aug 29 10:24:28  systemd: Starting Ceph object storage daemon osd.2...
> Aug 29 10:24:28  systemd: Started Ceph object storage daemon osd.2.
> Aug 29 10:24:28  ceph-osd: starting osd.2 at - osd_data 
> /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
> Aug 29 10:24:35  ceph-osd: *** Caught signal (Segmentation fault) **
> Aug 29 10:24:35  ceph-osd: in thread 7f5f1e790700 thread_name:tp_osd_tp
> Aug 29 10:24:35  kernel: traps: tp_osd_tp[186933] general protection 
> ip:7f5f43103e63 sp:7f5f1e78a1c8 error:0 in 
> libtcmalloc.so.4.4.5[7f5f430cd000+46000]
> Aug 29 10:24:35  systemd: ceph-osd@0.service: main process exited, 
> code=killed, status=11/SEGV
> Aug 29 10:24:35  systemd: Unit ceph-osd@0.service entered failed state.
> Aug 29 10:24:35  systemd: ceph-osd@0.service failed

These systemd messages aren't usually helpful, try poking around
/var/log/ceph/ for the output on that one OSD.

If those logs aren't useful either, try bumping up the verbosity (see
http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/#boot-time
)
>
> did I hit a known issue?
> any suggestions are highly appreciated
>
>
> br
> wolfgang
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs mount on osd node

2018-08-29 Thread Jake Grimmett
Hi Marc,

We mount cephfs using FUSE on all 10 nodes of our cluster, and provided
that we limit bluestore memory use, find it to be reliable*.

bluestore_cache_size = 209715200
bluestore_cache_kv_max = 134217728

Without the above tuning, we get OOM errors.

As others will confirm, the FUSE client is more stable than the kernel
client, but slower.

ta ta

Jake

* We have 128GB of ram per 45 x 8TB Drive OSD node, way below
recommendations (1GB RAM per TB storage); our OOM issues are completely
predictable...

On 29/08/18 13:25, Marc Roos wrote:
> 
> 
> I have 3 node test cluster and I would like to expand this with a 4th 
> node that is currently mounting the cephfs and rsync's backups to it. I 
> can remember reading something about that you could create a deadlock 
> situation doing this. 
> 
> What are the risks I would be taking if I would be doing this?
> 
> 
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New Ceph community manager: Mike Perez

2018-08-29 Thread Sage Weil
Correction:

Mike's new email is actually mipe...@redhat.com (sorry, mperez!).

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs mount on osd node

2018-08-29 Thread Marc Roos



I have 3 node test cluster and I would like to expand this with a 4th 
node that is currently mounting the cephfs and rsync's backups to it. I 
can remember reading something about that you could create a deadlock 
situation doing this. 

What are the risks I would be taking if I would be doing this?






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] prevent unnecessary MON leader re-election

2018-08-29 Thread Joao Eduardo Luis
On 08/29/2018 11:02 AM, William Lawton wrote:
> 
> We have a 5 node Ceph cluster, status output copied below. During our
> cluster resiliency tests we have noted that a MON leader election takes
> place when we fail one member of the MON quorum, even though the failed
> instance is not the current MON leader. We speculate that this
> re-election process may be contributing to short periods of cluster
> unavailability when one or more cluster instances fail. Is there a way
> to configure the cluster so that there is only a MON leader election if
> the existing MON leader fails but not when some other member of the MON
> quorum fails?

Not at the moment, and this hasn't been in our plans.

My reasoning, at least, has been that if a monitor failed, an election
is the best way we have to ensure the remaining monitors are alive and
communicative. And the election itself should be a quick process anyway,
so this never became a particularly pressing feature.

I'd suggest opening a feature request in the tracker, asking for this.
And, if possible, attach logs to the ticket showing that the election is
taking too long, or evidence that you're getting I/O stalls during this
period. (for the mon logs, I'd suggest 'debug mon = 10', 'debug paxos =
10', and 'debug ms = 1')

  -Joao

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Installing ceph 12.2.4 via Ubuntu apt

2018-08-29 Thread Paul Emmerich
The root cause is a restriction in reprepro used to manage the repository:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=570623

Paul

2018-08-29 8:50 GMT+02:00 Thomas Bennett :
> Hi David,
>
> Thanks for your reply. That's how I'm currently handling it.
>
> Kind regards,
> Tom
>
> On Tue, Aug 28, 2018 at 4:36 PM David Turner  wrote:
>>
>> That is the expected behavior of the ceph repo. In the past when I needed
>> a specific version I would download the packages for the version to a folder
>> and you can create a repo file that reads from a local directory. That's how
>> I would re-install my test lab after testing an upgrade procedure to try it
>> over again.
>>
>> On Tue, Aug 28, 2018, 1:01 AM Thomas Bennett  wrote:
>>>
>>> Hi,
>>>
>>> I'm wanting to pin to an older version of Ceph Luminous (12.2.4) and I've
>>> noticed that https://download.ceph.com/debian-luminous/ does not support
>>> this via apt install:
>>> apt install ceph works for 12.2.7 but
>>> apt install ceph=12.2.4-1xenial does not work
>>>
>>> The deb file are there, they're just not included in the package
>>> distribution. Is this the desired behaviour or a misconfiguration?
>>>
>>> Cheers,
>>> Tom
>>>
>>> --
>>> Thomas Bennett
>>>
>>> SARAO
>>> Science Data Processing
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Thomas Bennett
>
> SARAO
> Science Data Processing
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] prevent unnecessary MON leader re-election

2018-08-29 Thread William Lawton
Hi.

We have a 5 node Ceph cluster, status output copied below. During our cluster 
resiliency tests we have noted that a MON leader election takes place when we 
fail one member of the MON quorum, even though the failed instance is not the 
current MON leader. We speculate that this re-election process may be 
contributing to short periods of cluster unavailability when one or more 
cluster instances fail. Is there a way to configure the cluster so that there 
is only a MON leader election if the existing MON leader fails but not when 
some other member of the MON quorum fails?

cluster:
id: f774b9b2-d514-40d9-85ab-d0389724b6c0
health: HEALTH_OK

  services:
mon: 3 daemons, quorum dub-sitv-ceph-03,dub-sitv-ceph-04,dub-sitv-ceph-05
mgr: dub-sitv-ceph-04(active), standbys: dub-sitv-ceph-03, dub-sitv-ceph-05
mds: cephfs-1/1/1 up  {0=dub-sitv-ceph-02=up:active}, 1 up:standby-replay
osd: 4 osds: 4 up, 4 in

  data:
pools:   2 pools, 200 pgs
objects: 554  objects, 980 MiB
usage:   7.9 GiB used, 1.9 TiB / 2.0 TiB avail
pgs: 200 active+clean

  io:
client:   1.5 MiB/s rd, 810 KiB/s wr, 286 op/s rd, 218 op/s wr

William Lawton


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New Ceph community manager: Mike Perez

2018-08-29 Thread Lars Marowsky-Bree
On 2018-08-29T01:13:24, Sage Weil  wrote:

Most excellent! Welcome, Mike!

I look forward to working with you.


Regards,
Lars

-- 
Architect SDS, Distinguished Engineer
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
"Architects should open possibilities and not determine everything." (Ueli 
Zbinden)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New Ceph community manager: Mike Perez

2018-08-29 Thread Joao Eduardo Luis
On 08/29/2018 02:13 AM, Sage Weil wrote:
> Hi everyone,
> 
> Please help me welcome Mike Perez, the new Ceph community manager!

Very happy to have you with us!

Let us know if there's anything we can help you with, and don't hesitate
to get in touch :)

  -Joao
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New Ceph community manager: Mike Perez

2018-08-29 Thread Lenz Grimmer

Great news. Welcome Mike! I look forward to working with you, let me
know if there is anything I can help you with.

Lenz

On 08/29/2018 03:13 AM, Sage Weil wrote:

> Please help me welcome Mike Perez, the new Ceph community manager!
> 
> Mike has a long history with Ceph: he started at DreamHost working on 
> OpenStack and Ceph back in the early days, including work on the original 
> RBD integration.  He went on to work in several roles in the OpenStack 
> project, doing a mix of infrastructure, cross-project and community 
> related initiatives, including serving as the Project Technical Lead for 
> Cinder.
> 
> Mike lives in Pasadena, CA, and can be reached at mpe...@redhat.com, on 
> IRC as thingee, or twitter as @thingee.
> 
> I am very excited to welcome Mike back to Ceph, and look forward to 
> working together on building the Ceph developer and user communities!

-- 
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] SSD OSDs crashing after upgrade to 12.2.7

2018-08-29 Thread Wolfgang Lendl
Hi,

after upgrading my ceph clusters from 12.2.5 to 12.2.7  I'm experiencing random 
crashes from SSD OSDs (bluestore) - it seems that HDD OSDs are not affected.
I destroyed and recreated some of the SSD OSDs which seemed to help. 

this happens on centos 7.5 (different kernels tested)

/var/log/messages: 
Aug 29 10:24:08  ceph-osd: *** Caught signal (Segmentation fault) **
Aug 29 10:24:08  ceph-osd: in thread 7f8a8e69e700 thread_name:bstore_kv_final
Aug 29 10:24:08  kernel: traps: bstore_kv_final[187470] general protection 
ip:7f8a997cf42b sp:7f8a8e69abc0 error:0 in 
libtcmalloc.so.4.4.5[7f8a997a8000+46000]
Aug 29 10:24:08  systemd: ceph-osd@2.service: main process exited, code=killed, 
status=11/SEGV
Aug 29 10:24:08  systemd: Unit ceph-osd@2.service entered failed state.
Aug 29 10:24:08  systemd: ceph-osd@2.service failed.
Aug 29 10:24:28  systemd: ceph-osd@2.service holdoff time over, scheduling 
restart.
Aug 29 10:24:28  systemd: Starting Ceph object storage daemon osd.2...
Aug 29 10:24:28  systemd: Started Ceph object storage daemon osd.2.
Aug 29 10:24:28  ceph-osd: starting osd.2 at - osd_data 
/var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
Aug 29 10:24:35  ceph-osd: *** Caught signal (Segmentation fault) **
Aug 29 10:24:35  ceph-osd: in thread 7f5f1e790700 thread_name:tp_osd_tp
Aug 29 10:24:35  kernel: traps: tp_osd_tp[186933] general protection 
ip:7f5f43103e63 sp:7f5f1e78a1c8 error:0 in 
libtcmalloc.so.4.4.5[7f5f430cd000+46000]
Aug 29 10:24:35  systemd: ceph-osd@0.service: main process exited, code=killed, 
status=11/SEGV
Aug 29 10:24:35  systemd: Unit ceph-osd@0.service entered failed state.
Aug 29 10:24:35  systemd: ceph-osd@0.service failed.

did I hit a known issue?
any suggestions are highly appreciated


br
wolfgang




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph cluster "hung" after node failure

2018-08-29 Thread Brett Chancellor
Hi All. I have a ceph cluster that's partially upgraded to Luminous. Last
night a host died and since then the cluster is failing to recover. It
finished backfilling, but was left with thousands of requests degraded,
inactive, or stale.  In order to move past the issue, I put the cluster in
noout,noscrub,nodeep-scrub and restarted all services one by one.

Here is the current state of the cluster, any idea how to get past the
stale and stuck pgs? Any help would be very appreciated. Thanks.

-Brett


## ceph -s output
###
$ sudo ceph -s
  cluster:
id:
health: HEALTH_ERR
165 pgs are stuck inactive for more than 60 seconds
243 pgs backfill_wait
144 pgs backfilling
332 pgs degraded
5 pgs peering
1 pgs recovery_wait
22 pgs stale
332 pgs stuck degraded
143 pgs stuck inactive
22 pgs stuck stale
531 pgs stuck unclean
330 pgs stuck undersized
330 pgs undersized
671 requests are blocked > 32 sec
603 requests are blocked > 4096 sec
recovery 3524906/412016682 objects degraded (0.856%)
recovery 2462252/412016682 objects misplaced (0.598%)
noout,noscrub,nodeep-scrub flag(s) set
mon.ceph0rdi-mon1-1-prd store is getting too big! 17612 MB >=
15360 MB
mon.ceph0rdi-mon2-1-prd store is getting too big! 17669 MB >=
15360 MB
mon.ceph0rdi-mon3-1-prd store is getting too big! 17586 MB >=
15360 MB

  services:
mon: 3 daemons, quorum
ceph0rdi-mon1-1-prd,ceph0rdi-mon2-1-prd,ceph0rdi-mon3-1-prd
mgr: ceph0rdi-mon3-1-prd(active), standbys: ceph0rdi-mon2-1-prd,
ceph0rdi-mon1-1-prd
osd: 222 osds: 218 up, 218 in; 428 remapped pgs
 flags noout,noscrub,nodeep-scrub

  data:
pools:   35 pools, 38144 pgs
objects: 130M objects, 172 TB
usage:   538 TB used, 337 TB / 875 TB avail
pgs: 0.375% pgs not active
 3524906/412016682 objects degraded (0.856%)
 2462252/412016682 objects misplaced (0.598%)
 37599 active+clean
 173   active+undersized+degraded+remapped+backfill_wait
 133   active+undersized+degraded+remapped+backfilling
 93activating
 68active+remapped+backfill_wait
 22activating+undersized+degraded+remapped
 13stale+active+clean
 11active+remapped+backfilling
 9 activating+remapped
 5 remapped
 5 stale+activating+remapped
 3 remapped+peering
 2 stale+remapped
 2 stale+remapped+peering
 1 activating+degraded+remapped
 1 active+clean+remapped
 1 active+degraded+remapped+backfill_wait
 1 active+undersized+remapped+backfill_wait
 1 activating+degraded
 1 active+recovery_wait+undersized+degraded+remapped

  io:
client:   187 kB/s rd, 2595 kB/s wr, 99 op/s rd, 343 op/s wr
recovery: 1509 MB/s, 1541 objects/s

## ceph pg dump_stuck stale (this number doesn't seem to decrease)

$ sudo ceph pg dump_stuck stale
ok
PG_STAT STATE UPUP_PRIMARY ACTING
ACTING_PRIMARY
17.6d7 stale+remapped[5,223,96]  5  [223,96,148]
223
2.5c5  stale+active+clean  [224,48,179]224  [224,48,179]
224
17.64e stale+active+clean  [224,84,109]224  [224,84,109]
224
19.5b4  stale+activating+remapped  [124,130,20]124   [124,20,11]
124
17.4c6 stale+active+clean  [224,216,95]224  [224,216,95]
224
73.413  stale+activating+remapped [117,130,189]117 [117,189,137]
117
2.431  stale+remapped+peering   [5,180,142]  5  [180,142,40]
180
69.1dc stale+active+clean[62,36,54] 62[62,36,54]
 62
14.790 stale+active+clean   [81,114,19] 81   [81,114,19]
 81
2.78e  stale+active+clean [224,143,124]224 [224,143,124]
224
73.37a stale+active+clean   [224,84,38]224   [224,84,38]
224
17.42d  stale+activating+remapped  [220,130,25]220  [220,25,137]
220
72.263 stale+active+clean [224,148,117]224 [224,148,117]
224
67.40  stale+active+clean   [62,170,71] 62   [62,170,71]
 62
67.16d stale+remapped+peering[3,147,22]  3   [147,22,29]
147
20.3de stale+active+clean [224,103,126]224 [224,103,126]
224
19.721 stale+remapped[3,34,179]  3  [34,179,128]
 34
19.2f1  stale+activating+remapped [126,130,178]126  [126,178,72]
126
74.28b stale+active+clean   [224,95,56]224 

Re: [ceph-users] SAN or DAS for Production ceph

2018-08-29 Thread James Watson
Thanks, Tom and John, both of your input really helpful and helped to put
things into perspective.
Much appreciated.

@John, I am based out of Dubai.


On Wed, Aug 29, 2018 at 2:06 AM John Hearns  wrote:

> James, you also use the words enterprise and production ready.
> Is Redhat support important to you?
>
>
>
>
> On Tue, 28 Aug 2018 at 23:56, John Hearns  wrote:
>
>> James, well for a start don't use a SAN. I speak as someone who managed a
>> SAN with Brocade switches and multipathing for an F1 team. CEPH is Software
>> Defined Storage. You want discreet storage servers with a high bandwidth
>> Ethernet (or maybe Infiniband) fabric.
>>
>> Fibrechannel still has it place here though if you want servers with FC
>> attached JBODs.
>>
>> Also you ask about the choice between spinning disks, SSDs and NVMe
>> drives. Think about the COST for your petabyte archive.
>> True, these days you can argue that all SSD could be comparable to
>> spinning disks. But NVMe? Yes you get the best performance.. but do you
>> really want all that video data on $$$ NVMe? You need tiering.
>>
>> Also dont forget low and slow archive tiers - shingled archive disks and
>> perhaps tape.
>>
>> Me, I would start from the building blocks of Supermicro 36 bay storage
>> servers. Fill them with 12 Tbyte helium drives.
>> Two slots in the back for SSDs for your journaling.
>> For a higher performance tier, look at the 'double double' storage
>> servers from Supermicro. Or even nicer the new 'ruler'form factor servers.
>> For a higher density archiving tier the 90 bay Supermicro servers.
>>
>> Please get in touch with someone for advice. If you are in the UK I am
>> happy to help and point you in the right direction.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, 28 Aug 2018 at 21:05, James Watson 
>> wrote:
>>
>>> Dear cephers,
>>>
>>> I am new to the storage domain.
>>> Trying to get my head around the enterprise - production-ready setup.
>>>
>>> The following article helps a lot here: (Yahoo ceph implementation)
>>> https://yahooeng.tumblr.com/tagged/object-storage
>>>
>>> But a couple of questions:
>>>
>>> What HDD would they have used here? NVMe / SATA /SAS etc (with just 52
>>> storage node they got 3.2 PB of capacity !! )
>>> I try to calculate a similar setup with HGST Ultrastar He12 (12TB and
>>> it's more recent ) and would need 86 HDDs that adds up to 1 PB only!!
>>>
>>> How is the HDD drive attached is it DAS or a SAN (using Fibre Channel
>>> Switches, Host Bus Adapters etc)?
>>>
>>> Do we need a proprietary hashing algorithm to implement multi-cluster
>>> based setup of ceph to contain CPU/Memory usage within the cluster when
>>> rebuilding happens during device failure?
>>>
>>> If proprietary hashing algorithm is required to setup multi-cluster ceph
>>> using load balancer - then what could be the alternative setup we can
>>> deploy to address the same issue?
>>>
>>> The aim is to design a similar architecture but with upgraded products
>>> and higher performance. - Any suggestions or thoughts are welcome
>>>
>>>
>>>
>>> Thanks in advance
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Installing ceph 12.2.4 via Ubuntu apt

2018-08-29 Thread Thomas Bennett
Hi David,

Thanks for your reply. That's how I'm currently handling it.

Kind regards,
Tom

On Tue, Aug 28, 2018 at 4:36 PM David Turner  wrote:

> That is the expected behavior of the ceph repo. In the past when I needed
> a specific version I would download the packages for the version to a
> folder and you can create a repo file that reads from a local directory.
> That's how I would re-install my test lab after testing an upgrade
> procedure to try it over again.
>
> On Tue, Aug 28, 2018, 1:01 AM Thomas Bennett  wrote:
>
>> Hi,
>>
>> I'm wanting to pin to an older version of Ceph Luminous (12.2.4) and I've
>> noticed that https://download.ceph.com/debian-luminous/ does not support
>> this via apt install:
>> apt install ceph works for 12.2.7 but
>> apt install ceph=12.2.4-1xenial does not work
>>
>> The deb file are there, they're just not included in the package
>> distribution. Is this the desired behaviour or a misconfiguration?
>>
>> Cheers,
>> Tom
>>
>> --
>> Thomas Bennett
>>
>> SARAO
>> Science Data Processing
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

-- 
Thomas Bennett

SARAO
Science Data Processing
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com