[ceph-users] Error ENOENT: problem getting command descriptions from mon.5

2019-08-01 Thread Christoph Adomeit
Hi there,

i have updated my ceph-cluster from luminous to 14.2.1 and whenever I run a 
"ceph tell mon.* version"
I get the correct versions from all monitors except mon.5

For mon.5 is get  the error:
Error ENOENT: problem getting command descriptions from mon.5
mon.5: problem getting command descriptions from mon.5

The ceph status command is healthy and all monitors seem to work perfectly, 
also mon.5.


I have verified the running mon binaries are the same and the persmissions in
/var/lib/ceph are also the dame as on the other monitor hosts.

I am also not sure if the error came after the update to nautilus, i think it 
was there beforde.

Any idea what the problem getting command descriptions error might be, where i 
can look
and what to fix ? Monmap and ceph.conf seem also to be okay and monitor logfile 
does
not show anything unusual.

Thanks
  Christoph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Nautilus: xxx pgs not deep-scrubbed in time

2019-07-30 Thread Christoph Adomeit
Hi there,

yesterday I upgraded my ceph cluster from luminous to nautilus and since
then I have the message xxx pgs not deep-scrubbed in time.

My deep scrubbings were okay before, I do have a deep scrub
interval of 6 weeks:

osd_deep_scrub_interval = 3628800

and i had no warning.

Since yeterday it seems that every pg were the deep scrub is older 7 days
is marked as not deep-scrubbed in time.

Can somebody please tell me what is the math of the warning ?
I tried a lot of parameters today to overcome the warning.

The main thing is: I want my cluster to come from HEALTH_WARN to HEALTH_OK state
because otherwise some of my script are not started which is bad.

Any ideas what I can do ?

Thanks
  Christoph






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Possible data damage: 1 pg inconsistent

2018-12-21 Thread Christoph Adomeit
ls, 4224 pgs
>     objects: 23.35M objects, 20.9TiB
>     usage:   64.9TiB used, 136TiB / 201TiB avail
>     pgs: 4224 active+clean
> 
>   io:
>     client:   195KiB/s rd, 7.19MiB/s wr, 17op/s rd, 127op/s wr
> 
> 
> 
> Le 19/12/2018 à 04:48, Frank Ritchie a écrit :
> >Hi all,
> >
> >I have been receiving alerts for:
> >
> >Possible data damage: 1 pg inconsistent
> >
> >almost daily for a few weeks now. When I check:
> >
> >rados list-inconsistent-obj $PG --format=json-pretty
> >
> >I will always see a read_error. When I run a deep scrub on the PG I will
> >see:
> >
> >head candidate had a read error
> >
> >When I check dmesg on the osd node I see:
> >
> >blk_update_request: critical medium error, dev sdX, sector 123
> >
> >I will also see a few uncorrected read errors in smartctl.
> >
> >Info:
> >Ceph: ceph version 12.2.4-30.el7cp
> >OSD: Toshiba 1.8TB SAS 10K
> >120 OSDs total
> >
> >Has anyone else seen these alerts occur almost daily? Can the errors
> >possibly be due to deep scrubbing too aggressively?
> >
> >I realize these errors indicate potential failing drives but I can't
> >replace a drive daily.
> >
> >thx
> >Frank
> 
> 

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Kein Backup - kein Mitleid
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 12.2.9 release

2018-11-07 Thread Christoph Adomeit
So my question regarding the latest ceph releases still is:

Where do all these scrub errors come from and do we have to worry about ?


On Thu, Nov 08, 2018 at 12:16:05AM +0800, Ashley Merrick wrote:
> I am seeing this on the latest mimic on my test cluster aswel.
> 
> Every automatic deep-scrub comes back as inconsistent, but doing another
> manual scrub comes back as fine and clear each time.
> 
> Not sure if related or not..
> 
> On Wed, 7 Nov 2018 at 11:57 PM, Christoph Adomeit <
> christoph.adom...@gatworks.de> wrote:
> 
> > Hello together,
> >
> > we have upgraded to 12.2.9 because it was in the official repos.
> >
> > Right after the update and some scrubs we have issues.
> >
> > This morning after regular scrubs we had around 10% of all pgs inconstent:
> >
> > pgs: 4036 active+clean
> >   380  active+clean+inconsistent
> >
> > After repairung these 380 pgs we again have:
> >
> > 1/93611534 objects unfound (0.000%)
> > 28   active+clean+inconsistent
> > 1active+recovery_wait+degraded
> >
> > Now we stopped repairing because it does not seem to solve the problem and
> > more and more error messages are occuring. So far we did not see corruption
> > but we do not feel well with the cluster.
> >
> > What do you suggest, wait for 12.2.10 ? Roll Back to 12.2.8 ?
> >
> > Is ist dangerous for our Data to leave the cluster running ?
> >
> > I am sure we do not have hardware errors and that these errors came with
> > the update to 12.2.9.
> >
> > Thanks
> >   Christoph
> >
> >
> >
> > On Wed, Nov 07, 2018 at 07:39:59AM -0800, Gregory Farnum wrote:
> > > On Wed, Nov 7, 2018 at 5:58 AM Simon Ironside 
> > > wrote:
> > >
> > > >
> > > >
> > > > On 07/11/2018 10:59, Konstantin Shalygin wrote:
> > > > >> I wonder if there is any release announcement for ceph 12.2.9 that I
> > > > missed.
> > > > >> I just found the new packages on download.ceph.com, is this an
> > official
> > > > >> release?
> > > > >
> > > > > This is because 12.2.9 have a several bugs. You should avoid to use
> > this
> > > > > release and wait for 12.2.10
> > > >
> > > > Argh! What's it doing in the repos then?? I've just upgraded to it!
> > > > What are the bugs? Is there a thread about them?
> > >
> > >
> > > If you’ve already upgraded and have no issues then you won’t have any
> > > trouble going forward — except perhaps on the next upgrade, if you do it
> > > while the cluster is unhealthy.
> > >
> > > I agree that it’s annoying when these issues make it out. We’ve had
> > ongoing
> > > discussions to try and improve the release process so it’s less drawn-out
> > > and to prevent these upgrade issues from making it through testing, but
> > > nobody has resolved it yet. If anybody has experience working with deb
> > > repositories and handling releases, the Ceph upstream could use some
> > > help... ;)
> > > -Greg
> > >
> > >
> > > >
> > > > Simon
> > > > ___
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > >
> >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >

-- 
Was macht ein Clown im Büro ? Faxen 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 12.2.9 release

2018-11-07 Thread Christoph Adomeit
Hello together,

we have upgraded to 12.2.9 because it was in the official repos.

Right after the update and some scrubs we have issues.

This morning after regular scrubs we had around 10% of all pgs inconstent:

pgs: 4036 active+clean
  380  active+clean+inconsistent

After repairung these 380 pgs we again have:

1/93611534 objects unfound (0.000%)
28   active+clean+inconsistent
1active+recovery_wait+degraded

Now we stopped repairing because it does not seem to solve the problem and more 
and more error messages are occuring. So far we did not see corruption but we 
do not feel well with the cluster.

What do you suggest, wait for 12.2.10 ? Roll Back to 12.2.8 ?

Is ist dangerous for our Data to leave the cluster running ?

I am sure we do not have hardware errors and that these errors came with the 
update to 12.2.9.

Thanks
  Christoph



On Wed, Nov 07, 2018 at 07:39:59AM -0800, Gregory Farnum wrote:
> On Wed, Nov 7, 2018 at 5:58 AM Simon Ironside 
> wrote:
> 
> >
> >
> > On 07/11/2018 10:59, Konstantin Shalygin wrote:
> > >> I wonder if there is any release announcement for ceph 12.2.9 that I
> > missed.
> > >> I just found the new packages on download.ceph.com, is this an official
> > >> release?
> > >
> > > This is because 12.2.9 have a several bugs. You should avoid to use this
> > > release and wait for 12.2.10
> >
> > Argh! What's it doing in the repos then?? I've just upgraded to it!
> > What are the bugs? Is there a thread about them?
> 
> 
> If you’ve already upgraded and have no issues then you won’t have any
> trouble going forward — except perhaps on the next upgrade, if you do it
> while the cluster is unhealthy.
> 
> I agree that it’s annoying when these issues make it out. We’ve had ongoing
> discussions to try and improve the release process so it’s less drawn-out
> and to prevent these upgrade issues from making it through testing, but
> nobody has resolved it yet. If anybody has experience working with deb
> repositories and handling releases, the Ceph upstream could use some
> help... ;)
> -Greg
> 
> 
> >
> > Simon
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Luminous 12.2.3 Changelog ?

2018-02-21 Thread Christoph Adomeit
Hi there,

I noticed that luminous 12.2.3 is already released.

Is there any changelog for this release ?

Thanks
  Christoph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Separate WAL and DB Partitions for existing OSDs ?

2017-09-07 Thread Christoph Adomeit
To be mor eprecise, what I want to know is:


I have a lot of bluestore osds and now I want to add separate wal and db on new 
nvme partitions.

Would it be enough to just generate empty partitions with parted and make 
symlinks on the osd partition like this:

$ sudo ln -sf /dev/disk/by-partlabel/osd-device-0-db 
/var/lib/ceph/osd/ceph-0/block.db
$ sudo ln -sf /dev/disk/by-partlabel/osd-device-0-wal 
/var/lib/ceph/osd/ceph-0/block.wal

Shall I use special partition ids or flags for db and wal ? And how big should 
I make db and wal partitions ?


Thanks


Christoph 



On Thu, Sep 07, 2017 at 09:57:16AM +0200, Christoph Adomeit wrote:
> Hi there,
> 
> is it possible to move WAL and DB Data for Existing bluestore OSDs to 
> separate partitions ? 
> 
> I am looking for a method to maybe take an OSD out, do some magic and move 
> some data to new SSD Devices and then take the OSD back in.
> 
> Any Ideas ?
> 
> Thanks
>   Christoph
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Es gibt keine  Cloud, es gibt nur die Computer anderer Leute
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Separate WAL and DB Partitions for existing OSDs ?

2017-09-07 Thread Christoph Adomeit
Hi there,

is it possible to move WAL and DB Data for Existing bluestore OSDs to separate 
partitions ? 

I am looking for a method to maybe take an OSD out, do some magic and move some 
data to new SSD Devices and then take the OSD back in.

Any Ideas ?

Thanks
  Christoph


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Modification Time of RBD Images

2017-09-06 Thread Christoph Adomeit
Now that we are 2 years and some ceph releases farther and have bluestor:

Are there meanwhile any better ways to find out the mtime of an rbd image ?

Thanks
  Christoph

On Thu, Nov 26, 2015 at 06:50:46PM +0100, Jan Schermer wrote:
> Find in which block the filesystem on your RBD image stores journal, find the 
> object hosting this block in rados and use its mtime :-)
> 
> Jan
> 
> 
> > On 26 Nov 2015, at 18:49, Gregory Farnum <gfar...@redhat.com> wrote:
> > 
> > I don't think anything tracks this explicitly for RBD, but each RADOS 
> > object does maintain an mtime you can check via the rados tool. You could 
> > write a script to iterate through all the objects in the image and find the 
> > most recent mtime (although a custom librados binary will be faster if you 
> > want to do this frequently).
> > -Greg
> > 
> > On Thursday, November 26, 2015, Christoph Adomeit 
> > <christoph.adom...@gatworks.de <mailto:christoph.adom...@gatworks.de>> 
> > wrote:
> > Hi there,
> > 
> > I am using Ceph-Hammer and I am wondering about the following:
> > 
> > What is the recommended way to find out when an rbd-Image was last modified 
> > ?
> > 
> > Thanks
> >   Christoph
> > 
> > --
> > Christoph Adomeit
> > GATWORKS GmbH
> > Reststrauch 191
> > 41199 Moenchengladbach
> > Sitz: Moenchengladbach
> > Amtsgericht Moenchengladbach, HRB 6303
> > Geschaeftsfuehrer:
> > Christoph Adomeit, Hans Wilhelm Terstappen
> > 
> > christoph.adom...@gatworks.de <javascript:;> Internetloesungen vom 
> > Feinsten
> > Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com <javascript:;>
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Modification Time of RBD Images

2017-03-23 Thread Christoph Adomeit
Hi,

no i did not enable the journalling feature since we do not use mirroring.


On Thu, Mar 23, 2017 at 08:10:05PM +0800, Dongsheng Yang wrote:
> Did you enable the journaling feature?
> 
> On 03/23/2017 07:44 PM, Christoph Adomeit wrote:
> >Hi Yang,
> >
> >I mean "any write" to this image.
> >
> >I am sure we have a lot of not-used-anymore rbd images in our pool and I am 
> >trying to identify them.
> >
> >The mtime would be a good hint to show which images might be unused.
> >
> >Christoph
> >
> >On Thu, Mar 23, 2017 at 07:32:49PM +0800, Dongsheng Yang wrote:
> >>Hi Christoph,
> >>
> >>On 03/23/2017 07:16 PM, Christoph Adomeit wrote:
> >>>Hello List,
> >>>
> >>>i am wondering if there is meanwhile an easy method in ceph to find more 
> >>>information about rbd-images.
> >>>
> >>>For example I am interested in the modification time of an rbd image.
> >>Do you mean some metadata changing? such as resize?
> >>
> >>Or any write to this image?
> >>
> >>Thanx
> >>Yang
> >>>I found some posts from 2015 that say we have to go over all the objects 
> >>>of an rbd image and find the newest mtime put this is not a preferred 
> >>>solution for me. It takes to much time and too many system resources.
> >>>
> >>>Any Ideas ?
> >>>
> >>>Thanks
> >>>   Christoph
> >>>
> >>>
> >>>___________
> >>>ceph-users mailing list
> >>>ceph-users@lists.ceph.com
> >>>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> 

-- 
Es gibt keine  Cloud, es gibt nur die Computer anderer Leute
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Modification Time of RBD Images

2017-03-23 Thread Christoph Adomeit
Hi Yang,

I mean "any write" to this image.

I am sure we have a lot of not-used-anymore rbd images in our pool and I am 
trying to identify them.

The mtime would be a good hint to show which images might be unused.

Christoph

On Thu, Mar 23, 2017 at 07:32:49PM +0800, Dongsheng Yang wrote:
> Hi Christoph,
> 
> On 03/23/2017 07:16 PM, Christoph Adomeit wrote:
> >Hello List,
> >
> >i am wondering if there is meanwhile an easy method in ceph to find more 
> >information about rbd-images.
> >
> >For example I am interested in the modification time of an rbd image.
> 
> Do you mean some metadata changing? such as resize?
> 
> Or any write to this image?
> 
> Thanx
> Yang
> >
> >I found some posts from 2015 that say we have to go over all the objects of 
> >an rbd image and find the newest mtime put this is not a preferred solution 
> >for me. It takes to much time and too many system resources.
> >
> >Any Ideas ?
> >
> >Thanks
> >   Christoph
> >
> >
> >___
> >ceph-users mailing list
> >ceph-users@lists.ceph.com
> >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 

-- 
Es gibt keine  Cloud, es gibt nur die Computer anderer Leute
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Modification Time of RBD Images

2017-03-23 Thread Christoph Adomeit

Hello List,

i am wondering if there is meanwhile an easy method in ceph to find more 
information about rbd-images.

For example I am interested in the modification time of an rbd image.

I found some posts from 2015 that say we have to go over all the objects of an 
rbd image and find the newest mtime put this is not a preferred solution for 
me. It takes to much time and too many system resources.

Any Ideas ?

Thanks
  Christoph


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

2017-03-13 Thread Christoph Adomeit
Thanks for the detailed upgrade report.

We have another scenario: We have allready upgraded to jewel 10.2.6 but 
we are still running all our monitors and osd daemons as root using the
setuser match path directive. 

What would be the recommended way to have all daemons running as ceph:ceph user 
?

Could we chown -R the monitor and osd data directories under /var/lib/ceph one 
by one while keeping up service ?

Thanks
  Christoph

On Sat, Mar 11, 2017 at 12:21:38PM +0100, cephmailingl...@mosibi.nl wrote:
> Hello list,
> 
> A week ago we upgraded our Ceph clusters from Hammer to Jewel and with this
> email we want to share our experiences.
> 
-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] fast-diff map is always invalid

2016-08-05 Thread Christoph Adomeit
Hi Jason,

yes, after I also built object-maps for the snapshots the feature is working as 
expected.

Thanks
  Christoph

On Thu, Aug 04, 2016 at 01:52:54PM -0400, Jason Dillaman wrote:
> Can you run "rbd info vm-208-disk-2@initial.20160729-220225"? You most
> likely need to rebuild the object map for that specific snapshot via
> "rbd object-map rebuild vm-208-disk-2@initial.20160729-220225".
> 
> On Sat, Jul 30, 2016 at 7:17 AM, Christoph Adomeit
> <christoph.adom...@gatworks.de> wrote:
> > Hi there,
> >
> > I upgraded my cluster to jewel recently, built object maps for every image 
> > and
> > recreated all snapshots du use fast-diff feature for backups.
> >
> > Unfortunately i am still getting the following error message on rbd du:
> >
> > root@host:/backups/ceph# rbd du vm-208-disk-2
> > warning: fast-diff map is invalid for 
> > vm-208-disk-2@initial.20160729-220225. operation may be slow.
> >
> > What might be wrong ?
> >
> > root@1host:/backups/ceph# rbd --version
> > ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
> >
> > root@host:/backups/ceph# rbd info vm-208-disk-2
> > rbd image 'vm-208-disk-2':
> > size 275 GB in 70400 objects
> > order 22 (4096 kB objects)
> > block_name_prefix: rbd_data.35ea4ac2ae8944a
> > format: 2
> > features: layering, exclusive-lock, object-map, fast-diff
> > flags:
> >
> > Thanks
> >   Christoph
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> -- 
> Jason

-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Read Stalls with Multiple OSD Servers

2016-08-03 Thread Christoph Adomeit
her with any 
> attachments, and be advised that any dissemination or copying of this message 
> is prohibited.
> 
> 
> 
> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Helander, 
> Thomas [thomas.helan...@kla-tencor.com]
> Sent: Monday, August 01, 2016 10:06 AM
> To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
> Subject: [ceph-users] Read Stalls with Multiple OSD Servers
> Hi,
> 
> I’m running a three server cluster (one monitor, two OSD) and am having a 
> problem where after adding the second OSD server, my read rate drops 
> significantly and eventually the reads stall (writes are improved as 
> expected). Attached is a log of the rados benchmarks for the two 
> configurations and below is my hardware configuration. I’m not using replicas 
> (capacity is more important than uptime for our use case) and am using a 
> single 10GbE network. The pool (rbd) is configured with 128 placement groups.
> 
> I’ve checked the CPU utilization of the ceph-osd processes and they all hover 
> around 10% until the stall. After the stall, the CPU usage is 0% and the 
> disks all show zero operations via iostat. Iperf reports 9.9Gb/s between the 
> monitor and OSD servers.
> 
> I’m looking for any advice/help on how to identify the source of this issue 
> as my attempts so far have proven fruitless…
> 
> Monitor server:
> 2x E5-2680V3
> 32GB DDR4
> 2x 4TB HDD in RAID1 on an Avago/LSI 3108 with Cachevault, configured as 
> write-back
> 10GbE
> 
> OSD servers:
> 2x E5-2680V3
> 128GB DDR4
> 2x 8+2 RAID6 using 8TB SAS12 drives on an Avago/LSI 9380 controller with 
> Cachevault, configured as write-back.
> - Each RAID6 is an OSD
> 10GbE
> 
> Thanks,
> 
> Tom Helander
> 
> KLA-Tencor
> One Technology Dr | M/S 5-2042R | Milpitas, CA | 95035
> 
> CONFIDENTIALITY NOTICE: This e-mail transmission, and any documents, files or 
> previous e-mail messages attached to it, may contain confidential 
> information. If you are not the intended recipient, or a person responsible 
> for delivering it to the intended recipient, you are hereby notified that any 
> disclosure, copying, distribution or use of any of the information contained 
> in or attached to this message is STRICTLY PROHIBITED. If you have received 
> this transmission in error, please immediately notify us by reply e-mail at 
> thomas.helan...@kla-tencor.com<mailto:thomas.helan...@kla-tencor.com> or by 
> telephone at (408) 875-7819, and destroy the original transmission and its 
> attachments without reading them or saving them to disk. Thank you.
> 



> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] fast-diff map is always invalid

2016-07-30 Thread Christoph Adomeit
Hi there,

I upgraded my cluster to jewel recently, built object maps for every image and
recreated all snapshots du use fast-diff feature for backups.

Unfortunately i am still getting the following error message on rbd du:

root@host:/backups/ceph# rbd du vm-208-disk-2
warning: fast-diff map is invalid for vm-208-disk-2@initial.20160729-220225. 
operation may be slow.

What might be wrong ?

root@1host:/backups/ceph# rbd --version
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

root@host:/backups/ceph# rbd info vm-208-disk-2
rbd image 'vm-208-disk-2':
size 275 GB in 70400 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.35ea4ac2ae8944a
format: 2
features: layering, exclusive-lock, object-map, fast-diff
flags: 

Thanks
  Christoph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is anyone seeing iissues with task_numa_find_cpu?

2016-07-01 Thread Christoph Adomeit
ve_wait_queue+0x4d/0x60
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685853]
> >>>>
> >>>> [] ? poll_freewait+0x4a/0xa0
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685882]
> >>>>
> >>>> [] __do_page_fault+0x197/0x400
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685910]
> >>>>
> >>>> [] do_page_fault+0x22/0x30
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685939]
> >>>>
> >>>> [] page_fault+0x28/0x30
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685967]
> >>>>
> >>>> [] ? copy_page_to_iter_iovec+0x5f/0x300
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.685997]
> >>>>
> >>>> [] ? select_task_rq_fair+0x625/0x700
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686026]
> >>>>
> >>>> [] copy_page_to_iter+0x16/0xa0
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686056]
> >>>>
> >>>> [] skb_copy_datagram_iter+0x14d/0x280
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686087]
> >>>>
> >>>> [] tcp_recvmsg+0x613/0xbe0
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686117]
> >>>>
> >>>> [] inet_recvmsg+0x7e/0xb0
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686146]
> >>>>
> >>>> [] sock_recvmsg+0x3b/0x50
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686173]
> >>>>
> >>>> [] SYSC_recvfrom+0xe1/0x160
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686202]
> >>>>
> >>>> [] ? ktime_get_ts64+0x45/0xf0
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686230]
> >>>>
> >>>> [] SyS_recvfrom+0xe/0x10
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686259]
> >>>>
> >>>> [] entry_SYSCALL_64_fastpath+0x16/0x71
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686287] Code: 55 b0 4c
> >>>>
> >>>> 89 f7 e8 53 cd ff ff 48 8b 55 b0 49 8b 4e 78 48 8b 82 d8 01 00 00 48
> >>>>
> >>>> 83 c1 01 31 d2 49 0f af 86 b0 00 00 00 4c 8b 73 78 <48> f7 f1 48 8b 4b
> >>>>
> >>>> 20 49 89 c0 48 29 c1 48 8b 45 d0 4c 03 43 48
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686512] RIP
> >>>>
> >>>> [] task_numa_find_cpu+0x22e/0x6f0
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686544]  RSP
> >>>> 
> >>>>
> >>>> Jun 28 09:46:41 roc04r-sca090 kernel: [137912.686896] ---[ end trace
> >>>>
> >>>> 544cb9f68cb55c93 ]---
> >>>>
> >>>> Jun 28 09:52:15 roc04r-sca090 kernel: [138246.669713] mpt2sas_cm0:
> >>>>
> >>>> log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)
> >>>>
> >>>> Jun 28 09:55:01 roc0
> >>>>
> >>>> ___
> >>>>
> >>>> ceph-users mailing list
> >>>>
> >>>> ceph-users@lists.ceph.com
> >>>>
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>
> >>>>
> >>>>
> >>>> Tim.
> >>>>
> >>>> --
> >>>> Tim Bishop
> >>>> http://www.bishnet.net/tim/
> >>>> PGP Key: 0x6C226B37FDF38D55
> >>>>
> >>>> ___
> >>>> ceph-users mailing list
> >>>> ceph-users@lists.ceph.com
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Does object map feature lock snapshots ?

2016-03-21 Thread Christoph Adomeit
Hi Jason,

I can reproduce the issue 100%

Use standard ceph version 9.2.1 from repository

Create a vm rbd version 2, in my example it is:

vm-192-disk-1

enable these features:

rbd feature enable $IMG exclusive-lock
rbd feature enable $IMG object-map
rbd feature enable $IMG fast-diff

start the vm and inside the vm run some io, i ran bonnie++ in a loop

then go ahead and create first snapshot

/usr/bin/rbd snap create rbd/vm-192-disk-1@initial.20160321-130439
export the snapshot (don't know if it is necessary)

/usr/bin/rbd export --rbd-concurrent-management-ops 20 
vm-192-disk-1@initial.20160321-130439 -|pigz -b 512|/bin/dd 
of=/backups/ceph/vm-192-disk-1.initial.20160321-130439.gz.tmp && /bin/mv 
/backups/ceph/vm-192-disk-1.initial.20160321-130439.gz.tmp 
/backups/ceph/vm-192-disk-1.initial.20160321-130439.gz 


this is no problem, it will work

then create the second snapshot:

/usr/bin/rbd snap create rbd/vm-192-disk-1@incremental.20160321-130741

after a few seconds you see on the console:

2016-03-21 13:08:46.091526 7f8ab372a7c0 -1 librbd::ImageWatcher: 0x561d8a394150 
no lock owners detected


So it is not the export diff that is hanging, it is the rbd snap create 
operation on an
additional snapshot

Often the io in the vm is also hanging and sometimes load in the vm goes up to 
800 or more.

Even after stopping the vm I can see the image has an exclusive lock:

# rbd lock ls vm-192-disk-1
There is 1 exclusive lock on this image.
Locker  ID   Address 
client.71565451 auto 140269345641344 10.67.1.15:0/2701777604 

Without the image features i do not have these problems.

Can you reproduce this ?

Greetings
  Christoph


On Sun, Mar 20, 2016 at 10:57:16AM -0400, Jason Dillaman wrote:
> Definitely not a known issue but from a quick test (running export-diff 
> against an image being actively written) I wasn't able to recreate on v9.2.1. 
>  Are you able to recreate this reliably, and if so, can you share the steps 
> you used?
> 
> Thanks,
> 
> -- 
> 
> Jason Dillaman 
> 
> 
> ----- Original Message -
> > From: "Christoph Adomeit" <christoph.adom...@gatworks.de>
> > To: "Jason Dillaman" <dilla...@redhat.com>
> > Cc: ceph-us...@ceph.com
> > Sent: Friday, March 18, 2016 6:19:16 AM
> > Subject: Re: [ceph-users] Does object map feature lock snapshots ?
> > 
> > Hi,
> > 
> > I had no special logging activated.
> > 
> > Today I re-enabled exclusive-lock object-map and fast-diff on an image in
> > 9.2.1
> > 
> > As soon as I ran an rbd export-diff I had lots of these error messages on 
> > the
> > console of the rbd export process:
> > 
> > 2016-03-18 11:18:21.546658 7f77245d1700  1 heartbeat_map is_healthy
> > 'librbd::thread_pool thread 0x7f77137fe700' had timed out after 60
> > 2016-03-18 11:18:26.546750 7f77245d1700  1 heartbeat_map is_healthy
> > 'librbd::thread_pool thread 0x7f77137fe700' had timed out after 60
> > 2016-03-18 11:18:31.546840 7f77245d1700  1 heartbeat_map is_healthy
> > 'librbd::thread_pool thread 0x7f77137fe700' had timed out after 60
> > 2016-03-18 11:18:36.546928 7f77245d1700  1 heartbeat_map is_healthy
> > 'librbd::thread_pool thread 0x7f77137fe700' had timed out after 60
> > 2016-03-18 11:18:41.547017 7f77245d1700  1 heartbeat_map is_healthy
> > 'librbd::thread_pool thread 0x7f77137fe700' had timed out after 60
> > 2016-03-18 11:18:46.547105 7f77245d1700  1 heartbeat_map is_healthy
> > 'librbd::thread_pool thread 0x7f77137fe700' had timed out after 60
> > 
> > 
> > Is this a known issue ?
> > 
> > 
> > 
> > 
> > 
> > On Tue, Mar 08, 2016 at 11:22:17AM -0500, Jason Dillaman wrote:
> > > Is there anyway for you to provide debug logs (i.e. debug rbd = 20) from
> > > your rbd CLI and qemu process when you attempt to create a snapshot?  In
> > > v9.2.0, there was an issue [1] where the cache flush writeback from the
> > > snap create request was being blocked when the exclusive lock feature was
> > > enabled, but that should have been fixed in v9.2.1.
> > > 
> > > [1] http://tracker.ceph.com/issues/14542
> > > 
> > > --
> > > 
> > > Jason Dillaman
> > > 
> > > 
> > > - Original Message -
> > > > From: "Christoph Adomeit" <christoph.adom...@gatworks.de>
> > > > To: ceph-us...@ceph.com
> > > > Sent: Tuesday, March 8, 2016 11:13:04 AM
> > > > Subject: [ceph-users] Does object map feature lock snapshots ?
> > > > 
> > > > Hi,
> > > > 
> > > > i have installed ceph 9.21 on proxmox with kernel 4.2.8-1

Re: [ceph-users] Cannot remove rbd locks

2016-03-21 Thread Christoph Adomeit
Zhanks Jaseon,

this worked ...

On Fri, Mar 18, 2016 at 02:31:44PM -0400, Jason Dillaman wrote:
> Try the following:
> 
> # rbd lock remove vm-114-disk-1 "auto 140454012457856" client.71260575
> 
> -- 
> 
> Jason Dillaman 
> 
> 
> - Original Message -
> > From: "Christoph Adomeit" <christoph.adom...@gatworks.de>
> > To: ceph-us...@ceph.com
> > Sent: Friday, March 18, 2016 11:14:00 AM
> > Subject: [ceph-users] Cannot remove rbd locks
> > 
> > Hi,
> > 
> > some of my rbds show they have an exclusive lock.
> > 
> > I think the lock can be stale or weeks old.
> > 
> > We have also once added feature exclusive lock and later removed that 
> > feature
> > 
> > I can see the lock:
> > 
> > root@machine:~# rbd lock list vm-114-disk-1
> > There is 1 exclusive lock on this image.
> > Locker  ID   Address
> > client.71260575 auto 140454012457856 10.67.1.14:0/1131494432
> > 
> > iBut I cannot remove the lock:
> > 
> > root@machine:~# rbd lock remove vm-114-disk-1 auto client.71260575
> > rbd: releasing lock failed: (2) No such file or directory
> > 
> > How can I remove the locks ?
> > 
> > Thanks
> >   Christoph
> > 
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 

-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Infernalis: chown ceph:ceph at runtime ?

2016-03-19 Thread Christoph Adomeit
Hi,

we have upgraded our ceph-cluster to infernalis from hammer.

Ceph is still running as root and we are using the 
"setuser match path = /var/lib/ceph/$type/$cluster-$id" directive in ceph.conf

Now we would like to change the ownership of data-files and devices to ceph at 
runtime.

What ist the best way to do this ?

I am thinking about removing the "setuser match path" directive from ceph.conf 
and then stopping one osd after the other, change all files to ceph:ceph and 
then restart the daemon.

Is this the best and recommended way ?

I also once read about a fast parallel chmod syntax in this mailing list but I 
did not yet find the mail. Does someone remember how this was done ?

Thanks
  Christoph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cannot remove rbd locks

2016-03-19 Thread Christoph Adomeit
Hi,

some of my rbds show they have an exclusive lock.

I think the lock can be stale or weeks old.

We have also once added feature exclusive lock and later removed that feature

I can see the lock:

root@machine:~# rbd lock list vm-114-disk-1
There is 1 exclusive lock on this image.
Locker  ID   Address 
client.71260575 auto 140454012457856 10.67.1.14:0/1131494432 

iBut I cannot remove the lock:

root@machine:~# rbd lock remove vm-114-disk-1 auto client.71260575
rbd: releasing lock failed: (2) No such file or directory

How can I remove the locks ?

Thanks
  Christoph


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Does object map feature lock snapshots ?

2016-03-19 Thread Christoph Adomeit
Hi,

I had no special logging activated.

Today I re-enabled exclusive-lock object-map and fast-diff on an image in 9.2.1

As soon as I ran an rbd export-diff I had lots of these error messages on the 
console of the rbd export process:

2016-03-18 11:18:21.546658 7f77245d1700  1 heartbeat_map is_healthy 
'librbd::thread_pool thread 0x7f77137fe700' had timed out after 60
2016-03-18 11:18:26.546750 7f77245d1700  1 heartbeat_map is_healthy 
'librbd::thread_pool thread 0x7f77137fe700' had timed out after 60
2016-03-18 11:18:31.546840 7f77245d1700  1 heartbeat_map is_healthy 
'librbd::thread_pool thread 0x7f77137fe700' had timed out after 60
2016-03-18 11:18:36.546928 7f77245d1700  1 heartbeat_map is_healthy 
'librbd::thread_pool thread 0x7f77137fe700' had timed out after 60
2016-03-18 11:18:41.547017 7f77245d1700  1 heartbeat_map is_healthy 
'librbd::thread_pool thread 0x7f77137fe700' had timed out after 60
2016-03-18 11:18:46.547105 7f77245d1700  1 heartbeat_map is_healthy 
'librbd::thread_pool thread 0x7f77137fe700' had timed out after 60


Is this a known issue ? 





On Tue, Mar 08, 2016 at 11:22:17AM -0500, Jason Dillaman wrote:
> Is there anyway for you to provide debug logs (i.e. debug rbd = 20) from your 
> rbd CLI and qemu process when you attempt to create a snapshot?  In v9.2.0, 
> there was an issue [1] where the cache flush writeback from the snap create 
> request was being blocked when the exclusive lock feature was enabled, but 
> that should have been fixed in v9.2.1.
> 
> [1] http://tracker.ceph.com/issues/14542
> 
> -- 
> 
> Jason Dillaman 
> 
> 
> - Original Message -
> > From: "Christoph Adomeit" <christoph.adom...@gatworks.de>
> > To: ceph-us...@ceph.com
> > Sent: Tuesday, March 8, 2016 11:13:04 AM
> > Subject: [ceph-users] Does object map feature lock snapshots ?
> > 
> > Hi,
> > 
> > i have installed ceph 9.21 on proxmox with kernel 4.2.8-1-pve.
> > 
> > Afterwards I have enabled the features:
> > 
> > rbd feature enable $IMG exclusive-lock
> > rbd feature enable $IMG object-map
> > rbd feature enable $IMG fast-diff
> > 
> > 
> > During the night I have a cronjob which does a rbd snap create on each
> > of my images and then an rbd export-diff
> > 
> > I found out that my cronjob was hanging during the rbd snap create and
> > does not create the snapshot.
> > 
> > Also more worse, sometimes also the vms were hanging.
> > 
> > What are your experiences with object maps ? For me it looks that they
> > are not yet production ready.
> > 
> > Thanks
> >   Christoph
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Does object map feature lock snapshots ?

2016-03-08 Thread Christoph Adomeit
Hi,

i have installed ceph 9.21 on proxmox with kernel 4.2.8-1-pve.

Afterwards I have enabled the features:

rbd feature enable $IMG exclusive-lock
rbd feature enable $IMG object-map
rbd feature enable $IMG fast-diff


During the night I have a cronjob which does a rbd snap create on each
of my images and then an rbd export-diff

I found out that my cronjob was hanging during the rbd snap create and
does not create the snapshot.

Also more worse, sometimes also the vms were hanging. 

What are your experiences with object maps ? For me it looks that they
are not yet production ready.

Thanks
  Christoph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Can I rebuild object maps while VMs are running ?

2016-03-04 Thread Christoph Adomeit
Hi there,

I just updated our ceph-cluster to infernalis and now I want to enable the new 
image features.

I wonder if I can add the features on the rbd images while the VMs are running.

I want to do something like this:

rbd feature enable $IMG exclusive-lock
rbd feature enable $IMG object-map
rbd feature enable $IMG fast-diff
rbd object-map rebuild $IMG 

I am afraid of corrupting my rbds when building images at runtime.

What do you think ?

Thanks
  Christoph


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Modification Time of RBD Images

2015-11-26 Thread Christoph Adomeit
Hi there,

I am using Ceph-Hammer and I am wondering about the following:

What is the recommended way to find out when an rbd-Image was last modified ?

Thanks
  Christoph

-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.94.4 Hammer released

2015-10-21 Thread Christoph Adomeit
Hi there,

I was hoping for the following changes in 0.94.4 release:

-Stable Object Maps for faster Image Handling (Backups, Diffs, du etc). 
-Link against better Malloc implementation like jemalloc

Does 0.94.4 bring any improvement in these areas ?

Thanks
  Christoph



On Mon, Oct 19, 2015 at 02:07:39PM -0700, Sage Weil wrote:
> This Hammer point fixes several important bugs in Hammer, as well as
> fixing interoperability issues that are required before an upgrade to

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] possibility to delete all zeros

2015-10-02 Thread Christoph Adomeit
Hi Stefan,

you can run an fstrim on the mounted images. This will delete the unused space 
from ceph.

Greets
  Christoph


On Fri, Oct 02, 2015 at 02:16:52PM +0200, Stefan Priebe - Profihost AG wrote:
> Hi,
> 
> we accidentally added zeros to all our rbd images. So all images are no 
> longer thin provisioned. As we do not have access to the qemu guests running 
> those images. Is there any other options to trim them again?
> 
> Greets,
> Stefan
> 
> Excuse my typo sent from my mobile phone.

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to disable object-map and exclusive features ?

2015-09-01 Thread Christoph Adomeit
Hi Jason,

I have a coredump with the size of 1200M compressed .

Where shall i put the dump  ?

I think the crashes are often caused when I do a snapshot backup of the 
vm-images.
Then somwething happens with locking which causes the cm to crash 

Thanks
  Christoph

On Mon, Aug 31, 2015 at 09:10:49AM -0400, Jason Dillaman wrote:
> Unfortunately, the tool the dynamically enable/disable image features (rbd 
> feature disable  ) was added during the Infernalis 
> development cycle.  Therefore, in the short-term you would need to recreate 
> the images via export/import or clone/flatten.  
> 
> There are several object map / exclusive lock bug fixes that are scheduled to 
> be included in the 0.94.4 release that might address your issue.  However, 
> without more information, we won't know if the issue you are seeing has been 
> fixed or not in a later release.
> 
> If it is at all possible, it would be most helpful if you could provide logs 
> or a backtrace when the VMs lock up.  Since it happens so infrequently, you 
> may not be willing to increase the RBD debug log level to 20 on one of these 
> VMs. Therefore, another possibility is for you to run "gcore " 
> to generate a core dump or attach GDB to the hung VM and run "thread apply 
> all bt".  With the gcore or backtrace method, we would need a listing of all 
> the package versions installed on the machine to recreate a similar debug 
> environment.
> 
> Thanks,
> 
> Jason 
> 
> 
> - Original Message -
> > From: "Christoph Adomeit" <christoph.adom...@gatworks.de>
> > To: ceph-users@lists.ceph.com
> > Sent: Monday, August 31, 2015 7:49:00 AM
> > Subject: [ceph-users] How to disable object-map and exclusive features ?
> > 
> > Hi there,
> > 
> > I have a ceph-cluster (0.94-2) with >100 rbd kvm images.
> > 
> > Most vms are running rock-solid but 7 vms are hanging about once a week.
> > 
> > I found out the hanging machines have
> > features: layering, exclusive, object map while all other vms do not have
> > exclusive and object map set.
> > 
> > Now I want to disable these features. Is ist possible to disable these
> > features while the vms are running ? Or at least while they are shut down ?
> > Or will I have to recreate all these images ?
> > 
> > Thanks
> >   Christoph
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 

-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to disable object-map and exclusive features ?

2015-08-31 Thread Christoph Adomeit
Hi there,

I have a ceph-cluster (0.94-2) with >100 rbd kvm images.

Most vms are running rock-solid but 7 vms are hanging about once a week.

I found out the hanging machines have 
features: layering, exclusive, object map while all other vms do not have 
exclusive and object map set.

Now I want to disable these features. Is ist possible to disable these features 
while the vms are running ? Or at least while they are shut down ? Or will I 
have to recreate all these images ?

Thanks
  Christoph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Defective Gbic brings whole Cluster down

2015-08-27 Thread Christoph Adomeit
Hello Ceph Users,

yesterday I had a defective Gbic in 1 node of my 10 node ceph cluster.

The Gbic was working somehow but had 50% packet-loss. Some packets went 
through, some did not.

What happend that the whole cluster did not service requests in time, there 
were lots of timeouts and so on
until the problem was isolated. Monitors and osds where asked for data but did 
dot answer or answer late.

I am wondering, here we have a highly redundant network setup and a highly 
redundant piece of software, but a small
network fault brings down the whole cluster.

Is there anything that can be configured or changed in ceph so that 
availability will become better in case of flapping networks ?

I understand, it is not a ceph problem but a network problem but maybe 
something can be learned from such incidents  ?

Thanks
  Christoph
-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to prefer faster disks in same pool

2015-07-09 Thread Christoph Adomeit
Hi Guys,

I have a ceph pool that is mixed with 10k rpm disks and 7.2 k rpm disks.

There are 85 osds and 10 of them are 10k
Size is not an issue, the pool is filled only 20%

I want to somehow prefer the 10 k rpm disks so that they get more i/o

What is the most intelligent wy to prefer the faster disks ?
Just give them another weight or are there other methods ?

Thanks
  Christoph


-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] straw to straw2 migration

2015-06-29 Thread Christoph Adomeit
Hello Dziannis,

I am also planning to change our cluster from straw to straw2 because we
habe different hdd sizes and changes in the HDD-Sizes always triggers a lot of 
reorganization load.

Did you experience any issues ? Did you already change the other hosts ?

Don't you think we will have less reorganization if we change the algorithm on
all machines at the same time ?

We are using hammer 0.94.2 and proxmox Distribution with RedHat Kernel 
2.6.32-37-pve

Do you think the kernel can be a problem ?

Thanks
  Christoph


On Wed, Jun 24, 2015 at 06:11:26PM +0300, Dzianis Kahanovich wrote:
 I plan to migrate cluster from straw to straw2 mapping. Ceph and
 kernels is up to date (kernel 4.1.0), so I want to change directly
 in crush map srew to straw2 and load changed crush map (by steps -
 per host and rack). Are this relative safe and must be remapped
 runtime?
 
 -- 
 WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Use object-map Feature on existing rbd images ?

2015-04-28 Thread Christoph Adomeit
Hi there,

we are using ceph hammer and we have some fully provisioned
images with only little data.

rbd export of a 500 GB rbd Image takes long time although there are only
15 GB of used data, even if the rbd-image is trimmed.

Do you think it is a good idea to enable the object-map feature on
already existing rbd images ?

I am thinking about using the rbd binary from ceph-master to change
the features of some existing rbd images like this:

rbd --image-feature exclusive-lock
rbd --image-feature object-map

Do you think this is a good idea, this will work and will be stable ?

Thanks
  Christoph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] fstrim does not shrink ceph OSD disk usage ?

2015-04-24 Thread Christoph Adomeit
Hi there,

I have a ceph cluster running hammer-release.

Recently I trimmed a lot of virtual disks and I can verify that
the size of the images has decreased a lot.

I checked this with:

/usr/bin/rbd diff $IMG | grep -v zero |awk '{ SUM += $2 } END { print 
SUM/1024/1024  MB }'

the output after fstrim is always lower than the output before the fstrim 
becaue the trimmed blocks are shown as zero by rbd diff-


However: The output of ceph status never decreases, now matter about how
many disks I trim, I always have the output:
26457 GB used, 74549 GB / 101006 GB avail

I expected the 26457 GB number to decrease.

Doesn't ceph delete or truncate or zero trimmed blocks on osds ?

Is there something I misunderstand ?

Thanks
  Christoph

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD Crash makes whole cluster unusable ?

2014-12-16 Thread Christoph Adomeit

Hi there,

today I had an osd crash with ceph 0.87/giant which made my hole cluster 
unusable for 45 Minutes.

First it began with a disk error:

sd 0:1:2:0: [sdc] CDB: Read(10)Read(10):: 28 28 00 00 0d 15 fe d0 fd 7b e8 f8 
00 00 00 00 b0 08 00 00
XFS (sdc1): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5. 

Then most other osds found out that my osd.3 is down:

2014-12-16 08:45:15.873478 mon.0 10.67.1.11:6789/0 3361077 : cluster [INF] 
osd.3 10.67.1.11:6810/713621 failed (42 reports from 35 peers after 23.642482 
= grace 23.348982) 

5 minutes later the osd is marked as out:
2014-12-16 08:50:21.095903 mon.0 10.67.1.11:6789/0 3361367 : cluster [INF] 
osd.3 out (down for 304.581079) 

However, since 8:45 until 9:20 I have 1000 slow requests and 107 incomplete 
pgs. Many requests are not answered:

2014-12-16 08:46:03.029094 mon.0 10.67.1.11:6789/0 3361126 : cluster [INF] 
pgmap v6930583: 4224 pgs: 4117 active+clean, 107 incomplete; 7647 GB data, 
19090 GB used, 67952 GB / 87042 GB avail; 2307 kB/s rd, 2293 kB/s wr, 407 op/s

Also a recovery to another osd was not starting

Seems the osd thinks it is still up and all other osds think this osd is down ?
I found this in the log of osd3:
ceph-osd.3.log:2014-12-16 08:45:19.319152 7faf81296700  0 log_channel(default) 
log [WRN] : map e61177 wrongly marked me down
ceph-osd.3.log:  -440 2014-12-16 08:45:19.319152 7faf81296700  0 
log_channel(default) log [WRN] : map e61177 wrongly marked me down

Luckily I was able to restart osd3 and everything was working again but I do 
not understand what has happened. The cluster ways simply not usable for 45 
Minutes.

Any ideas

Thanks
  Christoph


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow Requests when taking down OSD Node

2014-12-03 Thread Christoph Adomeit
Yes,

I get this message:

2014-12-02 20:26:14.854853 7fe5a0a80700  0 log_channel(cluster) log [INF] : 
osd.51 marked itself down

So I am wondering, why are there so many slow requests although the osd marks 
itself down.

Christoph

On Tue, Dec 02, 2014 at 11:48:07AM -0800, Craig Lewis wrote:
 If you watch `ceph -w` while stopping the OSD, do you see
 2014-12-02 11:45:17.715629 mon.0 [INF] osd.X marked itself down
 
 ?
 
 On Tue, Dec 2, 2014 at 11:06 AM, Christoph Adomeit 
 christoph.adom...@gatworks.de wrote:
 
  Thanks Craig,
 
  but this is what I am doing.
 
  After setting ceph osd set noout I do a service ceph stop osd.51
  and as soon as I do this I get growing numbers (200) of slow requests,
  although there is not a big load on my cluster.
 
  Christoph
 
  On Tue, Dec 02, 2014 at 10:40:13AM -0800, Craig Lewis wrote:
   I've found that it helps to shut down the osds before shutting down the
   host.  Especially if the node is also a monitor.  It seems that some OSD
   shutdown messages get lost while monitors are holding elections.
  
   On Tue, Dec 2, 2014 at 10:10 AM, Christoph Adomeit 
   christoph.adom...@gatworks.de wrote:
  
Hi there,
   
I have a giant cluster with 60 OSDs on 6 OSD Hosts.
   
Now I want to do maintenance on one of the OSD Hosts.
   
The documented Procedure is to ceph osd set noout and then shutdown
the OSD Node for maintenance.
   
However, as soon as I even shut down 1 OSD I get around 200 slow
  requests
and the number of slow requests is growing for minutes.
   
The test was done at night with low IOPS and I was expecting the
  cluster
to handle this condition much better.
   
Is there some way of a more graceful shutdown of OSDs so that I can
  prevent
those slow requests ? I suppose it takes some time until monitor gets
notified that an OSD was shutdown.
   
Thanks
  Christoph
   
   
   
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
   
 
 

-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow Requests when taking down OSD Node

2014-12-02 Thread Christoph Adomeit
Thanks Craig,

but this is what I am doing.

After setting ceph osd set noout I do a service ceph stop osd.51 
and as soon as I do this I get growing numbers (200) of slow requests, 
although there is not a big load on my cluster.

Christoph

On Tue, Dec 02, 2014 at 10:40:13AM -0800, Craig Lewis wrote:
 I've found that it helps to shut down the osds before shutting down the
 host.  Especially if the node is also a monitor.  It seems that some OSD
 shutdown messages get lost while monitors are holding elections.
 
 On Tue, Dec 2, 2014 at 10:10 AM, Christoph Adomeit 
 christoph.adom...@gatworks.de wrote:
 
  Hi there,
 
  I have a giant cluster with 60 OSDs on 6 OSD Hosts.
 
  Now I want to do maintenance on one of the OSD Hosts.
 
  The documented Procedure is to ceph osd set noout and then shutdown
  the OSD Node for maintenance.
 
  However, as soon as I even shut down 1 OSD I get around 200 slow requests
  and the number of slow requests is growing for minutes.
 
  The test was done at night with low IOPS and I was expecting the cluster
  to handle this condition much better.
 
  Is there some way of a more graceful shutdown of OSDs so that I can prevent
  those slow requests ? I suppose it takes some time until monitor gets
  notified that an OSD was shutdown.
 
  Thanks
Christoph
 
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fastest way to shrink/rewrite rbd image ?

2014-11-28 Thread Christoph Adomeit
Hi,

I would like to shrink a thin provisioned rbd image which has grown to maximum.
90% of the data in the image is deleted data which is still hidden in the image 
and marked as deleted.

So I think I can fill the whole Image with zeroes and then qemu-img convert it.
So the newly created image should be only 10% of the maximum size.

I will do something like
qemu-img convert -O raw rbd:pool/origimage rbd:pool/smallimage
rbd rename origimage origimage-saved
rbd rename smallimage origimage

Would this be the best and fastest way or are there other ways to do this ?

Thanks
  Christoph



-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Solaris 10 VMs extremely slow in KVM on Ceph RBD Devices

2014-11-12 Thread Christoph Adomeit
Hi,

i installed a Ceph Cluster with 50 OSDs on 4 Hosts and finally I am really 
happy with it.

Linux and Windows VMs run really fast in KVM on the Ceph Storage.

Only my Solaris 10 guests are terribly slow on ceph rbd storage. A solaris on 
Ceph Storage needs 15 Minutes to boot. When I move the Solaris Image to the old 
nexenta nfs storage and start it on the same kvm host it will fly and boot in 
1,5 Minutes.

I have tested ceph firefly and giant and the Problem is with both ceph versions.

The performance problem is not only with booting. The problem continues when 
the server is up. EVerything is terribly slow.

So the only difference here is ceph vs. nexenta nfs storage that causes the big 
performance problems.

The solaris guests have zfs root standard installation.

Does anybody have an idea or a hint what might go on here and what I should try 
to make solaris 10 Guests faster on ceph storage ?

Many Thanks
  Christoph

-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Newbie Ceph Design Questions

2014-09-18 Thread Christoph Adomeit

Hello Ceph-Community,

we are considering to use a Ceph Cluster for serving VMs.
We need goog performance and absolute stability.

Regarding Ceph I have a few questions.

Presently we use Solaris ZFS Boxes as NFS Storage for VMs.

The zfs boxes are totally fast, because they use all free ram
for read caches. With arc stats we can see that 90% of all read 
operations are served from memory. Also read cache in zfs is very 
intelligent about what blocks to put in the read cache.

From Reading about Ceph it seems that ceph Clusters dont have
such an optimized read cache. Do you think we can still perform
as well as the solaris boxes ?

Next question: I read that in Ceph an OSD is marked invalid, as 
soon as its journaling disk is invalid. So what should I do ? I don't
want to use 1 Journal Disk for each osd. I also dont want to use 
a journal disk per 4 osds because then I will loose 4 osds if an ssd
fails. Using journals on osd Disks i am afraid will be slow.
Again I am afraid of slow Ceph performance compared to zfs because
zfs supports zil write cache disks .

Last Question: Someone told me Ceph Snapshots are slow. Is this true ?
I always thought making a snapshot is just moving around some pointers 
to data.

And very last question: What about btrfs, still not recommended ?

Thanks for helping

Christoph

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com