Re: [ceph-users] problem returning mon back to cluster

2019-11-14 Thread Nikola Ciprich
Hi,

just wanted to add some info..

1) I was able to workaround the problem (as advised by Harald) by increasing
mon_lease to 50s, waiting for monitor to join the cluster (it took hours!) and
decreasing it again.

2) since then we got hit by the same problem on different cluster. same 
symptoms,
same workaround.

3) I was able to 100% reproduce the problem on cleanly installed ceph 
environment
in virtual machines with same addresation and copied monitor data. for anyone 
of developers interested, I could give direct SSH access.

shall I fill a bug report for this?

thanks

nik



On Tue, Oct 15, 2019 at 07:17:38AM +0200, Nikola Ciprich wrote:
> On Tue, Oct 15, 2019 at 06:50:31AM +0200, Nikola Ciprich wrote:
> > 
> > 
> > On Mon, Oct 14, 2019 at 11:52:55PM +0200, Paul Emmerich wrote:
> > > How big is the mon's DB?  As in just the total size of the directory you 
> > > copied
> > > 
> > > FWIW I recently had to perform mon surgery on a 14.2.4 (or was it
> > > 14.2.2?) cluster with 8 GB mon size and I encountered no such problems
> > > while syncing a new mon which took 10 minutes or so.
> > Hi Paul,
> > 
> > yup I forgot to mention this.. It doesn't seem to be too big, just about
> > 100MB. I also noticed that while third monitor tries to join the cluster,
> > leader starts flapping between "leader" and "electing", so I suppose it's
> > quorum forming problem.. I tried bumping debug_ms and debug_paxos but
> > couldn't make head or tails of it.. can paste the logs somewhere if it
> > can help
> 
> btw I just noticed, that on test cluster, third mon finally managed to join
> the cluster and forum got formed.. after more then 6 hours.. knowing that 
> during
> it, the IO blocks for clients, it's pretty scary
> 
> now I can stop/start monitors without problems on it.. so it somehow got 
> "fixed"
> 
> still dunno what to do with this production cluster though, so I'll just 
> prepare
> test environment again and try digging more into it
> 
> BR
> 
> nik
> 
> 
> 
> 
> 
> > 
> > BR
> > 
> > nik
> > 
> > 
> > 
> > > 
> > > Paul
> > > 
> > > -- 
> > > Paul Emmerich
> > > 
> > > Looking for help with your Ceph cluster? Contact us at https://croit.io
> > > 
> > > croit GmbH
> > > Freseniusstr. 31h
> > > 81247 München
> > > www.croit.io
> > > Tel: +49 89 1896585 90
> > > 
> > > On Mon, Oct 14, 2019 at 9:41 PM Nikola Ciprich
> > >  wrote:
> > > >
> > > > On Mon, Oct 14, 2019 at 04:31:22PM +0200, Nikola Ciprich wrote:
> > > > > On Mon, Oct 14, 2019 at 01:40:19PM +0200, Harald Staub wrote:
> > > > > > Probably same problem here. When I try to add another MON, "ceph
> > > > > > health" becomes mostly unresponsive. One of the existing ceph-mon
> > > > > > processes uses 100% CPU for several minutes. Tried it on 2 test
> > > > > > clusters (14.2.4, 3 MONs, 5 storage nodes with around 2 hdd osds
> > > > > > each). To avoid errors like "lease timeout", I temporarily increase
> > > > > > "mon lease", from 5 to 50 seconds.
> > > > > >
> > > > > > Not sure how bad it is from a customer PoV. But it is a problem by
> > > > > > itself to be several minutes without "ceph health", when there is an
> > > > > > increased risk of losing the quorum ...
> > > > >
> > > > > Hi Harry,
> > > > >
> > > > > thanks a lot for your reply! not sure we're experiencing the same 
> > > > > issue,
> > > > > i don't have it on any other cluster.. when this is happening to you, 
> > > > > does
> > > > > only ceph health stop working, or it also blocks all clients IO?
> > > > >
> > > > > BR
> > > > >
> > > > > nik
> > > > >
> > > > >
> > > > > >
> > > > > >  Harry
> > > > > >
> > > > > > On 13.10.19 20:26, Nikola Ciprich wrote:
> > > > > > >dear ceph users and developers,
> > > > > > >
> > > > > > >on one of our production clusters, we got into pretty unpleasant 
> > > > > > >situation.
> > > > > > >
> > > > > > >After rebooting one of the nodes, when trying to start monitor, 
> > > > >

Re: [ceph-users] problem returning mon back to cluster

2019-10-14 Thread Nikola Ciprich
On Tue, Oct 15, 2019 at 06:50:31AM +0200, Nikola Ciprich wrote:
> 
> 
> On Mon, Oct 14, 2019 at 11:52:55PM +0200, Paul Emmerich wrote:
> > How big is the mon's DB?  As in just the total size of the directory you 
> > copied
> > 
> > FWIW I recently had to perform mon surgery on a 14.2.4 (or was it
> > 14.2.2?) cluster with 8 GB mon size and I encountered no such problems
> > while syncing a new mon which took 10 minutes or so.
> Hi Paul,
> 
> yup I forgot to mention this.. It doesn't seem to be too big, just about
> 100MB. I also noticed that while third monitor tries to join the cluster,
> leader starts flapping between "leader" and "electing", so I suppose it's
> quorum forming problem.. I tried bumping debug_ms and debug_paxos but
> couldn't make head or tails of it.. can paste the logs somewhere if it
> can help

btw I just noticed, that on test cluster, third mon finally managed to join
the cluster and forum got formed.. after more then 6 hours.. knowing that during
it, the IO blocks for clients, it's pretty scary

now I can stop/start monitors without problems on it.. so it somehow got "fixed"

still dunno what to do with this production cluster though, so I'll just prepare
test environment again and try digging more into it

BR

nik





> 
> BR
> 
> nik
> 
> 
> 
> > 
> > Paul
> > 
> > -- 
> > Paul Emmerich
> > 
> > Looking for help with your Ceph cluster? Contact us at https://croit.io
> > 
> > croit GmbH
> > Freseniusstr. 31h
> > 81247 München
> > www.croit.io
> > Tel: +49 89 1896585 90
> > 
> > On Mon, Oct 14, 2019 at 9:41 PM Nikola Ciprich
> >  wrote:
> > >
> > > On Mon, Oct 14, 2019 at 04:31:22PM +0200, Nikola Ciprich wrote:
> > > > On Mon, Oct 14, 2019 at 01:40:19PM +0200, Harald Staub wrote:
> > > > > Probably same problem here. When I try to add another MON, "ceph
> > > > > health" becomes mostly unresponsive. One of the existing ceph-mon
> > > > > processes uses 100% CPU for several minutes. Tried it on 2 test
> > > > > clusters (14.2.4, 3 MONs, 5 storage nodes with around 2 hdd osds
> > > > > each). To avoid errors like "lease timeout", I temporarily increase
> > > > > "mon lease", from 5 to 50 seconds.
> > > > >
> > > > > Not sure how bad it is from a customer PoV. But it is a problem by
> > > > > itself to be several minutes without "ceph health", when there is an
> > > > > increased risk of losing the quorum ...
> > > >
> > > > Hi Harry,
> > > >
> > > > thanks a lot for your reply! not sure we're experiencing the same issue,
> > > > i don't have it on any other cluster.. when this is happening to you, 
> > > > does
> > > > only ceph health stop working, or it also blocks all clients IO?
> > > >
> > > > BR
> > > >
> > > > nik
> > > >
> > > >
> > > > >
> > > > >  Harry
> > > > >
> > > > > On 13.10.19 20:26, Nikola Ciprich wrote:
> > > > > >dear ceph users and developers,
> > > > > >
> > > > > >on one of our production clusters, we got into pretty unpleasant 
> > > > > >situation.
> > > > > >
> > > > > >After rebooting one of the nodes, when trying to start monitor, 
> > > > > >whole cluster
> > > > > >seems to hang, including IO, ceph -s etc. When this mon is stopped 
> > > > > >again,
> > > > > >everything seems to continue. Traying to spawn new monitor leads to 
> > > > > >the same problem
> > > > > >(even on different node).
> > > > > >
> > > > > >I had to give up after minutes of outage, since it's unacceptable. I 
> > > > > >think we had this
> > > > > >problem once in the past on this cluster, but after some (but much 
> > > > > >shorter) time, monitor
> > > > > >joined and it worked fine since then.
> > > > > >
> > > > > >All cluster nodes are centos 7 machines, I have 3 monitors (so 2 are 
> > > > > >now running), I'm
> > > > > >using ceph 13.2.6
> > > > > >
> > > > > >Network connection seems to be fine.
> > > > > >
> > > > > >Anyone seen similar problem? I'd be very grate

Re: [ceph-users] problem returning mon back to cluster

2019-10-14 Thread Nikola Ciprich



On Mon, Oct 14, 2019 at 11:52:55PM +0200, Paul Emmerich wrote:
> How big is the mon's DB?  As in just the total size of the directory you 
> copied
> 
> FWIW I recently had to perform mon surgery on a 14.2.4 (or was it
> 14.2.2?) cluster with 8 GB mon size and I encountered no such problems
> while syncing a new mon which took 10 minutes or so.
Hi Paul,

yup I forgot to mention this.. It doesn't seem to be too big, just about
100MB. I also noticed that while third monitor tries to join the cluster,
leader starts flapping between "leader" and "electing", so I suppose it's
quorum forming problem.. I tried bumping debug_ms and debug_paxos but
couldn't make head or tails of it.. can paste the logs somewhere if it
can help

BR

nik



> 
> Paul
> 
> -- 
> Paul Emmerich
> 
> Looking for help with your Ceph cluster? Contact us at https://croit.io
> 
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
> 
> On Mon, Oct 14, 2019 at 9:41 PM Nikola Ciprich
>  wrote:
> >
> > On Mon, Oct 14, 2019 at 04:31:22PM +0200, Nikola Ciprich wrote:
> > > On Mon, Oct 14, 2019 at 01:40:19PM +0200, Harald Staub wrote:
> > > > Probably same problem here. When I try to add another MON, "ceph
> > > > health" becomes mostly unresponsive. One of the existing ceph-mon
> > > > processes uses 100% CPU for several minutes. Tried it on 2 test
> > > > clusters (14.2.4, 3 MONs, 5 storage nodes with around 2 hdd osds
> > > > each). To avoid errors like "lease timeout", I temporarily increase
> > > > "mon lease", from 5 to 50 seconds.
> > > >
> > > > Not sure how bad it is from a customer PoV. But it is a problem by
> > > > itself to be several minutes without "ceph health", when there is an
> > > > increased risk of losing the quorum ...
> > >
> > > Hi Harry,
> > >
> > > thanks a lot for your reply! not sure we're experiencing the same issue,
> > > i don't have it on any other cluster.. when this is happening to you, does
> > > only ceph health stop working, or it also blocks all clients IO?
> > >
> > > BR
> > >
> > > nik
> > >
> > >
> > > >
> > > >  Harry
> > > >
> > > > On 13.10.19 20:26, Nikola Ciprich wrote:
> > > > >dear ceph users and developers,
> > > > >
> > > > >on one of our production clusters, we got into pretty unpleasant 
> > > > >situation.
> > > > >
> > > > >After rebooting one of the nodes, when trying to start monitor, whole 
> > > > >cluster
> > > > >seems to hang, including IO, ceph -s etc. When this mon is stopped 
> > > > >again,
> > > > >everything seems to continue. Traying to spawn new monitor leads to 
> > > > >the same problem
> > > > >(even on different node).
> > > > >
> > > > >I had to give up after minutes of outage, since it's unacceptable. I 
> > > > >think we had this
> > > > >problem once in the past on this cluster, but after some (but much 
> > > > >shorter) time, monitor
> > > > >joined and it worked fine since then.
> > > > >
> > > > >All cluster nodes are centos 7 machines, I have 3 monitors (so 2 are 
> > > > >now running), I'm
> > > > >using ceph 13.2.6
> > > > >
> > > > >Network connection seems to be fine.
> > > > >
> > > > >Anyone seen similar problem? I'd be very grateful for tips on how to 
> > > > >debug and solve this..
> > > > >
> > > > >for those interested, here's log of one of running monitors with 
> > > > >debug_mon set to 10/10:
> > > > >
> > > > >https://storage.lbox.cz/public/d258d0
> > > > >
> > > > >if I could provide more info, please let me know
> > > > >
> > > > >with best regards
> > > > >
> > > > >nikola ciprich
> >
> > just to add quick update, I was able to reproduce the issue by transferring 
> > monitor
> > directories to test environmen with same IP adressing, so I can safely play 
> > with that
> > now..
> >
> > increasing lease timeout didn't help me to fix the problem,
> > but at least I seem to be able to use ceph -s now.
> >
> > few things I noticed in the meantime:
> >
> > - when

Re: [ceph-users] problem returning mon back to cluster

2019-10-14 Thread Nikola Ciprich
On Mon, Oct 14, 2019 at 04:31:22PM +0200, Nikola Ciprich wrote:
> On Mon, Oct 14, 2019 at 01:40:19PM +0200, Harald Staub wrote:
> > Probably same problem here. When I try to add another MON, "ceph
> > health" becomes mostly unresponsive. One of the existing ceph-mon
> > processes uses 100% CPU for several minutes. Tried it on 2 test
> > clusters (14.2.4, 3 MONs, 5 storage nodes with around 2 hdd osds
> > each). To avoid errors like "lease timeout", I temporarily increase
> > "mon lease", from 5 to 50 seconds.
> > 
> > Not sure how bad it is from a customer PoV. But it is a problem by
> > itself to be several minutes without "ceph health", when there is an
> > increased risk of losing the quorum ...
> 
> Hi Harry,
> 
> thanks a lot for your reply! not sure we're experiencing the same issue,
> i don't have it on any other cluster.. when this is happening to you, does
> only ceph health stop working, or it also blocks all clients IO?
> 
> BR
> 
> nik
> 
> 
> > 
> >  Harry
> > 
> > On 13.10.19 20:26, Nikola Ciprich wrote:
> > >dear ceph users and developers,
> > >
> > >on one of our production clusters, we got into pretty unpleasant situation.
> > >
> > >After rebooting one of the nodes, when trying to start monitor, whole 
> > >cluster
> > >seems to hang, including IO, ceph -s etc. When this mon is stopped again,
> > >everything seems to continue. Traying to spawn new monitor leads to the 
> > >same problem
> > >(even on different node).
> > >
> > >I had to give up after minutes of outage, since it's unacceptable. I think 
> > >we had this
> > >problem once in the past on this cluster, but after some (but much 
> > >shorter) time, monitor
> > >joined and it worked fine since then.
> > >
> > >All cluster nodes are centos 7 machines, I have 3 monitors (so 2 are now 
> > >running), I'm
> > >using ceph 13.2.6
> > >
> > >Network connection seems to be fine.
> > >
> > >Anyone seen similar problem? I'd be very grateful for tips on how to debug 
> > >and solve this..
> > >
> > >for those interested, here's log of one of running monitors with debug_mon 
> > >set to 10/10:
> > >
> > >https://storage.lbox.cz/public/d258d0
> > >
> > >if I could provide more info, please let me know
> > >
> > >with best regards
> > >
> > >nikola ciprich

just to add quick update, I was able to reproduce the issue by transferring 
monitor
directories to test environmen with same IP adressing, so I can safely play 
with that
now..

increasing lease timeout didn't help me to fix the problem,
but at least I seem to be able to use ceph -s now.

few things I noticed in the meantime:

- when I start problematic monitor, monitor slow ops start to appear for
quorum leader and the count is slowly increasing:

44 slow ops, oldest one blocked for 130 sec, mon.nodev1c has slow 
ops
 
- removing and recreating monitor didn't help

- checking mon_status of problematic monitor shows it remains in the 
"synchronizing" state

I tried increasing debug_ms and debug_paxos but didn't see anything usefull 
there..

will report further when I got something. I anyone has any idea in the 
meantime, please
let me know.

BR

nik




-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpVNlq5CstSn.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] problem returning mon back to cluster

2019-10-14 Thread Nikola Ciprich
On Mon, Oct 14, 2019 at 01:40:19PM +0200, Harald Staub wrote:
> Probably same problem here. When I try to add another MON, "ceph
> health" becomes mostly unresponsive. One of the existing ceph-mon
> processes uses 100% CPU for several minutes. Tried it on 2 test
> clusters (14.2.4, 3 MONs, 5 storage nodes with around 2 hdd osds
> each). To avoid errors like "lease timeout", I temporarily increase
> "mon lease", from 5 to 50 seconds.
> 
> Not sure how bad it is from a customer PoV. But it is a problem by
> itself to be several minutes without "ceph health", when there is an
> increased risk of losing the quorum ...

Hi Harry,

thanks a lot for your reply! not sure we're experiencing the same issue,
i don't have it on any other cluster.. when this is happening to you, does
only ceph health stop working, or it also blocks all clients IO?

BR

nik


> 
>  Harry
> 
> On 13.10.19 20:26, Nikola Ciprich wrote:
> >dear ceph users and developers,
> >
> >on one of our production clusters, we got into pretty unpleasant situation.
> >
> >After rebooting one of the nodes, when trying to start monitor, whole cluster
> >seems to hang, including IO, ceph -s etc. When this mon is stopped again,
> >everything seems to continue. Traying to spawn new monitor leads to the same 
> >problem
> >(even on different node).
> >
> >I had to give up after minutes of outage, since it's unacceptable. I think 
> >we had this
> >problem once in the past on this cluster, but after some (but much shorter) 
> >time, monitor
> >joined and it worked fine since then.
> >
> >All cluster nodes are centos 7 machines, I have 3 monitors (so 2 are now 
> >running), I'm
> >using ceph 13.2.6
> >
> >Network connection seems to be fine.
> >
> >Anyone seen similar problem? I'd be very grateful for tips on how to debug 
> >and solve this..
> >
> >for those interested, here's log of one of running monitors with debug_mon 
> >set to 10/10:
> >
> >https://storage.lbox.cz/public/d258d0
> >
> >if I could provide more info, please let me know
> >
> >with best regards
> >
> >nikola ciprich
> >
> >
> >
> >
> >
> >
> >
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgp6aNC92DcqR.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] problem returning mon back to cluster

2019-10-13 Thread Nikola Ciprich
dear ceph users and developers,

on one of our production clusters, we got into pretty unpleasant situation.

After rebooting one of the nodes, when trying to start monitor, whole cluster
seems to hang, including IO, ceph -s etc. When this mon is stopped again,
everything seems to continue. Traying to spawn new monitor leads to the same 
problem
(even on different node).

I had to give up after minutes of outage, since it's unacceptable. I think we 
had this
problem once in the past on this cluster, but after some (but much shorter) 
time, monitor
joined and it worked fine since then.

All cluster nodes are centos 7 machines, I have 3 monitors (so 2 are now 
running), I'm
using ceph 13.2.6

Network connection seems to be fine.

Anyone seen similar problem? I'd be very grateful for tips on how to debug and 
solve this..

for those interested, here's log of one of running monitors with debug_mon set 
to 10/10:

https://storage.lbox.cz/public/d258d0

if I could provide more info, please let me know

with best regards

nikola ciprich







-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] unable to manually flush cache: failed to flush /xxx: (2) No such file or directory

2019-04-24 Thread Nikola Ciprich
Hi,

we're having issue on one of our clusters, while wanting
to remove cache tier, trying to manually flush cache always
ends up with error:

rados -p ssd-cache cache-flush-evict-all
.
.
.
failed to flush /rb.0.965780.238e1f29.1641: (2) No such file or 
directory
   rb.0.965780.238e1f29.02c8
failed to flush /rb.0.965780.238e1f29.02c8: (2) No such file or 
directory
   rb.0.965780.238e1f29.9113
failed to flush /rb.0.965780.238e1f29.9113: (2) No such file or 
directory
   rb.0.965780.238e1f29.9b0f
failed to flush /rb.0.965780.238e1f29.9b0f: (2) No such file or 
directory
   rb.0.965780.238e1f29.62b6
failed to flush /rb.0.965780.238e1f29.62b6: (2) No such file or 
directory
   rb.0.965780.238e1f29.030c
.
.
.


cluster is healthy, running 13.2.5

any idea on what might be wrong?

should I provide more details, please let me know

BR

nik



-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd: error processing image xxx (2) No such file or directory

2019-04-02 Thread Nikola Ciprich
Hi,

on one of my clusters, I'm getting error message which is getting
me a bit nervous.. while listing contents of a pool I'm getting
error for one of images:

[root@node1 ~]# rbd ls -l nvme > /dev/null
rbd: error processing image  xxx: (2) No such file or directory

[root@node1 ~]# rbd info nvme/xxx
rbd image 'xxx':
size 60 GiB in 15360 objects
order 22 (4 MiB objects)
id: 132773d6deb56
block_name_prefix: rbd_data.132773d6deb56
format: 2
features: layering, operations
op_features: snap-trash
flags: 
create_timestamp: Wed Aug 29 12:25:13 2018

volume contains production data and seems to be working correctly (it's used
by VM)

is this something to worry about? What is snap-trash feature? wasn't able to 
google
much about it..

I'm running ceph 13.2.4 on centos 7.

I'd be gratefull any help

BR

nik


-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bad crc/signature errors

2018-08-14 Thread Nikola Ciprich
> > Hi Ilya,
> >
> > hmm, OK, I'm not  sure now whether this is the bug which I'm
> > experiencing.. I've had read_partial_message  / bad crc/signature
> > problem occurance on the second cluster in short period even though
> > we're on the same ceph version (12.2.5) for quite long time (almost since
> > its release), so it's starting to pain me.. I suppose this must
> > have been caused by some kernel update, (we're currently sticking
> > to 4.14.x and lately been upgrading to 4.14.50)
> 
> These "bad crc/signature" are usually the sign of faulty hardware.
> 
> What was the last "good" kernel and the first "bad" kernel?
> 
> You said "on the second cluster".  How is it different from the first?
> Are you using the kernel client with both?  Is there Xen involved?

it's complicated.. both those clusters are fairly new, running kernel 4.14.50,
ceph 12.2.5. XEN is not involved, but KVM is. 

I think those were already installed with this kernel. 

I was thinking about that, and main difference compared to other (and older)
clusters is, krbd is used much more: before, we were  using krbd only for
postgres, and qemu-kvm accessed RBD volumes using librbd. on new clusters
where problems occured, all volumes are accessed using krbd, since it performs
way much better.. so we'll just revert to librbd and I'll try to find way to
reproduce. If I find some, we can talk about bisect, but it's possible the 
problem
is here for the long time, but since we didn't use krbd heavily, it just didn't
occur..

but I think we can rule out hardware problem here..


> 
> Thanks,
> 
> Ilya
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bad crc/signature errors

2018-08-13 Thread Nikola Ciprich
Hi Ilya,

hmm, OK, I'm not  sure now whether this is the bug which I'm
experiencing.. I've had read_partial_message  / bad crc/signature
problem occurance on the second cluster in short period even though
we're on the same ceph version (12.2.5) for quite long time (almost since
its release), so it's starting to pain me.. I suppose this must
have been caused by some kernel update, (we're currently sticking
to 4.14.x and lately been upgrading to 4.14.50)

not sure whether this in is of some use..

BR

nik






On Mon, Aug 13, 2018 at 03:22:21PM +0200, Ilya Dryomov wrote:
> On Mon, Aug 13, 2018 at 2:49 PM Nikola Ciprich
>  wrote:
> >
> > Hi Paul,
> >
> > thanks, I'll give it a try.. do you think this might head to
> > upstream soon?  for some reason I can't review comments for
> > this patch on github.. Is some new version of this patch
> > on the way, or can I try to apply this one to latest luminous?
> >
> > thanks a lot!
> >
> > nik
> >
> >
> > On Fri, Aug 10, 2018 at 06:05:26PM +0200, Paul Emmerich wrote:
> > > I've built a work-around here:
> > > https://github.com/ceph/ceph/pull/23273
> 
> Those are completely different crc errors.  The ones Paul is talking
> about occur in bluestore when fetching data from the underlying disk.
> When they occur, there is no data to reply with to the client.  Paul's
> pull request is working around that (likely a bug in the core kernel)
> by adding up to two retries.
> 
> The ones this thread is about occur on the client side when receiving
> a reply from the OSD.  The retry logic is already there: the connection
> is cut, the client reconnects and resends the OSD request.
> 
> Thanks,
> 
> Ilya
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bad crc/signature errors

2018-08-13 Thread Nikola Ciprich
Hi Paul,

thanks, I'll give it a try.. do you think this might head to
upstream soon?  for some reason I can't review comments for
this patch on github.. Is some new version of this patch
on the way, or can I try to apply this one to latest luminous?

thanks a lot!

nik


On Fri, Aug 10, 2018 at 06:05:26PM +0200, Paul Emmerich wrote:
> I've built a work-around here:
> https://github.com/ceph/ceph/pull/23273
> 
> 
> Paul
> 
> 2018-08-10 12:51 GMT+02:00 Nikola Ciprich :
> 
> > Hi,
> >
> > did this ever come to some conclusion? I've recently started seeing
> > those messages on one luminous cluster and am not sure whethere
> > those are dangerous or not..
> >
> > BR
> >
> > nik
> >
> >
> > On Fri, Oct 06, 2017 at 05:37:00PM +0200, Olivier Bonvalet wrote:
> > > Le jeudi 05 octobre 2017 à 21:52 +0200, Ilya Dryomov a écrit :
> > > > On Thu, Oct 5, 2017 at 6:05 PM, Olivier Bonvalet  > > > > wrote:
> > > > > Le jeudi 05 octobre 2017 à 17:03 +0200, Ilya Dryomov a écrit :
> > > > > > When did you start seeing these errors?  Can you correlate that
> > > > > > to
> > > > > > a ceph or kernel upgrade?  If not, and if you don't see other
> > > > > > issues,
> > > > > > I'd write it off as faulty hardware.
> > > > >
> > > > > Well... I have one hypervisor (Xen 4.6 and kernel Linux 4.1.13),
> > > > > which
> > > >
> > > > Is that 4.1.13 or 4.13.1?
> > > >
> > >
> > > Linux 4.1.13. The old Debian 8, with Xen 4.6 from upstream.
> > >
> > >
> > > > > have the problem for a long time, at least since 1 month (I haven't
> > > > > older logs).
> > > > >
> > > > > But, on others hypervisors (Xen 4.8 with Linux 4.9.x), I haven't
> > > > > the
> > > > > problem.
> > > > > And it's when I upgraded thoses hypervisors to Linux 4.13.x, that
> > > > > "bad
> > > > > crc" errors appeared.
> > > > >
> > > > > Note : if I upgraded kernels on Xen 4.8 hypervisors, it's because
> > > > > some
> > > > > DISCARD commands over RBD were blocking ("fstrim" works, but not
> > > > > "lvremove" with discard enabled). After upgrading to Linux 4.13.3,
> > > > > DISCARD works again on Xen 4.8.
> > > >
> > > > Which kernel did you upgrade from to 4.13.3 exactly?
> > > >
> > > >
> > >
> > > 4.9.47 or 4.9.52, I don't have more precise data about this.
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> >
> > --
> > -----
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> >
> > tel.:   +420 591 166 214
> > fax:+420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz
> >
> > mobil servis: +420 737 238 656
> > email servis: ser...@linuxbox.cz
> > -
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> 
> 
> -- 
> Paul Emmerich
> 
> Looking for help with your Ceph cluster? Contact us at https://croit.io
> 
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Re : Re : Re : bad crc/signature errors

2018-08-10 Thread Nikola Ciprich
Hi,

did this ever come to some conclusion? I've recently started seeing
those messages on one luminous cluster and am not sure whethere
those are dangerous or not..

BR

nik


On Fri, Oct 06, 2017 at 05:37:00PM +0200, Olivier Bonvalet wrote:
> Le jeudi 05 octobre 2017 à 21:52 +0200, Ilya Dryomov a écrit :
> > On Thu, Oct 5, 2017 at 6:05 PM, Olivier Bonvalet  > > wrote:
> > > Le jeudi 05 octobre 2017 à 17:03 +0200, Ilya Dryomov a écrit :
> > > > When did you start seeing these errors?  Can you correlate that
> > > > to
> > > > a ceph or kernel upgrade?  If not, and if you don't see other
> > > > issues,
> > > > I'd write it off as faulty hardware.
> > > 
> > > Well... I have one hypervisor (Xen 4.6 and kernel Linux 4.1.13),
> > > which
> > 
> > Is that 4.1.13 or 4.13.1?
> > 
> 
> Linux 4.1.13. The old Debian 8, with Xen 4.6 from upstream.
> 
> 
> > > have the problem for a long time, at least since 1 month (I haven't
> > > older logs).
> > > 
> > > But, on others hypervisors (Xen 4.8 with Linux 4.9.x), I haven't
> > > the
> > > problem.
> > > And it's when I upgraded thoses hypervisors to Linux 4.13.x, that
> > > "bad
> > > crc" errors appeared.
> > > 
> > > Note : if I upgraded kernels on Xen 4.8 hypervisors, it's because
> > > some
> > > DISCARD commands over RBD were blocking ("fstrim" works, but not
> > > "lvremove" with discard enabled). After upgrading to Linux 4.13.3,
> > > DISCARD works again on Xen 4.8.
> > 
> > Which kernel did you upgrade from to 4.13.3 exactly?
> > 
> > 
> 
> 4.9.47 or 4.9.52, I don't have more precise data about this.
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] krbd vs librbd performance with qemu

2018-07-19 Thread Nikola Ciprich
> > opts="--randrepeat=1 --ioengine=rbd --direct=1 --numjobs=${numjobs}
> > --gtod_reduce=1 --name=test --pool=${pool} --rbdname=${vol} --invalidate=0
> > --bs=4k --iodepth=64 --time_based --runtime=$time --group_reporting"
> >
> 
> So that "--numjobs" parameter is what I was referring to when I said
> multiple jobs will cause a huge performance it. This causes fio to open the
> same image X images, so with (nearly) each write operation, the
> exclusive-lock is being moved from client-to-client. Instead of multiple
> jobs against the same image, you should use multiple images.

ah, I see, didn't realize thant... thanks a lot for valuable info!

n.




-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpDh9tPM6Oa2.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] krbd vs librbd performance with qemu

2018-07-18 Thread Nikola Ciprich
> Care to share your "bench-rbd" script (on pastebin or similar)?
sure, no problem.. it's so short I hope nobody will get offended if I paste it 
right
here :)

#!/bin/bash

#export LD_PRELOAD="/usr/lib64/libtcmalloc.so.4"
numjobs=8
pool=nvme
vol=xxx
time=30

opts="--randrepeat=1 --ioengine=rbd --direct=1 --numjobs=${numjobs} 
--gtod_reduce=1 --name=test --pool=${pool} --rbdname=${vol} --invalidate=0 
--bs=4k --iodepth=64 --time_based --runtime=$time --group_reporting"

sopts="--randrepeat=1 --ioengine=rbd --direct=1 --numjobs=1 --gtod_reduce=1 
--name=test --pool=${pool} --rbdname=${vol} --invalidate=0 --bs=256k 
--iodepth=64 --time_based --runtime=$time --group_reporting"

#fio $sopts --readwrite=read --output=rbd-fio-seqread.log
echo

#fio $sopts --readwrite=write --output=rbd-fio-seqwrite.log
echo

fio $opts --readwrite=randread --output=rbd-fio-randread.log
echo

fio $opts --readwrite=randwrite --output=rbd-fio-randwrite.log
echo


hope it's of some use..

n.


-- 
---------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgp1e1Nvjmsdz.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] krbd vs librbd performance with qemu

2018-07-18 Thread Nikola Ciprich
> What's the output from "rbd info nvme/centos7"?
that was it! the parent had some of unsupported features
enabled, therefore the child could not be mapped..

so the error message is a bit confusing, but now after disabling
the features on the parent it works for me, thanks!

> Odd. The exclusive-lock code is only executed once (in general) upon the
> first write IO (or immediately upon mapping the image if the "exclusive"
> option is passed to the kernel). Therefore, it should have zero impact on
> IO performance.

hmm, then I might have found a bug..

[root@v4a bench1]# sh bench-rbd 
Jobs: 8 (f=8): [r(8)][100.0%][r=671MiB/s,w=0KiB/s][r=172k,w=0 IOPS][eta 00m:00s]
Jobs: 8 (f=8): [w(8)][100.0%][r=0KiB/s,w=230MiB/s][r=0,w=58.8k IOPS][eta 
00m:00s]

[root@v4a bench1]# rbd feature enable nvme/xxx exclusive-lock
[root@v4a bench1]# sh bench-rbd
Jobs: 8 (f=8): [r(8)][100.0%][r=651MiB/s,w=0KiB/s][r=167k,w=0 IOPS][eta 00m:00s]
Jobs: 8 (f=8): [w(8)][100.0%][r=0KiB/s,w=45.9MiB/s][r=0,w=11.7k IOPS][eta 
00m:00s]

(as you can see, the performance impact is even worse..)

I guess I should create a bug report for this one?

nik



> 
> 
> >
> > BR
> >
> > nik
> >
> >
> > --
> > -
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28. rijna 168, 709 00 Ostrava
> >
> > tel.:   +420 591 166 214
> > fax:+420 596 621 273
> > mobil:  +420 777 093 799
> >
> > www.linuxbox.cz
> >
> > mobil servis: +420 737 238 656
> > email servis: ser...@linuxbox.cz
> > -
> >
> 
> 
> -- 
> Jason

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgp1jfmQEJQAu.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] krbd vs librbd performance with qemu

2018-07-18 Thread Nikola Ciprich
;6QHi Janon,

> Just to clarify: modern / rebased krbd block drivers definitely support
> layering. The only missing features right now are object-map/fast-diff,
> deep-flatten, and journaling (for RBD mirroring).

I thought it as well, but at least mapping clone does not work for me even
under 4.17.6:


[root@v4a ~]# rbd map nvme/xxx
rbd: sysfs write failed
RBD image feature set mismatch. You can disable features unsupported by the 
kernel with "rbd feature disable nvme/xxx".
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (6) No such device or address

(note incorrect hint on how this is supposed to be fixed, with feature disable 
command without any feature)

dmesg output:

[  +3.919281] rbd: image xxx: WARNING: kernel layering is EXPERIMENTAL!
[  +0.001266] rbd: id 36dde238e1f29: image uses unsupported features: 0x38


[root@v4a ~]# rbd info nvme/xxx
rbd image 'xxx':
size 20480 MB in 5120 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.6a71313887ee0
format: 2
features: layering
flags: 
create_timestamp: Wed Jun 20 13:46:38 2018
parent: nvme/centos7@template
overlap: 20480 MB

is trying 4.18-rc5 worth giving a try?

> If you are running multiple fio jobs against the same image (or have the
> krbd device mapped to multiple hosts w/ active IO), then I would expect a
> huge performance hit since the lock needs to be transitioned between
> clients.

nope, only one running fio instance, no users on the other node..

BR

nik


-- 
---------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgp_UJKdcWlKC.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] krbd vs librbd performance with qemu

2018-07-18 Thread Nikola Ciprich
Hi,

historically I've found many discussions about this topic in
last few years, but it seems  to me to be still a bit unresolved
so I'd like to open the question again..

In all flash deployments, under 12.2.5 luminous and qemu 12.2.0
using lbirbd, I'm getting much worse results regarding IOPS then
with KRBD and direct block device access..

I'm testing on the same 100GB RBD volume, notable ceph settings:

client rbd cache disabled
osd_enable_op_tracker = False
osd_op_num_shards = 64
osd_op_num_threads_per_shard = 1

osds are running bluestore, 2 replicas (it's just for testing)

when I run FIO using librbd directly, I'm getting ~160k reads/s
and ~60k writes/s which is not that bad.

however when I run fio on block device under VM (qemu using librbd),
I'm getting only 60/40K op/s which is a huge loss.. 

when I use VM with block access to krbd mapped device, numbers
are much better, I'm getting something like 115/40K op/s which
is not ideal, but still much better.. tried many optimisations
and configuration variants (multiple queues, threads vs native aio
etc), but krbd still performs much much better..

My question is whether this is expected, or should both access methods
give more similar results? If possible, I'd like  to stick to librbd
(especially because krbd still lacks layering support, but there are
more reasons)

interesting is, that when I compare fio direct ceph access, librbd performs
better then KRBD, but  this doesn't concern me that much..

another question, during the tests, I noticed that enabling exclusive lock
feature degrades write iops a lot as well, is this expected? (the performance
falls to someting like 50%)

I'm doing the tests on small 2 node cluster, VMS are running directly on ceph 
nodes,
all is centos 7 with 4.14 kernel. (I know it's not recommended to run VMs 
directly
on ceph nodes, but for small deployments it's necessary for us)

if I could provide more details, I'll be happy to do so

BR

nik


-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous - 12.2.1 - stale RBD locks after client crash

2017-11-23 Thread Nikola Ciprich
Hello Jason,

you're right! I've done the upgrade according to the docs you've
mentioned, but I must have overlooked this step with caps completely..

thanks a lot for the help!

with best regards

nik


On Wed, Nov 22, 2017 at 07:52:31AM -0500, Jason Dillaman wrote:
> See previous threads about this subject [1][2] and see step 6 in the
> upgrade notes [3].
> 
> [1] 
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020722.html
> [2] https://www.mail-archive.com/ceph-users@lists.ceph.com/msg41718.html
> [3] 
> http://docs.ceph.com/docs/master/release-notes/#upgrade-from-jewel-or-kraken
> 
> On Wed, Nov 22, 2017 at 2:50 AM, Nikola Ciprich
> <nikola.cipr...@linuxbox.cz> wrote:
> > Hello ceph users and developers,
> >
> > I've stumbled upon a bit strange problem with Luminous.
> >
> > One of our servers running multiple QEMU clients crashed.
> > When we tried restarting those on another cluster node,
> > we got lots of fsck errors, disks seemed to return "physical"
> > block errors. I figured this out to be stale RBD locks on volumes
> > from the crashed machine. Wnen I removed the locks, everything
> > started to work. (for some volumes, I was fixing those the another
> > day after crash, so it was >10-15hours later)
> >
> > My question is, it this a bug or feature? I mean, after the client
> > crashes, should locks somehow expire, or they need to be removed
> > by hand? I don't remember having this issue with older ceph versions,
> > but I suppose we didn't  have exclusive locks feature enabled..
> >
> > I'll be very grateful for any reply
> >
> > with best regards
> >
> > nik
> > --
> > -
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> >
> > tel.:   +420 591 166 214
> > fax:+420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz
> >
> > mobil servis: +420 737 238 656
> > email servis: ser...@linuxbox.cz
> > ---------
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> -- 
> Jason
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] luminous - 12.2.1 - stale RBD locks after client crash

2017-11-21 Thread Nikola Ciprich
Hello ceph users and developers,

I've stumbled upon a bit strange problem with Luminous.

One of our servers running multiple QEMU clients crashed.
When we tried restarting those on another cluster node,
we got lots of fsck errors, disks seemed to return "physical"
block errors. I figured this out to be stale RBD locks on volumes
from the crashed machine. Wnen I removed the locks, everything
started to work. (for some volumes, I was fixing those the another
day after crash, so it was >10-15hours later)

My question is, it this a bug or feature? I mean, after the client
crashes, should locks somehow expire, or they need to be removed
by hand? I don't remember having this issue with older ceph versions,
but I suppose we didn't  have exclusive locks feature enabled..

I'll be very grateful for any reply

with best regards

nik
-- 
-----
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] **** SPAM **** jewel - recovery keeps stalling (continues after restarting OSDs)

2017-08-07 Thread Nikola Ciprich
Hi,

I tried balancing number of OSDs per node, set their weights the same,
increased op recovery priority, but it still takes ages to recover..

I've got my cluster OK now, so I'll try switching to kraken to see if
it behaves better..

nik



On Mon, Aug 07, 2017 at 11:36:10PM +0800, cgxu wrote:
> I encountered same issue today and I solved problem by adjusting "osd 
> recovery op priority” to 63 temporarily.
> 
> It looks like recovery PUSH/PULL op starved in op_wq prioritized queue and 
> I’ve never experienced in hammer version.
> 
> Any other idea? 
> 
> 
> > Hi,
> > 
> > I'm trying to find reason for strange recovery issues I'm seeing on
> > our cluster..
> > 
> > it's mostly idle, 4 node cluster with 26 OSDs evenly distributed
> > across nodes. jewel 10.2.9
> > 
> > the problem is that after some disk replaces and data moves, recovery
> > is progressing extremely slowly.. pgs seem to be stuck in 
> > active+recovering+degraded
> > state:
> > 
> > [root@v1d ~]# ceph -s
> > cluster a5efbc87-3900-4c42-a977-8c93f7aa8c33
> >  health HEALTH_WARN
> > 159 pgs backfill_wait
> > 4 pgs backfilling
> > 259 pgs degraded
> > 12 pgs recovering
> > 113 pgs recovery_wait
> > 215 pgs stuck degraded
> > 266 pgs stuck unclean
> > 140 pgs stuck undersized
> > 151 pgs undersized
> > recovery 37788/2327775 objects degraded (1.623%)
> > recovery 23854/2327775 objects misplaced (1.025%)
> > noout,noin flag(s) set
> >  monmap e21: 3 mons at 
> > {v1a=10.0.0.1:6789/0,v1b=10.0.0.2:6789/0,v1c=10.0.0.3:6789/0}
> > election epoch 6160, quorum 0,1,2 v1a,v1b,v1c
> >   fsmap e817: 1/1/1 up {0=v1a=up:active}, 1 up:standby
> >  osdmap e76002: 26 osds: 26 up, 26 in; 185 remapped pgs
> > flags noout,noin,sortbitwise,require_jewel_osds
> >   pgmap v80995844: 3200 pgs, 4 pools, 2876 GB data, 757 kobjects
> > 9215 GB used, 35572 GB / 45365 GB avail
> > 37788/2327775 objects degraded (1.623%)
> > 23854/2327775 objects misplaced (1.025%)
> > 2912 active+clean
> >  130 active+undersized+degraded+remapped+wait_backfill
> >   97 active+recovery_wait+degraded
> >   29 active+remapped+wait_backfill
> >   12 active+recovery_wait+undersized+degraded+remapped
> >6 active+recovering+degraded
> >5 active+recovering+undersized+degraded+remapped
> >4 active+undersized+degraded+remapped+backfilling
> >4 active+recovery_wait+degraded+remapped
> >1 active+recovering+degraded+remapped
> >   client io 2026 B/s rd, 146 kB/s wr, 9 op/s rd, 21 op/s wr
> > 
> > 
> >  when I restart affected OSDs, it bumps the recovery, but then another
> > PGs get stuck.. All OSDs were restarted multiple times, none are even close 
> > to
> > nearfull, I just cant find what I'm doing wrong..
> > 
> > possibly related OSD options:
> > 
> > osd max backfills = 4
> > osd recovery max active = 15
> > debug osd = 0/0
> > osd op threads = 4
> > osd backfill scan min = 4
> > osd backfill scan max = 16
> > 
> > Any hints would be greatly appreciated
> > 
> > thanks
> > 
> > nik
> > 
> > 
> > -- 
> > -
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> > 
> > tel.:   +420 591 166 214
> > fax:+420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz <http://www.linuxbox.cz/>
> > 
> > mobil servis: +420 737 238 656
> > email servis: ser...@linuxbox.cz
> > -
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> > 
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread Nikola Ciprich
t;,
"items": [
{
"id": 1,
"weight": 104857,
"pos": 0
},
{
"id": 3,
"weight": 117964,
"pos": 1
},
{
"id": 9,
"weight": 104857,
"pos": 2
},
{
"id": 11,
"weight": 117964,
"pos": 3
},
{
"id": 24,
"weight": 235929,
"pos": 4
}
]
},
{
"id": -6,
"name": "v1c",
"type_id": 1,
"type_name": "host",
"weight": 511178,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": 14,
"weight": 104857,
"pos": 0
},
{
"id": 15,
"weight": 117964,
"pos": 1
},
{
"id": 16,
"weight": 91750,
"pos": 2
},
{
"id": 18,
"weight": 91750,
"pos": 3
},
{
"id": 17,
"weight": 104857,
"pos": 4
}
]
},
{
"id": -7,
"name": "v1d-ssd",
"type_id": 1,
"type_name": "host",
"weight": 14417,
"alg": "straw",
"hash": "rjenkins1",
"items": [
{
"id": 19,
"weight": 14417,
"pos": 0
}
]
},
{
"id": -9,
"name": "v1c-ssd",
"type_id": 1,
"type_name": "host",
"weight": 26214,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": 10,
"weight": 26214,
"pos": 0
}
]
},
{
"id": -10,
"name": "v1a-ssd",
"type_id": 1,
"type_name": "host",
"weight": 39320,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": 5,
"weight": 19660,
"pos": 0
},
{
"id": 26,
"weight": 19660,
"pos": 1
}
]
},
{
"id": -11,
"name": "v1b-ssd",
"type_id": 1,
"type_name": "host",
"weight": 22282,
"alg": "straw2",
"hash": "rjenkins1",
"items": [
{
"id": 13,
"weight": 22282,
"pos": 0
}
]
}
],
"rules": [
{
"rule_id": 0,
"rule_name": "replicated_ruleset",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",

Re: [ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread Nikola Ciprich

On Fri, Jul 28, 2017 at 05:43:14PM +0800, linghucongsong wrote:
> 
> 
> It look like the osd in your cluster is not all the same size.
> 
> can you show ceph osd df output?

you're right, they're not..  here's the output:

[root@v1b ~]# ceph osd df tree
ID  WEIGHT   REWEIGHT SIZE   USE   AVAIL  %USE  VAR  PGS TYPE NAME 
 -2  1.55995-  1706G  883G   805G 51.78 2.55   0 root ssd  
 -9  0.3-   393G  221G   171G 56.30 2.78   0 host v1c-ssd  
 10  0.3  1.0   393G  221G   171G 56.30 2.78  98 osd.10
-10  0.59998-   683G  275G   389G 40.39 1.99   0 host v1a-ssd  
  5  0.2  1.0   338G  151G   187G 44.77 2.21  65 osd.5 
 26  0.2  1.0   344G  124G   202G 36.07 1.78  52 osd.26
-11  0.34000-   338G  219G   119G 64.68 3.19   0 host v1b-ssd  
 13  0.34000  1.0   338G  219G   119G 64.68 3.19  96 osd.13
 -7  0.21999-   290G  166G   123G 57.43 2.83   0 host v1d-ssd  
 19  0.21999  1.0   290G  166G   123G 57.43 2.83  73 osd.19
 -1 39.29982- 43658G 8312G 34787G 19.04 0.94   0 root default  
 -4 11.89995- 12806G 2422G 10197G 18.92 0.93   0 host v1a  
  6  1.5  1.0  1833G  358G  1475G 19.53 0.96 366 osd.6 
  8  1.7  1.0  1833G  313G  1519G 17.11 0.84 370 osd.8 
  2  1.5  1.0  1833G  320G  1513G 17.46 0.86 331 osd.2 
  0  1.7  1.0  1804G  431G  1373G 23.90 1.18 359 osd.0 
  4  1.5  1.0  1833G  294G  1539G 16.07 0.79 360 osd.4 
 25  3.5  1.0  3667G  704G  2776G 19.22 0.95 745 osd.25
 -5 10.39995- 10914G 2154G  8573G 19.74 0.97   0 host v1b  
  1  1.5  1.0  1804G  350G  1454G 19.42 0.96 409 osd.1 
  3  1.7  1.0  1804G  360G  1444G 19.98 0.99 412 osd.3 
  9  1.5  1.0  1804G  331G  1473G 18.37 0.91 363 osd.9 
 11  1.7  1.0  1833G  367G  1465G 20.06 0.99 415 osd.11
 24  3.5  1.0  3667G  744G  2736G 20.30 1.00 834 osd.24
 -6  7.79996-  9051G 1769G  7282G 19.54 0.96   0 host v1c  
 14  1.5  1.0  1804G  370G  1433G 20.54 1.01 442 osd.14
 15  1.7  1.0  1833G  383G  1450G 20.92 1.03 447 osd.15
 16  1.3  1.0  1804G  295G  1508G 16.38 0.81 355 osd.16
 18  1.3  1.0  1804G  366G  1438G 20.29 1.00 381 osd.18
 17  1.5  1.0  1804G  353G  1451G 19.57 0.97 429 osd.17
 -3  9.19997- 10885G 1965G  8733G 18.06 0.89   0 host v1d-sata 
 12  1.3  1.0  1804G  348G  1455G 19.32 0.95 365 osd.12
 20  1.3  1.0  1804G  335G  1468G 18.60 0.92 371 osd.20
 21  3.5  1.0  3667G  695G  2785G 18.97 0.94 871 osd.21
 22  1.3  1.0  1804G  281G  1522G 15.63 0.77 326 osd.22
 23  1.3  1.0  1804G  303G  1500G 16.83 0.83 321 osd.23
TOTAL 45365G 9195G 35592G 20.27
MIN/MAX VAR: 0.77/3.19  STDDEV: 14.69



apart from replacing OSDs, how can I help it?




> 
> 
> At 2017-07-28 17:24:29, "Nikola Ciprich" <nikola.cipr...@linuxbox.cz> wrote:
> >I forgot to add that OSD daemons really seem to be idle, no disk
> >activity, no CPU usage.. it just looks to me like  some kind of
> >deadlock, as they were waiting for each other..
> >
> >and so I'm trying to get last 1.5% of misplaced / degraded PGs
> >for almost a week..
> >
> >
> >On Fri, Jul 28, 2017 at 10:56:02AM +0200, Nikola Ciprich wrote:
> >> Hi,
> >> 
> >> I'm trying to find reason for strange recovery issues I'm seeing on
> >> our cluster..
> >> 
> >> it's mostly idle, 4 node cluster with 26 OSDs evenly distributed
> >> across nodes. jewel 10.2.9
> >> 
> >> the problem is that after some disk replaces and data moves, recovery
> >> is progressing extremely slowly.. pgs seem to be stuck in 
> >> active+recovering+degraded
> >> state:
> >> 
> >> [root@v1d ~]# ceph -s
> >> cluster a5efbc87-3900-4c42-a977-8c93f7aa8c33
> >>  health HEALTH_WARN
> >> 159 pgs backfill_wait
> >> 4 pgs backfilling
> >> 259 pgs degraded
> >> 12 pgs recovering
> >> 113 pgs recovery_wait
> >> 215 pgs stuck degraded
> >> 266 pgs stuck unclean
> >> 140 pgs stuck undersized
> >> 151 pgs undersized
> >> recovery 37788/2327775 objects degraded (1.623%)
> >> recovery 23854/2327775 objects misplaced (1.

Re: [ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread Nikola Ciprich
I forgot to add that OSD daemons really seem to be idle, no disk
activity, no CPU usage.. it just looks to me like  some kind of
deadlock, as they were waiting for each other..

and so I'm trying to get last 1.5% of misplaced / degraded PGs
for almost a week..


On Fri, Jul 28, 2017 at 10:56:02AM +0200, Nikola Ciprich wrote:
> Hi,
> 
> I'm trying to find reason for strange recovery issues I'm seeing on
> our cluster..
> 
> it's mostly idle, 4 node cluster with 26 OSDs evenly distributed
> across nodes. jewel 10.2.9
> 
> the problem is that after some disk replaces and data moves, recovery
> is progressing extremely slowly.. pgs seem to be stuck in 
> active+recovering+degraded
> state:
> 
> [root@v1d ~]# ceph -s
> cluster a5efbc87-3900-4c42-a977-8c93f7aa8c33
>  health HEALTH_WARN
> 159 pgs backfill_wait
> 4 pgs backfilling
> 259 pgs degraded
> 12 pgs recovering
> 113 pgs recovery_wait
> 215 pgs stuck degraded
> 266 pgs stuck unclean
> 140 pgs stuck undersized
> 151 pgs undersized
> recovery 37788/2327775 objects degraded (1.623%)
> recovery 23854/2327775 objects misplaced (1.025%)
> noout,noin flag(s) set
>  monmap e21: 3 mons at 
> {v1a=10.0.0.1:6789/0,v1b=10.0.0.2:6789/0,v1c=10.0.0.3:6789/0}
> election epoch 6160, quorum 0,1,2 v1a,v1b,v1c
>   fsmap e817: 1/1/1 up {0=v1a=up:active}, 1 up:standby
>  osdmap e76002: 26 osds: 26 up, 26 in; 185 remapped pgs
> flags noout,noin,sortbitwise,require_jewel_osds
>   pgmap v80995844: 3200 pgs, 4 pools, 2876 GB data, 757 kobjects
> 9215 GB used, 35572 GB / 45365 GB avail
> 37788/2327775 objects degraded (1.623%)
> 23854/2327775 objects misplaced (1.025%)
> 2912 active+clean
>  130 active+undersized+degraded+remapped+wait_backfill
>   97 active+recovery_wait+degraded
>   29 active+remapped+wait_backfill
>   12 active+recovery_wait+undersized+degraded+remapped
>6 active+recovering+degraded
>5 active+recovering+undersized+degraded+remapped
>4 active+undersized+degraded+remapped+backfilling
>4 active+recovery_wait+degraded+remapped
>1 active+recovering+degraded+remapped
>   client io 2026 B/s rd, 146 kB/s wr, 9 op/s rd, 21 op/s wr
> 
> 
>  when I restart affected OSDs, it bumps the recovery, but then another
> PGs get stuck.. All OSDs were restarted multiple times, none are even close to
> nearfull, I just cant find what I'm doing wrong..
> 
> possibly related OSD options:
> 
> osd max backfills = 4
> osd recovery max active = 15
> debug osd = 0/0
> osd op threads = 4
> osd backfill scan min = 4
> osd backfill scan max = 16
> 
> Any hints would be greatly appreciated
> 
> thanks
> 
> nik
> 
> 
> -- 
> -
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:+420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -----
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] jewel - recovery keeps stalling (continues after restarting OSDs)

2017-07-28 Thread Nikola Ciprich
Hi,

I'm trying to find reason for strange recovery issues I'm seeing on
our cluster..

it's mostly idle, 4 node cluster with 26 OSDs evenly distributed
across nodes. jewel 10.2.9

the problem is that after some disk replaces and data moves, recovery
is progressing extremely slowly.. pgs seem to be stuck in 
active+recovering+degraded
state:

[root@v1d ~]# ceph -s
cluster a5efbc87-3900-4c42-a977-8c93f7aa8c33
 health HEALTH_WARN
159 pgs backfill_wait
4 pgs backfilling
259 pgs degraded
12 pgs recovering
113 pgs recovery_wait
215 pgs stuck degraded
266 pgs stuck unclean
140 pgs stuck undersized
151 pgs undersized
recovery 37788/2327775 objects degraded (1.623%)
recovery 23854/2327775 objects misplaced (1.025%)
noout,noin flag(s) set
 monmap e21: 3 mons at 
{v1a=10.0.0.1:6789/0,v1b=10.0.0.2:6789/0,v1c=10.0.0.3:6789/0}
election epoch 6160, quorum 0,1,2 v1a,v1b,v1c
  fsmap e817: 1/1/1 up {0=v1a=up:active}, 1 up:standby
 osdmap e76002: 26 osds: 26 up, 26 in; 185 remapped pgs
flags noout,noin,sortbitwise,require_jewel_osds
  pgmap v80995844: 3200 pgs, 4 pools, 2876 GB data, 757 kobjects
9215 GB used, 35572 GB / 45365 GB avail
37788/2327775 objects degraded (1.623%)
23854/2327775 objects misplaced (1.025%)
2912 active+clean
 130 active+undersized+degraded+remapped+wait_backfill
  97 active+recovery_wait+degraded
  29 active+remapped+wait_backfill
  12 active+recovery_wait+undersized+degraded+remapped
   6 active+recovering+degraded
   5 active+recovering+undersized+degraded+remapped
   4 active+undersized+degraded+remapped+backfilling
   4 active+recovery_wait+degraded+remapped
   1 active+recovering+degraded+remapped
  client io 2026 B/s rd, 146 kB/s wr, 9 op/s rd, 21 op/s wr


 when I restart affected OSDs, it bumps the recovery, but then another
PGs get stuck.. All OSDs were restarted multiple times, none are even close to
nearfull, I just cant find what I'm doing wrong..

possibly related OSD options:

osd max backfills = 4
osd recovery max active = 15
debug osd = 0/0
osd op threads = 4
osd backfill scan min = 4
osd backfill scan max = 16

Any hints would be greatly appreciated

thanks

nik


-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] after jewel 10.2.2->10.2.7 upgrade, one of OSD crashes on OSDMap::decode

2017-05-01 Thread Nikola Ciprich
Hi,

I've ugpraded tiny jewel cluster from 10.2.2 to 10.2.7 and now
one of OSDs fails to start..

here's (hopefully) important part of the backtrace:
2017-05-01 19:54:17.627262 7fb2bbf78800 10 filestore(/var/lib/ceph/osd/ceph-1) 
stat meta/#-1:c0371625:::snapmapper:0# = 0 (size 0)
2017-05-01 19:54:17.627440 7fb2bbf78800  0  cls/hello/cls_hello.cc:305: 
loading cls_hello
2017-05-01 19:54:17.629044 7fb2bbf78800  0  cls/cephfs/cls_cephfs.cc:202: 
loading cephfs_size_scan
2017-05-01 19:54:17.630656 7fb2bbf78800 15 filestore(/var/lib/ceph/osd/ceph-1) 
read meta/#-1:3294e826:::osdmap.53:0# 0~0
2017-05-01 19:54:17.630674 7fb2bbf78800 10 filestore(/var/lib/ceph/osd/ceph-1) 
FileStore::read meta/#-1:3294e826:::osdmap.53:0# 0~0/0
terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
  what():  buffer::end_of_buffer
*** Caught signal (Aborted) **
 in thread 7fb2bbf78800 thread_name:ceph-osd
 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
 1: (()+0x91d8ea) [0x5609e9f938ea]
 2: (()+0xf370) [0x7fb2ba6ca370]
 3: (gsignal()+0x37) [0x7fb2b8c8b1d7]
 4: (abort()+0x148) [0x7fb2b8c8c8c8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fb2b958f9d5]
 6: (()+0x5e946) [0x7fb2b958d946]
 7: (()+0x5e973) [0x7fb2b958d973]
 8: (()+0x5eb93) [0x7fb2b958db93]
 9: (ceph::buffer::list::iterator_impl::copy(unsigned int, char*)+0xa5) 
[0x5609ea09e425]
 10: (OSDMap::decode(ceph::buffer::list::iterator&)+0x6d) [0x5609ea055a9d]
 11: (OSDMap::decode(ceph::buffer::list&)+0x2e) [0x5609ea056d9e]
 12: (OSDService::try_get_map(unsigned int)+0x4ac) [0x5609e9a0882c]
 13: (OSDService::get_map(unsigned int)+0xe) [0x5609e9a6b5fe]
 14: (OSD::init()+0x1fe2) [0x5609e9a1e782]
 15: (main()+0x2c55) [0x5609e9981dc5]
 16: (__libc_start_main()+0xf5) [0x7fb2b8c77b35]
 17: (()+0x3561e7) [0x5609e99cc1e7]
2017-05-01 19:54:17.632871 7fb2bbf78800 -1 *** Caught signal (Aborted) **
 in thread 7fb2bbf78800 thread_name:ceph-osd

full osd log is here:

http://nik.lbox.cz/download/osd-crash.txt

I've found some older discussions and reports of similar problem, but
none of current versions, especially 10.2.7

the cluster is very small (just 2+2 OSDs, 3 mons, no MDS), was installed as
10.2.2, therefore no upgrade from hammer or so.. OS is centos7 based, 4.4.52
x86_64 kernel..

If anyone is interested in it, I can provide more info if needed, otherwise
I'll reformat OSD to get it back into OK state..

BR

nik


-- 
-----
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hammer - lost object after just one OSD failure?

2016-05-04 Thread Nikola Ciprich
Hi Gregory,

thanks for the reply.

> 
> Is OSD 0 the one which had a failing hard drive? And OSD 10 is
> supposed to be fine?

yes, OSD 0 crashed due to disk errors, rest of the cluster was without
problems, no crash, no restarts.. that's why it scared me a bit..

pity I purged lost placement groups, maybe we could have digged some
more debug info... I'll torture and watch the cluster carefully and report if
something similar happens again.. I suppose we can't do much more till
then...

BR

nik
> 
> In general what you're saying does make it sound like something under
> the Ceph code lost objects, but if one of those OSDs has never had a
> problem I'm not sure what it could be.
> 
> (The most common failure mode is power loss while the user has
> barriers turned off, or a RAID card misconfigured, or similar.)
> -Greg
> 
> >
> > I'd be grateful for any info
> >
> > br
> >
> > nik
> >
> >
> >
> >
> >
> > --
> > -
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> >
> > tel.:   +420 591 166 214
> > fax:+420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz
> >
> > mobil servis: +420 737 238 656
> > email servis: ser...@linuxbox.cz
> > -
> >
> > ___________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpoSeBEBZpA7.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] hammer - lost object after just one OSD failure?

2016-05-04 Thread Nikola Ciprich
Hi,

I was doing some performance tuning on test cluster of just 2
nodes (each 10 OSDs). I have test pool of 2 replicas (size=2, min_size=2)

then one of OSD crashed due to failing harddrive. All remaining OSDs were
fine, but health status reported one lost object..

here's detail:

"recovery_state": [
{
"name": "Started\/Primary\/Active",
"enter_time": "2016-05-04 07:59:10.706866",
"might_have_unfound": [
{
"osd": "0",
"status": "osd is down"
},
{
"osd": "10",
"status": "already probed"
}
],


it was no important data, so  I just discarded it as I don't need
to recover it, but now I'm wondering what is the cause of all this..

I have min_size set to 2 and I though that writes are confirmed after
they reach all target OSD journals, no? Is there something specific I should
check? Maybe I have some bug in configuration? Or how else could this object
be lost?

I'd be grateful for any info

br

nik





-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpewxsGEVgLj.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] can't get rid of stale+active+clean pgs by no means

2016-02-06 Thread Nikola Ciprich
Hi,

I'm still strugling with health problems of  my cluster..

I still have 2 stale+active+clean and one creating pgs..
I've just stopped all nodes and started them all again,
and those pages still remain..

I think I've read all related discussions and docs, and tried
virtually everything I though could help (and be safe).

Querying those stale pgs hangs, OSDs which should be acting
for them are running.. I can't figure what could be wrong..

does anyone have an idea what to try?

I'm running latest hammer (0.94.5) on centos 6..

thanks a lot in advance

cheers

nik



-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgp1v3XmsIQXD.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hammer-0.94.5 + kernel-4.1.15 - cephfs stuck

2016-02-04 Thread Nikola Ciprich


On 4 February 2016 08:33:55 CET, Gregory Farnum <gfar...@redhat.com> wrote:
>The quick and dirty cleanup is to restart the OSDs hosting those PGs.
>They might have gotten some stuck ops which didn't get woken up; a few
>bugs like that have gone by and are resolved in various stable
>branches (I'm not sure what release binaries they're in).
>
That's what I thought, so I tried restarting all OSD already.. But those stuck 
PGs still remain
The version I'm running is 0.94.5

Nik


>On Wed, Feb 3, 2016 at 11:32 PM, Nikola Ciprich
><nikola.cipr...@linuxbox.cz> wrote:
>>> Yeah, these inactive PGs are basically guaranteed to be the cause of
>>> the problem. There are lots of threads about getting PGs healthy
>>> again; you should dig around the archives and the documentation
>>> troubleshooting page(s). :)
>>> -Greg
>>
>> Hello Gregory,
>>
>> well, I wouldn't doubt it, but when the problems started, the only
>> unclean pages were some remapped, no inactive, so I guess it must've
>> been something else..
>>
>> but I'm now struggling to get rid of those inactive of course..
>> however I've not been successfull so far, I've probably read all
>> the related docs and discussions and still haven't found similar
>> problem..
>>
>> pg 6.11 is stuck stale for 79285.647847, current state
>stale+active+clean, last acting [4,10,8]
>> pg 3.198 is stuck stale for 79367.532437, current state
>stale+active+clean, last acting [8,13]
>>
>> those two are stale for some reason.. but OSDS 4, 8, 10, 13 are
>running, there
>> are no network problems.. PG query on those just hangs..
>>
>> I'm running ot of ideas here..
>>
>> nik
>>
>>
>> --
>> -
>> Ing. Nikola CIPRICH
>> LinuxBox.cz, s.r.o.
>> 28. rijna 168, 709 00 Ostrava
>>
>> tel.:   +420 591 166 214
>> fax:+420 596 621 273
>> mobil:  +420 777 093 799
>>
>> www.linuxbox.cz
>>
>> mobil servis: +420 737 238 656
>> email servis: ser...@linuxbox.cz
>> -

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hammer-0.94.5 + kernel-4.1.15 - cephfs stuck

2016-02-03 Thread Nikola Ciprich
Hello Gregory,

in the meantime, I managed to break it further :(

I tried getting rid of active+remapped pgs and got some undersized
instead.. nto sure whether this can be related..

anyways here's the status:

ceph -s
cluster ff21618e-5aea-4cfe-83b6-a0d2d5b4052a
 health HEALTH_WARN
3 pgs degraded
2 pgs stale
3 pgs stuck degraded
1 pgs stuck inactive
2 pgs stuck stale
242 pgs stuck unclean
3 pgs stuck undersized
3 pgs undersized
recovery 65/3374343 objects degraded (0.002%)
recovery 186187/3374343 objects misplaced (5.518%)
mds0: Behind on trimming (155/30)
 monmap e3: 3 mons at 
{remrprv1a=10.0.0.1:6789/0,remrprv1b=10.0.0.2:6789/0,remrprv1c=10.0.0.3:6789/0}
election epoch 522, quorum 0,1,2 remrprv1a,remrprv1b,remrprv1c
 mdsmap e342: 1/1/1 up {0=remrprv1c=up:active}, 2 up:standby
 osdmap e4385: 21 osds: 21 up, 21 in; 238 remapped pgs
  pgmap v18679192: 1856 pgs, 7 pools, 4223 GB data, 1103 kobjects
12947 GB used, 22591 GB / 35538 GB avail
65/3374343 objects degraded (0.002%)
186187/3374343 objects misplaced (5.518%)
1612 active+clean
 238 active+remapped
   3 active+undersized+degraded
   2 stale+active+clean
   1 creating
  client io 0 B/s rd, 40830 B/s wr, 17 op/s


> What's the full output of "ceph -s"? Have you looked at the MDS admin
> socket at all — what state does it say it's in?

[root@remrprv1c ceph]# ceph --admin-daemon 
/var/run/ceph/ceph-mds.remrprv1c.asok dump_ops_in_flight
{
"ops": [
{
"description": "client_request(client.3052096:83 getattr Fs 
#1000288 2016-02-03 10:10:46.361591 RETRY=1)",
"initiated_at": "2016-02-03 10:23:25.791790",
"age": 3963.093615,
"duration": 9.519091,
"type_data": [
"failed to rdlock, waiting",
"client.3052096:83",
"client_request",
{
"client": "client.3052096",
"tid": 83
},
[
{
"time": "2016-02-03 10:23:25.791790",
"event": "initiated"
},
{
"time": "2016-02-03 10:23:35.310881",
"event": "failed to rdlock, waiting"
}
]
]
}
],
"num_ops": 1
}

seems there's some lock stuck here.. 

Killing stuck client (it's postgres trying to access cephfs file
doesn't help..)


> -Greg
> 
> >
> > My question here is:
> >
> > 1) is there some known issue with hammer 0.94.5 or kernel 4.1.15
> > which could lead to cephfs hangs?
> >
> > 2) what can I do to debug what is the cause of this hang?
> >
> > 3) is there a way to recover this without hard resetting
> > node with hung cephfs mount?
> >
> > If I could provide more information, please let me know
> >
> > I'd really appreciate any help
> >
> > with best regards
> >
> > nik
> >
> >
> >
> >
> > --
> > -
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> >
> > tel.:   +420 591 166 214
> > fax:+420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz
> >
> > mobil servis: +420 737 238 656
> > email servis: ser...@linuxbox.cz
> > -
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpGoG5McCNrp.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] hammer - remapped / undersized pgs + related questions

2016-02-03 Thread Nikola Ciprich
   1 creating
  client io 14830 B/s rd, 269 kB/s wr, 94 op/s


I'd be very gratefull for any help with those..

with best regards

nik

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgp1GDSul_Wqi.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] placement group lost by using force_create_pg ?

2016-02-03 Thread Nikola Ciprich
Hello cephers,

I think I've got into pretty bad situation :(

I mistakenly run force_create_pg on one placement group in live cluster
Now it's stuck in creating state. Now I suppose the placement group
content is lost, right? Is there a way to recover it? Or at least
way to find out which objects are affected by it? I've only found
ways to find to which placement group objects belong, but not the other
direction (apart from trying all objects).

some data are in rbd objects, some on cephfs...

is there a way to help?

it'd be rally appreciated...

thanks a lot in advance

with best regards

nikola cirpich

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpzf96s06zZ7.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hammer-0.94.5 + kernel-4.1.15 - cephfs stuck

2016-02-03 Thread Nikola Ciprich
> Yeah, these inactive PGs are basically guaranteed to be the cause of
> the problem. There are lots of threads about getting PGs healthy
> again; you should dig around the archives and the documentation
> troubleshooting page(s). :)
> -Greg

Hello Gregory,

well, I wouldn't doubt it, but when the problems started, the only
unclean pages were some remapped, no inactive, so I guess it must've
been something else..

but I'm now struggling to get rid of those inactive of course..
however I've not been successfull so far, I've probably read all
the related docs and discussions and still haven't found similar
problem..

pg 6.11 is stuck stale for 79285.647847, current state stale+active+clean, last 
acting [4,10,8]
pg 3.198 is stuck stale for 79367.532437, current state stale+active+clean, 
last acting [8,13]

those two are stale for some reason.. but OSDS 4, 8, 10, 13 are running, there
are no network problems.. PG query on those just hangs..

I'm running ot of ideas here..

nik


-- 
-----
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpKKRsiUcBaT.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] sync writes - expected performance?

2015-12-16 Thread Nikola Ciprich
Hello Mark,

thanks for your explanation, it all makes sense. I've done
some measuring on google and amazon clouds as well and really,
those numbers seem to be pretty good. I'll be playing with
fine tunning a little bit more, but overall performance
really seems to be quite nice.

Thanks to all of you for your replies guys!

nik


On Mon, Dec 14, 2015 at 11:03:16AM -0600, Mark Nelson wrote:
> 
> 
> On 12/14/2015 04:49 AM, Nikola Ciprich wrote:
> >Hello,
> >
> >i'm doing some measuring on test (3 nodes) cluster and see strange 
> >performance
> >drop for sync writes..
> >
> >I'm using SSD for both journalling and OSD. It should be suitable for
> >journal, giving about 16.1KIOPS (67MB/s) for sync IO.
> >
> >(measured using fio --filename=/dev/xxx --direct=1 --sync=1 --rw=write 
> >--bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting 
> >--name=journal-test)
> >
> >On top of this cluster, I have running KVM guest (using qemu librbd backend).
> >Overall performance seems to be quite good, but the problem is when I try
> >to measure sync IO performance inside the guest.. I'm getting only about 
> >600IOPS,
> >which I think is quite poor.
> >
> >The problem is, I don't see any bottlenect, OSD daemons don't seem to be 
> >hanging on
> >IO, neither hogging CPU, qemu process is also not somehow too much loaded..
> >
> >I'm using hammer 0.94.5 on top of centos 6 (4.1 kernel), all debugging 
> >disabled,
> >
> >my question is, what results I can expect for synchronous writes? I 
> >understand
> >there will always be some performance drop, but 600IOPS on top of storage 
> >which
> >can give as much as 16K IOPS seems to little..
> 
> So basically what this comes down to is latency.  Since you get 16K IOPS for
> O_DSYNC writes on the SSD, there's a good chance that it has a
> super-capacitor on board and can basically acknowledge a write as complete
> as soon as it hits the on-board cache rather than when it's written to
> flash.  Figure that for 16K O_DSYNC IOPs means that each IO is completing in
> around 0.06ms on average.  That's very fast!  At 600 IOPs for O_DSYNC writes
> on your guest, you're looking at about 1.6ms per IO on average.
> 
> So how do we account for the difference?  Let's start out by looking at a
> quick example of network latency (This is between two random machines in one
> of our labs at Red Hat):
> 
> >64 bytes from gqas008: icmp_seq=1 ttl=64 time=0.583 ms
> >64 bytes from gqas008: icmp_seq=2 ttl=64 time=0.219 ms
> >64 bytes from gqas008: icmp_seq=3 ttl=64 time=0.224 ms
> >64 bytes from gqas008: icmp_seq=4 ttl=64 time=0.200 ms
> >64 bytes from gqas008: icmp_seq=5 ttl=64 time=0.196 ms
> 
> now consider that when you do a write in ceph, you write to the primary OSD
> which then writes out to the replica OSDs.  Every replica IO has to complete
> before the primary will send the acknowledgment to the client (ie you have
> to add the latency of the worst of the replica writes!). In your case, the
> network latency alone is likely dramatically increasing IO latency vs raw
> SSD O_DSYNC writes.  Now add in the time to process crush mappings, look up
> directory and inode metadata on the filesystem where objects are stored
> (assuming it's not cached), and other processing time, and the 1.6ms latency
> for the guest writes starts to make sense.
> 
> Can we improve things?  Likely yes.  There's various areas in the code where
> we can trim latency away, implement alternate OSD backends, and potentially
> use alternate network technology like RDMA to reduce network latency.  The
> thing to remember is that when you are talking about O_DSYNC writes, even
> very small increases in latency can have dramatic effects on performance.
> Every fraction of a millisecond has huge ramifications.
> 
> >
> >Has anyone done similar measuring?
> >
> >thanks a lot in advance!
> >
> >BR
> >
> >nik
> >
> >
> >
> >
> >___
> >ceph-users mailing list
> >ceph-users@lists.ceph.com
> >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpcTqptKGKxY.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] sync writes - expected performance?

2015-12-14 Thread Nikola Ciprich
Hello,

i'm doing some measuring on test (3 nodes) cluster and see strange performance
drop for sync writes..

I'm using SSD for both journalling and OSD. It should be suitable for
journal, giving about 16.1KIOPS (67MB/s) for sync IO.

(measured using fio --filename=/dev/xxx --direct=1 --sync=1 --rw=write --bs=4k 
--numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting 
--name=journal-test)

On top of this cluster, I have running KVM guest (using qemu librbd backend).
Overall performance seems to be quite good, but the problem is when I try
to measure sync IO performance inside the guest.. I'm getting only about 
600IOPS,
which I think is quite poor.

The problem is, I don't see any bottlenect, OSD daemons don't seem to be 
hanging on
IO, neither hogging CPU, qemu process is also not somehow too much loaded..

I'm using hammer 0.94.5 on top of centos 6 (4.1 kernel), all debugging disabled,

my question is, what results I can expect for synchronous writes? I understand
there will always be some performance drop, but 600IOPS on top of storage which
can give as much as 16K IOPS seems to little..

Has anyone done similar measuring?

thanks a lot in advance!

BR

nik


-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpXMhY4Ixq8A.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD pool and SATA pool

2015-11-17 Thread Nikola Ciprich
I'm not an ceph expert, but I needed to use

osd crush update on start = false

in [osd] config section..

BR

nik


On Tue, Nov 17, 2015 at 08:53:37PM +, Michael Kuriger wrote:
> Hey everybody,
> I have 10 servers, each with 2 SSD drives, and 8 SATA drives.  Is it possible 
> to create 2 pools, one made up of SSD and one made up of SATA?  I tried 
> manually editing the crush map to do it, but the configuration doesn’t seem 
> to persist reboots.  Any help would be very appreciated.
> 
> Thanks!
> 
> Mike
> 
> 

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
-----
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpaZwgaH_fdn.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] python binding - snap rollback - progress reporting

2015-11-08 Thread Nikola Ciprich
Hello,

I'd like to ask - I'm using python RBD/rados bindings. Everything
works well for me, the only thing I'd like to improve is snapshots rollback
as the operation is quite time consuming, I would like to report it's progress.

is this somehow possible? even at the cost of implementing whole rollback 
operation by myself?

thanks a lot in advance!

BR

nik


-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpGcfrLSPucy.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] qemu (or librbd in general) - very high load on client side

2015-06-16 Thread Nikola Ciprich
Hello dear ceph developers and users,

I've spent some time tuning and measuring our ceph cluster performance,
and noticed quite strange thing..

I've been using fio (using both rbd engine on hosts and direct block (aio) 
engine
inside qemu-kvm guests (qemu connected to ceph storage using rbd)) and I
noticed client part always generates huge amount of CPU load and therefore
CLIENT seems to be the bottleneck.

For example, when I measure direct SSD performance on one of ceph OSDs, I'm
getting 100k IOPS (which is OK, according to SSD specs) using fio, but when
I measure performance of ceph SSD pool volume, it's much worse. I'd understand,
if the bottlenect would be ceph-osd processes (or some other ceph component),
but it seems to me fio using rbd engine is the problem here (it's able
to eat 6 CPU cores itself). Seems to be very similar when using qemu to access
the ceph storage - it shows very high cpu utilisation (i'm using virtio-scsi
for guest disk emulation). This behaviour is for both random and sequential IO.

preloading libtcmalloc helps fio (and I also tried compiling qemu with
libtcmallc, it also helps), but still it seems to me that there could be 
something
wrong in librbd..

Has anyone else noticed this behaviour? I noticed on some mail threads, that 
disabling
cephx authentication can help a lot, but I don't really like this idea and
haven't tried it yet..

with best regards

nik



-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpsv3bZhzGOC.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] very different performance on two volumes in the same pool #2

2015-05-11 Thread Nikola Ciprich
On Mon, May 11, 2015 at 06:07:21AM +, Somnath Roy wrote:
 Yes, you need to run fio clients on a separate box, it will take quite a bit 
 of cpu.
 Stopping OSDs on other nodes, rebalancing will start. Have you waited cluster 
 to go for active + clean state ? If you are running while rebalancing is 
 going on , the performance will be impacted.
I set noout, so there was no rebalancing, I forgot to mention that..


 
 ~110%  cpu util seems pretty low. Try to run fio_rbd with more num_jobs (say 
 3 or 4 or more), io_depth =64 is fine and see if it improves performance or 
 not.
ok, increasing jobs to 4 seems to squeeze a bit more from the cluster, about 
43.3K iops..

OSD cpu util jumps to ~300% on both alive nodes, so there seems to be still a 
bit
of reserves.. 

 Also, since you have 3 OSDs (3 nodes?), I would suggest to tweak the 
 following settings
 
 osd_op_num_threads_per_shard
 osd_op_num_shards 
 
 May be (1,10 / 1,15 / 2, 10 ?).

tried all those combinations, but it doesn't make almost any difference..

do you think I could get more then those 43k?

one more think that makes me wonder a bit is this line I can see in perf:
  2.21%  libsoftokn3.so [.] 0x0001ebb2

I suppose this has something to do with resolving, 2.2% seems quite a lot to 
me..
Should I be worried about it? Does it make sense to enable kernel DNS resolving
support in ceph?

thanks for your time Somnath!

nik



 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: Nikola Ciprich [mailto:nikola.cipr...@linuxbox.cz] 
 Sent: Sunday, May 10, 2015 10:33 PM
 To: Somnath Roy
 Cc: ceph-users; n...@linuxbox.cz
 Subject: Re: [ceph-users] very different performance on two volumes in the 
 same pool #2
 
 
 On Mon, May 11, 2015 at 05:20:25AM +, Somnath Roy wrote:
  Two things..
  
  1. You should always use SSD drives for benchmarking after preconditioning 
  it.
 well, I don't really understand... ?
 
  
  2. After creating and mapping rbd lun, you need to write data first to 
  read it afterword otherwise fio output will be misleading. In fact, I 
  think you will see IO is not even hitting cluster (check with ceph -s)
 yes, so this approves my conjecture. ok.
 
 
  
  Now, if you are saying it's a 3 OSD setup, yes, ~23K is pretty low. Check 
  the following.
  
  1. Check client or OSd node cpu is saturating or not.
 On OSD nodes, I can see cpeh-osd CPU utilisation of ~110%. On client node 
 (which is one of OSD nodes as well), I can see fio eating quite lot of CPU 
 cycles.. I tried stopping ceph-osd on this node (thus only two nodes are 
 serving data) and performance got a bit higher, to ~33k IOPS. But still I 
 think it's not very good..
 
 
  
  2. With 4K, hope network BW is fine
 I think it's ok..
 
 
  
  3. Number of PGs/pool should be ~128 or so.
 I'm using pg_num 128
 
 
  
  4. If you are using krbd, you might want to try latest krbd module where 
  TCP_NODELAY problem is fixed. If you don't want that complexity, try with 
  fio-rbd.
 I'm not using RBD (only for writing data to volume), for benchmarking, I'm 
 using fio-rbd.
 
 anything else I could check?
 
 
  
  Hope this helps,
  
  Thanks  Regards
  Somnath
  
  -Original Message-
  From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
  Of Nikola Ciprich
  Sent: Sunday, May 10, 2015 9:43 PM
  To: ceph-users
  Cc: n...@linuxbox.cz
  Subject: [ceph-users] very different performance on two volumes in the 
  same pool #2
  
  Hello ceph developers and users,
  
  some time ago, I posted here a question regarding very different 
  performance for two volumes in one pool (backed by SSD drives).
  
  After some examination, I probably got to the root of the problem..
  
  When I create fresh volume (ie rbd create --image-format 2 --size 
  51200 ssd/test) and run random io fio benchmark
  
  fio  --randrepeat=1 --ioengine=rbd --direct=1 --gtod_reduce=1 
  --name=test --pool=ssd3r --rbdname=${rbdname} --invalidate=1 --bs=4k 
  --iodepth=64 --readwrite=randread
  
  I get very nice performance of up to 200k IOPS. However once the volume is 
  written to (ie when I map it using rbd map and dd whole volume with some 
  random data), and repeat the benchmark, random performance drops to ~23k 
  IOPS.
  
  This leads me to conjecture that for unwritten (sparse) volumes, read is 
  just a noop, simply returning zeroes without really having to read data 
  from physical storage, and thus showing nice performance, but once the 
  volume is written, performance drops due to need to physically read the 
  data, right?
  
  However I'm a bit unhappy about the performance drop, the pool is backed by 
  3 SSD drives (each having random io performance of 100k iops) on three 
  nodes, and object size is set to 3. Cluster is completely idle, nodes are 
  quad core Xeons E3-1220 v3 @ 3.10GHz, 32GB RAM each, centos 6, kernel 
  3.18.12, ceph 0.94.1. I'm using libtcmalloc (I even tried upgrading 
  gperftools-libs to 2.4) Nodes

Re: [ceph-users] very different performance on two volumes in the same pool #2

2015-05-10 Thread Nikola Ciprich

On Mon, May 11, 2015 at 05:20:25AM +, Somnath Roy wrote:
 Two things..
 
 1. You should always use SSD drives for benchmarking after preconditioning it.
well, I don't really understand... ?

 
 2. After creating and mapping rbd lun, you need to write data first to read 
 it afterword otherwise fio output will be misleading. In fact, I think you 
 will see IO is not even hitting cluster (check with ceph -s)
yes, so this approves my conjecture. ok.


 
 Now, if you are saying it's a 3 OSD setup, yes, ~23K is pretty low. Check the 
 following.
 
 1. Check client or OSd node cpu is saturating or not.
On OSD nodes, I can see cpeh-osd CPU utilisation of ~110%. On client node 
(which is one
of OSD nodes as well), I can see fio eating quite lot of CPU cycles.. I tried 
stopping
ceph-osd on this node (thus only two nodes are serving data) and performance 
got a bit higher,
to ~33k IOPS. But still I think it's not very good..


 
 2. With 4K, hope network BW is fine
I think it's ok..


 
 3. Number of PGs/pool should be ~128 or so.
I'm using pg_num 128


 
 4. If you are using krbd, you might want to try latest krbd module where 
 TCP_NODELAY problem is fixed. If you don't want that complexity, try with 
 fio-rbd.
I'm not using RBD (only for writing data to volume), for benchmarking, I'm 
using fio-rbd.

anything else I could check?


 
 Hope this helps,
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
 Nikola Ciprich
 Sent: Sunday, May 10, 2015 9:43 PM
 To: ceph-users
 Cc: n...@linuxbox.cz
 Subject: [ceph-users] very different performance on two volumes in the same 
 pool #2
 
 Hello ceph developers and users,
 
 some time ago, I posted here a question regarding very different performance 
 for two volumes in one pool (backed by SSD drives).
 
 After some examination, I probably got to the root of the problem..
 
 When I create fresh volume (ie rbd create --image-format 2 --size 51200 
 ssd/test) and run random io fio benchmark
 
 fio  --randrepeat=1 --ioengine=rbd --direct=1 --gtod_reduce=1 --name=test 
 --pool=ssd3r --rbdname=${rbdname} --invalidate=1 --bs=4k --iodepth=64 
 --readwrite=randread
 
 I get very nice performance of up to 200k IOPS. However once the volume is 
 written to (ie when I map it using rbd map and dd whole volume with some 
 random data), and repeat the benchmark, random performance drops to ~23k IOPS.
 
 This leads me to conjecture that for unwritten (sparse) volumes, read is just 
 a noop, simply returning zeroes without really having to read data from 
 physical storage, and thus showing nice performance, but once the volume is 
 written, performance drops due to need to physically read the data, right?
 
 However I'm a bit unhappy about the performance drop, the pool is backed by 3 
 SSD drives (each having random io performance of 100k iops) on three nodes, 
 and object size is set to 3. Cluster is completely idle, nodes are quad core 
 Xeons E3-1220 v3 @ 3.10GHz, 32GB RAM each, centos 6, kernel 3.18.12, ceph 
 0.94.1. I'm using libtcmalloc (I even tried upgrading gperftools-libs to 2.4) 
 Nodes are connected using 10gb ethernet, with jumbo frames enabled.
 
 
 I tried tuning following values:
 
 osd_op_threads = 5
 filestore_op_threads = 4
 osd_op_num_threads_per_shard = 1
 osd_op_num_shards = 25
 filestore_fd_cache_size = 64
 filestore_fd_cache_shards = 32
 
 I don't see anything special in perf:
 
   5.43%  [kernel]  [k] acpi_processor_ffh_cstate_enter
   2.93%  libtcmalloc.so.4.2.6  [.] 0x00017d2c
   2.45%  libpthread-2.12.so[.] pthread_mutex_lock
   2.37%  libpthread-2.12.so[.] pthread_mutex_unlock
   2.33%  [kernel]  [k] do_raw_spin_lock
   2.00%  libsoftokn3.so[.] 0x0001f455
   1.96%  [kernel]  [k] __switch_to
   1.32%  [kernel]  [k] __schedule
   1.24%  libstdc++.so.6.0.13   [.] std::basic_ostreamchar, 
 std::char_traitschar  std::__ostream_insertchar, std::char_traitschar 
 (std::basic_ostreamchar, std::char
   1.24%  libc-2.12.so  [.] memcpy
   1.19%  libtcmalloc.so.4.2.6  [.] operator delete(void*)
   1.16%  [kernel]  [k] __d_lookup_rcu
   1.09%  libstdc++.so.6.0.13   [.] 0x0007d6be
   0.93%  libstdc++.so.6.0.13   [.] std::basic_streambufchar, 
 std::char_traitschar ::xsputn(char const*, long)
   0.93%  ceph-osd  [.] crush_hash32_3
   0.85%  libc-2.12.so  [.] vfprintf
   0.84%  libc-2.12.so  [.] __strlen_sse42
   0.80%  [kernel]  [k] get_futex_key_refs
   0.80%  libpthread-2.12.so[.] pthread_mutex_trylock
   0.78%  libtcmalloc.so.4.2.6  [.] 
 tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
  unsigned long, int)
   0.71%  libstdc++.so.6.0.13   [.] std::basic_stringchar, 
 std::char_traitschar, std::allocatorchar ::basic_string(std::string 
 const)
   0.68%  ceph-osd  [.] ceph::log::Log::flush

[ceph-users] very different performance on two volumes in the same pool #2

2015-05-10 Thread Nikola Ciprich
Hello ceph developers and users,

some time ago, I posted here a question regarding very different
performance for two volumes in one pool (backed by SSD drives).

After some examination, I probably got to the root of the problem..

When I create fresh volume (ie rbd create --image-format 2 --size 51200 
ssd/test)
and run random io fio benchmark

fio  --randrepeat=1 --ioengine=rbd --direct=1 --gtod_reduce=1 --name=test 
--pool=ssd3r --rbdname=${rbdname} --invalidate=1 --bs=4k --iodepth=64 
--readwrite=randread

I get very nice performance of up to 200k IOPS. However once the volume is
written to (ie when I map it using rbd map and dd whole volume with some random 
data),
and repeat the benchmark, random performance drops to ~23k IOPS.

This leads me to conjecture that for unwritten (sparse) volumes, read
is just a noop, simply returning zeroes without really having to read
data from physical storage, and thus showing nice performance, but once
the volume is written, performance drops due to need to physically read the
data, right?

However I'm a bit unhappy about the performance drop, the pool is backed
by 3 SSD drives (each having random io performance of 100k iops) on three
nodes, and object size is set to 3. Cluster is completely idle, nodes
are quad core Xeons E3-1220 v3 @ 3.10GHz, 32GB RAM each, centos 6, kernel 
3.18.12,
ceph 0.94.1. I'm using libtcmalloc (I even tried upgrading gperftools-libs to 
2.4)
Nodes are connected using 10gb ethernet, with jumbo frames enabled.


I tried tuning following values:

osd_op_threads = 5
filestore_op_threads = 4
osd_op_num_threads_per_shard = 1
osd_op_num_shards = 25
filestore_fd_cache_size = 64
filestore_fd_cache_shards = 32

I don't see anything special in perf:

  5.43%  [kernel]  [k] acpi_processor_ffh_cstate_enter
  2.93%  libtcmalloc.so.4.2.6  [.] 0x00017d2c
  2.45%  libpthread-2.12.so[.] pthread_mutex_lock
  2.37%  libpthread-2.12.so[.] pthread_mutex_unlock
  2.33%  [kernel]  [k] do_raw_spin_lock
  2.00%  libsoftokn3.so[.] 0x0001f455
  1.96%  [kernel]  [k] __switch_to
  1.32%  [kernel]  [k] __schedule
  1.24%  libstdc++.so.6.0.13   [.] std::basic_ostreamchar, 
std::char_traitschar  std::__ostream_insertchar, std::char_traitschar 
(std::basic_ostreamchar, std::char
  1.24%  libc-2.12.so  [.] memcpy
  1.19%  libtcmalloc.so.4.2.6  [.] operator delete(void*)
  1.16%  [kernel]  [k] __d_lookup_rcu
  1.09%  libstdc++.so.6.0.13   [.] 0x0007d6be
  0.93%  libstdc++.so.6.0.13   [.] std::basic_streambufchar, 
std::char_traitschar ::xsputn(char const*, long)
  0.93%  ceph-osd  [.] crush_hash32_3
  0.85%  libc-2.12.so  [.] vfprintf
  0.84%  libc-2.12.so  [.] __strlen_sse42
  0.80%  [kernel]  [k] get_futex_key_refs
  0.80%  libpthread-2.12.so[.] pthread_mutex_trylock
  0.78%  libtcmalloc.so.4.2.6  [.] 
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, 
unsigned long, int)
  0.71%  libstdc++.so.6.0.13   [.] std::basic_stringchar, 
std::char_traitschar, std::allocatorchar ::basic_string(std::string const)
  0.68%  ceph-osd  [.] ceph::log::Log::flush()
  0.66%  libtcmalloc.so.4.2.6  [.] tc_free
  0.63%  [kernel]  [k] resched_curr
  0.63%  [kernel]  [k] page_fault
  0.62%  libstdc++.so.6.0.13   [.] std::string::reserve(unsigned long)

I'm running benchmark directly on one of nodes, which I know is not optimal,
but it's still able to give those 200k iops for empty volume, so I guess it
shouldn't be problem..

Another story is random write performance, which is totally poor, but I't like
to deal with read performance first..


so my question is, are those numbers normal? If not, what should I check?

I'll be very grateful for all the hints I could get..

thanks a lot in advance

nik


-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpWv_92Orh0U.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] very different performance on two volumes in the same pool

2015-04-27 Thread Nikola Ciprich
Hello Somnath,
 Thanks for the perf data..It seems innocuous..I am not seeing single tcmalloc 
 trace, are you running with tcmalloc by the way ?

according to ldd, it seems I have it compiled in, yes:
[root@vfnphav1a ~]# ldd /usr/bin/ceph-osd
.
.
libtcmalloc.so.4 = /usr/lib64/libtcmalloc.so.4 (0x7f7a3756e000)
.
.


 What about my other question, is the performance of slow volume increasing if 
 you stop IO on the other volume ?
I don't have any other cpeh users, actually whole cluster is idle..

 Are you using default ceph.conf ? Probably, you want to try with different 
 osd_op_num_shards (may be = 10 , based on your osd server config) and 
 osd_op_num_threads_per_shard (may be = 1). Also, you may want to see the 
 effect by doing osd_enable_op_tracker = false

I guess I'm using pretty default settings, few changes probably not much 
related:

[osd]
osd crush update on start = false

[client]
rbd cache = true
rbd cache writethrough until flush = true

[mon]
debug paxos = 0



I now tried setting
throttler perf counter = false
osd enable op tracker = false
osd_op_num_threads_per_shard = 1
osd_op_num_shards = 10

and restarting all ceph servers.. but it seems to make no big difference..


 
 Are you seeing similar resource consumption on both the servers while IO is 
 going on ?
yes, on all three nodes, ceph-osd seems to be consuming lots of CPU during 
benchmark.

 
 Need some information about your client, are the volumes exposed with krbd or 
 running with librbd environment ? If krbd and with same physical box, hope 
 you mapped the images with 'noshare' enabled.

I'm using fio with ceph engine, so I guess none rbd related stuff is in use 
here?


 
 Too many questions :-)  But, this may give some indication what is going on 
 there.
:-) hopefully my answers are not too confused, I'm still pretty new to ceph..

BR

nik


 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: Nikola Ciprich [mailto:nikola.cipr...@linuxbox.cz] 
 Sent: Sunday, April 26, 2015 7:32 AM
 To: Somnath Roy
 Cc: ceph-users@lists.ceph.com; n...@linuxbox.cz
 Subject: Re: [ceph-users] very different performance on two volumes in the 
 same pool
 
 Hello Somnath,
 
 On Fri, Apr 24, 2015 at 04:23:19PM +, Somnath Roy wrote:
  This could be again because of tcmalloc issue I reported earlier.
  
  Two things to observe.
  
  1. Is the performance improving if you stop IO on other volume ? If so, it 
  could be different issue.
 there is no other IO.. only cephfs mounted, but no users of it.
 
  
  2. Run perf top in the OSD node and see if tcmalloc traces are popping up.
 
 don't see anything special:
 
   3.34%  libc-2.12.so  [.] _int_malloc
   2.87%  libc-2.12.so  [.] _int_free
   2.79%  [vdso][.] __vdso_gettimeofday
   2.67%  libsoftokn3.so[.] 0x0001fad9
   2.34%  libfreeblpriv3.so [.] 0x000355e6
   2.33%  libpthread-2.12.so[.] pthread_mutex_unlock
   2.19%  libpthread-2.12.so[.] pthread_mutex_lock
   1.80%  libc-2.12.so  [.] malloc
   1.43%  [kernel]  [k] do_raw_spin_lock
   1.42%  libc-2.12.so  [.] memcpy
   1.23%  [kernel]  [k] __switch_to
   1.19%  [kernel]  [k] acpi_processor_ffh_cstate_enter
   1.09%  libc-2.12.so  [.] malloc_consolidate
   1.08%  [kernel]  [k] __schedule
   1.05%  libtcmalloc.so.4.1.0  [.] 0x00017e6f
   0.98%  libc-2.12.so  [.] vfprintf
   0.83%  libstdc++.so.6.0.13   [.] std::basic_ostreamchar, 
 std::char_traitschar  std::__ostream_insertchar, std::char_traitschar 
 (std::basic_ostreamchar,
   0.76%  libstdc++.so.6.0.13   [.] 0x0008092a
   0.73%  libc-2.12.so  [.] __memset_sse2
   0.72%  libc-2.12.so  [.] __strlen_sse42
   0.70%  libstdc++.so.6.0.13   [.] std::basic_streambufchar, 
 std::char_traitschar ::xsputn(char const*, long)
   0.68%  libpthread-2.12.so[.] pthread_mutex_trylock
   0.67%  librados.so.2.0.0 [.] ceph_crc32c_sctp
   0.63%  libpython2.6.so.1.0   [.] 0x0007d823
   0.55%  libnss3.so[.] 0x00056d2a
   0.52%  libc-2.12.so  [.] free
   0.50%  libstdc++.so.6.0.13   [.] std::basic_stringchar, 
 std::char_traitschar, std::allocatorchar ::basic_string(std::string 
 const)
 
 should I check anything else?
 BR
 nik
 
 
  
  Thanks  Regards
  Somnath
  
  -Original Message-
  From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
  Nikola Ciprich
  Sent: Friday, April 24, 2015 7:10 AM
  To: ceph-users@lists.ceph.com
  Cc: n...@linuxbox.cz
  Subject: [ceph-users] very different performance on two volumes in the same 
  pool
  
  Hello,
  
  I'm trying to solve a bit mysterious situation:
  
  I've got 3 nodes CEPH cluster

Re: [ceph-users] 3.18.11 - RBD triggered deadlock?

2015-04-26 Thread Nikola Ciprich
 tcp0  0 10.0.0.1:6809   10.0.0.1:59692
 ESTABLISHED 20182/ceph-osd
 tcp0 4163543 10.0.0.1:59692  10.0.0.1:6809 ESTABLISHED -
 
 You got bitten by a recently fixed regression.  It's never been a good
 idea to co-locate kernel client with osds, and we advise not to do it.
 However it happens to work most of the time so you can do it if you
 really want to.  That happens to work part got accidentally broken in
 3.18 and was fixed in 4.0, 3.19.5 and 3.18.12.
 
 You are running 3.18.11, so you are going to need to upgrade.

tried upgrading to 3.18.12 and and can no longer reproduce the issue.
Thanks a lot!

BR

nik


 
 Thanks,
 
 Ilya
 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgp7euaxcLrX7.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] very different performance on two volumes in the same pool

2015-04-26 Thread Nikola Ciprich
Hello Somnath,

On Fri, Apr 24, 2015 at 04:23:19PM +, Somnath Roy wrote:
 This could be again because of tcmalloc issue I reported earlier.
 
 Two things to observe.
 
 1. Is the performance improving if you stop IO on other volume ? If so, it 
 could be different issue.
there is no other IO.. only cephfs mounted, but no users of it.

 
 2. Run perf top in the OSD node and see if tcmalloc traces are popping up.

don't see anything special:

  3.34%  libc-2.12.so  [.] _int_malloc
  2.87%  libc-2.12.so  [.] _int_free
  2.79%  [vdso][.] __vdso_gettimeofday
  2.67%  libsoftokn3.so[.] 0x0001fad9
  2.34%  libfreeblpriv3.so [.] 0x000355e6
  2.33%  libpthread-2.12.so[.] pthread_mutex_unlock
  2.19%  libpthread-2.12.so[.] pthread_mutex_lock
  1.80%  libc-2.12.so  [.] malloc
  1.43%  [kernel]  [k] do_raw_spin_lock
  1.42%  libc-2.12.so  [.] memcpy
  1.23%  [kernel]  [k] __switch_to
  1.19%  [kernel]  [k] acpi_processor_ffh_cstate_enter
  1.09%  libc-2.12.so  [.] malloc_consolidate
  1.08%  [kernel]  [k] __schedule
  1.05%  libtcmalloc.so.4.1.0  [.] 0x00017e6f
  0.98%  libc-2.12.so  [.] vfprintf
  0.83%  libstdc++.so.6.0.13   [.] std::basic_ostreamchar, 
std::char_traitschar  std::__ostream_insertchar, std::char_traitschar 
(std::basic_ostreamchar,
  0.76%  libstdc++.so.6.0.13   [.] 0x0008092a
  0.73%  libc-2.12.so  [.] __memset_sse2
  0.72%  libc-2.12.so  [.] __strlen_sse42
  0.70%  libstdc++.so.6.0.13   [.] std::basic_streambufchar, 
std::char_traitschar ::xsputn(char const*, long)
  0.68%  libpthread-2.12.so[.] pthread_mutex_trylock
  0.67%  librados.so.2.0.0 [.] ceph_crc32c_sctp
  0.63%  libpython2.6.so.1.0   [.] 0x0007d823
  0.55%  libnss3.so[.] 0x00056d2a
  0.52%  libc-2.12.so  [.] free
  0.50%  libstdc++.so.6.0.13   [.] std::basic_stringchar, 
std::char_traitschar, std::allocatorchar ::basic_string(std::string const)

should I check anything else?
BR
nik


 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
 Nikola Ciprich
 Sent: Friday, April 24, 2015 7:10 AM
 To: ceph-users@lists.ceph.com
 Cc: n...@linuxbox.cz
 Subject: [ceph-users] very different performance on two volumes in the same 
 pool
 
 Hello,
 
 I'm trying to solve a bit mysterious situation:
 
 I've got 3 nodes CEPH cluster, and pool made of 3 OSDs (each on one node), 
 OSDs are 1TB SSD drives.
 
 pool has 3 replicas set. I'm measuring random IO performance using fio:
 
 fio  --randrepeat=1 --ioengine=rbd --direct=1 --gtod_reduce=1 --name=test 
 --pool=ssd3r --rbdname=${rbdname} --invalidate=1 --bs=4k --iodepth=64 
 --readwrite=randread --output=randio.log
 
 it's giving very nice performance of ~ 186K IOPS for random read.
 
 the problem is, I've got one volume on which it fives only ~20K IOPS and I 
 can't figure why. It's created using python, so I first suspected it can be 
 similar to missing layerign problem I was consulting here few days ago, but 
 when I tried reproducing it, I'm beting ~180K IOPS even for another volumes 
 created using python.
 
 so there is only this one problematic, others are fine. Since there is only 
 one SSD in each box and I'm using 3 replicas, there should not be any 
 difference in physical storage used between volumes..
 
 I'm using hammer, 0.94.1, fio 2.2.6.
 
 here's RBD info:
 
 slow volume:
 
 [root@vfnphav1a fio]# rbd info ssd3r/vmtst23-6 rbd image 'vmtst23-6':
 size 30720 MB in 7680 objects
 order 22 (4096 kB objects)
 block_name_prefix: rbd_data.1376d82ae8944a
 format: 2
 features:
 flags:
 
 fast volume:
 [root@vfnphav1a fio]# rbd info ssd3r/vmtst23-7 rbd image 'vmtst23-7':
 size 30720 MB in 7680 objects
 order 22 (4096 kB objects)
 block_name_prefix: rbd_data.13d01d2ae8944a
 format: 2
 features:
 flags:
 
 any idea on what could be wrong here?
 
 thanks a lot in advance!
 
 BR
 
 nik
 
 --
 -
 Ing. Nikola CIPRICH
 LinuxBox.cz, s.r.o.
 28.rijna 168, 709 00 Ostrava
 
 tel.:   +420 591 166 214
 fax:+420 596 621 273
 mobil:  +420 777 093 799
 www.linuxbox.cz
 
 mobil servis: +420 737 238 656
 email servis: ser...@linuxbox.cz
 -
 
 
 
 PLEASE NOTE: The information contained in this electronic mail message is 
 intended only for the use of the designated recipient(s) named above. If the 
 reader of this message is not the intended recipient, you are hereby notified 
 that you have received this message in error and that any review

Re: [ceph-users] 3.18.11 - RBD triggered deadlock?

2015-04-25 Thread Nikola Ciprich
 
 It seems you just grepped for ceph-osd - that doesn't include sockets
 opened by the kernel client, which is what I was after.  Paste the
 entire netstat?
ouch, bummer!  here are full netstats, sorry about delay..

http://nik.lbox.cz/download/ceph/

BR

nik


 
 Thanks,
 
 Ilya
 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpFPyi0Qlehp.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] very different performance on two volumes in the same pool

2015-04-24 Thread Nikola Ciprich
Hello,

I'm trying to solve a bit mysterious situation:

I've got 3 nodes CEPH cluster, and pool made of 3 OSDs
(each on one node), OSDs are 1TB SSD drives.

pool has 3 replicas set. I'm measuring random IO performance
using fio:

fio  --randrepeat=1 --ioengine=rbd --direct=1 --gtod_reduce=1 --name=test 
--pool=ssd3r --rbdname=${rbdname} --invalidate=1 --bs=4k --iodepth=64 
--readwrite=randread --output=randio.log

it's giving very nice performance of ~ 186K IOPS for random read.

the problem is, I've got one volume on which it fives only ~20K IOPS
and I can't figure why. It's created using python, so I first
suspected it can be similar to missing layerign problem I was consulting
here few days ago, but when I tried reproducing it, I'm beting ~180K IOPS
even for another volumes created using python.

so there is only this one problematic, others are fine. Since there is
only one SSD in each box and I'm using 3 replicas, there should not be
any difference in physical storage used between volumes..

I'm using hammer, 0.94.1, fio 2.2.6.

here's RBD info:

slow volume:

[root@vfnphav1a fio]# rbd info ssd3r/vmtst23-6
rbd image 'vmtst23-6':
size 30720 MB in 7680 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.1376d82ae8944a
format: 2
features: 
flags: 

fast volume:
[root@vfnphav1a fio]# rbd info ssd3r/vmtst23-7
rbd image 'vmtst23-7':
size 30720 MB in 7680 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.13d01d2ae8944a
format: 2
features: 
flags: 

any idea on what could be wrong here?

thanks a lot in advance!

BR

nik

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpvrmrUaU6_j.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 3.18.11 - RBD triggered deadlock?

2015-04-24 Thread Nikola Ciprich
]  [81050dc4] ? 
do_exit+0x6e4/0xaa0
Apr 24 17:09:45 vfnphav1a kernel: [340711.180987]  [8106a8b0] ? 
__init_kthread_worker+0x40/0x40
Apr 24 17:09:45 vfnphav1a kernel: [340711.187757]  [81498d88] 
ret_from_fork+0x58/0x90
Apr 24 17:09:45 vfnphav1a kernel: [340711.193652]  [8106a8b0] ? 
__init_kthread_worker+0x40/0x40

the process started running after some time, but it's excruciatingly slow, 
with speeds about 40KB/s.
all ceph processes seem to be mostly idle..

From the backtrace I'm not sure if this can't be network adapter problem, since 
I see
some bnc2x_ locking functions, but network seems to be running fine otherwise
and I didn't have any issuess till I tried heavily using RBD..

If I could provide some more information, please let me know.

BR

nik


-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpR52at9miX1.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 3.18.11 - RBD triggered deadlock?

2015-04-24 Thread Nikola Ciprich
 
 Does this mean rbd device is mapped on a node that also runs one or
 more osds?
yes.. I know it's not the best practice, but it's just test cluster..
 
 Can you watch osd sockets in netstat for a while and describe what you
 are seeing or forward a few representative samples?

sure, here it is:
http://nik.lbox.cz/download/netstat-osd.log

it doesn't seem to change at all. (just to be exact, there are
3 OSD on each node, 2 are SATA drives which are not used in this pool
though). there are currently no other ceph users apart from this testing
RBD.

I'll have to get off computer for today in few minutes, so I won't
be able to help much today, but I'll be able to send whatever you need
tommorou or whenever later will you wish.

n.



 
 Thanks,
 
 Ilya
 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgp9thQI3SgPg.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hammer (0.94.1) - still getting feature set mismatch for cephfs mount requests

2015-04-20 Thread Nikola Ciprich
  Your crushmap has straw2 buckets (alg straw2).  That's going to be
  supported in 4.1 kernel - when 3.18 was released none of the straw2
  stuff existed.
  I see.. maybe this is a bit too radical setting for optimal preset?
 
 Well, it depends on how you look at it.  Generally optimal is something
 that is the best or most desirable, and for hammer cluster it's going
 to be hammer tunables ;)  You have to remember that kernel client is
 just another client as far as ceph concerned.
yes, this makes sense and it's pretty easy to fix in case of need

thanks for your time!

 
 Thanks,
 
 Ilya
 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpesQ1A24XdQ.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] hammer (0.94.1) - still getting feature set mismatch for cephfs mount requests

2015-04-20 Thread Nikola Ciprich
Hello,

I'm quite new to ceph, so please forgive my ignorance.
Yesterday, I've deployed small test cluster (3 nodes, 2 SATA + 1 SSD OSD / node)

I enabled MDS server and created cephfs data + metadata pools and created
filesystem.

However upon mount requests, I'm getting following error:

[Apr20 10:09] libceph: mon0 10.0.0.1:6789 feature set mismatch, my 2b84a042aca 
 server's 102b84a042aca, missing 1

This two threads seem related to me:
http://www.spinics.net/lists/ceph-users/msg17406.html
(protocol feature mismatch after upgrading to Hammer)

and 
http://www.spinics.net/lists/ceph-users/msg17445.html
(crush issues in v0.94 hammer)

but I'm using 0.94.1 on all nodes (and 3.18.11) kernel
and am still getting those errors, which to my understanding
I shouldn't be..

What should I check please?

In case it could help, my crushmap can be checked here:

http://nik.lbox.cz/download/ceph/crushmap.txt

with best regards

nik



-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgp9vpeUYb6Np.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hammer (0.94.1) - still getting feature set mismatch for cephfs mount requests

2015-04-20 Thread Nikola Ciprich
Hello Ilya,
 Have you set your crush tunables to hammer?

I've set crush tunables to optimal (therefore I guess they got set
to hammer).


 
 Your crushmap has straw2 buckets (alg straw2).  That's going to be
 supported in 4.1 kernel - when 3.18 was released none of the straw2
 stuff existed.
I see.. maybe this is a bit too radical setting for optimal preset?

 
 You should be able to change alg straw2 to alg straw and that
 should make it work with 3.18 kernel.
It indeed helped! Thanks!

BR

nik



 
 Thanks,
 
 Ilya
 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpURrowE0LXW.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] hammer (0.94.1) - image must support layering(38) Function not implemented on v2 image

2015-04-20 Thread Nikola Ciprich
Hello,

I'd like to ask about another problem I've stumbled upon..

I've got format 2 image + snapshot, and while trying to protect snapshot
I'm getting following error:

[root@vfnphav1a ~]# rbd ls -l ssd2r
NAMESIZE PARENT FMT PROT LOCK 
fio_test   4096M  2   
template-win2k8-20150420  40960M  2   
template-win2k8-20150420@snap 40960M  2   

[root@vfnphav1a ~]# rbd snap protect ssd2r/template-win2k8-20150420@snap
rbd: protecting snap failed: 2015-04-20 16:47:31.587489 7f5e9e4fa760 -1 librbd: 
snap_protect: image must support layering(38) Function not implemented


am I doing something wrong?

thanks a lot in advance for reply

BR

nik


-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpNPs9zf65iQ.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hammer (0.94.1) - image must support layering(38) Function not implemented on v2 image

2015-04-20 Thread Nikola Ciprich
Hello Jason,

On Mon, Apr 20, 2015 at 01:48:14PM -0400, Jason Dillaman wrote:
 Can you please run 'rbd info' on template-win2k8-20150420 and 
 template-win2k8-20150420@snap?  I just want to verify which RBD features are 
 currently enabled on your images.  Have you overridden the value of 
 rbd_default_features in your ceph.conf?  Did you use the new rbd CLI option 
 '--image-features' when creating the image?

sure, now I can see the difference:


this is image created using rbd create ...

[root@vfnphav1a python-rbd]# rbd info ssd2r/template-win2k8-20150420
rbd image 'template-win2k8-20150420':
size 40960 MB in 10240 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.abc32ae8944a
format: 2
features: layering
flags: 

this is the image created using python script:

[root@vfnphav1a python-rbd]# rbd info ssd2r/template-win2k8-20150420_
rbd image 'template-win2k8-20150420_':
size 40960 MB in 10240 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.5e6236db3ab3
format: 2
features: 
flags: 


I haven't used any --image-features, nor have I rbd_default_features in 
ceph.conf.

apparantly the problem is in missing layering feature. So the python rbd create
method does not enable layering, although v2 format is used - when I added
rbd.RBD_FEATURE_LAYERING flag, I can properly protect created snapshots.

problem solved for me :)

Maybe the question is, whether layering should be enabled by default, but
now that I know what is the problem, It's no big deal..

thanks a lot for your time!

BR

nikola ciprich





 
 -- 
 
 Jason Dillaman 
 Red Hat 
 dilla...@redhat.com 
 http://www.redhat.com 
 
 
 - Original Message -
 From: Nikola Ciprich nikola.cipr...@linuxbox.cz
 To: Jason Dillaman dilla...@redhat.com
 Cc: ceph-users@lists.ceph.com
 Sent: Monday, April 20, 2015 12:41:26 PM
 Subject: Re: [ceph-users] hammer (0.94.1) - image must support layering(38) 
 Function not implemented on v2 image
 
 Hello Jason,
 
 here it is:
 
 [root@vfnphav1a ceph]# rbd snap protect ssd2r/template-win2k8-20150420_@snap
 2015-04-20 18:33:43.635427 7fa0344ca760 20 librbd::ImageCtx: enabling 
 caching...
 2015-04-20 18:33:43.635458 7fa0344ca760 20 librbd::ImageCtx: Initial cache 
 settings: size=33554432 num_objects=10 max_dirty=25165824 
 target_dirty=16777216 max_dirty_age=1
 2015-04-20 18:33:43.635497 7fa0344ca760 20 librbd: open_image: ictx = 
 0x4672010 name = 'template-win2k8-20150420_' id = '' snap_name = ''
 2015-04-20 18:33:43.636792 7fa0344ca760 20 librbd: detect format of 
 template-win2k8-20150420_ : new
 2015-04-20 18:33:43.637901 7fa0344ca760 10 librbd::ImageCtx:  cache bytes 
 33554432 order 22 - about 42 objects
 2015-04-20 18:33:43.637906 7fa0344ca760 10 librbd::ImageCtx: init_layout 
 stripe_unit 4194304 stripe_count 1 object_size 4194304 prefix 
 rbd_data.5e6236db3ab3 format rbd_data.5e6236db3ab3.%016llx
 2015-04-20 18:33:43.637932 7fa0344ca760 10 librbd::ImageWatcher: registering 
 image watcher
 2015-04-20 18:33:43.643651 7fa0344ca760 20 librbd: ictx_refresh 0x4672010
 2015-04-20 18:33:43.645062 7fa0344ca760 20 librbd: new snapshot id=6 
 name=snap size=42949672960 features=0
 2015-04-20 18:33:43.645075 7fa0344ca760 20 librbd::ImageCtx: finished 
 flushing cache
 2015-04-20 18:33:43.645083 7fa0344ca760 20 librbd: snap_protect 0x4672010 snap
 2015-04-20 18:33:43.645089 7fa0344ca760 20 librbd: ictx_check 0x4672010
 2015-04-20 18:33:43.645090 7fa0344ca760 -1 librbd: snap_protect: image must 
 support layering
 rbd: protecting snap failed: (38) Function not implemented
 2015-04-20 18:33:43.645115 7fa0344ca760 20 librbd: close_image 0x4672010
 2015-04-20 18:33:43.645117 7fa0344ca760 10 librbd::ImageCtx: canceling async 
 requests: count=0
 2015-04-20 18:33:43.645148 7fa0344ca760 10 librbd::ImageWatcher: 
 unregistering image watcher
 
 
 In the meantime, I realised what could be the difference here.. the image 
 I've got trouble protecting
 snapshot is created using python rbd binding..
 
 here's simple script to reproduce:
 
 #!/usr/bin/python
 
 import rados
 import rbd
 
 rc=rados.Rados(conffile='/etc/ceph/ceph.conf')
 rc.connect()
 ioctx = rc.open_ioctx('ssd2r')
 
 rbdi=rbd.RBD()
 rbdi.create(ioctx, 'test', 1024**2, old_format=False)
 
 will it help?
 
 BR
 
 nik
 
 
 
 
 On Mon, Apr 20, 2015 at 11:35:07AM -0400, Jason Dillaman wrote:
  Can you add debug rbd = 20 your ceph.conf, re-run the command, and 
  provide a link to the generated librbd log messages?
  
  Thanks,
  
  -- 
  
  Jason Dillaman 
  Red Hat 
  dilla...@redhat.com 
  http://www.redhat.com 
  
  
  - Original Message -
  From: Nikola Ciprich nikola.cipr...@linuxbox.cz
  To: ceph-users@lists.ceph.com
  Sent: Monday, April 20, 2015 10:54:17 AM
  Subject: [ceph-users] hammer (0.94.1) - image must support layering(38) 
  Function not implemented on v2 image
  
  Hello,
  
  I'd like to ask about another problem I've stumbled upon..
  
  I've got format 2 image

Re: [ceph-users] hammer (0.94.1) - image must support layering(38) Function not implemented on v2 image

2015-04-20 Thread Nikola Ciprich
Hello Jason,

here it is:

[root@vfnphav1a ceph]# rbd snap protect ssd2r/template-win2k8-20150420_@snap
2015-04-20 18:33:43.635427 7fa0344ca760 20 librbd::ImageCtx: enabling caching...
2015-04-20 18:33:43.635458 7fa0344ca760 20 librbd::ImageCtx: Initial cache 
settings: size=33554432 num_objects=10 max_dirty=25165824 target_dirty=16777216 
max_dirty_age=1
2015-04-20 18:33:43.635497 7fa0344ca760 20 librbd: open_image: ictx = 0x4672010 
name = 'template-win2k8-20150420_' id = '' snap_name = ''
2015-04-20 18:33:43.636792 7fa0344ca760 20 librbd: detect format of 
template-win2k8-20150420_ : new
2015-04-20 18:33:43.637901 7fa0344ca760 10 librbd::ImageCtx:  cache bytes 
33554432 order 22 - about 42 objects
2015-04-20 18:33:43.637906 7fa0344ca760 10 librbd::ImageCtx: init_layout 
stripe_unit 4194304 stripe_count 1 object_size 4194304 prefix 
rbd_data.5e6236db3ab3 format rbd_data.5e6236db3ab3.%016llx
2015-04-20 18:33:43.637932 7fa0344ca760 10 librbd::ImageWatcher: registering 
image watcher
2015-04-20 18:33:43.643651 7fa0344ca760 20 librbd: ictx_refresh 0x4672010
2015-04-20 18:33:43.645062 7fa0344ca760 20 librbd: new snapshot id=6 name=snap 
size=42949672960 features=0
2015-04-20 18:33:43.645075 7fa0344ca760 20 librbd::ImageCtx: finished flushing 
cache
2015-04-20 18:33:43.645083 7fa0344ca760 20 librbd: snap_protect 0x4672010 snap
2015-04-20 18:33:43.645089 7fa0344ca760 20 librbd: ictx_check 0x4672010
2015-04-20 18:33:43.645090 7fa0344ca760 -1 librbd: snap_protect: image must 
support layering
rbd: protecting snap failed: (38) Function not implemented
2015-04-20 18:33:43.645115 7fa0344ca760 20 librbd: close_image 0x4672010
2015-04-20 18:33:43.645117 7fa0344ca760 10 librbd::ImageCtx: canceling async 
requests: count=0
2015-04-20 18:33:43.645148 7fa0344ca760 10 librbd::ImageWatcher: unregistering 
image watcher


In the meantime, I realised what could be the difference here.. the image I've 
got trouble protecting
snapshot is created using python rbd binding..

here's simple script to reproduce:

#!/usr/bin/python

import rados
import rbd

rc=rados.Rados(conffile='/etc/ceph/ceph.conf')
rc.connect()
ioctx = rc.open_ioctx('ssd2r')

rbdi=rbd.RBD()
rbdi.create(ioctx, 'test', 1024**2, old_format=False)

will it help?

BR

nik




On Mon, Apr 20, 2015 at 11:35:07AM -0400, Jason Dillaman wrote:
 Can you add debug rbd = 20 your ceph.conf, re-run the command, and provide 
 a link to the generated librbd log messages?
 
 Thanks,
 
 -- 
 
 Jason Dillaman 
 Red Hat 
 dilla...@redhat.com 
 http://www.redhat.com 
 
 
 - Original Message -
 From: Nikola Ciprich nikola.cipr...@linuxbox.cz
 To: ceph-users@lists.ceph.com
 Sent: Monday, April 20, 2015 10:54:17 AM
 Subject: [ceph-users] hammer (0.94.1) - image must support layering(38) 
 Function not implemented on v2 image
 
 Hello,
 
 I'd like to ask about another problem I've stumbled upon..
 
 I've got format 2 image + snapshot, and while trying to protect snapshot
 I'm getting following error:
 
 [root@vfnphav1a ~]# rbd ls -l ssd2r
 NAMESIZE PARENT FMT PROT LOCK 
 fio_test   4096M  2   
 template-win2k8-20150420  40960M  2   
 template-win2k8-20150420@snap 40960M  2   
 
 [root@vfnphav1a ~]# rbd snap protect ssd2r/template-win2k8-20150420@snap
 rbd: protecting snap failed: 2015-04-20 16:47:31.587489 7f5e9e4fa760 -1 
 librbd: snap_protect: image must support layering(38) Function not implemented
 
 
 am I doing something wrong?
 
 thanks a lot in advance for reply
 
 BR
 
 nik
 
 
 -- 
 -
 Ing. Nikola CIPRICH
 LinuxBox.cz, s.r.o.
 28.rijna 168, 709 00 Ostrava
 
 tel.:   +420 591 166 214
 fax:+420 596 621 273
 mobil:  +420 777 093 799
 www.linuxbox.cz
 
 mobil servis: +420 737 238 656
 email servis: ser...@linuxbox.cz
 -
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpl02NeFF9Lv.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com