[ceph-users] rsync mirror download.ceph.com - broken file on rsync server

2015-10-27 Thread Björn Lässig

Hi,

after having some problems with ipv6 and download.ceph.com, i made a 
mirror (debian-hammer only) for my ipv6-only cluster.


Unfortunately after the release of 0.94.5 the rsync breaks with:

# less rsync-ftpsync-mirror_ceph.error.0
rsync: send_files failed to open 
"/debian-hammer/pool/main/c/ceph/.ceph-dbg_0.94.5-1trusty_amd64.deb.3xQnIQ" 
(in ceph): Permission denied (13)
rsync error: some files/attrs were not transferred (see previous errors) 
(code 23) at main.c(1655) [generator=3.1.1]


indeed there is:

[09:40:49] ~ > rsync -4 -L 
download.ceph.com::ceph/debian-hammer/pool/main/c/ceph/.ceph-dbg_0.94.5-1trusty_amd64.deb.3xQnIQ
-rw--- 91,488,256 2015/10/26 19:36:46 
.ceph-dbg_0.94.5-1trusty_amd64.deb.3xQnIQ


i would be thankful, if you could remove this broken file or complete 
the mirror process.


Thanks in advance

Björn Lässig

PS: a debian like mirror-push mechanic via 
https://ftp-master.debian.org/git/archvsync.git would be very helpful. 
This would add traces and Archive-in-progress files and all the other 
things, the debian-mirror-admins had to fix over the years. This would 
lead to less traffic for you and always up-to-date mirrors for us.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Understanding the number of TCP connections between clients and OSDs

2015-10-27 Thread Dan van der Ster
On Mon, Oct 26, 2015 at 10:48 PM, Jan Schermer  wrote:
> If we're talking about RBD clients (qemu) then the number also grows with
> number of volumes attached to the client.

I never thought about that but it might explain a problem we have
where multiple attached volumes crashes an HV. I had assumed that
multiple volumes would reuse the same rados client instance, and thus
reuse the same connections to the OSDs.

-- dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rsync mirror download.ceph.com - broken file on rsync server

2015-10-27 Thread Björn Lässig

On 10/27/2015 10:22 AM, Wido den Hollander wrote:

On 27-10-15 09:51, Björn Lässig wrote:

after having some problems with ipv6 and download.ceph.com, i made a
mirror (debian-hammer only) for my ipv6-only cluster.


I see you are from Germany, you can also sync from eu.ceph.com


good to know, that you give rsync access, too.
The Problem here is that there is no indication, that the primary mirror 
(download.ceph.com) is in a complete state. I suppose you you use some 
sort of cron-method for synchronizing with download.c.c. So in case the 
primary ist broken, the chain needs more time to repair.
Thats why i decided it is more robust to rsync from download.c.c 
directly. i will change this, if eu.c.c is pushed on update by download.c.c



PS: a debian like mirror-push mechanic via
https://ftp-master.debian.org/git/archvsync.git would be very helpful.
This would add traces and Archive-in-progress files and all the other
things, the debian-mirror-admins had to fix over the years. This would
lead to less traffic for you and always up-to-date mirrors for us.


That is still on my TODO list. We need a nice mirror system for Ceph.


i run a secondary debian mirror at my old university for more than 10 
years now. If you start working on that, please keep me in the loop.
Maybe starting with a ceph-mirror mailinglist would be a possibility, to 
concentrate all interested people.


Björn
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rsync mirror download.ceph.com - broken file on rsync server

2015-10-27 Thread Wido den Hollander


On 27-10-15 09:51, Björn Lässig wrote:
> Hi,
> 
> after having some problems with ipv6 and download.ceph.com, i made a
> mirror (debian-hammer only) for my ipv6-only cluster.
> 

I see you are from Germany, you can also sync from eu.ceph.com

> Unfortunately after the release of 0.94.5 the rsync breaks with:
> 
> # less rsync-ftpsync-mirror_ceph.error.0
> rsync: send_files failed to open
> "/debian-hammer/pool/main/c/ceph/.ceph-dbg_0.94.5-1trusty_amd64.deb.3xQnIQ"
> (in ceph): Permission denied (13)
> rsync error: some files/attrs were not transferred (see previous errors)
> (code 23) at main.c(1655) [generator=3.1.1]
> 
> indeed there is:
> 
> [09:40:49] ~ > rsync -4 -L
> download.ceph.com::ceph/debian-hammer/pool/main/c/ceph/.ceph-dbg_0.94.5-1trusty_amd64.deb.3xQnIQ
> 
> -rw--- 91,488,256 2015/10/26 19:36:46
> .ceph-dbg_0.94.5-1trusty_amd64.deb.3xQnIQ
> 
> i would be thankful, if you could remove this broken file or complete
> the mirror process.
> 
> Thanks in advance
> 
> Björn Lässig
> 
> PS: a debian like mirror-push mechanic via
> https://ftp-master.debian.org/git/archvsync.git would be very helpful.
> This would add traces and Archive-in-progress files and all the other
> things, the debian-mirror-admins had to fix over the years. This would
> lead to less traffic for you and always up-to-date mirrors for us.

That is still on my TODO list. We need a nice mirror system for Ceph.

Wido

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about rbd flag(RBD_FLAG_OBJECT_MAP_INVALID)

2015-10-27 Thread Jason Dillaman
> Hi Jason dillaman
> Recently I worked on the feature http://tracker.ceph.com/issues/13500 , when
> I read the code about librbd, I was confused by RBD_FLAG_OBJECT_MAP_INVALID
> flag.
> When I create a rbd with “—image-features = 13 ” , we enable object-map
> feature without setting RBD_FLAG_OBJECT_MAP_INVALID, then write data to
> generate an object, the existence of this object can be checked by
> object_may_exist.
> But when I use “feature enable ${name} object-map” to enable object-map
> feature of a clone rbd(we cannot specify –image-features option when I clone
> rbd), and RBD_FLAG_OBJECT_MAP_INVALID flag is set. If I use object_may_exist
> to check object existence, object_may_exist function return true, which
> means the object exists.

When you create a new (empty) image with object map enabled from the start, the 
object map is valid since it defaults to all objects don't exist.  If you use 
'rbd feature enable  object-map', the object map will be flagged as 
invalid since you may have already written to the object (and thus the object 
map doesn't potentially match reality).  When an object map is flagged as 
invalid, any optimizations for whether the block exists or not are disabled.  

> So there maybe inconsistency with these two methods (--image-features vs.
> feature enable) when we create a rbd. Is this a bug ?
> My question is what does RBD_FLAG_OBJECT_MAP_INVALID flag mean, does it mean
> the object map of rbd is not valid, we need rebuild the object map?

Yes, you need to rebuild an invalid object map via 'rbd object-map rebuild 
' to clear the RBD_FLAG_OBJECT_MAP_INVALID flag.  The 
rebuild process whether or not each potential object within the RBD image 
exists, and updates the object map accordingly. 

-- 

Jason Dillaman 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] BAD nvme SSD performance

2015-10-27 Thread Christian Balzer

Hello,

On Tue, 27 Oct 2015 11:37:42 + Matteo Dacrema wrote:

> Hi,
> 
> thanks for all the replies.
> 
> I've found the issue: 
> The Samsung nvme SSD has poor performance with sync=1. It reach only 4/5
> k iops with randwrite ops.
> 
> Using Intel DC S3700 SSDs I'm able to saturate the CPU.
> 
That's what I thought, also keep that CPU saturation in mind for any
further test subjects.

> I'm using hammer v 0.94.5 on Ubuntu 14.04 and 3.19.0-31 kernel
> 
> What do you think about Intel 750 series :
> http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-750-series.html
> 
> I plan to use it for cache layer ( one for host - is it a problem? )
> Behind the cache layer I plan to use Mechanical HDD with Journal on SSD
> drives.
> 

That SSD has been mentioned in this ML pretty much the day it was
released. 
It certainly looks fast, but so did your Samsung one.

Whether or not it is actually fast with sync writes I can't tell and
probably nobody actually tested or deployed it with Ceph.

Why?

Because at 70GB endurance per day, it is very hard to come up with a use
case where this kind of performance would be required but also fit into
such a small volume of data.

This is particularly the case in a cache tier situation, because unless
your working set (hot data) fits into your cache tier, Ceph will
constantly promote/evict objects to and from it.
Meaning that frequently READS will actually result in WRITES to the cache
tier before they are served to the client.

With a purely SSD based Ceph pool I would take my estimated writes per day
and multiply them by 5-10, if that still is in the endurance envelope of
the SSD, fine. 
But with a cache tier that becomes even higher and much more unpredictable.

Regards,

Christian

> What do you think about it?
> 
> Thanks
> Regards,
> Matteo
> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Somnath Roy Sent: lunedì 26 ottobre 2015 17:45
> To: Christian Balzer ; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] BAD nvme SSD performance
> 
> Another point,
> As Christian mentioned, try to evaluate O_DIRECT|O_DSYNC performance of
> a SSD before choosing that for Ceph.. Try to run with direct=1 and sync
> =1 with fio to a raw ssd drive..
> 
> Thanks & Regards
> Somnath
> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Somnath Roy Sent: Monday, October 26, 2015 9:20 AM
> To: Christian Balzer; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] BAD nvme SSD performance
> 
> One thing, *don't* trust iostat disk util% in case of SSDs..100% doesn't
> mean you are saturating SSDs there..I have seen a large performance
> delta even if iostat is reporting 100% disk util in both the cases.
> Also, the ceph.conf file you are using is not optimal..Try to add these..
> 
> debug_lockdep = 0/0
> debug_context = 0/0
> debug_crush = 0/0
> debug_buffer = 0/0
> debug_timer = 0/0
> debug_filer = 0/0
> debug_objecter = 0/0
> debug_rados = 0/0
> debug_rbd = 0/0
> debug_journaler = 0/0
> debug_objectcatcher = 0/0
> debug_client = 0/0
> debug_osd = 0/0
> debug_optracker = 0/0
> debug_objclass = 0/0
> debug_filestore = 0/0
> debug_journal = 0/0
> debug_ms = 0/0
> debug_monc = 0/0
> debug_tp = 0/0
> debug_auth = 0/0
> debug_finisher = 0/0
> debug_heartbeatmap = 0/0
> debug_perfcounter = 0/0
> debug_asok = 0/0
> debug_throttle = 0/0
> debug_mon = 0/0
> debug_paxos = 0/0
> debug_rgw = 0/0
> 
> You didn't mention anything about your cpu, considering you have
> powerful cpu complex for SSDs tweak this to high number of shards..It
> also depends on number of OSDs per box..
> 
> osd_op_num_threads_per_shard
> osd_op_num_shards
> 
> 
> Don't need to change the following..
> 
> osd_disk_threads
> osd_op_threads
> 
> 
> Instead, try increasing..
> 
> filestore_op_threads
> 
> Use the following in the global section..
> 
> ms_dispatch_throttle_bytes = 0
> throttler_perf_counter = false
> 
> Change the following..
> filestore_max_sync_interval = 1   (or even lower, need to lower
> filestore_min_sync_interval as well)
> 
> 
> I am assuming you are using hammer and newer..
> 
> Thanks & Regards
> Somnath
> 
> Try increasing the following to very big numbers..
> 
> > > filestore_queue_max_ops = 2000
> > >
> > > filestore_queue_max_bytes = 536870912
> > >
> > > filestore_queue_committing_max_ops = 500
> > >
> > > filestore_queue_committing_max_bytes = 268435456
> 
> Use the following..
> 
> osd_enable_op_tracker = false
> 
> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Christian Balzer Sent: Monday, October 26, 2015 8:23 AM
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] BAD nvme SSD performance
> 
> 
> Hello,
> 
> On Mon, 26 Oct 2015 14:35:19 +0100 Wido den Hollander wrote:
> 
> >
> >
> > On 26-10-15 14:29, Matteo Dacrema wrote:
> > > Hi Nick,
> > >
> > >
> > >
> > > I also 

Re: [ceph-users] BAD nvme SSD performance

2015-10-27 Thread Matteo Dacrema
Hi,

thanks for all the replies.

I've found the issue: 
The Samsung nvme SSD has poor performance with sync=1. It reach only 4/5 k iops 
with randwrite ops.

Using Intel DC S3700 SSDs I'm able to saturate the CPU.

I'm using hammer v 0.94.5 on Ubuntu 14.04 and 3.19.0-31 kernel

What do you think about Intel 750 series : 
http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-750-series.html

I plan to use it for cache layer ( one for host - is it a problem? )
Behind the cache layer I plan to use Mechanical HDD with Journal on SSD drives.

What do you think about it?

Thanks
Regards,
Matteo

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Somnath Roy
Sent: lunedì 26 ottobre 2015 17:45
To: Christian Balzer ; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] BAD nvme SSD performance

Another point,
As Christian mentioned, try to evaluate O_DIRECT|O_DSYNC performance of a SSD 
before choosing that for Ceph..
Try to run with direct=1 and sync =1 with fio to a raw ssd drive..

Thanks & Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Somnath Roy
Sent: Monday, October 26, 2015 9:20 AM
To: Christian Balzer; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] BAD nvme SSD performance

One thing, *don't* trust iostat disk util% in case of SSDs..100% doesn't mean 
you are saturating SSDs there..I have seen a large performance delta even if 
iostat is reporting 100% disk util in both the cases.
Also, the ceph.conf file you are using is not optimal..Try to add these..

debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_filer = 0/0
debug_objecter = 0/0
debug_rados = 0/0
debug_rbd = 0/0
debug_journaler = 0/0
debug_objectcatcher = 0/0
debug_client = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
debug_mon = 0/0
debug_paxos = 0/0
debug_rgw = 0/0

You didn't mention anything about your cpu, considering you have powerful cpu 
complex for SSDs tweak this to high number of shards..It also depends on number 
of OSDs per box..

osd_op_num_threads_per_shard
osd_op_num_shards


Don't need to change the following..

osd_disk_threads
osd_op_threads


Instead, try increasing..

filestore_op_threads

Use the following in the global section..

ms_dispatch_throttle_bytes = 0
throttler_perf_counter = false

Change the following..
filestore_max_sync_interval = 1   (or even lower, need to lower 
filestore_min_sync_interval as well)


I am assuming you are using hammer and newer..

Thanks & Regards
Somnath

Try increasing the following to very big numbers..

> > filestore_queue_max_ops = 2000
> >
> > filestore_queue_max_bytes = 536870912
> >
> > filestore_queue_committing_max_ops = 500
> >
> > filestore_queue_committing_max_bytes = 268435456

Use the following..

osd_enable_op_tracker = false


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Christian Balzer
Sent: Monday, October 26, 2015 8:23 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] BAD nvme SSD performance


Hello,

On Mon, 26 Oct 2015 14:35:19 +0100 Wido den Hollander wrote:

>
>
> On 26-10-15 14:29, Matteo Dacrema wrote:
> > Hi Nick,
> >
> >
> >
> > I also tried to increase iodepth but nothing has changed.
> >
> >
> >
> > With iostat I noticed that the disk is fully utilized and write per 
> > seconds from iostat match fio output.
> >
>
> Ceph isn't fully optimized to get the maximum potential out of NVME 
> SSDs yet.
>
Indeed. Don't expect Ceph to be near raw SSD performance.

However he writes that per iostat the SSD is fully utilized.

Matteo, can you run run atop instead of iostat and confirm that:

a) utilization of the SSD is 100%.
b) CPU is not the bottleneck.

My guess would be these particular NVMe SSDs might just suffer from the same 
direct sync I/O deficiencies as other Samsung SSDs.
This feeling is re-affirmed by seeing Samsung listing them as a Client SSDs, 
not data center one.
http://www.samsung.com/semiconductor/products/flash-storage/client-ssd/MZHPV256HDGL?ia=831

Regards,

Christian

> For example, NVM-E SSDs work best with very high queue depths and 
> parallel IOps.
>
> Also, be aware that Ceph add multiple layers to the whole I/O 
> subsystem and that there will be a performance impact when Ceph is used in 
> between.
>
> Wido
>
> >
> >
> > Matteo
> >
> >
> >
> > *From:*Nick Fisk [mailto:n...@fisk.me.uk]
> > *Sent:* lunedì 26 ottobre 2015 13:06
> > *To:* Matteo Dacrema ; ceph-us...@ceph.com
> > *Subject:* RE: BAD nvme SSD performance
> >
> >
> >
> > Hi Matteo,
> >
> >
> >
> > Ceph introduces latency into the write path and so what you are 
> 

Re: [ceph-users] when an osd is started up, IO will be blocked

2015-10-27 Thread Jevon Qiao

Hi Cephers,

We're in the middle of trying to triage the issue with ceph cluster 
running 0.80.9 which was reported by Songbo and seeking for you experts' 
advices.


In fact, per our testing the process of stopping an working OSD and 
starting it again will lead to a huge performance downgrade. In other 
words, this issue can be reproduced quite easily, and we cannot lower 
the impact of the state of OSD by tuning the settings like 
osd_max_backfills/osd_recovery_max_chunk/osd_recovery_max_active. 
Through looking into the source code, we notice that the requests issued 
by clients will be queued firstly when the corresponding PGs are in some 
certain states (like recovering and backfill) and then processed. During 
this period, the IOPS outputted by fio drops significantly(from 2000 to 
60). What we can think of this is to guarantee the data consistency, are 
we correct? If that's the design, we're wondering how Ceph can support 
the applications that are performance-sensitive? Is there any other 
parameters we can tuning to reduce the impact?


Thanks,
Jevon
On 26/10/15 13:27, wangsongbo wrote:

Hi all,

When an osd is started, I will get a lot of slow requests from the 
corresponding osd log, as follows:


2015-10-26 03:42:51.593961 osd.4 [WRN] slow request 3.967808 seconds 
old, received at 2015-10-26 03:42:47.625968: 
osd_repop(client.2682003.0:2686048 43.fcf 
d1ddfcf/rbd_data.196483222ac2db.0010/head//43 v 
9744'347845) currently commit_sent
2015-10-26 03:42:51.593964 osd.4 [WRN] slow request 3.964537 seconds 
old, received at 2015-10-26 03:42:47.629239: 
osd_repop(client.2682003.0:2686049 43.b4b 
cbcbbb4b/rbd_data.196483222ac2db.020b/head//43 v 
9744'193029) currently commit_sent
2015-10-26 03:42:52.594166 osd.4 [WRN] 40 slow requests, 17 included 
below; oldest blocked for > 53.692556 secs
2015-10-26 03:42:52.594172 osd.4 [WRN] slow request 2.272928 seconds 
old, received at 2015-10-26 03:42:50.321151: 
osd_repop(client.3684690.0:191908 43.540 
f1858540/rbd_data.1fc5ca7429fc17.0280/head//43 v 
9744'63645) currently commit_sent
2015-10-26 03:42:52.594175 osd.4 [WRN] slow request 2.270618 seconds 
old, received at 2015-10-26 03:42:50.323461: 
osd_op(client.3684690.0:191911 
rbd_data.1fc5ca7429fc17.0209 [write 2633728~4096] 
43.72b9f039 ack+ondisk+write e9744) currently commit_sent
2015-10-26 03:42:52.594264 osd.4 [WRN] slow request 4.968252 seconds 
old, received at 2015-10-26 03:42:47.625828: 
osd_repop(client.2682003.0:2686047 43.b4b 
cbcbbb4b/rbd_data.196483222ac2db.020b/head//43 v 
9744'193028) currently commit_sent
2015-10-26 03:42:52.594266 osd.4 [WRN] slow request 4.968111 seconds 
old, received at 2015-10-26 03:42:47.625968: 
osd_repop(client.2682003.0:2686048 43.fcf 
d1ddfcf/rbd_data.196483222ac2db.0010/head//43 v 
9744'347845) currently commit_sent
2015-10-26 03:42:52.594318 osd.4 [WRN] slow request 4.964841 seconds 
old, received at 2015-10-26 03:42:47.629239: 
osd_repop(client.2682003.0:2686049 43.b4b 
cbcbbb4b/rbd_data.196483222ac2db.020b/head//43 v 
9744'193029) currently commit_sent
2015-10-26 03:42:53.594527 osd.4 [WRN] 40 slow requests, 16 included 
below; oldest blocked for > 54.692945 secs
2015-10-26 03:42:53.594533 osd.4 [WRN] slow request 16.004669 seconds 
old, received at 2015-10-26 03:42:37.589800: 
osd_repop(client.2682003.0:2686041 43.b4b 
cbcbbb4b/rbd_data.196483222ac2db.020b/head//43 v 
9744'193024) currently commit_sent
2015-10-26 03:42:53.594536 osd.4 [WRN] slow request 16.003889 seconds 
old, received at 2015-10-26 03:42:37.590580: 
osd_repop(client.2682003.0:2686040 43.fcf 
d1ddfcf/rbd_data.196483222ac2db.0010/head//43 v 
9744'347842) currently commit_sent
2015-10-26 03:42:53.594538 osd.4 [WRN] slow request 16.000954 seconds 
old, received at 2015-10-26 03:42:37.593515: 
osd_repop(client.2682003.0:2686042 43.b4b 
cbcbbb4b/rbd_data.196483222ac2db.020b/head//43 v 
9744'193025) currently commit_sent
2015-10-26 03:42:53.594541 osd.4 [WRN] slow request 29.138828 seconds 
old, received at 2015-10-26 03:42:24.455641: 
osd_repop(client.4764855.0:65121 43.dbe 
169a9dbe/rbd_data.49a7a4633ac0b1.0021/head//43 v 
9744'12509) currently commit_sent
2015-10-26 03:42:53.594543 osd.4 [WRN] slow request 15.998814 seconds 
old, received at 2015-10-26 03:42:37.595656: 
osd_repop(client.1800547.0:1205399 43.cc5 
9285ecc5/rbd_data.1b794560c6e2ea.00d0/head//43 v 
9744'36732) currently commit_sent
2015-10-26 03:42:54.594892 osd.4 [WRN] 39 slow requests, 17 included 
below; oldest blocked for > 55.693227 secs
2015-10-26 03:42:54.594908 osd.4 [WRN] slow request 4.273600 seconds 
old, received at 2015-10-26 03:42:50.321151: 
osd_repop(client.3684690.0:191908 43.540 
f1858540/rbd_data.1fc5ca7429fc17.0280/head//43 v 
9744'63645) currently commit_sent
2015-10-26 03:42:54.594911 osd.4 [WRN] slow request 4.271290 seconds 
old, received at 

Re: [ceph-users] Our 0.94.2 OSD are not restarting : osd/PG.cc: 2856: FAILED assert(values.size() == 1)

2015-10-27 Thread Gregory Farnum
You might see if http://tracker.ceph.com/issues/13060 could apply to
your cluster. If so upgrading to .94.4 should fix it.

*Don't* reset your OSD journal. That is never the answer and is
basically the same as trashing the OSD in question.
-Greg

On Tue, Oct 27, 2015 at 9:59 AM, Laurent GUERBY  wrote:
> Hi,
>
> After a host failure (and two disks failing within 8 hours)
> one of our OSD failed to start after boot with the following error:
>
> 0> 2015-10-26 08:15:59.923059 7f67f0cb2900 -1 osd/PG.cc: In function
> 'static epoch_t PG::peek_map_epoch(ObjectStore*, spg_t,
> ceph::bufferlist*)' thread 7f67f0cb2900 time 2015-10-26 08:15:59.922041
> osd/PG.cc: 2856: FAILED assert(values.size() == 1)
>
> Full log attached here:
>
> http://tracker.ceph.com/issues/13594
>
> As noted this is similar to
>
> http://tracker.ceph.com/issues/4855
>
> Which was closed as cannot reproduce.
>
> After a second host failure we got a second
> OSD with the same error (we tried multiple times to restart), which is
> scary since our cluster is not that big and recovery
> takes a very long time.
>
> We'd like to restart these OSD, may be the
> start error is linked to the journal?
> Would it be sfe to reset the journal with:
>
> ceph-osd --mkjournal -i OSDNUM
>
> Thanks in advance for any help,
>
> Sincerely,
>
> Laurent
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rsync mirror download.ceph.com - broken file on rsync server

2015-10-27 Thread Ken Dreyer
On Tue, Oct 27, 2015 at 2:51 AM, Björn Lässig  wrote:
> indeed there is:
>
> [09:40:49] ~ > rsync -4 -L
> download.ceph.com::ceph/debian-hammer/pool/main/c/ceph/.ceph-dbg_0.94.5-1trusty_amd64.deb.3xQnIQ
> -rw--- 91,488,256 2015/10/26 19:36:46
> .ceph-dbg_0.94.5-1trusty_amd64.deb.3xQnIQ
>
> i would be thankful, if you could remove this broken file or complete the
> mirror process.
>

Alfredo this looks like a leftover from some rsync upload process. I
think we can just delete this file, right?

- Ken
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] fedora core 22

2015-10-27 Thread Ken Dreyer
On Tue, Oct 27, 2015 at 7:13 AM, Andrew Hume  wrote:
> a while back, i had installed ceph (firefly i believe) on my fedora core 
> system and all went smoothly.
> i went to repeat this yesterday with hammer, but i am stymied by lack of 
> packages. there doesn’t
> appear anything for fc21 or fc22.
>
> i initially tried ceph-deploy, but it fails because of the above issue.
> i then looked at the manual install documentation but am growing nervous 
> because
> it is clearly out of date (contents of ceph.conf are different than what 
> ceps-deploy generated).
>
> how do i make progress?
>
> andrew


The Fedora distribution itself ships the latest Hammer package [1], so
you can use the ceph packages from there. I think ceph-deploy's
"--no-adjust-repos" flag will keep it from trying to contact ceph.com?

- Ken

[1] https://bodhi.fedoraproject.org/updates/?packages=ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Our 0.94.2 OSD are not restarting : osd/PG.cc: 2856: FAILED assert(values.size() == 1)

2015-10-27 Thread Laurent GUERBY
Hi,

After a host failure (and two disks failing within 8 hours)
one of our OSD failed to start after boot with the following error:

0> 2015-10-26 08:15:59.923059 7f67f0cb2900 -1 osd/PG.cc: In function
'static epoch_t PG::peek_map_epoch(ObjectStore*, spg_t,
ceph::bufferlist*)' thread 7f67f0cb2900 time 2015-10-26 08:15:59.922041
osd/PG.cc: 2856: FAILED assert(values.size() == 1)

Full log attached here:

http://tracker.ceph.com/issues/13594

As noted this is similar to 

http://tracker.ceph.com/issues/4855

Which was closed as cannot reproduce.

After a second host failure we got a second
OSD with the same error (we tried multiple times to restart), which is
scary since our cluster is not that big and recovery
takes a very long time.

We'd like to restart these OSD, may be the
start error is linked to the journal?
Would it be sfe to reset the journal with:

ceph-osd --mkjournal -i OSDNUM

Thanks in advance for any help,

Sincerely,

Laurent

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs stuck in active+clean+replay

2015-10-27 Thread Andras Pataki
Hi Greg,

No, unfortunately I haven¹t found any resolution to it.  We are using
cephfs, the whole installation is on 0.94.4.  What I did notice is that
performance is extremely poor when backfilling is happening.  I wonder if
timeouts of some kind could cause PG¹s to get stuck in replay.  I lowered
the Œosd max backfills¹ parameter today from the default 10 all the way
down to 1 to see if it improves things.  Client read/write performance has
definitely improved since then, whether this improves the
Œstuck-in-replay¹ situation, I¹m still waiting to see.

Andras


On 10/27/15, 2:06 PM, "Gregory Farnum"  wrote:

>On Tue, Oct 27, 2015 at 11:03 AM, Gregory Farnum 
>wrote:
>> On Thu, Oct 22, 2015 at 3:58 PM, Andras Pataki
>>  wrote:
>>> Hi ceph users,
>>>
>>> We¹ve upgraded to 0.94.4 (all ceph daemons got restarted) ­ and are in
>>>the
>>> middle of doing some rebalancing due to crush changes (removing some
>>>disks).
>>> During the rebalance, I see that some placement groups get stuck in
>>> Œactive+clean+replay¹ for a long time (essentially until I restart the
>>>OSD
>>> they are on).  All IO for these PGs gets queued, and clients hang.
>>>
>>> ceph health details the blocked ops in it:
>>>
>>> 4 ops are blocked > 2097.15 sec
>>> 1 ops are blocked > 131.072 sec
>>> 2 ops are blocked > 2097.15 sec on osd.41
>>> 2 ops are blocked > 2097.15 sec on osd.119
>>> 1 ops are blocked > 131.072 sec on osd.124
>>>
>>> ceph pg dump | grep replay
>>> dumped all in format plain
>>> 2.121b 3836 0 0 0 0 15705994377 3006 3006 active+clean+replay
>>>2015-10-22
>>> 14:12:01.104564 123840'2258640 125080:1252265 [41,111] 41 [41,111] 41
>>> 114515'2258631 2015-10-20 18:44:09.757620 114515'2258631 2015-10-20
>>> 18:44:09.757620
>>> 2.b4 3799 0 0 0 0 15604827445 3003 3003 active+clean+replay 2015-10-22
>>> 13:57:25.490150 119558'2322127 125084:1174759 [119,75] 119 [119,75] 119
>>> 114515'2322124 2015-10-20 11:00:51.448239 114515'2322124 2015-10-17
>>> 09:22:14.676006
>>>
>>> Both osd.41 and OSD.119 are doing this ³replay².
>>>
>>> The end of the log of osd.41:
>>>
>>> 2015-10-22 10:44:35.727000 7f037929b700  0 -- 10.4.36.105:6827/98624 >>
>>> 10.4.36.170:6913/121602 pipe(0x3b4d sd=125 :6827 s=2 pgs=618 cs=1
>>>l=0
>>> c=0x374398c0).fault with nothing to send, going to standby
>>> 2015-10-22 10:50:25.954404 7f038adae700  0 -- 10.4.36.105:6827/98624 >>
>>> 10.4.36.105:6809/141110 pipe(0x3adff000 sd=229 :6827 s=2 pgs=94 cs=3
>>>l=0
>>> c=0x3e9d0940).fault with nothing to send, going to standby
>>> 2015-10-22 12:11:28.029214 7f03a0e0d700  0 -- 10.4.36.105:6827/98624 >>
>>> 10.4.36.106:6864/102556 pipe(0x40afe000 sd=621 :6827 s=2 pgs=91 cs=3
>>>l=0
>>> c=0x3acf5860).fault with nothing to send, going to standby
>>> 2015-10-22 12:45:45.404765 7f038050d700  0 -- 10.4.36.105:6827/98624 >>
>>> 10.4.36.102:6837/77957 pipe(0x39cbe000 sd=578 :6827 s=0 pgs=0 cs=0 l=0
>>> c=0x37b3cec0).accept connect_seq 1 vs existing 1 state standby
>>> 2015-10-22 12:45:45.405232 7f038050d700  0 -- 10.4.36.105:6827/98624 >>
>>> 10.4.36.102:6837/77957 pipe(0x39cbe000 sd=578 :6827 s=0 pgs=0 cs=0 l=0
>>> c=0x37b3cec0).accept connect_seq 2 vs existing 1 state standby
>>> 2015-10-22 12:52:49.062752 7f036525c700  0 -- 10.4.36.105:6827/98624 >>
>>> 10.4.36.105:6809/141110 pipe(0x3f637000 sd=405 :6827 s=0 pgs=0 cs=0 l=0
>>> c=0x37b3ba20).accept connect_seq 3 vs existing 3 state standby
>>> 2015-10-22 12:52:49.063169 7f036525c700  0 -- 10.4.36.105:6827/98624 >>
>>> 10.4.36.105:6809/141110 pipe(0x3f637000 sd=405 :6827 s=0 pgs=0 cs=0 l=0
>>> c=0x37b3ba20).accept connect_seq 4 vs existing 3 state standby
>>> 2015-10-22 13:02:16.573546 7f038050d700  0 -- 10.4.36.105:6827/98624 >>
>>> 10.4.36.102:6837/77957 pipe(0x39cbe000 sd=578 :6827 s=2 pgs=110 cs=3
>>>l=0
>>> c=0x37b92000).fault with nothing to send, going to standby
>>> 2015-10-22 13:07:58.667432 7f036525c700  0 -- 10.4.36.105:6827/98624 >>
>>> 10.4.36.105:6809/141110 pipe(0x3f637000 sd=405 :6827 s=2 pgs=146 cs=5
>>>l=0
>>> c=0x3e9d0940).fault with nothing to send, going to standby
>>> 2015-10-22 13:25:35.020722 7f038191a700  0 -- 10.4.36.105:6827/98624 >>
>>> 10.4.36.111:6841/71447 pipe(0x3e78e000 sd=205 :6827 s=2 pgs=82 cs=3 l=0
>>> c=0x36bf5860).fault with nothing to send, going to standby
>>> 2015-10-22 13:45:48.610068 7f0361620700  0 -- 10.4.36.105:6827/98624 >>
>>> 10.4.36.105:6841/99063 pipe(0x3e43b000 sd=539 :6827 s=0 pgs=0 cs=0 l=0
>>> c=0x373e11e0).accept we reset (peer sent cseq 1), sending RESETSESSION
>>> 2015-10-22 13:45:48.880698 7f0361620700  0 -- 10.4.36.105:6827/98624 >>
>>> 10.4.36.105:6841/99063 pipe(0x3e43b000 sd=539 :6827 s=2 pgs=199 cs=1
>>>l=0
>>> c=0x373e11e0).reader missed message?  skipped from seq 0 to 825623574
>>> 2015-10-22 14:11:32.967937 7f035d9e4700  0 -- 10.4.36.105:6827/98624 >>
>>> 10.4.36.105:6802/98037 pipe(0x3ce82000 sd=63 :43711 s=2 pgs=144 cs=3
>>>l=0
>>> c=0x3bf8c100).fault with nothing to send, going to 

Re: [ceph-users] PGs stuck in active+clean+replay

2015-10-27 Thread Gregory Farnum
On Tue, Oct 27, 2015 at 11:03 AM, Gregory Farnum  wrote:
> On Thu, Oct 22, 2015 at 3:58 PM, Andras Pataki
>  wrote:
>> Hi ceph users,
>>
>> We’ve upgraded to 0.94.4 (all ceph daemons got restarted) – and are in the
>> middle of doing some rebalancing due to crush changes (removing some disks).
>> During the rebalance, I see that some placement groups get stuck in
>> ‘active+clean+replay’ for a long time (essentially until I restart the OSD
>> they are on).  All IO for these PGs gets queued, and clients hang.
>>
>> ceph health details the blocked ops in it:
>>
>> 4 ops are blocked > 2097.15 sec
>> 1 ops are blocked > 131.072 sec
>> 2 ops are blocked > 2097.15 sec on osd.41
>> 2 ops are blocked > 2097.15 sec on osd.119
>> 1 ops are blocked > 131.072 sec on osd.124
>>
>> ceph pg dump | grep replay
>> dumped all in format plain
>> 2.121b 3836 0 0 0 0 15705994377 3006 3006 active+clean+replay 2015-10-22
>> 14:12:01.104564 123840'2258640 125080:1252265 [41,111] 41 [41,111] 41
>> 114515'2258631 2015-10-20 18:44:09.757620 114515'2258631 2015-10-20
>> 18:44:09.757620
>> 2.b4 3799 0 0 0 0 15604827445 3003 3003 active+clean+replay 2015-10-22
>> 13:57:25.490150 119558'2322127 125084:1174759 [119,75] 119 [119,75] 119
>> 114515'2322124 2015-10-20 11:00:51.448239 114515'2322124 2015-10-17
>> 09:22:14.676006
>>
>> Both osd.41 and OSD.119 are doing this “replay”.
>>
>> The end of the log of osd.41:
>>
>> 2015-10-22 10:44:35.727000 7f037929b700  0 -- 10.4.36.105:6827/98624 >>
>> 10.4.36.170:6913/121602 pipe(0x3b4d sd=125 :6827 s=2 pgs=618 cs=1 l=0
>> c=0x374398c0).fault with nothing to send, going to standby
>> 2015-10-22 10:50:25.954404 7f038adae700  0 -- 10.4.36.105:6827/98624 >>
>> 10.4.36.105:6809/141110 pipe(0x3adff000 sd=229 :6827 s=2 pgs=94 cs=3 l=0
>> c=0x3e9d0940).fault with nothing to send, going to standby
>> 2015-10-22 12:11:28.029214 7f03a0e0d700  0 -- 10.4.36.105:6827/98624 >>
>> 10.4.36.106:6864/102556 pipe(0x40afe000 sd=621 :6827 s=2 pgs=91 cs=3 l=0
>> c=0x3acf5860).fault with nothing to send, going to standby
>> 2015-10-22 12:45:45.404765 7f038050d700  0 -- 10.4.36.105:6827/98624 >>
>> 10.4.36.102:6837/77957 pipe(0x39cbe000 sd=578 :6827 s=0 pgs=0 cs=0 l=0
>> c=0x37b3cec0).accept connect_seq 1 vs existing 1 state standby
>> 2015-10-22 12:45:45.405232 7f038050d700  0 -- 10.4.36.105:6827/98624 >>
>> 10.4.36.102:6837/77957 pipe(0x39cbe000 sd=578 :6827 s=0 pgs=0 cs=0 l=0
>> c=0x37b3cec0).accept connect_seq 2 vs existing 1 state standby
>> 2015-10-22 12:52:49.062752 7f036525c700  0 -- 10.4.36.105:6827/98624 >>
>> 10.4.36.105:6809/141110 pipe(0x3f637000 sd=405 :6827 s=0 pgs=0 cs=0 l=0
>> c=0x37b3ba20).accept connect_seq 3 vs existing 3 state standby
>> 2015-10-22 12:52:49.063169 7f036525c700  0 -- 10.4.36.105:6827/98624 >>
>> 10.4.36.105:6809/141110 pipe(0x3f637000 sd=405 :6827 s=0 pgs=0 cs=0 l=0
>> c=0x37b3ba20).accept connect_seq 4 vs existing 3 state standby
>> 2015-10-22 13:02:16.573546 7f038050d700  0 -- 10.4.36.105:6827/98624 >>
>> 10.4.36.102:6837/77957 pipe(0x39cbe000 sd=578 :6827 s=2 pgs=110 cs=3 l=0
>> c=0x37b92000).fault with nothing to send, going to standby
>> 2015-10-22 13:07:58.667432 7f036525c700  0 -- 10.4.36.105:6827/98624 >>
>> 10.4.36.105:6809/141110 pipe(0x3f637000 sd=405 :6827 s=2 pgs=146 cs=5 l=0
>> c=0x3e9d0940).fault with nothing to send, going to standby
>> 2015-10-22 13:25:35.020722 7f038191a700  0 -- 10.4.36.105:6827/98624 >>
>> 10.4.36.111:6841/71447 pipe(0x3e78e000 sd=205 :6827 s=2 pgs=82 cs=3 l=0
>> c=0x36bf5860).fault with nothing to send, going to standby
>> 2015-10-22 13:45:48.610068 7f0361620700  0 -- 10.4.36.105:6827/98624 >>
>> 10.4.36.105:6841/99063 pipe(0x3e43b000 sd=539 :6827 s=0 pgs=0 cs=0 l=0
>> c=0x373e11e0).accept we reset (peer sent cseq 1), sending RESETSESSION
>> 2015-10-22 13:45:48.880698 7f0361620700  0 -- 10.4.36.105:6827/98624 >>
>> 10.4.36.105:6841/99063 pipe(0x3e43b000 sd=539 :6827 s=2 pgs=199 cs=1 l=0
>> c=0x373e11e0).reader missed message?  skipped from seq 0 to 825623574
>> 2015-10-22 14:11:32.967937 7f035d9e4700  0 -- 10.4.36.105:6827/98624 >>
>> 10.4.36.105:6802/98037 pipe(0x3ce82000 sd=63 :43711 s=2 pgs=144 cs=3 l=0
>> c=0x3bf8c100).fault with nothing to send, going to standby
>> 2015-10-22 14:12:35.338635 7f03afffb700  0 log_channel(cluster) log [WRN] :
>> 2 slow requests, 2 included below; oldest blocked for > 30.079053 secs
>> 2015-10-22 14:12:35.338875 7f03afffb700  0 log_channel(cluster) log [WRN] :
>> slow request 30.079053 seconds old, received at 2015-10-22 14:12:05.259156:
>> osd_op(client.734338.0:50618164 10b8f73.03ef [read 0~65536]
>> 2.338a921b ack+read+known_if_redirected e124995) currently waiting for
>> replay end
>> 2015-10-22 14:12:35.339050 7f03afffb700  0 log_channel(cluster) log [WRN] :
>> slow request 30.063995 seconds old, received at 2015-10-22 14:12:05.274213:
>> osd_op(client.734338.0:50618166 10b8f73.03ef [read 65536~131072]
>> 2.338a921b 

Re: [ceph-users] rsync mirror download.ceph.com - broken file on rsync server

2015-10-27 Thread Ken Dreyer
Thanks, I've deleted it from the download.ceph.com web server.

- Ken

On Tue, Oct 27, 2015 at 11:06 AM, Alfredo Deza  wrote:
> Yes that file can (should) be deleted
>
> On Tue, Oct 27, 2015 at 12:49 PM, Ken Dreyer  wrote:
>> On Tue, Oct 27, 2015 at 2:51 AM, Björn Lässig  
>> wrote:
>>> indeed there is:
>>>
>>> [09:40:49] ~ > rsync -4 -L
>>> download.ceph.com::ceph/debian-hammer/pool/main/c/ceph/.ceph-dbg_0.94.5-1trusty_amd64.deb.3xQnIQ
>>> -rw--- 91,488,256 2015/10/26 19:36:46
>>> .ceph-dbg_0.94.5-1trusty_amd64.deb.3xQnIQ
>>>
>>> i would be thankful, if you could remove this broken file or complete the
>>> mirror process.
>>>
>>
>> Alfredo this looks like a leftover from some rsync upload process. I
>> think we can just delete this file, right?
>>
>> - Ken
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs stuck in active+clean+replay

2015-10-27 Thread Gregory Farnum
On Tue, Oct 27, 2015 at 11:22 AM, Andras Pataki
 wrote:
> Hi Greg,
>
> No, unfortunately I haven¹t found any resolution to it.  We are using
> cephfs, the whole installation is on 0.94.4.  What I did notice is that
> performance is extremely poor when backfilling is happening.  I wonder if
> timeouts of some kind could cause PG¹s to get stuck in replay.  I lowered
> the Œosd max backfills¹ parameter today from the default 10 all the way
> down to 1 to see if it improves things.  Client read/write performance has
> definitely improved since then, whether this improves the
> Œstuck-in-replay¹ situation, I¹m still waiting to see.

Argh. Looks like known bug http://tracker.ceph.com/issues/13116. I've
pushed a new branch hammer-pg-replay to the gitbuilders which
backports that patch and ought to improve things if you're able to
install that to test. (It's untested but I don't foresee any issues
arising.) I've also added it to the backport queue.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rsync mirror download.ceph.com - broken file on rsync server

2015-10-27 Thread Alfredo Deza
Yes that file can (should) be deleted

On Tue, Oct 27, 2015 at 12:49 PM, Ken Dreyer  wrote:
> On Tue, Oct 27, 2015 at 2:51 AM, Björn Lässig  
> wrote:
>> indeed there is:
>>
>> [09:40:49] ~ > rsync -4 -L
>> download.ceph.com::ceph/debian-hammer/pool/main/c/ceph/.ceph-dbg_0.94.5-1trusty_amd64.deb.3xQnIQ
>> -rw--- 91,488,256 2015/10/26 19:36:46
>> .ceph-dbg_0.94.5-1trusty_amd64.deb.3xQnIQ
>>
>> i would be thankful, if you could remove this broken file or complete the
>> mirror process.
>>
>
> Alfredo this looks like a leftover from some rsync upload process. I
> think we can just delete this file, right?
>
> - Ken
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs stuck in active+clean+replay

2015-10-27 Thread Gregory Farnum
On Thu, Oct 22, 2015 at 3:58 PM, Andras Pataki
 wrote:
> Hi ceph users,
>
> We’ve upgraded to 0.94.4 (all ceph daemons got restarted) – and are in the
> middle of doing some rebalancing due to crush changes (removing some disks).
> During the rebalance, I see that some placement groups get stuck in
> ‘active+clean+replay’ for a long time (essentially until I restart the OSD
> they are on).  All IO for these PGs gets queued, and clients hang.
>
> ceph health details the blocked ops in it:
>
> 4 ops are blocked > 2097.15 sec
> 1 ops are blocked > 131.072 sec
> 2 ops are blocked > 2097.15 sec on osd.41
> 2 ops are blocked > 2097.15 sec on osd.119
> 1 ops are blocked > 131.072 sec on osd.124
>
> ceph pg dump | grep replay
> dumped all in format plain
> 2.121b 3836 0 0 0 0 15705994377 3006 3006 active+clean+replay 2015-10-22
> 14:12:01.104564 123840'2258640 125080:1252265 [41,111] 41 [41,111] 41
> 114515'2258631 2015-10-20 18:44:09.757620 114515'2258631 2015-10-20
> 18:44:09.757620
> 2.b4 3799 0 0 0 0 15604827445 3003 3003 active+clean+replay 2015-10-22
> 13:57:25.490150 119558'2322127 125084:1174759 [119,75] 119 [119,75] 119
> 114515'2322124 2015-10-20 11:00:51.448239 114515'2322124 2015-10-17
> 09:22:14.676006
>
> Both osd.41 and OSD.119 are doing this “replay”.
>
> The end of the log of osd.41:
>
> 2015-10-22 10:44:35.727000 7f037929b700  0 -- 10.4.36.105:6827/98624 >>
> 10.4.36.170:6913/121602 pipe(0x3b4d sd=125 :6827 s=2 pgs=618 cs=1 l=0
> c=0x374398c0).fault with nothing to send, going to standby
> 2015-10-22 10:50:25.954404 7f038adae700  0 -- 10.4.36.105:6827/98624 >>
> 10.4.36.105:6809/141110 pipe(0x3adff000 sd=229 :6827 s=2 pgs=94 cs=3 l=0
> c=0x3e9d0940).fault with nothing to send, going to standby
> 2015-10-22 12:11:28.029214 7f03a0e0d700  0 -- 10.4.36.105:6827/98624 >>
> 10.4.36.106:6864/102556 pipe(0x40afe000 sd=621 :6827 s=2 pgs=91 cs=3 l=0
> c=0x3acf5860).fault with nothing to send, going to standby
> 2015-10-22 12:45:45.404765 7f038050d700  0 -- 10.4.36.105:6827/98624 >>
> 10.4.36.102:6837/77957 pipe(0x39cbe000 sd=578 :6827 s=0 pgs=0 cs=0 l=0
> c=0x37b3cec0).accept connect_seq 1 vs existing 1 state standby
> 2015-10-22 12:45:45.405232 7f038050d700  0 -- 10.4.36.105:6827/98624 >>
> 10.4.36.102:6837/77957 pipe(0x39cbe000 sd=578 :6827 s=0 pgs=0 cs=0 l=0
> c=0x37b3cec0).accept connect_seq 2 vs existing 1 state standby
> 2015-10-22 12:52:49.062752 7f036525c700  0 -- 10.4.36.105:6827/98624 >>
> 10.4.36.105:6809/141110 pipe(0x3f637000 sd=405 :6827 s=0 pgs=0 cs=0 l=0
> c=0x37b3ba20).accept connect_seq 3 vs existing 3 state standby
> 2015-10-22 12:52:49.063169 7f036525c700  0 -- 10.4.36.105:6827/98624 >>
> 10.4.36.105:6809/141110 pipe(0x3f637000 sd=405 :6827 s=0 pgs=0 cs=0 l=0
> c=0x37b3ba20).accept connect_seq 4 vs existing 3 state standby
> 2015-10-22 13:02:16.573546 7f038050d700  0 -- 10.4.36.105:6827/98624 >>
> 10.4.36.102:6837/77957 pipe(0x39cbe000 sd=578 :6827 s=2 pgs=110 cs=3 l=0
> c=0x37b92000).fault with nothing to send, going to standby
> 2015-10-22 13:07:58.667432 7f036525c700  0 -- 10.4.36.105:6827/98624 >>
> 10.4.36.105:6809/141110 pipe(0x3f637000 sd=405 :6827 s=2 pgs=146 cs=5 l=0
> c=0x3e9d0940).fault with nothing to send, going to standby
> 2015-10-22 13:25:35.020722 7f038191a700  0 -- 10.4.36.105:6827/98624 >>
> 10.4.36.111:6841/71447 pipe(0x3e78e000 sd=205 :6827 s=2 pgs=82 cs=3 l=0
> c=0x36bf5860).fault with nothing to send, going to standby
> 2015-10-22 13:45:48.610068 7f0361620700  0 -- 10.4.36.105:6827/98624 >>
> 10.4.36.105:6841/99063 pipe(0x3e43b000 sd=539 :6827 s=0 pgs=0 cs=0 l=0
> c=0x373e11e0).accept we reset (peer sent cseq 1), sending RESETSESSION
> 2015-10-22 13:45:48.880698 7f0361620700  0 -- 10.4.36.105:6827/98624 >>
> 10.4.36.105:6841/99063 pipe(0x3e43b000 sd=539 :6827 s=2 pgs=199 cs=1 l=0
> c=0x373e11e0).reader missed message?  skipped from seq 0 to 825623574
> 2015-10-22 14:11:32.967937 7f035d9e4700  0 -- 10.4.36.105:6827/98624 >>
> 10.4.36.105:6802/98037 pipe(0x3ce82000 sd=63 :43711 s=2 pgs=144 cs=3 l=0
> c=0x3bf8c100).fault with nothing to send, going to standby
> 2015-10-22 14:12:35.338635 7f03afffb700  0 log_channel(cluster) log [WRN] :
> 2 slow requests, 2 included below; oldest blocked for > 30.079053 secs
> 2015-10-22 14:12:35.338875 7f03afffb700  0 log_channel(cluster) log [WRN] :
> slow request 30.079053 seconds old, received at 2015-10-22 14:12:05.259156:
> osd_op(client.734338.0:50618164 10b8f73.03ef [read 0~65536]
> 2.338a921b ack+read+known_if_redirected e124995) currently waiting for
> replay end
> 2015-10-22 14:12:35.339050 7f03afffb700  0 log_channel(cluster) log [WRN] :
> slow request 30.063995 seconds old, received at 2015-10-22 14:12:05.274213:
> osd_op(client.734338.0:50618166 10b8f73.03ef [read 65536~131072]
> 2.338a921b ack+read+known_if_redirected e124995) currently waiting for
> replay end
> 2015-10-22 14:13:11.817243 7f03afffb700  0 log_channel(cluster) log [WRN] :
> 2 slow requests, 2 included 

Re: [ceph-users] PGs stuck in active+clean+replay

2015-10-27 Thread Andras Pataki
Yes, this definitely sounds plausible (the peering/activating process does
take a long time).  At the moment I’m trying to get our cluster back to a
more working state.  Once everything works, I could try building a patched
set of ceph processes from source (currently I’m using the pre-built
centos RPMs) before a planned larger rebalance.

Andras


On 10/27/15, 2:36 PM, "Gregory Farnum"  wrote:

>On Tue, Oct 27, 2015 at 11:22 AM, Andras Pataki
> wrote:
>> Hi Greg,
>>
>> No, unfortunately I haven¹t found any resolution to it.  We are using
>> cephfs, the whole installation is on 0.94.4.  What I did notice is that
>> performance is extremely poor when backfilling is happening.  I wonder
>>if
>> timeouts of some kind could cause PG¹s to get stuck in replay.  I
>>lowered
>> the Œosd max backfills¹ parameter today from the default 10 all the way
>> down to 1 to see if it improves things.  Client read/write performance
>>has
>> definitely improved since then, whether this improves the
>> Œstuck-in-replay¹ situation, I¹m still waiting to see.
>
>Argh. Looks like known bug http://tracker.ceph.com/issues/13116. I've
>pushed a new branch hammer-pg-replay to the gitbuilders which
>backports that patch and ought to improve things if you're able to
>install that to test. (It's untested but I don't foresee any issues
>arising.) I've also added it to the backport queue.
>-Greg

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] fedora core 22

2015-10-27 Thread Brad Hubbard
- Original Message -
> From: "Andrew Hume" 
> To: ceph-users@lists.ceph.com
> Sent: Tuesday, 27 October, 2015 11:13:04 PM
> Subject: [ceph-users] fedora core 22
> 
> a while back, i had installed ceph (firefly i believe) on my fedora core
> system and all went smoothly.
> i went to repeat this yesterday with hammer, but i am stymied by lack of
> packages. there doesn’t
> appear anything for fc21 or fc22.
> 
> i initially tried ceph-deploy, but it fails because of the above issue.
> i then looked at the manual install documentation but am growing nervous
> because
> it is clearly out of date (contents of ceph.conf are different than what
> ceps-deploy generated).
> 
> how do i make progress?

$ dnf list ceph
Last metadata expiration check performed 7 days, 16:01:51 ago on Tue Oct 20 
18:22:03 2015.
Available Packages
ceph.x86_64  1:0.94.3-1.fc22
  updates

> 
>   andrew
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about rbd flag(RBD_FLAG_OBJECT_MAP_INVALID)

2015-10-27 Thread Shu, Xinxin
Thanks for your reply,  why not rebuild object-map when object-map feature is 
enabled.

Cheers,
xinxin

-Original Message-
From: Jason Dillaman [mailto:dilla...@redhat.com] 
Sent: Tuesday, October 27, 2015 9:20 PM
To: Shu, Xinxin
Cc: ceph-users
Subject: Re: Question about rbd flag(RBD_FLAG_OBJECT_MAP_INVALID)

> Hi Jason dillaman
> Recently I worked on the feature http://tracker.ceph.com/issues/13500 
> , when I read the code about librbd, I was confused by 
> RBD_FLAG_OBJECT_MAP_INVALID flag.
> When I create a rbd with “—image-features = 13 ” , we enable 
> object-map feature without setting RBD_FLAG_OBJECT_MAP_INVALID, then 
> write data to generate an object, the existence of this object can be 
> checked by object_may_exist.
> But when I use “feature enable ${name} object-map” to enable 
> object-map feature of a clone rbd(we cannot specify –image-features 
> option when I clone rbd), and RBD_FLAG_OBJECT_MAP_INVALID flag is set. 
> If I use object_may_exist to check object existence, object_may_exist 
> function return true, which means the object exists.

When you create a new (empty) image with object map enabled from the start, the 
object map is valid since it defaults to all objects don't exist.  If you use 
'rbd feature enable  object-map', the object map will be flagged as 
invalid since you may have already written to the object (and thus the object 
map doesn't potentially match reality).  When an object map is flagged as 
invalid, any optimizations for whether the block exists or not are disabled.  

> So there maybe inconsistency with these two methods (--image-features vs.
> feature enable) when we create a rbd. Is this a bug ?
> My question is what does RBD_FLAG_OBJECT_MAP_INVALID flag mean, does 
> it mean the object map of rbd is not valid, we need rebuild the object map?

Yes, you need to rebuild an invalid object map via 'rbd object-map rebuild 
' to clear the RBD_FLAG_OBJECT_MAP_INVALID flag.  The 
rebuild process whether or not each potential object within the RBD image 
exists, and updates the object map accordingly. 

-- 

Jason Dillaman 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Read-out much slower than write-in on my ceph cluster

2015-10-27 Thread FaHui Lin

Dear Ceph experts,

I found something strange about the performance of my Ceph cluster: 
Read-out much slower than write-in.


I have 3 machines running OSDs, each has 8 OSDs running on 8 raid0s 
(each made up of 2 HDDs) respectively. The OSD journal and data the is 
on the same device.  All machines in my clusters have 10Gb network.


I used both Ceph RBD and CephFS, the client on another machine outside 
cluster or on one of the running OSD (to rule out possible network 
issue), an so on. All of these end up in a similar results: write-in can 
almost reach the network limit, say 1200 MB/s, while read-out is only 
350~450 MB/s.


Trying to figure out, I did an extra test using CephFS:

Version and Config:
[root@dl-disk1 ~]# ceph --version
ceph version *0.94.3* (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
[root@dl-disk1 ~]# cat /etc/ceph/ceph.conf
[global]
fsid = (hidden)
mon_initial_members = dl-disk1, dl-disk2, dl-disk3
mon_host = (hidden)
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true

OSD tree:
# ceph osd tree
ID WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 258.88000 root default
-2  87.28000 host dl-disk1
 0  10.90999 osd.0  up  1.0  1.0
 1  10.90999 osd.1  up  1.0  1.0
 2  10.90999 osd.2  up  1.0  1.0
 3  10.90999 osd.3  up  1.0  1.0
 4  10.90999 osd.4  up  1.0  1.0
 5  10.90999 osd.5  up  1.0  1.0
 6  10.90999 osd.6  up  1.0  1.0
 7  10.90999 osd.7  up  1.0  1.0
-3  87.28000 host dl-disk2
 8  10.90999 osd.8  up  1.0  1.0
 9  10.90999 osd.9  up  1.0  1.0
10  10.90999 osd.10 up  1.0  1.0
11  10.90999 osd.11 up  1.0  1.0
12  10.90999 osd.12 up  1.0  1.0
13  10.90999 osd.13 up  1.0  1.0
14  10.90999 osd.14 up  1.0  1.0
15  10.90999 osd.15 up  1.0  1.0
-4  84.31999 host dl-disk3
16  10.53999 osd.16 up  1.0  1.0
17  10.53999 osd.17 up  1.0  1.0
18  10.53999 osd.18 up  1.0  1.0
19  10.53999 osd.19 up  1.0  1.0
20  10.53999 osd.20 up  1.0  1.0
21  10.53999 osd.21 up  1.0  1.0
22  10.53999 osd.22 up  1.0  1.0
23  10.53999 osd.23 up  1.0  1.0

Pools and PG (each pool has 128 PGs):
# ceph osd lspools
0 rbd,2 fs_meta,3 fs_data0,4 fs_data1,
# ceph pg dump pools
dumped pools in format plain
pg_stat objects mip degrmispunf bytes log disklog
pool 0  0   0   0   0   0   0 0   0
pool 2  20  0   0   0   0   356958 264 264
pool 3  32640   0   0   0 16106127360 14657   14657
pool 4  0   0   0   0   0   0 0   0

To simplify the problem, I made a new crush rule that the CephFS data 
pool use OSDs on only one machine (dl-disk1 here), and size = 1.

# ceph osd crush rule dump osd_in_dl-disk1__ruleset
{
"rule_id": 1,
"rule_name": "osd_in_dl-disk1__ruleset",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -2,
"item_name": "dl-disk1"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "osd"
},
{
"op": "emit"
}
]
}
# ceph osd pool get fs_data0 crush_ruleset
crush_ruleset: 1
# ceph osd pool get fs_data0 size
size: 1

Here starts the test.
On an client machine, I used dd to write a 4GB-file to CephFS, and 
checked dstat on the OSD node dl-disk1:

[root@client ~]# dd of=/mnt/cephfs/4Gfile if=/dev/zero bs=4096k count=1024
1024+0 records in
1024+0 records out
4294967296 bytes (4.3 GB) copied, 3.69993 s, 1.2 GB/s

[root@dl-disk1 ~]# dstat ...
---total-cpu-usage --memory-usage- -net/total- 
--dsk/sdb-dsk/sdc-dsk/sdd-dsk/sde-dsk/sdf-dsk/sdg-dsk/sdh-dsk/sdi--
usr sys idl wai hiq siq| used  buff  cach  free| recv  send| read  writ: 
read  writ: read  writ: read  writ: read  writ: read  writ: read  writ: 
read  writ


  0   0 100   0   0   0|3461M 67.2M 15.1G 44.3G|  19k   20k| 0 0 
:   0 0 :   0 0 :   0 0 :   0 0 : 0 0 :   0 0 
:   0 0
  0   0 100   0   0   0|3461M 67.2M 15.1G 44.3G|  32k   32k| 0 0 
:   0 0 :   0 0 :   0 0 :   0 0 : 0 0 :   0 0 
:   0 0
  8  18  74   

Re: [ceph-users] rsync mirror download.ceph.com - broken file on rsync server

2015-10-27 Thread Björn Lässig

On 10/27/2015 02:50 PM, Wido den Hollander wrote:

i run a secondary debian mirror at my old university for more than 10
years now. If you start working on that, please keep me in the loop.
Maybe starting with a ceph-mirror mailinglist would be a possibility, to
concentrate all interested people.


Yes, that would be an idea. I can ask for that mailinglist.

But keep in mind, download.ceph.com doesn't only serve DEB, it also
serves tarballs and RPM packages.


yes. thats why we would need help from some fedora/redhat/centos admins. 
I do not know how their stuff is distributed. i am a debian guy since i 
crushed yast a very long time ago.


I see 3 points:
 * give clusteradmins an easy way to get a OS/ceph-version mirror for
   their architecture (this is my use case)
 * doing a full mirror (eu.ceph.com)
 * inform mirror-owners (push mirrors), if something changed

Björn
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rsync mirror download.ceph.com - broken file on rsync server

2015-10-27 Thread Wido den Hollander


On 27-10-15 11:45, Björn Lässig wrote:
> On 10/27/2015 10:22 AM, Wido den Hollander wrote:
>> On 27-10-15 09:51, Björn Lässig wrote:
>>> after having some problems with ipv6 and download.ceph.com, i made a
>>> mirror (debian-hammer only) for my ipv6-only cluster.
>>
>> I see you are from Germany, you can also sync from eu.ceph.com
> 
> good to know, that you give rsync access, too.
> The Problem here is that there is no indication, that the primary mirror
> (download.ceph.com) is in a complete state. I suppose you you use some
> sort of cron-method for synchronizing with download.c.c. So in case the
> primary ist broken, the chain needs more time to repair.
> Thats why i decided it is more robust to rsync from download.c.c
> directly. i will change this, if eu.c.c is pushed on update by download.c.c
> 
>>> PS: a debian like mirror-push mechanic via
>>> https://ftp-master.debian.org/git/archvsync.git would be very helpful.
>>> This would add traces and Archive-in-progress files and all the other
>>> things, the debian-mirror-admins had to fix over the years. This would
>>> lead to less traffic for you and always up-to-date mirrors for us.
>>
>> That is still on my TODO list. We need a nice mirror system for Ceph.
> 
> i run a secondary debian mirror at my old university for more than 10
> years now. If you start working on that, please keep me in the loop.
> Maybe starting with a ceph-mirror mailinglist would be a possibility, to
> concentrate all interested people.
> 
Yes, that would be an idea. I can ask for that mailinglist.

But keep in mind, download.ceph.com doesn't only serve DEB, it also
serves tarballs and RPM packages.

Wido

> Björn
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] BAD nvme SSD performance

2015-10-27 Thread Mark Nelson

On 10/27/2015 06:37 AM, Matteo Dacrema wrote:

Hi,

thanks for all the replies.

I've found the issue:
The Samsung nvme SSD has poor performance with sync=1. It reach only 4/5 k iops 
with randwrite ops.

Using Intel DC S3700 SSDs I'm able to saturate the CPU.

I'm using hammer v 0.94.5 on Ubuntu 14.04 and 3.19.0-31 kernel

What do you think about Intel 750 series : 
http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-750-series.html


I briefly considered 750s for some test boxes that wouldn't actually 
hold real data, but even in that situation the write endurance is pretty 
scary looking.  I imagine they are probably going to do better than the 
very low rating Intel give them (they sure look a lot like rebadged 
P3500s), but in the end I ended up going with P3700s.  The rated writ 
endurance is just so much higher than it was worth the extra price (to 
us at least).




I plan to use it for cache layer ( one for host - is it a problem? )
Behind the cache layer I plan to use Mechanical HDD with Journal on SSD drives.

What do you think about it?

Thanks
Regards,
Matteo

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Somnath Roy
Sent: lunedì 26 ottobre 2015 17:45
To: Christian Balzer ; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] BAD nvme SSD performance

Another point,
As Christian mentioned, try to evaluate O_DIRECT|O_DSYNC performance of a SSD 
before choosing that for Ceph..
Try to run with direct=1 and sync =1 with fio to a raw ssd drive..

Thanks & Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Somnath Roy
Sent: Monday, October 26, 2015 9:20 AM
To: Christian Balzer; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] BAD nvme SSD performance

One thing, *don't* trust iostat disk util% in case of SSDs..100% doesn't mean 
you are saturating SSDs there..I have seen a large performance delta even if 
iostat is reporting 100% disk util in both the cases.
Also, the ceph.conf file you are using is not optimal..Try to add these..

debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_filer = 0/0
debug_objecter = 0/0
debug_rados = 0/0
debug_rbd = 0/0
debug_journaler = 0/0
debug_objectcatcher = 0/0
debug_client = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
debug_mon = 0/0
debug_paxos = 0/0
debug_rgw = 0/0

You didn't mention anything about your cpu, considering you have powerful cpu 
complex for SSDs tweak this to high number of shards..It also depends on number 
of OSDs per box..

osd_op_num_threads_per_shard
osd_op_num_shards


Don't need to change the following..

osd_disk_threads
osd_op_threads


Instead, try increasing..

filestore_op_threads

Use the following in the global section..

ms_dispatch_throttle_bytes = 0
throttler_perf_counter = false

Change the following..
filestore_max_sync_interval = 1   (or even lower, need to lower 
filestore_min_sync_interval as well)


I am assuming you are using hammer and newer..

Thanks & Regards
Somnath

Try increasing the following to very big numbers..


filestore_queue_max_ops = 2000

filestore_queue_max_bytes = 536870912

filestore_queue_committing_max_ops = 500

filestore_queue_committing_max_bytes = 268435456


Use the following..

osd_enable_op_tracker = false


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Christian Balzer
Sent: Monday, October 26, 2015 8:23 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] BAD nvme SSD performance


Hello,

On Mon, 26 Oct 2015 14:35:19 +0100 Wido den Hollander wrote:




On 26-10-15 14:29, Matteo Dacrema wrote:

Hi Nick,



I also tried to increase iodepth but nothing has changed.



With iostat I noticed that the disk is fully utilized and write per
seconds from iostat match fio output.



Ceph isn't fully optimized to get the maximum potential out of NVME
SSDs yet.


Indeed. Don't expect Ceph to be near raw SSD performance.

However he writes that per iostat the SSD is fully utilized.

Matteo, can you run run atop instead of iostat and confirm that:

a) utilization of the SSD is 100%.
b) CPU is not the bottleneck.

My guess would be these particular NVMe SSDs might just suffer from the same 
direct sync I/O deficiencies as other Samsung SSDs.
This feeling is re-affirmed by seeing Samsung listing them as a Client SSDs, 
not data center one.
http://www.samsung.com/semiconductor/products/flash-storage/client-ssd/MZHPV256HDGL?ia=831

Regards,

Christian


For example, NVM-E SSDs work best with very high queue depths and
parallel IOps.

Also, be aware that Ceph add multiple layers to the whole I/O
subsystem and that