from:"Olivier Bonvalet"

[ceph-users] add writeback to Bluestore thanks to lvm-writecache

2019-08-13 Thread Olivier Bonvalet

Hi,

we use OSDs with data on HDD and db/wal on NVMe.
But for now, BlueStore.DB and BlueStore.WAL only store medadata NOT
data. Right ?

So, when we migrated from :
A) Filestore + HDD with hardware writecache + journal on SSD
to :
B) Bluestore + HDD without hardware writecache + DB/WAL on NVMe

Performance on ours random-write workloads drops.

Since default OSD setup now use LVM, enabling LVM-writecache is easy.
But is it a good idea ? Do you try it ?

Thanks,

Olivier

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd_memory_target exceeding on Luminous OSD BlueStore

2019-04-09 Thread Olivier Bonvalet

Good point, thanks !

By making memory pressure (by playing with vm.min_free_kbytes), memory
is freed by the kernel.

So I think I essentially need to update monitoring rules, to avoid
false positive.

Thanks, I continue to read your resources.


Le mardi 09 avril 2019 à 09:30 -0500, Mark Nelson a écrit :
> My understanding is that basically the kernel is either unable or 
> uninterested (maybe due to lack of memory pressure?) in reclaiming
> the 
> memory .  It's possible you might have better behavior if you set 
> /sys/kernel/mm/khugepaged/max_ptes_none to a low value (maybe 0) or 
> maybe disable transparent huge pages entirely.
> 
> 
> Some background:
> 
> https://github.com/gperftools/gperftools/issues/1073
> 
> https://blog.nelhage.com/post/transparent-hugepages/
> 
> https://www.kernel.org/doc/Documentation/vm/transhuge.txt
> 
> 
> Mark
> 
> 
> On 4/9/19 7:31 AM, Olivier Bonvalet wrote:
> > Well, Dan seems to be right :
> > 
> > _tune_cache_size
> >  target: 4294967296
> >heap: 6514409472
> >unmapped: 2267537408
> >  mapped: 4246872064
> > old cache_size: 2845396873
> > new cache size: 2845397085
> > 
> > 
> > So we have 6GB in heap, but "only" 4GB mapped.
> > 
> > But "ceph tell osd.* heap release" should had release that ?
> > 
> > 
> > Thanks,
> > 
> > Olivier
> > 
> > 
> > Le lundi 08 avril 2019 à 16:09 -0500, Mark Nelson a écrit :
> > > One of the difficulties with the osd_memory_target work is that
> > > we
> > > can't
> > > tune based on the RSS memory usage of the process. Ultimately
> > > it's up
> > > to
> > > the kernel to decide to reclaim memory and especially with
> > > transparent
> > > huge pages it's tough to judge what the kernel is going to do
> > > even
> > > if
> > > memory has been unmapped by the process.  Instead the autotuner
> > > looks
> > > at
> > > how much memory has been mapped and tries to balance the caches
> > > based
> > > on
> > > that.
> > > 
> > > 
> > > In addition to Dan's advice, you might also want to enable debug
> > > bluestore at level 5 and look for lines containing "target:" and
> > > "cache_size:".  These will tell you the current target, the
> > > mapped
> > > memory, unmapped memory, heap size, previous aggregate cache
> > > size,
> > > and
> > > new aggregate cache size.  The other line will give you a break
> > > down
> > > of
> > > how much memory was assigned to each of the bluestore caches and
> > > how
> > > much each case is using.  If there is a memory leak, the
> > > autotuner
> > > can
> > > only do so much.  At some point it will reduce the caches to fit
> > > within
> > > cache_min and leave it there.
> > > 
> > > 
> > > Mark
> > > 
> > > 
> > > On 4/8/19 5:18 AM, Dan van der Ster wrote:
> > > > Which OS are you using?
> > > > With CentOS we find that the heap is not always automatically
> > > > released. (You can check the heap freelist with `ceph tell
> > > > osd.0
> > > > heap
> > > > stats`).
> > > > As a workaround we run this hourly:
> > > > 
> > > > ceph tell mon.* heap release
> > > > ceph tell osd.* heap release
> > > > ceph tell mds.* heap release
> > > > 
> > > > -- Dan
> > > > 
> > > > On Sat, Apr 6, 2019 at 1:30 PM Olivier Bonvalet <
> > > > ceph.l...@daevel.fr> wrote:
> > > > > Hi,
> > > > > 
> > > > > on a Luminous 12.2.11 deploiement, my bluestore OSD exceed
> > > > > the
> > > > > osd_memory_target :
> > > > > 
> > > > > daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd
> > > > > ceph3646 17.1 12.0 6828916 5893136 ? Ssl  mars29
> > > > > 1903:42 /usr/bin/ceph-osd -f --cluster ceph --id 143 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > > ceph3991 12.9 11.2 6342812 5485356 ? Ssl  mars29
> > > > > 1443:41 /usr/bin/ceph-osd -f --cluster ceph --id 144 --
> > > > > setuser
> > > > > ceph --setgroup ceph
> > > > > ceph4361 16.9 11.8 6718432 5783584 ? Ssl  mars29
> > > > > 1

Re: [ceph-users] osd_memory_target exceeding on Luminous OSD BlueStore

2019-04-09 Thread Olivier Bonvalet

Well, Dan seems to be right :

_tune_cache_size
target: 4294967296
  heap: 6514409472
  unmapped: 2267537408
mapped: 4246872064
old cache_size: 2845396873
new cache size: 2845397085


So we have 6GB in heap, but "only" 4GB mapped.

But "ceph tell osd.* heap release" should had release that ?


Thanks,

Olivier


Le lundi 08 avril 2019 à 16:09 -0500, Mark Nelson a écrit :
> One of the difficulties with the osd_memory_target work is that we
> can't 
> tune based on the RSS memory usage of the process. Ultimately it's up
> to 
> the kernel to decide to reclaim memory and especially with
> transparent 
> huge pages it's tough to judge what the kernel is going to do even
> if 
> memory has been unmapped by the process.  Instead the autotuner looks
> at 
> how much memory has been mapped and tries to balance the caches based
> on 
> that.
> 
> 
> In addition to Dan's advice, you might also want to enable debug 
> bluestore at level 5 and look for lines containing "target:" and 
> "cache_size:".  These will tell you the current target, the mapped 
> memory, unmapped memory, heap size, previous aggregate cache size,
> and 
> new aggregate cache size.  The other line will give you a break down
> of 
> how much memory was assigned to each of the bluestore caches and how 
> much each case is using.  If there is a memory leak, the autotuner
> can 
> only do so much.  At some point it will reduce the caches to fit
> within 
> cache_min and leave it there.
> 
> 
> Mark
> 
> 
> On 4/8/19 5:18 AM, Dan van der Ster wrote:
> > Which OS are you using?
> > With CentOS we find that the heap is not always automatically
> > released. (You can check the heap freelist with `ceph tell osd.0
> > heap
> > stats`).
> > As a workaround we run this hourly:
> > 
> > ceph tell mon.* heap release
> > ceph tell osd.* heap release
> > ceph tell mds.* heap release
> > 
> > -- Dan
> > 
> > On Sat, Apr 6, 2019 at 1:30 PM Olivier Bonvalet <
> > ceph.l...@daevel.fr> wrote:
> > > Hi,
> > > 
> > > on a Luminous 12.2.11 deploiement, my bluestore OSD exceed the
> > > osd_memory_target :
> > > 
> > > daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd
> > > ceph3646 17.1 12.0 6828916 5893136 ? Ssl  mars29
> > > 1903:42 /usr/bin/ceph-osd -f --cluster ceph --id 143 --setuser
> > > ceph --setgroup ceph
> > > ceph3991 12.9 11.2 6342812 5485356 ? Ssl  mars29
> > > 1443:41 /usr/bin/ceph-osd -f --cluster ceph --id 144 --setuser
> > > ceph --setgroup ceph
> > > ceph4361 16.9 11.8 6718432 5783584 ? Ssl  mars29
> > > 1889:41 /usr/bin/ceph-osd -f --cluster ceph --id 145 --setuser
> > > ceph --setgroup ceph
> > > ceph4731 19.7 12.2 6949584 5982040 ? Ssl  mars29
> > > 2198:47 /usr/bin/ceph-osd -f --cluster ceph --id 146 --setuser
> > > ceph --setgroup ceph
> > > ceph5073 16.7 11.6 6639568 5701368 ? Ssl  mars29
> > > 1866:05 /usr/bin/ceph-osd -f --cluster ceph --id 147 --setuser
> > > ceph --setgroup ceph
> > > ceph5417 14.6 11.2 6386764 5519944 ? Ssl  mars29
> > > 1634:30 /usr/bin/ceph-osd -f --cluster ceph --id 148 --setuser
> > > ceph --setgroup ceph
> > > ceph5760 16.9 12.0 6806448 5879624 ? Ssl  mars29
> > > 1882:42 /usr/bin/ceph-osd -f --cluster ceph --id 149 --setuser
> > > ceph --setgroup ceph
> > > ceph6105 16.0 11.6 6576336 5694556 ? Ssl  mars29
> > > 1782:52 /usr/bin/ceph-osd -f --cluster ceph --id 150 --setuser
> > > ceph --setgroup ceph
> > > 
> > > daevel-ob@ssdr712h:~$ free -m
> > >totalusedfree  shared  buff/ca
> > > che   available
> > > Mem:  47771   452101643  17 9
> > > 17   43556
> > > Swap: 0   0   0
> > > 
> > > # ceph daemon osd.147 config show | grep memory_target
> > >  "osd_memory_target": "4294967296",
> > > 
> > > 
> > > And there is no recovery / backfilling, the cluster is fine :
> > > 
> > > $ ceph status
> > >   cluster:
> > > id: de035250-323d-4cf6-8c4b-cf0faf6296b1
> > > health: HEALTH_OK
> > > 
> > >   services:
> > > mon: 5 daemons, quorum tolriq,tsyne,olkas,lorunde,amphel
> > > mgr: tsyne(active), standbys: olkas, tolriq, lorunde,
> > &

Re: [ceph-users] osd_memory_target exceeding on Luminous OSD BlueStore

2019-04-09 Thread Olivier Bonvalet

Thanks for the advice, we are using Debian 9 (stretch), with a custom
Linux kernel 4.14.

But "heap release" didn't help.


Le lundi 08 avril 2019 à 12:18 +0200, Dan van der Ster a écrit :
> Which OS are you using?
> With CentOS we find that the heap is not always automatically
> released. (You can check the heap freelist with `ceph tell osd.0 heap
> stats`).
> As a workaround we run this hourly:
> 
> ceph tell mon.* heap release
> ceph tell osd.* heap release
> ceph tell mds.* heap release
> 
> -- Dan
> 
> On Sat, Apr 6, 2019 at 1:30 PM Olivier Bonvalet 
> wrote:
> > Hi,
> > 
> > on a Luminous 12.2.11 deploiement, my bluestore OSD exceed the
> > osd_memory_target :
> > 
> > daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd
> > ceph3646 17.1 12.0 6828916 5893136 ? Ssl  mars29
> > 1903:42 /usr/bin/ceph-osd -f --cluster ceph --id 143 --setuser ceph
> > --setgroup ceph
> > ceph3991 12.9 11.2 6342812 5485356 ? Ssl  mars29
> > 1443:41 /usr/bin/ceph-osd -f --cluster ceph --id 144 --setuser ceph
> > --setgroup ceph
> > ceph4361 16.9 11.8 6718432 5783584 ? Ssl  mars29
> > 1889:41 /usr/bin/ceph-osd -f --cluster ceph --id 145 --setuser ceph
> > --setgroup ceph
> > ceph4731 19.7 12.2 6949584 5982040 ? Ssl  mars29
> > 2198:47 /usr/bin/ceph-osd -f --cluster ceph --id 146 --setuser ceph
> > --setgroup ceph
> > ceph5073 16.7 11.6 6639568 5701368 ? Ssl  mars29
> > 1866:05 /usr/bin/ceph-osd -f --cluster ceph --id 147 --setuser ceph
> > --setgroup ceph
> > ceph5417 14.6 11.2 6386764 5519944 ? Ssl  mars29
> > 1634:30 /usr/bin/ceph-osd -f --cluster ceph --id 148 --setuser ceph
> > --setgroup ceph
> > ceph5760 16.9 12.0 6806448 5879624 ? Ssl  mars29
> > 1882:42 /usr/bin/ceph-osd -f --cluster ceph --id 149 --setuser ceph
> > --setgroup ceph
> > ceph6105 16.0 11.6 6576336 5694556 ? Ssl  mars29
> > 1782:52 /usr/bin/ceph-osd -f --cluster ceph --id 150 --setuser ceph
> > --setgroup ceph
> > 
> > daevel-ob@ssdr712h:~$ free -m
> >   totalusedfree  shared  buff/cache
> >available
> > Mem:  47771   452101643  17 917
> >43556
> > Swap: 0   0   0
> > 
> > # ceph daemon osd.147 config show | grep memory_target
> > "osd_memory_target": "4294967296",
> > 
> > 
> > And there is no recovery / backfilling, the cluster is fine :
> > 
> >$ ceph status
> >  cluster:
> >id: de035250-323d-4cf6-8c4b-cf0faf6296b1
> >health: HEALTH_OK
> > 
> >  services:
> >mon: 5 daemons, quorum tolriq,tsyne,olkas,lorunde,amphel
> >mgr: tsyne(active), standbys: olkas, tolriq, lorunde, amphel
> >osd: 120 osds: 116 up, 116 in
> > 
> >  data:
> >pools:   20 pools, 12736 pgs
> >objects: 15.29M objects, 31.1TiB
> >usage:   101TiB used, 75.3TiB / 177TiB avail
> >pgs: 12732 active+clean
> > 4 active+clean+scrubbing+deep
> > 
> >  io:
> >client:   72.3MiB/s rd, 26.8MiB/s wr, 2.30kop/s rd,
> > 1.29kop/s wr
> > 
> > 
> >On an other host, in the same pool, I see also high memory usage
> > :
> > 
> >daevel-ob@ssdr712g:~$ ps auxw | grep ceph-osd
> >ceph6287  6.6 10.6 6027388 5190032 ? Ssl  mars21
> > 1511:07 /usr/bin/ceph-osd -f --cluster ceph --id 131 --setuser ceph
> > --setgroup ceph
> >ceph6759  7.3 11.2 6299140 5484412 ? Ssl  mars21
> > 1665:22 /usr/bin/ceph-osd -f --cluster ceph --id 132 --setuser ceph
> > --setgroup ceph
> >ceph7114  7.0 11.7 6576168 5756236 ? Ssl  mars21
> > 1612:09 /usr/bin/ceph-osd -f --cluster ceph --id 133 --setuser ceph
> > --setgroup ceph
> >ceph7467  7.4 11.1 6244668 5430512 ? Ssl  mars21
> > 1704:06 /usr/bin/ceph-osd -f --cluster ceph --id 134 --setuser ceph
> > --setgroup ceph
> >ceph7821  7.7 11.1 6309456 5469376 ? Ssl  mars21
> > 1754:35 /usr/bin/ceph-osd -f --cluster ceph --id 135 --setuser ceph
> > --setgroup ceph
> >ceph8174  6.9 11.6 6545224 5705412 ? Ssl  mars21
> > 1590:31 /usr/bin/ceph-osd -f --cluster ceph --id 136 --setuser ceph
> > --setgroup ceph
> >ceph8746  6.6 11.1 6290004 5477204 ? Ssl  mars21
> > 1511:11 /usr/bin/ceph-osd -f --cluster ceph --id 137 --setuser ceph
> > --set

[ceph-users] osd_memory_target exceeding on Luminous OSD BlueStore

2019-04-06 Thread Olivier Bonvalet

Hi,

on a Luminous 12.2.11 deploiement, my bluestore OSD exceed the
osd_memory_target :

daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd
ceph3646 17.1 12.0 6828916 5893136 ? Ssl  mars29 1903:42 
/usr/bin/ceph-osd -f --cluster ceph --id 143 --setuser ceph --setgroup ceph
ceph3991 12.9 11.2 6342812 5485356 ? Ssl  mars29 1443:41 
/usr/bin/ceph-osd -f --cluster ceph --id 144 --setuser ceph --setgroup ceph
ceph4361 16.9 11.8 6718432 5783584 ? Ssl  mars29 1889:41 
/usr/bin/ceph-osd -f --cluster ceph --id 145 --setuser ceph --setgroup ceph
ceph4731 19.7 12.2 6949584 5982040 ? Ssl  mars29 2198:47 
/usr/bin/ceph-osd -f --cluster ceph --id 146 --setuser ceph --setgroup ceph
ceph5073 16.7 11.6 6639568 5701368 ? Ssl  mars29 1866:05 
/usr/bin/ceph-osd -f --cluster ceph --id 147 --setuser ceph --setgroup ceph
ceph5417 14.6 11.2 6386764 5519944 ? Ssl  mars29 1634:30 
/usr/bin/ceph-osd -f --cluster ceph --id 148 --setuser ceph --setgroup ceph
ceph5760 16.9 12.0 6806448 5879624 ? Ssl  mars29 1882:42 
/usr/bin/ceph-osd -f --cluster ceph --id 149 --setuser ceph --setgroup ceph
ceph6105 16.0 11.6 6576336 5694556 ? Ssl  mars29 1782:52 
/usr/bin/ceph-osd -f --cluster ceph --id 150 --setuser ceph --setgroup ceph

daevel-ob@ssdr712h:~$ free -m
  totalusedfree  shared  buff/cache   available
Mem:  47771   452101643  17 917   43556
Swap: 0   0   0

# ceph daemon osd.147 config show | grep memory_target
"osd_memory_target": "4294967296",


And there is no recovery / backfilling, the cluster is fine :

   $ ceph status
 cluster:
   id: de035250-323d-4cf6-8c4b-cf0faf6296b1
   health: HEALTH_OK

 services:
   mon: 5 daemons, quorum tolriq,tsyne,olkas,lorunde,amphel
   mgr: tsyne(active), standbys: olkas, tolriq, lorunde, amphel
   osd: 120 osds: 116 up, 116 in

 data:
   pools:   20 pools, 12736 pgs
   objects: 15.29M objects, 31.1TiB
   usage:   101TiB used, 75.3TiB / 177TiB avail
   pgs: 12732 active+clean
4 active+clean+scrubbing+deep

 io:
   client:   72.3MiB/s rd, 26.8MiB/s wr, 2.30kop/s rd, 1.29kop/s wr


   On an other host, in the same pool, I see also high memory usage :

   daevel-ob@ssdr712g:~$ ps auxw | grep ceph-osd
   ceph6287  6.6 10.6 6027388 5190032 ? Ssl  mars21 1511:07 
/usr/bin/ceph-osd -f --cluster ceph --id 131 --setuser ceph --setgroup ceph
   ceph6759  7.3 11.2 6299140 5484412 ? Ssl  mars21 1665:22 
/usr/bin/ceph-osd -f --cluster ceph --id 132 --setuser ceph --setgroup ceph
   ceph7114  7.0 11.7 6576168 5756236 ? Ssl  mars21 1612:09 
/usr/bin/ceph-osd -f --cluster ceph --id 133 --setuser ceph --setgroup ceph
   ceph7467  7.4 11.1 6244668 5430512 ? Ssl  mars21 1704:06 
/usr/bin/ceph-osd -f --cluster ceph --id 134 --setuser ceph --setgroup ceph
   ceph7821  7.7 11.1 6309456 5469376 ? Ssl  mars21 1754:35 
/usr/bin/ceph-osd -f --cluster ceph --id 135 --setuser ceph --setgroup ceph
   ceph8174  6.9 11.6 6545224 5705412 ? Ssl  mars21 1590:31 
/usr/bin/ceph-osd -f --cluster ceph --id 136 --setuser ceph --setgroup ceph
   ceph8746  6.6 11.1 6290004 5477204 ? Ssl  mars21 1511:11 
/usr/bin/ceph-osd -f --cluster ceph --id 137 --setuser ceph --setgroup ceph
   ceph9100  7.7 11.6 6552080 5713560 ? Ssl  mars21 1757:22 
/usr/bin/ceph-osd -f --cluster ceph --id 138 --setuser ceph --setgroup ceph

   But ! On a similar host, in a different pool, the problem is less visible :

   daevel-ob@ssdr712i:~$ ps auxw | grep ceph-osd
   ceph3617  2.8  9.9 5660308 4847444 ? Ssl  mars29 313:05 
/usr/bin/ceph-osd -f --cluster ceph --id 151 --setuser ceph --setgroup ceph
   ceph3958  2.3  9.8 5661936 4834320 ? Ssl  mars29 256:55 
/usr/bin/ceph-osd -f --cluster ceph --id 152 --setuser ceph --setgroup ceph
   ceph4299  2.3  9.8 5620616 4807248 ? Ssl  mars29 266:26 
/usr/bin/ceph-osd -f --cluster ceph --id 153 --setuser ceph --setgroup ceph
   ceph4643  2.3  9.6 5527724 4713572 ? Ssl  mars29 262:50 
/usr/bin/ceph-osd -f --cluster ceph --id 154 --setuser ceph --setgroup ceph
   ceph5016  2.2  9.7 5597504 4783412 ? Ssl  mars29 248:37 
/usr/bin/ceph-osd -f --cluster ceph --id 155 --setuser ceph --setgroup ceph
   ceph5380  2.8  9.9 5700204 4886432 ? Ssl  mars29 321:05 
/usr/bin/ceph-osd -f --cluster ceph --id 156 --setuser ceph --setgroup ceph
   ceph5724  3.1 10.1 5767456 4953484 ? Ssl  mars29 352:55 
/usr/bin/ceph-osd -f --cluster ceph --id 157 --setuser ceph --setgroup ceph
   ceph6070  2.7  9.9 5683092 4868632 ? Ssl  mars29 309:10 
/usr/bin/ceph-osd -f --cluster ceph --id 158 --setuser ceph --setgroup ceph


   Is there some memory leak ? Or should I expect that

[ceph-users] Bluestore & snapshots weight

2018-10-28 Thread Olivier Bonvalet

Hi,

with Filestore, to estimate the weight of snapshot we use a simple find
script on each OSD :

nice find "$OSDROOT/$OSDDIR/current/" \
-type f -not -name '*_head_*' -not -name '*_snapdir_*' \
-printf '%P\n'

Then we agregate by image prefix, and obtain an estimation of each
snapshot weight. We use this method because we never found this
information in Ceph tools.

Now with Bluestore we can't use this script anymore. Is there an other
way to obtain this information ?

I read that we can "mount" inactive OSD with "ceph-objectstore-tool",
but I can't shutdown OSDs for this.

Thanks for any help,

Olivier

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet


Le vendredi 21 septembre 2018 à 19:45 +0200, Paul Emmerich a écrit :
> The cache tiering has nothing to do with the PG of the underlying
> pool
> being incomplete.
> You are just seeing these requests as stuck because it's the only
> thing trying to write to the underlying pool.

I agree, It was just to be sure that the problems on OSD 32, 68 and 69
are related to only one "real" problem.


> What you need to fix is the PG showing incomplete.  I assume you
> already tried reducing the min_size to 4 as suggested? Or did you by
> chance always run with min_size 4 on the ec pool, which is a common
> cause for problems like this.

Yes, it has always run with min_size 4.

We use Luminous 12.2.8 here, but some (~40%) OSD still run Luminous
12.2.7. I was hoping to "fix" this problem before to continue
upgrading.

pool details :

pool 37 'bkp-foo-raid6' erasure size 6 min_size 4 crush_rule 20
object_hash rjenkins pg_num 256 pgp_num 256 last_change 585715 lfor
585714/585714 flags hashpspool,backfillfull stripe_width 4096 fast_read
1 application rbd
removed_snaps [1~3]




> Can you share the output of "ceph osd pool ls detail"?
> Also, which version of Ceph are you running?
> Paul
> 
> Am Fr., 21. Sep. 2018 um 19:28 Uhr schrieb Olivier Bonvalet
> :
> > 
> > So I've totally disable cache-tiering and overlay. Now OSD 68 & 69
> > are
> > fine, no more blocked.
> > 
> > But OSD 32 is still blocked, and PG 37.9c still marked incomplete
> > with
> > :
> > 
> > "recovery_state": [
> > {
> > "name": "Started/Primary/Peering/Incomplete",
> > "enter_time": "2018-09-21 18:56:01.222970",
> > "comment": "not enough complete instances of this PG"
> > },
> > 
> > But I don't see blocked requests in OSD.32 logs, should I increase
> > one
> > of the "debug_xx" flag ?
> > 
> > 
> > Le vendredi 21 septembre 2018 à 16:51 +0200, Maks Kowalik a écrit :
> > > According to the query output you pasted shards 1 and 2 are
> > > broken.
> > > But, on the other hand EC profile (4+2) should make it possible
> > > to
> > > recover from 2 shards lost simultanously...
> > > 
> > > pt., 21 wrz 2018 o 16:29 Olivier Bonvalet 
> > > napisał(a):
> > > > Well on drive, I can find thoses parts :
> > > > 
> > > > - cs0 on OSD 29 and 30
> > > > - cs1 on OSD 18 and 19
> > > > - cs2 on OSD 13
> > > > - cs3 on OSD 66
> > > > - cs4 on OSD 0
> > > > - cs5 on OSD 75
> > > > 
> > > > And I can read thoses files too.
> > > > 
> > > > And all thoses OSD are UP and IN.
> > > > 
> > > > 
> > > > Le vendredi 21 septembre 2018 à 13:10 +, Eugen Block a
> > > > écrit :
> > > > > > > I tried to flush the cache with "rados -p cache-bkp-foo
> > > > 
> > > > cache-
> > > > > > > flush-
> > > > > > > evict-all", but it blocks on the object
> > > > > > > "rbd_data.f66c92ae8944a.000f2596".
> > > > > 
> > > > > This is the object that's stuck in the cache tier (according
> > > > > to
> > > > > your
> > > > > output in https://pastebin.com/zrwu5X0w). Can you verify if
> > > > > that
> > > > > block
> > > > > device is in use and healthy or is it corrupt?
> > > > > 
> > > > > 
> > > > > Zitat von Maks Kowalik :
> > > > > 
> > > > > > Could you, please paste the output of pg 37.9c query
> > > > > > 
> > > > > > pt., 21 wrz 2018 o 14:39 Olivier Bonvalet <
> > > > > > ceph.l...@daevel.fr>
> > > > > > napisał(a):
> > > > > > 
> > > > > > > In fact, one object (only one) seem to be blocked on the
> > > > 
> > > > cache
> > > > > > > tier
> > > > > > > (writeback).
> > > > > > > 
> > > > > > > I tried to flush the cache with "rados -p cache-bkp-foo
> > > > 
> > > > cache-
> > > > > > > flush-
> > > > > > > evict-all", but it blocks on the object
> > > > > > > "rbd_data.f66c92ae8944a.000f2596".
> > > >

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet

So I've totally disable cache-tiering and overlay. Now OSD 68 & 69 are
fine, no more blocked.

But OSD 32 is still blocked, and PG 37.9c still marked incomplete with
:

"recovery_state": [
{
"name": "Started/Primary/Peering/Incomplete",
"enter_time": "2018-09-21 18:56:01.222970",
"comment": "not enough complete instances of this PG"
},

But I don't see blocked requests in OSD.32 logs, should I increase one
of the "debug_xx" flag ?


Le vendredi 21 septembre 2018 à 16:51 +0200, Maks Kowalik a écrit :
> According to the query output you pasted shards 1 and 2 are broken.
> But, on the other hand EC profile (4+2) should make it possible to
> recover from 2 shards lost simultanously... 
> 
> pt., 21 wrz 2018 o 16:29 Olivier Bonvalet 
> napisał(a):
> > Well on drive, I can find thoses parts :
> > 
> > - cs0 on OSD 29 and 30
> > - cs1 on OSD 18 and 19
> > - cs2 on OSD 13
> > - cs3 on OSD 66
> > - cs4 on OSD 0
> > - cs5 on OSD 75
> > 
> > And I can read thoses files too.
> > 
> > And all thoses OSD are UP and IN.
> > 
> > 
> > Le vendredi 21 septembre 2018 à 13:10 +, Eugen Block a écrit :
> > > > > I tried to flush the cache with "rados -p cache-bkp-foo
> > cache-
> > > > > flush-
> > > > > evict-all", but it blocks on the object
> > > > > "rbd_data.f66c92ae8944a.000f2596".
> > > 
> > > This is the object that's stuck in the cache tier (according to
> > > your  
> > > output in https://pastebin.com/zrwu5X0w). Can you verify if that
> > > block  
> > > device is in use and healthy or is it corrupt?
> > > 
> > > 
> > > Zitat von Maks Kowalik :
> > > 
> > > > Could you, please paste the output of pg 37.9c query
> > > > 
> > > > pt., 21 wrz 2018 o 14:39 Olivier Bonvalet 
> > > > napisał(a):
> > > > 
> > > > > In fact, one object (only one) seem to be blocked on the
> > cache
> > > > > tier
> > > > > (writeback).
> > > > > 
> > > > > I tried to flush the cache with "rados -p cache-bkp-foo
> > cache-
> > > > > flush-
> > > > > evict-all", but it blocks on the object
> > > > > "rbd_data.f66c92ae8944a.000f2596".
> > > > > 
> > > > > So I reduced (a lot) the cache tier to 200MB, "rados -p
> > cache-
> > > > > bkp-foo
> > > > > ls" now show only 3 objects :
> > > > > 
> > > > > rbd_directory
> > > > > rbd_data.f66c92ae8944a.000f2596
> > > > > rbd_header.f66c92ae8944a
> > > > > 
> > > > > And "cache-flush-evict-all" still hangs.
> > > > > 
> > > > > I also switched the cache tier to "readproxy", to avoid using
> > > > > this
> > > > > cache. But, it's still blocked.
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet
> > a
> > > > > écrit :
> > > > > > Hello,
> > > > > > 
> > > > > > on a Luminous cluster, I have a PG incomplete and I can't
> > find
> > > > > > how to
> > > > > > fix that.
> > > > > > 
> > > > > > It's an EC pool (4+2) :
> > > > > > 
> > > > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75]
> > (reducing
> > > > > > pool
> > > > > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs
> > for
> > > > > > 'incomplete')
> > > > > > 
> > > > > > Of course, we can't reduce min_size from 4.
> > > > > > 
> > > > > > And the full state : https://pastebin.com/zrwu5X0w
> > > > > > 
> > > > > > So, IO are blocked, we can't access thoses damaged data.
> > > > > > OSD blocks too :
> > > > > > osds 32,68,69 have stuck requests > 4194.3 sec
> > > > > > 
> > > > > > OSD 32 is the primary of this PG.
> > > > > > And OSD 68 and 69 are for cache tiering.
> > > > > > 
> > > > > > Any idea how can I fix that ?
> > > > > > 
> > > > > > Thanks,
> > > > > > 
> > > > > > Olivier
> > > > > > 
> > > > > > 
> > > > > > ___
> > > > > > ceph-users mailing list
> > > > > > ceph-users@lists.ceph.com
> > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > > 
> > > > > 
> > > > > ___
> > > > > ceph-users mailing list
> > > > > ceph-users@lists.ceph.com
> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > 
> > > 
> > > 
> > > 
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet

Well on drive, I can find thoses parts :

- cs0 on OSD 29 and 30
- cs1 on OSD 18 and 19
- cs2 on OSD 13
- cs3 on OSD 66
- cs4 on OSD 0
- cs5 on OSD 75

And I can read thoses files too.

And all thoses OSD are UP and IN.


Le vendredi 21 septembre 2018 à 13:10 +, Eugen Block a écrit :
> > > I tried to flush the cache with "rados -p cache-bkp-foo cache-
> > > flush-
> > > evict-all", but it blocks on the object
> > > "rbd_data.f66c92ae8944a.000f2596".
> 
> This is the object that's stuck in the cache tier (according to
> your  
> output in https://pastebin.com/zrwu5X0w). Can you verify if that
> block  
> device is in use and healthy or is it corrupt?
> 
> 
> Zitat von Maks Kowalik :
> 
> > Could you, please paste the output of pg 37.9c query
> > 
> > pt., 21 wrz 2018 o 14:39 Olivier Bonvalet 
> > napisał(a):
> > 
> > > In fact, one object (only one) seem to be blocked on the cache
> > > tier
> > > (writeback).
> > > 
> > > I tried to flush the cache with "rados -p cache-bkp-foo cache-
> > > flush-
> > > evict-all", but it blocks on the object
> > > "rbd_data.f66c92ae8944a.000f2596".
> > > 
> > > So I reduced (a lot) the cache tier to 200MB, "rados -p cache-
> > > bkp-foo
> > > ls" now show only 3 objects :
> > > 
> > > rbd_directory
> > > rbd_data.f66c92ae8944a.000f2596
> > > rbd_header.f66c92ae8944a
> > > 
> > > And "cache-flush-evict-all" still hangs.
> > > 
> > > I also switched the cache tier to "readproxy", to avoid using
> > > this
> > > cache. But, it's still blocked.
> > > 
> > > 
> > > 
> > > 
> > > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a
> > > écrit :
> > > > Hello,
> > > > 
> > > > on a Luminous cluster, I have a PG incomplete and I can't find
> > > > how to
> > > > fix that.
> > > > 
> > > > It's an EC pool (4+2) :
> > > > 
> > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing
> > > > pool
> > > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
> > > > 'incomplete')
> > > > 
> > > > Of course, we can't reduce min_size from 4.
> > > > 
> > > > And the full state : https://pastebin.com/zrwu5X0w
> > > > 
> > > > So, IO are blocked, we can't access thoses damaged data.
> > > > OSD blocks too :
> > > > osds 32,68,69 have stuck requests > 4194.3 sec
> > > > 
> > > > OSD 32 is the primary of this PG.
> > > > And OSD 68 and 69 are for cache tiering.
> > > > 
> > > > Any idea how can I fix that ?
> > > > 
> > > > Thanks,
> > > > 
> > > > Olivier
> > > > 
> > > > 
> > > > ___
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > 
> > > 
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet

Yep :

pool 38 'cache-bkp-foo' replicated size 3 min_size 2 crush_rule 26
object_hash rjenkins pg_num 128 pgp_num 128 last_change 585369 lfor
68255/68255 flags hashpspool,incomplete_clones tier_of 37 cache_mode
readproxy target_bytes 209715200 hit_set
bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 300s
x2 decay_rate 0 search_last_n 0 min_read_recency_for_promote 10
min_write_recency_for_promote 2 stripe_width 0

I can't totally disable the cache tiering, because OSD are in filestore
(so without "overwrites" feature).

Le vendredi 21 septembre 2018 à 13:26 +, Eugen Block a écrit :
> > I also switched the cache tier to "readproxy", to avoid using this
> > cache. But, it's still blocked.
> 
> You could change the cache mode to "none" to disable it. Could you  
> paste the output of:
> 
> ceph osd pool ls detail | grep cache-bkp-foo
> 
> 
> 
> Zitat von Olivier Bonvalet :
> 
> > In fact, one object (only one) seem to be blocked on the cache tier
> > (writeback).
> > 
> > I tried to flush the cache with "rados -p cache-bkp-foo cache-
> > flush-
> > evict-all", but it blocks on the object
> > "rbd_data.f66c92ae8944a.000f2596".
> > 
> > So I reduced (a lot) the cache tier to 200MB, "rados -p cache-bkp-
> > foo
> > ls" now show only 3 objects :
> > 
> > rbd_directory
> > rbd_data.f66c92ae8944a.000f2596
> > rbd_header.f66c92ae8944a
> > 
> > And "cache-flush-evict-all" still hangs.
> > 
> > I also switched the cache tier to "readproxy", to avoid using this
> > cache. But, it's still blocked.
> > 
> > 
> > 
> > 
> > Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a
> > écrit :
> > > Hello,
> > > 
> > > on a Luminous cluster, I have a PG incomplete and I can't find
> > > how to
> > > fix that.
> > > 
> > > It's an EC pool (4+2) :
> > > 
> > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing
> > > pool
> > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
> > > 'incomplete')
> > > 
> > > Of course, we can't reduce min_size from 4.
> > > 
> > > And the full state : https://pastebin.com/zrwu5X0w
> > > 
> > > So, IO are blocked, we can't access thoses damaged data.
> > > OSD blocks too :
> > > osds 32,68,69 have stuck requests > 4194.3 sec
> > > 
> > > OSD 32 is the primary of this PG.
> > > And OSD 68 and 69 are for cache tiering.
> > > 
> > > Any idea how can I fix that ?
> > > 
> > > Thanks,
> > > 
> > > Olivier
> > > 
> > > 
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet

In fact, one object (only one) seem to be blocked on the cache tier
(writeback).

I tried to flush the cache with "rados -p cache-bkp-foo cache-flush-
evict-all", but it blocks on the object
"rbd_data.f66c92ae8944a.000f2596".

So I reduced (a lot) the cache tier to 200MB, "rados -p cache-bkp-foo
ls" now show only 3 objects :

rbd_directory
rbd_data.f66c92ae8944a.000f2596
rbd_header.f66c92ae8944a

And "cache-flush-evict-all" still hangs.

I also switched the cache tier to "readproxy", to avoid using this
cache. But, it's still blocked.




Le vendredi 21 septembre 2018 à 02:14 +0200, Olivier Bonvalet a écrit :
> Hello,
> 
> on a Luminous cluster, I have a PG incomplete and I can't find how to
> fix that.
> 
> It's an EC pool (4+2) :
> 
> pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool
> bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
> 'incomplete')
> 
> Of course, we can't reduce min_size from 4.
> 
> And the full state : https://pastebin.com/zrwu5X0w
> 
> So, IO are blocked, we can't access thoses damaged data.
> OSD blocks too :
> osds 32,68,69 have stuck requests > 4194.3 sec
> 
> OSD 32 is the primary of this PG.
> And OSD 68 and 69 are for cache tiering.
> 
> Any idea how can I fix that ?
> 
> Thanks,
> 
> Olivier
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet

Ok, so it's a replica 3 pool, and OSD 68 & 69 are on the same host.

Le vendredi 21 septembre 2018 à 11:09 +, Eugen Block a écrit :
> > cache-tier on this pool have 26GB of data (for 5.7TB of data on the
> > EC
> > pool).
> > We tried to flush the cache tier, and restart OSD 68 & 69, without
> > any
> > success.
> 
> I meant the replication size of the pool
> 
> ceph osd pool ls detail | grep 
> 
> In the experimental state of our cluster we had a cache tier (for
> rbd  
> pool) with size 2, that can cause problems during recovery. Since
> only  
> OSDs 68 and 69 are mentioned I was wondering if your cache tier
> also  
> has size 2.
> 
> 
> Zitat von Olivier Bonvalet :
> 
> > Hi,
> > 
> > cache-tier on this pool have 26GB of data (for 5.7TB of data on the
> > EC
> > pool).
> > We tried to flush the cache tier, and restart OSD 68 & 69, without
> > any
> > success.
> > 
> > But I don't see any related data on cache-tier OSD (filestore) with
> > :
> > 
> > find /var/lib/ceph/osd/ -maxdepth 3 -name '*37.9c*'
> > 
> > 
> > I don't see any usefull information in logs. Maybe I should
> > increase
> > log level ?
> > 
> > Thanks,
> > 
> > Olivier
> > 
> > 
> > Le vendredi 21 septembre 2018 à 09:34 +, Eugen Block a écrit :
> > > Hi Olivier,
> > > 
> > > what size does the cache tier have? You could set cache-mode to
> > > forward and flush it, maybe restarting those OSDs (68, 69) helps,
> > > too.
> > > Or there could be an issue with the cache tier, what do those
> > > logs
> > > say?
> > > 
> > > Regards,
> > > Eugen
> > > 
> > > 
> > > Zitat von Olivier Bonvalet :
> > > 
> > > > Hello,
> > > > 
> > > > on a Luminous cluster, I have a PG incomplete and I can't find
> > > > how
> > > > to
> > > > fix that.
> > > > 
> > > > It's an EC pool (4+2) :
> > > > 
> > > > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing
> > > > pool
> > > > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
> > > > 'incomplete')
> > > > 
> > > > Of course, we can't reduce min_size from 4.
> > > > 
> > > > And the full state : https://pastebin.com/zrwu5X0w
> > > > 
> > > > So, IO are blocked, we can't access thoses damaged data.
> > > > OSD blocks too :
> > > > osds 32,68,69 have stuck requests > 4194.3 sec
> > > > 
> > > > OSD 32 is the primary of this PG.
> > > > And OSD 68 and 69 are for cache tiering.
> > > > 
> > > > Any idea how can I fix that ?
> > > > 
> > > > Thanks,
> > > > 
> > > > Olivier
> > > > 
> > > > 
> > > > ___
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > > 
> > > 
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> 
> 
> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG stuck incomplete

2018-09-21 Thread Olivier Bonvalet

Hi,

cache-tier on this pool have 26GB of data (for 5.7TB of data on the EC
pool).
We tried to flush the cache tier, and restart OSD 68 & 69, without any
success.

But I don't see any related data on cache-tier OSD (filestore) with :

find /var/lib/ceph/osd/ -maxdepth 3 -name '*37.9c*'


I don't see any usefull information in logs. Maybe I should increase
log level ?

Thanks,

Olivier


Le vendredi 21 septembre 2018 à 09:34 +, Eugen Block a écrit :
> Hi Olivier,
> 
> what size does the cache tier have? You could set cache-mode to  
> forward and flush it, maybe restarting those OSDs (68, 69) helps,
> too.  
> Or there could be an issue with the cache tier, what do those logs
> say?
> 
> Regards,
> Eugen
> 
> 
> Zitat von Olivier Bonvalet :
> 
> > Hello,
> > 
> > on a Luminous cluster, I have a PG incomplete and I can't find how
> > to
> > fix that.
> > 
> > It's an EC pool (4+2) :
> > 
> > pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool
> > bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
> > 'incomplete')
> > 
> > Of course, we can't reduce min_size from 4.
> > 
> > And the full state : https://pastebin.com/zrwu5X0w
> > 
> > So, IO are blocked, we can't access thoses damaged data.
> > OSD blocks too :
> > osds 32,68,69 have stuck requests > 4194.3 sec
> > 
> > OSD 32 is the primary of this PG.
> > And OSD 68 and 69 are for cache tiering.
> > 
> > Any idea how can I fix that ?
> > 
> > Thanks,
> > 
> > Olivier
> > 
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] PG stuck incomplete

2018-09-20 Thread Olivier Bonvalet

Hello,

on a Luminous cluster, I have a PG incomplete and I can't find how to
fix that.

It's an EC pool (4+2) :

pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool
bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
'incomplete')

Of course, we can't reduce min_size from 4.

And the full state : https://pastebin.com/zrwu5X0w

So, IO are blocked, we can't access thoses damaged data.
OSD blocks too :
osds 32,68,69 have stuck requests > 4194.3 sec

OSD 32 is the primary of this PG.
And OSD 68 and 69 are for cache tiering.

Any idea how can I fix that ?

Thanks,

Olivier


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Optane 900P device class automatically set to SSD not NVME

2018-08-13 Thread Olivier Bonvalet

On a recent Luminous cluster, with nvme*n1 devices, the class is
automatically set as "nvme" on "Intel SSD DC P3520 Series" :

~# ceph osd tree
ID  CLASS WEIGHT  TYPE NAME STATUS REWEIGHT PRI-AFF 
 -1   2.15996 root default  
 -9   0.71999 room room135  
 -3   0.71999 host ceph135a 
  0  nvme 0.35999 osd.0 up  1.0 1.0 
  1  nvme 0.35999 osd.1 up  1.0 1.0 
-11   0.71999 room room209  
 -5   0.71999 host ceph209a 
  2  nvme 0.35999 osd.2 up  1.0 1.0 
  3  nvme 0.35999 osd.3 up  1.0 1.0 
-12   0.71999 room room220  
 -7   0.71999 host ceph220a 
  4  nvme 0.35999 osd.4 up  1.0 1.0 
  5  nvme 0.35999 osd.5 up  1.0 1.0 


Le dimanche 12 août 2018 à 23:37 +0200, c...@elchaka.de a écrit :
> 
> Am 1. August 2018 10:33:26 MESZ schrieb Jake Grimmett <
> j...@mrc-lmb.cam.ac.uk>:
> > Dear All,
> 
> Hello Jake,
> 
> > 
> > Not sure if this is a bug, but when I add Intel Optane 900P drives,
> > their device class is automatically set to SSD rather than NVME.
> > 
> 
> AFAIK ceph actually difference only between hdd and ssd. Nvme would
> be handled as same like ssd.
> 
> Hth 
> - Mehmet 
>  
> > This happens under Mimic 13.2.0 and 13.2.1
> > 
> > [root@ceph2 ~]# ceph-volume lvm prepare --bluestore --data
> > /dev/nvme0n1
> > 
> > (SNIP see http://p.ip.fi/eopR for output)
> > 
> > Check...
> > [root@ceph2 ~]# ceph osd tree | grep "osd.1 "
> >  1   ssd0.25470 osd.1   up  1.0 1.0
> > 
> > Fix is easy
> > [root@ceph2 ~]# ceph osd crush rm-device-class osd.1
> > done removing class of osd(s): 1
> > 
> > [root@ceph2 ~]# ceph osd crush set-device-class nvme osd.1
> > set osd(s) 1 to class 'nvme'
> > 
> > Check...
> > [root@ceph2 ~]# ceph osd tree | grep "osd.1 "
> >  1  nvme0.25470 osd.1   up  1.0 1.0
> > 
> > 
> > Thanks,
> > 
> > Jake
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ghost PG : "i don't have pgid xx"

2018-06-05 Thread Olivier Bonvalet

Hi,

Good point ! Changing this value, *and* restarting ceph-mgr fix this
issue. Now we have to find a way to reduce PG account.

Thanks Paul !

Olivier

Le mardi 05 juin 2018 à 10:39 +0200, Paul Emmerich a écrit :
> Hi,
> 
> looks like you are running into the PG overdose protection of
> Luminous (you got > 200 PGs per OSD): try to increase
> mon_max_pg_per_osd on the monitors to 300 or so to temporarily
> resolve this.
> 
> Paul
> 
> 2018-06-05 9:40 GMT+02:00 Olivier Bonvalet :
> > Some more informations : the cluster was just upgraded from Jewel
> > to
> > Luminous.
> > 
> > # ceph pg dump | egrep '(stale|creating)'
> > dumped all
> > 15.32 10947  00 0   0 
> > 45870301184  3067 3067   
> > stale+active+clean 2018-06-04 09:20:42.594317   387644'251008   
> >  437722:754803[48,31,45] 48   
> > [48,31,45] 48   213014'224196 2018-04-22
> > 02:01:09.148152   200181'219150 2018-04-14 14:40:13.116285 
> >0 
> > 19.77  4131  00 0   0 
> > 17326669824  3076 3076   
> > stale+down 2018-06-05 07:28:33.968860394478'58307   
> >  438699:736881  [NONE,20,76] 20   
> >   [NONE,20,76] 20273736'49495 2018-05-17
> > 01:05:35.523735273736'49495 2018-05-17 01:05:35.523735 
> >0 
> > 13.76 10730  00 0   0 
> > 44127133696  3011 3011   
> > stale+down 2018-06-05 07:30:27.578512   397231'457143   
> > 438813:4600135  [NONE,21,76] 21   
> >   [NONE,21,76]         21   286462'438402 2018-05-20
> > 18:06:12.443141   286462'438402 2018-05-20 18:06:12.443141 
> >0 
> > 
> > 
> > 
> > 
> > Le mardi 05 juin 2018 à 09:25 +0200, Olivier Bonvalet a écrit :
> > > Hi,
> > > 
> > > I have a cluster in "stale" state : a lots of RBD are blocked
> > since
> > > ~10
> > > hours. In the status I see PG in stale or down state, but thoses
> > PG
> > > doesn't seem to exists anymore :
> > > 
> > > root! stor00-sbg:~# ceph health detail | egrep '(stale|down)'
> > > HEALTH_ERR noout,noscrub,nodeep-scrub flag(s) set; 1 nearfull
> > osd(s);
> > > 16 pool(s) nearfull; 4645278/103969515 objects misplaced
> > (4.468%);
> > > Reduced data availability: 643 pgs inactive, 12 pgs down, 2 pgs
> > > peering, 3 pgs stale; Degraded data redundancy: 2723173/103969515
> > > objects degraded (2.619%), 387 pgs degraded, 297 pgs undersized;
> > 229
> > > slow requests are blocked > 32 sec; 4074 stuck requests are
> > blocked >
> > > 4096 sec; too many PGs per OSD (202 > max 200); mons hyp01-
> > sbg,hyp02-
> > > sbg,hyp03-sbg are using a lot of disk space
> > > PG_AVAILABILITY Reduced data availability: 643 pgs inactive, 12
> > pgs
> > > down, 2 pgs peering, 3 pgs stale
> > > pg 31.8b is down, acting [2147483647,16,36]
> > > pg 31.8e is down, acting [2147483647,29,19]
> > > pg 46.b8 is down, acting [2147483647,2147483647,13,17,47,28]
> > > 
> > > root! stor00-sbg:~# ceph pg 31.8b query
> > > Error ENOENT: i don't have pgid 31.8b
> > > 
> > > root! stor00-sbg:~# ceph pg 31.8e query
> > > Error ENOENT: i don't have pgid 31.8e
> > > 
> > > root! stor00-sbg:~# ceph pg 46.b8 query
> > > Error ENOENT: i don't have pgid 46.b8
> > > 
> > > 
> > > We just loose an HDD, and mark the corresponding OSD as "lost".
> > > 
> > > Any idea of what should I do ?
> > > 
> > > Thanks,
> > > 
> > > Olivier
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ghost PG : "i don't have pgid xx"

2018-06-05 Thread Olivier Bonvalet

Some more informations : the cluster was just upgraded from Jewel to
Luminous.

# ceph pg dump | egrep '(stale|creating)'
dumped all
15.32 10947  00 0   0  45870301184  
3067 3067stale+active+clean 2018-06-04 
09:20:42.594317   387644'251008 437722:754803[48,31,45] 
48[48,31,45] 48   213014'224196 
2018-04-22 02:01:09.148152   200181'219150 2018-04-14 14:40:13.116285   
  0 
19.77  4131  00 0   0  17326669824  
3076 3076stale+down 2018-06-05 
07:28:33.968860394478'58307 438699:736881  [NONE,20,76] 
20  [NONE,20,76] 20273736'49495 
2018-05-17 01:05:35.523735273736'49495 2018-05-17 01:05:35.523735   
  0 
13.76 10730  00 0   0  44127133696  
3011 3011stale+down 2018-06-05 
07:30:27.578512   397231'457143438813:4600135  [NONE,21,76] 
21  [NONE,21,76] 21   286462'438402 
2018-05-20 18:06:12.443141   286462'438402 2018-05-20 18:06:12.443141   
  0 




Le mardi 05 juin 2018 à 09:25 +0200, Olivier Bonvalet a écrit :
> Hi,
> 
> I have a cluster in "stale" state : a lots of RBD are blocked since
> ~10
> hours. In the status I see PG in stale or down state, but thoses PG
> doesn't seem to exists anymore :
> 
> root! stor00-sbg:~# ceph health detail | egrep '(stale|down)'
> HEALTH_ERR noout,noscrub,nodeep-scrub flag(s) set; 1 nearfull osd(s);
> 16 pool(s) nearfull; 4645278/103969515 objects misplaced (4.468%);
> Reduced data availability: 643 pgs inactive, 12 pgs down, 2 pgs
> peering, 3 pgs stale; Degraded data redundancy: 2723173/103969515
> objects degraded (2.619%), 387 pgs degraded, 297 pgs undersized; 229
> slow requests are blocked > 32 sec; 4074 stuck requests are blocked >
> 4096 sec; too many PGs per OSD (202 > max 200); mons hyp01-sbg,hyp02-
> sbg,hyp03-sbg are using a lot of disk space
> PG_AVAILABILITY Reduced data availability: 643 pgs inactive, 12 pgs
> down, 2 pgs peering, 3 pgs stale
> pg 31.8b is down, acting [2147483647,16,36]
> pg 31.8e is down, acting [2147483647,29,19]
> pg 46.b8 is down, acting [2147483647,2147483647,13,17,47,28]
> 
> root! stor00-sbg:~# ceph pg 31.8b query
> Error ENOENT: i don't have pgid 31.8b
> 
> root! stor00-sbg:~# ceph pg 31.8e query
> Error ENOENT: i don't have pgid 31.8e
> 
> root! stor00-sbg:~# ceph pg 46.b8 query
> Error ENOENT: i don't have pgid 46.b8
> 
> 
> We just loose an HDD, and mark the corresponding OSD as "lost".
> 
> Any idea of what should I do ?
> 
> Thanks,
> 
> Olivier
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ghost PG : "i don't have pgid xx"

2018-06-05 Thread Olivier Bonvalet

Hi,

I have a cluster in "stale" state : a lots of RBD are blocked since ~10
hours. In the status I see PG in stale or down state, but thoses PG
doesn't seem to exists anymore :

root! stor00-sbg:~# ceph health detail | egrep '(stale|down)'
HEALTH_ERR noout,noscrub,nodeep-scrub flag(s) set; 1 nearfull osd(s); 16 
pool(s) nearfull; 4645278/103969515 objects misplaced (4.468%); Reduced data 
availability: 643 pgs inactive, 12 pgs down, 2 pgs peering, 3 pgs stale; 
Degraded data redundancy: 2723173/103969515 objects degraded (2.619%), 387 pgs 
degraded, 297 pgs undersized; 229 slow requests are blocked > 32 sec; 4074 
stuck requests are blocked > 4096 sec; too many PGs per OSD (202 > max 200); 
mons hyp01-sbg,hyp02-sbg,hyp03-sbg are using a lot of disk space
PG_AVAILABILITY Reduced data availability: 643 pgs inactive, 12 pgs down, 2 pgs 
peering, 3 pgs stale
pg 31.8b is down, acting [2147483647,16,36]
pg 31.8e is down, acting [2147483647,29,19]
pg 46.b8 is down, acting [2147483647,2147483647,13,17,47,28]

root! stor00-sbg:~# ceph pg 31.8b query
Error ENOENT: i don't have pgid 31.8b

root! stor00-sbg:~# ceph pg 31.8e query
Error ENOENT: i don't have pgid 31.8e

root! stor00-sbg:~# ceph pg 46.b8 query
Error ENOENT: i don't have pgid 46.b8


We just loose an HDD, and mark the corresponding OSD as "lost".

Any idea of what should I do ?

Thanks,

Olivier
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Re : general protection fault: 0000 [#1] SMP

2017-10-12 Thread Olivier Bonvalet

Le jeudi 12 octobre 2017 à 09:12 +0200, Ilya Dryomov a écrit :
> It's a crash in memcpy() in skb_copy_ubufs().  It's not in ceph, but
> ceph-induced, it looks like.  I don't remember seeing anything
> similar
> in the context of krbd.
> 
> This is a Xen dom0 kernel, right?  What did the workload look like?
> Can you provide dmesg before the crash?

Hi,

yes it's a Xen dom0 kernel. Linux 4.13.3, Xen 4.8.2, with an old
0.94.10 Ceph (so, Hammer).

Before this error, I add this in logs :

Oct 11 16:00:41 lorunde kernel: [310548.899082] libceph: read_partial_message 
88021a910200 data crc 2306836368 != exp. 2215155875
Oct 11 16:00:41 lorunde kernel: [310548.899841] libceph: osd117 10.0.0.31:6804 
bad crc/signature
Oct 11 16:02:25 lorunde kernel: [310652.695015] libceph: read_partial_message 
880220b10100 data crc 842840543 != exp. 2657161714
Oct 11 16:02:25 lorunde kernel: [310652.695731] libceph: osd3 10.0.0.26:6804 
bad crc/signature
Oct 11 16:07:24 lorunde kernel: [310952.485202] libceph: read_partial_message 
88025d1aa400 data crc 938978341 != exp. 4154366769
Oct 11 16:07:24 lorunde kernel: [310952.485870] libceph: osd117 10.0.0.31:6804 
bad crc/signature
Oct 11 16:10:44 lorunde kernel: [311151.841812] libceph: read_partial_message 
880260300400 data crc 2988747958 != exp. 319958859
Oct 11 16:10:44 lorunde kernel: [311151.842672] libceph: osd9 10.0.0.51:6802 
bad crc/signature
Oct 11 16:10:57 lorunde kernel: [311165.211412] libceph: read_partial_message 
8802208b8300 data crc 369498361 != exp. 906022772
Oct 11 16:10:57 lorunde kernel: [311165.212135] libceph: osd87 10.0.0.5:6800 
bad crc/signature
Oct 11 16:12:27 lorunde kernel: [311254.635767] libceph: read_partial_message 
880236f9a000 data crc 2586662963 != exp. 2886241494
Oct 11 16:12:27 lorunde kernel: [311254.636493] libceph: osd90 10.0.0.5:6814 
bad crc/signature
Oct 11 16:14:31 lorunde kernel: [311378.808191] libceph: read_partial_message 
88027e633c00 data crc 1102363051 != exp. 679243837
Oct 11 16:14:31 lorunde kernel: [311378.808889] libceph: osd13 10.0.0.21:6804 
bad crc/signature
Oct 11 16:15:01 lorunde kernel: [311409.431034] libceph: read_partial_message 
88024ce0a800 data crc 2467415342 != exp. 1753860323
Oct 11 16:15:01 lorunde kernel: [311409.431718] libceph: osd111 10.0.0.30:6804 
bad crc/signature
Oct 11 16:15:11 lorunde kernel: [311418.891238] general protection fault:  
[#1] SMP


We had to switch to TCP Cubic (instead of badly configured TCP BBR, without 
FQ), to reduce the data crc errors.
But since we still had some errors, last night we rebooted all the OSD nodes in 
Linux 4.4.91, instead of Linux 4.9.47 & 4.9.53.

Since the last 7 hours, we haven't got any data crc errors from OSD, but we had 
one from a MON. Without hang/crash.

About the workload, the Xen VMs are mainly LAMP servers : http traffic, handle 
by nginx or apache, php, and MySQL databases.

Thanks,

Olivier
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] general protection fault: 0000 [#1] SMP

2017-10-11 Thread Olivier Bonvalet

Hi,

I had a "general protection fault: " with Ceph RBD kernel client.
Not sure how to read the call, is it Ceph related ?


Oct 11 16:15:11 lorunde kernel: [311418.891238] general protection fault:  
[#1] SMP
Oct 11 16:15:11 lorunde kernel: [311418.891855] Modules linked in: cpuid 
binfmt_misc nls_iso8859_1 nls_cp437 vfat fat tcp_diag inet_diag xt_physdev 
br_netfilter iptable_filter xen_netback loop xen_blkback cbc rbd libceph 
xen_gntdev xen_evtchn xenfs xen_privcmd ipmi_ssif intel_rapl iosf_mbi sb_edac 
x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul 
ghash_clmulni_intel iTCO_wdt pcbc iTCO_vendor_support mxm_wmi aesni_intel 
aes_x86_64 crypto_simd glue_helper cryptd mgag200 i2c_algo_bit drm_kms_helper 
intel_rapl_perf ttm drm syscopyarea sysfillrect efi_pstore sysimgblt 
fb_sys_fops lpc_ich efivars mfd_core evdev ioatdma shpchp acpi_power_meter 
ipmi_si wmi button ipmi_devintf ipmi_msghandler bridge efivarfs ip_tables 
x_tables autofs4 dm_mod dax raid10 raid456 async_raid6_recov async_memcpy 
async_pq async_xor xor async_tx raid6_pq
Oct 11 16:15:11 lorunde kernel: [311418.895403]  libcrc32c raid1 raid0 
multipath linear md_mod hid_generic usbhid i2c_i801 crc32c_intel i2c_core 
xhci_pci ahci ixgbe xhci_hcd libahci ehci_pci ehci_hcd libata usbcore dca ptp 
usb_common pps_core mdio
Oct 11 16:15:11 lorunde kernel: [311418.896551] CPU: 1 PID: 4916 Comm: 
kworker/1:0 Not tainted 4.13-dae-dom0 #2
Oct 11 16:15:11 lorunde kernel: [311418.897134] Hardware name: Intel 
Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0019.101220160604 
10/12/2016
Oct 11 16:15:11 lorunde kernel: [311418.897745] Workqueue: ceph-msgr 
ceph_con_workfn [libceph]
Oct 11 16:15:11 lorunde kernel: [311418.898355] task: 8801ce434280 
task.stack: c900151bc000
Oct 11 16:15:11 lorunde kernel: [311418.899007] RIP: e030:memcpy_erms+0x6/0x10
Oct 11 16:15:11 lorunde kernel: [311418.899616] RSP: e02b:c900151bfac0 
EFLAGS: 00010202
Oct 11 16:15:11 lorunde kernel: [311418.900228] RAX: 8801b63df000 RBX: 
88021b41be00 RCX: 04df
Oct 11 16:15:11 lorunde kernel: [311418.900848] RDX: 04df RSI: 
4450736e24806564 RDI: 8801b63df000
Oct 11 16:15:11 lorunde kernel: [311418.901479] RBP: ea0005fdd8c8 R08: 
88028545d618 R09: 0010
Oct 11 16:15:11 lorunde kernel: [311418.902104] R10:  R11: 
880215815000 R12: 
Oct 11 16:15:11 lorunde kernel: [311418.902723] R13: 8802158156c0 R14: 
 R15: 8801ce434280
Oct 11 16:15:11 lorunde kernel: [311418.903359] FS:  () 
GS:88028544() knlGS:88028544
Oct 11 16:15:11 lorunde kernel: [311418.903994] CS:  e033 DS:  ES:  
CR0: 80050033
Oct 11 16:15:11 lorunde kernel: [311418.904627] CR2: 55a8461cfc20 CR3: 
01809000 CR4: 00042660
Oct 11 16:15:11 lorunde kernel: [311418.905271] Call Trace:
Oct 11 16:15:11 lorunde kernel: [311418.905909]  ? skb_copy_ubufs+0xef/0x290
Oct 11 16:15:11 lorunde kernel: [311418.906548]  ? skb_clone+0x82/0x90
Oct 11 16:15:11 lorunde kernel: [311418.907225]  ? tcp_transmit_skb+0x74/0x930
Oct 11 16:15:11 lorunde kernel: [311418.907858]  ? tcp_write_xmit+0x1bd/0xfb0
Oct 11 16:15:11 lorunde kernel: [311418.908490]  ? 
__sk_mem_raise_allocated+0x4e/0x220
Oct 11 16:15:11 lorunde kernel: [311418.909122]  ? 
__tcp_push_pending_frames+0x28/0x90
Oct 11 16:15:11 lorunde kernel: [311418.909755]  ? do_tcp_sendpages+0x4fc/0x590
Oct 11 16:15:11 lorunde kernel: [311418.910386]  ? tcp_sendpage+0x7c/0xa0
Oct 11 16:15:11 lorunde kernel: [311418.911026]  ? inet_sendpage+0x37/0xe0
Oct 11 16:15:11 lorunde kernel: [311418.911655]  ? kernel_sendpage+0x12/0x20
Oct 11 16:15:11 lorunde kernel: [311418.912297]  ? ceph_tcp_sendpage+0x5c/0xc0 
[libceph]
Oct 11 16:15:11 lorunde kernel: [311418.912926]  ? ceph_tcp_recvmsg+0x53/0x70 
[libceph]
Oct 11 16:15:11 lorunde kernel: [311418.913553]  ? ceph_con_workfn+0xd08/0x22a0 
[libceph]
Oct 11 16:15:11 lorunde kernel: [311418.914179]  ? 
ceph_osdc_start_request+0x23/0x30 [libceph]
Oct 11 16:15:11 lorunde kernel: [311418.914807]  ? 
rbd_img_obj_request_submit+0x1ac/0x3c0 [rbd]
Oct 11 16:15:11 lorunde kernel: [311418.915458]  ? process_one_work+0x1ad/0x340
Oct 11 16:15:11 lorunde kernel: [311418.916083]  ? worker_thread+0x45/0x3f0
Oct 11 16:15:11 lorunde kernel: [311418.916706]  ? kthread+0xf2/0x130
Oct 11 16:15:11 lorunde kernel: [311418.917327]  ? process_one_work+0x340/0x340
Oct 11 16:15:11 lorunde kernel: [311418.917946]  ? 
kthread_create_on_node+0x40/0x40
Oct 11 16:15:11 lorunde kernel: [311418.918565]  ? do_group_exit+0x35/0xa0
Oct 11 16:15:11 lorunde kernel: [311418.919215]  ? ret_from_fork+0x25/0x30
Oct 11 16:15:11 lorunde kernel: [311418.919826] Code: 43 4e 5b eb ec eb 1e 0f 
1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 
44 00 00 48 89 f8 48 89 d1  a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 
72 7e 40 38 
Oct 11 16:15:11

[ceph-users] Re : Re : Re : bad crc/signature errors

2017-10-06 Thread Olivier Bonvalet

Le jeudi 05 octobre 2017 à 21:52 +0200, Ilya Dryomov a écrit :
> On Thu, Oct 5, 2017 at 6:05 PM, Olivier Bonvalet <ceph.l...@daevel.fr
> > wrote:
> > Le jeudi 05 octobre 2017 à 17:03 +0200, Ilya Dryomov a écrit :
> > > When did you start seeing these errors?  Can you correlate that
> > > to
> > > a ceph or kernel upgrade?  If not, and if you don't see other
> > > issues,
> > > I'd write it off as faulty hardware.
> > 
> > Well... I have one hypervisor (Xen 4.6 and kernel Linux 4.1.13),
> > which
> 
> Is that 4.1.13 or 4.13.1?
> 

Linux 4.1.13. The old Debian 8, with Xen 4.6 from upstream.


> > have the problem for a long time, at least since 1 month (I haven't
> > older logs).
> > 
> > But, on others hypervisors (Xen 4.8 with Linux 4.9.x), I haven't
> > the
> > problem.
> > And it's when I upgraded thoses hypervisors to Linux 4.13.x, that
> > "bad
> > crc" errors appeared.
> > 
> > Note : if I upgraded kernels on Xen 4.8 hypervisors, it's because
> > some
> > DISCARD commands over RBD were blocking ("fstrim" works, but not
> > "lvremove" with discard enabled). After upgrading to Linux 4.13.3,
> > DISCARD works again on Xen 4.8.
> 
> Which kernel did you upgrade from to 4.13.3 exactly?
> 
> 

4.9.47 or 4.9.52, I don't have more precise data about this.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Re : Re : bad crc/signature errors

2017-10-05 Thread Olivier Bonvalet

Le jeudi 05 octobre 2017 à 17:03 +0200, Ilya Dryomov a écrit :
> When did you start seeing these errors?  Can you correlate that to
> a ceph or kernel upgrade?  If not, and if you don't see other issues,
> I'd write it off as faulty hardware.

Well... I have one hypervisor (Xen 4.6 and kernel Linux 4.1.13), which
have the problem for a long time, at least since 1 month (I haven't
older logs).

But, on others hypervisors (Xen 4.8 with Linux 4.9.x), I haven't the
problem.
And it's when I upgraded thoses hypervisors to Linux 4.13.x, that "bad
crc" errors appeared.

Note : if I upgraded kernels on Xen 4.8 hypervisors, it's because some
DISCARD commands over RBD were blocking ("fstrim" works, but not
"lvremove" with discard enabled). After upgrading to Linux 4.13.3,
DISCARD works again on Xen 4.8.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Re : Re : bad crc/signature errors

2017-10-05 Thread Olivier Bonvalet

Le jeudi 05 octobre 2017 à 11:10 +0200, Ilya Dryomov a écrit :
> On Thu, Oct 5, 2017 at 9:03 AM, Olivier Bonvalet <ceph.l...@daevel.fr
> > wrote:
> > I also see that, but on 4.9.52 and 4.13.3 kernel.
> > 
> > I also have some kernel panic, but don't know if it's related (RBD
> > are
> > mapped on Xen hosts).
> 
> Do you have that panic message?
> 
> Do you use rbd devices for something other than Xen?  If so, have you
> ever seen these errors outside of Xen?
> 
> Thanks,
> 
> Ilya
> 

No, I don't have that panic message : the hosts reboots way too
quickly. And no, I only use this cluster with Xen.

Sorry for this useless answer...

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Re : bad crc/signature errors

2017-10-05 Thread Olivier Bonvalet

Le jeudi 05 octobre 2017 à 11:47 +0200, Ilya Dryomov a écrit :
> The stable pages bug manifests as multiple sporadic connection
> resets,
> because in that case CRCs computed by the kernel don't always match
> the
> data that gets sent out.  When the mismatch is detected on the OSD
> side, OSDs reset the connection and you'd see messages like
> 
>   libceph: osd1 1.2.3.4:6800 socket closed (con state OPEN)
>   libceph: osd2 1.2.3.4:6804 socket error on write
> 
> This is a different issue.  Josy, Adrian, Olivier, do you also see
> messages of the "libceph: read_partial_message ..." type or is it
> just
> "libceph: ... bad crc/signature" errors?
> 
> Thanks,
> 
> Ilya

I have "read_partial_message" too, for example :

Oct  5 09:00:47 lorunde kernel: [65575.969322] libceph: read_partial_message 
88027c231500 data crc 181941039 != exp. 115232978
Oct  5 09:00:47 lorunde kernel: [65575.969953] libceph: osd122 10.0.0.31:6800 
bad crc/signature
Oct  5 09:04:30 lorunde kernel: [65798.958344] libceph: read_partial_message 
880254a25c00 data crc 443114996 != exp. 2014723213
Oct  5 09:04:30 lorunde kernel: [65798.959044] libceph: osd18 10.0.0.22:6802 
bad crc/signature
Oct  5 09:14:28 lorunde kernel: [66396.788272] libceph: read_partial_message 
880238636200 data crc 1797729588 != exp. 2550563968
Oct  5 09:14:28 lorunde kernel: [66396.788984] libceph: osd43 10.0.0.9:6804 bad 
crc/signature
Oct  5 10:09:36 lorunde kernel: [69704.211672] libceph: read_partial_message 
8802712dff00 data crc 2241944833 != exp. 762990605
Oct  5 10:09:36 lorunde kernel: [69704.212422] libceph: osd103 10.0.0.28:6804 
bad crc/signature
Oct  5 10:25:41 lorunde kernel: [70669.203596] libceph: read_partial_message 
880257521400 data crc 3655331946 != exp. 2796991675
Oct  5 10:25:41 lorunde kernel: [70669.204462] libceph: osd16 10.0.0.21:6806 
bad crc/signature
Oct  5 10:25:52 lorunde kernel: [70680.255943] libceph: read_partial_message 
880245e3d600 data crc 3787567693 != exp. 725251636
Oct  5 10:25:52 lorunde kernel: [70680.257066] libceph: osd60 10.0.0.23:6800 
bad crc/signature


On OSD side, for osd122 for example, I don't see any "reset" in osd
log.


Thanks,

Olivier
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Re : bad crc/signature errors

2017-10-05 Thread Olivier Bonvalet

I also see that, but on 4.9.52 and 4.13.3 kernel.

I also have some kernel panic, but don't know if it's related (RBD are
mapped on Xen hosts).

Le jeudi 05 octobre 2017 à 05:53 +, Adrian Saul a écrit :
> We see the same messages and are similarly on a 4.4 KRBD version that
> is affected by this.
> 
> I have seen no impact from it so far that I know about
> 
> 
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
> > Behalf Of
> > Jason Dillaman
> > Sent: Thursday, 5 October 2017 5:45 AM
> > To: Gregory Farnum 
> > Cc: ceph-users ; Josy
> > 
> > Subject: Re: [ceph-users] bad crc/signature errors
> > 
> > Perhaps this is related to a known issue on some 4.4 and later
> > kernels [1]
> > where the stable write flag was not preserved by the kernel?
> > 
> > [1] http://tracker.ceph.com/issues/19275
> > 
> > On Wed, Oct 4, 2017 at 2:36 PM, Gregory Farnum 
> > wrote:
> > > That message indicates that the checksums of messages between
> > > your
> > > kernel client and OSD are incorrect. It could be actual physical
> > > transmission errors, but if you don't see other issues then this
> > > isn't
> > > fatal; they can recover from it.
> > > 
> > > On Wed, Oct 4, 2017 at 8:52 AM Josy 
> > 
> > wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > We have setup a cluster with 8 OSD servers (31 disks)
> > > > 
> > > > Ceph health is Ok.
> > > > --
> > > > [root@las1-1-44 ~]# ceph -s
> > > >cluster:
> > > >  id: de296604-d85c-46ab-a3af-add3367f0e6d
> > > >  health: HEALTH_OK
> > > > 
> > > >services:
> > > >  mon: 3 daemons, quorum
> > > > ceph-las-mon-a1,ceph-las-mon-a2,ceph-las-mon-a3
> > > >  mgr: ceph-las-mon-a1(active), standbys: ceph-las-mon-a2
> > > >  osd: 31 osds: 31 up, 31 in
> > > > 
> > > >data:
> > > >  pools:   4 pools, 510 pgs
> > > >  objects: 459k objects, 1800 GB
> > > >  usage:   5288 GB used, 24461 GB / 29749 GB avail
> > > >  pgs: 510 active+clean
> > > > 
> > > > 
> > > > We created a pool and mounted it as RBD in one of the client
> > > > server.
> > > > While adding data to it, we see this below error :
> > > > 
> > > > 
> > > > [939656.039750] libceph: osd20 10.255.0.9:6808 bad
> > > > crc/signature
> > > > [939656.041079] libceph: osd16 10.255.0.8:6816 bad
> > > > crc/signature
> > > > [939735.627456] libceph: osd11 10.255.0.7:6800 bad
> > > > crc/signature
> > > > [939735.628293] libceph: osd30 10.255.0.11:6804 bad
> > > > crc/signature
> > > > 
> > > > =
> > > > 
> > > > Can anyone explain what is this and if I can fix it ?
> > > > 
> > > > 
> > > > ___
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > > 
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > 
> > 
> > 
> > --
> > Jason
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> Confidentiality: This email and any attachments are confidential and
> may be subject to copyright, legal or some other professional
> privilege. They are intended solely for the attention and use of the
> named addressee(s). They may only be copied, distributed or disclosed
> with the consent of the copyright owner. If you have received this
> email by mistake or by breach of the confidentiality clause, please
> notify the sender immediately by return email and delete or destroy
> all copies of the email. Any confidentiality, privilege or copyright
> is not waived or lost because this email has been sent to you by
> mistake.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph.com IPv6 down

2015-09-23 Thread Olivier Bonvalet

Le mercredi 23 septembre 2015 à 13:41 +0200, Wido den Hollander a écrit
 :
> Hmm, that is weird. It works for me here from the Netherlands via
> IPv6:

You're right, I checked from other providers and it works.

So, a problem between Free (France) and Dreamhost ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph.com IPv6 down

2015-09-23 Thread Olivier Bonvalet

Hi,

since several hours http://ceph.com/ doesn't reply anymore in IPv6.
It pings, and we can open TCP socket, but nothing more :


~$ nc -w30 -v -6 ceph.com 80
Connection to ceph.com 80 port [tcp/http] succeeded!
GET / HTTP/1.0
Host: ceph.com




But, a HEAD query works :

~$ nc -w30 -v -6 ceph.com 80
Connection to ceph.com 80 port [tcp/http] succeeded!
HEAD / HTTP/1.0
Host: ceph.com
HTTP/1.0 200 OK
Date: Wed, 23 Sep 2015 11:35:27 GMT
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips PHP/5.4.16
X-Powered-By: PHP/5.4.16
Set-Cookie: PHPSESSID=q0jf4mh9rqfk5du4kn8tcnqen1; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, 
pre-check=0
Pragma: no-cache
X-Pingback: http://ceph.com/xmlrpc.php
Link: ; rel=shortlink
Connection: close
Content-Type: text/html; charset=UTF-8



So, from my browser the website is unavailable.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Olivier Bonvalet

Le vendredi 18 septembre 2015 à 12:04 +0200, Jan Schermer a écrit :
> > On 18 Sep 2015, at 11:28, Christian Balzer <ch...@gol.com> wrote:
> > 
> > On Fri, 18 Sep 2015 11:07:49 +0200 Olivier Bonvalet wrote:
> > 
> > > Le vendredi 18 septembre 2015 à 10:59 +0200, Jan Schermer a écrit
> > > :
> > > > In that case it can either be slow monitors (slow network, slow
> > > > disks(!!!)  or a CPU or memory problem).
> > > > But it still can also be on the OSD side in the form of either
> > > > CPU
> > > > usage or memory pressure - in my case there were lots of memory
> > > > used
> > > > for pagecache (so for all intents and purposes considered
> > > > "free") but
> > > > when peering the OSD had trouble allocating any memory from it
> > > > and it
> > > > caused lots of slow ops and peering hanging in there for a
> > > > while.
> > > > This also doesn't show as high CPU usage, only kswapd spins up
> > > > a bit
> > > > (don't be fooled by its name, it has nothing to do with swap in
> > > > this
> > > > case).
> > > 
> > > My nodes have 256GB of RAM (for 12x300GB ones) or 128GB of RAM
> > > (for
> > > 4x800GB ones), so I will try track this too. Thanks !
> > > 
> > I haven't seen this (known problem) with 64GB or 128GB nodes,
> > probably
> > because I set /proc/sys/vm/min_free_kbytes to 512MB or 1GB
> > respectively.
> > 
> 
> I had this set to 6G and that doesn't help. This "buffer" is probably
> only useful for some atomic allocations that can use it, not for
> userland processes and their memory. Or maybe they get memory from
> this pool but it gets replenished immediately.
> QEMU has no problem allocating 64G on the same host, OSD struggles to
> allocate memory during startup or when PGs are added during
> rebalancing - probably because it does a lot of smaller allocations
> instead of one big.
> 

For now I dropped cache *and* set min_free_kbytes to 1GB. I don't throw
any rebalance, but I can see a reduced filestore.commitcycle_latency.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Olivier Bonvalet

Le vendredi 18 septembre 2015 à 17:04 +0900, Christian Balzer a écrit :
> Hello,
> 
> On Fri, 18 Sep 2015 09:37:24 +0200 Olivier Bonvalet wrote:
> 
> > Hi,
> > 
> > sorry for missing informations. I was to avoid putting too much
> > inappropriate infos ;)
> > 
> Nah, everything helps, there are known problems with some versions,
> kernels, file systems, etc.
> 
> Speaking of which, what FS are you using on your OSDs?
> 

XFS.

> > 
> > 
> > Le vendredi 18 septembre 2015 à 12:30 +0900, Christian Balzer a
> > écrit :
> > > Hello,
> > > 
> > > On Fri, 18 Sep 2015 02:43:49 +0200 Olivier Bonvalet wrote:
> > > 
> > > The items below help, but be a s specific as possible, from OS,
> > > kernel
> > > version to Ceph version, "ceph -s", any other specific details
> > > (pool
> > > type,
> > > replica size).
> > > 
> > 
> > So, all nodes use Debian Wheezy, running on a vanilla 3.14.x
> > kernel,
> > and Ceph 0.80.10.
> All my stuff is on Jessie, but at least Firefly should be stable and
> I
> haven't seen anything like your problem with it.
> And while 3.14 is a LTS kernel I wonder if something newer may be
> beneficial, but probably not.
> 

Well, I can try a 3.18.x kernel. But for that I have to restart all
nodes, which will throw some backfilling and probably some blocked IO
too ;)


> > I don't have anymore ceph status right now. But I have
> > data to move tonight again, so I'll track that.
> > 
> I was interested in that to see how many pools and PGs you have.

Well :

cluster de035250-323d-4cf6-8c4b-cf0faf6296b1
 health HEALTH_OK
 monmap e21: 3 mons at 
{faude=10.0.0.13:6789/0,murmillia=10.0.0.18:6789/0,rurkh=10.0.0.19:6789/0}, 
election epoch 4312, quorum 0,1,2 faude,murmillia,rurkh
 osdmap e847496: 88 osds: 88 up, 87 in
  pgmap v86390609: 6632 pgs, 16 pools, 18883 GB data, 5266 kobjects
68559 GB used, 59023 GB / 124 TB avail
6632 active+clean
  client io 3194 kB/s rd, 23542 kB/s wr, 1450 op/s


There is mainly 2 pools used. A "ssd" pool, and a "hdd" pool. This hdd
pool use different OSD, on different nodes.
Since I don't often balance data of this hdd pool, I don't yet see
problem on it.



> >  The affected pool is a standard one (no erasure coding), with only
> > 2
> > replica (size=2).
> > 
> Good, nothing fancy going on there then.
> 
> > 
> > 
> > 
> > > > Some additionnal informations :
> > > > - I have 4 SSD per node.
> > > Type, if nothing else for anecdotal reasons.
> > 
> > I have 7 storage nodes here :
> > - 3 nodes which have each 12 OSD of 300GB
> > SSD
> > - 4 nodes which have each  4 OSD of 800GB SSD
> > 
> > And I'm trying to replace 12x300GB nodes by 4x800GB nodes.
> > 
> Type as in model/maker, but helpful information.
> 

300GB models are Intel SSDSC2BB300G4 (DC S3500).
800GB models are Intel SSDSC2BB800H4 (DC S3500 I think).




> > 
> > 
> > > > - the CPU usage is near 0
> > > > - IO wait is near 0 too
> > > Including the trouble OSD(s)?
> > 
> > Yes
> > 
> > 
> > > Measured how, iostat or atop?
> > 
> > iostat, htop, and confirmed with Zabbix supervisor.
> > 
> 
> Good. I'm sure you checked for network errors. 
> Single network or split client/cluster network?
> 

It's the first thing I checked, and latency and packet loss is
monitored between each node and mons, but maybe I forgot some checks.


> > 
> > 
> > 
> > > > - bandwith usage is also near 0
> > > > 
> > > Yeah, all of the above are not surprising if everything is stuck
> > > waiting
> > > on some ops to finish. 
> > > 
> > > How many nodes are we talking about?
> > 
> > 
> > 7 nodes, 52 OSDs.
> > 
> That be below the threshold for most system tunables (there are
> various
> threads and articles on how to tune Ceph for "large" clusters).
> 
> Since this happens only when your cluster reshuffles data (and thus
> has
> more threads going) what is your ulimit setting for open files?


Wow... the default one on Debian Wheezy : 1024.



> > 
> > 
> > > > The whole cluster seems waiting for something... but I don't
> > > > see
> > > > what.
> > > > 
> > > Is it just one specific OSD (or a set of them) or is that all
> > > over
> > > the
> > > place?
> > 
> > A set of them. When I increase the weight of all 4 OSDs of a node,
&g

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Olivier Bonvalet

Le vendredi 18 septembre 2015 à 10:59 +0200, Jan Schermer a écrit :
> In that case it can either be slow monitors (slow network, slow
> disks(!!!)  or a CPU or memory problem).
> But it still can also be on the OSD side in the form of either CPU
> usage or memory pressure - in my case there were lots of memory used
> for pagecache (so for all intents and purposes considered "free") but
> when peering the OSD had trouble allocating any memory from it and it
> caused lots of slow ops and peering hanging in there for a while.
> This also doesn't show as high CPU usage, only kswapd spins up a bit
> (don't be fooled by its name, it has nothing to do with swap in this
> case).

My nodes have 256GB of RAM (for 12x300GB ones) or 128GB of RAM (for
4x800GB ones), so I will try track this too. Thanks !


> echo 1 >/proc/sys/vm/drop_caches before I touch anything has become a
> routine now and that problem is gone.
> 
> Jan
> 
> > On 18 Sep 2015, at 10:53, Olivier Bonvalet <ceph.l...@daevel.fr>
> > wrote:
> > 
> > mmm good point.
> > 
> > I don't see CPU or IO problem on mons, but in logs, I have this :
> > 
> > 2015-09-18 01:55:16.921027 7fb951175700  0 log [INF] : pgmap
> > v86359128:
> > 6632 pgs: 77 inactive, 1 remapped, 10
> > active+remapped+wait_backfill, 25
> > peering, 5 active+remapped, 6 active+remapped+backfilling, 6499
> > active+clean, 9 remapped+peering; 18974 GB data, 69004 GB used,
> > 58578
> > GB / 124 TB avail; 915 kB/s rd, 26383 kB/s wr, 1671 op/s;
> > 8417/15680513
> > objects degraded (0.054%); 1062 MB/s, 274 objects/s recovering
> > 
> > 
> > So... it can be a peering problem. Didn't see that, thanks.
> > 
> > 
> > 
> > Le vendredi 18 septembre 2015 à 09:52 +0200, Jan Schermer a écrit :
> > > Could this be caused by monitors? In my case lagging monitors can
> > > also cause slow requests (because of slow peering). Not sure if
> > > that's expected or not, but it of course doesn't show on the OSDs
> > > as
> > > any kind of bottleneck when you try to investigate...
> > > 
> > > Jan
> > > 
> > > > On 18 Sep 2015, at 09:37, Olivier Bonvalet <ceph.l...@daevel.fr
> > > > >
> > > > wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > sorry for missing informations. I was to avoid putting too much
> > > > inappropriate infos ;)
> > > > 
> > > > 
> > > > 
> > > > Le vendredi 18 septembre 2015 à 12:30 +0900, Christian Balzer a
> > > > écrit :
> > > > > Hello,
> > > > > 
> > > > > On Fri, 18 Sep 2015 02:43:49 +0200 Olivier Bonvalet wrote:
> > > > > 
> > > > > The items below help, but be a s specific as possible, from
> > > > > OS,
> > > > > kernel
> > > > > version to Ceph version, "ceph -s", any other specific
> > > > > details
> > > > > (pool
> > > > > type,
> > > > > replica size).
> > > > > 
> > > > 
> > > > So, all nodes use Debian Wheezy, running on a vanilla 3.14.x
> > > > kernel,
> > > > and Ceph 0.80.10.
> > > > I don't have anymore ceph status right now. But I have
> > > > data to move tonight again, so I'll track that.
> > > > 
> > > > The affected pool is a standard one (no erasure coding), with
> > > > only
> > > > 2 replica (size=2).
> > > > 
> > > > 
> > > > 
> > > > 
> > > > > > Some additionnal informations :
> > > > > > - I have 4 SSD per node.
> > > > > Type, if nothing else for anecdotal reasons.
> > > > 
> > > > I have 7 storage nodes here :
> > > > - 3 nodes which have each 12 OSD of 300GB
> > > > SSD
> > > > - 4 nodes which have each  4 OSD of 800GB SSD
> > > > 
> > > > And I'm trying to replace 12x300GB nodes by 4x800GB nodes.
> > > > 
> > > > 
> > > > 
> > > > > > - the CPU usage is near 0
> > > > > > - IO wait is near 0 too
> > > > > Including the trouble OSD(s)?
> > > > 
> > > > Yes
> > > > 
> > > > 
> > > > > Measured how, iostat or atop?
> > > > 
> > > > iostat, htop, and confirmed with Zabbix supervisor.
> > > > 
> > > > 
> > > >

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Olivier Bonvalet

Hi,

I think I found the problem : a way too large journal.
I catch this from logs of an OSD having blocked queries :

OSD.15 :

2015-09-19 00:41:12.717062 7fb8a3c44700  1 journal check_for_full at 3548528640 
: JOURNAL FULL 3548528640 >= 1376255 (max_size 4294967296 start 3549904896)
2015-09-19 00:41:43.124590 7fb8a6181700  0 log [WRN] : 6 slow requests, 6 
included below; oldest blocked for > 30.405719 secs
2015-09-19 00:41:43.124596 7fb8a6181700  0 log [WRN] : slow request 30.405719 
seconds old, received at 2015-09-19 00:41:12.718829: 
osd_op(client.31621623.1:5392489797 rb.0.1b844d6.238e1f29.04d3 [write 
0~4096] 6.3aed306f snapc 4=[4,11096,11018] ondisk+write e847952) v4 
currently waiting for subops from 19
2015-09-19 00:41:43.124599 7fb8a6181700  0 log [WRN] : slow request 30.172735 
seconds old, received at 2015-09-19 00:41:12.951813: 
osd_op(client.31435077.1:8423014905 rb.0.1c39394.238e1f29.037a [write 
1499136~8192] 6.2ffed26e snapc 8=[8,1109a,1101c] ondisk+write e847952) 
v4 currently waiting for subops from 28
2015-09-19 00:41:43.124602 7fb8a6181700  0 log [WRN] : slow request 30.172703 
seconds old, received at 2015-09-19 00:41:12.951845: 
osd_op(client.31435077.1:8423014906 rb.0.1c39394.238e1f29.037a [write 
1523712~8192] 6.2ffed26e snapc 8=[8,1109a,1101c] ondisk+write e847952) 
v4 currently waiting for subops from 28
2015-09-19 00:41:43.124604 7fb8a6181700  0 log [WRN] : slow request 30.172576 
seconds old, received at 2015-09-19 00:41:12.951972: 
osd_op(client.31435077.1:8423014907 rb.0.1c39394.238e1f29.037a [write 
1515520~8192] 6.2ffed26e snapc 8=[8,1109a,1101c] ondisk+write e847952) 
v4 currently waiting for subops from 28
2015-09-19 00:41:43.124606 7fb8a6181700  0 log [WRN] : slow request 30.172546 
seconds old, received at 2015-09-19 00:41:12.952002: 
osd_op(client.31435077.1:8423014909 rb.0.1c39394.238e1f29.037a [write 
1531904~8192] 6.2ffed26e snapc 8=[8,1109a,1101c] ondisk+write e847952) 
v4 currently waiting for subops from 28

and at same time on OSD.19 :

2015-09-19 00:41:19.549508 7f55973c0700  0 -- 192.168.42.22:6806/28596 >> 
192.168.42.16:6828/38905 pipe(0x230f sd=358 :6806 s=2 pgs=14268 cs=3 l=0 
c=0x6d9cb00).fault with nothing to send, going to standby
2015-09-19 00:41:43.246421 7f55ba277700  0 log [WRN] : 1 slow requests, 1 
included below; oldest blocked for > 30.253274 secs
2015-09-19 00:41:43.246428 7f55ba277700  0 log [WRN] : slow request 30.253274 
seconds old, received at 2015-09-19 00:41:12.993123: 
osd_op(client.31626115.1:4664205553 rb.0.1c918ad.238e1f29.2da9 [write 
3063808~16384] 6.604ba242 snapc 10aaf=[10aaf,10a31,109b3] ondisk+write e847952) 
v4 currently waiting for subops from 15
2015-09-19 00:42:13.251591 7f55ba277700  0 log [WRN] : 1 slow requests, 1 
included below; oldest blocked for > 60.258446 secs
2015-09-19 00:42:13.251596 7f55ba277700  0 log [WRN] : slow request 60.258446 
seconds old, received at 2015-09-19 00:41:12.993123: 
osd_op(client.31626115.1:4664205553 rb.0.1c918ad.238e1f29.2da9 [write 
3063808~16384] 6.604ba242 snapc 10aaf=[10aaf,10a31,109b3] ondisk+write e847952) 
v4 currently waiting for subops from 15

So the blocking seem to be the "JOURNAL FULL" event, with big numbers. 
3548528640, is the journal size ?
I just reduced the filestore_max_sync_interval to 30s, and everything
seems to work fine.

For SSD OSD, with journal on same device, big journal is a crazy
thing... I suppose I break this setup when trying to tune the journal
for the HDD pool.

At same time, is there tips tuning journal in case of HDD OSD, with
(potentially big) SSD journal, and hardware RAID card which handle
write back ?

Thanks for your help.

Olivier


Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier Bonvalet a écrit :
> Hi,
> 
> I have a cluster with lot of blocked operations each time I try to
> move
> data (by reweighting a little an OSD).
> 
> It's a full SSD cluster, with 10GbE network.
> 
> In logs, when I have blocked OSD, on the main OSD I can see that :
> 2015-09-18 01:55:16.981396 7f89e8cb8700  0 log [WRN] : 2 slow
> requests, 1 included below; oldest blocked for > 33.976680 secs
> 2015-09-18 01:55:16.981402 7f89e8cb8700  0 log [WRN] : slow request
> 30.125556 seconds old, received at 2015-09-18 01:54:46.855821:
> osd_op(client.29760717.1:18680817544
> rb.0.1c16005.238e1f29.027f [write 180224~16384] 6.c11916a4
> snapc 11065=[11065,10fe7,10f69] ondisk+write e845819) v4 currently
> reached pg
> 2015-09-18 01:55:46.986319 7f89e8cb8700  0 log [WRN] : 2 slow
> requests, 1 included below; oldest blocked for > 63.981596 secs
> 2015-09-18 01:55:46.986324 7f89e8cb8700  0 log [WRN] : slow request
> 60.130472 seconds old, received at 2015-09-18 01:54:46.855821:
> osd_op(client.29760717.1:18680817544
> rb.0.1c16005.238e1f29.027f [write 180

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Olivier Bonvalet

Le vendredi 18 septembre 2015 à 14:14 +0200, Paweł Sadowski a écrit :
> It might be worth checking how many threads you have in your system
> (ps
> -eL | wc -l). By default there is a limit of 32k (sysctl -q
> kernel.pid_max). There is/was a bug in fork()
> (https://lkml.org/lkml/2015/2/3/345) reporting ENOMEM when PID limit
> is
> reached. We hit a situation when OSD trying to create new thread was
> killed and reports 'Cannot allocate memory' (12 OSD per node created
> more than 32k threads).
> 

Thanks ! For now I don't see more than 5k threads on nodes with 12 OSD,
but maybe during recovery/backfilling ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Olivier Bonvalet

I use Ceph 0.80.10.

I see IO wait is near 0 thanks to iostat, htop (in detailed mode), and
rechecked with Zabbix supervisor.


Le jeudi 17 septembre 2015 à 20:28 -0700, GuangYang a écrit :
> Which version are you using?
> 
> My guess is that the request (op) is waiting for lock (might be
> ondisk_read_lock of the object, but a debug_osd=20 should be helpful
> to tell what happened to the op).
> 
> How do you tell the IO wait is near to 0 (by top?)? 
> 
> Thanks,
> Guang
> 
> > From: ceph.l...@daevel.fr
> > To: ceph-users@lists.ceph.com
> > Date: Fri, 18 Sep 2015 02:43:49 +0200
> > Subject: Re: [ceph-users] Lot of blocked operations
> > 
> > Some additionnal informations :
> > - I have 4 SSD per node.
> > - the CPU usage is near 0
> > - IO wait is near 0 too
> > - bandwith usage is also near 0
> > 
> > The whole cluster seems waiting for something... but I don't see
> > what.
> > 
> > 
> > Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier Bonvalet a
> > écrit :
> > > Hi,
> > > 
> > > I have a cluster with lot of blocked operations each time I try
> > > to
> > > move
> > > data (by reweighting a little an OSD).
> > > 
> > > It's a full SSD cluster, with 10GbE network.
> > > 
> > > In logs, when I have blocked OSD, on the main OSD I can see that
> > > :
> > > 2015-09-18 01:55:16.981396 7f89e8cb8700 0 log [WRN] : 2 slow
> > > requests, 1 included below; oldest blocked for> 33.976680 secs
> > > 2015-09-18 01:55:16.981402 7f89e8cb8700 0 log [WRN] : slow
> > > request
> > > 30.125556 seconds old, received at 2015-09-18 01:54:46.855821:
> > > osd_op(client.29760717.1:18680817544
> > > rb.0.1c16005.238e1f29.027f [write 180224~16384]
> > > 6.c11916a4
> > > snapc 11065=[11065,10fe7,10f69] ondisk+write e845819) v4
> > > currently
> > > reached pg
> > > 2015-09-18 01:55:46.986319 7f89e8cb8700 0 log [WRN] : 2 slow
> > > requests, 1 included below; oldest blocked for> 63.981596 secs
> > > 2015-09-18 01:55:46.986324 7f89e8cb8700 0 log [WRN] : slow
> > > request
> > > 60.130472 seconds old, received at 2015-09-18 01:54:46.855821:
> > > osd_op(client.29760717.1:18680817544
> > > rb.0.1c16005.238e1f29.027f [write 180224~16384]
> > > 6.c11916a4
> > > snapc 11065=[11065,10fe7,10f69] ondisk+write e845819) v4
> > > currently
> > > reached pg
> > > 
> > > How should I read that ? What this OSD is waiting for ?
> > > 
> > > Thanks for any help,
> > > 
> > > Olivier
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] debian repositories path change?

2015-09-18 Thread Olivier Bonvalet

Hi,

not sure if it's related, but there is recent changes because of a
security issue :

http://ceph.com/releases/important-security-notice-regarding-signing-key-and-binary-downloads-of-ceph/




Le vendredi 18 septembre 2015 à 08:45 -0500, Brian Kroth a écrit :
> Hi all, we've had the following in our
> /etc/apt/sources.list.d/ceph.list 
> for a while based on some previous docs,
> 
> # ceph upstream stable (currently giant) release packages for wheezy:
> deb http://ceph.com/debian/ wheezy main
> 
> # ceph extras:
> deb http://ceph.com/packages/ceph-extras/debian wheezy main
> 
> but it seems like the straight "debian/" portion of that path has
> gone 
> missing recently, and now there's only debian-firefly/, debian
> -giant/, 
> debian-hammer/, etc.
> 
> Is that just an oversight, or should we be switching our sources to
> one 
> of the named releases?  I figured that the unnamed one would 
> automatically track what ceph currently considered "stable" for the 
> target distro release for me, but maybe that's not the case.
> 
> Thanks,
> Brian
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Olivier Bonvalet

But yes, I will try to increase OSD verbosity.

Le jeudi 17 septembre 2015 à 20:28 -0700, GuangYang a écrit :
> Which version are you using?
> 
> My guess is that the request (op) is waiting for lock (might be
> ondisk_read_lock of the object, but a debug_osd=20 should be helpful
> to tell what happened to the op).
> 
> How do you tell the IO wait is near to 0 (by top?)? 
> 
> Thanks,
> Guang
> 
> > From: ceph.l...@daevel.fr
> > To: ceph-users@lists.ceph.com
> > Date: Fri, 18 Sep 2015 02:43:49 +0200
> > Subject: Re: [ceph-users] Lot of blocked operations
> > 
> > Some additionnal informations :
> > - I have 4 SSD per node.
> > - the CPU usage is near 0
> > - IO wait is near 0 too
> > - bandwith usage is also near 0
> > 
> > The whole cluster seems waiting for something... but I don't see
> > what.
> > 
> > 
> > Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier Bonvalet a
> > écrit :
> > > Hi,
> > > 
> > > I have a cluster with lot of blocked operations each time I try
> > > to
> > > move
> > > data (by reweighting a little an OSD).
> > > 
> > > It's a full SSD cluster, with 10GbE network.
> > > 
> > > In logs, when I have blocked OSD, on the main OSD I can see that
> > > :
> > > 2015-09-18 01:55:16.981396 7f89e8cb8700 0 log [WRN] : 2 slow
> > > requests, 1 included below; oldest blocked for> 33.976680 secs
> > > 2015-09-18 01:55:16.981402 7f89e8cb8700 0 log [WRN] : slow
> > > request
> > > 30.125556 seconds old, received at 2015-09-18 01:54:46.855821:
> > > osd_op(client.29760717.1:18680817544
> > > rb.0.1c16005.238e1f29.027f [write 180224~16384]
> > > 6.c11916a4
> > > snapc 11065=[11065,10fe7,10f69] ondisk+write e845819) v4
> > > currently
> > > reached pg
> > > 2015-09-18 01:55:46.986319 7f89e8cb8700 0 log [WRN] : 2 slow
> > > requests, 1 included below; oldest blocked for> 63.981596 secs
> > > 2015-09-18 01:55:46.986324 7f89e8cb8700 0 log [WRN] : slow
> > > request
> > > 60.130472 seconds old, received at 2015-09-18 01:54:46.855821:
> > > osd_op(client.29760717.1:18680817544
> > > rb.0.1c16005.238e1f29.027f [write 180224~16384]
> > > 6.c11916a4
> > > snapc 11065=[11065,10fe7,10f69] ondisk+write e845819) v4
> > > currently
> > > reached pg
> > > 
> > > How should I read that ? What this OSD is waiting for ?
> > > 
> > > Thanks for any help,
> > > 
> > > Olivier
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Lot of blocked operations

2015-09-18 Thread Olivier Bonvalet

mmm good point.

I don't see CPU or IO problem on mons, but in logs, I have this :

2015-09-18 01:55:16.921027 7fb951175700  0 log [INF] : pgmap v86359128:
6632 pgs: 77 inactive, 1 remapped, 10 active+remapped+wait_backfill, 25
peering, 5 active+remapped, 6 active+remapped+backfilling, 6499
active+clean, 9 remapped+peering; 18974 GB data, 69004 GB used, 58578
GB / 124 TB avail; 915 kB/s rd, 26383 kB/s wr, 1671 op/s; 8417/15680513
objects degraded (0.054%); 1062 MB/s, 274 objects/s recovering


So... it can be a peering problem. Didn't see that, thanks.



Le vendredi 18 septembre 2015 à 09:52 +0200, Jan Schermer a écrit :
> Could this be caused by monitors? In my case lagging monitors can
> also cause slow requests (because of slow peering). Not sure if
> that's expected or not, but it of course doesn't show on the OSDs as
> any kind of bottleneck when you try to investigate...
> 
> Jan
> 
> > On 18 Sep 2015, at 09:37, Olivier Bonvalet <ceph.l...@daevel.fr>
> > wrote:
> > 
> > Hi,
> > 
> > sorry for missing informations. I was to avoid putting too much
> > inappropriate infos ;)
> > 
> > 
> > 
> > Le vendredi 18 septembre 2015 à 12:30 +0900, Christian Balzer a
> > écrit :
> > > Hello,
> > > 
> > > On Fri, 18 Sep 2015 02:43:49 +0200 Olivier Bonvalet wrote:
> > > 
> > > The items below help, but be a s specific as possible, from OS,
> > > kernel
> > > version to Ceph version, "ceph -s", any other specific details
> > > (pool
> > > type,
> > > replica size).
> > > 
> > 
> > So, all nodes use Debian Wheezy, running on a vanilla 3.14.x
> > kernel,
> > and Ceph 0.80.10.
> > I don't have anymore ceph status right now. But I have
> > data to move tonight again, so I'll track that.
> > 
> > The affected pool is a standard one (no erasure coding), with only
> > 2 replica (size=2).
> > 
> > 
> > 
> > 
> > > > Some additionnal informations :
> > > > - I have 4 SSD per node.
> > > Type, if nothing else for anecdotal reasons.
> > 
> > I have 7 storage nodes here :
> > - 3 nodes which have each 12 OSD of 300GB
> > SSD
> > - 4 nodes which have each  4 OSD of 800GB SSD
> > 
> > And I'm trying to replace 12x300GB nodes by 4x800GB nodes.
> > 
> > 
> > 
> > > > - the CPU usage is near 0
> > > > - IO wait is near 0 too
> > > Including the trouble OSD(s)?
> > 
> > Yes
> > 
> > 
> > > Measured how, iostat or atop?
> > 
> > iostat, htop, and confirmed with Zabbix supervisor.
> > 
> > 
> > 
> > 
> > > > - bandwith usage is also near 0
> > > > 
> > > Yeah, all of the above are not surprising if everything is stuck
> > > waiting
> > > on some ops to finish. 
> > > 
> > > How many nodes are we talking about?
> > 
> > 
> > 7 nodes, 52 OSDs.
> > 
> > 
> > 
> > > > The whole cluster seems waiting for something... but I don't
> > > > see
> > > > what.
> > > > 
> > > Is it just one specific OSD (or a set of them) or is that all
> > > over
> > > the
> > > place?
> > 
> > A set of them. When I increase the weight of all 4 OSDs of a node,
> > I
> > frequently have blocked IO from 1 OSD of this node.
> > 
> > 
> > 
> > > Does restarting the OSD fix things?
> > 
> > Yes. For several minutes.
> > 
> > 
> > > Christian
> > > > 
> > > > Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier Bonvalet a
> > > > écrit :
> > > > > Hi,
> > > > > 
> > > > > I have a cluster with lot of blocked operations each time I
> > > > > try
> > > > > to
> > > > > move
> > > > > data (by reweighting a little an OSD).
> > > > > 
> > > > > It's a full SSD cluster, with 10GbE network.
> > > > > 
> > > > > In logs, when I have blocked OSD, on the main OSD I can see
> > > > > that
> > > > > :
> > > > > 2015-09-18 01:55:16.981396 7f89e8cb8700  0 log [WRN] : 2 slow
> > > > > requests, 1 included below; oldest blocked for > 33.976680
> > > > > secs
> > > > > 2015-09-18 01:55:16.981402 7f89e8cb8700  0 log [WRN] : slow
> > > > > request
> > > > > 30.125556 seconds old, received at 2015-09-18
> > > >

[ceph-users] Lot of blocked operations

2015-09-17 Thread Olivier Bonvalet

Hi,

I have a cluster with lot of blocked operations each time I try to move
data (by reweighting a little an OSD).

It's a full SSD cluster, with 10GbE network.

In logs, when I have blocked OSD, on the main OSD I can see that :
2015-09-18 01:55:16.981396 7f89e8cb8700  0 log [WRN] : 2 slow requests, 1 
included below; oldest blocked for > 33.976680 secs
2015-09-18 01:55:16.981402 7f89e8cb8700  0 log [WRN] : slow request 30.125556 
seconds old, received at 2015-09-18 01:54:46.855821: 
osd_op(client.29760717.1:18680817544 rb.0.1c16005.238e1f29.027f [write 
180224~16384] 6.c11916a4 snapc 11065=[11065,10fe7,10f69] ondisk+write e845819) 
v4 currently reached pg
2015-09-18 01:55:46.986319 7f89e8cb8700  0 log [WRN] : 2 slow requests, 1 
included below; oldest blocked for > 63.981596 secs
2015-09-18 01:55:46.986324 7f89e8cb8700  0 log [WRN] : slow request 60.130472 
seconds old, received at 2015-09-18 01:54:46.855821: 
osd_op(client.29760717.1:18680817544 rb.0.1c16005.238e1f29.027f [write 
180224~16384] 6.c11916a4 snapc 11065=[11065,10fe7,10f69] ondisk+write e845819) 
v4 currently reached pg

How should I read that ? What this OSD is waiting for ?

Thanks for any help,

Olivier
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Lot of blocked operations

2015-09-17 Thread Olivier Bonvalet

Some additionnal informations :
- I have 4 SSD per node.
- the CPU usage is near 0
- IO wait is near 0 too
- bandwith usage is also near 0

The whole cluster seems waiting for something... but I don't see what.


Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier Bonvalet a écrit :
> Hi,
> 
> I have a cluster with lot of blocked operations each time I try to
> move
> data (by reweighting a little an OSD).
> 
> It's a full SSD cluster, with 10GbE network.
> 
> In logs, when I have blocked OSD, on the main OSD I can see that :
> 2015-09-18 01:55:16.981396 7f89e8cb8700  0 log [WRN] : 2 slow
> requests, 1 included below; oldest blocked for > 33.976680 secs
> 2015-09-18 01:55:16.981402 7f89e8cb8700  0 log [WRN] : slow request
> 30.125556 seconds old, received at 2015-09-18 01:54:46.855821:
> osd_op(client.29760717.1:18680817544
> rb.0.1c16005.238e1f29.027f [write 180224~16384] 6.c11916a4
> snapc 11065=[11065,10fe7,10f69] ondisk+write e845819) v4 currently
> reached pg
> 2015-09-18 01:55:46.986319 7f89e8cb8700  0 log [WRN] : 2 slow
> requests, 1 included below; oldest blocked for > 63.981596 secs
> 2015-09-18 01:55:46.986324 7f89e8cb8700  0 log [WRN] : slow request
> 60.130472 seconds old, received at 2015-09-18 01:54:46.855821:
> osd_op(client.29760717.1:18680817544
> rb.0.1c16005.238e1f29.027f [write 180224~16384] 6.c11916a4
> snapc 11065=[11065,10fe7,10f69] ondisk+write e845819) v4 currently
> reached pg
> 
> How should I read that ? What this OSD is waiting for ?
> 
> Thanks for any help,
> 
> Olivier
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Firefly 0.80.10 ready to upgrade to?

2015-07-21 Thread Olivier Bonvalet

Le lundi 13 juillet 2015 à 11:31 +0100, Gregory Farnum a écrit :
 On Mon, Jul 13, 2015 at 11:25 AM, Kostis Fardelas 
 dante1...@gmail.com wrote:
  Hello,
  it seems that new packages for firefly have been uploaded to repo.
  However, I can't find any details in Ceph Release notes. There is 
  only
  one thread in ceph-devel [1], but it is not clear what this new
  version is about. Is it safe to upgrade from 0.80.9 to 0.80.10?
 
 These packages got created and uploaded to the repository without
 release notes. I'm not sure why but I believe they're safe to use.
 Hopefully Sage and our release guys can resolve that soon as we've
 gotten several queries on the subject. :)
 -Greg
 ___


Hi,

any update on that point ? Packages were uploaded on repositories one
month ago.

I would appreciate a confirmation go! or NO go! ;)

thanks,
Olivier
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Firefly 0.80.10 ready to upgrade to?

2015-07-21 Thread Olivier Bonvalet

Le mardi 21 juillet 2015 à 07:06 -0700, Sage Weil a écrit :
 On Tue, 21 Jul 2015, Olivier Bonvalet wrote:
  Le lundi 13 juillet 2015 à 11:31 +0100, Gregory Farnum a écrit :
   On Mon, Jul 13, 2015 at 11:25 AM, Kostis Fardelas 
   dante1...@gmail.com wrote:
Hello,
it seems that new packages for firefly have been uploaded to 
 repo.
However, I can't find any details in Ceph Release notes. There 
 is 
only
one thread in ceph-devel [1], but it is not clear what this new
version is about. Is it safe to upgrade from 0.80.9 to 0.80.10?
   
   These packages got created and uploaded to the repository without
   release notes. I'm not sure why but I believe they're safe to 
 use.
   Hopefully Sage and our release guys can resolve that soon as 
 we've
   gotten several queries on the subject. :)
   -Greg
   ___
  
  
  Hi,
  
  any update on that point ? Packages were uploaded on repositories 
 one
  month ago.
  
  I would appreciate a confirmation go! or NO go! ;)
 
 Sorry, I was sick and this dropped off my list.  I'll post the 
 release 
 notes today.
 
 Thanks!
 sage

Great, I take that for a go!.

Thanks Sage :)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] More writes on filestore than on journal ?

2015-03-23 Thread Olivier Bonvalet

Hi,

I'm still trying to find why there is much more write operations on
filestore since Emperor/Firefly than from Dumpling.

So, I add monitoring of all perf counters values from OSD.

From what I see : «filestore.ops» reports an average of 78 operations
per seconds. But, block device monitoring reports an average of 113
operations per seconds (+45%).
please thoses 2 graphs :
- https://daevel.fr/img/firefly/osd-70.filestore-ops.png
- https://daevel.fr/img/firefly/osd-70.sda-ops.png

Do you see what can explain this difference ? (this OSD use XFS)

Thanks,
Olivier

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] More writes on blockdevice than on filestore ?

2015-03-23 Thread Olivier Bonvalet

Erg... I sent to fast. Bad title, please read «More writes on
blockdevice than on filestore)


Le lundi 23 mars 2015 à 14:21 +0100, Olivier Bonvalet a écrit :
 Hi,
 
 I'm still trying to find why there is much more write operations on
 filestore since Emperor/Firefly than from Dumpling.
 
 So, I add monitoring of all perf counters values from OSD.
 
 From what I see : «filestore.ops» reports an average of 78 operations
 per seconds. But, block device monitoring reports an average of 113
 operations per seconds (+45%).
 please thoses 2 graphs :
 - https://daevel.fr/img/firefly/osd-70.filestore-ops.png
 - https://daevel.fr/img/firefly/osd-70.sda-ops.png
 
 Do you see what can explain this difference ? (this OSD use XFS)
 
 Thanks,
 Olivier
 
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] More writes on filestore than on journal ?

2015-03-23 Thread Olivier Bonvalet

Hi,

Le lundi 23 mars 2015 à 07:29 -0700, Gregory Farnum a écrit :
 On Mon, Mar 23, 2015 at 6:21 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote:
  Hi,
 
  I'm still trying to find why there is much more write operations on
  filestore since Emperor/Firefly than from Dumpling.
 
 Do you have any history around this? It doesn't sound familiar,
 although I bet it's because of the WBThrottle and flushing changes.

I only have history for block device stats and global stats reports by
«ceph status».
When I have upgrade from Dumpling to Firefly (via Emperor), write
operations increased a lot on OSD.
I suppose it's because of WBThrottle too, but can't find any parameter
able to confirm that.


 
  So, I add monitoring of all perf counters values from OSD.
 
  From what I see : «filestore.ops» reports an average of 78 operations
  per seconds. But, block device monitoring reports an average of 113
  operations per seconds (+45%).
  please thoses 2 graphs :
  - https://daevel.fr/img/firefly/osd-70.filestore-ops.png
  - https://daevel.fr/img/firefly/osd-70.sda-ops.png
 
 That's unfortunate but perhaps not surprising — any filestore op can
 change a backing file (which requires hitting both the file and the
 inode: potentially two disk seeks), as well as adding entries to the
 leveldb instance.
 -Greg
 

Ok thanks, so this part can be «normal».

 
  Do you see what can explain this difference ? (this OSD use XFS)
 
  Thanks,
  Olivier
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Olivier Bonvalet

Yes, good idea.

I was looking the «WBThrottle» feature, but go for logging instead.


Le mercredi 04 mars 2015 à 17:10 +0100, Alexandre DERUMIER a écrit :
 Only writes ;) 
 
 ok, so maybe some background operations (snap triming, scrubing...).
 
 maybe debug_osd=20 , could give you more logs ?
 
 
 - Mail original -
 De: Olivier Bonvalet ceph.l...@daevel.fr
 À: aderumier aderum...@odiso.com
 Cc: ceph-users ceph-users@lists.ceph.com
 Envoyé: Mercredi 4 Mars 2015 16:42:13
 Objet: Re: [ceph-users] Perf problem after upgrade from dumpling to firefly
 
 Only writes ;) 
 
 
 Le mercredi 04 mars 2015 à 16:19 +0100, Alexandre DERUMIER a écrit : 
  The change is only on OSD (and not on OSD journal). 
  
  do you see twice iops for read and write ? 
  
  if only read, maybe a read ahead bug could explain this. 
  
  - Mail original - 
  De: Olivier Bonvalet ceph.l...@daevel.fr 
  À: aderumier aderum...@odiso.com 
  Cc: ceph-users ceph-users@lists.ceph.com 
  Envoyé: Mercredi 4 Mars 2015 15:13:30 
  Objet: Re: [ceph-users] Perf problem after upgrade from dumpling to firefly 
  
  Ceph health is OK yes. 
  
  The «firefly-upgrade-cluster-IO.png» graph is about IO stats seen by 
  ceph : there is no change between dumpling and firefly. The change is 
  only on OSD (and not on OSD journal). 
  
  
  Le mercredi 04 mars 2015 à 15:05 +0100, Alexandre DERUMIER a écrit : 
   The load problem is permanent : I have twice IO/s on HDD since firefly. 
   
   Oh, permanent, that's strange. (If you don't see more traffic coming from 
   clients, I don't understand...) 
   
   do you see also twice ios/ ops in ceph -w  stats ? 
   
   is the ceph health ok ? 
   
   
   
   - Mail original - 
   De: Olivier Bonvalet ceph.l...@daevel.fr 
   À: aderumier aderum...@odiso.com 
   Cc: ceph-users ceph-users@lists.ceph.com 
   Envoyé: Mercredi 4 Mars 2015 14:49:41 
   Objet: Re: [ceph-users] Perf problem after upgrade from dumpling to 
   firefly 
   
   Thanks Alexandre. 
   
   The load problem is permanent : I have twice IO/s on HDD since firefly. 
   And yes, the problem hang the production at night during snap trimming. 
   
   I suppose there is a new OSD parameter which change behavior of the 
   journal, or something like that. But didn't find anything about that. 
   
   Olivier 
   
   Le mercredi 04 mars 2015 à 14:44 +0100, Alexandre DERUMIER a écrit : 
Hi, 

maybe this is related ?: 

http://tracker.ceph.com/issues/9503 
Dumpling: removing many snapshots in a short time makes OSDs go 
berserk 

http://tracker.ceph.com/issues/9487 
dumpling: snaptrimmer causes slow requests while backfilling. 
osd_snap_trim_sleep not helping 

http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-December/045116.html
 



I think it's already backport in dumpling, not sure it's already done 
for firefly 


Alexandre 



- Mail original - 
De: Olivier Bonvalet ceph.l...@daevel.fr 
À: ceph-users ceph-users@lists.ceph.com 
Envoyé: Mercredi 4 Mars 2015 12:10:30 
Objet: [ceph-users] Perf problem after upgrade from dumpling to firefly 

Hi, 

last saturday I upgraded my production cluster from dumpling to emperor 
(since we were successfully using it on a test cluster). 
A couple of hours later, we had falling OSD : some of them were marked 
as down by Ceph, probably because of IO starvation. I marked the 
cluster 
in «noout», start downed OSD, then let him recover. 24h later, same 
problem (near same hour). 

So, I choose to directly upgrade to firefly, which is maintained. 
Things are better, but the cluster is slower than with dumpling. 

The main problem seems that OSD have twice more write operations par 
second : 
https://daevel.fr/img/firefly/firefly-upgrade-OSD70-IO.png 
https://daevel.fr/img/firefly/firefly-upgrade-OSD71-IO.png 

But journal doesn't change (SSD dedicated to OSD70+71+72) : 
https://daevel.fr/img/firefly/firefly-upgrade-OSD70+71-journal.png 

Neither node bandwidth : 
https://daevel.fr/img/firefly/firefly-upgrade-dragan-bandwidth.png 

Or whole cluster IO activity : 
https://daevel.fr/img/firefly/firefly-upgrade-cluster-IO.png 

Some background : 
The cluster is splitted in pools with «full SSD» OSD and «HDD+SSD 
journal» OSD. Only «HDD+SSD» OSD seems to be affected. 

I have 9 OSD on «HDD+SSD» node, 9 HDD and 3 SSD, and only 3 «HDD+SSD» 
nodes (so a total of 27 «HDD+SSD» OSD). 

The IO peak between 03h00 and 09h00 corresponds to snapshot rotation (= 
«rbd snap rm» operations). 
osd_snap_trim_sleep is setup to 0.8 since monthes. 
Yesterday I tried to reduce osd_pg_max_concurrent_snap_trims to 1. It 
doesn't seem to really help. 

The only thing which seems to help, is to reduce osd_disk_threads from

Re: [ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Olivier Bonvalet

Only writes ;)


Le mercredi 04 mars 2015 à 16:19 +0100, Alexandre DERUMIER a écrit :
 The change is only on OSD (and not on OSD journal). 
 
 do you see twice iops for read and write ?
 
 if only read, maybe a read ahead bug could explain this. 
 
 - Mail original -
 De: Olivier Bonvalet ceph.l...@daevel.fr
 À: aderumier aderum...@odiso.com
 Cc: ceph-users ceph-users@lists.ceph.com
 Envoyé: Mercredi 4 Mars 2015 15:13:30
 Objet: Re: [ceph-users] Perf problem after upgrade from dumpling to firefly
 
 Ceph health is OK yes. 
 
 The «firefly-upgrade-cluster-IO.png» graph is about IO stats seen by 
 ceph : there is no change between dumpling and firefly. The change is 
 only on OSD (and not on OSD journal). 
 
 
 Le mercredi 04 mars 2015 à 15:05 +0100, Alexandre DERUMIER a écrit : 
  The load problem is permanent : I have twice IO/s on HDD since firefly. 
  
  Oh, permanent, that's strange. (If you don't see more traffic coming from 
  clients, I don't understand...) 
  
  do you see also twice ios/ ops in ceph -w  stats ? 
  
  is the ceph health ok ? 
  
  
  
  - Mail original - 
  De: Olivier Bonvalet ceph.l...@daevel.fr 
  À: aderumier aderum...@odiso.com 
  Cc: ceph-users ceph-users@lists.ceph.com 
  Envoyé: Mercredi 4 Mars 2015 14:49:41 
  Objet: Re: [ceph-users] Perf problem after upgrade from dumpling to firefly 
  
  Thanks Alexandre. 
  
  The load problem is permanent : I have twice IO/s on HDD since firefly. 
  And yes, the problem hang the production at night during snap trimming. 
  
  I suppose there is a new OSD parameter which change behavior of the 
  journal, or something like that. But didn't find anything about that. 
  
  Olivier 
  
  Le mercredi 04 mars 2015 à 14:44 +0100, Alexandre DERUMIER a écrit : 
   Hi, 
   
   maybe this is related ?: 
   
   http://tracker.ceph.com/issues/9503 
   Dumpling: removing many snapshots in a short time makes OSDs go berserk 
   
   http://tracker.ceph.com/issues/9487 
   dumpling: snaptrimmer causes slow requests while backfilling. 
   osd_snap_trim_sleep not helping 
   
   http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-December/045116.html

   
   
   
   I think it's already backport in dumpling, not sure it's already done for 
   firefly 
   
   
   Alexandre 
   
   
   
   - Mail original - 
   De: Olivier Bonvalet ceph.l...@daevel.fr 
   À: ceph-users ceph-users@lists.ceph.com 
   Envoyé: Mercredi 4 Mars 2015 12:10:30 
   Objet: [ceph-users] Perf problem after upgrade from dumpling to firefly 
   
   Hi, 
   
   last saturday I upgraded my production cluster from dumpling to emperor 
   (since we were successfully using it on a test cluster). 
   A couple of hours later, we had falling OSD : some of them were marked 
   as down by Ceph, probably because of IO starvation. I marked the cluster 
   in «noout», start downed OSD, then let him recover. 24h later, same 
   problem (near same hour). 
   
   So, I choose to directly upgrade to firefly, which is maintained. 
   Things are better, but the cluster is slower than with dumpling. 
   
   The main problem seems that OSD have twice more write operations par 
   second : 
   https://daevel.fr/img/firefly/firefly-upgrade-OSD70-IO.png 
   https://daevel.fr/img/firefly/firefly-upgrade-OSD71-IO.png 
   
   But journal doesn't change (SSD dedicated to OSD70+71+72) : 
   https://daevel.fr/img/firefly/firefly-upgrade-OSD70+71-journal.png 
   
   Neither node bandwidth : 
   https://daevel.fr/img/firefly/firefly-upgrade-dragan-bandwidth.png 
   
   Or whole cluster IO activity : 
   https://daevel.fr/img/firefly/firefly-upgrade-cluster-IO.png 
   
   Some background : 
   The cluster is splitted in pools with «full SSD» OSD and «HDD+SSD 
   journal» OSD. Only «HDD+SSD» OSD seems to be affected. 
   
   I have 9 OSD on «HDD+SSD» node, 9 HDD and 3 SSD, and only 3 «HDD+SSD» 
   nodes (so a total of 27 «HDD+SSD» OSD). 
   
   The IO peak between 03h00 and 09h00 corresponds to snapshot rotation (= 
   «rbd snap rm» operations). 
   osd_snap_trim_sleep is setup to 0.8 since monthes. 
   Yesterday I tried to reduce osd_pg_max_concurrent_snap_trims to 1. It 
   doesn't seem to really help. 
   
   The only thing which seems to help, is to reduce osd_disk_threads from 8 
   to 1. 
   
   So. Any idea about what's happening ? 
   
   Thanks for any help, 
   Olivier 
   
   ___ 
   ceph-users mailing list 
   ceph-users@lists.ceph.com 
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
   
  
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Olivier Bonvalet

Hi,

last saturday I upgraded my production cluster from dumpling to emperor
(since we were successfully using it on a test cluster).
A couple of hours later, we had falling OSD : some of them were marked
as down by Ceph, probably because of IO starvation. I marked the cluster
in «noout», start downed OSD, then let him recover. 24h later, same
problem (near same hour).

So, I choose to directly upgrade to firefly, which is maintained.
Things are better, but the cluster is slower than with dumpling.

The main problem seems that OSD have twice more write operations par
second :
https://daevel.fr/img/firefly/firefly-upgrade-OSD70-IO.png
https://daevel.fr/img/firefly/firefly-upgrade-OSD71-IO.png

But journal doesn't change (SSD dedicated to OSD70+71+72) :
https://daevel.fr/img/firefly/firefly-upgrade-OSD70+71-journal.png

Neither node bandwidth :
https://daevel.fr/img/firefly/firefly-upgrade-dragan-bandwidth.png

Or whole cluster IO activity :
https://daevel.fr/img/firefly/firefly-upgrade-cluster-IO.png

Some background :
The cluster is splitted in pools with «full SSD» OSD and «HDD+SSD
journal» OSD. Only «HDD+SSD» OSD seems to be affected.

I have 9 OSD on «HDD+SSD» node, 9 HDD and 3 SSD, and only 3 «HDD+SSD»
nodes (so a total of 27 «HDD+SSD» OSD).

The IO peak between 03h00 and 09h00 corresponds to snapshot rotation (=
«rbd snap rm» operations).
osd_snap_trim_sleep is setup to 0.8 since monthes.
Yesterday I tried to reduce osd_pg_max_concurrent_snap_trims to 1. It
doesn't seem to really help.

The only thing which seems to help, is to reduce osd_disk_threads from 8
to 1.

So. Any idea about what's happening ?

Thanks for any help,
Olivier

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Olivier Bonvalet

Thanks Alexandre.

The load problem is permanent : I have twice IO/s on HDD since firefly.
And yes, the problem hang the production at night during snap trimming.

I suppose there is a new OSD parameter which change behavior of the
journal, or something like that. But didn't find anything about that.

Olivier

Le mercredi 04 mars 2015 à 14:44 +0100, Alexandre DERUMIER a écrit :
 Hi,
 
 maybe this is related ?:
 
 http://tracker.ceph.com/issues/9503
 Dumpling: removing many snapshots in a short time makes OSDs go berserk
 
 http://tracker.ceph.com/issues/9487
 dumpling: snaptrimmer causes slow requests while backfilling. 
 osd_snap_trim_sleep not helping
 
 http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-December/045116.html
 
 
 
 I think it's already backport in dumpling, not sure it's already done for 
 firefly
 
 
 Alexandre
 
 
 
 - Mail original -
 De: Olivier Bonvalet ceph.l...@daevel.fr
 À: ceph-users ceph-users@lists.ceph.com
 Envoyé: Mercredi 4 Mars 2015 12:10:30
 Objet: [ceph-users] Perf problem after upgrade from dumpling to firefly
 
 Hi, 
 
 last saturday I upgraded my production cluster from dumpling to emperor 
 (since we were successfully using it on a test cluster). 
 A couple of hours later, we had falling OSD : some of them were marked 
 as down by Ceph, probably because of IO starvation. I marked the cluster 
 in «noout», start downed OSD, then let him recover. 24h later, same 
 problem (near same hour). 
 
 So, I choose to directly upgrade to firefly, which is maintained. 
 Things are better, but the cluster is slower than with dumpling. 
 
 The main problem seems that OSD have twice more write operations par 
 second : 
 https://daevel.fr/img/firefly/firefly-upgrade-OSD70-IO.png 
 https://daevel.fr/img/firefly/firefly-upgrade-OSD71-IO.png 
 
 But journal doesn't change (SSD dedicated to OSD70+71+72) : 
 https://daevel.fr/img/firefly/firefly-upgrade-OSD70+71-journal.png 
 
 Neither node bandwidth : 
 https://daevel.fr/img/firefly/firefly-upgrade-dragan-bandwidth.png 
 
 Or whole cluster IO activity : 
 https://daevel.fr/img/firefly/firefly-upgrade-cluster-IO.png 
 
 Some background : 
 The cluster is splitted in pools with «full SSD» OSD and «HDD+SSD 
 journal» OSD. Only «HDD+SSD» OSD seems to be affected. 
 
 I have 9 OSD on «HDD+SSD» node, 9 HDD and 3 SSD, and only 3 «HDD+SSD» 
 nodes (so a total of 27 «HDD+SSD» OSD). 
 
 The IO peak between 03h00 and 09h00 corresponds to snapshot rotation (= 
 «rbd snap rm» operations). 
 osd_snap_trim_sleep is setup to 0.8 since monthes. 
 Yesterday I tried to reduce osd_pg_max_concurrent_snap_trims to 1. It 
 doesn't seem to really help. 
 
 The only thing which seems to help, is to reduce osd_disk_threads from 8 
 to 1. 
 
 So. Any idea about what's happening ? 
 
 Thanks for any help, 
 Olivier 
 
 ___ 
 ceph-users mailing list 
 ceph-users@lists.ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Perf problem after upgrade from dumpling to firefly

2015-03-04 Thread Olivier Bonvalet

Ceph health is OK yes.

The «firefly-upgrade-cluster-IO.png» graph is about IO stats seen by
ceph : there is no change between dumpling and firefly. The change is
only on OSD (and not on OSD journal).


Le mercredi 04 mars 2015 à 15:05 +0100, Alexandre DERUMIER a écrit :
 The load problem is permanent : I have twice IO/s on HDD since firefly.
 
 Oh, permanent, that's strange. (If you don't see more traffic coming from 
 clients, I don't understand...)
 
 do you see also twice ios/ ops in ceph -w  stats ?
 
 is the ceph health ok ?
 
 
 
 - Mail original -
 De: Olivier Bonvalet ceph.l...@daevel.fr
 À: aderumier aderum...@odiso.com
 Cc: ceph-users ceph-users@lists.ceph.com
 Envoyé: Mercredi 4 Mars 2015 14:49:41
 Objet: Re: [ceph-users] Perf problem after upgrade from dumpling to firefly
 
 Thanks Alexandre. 
 
 The load problem is permanent : I have twice IO/s on HDD since firefly. 
 And yes, the problem hang the production at night during snap trimming. 
 
 I suppose there is a new OSD parameter which change behavior of the 
 journal, or something like that. But didn't find anything about that. 
 
 Olivier 
 
 Le mercredi 04 mars 2015 à 14:44 +0100, Alexandre DERUMIER a écrit : 
  Hi, 
  
  maybe this is related ?: 
  
  http://tracker.ceph.com/issues/9503 
  Dumpling: removing many snapshots in a short time makes OSDs go berserk 
  
  http://tracker.ceph.com/issues/9487 
  dumpling: snaptrimmer causes slow requests while backfilling. 
  osd_snap_trim_sleep not helping 
  
  http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-December/045116.html
   
  
  
  
  I think it's already backport in dumpling, not sure it's already done for 
  firefly 
  
  
  Alexandre 
  
  
  
  - Mail original - 
  De: Olivier Bonvalet ceph.l...@daevel.fr 
  À: ceph-users ceph-users@lists.ceph.com 
  Envoyé: Mercredi 4 Mars 2015 12:10:30 
  Objet: [ceph-users] Perf problem after upgrade from dumpling to firefly 
  
  Hi, 
  
  last saturday I upgraded my production cluster from dumpling to emperor 
  (since we were successfully using it on a test cluster). 
  A couple of hours later, we had falling OSD : some of them were marked 
  as down by Ceph, probably because of IO starvation. I marked the cluster 
  in «noout», start downed OSD, then let him recover. 24h later, same 
  problem (near same hour). 
  
  So, I choose to directly upgrade to firefly, which is maintained. 
  Things are better, but the cluster is slower than with dumpling. 
  
  The main problem seems that OSD have twice more write operations par 
  second : 
  https://daevel.fr/img/firefly/firefly-upgrade-OSD70-IO.png 
  https://daevel.fr/img/firefly/firefly-upgrade-OSD71-IO.png 
  
  But journal doesn't change (SSD dedicated to OSD70+71+72) : 
  https://daevel.fr/img/firefly/firefly-upgrade-OSD70+71-journal.png 
  
  Neither node bandwidth : 
  https://daevel.fr/img/firefly/firefly-upgrade-dragan-bandwidth.png 
  
  Or whole cluster IO activity : 
  https://daevel.fr/img/firefly/firefly-upgrade-cluster-IO.png 
  
  Some background : 
  The cluster is splitted in pools with «full SSD» OSD and «HDD+SSD 
  journal» OSD. Only «HDD+SSD» OSD seems to be affected. 
  
  I have 9 OSD on «HDD+SSD» node, 9 HDD and 3 SSD, and only 3 «HDD+SSD» 
  nodes (so a total of 27 «HDD+SSD» OSD). 
  
  The IO peak between 03h00 and 09h00 corresponds to snapshot rotation (= 
  «rbd snap rm» operations). 
  osd_snap_trim_sleep is setup to 0.8 since monthes. 
  Yesterday I tried to reduce osd_pg_max_concurrent_snap_trims to 1. It 
  doesn't seem to really help. 
  
  The only thing which seems to help, is to reduce osd_disk_threads from 8 
  to 1. 
  
  So. Any idea about what's happening ? 
  
  Thanks for any help, 
  Olivier 
  
  ___ 
  ceph-users mailing list 
  ceph-users@lists.ceph.com 
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
  
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] v0.80.8 and librbd performance

2015-03-03 Thread Olivier Bonvalet

Does kernel client affected by the problem ?

Le mardi 03 mars 2015 à 15:19 -0800, Sage Weil a écrit :
 Hi,
 
 This is just a heads up that we've identified a performance regression in 
 v0.80.8 from previous firefly releases.  A v0.80.9 is working it's way 
 through QA and should be out in a few days.  If you haven't upgraded yet 
 you may want to wait.
 
 Thanks!
 sage
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] v0.80.8 and librbd performance

2015-03-03 Thread Olivier Bonvalet

Le mardi 03 mars 2015 à 16:32 -0800, Sage Weil a écrit :
 On Wed, 4 Mar 2015, Olivier Bonvalet wrote:
  Does kernel client affected by the problem ?
 
 Nope.  The kernel client is unaffected.. the issue is in librbd.
 
 sage
 


Ok, thanks for the clarification.
So I have to dig !


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Data still in OSD directories after removing

2014-05-21 Thread Olivier Bonvalet

Hi,

I have a lot of space wasted by this problem (about 10GB per OSD, just
for this RBD image).
If OSDs can't detect orphans files, should I manually detect them, then
remove them ?

This command can do the job, at least for this image prefix :
find /var/lib/ceph/osd/ -name 'rb.0.14bfb5a.238e1f29.*' -delete

Thanks for any advice,
Olivier

PS : not sure if this kind of problem is for the user or dev mailing
list.

Le mardi 20 mai 2014 à 11:32 +0200, Olivier Bonvalet a écrit :
 Hi,
 
 short : I removed a 1TB RBD image, but I still see files about it on
 OSD.
 
 
 long :
 1) I did : rbd snap purge $pool/$img
but since it overload the cluster, I stopped it (CTRL+C)
 2) latter, rbd snap purge $pool/$img
 3) then, rbd rm $pool/$img
 
 now, on the disk I can found files of this v1 RBD image (prefix was
 rb.0.14bfb5a.238e1f29) :
 
 # find /var/lib/ceph/osd/ceph-64/ -name 'rb.0.14bfb5a.238e1f29.*'
 /var/lib/ceph/osd/ceph-64/current/9.5c1_head/DIR_1/DIR_C/DIR_5/DIR_3/rb.0.14bfb5a.238e1f29.00021431__snapdir_C96635C1__9
 /var/lib/ceph/osd/ceph-64/current/9.5c1_head/DIR_1/DIR_C/DIR_5/DIR_3/rb.0.14bfb5a.238e1f29.5622__a252_32F435C1__9
 /var/lib/ceph/osd/ceph-64/current/9.5c1_head/DIR_1/DIR_C/DIR_5/DIR_3/rb.0.14bfb5a.238e1f29.00021431__a252_C96635C1__9
 /var/lib/ceph/osd/ceph-64/current/9.5c1_head/DIR_1/DIR_C/DIR_5/DIR_3/rb.0.14bfb5a.238e1f29.5622__snapdir_32F435C1__9
 /var/lib/ceph/osd/ceph-64/current/9.5c1_head/DIR_1/DIR_C/DIR_5/DIR_9/rb.0.14bfb5a.238e1f29.00011e08__a172_594495C1__9
 /var/lib/ceph/osd/ceph-64/current/9.5c1_head/DIR_1/DIR_C/DIR_5/DIR_9/rb.0.14bfb5a.238e1f29.00011e08__snapdir_594495C1__9
 /var/lib/ceph/osd/ceph-64/current/9.5c1_head/DIR_1/DIR_C/DIR_5/DIR_A/rb.0.14bfb5a.238e1f29.00021620__a252_779FA5C1__9
 ...
 
 
 So, is there a way to force OSD to detect if files are orphans, then
 remove them ?
 
 Thanks,
 Olivier
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Data still in OSD directories after removing

2014-05-21 Thread Olivier Bonvalet

Le mercredi 21 mai 2014 à 08:20 -0700, Sage Weil a écrit :
 
 You should definitely not do this!  :)

Of course ;)

 
 You're certain that that is the correct prefix for the rbd image you 
 removed?  Do you see the objects lists when you do 'rados -p rbd ls - | 
 grep prefix'?

I'm pretty sure yes : since I didn't see a lot of space freed by the
rbd snap purge command, I looked at the RBD prefix before to do the
rbd rm (it's not the first time I see that problem, but previous time
without the RBD prefix I was not able to check).

So : 
- rados -p sas3copies ls - | grep rb.0.14bfb5a.238e1f29 return nothing
at all
- # rados stat -p sas3copies rb.0.14bfb5a.238e1f29.0002f026
 error stat-ing sas3copies/rb.0.14bfb5a.238e1f29.0002f026: No such
file or directory
- # rados stat -p sas3copies rb.0.14bfb5a.238e1f29.
 error stat-ing sas3copies/rb.0.14bfb5a.238e1f29.: No such
file or directory
- # ls -al 
/var/lib/ceph/osd/ceph-67/current/9.1fe_head/DIR_E/DIR_F/DIR_1/DIR_7/rb.0.14bfb5a.238e1f29.0002f026__a252_E68871FE__9
-rw-r--r-- 1 root root 4194304 oct.   8  2013 
/var/lib/ceph/osd/ceph-67/current/9.1fe_head/DIR_E/DIR_F/DIR_1/DIR_7/rb.0.14bfb5a.238e1f29.0002f026__a252_E68871FE__9


 If the objects really are orphaned, teh way to clean them up is via 'rados 
 -p rbd rm objectname'.  I'd like to get to the bottom of how they ended 
 up that way first, though!

I suppose the problem came from me, by doing CTRL+C while rbd snap
purge $IMG.
rados rm -p sas3copies rb.0.14bfb5a.238e1f29.0002f026 don't remove
thoses files, and just answer with a No such file or directory.

Thanks,
Olivier



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Data still in OSD directories after removing

2014-05-20 Thread Olivier Bonvalet

Hi,

short : I removed a 1TB RBD image, but I still see files about it on
OSD.


long :
1) I did : rbd snap purge $pool/$img
   but since it overload the cluster, I stopped it (CTRL+C)
2) latter, rbd snap purge $pool/$img
3) then, rbd rm $pool/$img

now, on the disk I can found files of this v1 RBD image (prefix was
rb.0.14bfb5a.238e1f29) :

# find /var/lib/ceph/osd/ceph-64/ -name 'rb.0.14bfb5a.238e1f29.*'
/var/lib/ceph/osd/ceph-64/current/9.5c1_head/DIR_1/DIR_C/DIR_5/DIR_3/rb.0.14bfb5a.238e1f29.00021431__snapdir_C96635C1__9
/var/lib/ceph/osd/ceph-64/current/9.5c1_head/DIR_1/DIR_C/DIR_5/DIR_3/rb.0.14bfb5a.238e1f29.5622__a252_32F435C1__9
/var/lib/ceph/osd/ceph-64/current/9.5c1_head/DIR_1/DIR_C/DIR_5/DIR_3/rb.0.14bfb5a.238e1f29.00021431__a252_C96635C1__9
/var/lib/ceph/osd/ceph-64/current/9.5c1_head/DIR_1/DIR_C/DIR_5/DIR_3/rb.0.14bfb5a.238e1f29.5622__snapdir_32F435C1__9
/var/lib/ceph/osd/ceph-64/current/9.5c1_head/DIR_1/DIR_C/DIR_5/DIR_9/rb.0.14bfb5a.238e1f29.00011e08__a172_594495C1__9
/var/lib/ceph/osd/ceph-64/current/9.5c1_head/DIR_1/DIR_C/DIR_5/DIR_9/rb.0.14bfb5a.238e1f29.00011e08__snapdir_594495C1__9
/var/lib/ceph/osd/ceph-64/current/9.5c1_head/DIR_1/DIR_C/DIR_5/DIR_A/rb.0.14bfb5a.238e1f29.00021620__a252_779FA5C1__9
...


So, is there a way to force OSD to detect if files are orphans, then
remove them ?

Thanks,
Olivier

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] performance and disk usage of snapshots

2013-09-28 Thread Olivier Bonvalet

Hi,

Le mardi 24 septembre 2013 à 18:37 +0200, Corin Langosch a écrit :
 Hi there,
 
 do snapshots have an impact on write performance? I assume on each write all 
 snapshots have to get updated (cow) so the more snapshots exist the worse 
 write 
 performance will get?
 

Not exactly : the first time a write is done on a snapshot block, yes
that block (4MB per default) is duplicated on disk. So if you do 1
snapshot per RBD every day, each modified block will be duplicated once
time during the day. So, it's not a big impact.

But if you do frequent snapshots, one per hour for example, and your
workload is a lot of 8KB random write (MySQL Innodb...), then each of
this 8KB will throw a 4MB duplication on disk. Which is a big write
amplification here.


 Is there any way to see how much disk space a snapshot occupies? I assume 
 because of cow snapshots start with 0 real disk usage and grow over time as 
 the 
 underlying object changes?

Well, since rados df and ceph df don't report correctly space used
by snapshots, no, you can't. Or I didn't find how !

Small example : you have a 8MB RBD, and make a snapshot on it. Then you
still have 8MB of space used. Then you write 8KB on the first block,
ceph duplicate that block and now you have 12MB used on disk. But ceph
will report 8MB + 8KB used, not 12MB.


Olivier

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-11 Thread Olivier Bonvalet

Hi,

do you need more information about that ?

thanks,
Olivier

Le mardi 10 septembre 2013 à 11:19 -0700, Samuel Just a écrit :
 Can you post the rest of you crush map?
 -Sam
 
 On Tue, Sep 10, 2013 at 5:52 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote:
  I also checked that all files in that PG still are on that PG :
 
  for IMG in `find . -type f -printf '%f\n' | awk -F '__' '{ print $1 }' |
  sort --unique` ; do echo -n $IMG ; ceph osd map ssd3copies $IMG | grep
  -v 6\\.31f ; echo ; done
 
  And all objects are referenced in rados (compared with rados --pool
  ssd3copies ls rados.ssd3copies.dump).
 
 
 
  Le mardi 10 septembre 2013 à 13:46 +0200, Olivier Bonvalet a écrit :
  Some additionnal informations : if I look on one PG only, for example
  the 6.31f. ceph pg dump report a size of 616GB :
 
  # ceph pg dump | grep ^6\\. | awk '{ SUM+=($6/1024/1024) } END { print SUM 
  }'
  631717
 
  But on disk, on the 3 replica I have :
  # du -sh  /var/lib/ceph/osd/ceph-50/current/6.31f_head/
  1,3G  /var/lib/ceph/osd/ceph-50/current/6.31f_head/
 
  Since I was suspected a snapshot problem, I try to count only head
  files :
  # find /var/lib/ceph/osd/ceph-50/current/6.31f_head/ -type f -name 
  '*head*' -print0 | xargs -r -0 du -hc | tail -n1
  448M  total
 
  and the content of the directory : http://pastebin.com/u73mTvjs
 
 
  Le mardi 10 septembre 2013 à 10:31 +0200, Olivier Bonvalet a écrit :
   Hi,
  
   I have a space problem on a production cluster, like if there is unused
   data not freed : ceph df and rados df reports 613GB of data, and
   disk usage is 2640GB (with 3 replica). It should be near 1839GB.
  
  
   I have 5 hosts, 3 with SAS storage and 2 with SSD storage. I use crush
   rules to put pools on SAS or on SSD.
  
   My pools :
   # ceph osd dump | grep ^pool
   pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins 
   pg_num 576 pgp_num 576 last_change 68315 owner 0 crash_replay_interval 45
   pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash 
   rjenkins pg_num 576 pgp_num 576 last_change 68317 owner 0
   pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash rjenkins 
   pg_num 576 pgp_num 576 last_change 68321 owner 0
   pool 3 'hdd3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash 
   rjenkins pg_num 200 pgp_num 200 last_change 172933 owner 0
   pool 6 'ssd3copies' rep size 3 min_size 1 crush_ruleset 7 object_hash 
   rjenkins pg_num 800 pgp_num 800 last_change 172929 owner 0
   pool 9 'sas3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash 
   rjenkins pg_num 2048 pgp_num 2048 last_change 172935 owner 0
  
   Only hdd3copies, sas3copies and ssd3copies are really used :
   # ceph df
   GLOBAL:
   SIZE   AVAIL  RAW USED %RAW USED
   76498G 51849G 24648G   32.22
  
   POOLS:
   NAME   ID USED  %USED OBJECTS
   data   0  46753 0 72
   metadata   1  0 0 0
   rbd2  8 0 1
   hdd3copies 3  2724G 3.56  5190954
   ssd3copies 6  613G  0.80  347668
   sas3copies 9  3692G 4.83  764394
  
  
   My CRUSH rules was :
  
   rule SASperHost {
   ruleset 4
   type replicated
   min_size 1
   max_size 10
   step take SASroot
   step chooseleaf firstn 0 type host
   step emit
   }
  
   and :
  
   rule SSDperOSD {
   ruleset 3
   type replicated
   min_size 1
   max_size 10
   step take SSDroot
   step choose firstn 0 type osd
   step emit
   }
  
  
   but, since the cluster was full because of that space problem, I swith 
   to a different rule :
  
   rule SSDperOSDfirst {
   ruleset 7
   type replicated
   min_size 1
   max_size 10
   step take SSDroot
   step choose firstn 1 type osd
   step emit
   step take SASroot
   step chooseleaf firstn -1 type net
   step emit
   }
  
  
   So with that last rule, I should have only one replica on my SSD OSD, so 
   613GB of space used. But if I check on OSD I see 1212GB really used.
  
   I also use snapshots, maybe snapshots are ignored by ceph df and 
   rados df ?
  
   Thanks for any help.
  
   Olivier
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-11 Thread Olivier Bonvalet

Very simple test on a new pool ssdtest, with 3 replica full SSD
(crushrule 3) :

# rbd create ssdtest/test-mysql --size 102400
# rbd map ssdtest/test-mysql
# dd if=/dev/zero of=/dev/rbd/ssdtest/test-mysql bs=4M count=500
# ceph df | grep ssdtest
ssdtest10 2000M 0 502 

host1:# du -skc /var/lib/ceph/osd/ceph-*/*/10.* | tail -n1
3135780total
host2:# du -skc /var/lib/ceph/osd/ceph-*/*/10.* | tail -n1
3028804total
→ so 6020kB on disk, wich seems correct (and a find reports 739+767
files of 4MB, so it's also good).



First snapshot :

# rbd snap create ssdtest/test-mysql@s1
# dd if=/dev/zero of=/dev/rbd/ssdtest/test-mysql bs=4M count=250
# ceph df | grep ssdtest
ssdtest10 3000M 0 752 
2 * # du -skc /var/lib/ceph/osd/ceph-*/*/10.* | tail -n1
→ 9024kB on disk, which is correct again.



Second snapshot :

# rbd snap create ssdtest/test-mysql@s2
Here I write 4KB only it 100 differents rados blocks :
# for I in '' 1 2 3 4 5 6 7 8 9 ; do for J in 0 1 2 3 4 5 6 7 8 9 ; do
OFFSET=$I$J ; dd if=/dev/zero of=/dev/rbd/ssdtest/test-mysql bs=1k seek=
$((OFFSET*4096)) count=4 ; done ; done
# ceph df | grep ssdtest
ssdtest10 3000M 0 852 

Here the USED column of ceph df is wrong. And on the disk I see
10226kB used.


So, for me the problem come from ceph df (and rados df), wich don't
correctly reports space used by partially writed object.

Or is it XFS related only ?


Le mercredi 11 septembre 2013 à 11:00 +0200, Olivier Bonvalet a écrit :
 Hi,
 
 do you need more information about that ?
 
 thanks,
 Olivier
 
 Le mardi 10 septembre 2013 à 11:19 -0700, Samuel Just a écrit :
  Can you post the rest of you crush map?
  -Sam
  
  On Tue, Sep 10, 2013 at 5:52 AM, Olivier Bonvalet ceph.l...@daevel.fr 
  wrote:
   I also checked that all files in that PG still are on that PG :
  
   for IMG in `find . -type f -printf '%f\n' | awk -F '__' '{ print $1 }' |
   sort --unique` ; do echo -n $IMG ; ceph osd map ssd3copies $IMG | grep
   -v 6\\.31f ; echo ; done
  
   And all objects are referenced in rados (compared with rados --pool
   ssd3copies ls rados.ssd3copies.dump).
  
  
  
   Le mardi 10 septembre 2013 à 13:46 +0200, Olivier Bonvalet a écrit :
   Some additionnal informations : if I look on one PG only, for example
   the 6.31f. ceph pg dump report a size of 616GB :
  
   # ceph pg dump | grep ^6\\. | awk '{ SUM+=($6/1024/1024) } END { print 
   SUM }'
   631717
  
   But on disk, on the 3 replica I have :
   # du -sh  /var/lib/ceph/osd/ceph-50/current/6.31f_head/
   1,3G  /var/lib/ceph/osd/ceph-50/current/6.31f_head/
  
   Since I was suspected a snapshot problem, I try to count only head
   files :
   # find /var/lib/ceph/osd/ceph-50/current/6.31f_head/ -type f -name 
   '*head*' -print0 | xargs -r -0 du -hc | tail -n1
   448M  total
  
   and the content of the directory : http://pastebin.com/u73mTvjs
  
  
   Le mardi 10 septembre 2013 à 10:31 +0200, Olivier Bonvalet a écrit :
Hi,
   
I have a space problem on a production cluster, like if there is unused
data not freed : ceph df and rados df reports 613GB of data, and
disk usage is 2640GB (with 3 replica). It should be near 1839GB.
   
   
I have 5 hosts, 3 with SAS storage and 2 with SSD storage. I use crush
rules to put pools on SAS or on SSD.
   
My pools :
# ceph osd dump | grep ^pool
pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 576 pgp_num 576 last_change 68315 owner 0 
crash_replay_interval 45
pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash 
rjenkins pg_num 576 pgp_num 576 last_change 68317 owner 0
pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash 
rjenkins pg_num 576 pgp_num 576 last_change 68321 owner 0
pool 3 'hdd3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash 
rjenkins pg_num 200 pgp_num 200 last_change 172933 owner 0
pool 6 'ssd3copies' rep size 3 min_size 1 crush_ruleset 7 object_hash 
rjenkins pg_num 800 pgp_num 800 last_change 172929 owner 0
pool 9 'sas3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash 
rjenkins pg_num 2048 pgp_num 2048 last_change 172935 owner 0
   
Only hdd3copies, sas3copies and ssd3copies are really used :
# ceph df
GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED
76498G 51849G 24648G   32.22
   
POOLS:
NAME   ID USED  %USED OBJECTS
data   0  46753 0 72
metadata   1  0 0 0
rbd2  8 0 1
hdd3copies 3  2724G 3.56  5190954
ssd3copies 6  613G  0.80  347668
sas3copies 9  3692G 4.83  764394
   
   
My CRUSH rules was :
   
rule SASperHost {
ruleset 4
type replicated
min_size 1

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-11 Thread Olivier Bonvalet

I removed some garbage about hosts faude / rurkh / murmillia (they was
temporarily added because cluster was full). So the clean CRUSH map :


# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50

# devices
device 0 device0
device 1 device1
device 2 device2
device 3 device3
device 4 device4
device 5 device5
device 6 device6
device 7 device7
device 8 device8
device 9 device9
device 10 device10
device 11 device11
device 12 device12
device 13 device13
device 14 device14
device 15 device15
device 16 device16
device 17 device17
device 18 device18
device 19 device19
device 20 device20
device 21 device21
device 22 device22
device 23 device23
device 24 device24
device 25 device25
device 26 device26
device 27 device27
device 28 device28
device 29 device29
device 30 device30
device 31 device31
device 32 device32
device 33 device33
device 34 device34
device 35 device35
device 36 device36
device 37 device37
device 38 device38
device 39 device39
device 40 osd.40
device 41 osd.41
device 42 osd.42
device 43 osd.43
device 44 osd.44
device 45 osd.45
device 46 osd.46
device 47 osd.47
device 48 osd.48
device 49 osd.49
device 50 osd.50
device 51 osd.51
device 52 osd.52
device 53 osd.53
device 54 osd.54
device 55 osd.55
device 56 osd.56
device 57 osd.57
device 58 osd.58
device 59 osd.59
device 60 osd.60
device 61 osd.61
device 62 osd.62
device 63 osd.63
device 64 osd.64
device 65 osd.65
device 66 osd.66
device 67 osd.67
device 68 osd.68
device 69 osd.69
device 70 osd.70
device 71 osd.71
device 72 osd.72
device 73 osd.73
device 74 osd.74
device 75 osd.75
device 76 osd.76
device 77 osd.77
device 78 osd.78

# types
type 0 osd
type 1 host
type 2 rack
type 3 net
type 4 room
type 5 datacenter
type 6 root

# buckets
host dragan {
id -17  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item osd.70 weight 2.720
item osd.71 weight 2.720
item osd.72 weight 2.720
item osd.73 weight 2.720
item osd.74 weight 2.720
item osd.75 weight 2.720
item osd.76 weight 2.720
item osd.77 weight 2.720
item osd.78 weight 2.720
}
rack SAS15B01 {
id -40  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item dragan weight 24.480
}
net SAS188-165-15 {
id -72  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS15B01 weight 24.480
}
room SASs15 {
id -90  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS188-165-15 weight 24.480
}
datacenter SASrbx1 {
id -100 # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SASs15 weight 24.480
}
host taman {
id -16  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item osd.49 weight 2.720
item osd.62 weight 2.720
item osd.63 weight 2.720
item osd.64 weight 2.720
item osd.65 weight 2.720
item osd.66 weight 2.720
item osd.67 weight 2.720
item osd.68 weight 2.720
item osd.69 weight 2.720
}
rack SAS31A10 {
id -15  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item taman weight 24.480
}
net SAS178-33-62 {
id -14  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS31A10 weight 24.480
}
room SASs31 {
id -13  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS178-33-62 weight 24.480
}
host kaino {
id -9   # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item osd.40 weight 2.720
item osd.41 weight 2.720
item osd.42 weight 2.720
item osd.43 weight 2.720
item osd.44 weight 2.720
item osd.45 weight 2.720
item osd.46 weight 2.720
item osd.47 weight 2.720
item osd.48 weight 2.720
}
rack SAS34A14 {
id -10  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item kaino weight 24.480
}
net SAS5-135-135 {
id -11  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS34A14 weight 24.480
}
room SASs34 {
id -12  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS5-135-135 weight 24.480
}
datacenter SASrbx2 {
id -101 # do not change unnecessarily
# weight 48.960
alg straw

[ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet

Hi,

I have a space problem on a production cluster, like if there is unused
data not freed : ceph df and rados df reports 613GB of data, and
disk usage is 2640GB (with 3 replica). It should be near 1839GB.


I have 5 hosts, 3 with SAS storage and 2 with SSD storage. I use crush
rules to put pools on SAS or on SSD.

My pools :
# ceph osd dump | grep ^pool
pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 
576 pgp_num 576 last_change 68315 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash rjenkins 
pg_num 576 pgp_num 576 last_change 68317 owner 0
pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 
576 pgp_num 576 last_change 68321 owner 0
pool 3 'hdd3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash rjenkins 
pg_num 200 pgp_num 200 last_change 172933 owner 0
pool 6 'ssd3copies' rep size 3 min_size 1 crush_ruleset 7 object_hash rjenkins 
pg_num 800 pgp_num 800 last_change 172929 owner 0
pool 9 'sas3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash rjenkins 
pg_num 2048 pgp_num 2048 last_change 172935 owner 0

Only hdd3copies, sas3copies and ssd3copies are really used :
# ceph df
GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED 
76498G 51849G 24648G   32.22 

POOLS:
NAME   ID USED  %USED OBJECTS 
data   0  46753 0 72  
metadata   1  0 0 0   
rbd2  8 0 1   
hdd3copies 3  2724G 3.56  5190954 
ssd3copies 6  613G  0.80  347668  
sas3copies 9  3692G 4.83  764394  


My CRUSH rules was :

rule SASperHost {
ruleset 4
type replicated
min_size 1
max_size 10
step take SASroot
step chooseleaf firstn 0 type host
step emit
}

and :

rule SSDperOSD {
ruleset 3
type replicated
min_size 1
max_size 10
step take SSDroot
step choose firstn 0 type osd
step emit
}


but, since the cluster was full because of that space problem, I swith to a 
different rule :

rule SSDperOSDfirst {
ruleset 7
type replicated
min_size 1
max_size 10
step take SSDroot
step choose firstn 1 type osd
step emit
step take SASroot
step chooseleaf firstn -1 type net
step emit
}


So with that last rule, I should have only one replica on my SSD OSD, so 613GB 
of space used. But if I check on OSD I see 1212GB really used.

I also use snapshots, maybe snapshots are ignored by ceph df and rados df ?

Thanks for any help.

Olivier

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet

Some additionnal informations : if I look on one PG only, for example
the 6.31f. ceph pg dump report a size of 616GB :

# ceph pg dump | grep ^6\\. | awk '{ SUM+=($6/1024/1024) } END { print SUM }'
631717

But on disk, on the 3 replica I have :
# du -sh  /var/lib/ceph/osd/ceph-50/current/6.31f_head/
1,3G/var/lib/ceph/osd/ceph-50/current/6.31f_head/

Since I was suspected a snapshot problem, I try to count only head
files :
# find /var/lib/ceph/osd/ceph-50/current/6.31f_head/ -type f -name '*head*' 
-print0 | xargs -r -0 du -hc | tail -n1
448Mtotal

and the content of the directory : http://pastebin.com/u73mTvjs


Le mardi 10 septembre 2013 à 10:31 +0200, Olivier Bonvalet a écrit :
 Hi,
 
 I have a space problem on a production cluster, like if there is unused
 data not freed : ceph df and rados df reports 613GB of data, and
 disk usage is 2640GB (with 3 replica). It should be near 1839GB.
 
 
 I have 5 hosts, 3 with SAS storage and 2 with SSD storage. I use crush
 rules to put pools on SAS or on SSD.
 
 My pools :
 # ceph osd dump | grep ^pool
 pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins 
 pg_num 576 pgp_num 576 last_change 68315 owner 0 crash_replay_interval 45
 pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash rjenkins 
 pg_num 576 pgp_num 576 last_change 68317 owner 0
 pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash rjenkins 
 pg_num 576 pgp_num 576 last_change 68321 owner 0
 pool 3 'hdd3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash 
 rjenkins pg_num 200 pgp_num 200 last_change 172933 owner 0
 pool 6 'ssd3copies' rep size 3 min_size 1 crush_ruleset 7 object_hash 
 rjenkins pg_num 800 pgp_num 800 last_change 172929 owner 0
 pool 9 'sas3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash 
 rjenkins pg_num 2048 pgp_num 2048 last_change 172935 owner 0
 
 Only hdd3copies, sas3copies and ssd3copies are really used :
 # ceph df
 GLOBAL:
 SIZE   AVAIL  RAW USED %RAW USED 
 76498G 51849G 24648G   32.22 
 
 POOLS:
 NAME   ID USED  %USED OBJECTS 
 data   0  46753 0 72  
 metadata   1  0 0 0   
 rbd2  8 0 1   
 hdd3copies 3  2724G 3.56  5190954 
 ssd3copies 6  613G  0.80  347668  
 sas3copies 9  3692G 4.83  764394  
 
 
 My CRUSH rules was :
 
 rule SASperHost {
   ruleset 4
   type replicated
   min_size 1
   max_size 10
   step take SASroot
   step chooseleaf firstn 0 type host
   step emit
 }
 
 and :
 
 rule SSDperOSD {
   ruleset 3
   type replicated
   min_size 1
   max_size 10
   step take SSDroot
   step choose firstn 0 type osd
   step emit
 }
 
 
 but, since the cluster was full because of that space problem, I swith to a 
 different rule :
 
 rule SSDperOSDfirst {
   ruleset 7
   type replicated
   min_size 1
   max_size 10
   step take SSDroot
   step choose firstn 1 type osd
   step emit
 step take SASroot
 step chooseleaf firstn -1 type net
 step emit
 }
 
 
 So with that last rule, I should have only one replica on my SSD OSD, so 
 613GB of space used. But if I check on OSD I see 1212GB really used.
 
 I also use snapshots, maybe snapshots are ignored by ceph df and rados df 
 ?
 
 Thanks for any help.
 
 Olivier
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet

I also checked that all files in that PG still are on that PG :

for IMG in `find . -type f -printf '%f\n' | awk -F '__' '{ print $1 }' |
sort --unique` ; do echo -n $IMG ; ceph osd map ssd3copies $IMG | grep
-v 6\\.31f ; echo ; done

And all objects are referenced in rados (compared with rados --pool
ssd3copies ls rados.ssd3copies.dump).



Le mardi 10 septembre 2013 à 13:46 +0200, Olivier Bonvalet a écrit :
 Some additionnal informations : if I look on one PG only, for example
 the 6.31f. ceph pg dump report a size of 616GB :
 
 # ceph pg dump | grep ^6\\. | awk '{ SUM+=($6/1024/1024) } END { print SUM }'
 631717
 
 But on disk, on the 3 replica I have :
 # du -sh  /var/lib/ceph/osd/ceph-50/current/6.31f_head/
 1,3G  /var/lib/ceph/osd/ceph-50/current/6.31f_head/
 
 Since I was suspected a snapshot problem, I try to count only head
 files :
 # find /var/lib/ceph/osd/ceph-50/current/6.31f_head/ -type f -name '*head*' 
 -print0 | xargs -r -0 du -hc | tail -n1
 448M  total
 
 and the content of the directory : http://pastebin.com/u73mTvjs
 
 
 Le mardi 10 septembre 2013 à 10:31 +0200, Olivier Bonvalet a écrit :
  Hi,
  
  I have a space problem on a production cluster, like if there is unused
  data not freed : ceph df and rados df reports 613GB of data, and
  disk usage is 2640GB (with 3 replica). It should be near 1839GB.
  
  
  I have 5 hosts, 3 with SAS storage and 2 with SSD storage. I use crush
  rules to put pools on SAS or on SSD.
  
  My pools :
  # ceph osd dump | grep ^pool
  pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins 
  pg_num 576 pgp_num 576 last_change 68315 owner 0 crash_replay_interval 45
  pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash 
  rjenkins pg_num 576 pgp_num 576 last_change 68317 owner 0
  pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash rjenkins 
  pg_num 576 pgp_num 576 last_change 68321 owner 0
  pool 3 'hdd3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash 
  rjenkins pg_num 200 pgp_num 200 last_change 172933 owner 0
  pool 6 'ssd3copies' rep size 3 min_size 1 crush_ruleset 7 object_hash 
  rjenkins pg_num 800 pgp_num 800 last_change 172929 owner 0
  pool 9 'sas3copies' rep size 3 min_size 1 crush_ruleset 4 object_hash 
  rjenkins pg_num 2048 pgp_num 2048 last_change 172935 owner 0
  
  Only hdd3copies, sas3copies and ssd3copies are really used :
  # ceph df
  GLOBAL:
  SIZE   AVAIL  RAW USED %RAW USED 
  76498G 51849G 24648G   32.22 
  
  POOLS:
  NAME   ID USED  %USED OBJECTS 
  data   0  46753 0 72  
  metadata   1  0 0 0   
  rbd2  8 0 1   
  hdd3copies 3  2724G 3.56  5190954 
  ssd3copies 6  613G  0.80  347668  
  sas3copies 9  3692G 4.83  764394  
  
  
  My CRUSH rules was :
  
  rule SASperHost {
  ruleset 4
  type replicated
  min_size 1
  max_size 10
  step take SASroot
  step chooseleaf firstn 0 type host
  step emit
  }
  
  and :
  
  rule SSDperOSD {
  ruleset 3
  type replicated
  min_size 1
  max_size 10
  step take SSDroot
  step choose firstn 0 type osd
  step emit
  }
  
  
  but, since the cluster was full because of that space problem, I swith to a 
  different rule :
  
  rule SSDperOSDfirst {
  ruleset 7
  type replicated
  min_size 1
  max_size 10
  step take SSDroot
  step choose firstn 1 type osd
  step emit
  step take SASroot
  step chooseleaf firstn -1 type net
  step emit
  }
  
  
  So with that last rule, I should have only one replica on my SSD OSD, so 
  613GB of space used. But if I check on OSD I see 1212GB really used.
  
  I also use snapshots, maybe snapshots are ignored by ceph df and rados 
  df ?
  
  Thanks for any help.
  
  Olivier
  
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet

Le mardi 10 septembre 2013 à 11:19 -0700, Samuel Just a écrit :
 Can you post the rest of you crush map?
 -Sam
 

Yes :

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 device3
device 4 device4
device 5 device5
device 6 device6
device 7 device7
device 8 device8
device 9 device9
device 10 device10
device 11 device11
device 12 device12
device 13 device13
device 14 device14
device 15 device15
device 16 device16
device 17 device17
device 18 device18
device 19 device19
device 20 device20
device 21 device21
device 22 device22
device 23 device23
device 24 device24
device 25 device25
device 26 device26
device 27 device27
device 28 device28
device 29 device29
device 30 device30
device 31 device31
device 32 device32
device 33 device33
device 34 device34
device 35 device35
device 36 device36
device 37 device37
device 38 device38
device 39 device39
device 40 osd.40
device 41 osd.41
device 42 osd.42
device 43 osd.43
device 44 osd.44
device 45 osd.45
device 46 osd.46
device 47 osd.47
device 48 osd.48
device 49 osd.49
device 50 osd.50
device 51 osd.51
device 52 osd.52
device 53 osd.53
device 54 osd.54
device 55 osd.55
device 56 osd.56
device 57 osd.57
device 58 osd.58
device 59 osd.59
device 60 osd.60
device 61 osd.61
device 62 osd.62
device 63 osd.63
device 64 osd.64
device 65 osd.65
device 66 osd.66
device 67 osd.67
device 68 osd.68
device 69 osd.69
device 70 osd.70
device 71 osd.71
device 72 osd.72
device 73 osd.73
device 74 osd.74
device 75 osd.75
device 76 osd.76
device 77 osd.77
device 78 osd.78

# types
type 0 osd
type 1 host
type 2 rack
type 3 net
type 4 room
type 5 datacenter
type 6 root

# buckets
host dragan {
id -17  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item osd.70 weight 2.720
item osd.71 weight 2.720
item osd.72 weight 2.720
item osd.73 weight 2.720
item osd.74 weight 2.720
item osd.75 weight 2.720
item osd.76 weight 2.720
item osd.77 weight 2.720
item osd.78 weight 2.720
}
rack SAS15B01 {
id -40  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item dragan weight 24.480
}
net SAS188-165-15 {
id -72  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS15B01 weight 24.480
}
room SASs15 {
id -90  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS188-165-15 weight 24.480
}
datacenter SASrbx1 {
id -100 # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SASs15 weight 24.480
}
host taman {
id -16  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item osd.49 weight 2.720
item osd.62 weight 2.720
item osd.63 weight 2.720
item osd.64 weight 2.720
item osd.65 weight 2.720
item osd.66 weight 2.720
item osd.67 weight 2.720
item osd.68 weight 2.720
item osd.69 weight 2.720
}
rack SAS31A10 {
id -15  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item taman weight 24.480
}
net SAS178-33-62 {
id -14  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS31A10 weight 24.480
}
room SASs31 {
id -13  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS178-33-62 weight 24.480
}
host kaino {
id -9   # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item osd.40 weight 2.720
item osd.41 weight 2.720
item osd.42 weight 2.720
item osd.43 weight 2.720
item osd.44 weight 2.720
item osd.45 weight 2.720
item osd.46 weight 2.720
item osd.47 weight 2.720
item osd.48 weight 2.720
}
rack SAS34A14 {
id -10  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item kaino weight 24.480
}
net SAS5-135-135 {
id -11  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS34A14 weight 24.480
}
room SASs34 {
id -12  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS5-135-135 weight 24.480
}
datacenter SASrbx2 {
id -101 # do not change unnecessarily
# weight 48.960
alg straw
hash 0  # rjenkins1

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet

   




Le mardi 10 septembre 2013 à 21:01 +0200, Olivier Bonvalet a écrit :
 I removed some garbage about hosts faude / rurkh / murmillia (they was
 temporarily added because cluster was full). So the clean CRUSH map :
 
 
 # begin crush map
 tunable choose_local_tries 0
 tunable choose_local_fallback_tries 0
 tunable choose_total_tries 50
 
 # devices
 device 0 device0
 device 1 device1
 device 2 device2
 device 3 device3
 device 4 device4
 device 5 device5
 device 6 device6
 device 7 device7
 device 8 device8
 device 9 device9
 device 10 device10
 device 11 device11
 device 12 device12
 device 13 device13
 device 14 device14
 device 15 device15
 device 16 device16
 device 17 device17
 device 18 device18
 device 19 device19
 device 20 device20
 device 21 device21
 device 22 device22
 device 23 device23
 device 24 device24
 device 25 device25
 device 26 device26
 device 27 device27
 device 28 device28
 device 29 device29
 device 30 device30
 device 31 device31
 device 32 device32
 device 33 device33
 device 34 device34
 device 35 device35
 device 36 device36
 device 37 device37
 device 38 device38
 device 39 device39
 device 40 osd.40
 device 41 osd.41
 device 42 osd.42
 device 43 osd.43
 device 44 osd.44
 device 45 osd.45
 device 46 osd.46
 device 47 osd.47
 device 48 osd.48
 device 49 osd.49
 device 50 osd.50
 device 51 osd.51
 device 52 osd.52
 device 53 osd.53
 device 54 osd.54
 device 55 osd.55
 device 56 osd.56
 device 57 osd.57
 device 58 osd.58
 device 59 osd.59
 device 60 osd.60
 device 61 osd.61
 device 62 osd.62
 device 63 osd.63
 device 64 osd.64
 device 65 osd.65
 device 66 osd.66
 device 67 osd.67
 device 68 osd.68
 device 69 osd.69
 device 70 osd.70
 device 71 osd.71
 device 72 osd.72
 device 73 osd.73
 device 74 osd.74
 device 75 osd.75
 device 76 osd.76
 device 77 osd.77
 device 78 osd.78
 
 # types
 type 0 osd
 type 1 host
 type 2 rack
 type 3 net
 type 4 room
 type 5 datacenter
 type 6 root
 
 # buckets
 host dragan {
   id -17  # do not change unnecessarily
   # weight 24.480
   alg straw
   hash 0  # rjenkins1
   item osd.70 weight 2.720
   item osd.71 weight 2.720
   item osd.72 weight 2.720
   item osd.73 weight 2.720
   item osd.74 weight 2.720
   item osd.75 weight 2.720
   item osd.76 weight 2.720
   item osd.77 weight 2.720
   item osd.78 weight 2.720
 }
 rack SAS15B01 {
   id -40  # do not change unnecessarily
   # weight 24.480
   alg straw
   hash 0  # rjenkins1
   item dragan weight 24.480
 }
 net SAS188-165-15 {
   id -72  # do not change unnecessarily
   # weight 24.480
   alg straw
   hash 0  # rjenkins1
   item SAS15B01 weight 24.480
 }
 room SASs15 {
   id -90  # do not change unnecessarily
   # weight 24.480
   alg straw
   hash 0  # rjenkins1
   item SAS188-165-15 weight 24.480
 }
 datacenter SASrbx1 {
   id -100 # do not change unnecessarily
   # weight 24.480
   alg straw
   hash 0  # rjenkins1
   item SASs15 weight 24.480
 }
 host taman {
   id -16  # do not change unnecessarily
   # weight 24.480
   alg straw
   hash 0  # rjenkins1
   item osd.49 weight 2.720
   item osd.62 weight 2.720
   item osd.63 weight 2.720
   item osd.64 weight 2.720
   item osd.65 weight 2.720
   item osd.66 weight 2.720
   item osd.67 weight 2.720
   item osd.68 weight 2.720
   item osd.69 weight 2.720
 }
 rack SAS31A10 {
   id -15  # do not change unnecessarily
   # weight 24.480
   alg straw
   hash 0  # rjenkins1
   item taman weight 24.480
 }
 net SAS178-33-62 {
   id -14  # do not change unnecessarily
   # weight 24.480
   alg straw
   hash 0  # rjenkins1
   item SAS31A10 weight 24.480
 }
 room SASs31 {
   id -13  # do not change unnecessarily
   # weight 24.480
   alg straw
   hash 0  # rjenkins1
   item SAS178-33-62 weight 24.480
 }
 host kaino {
   id -9   # do not change unnecessarily
   # weight 24.480
   alg straw
   hash 0  # rjenkins1
   item osd.40 weight 2.720
   item osd.41 weight 2.720
   item osd.42 weight 2.720
   item osd.43 weight 2.720
   item osd.44 weight 2.720
   item osd.45 weight 2.720
   item osd.46 weight 2.720
   item osd.47 weight 2.720
   item osd.48 weight 2.720
 }
 rack SAS34A14 {
   id -10  # do not change unnecessarily
   # weight 24.480
   alg straw
   hash 0  # rjenkins1
   item kaino weight 24.480
 }
 net SAS5-135-135 {
   id -11  # do not change unnecessarily
   # weight 24.480
   alg straw
   hash 0  # rjenkins1
   item SAS34A14 weight 24.480
 }
 room SASs34 {
   id -12  # do not change unnecessarily
   # weight 24.480
   alg straw
   hash 0  # rjenkins1
   item SAS5-135-135 weight 24.480

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Olivier Bonvalet


I removed some garbage about hosts faude / rurkh / murmillia (they was
temporarily added because cluster was full). So the clean CRUSH map :


# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50

# devices
device 0 device0
device 1 device1
device 2 device2
device 3 device3
device 4 device4
device 5 device5
device 6 device6
device 7 device7
device 8 device8
device 9 device9
device 10 device10
device 11 device11
device 12 device12
device 13 device13
device 14 device14
device 15 device15
device 16 device16
device 17 device17
device 18 device18
device 19 device19
device 20 device20
device 21 device21
device 22 device22
device 23 device23
device 24 device24
device 25 device25
device 26 device26
device 27 device27
device 28 device28
device 29 device29
device 30 device30
device 31 device31
device 32 device32
device 33 device33
device 34 device34
device 35 device35
device 36 device36
device 37 device37
device 38 device38
device 39 device39
device 40 osd.40
device 41 osd.41
device 42 osd.42
device 43 osd.43
device 44 osd.44
device 45 osd.45
device 46 osd.46
device 47 osd.47
device 48 osd.48
device 49 osd.49
device 50 osd.50
device 51 osd.51
device 52 osd.52
device 53 osd.53
device 54 osd.54
device 55 osd.55
device 56 osd.56
device 57 osd.57
device 58 osd.58
device 59 osd.59
device 60 osd.60
device 61 osd.61
device 62 osd.62
device 63 osd.63
device 64 osd.64
device 65 osd.65
device 66 osd.66
device 67 osd.67
device 68 osd.68
device 69 osd.69
device 70 osd.70
device 71 osd.71
device 72 osd.72
device 73 osd.73
device 74 osd.74
device 75 osd.75
device 76 osd.76
device 77 osd.77
device 78 osd.78

# types
type 0 osd
type 1 host
type 2 rack
type 3 net
type 4 room
type 5 datacenter
type 6 root

# buckets
host dragan {
id -17  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item osd.70 weight 2.720
item osd.71 weight 2.720
item osd.72 weight 2.720
item osd.73 weight 2.720
item osd.74 weight 2.720
item osd.75 weight 2.720
item osd.76 weight 2.720
item osd.77 weight 2.720
item osd.78 weight 2.720
}
rack SAS15B01 {
id -40  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item dragan weight 24.480
}
net SAS188-165-15 {
id -72  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS15B01 weight 24.480
}
room SASs15 {
id -90  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS188-165-15 weight 24.480
}
datacenter SASrbx1 {
id -100 # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SASs15 weight 24.480
}
host taman {
id -16  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item osd.49 weight 2.720
item osd.62 weight 2.720
item osd.63 weight 2.720
item osd.64 weight 2.720
item osd.65 weight 2.720
item osd.66 weight 2.720
item osd.67 weight 2.720
item osd.68 weight 2.720
item osd.69 weight 2.720
}
rack SAS31A10 {
id -15  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item taman weight 24.480
}
net SAS178-33-62 {
id -14  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS31A10 weight 24.480
}
room SASs31 {
id -13  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS178-33-62 weight 24.480
}
host kaino {
id -9   # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item osd.40 weight 2.720
item osd.41 weight 2.720
item osd.42 weight 2.720
item osd.43 weight 2.720
item osd.44 weight 2.720
item osd.45 weight 2.720
item osd.46 weight 2.720
item osd.47 weight 2.720
item osd.48 weight 2.720
}
rack SAS34A14 {
id -10  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item kaino weight 24.480
}
net SAS5-135-135 {
id -11  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS34A14 weight 24.480
}
room SASs34 {
id -12  # do not change unnecessarily
# weight 24.480
alg straw
hash 0  # rjenkins1
item SAS5-135-135 weight 24.480
}
datacenter SASrbx2 {
id -101 # do not change unnecessarily
# weight 48.960
alg straw

Re: [ceph-users] Ceph + Xen - RBD io hang

2013-08-27 Thread Olivier Bonvalet

Hi,

I use Ceph 0.61.8 and Xen 4.2.2 (Debian) in production, and can't use
kernel 3.10.* on dom0, which hang very soon. But it's visible in kernel
logs of the dom0, not the domU.

Anyway, you should probably re-try with kernel 3.9.11 for the dom0 (I
also use 3.10.9 in domU).

Olivier

Le mardi 27 août 2013 à 11:46 +0100, James Dingwall a écrit :
 Hi,
 
 I am doing some experimentation with Ceph and Xen (on the same host) and 
 I'm experiencing some problems with the rbd device that I'm using as the 
 block device.  My environment is:
 
 2 node Ceph 0.67.2 cluster, 4x OSD (btrfs) and 1x mon
 Xen 4.3.0
 Kernel 3.10.9
 
 The domU I'm trying to build is from the Ubuntu 13.04 desktop release.  
 When I pass through the rbd (format 1 or 2) device as 
 phy:/dev/rbd/rbd/ubuntu-test then the domU has no problems reading data 
 from it, the test I ran was:
 
 for i in $(seq 0 1023) ; do
  dd if=/dev/xvda of=/dev/null bs=4k count=1024 skip=$(($i * 4))
 done
 
 However writing data causes the domU to hang while while i is still in 
 single figures but it doesn't seem consistent about the exact value.
 for i in $(seq 0 1023) ; do
  dd of=/dev/xvda of=/dev/zero bs=4k count=1024 seek=$(($i * 4))
 done
 
 eventually the kernel in the domU will print a hung task warning.  I 
 have tried the domU as pv and hvm (with xen_platform_pci = 1 and 0) but 
 have the same behaviour in both cases.  Once this state is triggered on 
 the rbd device then any interaction with it in dom0 will result in the 
 same hang.  I'm assuming that there is some unfavourable interaction 
 between ceph/rbd and blkback but I haven't found anything in the dom0 
 logs so I would like to know if anyone has some suggestions about where 
 to start trying to hunt this down.
 
 Thanks,
 James
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] osd/OSD.cc: 4844: FAILED assert(_get_map_bl(epoch, bl)) (ceph 0.61.7)

2013-08-19 Thread Olivier Bonvalet

Hi,

I have an OSD which crash every time I try to start it (see logs below).
Is it a known problem ? And is there a way to fix it ?

root! taman:/var/log/ceph# grep -v ' pipe' osd.65.log
2013-08-19 11:07:48.478558 7f6fe367a780  0 ceph version 0.61.7 
(8f010aff684e820ecc837c25ac77c7a05d7191ff), process ceph-osd, pid 19327
2013-08-19 11:07:48.516363 7f6fe367a780  0 filestore(/var/lib/ceph/osd/ceph-65) 
mount FIEMAP ioctl is supported and appears to work
2013-08-19 11:07:48.516380 7f6fe367a780  0 filestore(/var/lib/ceph/osd/ceph-65) 
mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-08-19 11:07:48.516514 7f6fe367a780  0 filestore(/var/lib/ceph/osd/ceph-65) 
mount did NOT detect btrfs
2013-08-19 11:07:48.517087 7f6fe367a780  0 filestore(/var/lib/ceph/osd/ceph-65) 
mount syscall(SYS_syncfs, fd) fully supported
2013-08-19 11:07:48.517389 7f6fe367a780  0 filestore(/var/lib/ceph/osd/ceph-65) 
mount found snaps 
2013-08-19 11:07:49.199483 7f6fe367a780  0 filestore(/var/lib/ceph/osd/ceph-65) 
mount: enabling WRITEAHEAD journal mode: btrfs not detected
2013-08-19 11:07:52.191336 7f6fe367a780  1 journal _open /dev/sdk4 fd 18: 
53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-08-19 11:07:52.196020 7f6fe367a780  1 journal _open /dev/sdk4 fd 18: 
53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-08-19 11:07:52.196920 7f6fe367a780  1 journal close /dev/sdk4
2013-08-19 11:07:52.199908 7f6fe367a780  0 filestore(/var/lib/ceph/osd/ceph-65) 
mount FIEMAP ioctl is supported and appears to work
2013-08-19 11:07:52.199916 7f6fe367a780  0 filestore(/var/lib/ceph/osd/ceph-65) 
mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-08-19 11:07:52.200058 7f6fe367a780  0 filestore(/var/lib/ceph/osd/ceph-65) 
mount did NOT detect btrfs
2013-08-19 11:07:52.200886 7f6fe367a780  0 filestore(/var/lib/ceph/osd/ceph-65) 
mount syscall(SYS_syncfs, fd) fully supported
2013-08-19 11:07:52.200919 7f6fe367a780  0 filestore(/var/lib/ceph/osd/ceph-65) 
mount found snaps 
2013-08-19 11:07:52.215850 7f6fe367a780  0 filestore(/var/lib/ceph/osd/ceph-65) 
mount: enabling WRITEAHEAD journal mode: btrfs not detected
2013-08-19 11:07:52.219819 7f6fe367a780  1 journal _open /dev/sdk4 fd 26: 
53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-08-19 11:07:52.227420 7f6fe367a780  1 journal _open /dev/sdk4 fd 26: 
53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-08-19 11:07:52.500342 7f6fe367a780  0 osd.65 144201 crush map has features 
262144, adjusting msgr requires for clients
2013-08-19 11:07:52.500353 7f6fe367a780  0 osd.65 144201 crush map has features 
262144, adjusting msgr requires for osds
2013-08-19 11:08:13.581709 7f6fbdcb5700 -1 osd/OSD.cc: In function 'OSDMapRef 
OSDService::get_map(epoch_t)' thread 7f6fbdcb5700 time 2013-08-19 
11:08:13.579519
osd/OSD.cc: 4844: FAILED assert(_get_map_bl(epoch, bl))

 ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
 1: (OSDService::get_map(unsigned int)+0x44b) [0x6f5b9b]
 2: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle, 
PG::RecoveryCtx*, std::setboost::intrusive_ptrPG, 
std::lessboost::intrusive_ptrPG , std::allocatorboost::intrusive_ptrPG  
*)+0x3c8) [0x6f8f48]
 3: (OSD::process_peering_events(std::listPG*, std::allocatorPG*  const, 
ThreadPool::TPHandle)+0x31f) [0x6f975f]
 4: (OSD::PeeringWQ::_process(std::listPG*, std::allocatorPG*  const, 
ThreadPool::TPHandle)+0x14) [0x7391d4]
 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0x8f8e3a]
 6: (ThreadPool::WorkThread::entry()+0x10) [0x8fa0e0]
 7: (()+0x6b50) [0x7f6fe3070b50]
 8: (clone()+0x6d) [0x7f6fe15cba7d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

full logs here : http://pastebin.com/RphNyLU0


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd/OSD.cc: 4844: FAILED assert(_get_map_bl(epoch, bl)) (ceph 0.61.7)

2013-08-19 Thread Olivier Bonvalet

Le lundi 19 août 2013 à 12:27 +0200, Olivier Bonvalet a écrit :
 Hi,
 
 I have an OSD which crash every time I try to start it (see logs below).
 Is it a known problem ? And is there a way to fix it ?
 
 root! taman:/var/log/ceph# grep -v ' pipe' osd.65.log
 2013-08-19 11:07:48.478558 7f6fe367a780  0 ceph version 0.61.7 
 (8f010aff684e820ecc837c25ac77c7a05d7191ff), process ceph-osd, pid 19327
 2013-08-19 11:07:48.516363 7f6fe367a780  0 
 filestore(/var/lib/ceph/osd/ceph-65) mount FIEMAP ioctl is supported and 
 appears to work
 2013-08-19 11:07:48.516380 7f6fe367a780  0 
 filestore(/var/lib/ceph/osd/ceph-65) mount FIEMAP ioctl is disabled via 
 'filestore fiemap' config option
 2013-08-19 11:07:48.516514 7f6fe367a780  0 
 filestore(/var/lib/ceph/osd/ceph-65) mount did NOT detect btrfs
 2013-08-19 11:07:48.517087 7f6fe367a780  0 
 filestore(/var/lib/ceph/osd/ceph-65) mount syscall(SYS_syncfs, fd) fully 
 supported
 2013-08-19 11:07:48.517389 7f6fe367a780  0 
 filestore(/var/lib/ceph/osd/ceph-65) mount found snaps 
 2013-08-19 11:07:49.199483 7f6fe367a780  0 
 filestore(/var/lib/ceph/osd/ceph-65) mount: enabling WRITEAHEAD journal mode: 
 btrfs not detected
 2013-08-19 11:07:52.191336 7f6fe367a780  1 journal _open /dev/sdk4 fd 18: 
 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1
 2013-08-19 11:07:52.196020 7f6fe367a780  1 journal _open /dev/sdk4 fd 18: 
 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1
 2013-08-19 11:07:52.196920 7f6fe367a780  1 journal close /dev/sdk4
 2013-08-19 11:07:52.199908 7f6fe367a780  0 
 filestore(/var/lib/ceph/osd/ceph-65) mount FIEMAP ioctl is supported and 
 appears to work
 2013-08-19 11:07:52.199916 7f6fe367a780  0 
 filestore(/var/lib/ceph/osd/ceph-65) mount FIEMAP ioctl is disabled via 
 'filestore fiemap' config option
 2013-08-19 11:07:52.200058 7f6fe367a780  0 
 filestore(/var/lib/ceph/osd/ceph-65) mount did NOT detect btrfs
 2013-08-19 11:07:52.200886 7f6fe367a780  0 
 filestore(/var/lib/ceph/osd/ceph-65) mount syscall(SYS_syncfs, fd) fully 
 supported
 2013-08-19 11:07:52.200919 7f6fe367a780  0 
 filestore(/var/lib/ceph/osd/ceph-65) mount found snaps 
 2013-08-19 11:07:52.215850 7f6fe367a780  0 
 filestore(/var/lib/ceph/osd/ceph-65) mount: enabling WRITEAHEAD journal mode: 
 btrfs not detected
 2013-08-19 11:07:52.219819 7f6fe367a780  1 journal _open /dev/sdk4 fd 26: 
 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1
 2013-08-19 11:07:52.227420 7f6fe367a780  1 journal _open /dev/sdk4 fd 26: 
 53687091200 bytes, block size 4096 bytes, directio = 1, aio = 1
 2013-08-19 11:07:52.500342 7f6fe367a780  0 osd.65 144201 crush map has 
 features 262144, adjusting msgr requires for clients
 2013-08-19 11:07:52.500353 7f6fe367a780  0 osd.65 144201 crush map has 
 features 262144, adjusting msgr requires for osds
 2013-08-19 11:08:13.581709 7f6fbdcb5700 -1 osd/OSD.cc: In function 'OSDMapRef 
 OSDService::get_map(epoch_t)' thread 7f6fbdcb5700 time 2013-08-19 
 11:08:13.579519
 osd/OSD.cc: 4844: FAILED assert(_get_map_bl(epoch, bl))
 
  ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
  1: (OSDService::get_map(unsigned int)+0x44b) [0x6f5b9b]
  2: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle, 
 PG::RecoveryCtx*, std::setboost::intrusive_ptrPG, 
 std::lessboost::intrusive_ptrPG , std::allocatorboost::intrusive_ptrPG 
  *)+0x3c8) [0x6f8f48]
  3: (OSD::process_peering_events(std::listPG*, std::allocatorPG*  const, 
 ThreadPool::TPHandle)+0x31f) [0x6f975f]
  4: (OSD::PeeringWQ::_process(std::listPG*, std::allocatorPG*  const, 
 ThreadPool::TPHandle)+0x14) [0x7391d4]
  5: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0x8f8e3a]
  6: (ThreadPool::WorkThread::entry()+0x10) [0x8fa0e0]
  7: (()+0x6b50) [0x7f6fe3070b50]
  8: (clone()+0x6d) [0x7f6fe15cba7d]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.
 
 full logs here : http://pastebin.com/RphNyLU0
 
 

Hi,

still same problem with Ceph 0.61.8 :

2013-08-19 23:01:54.369609 7fdd667a4780  0 osd.65 144279 crush map has features 
262144, adjusting msgr requires for osds
2013-08-19 23:01:58.315115 7fdd405de700 -1 osd/OSD.cc: In function 'OSDMapRef 
OSDService::get_map(epoch_t)' thread 7fdd405de700 time 2013-08-19 
23:01:58.313955
osd/OSD.cc: 4847: FAILED assert(_get_map_bl(epoch, bl))

 ceph version 0.61.8 (a6fdcca3bddbc9f177e4e2bf0d9cdd85006b028b)
 1: (OSDService::get_map(unsigned int)+0x44b) [0x6f736b]
 2: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle, 
PG::RecoveryCtx*, std::setboost::intrusive_ptrPG, 
std::lessboost::intrusive_ptrPG , std::allocatorboost::intrusive_ptrPG  
*)+0x3c8) [0x6fa708]
 3: (OSD::process_peering_events(std::listPG*, std::allocatorPG*  const, 
ThreadPool::TPHandle)+0x31f) [0x6faf1f]
 4: (OSD::PeeringWQ::_process(std::listPG*, std::allocatorPG*  const, 
ThreadPool::TPHandle)+0x14) [0x73a9b4]
 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0x8fb69a]
 6: (ThreadPool::WorkThread

Re: [ceph-users] Replace all monitors

2013-08-10 Thread Olivier Bonvalet

Le jeudi 08 août 2013 à 18:04 -0700, Sage Weil a écrit :
 On Fri, 9 Aug 2013, Olivier Bonvalet wrote:
  Le jeudi 08 ao?t 2013 ? 09:43 -0700, Sage Weil a ?crit :
   On Thu, 8 Aug 2013, Olivier Bonvalet wrote:
Hi,

from now I have 5 monitors which share slow SSD with several OSD
journal. As a result, each data migration operation (reweight, recovery,
etc) is very slow and the cluster is near down.

So I have to change that. I'm looking to replace this 5 monitors by 3
new monitors, which still share (very fast) SSD with several OSD.
I suppose it's not a good idea, since monitors should have a dedicated
storage. What do you think about that ?
Is it a better practice to have dedicated storage, but share CPU with
Xen VM ?
   
   I think it's okay, as long as you aren't wroried about the device filling 
   up and the monitors are on different hosts.
  
  Not sure to understand : by ?dedicated storage?, I was talking of the
  monitor. Can I put monitors on Xen ?host?, if they have dedicated
  storage ?
 
 Yeah, Xen would work fine here, although I'm not sure it is necessary.  
 Just putting /var/lib/mon on a different storage device will probably be 
 the most important piece.  It sounds like it is storage contention, and 
 not CPU contention, that is the source of your problems.
 
 sage
 

Yop, the transition worked fine, thanks ! Newer mon are really fasters,
and now I can migrate data without downtime. Good job devs !

Thanks again.

Olivier

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Replace all monitors

2013-08-08 Thread Olivier Bonvalet

Hi,

from now I have 5 monitors which share slow SSD with several OSD
journal. As a result, each data migration operation (reweight, recovery,
etc) is very slow and the cluster is near down.

So I have to change that. I'm looking to replace this 5 monitors by 3
new monitors, which still share (very fast) SSD with several OSD.
I suppose it's not a good idea, since monitors should have a dedicated
storage. What do you think about that ?
Is it a better practice to have dedicated storage, but share CPU with
Xen VM ?

Second point, I'm not sure how to do that migration, without downtime.
I was hoping to add the 3 new monitors, then progressively remove the 5
old monitors, but in the doc [1] indicate a special procedure for
unhealthy cluster, which seem to be for clusters with damaged monitors,
right ? In my case I only have dead PG [2] (#5226), from which I can't
recover, but monitors are fine. Can I use the standard procedure ?

Thanks,
Olivier

[1] 
http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors
[2] http://tracker.ceph.com/issues/5226

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Replace all monitors

2013-08-08 Thread Olivier Bonvalet

Le jeudi 08 août 2013 à 09:43 -0700, Sage Weil a écrit :
 On Thu, 8 Aug 2013, Olivier Bonvalet wrote:
  Hi,
  
  from now I have 5 monitors which share slow SSD with several OSD
  journal. As a result, each data migration operation (reweight, recovery,
  etc) is very slow and the cluster is near down.
  
  So I have to change that. I'm looking to replace this 5 monitors by 3
  new monitors, which still share (very fast) SSD with several OSD.
  I suppose it's not a good idea, since monitors should have a dedicated
  storage. What do you think about that ?
  Is it a better practice to have dedicated storage, but share CPU with
  Xen VM ?
 
 I think it's okay, as long as you aren't wroried about the device filling 
 up and the monitors are on different hosts.

Not sure to understand : by «dedicated storage», I was talking of the
monitor. Can I put monitors on Xen «host», if they have dedicated
storage ?

 
  Second point, I'm not sure how to do that migration, without downtime.
  I was hoping to add the 3 new monitors, then progressively remove the 5
  old monitors, but in the doc [1] indicate a special procedure for
  unhealthy cluster, which seem to be for clusters with damaged monitors,
  right ? In my case I only have dead PG [2] (#5226), from which I can't
  recover, but monitors are fine. Can I use the standard procedure ?
 
 The 'healthy' caveat in this case is about the monitor cluster; teh 
 special procedure is only needed if you don't have enough healthy mons to 
 form a  quorum.  The normal procedure should work just fine.
 

Great, thanks !


 sage
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] kernel BUG at net/ceph/osd_client.c:2103

2013-08-05 Thread Olivier Bonvalet

It's Xen yes, but no I didn't tried the RBD tab client, for two
reasons :
- too young to enable it in production
- Debian packages don't have the TAP driver


Le lundi 05 août 2013 à 01:43 +, James Harper a écrit :
 What VM? If Xen, have you tried the rbd tap client?
 
 James
 
  -Original Message-
  From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
  boun...@lists.ceph.com] On Behalf Of Olivier Bonvalet
  Sent: Monday, 5 August 2013 11:07 AM
  To: ceph-users@lists.ceph.com
  Subject: [ceph-users] kernel BUG at net/ceph/osd_client.c:2103
  
  
  Hi,
  
  I've just upgraded a Xen Dom0 (Debian Wheezy with Xen 4.2.2) from Linux
  3.9.11 to Linux 3.10.5, and now I have kernel panic after launching some
  VM which use RBD kernel client.
  
  
  In kernel logs, I have :
  
  Aug  5 02:51:22 murmillia kernel: [  289.205652] kernel BUG at
  net/ceph/osd_client.c:2103!
  Aug  5 02:51:22 murmillia kernel: [  289.205725] invalid opcode:  [#1] 
  SMP
  Aug  5 02:51:22 murmillia kernel: [  289.205908] Modules linked in: cbc rbd
  libceph libcrc32c xen_gntdev ip6table_mangle ip6t_REJECT ip6table_filter
  ip6_tables xt_DSCP iptable_mangle xt_LOG xt_physdev ipt_REJECT
  xt_tcpudp iptable_filter ip_tables x_tables bridge loop coretemp
  ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper
  ablk_helper cryptd iTCO_wdt iTCO_vendor_support gpio_ich microcode
  serio_raw sb_edac edac_core evdev lpc_ich i2c_i801 mfd_core wmi ac
  ioatdma shpchp button dm_mod hid_generic usbhid hid sg sd_mod
  crc_t10dif crc32c_intel isci megaraid_sas libsas ahci libahci ehci_pci 
  ehci_hcd
  libata scsi_transport_sas igb scsi_mod i2c_algo_bit ixgbe usbcore i2c_core
  dca usb_common ptp pps_core mdio
  Aug  5 02:51:22 murmillia kernel: [  289.210499] CPU: 2 PID: 5326 Comm:
  blkback.3.xvda Not tainted 3.10-dae-dom0 #1
  Aug  5 02:51:22 murmillia kernel: [  289.210617] Hardware name: Supermicro
  X9DRW-7TPF+/X9DRW-7TPF+, BIOS 2.0a 03/11/2013
  Aug  5 02:51:22 murmillia kernel: [  289.210738] task: 880037d01040 ti:
  88003803a000 task.ti: 88003803a000
  Aug  5 02:51:22 murmillia kernel: [  289.210858] RIP: 
  e030:[a02d21d0]
  [a02d21d0] ceph_osdc_build_request+0x2bb/0x3c6 [libceph]
  Aug  5 02:51:22 murmillia kernel: [  289.211062] RSP: e02b:88003803b9f8
  EFLAGS: 00010212
  Aug  5 02:51:22 murmillia kernel: [  289.211154] RAX: 880033a181c0 RBX:
  880033a182ec RCX: 
  Aug  5 02:51:22 murmillia kernel: [  289.211251] RDX: 880033a182af RSI:
  8050 RDI: 880030d34888
  Aug  5 02:51:22 murmillia kernel: [  289.211347] RBP: 2000 R08:
  88003803ba58 R09: 
  Aug  5 02:51:22 murmillia kernel: [  289.211444] R10:  R11:
   R12: 880033ba3500
  Aug  5 02:51:22 murmillia kernel: [  289.211541] R13: 0001 R14:
  88003847aa78 R15: 88003847ab58
  Aug  5 02:51:22 murmillia kernel: [  289.211644] FS:  7f775da8c700()
  GS:88003f84() knlGS:
  Aug  5 02:51:22 murmillia kernel: [  289.211765] CS:  e033 DS:  ES: 
  CR0: 80050033
  Aug  5 02:51:22 murmillia kernel: [  289.211858] CR2: 7fa21ee2c000 CR3:
  2be14000 CR4: 00042660
  Aug  5 02:51:22 murmillia kernel: [  289.211956] DR0:  DR1:
   DR2: 
  Aug  5 02:51:22 murmillia kernel: [  289.212052] DR3:  DR6:
  0ff0 DR7: 0400
  Aug  5 02:51:22 murmillia kernel: [  289.212148] Stack:
  Aug  5 02:51:22 murmillia kernel: [  289.212232]  2000
  00243847aa78  880039949b40
  Aug  5 02:51:22 murmillia kernel: [  289.212577]  2201
  880033811d98 88003803ba80 88003847aa78
  Aug  5 02:51:22 murmillia kernel: [  289.212921]  880030f24380
  880002a38400 2000 a029584c
  Aug  5 02:51:22 murmillia kernel: [  289.213264] Call Trace:
  Aug  5 02:51:22 murmillia kernel: [  289.213358]  [a029584c] ?
  rbd_osd_req_format_write+0x71/0x7c [rbd]
  Aug  5 02:51:22 murmillia kernel: [  289.213459]  [a0296f05] ?
  rbd_img_request_fill+0x695/0x736 [rbd]
  Aug  5 02:51:22 murmillia kernel: [  289.213562]  [810c96a7] ?
  arch_local_irq_restore+0x7/0x8
  Aug  5 02:51:22 murmillia kernel: [  289.213667]  [81357ff8] ?
  down_read+0x9/0x19
  Aug  5 02:51:22 murmillia kernel: [  289.213763]  [a029828a] ?
  rbd_request_fn+0x191/0x22e [rbd]
  Aug  5 02:51:22 murmillia kernel: [  289.213864]  [8117ac9e] ?
  __blk_run_queue_uncond+0x1e/0x26
  Aug  5 02:51:22 murmillia kernel: [  289.213962]  [8117b7aa] ?
  blk_flush_plug_list+0x1c1/0x1e4
  Aug  5 02:51:22 murmillia kernel: [  289.214059]  [8117baad] ?
  blk_finish_plug+0xb/0x2a
  Aug  5 02:51:22 murmillia kernel: [  289.214157]  [81255c36] ?
  dispatch_rw_block_io+0x33e/0x3f0
  Aug  5 02:51:22

Re: [ceph-users] kernel BUG at net/ceph/osd_client.c:2103

2013-08-04 Thread Olivier Bonvalet

Sorry, the dev list is probably a better place for that one.

Le lundi 05 août 2013 à 03:07 +0200, Olivier Bonvalet a écrit :
 Hi,
 
 I've just upgraded a Xen Dom0 (Debian Wheezy with Xen 4.2.2) from Linux
 3.9.11 to Linux 3.10.5, and now I have kernel panic after launching some
 VM which use RBD kernel client. 
 
 
 In kernel logs, I have :
 
 Aug  5 02:51:22 murmillia kernel: [  289.205652] kernel BUG at 
 net/ceph/osd_client.c:2103!
 Aug  5 02:51:22 murmillia kernel: [  289.205725] invalid opcode:  [#1] 
 SMP 
 Aug  5 02:51:22 murmillia kernel: [  289.205908] Modules linked in: cbc rbd 
 libceph libcrc32c xen_gntdev ip6table_mangle ip6t_REJECT ip6table_filter 
 ip6_tables xt_DSCP iptable_mangle xt_LOG xt_physdev ipt_REJECT xt_tcpudp 
 iptable_filter ip_tables x_tables bridge loop coretemp ghash_clmulni_intel 
 aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt 
 iTCO_vendor_support gpio_ich microcode serio_raw sb_edac edac_core evdev 
 lpc_ich i2c_i801 mfd_core wmi ac ioatdma shpchp button dm_mod hid_generic 
 usbhid hid sg sd_mod crc_t10dif crc32c_intel isci megaraid_sas libsas ahci 
 libahci ehci_pci ehci_hcd libata scsi_transport_sas igb scsi_mod i2c_algo_bit 
 ixgbe usbcore i2c_core dca usb_common ptp pps_core mdio
 Aug  5 02:51:22 murmillia kernel: [  289.210499] CPU: 2 PID: 5326 Comm: 
 blkback.3.xvda Not tainted 3.10-dae-dom0 #1
 Aug  5 02:51:22 murmillia kernel: [  289.210617] Hardware name: Supermicro 
 X9DRW-7TPF+/X9DRW-7TPF+, BIOS 2.0a 03/11/2013
 Aug  5 02:51:22 murmillia kernel: [  289.210738] task: 880037d01040 ti: 
 88003803a000 task.ti: 88003803a000
 Aug  5 02:51:22 murmillia kernel: [  289.210858] RIP: 
 e030:[a02d21d0]  [a02d21d0] 
 ceph_osdc_build_request+0x2bb/0x3c6 [libceph]
 Aug  5 02:51:22 murmillia kernel: [  289.211062] RSP: e02b:88003803b9f8  
 EFLAGS: 00010212
 Aug  5 02:51:22 murmillia kernel: [  289.211154] RAX: 880033a181c0 RBX: 
 880033a182ec RCX: 
 Aug  5 02:51:22 murmillia kernel: [  289.211251] RDX: 880033a182af RSI: 
 8050 RDI: 880030d34888
 Aug  5 02:51:22 murmillia kernel: [  289.211347] RBP: 2000 R08: 
 88003803ba58 R09: 
 Aug  5 02:51:22 murmillia kernel: [  289.211444] R10:  R11: 
  R12: 880033ba3500
 Aug  5 02:51:22 murmillia kernel: [  289.211541] R13: 0001 R14: 
 88003847aa78 R15: 88003847ab58
 Aug  5 02:51:22 murmillia kernel: [  289.211644] FS:  7f775da8c700() 
 GS:88003f84() knlGS:
 Aug  5 02:51:22 murmillia kernel: [  289.211765] CS:  e033 DS:  ES:  
 CR0: 80050033
 Aug  5 02:51:22 murmillia kernel: [  289.211858] CR2: 7fa21ee2c000 CR3: 
 2be14000 CR4: 00042660
 Aug  5 02:51:22 murmillia kernel: [  289.211956] DR0:  DR1: 
  DR2: 
 Aug  5 02:51:22 murmillia kernel: [  289.212052] DR3:  DR6: 
 0ff0 DR7: 0400
 Aug  5 02:51:22 murmillia kernel: [  289.212148] Stack:
 Aug  5 02:51:22 murmillia kernel: [  289.212232]  2000 
 00243847aa78  880039949b40
 Aug  5 02:51:22 murmillia kernel: [  289.212577]  2201 
 880033811d98 88003803ba80 88003847aa78
 Aug  5 02:51:22 murmillia kernel: [  289.212921]  880030f24380 
 880002a38400 2000 a029584c
 Aug  5 02:51:22 murmillia kernel: [  289.213264] Call Trace:
 Aug  5 02:51:22 murmillia kernel: [  289.213358]  [a029584c] ? 
 rbd_osd_req_format_write+0x71/0x7c [rbd]
 Aug  5 02:51:22 murmillia kernel: [  289.213459]  [a0296f05] ? 
 rbd_img_request_fill+0x695/0x736 [rbd]
 Aug  5 02:51:22 murmillia kernel: [  289.213562]  [810c96a7] ? 
 arch_local_irq_restore+0x7/0x8
 Aug  5 02:51:22 murmillia kernel: [  289.213667]  [81357ff8] ? 
 down_read+0x9/0x19
 Aug  5 02:51:22 murmillia kernel: [  289.213763]  [a029828a] ? 
 rbd_request_fn+0x191/0x22e [rbd]
 Aug  5 02:51:22 murmillia kernel: [  289.213864]  [8117ac9e] ? 
 __blk_run_queue_uncond+0x1e/0x26
 Aug  5 02:51:22 murmillia kernel: [  289.213962]  [8117b7aa] ? 
 blk_flush_plug_list+0x1c1/0x1e4
 Aug  5 02:51:22 murmillia kernel: [  289.214059]  [8117baad] ? 
 blk_finish_plug+0xb/0x2a
 Aug  5 02:51:22 murmillia kernel: [  289.214157]  [81255c36] ? 
 dispatch_rw_block_io+0x33e/0x3f0
 Aug  5 02:51:22 murmillia kernel: [  289.214259]  [81054f4b] ? 
 find_busiest_group+0x28/0x1d4
 Aug  5 02:51:22 murmillia kernel: [  289.214357]  [810551b0] ? 
 load_balance+0xb9/0x5e1
 Aug  5 02:51:22 murmillia kernel: [  289.214454]  [8100122a] ? 
 xen_hypercall_xen_version+0xa/0x20
 Aug  5 02:51:22 murmillia kernel: [  289.214552]  [81255f40] ? 
 __do_block_io_op+0x258/0x390
 Aug  5 02:51:22 murmillia kernel: [  289.214649]  [810026fa

Re: [ceph-users] kernel BUG at net/ceph/osd_client.c:2103

2013-08-04 Thread Olivier Bonvalet

Yes of course, thanks !

Le dimanche 04 août 2013 à 20:59 -0700, Sage Weil a écrit :
 Hi Olivier,
 
 This looks like http://tracker.ceph.com/issues/5760.  We should be able to 
 look at this more closely this week.  In the meantime, you might want to 
 go back to 3.9.x.  If we have a patch that addresses the bug, would you be 
 able to test it?
 
 Thanks!
 sage
 
 
 On Mon, 5 Aug 2013, Olivier Bonvalet wrote:
  Sorry, the dev list is probably a better place for that one.
  
  Le lundi 05 ao?t 2013 ? 03:07 +0200, Olivier Bonvalet a ?crit :
   Hi,
   
   I've just upgraded a Xen Dom0 (Debian Wheezy with Xen 4.2.2) from Linux
   3.9.11 to Linux 3.10.5, and now I have kernel panic after launching some
   VM which use RBD kernel client. 
   
   
   In kernel logs, I have :
   
   Aug  5 02:51:22 murmillia kernel: [  289.205652] kernel BUG at 
   net/ceph/osd_client.c:2103!
   Aug  5 02:51:22 murmillia kernel: [  289.205725] invalid opcode:  
   [#1] SMP 
   Aug  5 02:51:22 murmillia kernel: [  289.205908] Modules linked in: cbc 
   rbd libceph libcrc32c xen_gntdev ip6table_mangle ip6t_REJECT 
   ip6table_filter ip6_tables xt_DSCP iptable_mangle xt_LOG xt_physdev 
   ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge loop 
   coretemp ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
   glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support gpio_ich 
   microcode serio_raw sb_edac edac_core evdev lpc_ich i2c_i801 mfd_core wmi 
   ac ioatdma shpchp button dm_mod hid_generic usbhid hid sg sd_mod 
   crc_t10dif crc32c_intel isci megaraid_sas libsas ahci libahci ehci_pci 
   ehci_hcd libata scsi_transport_sas igb scsi_mod i2c_algo_bit ixgbe 
   usbcore i2c_core dca usb_common ptp pps_core mdio
   Aug  5 02:51:22 murmillia kernel: [  289.210499] CPU: 2 PID: 5326 Comm: 
   blkback.3.xvda Not tainted 3.10-dae-dom0 #1
   Aug  5 02:51:22 murmillia kernel: [  289.210617] Hardware name: 
   Supermicro X9DRW-7TPF+/X9DRW-7TPF+, BIOS 2.0a 03/11/2013
   Aug  5 02:51:22 murmillia kernel: [  289.210738] task: 880037d01040 
   ti: 88003803a000 task.ti: 88003803a000
   Aug  5 02:51:22 murmillia kernel: [  289.210858] RIP: 
   e030:[a02d21d0]  [a02d21d0] 
   ceph_osdc_build_request+0x2bb/0x3c6 [libceph]
   Aug  5 02:51:22 murmillia kernel: [  289.211062] RSP: 
   e02b:88003803b9f8  EFLAGS: 00010212
   Aug  5 02:51:22 murmillia kernel: [  289.211154] RAX: 880033a181c0 
   RBX: 880033a182ec RCX: 
   Aug  5 02:51:22 murmillia kernel: [  289.211251] RDX: 880033a182af 
   RSI: 8050 RDI: 880030d34888
   Aug  5 02:51:22 murmillia kernel: [  289.211347] RBP: 2000 
   R08: 88003803ba58 R09: 
   Aug  5 02:51:22 murmillia kernel: [  289.211444] R10:  
   R11:  R12: 880033ba3500
   Aug  5 02:51:22 murmillia kernel: [  289.211541] R13: 0001 
   R14: 88003847aa78 R15: 88003847ab58
   Aug  5 02:51:22 murmillia kernel: [  289.211644] FS:  
   7f775da8c700() GS:88003f84() knlGS:
   Aug  5 02:51:22 murmillia kernel: [  289.211765] CS:  e033 DS:  ES: 
    CR0: 80050033
   Aug  5 02:51:22 murmillia kernel: [  289.211858] CR2: 7fa21ee2c000 
   CR3: 2be14000 CR4: 00042660
   Aug  5 02:51:22 murmillia kernel: [  289.211956] DR0:  
   DR1:  DR2: 
   Aug  5 02:51:22 murmillia kernel: [  289.212052] DR3:  
   DR6: 0ff0 DR7: 0400
   Aug  5 02:51:22 murmillia kernel: [  289.212148] Stack:
   Aug  5 02:51:22 murmillia kernel: [  289.212232]  2000 
   00243847aa78  880039949b40
   Aug  5 02:51:22 murmillia kernel: [  289.212577]  2201 
   880033811d98 88003803ba80 88003847aa78
   Aug  5 02:51:22 murmillia kernel: [  289.212921]  880030f24380 
   880002a38400 2000 a029584c
   Aug  5 02:51:22 murmillia kernel: [  289.213264] Call Trace:
   Aug  5 02:51:22 murmillia kernel: [  289.213358]  [a029584c] ? 
   rbd_osd_req_format_write+0x71/0x7c [rbd]
   Aug  5 02:51:22 murmillia kernel: [  289.213459]  [a0296f05] ? 
   rbd_img_request_fill+0x695/0x736 [rbd]
   Aug  5 02:51:22 murmillia kernel: [  289.213562]  [810c96a7] ? 
   arch_local_irq_restore+0x7/0x8
   Aug  5 02:51:22 murmillia kernel: [  289.213667]  [81357ff8] ? 
   down_read+0x9/0x19
   Aug  5 02:51:22 murmillia kernel: [  289.213763]  [a029828a] ? 
   rbd_request_fn+0x191/0x22e [rbd]
   Aug  5 02:51:22 murmillia kernel: [  289.213864]  [8117ac9e] ? 
   __blk_run_queue_uncond+0x1e/0x26
   Aug  5 02:51:22 murmillia kernel: [  289.213962]  [8117b7aa] ? 
   blk_flush_plug_list+0x1c1/0x1e4
   Aug  5 02:51:22 murmillia kernel: [  289.214059]  [8117baad] ? 
   blk_finish_plug+0xb/0x2a
   Aug  5 02:51:22 murmillia kernel

Re: [ceph-users] VMs freez after slow requests

2013-06-03 Thread Olivier Bonvalet


Le lundi 03 juin 2013 à 08:04 -0700, Gregory Farnum a écrit :
 On Sunday, June 2, 2013, Dominik Mostowiec wrote:
 Hi,
 I try to start postgres cluster on VMs with second disk
 mounted from
 ceph (rbd - kvm).
 I started some writes (pgbench initialisation) on 8 VMs and
 VMs freez.
 Ceph reports slow request on 1 osd. I restarted this osd to
 remove
 slows and VMs hangs permanently.
 Is this a normal situation afer cluster problems?
 
 
 Definitely not. Is your cluster reporting as healthy (what's ceph -s
 say)? Can you get anything off your hung VMs (like dmesg output)?
 -Greg
 
 
 -- 
 Software Engineer #42 @ http://inktank.com | http://ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Hi,

I also see that with Xen and kernel RBD client, when the ceph cluster
was full : in fact after some errors the block device switch in
read-only mode, and I didn't find any way to fix that (mount -o
remount,rw doesn't work). I had to reboot all the VM.

But since I don't have to unmap/remap RBD device, I don't think it's a
Ceph/RBD problem. Probably a Xen or Linux feature.

Olivier





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] scrub error: found clone without head

2013-05-31 Thread Olivier Bonvalet

Hi,

sorry for the late answer : trying to fix that, I tried to delete the
image (rbd rm XXX), the rbd rm complete without errors, but rbd ls
still display this image.

What should I do ?


Here the files for the PG 3.6b :

# find /var/lib/ceph/osd/ceph-28/current/3.6b_head/ -name 
'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
-rw-r--r-- 1 root root 4194304 19 mai   22:52 
/var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.9221__12d7_ADE3C16B__3
-rw-r--r-- 1 root root 4194304 19 mai   23:00 
/var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.3671__12d7_261CC0EB__3
-rw-r--r-- 1 root root 4194304 19 mai   22:59 
/var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.86a2__12d7_B10DEAEB__3

# find /var/lib/ceph/osd/ceph-23/current/3.6b_head/ -name 
'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
-rw-r--r-- 1 root root 4194304 25 mars  19:18 
/var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.9221__12d7_ADE3C16B__3
-rw-r--r-- 1 root root 4194304 25 mars  19:33 
/var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.3671__12d7_261CC0EB__3
-rw-r--r-- 1 root root 4194304 25 mars  19:34 
/var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.86a2__12d7_B10DEAEB__3

# find /var/lib/ceph/osd/ceph-5/current/3.6b_head/ -name 'rb.0.15c26.238e1f29*' 
-print0 | xargs -r -0 ls -l
-rw-r--r-- 1 root root 4194304 25 mars  19:18 
/var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.9221__12d7_ADE3C16B__3
-rw-r--r-- 1 root root 4194304 25 mars  19:33 
/var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.3671__12d7_261CC0EB__3
-rw-r--r-- 1 root root 4194304 25 mars  19:34 
/var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.86a2__12d7_B10DEAEB__3


As you can see, OSD doesn't contain any other data on thoses PG for this RBD 
image. Should I remove them thought rados ?


In fact I remember that some of thoses files was truncated (size 0), then I 
manually copy data from osd-5. It was probably an error to do that.


Thanks,
Olivier

Le jeudi 23 mai 2013 à 15:53 -0700, Samuel Just a écrit :
 Can you send the filenames in the pg directories for those 4 pgs?
 -Sam
 
 On Thu, May 23, 2013 at 3:27 PM, Olivier Bonvalet ceph.l...@daevel.fr wrote:
  No :
  pg 3.7c is active+clean+inconsistent, acting [24,13,39]
  pg 3.6b is active+clean+inconsistent, acting [28,23,5]
  pg 3.d is active+clean+inconsistent, acting [29,4,11]
  pg 3.1 is active+clean+inconsistent, acting [28,19,5]
 
  But I suppose that all PG *was* having the osd.25 as primary (on the
  same host), which is (disabled) buggy OSD.
 
  Question : 12d7 in object path is the snapshot id, right ? If it's the
  case, I haven't got any snapshot with this id for the
  rb.0.15c26.238e1f29 image.
 
  So, which files should I remove ?
 
  Thanks for your help.
 
 
  Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit :
  Do all of the affected PGs share osd.28 as the primary?  I think the
  only recovery is probably to manually remove the orphaned clones.
  -Sam
 
  On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet ceph.l...@daevel.fr 
  wrote:
   Not yet. I keep it for now.
  
   Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
   rb.0.15c26.238e1f29
  
   Has that rbd volume been removed?
   -Sam
  
   On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet 
   ceph.l...@daevel.fr wrote:
0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
   
   
Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
What version are you running?
-Sam
   
On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet 
ceph.l...@daevel.fr wrote:
 Is it enough ?

 # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found 
 clone without head'
 2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub 
 ok
 2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub 
 ok
 2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub 
 ok
 2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub 
 ok
 2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub 
 ok
 2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b 
 ade3c16b/rb.0.15c26.238e1f29.9221/12d7//3 found clone 
 without head
 2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 
 261cc0eb/rb.0.15c26.238e1f29.3671/12d7//3 found clone 
 without head
 2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b 
 b10deaeb/rb.0.15c26.238e1f29.86a2/12d7//3 found clone 
 without head
 2013-05-22 15:57:51.667085 7f707dd64700  0 log

Re: [ceph-users] scrub error: found clone without head

2013-05-31 Thread Olivier Bonvalet

Note that I still have scrub errors, but rados doesn't see thoses
objects :

root! brontes:~# rados -p hdd3copies ls | grep '^rb.0.15c26.238e1f29'
root! brontes:~# 



Le vendredi 31 mai 2013 à 15:36 +0200, Olivier Bonvalet a écrit :
 Hi,
 
 sorry for the late answer : trying to fix that, I tried to delete the
 image (rbd rm XXX), the rbd rm complete without errors, but rbd ls
 still display this image.
 
 What should I do ?
 
 
 Here the files for the PG 3.6b :
 
 # find /var/lib/ceph/osd/ceph-28/current/3.6b_head/ -name 
 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
 -rw-r--r-- 1 root root 4194304 19 mai   22:52 
 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.9221__12d7_ADE3C16B__3
 -rw-r--r-- 1 root root 4194304 19 mai   23:00 
 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.3671__12d7_261CC0EB__3
 -rw-r--r-- 1 root root 4194304 19 mai   22:59 
 /var/lib/ceph/osd/ceph-28/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.86a2__12d7_B10DEAEB__3
 
 # find /var/lib/ceph/osd/ceph-23/current/3.6b_head/ -name 
 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
 -rw-r--r-- 1 root root 4194304 25 mars  19:18 
 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.9221__12d7_ADE3C16B__3
 -rw-r--r-- 1 root root 4194304 25 mars  19:33 
 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.3671__12d7_261CC0EB__3
 -rw-r--r-- 1 root root 4194304 25 mars  19:34 
 /var/lib/ceph/osd/ceph-23/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.86a2__12d7_B10DEAEB__3
 
 # find /var/lib/ceph/osd/ceph-5/current/3.6b_head/ -name 
 'rb.0.15c26.238e1f29*' -print0 | xargs -r -0 ls -l
 -rw-r--r-- 1 root root 4194304 25 mars  19:18 
 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_6/DIR_1/DIR_C/rb.0.15c26.238e1f29.9221__12d7_ADE3C16B__3
 -rw-r--r-- 1 root root 4194304 25 mars  19:33 
 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_0/DIR_C/rb.0.15c26.238e1f29.3671__12d7_261CC0EB__3
 -rw-r--r-- 1 root root 4194304 25 mars  19:34 
 /var/lib/ceph/osd/ceph-5/current/3.6b_head/DIR_B/DIR_E/DIR_A/DIR_E/rb.0.15c26.238e1f29.86a2__12d7_B10DEAEB__3
 
 
 As you can see, OSD doesn't contain any other data on thoses PG for this RBD 
 image. Should I remove them thought rados ?
 
 
 In fact I remember that some of thoses files was truncated (size 0), then I 
 manually copy data from osd-5. It was probably an error to do that.
 
 
 Thanks,
 Olivier
 
 Le jeudi 23 mai 2013 à 15:53 -0700, Samuel Just a écrit :
  Can you send the filenames in the pg directories for those 4 pgs?
  -Sam
  
  On Thu, May 23, 2013 at 3:27 PM, Olivier Bonvalet ceph.l...@daevel.fr 
  wrote:
   No :
   pg 3.7c is active+clean+inconsistent, acting [24,13,39]
   pg 3.6b is active+clean+inconsistent, acting [28,23,5]
   pg 3.d is active+clean+inconsistent, acting [29,4,11]
   pg 3.1 is active+clean+inconsistent, acting [28,19,5]
  
   But I suppose that all PG *was* having the osd.25 as primary (on the
   same host), which is (disabled) buggy OSD.
  
   Question : 12d7 in object path is the snapshot id, right ? If it's the
   case, I haven't got any snapshot with this id for the
   rb.0.15c26.238e1f29 image.
  
   So, which files should I remove ?
  
   Thanks for your help.
  
  
   Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit :
   Do all of the affected PGs share osd.28 as the primary?  I think the
   only recovery is probably to manually remove the orphaned clones.
   -Sam
  
   On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet ceph.l...@daevel.fr 
   wrote:
Not yet. I keep it for now.
   
Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
rb.0.15c26.238e1f29
   
Has that rbd volume been removed?
-Sam
   
On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet 
ceph.l...@daevel.fr wrote:
 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.


 Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
 What version are you running?
 -Sam

 On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet 
 ceph.l...@daevel.fr wrote:
  Is it enough ?
 
  # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found 
  clone without head'
  2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 
  scrub ok
  2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 
  scrub ok
  2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 
  scrub ok
  2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 
  scrub ok
  2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 
  scrub ok
  2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 
  3.6b ade3c16b/rb.0.15c26.238e1f29.9221/12d7//3 found 
  clone without head
  2013-05

Re: [ceph-users] [solved] scrub error: found clone without head

2013-05-31 Thread Olivier Bonvalet

Ok, so :
- after a second rbd rm XXX, the image was gone
- and rados ls doesn't see any object from that image
- so I tried to move thoses files

= scrub is now ok !

So for me it's fixed. Thanks

Le vendredi 31 mai 2013 à 16:34 +0200, Olivier Bonvalet a écrit :
 Note that I still have scrub errors, but rados doesn't see thoses
 objects :
 
 root! brontes:~# rados -p hdd3copies ls | grep '^rb.0.15c26.238e1f29'
 root! brontes:~# 
 
 
 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Edge effect with multiple RBD kernel clients per host ?

2013-05-25 Thread Olivier Bonvalet

Hi,

I seem to have a bad edge effect in my setup, don't know if it's a RBD
problem or a Xen problem.

So, I have one Ceph cluster, in which I setup 2 different storage
pools : one on SSD and one on SAS. With appropriate CRUSH rules, those
pools are complety separated, only MON are commons.

Then, on a Xen host A, I run VMSSD and VMSAS. If I launch a big
reballance on the SSD pool, then the VMSSD *and* VMSAS will slow
down (a lot of IOWait). But if I move the VMSAS on a different Xen
host (B), then VMSSD will still be slow, but the VMSAS will be fast
again.

The first thing I checked is the network of the Xen host A, but I didn't
find any problem.

So, is there a queue, shared by all RBD kernel clients running on a same
host ? Or something which can explain this edge effect ?


Olivier

PS : one precision, I have about 60 RBD mapped on the Xen host A,
don't know if it can be the key of the problem.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] scrub error: found clone without head

2013-05-23 Thread Olivier Bonvalet

Not yet. I keep it for now.

Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
 rb.0.15c26.238e1f29
 
 Has that rbd volume been removed?
 -Sam
 
 On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet ceph.l...@daevel.fr 
 wrote:
  0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
 
 
  Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
  What version are you running?
  -Sam
 
  On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet ceph.l...@daevel.fr 
  wrote:
   Is it enough ?
  
   # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone 
   without head'
   2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub ok
   2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub ok
   2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub ok
   2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub ok
   2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub ok
   2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b 
   ade3c16b/rb.0.15c26.238e1f29.9221/12d7//3 found clone without 
   head
   2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 
   261cc0eb/rb.0.15c26.238e1f29.3671/12d7//3 found clone without 
   head
   2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b 
   b10deaeb/rb.0.15c26.238e1f29.86a2/12d7//3 found clone without 
   head
   2013-05-22 15:57:51.667085 7f707dd64700  0 log [ERR] : 3.6b scrub 3 
   errors
   2013-05-22 15:57:55.241224 7f707dd64700  0 log [INF] : 9.450 scrub ok
   2013-05-22 15:57:59.800383 7f707cd62700  0 log [INF] : 9.465 scrub ok
   2013-05-22 15:59:55.024065 7f707661a700  0 -- 192.168.42.3:6803/12142  
   192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 
   cs=73 l=0).fault with nothing to send, going to standby
   2013-05-22 16:01:45.542579 7f7022770700  0 -- 192.168.42.3:6803/12142  
   192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 
   l=0).accept connect_seq 74 vs existing 73 state standby
   --
   2013-05-22 16:29:49.544310 7f707dd64700  0 log [INF] : 9.4eb scrub ok
   2013-05-22 16:29:53.190233 7f707dd64700  0 log [INF] : 9.4f4 scrub ok
   2013-05-22 16:29:59.478736 7f707dd64700  0 log [INF] : 8.6bb scrub ok
   2013-05-22 16:35:12.240246 7f7022770700  0 -- 192.168.42.3:6803/12142  
   192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 
   l=0).fault with nothing to send, going to standby
   2013-05-22 16:35:19.519019 7f707d563700  0 log [INF] : 8.700 scrub ok
   2013-05-22 16:39:15.422532 7f707dd64700  0 log [ERR] : scrub 3.1 
   b1869301/rb.0.15c26.238e1f29.0836/12d7//3 found clone without 
   head
   2013-05-22 16:40:04.995256 7f707cd62700  0 log [ERR] : scrub 3.1 
   bccad701/rb.0.15c26.238e1f29.9a00/12d7//3 found clone without 
   head
   2013-05-22 16:41:07.008717 7f707d563700  0 log [ERR] : scrub 3.1 
   8a9bec01/rb.0.15c26.238e1f29.9820/12d7//3 found clone without 
   head
   2013-05-22 16:41:42.460280 7f707c561700  0 log [ERR] : 3.1 scrub 3 errors
   2013-05-22 16:46:12.385678 7f7077735700  0 -- 192.168.42.3:6803/12142  
   192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 
   l=0).accept connect_seq 76 vs existing 75 state standby
   2013-05-22 16:58:36.079010 7f707661a700  0 -- 192.168.42.3:6803/12142  
   192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 
   l=0).accept connect_seq 40 vs existing 39 state standby
   2013-05-22 16:58:36.798038 7f707d563700  0 log [INF] : 9.50c scrub ok
   2013-05-22 16:58:40.104159 7f707c561700  0 log [INF] : 9.526 scrub ok
  
  
   Note : I have 8 scrub errors like that, on 4 impacted PG, and all 
   impacted objects are about the same RBD image (rb.0.15c26.238e1f29).
  
  
  
   Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
   Can you post your ceph.log with the period including all of these 
   errors?
   -Sam
  
   On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
   maha...@bspu.unibel.by wrote:
Olivier Bonvalet пишет:
   
Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
I have 4 scrub errors (3 PGs - found clone without head), on one 
OSD. Not
repairing. How to repair it exclude re-creating of OSD?
   
Now it easy to clean+create OSD, but in theory - in case there 
are multiple
OSDs - it may cause data lost.
   
I have same problem : 8 objects (4 PG) with error found clone 
without
head. How can I fix that ?
since pg repair doesn't handle that kind of errors, is there a way 
to
manually fix that ? (it's a production cluster)
   
Trying to fix manually I cause assertions in trimming process (died 
OSD). And
many others troubles. So, if you want to keep cluster running, wait 
for
developers answer. IMHO.
   
About manual repair attempt: see issue #4937. Also

Re: [ceph-users] scrub error: found clone without head

2013-05-23 Thread Olivier Bonvalet

No : 
pg 3.7c is active+clean+inconsistent, acting [24,13,39]
pg 3.6b is active+clean+inconsistent, acting [28,23,5]
pg 3.d is active+clean+inconsistent, acting [29,4,11]
pg 3.1 is active+clean+inconsistent, acting [28,19,5]

But I suppose that all PG *was* having the osd.25 as primary (on the
same host), which is (disabled) buggy OSD.

Question : 12d7 in object path is the snapshot id, right ? If it's the
case, I haven't got any snapshot with this id for the
rb.0.15c26.238e1f29 image.

So, which files should I remove ?

Thanks for your help.


Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit :
 Do all of the affected PGs share osd.28 as the primary?  I think the
 only recovery is probably to manually remove the orphaned clones.
 -Sam
 
 On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote:
  Not yet. I keep it for now.
 
  Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit :
  rb.0.15c26.238e1f29
 
  Has that rbd volume been removed?
  -Sam
 
  On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet ceph.l...@daevel.fr 
  wrote:
   0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail.
  
  
   Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit :
   What version are you running?
   -Sam
  
   On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet 
   ceph.l...@daevel.fr wrote:
Is it enough ?
   
# tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone 
without head'
2013-05-22 15:43:09.308352 7f707dd64700  0 log [INF] : 9.105 scrub ok
2013-05-22 15:44:21.054893 7f707dd64700  0 log [INF] : 9.451 scrub ok
2013-05-22 15:44:52.898784 7f707cd62700  0 log [INF] : 9.784 scrub ok
2013-05-22 15:47:43.148515 7f707cd62700  0 log [INF] : 9.3c3 scrub ok
2013-05-22 15:47:45.717085 7f707dd64700  0 log [INF] : 9.3d0 scrub ok
2013-05-22 15:52:14.573815 7f707dd64700  0 log [ERR] : scrub 3.6b 
ade3c16b/rb.0.15c26.238e1f29.9221/12d7//3 found clone without 
head
2013-05-22 15:55:07.230114 7f707d563700  0 log [ERR] : scrub 3.6b 
261cc0eb/rb.0.15c26.238e1f29.3671/12d7//3 found clone without 
head
2013-05-22 15:56:56.456242 7f707d563700  0 log [ERR] : scrub 3.6b 
b10deaeb/rb.0.15c26.238e1f29.86a2/12d7//3 found clone without 
head
2013-05-22 15:57:51.667085 7f707dd64700  0 log [ERR] : 3.6b scrub 3 
errors
2013-05-22 15:57:55.241224 7f707dd64700  0 log [INF] : 9.450 scrub ok
2013-05-22 15:57:59.800383 7f707cd62700  0 log [INF] : 9.465 scrub ok
2013-05-22 15:59:55.024065 7f707661a700  0 -- 192.168.42.3:6803/12142 
 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 
pgs=200652 cs=73 l=0).fault with nothing to send, going to standby
2013-05-22 16:01:45.542579 7f7022770700  0 -- 192.168.42.3:6803/12142 
 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 
l=0).accept connect_seq 74 vs existing 73 state standby
--
2013-05-22 16:29:49.544310 7f707dd64700  0 log [INF] : 9.4eb scrub ok
2013-05-22 16:29:53.190233 7f707dd64700  0 log [INF] : 9.4f4 scrub ok
2013-05-22 16:29:59.478736 7f707dd64700  0 log [INF] : 8.6bb scrub ok
2013-05-22 16:35:12.240246 7f7022770700  0 -- 192.168.42.3:6803/12142 
 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 
cs=75 l=0).fault with nothing to send, going to standby
2013-05-22 16:35:19.519019 7f707d563700  0 log [INF] : 8.700 scrub ok
2013-05-22 16:39:15.422532 7f707dd64700  0 log [ERR] : scrub 3.1 
b1869301/rb.0.15c26.238e1f29.0836/12d7//3 found clone without 
head
2013-05-22 16:40:04.995256 7f707cd62700  0 log [ERR] : scrub 3.1 
bccad701/rb.0.15c26.238e1f29.9a00/12d7//3 found clone without 
head
2013-05-22 16:41:07.008717 7f707d563700  0 log [ERR] : scrub 3.1 
8a9bec01/rb.0.15c26.238e1f29.9820/12d7//3 found clone without 
head
2013-05-22 16:41:42.460280 7f707c561700  0 log [ERR] : 3.1 scrub 3 
errors
2013-05-22 16:46:12.385678 7f7077735700  0 -- 192.168.42.3:6803/12142 
 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 
cs=0 l=0).accept connect_seq 76 vs existing 75 state standby
2013-05-22 16:58:36.079010 7f707661a700  0 -- 192.168.42.3:6803/12142 
 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 
l=0).accept connect_seq 40 vs existing 39 state standby
2013-05-22 16:58:36.798038 7f707d563700  0 log [INF] : 9.50c scrub ok
2013-05-22 16:58:40.104159 7f707c561700  0 log [INF] : 9.526 scrub ok
   
   
Note : I have 8 scrub errors like that, on 4 impacted PG, and all 
impacted objects are about the same RBD image (rb.0.15c26.238e1f29).
   
   
   
Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit :
Can you post your ceph.log with the period including all of these 
errors?
-Sam
   
On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich
maha...@bspu.unibel.by wrote:
 Olivier

Re: [ceph-users] scrub error: found clone without head

2013-05-22 Thread Olivier Bonvalet


Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit :
 Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
  I have 4 scrub errors (3 PGs - found clone without head), on one OSD. Not
  repairing. How to repair it exclude re-creating of OSD?
  
  Now it easy to clean+create OSD, but in theory - in case there are 
  multiple
  OSDs - it may cause data lost.
  
  -- 
  WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
 
 
 Hi,
 
 I have same problem : 8 objects (4 PG) with error found clone without
 head. How can I fix that ?
 
 thanks,
 Olivier
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Hi,

since pg repair doesn't handle that kind of errors, is there a way to
manually fix that ? (it's a production cluster)

thanks in advance,
Olivier

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] scrub error: found clone without head

2013-05-20 Thread Olivier Bonvalet

Great, thanks. I will follow this issue, and add informations if needed.

Le lundi 20 mai 2013 à 17:22 +0300, Dzianis Kahanovich a écrit :
 http://tracker.ceph.com/issues/4937
 
 For me it progressed up to ceph reinstall with repair data from backup (I help
 ceph die, but it was IMHO self-provocation for force reinstall). Now (at least
 to my summer outdoors) I keep v0.62 (3 nodes) with every pool size=3 
 min_size=2
 (was - size=2 min_size=1).
 
 But try to do nothing first and try to install latest version. And keep your
 vote to issue #4937 to force developers.
 
 Olivier Bonvalet пишет:
  Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
  I have 4 scrub errors (3 PGs - found clone without head), on one OSD. Not
  repairing. How to repair it exclude re-creating of OSD?
 
  Now it easy to clean+create OSD, but in theory - in case there are 
  multiple
  OSDs - it may cause data lost.
 
  -- 
  WBR, Dzianis Kahanovich AKA Denis Kaganovich, 
  http://mahatma.bspu.unibel.by/
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
  
  
  Hi,
  
  I have same problem : 8 objects (4 PG) with error found clone without
  head. How can I fix that ?
  
  thanks,
  Olivier
  
  
  
 
 
 -- 
 WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] scrub error: found clone without head

2013-05-19 Thread Olivier Bonvalet

Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
 I have 4 scrub errors (3 PGs - found clone without head), on one OSD. Not
 repairing. How to repair it exclude re-creating of OSD?
 
 Now it easy to clean+create OSD, but in theory - in case there are multiple
 OSDs - it may cause data lost.
 
 -- 
 WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


Hi,

I have same problem : 8 objects (4 PG) with error found clone without
head. How can I fix that ?

thanks,
Olivier

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG down incomplete

2013-05-17 Thread Olivier Bonvalet

 incomplete, last
acting [19,30]
pg 8.71d is stuck unclean since forever, current state incomplete, last
acting [24,19]
pg 8.3fa is stuck unclean since forever, current state incomplete, last
acting [19,31]
pg 8.3e0 is stuck unclean since forever, current state incomplete, last
acting [31,19]
pg 8.56c is stuck unclean since forever, current state incomplete, last
acting [19,28]
pg 8.19f is stuck unclean since forever, current state incomplete, last
acting [31,19]
pg 8.792 is stuck unclean since forever, current state incomplete, last
acting [19,28]
pg 4.0 is stuck unclean since forever, current state incomplete, last
acting [28,19]
pg 8.78a is stuck unclean since forever, current state incomplete, last
acting [31,19]
pg 8.23e is stuck unclean since forever, current state incomplete, last
acting [32,13]
pg 8.2ff is stuck unclean since forever, current state incomplete, last
acting [6,19]
pg 8.5e2 is stuck unclean since forever, current state incomplete, last
acting [0,19]
pg 8.528 is stuck unclean since forever, current state incomplete, last
acting [31,19]
pg 8.20f is stuck unclean since forever, current state incomplete, last
acting [31,19]
pg 8.372 is stuck unclean since forever, current state incomplete, last
acting [19,24]
pg 8.792 is incomplete, acting [19,28]
pg 8.78a is incomplete, acting [31,19]
pg 8.71d is incomplete, acting [24,19]
pg 8.5e2 is incomplete, acting [0,19]
pg 8.56c is incomplete, acting [19,28]
pg 8.528 is incomplete, acting [31,19]
pg 8.3fa is incomplete, acting [19,31]
pg 8.3e0 is incomplete, acting [31,19]
pg 8.372 is incomplete, acting [19,24]
pg 8.2ff is incomplete, acting [6,19]
pg 8.23e is incomplete, acting [32,13]
pg 8.20f is incomplete, acting [31,19]
pg 8.19f is incomplete, acting [31,19]
pg 3.7c is active+clean+inconsistent, acting [24,13,39]
pg 3.6b is active+clean+inconsistent, acting [28,23,5]
pg 4.5c is incomplete, acting [19,30]
pg 3.d is active+clean+inconsistent, acting [29,4,11]
pg 4.0 is incomplete, acting [28,19]
pg 3.1 is active+clean+inconsistent, acting [28,19,5]
osd.10 is near full at 85%
19 scrub errors
noout flag(s) set
mon.d (rank 4) addr 10.0.0.6:6789/0 is down (out of quorum)


Pools 4 and 8 have only 2 replica, and pool 3 have 3 replica but
inconsistent data.

Thanks in advance.

Le vendredi 17 mai 2013 à 00:14 -0700, John Wilkins a écrit :
 If you can follow the documentation here:
 http://ceph.com/docs/master/rados/operations/monitoring-osd-pg/  and
 http://ceph.com/docs/master/rados/troubleshooting/  to provide some
 additional information, we may be better able to help you.
 
 For example, ceph osd tree would help us understand the status of
 your cluster a bit better.
 
 On Thu, May 16, 2013 at 10:32 PM, Olivier Bonvalet ceph.l...@daevel.fr 
 wrote:
  Le mercredi 15 mai 2013 à 00:15 +0200, Olivier Bonvalet a écrit :
  Hi,
 
  I have some PG in state down and/or incomplete on my cluster, because I
  loose 2 OSD and a pool was having only 2 replicas. So of course that
  data is lost.
 
  My problem now is that I can't retreive a HEALTH_OK status : if I try
  to remove, read or overwrite the corresponding RBD images, near all OSD
  hang (well... they don't do anything and requests stay in a growing
  queue, until the production will be done).
 
  So, what can I do to remove that corrupts images ?
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
  Up. Nobody can help me on that problem ?
 
  Thanks,
 
  Olivier
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 -- 
 John Wilkins
 Senior Technical Writer
 Intank
 john.wilk...@inktank.com
 (415) 425-9599
 http://inktank.com
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG down incomplete

2013-05-17 Thread Olivier Bonvalet

Yes, I set the noout flag to avoid the auto balancing of the osd.25,
which will crash all OSD of this host (already tried several times).

Le vendredi 17 mai 2013 à 11:27 -0700, John Wilkins a écrit :
 It looks like you have the noout flag set:
 
 noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e
monmap e7: 5 mons at
 {a=10.0.0.1:6789/0,b=10.0.0.2:6789/0,c=10.0.0.5:6789/0,d=10.0.0.6:6789/0,e=10.0.0.3:6789/0},
 election epoch 2584, quorum 0,1,2,3 a,b,c,e
osdmap e82502: 50 osds: 48 up, 48 in
 
 http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#stopping-w-out-rebalancing
 
 If you have down OSDs that don't get marked out, that would certainly
 cause problems. Have you tried restarting the failed OSDs?
 
 What do the logs look like for osd.15 and osd.25?
 
 On Fri, May 17, 2013 at 1:31 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote:
  Hi,
 
  thanks for your answer. In fact I have several different problems, which
  I tried to solve separatly :
 
  1) I loose 2 OSD, and some pools have only 2 replicas. So some data was
  lost.
  2) One monitor refuse the Cuttlefish upgrade, so I only have 4 of 5
  monitors running.
  3) I have 4 old inconsistent PG that I can't repair.
 
 
  So the status :
 
 health HEALTH_ERR 15 pgs incomplete; 4 pgs inconsistent; 15 pgs stuck
  inactive; 15 pgs stuck unclean; 1 near full osd(s); 19 scrub errors;
  noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e
 monmap e7: 5 mons at
  {a=10.0.0.1:6789/0,b=10.0.0.2:6789/0,c=10.0.0.5:6789/0,d=10.0.0.6:6789/0,e=10.0.0.3:6789/0},
   election epoch 2584, quorum 0,1,2,3 a,b,c,e
 osdmap e82502: 50 osds: 48 up, 48 in
  pgmap v12807617: 7824 pgs: 7803 active+clean, 1 active+clean
  +scrubbing, 15 incomplete, 4 active+clean+inconsistent, 1 active+clean
  +scrubbing+deep; 5676 GB data, 18948 GB used, 18315 GB / 37263 GB avail;
  137KB/s rd, 1852KB/s wr, 199op/s
 mdsmap e1: 0/0/1 up
 
 
 
  The tree :
 
  # idweight  type name   up/down reweight
  -8  14.26   root SSDroot
  -27 8   datacenter SSDrbx2
  -26 8   room SSDs25
  -25 8   net SSD188-165-12
  -24 8   rack SSD25B09
  -23 8   host lyll
  46  2   osd.46  up  
  1
  47  2   osd.47  up  
  1
  48  2   osd.48  up  
  1
  49  2   osd.49  up  
  1
  -10 4.26datacenter SSDrbx3
  -12 2   room SSDs43
  -13 2   net SSD178-33-122
  -16 2   rack SSD43S01
  -17 2   host kaino
  42  1   osd.42  up  
  1
  43  1   osd.43  up  
  1
  -22 2.26room SSDs45
  -21 2.26net SSD5-135-138
  -20 2.26rack SSD45F01
  -19 2.26host taman
  44  1.13osd.44  up  
  1
  45  1.13osd.45  up  
  1
  -9  2   datacenter SSDrbx4
  -11 2   room SSDs52
  -14 2   net SSD176-31-226
  -15 2   rack SSD52B09
  -18 2   host dragan
  40  1   osd.40  up  
  1
  41  1   osd.41  up  
  1
  -1  33.43   root SASroot
  -10015.9datacenter SASrbx1
  -90 15.9room SASs15
  -72 15.9net SAS188-165-15
  -40 8   rack SAS15B01
  -3  8   host brontes
  0   1   osd.0   up  
  1
  1   1   osd.1   up  
  1
  2   1   osd.2   up  
  1
  3   1   osd.3   up  
  1
  4   1   osd.4   up  
  1
  5   1   osd.5   up  
  1
  6   1   osd.6   up  
  1
  7   1   osd.7   up

Re: [ceph-users] PG down incomplete

2013-05-16 Thread Olivier Bonvalet

Le mercredi 15 mai 2013 à 00:15 +0200, Olivier Bonvalet a écrit :
 Hi,
 
 I have some PG in state down and/or incomplete on my cluster, because I
 loose 2 OSD and a pool was having only 2 replicas. So of course that
 data is lost.
 
 My problem now is that I can't retreive a HEALTH_OK status : if I try
 to remove, read or overwrite the corresponding RBD images, near all OSD
 hang (well... they don't do anything and requests stay in a growing
 queue, until the production will be done).
 
 So, what can I do to remove that corrupts images ?
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

Up. Nobody can help me on that problem ?

Thanks,

Olivier

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] PG down incomplete

2013-05-14 Thread Olivier Bonvalet

Hi,

I have some PG in state down and/or incomplete on my cluster, because I
loose 2 OSD and a pool was having only 2 replicas. So of course that
data is lost.

My problem now is that I can't retreive a HEALTH_OK status : if I try
to remove, read or overwrite the corresponding RBD images, near all OSD
hang (well... they don't do anything and requests stay in a growing
queue, until the production will be done).

So, what can I do to remove that corrupts images ?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd snap rm overload my cluster (during backups)

2013-04-21 Thread Olivier Bonvalet

I use ceph 0.56.4 ; and to be fair, a lot of stuff are «doing badly» on
my cluster, so maybe I have a general OSD problem.


Le dimanche 21 avril 2013 à 08:44 -0700, Gregory Farnum a écrit :
 Which version of Ceph are you running right now and seeing this with
 (Sam reworked it a bit for Cuttlefish and it was in some of the dev
 releases)? Snapshot deletes are a little more expensive than we'd
 like, but I'm surprised they're doing this badly for you. :/
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com
 
 On Sun, Apr 21, 2013 at 2:16 AM, Olivier Bonvalet
 olivier.bonva...@daevel.fr wrote:
  Hi,
 
  I have a backup script, which every night :
  * create a snapshot of each RBD image
  * then delete all snapshot that have more than 15 days
 
  The problem is that rbd snap rm XXX will overload my cluster for hours
  (6 hours today...).
 
  Here I see several problems :
  #1 rbd snap rm XXX is not blocking. The erase is done in background,
  and I know no way to verify if it was completed. So I add sleeps
  between rm, but I have to estimate the time it will take
 
  #2 rbd (snap) rm are sometimes very very slow. I don't know if it's
  because of XFS or not, but all my OSD are at 100% IO usage (reported by
  iostat)
 
 
 
  So :
  * is there a way to reduce priority of snap rm, to avoid overloading
  of the cluster ?
  * is there a way to have a blocking snap rm which will wait until it's
  completed
  * is there a way to speedup snap rm ?
 
 
  Note that I have a too low PG number on my cluster (200 PG for 40 active
  OSD ; but I'm trying to progressivly migrate data to a newer pool). Can
  it be the source of the problem ?
 
  Thanks,
 
  Olivier
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Scrub shutdown the OSD process

2013-04-20 Thread Olivier Bonvalet

Le mercredi 17 avril 2013 à 20:52 +0200, Olivier Bonvalet a écrit :
 What I didn't understand is why the OSD process crash, instead of
 marking that PG corrupted, and does that PG really corrupted are
 is
 this just an OSD bug ?

Once again, a bit more informations : by searching informations about
one of this faulty PG (3.d), I found that :

  -592 2013-04-20 08:31:56.838280 7f0f41d1b700  0 log [ERR] : 3.d osd.25 
inconsistent snapcolls on a8620b0d/rb.0.15c26.238e1f29.4603/12d7//3 
found  expected 12d7
  -591 2013-04-20 08:31:56.838284 7f0f41d1b700  0 log [ERR] : 3.d osd.4 
inconsistent snapcolls on a8620b0d/rb.0.15c26.238e1f29.4603/12d7//3 
found  expected 12d7
  -590 2013-04-20 08:31:56.838290 7f0f41d1b700  0 log [ERR] : 3.d osd.4: soid 
a8620b0d/rb.0.15c26.238e1f29.4603/12d7//3 size 4194304 != known size 0
  -589 2013-04-20 08:31:56.838292 7f0f41d1b700  0 log [ERR] : 3.d osd.11 
inconsistent snapcolls on a8620b0d/rb.0.15c26.238e1f29.4603/12d7//3 
found  expected 12d7
  -588 2013-04-20 08:31:56.838294 7f0f41d1b700  0 log [ERR] : 3.d osd.11: soid 
a8620b0d/rb.0.15c26.238e1f29.4603/12d7//3 size 4194304 != known size 0
  -587 2013-04-20 08:31:56.838395 7f0f41d1b700  0 log [ERR] : scrub 3.d 
a8620b0d/rb.0.15c26.238e1f29.4603/12d7//3 on disk size (0) does not 
match object info size (4194304)


I prefered to verify, so I found that :

# md5sum 
/var/lib/ceph/osd/ceph-*/current/3.d_head/DIR_D/DIR_0/DIR_B/DIR_0/rb.0.15c26.238e1f29.4603__12d7_A8620B0D__3
217ac2518dfe9e1502e5bfedb8be29b8  
/var/lib/ceph/osd/ceph-4/current/3.d_head/DIR_D/DIR_0/DIR_B/DIR_0/rb.0.15c26.238e1f29.4603__12d7_A8620B0D__3
 (4MB)
217ac2518dfe9e1502e5bfedb8be29b8  
/var/lib/ceph/osd/ceph-11/current/3.d_head/DIR_D/DIR_0/DIR_B/DIR_0/rb.0.15c26.238e1f29.4603__12d7_A8620B0D__3
 (4MB)
d41d8cd98f00b204e9800998ecf8427e  
/var/lib/ceph/osd/ceph-25/current/3.d_head/DIR_D/DIR_0/DIR_B/DIR_0/rb.0.15c26.238e1f29.4603__12d7_A8620B0D__3
 (0B)


So this object is identical on OSD 4 and 11, but is empty on OSD 25.
Since 4 is the master, this should not be a problem, so I try a repair,
without any success :
ceph pg repair 3.d


Is there a way to force rewrite of this replica ?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Scrub shutdown the OSD process

2013-04-17 Thread Olivier Bonvalet

Le mardi 16 avril 2013 à 08:56 +0200, Olivier Bonvalet a écrit :
 
 
 Le lundi 15 avril 2013 à 10:57 -0700, Gregory Farnum a écrit :
  On Mon, Apr 15, 2013 at 10:19 AM, Olivier Bonvalet ceph.l...@daevel.fr 
  wrote:
   Le lundi 15 avril 2013 à 10:16 -0700, Gregory Farnum a écrit :
   Are you saying you saw this problem more than once, and so you
   completely wiped the OSD in question, then brought it back into the
   cluster, and now it's seeing this error again?
  
   Yes, it's exactly that.
  
  
   Are any other OSDs experiencing this issue?
  
   No, only this one have the problem.
  
  Did you run scrubs while this node was out of the cluster? If you
  wiped the data and this is recurring then this is apparently an issue
  with the cluster state, not just one node, and any other primary for
  the broken PG(s) should crash as well. Can you verify by taking this
  one down and then doing a full scrub?
  -Greg
  Software Engineer #42 @ http://inktank.com | http://ceph.com
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
 
 So, I mark this OSD as out to balance data and be able to re-do a
 scrum. You are probably right, since I now have 3 other OSD on the same
 host which are down.
 
 I still haven't any PG in error (and the cluster is in HEALTH_WARN
 status), but something goes wrong.
 
 In syslog I have :
 
 Apr 16 02:07:08 alim ceph-osd: 2013-04-16 02:07:06.742915 7fe651131700 -1 
 filestore(/var/lib/ceph/osd/ceph-31) could not find 
 d452999/rb.0.2d76b.238e1f29.162d/f99//4 in index: (2) No such file or 
 directory
 Apr 16 02:07:08 alim ceph-osd: 2013-04-16 02:07:06.742999 7fe651131700 -1 
 filestore(/var/lib/ceph/osd/ceph-31) could not find 
 85242299/rb.0.1367.2ae8944a.1f9c/f98//4 in index: (2) No such file or 
 directory
 Apr 16 03:41:11 alim ceph-osd: 2013-04-16 03:41:11.758150 7fe64f12d700 -1 
 osd.31 48020 heartbeat_check: no reply from osd.5 since 2013-04-16 
 03:40:50.349130 (cutoff 2013-04-16 03:40:51.758149)
 Apr 16 04:27:40 alim ceph-osd: 2013-04-16 04:27:40.529492 7fe65e14b700 -1 
 osd.31 48416 heartbeat_check: no reply from osd.26 since 2013-04-16 
 04:27:20.203868 (cutoff 2013-04-16 04:27:20.529489)
 Apr 16 04:27:41 alim ceph-osd: 2013-04-16 04:27:41.529609 7fe65e14b700 -1 
 osd.31 48416 heartbeat_check: no reply from osd.26 since 2013-04-16 
 04:27:20.203868 (cutoff 2013-04-16 04:27:21.529605)
 Apr 16 05:01:43 alim ceph-osd: 2013-04-16 05:01:43.440257 7fe64f12d700 -1 
 osd.31 48602 heartbeat_check: no reply from osd.4 since 2013-04-16 
 05:01:22.529918 (cutoff 2013-04-16 05:01:23.440257)
 Apr 16 05:01:43 alim ceph-osd: 2013-04-16 05:01:43.523985 7fe65e14b700 -1 
 osd.31 48602 heartbeat_check: no reply from osd.4 since 2013-04-16 
 05:01:22.529918 (cutoff 2013-04-16 05:01:23.523984)
 Apr 16 05:55:28 alim ceph-osd: 2013-04-16 05:55:27.770327 7fe65e14b700 -1 
 osd.31 48847 heartbeat_check: no reply from osd.26 since 2013-04-16 
 05:55:07.392502 (cutoff 2013-04-16 05:55:07.770323)
 Apr 16 05:55:28 alim ceph-osd: 2013-04-16 05:55:28.497600 7fe64f12d700 -1 
 osd.31 48847 heartbeat_check: no reply from osd.26 since 2013-04-16 
 05:55:07.392502 (cutoff 2013-04-16 05:55:08.497598)
 Apr 16 06:04:13 alim ceph-osd: 2013-04-16 06:04:13.051839 7fe65012f700 -1 
 osd/ReplicatedPG.cc: In function 'virtual void 
 ReplicatedPG::_scrub(ScrubMap)' thread 7fe65012f700 time 2013-04-16 
 06:04:12.843977#012osd/ReplicatedPG.cc: 7188: FAILED assert(head != 
 hobject_t())#012#012 ceph version 0.56.4-4-gd89ab0e 
 (d89ab0ea6fa8d0961cad82f6a81eccbd3bbd3f55)#012 1: 
 (ReplicatedPG::_scrub(ScrubMap)+0x1a78) [0x57a038]#012 2: 
 (PG::scrub_compare_maps()+0xeb8) [0x696c18]#012 3: (PG::chunky_scrub()+0x2d9) 
 [0x6c37f9]#012 4: (PG::scrub()+0x145) [0x6c4e55]#012 5: 
 (OSD::ScrubWQ::_process(PG*)+0xc) [0x64048c]#012 6: 
 (ThreadPool::worker(ThreadPool::WorkThread*)+0x879) [0x815179]#012 7: 
 (ThreadPool::WorkThread::entry()+0x10) [0x817980]#012 8: (()+0x68ca) 
 [0x7fe6626558ca]#012 9: (clone()+0x6d) [0x7fe661184b6d]#012 NOTE: a copy of 
 the executable, or `objdump -rdS executable` is needed to interpret this.
 Apr 16 06:04:13 alim ceph-osd:  0 2013-04-16 06:04:13.051839 
 7fe65012f700 -1 osd/ReplicatedPG.cc: In function 'virtual void 
 ReplicatedPG::_scrub(ScrubMap)' thread 7fe65012f700 time 2013-04-16 
 06:04:12.843977#012osd/ReplicatedPG.cc: 7188: FAILED assert(head != 
 hobject_t())#012#012 ceph version 0.56.4-4-gd89ab0e 
 (d89ab0ea6fa8d0961cad82f6a81eccbd3bbd3f55)#012 1: 
 (ReplicatedPG::_scrub(ScrubMap)+0x1a78) [0x57a038]#012 2: 
 (PG::scrub_compare_maps()+0xeb8) [0x696c18]#012 3: (PG::chunky_scrub()+0x2d9) 
 [0x6c37f9]#012 4: (PG::scrub()+0x145) [0x6c4e55]#012 5: 
 (OSD::ScrubWQ::_process(PG*)+0xc) [0x64048c]#012 6: 
 (ThreadPool::worker(ThreadPool::WorkThread*)+0x879) [0x815179]#012 7: 
 (ThreadPool::WorkThread::entry()+0x10) [0x817980]#012 8: (()+0x68ca) 
 [0x7fe6626558ca]#012 9

Re: [ceph-users] Scrub shutdown the OSD process

2013-04-16 Thread Olivier Bonvalet



Le lundi 15 avril 2013 à 10:57 -0700, Gregory Farnum a écrit :
 On Mon, Apr 15, 2013 at 10:19 AM, Olivier Bonvalet ceph.l...@daevel.fr 
 wrote:
  Le lundi 15 avril 2013 à 10:16 -0700, Gregory Farnum a écrit :
  Are you saying you saw this problem more than once, and so you
  completely wiped the OSD in question, then brought it back into the
  cluster, and now it's seeing this error again?
 
  Yes, it's exactly that.
 
 
  Are any other OSDs experiencing this issue?
 
  No, only this one have the problem.
 
 Did you run scrubs while this node was out of the cluster? If you
 wiped the data and this is recurring then this is apparently an issue
 with the cluster state, not just one node, and any other primary for
 the broken PG(s) should crash as well. Can you verify by taking this
 one down and then doing a full scrub?
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

So, I mark this OSD as out to balance data and be able to re-do a
scrum. You are probably right, since I now have 3 other OSD on the same
host which are down.

I still haven't any PG in error (and the cluster is in HEALTH_WARN
status), but something goes wrong.

In syslog I have :

Apr 16 02:07:08 alim ceph-osd: 2013-04-16 02:07:06.742915 7fe651131700 -1 
filestore(/var/lib/ceph/osd/ceph-31) could not find 
d452999/rb.0.2d76b.238e1f29.162d/f99//4 in index: (2) No such file or 
directory
Apr 16 02:07:08 alim ceph-osd: 2013-04-16 02:07:06.742999 7fe651131700 -1 
filestore(/var/lib/ceph/osd/ceph-31) could not find 
85242299/rb.0.1367.2ae8944a.1f9c/f98//4 in index: (2) No such file or 
directory
Apr 16 03:41:11 alim ceph-osd: 2013-04-16 03:41:11.758150 7fe64f12d700 -1 
osd.31 48020 heartbeat_check: no reply from osd.5 since 2013-04-16 
03:40:50.349130 (cutoff 2013-04-16 03:40:51.758149)
Apr 16 04:27:40 alim ceph-osd: 2013-04-16 04:27:40.529492 7fe65e14b700 -1 
osd.31 48416 heartbeat_check: no reply from osd.26 since 2013-04-16 
04:27:20.203868 (cutoff 2013-04-16 04:27:20.529489)
Apr 16 04:27:41 alim ceph-osd: 2013-04-16 04:27:41.529609 7fe65e14b700 -1 
osd.31 48416 heartbeat_check: no reply from osd.26 since 2013-04-16 
04:27:20.203868 (cutoff 2013-04-16 04:27:21.529605)
Apr 16 05:01:43 alim ceph-osd: 2013-04-16 05:01:43.440257 7fe64f12d700 -1 
osd.31 48602 heartbeat_check: no reply from osd.4 since 2013-04-16 
05:01:22.529918 (cutoff 2013-04-16 05:01:23.440257)
Apr 16 05:01:43 alim ceph-osd: 2013-04-16 05:01:43.523985 7fe65e14b700 -1 
osd.31 48602 heartbeat_check: no reply from osd.4 since 2013-04-16 
05:01:22.529918 (cutoff 2013-04-16 05:01:23.523984)
Apr 16 05:55:28 alim ceph-osd: 2013-04-16 05:55:27.770327 7fe65e14b700 -1 
osd.31 48847 heartbeat_check: no reply from osd.26 since 2013-04-16 
05:55:07.392502 (cutoff 2013-04-16 05:55:07.770323)
Apr 16 05:55:28 alim ceph-osd: 2013-04-16 05:55:28.497600 7fe64f12d700 -1 
osd.31 48847 heartbeat_check: no reply from osd.26 since 2013-04-16 
05:55:07.392502 (cutoff 2013-04-16 05:55:08.497598)
Apr 16 06:04:13 alim ceph-osd: 2013-04-16 06:04:13.051839 7fe65012f700 -1 
osd/ReplicatedPG.cc: In function 'virtual void ReplicatedPG::_scrub(ScrubMap)' 
thread 7fe65012f700 time 2013-04-16 06:04:12.843977#012osd/ReplicatedPG.cc: 
7188: FAILED assert(head != hobject_t())#012#012 ceph version 0.56.4-4-gd89ab0e 
(d89ab0ea6fa8d0961cad82f6a81eccbd3bbd3f55)#012 1: 
(ReplicatedPG::_scrub(ScrubMap)+0x1a78) [0x57a038]#012 2: 
(PG::scrub_compare_maps()+0xeb8) [0x696c18]#012 3: (PG::chunky_scrub()+0x2d9) 
[0x6c37f9]#012 4: (PG::scrub()+0x145) [0x6c4e55]#012 5: 
(OSD::ScrubWQ::_process(PG*)+0xc) [0x64048c]#012 6: 
(ThreadPool::worker(ThreadPool::WorkThread*)+0x879) [0x815179]#012 7: 
(ThreadPool::WorkThread::entry()+0x10) [0x817980]#012 8: (()+0x68ca) 
[0x7fe6626558ca]#012 9: (clone()+0x6d) [0x7fe661184b6d]#012 NOTE: a copy of the 
executable, or `objdump -rdS executable` is needed to interpret this.
Apr 16 06:04:13 alim ceph-osd:  0 2013-04-16 06:04:13.051839 7fe65012f700 
-1 osd/ReplicatedPG.cc: In function 'virtual void 
ReplicatedPG::_scrub(ScrubMap)' thread 7fe65012f700 time 2013-04-16 
06:04:12.843977#012osd/ReplicatedPG.cc: 7188: FAILED assert(head != 
hobject_t())#012#012 ceph version 0.56.4-4-gd89ab0e 
(d89ab0ea6fa8d0961cad82f6a81eccbd3bbd3f55)#012 1: 
(ReplicatedPG::_scrub(ScrubMap)+0x1a78) [0x57a038]#012 2: 
(PG::scrub_compare_maps()+0xeb8) [0x696c18]#012 3: (PG::chunky_scrub()+0x2d9) 
[0x6c37f9]#012 4: (PG::scrub()+0x145) [0x6c4e55]#012 5: 
(OSD::ScrubWQ::_process(PG*)+0xc) [0x64048c]#012 6: 
(ThreadPool::worker(ThreadPool::WorkThread*)+0x879) [0x815179]#012 7: 
(ThreadPool::WorkThread::entry()+0x10) [0x817980]#012 8: (()+0x68ca) 
[0x7fe6626558ca]#012 9: (clone()+0x6d) [0x7fe661184b6d]#012 NOTE: a copy of the 
executable, or `objdump -rdS executable` is needed to interpret this.
Apr 16 06:04:13 alim ceph-osd: 2013

Re: [ceph-users] Performance problems

2013-04-16 Thread Olivier Bonvalet



Le vendredi 12 avril 2013 à 19:45 +0200, Olivier Bonvalet a écrit :
 Le vendredi 12 avril 2013 à 10:04 -0500, Mark Nelson a écrit :
  On 04/11/2013 07:25 PM, Ziemowit Pierzycki wrote:
   No, I'm not using RDMA in this configuration since this will eventually
   get deployed to production with 10G ethernet (yes RDMA is faster).  I
   would prefer Ceph because it has a storage drive built into OpenNebula
   which my company is using and as you mentioned individual drives.
  
   I'm not sure what the problem is but it appears to me that one of the
   hosts may be holding up the rest... with Ceph if the performance of one
   of the hosts is much faster than others could this potentially slow down
   the cluster to this level?
  
  Definitely!  Even 1 slow OSD can cause dramatic slow downs.  This is 
  because we (by default) try to distribute data evenly to every OSD in 
  the cluster.  If even 1 OSD is really slow, it will accumulate more and 
  more outstanding operations while all of the other OSDs complete their 
  requests.  What will happen is that eventually you will have all of your 
  outstanding operations waiting on that slow OSD, and all of the other 
  OSDs will sit idle waiting for new requests.
  
  If you know that some OSDs are permanently slower than others, you can 
  re-weight them so that they receive fewer requests than the others which 
  can mitigate this, but that isn't always an optimal solution.  Some 
  times a slow OSD can be a sign of other hardware problems too.
  
  Mark
  
 
 and does response time of OSD are log somewhere, to identify that weak
 link ?
 
 

I think I found the answer with the admin socket :
ceph --admin-daemon /var/run/ceph/ceph-osd.14.asok perf dump

For example in the output I can see op_latency, op_w_latency,
op_r_latency, op_rw_latency, etc. Great.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD snapshots are not «readable», because of LVM ?

2013-04-16 Thread Olivier Bonvalet

Ok thanks for the advice. But since the kernel client is not compatible
with format 2 images, I can't use it.

I will look at LVM about PV in read-only. Thanks again.

Le mardi 16 avril 2013 à 06:38 +0200, Wolfgang Hennerbichler a écrit :
 I am not entirely sure about LVM, but snapshots are read-only by nature.
 Maybe LVM expects a writeable PV.
 Use format 2 images and create snapshot childs, and all should be good.
 
 On 04/15/2013 04:48 PM, Olivier Bonvalet wrote:
  Hi,
  
  I'm trying to map a RBD snapshot, which contains an LVM PV.
  
  I can do the «map» : 
  rbd map hdd3copies/jason@20130415-065314 --id alg
  
  Then pvscan works :
  pvscan | grep rbd
PV /dev/rbd58   VG vg-jason   lvm2 [19,94 GiB / 1,44 GiB free]
  
  But enabling LV doesn't work :
  # vgchange -ay vg-jason
device-mapper: reload ioctl failed: Argument invalide
Internal error: Maps lock 13746176  unlock 13889536
device-mapper: reload ioctl failed: Argument invalide
device-mapper: reload ioctl failed: Argument invalide
device-mapper: reload ioctl failed: Argument invalide
device-mapper: reload ioctl failed: Argument invalide
device-mapper: reload ioctl failed: Argument invalide
device-mapper: reload ioctl failed: Argument invalide
7 logical volume(s) in volume group vg-jason now active
  
  # blockdev --getsize64 /dev/mapper/vg--jason-*
  0
  0
  0
  0
  0
  0
  0
  
  Is it a problem from LVM, or I should not use snapshots like that ?
  
  
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
 
 
 -- 
 DI (FH) Wolfgang Hennerbichler
 Software Development
 Unit Advanced Computing Technologies
 RISC Software GmbH
 A company of the Johannes Kepler University Linz
 
 IT-Center
 Softwarepark 35
 4232 Hagenberg
 Austria
 
 Phone: +43 7236 3343 245
 Fax: +43 7236 3343 250
 wolfgang.hennerbich...@risc-software.at
 http://www.risc-software.at
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Scrub shutdown the OSD process

2013-04-15 Thread Olivier Bonvalet

Hi,

I have an OSD process which is regulary shutdown by scrub, if I well
understand that trace :

 0 2013-04-15 09:29:53.708141 7f5a8e3cc700 -1 *** Caught signal (Aborted) 
**
 in thread 7f5a8e3cc700

 ceph version 0.56.4-4-gd89ab0e (d89ab0ea6fa8d0961cad82f6a81eccbd3bbd3f55)
 1: /usr/bin/ceph-osd() [0x7a6289]
 2: (()+0xeff0) [0x7f5aa08faff0]
 3: (gsignal()+0x35) [0x7f5a9f3841b5]
 4: (abort()+0x180) [0x7f5a9f386fc0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f5a9fc18dc5]
 6: (()+0xcb166) [0x7f5a9fc17166]
 7: (()+0xcb193) [0x7f5a9fc17193]
 8: (()+0xcb28e) [0x7f5a9fc1728e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x7c9) [0x8f9549]
 10: (ReplicatedPG::_scrub(ScrubMap)+0x1a78) [0x57a038]
 11: (PG::scrub_compare_maps()+0xeb8) [0x696c18]
 12: (PG::chunky_scrub()+0x2d9) [0x6c37f9]
 13: (PG::scrub()+0x145) [0x6c4e55]
 14: (OSD::ScrubWQ::_process(PG*)+0xc) [0x64048c]
 15: (ThreadPool::worker(ThreadPool::WorkThread*)+0x879) [0x815179]
 16: (ThreadPool::WorkThread::entry()+0x10) [0x817980]
 17: (()+0x68ca) [0x7f5aa08f28ca]
 18: (clone()+0x6d) [0x7f5a9f421b6d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -1/-1 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent 1
  max_new 1000
  log_file /var/log/ceph/osd.25.log
--- end dump of recent events ---


I tried to format that OSD, and re-inject it in the cluster, but after
the recovery the problem still occur.

Since I don't see any hard drive error in kernel logs, what can be the
problem ?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RBD snapshots are not «readable», because of LVM ?

2013-04-15 Thread Olivier Bonvalet

Hi,

I'm trying to map a RBD snapshot, which contains an LVM PV.

I can do the «map» : 
rbd map hdd3copies/jason@20130415-065314 --id alg

Then pvscan works :
pvscan | grep rbd
  PV /dev/rbd58   VG vg-jason   lvm2 [19,94 GiB / 1,44 GiB free]

But enabling LV doesn't work :
# vgchange -ay vg-jason
  device-mapper: reload ioctl failed: Argument invalide
  Internal error: Maps lock 13746176  unlock 13889536
  device-mapper: reload ioctl failed: Argument invalide
  device-mapper: reload ioctl failed: Argument invalide
  device-mapper: reload ioctl failed: Argument invalide
  device-mapper: reload ioctl failed: Argument invalide
  device-mapper: reload ioctl failed: Argument invalide
  device-mapper: reload ioctl failed: Argument invalide
  7 logical volume(s) in volume group vg-jason now active

# blockdev --getsize64 /dev/mapper/vg--jason-*
0
0
0
0
0
0
0

Is it a problem from LVM, or I should not use snapshots like that ?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Scrub shutdown the OSD process

2013-04-15 Thread Olivier Bonvalet

Le lundi 15 avril 2013 à 10:57 -0700, Gregory Farnum a écrit :
 On Mon, Apr 15, 2013 at 10:19 AM, Olivier Bonvalet ceph.l...@daevel.fr 
 wrote:
  Le lundi 15 avril 2013 à 10:16 -0700, Gregory Farnum a écrit :
  Are you saying you saw this problem more than once, and so you
  completely wiped the OSD in question, then brought it back into the
  cluster, and now it's seeing this error again?
 
  Yes, it's exactly that.
 
 
  Are any other OSDs experiencing this issue?
 
  No, only this one have the problem.
 
 Did you run scrubs while this node was out of the cluster? If you
 wiped the data and this is recurring then this is apparently an issue
 with the cluster state, not just one node, and any other primary for
 the broken PG(s) should crash as well. Can you verify by taking this
 one down and then doing a full scrub?
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

Also note that no PG is marked corrupted. I have only PG in active
+remapped or active+degraded.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Number of ODS per host

2013-03-06 Thread Olivier Bonvalet

Hi,

I think it depends of your total number of OSDs.

I take my case : I have 8 OSD per host, and 5 host. If one host crash, I
loose 20% of the cluster, and have a huge amount of data to rebalance.

For fault tolerance, it was not a good idea.


Le mardi 05 mars 2013 à 02:39 -0800, waed Albataineh a écrit :
 Hi there, 
 i believe for the quick start of Ceph we will get two OSDs, even it's
 not recommended i wanna increase them. 
 My question is it gonna ended bad if i end up with 10 OSDs per host ??
 and for these increment i must manipulate the configuration file,
 right ?? finally if i finished the installation then i realize i need
 to change sth on the configuration file is it possible or it will
 crash ??  
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

96 matches

Mail list logo