subject:"\[ceph\-users\] Broken snapshots... CEPH 0.94.2"

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-26 Thread Voloshanenko Igor

Great! Yes, behaviour exact as i described. So looks like it's root cause ) Thank you, Sam. Ilya! 2015-08-21 21:08 GMT+03:00 Samuel Just sj...@redhat.com: I think I found the bug -- need to whiteout the snapset (or decache it) upon evict. http://tracker.ceph.com/issues/12748 -Sam On Fri,

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-21 Thread Ilya Dryomov

On Fri, Aug 21, 2015 at 5:59 PM, Samuel Just sj...@redhat.com wrote: Odd, did you happen to capture osd logs? No, but the reproducer is trivial to cut paste. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-21 Thread Samuel Just

Odd, did you happen to capture osd logs? -Sam On Thu, Aug 20, 2015 at 8:10 PM, Ilya Dryomov idryo...@gmail.com wrote: On Fri, Aug 21, 2015 at 2:02 AM, Samuel Just sj...@redhat.com wrote: What's supposed to happen is that the client transparently directs all requests to the cache pool rather

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-21 Thread Samuel Just

I think I found the bug -- need to whiteout the snapset (or decache it) upon evict. http://tracker.ceph.com/issues/12748 -Sam On Fri, Aug 21, 2015 at 8:04 AM, Ilya Dryomov idryo...@gmail.com wrote: On Fri, Aug 21, 2015 at 5:59 PM, Samuel Just sj...@redhat.com wrote: Odd, did you happen to

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-21 Thread Voloshanenko Igor

Exact as in our case. Ilya, same for images from our side. Headers opened from hot tier пятница, 21 августа 2015 г. пользователь Ilya Dryomov написал: On Fri, Aug 21, 2015 at 2:02 AM, Samuel Just sj...@redhat.com javascript:; wrote: What's supposed to happen is that the client transparently

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Ilya Dryomov

On Fri, Aug 21, 2015 at 2:02 AM, Samuel Just sj...@redhat.com wrote: What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool. If the kernel is sending requests to the cold pool, that's probably

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just

Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com wrote: This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor

Good joke ) 2015-08-21 2:06 GMT+03:00 Samuel Just sj...@redhat.com: Certainly, don't reproduce this with a cluster you care about :). -Sam On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com wrote: What's supposed to happen is that the client transparently directs all

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor

We switch to forward mode as step to switch cache layer off. Right now we have samsung 850 pro in cache layer (10 ssd, 2 per nodes) and they show 2MB for 4K blocks... 250 IOPS... intead of 18-20K for intel S3500 240G which we choose as replacement.. So with such good disks - cache layer - very

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just

Created a ticket to improve our testing here -- this appears to be a hole. http://tracker.ceph.com/issues/12742 -Sam On Thu, Aug 20, 2015 at 4:09 PM, Samuel Just sj...@redhat.com wrote: So you started draining the cache pool before you saw either the inconsistent pgs or the anomalous snap

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor

Right. But issues started... 2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com: But that was still in writeback mode, right? -Sam On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: WE haven't set values for max_bytes / max_objects.. and all data

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just

Yeah, I'm trying to confirm that the issues did happen in writeback mode. -Sam On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Right. But issues started... 2015-08-21 2:20 GMT+03:00 Samuel Just sj...@redhat.com: But that was still in writeback mode,

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor

As i we use journal collocation for journal now (because we want to utilize cache layer ((( ) i use ceph-disk to create new OSD (changed journal size on ceph.conf). I don;t prefer manual work)) So create very simple script to update journal size 2015-08-21 2:25 GMT+03:00 Voloshanenko Igor

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor

Will do, Sam! thank in advance for you help! 2015-08-21 2:28 GMT+03:00 Samuel Just sj...@redhat.com: Ok, create a ticket with a timeline and all of this information, I'll try to look into it more tomorrow. -Sam On Thu, Aug 20, 2015 at 4:25 PM, Voloshanenko Igor igor.voloshane...@gmail.com

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor

WE haven't set values for max_bytes / max_objects.. and all data initially writes only to cache layer and not flushed at all to cold layer. Then we received notification from monitoring that we collect about 750GB in hot pool ) So i changed values for max_object_bytes to be 0,9 of disk size...

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just

But that was still in writeback mode, right? -Sam On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: WE haven't set values for max_bytes / max_objects.. and all data initially writes only to cache layer and not flushed at all to cold layer. Then we received

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor

I already kill cache layer, but will try to reproduce on lab 2015-08-21 1:58 GMT+03:00 Samuel Just sj...@redhat.com: Hmm, that might actually be client side. Can you attempt to reproduce with rbd-fuse (different client side implementation from the kernel)? -Sam On Thu, Aug 20, 2015 at 3:56

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just

So you started draining the cache pool before you saw either the inconsistent pgs or the anomalous snap behavior? (That is, writeback mode was working correctly?) -Sam On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Good joke ) 2015-08-21 2:06

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just

Specifically, the snap behavior (we already know that the pgs went inconsistent while the pool was in writeback mode, right?). -Sam On Thu, Aug 20, 2015 at 4:22 PM, Samuel Just sj...@redhat.com wrote: Yeah, I'm trying to confirm that the issues did happen in writeback mode. -Sam On Thu, Aug

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor

I mean in forward mode - it;s permanent problem - snapshots not working. And for writeback mode after we change max_bytes/object values, it;s around 30 by 70... 70% of time it;s works... 30% - not. Looks like for old images - snapshots works fine (images which already exists before we change

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just

What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool. If the kernel is sending requests to the cold pool, that's probably where the bug is. Odd. It could also be a bug specific 'forward' mode

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor

Our initial values for journal sizes was enough, but flush time was 5 secs, so we increase journal side to fit flush timeframe min|max for 29/30 seconds. I mean filestore max sync interval = 30 filestore min sync interval = 29 when said flush time 2015-08-21 2:16 GMT+03:00 Samuel Just

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just

It would help greatly if, on a disposable cluster, you could reproduce the snapshot problem with debug osd = 20 debug filestore = 20 debug ms = 1 on all of the osds and attach the logs to the bug report. That should make it easier to work out what is going on. -Sam On Thu, Aug 20, 2015 at 4:40

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just

Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just sj...@redhat.com wrote: Snapshotting with cache/tiering *is* supposed to work. Can you open a bug? -Sam On Thu, Aug 20, 2015 at 3:36 PM, Andrija Panic andrija.pa...@gmail.com wrote: This was

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor

Yes, will do. What we see. When cache tier in forward mod, if i did rbd snap create - it's use rbd_header not from cold tier, but from hot-tier, butm this 2 headers not synced And can;t be evicted from hot-storage, as it;s locked by KVM (Qemu). If i kill lock, evict header - all start to work..

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just

Certainly, don't reproduce this with a cluster you care about :). -Sam On Thu, Aug 20, 2015 at 4:02 PM, Samuel Just sj...@redhat.com wrote: What's supposed to happen is that the client transparently directs all requests to the cache pool rather than the cold pool when there is a cache pool.

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just

Also, what do you mean by change journal side? -Sam On Thu, Aug 20, 2015 at 4:15 PM, Samuel Just sj...@redhat.com wrote: Not sure what you mean by: but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... -Sam On Thu, Aug 20, 2015 at 4:11 PM,

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just

Not sure what you mean by: but it's stop to work in same moment, when cache layer fulfilled with data and evict/flush started... -Sam On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: No, when we start draining cache - bad pgs was in place... We have big

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just

Ok, create a ticket with a timeline and all of this information, I'll try to look into it more tomorrow. -Sam On Thu, Aug 20, 2015 at 4:25 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Exactly пятница, 21 августа 2015 г. пользователь Samuel Just написал: And you adjusted the

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor

Attachment blocked, so post as text... root@zzz:~# cat update_osd.sh #!/bin/bash ID=$1 echo Process OSD# ${ID} DEV=`mount | grep ceph-${ID} | cut -d -f 1` echo OSD# ${ID} hosted on ${DEV::-1} TYPE_RAW=`smartctl -a ${DEV} | grep Rota | cut -d -f 6` if [ ${TYPE_RAW} == Solid ] then

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor

root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 2015-08-21 1:54 GMT+03:00 Samuel Just sj...@redhat.com: Also, can you include the kernel version? -Sam On Thu, Aug 20, 2015 at 3:51 PM, Samuel Just

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor

Right ( but also was rebalancing cycle 2 day before pgs corrupted) 2015-08-21 2:23 GMT+03:00 Samuel Just sj...@redhat.com: Specifically, the snap behavior (we already know that the pgs went inconsistent while the pool was in writeback mode, right?). -Sam On Thu, Aug 20, 2015 at 4:22 PM,

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor

Exactly пятница, 21 августа 2015 г. пользователь Samuel Just написал: And you adjusted the journals by removing the osd, recreating it with a larger journal, and reinserting it? -Sam On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor igor.voloshane...@gmail.com javascript:; wrote: Right

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just

And you adjusted the journals by removing the osd, recreating it with a larger journal, and reinserting it? -Sam On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Right ( but also was rebalancing cycle 2 day before pgs corrupted) 2015-08-21 2:23 GMT+03:00

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Samuel Just

Hmm, that might actually be client side. Can you attempt to reproduce with rbd-fuse (different client side implementation from the kernel)? -Sam On Thu, Aug 20, 2015 at 3:56 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: root@test:~# uname -a Linux ix-s5 4.0.4-040004-generic

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor

We used 4.x branch, as we have very good Samsung 850 pro in production, and they don;t support ncq_trim... And 4,x first branch which include exceptions for this in libsata.c. sure we can backport this 1 line to 3.x branch, but we prefer no to go deeper if packege for new kernel exist.

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Voloshanenko Igor

No, when we start draining cache - bad pgs was in place... We have big rebalance (disk by disk - to change journal side on both hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors and 2 pgs inconsistent... In writeback - yes, looks like snapshot works good. but it's stop to

Re: [ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-20 Thread Andrija Panic

This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015 at 21:15, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken...

[ceph-users] Broken snapshots... CEPH 0.94.2

2015-08-17 Thread Voloshanenko Igor

Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken... So, as example, we have VM template, as rbd inside ceph. We can map it and mount to check that all ok with it root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 /dev/rbd0

39 matches

Mail list logo