Re: [ceph-users] Instance filesystem corrupt

2016-10-27 Thread Ahmed Mostafa
Well, for me know i know my issue

It is indeed an over-utilization issue, but bot related to ceph

Th cluster is connected via a 1g interfaces, which basixally gets saturated
by all the bandwidth generated from theae instances trying to read their
root filesystems and mount it, which is stored in ceph

So that explains why restarting the vm later solvs the problem.



On Friday, 28 October 2016, Ahmed Mostafa <ahmedmostafa...@gmail.com> wrote:

> So i couldn't actually wait till the morning
>
> I sat rbd cache to false and tried to create the same number of instances,
> but the same issue happened again.
>
> I want to note, that if i rebooted any of the virtual machines that has
> this issue, it works without any problem afterwards.
>
> Does this mean that over-utilization could be the cause of my problem ?
> The cluster i have have bad hardware and this is the only logical
> explanation i can reach .
>
> By bad hardware i mean core-i5 processors for instance, i can see the %wa
> reaching 50-60% too.
>
> Thank you
>
>
> On Thu, Oct 27, 2016 at 4:13 PM, Jason Dillaman <jdill...@redhat.com
> <javascript:_e(%7B%7D,'cvml','jdill...@redhat.com');>> wrote:
>
>> The only effect I could see out of a highly overloaded system would be
>> that the OSDs might appear to become unresponsive to the VMs. Are any of
>> you using cache tiering or librbd cache? For the latter, there was one
>> issue [1] that can result in read corruption that affects hammer and prior
>> releases.
>>
>> [1] http://tracker.ceph.com/issues/16002
>>
>> On Thu, Oct 27, 2016 at 1:34 AM, Ahmed Mostafa <ahmedmostafa...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','ahmedmostafa...@gmail.com');>> wrote:
>>
>>> This is more or less the same bahaviour i have in ky environment
>>>
>>> By any chance is anyone running their osds and their hypervisors on the
>>> same machine ?
>>>
>>> And could high workload, like starting 40 - 60 or above virtual machines
>>> have an effect on this problem ?
>>>
>>>
>>> On Thursday, 27 October 2016, <keynes_...@wistron.com
>>> <javascript:_e(%7B%7D,'cvml','keynes_...@wistron.com');>> wrote:
>>>
>>>>
>>>>
>>>> Most of filesystem corrupt causes instances crashed, we saw that after
>>>> a shutdown / restart
>>>>
>>>> ( triggered by OpenStack portal  buttons or triggered by OS commands in
>>>> Instances )
>>>>
>>>>
>>>>
>>>> Some are early-detected, we saw filesystem errors in OS logs on
>>>> instances.
>>>>
>>>> Then we make a filesystem check ( FSCK / chkdsk ) immediately, issue
>>>> fixed.
>>>>
>>>>
>>>>
>>>> [image: cid:image007.jpg@01D1747D.DB260110]
>>>>
>>>> *Keynes  Lee**李* *俊* *賢*
>>>>
>>>> Direct:
>>>>
>>>> +886-2-6612-1025
>>>>
>>>> Mobile:
>>>>
>>>> +886-9-1882-3787
>>>>
>>>> Fax:
>>>>
>>>> +886-2-6612-1991
>>>>
>>>>
>>>>
>>>> E-Mail:
>>>>
>>>> keynes_...@wistron.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Jason Dillaman [mailto:jdill...@redhat.com]
>>>> *Sent:* Wednesday, October 26, 2016 9:38 PM
>>>> *To:* Keynes Lee/WHQ/Wistron <keynes_...@wistron.com>
>>>> *Cc:* will.bo...@target.com; ceph-users <ceph-users@lists.ceph.com>
>>>> *Subject:* Re: [ceph-users] [EXTERNAL] Instance filesystem corrupt
>>>>
>>>>
>>>>
>>>> I am not aware of any similar reports against librbd on Firefly. Do you
>>>> use any configuration overrides? Does the filesystem corruption appears
>>>> while the instances are running or only after a shutdown / restart of the
>>>> instance?
>>>>
>>>>
>>>>
>>>> On Wed, Oct 26, 2016 at 12:46 AM, <keynes_...@wistron.com> wrote:
>>>>
>>>> No , we are using Firefly (0.80.7).
>>>>
>>>> As we are using HPE Helion OpenStack 2.1.5, and what the version is was
>>>> embedded is Firefly.
>>>>
>>>>
>>>>
>>>> An upgrade was planning, but should will not happen  soon.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
&g

Re: [ceph-users] Instance filesystem corrupt

2016-10-27 Thread Ahmed Mostafa
So i couldn't actually wait till the morning

I sat rbd cache to false and tried to create the same number of instances,
but the same issue happened again.

I want to note, that if i rebooted any of the virtual machines that has
this issue, it works without any problem afterwards.

Does this mean that over-utilization could be the cause of my problem ? The
cluster i have have bad hardware and this is the only logical explanation i
can reach .

By bad hardware i mean core-i5 processors for instance, i can see the %wa
reaching 50-60% too.

Thank you


On Thu, Oct 27, 2016 at 4:13 PM, Jason Dillaman <jdill...@redhat.com> wrote:

> The only effect I could see out of a highly overloaded system would be
> that the OSDs might appear to become unresponsive to the VMs. Are any of
> you using cache tiering or librbd cache? For the latter, there was one
> issue [1] that can result in read corruption that affects hammer and prior
> releases.
>
> [1] http://tracker.ceph.com/issues/16002
>
> On Thu, Oct 27, 2016 at 1:34 AM, Ahmed Mostafa <ahmedmostafa...@gmail.com>
> wrote:
>
>> This is more or less the same bahaviour i have in ky environment
>>
>> By any chance is anyone running their osds and their hypervisors on the
>> same machine ?
>>
>> And could high workload, like starting 40 - 60 or above virtual machines
>> have an effect on this problem ?
>>
>>
>> On Thursday, 27 October 2016, <keynes_...@wistron.com> wrote:
>>
>>>
>>>
>>> Most of filesystem corrupt causes instances crashed, we saw that after a
>>> shutdown / restart
>>>
>>> ( triggered by OpenStack portal  buttons or triggered by OS commands in
>>> Instances )
>>>
>>>
>>>
>>> Some are early-detected, we saw filesystem errors in OS logs on
>>> instances.
>>>
>>> Then we make a filesystem check ( FSCK / chkdsk ) immediately, issue
>>> fixed.
>>>
>>>
>>>
>>> [image: cid:image007.jpg@01D1747D.DB260110]
>>>
>>> *Keynes  Lee**李* *俊* *賢*
>>>
>>> Direct:
>>>
>>> +886-2-6612-1025
>>>
>>> Mobile:
>>>
>>> +886-9-1882-3787
>>>
>>> Fax:
>>>
>>> +886-2-6612-1991
>>>
>>>
>>>
>>> E-Mail:
>>>
>>> keynes_...@wistron.com
>>>
>>>
>>>
>>>
>>>
>>> *From:* Jason Dillaman [mailto:jdill...@redhat.com]
>>> *Sent:* Wednesday, October 26, 2016 9:38 PM
>>> *To:* Keynes Lee/WHQ/Wistron <keynes_...@wistron.com>
>>> *Cc:* will.bo...@target.com; ceph-users <ceph-users@lists.ceph.com>
>>> *Subject:* Re: [ceph-users] [EXTERNAL] Instance filesystem corrupt
>>>
>>>
>>>
>>> I am not aware of any similar reports against librbd on Firefly. Do you
>>> use any configuration overrides? Does the filesystem corruption appears
>>> while the instances are running or only after a shutdown / restart of the
>>> instance?
>>>
>>>
>>>
>>> On Wed, Oct 26, 2016 at 12:46 AM, <keynes_...@wistron.com> wrote:
>>>
>>> No , we are using Firefly (0.80.7).
>>>
>>> As we are using HPE Helion OpenStack 2.1.5, and what the version is was
>>> embedded is Firefly.
>>>
>>>
>>>
>>> An upgrade was planning, but should will not happen  soon.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Will.Boege [mailto:will.bo...@target.com]
>>> *Sent:* Wednesday, October 26, 2016 12:03 PM
>>> *To:* Keynes Lee/WHQ/Wistron <keynes_...@wistron.com>;
>>> ceph-users@lists.ceph.com
>>> *Subject:* Re: [EXTERNAL] [ceph-users] Instance filesystem corrupt
>>>
>>>
>>>
>>> Just out of curiosity, did you recently upgrade to Jewel?
>>>
>>>
>>>
>>> *From: *ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of "
>>> keynes_...@wistron.com" <keynes_...@wistron.com>
>>> *Date: *Tuesday, October 25, 2016 at 10:52 PM
>>> *To: *"ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com>
>>> *Subject: *[EXTERNAL] [ceph-users] Instance filesystem corrupt
>>>
>>>
>>>
>>> We are using OpenStack + Ceph.
>>>
>>> Recently we found a lot of filesystem corrupt incident on instances.
>>>
>>> Some of them are correctable, fixed by fsck, but t

[ceph-users] Qcow2 and RBD Import

2016-10-27 Thread Ahmed Mostafa
Hello,

Going through the documentation I am aware that I should be using RAW
images instead of Qcow2 when storing my images in ceph.

I have carried a small test to understand how this is going.

[root@ ~]# qemu-img create -f qcow2 test.qcow2 100G
Formatting 'test.qcow2', fmt=qcow2 size=107374182400 encryption=off
cluster_size=65536 lazy_refcounts=off
[root@ ~]# qemu-img info test.qcow2
image: test.qcow2
file format: qcow2
virtual size: 100G (107374182400 bytes)
disk size: 196K
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false


After that i havce imported the image to my ceph cluster:

[root@ ~]# rbd import test.qcow2 --pool openstack_images
rbd: --pool is deprecated for import, use --dest-pool
Importing image: 100% complete...done.


Now, watching the size of the image and it's information :

[root@ ~]# rbd info openstack_images/test.qcow2
rbd image 'test.qcow2':
size 194 kB in 1 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.123ca2ae8944a
format: 2
features: layering, exclusive-lock, object-map, fast-diff,
deep-flatten
flags:

So, what does this exactly mean ? am i able to store qcow2 ? Is it safe to
store qcow2? in the backend, how did that happen , were the image converted
somehow to raw ? have the other part of the image that did not have any
information stored trimmed ?

Thank you
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Instance filesystem corrupt

2016-10-26 Thread Ahmed Mostafa
This is more or less the same bahaviour i have in ky environment

By any chance is anyone running their osds and their hypervisors on the
same machine ?

And could high workload, like starting 40 - 60 or above virtual machines
have an effect on this problem ?


On Thursday, 27 October 2016,  wrote:

>
>
> Most of filesystem corrupt causes instances crashed, we saw that after a
> shutdown / restart
>
> ( triggered by OpenStack portal  buttons or triggered by OS commands in
> Instances )
>
>
>
> Some are early-detected, we saw filesystem errors in OS logs on instances.
>
> Then we make a filesystem check ( FSCK / chkdsk ) immediately, issue fixed.
>
>
>
> [image: cid:image007.jpg@01D1747D.DB260110]
>
> *Keynes  Lee**李* *俊* *賢*
>
> Direct:
>
> +886-2-6612-1025
>
> Mobile:
>
> +886-9-1882-3787
>
> Fax:
>
> +886-2-6612-1991
>
>
>
> E-Mail:
>
> keynes_...@wistron.com
> 
>
>
>
>
>
> *From:* Jason Dillaman [mailto:jdill...@redhat.com
> ]
> *Sent:* Wednesday, October 26, 2016 9:38 PM
> *To:* Keynes Lee/WHQ/Wistron  >
> *Cc:* will.bo...@target.com
> ; ceph-users <
> ceph-users@lists.ceph.com
> >
> *Subject:* Re: [ceph-users] [EXTERNAL] Instance filesystem corrupt
>
>
>
> I am not aware of any similar reports against librbd on Firefly. Do you
> use any configuration overrides? Does the filesystem corruption appears
> while the instances are running or only after a shutdown / restart of the
> instance?
>
>
>
> On Wed, Oct 26, 2016 at 12:46 AM,  > wrote:
>
> No , we are using Firefly (0.80.7).
>
> As we are using HPE Helion OpenStack 2.1.5, and what the version is was
> embedded is Firefly.
>
>
>
> An upgrade was planning, but should will not happen  soon.
>
>
>
>
>
>
>
>
>
>
>
> *From:* Will.Boege [mailto:will.bo...@target.com
> ]
> *Sent:* Wednesday, October 26, 2016 12:03 PM
> *To:* Keynes Lee/WHQ/Wistron  >;
> ceph-users@lists.ceph.com
> 
> *Subject:* Re: [EXTERNAL] [ceph-users] Instance filesystem corrupt
>
>
>
> Just out of curiosity, did you recently upgrade to Jewel?
>
>
>
> *From: *ceph-users  > on
> behalf of "keynes_...@wistron.com
> " <
> keynes_...@wistron.com
> >
> *Date: *Tuesday, October 25, 2016 at 10:52 PM
> *To: *"ceph-users@lists.ceph.com
> " <
> ceph-users@lists.ceph.com
> >
> *Subject: *[EXTERNAL] [ceph-users] Instance filesystem corrupt
>
>
>
> We are using OpenStack + Ceph.
>
> Recently we found a lot of filesystem corrupt incident on instances.
>
> Some of them are correctable, fixed by fsck, but the others have no luck,
> just corrupt and can never start up again.
>
>
>
> We found this issue on vary operation systems of instances. They are
>
> Redhat4 / CentOS 7 / Windows 2012
>
>
>
> Could someone please advise us some troubleshooting direction ?
>
>
>
>
>
> [image: id:image007.jpg@01D1747D.DB260110]
>
> *Keynes  Lee**李* *俊* *賢*
>
> Direct:
>
> +886-2-6612-1025
>
> Mobile:
>
> +886-9-1882-3787
>
> Fax:
>
> +886-2-6612-1991
>
>
>
> E-Mail:
>
> keynes_...@wistron.com
> 
>
>
>
>
>
>
> *---*
>
> *This email contains confidential or legally privileged information and is
> for the sole use of its intended recipient. *
>
> *Any unauthorized review, use, copying or distribution of this email or
> the content of this email is strictly prohibited.*
>
> *If you are not the intended recipient, you may reply to the sender and
> should delete this e-mail immediately.*
>
>
> *---*
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
>
> --
>
> Jason
>
___
ceph-users mailing list

Re: [ceph-users] Instance filesystem corrupt

2016-10-26 Thread Ahmed Mostafa
Actually i have the same problem when starting an instance backed up by
librbd
But this only happens when trying to start 60+ instance

But I decided that this is due to the fact that we are using old hardware
that is not able to respond to high demand.

Could that be the same issue that you are facing ?


On Wednesday, 26 October 2016, Jason Dillaman  wrote:

> I am not aware of any similar reports against librbd on Firefly. Do you
> use any configuration overrides? Does the filesystem corruption appears
> while the instances are running or only after a shutdown / restart of the
> instance?
>
> On Wed, Oct 26, 2016 at 12:46 AM,  > wrote:
>
>> No , we are using Firefly (0.80.7).
>>
>> As we are using HPE Helion OpenStack 2.1.5, and what the version is was
>> embedded is Firefly.
>>
>>
>>
>> An upgrade was planning, but should will not happen  soon.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Will.Boege [mailto:will.bo...@target.com
>> ]
>> *Sent:* Wednesday, October 26, 2016 12:03 PM
>> *To:* Keynes Lee/WHQ/Wistron > >;
>> ceph-users@lists.ceph.com
>> 
>> *Subject:* Re: [EXTERNAL] [ceph-users] Instance filesystem corrupt
>>
>>
>>
>> Just out of curiosity, did you recently upgrade to Jewel?
>>
>>
>>
>> *From: *ceph-users > > on
>> behalf of "keynes_...@wistron.com
>> " <
>> keynes_...@wistron.com
>> >
>> *Date: *Tuesday, October 25, 2016 at 10:52 PM
>> *To: *"ceph-users@lists.ceph.com
>> " <
>> ceph-users@lists.ceph.com
>> >
>> *Subject: *[EXTERNAL] [ceph-users] Instance filesystem corrupt
>>
>>
>>
>> We are using OpenStack + Ceph.
>>
>> Recently we found a lot of filesystem corrupt incident on instances.
>>
>> Some of them are correctable, fixed by fsck, but the others have no luck,
>> just corrupt and can never start up again.
>>
>>
>>
>> We found this issue on vary operation systems of instances. They are
>>
>> Redhat4 / CentOS 7 / Windows 2012
>>
>>
>>
>> Could someone please advise us some troubleshooting direction ?
>>
>>
>>
>>
>>
>> [image: id:image007.jpg@01D1747D.DB260110]
>>
>> *Keynes  Lee**李* *俊* *賢*
>>
>> Direct:
>>
>> +886-2-6612-1025
>>
>> Mobile:
>>
>> +886-9-1882-3787
>>
>> Fax:
>>
>> +886-2-6612-1991
>>
>>
>>
>> E-Mail:
>>
>> keynes_...@wistron.com
>> 
>>
>>
>>
>>
>>
>>
>> *---*
>>
>> *This email contains confidential or legally privileged information and
>> is for the sole use of its intended recipient. *
>>
>> *Any unauthorized review, use, copying or distribution of this email or
>> the content of this email is strictly prohibited.*
>>
>> *If you are not the intended recipient, you may reply to the sender and
>> should delete this e-mail immediately.*
>>
>>
>> *---*
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph on two data centers far away

2016-10-25 Thread Ahmed Mostafa
May i ask how are your two data centers are connected together ?

On Thursday, 20 October 2016, yan cui  wrote:

> The two data centers are actually cross US.  One is in the west, and the
> other in the east.
> We try to sync rdb images using RDB mirroring.
>
> 2016-10-20 9:54 GMT-07:00 German Anders  >:
>
>> from curiosity I wanted to ask you what kind of network topology are you
>> trying to use across the cluster? In this type of scenario you really need
>> an ultra low latency network, how far from each other?
>>
>> Best,
>>
>> *German*
>>
>> 2016-10-18 16:22 GMT-03:00 Sean Redmond > >:
>>
>>> Maybe this would be an option for you:
>>>
>>> http://docs.ceph.com/docs/jewel/rbd/rbd-mirroring/
>>>
>>>
>>> On Tue, Oct 18, 2016 at 8:18 PM, yan cui >> > wrote:
>>>
 Hi Guys,

Our company has a use case which needs the support of Ceph across
 two data centers (one data center is far away from the other). The
 experience of using one data center is good. We did some benchmarking on
 two data centers, and the performance is bad because of the synchronization
 feature in Ceph and large latency between data centers. So, are there
 setting ups like data center aware features in Ceph, so that we have good
 locality? Usually, we use rbd to create volume and snapshot. But we want
 the volume is high available with acceptable performance in case one data
 center is down. Our current setting ups does not consider data center
 difference. Any ideas?


 Thanks, Yan

 --
 Think big; Dream impossible; Make it happen.

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>
>
> --
> Think big; Dream impossible; Make it happen.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-rbd and ceph striping

2016-10-19 Thread Ahmed Mostafa
Does this also mean that strip count can be thought of as the number of
parrallel writes to different objects at different OSDs ?

Thank you

On Thursday, 20 October 2016, Jason Dillaman <jdill...@redhat.com> wrote:

> librbd (used by QEMU to provide RBD-backed disks) uses librados and
> provides the necessary handling for striping across multiple backing
> objects. When you don't specify "fancy" striping options via
> "--stripe-count" and "--stripe-unit", it essentially defaults to
> stripe count of 1 and stripe unit of the object size (defaults to
> 4MB).
>
> The use-case for fancy striping settings for an RBD image are images
> that have lots of small, sequential IO. The rationale for that is that
> normally these small, sequential IOs will continue to hit the same PG
> until the object boundary is crossed. However, if you were to use a
> small stripe unit that matched your normal IO size (or a small
> multiple thereof), your small, sequential IO requests would be sent to
>  PGs -- spreading the load.
>
> On Wed, Oct 19, 2016 at 12:32 PM, Ahmed Mostafa
> <ahmedmostafa...@gmail.com <javascript:;>> wrote:
> > Hello
> >
> > From the documentation i understand that clients that uses librados must
> > perform striping for themselves, but i do not understand how could this
> be
> > if we have striping options in ceph ? i mean i can create rbd images that
> > has configuration for striping, count and unite size.
> >
> > So my question is, if i created an RBD image that have striping enabled
> and
> > configured, will that make a difference with qemu-rbd ? by difference i
> mean
> > enhancing performance of my virtual machines i/o and allowing utilizing
> the
> > cluster resources
> >
> > Thank you
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com <javascript:;>
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] qemu-rbd and ceph striping

2016-10-19 Thread Ahmed Mostafa
Hello

>From the documentation i understand that clients that uses librados must
perform striping for themselves, but i do not understand how could this be
if we have striping options in ceph ? i mean i can create rbd images that
has configuration for striping, count and unite size.

So my question is, if i created an RBD image that have striping enabled and
configured, will that make a difference with qemu-rbd ? by difference i
mean enhancing performance of my virtual machines i/o and allowing
utilizing the cluster resources

Thank you
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com