[ceph-users] Restricting access of a users to only objects of a specific bucket

2016-08-04 Thread Parveen Sharma
Added appropriate subject





On Fri, Aug 5, 2016 at 10:23 AM, Parveen Sharma 
wrote:

> Have a cluster and I want a radosGW user to have access on a bucket
> objects only like /* but user should not be able to create new
> or remove this bucket
>
>
>
> -
> Parveen Kumar Sharma
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] (no subject)

2016-08-04 Thread Parveen Sharma
Have a cluster and I want a radosGW user to have access on a bucket objects
only like /* but user should not be able to create new or
remove this bucket



-
Parveen Kumar Sharma
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Advice on migrating from legacy tunables to Jewel tunables.

2016-08-04 Thread Goncalo Borges

Dear cephers...

I am looking for some advice on migrating from legacy tunables to Jewel 
tunables.


What would be the best strategy?

1) A step by step approach?
- starting with the transition from bobtail to firefly (and, in 
this particular step, by starting to set setting chooseleaf_vary_r=5 and 
then decrease it slowly to 1?)

- then from firefly to hammer
- then from hammer to jewel

2) or going directly to jewel tunables?

Any advise on how to minimize the data movement?

TIA
Goncalo



--
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW  2006
T: +61 2 93511937

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Scst-devel] Thin Provisioning and Ceph RBD's

2016-08-04 Thread Alex Gorbachev
On Wed, Aug 3, 2016 at 10:54 AM, Alex Gorbachev  
wrote:
> On Wed, Aug 3, 2016 at 9:59 AM, Alex Gorbachev  
> wrote:
>> On Tue, Aug 2, 2016 at 10:49 PM, Vladislav Bolkhovitin  wrote:
>>> Alex Gorbachev wrote on 08/02/2016 07:56 AM:
 On Tue, Aug 2, 2016 at 9:56 AM, Ilya Dryomov  wrote:
> On Tue, Aug 2, 2016 at 3:49 PM, Alex Gorbachev  
> wrote:
>> On Mon, Aug 1, 2016 at 11:03 PM, Vladislav Bolkhovitin  
>> wrote:
>>> Alex Gorbachev wrote on 08/01/2016 04:05 PM:
 Hi Ilya,

 On Mon, Aug 1, 2016 at 3:07 PM, Ilya Dryomov  
 wrote:
> On Mon, Aug 1, 2016 at 7:55 PM, Alex Gorbachev 
>  wrote:
>> RBD illustration showing RBD ignoring discard until a certain
>> threshold - why is that?  This behavior is unfortunately incompatible
>> with ESXi discard (UNMAP) behavior.
>>
>> Is there a way to lower the discard sensitivity on RBD devices?
>>
 
>>
>> root@e1:/var/log# blkdiscard -o 0 -l 4096000 /dev/rbd28
>> root@e1:/var/log# rbd diff spin1/testdis|awk '{ SUM += $2 } END {
>> print SUM/1024 " KB" }'
>> 819200 KB
>>
>> root@e1:/var/log# blkdiscard -o 0 -l 4096 /dev/rbd28
>> root@e1:/var/log# rbd diff spin1/testdis|awk '{ SUM += $2 } END {
>> print SUM/1024 " KB" }'
>> 782336 KB
>
> Think about it in terms of underlying RADOS objects (4M by default).
> There are three cases:
>
> discard range   | command
> -
> whole object| delete
> object's tail   | truncate
> object's head   | zero
>
> Obviously, only delete and truncate free up space.  In all of your
> examples, except the last one, you are attempting to discard the head
> of the (first) object.
>
> You can free up as little as a sector, as long as it's the tail:
>
> OffsetLength  Type
> 0 4194304 data
>
> # blkdiscard -o $(((4 << 20) - 512)) -l 512 /dev/rbd28
>
> OffsetLength  Type
> 0 4193792 data

 Looks like ESXi is sending in each discard/unmap with the fixed
 granularity of 8192 sectors, which is passed verbatim by SCST.  There
 is a slight reduction in size via rbd diff method, but now I
 understand that actual truncate only takes effect when the discard
 happens to clip the tail of an image.

 So far looking at
 https://kb.vmware.com/selfservice/microsites/search.do?language=en_US=displayKC=2057513

 ...the only variable we can control is the count of 8192-sector chunks
 and not their size.  Which means that most of the ESXi discard
 commands will be disregarded by Ceph.

 Vlad, is 8192 sectors coming from ESXi, as in the debug:

 Aug  1 19:01:36 e1 kernel: [168220.570332] Discarding (start_sector
 1342099456, nr_sects 8192)
>>>
>>> Yes, correct. However, to make sure that VMware is not (erroneously) 
>>> enforced to do this, you need to perform one more check.
>>>
>>> 1. Run cat /sys/block/rbd28/queue/discard*. Ceph should report here 
>>> correct granularity and alignment (4M, I guess?)
>>
>> This seems to reflect the granularity (4194304), which matches the
>> 8192 pages (8192 x 512 = 4194304).  However, there is no alignment
>> value.
>>
>> Can discard_alignment be specified with RBD?
>
> It's exported as a read-only sysfs attribute, just like
> discard_granularity:
>
> # cat /sys/block/rbd0/discard_alignment
> 4194304

 Ah thanks Ilya, it is indeed there.  Vlad, your email says to look for
 discard_alignment in /sys/block//queue, but for RBD it's in
 /sys/block/ - could this be the source of the issue?
>>>
>>> No. As you can see below, the alignment reported correctly. So, this must 
>>> be VMware
>>> issue, because it is ignoring the alignment parameter. You can try to align 
>>> your VMware
>>> partition on 4M boundary, it might help.
>>
>> Is this not a mismatch:
>>
>> - From sg_inq: Unmap granularity alignment: 8192
>>
>> - From "cat /sys/block/rbd0/discard_alignment": 4194304
>>
>> I am compiling the latest SCST trunk now.
>
> Scratch that, please, I just did a test that shows correct calculation
> of 4MB in sectors.
>
> - On iSCSI client node:
>
> dd if=/dev/urandom of=/dev/sdf bs=1M count=800
> blkdiscard -o 0 -l 4194304 /dev/sdf
>
> - On iSCSI server node:
>
> Aug  3 10:50:57 e1 kernel: [  893.444538] [1381]:
> vdisk_unmap_range:3832:Discarding (start_sector 

[ceph-users] 答复: Bad performance when two fio write to the same image

2016-08-04 Thread Zhiyuan Wang
Hi Jason

Thanks for your information

-邮件原件-
发件人: Jason Dillaman [mailto:jdill...@redhat.com]
发送时间: 2016年8月4日 19:49
收件人: Alexandre DERUMIER 
抄送: Zhiyuan Wang ; ceph-users 

主题: Re: [ceph-users] Bad performance when two fio write to the same image

With exclusive-lock, only a single client can have write access to the image at 
a time. Therefore, if you are using multiple fio processes against the same 
image, they will be passing the lock back and forth between each other and you 
can expect bad performance.

If you have a use-case where you really need to share the same image between 
multiple concurrent clients, you will need to disable the exclusive-lock 
feature (this can be done with the RBD cli on existing images or by passing 
"--image-shared" when creating new images).

On Thu, Aug 4, 2016 at 5:52 AM, Alexandre DERUMIER  wrote:
> Hi,
>
> I think this is because of exclusive-lock feature enabled by default
> since jessie on rbd image
>
>
> - Mail original -
> De: "Zhiyuan Wang" 
> À: "ceph-users" 
> Envoyé: Jeudi 4 Août 2016 11:37:04
> Objet: [ceph-users] Bad performance when two fio write to the same
> image
>
>
>
> Hi Guys
>
> I am testing the performance of Jewel (10.2.2) with FIO, but found the 
> performance would drop dramatically when two process write to the same image.
>
> My environment:
>
> 1. Server:
>
> One mon and four OSDs running on the same server.
>
> Intel P3700 400GB SSD which have 4 partitions, and each for one osd
> journal (journal size is 10GB)
>
> Inter P3700 400GB SSD which have 4 partitions, and each format to XFS
> for one osd data (each data is 90GB)
>
> 10GB network
>
> CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck)
>
> Memory: 256GB (it is not the bottleneck)
>
> 2. Client
>
> 10GB network
>
> CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck)
>
> Memory: 256GB (it is not the bottleneck)
>
> 3. Ceph
>
> Default configuration expect use async messager (have tried simple
> messager, got nearly the same result)
>
> 10GB image with 256 pg num
>
> Test Case
>
> 1. One Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd;
> randwrite
>
> The performance is nearly 60MB/s and IOPS is nearly 15K
>
> Four osd are nearly the same busy
>
> 2. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd;
> randwrite (write to the same image)
>
> The performance is nearly 4MB/s each, and IOPS is nearly 1.5K each
> Terrible
>
> And I found that only one osd is busy, the other three are much more
> idle on CPU
>
> And I also run FIO on two clients, the same result
>
> 3. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd
> randwrite (one to image1, one to image2)
>
> The performance is nearly 35MB/s each and IOPS is nearly 8.5K each
> Reasonable
>
> Four osd are nearly the same busy
>
>
>
>
>
> Could someone help to explain the reason of TEST 2
>
>
>
> Thanks
>
>
> Email Disclaimer & Confidentiality Notice
>
> This message is confidential and intended solely for the use of the recipient 
> to whom they are addressed. If you are not the intended recipient you should 
> not deliver, distribute or copy this e-mail. Please notify the sender 
> immediately by e-mail and delete this e-mail from your system. Copyright © 
> 2016 by Istuary Innovation Labs, Inc. All rights reserved.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jason
Email Disclaimer & Confidentiality Notice
This message is confidential and intended solely for the use of the recipient 
to whom they are addressed. If you are not the intended recipient you should 
not deliver, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail and delete this e-mail from your system. Copyright © 2016 
by Istuary Innovation Labs, Inc. All rights reserved.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] question about ceph-deploy osd create

2016-08-04 Thread Guillaume Comte
Ok i will try without creating by myself

Never the less thanks a lot Christian for your patience, i will try more
clever questions when i'm ready for them

Le 5 août 2016 02:44, "Christian Balzer"  a écrit :

Hello,

On Fri, 5 Aug 2016 02:41:47 +0200 Guillaume Comte wrote:

> Maybe you are mispelling, but in the docs they dont use white space but :
> this is quite misleading if it works
>
I'm quoting/showing "ceph-disk", which is called by ceph-deploy, which
indeed uses a ":".

Christian
> Le 5 août 2016 02:30, "Christian Balzer"  a écrit :
>
> >
> > Hello,
> >
> > On Fri, 5 Aug 2016 02:11:31 +0200 Guillaume Comte wrote:
> >
> > > I am reading half your answer
> > >
> > > Do you mean that ceph will create by itself the partitions for the
> > journal?
> > >
> > Yes, "man ceph-disk".
> >
> > > If so its cool and weird...
> > >
> > It can be very weird indeed.
> > If sdc is your data (OSD) disk and sdb your journal device then:
> >
> > "ceph-disk prepare /dev/sdc /dev/sdb1"
> > will not work, but:
> >
> > "ceph-disk prepare /dev/sdc /dev/sdb"
> > will and create a journal partition on sdb.
> > However you have no control over numbering or positioning this way.
> >
> > Christian
> >
> > > Le 5 août 2016 02:01, "Christian Balzer"  a écrit :
> > >
> > > >
> > > > Hello,
> > > >
> > > > you need to work on your google skills. ^_-
> > > >
> > > > I wrote about his just yesterday and if you search for "ceph-deploy
> > wrong
> > > > permission" the second link is the issue description:
> > > > http://tracker.ceph.com/issues/13833
> > > >
> > > > So I assume your journal partitions are either pre-made or non-GPT.
> > > >
> > > > Christian
> > > >
> > > > On Thu, 4 Aug 2016 15:34:44 +0200 Guillaume Comte wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > With ceph jewel,
> > > > >
> > > > > I'm pretty stuck with
> > > > >
> > > > >
> > > > > ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}]
> > > > >
> > > > > Because when i specify a journal path like this:
> > > > > ceph-deploy osd prepare ceph-osd1:sdd:sdf7
> > > > > And then:
> > > > > ceph-deploy osd activate ceph-osd1:sdd:sdf7
> > > > > I end up with "wrong permission" on the osd when activating,
> > complaining
> > > > > about "tmp" directory where the files are owned by root, and it
> > seems it
> > > > > tryes to do stuff as ceph user.
> > > > >
> > > > > It works when i don't specify a separate journal
> > > > >
> > > > > Any idea of what i'm doing wrong ?
> > > > >
> > > > > thks
> > > >
> > > >
> > > > --
> > > > Christian BalzerNetwork/Systems Engineer
> > > > ch...@gol.com   Global OnLine Japan/Rakuten Communications
> > > > http://www.gol.com/
> > > >
> >
> >
> > --
> > Christian BalzerNetwork/Systems Engineer
> > ch...@gol.com   Global OnLine Japan/Rakuten Communications
> > http://www.gol.com/
> >


--
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] question about ceph-deploy osd create

2016-08-04 Thread Christian Balzer
Hello,

On Fri, 5 Aug 2016 02:41:47 +0200 Guillaume Comte wrote:

> Maybe you are mispelling, but in the docs they dont use white space but :
> this is quite misleading if it works
>
I'm quoting/showing "ceph-disk", which is called by ceph-deploy, which
indeed uses a ":".
 
Christian
> Le 5 août 2016 02:30, "Christian Balzer"  a écrit :
> 
> >
> > Hello,
> >
> > On Fri, 5 Aug 2016 02:11:31 +0200 Guillaume Comte wrote:
> >
> > > I am reading half your answer
> > >
> > > Do you mean that ceph will create by itself the partitions for the
> > journal?
> > >
> > Yes, "man ceph-disk".
> >
> > > If so its cool and weird...
> > >
> > It can be very weird indeed.
> > If sdc is your data (OSD) disk and sdb your journal device then:
> >
> > "ceph-disk prepare /dev/sdc /dev/sdb1"
> > will not work, but:
> >
> > "ceph-disk prepare /dev/sdc /dev/sdb"
> > will and create a journal partition on sdb.
> > However you have no control over numbering or positioning this way.
> >
> > Christian
> >
> > > Le 5 août 2016 02:01, "Christian Balzer"  a écrit :
> > >
> > > >
> > > > Hello,
> > > >
> > > > you need to work on your google skills. ^_-
> > > >
> > > > I wrote about his just yesterday and if you search for "ceph-deploy
> > wrong
> > > > permission" the second link is the issue description:
> > > > http://tracker.ceph.com/issues/13833
> > > >
> > > > So I assume your journal partitions are either pre-made or non-GPT.
> > > >
> > > > Christian
> > > >
> > > > On Thu, 4 Aug 2016 15:34:44 +0200 Guillaume Comte wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > With ceph jewel,
> > > > >
> > > > > I'm pretty stuck with
> > > > >
> > > > >
> > > > > ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}]
> > > > >
> > > > > Because when i specify a journal path like this:
> > > > > ceph-deploy osd prepare ceph-osd1:sdd:sdf7
> > > > > And then:
> > > > > ceph-deploy osd activate ceph-osd1:sdd:sdf7
> > > > > I end up with "wrong permission" on the osd when activating,
> > complaining
> > > > > about "tmp" directory where the files are owned by root, and it
> > seems it
> > > > > tryes to do stuff as ceph user.
> > > > >
> > > > > It works when i don't specify a separate journal
> > > > >
> > > > > Any idea of what i'm doing wrong ?
> > > > >
> > > > > thks
> > > >
> > > >
> > > > --
> > > > Christian BalzerNetwork/Systems Engineer
> > > > ch...@gol.com   Global OnLine Japan/Rakuten Communications
> > > > http://www.gol.com/
> > > >
> >
> >
> > --
> > Christian BalzerNetwork/Systems Engineer
> > ch...@gol.com   Global OnLine Japan/Rakuten Communications
> > http://www.gol.com/
> >


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] question about ceph-deploy osd create

2016-08-04 Thread Guillaume Comte
Maybe you are mispelling, but in the docs they dont use white space but :
this is quite misleading if it works

Le 5 août 2016 02:30, "Christian Balzer"  a écrit :

>
> Hello,
>
> On Fri, 5 Aug 2016 02:11:31 +0200 Guillaume Comte wrote:
>
> > I am reading half your answer
> >
> > Do you mean that ceph will create by itself the partitions for the
> journal?
> >
> Yes, "man ceph-disk".
>
> > If so its cool and weird...
> >
> It can be very weird indeed.
> If sdc is your data (OSD) disk and sdb your journal device then:
>
> "ceph-disk prepare /dev/sdc /dev/sdb1"
> will not work, but:
>
> "ceph-disk prepare /dev/sdc /dev/sdb"
> will and create a journal partition on sdb.
> However you have no control over numbering or positioning this way.
>
> Christian
>
> > Le 5 août 2016 02:01, "Christian Balzer"  a écrit :
> >
> > >
> > > Hello,
> > >
> > > you need to work on your google skills. ^_-
> > >
> > > I wrote about his just yesterday and if you search for "ceph-deploy
> wrong
> > > permission" the second link is the issue description:
> > > http://tracker.ceph.com/issues/13833
> > >
> > > So I assume your journal partitions are either pre-made or non-GPT.
> > >
> > > Christian
> > >
> > > On Thu, 4 Aug 2016 15:34:44 +0200 Guillaume Comte wrote:
> > >
> > > > Hi All,
> > > >
> > > > With ceph jewel,
> > > >
> > > > I'm pretty stuck with
> > > >
> > > >
> > > > ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}]
> > > >
> > > > Because when i specify a journal path like this:
> > > > ceph-deploy osd prepare ceph-osd1:sdd:sdf7
> > > > And then:
> > > > ceph-deploy osd activate ceph-osd1:sdd:sdf7
> > > > I end up with "wrong permission" on the osd when activating,
> complaining
> > > > about "tmp" directory where the files are owned by root, and it
> seems it
> > > > tryes to do stuff as ceph user.
> > > >
> > > > It works when i don't specify a separate journal
> > > >
> > > > Any idea of what i'm doing wrong ?
> > > >
> > > > thks
> > >
> > >
> > > --
> > > Christian BalzerNetwork/Systems Engineer
> > > ch...@gol.com   Global OnLine Japan/Rakuten Communications
> > > http://www.gol.com/
> > >
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] question about ceph-deploy osd create

2016-08-04 Thread Christian Balzer

Hello,

On Fri, 5 Aug 2016 02:11:31 +0200 Guillaume Comte wrote:

> I am reading half your answer
> 
> Do you mean that ceph will create by itself the partitions for the journal?
>
Yes, "man ceph-disk".
 
> If so its cool and weird...
>
It can be very weird indeed.
If sdc is your data (OSD) disk and sdb your journal device then:

"ceph-disk prepare /dev/sdc /dev/sdb1" 
will not work, but:

"ceph-disk prepare /dev/sdc /dev/sdb"
will and create a journal partition on sdb. 
However you have no control over numbering or positioning this way.
 
Christian

> Le 5 août 2016 02:01, "Christian Balzer"  a écrit :
> 
> >
> > Hello,
> >
> > you need to work on your google skills. ^_-
> >
> > I wrote about his just yesterday and if you search for "ceph-deploy wrong
> > permission" the second link is the issue description:
> > http://tracker.ceph.com/issues/13833
> >
> > So I assume your journal partitions are either pre-made or non-GPT.
> >
> > Christian
> >
> > On Thu, 4 Aug 2016 15:34:44 +0200 Guillaume Comte wrote:
> >
> > > Hi All,
> > >
> > > With ceph jewel,
> > >
> > > I'm pretty stuck with
> > >
> > >
> > > ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}]
> > >
> > > Because when i specify a journal path like this:
> > > ceph-deploy osd prepare ceph-osd1:sdd:sdf7
> > > And then:
> > > ceph-deploy osd activate ceph-osd1:sdd:sdf7
> > > I end up with "wrong permission" on the osd when activating, complaining
> > > about "tmp" directory where the files are owned by root, and it seems it
> > > tryes to do stuff as ceph user.
> > >
> > > It works when i don't specify a separate journal
> > >
> > > Any idea of what i'm doing wrong ?
> > >
> > > thks
> >
> >
> > --
> > Christian BalzerNetwork/Systems Engineer
> > ch...@gol.com   Global OnLine Japan/Rakuten Communications
> > http://www.gol.com/
> >


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] question about ceph-deploy osd create

2016-08-04 Thread Guillaume Comte
I am reading half your answer

Do you mean that ceph will create by itself the partitions for the journal?

If so its cool and weird...

Le 5 août 2016 02:01, "Christian Balzer"  a écrit :

>
> Hello,
>
> you need to work on your google skills. ^_-
>
> I wrote about his just yesterday and if you search for "ceph-deploy wrong
> permission" the second link is the issue description:
> http://tracker.ceph.com/issues/13833
>
> So I assume your journal partitions are either pre-made or non-GPT.
>
> Christian
>
> On Thu, 4 Aug 2016 15:34:44 +0200 Guillaume Comte wrote:
>
> > Hi All,
> >
> > With ceph jewel,
> >
> > I'm pretty stuck with
> >
> >
> > ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}]
> >
> > Because when i specify a journal path like this:
> > ceph-deploy osd prepare ceph-osd1:sdd:sdf7
> > And then:
> > ceph-deploy osd activate ceph-osd1:sdd:sdf7
> > I end up with "wrong permission" on the osd when activating, complaining
> > about "tmp" directory where the files are owned by root, and it seems it
> > tryes to do stuff as ceph user.
> >
> > It works when i don't specify a separate journal
> >
> > Any idea of what i'm doing wrong ?
> >
> > thks
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fixing NTFS index in snapshot for new and existing clones

2016-08-04 Thread John Holder

Hello!

I would like some guidance about how to proceed with a problem inside of 
a snap which is used to clone images. My sincere apologies if what I am 
asking isn't possible.


I have snapshot which is used to create clones for guest virtual 
machines. It is a raw object with an NTFS OS contained within it.


My understanding is that when you clone the snap, all children become 
bound to the parent snap via layering.


We had a system problem in which I was able to recover almost fully. I 
could go into details, but I figure if I do, the advice will be to 
upgrade past dumpling (I can see you shaking your head :D). It is in 
very short term plan to upgrade. I just want to be sure my cluster is 
totally as clean as I can make it before I do it.


Recently, new clones and old clones started having a problem with the 
drive inside of windows. It seems to be a NTFS Index issue, which I can 
fix. (i've exported, verified the fix)


So I only have 4 pretty simple questions:

1) Would it be right to assume that if I fix the snapshot NTFS problem, 
that would 'cascade' to all cloned VMs? If not, I'm assuming I have to 
repair all clones individually (which I can script).
2) Am I off-base if I think the problem is in the snapshot? Could it be 
in the source image all along?
3) If there is no relationship with this snap or master image, then am I 
correct to assume that this is an individual problem on each of these 
guests? Or is there a source I should look at?

4) Would upgrading to at least firefly resolve this issue?

I've run many checks on the cluster and the data seems fully accessible 
and correct. No inconsistent pages, everything exports, snaps, can be 
moved. I also have the gdb debugger attached to watch for things which 
may arise in this version of ceph. I'll be upgrading once I find the 
answer to this.


I have also attempted to ensure the parent/child relationship is intact 
at HEAD by rolling back to the snap as mentioned on this mailing list in 
January.


Many thanks for your time!

--
John Holder
Trapp Technology
Developer, Linux, & Mail Operations
Complacency kills innovation, but ambition kills complacency.
Office: 602-443-9145 x2017
On Call Cell: 480-548-3902
Skype: z_jholder
Alt-Email: jhol...@brinkster.net

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] question about ceph-deploy osd create

2016-08-04 Thread Guillaume Comte
Yeah you are right

>From what i understand is that using a ceph is a good idea

But the fact is that it dont work

So i circumvent that by configuring ceph-deploy to use root

Was it the main goal, i dont think so

Thanks for your answer

Le 5 août 2016 02:01, "Christian Balzer"  a écrit :

>
> Hello,
>
> you need to work on your google skills. ^_-
>
> I wrote about his just yesterday and if you search for "ceph-deploy wrong
> permission" the second link is the issue description:
> http://tracker.ceph.com/issues/13833
>
> So I assume your journal partitions are either pre-made or non-GPT.
>
> Christian
>
> On Thu, 4 Aug 2016 15:34:44 +0200 Guillaume Comte wrote:
>
> > Hi All,
> >
> > With ceph jewel,
> >
> > I'm pretty stuck with
> >
> >
> > ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}]
> >
> > Because when i specify a journal path like this:
> > ceph-deploy osd prepare ceph-osd1:sdd:sdf7
> > And then:
> > ceph-deploy osd activate ceph-osd1:sdd:sdf7
> > I end up with "wrong permission" on the osd when activating, complaining
> > about "tmp" directory where the files are owned by root, and it seems it
> > tryes to do stuff as ceph user.
> >
> > It works when i don't specify a separate journal
> >
> > Any idea of what i'm doing wrong ?
> >
> > thks
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] question about ceph-deploy osd create

2016-08-04 Thread Christian Balzer

Hello,

you need to work on your google skills. ^_-

I wrote about his just yesterday and if you search for "ceph-deploy wrong
permission" the second link is the issue description:
http://tracker.ceph.com/issues/13833

So I assume your journal partitions are either pre-made or non-GPT.

Christian

On Thu, 4 Aug 2016 15:34:44 +0200 Guillaume Comte wrote:

> Hi All,
> 
> With ceph jewel,
> 
> I'm pretty stuck with
> 
> 
> ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}]
> 
> Because when i specify a journal path like this:
> ceph-deploy osd prepare ceph-osd1:sdd:sdf7
> And then:
> ceph-deploy osd activate ceph-osd1:sdd:sdf7
> I end up with "wrong permission" on the osd when activating, complaining
> about "tmp" directory where the files are owned by root, and it seems it
> tryes to do stuff as ceph user.
> 
> It works when i don't specify a separate journal
> 
> Any idea of what i'm doing wrong ?
> 
> thks


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bad performance when two fio write to the same image

2016-08-04 Thread Mark Nelson
If you search through the archives, there's been a couple of other 
people that have run into this as well with Jewel.  With the librbd 
engine, you are much better using iodepth and/or multiple fio processes 
vs numjobs.  Even pre-jewel, there were gotchas that might not be 
immediately apparent.  If you for instance increase numjobs and do 
sequential reads, after the first job reads some data, it gets cached on 
the OSD, and then all subsequent jobs will re-read the same cached data 
unless you explicitly change the offsets.


IE it was probably never a good idea to use numjobs, but now it's really 
apparent that it's not a good idea. :)


Mark

On 08/04/2016 03:48 PM, Warren Wang - ISD wrote:

Wow, thanks. I think that¹s the tidbit of info I needed to explain why
increasing numjobs doesn¹t (anymore) scale performance as expected.

Warren Wang



On 8/4/16, 7:49 AM, "ceph-users on behalf of Jason Dillaman"
 wrote:


With exclusive-lock, only a single client can have write access to the
image at a time. Therefore, if you are using multiple fio processes
against the same image, they will be passing the lock back and forth
between each other and you can expect bad performance.

If you have a use-case where you really need to share the same image
between multiple concurrent clients, you will need to disable the
exclusive-lock feature (this can be done with the RBD cli on existing
images or by passing "--image-shared" when creating new images).

On Thu, Aug 4, 2016 at 5:52 AM, Alexandre DERUMIER 
wrote:

Hi,

I think this is because of exclusive-lock feature enabled by default
since jessie on rbd image


- Mail original -
De: "Zhiyuan Wang" 
À: "ceph-users" 
Envoyé: Jeudi 4 Août 2016 11:37:04
Objet: [ceph-users] Bad performance when two fio write to the same image



Hi Guys

I am testing the performance of Jewel (10.2.2) with FIO, but found the
performance would drop dramatically when two process write to the same
image.

My environment:

1. Server:

One mon and four OSDs running on the same server.

Intel P3700 400GB SSD which have 4 partitions, and each for one osd
journal (journal size is 10GB)

Inter P3700 400GB SSD which have 4 partitions, and each format to XFS
for one osd data (each data is 90GB)

10GB network

CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck)

Memory: 256GB (it is not the bottleneck)

2. Client

10GB network

CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck)

Memory: 256GB (it is not the bottleneck)

3. Ceph

Default configuration expect use async messager (have tried simple
messager, got nearly the same result)

10GB image with 256 pg num

Test Case

1. One Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd;
randwrite

The performance is nearly 60MB/s and IOPS is nearly 15K

Four osd are nearly the same busy

2. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd;
randwrite (write to the same image)

The performance is nearly 4MB/s each, and IOPS is nearly 1.5K each
Terrible

And I found that only one osd is busy, the other three are much more
idle on CPU

And I also run FIO on two clients, the same result

3. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd
randwrite (one to image1, one to image2)

The performance is nearly 35MB/s each and IOPS is nearly 8.5K each
Reasonable

Four osd are nearly the same busy





Could someone help to explain the reason of TEST 2



Thanks


Email Disclaimer & Confidentiality Notice

This message is confidential and intended solely for the use of the
recipient to whom they are addressed. If you are not the intended
recipient you should not deliver, distribute or copy this e-mail. Please
notify the sender immediately by e-mail and delete this e-mail from your
system. Copyright © 2016 by Istuary Innovation Labs, Inc. All rights
reserved.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bad performance when two fio write to the same image

2016-08-04 Thread Warren Wang - ISD
Wow, thanks. I think that¹s the tidbit of info I needed to explain why
increasing numjobs doesn¹t (anymore) scale performance as expected.

Warren Wang



On 8/4/16, 7:49 AM, "ceph-users on behalf of Jason Dillaman"
 wrote:

>With exclusive-lock, only a single client can have write access to the
>image at a time. Therefore, if you are using multiple fio processes
>against the same image, they will be passing the lock back and forth
>between each other and you can expect bad performance.
>
>If you have a use-case where you really need to share the same image
>between multiple concurrent clients, you will need to disable the
>exclusive-lock feature (this can be done with the RBD cli on existing
>images or by passing "--image-shared" when creating new images).
>
>On Thu, Aug 4, 2016 at 5:52 AM, Alexandre DERUMIER 
>wrote:
>> Hi,
>>
>> I think this is because of exclusive-lock feature enabled by default
>>since jessie on rbd image
>>
>>
>> - Mail original -
>> De: "Zhiyuan Wang" 
>> À: "ceph-users" 
>> Envoyé: Jeudi 4 Août 2016 11:37:04
>> Objet: [ceph-users] Bad performance when two fio write to the same image
>>
>>
>>
>> Hi Guys
>>
>> I am testing the performance of Jewel (10.2.2) with FIO, but found the
>>performance would drop dramatically when two process write to the same
>>image.
>>
>> My environment:
>>
>> 1. Server:
>>
>> One mon and four OSDs running on the same server.
>>
>> Intel P3700 400GB SSD which have 4 partitions, and each for one osd
>>journal (journal size is 10GB)
>>
>> Inter P3700 400GB SSD which have 4 partitions, and each format to XFS
>>for one osd data (each data is 90GB)
>>
>> 10GB network
>>
>> CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck)
>>
>> Memory: 256GB (it is not the bottleneck)
>>
>> 2. Client
>>
>> 10GB network
>>
>> CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck)
>>
>> Memory: 256GB (it is not the bottleneck)
>>
>> 3. Ceph
>>
>> Default configuration expect use async messager (have tried simple
>>messager, got nearly the same result)
>>
>> 10GB image with 256 pg num
>>
>> Test Case
>>
>> 1. One Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd;
>>randwrite
>>
>> The performance is nearly 60MB/s and IOPS is nearly 15K
>>
>> Four osd are nearly the same busy
>>
>> 2. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd;
>>randwrite (write to the same image)
>>
>> The performance is nearly 4MB/s each, and IOPS is nearly 1.5K each
>>Terrible
>>
>> And I found that only one osd is busy, the other three are much more
>>idle on CPU
>>
>> And I also run FIO on two clients, the same result
>>
>> 3. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd
>>randwrite (one to image1, one to image2)
>>
>> The performance is nearly 35MB/s each and IOPS is nearly 8.5K each
>>Reasonable
>>
>> Four osd are nearly the same busy
>>
>>
>>
>>
>>
>> Could someone help to explain the reason of TEST 2
>>
>>
>>
>> Thanks
>>
>>
>> Email Disclaimer & Confidentiality Notice
>>
>> This message is confidential and intended solely for the use of the
>>recipient to whom they are addressed. If you are not the intended
>>recipient you should not deliver, distribute or copy this e-mail. Please
>>notify the sender immediately by e-mail and delete this e-mail from your
>>system. Copyright © 2016 by Istuary Innovation Labs, Inc. All rights
>>reserved.
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>-- 
>Jason
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Troubleshooting] I have a watcher I can't get rid of...

2016-08-04 Thread K.C. Wong
Thank you, Jason.

While I can't find the culprit for the watcher (the watcher never expired,
and survived a reboot. udev, maybe?), blacklisting the host did allow me
to remove the device.

Much appreciated,

-kc

> On Aug 4, 2016, at 4:50 AM, Jason Dillaman  wrote:
> 
> If the client is no longer running the watch should expire within 30
> seconds. If you are still experiencing this issue, you can blacklist
> the mystery client via "ceph osd blacklist add".
> 
> On Wed, Aug 3, 2016 at 6:06 PM, K.C. Wong  wrote:
>> I'm having a hard time removing an RBD that I no longer need.
>> 
>> # rbd rm /
>> 2016-08-03 15:00:01.085784 7ff9dfc997c0 -1 librbd: image has watchers - not 
>> removing
>> Removing image: 0% complete...failed.
>> rbd: error: image still has watchers
>> This means the image is still open or the client using it crashed. Try again 
>> after closing/unmapping it or waiting 30s for the crashed client to timeout.
>> 
>> So, I use `rbd status` to identify the watcher:
>> 
>> # rbd status /
>> Watchers:
>>watcher=:0/705293879 client.1076985 cookie=1
>> 
>> I log onto that host, and did
>> 
>> # rbd showmapped
>> 
>> which returns nothing
>> 
>> I don't use snapshot and I don't use cloning, so, there shouldn't
>> be any image sharing. I ended up rebooting that host and the
>> watcher is still around, and my problem persist: I can't remove
>> the RBD.
>> 
>> At this point, I'm all out of ideas on how to troubleshoot this
>> problem. I'm running infernalis:
>> 
>> # ceph --version
>> ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
>> 
>> in my set up, on CentOS 7.2 hosts
>> 
>> # uname -r
>> 3.10.0-327.22.2.el7.x86_64
>> 
>> I appreciate any assistance,
>> 
>> -kc
>> 
>> K.C. Wong
>> kcw...@verseon.com
>> 4096R/B8995EDE  E527 CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
>> hkps://hkps.pool.sks-keyservers.net
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> 
> 
> 
> --
> Jason

K.C. Wong
kcw...@verseon.com
4096R/B8995EDE  E527 CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
hkps://hkps.pool.sks-keyservers.net



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] fast-diff map is always invalid

2016-08-04 Thread Jason Dillaman
Can you run "rbd info vm-208-disk-2@initial.20160729-220225"? You most
likely need to rebuild the object map for that specific snapshot via
"rbd object-map rebuild vm-208-disk-2@initial.20160729-220225".

On Sat, Jul 30, 2016 at 7:17 AM, Christoph Adomeit
 wrote:
> Hi there,
>
> I upgraded my cluster to jewel recently, built object maps for every image and
> recreated all snapshots du use fast-diff feature for backups.
>
> Unfortunately i am still getting the following error message on rbd du:
>
> root@host:/backups/ceph# rbd du vm-208-disk-2
> warning: fast-diff map is invalid for vm-208-disk-2@initial.20160729-220225. 
> operation may be slow.
>
> What might be wrong ?
>
> root@1host:/backups/ceph# rbd --version
> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>
> root@host:/backups/ceph# rbd info vm-208-disk-2
> rbd image 'vm-208-disk-2':
> size 275 GB in 70400 objects
> order 22 (4096 kB objects)
> block_name_prefix: rbd_data.35ea4ac2ae8944a
> format: 2
> features: layering, exclusive-lock, object-map, fast-diff
> flags:
>
> Thanks
>   Christoph
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd-mirror questions

2016-08-04 Thread Shain Miley

Hello,

I am thinking about setting up a second Ceph cluster in the near future, 
and I was wondering about the current status of rbd-mirror.


1)is it production ready at this point?

2)can it be used when you have a cluster with existing data in order to 
replicate onto a new cluster?


3)we have some rather large rbd images at this point..several in the 
90TB range...would there be any concern using rbd-mirror given the size 
of our images?


Thanks,

Shain

--
NPR | Shain Miley | Manager of Infrastructure, Digital Media | 
smi...@npr.org | 202.513.3649


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] openATTIC 2.0.13 beta has been released

2016-08-04 Thread Lenz Grimmer
Hi all,

FYI, a few days ago, we released openATTIC 2.0.13 beta. On the Ceph
management side, we've made some progress with the cluster and pool
monitoring backend, which lays the foundation for the dashboard that
will display graphs generated from this data. We also added some more
RBD management functionality to the Web UI.

For more details, please see the release announcement here:

https://blog.openattic.org/posts/openattic-2.0.13-beta-has-been-released/

We're still in the early stages of development of the Ceph management
and monitoring functionality, so we're very eager on receiving feedback
and comments.

Thanks!

Lenz



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] question about ceph-deploy osd create

2016-08-04 Thread Guillaume Comte
Hi All,

With ceph jewel,

I'm pretty stuck with


ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}]

Because when i specify a journal path like this:
ceph-deploy osd prepare ceph-osd1:sdd:sdf7
And then:
ceph-deploy osd activate ceph-osd1:sdd:sdf7
I end up with "wrong permission" on the osd when activating, complaining
about "tmp" directory where the files are owned by root, and it seems it
tryes to do stuff as ceph user.

It works when i don't specify a separate journal

Any idea of what i'm doing wrong ?

thks
-- 
*Guillaume Comte*
06 25 85 02 02  | guillaume.co...@blade-group.com

90 avenue des Ternes, 75 017 Paris
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ubuntu 14.04 Striping / RBD / Single Thread Performance

2016-08-04 Thread Jason Dillaman
If you are attempting to use RBD "fancy" striping (e.g. stripe unit !=
object size and stripe count != 1) with krbd, the answer is that it is
still unsupported.

On Wed, Aug 3, 2016 at 8:41 AM, w...@globe.de  wrote:
> Hi List,
> i am using Ceph Infernalis and Ubuntu 14.04 Kernel 3.13.
> 18 Data Server / 3 MON / 3 RBD Clients
>
> I want to use RBD on the Client with image format 2 and Striping.
> Is it supported?
>
> I want to create rbd with:
> rbd create testrbd -s 2T --image-format=2 --image-feature=striping
> --image-feature=exclusive-lock --stripe-unit 65536B --stripe-count 8
>
> Do i become better single Thread performance with a higher stripe count?
> If not: Should i use Ubuntu 16.04 with Kernel 4.4 ? Is it with that Kernel
> supported?
>
> The manpage says:
>
> http://manpages.ubuntu.com/manpages/wily/man8/rbd.8.html
>
> PARAMETERS
>
>--image-format format
>   Specifies which object layout to use. The default is 1.
>
>   · format  2  -  Use the second rbd format, which is supported
> by
> librbd and kernel since version 3.11  (except  for
> striping).
> This adds support for cloning and is more easily extensible
> to
> allow more features in the future.
>
>
> Regards
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Troubleshooting] I have a watcher I can't get rid of...

2016-08-04 Thread Jason Dillaman
If the client is no longer running the watch should expire within 30
seconds. If you are still experiencing this issue, you can blacklist
the mystery client via "ceph osd blacklist add".

On Wed, Aug 3, 2016 at 6:06 PM, K.C. Wong  wrote:
> I'm having a hard time removing an RBD that I no longer need.
>
> # rbd rm /
> 2016-08-03 15:00:01.085784 7ff9dfc997c0 -1 librbd: image has watchers - not 
> removing
> Removing image: 0% complete...failed.
> rbd: error: image still has watchers
> This means the image is still open or the client using it crashed. Try again 
> after closing/unmapping it or waiting 30s for the crashed client to timeout.
>
> So, I use `rbd status` to identify the watcher:
>
> # rbd status /
> Watchers:
> watcher=:0/705293879 client.1076985 cookie=1
>
> I log onto that host, and did
>
> # rbd showmapped
>
> which returns nothing
>
> I don't use snapshot and I don't use cloning, so, there shouldn't
> be any image sharing. I ended up rebooting that host and the
> watcher is still around, and my problem persist: I can't remove
> the RBD.
>
> At this point, I'm all out of ideas on how to troubleshoot this
> problem. I'm running infernalis:
>
> # ceph --version
> ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
>
> in my set up, on CentOS 7.2 hosts
>
> # uname -r
> 3.10.0-327.22.2.el7.x86_64
>
> I appreciate any assistance,
>
> -kc
>
> K.C. Wong
> kcw...@verseon.com
> 4096R/B8995EDE  E527 CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
> hkps://hkps.pool.sks-keyservers.net
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bad performance when two fio write to the same image

2016-08-04 Thread Jason Dillaman
With exclusive-lock, only a single client can have write access to the
image at a time. Therefore, if you are using multiple fio processes
against the same image, they will be passing the lock back and forth
between each other and you can expect bad performance.

If you have a use-case where you really need to share the same image
between multiple concurrent clients, you will need to disable the
exclusive-lock feature (this can be done with the RBD cli on existing
images or by passing "--image-shared" when creating new images).

On Thu, Aug 4, 2016 at 5:52 AM, Alexandre DERUMIER  wrote:
> Hi,
>
> I think this is because of exclusive-lock feature enabled by default since 
> jessie on rbd image
>
>
> - Mail original -
> De: "Zhiyuan Wang" 
> À: "ceph-users" 
> Envoyé: Jeudi 4 Août 2016 11:37:04
> Objet: [ceph-users] Bad performance when two fio write to the same image
>
>
>
> Hi Guys
>
> I am testing the performance of Jewel (10.2.2) with FIO, but found the 
> performance would drop dramatically when two process write to the same image.
>
> My environment:
>
> 1. Server:
>
> One mon and four OSDs running on the same server.
>
> Intel P3700 400GB SSD which have 4 partitions, and each for one osd journal 
> (journal size is 10GB)
>
> Inter P3700 400GB SSD which have 4 partitions, and each format to XFS for one 
> osd data (each data is 90GB)
>
> 10GB network
>
> CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck)
>
> Memory: 256GB (it is not the bottleneck)
>
> 2. Client
>
> 10GB network
>
> CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck)
>
> Memory: 256GB (it is not the bottleneck)
>
> 3. Ceph
>
> Default configuration expect use async messager (have tried simple messager, 
> got nearly the same result)
>
> 10GB image with 256 pg num
>
> Test Case
>
> 1. One Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; randwrite
>
> The performance is nearly 60MB/s and IOPS is nearly 15K
>
> Four osd are nearly the same busy
>
> 2. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; randwrite 
> (write to the same image)
>
> The performance is nearly 4MB/s each, and IOPS is nearly 1.5K each Terrible
>
> And I found that only one osd is busy, the other three are much more idle on 
> CPU
>
> And I also run FIO on two clients, the same result
>
> 3. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd randwrite 
> (one to image1, one to image2)
>
> The performance is nearly 35MB/s each and IOPS is nearly 8.5K each Reasonable
>
> Four osd are nearly the same busy
>
>
>
>
>
> Could someone help to explain the reason of TEST 2
>
>
>
> Thanks
>
>
> Email Disclaimer & Confidentiality Notice
>
> This message is confidential and intended solely for the use of the recipient 
> to whom they are addressed. If you are not the intended recipient you should 
> not deliver, distribute or copy this e-mail. Please notify the sender 
> immediately by e-mail and delete this e-mail from your system. Copyright © 
> 2016 by Istuary Innovation Labs, Inc. All rights reserved.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Small Ceph cluster

2016-08-04 Thread Tom T
I will read your cache thread, thnx.

Now we have the following setup in mind:
X10SRH-CLN4F
E5-2620v4
32GB/64GB RAM
6x 2TB (start with 4 drives)
1x S3710 200GB for Journaling
In the future adding 2 SSD's for caching, or is it an option to use a P3700
400GB (or two) for journaling and caching ?

Kind regards,
Tom


On Mon, Aug 1, 2016 at 2:46 PM, Christian Balzer  wrote:

>
> Hello,
>
> On Mon, 1 Aug 2016 14:34:43 +0200 Tom T wrote:
>
> > Hi Christian,
> >
> > Thnx for your reply.
> >
> > Case:
> > CSE-825TQC-600LPB
> >
> > I made a typo with the CPU, it's a E3-1240 v5 3.5Ghz.
> > So a E5-2620 v4 is recommended when i want to add SSD for caching ?
> >
> It's more a question of core count (and speed per core), an E5-1650 v3 for
> example would do the trick (in all situations) with 4 HDD OSDs with SSD
> journal and 1-2 SSD OSDs.
>
> > With a caching tier, is the data on the caching tier a copy from data on
> > the normal tier ?
> You want to re-read the respective documentation and the various threads
> on this ML about cache tiering, including my
> "Cache tier operation clarifications" one.
>
> In a reasonably busy cluster a cache pool will be very different from the
> base pool and some hot data may never reach the base pool, ever.
>
> Meaning that your cache pool needs to be just as reliable as everything
> else.
>
> > Is a caching tier with one SSD recommended or should i always have two
> SSD
> > in replicated mode ?
> >
> See above.
>
> Christian
> >
> > Kind regards,
> > Tom
> >
> >
> >
> > On Mon, Aug 1, 2016 at 2:00 PM, Christian Balzer  wrote:
> >
> > >
> > > Hello,
> > >
> > > On Mon, 1 Aug 2016 11:09:00 +0200 Tom T wrote:
> > >
> > > > Hi Ceph users
> > > >
> > > > We are planning to setup a small ceph cluster, starting with 3 nodes
> for
> > > > VM's.
> > > > I have some question about CPU and caching
> > > >
> > > > We would like to start with the following config:
> > > >
> > > >
> > > > Supermicro X11SSI-LN4F
> > > In which case?
> > >
> > > > Intel E3-1246 v3 3.5Ghz
> > > A bit dated, but fast enough.
> > >
> > > > 32GB RAM
> > > While enough for 4 OSDs, don't skimp on RAM if you can afford, reads
> will
> > > thank you for it.
> > >
> > > > S3500 80GB M.2 for OS
> > > If you're short on money, maybe use a 535 (or 2!) for that purpose.
> > >
> > > > AOC-S3008L-L8e (LSI SAS3008)
> > > > 4x 2TB ST2000NM0034 SAS12Gb
> > >
> > > I fail to see the need/point for 7.2k RPM HDDs with a mere 128MB of
> cache
> > > hanging of a 12Gb/s bus, but maybe that's just me.
> > >
> > > > 1x Intel 200GB S3710 for journal (via onboard SATA)
> > > Good enough.
> > >
> > > > 4x 1Gb for networking
> > > >
> > > Unless all your clients also are limited to GbE and you have no budget
> to
> > > change that, don't.
> > >
> > > For VM's latency will be one of your biggest nemesis (nemesii?), use
> > > faster (lower latency) networking.
> > >
> > > > Questions:
> > > > Is the CPU enough ?
> > > See above.
> > >
> > > > I would like to run the monitoring deamon on the same host, would
> this
> > > be a
> > > > problem ?
> > > >
> > > Just within the normal usage needs, more RAM in that case anyway.
> > >
> > > > Optionally i would like to add an extra SSD for caching
> > > Not really recommended with that server and not particular helpful with
> > > that network.
> > > A single SSD of any caliber will/can eat one of your CPU cores by
> itself
> > > and then ask for seconds.
> > >
> > > > Does write-back caching also optimize the reads ?
> > > Yes, subject to "correct" configuration of course.
> > >
> > > > Do I need two SSD's per node
> > > >
> > > From a performance point of view, not so much.
> > > Your network can't even saturate one 200GB DC S3710.
> > >
> > > From a redundancy point of view you might be better off with more
> nodes.
> > >
> > > Christian
> > >
> > > >
> > > > Kind regards,
> > > > Tom
> > >
> > >
> > > --
> > > Christian BalzerNetwork/Systems Engineer
> > > ch...@gol.com   Global OnLine Japan/Rakuten Communications
> > > http://www.gol.com/
> > >
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bad performance when two fio write to the same image

2016-08-04 Thread Alexandre DERUMIER
Hi,

I think this is because of exclusive-lock feature enabled by default since 
jessie on rbd image


- Mail original -
De: "Zhiyuan Wang" 
À: "ceph-users" 
Envoyé: Jeudi 4 Août 2016 11:37:04
Objet: [ceph-users] Bad performance when two fio write to the same image



Hi Guys 

I am testing the performance of Jewel (10.2.2) with FIO, but found the 
performance would drop dramatically when two process write to the same image. 

My environment: 

1. Server: 

One mon and four OSDs running on the same server. 

Intel P3700 400GB SSD which have 4 partitions, and each for one osd journal 
(journal size is 10GB) 

Inter P3700 400GB SSD which have 4 partitions, and each format to XFS for one 
osd data (each data is 90GB) 

10GB network 

CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck) 

Memory: 256GB (it is not the bottleneck) 

2. Client 

10GB network 

CPU: Intel(R) Xeon(R) CPU E5-2660 (it is not the bottleneck) 

Memory: 256GB (it is not the bottleneck) 

3. Ceph 

Default configuration expect use async messager (have tried simple messager, 
got nearly the same result) 

10GB image with 256 pg num 

Test Case 

1. One Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; randwrite 

The performance is nearly 60MB/s and IOPS is nearly 15K 

Four osd are nearly the same busy 

2. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; randwrite 
(write to the same image) 

The performance is nearly 4MB/s each, and IOPS is nearly 1.5K each Terrible 

And I found that only one osd is busy, the other three are much more idle on 
CPU 

And I also run FIO on two clients, the same result 

3. Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd randwrite (one 
to image1, one to image2) 

The performance is nearly 35MB/s each and IOPS is nearly 8.5K each Reasonable 

Four osd are nearly the same busy 





Could someone help to explain the reason of TEST 2 



Thanks 


Email Disclaimer & Confidentiality Notice 

This message is confidential and intended solely for the use of the recipient 
to whom they are addressed. If you are not the intended recipient you should 
not deliver, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail and delete this e-mail from your system. Copyright © 2016 
by Istuary Innovation Labs, Inc. All rights reserved. 



___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bad performance when two fio write to the same image

2016-08-04 Thread Zhiyuan Wang
Hi Guys
I am testing the performance of Jewel (10.2.2) with FIO, but found the 
performance would drop dramatically when two process write to the same image.
My environment:

1.   Server:

One mon and four OSDs running on the same server.

Intel P3700 400GB SSD which have 4 partitions, and each for one osd journal 
(journal size is 10GB)

Inter P3700 400GB SSD which have 4 partitions, and each format to XFS for one 
osd data (each data is 90GB)

10GB network

CPU: Intel(R) Xeon(R) CPU E5-2660  (it is not the bottleneck)

Memory: 256GB (it is not the bottleneck)

2.   Client

10GB network

CPU: Intel(R) Xeon(R) CPU E5-2660  (it is not the bottleneck)

Memory: 256GB (it is not the bottleneck)

3.   Ceph

Default configuration expect use async messager (have tried simple messager, 
got nearly the same result)

10GB image with 256 pg num

Test Case

1.   One Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; randwrite

The performance is nearly 60MB/s and IOPS is nearly 15K

Four osd are nearly the same busy

2.   Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd; 
randwrite (write to the same image)

The performance is nearly 4MB/s each, and IOPS is nearly 1.5K each   Terrible

And I found that only one osd is busy, the other three are much more idle on CPU

And I also run FIO on two clients, the same result

3.   Two Fio process: bs 4KB; iodepth 256; direct 1; ioengine rbd randwrite 
(one to image1, one to image2)

The performance is nearly 35MB/s each and IOPS is nearly 8.5K each Reasonable

Four osd are nearly the same busy





Could someone help to explain the reason of TEST 2



Thanks
Email Disclaimer & Confidentiality Notice
This message is confidential and intended solely for the use of the recipient 
to whom they are addressed. If you are not the intended recipient you should 
not deliver, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail and delete this e-mail from your system. Copyright (c) 
2016 by Istuary Innovation Labs, Inc. All rights reserved.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph and SMI-S

2016-08-04 Thread Luis Periquito
Hi all,

I was being asked if CEPH supports the Storage Management Initiative
Specification (SMI-S)? This for the context of monitoring our ceph
clusters/environments.

I've tried looking and find no references to supporting it. But does it?

thanks,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Upgrading a "conservative" [tm] cluster from Hammer to Jewel, a nightmare in the making

2016-08-04 Thread Christian Balzer

Hello,

This is going to have some level of ranting, bear with me as the points
are all valid and poignant.

Backstory:

I currently run 3 Ceph clusters all on Debian Jessie but with SysV init,
as they all predate any systemd supporting Ceph packages. 
- A crappy test one running Hammer, manually deployed (OSDs mounted via
  fstab, a mix of Ext4 and XFS), MBR/DOS partitions.
- Our main production cluster, also with fstab mounted OSDs, all Ext4.
  Extra bonus points for OSDs being entire, unpartitioned disk and journal
  holding SSDs being MBR/DOS partitioned.
- Non critical production cluster installed with ceph-deploy and GPT OSDs.

The later obviously is going to be the least problematic, though I'm sure
there will be enough entertainment.

Now we just finally got our real staging/testing cluster that actually
resembles the production ones so I was going to give it a few spins before
installing something that equals the production cluster.

First try was Hammer using ceph-deploy. 
Complete fail, due to lack of systemd unit files/targets:
---
[ceph-01][INFO  ] Running command: systemctl enable ceph.target
[ceph-01][WARNIN] Failed to execute operation: No such file or directory
---

Allrite, lets try with Jewel. 
This blew up in my face when trying to use previously created partitions
(GPT, mind ya), as documented here:
---
http://tracker.ceph.com/issues/13833
---
Incidentally ceph-deploy once again is trying to be too helpful, when I
gave it a "/dev/sda4" as journal target with the wrong GUID but a chown'ed
dev file it created things and linked to the partition as stated.

With partitions that have the correct GUID it will make a smarty-pants
link to /dev/disk/by-partuuid/ (good intention!), even when given a
"/dev/disk/by-id/" input. Oh well.

And ceph-deploy of course still activates new OSDs half of the time, the
other half it actually needs the activate step, thanks to udev I'm sure.

Now I'd like to repeat the question in the issue above, at what point
did GPT partitions (and udev magic) become mandatory?
And if it is indeed mandatory, where are the painless and data safe
transition tools?

I'll re-create the main production cluster (that is hammer, sysv-init,
fstab mounted OSDs) on the staging system next and see how what blows up
and how violently when trying a Jewel upgrade. 

My guess is that it won't be systemd (as Jewel actually has the targets
now), but the inability to deal with a manually deployed environment like
mine.

Expect news about that next week the latest.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com