Re: [ceph-users] unable to do regionmap update

2017-01-14 Thread Orit Wasserman
On Wed, Jan 11, 2017 at 2:53 PM, Marko Stojanovic  wrote:
>
> Hello all,
>
>  I have issue with radosgw-admin regionmap update . It doesn't update map.
>
> With zone configured like this:
>
> radosgw-admin zone get
> {
> "id": "fc12ac44-e27e-44e3-9b13-347162d3c1d2",
> "name": "oak-1",
> "domain_root": "oak-1.rgw.data.root",
> "control_pool": "oak-1.rgw.control",
> "gc_pool": "oak-1.rgw.gc",
> "log_pool": "oak-1.rgw.log",
> "intent_log_pool": "oak-1.rgw.intent-log",
> "usage_log_pool": "oak-1.rgw.usage",
> "user_keys_pool": "oak-1.rgw.users.keys",
> "user_email_pool": "oak-1.rgw.users.email",
> "user_swift_pool": "oak-1.rgw.users.swift",
> "user_uid_pool": "oak-1.rgw.users.uid",
> "system_key": {
> "access_key": "XX",
> "secret_key": "XX"
> },
> "placement_pools": [
> {
> "key": "default-placement",
> "val": {
> "index_pool": "oak-1.rgw.buckets.index",
> "data_pool": "oak-1.rgw.buckets.data",
> "data_extra_pool": "oak-1.rgw.buckets.non-ec",
> "index_type": 0
> }
> },
> {
> "key": "ssd-placement",
> "val": {
> "index_pool": "oak-1.rgw.buckets.index-ssd",
> "data_pool": "oak-1.rgw.buckets.data-ssd",
> "data_extra_pool": "oak-1.rgw.buckets.non-ec-ssd",
> "index_type": 0
> }
> }
> ],
> "metadata_heap": "oak-1.rgw.meta",
> "realm_id": "67e26f6b-4774-4b14-9668-a5cf76b9e9ce"
> }
>
> And region
>
> radosgw-admin region get
> {
> "id": "dbec3557-87bb-4460-8546-b59b4fde7e10",
> "name": "oak",
> "api_name": "oak",
> "is_master": "true",
> "endpoints": [],
> "hostnames": [],
> "hostnames_s3website": [],
> "master_zone": "fc12ac44-e27e-44e3-9b13-347162d3c1d2",
> "zones": [
> {
> "id": "fc12ac44-e27e-44e3-9b13-347162d3c1d2",
> "name": "oak-1",
> "endpoints": [
> "http:\/\/ceph1.oak.vast.com:7480"
> ],
> "log_meta": "true",
> "log_data": "false",
> "bucket_index_max_shards": 0,
> "read_only": "false"
> }
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": [
> "default-placement"
> ]
> },
> {
> "name": "ssd-placement",
> "tags": [
> "ssd-placement"
> ]
> }
> ],
> "default_placement": "default-placement",
> "realm_id": "67e26f6b-4774-4b14-9668-a5cf76b9e9ce"
>
>
>
> When I run radosgw-admin regionmap update I don't get ssd-placement as
> placement_target:
>
> {
> "zonegroups": [
> {
> "key": "dbec3557-87bb-4460-8546-b59b4fde7e10",
> "val": {
> "id": "dbec3557-87bb-4460-8546-b59b4fde7e10",
> "name": "oak",
> "api_name": "oak",
> "is_master": "true",
> "endpoints": [],
> "hostnames": [],
> "hostnames_s3website": [],
> "master_zone": "fc12ac44-e27e-44e3-9b13-347162d3c1d2",
> "zones": [
> {
> "id": "fc12ac44-e27e-44e3-9b13-347162d3c1d2",
> "name": "oak-1",
> "endpoints": [
> "http:\/\/ceph1.oak.vast.com:7480"
> ],
> "log_meta": "true",
> "log_data": "false",
> "bucket_index_max_shards": 0,
> "read_only": "false"
> }
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": []
> }
> ],
> "default_placement": "default-placement",
> "realm_id": "67e26f6b-4774-4b14-9668-a5cf76b9e9ce"
> }
> }
> ],
> "master_zonegroup": "dbec3557-87bb-4460-8546-b59b4fde7e10",
> "bucket_quota": {
> "enabled": false,
> "max_size_kb": -1,
> "max_objects": -1
> },
> "user_quota": {
> "enabled": false,
> "max_size_kb": -1,
> "max_objects": -1
> }
> }
>
> Ceph version is:
> ceph --version
> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>
> Any advises?
>

First I recommend using zonegroup for jewel as region was renamed to zonegroup.
How did you create/update the zones and zonegroup?
Did you executed period update?

Orit

>
> Thanks in advance
>
> Marko Stojanovic
>
>
>
> ___
> ceph-users mailing list

[ceph-users] 答复: 答复: 答复: Pipe "deadlock" in Hammer, 0.94.5

2017-01-14 Thread 许雪寒
Thanks for your help:-)

I checked the source code again, and in read_message, it does hold the 
Connection::lock:

 while (left > 0) {
// wait for data
if (tcp_read_wait() < 0)
goto out_dethrottle;

// get a buffer
connection_state->lock.Lock();
map >::iterator p =

connection_state->rx_buffers.find(header.tid);
if (p != connection_state->rx_buffers.end()) {
if (rxbuf.length() == 0 || p->second.second != 
rxbuf_version) {
ldout(msgr->cct,10)
<< "reader 
seleting rx buffer v "

<< p->second.second << " at offset "

<< offset << " len "

<< p->second.first.length() << dendl;
rxbuf = p->second.first;
rxbuf_version = p->second.second;
// make sure it's big enough
if (rxbuf.length() < data_len)
rxbuf.push_back(

buffer::create(data_len - rxbuf.length()));
blp = p->second.first.begin();
blp.advance(offset);
}
} else {
if (!newbuf.length()) {
ldout(msgr->cct,20)
<< "reader 
allocating new rx buffer at offset "

<< offset << dendl;
alloc_aligned_buffer(newbuf, data_len, 
data_off);
blp = newbuf.begin();
blp.advance(offset);
}
}
bufferptr bp = blp.get_current_ptr();
int read = MIN(bp.length(), left);
ldout(msgr->cct,20)
<< "reader reading nonblocking 
into "
<< (void*) 
bp.c_str() << " len " << bp.length()
<< dendl;
int got = tcp_read_nonblocking(bp.c_str(), read);
ldout(msgr->cct,30)
<< "reader read " << got << " 
of " << read << dendl;
connection_state->lock.Unlock();
if (got < 0)
goto out_dethrottle;
if (got > 0) {
blp.advance(got);
data.append(bp, 0, got);
offset += got;
left -= got;
} // else we got a signal or something; just loop.
}

As shown in the above code, in the reading loop, it first lock 
connection_state->lock and then do tcp_read_nonblocking. connection_state is of 
type PipeConnectionRef, connection_state->lock is Connection::lock.

On the other hand, I'll check that whether there are a lot of message to send 
as you suggested. Thanks:-)



发件人: Gregory Farnum [gfar...@redhat.com]

发送时间: 2017年1月14日 9:39

收件人: 许雪寒

Cc: jiajia zhong; ceph-users@lists.ceph.com

主题: Re: [ceph-users] 答复: 答复: Pipe "deadlock" in Hammer, 0.94.5









On Thu, Jan 12, 2017 at 7:58 PM, 许雪寒  wrote:










Thank you for your continuous helpJ.
 
We are using hammer 0.94.5 version, and what I read is the version of the 
source code.

However, on the other hand, if Pipe::do_recv do act as blocked, is it 
reasonable for the Pipe::reader_thread to block threads calling 
SimpleMessenger::submit_message
 by holding Connection::lock?
 
I think maybe a different mutex should be used in Pipe::read_message rather 
than Connection::lock.







I don't think it does use that lock. Pipe::read_message() is generally called 
while the pipe_lock is held, but not Connection::lock. (They are separate.) 

I haven't dug into the relevant OSD code in a while, but I think it's a lot 
more likely your OSD is just overloaded and is taking a while to send a lot of 
different messages, and that the loop 

Re: [ceph-users] Mixing disks

2017-01-14 Thread Nick Fisk
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Marc 
> Roos
> Sent: 14 January 2017 12:56
> To: ceph-users 
> Subject: [ceph-users] Mixing disks
> 
> 
> For a test cluster, we like to use some 5400rpm and 7200rpm drives, is it 
> advisable to customize the configuration then as
described on
> this page. Or is the speed difference to so small, and should this only be 
> done when adding ssd's to the same osd node?

I wouldn't add two different disk types to the same pool, it will likely just 
bring the speed of the pool down to the slowest disk.
You could either use them as two different pools to provide different tiers of 
storage, or use tiering to allow automatic data
migration between them.

Although I would have to question the logic of what you are trying to achieve. 
The speed/cost difference between the two types isn't
that great and I'm not sure if it's worth the hassle?

> 
> https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/
> 
> 
> 
> 
> 
> 
> -Original Message-
> From: David Turner [mailto:david.tur...@storagecraft.com]
> Sent: vrijdag 13 januari 2017 21:36
> To: Chris Jones
> Cc: ceph-us...@ceph.com
> Subject: Re: [ceph-users] Ceph Monitoring
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mixing disks

2017-01-14 Thread Marc Roos
 
For a test cluster, we like to use some 5400rpm and 7200rpm drives, is 
it advisable to customize the configuration then as described on this 
page. Or is the speed difference to so small, and should this only be 
done when adding ssd's to the same osd node?

https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/






-Original Message-
From: David Turner [mailto:david.tur...@storagecraft.com] 
Sent: vrijdag 13 januari 2017 21:36
To: Chris Jones
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] Ceph Monitoring

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Change Partition Schema on OSD Possible?

2017-01-14 Thread Wido den Hollander

> Op 14 januari 2017 om 11:05 schreef Hauke Homburg :
> 
> 
> Hello,
> 
> In our Ceph Cluster are our HDD in the OSD with 50% DATA in GPT
> Partitions configured. Can we change this Schema to have more Data Storage?
> 

How do you mean?

> Our HDD are 5TB so i hope to have more Space when i change the GPT
> bigger from 2TB to 3 oder 4 TB.
> 

On a 5TB disks only 50% is used for data? What is the other 50% being used for?

> Can we modify the Partitions without install reinstall the Server?
> 

Sure! Just like changing any other GPT partition. Don't forget to resize XFS 
afterwards with xfs_growfs.

However, test this on one OSD/disk first before doing it on all.

Wido

> Whats the best Way to do this? Boot the Node with a Rescue CD and change
> the Partition with gparted, and boot the Server again?
> 
> Thanks for help
> 
> Regards
> 
> Hauke
> 
> -- 
> www.w3-creative.de
> 
> www.westchat.de
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Change Partition Schema on OSD Possible?

2017-01-14 Thread Hauke Homburg
Hello,

In our Ceph Cluster are our HDD in the OSD with 50% DATA in GPT
Partitions configured. Can we change this Schema to have more Data Storage?

Our HDD are 5TB so i hope to have more Space when i change the GPT
bigger from 2TB to 3 oder 4 TB.

Can we modify the Partitions without install reinstall the Server?

Whats the best Way to do this? Boot the Node with a Rescue CD and change
the Partition with gparted, and boot the Server again?

Thanks for help

Regards

Hauke

-- 
www.w3-creative.de

www.westchat.de

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] All SSD cluster performance

2017-01-14 Thread Wido den Hollander

> Op 14 januari 2017 om 6:41 schreef Christian Balzer :
> 
> 
> 
> Hello,
> 
> On Fri, 13 Jan 2017 13:18:35 -0500 Mohammed Naser wrote:
> 
> > These Intel SSDs are more than capable of handling the workload, in 
> > addition, this cluster is used as an RBD backend for an OpenStack cluster. 
> >
> 
> I haven't tested the S3520s yet, them being the first 3D NAND offering
> from Intel they are slightly slower than the predecessors in terms of BW
> and IOPS, but have supposedly a slightly lower latency if the specs are to
> believed.
> 
> Given the history of Intel DC S SSDs I think it is safe to assume that they
> use the same/similar controller setup as their predecessors, meaning a
> large powercap backed cache which enables them to deal correctly and
> quickly with SYNC and DIRECT writes. 
> 
> What would worry me slight more (even at their 960GB size) is the endurance
> of 1 DWPD, which with journals inline comes down to 0.5 and with FS
> overhead and write amplification (depends a lot on the type of operations)
> you're looking a something along 0.3 DWPD to base your expectations on.
> Mind, that still leaves you with about 9.6TB per day, which is a decent
> enough number, but only equates to about 112MB/s.
> 
> Finally, most people start with looking at bandwidth/throughput when
> penultimately they discover it was IOPS they needed first and foremost.

Yes! Bandwidth isn't what people usually need, they need IOps. Low latency.

I see a lot of clusters doing 10k ~ 20k IOps with somewhere around 1Gbit/s of 
traffic.

Wido

> 
> Christian
> 
> > Sent from my iPhone
> > 
> > > On Jan 13, 2017, at 1:08 PM, Somnath Roy  wrote:
> > > 
> > > Also, there are lot of discussion about SSDs not suitable for Ceph write 
> > > workload (with filestore) in community as those are not good for 
> > > odirect/odsync kind of writes. Hope your SSDs are tolerant of that.
> > > 
> > > -Original Message-
> > > From: Somnath Roy
> > > Sent: Friday, January 13, 2017 10:06 AM
> > > To: 'Mohammed Naser'; Wido den Hollander
> > > Cc: ceph-users@lists.ceph.com
> > > Subject: RE: [ceph-users] All SSD cluster performance
> > > 
> > > << Both OSDs are pinned to two cores on the system Is there any reason 
> > > you are pinning osds like that ? I would say for object workload there is 
> > > no need to pin osds.
> > > The configuration you mentioned , Ceph with 4M object PUT it should be 
> > > saturating your network first.
> > > 
> > > Have you run say 4M object GET to see what BW you are getting ?
> > > 
> > > Thanks & Regards
> > > Somnath
> > > 
> > > -Original Message-
> > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> > > Mohammed Naser
> > > Sent: Friday, January 13, 2017 9:51 AM
> > > To: Wido den Hollander
> > > Cc: ceph-users@lists.ceph.com
> > > Subject: Re: [ceph-users] All SSD cluster performance
> > > 
> > > 
> > >> On Jan 13, 2017, at 12:41 PM, Wido den Hollander  wrote:
> > >> 
> > >> 
> > >>> Op 13 januari 2017 om 18:39 schreef Mohammed Naser 
> > >>> :
> > >>> 
> > >>> 
> > >>> 
> >  On Jan 13, 2017, at 12:37 PM, Wido den Hollander  wrote:
> >  
> >  
> > > Op 13 januari 2017 om 18:18 schreef Mohammed Naser 
> > > :
> > > 
> > > 
> > > Hi everyone,
> > > 
> > > We have a deployment with 90 OSDs at the moment which is all SSD 
> > > that’s not hitting quite the performance that it should be in my 
> > > opinion, a `rados bench` run gives something along these numbers:
> > > 
> > > Maintaining 16 concurrent writes of 4194304 bytes to objects of
> > > size 4194304 for up to 10 seconds or 0 objects Object prefix: 
> > > benchmark_data_bench.vexxhost._30340
> > > sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg 
> > > lat(s)
> > >  0   0 0 0 0 0   -
> > >0
> > >  1  16   158   142   568.513   568   0.0965336   
> > > 0.0939971
> > >  2  16   287   271   542.191   516   0.0291494
> > > 0.107503
> > >  3  16   375   359478.75   352   0.0892724
> > > 0.118463
> > >  4  16   477   461   461.042   408   0.0243493
> > > 0.126649
> > >  5  16   540   524   419.216   2520.239123
> > > 0.132195
> > >  6  16   644   628418.67   4160.347606
> > > 0.146832
> > >  7  16   734   718   410.281   360   0.0534447
> > > 0.147413
> > >  8  16   811   795   397.487   308   0.0311927 
> > > 0.15004
> > >  9  16   879   863   383.537   272   0.0894534
> > > 0.158513
> > > 10  16   980   964   385.578   404   0.0969865
> > > 0.162121
> > > 11