Re: [ceph-users] S3 Bucket usage up 150% diference between rgw-admin and external metering tools.

2020-01-21 Thread EDH - Manuel Rios
Hi Cbodley  ,

As you requested by IRC we tested directly with AWS Cli.

Results:
aws --endpoint=http://XX --profile=ceph s3api list-multipart-uploads 
--bucket Evol6

It reports near 170 uploads.

We used the last one:
{
"Initiator": {
"DisplayName": "x",
"ID": "xx"
},
"Initiated": "2019-12-03T01:23:06.007Z",
"UploadId": "2~r0BMPPs8CewVZ6Qheu1s9WzaBn7bBvU",
"StorageClass": "STANDARD",
"Key": 
"MBS-da43656f-2b8c-464f-b341-03fdbdf446ae/CBB_SRV2K12/CBB_VM/192.168.0.197/SRV2K12/Hard
 disk 1$/20191203010516/431.cbrevision",
"Owner": {
"DisplayName": "x",
"ID": ""
}
}

aws --endpoint=http://x --profile=ceph s3api abort-multipart-upload 
--bucket Evol6 --key 
'MBS-da43656f-2b8c-464f-b341-03fdbdf446ae/CBB_SRV2K12/CBB_VM/192.168.0.197/SRV2K12/Hard
 disk 1$/20191203010516/431.cbrevision' --upload-id 
2~r0BMPPs8CewVZ6Qheu1s9WzaBn7bBvU

Return: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload 
operation: Unknown

The same error is reported by S3CMD.
Maybe is there something wrong parsing the "1$" inside the key 

Best Regards, 

Regards
Manuel

-Mensaje original-
De: ceph-users  En nombre de EDH - Manuel 
Rios
Enviado el: martes, 21 de enero de 2020 20:09
Para: Robin H. Johnson 
CC: ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] S3 Bucket usage up 150% diference between rgw-admin 
and external metering tools.

Hi Robin,

- What are the external tools? CloudBerry S3 Explorer  and S3 Browser
- How many objects do the external tools report as existing?  Tool report 72142 
keys (Aprox 6TB) vs  CEPH num_objects  180981 (9TB)
- Do the external tools include incomplete multipart uploads in their  size 
data? I think no one external software include incomplete objects in the size, 
due S3 api list recursive don't include it.
Checking for incomplete multiparts , I got a response 404 NoSuchKeys.
- If bucket versioning is enabled, do the tools include all versions in the
  size data? Versioning is not enabled
- Are there leftover multipart pieces without a multipart head?   How can we 
check it?

Specific bucket information:
{
"bucket": "XX",
"tenant": "",
"zonegroup": "4d8c7c5f-ca40-4ee3-b5bb-b2cad90bd007",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "default.rgw.buckets.data",
"data_extra_pool": "default.rgw.buckets.non-ec",
"index_pool": "default.rgw.buckets.index"
},
"id": "48efb8c3-693c-4fe0-bbe4-fdc16f590a82.132873679.2",
"marker": "48efb8c3-693c-4fe0-bbe4-fdc16f590a82.3886182.52",
"index_type": "Normal",
"owner": "XXX",
"ver": "0#89789,1#60165,2#80652,3#76367",
"master_ver": "0#0,1#0,2#0,3#0",
"mtime": "2020-01-05 19:29:59.360574Z",
"max_marker": "0#,1#,2#,3#",
"usage": {
"rgw.main": {
"size": 9050249319344,
"size_actual": 9050421526528,
"size_utilized": 9050249319344,
"size_kb": 8838134101,
"size_kb_actual": 8838302272,
"size_kb_utilized": 8838134101,
"num_objects": 180981
},
"rgw.multimeta": {
"size": 0,
"size_actual": 0,
"size_utilized": 3861,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 4,
"num_objects": 143
}
},
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1024,
"max_size_kb": 0,
"max_objects": -1
}
}

-Mensaje original-
De: ceph-users  En nombre de Robin H. 
Johnson Enviado el: martes, 21 de enero de 2020 18:58
CC: ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] S3 Bucket usage up 150% diference between rgw-admin 
and external metering tools.

On Mon, Jan 20, 2020 at 12:57:51PM +, EDH - Manuel Rios wrote:
> Hi Cephs
> 
> Several nodes of our Ceph 14.2.5 are fully dedicated to host cold storage / 
> backups information.
> 
> Today checking the data usage with a customer found that rgw-admin is 
>

Re: [ceph-users] S3 Bucket usage up 150% diference between rgw-admin and external metering tools.

2020-01-21 Thread EDH - Manuel Rios
Hi Robin,

- What are the external tools? CloudBerry S3 Explorer  and S3 Browser
- How many objects do the external tools report as existing?  Tool report 72142 
keys (Aprox 6TB) vs  CEPH num_objects  180981 (9TB)
- Do the external tools include incomplete multipart uploads in their  size 
data? I think no one external software include incomplete objects in the size, 
due S3 api list recursive don't include it.
Checking for incomplete multiparts , I got a response 404 NoSuchKeys.
- If bucket versioning is enabled, do the tools include all versions in the
  size data? Versioning is not enabled
- Are there leftover multipart pieces without a multipart head?   How can we 
check it?

Specific bucket information:
{
"bucket": "XX",
"tenant": "",
"zonegroup": "4d8c7c5f-ca40-4ee3-b5bb-b2cad90bd007",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "default.rgw.buckets.data",
"data_extra_pool": "default.rgw.buckets.non-ec",
"index_pool": "default.rgw.buckets.index"
},
"id": "48efb8c3-693c-4fe0-bbe4-fdc16f590a82.132873679.2",
"marker": "48efb8c3-693c-4fe0-bbe4-fdc16f590a82.3886182.52",
"index_type": "Normal",
"owner": "XXX",
"ver": "0#89789,1#60165,2#80652,3#76367",
"master_ver": "0#0,1#0,2#0,3#0",
"mtime": "2020-01-05 19:29:59.360574Z",
"max_marker": "0#,1#,2#,3#",
"usage": {
"rgw.main": {
"size": 9050249319344,
"size_actual": 9050421526528,
"size_utilized": 9050249319344,
"size_kb": 8838134101,
"size_kb_actual": 8838302272,
"size_kb_utilized": 8838134101,
"num_objects": 180981
},
"rgw.multimeta": {
"size": 0,
"size_actual": 0,
"size_utilized": 3861,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 4,
"num_objects": 143
        }
},
    "bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1024,
"max_size_kb": 0,
"max_objects": -1
}
}

-Mensaje original-
De: ceph-users  En nombre de Robin H. Johnson
Enviado el: martes, 21 de enero de 2020 18:58
CC: ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] S3 Bucket usage up 150% diference between rgw-admin 
and external metering tools.

On Mon, Jan 20, 2020 at 12:57:51PM +, EDH - Manuel Rios wrote:
> Hi Cephs
> 
> Several nodes of our Ceph 14.2.5 are fully dedicated to host cold storage / 
> backups information.
> 
> Today checking the data usage with a customer found that rgw-admin is 
> reporting:
...
> That's near 5TB used space in CEPH, and the external tools are reporting just 
> 1.42TB.
- What are the external tools?
- How many objects do the external tools report as existing?
- Do the external tools include incomplete multipart uploads in their
  size data?
- If bucket versioning is enabled, do the tools include all versions in the
  size data?
- Are there leftover multipart pieces without a multipart head?  (this
  is a Ceph bug that I think is fixed in your release, but old pieces
  might still exist).

--
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB 
E9B85B1F 825BCECF EE05E6F6 A48F6136 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] S3 Bucket usage up 150% diference between rgw-admin and external metering tools.

2020-01-20 Thread EDH - Manuel Rios
Hi Cephs

Several nodes of our Ceph 14.2.5 are fully dedicated to host cold storage / 
backups information.

Today checking the data usage with a customer found that rgw-admin is reporting:

{
"bucket": "XX",
"tenant": "",
"zonegroup": "4d8c7c5f-ca40-4ee3-b5bb-b2cad90bd007",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "default.rgw.buckets.data",
"data_extra_pool": "default.rgw.buckets.non-ec",
"index_pool": "default.rgw.buckets.index"
},
"id": "48efb8c3-693c-4fe0-bbe4-fdc16f590a82.15946848.1",
"marker": "48efb8c3-693c-4fe0-bbe4-fdc16f590a82.3886182.18",
"index_type": "Normal",
"owner": "",
"ver": "0#410482,1#441516,2#401062,3#371595",
"master_ver": "0#0,1#0,2#0,3#0",
"mtime": "2019-06-08 00:26:06.266567Z",
"max_marker": "0#,1#,2#,3#",
"usage": {
"rgw.none": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 0
},
"rgw.main": {
"size": 5118399148914,
"size_actual": 5118401548288,
"size_utilized": 5118399148914,
"size_kb": 4998436669,
"size_kb_actual": 4998439012,
"size_kb_utilized": 4998436669,
"num_objects": 293083
},
"rgw.multimeta": {
"size": 0,
"size_actual": 0,
"size_utilized": 378,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 1,
"num_objects": 1688
}
},
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1024,
"max_size_kb": 0,
"max_objects": -1
}

That's near 5TB used space in CEPH, and the external tools are reporting just 
1.42TB.

Just in this case is more than a 300%. As the platform is billed by usage that 
cause an internal problem with customers.

Our setup don't use EC nodes, all are replica. All nodes use 14.2.5. 6 SSD 
fully dedicated to RGW-index .

No error at rgw logs or something that can explain this huge difference.

Magnitude in our case is that customer report us he use near 70-80TB in 
multiple buckets, but our CEPH report 163TB.

Im planning to move out all the customer information to a NAS to cleanup this 
bucket/space and re-upload but the process is not very transparent or smooth 
for customer.

Suggestions accepted.

Regards
Manuel


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [db/db_impl_compaction_flush.cc:1403] [default] Manual compaction starting

2020-01-02 Thread EDH - Manuel Rios
HI

Today checking our monitor logs see that RocksDB compactation trigger every 
minute.

Is that normal?

2020-01-02 14:08:33.091 7f2b8acbe700  4 rocksdb: 
[db/db_impl_compaction_flush.cc:1403] [default] Manual compaction starting
2020-01-02 14:08:33.091 7f2b8acbe700  4 rocksdb: 
[db/db_impl_compaction_flush.cc:1403] [default] Manual compaction starting
2020-01-02 14:08:33.091 7f2b8acbe700  4 rocksdb: 
[db/db_impl_compaction_flush.cc:1403] [default] Manual compaction starting
2020-01-02 14:08:33.091 7f2b8acbe700  4 rocksdb: 
[db/db_impl_compaction_flush.cc:1403] [default] Manual compaction starting
2020-01-02 14:08:33.091 7f2b8acbe700  4 rocksdb: 
[db/db_impl_compaction_flush.cc:1403] [default] Manual compaction starting

2020-01-02 14:09:15.193 7f2b8acbe700  4 rocksdb: 
[db/db_impl_compaction_flush.cc:1403] [default] Manual compaction starting
2020-01-02 14:09:15.193 7f2b8acbe700  4 rocksdb: 
[db/db_impl_compaction_flush.cc:1403] [default] Manual compaction starting
2020-01-02 14:09:15.193 7f2b8acbe700  4 rocksdb: 
[db/db_impl_compaction_flush.cc:1403] [default] Manual compaction starting
2020-01-02 14:09:15.193 7f2b8acbe700  4 rocksdb: 
[db/db_impl_compaction_flush.cc:1403] [default] Manual compaction starting
2020-01-02 14:09:15.193 7f2b8acbe700  4 rocksdb: 
[db/db_impl_compaction_flush.cc:1403] [default] Manual compaction starting
2020-01-02 14:09:15.193 7f2b8acbe700  4 rocksdb: 
[db/db_impl_compaction_flush.cc:1403] [default] Manual compaction starting
2020-01-02 14:09:15.193 7f2b8acbe700  4 rocksdb: 
[db/db_impl_compaction_flush.cc:1403] [default] Manual compaction starting
2020-01-02 14:09:15.193 7f2b8acbe700  4 rocksdb: 
[db/db_impl_compaction_flush.cc:1403] [default] Manual compaction starting
2020-01-02 14:09:15.193 7f2b8acbe700  4 rocksdb: 
[db/db_impl_compaction_flush.cc:1403] [default] Manual compaction starting
2020-01-02 14:09:15.193 7f2b8acbe700  4 rocksdb: 
[db/db_impl_compaction_flush.cc:1403] [default] Manual compaction starting
2020-01-02 14:09:15.193 7f2b8acbe700  4 rocksdb: 
[db/db_impl_compaction_flush.cc:1403] [default] Manual compaction starting

Best Regards
Manuel

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consumer-grade SSD in Ceph

2019-12-27 Thread EDH - Manuel Rios
Micron 9300

Obtener Outlook para Android


From: ceph-users  on behalf of Sinan Polat 

Sent: Friday, December 27, 2019 12:35:44 PM
To: Eneko Lacunza 
Cc: ceph-users@lists.ceph.com 
Subject: Re: [ceph-users] Consumer-grade SSD in Ceph

Thanks for all the replies. In summary; consumer grade SSD is a no go.

What is an alternative to SM863a? Since it is quite hard to get these due non 
non-stock.

Thanks!
Sinan

> Op 23 dec. 2019 om 08:50 heeft Eneko Lacunza  het 
> volgende geschreven:
>
> Hi Sinan,
>
> Just to reiterate: don't do this. Consumer SSDs will destroy your enterprise 
> SSD's performance.
>
> Our office cluster is made of consumer-grade servers: cheap gaming 
> motherboards, memory, ryzen processors, desktop HDDs. But SSD drives are 
> Enterprise, we had awful experiences with consumer SSDs (some perform worse 
> that HDDs with Ceph).
>
> Cheers
> Eneko
>
>> El 19/12/19 a las 20:20, Sinan Polat escribió:
>> Hi all,
>>
>> Thanks for the replies. I am not worried about their lifetime. We will be 
>> adding only 1 SSD disk per physical server. All SSD’s are enterprise drives. 
>> If the added consumer grade disk will fail, no problem.
>>
>> I am more curious regarding their I/O performance. I do want to have 50% 
>> drop in performance.
>>
>> So anyone any experience with 860 EVO or Crucial MX500 in a Ceph setup?
>>
>> Thanks!
>>
>>> Op 19 dec. 2019 om 19:18 heeft Mark Nelson  het 
>>> volgende geschreven:
>>>
>>> The way I try to look at this is:
>>>
>>>
>>> 1) How much more do the enterprise grade drives cost?
>>>
>>> 2) What are the benefits? (Faster performance, longer life, etc)
>>>
>>> 3) How much does it cost to deal with downtime, diagnose issues, and 
>>> replace malfunctioning hardware?
>>>
>>>
>>> My personal take is that enterprise drives are usually worth it. There may 
>>> be consumer grade drives that may be worth considering in very specific 
>>> scenarios if they still have power loss protection and high write 
>>> durability.  Even when I was in academia years ago with very limited 
>>> budgets, we got burned with consumer grade SSDs to the point where we had 
>>> to replace them all.  You have to be very careful and know exactly what you 
>>> are buying.
>>>
>>>
>>> Mark
>>>
>>>
 On 12/19/19 12:04 PM, jes...@krogh.cc wrote:
 I dont think “usually” is good enough in a production setup.



 Sent from myMail for iOS


 Thursday, 19 December 2019, 12.09 +0100 from Виталий Филиппов 
 :

Usually it doesn't, it only harms performance and probably SSD
lifetime
too

> I would not be running ceph on ssds without powerloss protection. I
> delivers a potential data loss scenario


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> Zuzendari Teknikoa / Director Técnico
> Binovo IT Human Project, S.L.
> Telf. 943569206
> Astigarragako bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
> www.binovo.es
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-15 Thread EDH - Manuel Rios Fernandez
Hi,

For solve the issue, mount with:

rbd map pool/disk_id , and mount the / volume in a linux machine "A ceph
node will be ok", this will flush the journal and close and discard the
pending changes in openstack nodes cache, then unmount and rbd unmap. Boot
the instance from openstack again, and voila will work.

For windows instances you must use ntfsfix in a linux computer with the same
commands.

Regards,
Manuel




-Mensaje original-
De: ceph-users  En nombre de Simon
Ironside
Enviado el: viernes, 15 de noviembre de 2019 14:28
Para: ceph-users 
Asunto: Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM,
and Ceph RBD locks

Hi Florian,

On 15/11/2019 12:32, Florian Haas wrote:

> I received this off-list but then subsequently saw this message pop up 
> in the list archive, so I hope it's OK to reply on-list?

Of course, I just clicked the wrong reply button the first time.

> So that cap was indeed missing, thanks for the hint! However, I am 
> still trying to understand how this is related to the issue we saw.

I had exactly the same happen to me as happened to you a week or so ago. 
Compute node lost power and once restored the VMs would start booting but
fail early on when they tried to write.

My key was also missing that cap, adding it and resetting the affected VMs
was the only action I took to sort things out. I didn't need to go around
removing locks by hand as you did. As you say, waiting 30 seconds didn't do
any good so it doesn't appear to be a watcher thing.

This was mentioned in the release notes for Luminous[1], I'd missed it too
as I redeployed Nautilus instead and skipped these steps:



Verify that all RBD client users have sufficient caps to blacklist other
client users. RBD client users with only "allow r" monitor caps should be
updated as follows:

# ceph auth caps client. mon 'allow r, allow command "osd blacklist"'
osd ''



Simon

[1]
https://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-k
raken
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs full, 2/3 Raw capacity used

2019-08-26 Thread EDH - Manuel Rios Fernandez
Balancer just balance in Healthy mode.

The problem is that data is distributed without be balanced in their first
write, that cause unproperly data balanced across osd.

This problem only happens in CEPH, we are the same with 14.2.2, having to
change the weight manually.Because the balancer is a passive element of the
cluster.

I hope in next version we get a more aggressive balancer, like enterprises
storages that allow to fill up 95% storage (raw).

Regards


-Mensaje original-
De: ceph-users  En nombre de Simon
Oosthoek
Enviado el: lunes, 26 de agosto de 2019 11:52
Para: Dan van der Ster 
CC: ceph-users 
Asunto: Re: [ceph-users] cephfs full, 2/3 Raw capacity used

On 26-08-19 11:37, Dan van der Ster wrote:
> Thanks. The version and balancer config look good.
> 
> So you can try `ceph osd reweight osd.10 0.8` to see if it helps to 
> get you out of this.

I've done this and the next fullest 3 osds. This will take some time to
recover, I'll let you know when it's done.

Thanks,

/simon

> 
> -- dan
> 
> On Mon, Aug 26, 2019 at 11:35 AM Simon Oosthoek 
>  wrote:
>>
>> On 26-08-19 11:16, Dan van der Ster wrote:
>>> Hi,
>>>
>>> Which version of ceph are you using? Which balancer mode?
>>
>> Nautilus (14.2.2), balancer is in upmap mode.
>>
>>> The balancer score isn't a percent-error or anything humanly usable.
>>> `ceph osd df tree` can better show you exactly which osds are 
>>> over/under utilized and by how much.
>>>
>>
>> Aha, I ran this and sorted on the %full column:
>>
>>81   hdd   10.81149  1.0  11 TiB 5.2 TiB 5.1 TiB   4 KiB  14 GiB
>> 5.6 TiB 48.40 0.73  96 up osd.81
>>48   hdd   10.81149  1.0  11 TiB 5.3 TiB 5.2 TiB  15 KiB  14 GiB
>> 5.5 TiB 49.08 0.74  95 up osd.48
>> 154   hdd   10.81149  1.0  11 TiB 5.5 TiB 5.4 TiB 2.6 GiB  15 GiB
>> 5.3 TiB 50.95 0.76  96 up osd.154
>> 129   hdd   10.81149  1.0  11 TiB 5.5 TiB 5.4 TiB 5.1 GiB  16 GiB
>> 5.3 TiB 51.33 0.77  96 up osd.129
>>42   hdd   10.81149  1.0  11 TiB 5.6 TiB 5.5 TiB 2.6 GiB  14 GiB
>> 5.2 TiB 51.81 0.78  96 up osd.42
>> 122   hdd   10.81149  1.0  11 TiB 5.7 TiB 5.6 TiB  16 KiB  14 GiB
>> 5.1 TiB 52.47 0.79  96 up osd.122
>> 120   hdd   10.81149  1.0  11 TiB 5.7 TiB 5.6 TiB 2.6 GiB  15 GiB
>> 5.1 TiB 52.92 0.79  95 up osd.120
>>96   hdd   10.81149  1.0  11 TiB 5.8 TiB 5.7 TiB 2.6 GiB  15 GiB
>> 5.0 TiB 53.58 0.80  96 up osd.96
>>26   hdd   10.81149  1.0  11 TiB 5.8 TiB 5.7 TiB  20 KiB  15 GiB
>> 5.0 TiB 53.68 0.80  97 up osd.26
>> ...
>> 6   hdd   10.81149  1.0  11 TiB 8.3 TiB 8.2 TiB  88 KiB  18 GiB
>> 2.5 TiB 77.14 1.16  96 up osd.6
>>16   hdd   10.81149  1.0  11 TiB 8.4 TiB 8.3 TiB  28 KiB  18 GiB
>> 2.4 TiB 77.56 1.16  95 up osd.16
>> 0   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.4 TiB  48 KiB  17 GiB
>> 2.2 TiB 79.24 1.19  96 up osd.0
>> 144   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB 2.6 GiB  18 GiB
>> 2.2 TiB 79.57 1.19  95 up osd.144
>> 136   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB  48 KiB  17 GiB
>> 2.2 TiB 79.60 1.19  95 up osd.136
>>63   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB 2.6 GiB  17 GiB
>> 2.2 TiB 79.60 1.19  95 up osd.63
>> 155   hdd   10.81149  1.0  11 TiB 8.6 TiB 8.5 TiB   8 KiB  19 GiB
>> 2.2 TiB 79.85 1.20  95 up osd.155
>>89   hdd   10.81149  1.0  11 TiB 8.7 TiB 8.5 TiB  12 KiB  20 GiB
>> 2.2 TiB 80.04 1.20  96 up osd.89
>> 106   hdd   10.81149  1.0  11 TiB 8.8 TiB 8.7 TiB  64 KiB  19 GiB
>> 2.0 TiB 81.38 1.22  96 up osd.106
>>94   hdd   10.81149  1.0  11 TiB 9.0 TiB 8.9 TiB 0 B  19 GiB
>> 1.8 TiB 83.53 1.25  96 up osd.94
>>33   hdd   10.81149  1.0  11 TiB 9.1 TiB 9.0 TiB  44 KiB  19 GiB
>> 1.7 TiB 84.40 1.27  96 up osd.33
>>15   hdd   10.81149  1.0  11 TiB  10 TiB 9.8 TiB  16 KiB  20 GiB
>> 877 GiB 92.08 1.38  96 up osd.15
>>53   hdd   10.81149  1.0  11 TiB  10 TiB  10 TiB 2.6 GiB  20 GiB
>> 676 GiB 93.90 1.41  96 up osd.53
>>51   hdd   10.81149  1.0  11 TiB  10 TiB  10 TiB 2.6 GiB  20 GiB
>> 666 GiB 93.98 1.41  96 up osd.51
>>10   hdd   10.81149  1.0  11 TiB  10 TiB  10 TiB  40 KiB  22 GiB
>> 552 GiB 95.01 1.42  97 up osd.10
>>
>> So the fullest one is at 95.01%, the emptiest one at 48.4%, so 
>> there's some balancing to be done.
>>
>>> You might be able to manually fix things by using `ceph osd reweight 
>>> ...` on the most full osds to move data elsewhere.
>>
>> I'll look into this, but I was hoping that the balancer module would 
>> take care of this...
>>
>>>

[ceph-users] Balancer dont work with state pgs backfill_toofull

2019-08-23 Thread EDH - Manuel Rios Fernandez
Root affected got more than 70TB free.  The only solution is manual reweight
the OSD. But in this situacion balancer in unmap mode should move data to
get all HEALTHY

 

Hope some fix come in the next 14.2.X to fix that issue. 

 

Ceph 14.2.2 Centos 7.6

  cluster:

id: e1ee8086-7cce-43fd-a252-3d677af22428

health: HEALTH_ERR

2 nearfull osd(s)

2 pool(s) nearfull

Degraded data redundancy (low space): 9 pgs backfill_toofull

 

  services:

mon: 3 daemons, quorum CEPH-MON01,CEPH002,CEPH003 (age 19m)

mgr: CEPH001(active, since 12h), standbys: CEPH-MON01

osd: 90 osds: 90 up (since 4h), 90 in (since 34h); 9 remapped pgs

rgw: 1 daemon active (ceph-rgw03)

 

  data:

pools:   18 pools, 8252 pgs

objects: 105.59M objects, 294 TiB

usage:   340 TiB used, 84 TiB / 424 TiB avail

pgs: 197930/116420259 objects misplaced (0.170%)

 8243 active+clean

 9active+remapped+backfill_toofull

 

ID  CLASS   WEIGHTREWEIGHT SIZERAW USE DATAOMAPMETAAVAIL
%USE  VAR  PGS STATUS TYPE NAME

-41 392.89362- 393 TiB 317 TiB 317 TiB  19 MiB 561 GiB  75
TiB 00   -root archive

-44 130.95793- 131 TiB 106 TiB 106 TiB 608 KiB 188 GiB  25
TiB 81.29 1.01   -host CEPH005

83 archive  10.91309  1.0  11 TiB 9.2 TiB 9.1 TiB  56 KiB  16 GiB 1.7
TiB 83.97 1.05 131 up osd.83

84 archive  10.91309  1.0  11 TiB 8.5 TiB 8.5 TiB  56 KiB  15 GiB 2.4
TiB 78.33 0.98 121 up osd.84

85 archive  10.91309  1.0  11 TiB 9.0 TiB 9.0 TiB  92 KiB  16 GiB 1.9
TiB 82.24 1.03 129 up osd.85

86 archive  10.91309  1.0  11 TiB  10 TiB  10 TiB  48 KiB  18 GiB 597
GiB 94.66 1.18 148 up osd.86

87 archive  10.91399  1.0  11 TiB  10 TiB  10 TiB  88 KiB  18 GiB 596
GiB 94.66 1.18 149 up osd.87

88 archive  10.91309  1.0  11 TiB 8.3 TiB 8.3 TiB  56 KiB  14 GiB 2.6
TiB 76.02 0.95 119 up osd.88

97 archive  10.91309  1.0  11 TiB 7.7 TiB 7.7 TiB  44 KiB  14 GiB 3.2
TiB 70.44 0.88 109 up osd.97

98 archive  10.91309  1.0  11 TiB 8.4 TiB 8.4 TiB  72 KiB  15 GiB 2.5
TiB 77.18 0.96 126 up osd.98

99 archive  10.91309  1.0  11 TiB 8.0 TiB 8.0 TiB  24 KiB  15 GiB 2.9
TiB 73.29 0.91 116 up osd.99

100 archive  10.91309  1.0  11 TiB 9.5 TiB 9.4 TiB  32 KiB  17 GiB 1.4
TiB 86.72 1.08 132 up osd.100

101 archive  10.91309  1.0  11 TiB 8.4 TiB 8.4 TiB  12 KiB  15 GiB 2.5
TiB 76.71 0.96 125 up osd.101

102 archive  10.91309  1.0  11 TiB 8.9 TiB 8.8 TiB  28 KiB  15 GiB 2.1
TiB 81.20 1.01 125 up osd.102

-17 0- 0 B 0 B 0 B 0 B 0 B 0
B 00   -host CEPH006

-26 130.96783- 131 TiB 104 TiB 104 TiB 6.7 MiB 184 GiB  27
TiB 79.21 0.99   -host CEPH007

14 archive  10.91399  1.0  11 TiB 8.3 TiB 8.3 TiB  28 KiB  15 GiB 2.6
TiB 76.06 0.95 126 up osd.14

15 archive  10.91399  1.0  11 TiB 8.9 TiB 8.9 TiB  84 KiB  16 GiB 2.0
TiB 81.72 1.02 130 up osd.15

16 archive  10.91399  1.0  11 TiB 8.7 TiB 8.7 TiB  80 KiB  15 GiB 2.2
TiB 79.98 1.00 127 up osd.16

39 archive  10.91399  1.0  11 TiB 8.1 TiB 8.1 TiB 3.4 MiB  14 GiB 2.8
TiB 74.26 0.93 118 up osd.39

40 archive  10.91399  1.0  11 TiB 9.2 TiB 9.2 TiB  53 KiB  16 GiB 1.7
TiB 84.53 1.05 132 up osd.40

44 archive  10.91399  1.0  11 TiB 8.1 TiB 8.1 TiB 2.6 MiB  15 GiB 2.8
TiB 74.40 0.93 117 up osd.44

48 archive  10.91399  1.0  11 TiB 9.7 TiB 9.7 TiB  44 KiB  17 GiB 1.2
TiB 89.02 1.11 135 up osd.48

49 archive  10.91399  1.0  11 TiB 8.6 TiB 8.6 TiB 132 KiB  15 GiB 2.3
TiB 78.90 0.98 126 up osd.49

52 archive  10.91399  1.0  11 TiB 9.3 TiB 9.3 TiB  28 KiB  17 GiB 1.6
TiB 85.28 1.06 134 up osd.52

77 archive  10.91399  1.0  11 TiB 8.1 TiB 8.1 TiB  73 KiB  15 GiB 2.8
TiB 74.44 0.93 118 up osd.77

89 archive  10.91399  1.0  11 TiB 7.2 TiB 7.2 TiB  60 KiB  13 GiB 3.7
TiB 66.22 0.83 106 up osd.89

90 archive  10.91399  1.0  11 TiB 9.4 TiB 9.3 TiB  48 KiB  16 GiB 1.6
TiB 85.68 1.07 137 up osd.90

-31 130.96783- 131 TiB 107 TiB 107 TiB  12 MiB 189 GiB  24
TiB 81.86 1.02   -host CEPH008

  5 archive  10.91399  1.0  11 TiB 9.6 TiB 9.6 TiB 2.7 MiB  17 GiB 1.3
TiB 87.81 1.10 135 up osd.5

  6 archive  10.91399  1.0  11 TiB 8.4 TiB 8.4 TiB 3.9 MiB  16 GiB 2.5
TiB 77.19 0.96 124 up osd.6

11 archive  10.91399  1.0  11 TiB 8.9 TiB 8.8 TiB  48 KiB  16 GiB 2.1
TiB 81.11 1.01 128 up osd.11

45 archive  10.91399  1.0  11 TiB 9.5 TiB 9.4 TiB  48 KiB  17 GiB 1.5
TiB 86.66 1.08 138 up osd.45

46 archive  10.91399  1.0  11 

Re: [ceph-users] Applications slow in VMs running RBD disks

2019-08-21 Thread EDH - Manuel Rios Fernandez
Use 100% Flash setup avoid rotational disk for get some performance in HDD with 
windows.

 

Windows is very sensitive to disk latency and interface with latency provides a 
bad sense for customer some times.

 

You can check in your Graphana for ceph your avg read/write when in windows go 
up 50ms or more the performance is a painful

 

Regards

 

Manuel

 

 

De: ceph-users  En nombre de Gesiel Galvão 
Bernardes
Enviado el: miércoles, 21 de agosto de 2019 14:26
Para: ceph-users 
Asunto: [ceph-users] Applications slow in VMs running RBD disks

 

Hi,

I`m use a Qemu/kvm(Opennebula) with Ceph/RBD for running VMs, and I having 
problems with slowness in aplications that many times not consuming very CPU or 
RAM. This problem affect mostly Windows. Appearly the problem is that normally 
the application load many short files (ex: DLLs)  and these files take a long 
time to load, generating a slowness.

I using 8Tb disks, with 3x replica (I've tried with erasure and 2x too), and 
tried with SSD cache and without SSD cache and the problem persists. Using the 
same disks with NFS, the applications run fine.

I've already tried change RBD object size (from 4mb to 128k), use qemu 
writeback cache, configure virtio iscsi queues, use virtio (virtio-blk) driver, 
and and none of these brought effective improvement for this problem.

Anyone already had similar problem and / or have any idea how to solve this? Or 
have a idea that where I should look to resolve this?

Thanks Advance,

Gesiel

 

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] fixing a bad PG per OSD decision with pg-autoscaling?

2019-08-21 Thread EDH - Manuel Rios Fernandez
HI Nigel,

 

In Nautilus you can decrease PG , but it take weeks , for example for us to go 
from 4096 to 2048 took more than 2 weeks.

 

First at all pg-autoscaling is activable by pool. And you’re going to get a lot 
of warning , but it works.

 

Normally is recommended upgrade a cluster with HEALTH_OK state.

 

Also is recommended to use the unmap method the get the perfect distribution at 
balancer module, but it don’t work with misplaced/degraded error states.

 

>From my poin of view I will try go Healthy , them upgrade.

 

Remember that you MUST repair all your SSD pre-nautilus due statistics scheme 
changed. 

 

Regards

 

Manuel

 

 

De: ceph-users  En nombre de Nigel Williams
Enviado el: miércoles, 21 de agosto de 2019 0:33
Para: ceph-users 
Asunto: [ceph-users] fixing a bad PG per OSD decision with pg-autoscaling?

 

Due to a gross miscalculation several years ago I set way too many PGs for our 
original Hammer cluster. We've lived with it ever since, but now we are on 
Luminous, changes result in stuck-requests and balancing problems. 

 

The cluster currently has 12% misplaced, and is grinding to re-balance but is 
unusable to clients (even with osd_max_pg_per_osd_hard_ratio set to 32, and 
mon_max_pg_per_osd set to 1000).

 

Can I safely press on upgrading to Nautilus in this state so I can enable the 
pg-autoscaling to finally fix the problem?

 

thanks.

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Balancer code

2019-08-17 Thread EDH - Manuel Rios Fernandez
 

Hi ,

 

Whats the reason for not allow balancer PG if objects are inactive/misplaced
at least in nautilus 14.2.2 ?

 

https://github.com/ceph/ceph/blob/master/src/pybind/mgr/balancer/module.py#L
874

 


if unknown > 0.0:


detail = 'Some PGs (%f) are unknown; try again later' % unknown


self.log.info(detail)


return -errno.EAGAIN, detail


elif degraded > 0.0:


detail = 'Some objects (%f) are degraded; try again later' %
degraded


self.log.info(detail)


return -errno.EAGAIN, detail


elif inactive > 0.0:


detail = 'Some PGs (%f) are inactive; try again later' %
inactive


self.log.info(detail)


return -errno.EAGAIN, detail


elif misplaced >= max_misplaced:


detail = 'Too many objects (%f > %f) are misplaced; ' \


 'try again later' % (misplaced, max_misplaced)


self.log.info(detail)


return -errno.EAGAIN, detail

 

A lot of time, objects are misplaced and degraded because balancer just run
in healthy periods , but from my point of view , there're states "misplaced"
& degraded where balancer become a must, because finally ceph admin need to
do manually a ceph reweight to do balancer job and allow our cluster to be
healthy for allow balancer start working.

 

We can understood that balancer cant work with unknow pgs states and
inactive states. But. missing and misplaced.

 

Hope some developer can clarify that. This lines cause a lot of problem at
least in nautilus 14.2.2

 

Case example:

*   Pool Size 1, upgraded to Size 2. Cluster become Warning with
misplaced and degraded. Some objects are don't recovery from degraded state
due "OSD backfullfill_toofull "due OSDs became full instead of even
distributed and balanced, because balancer code exclude it.
*   Solution manual reweight. but have not sense

 

 

Regards

Manuel

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 14.2.2 - OSD Crash

2019-08-07 Thread EDH - Manuel Rios Fernandez
Hi Igor 

 

Yes we got all in same device :

 

[root@CEPH-MON01 ~]# ceph osd df tree

ID  CLASS   WEIGHTREWEIGHT SIZERAW USE DATAOMAPMETA
AVAIL   %USE  VAR  PGS STATUS TYPE NAME

31 130.96783- 131 TiB 114 TiB 114 TiB  14 MiB  204 GiB  17
TiB 86.88 1.03   -host CEPH008

  5 archive  10.91399  0.80002  11 TiB 7.9 TiB 7.9 TiB 2.6 MiB   15 GiB 3.0
TiB 72.65 0.86 181 up osd.5

  6 archive  10.91399  1.0  11 TiB 9.4 TiB 9.3 TiB 5.8 MiB   17 GiB 1.6
TiB 85.76 1.01 222 up osd.6

11 archive  10.91399  1.0  11 TiB  10 TiB  10 TiB  48 KiB   19 GiB 838
GiB 92.50 1.09 251 up osd.11

45 archive  10.91399  1.0  11 TiB  10 TiB  10 TiB 148 KiB   18 GiB 678
GiB 93.94 1.11 248 up osd.45

46 archive  10.91399  1.0  11 TiB 9.6 TiB 9.5 TiB 4.7 MiB   17 GiB 1.4
TiB 87.52 1.04 235 up osd.46

47 archive  10.91399  1.0  11 TiB 8.8 TiB 8.8 TiB  68 KiB   17 GiB 2.1
TiB 80.43 0.95 211 up osd.47

55 archive  10.91399  1.0  11 TiB 9.9 TiB 9.9 TiB 132 KiB   17 GiB 1.0
TiB 90.74 1.07 243 up osd.55

70 archive  10.91399  1.0  11 TiB  10 TiB  10 TiB  44 KiB   19 GiB 864
GiB 92.27 1.09 236 up osd.70

71 archive  10.91399  1.0  11 TiB 9.2 TiB 9.2 TiB  28 KiB   16 GiB 1.7
TiB 84.19 1.00 228 up osd.71

78 archive  10.91399  1.0  11 TiB 8.9 TiB 8.9 TiB 182 KiB   16 GiB 2.0
TiB 81.87 0.97 215 up osd.78

79 archive  10.91399  1.0  11 TiB  10 TiB  10 TiB 152 KiB   17 GiB 958
GiB 91.43 1.08 238 up osd.79

91 archive  10.91399  1.0  11 TiB 9.7 TiB 9.7 TiB  92 KiB   17 GiB 1.2
TiB 89.22 1.06 232 up osd.91

 

Disk are HGST of 12TB for archive porpourse.

 

In the same osd we got sine commit bluestore log latency

 

2019-08-07 06:57:33.681 7f059b06e700  0 bluestore(/var/lib/ceph/osd/ceph-46)
queue_transactions slow operation observed for l_bluestore_submit_lat,
latency = 11.163s

2019-08-07 06:57:33.703 7f05a8088700  0 bluestore(/var/lib/ceph/osd/ceph-46)
_txc_committed_kv slow operation observed for l_bluestore_commit_lat,
latency = 11.1858s, txc = 0x55e9e3ea2c00

2019-08-07 09:14:00.620 7f059d072700  0 bluestore(/var/lib/ceph/osd/ceph-46)
queue_transactions slow operation observed for l_bluestore_submit_lat,
latency = 7.23777s

2019-08-07 09:14:00.650 7f05a8088700  0 bluestore(/var/lib/ceph/osd/ceph-46)
_txc_committed_kv slow operation observed for l_bluestore_commit_lat,
latency = 7.26778s, txc = 0x55eaafbf6600

2019-08-07 09:19:08.242 7f059e875700  0 bluestore(/var/lib/ceph/osd/ceph-46)
queue_transactions slow operation observed for l_bluestore_submit_lat,
latency = 81.8293s

2019-08-07 09:19:08.291 7f05a8088700  0 bluestore(/var/lib/ceph/osd/ceph-46)
_txc_committed_kv slow operation observed for l_bluestore_commit_lat,
latency = 81.8609s, txc = 0x55ea05ee6000

2019-08-07 09:19:08.467 7f059b06e700  0 bluestore(/var/lib/ceph/osd/ceph-46)
queue_transactions slow operation observed for l_bluestore_submit_lat,
latency = 87.7795s

2019-08-07 09:19:08.481 7f05a8088700  0 bluestore(/var/lib/ceph/osd/ceph-46)
_txc_committed_kv slow operation observed for l_bluestore_commit_lat,
latency = 87.7928s, txc = 0x55eaa7a40600

 

Maybe move OMAP +META from all OSD to a NVME of 480GB per node helps in this
situation but not sure.

 

Manuel

 

 

 

 

De: Igor Fedotov  
Enviado el: miércoles, 7 de agosto de 2019 13:10
Para: EDH - Manuel Rios Fernandez ; 'Ceph Users'

Asunto: Re: [ceph-users] 14.2.2 - OSD Crash

 

Hi Manuel,

as Brad pointed out timeouts and suicides are rather consequences of some
other issues with OSDs.

I recall at least two recent relevant tickets:

https://tracker.ceph.com/issues/36482

https://tracker.ceph.com/issues/40741 (see last comments)

Both had massive and slow reads from RocksDB which caused timeouts..

Visible symptom for both cases was  unexpectedly high read I/O from
underlying disks (main and/or DB). 

You can use iotop for inspection..,

These were worsened by having significant part of DB at spinners due to
spillovers. So wondering what's your layout in this respect:

what drives back troublesome OSDs, is there any spillover to slow device,
how massive it is?

Also could you please inspect your OSD logs for the presence of lines
containing "slow operation observed" substring. And share them if any..

 

Hope this helps.

Thanks,

Igor

 

 

On 8/7/2019 2:16 AM, EDH - Manuel Rios Fernandez wrote:

Hi 

 

We got a pair of OSD located in  node that crash randomly since 14.2.2

 

OS Version : Centos 7.6

 

There’re a ton of lines before crash , I will unespected:

 

--

3045> 2019-08-07 00:39:32.013 7fe9a4996700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

-3044> 2019-08-07 00:39:32.013 7fe9a3994700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

-3043> 2019-08-07 00:39:32.

[ceph-users] Nautilus - Balancer is always on

2019-08-07 Thread EDH - Manuel Rios Fernandez
Hi All,

 

 

ceph mgr module disable balancer

Error EINVAL: module 'balancer' cannot be disabled (always-on)

 

Whats the way to restart balanacer? Restart MGR service?

 

I wanna suggest to Balancer developer to setup a ceph-balancer.log for this
module get more information about whats doing.

 

Regards

 

Manuel

 

 

 

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW (Ceph Object Gateay) Pools

2019-08-06 Thread EDH - Manuel Rios Fernandez
Hi,

I think -> default.rgw.buckets.index for us it reach 2k-6K iops for a index
size of 23GB.

Regards
Manuel



-Mensaje original-
De: ceph-users  En nombre de
dhils...@performair.com
Enviado el: miércoles, 7 de agosto de 2019 1:41
Para: ceph-users@lists.ceph.com
Asunto: [ceph-users] RadosGW (Ceph Object Gateay) Pools

All;

Based on the PG Calculator, on the Ceph website, I have this list of pools
to pre-create for my Object Gateway:
.rgw.root
default.rgw.control
default.rgw.data.root
default.rgw.gc
default.rgw.log
default.rgw.intent-log
default.rgw.meta
default.rgw.usage
default.rgw.users.keys
default.rgw.users.email
default.rgw.users.uid
default.rgw.buckets.extra
default.rgw.buckets.index
default.rgw.buckets.data

I have a limited amount of SSDs, and I plan to create rules which limit
pools to either HDD or SSD.  My HDDs have their block.db on NVMe devices.

I intend to use the SSDs primarily to back RBD for ISCSi, to support
virtualization, but I'm not opposed to using some of the space to speed up
RGW.

Which pool(s) would have the most impact on the performance of RGW to have
on SSDs?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 14.2.2 - OSD Crash

2019-08-06 Thread EDH - Manuel Rios Fernandez
Hi 

 

We got a pair of OSD located in  node that crash randomly since 14.2.2

 

OS Version : Centos 7.6

 

There're a ton of lines before crash , I will unespected:

 

--

3045> 2019-08-07 00:39:32.013 7fe9a4996700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

-3044> 2019-08-07 00:39:32.013 7fe9a3994700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

-3043> 2019-08-07 00:39:32.033 7fe9a4195700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

-3042> 2019-08-07 00:39:32.033 7fe9a4996700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

--

-

 

Some hundred lines of:

-164> 2019-08-07 00:47:36.628 7fe9a3994700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60

  -163> 2019-08-07 00:47:36.632 7fe9a3994700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60

  -162> 2019-08-07 00:47:36.632 7fe9a3994700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60

-

 

   -78> 2019-08-07 00:50:51.755 7fe995bfa700 10 monclient: tick

   -77> 2019-08-07 00:50:51.755 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:50:21.756453)

   -76> 2019-08-07 00:51:01.755 7fe995bfa700 10 monclient: tick

   -75> 2019-08-07 00:51:01.755 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:50:31.756604)

   -74> 2019-08-07 00:51:11.755 7fe995bfa700 10 monclient: tick

   -73> 2019-08-07 00:51:11.755 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:50:41.756788)

   -72> 2019-08-07 00:51:21.756 7fe995bfa700 10 monclient: tick

   -71> 2019-08-07 00:51:21.756 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:50:51.756982)

   -70> 2019-08-07 00:51:31.755 7fe995bfa700 10 monclient: tick

   -69> 2019-08-07 00:51:31.755 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:51:01.757206)

   -68> 2019-08-07 00:51:41.756 7fe995bfa700 10 monclient: tick

   -67> 2019-08-07 00:51:41.756 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:51:11.757364)

   -66> 2019-08-07 00:51:51.756 7fe995bfa700 10 monclient: tick

   -65> 2019-08-07 00:51:51.756 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:51:21.757535)

   -64> 2019-08-07 00:51:52.861 7fe987e49700  1 heartbeat_map clear_timeout
'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

   -63> 2019-08-07 00:51:52.861 7fe987e49700  1 heartbeat_map clear_timeout
'OSD::osd_op_tp thread 0x7fe987e49700' had suicide timed out after 150

   -62> 2019-08-07 00:51:52.948 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1721180160 mapped: 4297818112 old cache_size:
1994018210 new cache size: 1992784572

   -61> 2019-08-07 00:51:52.948 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1992784572
kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used:
654593191 data_alloc: 452984832 data_used: 455929856

   -60> 2019-08-07 00:51:57.923 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1994110827
kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -59> 2019-08-07 00:51:57.973 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
1994110827 new cache size: 1994442069

   -58> 2019-08-07 00:52:01.756 7fe995bfa700 10 monclient: tick

   -57> 2019-08-07 00:52:01.756 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:51:31.757684)

   -56> 2019-08-07 00:52:02.933 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1995765747
kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -55> 2019-08-07 00:52:02.983 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
1995765747 new cache size: 1996096345

   -54> 2019-08-07 00:52:07.943 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1997417449
kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -53> 2019-08-07 00:52:07.993 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 

Re: [ceph-users] radosgw (beast): how to enable verbose log? request, user-agent, etc.

2019-08-06 Thread EDH - Manuel Rios Fernandez
Hi Felix,

 

You can increase debug option with debug rgw in your rgw nodes.

 

We got it to 10.

 

But at least in our case we switched again to civetweb because it don’t provide 
a clear log without a lot verbose.

 

Regards

 

Manuel

 

 

De: ceph-users  En nombre de Félix Barbeira
Enviado el: martes, 6 de agosto de 2019 17:43
Para: Ceph Users 
Asunto: [ceph-users] radosgw (beast): how to enable verbose log? request, 
user-agent, etc.

 

Hi,

 

I'm testing radosgw with beast backend and I did not found a way to view more 
information on logfile. This is an example:

 

2019-08-06 16:59:14.488 7fc808234700  1 == starting new request 
req=0x5608245646f0 =
2019-08-06 16:59:14.496 7fc808234700  1 == req done req=0x5608245646f0 op 
status=0 http_status=204 latency=0.00800043s ==


 

I would be interested on typical fields that a regular webserver has: origin, 
request, useragent, etc. I checked the official docs but I don't find anything 
related:

 

https://docs.ceph.com/docs/nautilus/radosgw/frontends/ 
 

 

The only manner I found is to put in front a nginx server running as a proxy or 
an haproxy, but I really don't like that solution because it would be an 
overhead component used only to log requests. Anyone in the same situation?

 

Thanks in advance.

-- 

Félix Barbeira.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] even number of monitors

2019-08-05 Thread EDH - Manuel Rios Fernandez
With 4 monitors if you lost 2 , your quorum will get out, because it needs
be N+1

Monitors recommended:

1 - 3 - 5 - 7

Regards
Manuel


-Mensaje original-
De: ceph-users  En nombre de Alfredo
Daniel Rezinovsky
Enviado el: lunes, 5 de agosto de 2019 12:28
Para: ceph-users 
Asunto: [ceph-users] even number of monitors

With 3 monitors, paxos needs at least 2 to reach consensus about the cluster
status

With 4 monitors, more than half is 3. The only problem I can see here is
that I will have only 1 spare monitor.

There's any other problem with and even number of monitors?

--
Alfrenovsky

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adventures with large RGW buckets

2019-08-01 Thread EDH - Manuel Rios Fernandez
HI Greg / Eric,

What about allow delete bucket object with a lifecycle policy?

You can actually put 1 day of object life, that task is done at cluster level. 
And them delete objects young than 1 day, and remove bucket.

That sometimes speed deletes as task is done by rgw's.

It should be like a background delete option, due deleting bucket of millions 
of objects take weeks.

Regards





-Mensaje original-
De: ceph-users  En nombre de Gregory Farnum
Enviado el: jueves, 1 de agosto de 2019 22:48
Para: Eric Ivancich 
CC: Ceph Users ; d...@ceph.io
Asunto: Re: [ceph-users] Adventures with large RGW buckets

On Thu, Aug 1, 2019 at 12:06 PM Eric Ivancich  wrote:
>
> Hi Paul,
>
> I’ll interleave responses below.
>
> On Jul 31, 2019, at 2:02 PM, Paul Emmerich  wrote:
>
> How could the bucket deletion of the future look like? Would it be 
> possible to put all objects in buckets into RADOS namespaces and 
> implement some kind of efficient namespace deletion on the OSD level 
> similar to how pool deletions are handled at a lower level?
>
> I’ll raise that with other RGW developers. I’m unfamiliar with how RADOS 
> namespaces are handled.

I expect RGW could do this, but unfortunately deleting namespaces at the RADOS 
level is not practical. People keep asking and maybe in some future world it 
will be cheaper, but a namespace is effectively just part of the object name 
(and I don't think it's even the first thing they sort by for the key entries 
in metadata tracking!), so deleting a namespace would be equivalent to deleting 
a snapshot[1] but with the extra cost that namespaces can be created 
arbitrarily on every write operation (so our solutions for handling snapshots 
without it being ludicrously expensive wouldn't apply). Deleting a namespace 
from the OSD-side using map updates would require the OSD to iterate through 
just about all the objects they have and examine them for deletion.

Is it cheaper than doing over the network? Sure. Is it cheap enough we're 
willing to let a single user request generate that kind of cluster IO on an 
unconstrained interface? Absolutely not.
-Greg
[1]: Deleting snapshots is only feasible because every OSD maintains a sorted 
secondary index from snapid->set. This is only possible because 
snapids are issued by the monitors and clients cooperate in making sure they 
can't get reused after being deleted.
Namespaces are generated by clients and there are no constraints on their use, 
reuse, or relationship to each other. We could maybe work around these 
problems, but it'd be building a fundamentally different interface than what 
namespaces currently are.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Balancer in HEALTH_ERR

2019-08-01 Thread EDH - Manuel Rios Fernandez
Hi Eric,

 

CEPH006 is the node that we’re evacuating , for that task we added CEPH005.

 

Thanks 

 

De: Smith, Eric  
Enviado el: jueves, 1 de agosto de 2019 20:12
Para: EDH - Manuel Rios Fernandez ; 
ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] Balancer in HEALTH_ERR

 

>From your pastebin data – it appears you need to change the crush weight of 
>the OSDs on CEPH006? They all have crush weight of 0, when other OSDs seem to 
>have a crush weight of 10.91309. You might look into the ceph osd crush 
>reweight-subtree command.

 

Eric

 

From: ceph-users mailto:ceph-users-boun...@lists.ceph.com> > on behalf of EDH - Manuel Rios 
Fernandez mailto:mrios...@easydatahost.com> >
Date: Thursday, August 1, 2019 at 1:52 PM
To: "ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> " 
mailto:ceph-users@lists.ceph.com> >
Subject: [ceph-users] Balancer in HEALTH_ERR

 

Hi ,

 

Two weeks ago, we started a data migration from one old ceph node to a new one.

For task we added a 120TB Host to the cluster and evacuated the old one with 
the ceph osd crush reweight osd.X 0.0 that move near 15 TB per day.

 

After 1 week and few days we found that balancer module don’t work fine under 
this situacion it don’t distribute data between OSD if cluster is not HEALTH 
status.

 

The current situation , some osd are at 96% and others at 75% , causing some 
pools get very nearfull 99%.

 

I read several post about balancer only works in HEALHTY mode and that’s the 
problem, because ceph don’t distribute the data equal between OSD in native 
mode, causing in the scenario of “Evacuate+Add” huge problems.

 

Info: https://pastebin.com/HuEt5Ukn

 

Right now for solve we are manually change weight of most used osd.

 

Anyone more got this problem?

 

Regards

 

Manuel

 

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Balancer in HEALTH_ERR

2019-08-01 Thread EDH - Manuel Rios Fernandez
Hi ,

 

Two weeks ago, we started a data migration from one old ceph node to a new
one.

For task we added a 120TB Host to the cluster and evacuated the old one with
the ceph osd crush reweight osd.X 0.0 that move near 15 TB per day.

 

After 1 week and few days we found that balancer module don't work fine
under this situacion it don't distribute data between OSD if cluster is not
HEALTH status.

 

The current situation , some osd are at 96% and others at 75% , causing some
pools get very nearfull 99%.

 

I read several post about balancer only works in HEALHTY mode and that's the
problem, because ceph don't distribute the data equal between OSD in native
mode, causing in the scenario of "Evacuate+Add" huge problems.

 

Info: https://pastebin.com/HuEt5Ukn

 

Right now for solve we are manually change weight of most used osd.

 

Anyone more got this problem?

 

Regards

 

Manuel

 

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Nautilus - can't balance due to degraded state

2019-07-29 Thread EDH - Manuel Rios Fernandez
Same here,

 

Nautilus 14.2.2.

 

Evacuate one host and join another one at the same time and all is
unbalance. 

 

Best 

 

De: ceph-users  En nombre de David
Herselman
Enviado el: lunes, 29 de julio de 2019 11:31
Para: ceph-users@lists.ceph.com
Asunto: [ceph-users] Ceph Nautilus - can't balance due to degraded state

 

Hi,

 

We appear to be stuck in a proverbial chicken and egg situation. Degraded
placement groups won't backfill as OSDs are near full and we can't run the
balancer as some placement groups are degraded.

 

We upgraded Ceph from Luminous 12.2.12 to Nautilus 14.2.1 on a cluster used
for backup services. We are in the process of migrating data (nearly
complete), after which we'll be able to repurpose the old systems as
additional Ceph OSD nodes. Our cluster was subsequently at about 75%
utilisation and the balancer module together with upmap did a great job.
We've historically been very conservative with placement group numbering,
considering that smaller drives generally get replaced with much larger ones
and the PGs per OSD subsequently grow to problematic levels.

 

The upgrade process was so extremely painless that we also enabled the
pg_autoscaler module which subsequently marked 75% of the data as miss
placed, but also degraded various placement groups. The result is now that
we have many placement groups marked as nearfull, but can't run the balancer
as some placement groups are in a degraded state.

 

 

Is there a way we can override the degraded check and force the balancer to
redistribute PGs; or could we manually adjust OSDs to have the same effect?

 

Is there alternatively a way that we can get Ceph to first heal the degraded
PGs and only then work on the miss placed ones?

 

 

There are only 3 RBD images in this cluster, a 80GB operating system image
in a replicated SSD pool, a 150TB erasure coded image and a relatively tiny
replicated SSD caching tier for the EC pool.

 

[admin@kvm7e ~]# ceph osd lspools

1 rbd_ssd

5 cephfs_data

6 cephfs_metadata

7 rbd_hdd

8 ec_hdd

9 rbd_hdd_cache

10 ec_hdd_cache

 

[admin@kvm7e ~]# for f in `ceph osd lspools | cut -d\  -f2`; do ceph osd
pool set $f pg_autoscale_mode on; done;

set pool 1 pg_autoscale_mode to on

set pool 5 pg_autoscale_mode to on

set pool 6 pg_autoscale_mode to on

set pool 7 pg_autoscale_mode to on

set pool 8 pg_autoscale_mode to on

set pool 9 pg_autoscale_mode to on

set pool 10 pg_autoscale_mode to on

 

 

 

Concerning was that Ceph marked OSDs are near full although this is by
default only when an OSD reaches 85% utilisation. I presume Ceph projects
the resulting storage utilisation based on the weighting set by the
balancer?

 

[admin@kvm7e ~]# ceph health detail

HEALTH_ERR noout flag(s) set; 6 nearfull osd(s); 4 pool(s) nearfull; Reduced
data availability: 2 pgs inactive; Degraded data redundancy (low space): 4
pgs backfill_toofull

OSDMAP_FLAGS noout flag(s) set

OSD_NEARFULL 6 nearfull osd(s)

osd.100 is near full

osd.101 is near full

osd.102 is near full

osd.103 is near full

osd.104 is near full

osd.105 is near full

POOL_NEARFULL 4 pool(s) nearfull

pool 'cephfs_data' is nearfull

pool 'cephfs_metadata' is nearfull

pool 'rbd_hdd' is nearfull

pool 'ec_hdd' is nearfull

PG_AVAILABILITY Reduced data availability: 2 pgs inactive

pg 7.1e is stuck inactive for 437.102346, current state
clean+premerge+peered, last acting [303,104,405]

pg 7.3e is stuck inactive for 436.965670, current state
remapped+premerge+backfill_wait+peered, last acting [405,104,301]

PG_DEGRADED_FULL Degraded data redundancy (low space): 4 pgs
backfill_toofull

pg 8.c8 is active+remapped+backfill_wait+backfill_toofull, acting
[305,104,404,504,203]

pg 8.1bb is active+remapped+backfill_wait+backfill_toofull, acting
[505,204,102,304,404]

pg 8.326 is active+remapped+backfill_wait+backfill_toofull, acting
[302,504,402,103,202]

pg 8.3e0 is active+remapped+backfill_wait+backfill_toofull, acting
[202,402,103,305,505]

 

[admin@kvm7e ~]# ceph osd df

IDCLASS WEIGHT  REWEIGHT SIZERAW USE DATAOMAPMETA AVAIL
%USE  VAR  PGS STATUS

  100   hdd 1.81929  1.0 1.8 TiB 1.3 TiB 1.3 TiB 9.2 MiB  2.5 GiB 490
GiB 73.71 1.01  59 up

  101   hdd 1.81929  1.0 1.8 TiB 1.3 TiB 1.3 TiB 4.2 MiB  2.5 GiB 489
GiB 73.77 1.01  58 up

  102   hdd 5.45789  1.0 5.5 TiB 4.0 TiB 4.0 TiB  21 MiB  7.4 GiB 1.4
TiB 73.63 1.01 175 up

  103   hdd 5.45789  1.0 5.5 TiB 4.0 TiB 4.0 TiB  20 MiB  7.4 GiB 1.4
TiB 73.49 1.00 176 up

  104   hdd 9.09560  1.0 9.1 TiB 6.9 TiB 6.9 TiB  28 MiB   13 GiB 2.2
TiB 75.68 1.03 304 up

  105   hdd 9.09560  1.0 9.1 TiB 6.9 TiB 6.9 TiB  23 MiB   13 GiB 2.2
TiB 75.63 1.03 301 up

  200   hdd 1.81929  1.0 1.8 TiB 1.3 TiB 1.3 TiB 4.1 MiB  2.5 GiB 492
GiB 73.61 1.01  59 up

  201   hdd 1.81929  1.0 1.8 TiB 1.3 TiB 1.3 TiB 4.2 MiB  2.5 GiB 491
GiB 73.65 1.01  58 up

  202   hdd 

Re: [ceph-users] Nautilus dashboard: crushmap viewer shows only first root

2019-07-24 Thread EDH - Manuel Rios Fernandez
Hi Eugen,

Yes its solved, we reported in 14.2.1 and team fixed in 14.2.2

Regards,
Manuel

-Mensaje original-
De: ceph-users  En nombre de Eugen Block
Enviado el: miércoles, 24 de julio de 2019 15:10
Para: ceph-users@lists.ceph.com
Asunto: [ceph-users] Nautilus dashboard: crushmap viewer shows only first
root

Hi all,

we just upgraded our cluster to:

ceph version 14.2.0-300-gacd2f2b9e1
(acd2f2b9e196222b0350b3b59af9981f91706c7f) nautilus (stable)

When clicking through the dashboard to see what's new we noticed that the
crushmap viewer only shows the first root of our crushmap (we have two
roots). I couldn't find anything in the tracker, and I can't update further
to the latest release 14.2.2 to see if that has been resolved. Is this known
or already fixed?

Regards
Eugen

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Repair statsfs fail some osd 14.2.1 to 14.2.2

2019-07-23 Thread EDH - Manuel Rios Fernandez
Hi Ceph,

 

Upgraded last night from 14.2.1 to 14.2.2, 36 osd with old stats. We're
still repairing stats one by one . But one failed.

 

Hope this helps.

 

CentOS Version: Linux CEPH006 3.10.0-957.10.1.el7.x86_64 #1 SMP Mon Mar 18
15:06:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

 

 

[root@CEPH006 ~]# ceph-bluestore-tool repair --path
/var/lib/ceph/osd/ceph-10

src/central_freelist.cc:333] tcmalloc: allocation failed 8192

terminate called after throwing an instance of 'ceph::buffer::bad_alloc'

  what():  buffer::bad_alloc

*** Caught signal (Aborted) **

in thread 7f823c8e3f00 thread_name:ceph-bluestore-

ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus
(stable)

1: (()+0xf5d0) [0x7f8230dab5d0]

2: (gsignal()+0x37) [0x7f822f5762c7]

3: (abort()+0x148) [0x7f822f5779b8]

4: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f822fe857d5]

5: (()+0x5e746) [0x7f822fe83746]

6: (()+0x5e773) [0x7f822fe83773]

7: (()+0x5e993) [0x7f822fe83993]

8: (()+0x250478) [0x7f82328c7478]

9: (ceph::buffer::create_aligned_in_mempool(unsigned int, unsigned int,
int)+0x2b1) [0x7f8232bf6791]

10: (ceph::buffer::create_aligned(unsigned int, unsigned int)+0x22)
[0x7f8232bf6812]

11: (ceph::buffer::copy(char const*, unsigned int)+0x2c) [0x7f8232bf71cc]

12: (BlueStore::Blob::decode(BlueStore::Collection*,
ceph::buffer::v14_2_0::ptr::iterator_impl&, unsigned long, unsigned
long*, bool)+0x23e) [0x55ba137eafce]

13: (BlueStore::ExtentMap::decode_some(ceph::buffer::v14_2_0::list&)+0x8d6)
[0x55ba137f3536]

14: (BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned
int)+0x2b2) [0x55ba137f3c82]

15: (BlueStore::_fsck(bool, bool)+0x22a5) [0x55ba138577e5]

16: (main()+0x107e) [0x55ba136b3ece]

17: (__libc_start_main()+0xf5) [0x7f822f562495]

18: (()+0x27321f) [0x55ba1379b21f]

2019-07-23 10:14:57.156 7f823c8e3f00 -1 *** Caught signal (Aborted) **

in thread 7f823c8e3f00 thread_name:ceph-bluestore-

 

ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus
(stable)

1: (()+0xf5d0) [0x7f8230dab5d0]

2: (gsignal()+0x37) [0x7f822f5762c7]

3: (abort()+0x148) [0x7f822f5779b8]

4: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f822fe857d5]

5: (()+0x5e746) [0x7f822fe83746]

6: (()+0x5e773) [0x7f822fe83773]

7: (()+0x5e993) [0x7f822fe83993]

8: (()+0x250478) [0x7f82328c7478]

9: (ceph::buffer::create_aligned_in_mempool(unsigned int, unsigned int,
int)+0x2b1) [0x7f8232bf6791]

10: (ceph::buffer::create_aligned(unsigned int, unsigned int)+0x22)
[0x7f8232bf6812]

11: (ceph::buffer::copy(char const*, unsigned int)+0x2c) [0x7f8232bf71cc]

12: (BlueStore::Blob::decode(BlueStore::Collection*,
ceph::buffer::v14_2_0::ptr::iterator_impl&, unsigned long, unsigned
long*, bool)+0x23e) [0x55ba137eafce]

13: (BlueStore::ExtentMap::decode_some(ceph::buffer::v14_2_0::list&)+0x8d6)
[0x55ba137f3536]

14: (BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned
int)+0x2b2) [0x55ba137f3c82]

15: (BlueStore::_fsck(bool, bool)+0x22a5) [0x55ba138577e5]

16: (main()+0x107e) [0x55ba136b3ece]

17: (__libc_start_main()+0xf5) [0x7f822f562495]

18: (()+0x27321f) [0x55ba1379b21f]

NOTE: a copy of the executable, or `objdump -rdS ` is needed to
interpret this.

 

terminate called recursively

Aborted

 

---

 

CEPH Startup fail osd 10 fail.

 

ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus
(stable)

1: (()+0xf5d0) [0x7f00ee9045d0]

2: (gsignal()+0x37) [0x7f00ed6f42c7]

3: (abort()+0x148) [0x7f00ed6f59b8]

4: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f00ee0037d5]

5: (()+0x5e746) [0x7f00ee001746]

6: (()+0x5e773) [0x7f00ee001773]

7: (__cxa_rethrow()+0x49) [0x7f00ee0019e9]

8: (std::_Hashtable, std::allocator
>, std::__detail::_Select1st, std::equal_to,
std::hash, std::__detail::_Mod_range_hashing,
std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy,
std::__detail::_Hashtable_traits
>::_M_insert_unique_node(unsigned long, unsigned long,
std::__detail::_Hash_node,
true>*)+0xfd) [0x55e0c6412a8d]

9: (std::__detail::_Map_base, std::allocator
>, std::__detail::_Select1st, std::equal_to,
std::hash, std::__detail::_Mod_range_hashing,
std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy,
std::__detail::_Hashtable_traits,
true>::operator[](osd_reqid_t const&)+0x99) [0x55e0c64478c9]

10: (PGLog::merge_log_dups(pg_log_t const&)+0x328) [0x55e0c6441d28]

11: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&,
PGLog::LogEntryHandler*, bool&, bool&)+0xf6e) [0x55e0c644353e]

12: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&,
pg_shard_t)+0x64) [0x55e0c63a0804]

13: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&,
pg_missing_set&, pg_shard_t)+0x94) [0x55e0c63d1a54]

14: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog
const&)+0x97) [0x55e0c63ed567]

15: (boost::statechart::simple_state,

[ceph-users] RGW Beast crash 14.2.1

2019-07-11 Thread EDH - Manuel Rios Fernandez
Hi Folks,

 

This night RGW crashed without sense using beast as fronted.

We solved turning on civetweb again.

 

Should be report to tracker?

 

Regards

Manuel

 

Centos 7.6

Linux ceph-rgw03 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC
2019 x86_64 x86_64 x86_64 GNU/Linux

 

 

fsid e1ee8086-7cce-43fd-a252-3d677af22428

last_changed 2019-06-17 22:35:18.946810

created 2018-04-17 01:37:27.768960

min_mon_release 14 (nautilus)

0: [v2:172.16.2.5:3300/0,v1:172.16.2.5:6789/0] mon.CEPH-MON01

1: [v2:172.16.2.11:3300/0,v1:172.16.2.11:6789/0] mon.CEPH002

2: [v2:172.16.2.12:3300/0,v1:172.16.2.12:6789/0] mon.CEPH003

3: [v2:172.16.2.10:3300/0,v1:172.16.2.10:6789/0] mon.CEPH001

 

   -18> 2019-07-11 09:05:01.995 7f8441aff700  4 set_mon_vals no callback set

   -17> 2019-07-11 09:05:01.995 7f845f6e47c0 10 monclient: _renew_subs

   -16> 2019-07-11 09:05:01.995 7f845f6e47c0 10 monclient: _send_mon_message
to mon.CEPH003 at v2:172.16.2.12:3300/0

  -15> 2019-07-11 09:05:01.995 7f845f6e47c0  1 librados: init done

   -14> 2019-07-11 09:05:01.995 7f845f6e47c0  5 asok(0x55cd18bac000)
register_command cr dump hook 0x55cd198247a8

   -13> 2019-07-11 09:05:01.996 7f8443302700  4 mgrc handle_mgr_map Got map
version 774

   -12> 2019-07-11 09:05:01.996 7f8443302700  4 mgrc handle_mgr_map Active
mgr is now [v2:172.16.2.10:6858/256331,v1:172.16.2.10:6859/256331]

   -11> 2019-07-11 09:05:01.996 7f8443302700  4 mgrc reconnect Starting new
session with [v2:172.16.2.10:6858/256331,v1:172.16.2.10:6859/256331]

   -10> 2019-07-11 09:05:01.996 7f844c59d700 10 monclient: get_auth_request
con 0x55cd19a62000 auth_method 0

-9> 2019-07-11 09:05:01.997 7f844cd9e700 10 monclient: get_auth_request
con 0x55cd19a62400 auth_method 0

-8> 2019-07-11 09:05:01.997 7f844c59d700 10 monclient: get_auth_request
con 0x55cd19a62800 auth_method 0

-7> 2019-07-11 09:05:01.998 7f845f6e47c0  5 asok(0x55cd18bac000)
register_command sync trace show hook 0x55cd19846c40

-6> 2019-07-11 09:05:01.998 7f845f6e47c0  5 asok(0x55cd18bac000)
register_command sync trace history hook 0x55cd19846c40

-5> 2019-07-11 09:05:01.998 7f845f6e47c0  5 asok(0x55cd18bac000)
register_command sync trace active hook 0x55cd19846c40

-4> 2019-07-11 09:05:01.998 7f845f6e47c0  5 asok(0x55cd18bac000)
register_command sync trace active_short hook 0x55cd19846c40

-3> 2019-07-11 09:05:01.999 7f844d59f700 10 monclient: get_auth_request
con 0x55cd19a62c00 auth_method 0

-2> 2019-07-11 09:05:01.999 7f844cd9e700 10 monclient: get_auth_request
con 0x55cd19a63000 auth_method 0

-1> 2019-07-11 09:05:01.999 7f845f6e47c0  0 starting handler: beast

 0> 2019-07-11 09:05:02.001 7f845f6e47c0 -1 *** Caught signal (Aborted)
**

in thread 7f845f6e47c0 thread_name:radosgw

 

ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus
(stable)

1: (()+0xf5d0) [0x7f845293c5d0]

2: (gsignal()+0x37) [0x7f8451d77207]

3: (abort()+0x148) [0x7f8451d788f8]

4: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f84526867d5]

5: (()+0x5e746) [0x7f8452684746]

6: (()+0x5e773) [0x7f8452684773]

7: (()+0x5e993) [0x7f8452684993]

8: (void
boost::throw_exception(boost::system::system_er
ror const&)+0x173) [0x55cd16d9f863]

9: (boost::asio::detail::do_throw_error(boost::system::error_code const&,
char const*)+0x5b) [0x55cd16d9f91b]

10: (()+0x2837fc) [0x55cd16d8b7fc]

11: (main()+0x2873) [0x55cd16d2a8b3]

12: (__libc_start_main()+0xf5) [0x7f8451d633d5]

13: (()+0x24a877) [0x55cd16d52877]

NOTE: a copy of the executable, or `objdump -rdS ` is needed to
interpret this.

 

--- logging levels ---

   0/ 5 none

   0/ 1 lockdep

   0/ 1 context

   1/ 1 crush

   1/ 5 mds

   1/ 5 mds_balancer

   1/ 5 mds_locker

   1/ 5 mds_log

   1/ 5 mds_log_expire

   1/ 5 mds_migrator

   0/ 1 buffer

   0/ 1 timer

   0/ 1 filer

   0/ 1 striper

   0/ 1 objecter

   0/ 5 rados

   0/ 5 rbd

   0/ 5 rbd_mirror

   0/ 5 rbd_replay

   0/ 5 journaler

   0/ 5 objectcacher

   0/ 5 client

   0/ 0 osd

   0/ 5 optracker

   0/ 5 objclass

   1/ 3 filestore

   0/ 0 journal

   0/ 0 ms

   1/ 5 mon

   0/10 monc

   1/ 5 paxos

   0/ 5 tp

   1/ 5 auth

   1/ 5 crypto

   1/ 1 finisher

   1/ 1 reserver

   1/ 5 heartbeatmap

   1/ 5 perfcounter

   1/ 1 rgw

   1/ 5 rgw_sync

   1/10 civetweb

   1/ 5 javaclient

   1/ 5 asok

   1/ 1 throttle

   0/ 0 refs

   1/ 5 xio

   1/ 5 compressor

   1/ 5 bluestore

   1/ 5 bluefs

   1/ 3 bdev

   1/ 5 kstore

   4/ 5 rocksdb

   4/ 5 leveldb

   4/ 5 memdb

   1/ 5 kinetic

   1/ 5 fuse

   1/ 5 mgr

   1/ 5 mgrc

   1/ 5 dpdk

   1/ 5 eventtrace

  -2/-2 (syslog threshold)

  -1/-1 (stderr threshold)

  max_recent 1

  max_new 1000

  log_file /var/log/ceph/ceph-client.rgw.ceph-rgw03.log

--- end dump of recent events ---

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Even more objects in a single bucket?

2019-06-17 Thread EDH - Manuel Rios Fernandez
Hi Harald ,

We saw in our internal Veeam repo that only 4TB used created more than 10M
objects.

I don't know if Veeam need to list content inside the bucket, that will make
a 500 millions bucket not a good solution at least in our experience with
sharding.

I read someone in the IRC telling that they're using 1M objects per shard,
that means that shard limit is a "soft" limit. And deployments with
multi-site with a limit of 128 shards.

Manuel



-Mensaje original-
De: ceph-users  En nombre de Harald Staub
Enviado el: lunes, 17 de junio de 2019 17:01
Para: Ceph Users 
Asunto: [ceph-users] Even more objects in a single bucket?

There are customers asking for 500 million objects in a single object
storage bucket (i.e. 5000 shards), but also more. But we found some places
that say that there is a limit in the number of shards per bucket, e.g.

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/ob
ject_gateway_guide_for_ubuntu/administration_cli

It says that the maximum number of shards is 7877. But I could not find this
magic number (or any other limit) on http://docs.ceph.com.

Maybe this hard limit no longer applies to Nautilus? Maybe there is a
recommended soft limit?

Background about the application: Veeam (veeam.com) is a backup solution for
VMWare that can embed a cloud storage tier with object storage (only with a
single bucket). Just thinking loud:  Maybe this could work with an indexless
bucket. Not sure how manageable this would be, e.g. to monitor how much
space is used. Maybe separate pools would be needed.

  Harry
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can I limit OSD memory usage?

2019-06-07 Thread EDH - Manuel Rios Fernandez
Hi Sergei,

Please add to your host:

For 64GB RAM, reserve 1GB.

vm.min_free_kbytes = 1048576
For 128GB RAM, reserve 2GB.

vm.min_free_kbytes = 2097152
For 256GB RAM, reserve 3GB.

vm.min_free_kbytes = 3145728

This will prevent to your OSD to use  ALL memory of host and OOM act.

Regards


-Mensaje original-
De: ceph-users  En nombre de Sergei
Genchev
Enviado el: viernes, 7 de junio de 2019 23:35
Para: Ceph Users 
Asunto: [ceph-users] Can I limit OSD memory usage?

 Hi,
 My OSD processes are constantly getting killed by OOM killer. My cluster
has 5 servers, each with 18 spinning disks, running 18 OSD daemons in 48GB
of memory.
 I was trying to limit OSD cache, according to
http://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/

[osd]
bluestore_cache_size_ssd = 1G
bluestore_cache_size_hdd = 768M

Yet, my OSDs are using way more memory than that. I have seen as high as
3.2G

KiB Mem : 47877604 total,   310172 free, 45532752 used,  2034680 buff/cache
KiB Swap:  2097148 total,0 free,  2097148 used.   950224 avail Mem

PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND
 352516 ceph  20   0 3962504   2.8g   4164 S   2.3  6.1   4:22.98
ceph-osd
 350771 ceph  20   0 3668248   2.7g   4724 S   3.0  6.0   3:56.76
ceph-osd
 352777 ceph  20   0 3659204   2.7g   4672 S   1.7  5.9   4:10.52
ceph-osd
 353578 ceph  20   0 3589484   2.6g   4808 S   4.6  5.8   3:37.54
ceph-osd
 352280 ceph  20   0 3577104   2.6g   4704 S   5.9  5.7   3:44.58
ceph-osd
 350933 ceph  20   0 3421168   2.5g   4140 S   2.6  5.4   3:38.13
ceph-osd
 353678 ceph  20   0 3368664   2.4g   4804 S   4.0  5.3  12:47.12
ceph-osd
 350665 ceph  20   0 3364780   2.4g   4716 S   2.6  5.3   4:23.44
ceph-osd
 353101 ceph  20   0 3304288   2.4g   4676 S   4.3  5.2   3:16.53
ceph-osd
 ...


 Is there any way for me to limit how much memory does OSD use?
Thank you!

ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic
(stable) ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD RAM recommendations

2019-06-07 Thread EDH - Manuel Rios Fernandez
In nautilus min 4GB per disk .
In case of ssd/nvme 6-12GB per disk.

8GB per disk is a good way to get performance

+2/4GB for the OS.

Regards,

Manuel

-Mensaje original-
De: ceph-users  En nombre de
jes...@krogh.cc
Enviado el: viernes, 7 de junio de 2019 19:36
Para: Jorge Garcia 
CC: ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] OSD RAM recommendations

> I'm a bit confused by the RAM recommendations for OSD servers. I have 
> also seen conflicting information in the lists (1 GB RAM per OSD, 1 GB 
> RAM per TB, 3-5 GB RAM per OSD, etc.). I guess I'm a lot better with a 
> concrete example:

I think it depends on the usagepattern - the more the better.
When configured the OSD daemon will use the memory as disk-caching for reads
- I have a simiar setup 7 hosts x 10TB x 12 disk - with 512GB each This
serves an "active dataset" to a HPC cluster, where it is hugely beneficial
to be able to cache the "hot data" which is 1.5TB'ish.

If your "hot" dataset is smaller, then less will do as well.

Jesper


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cls_rgw.cc:3461: couldn't find tag in name index tag

2019-06-05 Thread EDH - Manuel Rios Fernandez
Hi

 

Checking our cluster logs we found tons of this lines in the osd.

 

One osd


/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x8
6_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rp
m/el7/BUILD/ceph-14.2.1/src/cls/rgw/cls_rgw.cc:3461: couldn't find tag in
name index tag=48efb8c3-693c-4fe0-bbe4-fdc16f590a82.9710765.5817269

 

Other osd


/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x8
6_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rp
m/el7/BUILD/ceph-14.2.1/src/cls/rgw/cls_rgw.cc:979:
rgw_bucket_complete_op():
entry.name=_multipart_MBS-25c5afb5-f8f1-43cc-91ee-f49a3258012b/CBB_SRVCLASS2
/CBB_DiskImage/Disk_----/Volume_NTFS_000
0----$/20190605210028/102.cbrevision.2~65Mi-_pt5OPiV
6ULDxpScrmPlrD7yEz.208 entry.instance= entry.meta.category=1

 

All 44 ssd got lines like those with different information but refer to the
same cls_rgw.cc 

 

Of course is related to rgw or rgw index I think so but .

 

Are this entries ok? If yes how can we disable it? 

 

Best Regards

Manuel

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Lifecycle policy completed but not done

2019-05-30 Thread EDH - Manuel Rios Fernandez
Hi Cephs!

 

Yesterday we setup a Lifecycle policy for remote all incomplete partial
uploads in the buckets, due this make mistakes between Used space showed in
tools and bucket stats from ceph.

 

We setup this policy (s3cmd setlifecycle rule.xml s3://GIB --no-ssl)

 

http://s3.amazonaws.com/doc/2006-03-01/;>

  

Expire old logs

logs/

Enabled



  90



  

 

  

Remove uncompleted uploads

Enabled





  1



  



 

And waited until 22:00 PM for the new cycle of lifecycle.

 

Them we checked this morning, and looks like completed.

 

[root@CEPH-ADMIN home]# radosgw-admin lc list

[

{

"bucket":
":ControlGroup:48efb8c3-693c-4fe0-bbe4-fdc16f590a82.3886182.8",

"status": "COMPLETE"

},

{

"bucket": ":GIB:48efb8c3-693c-4fe0-bbe4-fdc16f590a82.3886182.17",

"status": "COMPLETE"

}

]

 

 

But after s3cmd multipart s3://GIB --no-ssl -continue we saw thousands of
incompleted uploads like:

 

2019-01-25T20:04:15.559Z
s3://GIB/MBS-f1454908-2af7-4a8b-8b46-7d6d6e86adba/CBB_SERVER/CBB_DiskImage/D
isk_----0002/Volume_NTFS_---
-:/20190125200230/3.cbrevision
2~unJtWv-x9xrPjb2GSOe32PVT0QQ3QGw

2019-01-25T20:53:43.183Z
s3://GIB/MBS-f1454908-2af7-4a8b-8b46-7d6d6e86adba/CBB_SERVER/CBB_DiskImage/D
isk_----0002/Volume_NTFS_---
-:/20190125200230/3.cbrevision
2~yV_enUnhb6ch8GiDUIGCsgaon2Z9dyi

 

Is there any limit per cycle for cancel the partial uploads?

 

Regards

 

Manuel

 

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs causing high load on vm, taking down 15 min later another cephfs vm

2019-05-21 Thread EDH - Manuel Rios Fernandez
Hi Marc

Is there any scrub / deepscrub running in the affected OSDs?

Best Regards,
Manuel

-Mensaje original-
De: ceph-users  En nombre de Marc Roos
Enviado el: martes, 21 de mayo de 2019 10:01
Para: ceph-users ; Marc Roos

Asunto: Re: [ceph-users] cephfs causing high load on vm, taking down 15 min
later another cephfs vm

 
I have evicted all client connections and have still high load on osd's 

And ceph osd pool stats shows still client activity?

pool fs_data id 20
  client io 565KiB/s rd, 120op/s rd, 0op/s wr




-Original Message-
From: Marc Roos
Sent: dinsdag 21 mei 2019 9:51
To: ceph-users@lists.ceph.com; Marc Roos
Subject: RE: [ceph-users] cephfs causing high load on vm, taking down 15 min
later another cephfs vm


I have got this today again? I cannot unmount the filesystem and looks
like some osd's are having 100% cpu utilization?


-Original Message-
From: Marc Roos
Sent: maandag 20 mei 2019 12:42
To: ceph-users
Subject: [ceph-users] cephfs causing high load on vm, taking down 15 min 
later another cephfs vm



I got my first problem with cephfs in a production environment. Is it 
possible from these logfiles to deduct what happened?

svr1 is connected to ceph client network via switch
svr2 vm is collocated on c01 node.
c01 has osd's and the mon.a colocated. 

svr1 was the first to report errors at 03:38:44. I have no error 
messages reported of a network connection problem by any of the ceph 
nodes. I have nothing in dmesg on c01.

[@c01 ~]# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
[@c01 ~]# uname -a
Linux c01 3.10.0-957.10.1.el7.x86_64 #1 SMP Mon Mar 18 15:06:45 UTC 2019 

x86_64 x86_64 x86_64 GNU/Linux
[@c01 ~]# ceph versions
{
"mon": {
"ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) 

luminous (stable)": 3
},
"mgr": {
"ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) 

luminous (stable)": 3
},
"osd": {
"ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) 

luminous (stable)": 32
},
"mds": {
"ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) 

luminous (stable)": 2
},
"rgw": {
"ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) 

luminous (stable)": 2
},
"overall": {
"ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) 

luminous (stable)": 42
}
}




[0] svr1 messages 
May 20 03:36:01 svr1 systemd: Started Session 308978 of user root.
May 20 03:36:01 svr1 systemd: Started Session 308979 of user root.
May 20 03:36:01 svr1 systemd: Started Session 308979 of user root.
May 20 03:36:01 svr1 systemd: Started Session 308980 of user root.
May 20 03:36:01 svr1 systemd: Started Session 308980 of user root.
May 20 03:38:01 svr1 systemd: Started Session 308981 of user root.
May 20 03:38:01 svr1 systemd: Started Session 308981 of user root.
May 20 03:38:01 svr1 systemd: Started Session 308982 of user root.
May 20 03:38:01 svr1 systemd: Started Session 308982 of user root.
May 20 03:38:01 svr1 systemd: Started Session 308983 of user root.
May 20 03:38:01 svr1 systemd: Started Session 308983 of user root.
May 20 03:38:44 svr1 kernel: libceph: osd0 192.168.x.111:6814 io error
May 20 03:38:44 svr1 kernel: libceph: osd0 192.168.x.111:6814 io error
May 20 03:38:45 svr1 kernel: last message repeated 5 times
May 20 03:38:45 svr1 kernel: libceph: mon0 192.168.x.111:6789 io error
May 20 03:38:45 svr1 kernel: libceph: mon0 192.168.x.111:6789 session 
lost, hunting for new mon
May 20 03:38:45 svr1 kernel: last message repeated 5 times
May 20 03:38:45 svr1 kernel: libceph: mon0 192.168.x.111:6789 io error
May 20 03:38:45 svr1 kernel: libceph: mon0 192.168.x.111:6789 session 
lost, hunting for new mon
May 20 03:38:45 svr1 kernel: libceph: mon1 192.168.x.112:6789 session 
established
May 20 03:38:45 svr1 kernel: libceph: mon1 192.168.x.112:6789 session 
established
May 20 03:38:45 svr1 kernel: libceph: osd0 192.168.x.111:6814 io error
May 20 03:38:45 svr1 kernel: libceph: osd0 192.168.x.111:6814 io error
May 20 03:38:45 svr1 kernel: libceph: mon1 192.168.x.112:6789 io error
May 20 03:38:45 svr1 kernel: libceph: mon1 192.168.x.112:6789 session 
lost, hunting for new mon
May 20 03:38:45 svr1 kernel: libceph: mon1 192.168.x.112:6789 io error
May 20 03:38:45 svr1 kernel: libceph: mon1 192.168.x.112:6789 session 
lost, hunting for new mon
May 20 03:38:45 svr1 kernel: libceph: mon0 192.168.x.111:6789 session 
established
May 20 03:38:45 svr1 kernel: libceph: mon0 192.168.x.111:6789 session 
established
May 20 03:38:45 svr1 kernel: libceph: mon0 192.168.x.111:6789 io error
May 20 03:38:45 svr1 kernel: libceph: mon0 192.168.x.111:6789 session 
lost, hunting for new mon
May 20 03:38:45 svr1 kernel: libceph: mon0 192.168.x.111:6789 io error
May 20 03:38:45 svr1 kernel: libceph: mon0 192.168.x.111:6789 session 
lost, hunting for new mon
May 20 03:38:45 svr1 kernel: libceph: mon2 192.168.x.113:6789 

Re: [ceph-users] Large OMAP Objects in default.rgw.log pool

2019-05-20 Thread EDH - Manuel Rios Fernandez
Hi Arnondh,

 

Whats your ceph version?

 

Regards

 

 

De: ceph-users  En nombre de mr. non non
Enviado el: lunes, 20 de mayo de 2019 12:39
Para: ceph-users@lists.ceph.com
Asunto: [ceph-users] Large OMAP Objects in default.rgw.log pool

 

Hi,

 

I found the same issue like above. 

Does anyone know how to fix it?

 

Thanks.

Arnondh

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket

2019-05-18 Thread EDH - Manuel Rios Fernandez
Hi Oscar,

We also come back to civetweb , all beast frontend got huge memory load 98% of 
host memory (64GB) and don’t accepting traffic until reboot daemons looks 
unstable right now , at least in 14.2.1

Now all RGW (civetweb) consumes near 6GB RAM each one.

Also we continue trying to find why our ceph RGW listing become so poor.

We found that 100% of HEAD petition at RGW got a 404 response. Is that ok?

Example:

2019-05-19 07:37:22.756 7f5fc5855700  1 civetweb: 0x55c84595b618: 172.16.2.8 - 
- [19/May/2019:07:37:22 +0200] "HEAD 
/Infoself/MBS-ccb4da86-2b33-4291-ba8b-dd21d4b16e45/CBB_SRV-CONTROL/CBB_VM/192.168.7.247/Servidor%20pc1/Hard%20disk%201%24/20190518220111/85.cbrevision
 HTTP/1.1" 404 225 - CloudBerryLab.Base.HttpUtil.Client 5.9.5 
2019-05-19 07:37:27.490 7f5fcd064700  1 civetweb: 0x55c845952270: 172.16.2.8 - 
- [19/May/2019:07:37:27 +0200] "HEAD 
/ControlGroup/MBS-e9045ebd-3174-46d4-9ecf-5c2e572a89b5/CBB_SERVERL/CBB_DiskImage/Disk_----/Volume_NTFS_----0001%24/20190505010049/41.cbrevision
 HTTP/1.1" 404 225 - CloudBerryLab.Base.HttpUtil.Client 5.9.5 
2019-05-19 07:37:32.488 7f5fca85f700  1 civetweb: 0x55c8459553a8: 172.16.2.8 - 
- [19/May/2019:07:37:32 +0200] "HEAD 
/Infoself/MBS-ccb4da86-2b33-4291-ba8b-dd21d4b16e45/CBB_SRV-CONTROL/CBB_VM/192.168.7.247/Servidor%20pc1/Hard%20disk%201%24/20190518220111/85.cbrevision
 HTTP/1.1" 404 225 - CloudBerryLab.Base.HttpUtil.Client 5.9.5 
2019-05-19 07:37:38.987 7f5fd086b700  1 civetweb: 0x55c84594dd88: 172.16.2.8 - 
- [19/May/2019:07:37:38 +0200] "HEAD 
/Infoself/MBS-ccb4da86-2b33-4291-ba8b-dd21d4b16e45/CBB_SRV-CONTROL/CBB_VM/192.168.7.247/Servidor%20pc1/Hard%20disk%201%24/20190518220111/85.cbrevision
 HTTP/1.1" 404 225 - CloudBerryLab.Base.HttpUtil.Client 5.9.5 
2019-05-19 07:37:41.178 7f5fd206e700  1 civetweb: 0x55c84594c000: 172.16.2.8 - 
- [19/May/2019:07:37:41 +0200] "HEAD 
/Infoself/MBS-ccb4da86-2b33-4291-ba8b-dd21d4b16e45/CBB_SRV-CONTROL/CBB_VM/192.168.7.247/Servidor%20pc1/Hard%20disk%201%24/20190518220111/85.cbrevision
 HTTP/1.1" 404 225 - CloudBerryLab.Base.HttpUtil.Client 5.9.5 
2019-05-19 07:37:51.993 7f5fcd865700  1 civetweb: 0x55c845951898: 172.16.2.8 - 
- [19/May/2019:07:37:51 +0200] "HEAD 
/ControlGroup/MBS-e9045ebd-3174-46d4-9ecf-5c2e572a89b5/CBB_SERVERL/CBB_DiskImage/Disk_----/Volume_NTFS_----0001%24/20190505010049/41.cbrevision
 HTTP/1.1" 404 225 - CloudBerryLab.Base.HttpUtil.Client 


Regards,
Manuel




-Mensaje original-
De: ceph-users  En nombre de Oscar Tiderman
Enviado el: viernes, 10 de mayo de 2019 10:14
Para: ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket


On 10/05/2019 08:42, EDH - Manuel Rios Fernandez wrote:
> Hi
>
> Yesterday night we added 2 Intel Optane Nvme
>
> Generated 4 partitions for get the max performance (Q=32) of those monsters, 
> total 8 Partitions of 50GB.
>
> Move the rgw.index pool got filled near 3GB .
>
> And...
>
> Still the same issue, listing buckets its really slow or deeply slow that 
> make its unable to common use when you need list.
>
> Im still don’t know how we can optimize it more.  Any suggestion/ideas? 
>
> Note: we also upgraded to ceph nautilus 14.2.1 for check if some fixes 
> also help,
>
> With the new RGW now log include in debug level 2 a param "latency" :
> 2019-05-10 08:39:39.793 7f4587482700  1 == req done 
> req=0x55e6163948e0 op status=0 http_status=200 latency=214.109s ==
> 2019-05-10 08:41:38.240 7f451ebb1700  1 == req done 
> req=0x55e6163348e0 op status=0 http_status=200 latency=144.57s ==
>
> Sometimes it get 214 (seconds??)
>
> Best Regards,
>
> Manuel
>
>
> -Mensaje original-
> De: ceph-users  En nombre de EDH - 
> Manuel Rios Fernandez Enviado el: sábado, 4 de mayo de 2019 15:53
> Para: 'Matt Benjamin' 
> CC: 'ceph-users' 
> Asunto: Re: [ceph-users] RGW Bucket unable to list buckets 100TB 
> bucket
>
> Hi Folks,
>
> The user is telling us that their software drops a timeout at 10 min 
> (secs)
>
> Reading documentation I think that we can set param  to 3600 secs as 
> Amazon got it as timeout
>
> rgw op thread timeout
>
> Description:  The timeout in seconds for open threads.
> Type: Integer
> Default:  600
>
> Of course list a bucket with 7M objects is a painfull maybe this help to 
> allow software complete the listing?
>
> Best Regards
> Manuel
>
> -Mensaje original-
> De: Matt Benjamin  Enviado el: viernes, 3 de mayo 
> de 2019 15:47
> Para: EDH - Manuel Rios Fernandez 
> CC: ceph-users 
> Asunto: Re: [ceph-users] RGW Bucket unable to list buckets 100TB 
> bucket
>
> I think I woul

[ceph-users] Scrub Crash OSD 14.2.1

2019-05-17 Thread EDH - Manuel Rios Fernandez
Hi ,

 

Today we got some osd that crash after scrub. Version 14.2.1

 

 

2019-05-17 12:49:40.955 7fd980d8fd80  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1558090180955778, "job": 1, "event": "recovery_finished"}

2019-05-17 12:49:40.967 7fd980d8fd80  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x
86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/r
pm/el7/BUILD/ceph-14.2.1/src/rocksdb/db/db_impl_open.cc:1287] DB pointer
0x55cbfcfc9000

2019-05-17 12:49:40.967 7fd980d8fd80  1 bluestore(/var/lib/ceph/osd/ceph-7)
_open_db opened rocksdb path db options
compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number
_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file
_max_buffer_size=0,compaction_readahead_size=2097152

2019-05-17 12:49:40.967 7fd980d8fd80  1 bluestore(/var/lib/ceph/osd/ceph-7)
_upgrade_super from 2, latest 2

2019-05-17 12:49:40.967 7fd980d8fd80  1 bluestore(/var/lib/ceph/osd/ceph-7)
_upgrade_super done

2019-05-17 12:49:41.090 7fd980d8fd80  0 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x8
6_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rp
m/el7/BUILD/ceph-14.2.1/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs

2019-05-17 12:49:41.092 7fd980d8fd80  0 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x8
6_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rp
m/el7/BUILD/ceph-14.2.1/src/cls/hello/cls_hello.cc:296: loading cls_hello

2019-05-17 12:49:41.093 7fd980d8fd80  0 _get_class not permitted to load kvs

2019-05-17 12:49:41.096 7fd980d8fd80  0 _get_class not permitted to load lua

2019-05-17 12:49:41.121 7fd980d8fd80  0 _get_class not permitted to load sdk

2019-05-17 12:49:41.124 7fd980d8fd80  0 osd.7 135670 crush map has features
283675107524608, adjusting msgr requires for clients

2019-05-17 12:49:41.124 7fd980d8fd80  0 osd.7 135670 crush map has features
283675107524608 was 8705, adjusting msgr requires for mons

2019-05-17 12:49:41.124 7fd980d8fd80  0 osd.7 135670 crush map has features
3026702624700514304, adjusting msgr requires for osds

2019-05-17 12:49:50.430 7fd980d8fd80  0 osd.7 135670 load_pgs

2019-05-17 12:50:09.302 7fd980d8fd80  0 osd.7 135670 load_pgs opened 201 pgs

2019-05-17 12:50:09.303 7fd980d8fd80  0 osd.7 135670 using weightedpriority
op queue with priority op cut off at 64.

2019-05-17 12:50:09.324 7fd980d8fd80 -1 osd.7 135670 log_to_monitors
{default=true}

2019-05-17 12:50:09.361 7fd980d8fd80 -1 osd.7 135670
mon_cmd_maybe_osd_create fail: 'osd.7 has already bound to class 'archive',
can not reset class to 'hdd'; use 'ceph osd crush rm-device-class ' to
remove old class first': (16) Device or resource busy

2019-05-17 12:50:09.365 7fd980d8fd80  0 osd.7 135670 done with init,
starting boot process

2019-05-17 12:50:09.371 7fd97339d700 -1 osd.7 135670 set_numa_affinity
unable to identify public interface 'vlan.4094' numa node: (2) No such file
or directory

2019-05-17 12:50:16.443 7fd95f375700 -1 bdev(0x55cbfcec4e00
/var/lib/ceph/osd/ceph-7/block) read_random 0x5428527b5be~15b3 error: (14)
Bad address

2019-05-17 12:50:16.467 7fd95f375700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x8
6_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rp
m/el7/BUILD/ceph-14.2.1/src/os/bluestore/BlueFS.cc: In function 'int
BlueFS::_read_random(BlueFS::FileReader*, uint64_t, size_t, char*)' thread
7fd95f375700 time 2019-05-17 12:50:16.445954

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x8
6_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rp
m/el7/BUILD/ceph-14.2.1/src/os/bluestore/BlueFS.cc: 1337: FAILED
ceph_assert(r == 0)

 

ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus
(stable)

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x14a) [0x55cbf14e265c]

2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*,
char const*, ...)+0) [0x55cbf14e282a]

3: (BlueFS::_read_random(BlueFS::FileReader*, unsigned long, unsigned long,
char*)+0x71a) [0x55cbf1b8fd6a]

4: (BlueRocksRandomAccessFile::Read(unsigned long, unsigned long,
rocksdb::Slice*, char*) const+0x20) [0x55cbf1bb8440]

5: (rocksdb::RandomAccessFileReader::Read(unsigned long, unsigned long,
rocksdb::Slice*, char*) const+0x960) [0x55cbf21e3ba0]

6: (rocksdb::BlockFetcher::ReadBlockContents()+0x3e7) [0x55cbf219dc27]

7: (()+0x11146a4) [0x55cbf218a6a4]

8:
(rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::FilePrefetchBu
ffer*, rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&,
rocksdb::BlockHandle const&, rocksdb::Slice,
rocksdb::BlockBasedTable::CachableEntry*, bool,
rocksdb::GetContext*)+0x2cc) [0x55cbf218c63c]

9: (rocksdb::DataBlockIter*
rocksdb::BlockBasedTable::NewDataBlockIterator(rocks
db::BlockBasedTable::Rep*, 

Re: [ceph-users] openstack with ceph rbd vms IO/erros

2019-05-17 Thread EDH - Manuel Rios Fernandez
Did you check your KVM host RAM usage?

 

We saw this on host very very loaded with overcommit in RAM causes a random 
crash of VM.

 

As you said for solve must be remounted externaly and fsck. You can prevent it 
disabled ceph cache at Openstack Nova host. But your VM’s are going get less 
performance.

 

Whats you Ceph & Openstack version?

 

Regards

 

 

De: ceph-users  En nombre de ??
Enviado el: viernes, 17 de mayo de 2019 9:01
Para: ceph-users 
Asunto: [ceph-users] openstack with ceph rbd vms IO/erros

 

hi: 

 I hava a openstack cluster with a ceph cluster ,use rbd,ceph cluster use ssd  
pool tier.

 

some vm on openstack sometimes crashed in two case .

 

1.  become readonly filesystem. after reboot ,it work fine again.

2.  IO errors . I must  repair the file system by fsck. thenreboot , it work 
fine again.

 

I do not know if this is ceph bugs or kvm bugs.

 

I need some ideas to resolv this ,Anyone can help me ?

Look forward to your reply

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker diferent.

2019-05-15 Thread EDH - Manuel Rios Fernandez
   0.87299  1.0 894 GiB 602 GiB 598 GiB 2.5 GiB  1.7 GiB 291 GiB 
67.39 0.80 198 up osd.50
 58 ssd   0.43599  1.0 447 GiB 326 GiB 324 GiB 433 MiB  1.0 GiB 121 GiB 
72.92 0.87  92 up osd.58
 59 ssd   0.43599  1.0 447 GiB 321 GiB 320 GiB 1.1 MiB  1.2 GiB 125 GiB 
71.96 0.86 107 up osd.59
 67 ssd   0.43599  1.0 447 GiB 312 GiB 311 GiB 324 MiB  1.2 GiB 134 GiB 
69.91 0.83  97 up osd.67
 68 ssd   0.43599  1.0 447 GiB 303 GiB 301 GiB 809 MiB  1.1 GiB 144 GiB 
67.82 0.81  86 up osd.68
 69 ssd   0.43599  1.0 447 GiB 303 GiB 301 GiB 822 MiB  1.0 GiB 144 GiB 
67.83 0.81 113 up osd.69
 73 ssd   0.87299  1.0 894 GiB 614 GiB 612 GiB 426 MiB  1.5 GiB 280 GiB 
68.67 0.82 177 up osd.73
 TOTAL 684 TiB 575 TiB 574 TiB  54 GiB  1.1 TiB 109 TiB 
84.03
MIN/MAX VAR: 0.00/1.05  STDDEV: 25.52

-Mensaje original-
De: J. Eric Ivancich  
Enviado el: miércoles, 15 de mayo de 2019 18:12
Para: EDH - Manuel Rios Fernandez ; 'Casey Bodley' 
; ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker 
diferent.

Hi Manuel,

My response is interleaved below.

On 5/8/19 3:17 PM, EDH - Manuel Rios Fernandez wrote:
> Eric,
> 
> Yes we do :
> 
> time s3cmd ls s3://[BUCKET]/ --no-ssl and we get near 2min 30 secs for list 
> the bucket.

We're adding an --allow-unordered option to `radosgw-admin bucket list`.
That would likely speed up your listing. If you want to follow the trackers, 
they are:

https://tracker.ceph.com/issues/39637 [feature added to master]
https://tracker.ceph.com/issues/39730 [nautilus backport]
https://tracker.ceph.com/issues/39731 [mimic backport]
https://tracker.ceph.com/issues/39732 [luminous backport]

> If we instantly hit again the query it normally timeouts.

That's interesting. I don't have an explanation for that behavior. I would 
suggest creating a tracker for the issue, ideally with the minimal steps to 
reproduce the issue. My concern is that your bucket has so many objects, and if 
that's related to the issue, it would not be easy to reproduce.

> Could you explain a little more "
> 
> With respect to your earlier message in which you included the output 
> of `ceph df`, I believe the reason that default.rgw.buckets.index 
> shows as
> 0 bytes used is that the index uses the metadata branch of the object to 
> store its data.
> "

Each object in ceph has three components. The data itself plus two types of 
metadata (omap and xattr). The `ceph df` command doesn't count the metadata.

The bucket indexes that track the objects in each bucket use only the metadata. 
So you won't see that reported in `ceph df`.

> I read in IRC today that in Nautilus release now is well calculated and no 
> show more 0B. Is it correct?

I don't know. I wasn't aware of any changes in nautilus that report metadata in 
`ceph df`.

> Thanks for your response.

You're welcome,

Eric

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How do you deal with "clock skew detected"?

2019-05-15 Thread EDH - Manuel Rios Fernandez
We setup 2 monitors as NTP server, and the other nodes are sync from monitors.

-Mensaje original-
De: ceph-users  En nombre de Richard Hesketh
Enviado el: miércoles, 15 de mayo de 2019 14:04
Para: ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] How do you deal with "clock skew detected"?

Another option would be adding a boot time script which uses ntpdate (or
something) to force an immediate sync with your timeservers before ntpd starts 
- this is actually suggested in ntpdate's man page!

Rich

On 15/05/2019 13:00, Marco Stuurman wrote:
> Hi Yenya,
> 
> You could try to synchronize the system clock to the hardware clock 
> before rebooting. Also try chrony, it catches up very fast.
> 
> 
> Kind regards,
> 
> Marco Stuurman
> 
> 
> Op wo 15 mei 2019 om 13:48 schreef Jan Kasprzak  >
> 
> Hello, Ceph users,
> 
> how do you deal with the "clock skew detected" HEALTH_WARN message?
> 
> I think the internal RTC in most x86 servers does have 1 second resolution
> only, but Ceph skew limit is much smaller than that. So every time I 
> reboot
> one of my mons (for kernel upgrade or something), I have to wait for 
> several
> minutes for the system clock to synchronize over NTP, even though ntpd
> has been running before reboot and was started during the system
> boot again.
> 
> Thanks,
> 
> -Yenya


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Using centraliced management configuration drops some unrecognized config option

2019-05-14 Thread EDH - Manuel Rios Fernandez
Hi

 

We're moving our config to centralized management configuration with "ceph
config set" and with the minimal ceph.conf in all nodes.

 

Several options from ceph are not allowed. Why? 

ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus
(stable)

 

ceph config set osd osd_mkfs_type xfs

Error EINVAL: unrecognized config option 'osd_mkfs_type'

ceph config set osd osd_op_threads 12

Error EINVAL: unrecognized config option 'osd_op_threads'

ceph config set osd osd_disk_threads 2

Error EINVAL: unrecognized config option 'osd_disk_threads'

ceph config set osd osd_recovery_threads 4

Error EINVAL: unrecognized config option 'osd_recovery_threads'

ceph config set osd osd_recovery_thread 4

Error EINVAL: unrecognized config option 'osd_recovery_thread'

 

Bug? Failed in the cli setup?

 

Regards

 

Manuel

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph nautilus deep-scrub health error

2019-05-14 Thread EDH - Manuel Rios Fernandez
Hi Muthu

 

We found the same issue near 2000 pgs not deep-scrubbed in time.

 

We’re manually force scrubbing with :

 

ceph health detail | grep -i not | awk '{print $2}' | while read i; do ceph pg 
deep-scrub ${i}; done

 

It launch near 20-30 pgs to be deep-scrubbed. I think you can improve  with a 
sleep of 120 secs between scrub to prevent overload your osd.

 

For disable deep-scrub you can use “ceph osd set nodeep-scrub” , Also you can 
setup deep-scrub with threshold .

#Start Scrub 22:00

osd scrub begin hour = 22

#Stop Scrub 8

osd scrub end hour = 8

#Scrub Load 0.5

osd scrub load threshold = 0.5

 

Regards,

 

Manuel

 

 

 

 

De: ceph-users  En nombre de nokia ceph
Enviado el: martes, 14 de mayo de 2019 11:44
Para: Ceph Users 
Asunto: [ceph-users] ceph nautilus deep-scrub health error

 

Hi Team,

 

After upgrading from Luminous to Nautilus , we see 654 pgs not deep-scrubbed in 
time error in ceph status . How can we disable this flag? . In our setup we 
disable deep-scrubbing for performance issues.

 

Thanks,

Muthu

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph MGR CRASH : balancer module

2019-05-14 Thread EDH - Manuel Rios Fernandez
We can confirm that Balancer module works smooth in 14.2.1.

 

We’re balancing with bytes and pg. Now all osd are 100% balanced.

 

 

 

De: ceph-users  En nombre de 
xie.xing...@zte.com.cn
Enviado el: martes, 14 de mayo de 2019 9:53
Para: tze...@us.ibm.com
CC: ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] Ceph MGR CRASH : balancer module

 

Should be fixed by https://github.com/ceph/ceph/pull/27225 

You can simply upgrade to v14.2.1 to get rid of it,

or you can do 'ceph balancer off' to temporarily disable automatic balancing...

 

 

 

 

原始邮件

发件人:TarekZegar mailto:tze...@us.ibm.com> >

收件人:ceph-users@lists.ceph.com   
mailto:ceph-users@lists.ceph.com> >;

日 期 :2019年05月14日 01:53

主 题 :[ceph-users] Ceph MGR CRASH : balancer module

___
ceph-users mailing list
ceph-users@lists.ceph.com  
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Hello,

My manager keeps dying, the last meta log is below. What is causing this? I do 
have two roots in the osd tree with shared hosts(see below), I can't imagine 
that is causing balancer to fail?


meta log:
{
   "crash_id": 
"2019-05-11_19:09:17.999875Z_aa7afa7c-bc7e-43ec-b32a-821bd47bd68b",
   "timestamp": "2019-05-11 19:09:17.999875Z",
   "process_name": "ceph-mgr",
   "entity_name": "mgr.pok1-qz1-sr1-rk023-s08",
   "ceph_version": "14.2.0",
   "utsname_hostname": "pok1-qz1-sr1-rk023-s08",
   "utsname_sysname": "Linux",
   "utsname_release": "4.15.0-1014-ibm-gt",
   "utsname_version": "#16-Ubuntu SMP Tue Dec 11 11:19:10 UTC 2018",
   "utsname_machine": "x86_64",
   "os_name": "Ubuntu",
   "os_id": "ubuntu",
   "os_version_id": "18.04",
   "os_version": "18.04.1 LTS (Bionic Beaver)",
   "assert_condition": "osd_weight.count(i.first)",
   "assert_func": "int OSDMap::calc_pg_upmaps(CephContext*, float, int, const 
std::set&, OSDMap::Incremental*)",
   "assert_file": "/build/ceph-14.2.0/src/osd/OSDMap.cc",
   "assert_line": 4743,
   "assert_thread_name": "balancer",
   "assert_msg": "/build/ceph-14.2.0/src/osd/OSDMap.cc: In function 'int 
OSDMap::calc_pg_upmaps(CephContext*, float, int, const std::set&, 
OSDMap::Incremental*)' thread 7fffd6572700 time 2019-05-11 
19:09:17.998114\n/build/ceph-14.2.0/src/osd/OSDMap.cc: 4743: FAILED 
ceph_assert(osd_weight.count(i.first))\n",
   "backtrace": [
   "(()+0x12890) [0x7fffee586890]",
   "(gsignal()+0xc7) [0x7fffed67ee97]",
   "(abort()+0x141) [0x7fffed680801]",
   "(ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x1a3) [0x7fffef1eb7d3]",
   "(ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, 
char const*, )+0) [0x7fffef1eb95d]",
   "(OSDMap::calc_pg_upmaps(CephContext*, float, int, std::set, std::allocator > const&, OSDMap::Incremental*)+0x274b) 
[0x7fffef61bb3b]",
   "(()+0x1d52b6) [0x557292b6]",
   "(PyEval_EvalFrameEx()+0x8010) [0x7fffeeab21d0]",
   "(PyEval_EvalCodeEx()+0x7d8) [0x7fffeebe2278]",
   "(PyEval_EvalFrameEx()+0x5bf6) [0x7fffeeaafdb6]",
   "(PyEval_EvalFrameEx()+0x8b5b) [0x7fffeeab2d1b]",
   "(PyEval_EvalFrameEx()+0x8b5b) [0x7fffeeab2d1b]",
   "(PyEval_EvalCodeEx()+0x7d8) [0x7fffeebe2278]",
   "(()+0x1645f9) [0x7fffeeb675f9]",
   "(PyObject_Call()+0x43) [0x7fffeea57333]",
   "(()+0x1abd1c) [0x7fffeebaed1c]",
   "(PyObject_Call()+0x43) [0x7fffeea57333]",
   "(PyObject_CallMethod()+0xc8) [0x7fffeeb7bc78]",
   "(PyModuleRunner::serve()+0x62) [0x55725f32]",
   "(PyModuleRunner::PyModuleRunnerThread::entry()+0x1cf) [0x557265df]",
   "(()+0x76db) [0x7fffee57b6db]",
   "(clone()+0x3f) [0x7fffed76188f]"
   ]
}

OSD TREE:
ID  CLASS WEIGHTTYPE NAME   STATUS REWEIGHT PRI-AFF
-2954.58200 root tzrootthreenodes
-2518.19400 host pok1-qz1-sr1-rk001-s20
 0   ssd   1.81898 osd.0   up  1.0 1.0
122   ssd   1.81898 osd.122 up  1.0 1.0
135   ssd   1.81898 osd.135 up  1.0 1.0
149   ssd   1.81898 osd.149 up  1.0 1.0
162   ssd   1.81898 osd.162 up  1.0 1.0
175   ssd   1.81898 osd.175 up  1.0 1.0
188   ssd   1.81898 osd.188 up  1.0 1.0
200   ssd   1.81898 osd.200 up  1.0 1.0
213   ssd   1.81898 osd.213 up  1.0 1.0
225   ssd   1.81898 osd.225 up  1.0 1.0
-518.19400 host pok1-qz1-sr1-rk002-s05
112   ssd   1.81898 osd.112 up  1.0 1.0
120   ssd   1.81898 osd.120 up  1.0 1.0
132   ssd   1.81898 osd.132 up  1.0 1.0
144   ssd   1.81898 osd.144

[ceph-users] Ceph Health 14.2.1 Dont report slow OPS

2019-05-13 Thread EDH - Manuel Rios Fernandez
Hi

 

The lastest versión of ceph is not reporting anymore slowops in dashboard
and cli? Bug? Or expected?

 

ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus
(stable)

Linux 3.10.0-957.12.1.el7.x86_64 #1 SMP Mon Apr 29 14:59:59 UTC 2019 x86_64
x86_64 x86_64 GNU/Linux

 

2019-05-13 16:48:27.536 7f38111a4700 -1 osd.5 129902 get_health_metrics
reporting 136 slow ops, oldest is osd_op(client.15648485.0:469762 39.79
39:9e06324d:::48efb8c3-693c-4fe0-bbe4-fdc16f590a82.3886182.18__multipart_MBS
-6736e395-d1ca-43a7-8098-324ef41f3881%2fCBB_BIM-IIS%2fCBB_DiskImage%2fDisk_0
000----%2fVolume_NTFS_----00
01$%2f20190512220203%2f92.cbrevision.2~gmBBOM5CIdaevVmz6Stp2ot8UguMF
7H.5:head [create,writefull 0~4194304] snapc 0=[]
ondisk+write+known_if_redirected e129417)

2019-05-13 16:48:28.510 7f38111a4700 -1 osd.5 129902 get_health_metrics
reporting 136 slow ops, oldest is osd_op(client.15648485.0:469762 39.79
39:9e06324d:::48efb8c3-693c-4fe0-bbe4-fdc16f590a82.3886182.18__multipart_MBS
-6736e395-d1ca-43a7-8098-324ef41f3881%2fCBB_BIM-IIS%2fCBB_DiskImage%2fDisk_0
000----%2fVolume_NTFS_----00
01$%2f20190512220203%2f92.cbrevision.2~gmBBOM5CIdaevVmz6Stp2ot8UguMF
7H.5:head [create,writefull 0~4194304] snapc 0=[]
ondisk+write+known_if_redirected e129417)

2019-05-13 16:48:29.508 7f38111a4700 -1 osd.5 129902 get_health_metrics
reporting 136 slow ops, oldest is osd_op(client.15648485.0:469762 39.79
39:9e06324d:::48efb8c3-693c-4fe0-bbe4-fdc16f590a82.3886182.18__multipart_MBS
-6736e395-d1ca-43a7-8098-324ef41f3881%2fCBB_BIM-IIS%2fCBB_DiskImage%2fDisk_0
000----%2fVolume_NTFS_----00
01$%2f20190512220203%2f92.cbrevision.2~gmBBOM5CIdaevVmz6Stp2ot8UguMF
7H.5:head [create,writefull 0~4194304] snapc 0=[]
ondisk+write+known_if_redirected e129417)

2019-05-13 16:48:30.509 7f38111a4700 -1 osd.5 129902 get_health_metrics
reporting 136 slow ops, oldest is osd_op(client.15648485.0:469762 39.79
39:9e06324d:::48efb8c3-693c-4fe0-bbe4-fdc16f590a82.3886182.18__multipart_MBS
-6736e395-d1ca-43a7-8098-324ef41f3881%2fCBB_BIM-IIS%2fCBB_DiskImage%2fDisk_0
000----%2fVolume_NTFS_----00
01$%2f20190512220203%2f92.cbrevision.2~gmBBOM5CIdaevVmz6Stp2ot8UguMF
7H.5:head [create,writefull 0~4194304] snapc 0=[]
ondisk+write+known_if_redirected e129417)

2019-05-13 16:48:31.489 7f38111a4700 -1 osd.5 129902 get_health_metrics
reporting 136 slow ops, oldest is osd_op(client.15648485.0:469762 39.79
39:9e06324d:::48efb8c3-693c-4fe0-bbe4-fdc16f590a82.3886182.18__multipart_MBS
-6736e395-d1ca-43a7-8098-324ef41f3881%2fCBB_BIM-IIS%2fCBB_DiskImage%2fDisk_0
000----%2fVolume_NTFS_----00
01$%2f20190512220203%2f92.cbrevision.2~gmBBOM5CIdaevVmz6Stp2ot8UguMF
7H.5:head [create,writefull 0~4194304] snapc 0=[]
ondisk+write+known_if_redirected e129417)

 

[root@CEPH001 ~]# ceph -s

  cluster:

id: e1ee8086-7cce-43fd-a252-3d677af22428

health: HEALTH_WARN

noscrub,nodeep-scrub flag(s) set

2663 pgs not deep-scrubbed in time

3245 pgs not scrubbed in time

 

  services:

mon: 3 daemons, quorum CEPH001,CEPH002,CEPH003 (age 110m)

mgr: CEPH001(active, since 83m)

osd: 120 osds: 120 up (since 65s), 120 in (since 22h)

 flags noscrub,nodeep-scrub

rgw: 1 daemon active (ceph-rgw03)

 

  data:

pools:   17 pools, 9336 pgs

objects: 112.63M objects, 284 TiB

usage:   541 TiB used, 144 TiB / 684 TiB avail

pgs: 9336 active+clean

 

  io:

client:   666 MiB/s rd, 81 MiB/s wr, 3.30k op/s rd, 969 op/s wr

 

[root@CEPH001 ~]# ceph health

HEALTH_WARN noscrub,nodeep-scrub flag(s) set; 2663 pgs not deep-scrubbed in
time; 3245 pgs not scrubbed in time

[root@CEPH001 ~]#

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow requests from bluestore osds

2019-05-12 Thread EDH - Manuel Rios Fernandez
Hi Marc,

Try to compact OSD with slow request 

ceph tell osd.[ID] compact

This will make the OSD offline for some seconds(SSD) to minutes(HDD) and 
perform a compact of OMAP database.

Regards,




-Mensaje original-
De: ceph-users  En nombre de Marc Schöchlin
Enviado el: lunes, 13 de mayo de 2019 6:59
Para: ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] Slow requests from bluestore osds

Hello cephers,

one week ago we replaced the bluestore cache size by "osd memory target" and 
removed the detail memory settings.
This storage class now runs 42*8GB spinners with a permanent write workload of 
2000-3000 write IOPS, and 1200-8000 read IOPS.

Out new setup is now:
(12.2.10 on Ubuntu 16.04)

[osd]
osd deep scrub interval = 2592000
osd scrub begin hour = 19
osd scrub end hour = 6
osd scrub load threshold = 6
osd scrub sleep = 0.3
osd snap trim sleep = 0.4
pg max concurrent snap trims = 1

[osd.51]
osd memory target = 8589934592
...

After that (restarting the entire cluster with these settings) we were very 
happy to not seeany slow request for 7 days.

Unfortunately this night the slow requests returned on one osd without any 
known change of the workload of the last 14 days (according to our detailed 
monitoring)

2019-05-12 22:00:00.000117 mon.ceph-mon-s43 [INF] overall HEALTH_OK
2019-05-12 23:00:00.000130 mon.ceph-mon-s43 [INF] overall HEALTH_OK
2019-05-13 00:00:00.000129 mon.ceph-mon-s43 [INF] overall HEALTH_OK
2019-05-13 00:00:44.069793 mon.ceph-mon-s43 [WRN] Health check failed: 416 slow 
requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:00:50.151190 mon.ceph-mon-s43 [WRN] Health check update: 439 slow 
requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:00:59.750398 mon.ceph-mon-s43 [WRN] Health check update: 452 slow 
requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:04.750697 mon.ceph-mon-s43 [WRN] Health check update: 283 slow 
requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:10.419801 mon.ceph-mon-s43 [WRN] Health check update: 230 slow 
requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:19.751516 mon.ceph-mon-s43 [WRN] Health check update: 362 slow 
requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:24.751822 mon.ceph-mon-s43 [WRN] Health check update: 324 slow 
requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:30.675160 mon.ceph-mon-s43 [WRN] Health check update: 341 slow 
requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:38.759012 mon.ceph-mon-s43 [WRN] Health check update: 390 slow 
requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:44.858392 mon.ceph-mon-s43 [WRN] Health check update: 366 slow 
requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:54.753388 mon.ceph-mon-s43 [WRN] Health check update: 352 slow 
requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:59.045220 mon.ceph-mon-s43 [INF] Health check cleared: 
REQUEST_SLOW (was: 168 slow requests are blocked > 32 sec. Implicated osds 51)
2019-05-13 00:01:59.045257 mon.ceph-mon-s43 [INF] Cluster is now healthy
2019-05-13 01:00:00.000114 mon.ceph-mon-s43 [INF] overall HEALTH_OK
2019-05-13 02:00:00.000130 mon.ceph-mon-s43 [INF] overall HEALTH_OK


The output of a "ceph health detail" loop at the time the problem occurred:

Mon May 13 00:01:27 CEST 2019
HEALTH_WARN 324 slow requests are blocked > 32 sec. Implicated osds 51 
REQUEST_SLOW 324 slow requests are blocked > 32 sec. Implicated osds 51
324 ops are blocked > 32.768 sec
osd.51 has blocked requests > 32.768 sec

The logfile of the OSD:

2019-05-12 23:57:28.767463 7f38da4e2700  4 rocksdb: (Original Log Time 
2019/05/12-23:57:28.767419) 
[/build/ceph-12.2.10/src/rocksdb/db/db_impl_compaction_flush.cc:132] [default] 
Level summary: base level 1 max b ytes base 268435456 files[2 4 21 122 0 0 0] 
max score 0.94

2019-05-12 23:57:28.767511 7f38da4e2700  4 rocksdb: 
[/build/ceph-12.2.10/src/rocksdb/db/db_impl_files.cc:388] [JOB 2991] Try to 
delete WAL files size 256700142, prev total WAL file size 257271487, number of 
live
 WAL files 2.

2019-05-12 23:58:07.816376 7f38ddce9700  0 log_channel(cluster) log [DBG] : 
34.ac scrub ok
2019-05-12 23:59:54.070025 7f38de4ea700  0 log_channel(cluster) log [DBG] : 
34.236 scrub starts
2019-05-13 00:02:21.818689 7f38de4ea700  0 log_channel(cluster) log [DBG] : 
34.236 scrub ok
2019-05-13 00:04:37.613094 7f38ead03700  4 rocksdb: 
[/build/ceph-12.2.10/src/rocksdb/db/db_impl_write.cc:684] reusing log 422507 
from recycle list

2019-05-13 00:04:37.613186 7f38ead03700  4 rocksdb: 
[/build/ceph-12.2.10/src/rocksdb/db/db_impl_write.cc:725] [default] New 
memtable created with log file: #422511. Immutable memtables: 0.

Any hints how to find more details about the origin of this problem?
How can we solve 

[ceph-users] Daemon configuration preference

2019-05-10 Thread EDH - Manuel Rios Fernandez
Hi Cephs

 

We migrated the ceph.conf into the cluster's configuration database.

 

What information got preference once the daemon startup ceph.conf or
configuration database?

 

Is cluster configuration databases read in-live or we continue needing
restart daemons?

 

Regards

 

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket

2019-05-10 Thread EDH - Manuel Rios Fernandez
Hi 

Yesterday night we added 2 Intel Optane Nvme 

Generated 4 partitions for get the max performance (Q=32) of those monsters, 
total 8 Partitions of 50GB.

Move the rgw.index pool got filled near 3GB .

And...

Still the same issue, listing buckets its really slow or deeply slow that make 
its unable to common use when you need list.

Im still don’t know how we can optimize it more.  Any suggestion/ideas? 

Note: we also upgraded to ceph nautilus 14.2.1 for check if some fixes also 
help,

With the new RGW now log include in debug level 2 a param "latency" :
2019-05-10 08:39:39.793 7f4587482700  1 == req done req=0x55e6163948e0 op 
status=0 http_status=200 latency=214.109s ==
2019-05-10 08:41:38.240 7f451ebb1700  1 == req done req=0x55e6163348e0 op 
status=0 http_status=200 latency=144.57s ==

Sometimes it get 214 (seconds??)

Best Regards,

Manuel


-Mensaje original-
De: ceph-users  En nombre de EDH - Manuel 
Rios Fernandez
Enviado el: sábado, 4 de mayo de 2019 15:53
Para: 'Matt Benjamin' 
CC: 'ceph-users' 
Asunto: Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket

Hi Folks,

The user is telling us that their software drops a timeout at 10 min (secs)

Reading documentation I think that we can set param  to 3600 secs as Amazon got 
it as timeout

rgw op thread timeout 

Description:The timeout in seconds for open threads.
Type:   Integer
Default:600

Of course list a bucket with 7M objects is a painfull maybe this help to allow 
software complete the listing?

Best Regards
Manuel

-Mensaje original-
De: Matt Benjamin  Enviado el: viernes, 3 de mayo de 2019 
15:47
Para: EDH - Manuel Rios Fernandez 
CC: ceph-users 
Asunto: Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket

I think I would not override the default value for "rgw list buckets max 
chunk", I have no experience doing that, though I can see why it might be 
plausible.

Matt

On Fri, May 3, 2019 at 9:39 AM EDH - Manuel Rios Fernandez 
 wrote:
>
> From changes right know we got some other errors...
>
> 2019-05-03 15:37:28.604 7f499a2e8700  1 == starting new request
> req=0x55f326692970 =
> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s::GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision::initializin
> g for trans_id = tx05c63-005ccc4418-e76558-default
> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision::getting op
> 0
> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:veri
> fying requester
> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:norm
> alizing buckets and tenants
> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:init
> permissions
> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:reca
> lculating target
> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:read
> ing permissions
> 2019-05-03 15:37:28.607 7f499a2e8700  2 req 23651:0.003s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:init
> op
> 2019-05-03 15:37:28.607 7f499a2e8700  2 req 23651:0.003s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-

Re: [ceph-users] Getting "No space left on device" when reading from cephfs

2019-05-09 Thread EDH - Manuel Rios Fernandez
Im not sure that setup to 128 max backfills is a good idea, I shared our config 
for recovery and backfilling:

 

osd recovery threads = 4

osd recovery op priority = 1

osd recovery max active = 2

osd recovery max single start = 1

osd max backfills = 4

osd backfill scan max = 16

osd backfill scan min = 4

osd client op priority = 63

 

Check the fullest osd my be one is full and this prevent of use the other 131TB 
raw or distribution is not even across osd

Check mon osd full ratio and mon osd nearfull ratio  , maybe put just a 2% more 
in full ratio makes different for you.

 

Regards

 

 

De: ceph-users  En nombre de Kári Bertilsson
Enviado el: jueves, 9 de mayo de 2019 14:08
Para: ceph-users 
Asunto: [ceph-users] Getting "No space left on device" when reading from cephfs

 

Hello

 

I am running cephfs with 8/2 erasure coding. I had about 40tb usable free(110tb 
raw), one small disk crashed and i added 2x10tb disks. Now it's backfilling & 
recovering with 0B free and i can't read a single file from the file system...

 

This happend with max-backfilling 4, but i have increased max backfills to 128, 
to hopefully get this over a little faster since system has been unusable for 
12 hours anyway. Not sure yet if that was a good idea.

 

131TB of raw space was somehow not enough to keep things running. Any tips to 
avoid this kind of scenario in the future ?

 

GLOBAL: 
   SIZE   AVAIL  RAW USED %RAW USED  
   489TiB 131TiB   358TiB 73.17  
POOLS: 
   NAMEID USED%USED  MAX AVAIL OBJECTS   
   ec82_pool   41  278TiB 100.000B 28549450  
   cephfs_metadata 42  174MiB   0.04381GiB   666939  
   rbd 51 99.3GiB  20.68381GiB25530 

  data: 
   pools:   3 pools, 704 pgs 
   objects: 29.24M objects, 278TiB 
   usage:   358TiB used, 131TiB / 489TiB avail 
   pgs: 1265432/287571907 objects degraded (0.440%) 
12366014/287571907 objects misplaced (4.300%) 
536 active+clean 
137 active+remapped+backfilling 
27  active+undersized+degraded+remapped+backfilling 
4   active+remapped+backfill_toofull 
 
 io: 
   client:   64.0KiB/s wr, 0op/s rd, 7op/s wr 
   recovery: 1.17GiB/s, 113objects/s

 

Is there anything i can do to restore reading ? I can understand writing not 
working, but why is it blocking reading also ? Any tips ?

 

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker diferent.

2019-05-08 Thread EDH - Manuel Rios Fernandez
Eric,

Yes we do :

time s3cmd ls s3://[BUCKET]/ --no-ssl and we get near 2min 30 secs for list the 
bucket.

If we instantly hit again the query it normally timeouts.


Could you explain a little more "

With respect to your earlier message in which you included the output of `ceph 
df`, I believe the reason that default.rgw.buckets.index shows as
0 bytes used is that the index uses the metadata branch of the object to store 
its data.
"
I read in IRC today that in Nautilus release now is well calculated and no show 
more 0B. Is it correct?

Thanks for your response.


-Mensaje original-
De: J. Eric Ivancich  
Enviado el: miércoles, 8 de mayo de 2019 21:00
Para: EDH - Manuel Rios Fernandez ; 'Casey Bodley' 
; ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker 
diferent.

Hi Manuel,

My response is interleaved.

On 5/7/19 7:32 PM, EDH - Manuel Rios Fernandez wrote:
> Hi Eric,
> 
> This looks like something the software developer must do, not something than 
> Storage provider must allow no?

True -- so you're using `radosgw-admin bucket list --bucket=XYZ` to list the 
bucket? Currently we do not allow for a "--allow-unordered" flag, but there's 
no reason we could not. I'm working on the PR now, although it might take some 
time before it gets to v13.

> Strange behavior is that sometimes bucket is list fast in less than 30 secs 
> and other time it timeout after 600 secs, the bucket contains 875 folders 
> with a total object number of 6Millions.
> 
> I don’t know how a simple list of 875 folder can timeout after 600 
> secs

Burkhard Linke's comment is on target. The "folders" are a trick using 
delimiters. A bucket is really entirely flat without a hierarchy.

> We bought several NVMe Optane for do 4 partitions in each PCIe card and get 
> up 1.000.000 IOPS for Index. Quite expensive because we calc that our index 
> is just 4GB (100-200M objects),waiting those cards. Any more idea?

With respect to your earlier message in which you included the output of `ceph 
df`, I believe the reason that default.rgw.buckets.index shows as
0 bytes used is that the index uses the metadata branch of the object to store 
its data.

> Regards

Eric

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker diferent.

2019-05-07 Thread EDH - Manuel Rios Fernandez
Hi Eric,

This looks like something the software developer must do, not something than 
Storage provider must allow no?

Strange behavior is that sometimes bucket is list fast in less than 30 secs and 
other time it timeout after 600 secs, the bucket contains 875 folders with a 
total object number of 6Millions.

I don’t know how a simple list of 875 folder can timeout after 600 secs

We bought several NVMe Optane for do 4 partitions in each PCIe card and get up 
1.000.000 IOPS for Index. Quite expensive because we calc that our index is 
just 4GB (100-200M objects),waiting those cards. Any more idea?

Regards




-Mensaje original-
De: J. Eric Ivancich  
Enviado el: martes, 7 de mayo de 2019 23:53
Para: EDH - Manuel Rios Fernandez ; 'Casey Bodley' 
; ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker 
diferent.

On 5/7/19 11:24 AM, EDH - Manuel Rios Fernandez wrote:
> Hi Casey
> 
> ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic
> (stable)
> 
> Reshard is something than don’t allow us customer to list index?
> 
> Regards
Listing of buckets with a large number of buckets is notoriously slow, because 
the entries are not stored in lexical order but the default behavior is to list 
the objects in lexical order.

If your use case allows for an unordered listing it would likely perform 
better. You can see some documentation here under the S3 API / GET BUCKET:

http://docs.ceph.com/docs/mimic/radosgw/s3/bucketops/

Are you using S3?

Eric

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker diferent.

2019-05-07 Thread EDH - Manuel Rios Fernandez
Ok our last shoot is buy NVME PCIe cards for index pool and dedicate to it.

Checking how many GB/TB are needed for the pool is not clean by ceph df show 0:

default.rgw.buckets.index  38 0 B 0   1.8 TiB   
   1056

Any idea for 200M Objects?



-Mensaje original-
De: Casey Bodley  
Enviado el: martes, 7 de mayo de 2019 19:13
Para: EDH - Manuel Rios Fernandez ; 
ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker 
diferent.


On 5/7/19 11:24 AM, EDH - Manuel Rios Fernandez wrote:
> Hi Casey
>
> ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic
> (stable)
>
> Reshard is something than don’t allow us customer to list index?

Reshard does not prevent buckets from being listed, it just spreads the index 
over more rados objects (so more osds). Bucket sharding does have an impact on 
listing performance though, because each request to list the bucket has to read 
from every shard of the bucket index in order to sort the entries. If any of 
those osds have performance issues or slow requests, that would slow down all 
bucket listings.

> Regards
>
>
> -Mensaje original-
> De: ceph-users  En nombre de Casey 
> Bodley Enviado el: martes, 7 de mayo de 2019 17:07
> Para: ceph-users@lists.ceph.com
> Asunto: Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and 
> marker diferent.
>
> When the bucket id is different than the bucket marker, that indicates 
> the bucket has been resharded. Bucket stats shows 128 shards, which is 
> reasonable for that object count. The rgw.none category in bucket 
> stats is nothing to worry about.
>
> What ceph version is this? This reminds me of a fix in 
> https://github.com/ceph/ceph/pull/23940, which I now see never got its 
> backports to mimic or luminous. :(
>
> On 5/7/19 10:20 AM, EDH - Manuel Rios Fernandez wrote:
>> Hi Ceph’s
>>
>> We got an issue that we’re still looking the cause after more than 60 
>> hour searching a misconfiguration.
>>
>> After cheking a lot of documentation and Questions we find 
>> that bucket id and bucket marker are not the same. We compared all 
>> our other bucket and all got the same id and marker.
>>
>> Also found some bucket with the rgw.none section an another not.
>>
>> This bucket is unable to be listed in a fashionable time. Customer 
>> relaxed usage from 120TB to 93TB , from 7Million objects to 5.8M.
>>
>> We isolated a single petition in a RGW server and check some metric, 
>> just try to list this bucket generate 2-3Gbps traffic from RGW to 
>> OSD/MON’s.
>>
>> I asked at IRC if there’re any problem about index pool be in other 
>> root in the same site at crushmap and we think that shouldn’t be.
>>
>> Any idea or suggestion, however crazy, will be proven.
>>
>> Our relevant configuration that may help :
>>
>> CEPH DF:
>>
>> ceph df
>>
>> GLOBAL:
>>
>>  SIZE AVAIL   RAW USED %RAW USED
>>
>>  684 TiB 139 TiB  545 TiB 79.70
>>
>> POOLS:
>>
>>  NAME   ID USED%USED MAX AVAIL 
>> OBJECTS
>>
>> volumes21 3.3 TiB 63.90   1.9 TiB
>> 831300
>>
>> backups22 0 B 0   1.9 TiB
>> 0
>>
>>  images 23 1.8 TiB 49.33   1.9
> TiB
>> 237066
>>
>> vms24 3.4 TiB 64.85   1.9 TiB
>> 811534
>>
>> openstack-volumes-archive  25  30 TiB 47.9232 TiB
>> 7748864
>>
>> .rgw.root  26 1.6 KiB 0   1.9 TiB
>> 4
>>
>> default.rgw.control27 0 B 0   1.9 TiB
>> 100
>>
>> default.rgw.data.root  28  56 KiB 0   1.9 TiB
>> 186
>>
>> default.rgw.gc 29 0 B 0   1.9 TiB
>> 32
>>
>> default.rgw.log30 0 B 0   1.9 TiB
>> 175
>>
>> default.rgw.users.uid  31 4.9 KiB 0   1.9 TiB
>>26
>>
>> default.rgw.users.email3612 B 0   1.9 TiB
>> 1
>>
>> default.rgw.users.keys 37   243 B 0   1.9 TiB
>> 14
>>
>> default.rgw.buckets.index  38 0 B 0   1.9 TiB
>> 1056
>>
>> default.rgw.buckets.data   39 245 TiB 93.8416 TiB
>> 102131428
>>
>> default.rgw.buckets.non-ec 40 0 B 0   1.9 TiB
>> 23046
>>
>> default.rgw.usage  4

Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker diferent.

2019-05-07 Thread EDH - Manuel Rios Fernandez
Hi Casey

ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic
(stable)

Reshard is something than don’t allow us customer to list index?

Regards


-Mensaje original-
De: ceph-users  En nombre de Casey Bodley
Enviado el: martes, 7 de mayo de 2019 17:07
Para: ceph-users@lists.ceph.com
Asunto: Re: [ceph-users] Ceph Bucket strange issues rgw.none + id and marker
diferent.

When the bucket id is different than the bucket marker, that indicates the
bucket has been resharded. Bucket stats shows 128 shards, which is
reasonable for that object count. The rgw.none category in bucket stats is
nothing to worry about.

What ceph version is this? This reminds me of a fix in
https://github.com/ceph/ceph/pull/23940, which I now see never got its
backports to mimic or luminous. :(

On 5/7/19 10:20 AM, EDH - Manuel Rios Fernandez wrote:
>
> Hi Ceph’s
>
> We got an issue that we’re still looking the cause after more than 60 
> hour searching a misconfiguration.
>
> After cheking a lot of documentation and Questions we find that 
> bucket id and bucket marker are not the same. We compared all our 
> other bucket and all got the same id and marker.
>
> Also found some bucket with the rgw.none section an another not.
>
> This bucket is unable to be listed in a fashionable time. Customer 
> relaxed usage from 120TB to 93TB , from 7Million objects to 5.8M.
>
> We isolated a single petition in a RGW server and check some metric, 
> just try to list this bucket generate 2-3Gbps traffic from RGW to 
> OSD/MON’s.
>
> I asked at IRC if there’re any problem about index pool be in other 
> root in the same site at crushmap and we think that shouldn’t be.
>
> Any idea or suggestion, however crazy, will be proven.
>
> Our relevant configuration that may help :
>
> CEPH DF:
>
> ceph df
>
> GLOBAL:
>
>     SIZE AVAIL   RAW USED %RAW USED
>
>     684 TiB 139 TiB  545 TiB 79.70
>
> POOLS:
>
>     NAME   ID USED    %USED MAX AVAIL OBJECTS
>
> volumes    21 3.3 TiB 63.90   1.9 TiB    
> 831300
>
> backups    22 0 B 0   1.9 TiB 
> 0
>
>     images 23 1.8 TiB 49.33   1.9
TiB    
> 237066
>
> vms    24 3.4 TiB 64.85   1.9 TiB    
> 811534
>
> openstack-volumes-archive  25  30 TiB 47.92    32 TiB   
> 7748864
>
> .rgw.root  26 1.6 KiB 0   1.9 TiB 
> 4
>
> default.rgw.control    27 0 B 0   1.9 TiB   
> 100
>
> default.rgw.data.root  28  56 KiB 0   1.9 TiB   
> 186
>
> default.rgw.gc 29 0 B 0   1.9 TiB    
> 32
>
> default.rgw.log    30 0 B 0   1.9 TiB   
> 175
>
> default.rgw.users.uid  31 4.9 KiB 0   1.9 TiB
>   26
>
> default.rgw.users.email    36    12 B 0   1.9 TiB 
> 1
>
> default.rgw.users.keys 37   243 B 0   1.9 TiB    
> 14
>
> default.rgw.buckets.index  38 0 B 0   1.9 TiB
> 1056
>
> default.rgw.buckets.data   39 245 TiB 93.84    16 TiB
> 102131428
>
> default.rgw.buckets.non-ec 40 0 B 0   1.9 TiB
> 23046
>
> default.rgw.usage  43 0 B 0    1.9
TiB 
> 6
>
> CEPH OSD Distribution:
>
> ceph osd tree
>
> ID  CLASS   WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
>
> -41 654.84045 root archive
>
> -37 130.96848 host CEPH-ARCH-R03-07
>
> 100 archive 10.91399 osd.100   up  1.0 1.0
>
> 101 archive 10.91399 osd.101   up  1.0 1.0
>
> 102 archive 10.91399 osd.102   up  1.0 1.0
>
> 103 archive 10.91399 osd.103   up  1.0 1.0
>
> 104 archive 10.91399 osd.104   up  1.0 1.0
>
> 105 archive 10.91399 osd.105   up  1.0 1.0
>
> 106 archive 10.91409 osd.106   up  1.0 1.0
>
> 107 archive 10.91409 osd.107   up  1.0 1.0
>
> 108 archive 10.91409 osd.108   up  1.0 1.0
>
> 109 archive 10.91409 osd.109   up  1.0 1.0
>
> 110 archive 10.91409 osd.110   up  1.0 1.0
>
> 111 archive 10.91409 osd.111   up  1.0 1.0
>
> -23 130.96800 host CEPH005
>
>   4 archive 10.91399 osd.4  

[ceph-users] Ceph Bucket strange issues rgw.none + id and marker diferent.

2019-05-07 Thread EDH - Manuel Rios Fernandez
Hi Ceph's

 

We got an issue that we're still looking the cause after more than 60 hour
searching a misconfiguration.

 

After cheking a lot of documentation and Questions we find that
bucket id and bucket marker are not the same. We compared all our other
bucket and all got the same id and marker.

 

Also found some bucket with the rgw.none section an another not.

 

This bucket is unable to be listed in a fashionable time. Customer relaxed
usage from 120TB to 93TB , from 7Million objects to 5.8M.

 

We isolated a single petition in a RGW server and check some metric, just
try to list this bucket generate 2-3Gbps traffic from RGW to OSD/MON's.

 

I asked at IRC if there're any problem about index pool be in other root in
the same site at crushmap and we think that shouldn't be.

 

Any idea or suggestion, however crazy, will be proven.

 

Our relevant configuration that may help :

 

CEPH DF:

 

ceph df

GLOBAL:

SIZEAVAIL   RAW USED %RAW USED

684 TiB 139 TiB  545 TiB 79.70

POOLS:

NAME   ID USED%USED MAX AVAIL
OBJECTS

volumes21 3.3 TiB 63.90   1.9 TiB
831300

backups22 0 B 0   1.9 TiB
0

images 23 1.8 TiB 49.33   1.9 TiB
237066

vms24 3.4 TiB 64.85   1.9 TiB
811534

openstack-volumes-archive  25  30 TiB 47.9232 TiB
7748864

.rgw.root  26 1.6 KiB 0   1.9 TiB
4

default.rgw.control27 0 B 0   1.9 TiB
100

default.rgw.data.root  28  56 KiB 0   1.9 TiB
186

default.rgw.gc 29 0 B 0   1.9 TiB
32

default.rgw.log30 0 B 0   1.9 TiB
175

default.rgw.users.uid  31 4.9 KiB 0   1.9 TiB
26

default.rgw.users.email3612 B 0   1.9 TiB
1

default.rgw.users.keys 37   243 B 0   1.9 TiB
14

default.rgw.buckets.index  38 0 B 0   1.9 TiB
1056

default.rgw.buckets.data   39 245 TiB 93.8416 TiB
102131428

default.rgw.buckets.non-ec 40 0 B 0   1.9 TiB
23046

default.rgw.usage  43 0 B 0   1.9 TiB
6

 

 

CEPH OSD Distribution:

 

ceph osd tree

ID  CLASS   WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF

-41 654.84045 root archive

-37 130.96848 host CEPH-ARCH-R03-07

100 archive  10.91399 osd.100   up  1.0 1.0

101 archive  10.91399 osd.101   up  1.0 1.0

102 archive  10.91399 osd.102   up  1.0 1.0

103 archive  10.91399 osd.103   up  1.0 1.0

104 archive  10.91399 osd.104   up  1.0 1.0

105 archive  10.91399 osd.105   up  1.0 1.0

106 archive  10.91409 osd.106   up  1.0 1.0

107 archive  10.91409 osd.107   up  1.0 1.0

108 archive  10.91409 osd.108   up  1.0 1.0

109 archive  10.91409 osd.109   up  1.0 1.0

110 archive  10.91409 osd.110   up  1.0 1.0

111 archive  10.91409 osd.111   up  1.0 1.0

-23 130.96800 host CEPH005

  4 archive  10.91399 osd.4 up  1.0 1.0

41 archive  10.91399 osd.41up  1.0 1.0

74 archive  10.91399 osd.74up  1.0 1.0

75 archive  10.91399 osd.75up  1.0 1.0

81 archive  10.91399 osd.81up  1.0 1.0

82 archive  10.91399 osd.82up  1.0 1.0

83 archive  10.91399 osd.83up  1.0 1.0

84 archive  10.91399 osd.84up  1.0 1.0

85 archive  10.91399 osd.85up  1.0 1.0

86 archive  10.91399 osd.86up  1.0 1.0

87 archive  10.91399 osd.87up  1.0 1.0

88 archive  10.91399 osd.88up  1.0 1.0

-17 130.96800 host CEPH006

  7 archive  10.91399 osd.7 up  1.0 1.0

  8 archive  10.91399 osd.8 up  1.0 1.0

  9 archive  10.91399 osd.9 up  1.0 1.0

10 archive  10.91399 osd.10up  1.0 1.0

12 archive  10.91399 osd.12up  1.0 1.0

13 archive  10.91399 osd.13up  1.0 1.0

42 archive  10.91399 osd.42

[ceph-users] cls_rgw.cc:3420: couldn't find tag in name index

2019-05-04 Thread EDH - Manuel Rios Fernandez
Hi Ceph's!

 

We're looking in some OSD with overcpu usage the next logs:

 

2019-05-05 01:40:57.733 7efeb10bc700  0 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x8
6_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rp
m/el7/BUILD/ceph-13.2.5/src/cls/rgw/cls_rgw.cc:3420: couldn't find tag in
name index tag=48efb8c3-693c-4fe0-bbe4-fdc16f590a82.15164760.295694

 

2019-05-05 01:40:57.733 7efeb10bc700  0 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x8
6_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rp
m/el7/BUILD/ceph-13.2.5/src/cls/rgw/cls_rgw.cc:3420: couldn't find tag in
name index tag=48efb8c3-693c-4fe0-bbe4-fdc16f590a82.15164760.295694

 

2019-05-05 01:40:57.733 7efeb10bc700  0 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x8
6_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rp
m/el7/BUILD/ceph-13.2.5/src/cls/rgw/cls_rgw.cc:3420: couldn't find tag in
name index tag=48efb8c3-693c-4fe0-bbe4-fdc16f590a82.15164760.295694

 

2019-05-05 01:41:00.309 7efeb10bc700  0 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x8
6_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rp
m/el7/BUILD/ceph-13.2.5/src/cls/rgw/cls_rgw.cc:3420: couldn't find tag in
name index tag=48efb8c3-693c-4fe0-bbe4-fdc16f590a82.15197537.7430600

 

2019-05-05 01:41:00.309 7efeb10bc700  0 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x8
6_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rp
m/el7/BUILD/ceph-13.2.5/src/cls/rgw/cls_rgw.cc:3420: couldn't find tag in
name index tag=48efb8c3-693c-4fe0-bbe4-fdc16f590a82.15197537.7430600

 

 

ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic
(stable)

 

 

 

Best Regards,

 

Manuel

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket

2019-05-04 Thread EDH - Manuel Rios Fernandez
Hi Folks,

The user is telling us that their software drops a timeout at 10 min (secs)

Reading documentation I think that we can set param  to 3600 secs as Amazon got 
it as timeout

rgw op thread timeout 

Description:The timeout in seconds for open threads.
Type:   Integer
Default:600

Of course list a bucket with 7M objects is a painfull maybe this help to allow 
software complete the listing?

Best Regards
Manuel

-Mensaje original-
De: Matt Benjamin  
Enviado el: viernes, 3 de mayo de 2019 15:47
Para: EDH - Manuel Rios Fernandez 
CC: ceph-users 
Asunto: Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket

I think I would not override the default value for "rgw list buckets max 
chunk", I have no experience doing that, though I can see why it might be 
plausible.

Matt

On Fri, May 3, 2019 at 9:39 AM EDH - Manuel Rios Fernandez 
 wrote:
>
> From changes right know we got some other errors...
>
> 2019-05-03 15:37:28.604 7f499a2e8700  1 == starting new request 
> req=0x55f326692970 =
> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s::GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision::initializin
> g for trans_id = tx05c63-005ccc4418-e76558-default
> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision::getting op 
> 0
> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:veri
> fying requester
> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:norm
> alizing buckets and tenants
> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:init 
> permissions
> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:reca
> lculating target
> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:read
> ing permissions
> 2019-05-03 15:37:28.607 7f499a2e8700  2 req 23651:0.003s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:init 
> op
> 2019-05-03 15:37:28.607 7f499a2e8700  2 req 23651:0.003s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:veri
> fying op mask
> 2019-05-03 15:37:28.607 7f499a2e8700  2 req 23651:0.003s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:veri
> fying op permissions
> 2019-05-03 15:37:28.607 7f499a2e8700  2 req 23651:0.003s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:veri
> fying op params
> 2019-05-03 15:37:28.607 7f499a2e8700  2 req 23651:0.003s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:pre-
> executing
> 2019-05-03 15:37:28.607 7f499a2e8700  2 req 23651:0.003s:s3:GET 
> /CBERRY/MBS-69e38e26-5516-

[ceph-users] RGW BEAST mimic backport dont show customer IP

2019-05-03 Thread EDH - Manuel Rios Fernandez
Hi Folks,

 

We migrated our RGW from Citeweb to Beast as frontend backport to mimic, the
performance is impressive compared with the old one.

 

But. in ceph logs don't show client peer IP, checked with debug rgw = 1 and
2.

 

Checked the documentation in ceph don't tell us much more.

 

How we can enable for auditoring ?

 

Any recommendation for improve BEAST performance ? We got 40cores and 64GB
rgw frontends with 40Gbps cards.

 

Best Regards

 

 

Manuel Rios

Keep IT Simple!

 

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph cluster available to clients with 2 different VLANs ?

2019-05-03 Thread EDH - Manuel Rios Fernandez
You can put multiple networks in ceph.conf with commas

 

public network = 172.16.2.0/24, 192.168.0/22

 

But remember your servers must be able to reach it. L3 , FW needed.

 

Regards

Manuel

 

 

De: ceph-users  En nombre de Martin Verges
Enviado el: viernes, 3 de mayo de 2019 11:36
Para: Hervé Ballans 
CC: ceph-users 
Asunto: Re: [ceph-users] Ceph cluster available to clients with 2 different 
VLANs ?

 

Hello,

 

configure a gateway on your router or use a good rack switch that can provide 
such features and use layer3 routing to connect different vlans / ip zones.




--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io  
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx

 

 

Am Fr., 3. Mai 2019 um 10:21 Uhr schrieb Hervé Ballans 
mailto:herve.ball...@ias.u-psud.fr> >:

Hi all,

I have a Ceph cluster on Luminous 12.2.10 with 3 mon and 6 osd servers.
My current network settings is a separated public and cluster (private 
IP) network.

I would like my cluster available to clients on another VLAN than the 
default one (which is the public network on ceph.conf)

Is it possible ? How can I achieve that ?
For information, each node still has two unused network cards.

Thanks for any suggestions,

Hervé

___
ceph-users mailing list
ceph-users@lists.ceph.com  
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket

2019-05-03 Thread EDH - Manuel Rios Fernandez
-03 15:37:28.959 7f4a68484700  1 == req done req=0x55f2fde20970 op 
status=-104 http_status=206 ==


-Mensaje original-
De: EDH - Manuel Rios Fernandez  
Enviado el: viernes, 3 de mayo de 2019 15:12
Para: 'Matt Benjamin' 
CC: 'ceph-users' 
Asunto: RE: [ceph-users] RGW Bucket unable to list buckets 100TB bucket

Hi Matt,

Thanks for your help,

We have done the changes plus a reboot of MONs and RGW they look like strange 
stucked , now we're able to list  250 directories.

time s3cmd ls s3://datos101 --no-ssl --limit 150
real2m50.854s
user0m0.147s
sys 0m0.042s


Is there any recommendation of max_shard ?

Our main goal is cold storage, normally our usage are backups or customers tons 
of files. This cause that customers in single bucket store millions objetcs.

Its strange because this issue started on Friday without any warning error at 
OSD / RGW logs.

When you should warning customer that will not be able to list their directory 
if they reach X Millions objetcs?

Our current ceph.conf

#Normal-Memory 1/5
debug rgw = 2
#Disable
debug osd = 0
debug journal = 0
debug ms = 0

fsid = e1ee8086-7cce-43fd-a252-3d677af22428
mon_initial_members = CEPH001, CEPH002, CEPH003 mon_host = 
172.16.2.10,172.16.2.11,172.16.2.12
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd pool default pg num = 128
osd pool default pgp num = 128

public network = 172.16.2.0/24
cluster network = 172.16.1.0/24

osd pool default size = 2
osd pool default min size = 1

rgw_dynamic_resharding = true
#Increment to 128
rgw_override_bucket_index_max_shards = 128

#Default: 1000
rgw list buckets max chunk = 5000



[osd]
osd mkfs type = xfs
osd op threads = 12
osd disk threads = 12

osd recovery threads = 4
osd recovery op priority = 1
osd recovery max active = 2
osd recovery max single start = 1

osd max backfills = 4
osd backfill scan max = 16
osd backfill scan min = 4
osd client op priority = 63


osd_memory_target = 2147483648

osd_scrub_begin_hour = 23
osd_scrub_end_hour = 6
osd_scrub_load_threshold = 0.25 #low load scrubbing osd_scrub_during_recovery = 
false #scrub during recovery

[mon]
mon allow pool delete = true
mon osd min down reporters = 3

[mon.a]
host = CEPH001
public bind addr = 172.16.2.10
mon addr = 172.16.2.10:6789
mon allow pool delete = true

[mon.b]
host = CEPH002
public bind addr = 172.16.2.11
mon addr = 172.16.2.11:6789
mon allow pool delete = true

[mon.c]
host = CEPH003
public bind addr = 172.16.2.12
mon addr = 172.16.2.12:6789
mon allow pool delete = true

[client.rgw]
 rgw enable usage log = true


[client.rgw.ceph-rgw01]
 host = ceph-rgw01
 rgw enable usage log = true
 rgw dns name =
 rgw frontends = "beast port=7480"
 rgw resolve cname = false
 rgw thread pool size = 512
 rgw num rados handles = 1
 rgw op thread timeout = 600


[client.rgw.ceph-rgw03]
 host = ceph-rgw03
 rgw enable usage log = true
 rgw dns name = 
 rgw frontends = "beast port=7480"
 rgw resolve cname = false
 rgw thread pool size = 512
 rgw num rados handles = 1
 rgw op thread timeout = 600


On Thursday customer tell us that listing were instant, and now their programs 
delay until timeout.

Best Regards

Manuel

-Mensaje original-
De: Matt Benjamin  
Enviado el: viernes, 3 de mayo de 2019 14:00
Para: EDH - Manuel Rios Fernandez 
CC: ceph-users 
Asunto: Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket

Hi Folks,

Thanks for sharing your ceph.conf along with the behavior.

There are some odd things there.

1. rgw_num_rados_handles is deprecated--it should be 1 (the default), but 
changing it may require you to check and retune the values for 
objecter_inflight_ops and objecter_inflight_op_bytes to be larger 2. you have 
very different rgw_thread_pool_size values on these to gateways;  a value 
between 512 and 1024 is usually best (future rgws will not rely on large thread 
pools) 3. the actual behavior with 128 shards might be assisted by listing in 
unordered mode--HOWEVER, there was a bug in this feature which caused a perf 
regression and masked the benefit--make sure you have applied the fix for 
https://tracker.ceph.com/issues/39393 before evaluating

regards,

Matt

On Fri, May 3, 2019 at 4:57 AM EDH - Manuel Rios Fernandez 
 wrote:
>
> Hi,
>
>
>
> We got a ceph deployment 13.2.5 version, but several bucket with millions of 
> files.
>
>
>
>   services:
>
> mon: 3 daemons, quorum CEPH001,CEPH002,CEPH003
>
> mgr: CEPH001(active)
>
> osd: 106 osds: 106 up, 106 in
>
> rgw: 2 daemons active
>
>
>
>   data:
>
> pools:   17 pools, 7120 pgs
>
> objects: 106.8 M objects, 271 TiB
>
> usage:   516 TiB used, 102 TiB / 619 TiB avail
>
> pgs: 7120 active+clean
>
>
>
> We done a test in a spare RGW server for this case.
>
>
>
>
>
>

Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket

2019-05-03 Thread EDH - Manuel Rios Fernandez
Hi Matt,

Thanks for your help,

We have done the changes plus a reboot of MONs and RGW they look like strange 
stucked , now we're able to list  250 directories.

time s3cmd ls s3://datos101 --no-ssl --limit 150
real2m50.854s
user0m0.147s
sys 0m0.042s


Is there any recommendation of max_shard ?

Our main goal is cold storage, normally our usage are backups or customers tons 
of files. This cause that customers in single bucket store millions objetcs.

Its strange because this issue started on Friday without any warning error at 
OSD / RGW logs.

When you should warning customer that will not be able to list their directory 
if they reach X Millions objetcs?

Our current ceph.conf

#Normal-Memory 1/5
debug rgw = 2
#Disable
debug osd = 0
debug journal = 0
debug ms = 0

fsid = e1ee8086-7cce-43fd-a252-3d677af22428
mon_initial_members = CEPH001, CEPH002, CEPH003
mon_host = 172.16.2.10,172.16.2.11,172.16.2.12
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd pool default pg num = 128
osd pool default pgp num = 128

public network = 172.16.2.0/24
cluster network = 172.16.1.0/24

osd pool default size = 2
osd pool default min size = 1

rgw_dynamic_resharding = true
#Increment to 128
rgw_override_bucket_index_max_shards = 128

#Default: 1000
rgw list buckets max chunk = 5000



[osd]
osd mkfs type = xfs
osd op threads = 12
osd disk threads = 12

osd recovery threads = 4
osd recovery op priority = 1
osd recovery max active = 2
osd recovery max single start = 1

osd max backfills = 4
osd backfill scan max = 16
osd backfill scan min = 4
osd client op priority = 63


osd_memory_target = 2147483648

osd_scrub_begin_hour = 23
osd_scrub_end_hour = 6
osd_scrub_load_threshold = 0.25 #low load scrubbing
osd_scrub_during_recovery = false #scrub during recovery

[mon]
mon allow pool delete = true
mon osd min down reporters = 3

[mon.a]
host = CEPH001
public bind addr = 172.16.2.10
mon addr = 172.16.2.10:6789
mon allow pool delete = true

[mon.b]
host = CEPH002
public bind addr = 172.16.2.11
mon addr = 172.16.2.11:6789
mon allow pool delete = true

[mon.c]
host = CEPH003
public bind addr = 172.16.2.12
mon addr = 172.16.2.12:6789
mon allow pool delete = true

[client.rgw]
 rgw enable usage log = true


[client.rgw.ceph-rgw01]
 host = ceph-rgw01
 rgw enable usage log = true
 rgw dns name = 
 rgw frontends = "beast port=7480"
 rgw resolve cname = false
 rgw thread pool size = 512
 rgw num rados handles = 1
 rgw op thread timeout = 600


[client.rgw.ceph-rgw03]
 host = ceph-rgw03
 rgw enable usage log = true
 rgw dns name = 
 rgw frontends = "beast port=7480"
 rgw resolve cname = false
 rgw thread pool size = 512
 rgw num rados handles = 1
 rgw op thread timeout = 600


On Thursday customer tell us that listing were instant, and now their programs 
delay until timeout.

Best Regards

Manuel

-Mensaje original-
De: Matt Benjamin  
Enviado el: viernes, 3 de mayo de 2019 14:00
Para: EDH - Manuel Rios Fernandez 
CC: ceph-users 
Asunto: Re: [ceph-users] RGW Bucket unable to list buckets 100TB bucket

Hi Folks,

Thanks for sharing your ceph.conf along with the behavior.

There are some odd things there.

1. rgw_num_rados_handles is deprecated--it should be 1 (the default), but 
changing it may require you to check and retune the values for 
objecter_inflight_ops and objecter_inflight_op_bytes to be larger 2. you have 
very different rgw_thread_pool_size values on these to gateways;  a value 
between 512 and 1024 is usually best (future rgws will not rely on large thread 
pools) 3. the actual behavior with 128 shards might be assisted by listing in 
unordered mode--HOWEVER, there was a bug in this feature which caused a perf 
regression and masked the benefit--make sure you have applied the fix for 
https://tracker.ceph.com/issues/39393 before evaluating

regards,

Matt

On Fri, May 3, 2019 at 4:57 AM EDH - Manuel Rios Fernandez 
 wrote:
>
> Hi,
>
>
>
> We got a ceph deployment 13.2.5 version, but several bucket with millions of 
> files.
>
>
>
>   services:
>
> mon: 3 daemons, quorum CEPH001,CEPH002,CEPH003
>
> mgr: CEPH001(active)
>
> osd: 106 osds: 106 up, 106 in
>
> rgw: 2 daemons active
>
>
>
>   data:
>
> pools:   17 pools, 7120 pgs
>
> objects: 106.8 M objects, 271 TiB
>
> usage:   516 TiB used, 102 TiB / 619 TiB avail
>
> pgs: 7120 active+clean
>
>
>
> We done a test in a spare RGW server for this case.
>
>
>
>
>
> Customer report us that is unable to list their buckets, we tested in a 
> monitor with the command:
>
>
>
> s3cmd ls s3://[bucket] --no-ssl --limit 20
>
>
>
> Takes 1m and 2 secs.
>
>
>
> RGW log in debug mode = 2
>
>
>
> 2019-05-03 10:40:25.449 7f65f63e1700  1 ==

[ceph-users] RGW Bucket unable to list buckets 100TB bucket

2019-05-03 Thread EDH - Manuel Rios Fernandez
Hi,

 

We got a ceph deployment 13.2.5 version, but several bucket with millions of
files.

 

  services:

mon: 3 daemons, quorum CEPH001,CEPH002,CEPH003

mgr: CEPH001(active)

osd: 106 osds: 106 up, 106 in

rgw: 2 daemons active

 

  data:

pools:   17 pools, 7120 pgs

objects: 106.8 M objects, 271 TiB

usage:   516 TiB used, 102 TiB / 619 TiB avail

pgs: 7120 active+clean

 

We done a test in a spare RGW server for this case.

 

 

Customer report us that is unable to list their buckets, we tested in a
monitor with the command:

 

s3cmd ls s3://[bucket] --no-ssl --limit 20

 

Takes 1m and 2 secs.

 

RGW log in debug mode = 2

 

2019-05-03 10:40:25.449 7f65f63e1700  1 == starting new request
req=0x55eba26e8970 =

2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s::GET
/[bucketname]/::initializing for trans_id =
tx00071-005ccbfe79-e6283e-default

2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET
/[bucketname]/::getting op 0

2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:verifying requester

2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:normalizing buckets and tenants

2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:init permissions

2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:recalculating target

2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:reading permissions

2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:init op

2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:verifying op mask

2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:verifying op permissions

2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:verifying op params

2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:pre-executing

2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET
/[bucketname]/:list_bucket:executing

2019-05-03 10:40:41.026 7f660e411700  2
RGWDataChangesLog::ChangesRenewThread: start

2019-05-03 10:41:03.026 7f660e411700  2
RGWDataChangesLog::ChangesRenewThread: start

2019-05-03 10:41:25.026 7f660e411700  2
RGWDataChangesLog::ChangesRenewThread: start

2019-05-03 10:41:47.026 7f660e411700  2
RGWDataChangesLog::ChangesRenewThread: start

2019-05-03 10:41:49.395 7f65f63e1700  2 req 113:83.9461s:s3:GET
/[bucketname]/:list_bucket:completing

2019-05-03 10:41:49.395 7f65f63e1700  2 req 113:83.9461s:s3:GET
/[bucketname]/:list_bucket:op status=0

2019-05-03 10:41:49.395 7f65f63e1700  2 req 113:83.9461s:s3:GET
/[bucketname]/:list_bucket:http status=200

2019-05-03 10:41:49.395 7f65f63e1700  1 == req done req=0x55eba26e8970
op status=0 http_status=200 ==

 

 

time s3cmd ls s3://[bucket] --no-ssl --limit 100

real4m26.318s

 

 

2019-05-03 10:42:36.439 7f65f33db700  1 == starting new request
req=0x55eba26e8970 =

2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s::GET
/[bucketname]/::initializing for trans_id =
tx00073-005ccbfefc-e6283e-default

2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET
/[bucketname]/::getting op 0

2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:verifying requester

2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:normalizing buckets and tenants

2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:init permissions

2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:recalculating target

2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:reading permissions

2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:init op

2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:verifying op mask

2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:verifying op permissions

2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:verifying op params

2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:pre-executing

2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET
/[bucketname]/:list_bucket:executing

2019-05-03 10:42:53.026 7f660e411700  2
RGWDataChangesLog::ChangesRenewThread: start

2019-05-03 10:43:15.027 7f660e411700  2
RGWDataChangesLog::ChangesRenewThread: start

2019-05-03 10:43:37.028 7f660e411700  2
RGWDataChangesLog::ChangesRenewThread: start

2019-05-03 10:43:59.027 7f660e411700  2
RGWDataChangesLog::ChangesRenewThread: start

2019-05-03 10:44:21.028 7f660e411700  2
RGWDataChangesLog::ChangesRenewThread: start

2019-05-03 10:44:43.027 7f660e411700  2