Re: [ceph-users] how to fix X is an unexpected clone

2018-02-26 Thread Saverio Proto
t; ceph-object-tool and the remove operation to remove all left over files. > > I did this on all OSDs with the problematic pg. After that ceph was able > to fix itself. > > A better approach might be that ceph can recover itself from an > unexpected clone by just deleting it. >

Re: [ceph-users] OSPF to the host

2016-07-11 Thread Saverio Proto
> I'm looking at the Dell S-ON switches which we can get in a Cumulus > version. Any pro's and con's of using Cumulus vs old school switch OS's you > may have come across? Nothing to declare here. Once configured properly the hardware works as expected. I never used Dell, I used switches from

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-06-14 Thread Saverio Proto
test it on a ceph version newer than Hammer, you can update the bug :) thank you Saverio 2016-05-12 15:49 GMT+02:00 Yehuda Sadeh-Weinraub : > On Thu, May 12, 2016 at 12:29 AM, Saverio Proto wrote: >>> While I'm usually not fond of blaming the client application, this is >

Re: [ceph-users] hadoop on cephfs

2016-06-09 Thread Saverio Proto
You can also have Hadoop talking to the Rados Gateway (SWIFT API) so that the data is in Ceph instead of HDFS. I wrote this tutorial that might help: https://github.com/zioproto/hadoop-swift-tutorial Saverio 2016-04-30 23:55 GMT+02:00 Adam Tygart : > Supposedly cephfs-hadoop worked and/or works

Re: [ceph-users] OSPF to the host

2016-06-09 Thread Saverio Proto
> Has anybody had any experience with running the network routed down all the > way to the host? > Hello Nick, yes at SWITCH.ch we run OSPF unnumbered on the switches and on the hosts. Each server has two NICs and we are able to plug the servers to any port on the fabric and OSFP will make the m

[ceph-users] Ubuntu Xenial - Ceph repo uses weak digest algorithm (SHA1)

2016-05-27 Thread Saverio Proto
I started to use Xenial... does everyone have this error ? : W: http://ceph.com/debian-hammer/dists/xenial/InRelease: Signature by key 08B73419AC32B4E966C1A330E84AC2C0460F3994 uses weak digest algorithm (SHA1) Saverio ___ ceph-users mailing list ceph-us

Re: [ceph-users] The RGW create new bucket instance then delete it at every create bucket OP

2016-05-18 Thread Saverio Proto
Hello, I am not sure I understood the problem. Can you post the example steps to reproduce the problem ? Also what version of Ceph RGW are you running ? Saverio 2016-05-18 10:24 GMT+02:00 fangchen sun : > Dear ALL, > > I found a problem that the RGW create a new bucket instance and delete > th

Re: [ceph-users] ACL nightmare on RadosGW for 200 TB dataset

2016-05-12 Thread Saverio Proto
> Can't you set the ACL on the object when you put it? What do you think of this bug ? https://github.com/s3tools/s3cmd/issues/743 Saverio ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-12 Thread Saverio Proto
> While I'm usually not fond of blaming the client application, this is > really the swift command line tool issue. It tries to be smart by > comparing the md5sum of the object's content with the object's etag, > and it breaks with multipart objects. Multipart objects is calculated > differently (m

Re: [ceph-users] ACL nightmare on RadosGW for 200 TB dataset

2016-05-12 Thread Saverio Proto
> Can't you set the ACL on the object when you put it? I could create two tenants. One tenant DATASETADMIN for read/write access, and a tenant DATASETUSERS for readonly access. When I load the dataset into the object store, I need a "s3cmd put" operation and a "s3cmd setacl" operation for each ob

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-11 Thread Saverio Proto
ecf8427e Saverio 2016-05-11 16:07 GMT+02:00 Saverio Proto : > Thank you. > > It is exactly a problem with multipart. > > So I tried two clients (s3cmd and rclone). When you upload a file in > S3 using multipart, you are not able to read anymore this object with > the SWIFT API be

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-11 Thread Saverio Proto
I try to simplify the question to get some feedback. >> >> Is anyone running the RadosGW in production with S3 and SWIFT API active at >> the same time ? >> >> thank you ! >> >> Saverio >> >> >> 2016-05-06 11:39 GMT+02:00 Saverio Proto : >&

[ceph-users] ACL nightmare on RadosGW for 200 TB dataset

2016-05-11 Thread Saverio Proto
Hello there, Our setup is with Ceph Hammer (latest release). We want to publish in our Object Storage some Scientific Datasets. These are collections of around 100K objects and total size of about 200 TB. For Object Storage we use the RadosGW with S3 API. For the initial testing we are using a

[ceph-users] Mixed versions of Ceph Cluster and RadosGW

2016-05-11 Thread Saverio Proto
Hello, I have a production Ceph cluster running the latest Hammer Release. We are not planning soon the upgrade to Jewel. However, I would like to upgrade just the Rados Gateway to Jewel, because I want to test the new SWIFT compatibiltiy improvements. Is it supported to run the system with thi

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-09 Thread Saverio Proto
I try to simplify the question to get some feedback. Is anyone running the RadosGW in production with S3 and SWIFT API active at the same time ? thank you ! Saverio 2016-05-06 11:39 GMT+02:00 Saverio Proto : > Hello, > > We have been running the Rados GW with the S3 API and we did

[ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-06 Thread Saverio Proto
Hello, We have been running the Rados GW with the S3 API and we did not have problems for more than a year. We recently enabled also the SWIFT API for our users. radosgw --version ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) The idea is that each user of the system is free of

Re: [ceph-users] Cannot reliably create snapshot after freezing QEMU IO

2016-02-25 Thread Saverio Proto
I confirm that the bug is fixed with the 0.94.6 release packages. thank you Saverio 2016-02-22 10:20 GMT+01:00 Saverio Proto : > Hello Jason, > > from this email on ceph-dev > http://article.gmane.org/gmane.comp.file-systems.ceph.devel/29692 > > it looks like 0.94.6 is coming

Re: [ceph-users] Cannot reliably create snapshot after freezing QEMU IO

2016-02-22 Thread Saverio Proto
Dillaman : > Correct -- a v0.94.6 tag on the hammer branch won't be created until the > release. > > -- > > Jason Dillaman > > > - Original Message - >> From: "Saverio Proto" >> To: "Jason Dillaman" >> Cc: ceph-users@lists.

Re: [ceph-users] Cannot reliably create snapshot after freezing QEMU IO

2016-02-19 Thread Saverio Proto
[2] http://docs.ceph.com/docs/master/install/get-packages/ > > -- > > Jason Dillaman > > > - Original Message - >> From: "Saverio Proto" >> To: ceph-users@lists.ceph.com >> Sent: Friday, February 19, 2016 10:11:01 AM >> Subject: [

[ceph-users] Cannot reliably create snapshot after freezing QEMU IO

2016-02-19 Thread Saverio Proto
Hello, we are hitting here Bug #14373 in our production cluster http://tracker.ceph.com/issues/14373 Since we introduced the object map feature in our cinder rbd volumes, we are not able to make snapshot the volumes, unless they pause the VMs. We are running the latest Hammer and so we are reall

Re: [ceph-users] Increasing time to save RGW objects

2016-02-10 Thread Saverio Proto
What kind of authentication you use against the Rados Gateway ? We had similar problem authenticating against our Keystone server. If the Keystone server is overloaded the time to read/write RGW objects increases. You will not see anything wrong on the ceph side. Saverio 2016-02-08 17:49 GMT+01:

[ceph-users] What are linger_ops in the output of objecter_requests ?

2015-10-14 Thread Saverio Proto
Hello, debugging slow requests behaviour of our Rados Gateway, I run into this linger_ops field and I cannot understand the meaning. I would expect in the "ops" field to find slow requests stucked there. Actually most of the time I have "ops": [], and looks like ops gets empty very quickly. Howe

Re: [ceph-users] radosgw secret_key

2015-09-01 Thread Saverio Proto
Look at this: https://github.com/ncw/rclone/issues/47 Because this is a json dump, it is encoding the / as \/. It was source of confusion also for me. Best regards Saverio 2015-08-24 16:58 GMT+02:00 Luis Periquito : > When I create a new user using radosgw-admin most of the time the secret

Re: [ceph-users] How to use cgroup to bind ceph-osd to a specific cpu core?

2015-07-27 Thread Saverio Proto
Hello Jan, I am testing your scripts, because we want also to test OSDs and VMs on the same server. I am new to cgroups, so this might be a very newbie question. In your script you always reference to the file /cgroup/cpuset/libvirt/cpuset.cpus but I have the file in /sys/fs/cgroup/cpuset/libvir

Re: [ceph-users] Unexpected issues with simulated 'rack' outage

2015-06-24 Thread Saverio Proto
Romero Junior : > If I have a replica of each object on the other racks why should I have > to wait for any recovery time? The failure should not impact my virtual > machines. > > > > *From:* Saverio Proto [mailto:ziopr...@gmail.com] > *Sent:* woensdag, 24 juni, 2015 14:54 >

Re: [ceph-users] Unexpected issues with simulated 'rack' outage

2015-06-24 Thread Saverio Proto
Hello Romero, I am still begineer with Ceph, but as far as I understood, ceph is not designed to lose the 33% of the cluster at once and recover rapidly. What I understand is that you are losing 33% of the cluster losing 1 rack out of 3. It will take a very long time to recover, before you have HE

Re: [ceph-users] Ceph migration to AWS

2015-05-06 Thread Saverio Proto
Why you don't use directly AWS S3 then ? Saverio 2015-04-24 17:14 GMT+02:00 Mike Travis : > To those interested in a tricky problem, > > We have a Ceph cluster running at one of our data centers. One of our > client's requirements is to have them hosted at AWS. My question is: How do > we effecti

Re: [ceph-users] xfs corruption, data disaster!

2015-05-06 Thread Saverio Proto
he lost 22 pgs. But I guess the cluster has thousands of pgs so the actual data lost is small. Is that correct ? thanks Saverio 2015-05-07 4:16 GMT+02:00 Christian Balzer : > > Hello, > > On Thu, 7 May 2015 00:34:58 +0200 Saverio Proto wrote: > >> Hello, >> >> I

Re: [ceph-users] xfs corruption, data disaster!

2015-05-06 Thread Saverio Proto
Hello, I dont get it. You lost just 6 osds out of 145 and your cluster is not able to recover ? what is the status of ceph -s ? Saverio 2015-05-04 9:00 GMT+02:00 Yujian Peng : > Hi, > I'm encountering a data disaster. I have a ceph cluster with 145 osd. The > data center had a power problem ye

Re: [ceph-users] advantages of multiple pools?

2015-04-17 Thread Saverio Proto
For example you can assign different read/write permissions and different keyrings to different pools. 2015-04-17 16:00 GMT+02:00 Chad William Seys : > Hi All, >What are the advantages of having multiple ceph pools (if they use the > whole cluster)? >Thanks! > > C. > __

Re: [ceph-users] All pools have size=3 but "MB data" and "MB used" ratio is 1 to 5

2015-04-17 Thread Saverio Proto
> Do you by any chance have your OSDs placed at a local directory path rather > than on a non utilized physical disk? No, I have 18 Disks per Server. Each OSD is mapped to a physical disk. Here in the output of one server: ansible@zrh-srv-m-cph02:~$ df -h Filesystem Size Used Avail

Re: [ceph-users] Binding a pool to certain OSDs

2015-04-14 Thread Saverio Proto
services) I have 16BG RAM, 15GB used but > 5 cached. On the OSD servers I have 3GB RAM, 3GB used but 2 cached. > "ceph -s" tells me nothing about PGs, shouldn't I get an error message from > its output? > > Thanks > Giuseppe > > 2015-04-14 18:20 GMT+02:00 Saver

Re: [ceph-users] Binding a pool to certain OSDs

2015-04-14 Thread Saverio Proto
You only have 4 OSDs ? How much RAM per server ? I think you have already too many PG. Check your RAM usage. Check on Ceph wiki guidelines to dimension the correct number of PGs. Remeber that everytime to create a new pool you add PGs into the system. Saverio 2015-04-14 17:58 GMT+02:00 Giuseppe

Re: [ceph-users] Binding a pool to certain OSDs

2015-04-14 Thread Saverio Proto
Yes you can. You have to write your own crushmap. At the end of the crushmap you have rulesets Write a ruleset that selects only the OSDs you want. Then you have to assign the pool to that ruleset. I have seen examples online, people what wanted some pools only on SSD disks and other pools only

Re: [ceph-users] All pools have size=3 but "MB data" and "MB used" ratio is 1 to 5

2015-04-14 Thread Saverio Proto
2015-03-27 18:27 GMT+01:00 Gregory Farnum : > Ceph has per-pg and per-OSD metadata overhead. You currently have 26000 PGs, > suitable for use on a cluster of the order of 260 OSDs. You have placed > almost 7GB of data into it (21GB replicated) and have about 7GB of > additional overhead. > > You mi

Re: [ceph-users] All pools have size=3 but "MB data" and "MB used" ratio is 1 to 5

2015-03-27 Thread Saverio Proto
> I will start now to push a lot of data into the cluster to see if the > "metadata" grows a lot or stays costant. > > There is a way to clean up old metadata ? I pushed a lot of more data to the cluster. Then I lead the cluster sleep for the night. This morning I find this values: 6841 MB data

Re: [ceph-users] All pools have size=3 but "MB data" and "MB used" ratio is 1 to 5

2015-03-26 Thread Saverio Proto
> You just need to go look at one of your OSDs and see what data is > stored on it. Did you configure things so that the journals are using > a file on the same storage disk? If so, *that* is why the "data used" > is large. I followed your suggestion and this is the result of my trobleshooting. E

[ceph-users] All pools have size=3 but "MB data" and "MB used" ratio is 1 to 5

2015-03-26 Thread Saverio Proto
ea1335e) Thank you Saverio 2015-03-25 14:55 GMT+01:00 Gregory Farnum : > On Wed, Mar 25, 2015 at 1:24 AM, Saverio Proto wrote: >> Hello there, >> >> I started to push data into my ceph cluster. There is something I >> cannot understand in the output of ceph -w. >

[ceph-users] ceph -w: Understanding "MB data" versus "MB used"

2015-03-25 Thread Saverio Proto
Hello there, I started to push data into my ceph cluster. There is something I cannot understand in the output of ceph -w. When I run ceph -w I get this kinkd of output: 2015-03-25 09:11:36.785909 mon.0 [INF] pgmap v278788: 26056 pgs: 26056 active+clean; 2379 MB data, 19788 MB used, 33497 GB / 3

Re: [ceph-users] Ceph in Production: best practice to monitor OSD up/down status

2015-03-23 Thread Saverio Proto
at is min_size, I guess the best setting for me is min_size = 1 because I would like to be able to make I/O operations even of only 1 copy is left. Thanks to all for helping ! Saverio 2015-03-23 14:58 GMT+01:00 Gregory Farnum : > On Sun, Mar 22, 2015 at 2:55 AM, Saverio Proto wrote: >>

[ceph-users] Ceph in Production: best practice to monitor OSD up/down status

2015-03-22 Thread Saverio Proto
Hello, I started to work with CEPH few weeks ago, I might ask a very newbie question, but I could not find an answer in the docs or in the ml archive for this. Quick description of my setup: I have a ceph cluster with two servers. Each server has 3 SSD drives I use for journal only. To map to dif