Re: [ceph-users] how to fix X is an unexpected clone

2018-02-26 Thread Saverio Proto
-tool and the remove operation to remove all left over files. > > I did this on all OSDs with the problematic pg. After that ceph was able > to fix itself. > > A better approach might be that ceph can recover itself from an > unexpected clone by just deleting it. > > Greets,

Re: [ceph-users] OSPF to the host

2016-07-11 Thread Saverio Proto
> I'm looking at the Dell S-ON switches which we can get in a Cumulus > version. Any pro's and con's of using Cumulus vs old school switch OS's you > may have come across? Nothing to declare here. Once configured properly the hardware works as expected. I never used Dell, I used switches from

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-06-14 Thread Saverio Proto
to test it on a ceph version newer than Hammer, you can update the bug :) thank you Saverio 2016-05-12 15:49 GMT+02:00 Yehuda Sadeh-Weinraub <yeh...@redhat.com>: > On Thu, May 12, 2016 at 12:29 AM, Saverio Proto <ziopr...@gmail.com> wrote: >>> While I'm usually not fo

Re: [ceph-users] hadoop on cephfs

2016-06-09 Thread Saverio Proto
You can also have Hadoop talking to the Rados Gateway (SWIFT API) so that the data is in Ceph instead of HDFS. I wrote this tutorial that might help: https://github.com/zioproto/hadoop-swift-tutorial Saverio 2016-04-30 23:55 GMT+02:00 Adam Tygart : > Supposedly cephfs-hadoop

Re: [ceph-users] OSPF to the host

2016-06-09 Thread Saverio Proto
> Has anybody had any experience with running the network routed down all the > way to the host? > Hello Nick, yes at SWITCH.ch we run OSPF unnumbered on the switches and on the hosts. Each server has two NICs and we are able to plug the servers to any port on the fabric and OSFP will make the

Re: [ceph-users] The RGW create new bucket instance then delete it at every create bucket OP

2016-05-18 Thread Saverio Proto
Hello, I am not sure I understood the problem. Can you post the example steps to reproduce the problem ? Also what version of Ceph RGW are you running ? Saverio 2016-05-18 10:24 GMT+02:00 fangchen sun : > Dear ALL, > > I found a problem that the RGW create a new bucket

Re: [ceph-users] ACL nightmare on RadosGW for 200 TB dataset

2016-05-12 Thread Saverio Proto
> Can't you set the ACL on the object when you put it? What do you think of this bug ? https://github.com/s3tools/s3cmd/issues/743 Saverio ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-12 Thread Saverio Proto
> While I'm usually not fond of blaming the client application, this is > really the swift command line tool issue. It tries to be smart by > comparing the md5sum of the object's content with the object's etag, > and it breaks with multipart objects. Multipart objects is calculated > differently

Re: [ceph-users] ACL nightmare on RadosGW for 200 TB dataset

2016-05-12 Thread Saverio Proto
> Can't you set the ACL on the object when you put it? I could create two tenants. One tenant DATASETADMIN for read/write access, and a tenant DATASETUSERS for readonly access. When I load the dataset into the object store, I need a "s3cmd put" operation and a "s3cmd setacl" operation for each

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-11 Thread Saverio Proto
+02:00 Saverio Proto <ziopr...@gmail.com>: > Thank you. > > It is exactly a problem with multipart. > > So I tried two clients (s3cmd and rclone). When you upload a file in > S3 using multipart, you are not able to read anymore this object with > the SWIFT API because

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-11 Thread Saverio Proto
- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Saverio Proto >> Sent: Monday, May 09, 2016 4:42 PM >> To: ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at >> the same

[ceph-users] ACL nightmare on RadosGW for 200 TB dataset

2016-05-11 Thread Saverio Proto
Hello there, Our setup is with Ceph Hammer (latest release). We want to publish in our Object Storage some Scientific Datasets. These are collections of around 100K objects and total size of about 200 TB. For Object Storage we use the RadosGW with S3 API. For the initial testing we are using a

[ceph-users] Mixed versions of Ceph Cluster and RadosGW

2016-05-11 Thread Saverio Proto
Hello, I have a production Ceph cluster running the latest Hammer Release. We are not planning soon the upgrade to Jewel. However, I would like to upgrade just the Rados Gateway to Jewel, because I want to test the new SWIFT compatibiltiy improvements. Is it supported to run the system with

Re: [ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-09 Thread Saverio Proto
I try to simplify the question to get some feedback. Is anyone running the RadosGW in production with S3 and SWIFT API active at the same time ? thank you ! Saverio 2016-05-06 11:39 GMT+02:00 Saverio Proto <ziopr...@gmail.com>: > Hello, > > We have been running the Rados GW

[ceph-users] RadosGW - Problems running the S3 and SWIFT API at the same time

2016-05-06 Thread Saverio Proto
Hello, We have been running the Rados GW with the S3 API and we did not have problems for more than a year. We recently enabled also the SWIFT API for our users. radosgw --version ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) The idea is that each user of the system is free of

Re: [ceph-users] Cannot reliably create snapshot after freezing QEMU IO

2016-02-25 Thread Saverio Proto
I confirm that the bug is fixed with the 0.94.6 release packages. thank you Saverio 2016-02-22 10:20 GMT+01:00 Saverio Proto <ziopr...@gmail.com>: > Hello Jason, > > from this email on ceph-dev > http://article.gmane.org/gmane.comp.file-systems.ceph.devel/29692 > &

Re: [ceph-users] Cannot reliably create snapshot after freezing QEMU IO

2016-02-22 Thread Saverio Proto
Dillaman <dilla...@redhat.com>: > Correct -- a v0.94.6 tag on the hammer branch won't be created until the > release. > > -- > > Jason Dillaman > > > - Original Message - >> From: "Saverio Proto" <ziopr...@gmail.com> >> To: "Jaso

Re: [ceph-users] Cannot reliably create snapshot after freezing QEMU IO

2016-02-19 Thread Saverio Proto
tracker.ceph.com/issues/13356 > [2] http://docs.ceph.com/docs/master/install/get-packages/ > > -- > > Jason Dillaman > > > - Original Message - >> From: "Saverio Proto" <ziopr...@gmail.com> >> To: ceph-users@lists.ceph.com >> Sent: Fr

[ceph-users] Cannot reliably create snapshot after freezing QEMU IO

2016-02-19 Thread Saverio Proto
Hello, we are hitting here Bug #14373 in our production cluster http://tracker.ceph.com/issues/14373 Since we introduced the object map feature in our cinder rbd volumes, we are not able to make snapshot the volumes, unless they pause the VMs. We are running the latest Hammer and so we are

Re: [ceph-users] Increasing time to save RGW objects

2016-02-10 Thread Saverio Proto
What kind of authentication you use against the Rados Gateway ? We had similar problem authenticating against our Keystone server. If the Keystone server is overloaded the time to read/write RGW objects increases. You will not see anything wrong on the ceph side. Saverio 2016-02-08 17:49

[ceph-users] What are linger_ops in the output of objecter_requests ?

2015-10-14 Thread Saverio Proto
Hello, debugging slow requests behaviour of our Rados Gateway, I run into this linger_ops field and I cannot understand the meaning. I would expect in the "ops" field to find slow requests stucked there. Actually most of the time I have "ops": [], and looks like ops gets empty very quickly.

Re: [ceph-users] radosgw secret_key

2015-09-01 Thread Saverio Proto
Look at this: https://github.com/ncw/rclone/issues/47 Because this is a json dump, it is encoding the / as \/. It was source of confusion also for me. Best regards Saverio 2015-08-24 16:58 GMT+02:00 Luis Periquito : > When I create a new user using radosgw-admin most

Re: [ceph-users] How to use cgroup to bind ceph-osd to a specific cpu core?

2015-07-27 Thread Saverio Proto
Hello Jan, I am testing your scripts, because we want also to test OSDs and VMs on the same server. I am new to cgroups, so this might be a very newbie question. In your script you always reference to the file /cgroup/cpuset/libvirt/cpuset.cpus but I have the file in

Re: [ceph-users] Unexpected issues with simulated 'rack' outage

2015-06-24 Thread Saverio Proto
Hello Romero, I am still begineer with Ceph, but as far as I understood, ceph is not designed to lose the 33% of the cluster at once and recover rapidly. What I understand is that you are losing 33% of the cluster losing 1 rack out of 3. It will take a very long time to recover, before you have

Re: [ceph-users] Unexpected issues with simulated 'rack' outage

2015-06-24 Thread Saverio Proto
Romero Junior r.jun...@global.leaseweb.com: If I have a replica of each object on the other racks why should I have to wait for any recovery time? The failure should not impact my virtual machines. *From:* Saverio Proto [mailto:ziopr...@gmail.com] *Sent:* woensdag, 24 juni, 2015 14:54

Re: [ceph-users] xfs corruption, data disaster!

2015-05-06 Thread Saverio Proto
Hello, I dont get it. You lost just 6 osds out of 145 and your cluster is not able to recover ? what is the status of ceph -s ? Saverio 2015-05-04 9:00 GMT+02:00 Yujian Peng pengyujian5201...@126.com: Hi, I'm encountering a data disaster. I have a ceph cluster with 145 osd. The data center

Re: [ceph-users] xfs corruption, data disaster!

2015-05-06 Thread Saverio Proto
he lost 22 pgs. But I guess the cluster has thousands of pgs so the actual data lost is small. Is that correct ? thanks Saverio 2015-05-07 4:16 GMT+02:00 Christian Balzer ch...@gol.com: Hello, On Thu, 7 May 2015 00:34:58 +0200 Saverio Proto wrote: Hello, I dont get it. You lost just 6

Re: [ceph-users] Ceph migration to AWS

2015-05-06 Thread Saverio Proto
Why you don't use directly AWS S3 then ? Saverio 2015-04-24 17:14 GMT+02:00 Mike Travis mike.r.tra...@gmail.com: To those interested in a tricky problem, We have a Ceph cluster running at one of our data centers. One of our client's requirements is to have them hosted at AWS. My question is:

Re: [ceph-users] All pools have size=3 but MB data and MB used ratio is 1 to 5

2015-04-17 Thread Saverio Proto
Do you by any chance have your OSDs placed at a local directory path rather than on a non utilized physical disk? No, I have 18 Disks per Server. Each OSD is mapped to a physical disk. Here in the output of one server: ansible@zrh-srv-m-cph02:~$ df -h Filesystem Size Used Avail

Re: [ceph-users] advantages of multiple pools?

2015-04-17 Thread Saverio Proto
For example you can assign different read/write permissions and different keyrings to different pools. 2015-04-17 16:00 GMT+02:00 Chad William Seys cws...@physics.wisc.edu: Hi All, What are the advantages of having multiple ceph pools (if they use the whole cluster)? Thanks! C.

Re: [ceph-users] Binding a pool to certain OSDs

2015-04-14 Thread Saverio Proto
Yes you can. You have to write your own crushmap. At the end of the crushmap you have rulesets Write a ruleset that selects only the OSDs you want. Then you have to assign the pool to that ruleset. I have seen examples online, people what wanted some pools only on SSD disks and other pools only

Re: [ceph-users] All pools have size=3 but MB data and MB used ratio is 1 to 5

2015-04-14 Thread Saverio Proto
2015-03-27 18:27 GMT+01:00 Gregory Farnum g...@gregs42.com: Ceph has per-pg and per-OSD metadata overhead. You currently have 26000 PGs, suitable for use on a cluster of the order of 260 OSDs. You have placed almost 7GB of data into it (21GB replicated) and have about 7GB of additional

Re: [ceph-users] Binding a pool to certain OSDs

2015-04-14 Thread Saverio Proto
run other services) I have 16BG RAM, 15GB used but 5 cached. On the OSD servers I have 3GB RAM, 3GB used but 2 cached. ceph -s tells me nothing about PGs, shouldn't I get an error message from its output? Thanks Giuseppe 2015-04-14 18:20 GMT+02:00 Saverio Proto ziopr...@gmail.com: You only

Re: [ceph-users] Binding a pool to certain OSDs

2015-04-14 Thread Saverio Proto
You only have 4 OSDs ? How much RAM per server ? I think you have already too many PG. Check your RAM usage. Check on Ceph wiki guidelines to dimension the correct number of PGs. Remeber that everytime to create a new pool you add PGs into the system. Saverio 2015-04-14 17:58 GMT+02:00

Re: [ceph-users] All pools have size=3 but MB data and MB used ratio is 1 to 5

2015-03-27 Thread Saverio Proto
I will start now to push a lot of data into the cluster to see if the metadata grows a lot or stays costant. There is a way to clean up old metadata ? I pushed a lot of more data to the cluster. Then I lead the cluster sleep for the night. This morning I find this values: 6841 MB data 25814

[ceph-users] All pools have size=3 but MB data and MB used ratio is 1 to 5

2015-03-26 Thread Saverio Proto
Farnum g...@gregs42.com: On Wed, Mar 25, 2015 at 1:24 AM, Saverio Proto ziopr...@gmail.com wrote: Hello there, I started to push data into my ceph cluster. There is something I cannot understand in the output of ceph -w. When I run ceph -w I get this kinkd of output: 2015-03-25 09:11

Re: [ceph-users] All pools have size=3 but MB data and MB used ratio is 1 to 5

2015-03-26 Thread Saverio Proto
You just need to go look at one of your OSDs and see what data is stored on it. Did you configure things so that the journals are using a file on the same storage disk? If so, *that* is why the data used is large. I followed your suggestion and this is the result of my trobleshooting. Each

[ceph-users] ceph -w: Understanding MB data versus MB used

2015-03-25 Thread Saverio Proto
Hello there, I started to push data into my ceph cluster. There is something I cannot understand in the output of ceph -w. When I run ceph -w I get this kinkd of output: 2015-03-25 09:11:36.785909 mon.0 [INF] pgmap v278788: 26056 pgs: 26056 active+clean; 2379 MB data, 19788 MB used, 33497 GB /

Re: [ceph-users] Ceph in Production: best practice to monitor OSD up/down status

2015-03-23 Thread Saverio Proto
is min_size, I guess the best setting for me is min_size = 1 because I would like to be able to make I/O operations even of only 1 copy is left. Thanks to all for helping ! Saverio 2015-03-23 14:58 GMT+01:00 Gregory Farnum g...@gregs42.com: On Sun, Mar 22, 2015 at 2:55 AM, Saverio Proto ziopr

[ceph-users] Ceph in Production: best practice to monitor OSD up/down status

2015-03-22 Thread Saverio Proto
Hello, I started to work with CEPH few weeks ago, I might ask a very newbie question, but I could not find an answer in the docs or in the ml archive for this. Quick description of my setup: I have a ceph cluster with two servers. Each server has 3 SSD drives I use for journal only. To map to