Re: [ceph-users] buckets and users

2014-11-06 Thread Marco Garcês
Your solution of pre-pending the environment name to the bucket, was my first choice, but at the moment I can't ask the devs to change the code to do that. For now I have to stick with the zones solution. Should I follow the federated zones docs

[ceph-users] All OSDs don't restart after shutdown

2014-11-06 Thread Luca Mazzaferro
Dear Users, I'm quite new to CEPH. I completed the tutorial here: http://ceph.com/docs/giant/start/quick-ceph-deploy After it, I turned off the VMs where the OSDs, Monitors and MDS were. This morning I restarted the machines but the OSD don't want to restart, while the other services

Re: [ceph-users] All OSDs don't restart after shutdown

2014-11-06 Thread Antonio Messina
On Thu, Nov 6, 2014 at 12:00 PM, Luca Mazzaferro luca.mazzafe...@rzg.mpg.de wrote: Dear Users, Hi Luca, On the admin-node side the ceph healt command or the ceph -w hangs forever. I'm not a ceph expert either, but this is usually an indication that the monitors are not running. How many MONs

[ceph-users] Typical 10GbE latency

2014-11-06 Thread Wido den Hollander
Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I have access to a couple of Ceph cluster and I ran a simple ping test: $ ping -s 8192 -c 100 -n ip Two results I got: rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms rtt min/avg/max/mdev =

[ceph-users] PG inconsistency

2014-11-06 Thread GuangYang
Hello Cephers, Recently we observed a couple of inconsistencies in our Ceph cluster, there were two major patterns leading to inconsistency as I observed: 1) EIO to read the file, 2) the digest is inconsistent (for EC) even there is no read error). While ceph has built-in tool sets to repair

Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Dan van der Ster
Between two hosts on an HP Procurve 6600, no jumbo frames: rtt min/avg/max/mdev = 0.096/0.128/0.151/0.019 ms Cheers, Dan On Thu Nov 06 2014 at 2:19:07 PM Wido den Hollander w...@42on.com wrote: Hello, While working at a customer I've ran into a 10GbE latency which seems high to me. I

Re: [ceph-users] PG inconsistency

2014-11-06 Thread Dan van der Ster
Hi, I've only ever seen (1), EIO to read a file. In this case I've always just killed / formatted / replaced that OSD completely -- that moves the PG to a new master and the new replication fixes the inconsistency. This way, I've never had to pg repair. I don't know if this is a best or even good

Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Luis Periquito
Hi Wido, What is the full topology? Are you using a north-south or east-west? So far I've seen the east-west are slightly slower. What are the fabric modes you have configured? How is everything connected? Also you have no information on the OS - if I remember correctly there was a lot of

Re: [ceph-users] All OSDs don't restart after shutdown

2014-11-06 Thread Luca Mazzaferro
On 11/06/2014 12:36 PM, Antonio Messina wrote: On Thu, Nov 6, 2014 at 12:00 PM, Luca Mazzaferro luca.mazzafe...@rzg.mpg.de wrote: Dear Users, Hi Luca, On the admin-node side the ceph healt command or the ceph -w hangs forever. I'm not a ceph expert either, but this is usually an indication

Re: [ceph-users] PG inconsistency

2014-11-06 Thread GuangYang
Thanks Dan. By killed/formatted/replaced the OSD, did you replace the disk? Not an filesystem expert here, but would like to understand the underlying what happened behind the EIO and does that reveal something (e.g. hardware issue). In our case, we are using 6TB drive so that there are lot of

Re: [ceph-users] PG inconsistency

2014-11-06 Thread Irek Fasikhov
What is your version of the ceph? 0.80.0 - 0.80.3 https://github.com/ceph/ceph/commit/7557a8139425d1705b481d7f010683169fd5e49b Thu Nov 06 2014 at 16:24:21, GuangYang yguan...@outlook.com: Hello Cephers, Recently we observed a couple of inconsistencies in our Ceph cluster, there were two major

Re: [ceph-users] PG inconsistency

2014-11-06 Thread GuangYang
We are using v0.80.4. Just would like to ask for general suggestion here :) Thanks, Guang From: malm...@gmail.com Date: Thu, 6 Nov 2014 13:46:12 + Subject: Re: [ceph-users] PG inconsistency To: yguan...@outlook.com; ceph-de...@vger.kernel.org;

Re: [ceph-users] PG inconsistency

2014-11-06 Thread Irek Fasikhov
Thu Nov 06 2014 at 16:44:09, GuangYang yguan...@outlook.com: Thanks Dan. By killed/formatted/replaced the OSD, did you replace the disk? Not an filesystem expert here, but would like to understand the underlying what happened behind the EIO and does that reveal something (e.g. hardware

Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread German Anders
also, between two hosts on a NetGear SW model at 10GbE: rtt min/avg/max/mdev = 0.104/0.196/0.288/0.055 ms German Anders --- Original message --- Asunto: [ceph-users] Typical 10GbE latency De: Wido den Hollander w...@42on.com Para: ceph-us...@ceph.com Fecha: Thursday,

Re: [ceph-users] PG inconsistency

2014-11-06 Thread Dan van der Ster
IIRC, the EIO we had also correlated with a SMART status that showed the disk was bad enough for a warranty replacement -- so yes, I replaced the disk in these cases. Cheers, Dan On Thu Nov 06 2014 at 2:44:08 PM GuangYang yguan...@outlook.com wrote: Thanks Dan. By killed/formatted/replaced the

Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Wido den Hollander
On 11/06/2014 02:38 PM, Luis Periquito wrote: Hi Wido, What is the full topology? Are you using a north-south or east-west? So far I've seen the east-west are slightly slower. What are the fabric modes you have configured? How is everything connected? Also you have no information on the OS

Re: [ceph-users] buckets and users

2014-11-06 Thread Marco Garcês
By the way, Is it possible to run 2 radosgw on the same host? I think I have created the zone, not sure if it was correct, because it used the default pool names, even though I had changed them in the json file I had provided. Now I am trying to run ceph-radosgw with two different entries in the

Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Udo Lembke
Hi, from one host to five OSD-hosts. NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade network). rtt min/avg/max/mdev = 0.075/0.114/0.231/0.037 ms rtt min/avg/max/mdev = 0.088/0.164/0.739/0.072 ms rtt min/avg/max/mdev = 0.081/0.141/0.229/0.030 ms rtt min/avg/max/mdev =

Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Luis Periquito
What is the COPP? On Thu, Nov 6, 2014 at 1:53 PM, Wido den Hollander w...@42on.com wrote: On 11/06/2014 02:38 PM, Luis Periquito wrote: Hi Wido, What is the full topology? Are you using a north-south or east-west? So far I've seen the east-west are slightly slower. What are the fabric

Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Irek Fasikhov
Hi,Udo. Good value :) Whether an additional optimization on the host? Thanks. Thu Nov 06 2014 at 16:57:36, Udo Lembke ulem...@polarzone.de: Hi, from one host to five OSD-hosts. NIC Intel 82599EB; jumbo-frames; single Switch IBM G8124 (blade network). rtt min/avg/max/mdev =

Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Robert Sander
Hi, 2 LACP bonded Intel Corporation Ethernet 10G 2P X520 Adapters, no jumbo frames, here: rtt min/avg/max/mdev = 0.141/0.207/0.313/0.040 ms rtt min/avg/max/mdev = 0.124/0.223/0.289/0.044 ms rtt min/avg/max/mdev = 0.302/0.378/0.460/0.038 ms rtt min/avg/max/mdev = 0.282/0.389/0.473/0.035 ms All

Re: [ceph-users] buckets and users

2014-11-06 Thread Marco Garcês
Update: I was able to fix the authentication error, and I have 2 radosgw running on the same host. The problem now, is, I believe I have created the zone wrong, or, I am doing something wrong, because I can login with the user I had before, and I can access his buckets. I need to have everything

Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Wido den Hollander
On 11/06/2014 02:58 PM, Luis Periquito wrote: What is the COPP? Nothing special, default settings. 200 ICMP packets/second. But we also tested with a direct TwinAx cable between two hosts, so no switch involved. That did not improve the latency. So this seems to be a kernel/driver issue

Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-06 Thread Chad William Seys
Hi Sam, Sounds like you needed osd 20. You can mark osd 20 lost. -Sam Does not work: # ceph osd lost 20 --yes-i-really-mean-it osd.20 is not down or doesn't exist Also, here is an interesting post which I will follow from October:

Re: [ceph-users] buckets and users

2014-11-06 Thread Craig Lewis
You need to tell each radosgw daemon which zone to use. In ceph.conf, I have: [client.radosgw.ceph3c] host = ceph3c rgw socket path = /var/run/ceph/radosgw.ceph3c keyring = /etc/ceph/ceph.client.radosgw.ceph3c.keyring log file = /var/log/ceph/radosgw.log admin socket =

[ceph-users] Red Hat/CentOS kernel-ml to get RBD module

2014-11-06 Thread Robert LeBlanc
The maintainers of the kernel-ml[1] package have graciously accepted the request to include the RBD module in the mainline kernel build[2]. This should help people test out new kernels with RBD easier if you have better things to than build new kernels. Thanks kernel-ml maintainers! Robert

Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Udo Lembke
Hi, no special optimizations on the host. In this case the pings are from an proxmox-ve host to ceph-osds (ubuntu + debian). The pings from one osd to the others are comparable. Udo On 06.11.2014 15:00, Irek Fasikhov wrote: Hi,Udo. Good value :) Whether an additional optimization on the

Re: [ceph-users] Typical 10GbE latency

2014-11-06 Thread Robert LeBlanc
rtt min/avg/max/mdev = 0.130/0.157/0.190/0.016 ms IPoIB Mellanox ConnectX-3 MT27500 FDR adapter and Mellanox IS5022 QDR switch MTU set to 65520. CentOS 7.0.1406 running 3.17.2-1.el7.elrepo.x86_64 on Intel(R) Atom(TM) CPU C2750 with 32 GB of RAM. On Thu, Nov 6, 2014 at 9:46 AM, Udo Lembke

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-06 Thread John Spray
Jasper, Thanks for this -- I've reproduced this issue in a development environment. We'll see if this is also an issue on giant, and backport a fix if appropriate. I'll update this thread soon. Cheers, John On Mon, Nov 3, 2014 at 8:49 AM, Jasper Siero jasper.si...@target-holding.nl wrote:

Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-06 Thread Samuel Just
Amusingly, that's what I'm working on this week. http://tracker.ceph.com/issues/7862 There are pretty good reasons for why it works the way it does right now, but it certainly is unexpected. -Sam On Thu, Nov 6, 2014 at 7:18 AM, Chad William Seys cws...@physics.wisc.edu wrote: Hi Sam, Sounds

Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-06 Thread Samuel Just
Also, are you certain that osd 20 is not up? -Sam On Thu, Nov 6, 2014 at 10:52 AM, Samuel Just sam.j...@inktank.com wrote: Amusingly, that's what I'm working on this week. http://tracker.ceph.com/issues/7862 There are pretty good reasons for why it works the way it does right now, but it

Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-06 Thread Chad Seys
Hi Sam, Amusingly, that's what I'm working on this week. http://tracker.ceph.com/issues/7862 Well, thanks for any bugfixes in advance! :) Also, are you certain that osd 20 is not up? -Sam Yep. # ceph osd metadata 20 Error ENOENT: osd.20 does not exist So part of ceph thinks osd.20

Re: [ceph-users] Basic Ceph Questions

2014-11-06 Thread Craig Lewis
On Wed, Nov 5, 2014 at 11:57 PM, Wido den Hollander w...@42on.com wrote: On 11/05/2014 11:03 PM, Lindsay Mathieson wrote: - Geo Replication - thats done via federated gateways? looks complicated :( * The remote slave, it would be read only? That is only for the RADOS Gateway. Ceph

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-06 Thread John Spray
This is still an issue on master, so a fix will be coming soon. Follow the ticket for updates: http://tracker.ceph.com/issues/10025 Thanks for finding the bug! John On Thu, Nov 6, 2014 at 6:21 PM, John Spray john.sp...@redhat.com wrote: Jasper, Thanks for this -- I've reproduced this issue

Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-06 Thread Craig Lewis
On Thu, Nov 6, 2014 at 11:27 AM, Chad Seys Also, are you certain that osd 20 is not up? -Sam Yep. # ceph osd metadata 20 Error ENOENT: osd.20 does not exist So part of ceph thinks osd.20 doesn't exist, but another part (the down_osds_we_would_probe) thinks the osd exists and is down?

Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-06 Thread Chad Seys
Hi Craig, You'll have trouble until osd.20 exists again. Ceph really does not want to lose data. Even if you tell it the osd is gone, ceph won't believe you. Once ceph can probe any osd that claims to be 20, it might let you proceed with your recovery. Then you'll probably need to use

[ceph-users] RBD Diff based on Timestamp

2014-11-06 Thread Nick Fisk
I have been thinking about the implications of losing the snapshot chain on a RBD when doing export-diff-import-diff between two separate physical locations. As I understand it, in this scenario when you take the first snapshot again on the source, you would In effect end up copying the whole RBD

[ceph-users] Installing CephFs via puppet

2014-11-06 Thread JIten Shah
Hi Guys, I am sure many of you guys have installed cephfs using puppet. I am trying to install “firefly” using the puppet module from https://github.com/ceph/puppet-ceph.git and running into the “ceph_config” file issue where it’s unable to find the config file and I am not sure why.

[ceph-users] installing ceph object gateway

2014-11-06 Thread Michael Kuriger
Is there updated documentation explaining how to install and use the object gateway? http://docs.ceph.com/docs/master/install/install-ceph-gateway/ I attempted this install and quickly run into problems. Thanks! -M ___ ceph-users mailing list

Re: [ceph-users] osd down

2014-11-06 Thread Shain Miley
I tried restarting all the osd's on that node, osd.70 was the only ceph process that did not come back online. There is nothing in the ceph-osd log for osd.70. However I do see over 13,000 of these messages in the kern.log: Nov 6 19:54:27 hqosd6 kernel: [34042786.392178] XFS (sdl1):

Re: [ceph-users] Installing CephFs via puppet

2014-11-06 Thread Loic Dachary
Hi, At the moment puppet-ceph does not support CephFS. The error you're seeing does not ring a bell, would you have more context to help diagnose it ? Cheers On 06/11/2014 23:44, JIten Shah wrote: Hi Guys, I am sure many of you guys have installed cephfs using puppet. I am trying to

Re: [ceph-users] Installing CephFs via puppet

2014-11-06 Thread JIten Shah
Thanks Loic. What is the recommended puppet module for installing cephFS ? I can send more details about puppet-ceph but basically I haven't changed anything in there except for assigning values to the required params in the yaml file. --Jiten On Nov 6, 2014, at 7:24 PM, Loic Dachary

Re: [ceph-users] Ceph Cluster with two radosgw

2014-11-06 Thread lakshmi k s
Any best practices available for Radosgw HA? Please suggest. On Wednesday, November 5, 2014 2:08 PM, lakshmi k s lux...@yahoo.com wrote: Hello - My ceph cluster needs to have two rados gateway nodes eventually interfacing with Openstack haproxy. I have been successful in bringing up one of

Re: [ceph-users] installing ceph object gateway

2014-11-06 Thread M Ranga Swami Reddy
Please share the problem/issue details(like error msg,etc). We could check and help. Thanks Swami On Fri, Nov 7, 2014 at 4:41 AM, Michael Kuriger mk7...@yp.com wrote: Is there updated documentation explaining how to install and use the object gateway?

[ceph-users] Is it normal that osd's memory exceed 1GB under stress test?

2014-11-06 Thread 谢锐
I set mon_osd_down_out_interval to two days,and do stress test. the memory of osd exceed 1GB.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Is it normal that osd's memory exceed 1GB under stresstest?

2014-11-06 Thread 谢锐
and make one osd down.then do stress test by fio. -- Original -- From: 谢锐xie...@szsandstone.com; Date: Fri, Nov 7, 2014 02:50 PM To: ceph-usersceph-us...@ceph.com; Subject: [ceph-users] Is it normal that osd's memory exceed 1GB under stresstest? I set