Re: [ceph-users] Public network faster than cluster network

2018-05-16 Thread Gandalf Corvotempesta
No more advices for a new cluster ? Sorry for these multiple posts but I had some trouble with ML. I'm getting "Access Denied" Il giorno ven 11 mag 2018 alle ore 10:21 Gandalf Corvotempesta < gandalf.corvotempe...@gmail.com> ha scritto: > no more advices for a new cluster ?

Re: [ceph-users] Public network faster than cluster network

2018-05-10 Thread Gandalf Corvotempesta
Il giorno gio 10 mag 2018 alle ore 09:48 Christian Balzer ha scritto: > Without knowing what your use case is (lots of large reads or writes, or > the more typical smallish I/Os) it's hard to give specific advice. 99% VM hosting. Everything else would be negligible and I don't

Re: [ceph-users] Public network faster than cluster network

2018-05-10 Thread Gandalf Corvotempesta
Il giorno gio 10 mag 2018 alle ore 02:30 Christian Balzer ha scritto: > This cosmic imbalance would clearly lead to the end of the universe. > Seriously, think it through, what do you _think_ will happen? I thought what David told: "For a write on a replicated pool with size 3

[ceph-users] Public network faster than cluster network

2018-05-09 Thread Gandalf Corvotempesta
As subject, what would happen ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cluster network slower than public network

2017-11-15 Thread Gandalf Corvotempesta
Any idea? I have 1 16 ports 10gb switch, 2 or more 24ports gigabit switches and 5 OSDs (MONs running over them) and 5 hypervisor servers to connect to the storage At least 10 ports are needed for each network, thus, 20 ports for both cluster and public, right ? I don't have 20 10gb ports Il

Re: [ceph-users] Cluster network slower than public network

2017-11-15 Thread Gandalf Corvotempesta
(the one connected to the hypervisors) and two 1gb switch as cluster network Il 15 nov 2017 1:50 PM, "Gandalf Corvotempesta" < gandalf.corvotempe...@gmail.com> ha scritto: > As 10gb switches are expansive, what would happen by using a gigabit > cluster network and

[ceph-users] Cluster network slower than public network

2017-11-15 Thread Gandalf Corvotempesta
As 10gb switches are expansive, what would happen by using a gigabit cluster network and a 10gb public network? Replication and rebalance should be slow, but what about public I/O ? When a client wants to write to a file, it does over the public network and the ceph automatically replicate it

Re: [ceph-users] Small cluster for VMs hosting

2017-11-07 Thread Gandalf Corvotempesta
2017-11-07 14:49 GMT+01:00 Richard Hesketh : > Read up on > http://docs.ceph.com/docs/master/rados/operations/monitoring-osd-pg/ and > http://docs.ceph.com/docs/master/rados/operations/pg-states/ - understanding > what the different states that PGs and OSDs can be

[ceph-users] Small cluster for VMs hosting

2017-11-07 Thread Gandalf Corvotempesta
Hi to all I've been far from ceph from a couple of years (CephFS was still unstable) I would like to test it again, some questions for a production cluster for VMs hosting: 1. Is CephFS stable? 2. Can I spin up a 3 nodes cluster with mons, MDS and osds on the same machine? 3. Hardware

Re: [ceph-users] MDS failover

2017-04-15 Thread Gandalf Corvotempesta
Il 15 apr 2017 5:48 PM, "John Spray" ha scritto: MDSs do not replicate to one another. They write all metadata to a RADOS pool (i.e. to the OSDs), and when a failover happens, the new active MDS reads the metadata in. Is MDS atomic? A successful ack is sent only after data

[ceph-users] MDS failover

2017-04-15 Thread Gandalf Corvotempesta
Hi to all Sorry if this question was already asked but I didn't find anything related AFAIK MDS are a foundametal component for CephFS. What happens in case of MDS crash between replication from the active MDS to the slaves? Changes made between the crash and the missing replication are lost?

Re: [ceph-users] new Open Source Ceph based iSCSI SAN project

2016-10-16 Thread Gandalf Corvotempesta
Really interesting project Il 16 ott 2016 18:57, "Maged Mokhtar" ha scritto: > Hello, > > I am happy to announce PetaSAN, an open source scale-out SAN that uses > Ceph storage and LIO iSCSI Target. > visit us at: > www.petasan.org > > your feedback will be much

Re: [ceph-users] 6 Node cluster with 24 SSD per node: Hardwareplanning/ agreement

2016-10-11 Thread Gandalf Corvotempesta
Il 11 ott 2016 3:05 AM, "Christian Balzer" ha scritto: > 10Gb/s MC-LAG (white box) switches are also widely available and > affordable. > At which models are you referring to? I've never found any 10gb switches at less than many thousands euros. The cheaper ones i've found are

Re: [ceph-users] IOPS requirements

2016-06-20 Thread Gandalf Corvotempesta
Il 18 giu 2016 07:10, "Christian Balzer" ha scritto: > That sounds extremely high, is that more or less consistent? > How many VMs is that for? > What are you looking at, as in are those individual disks/SSDs, a raid > (what kind)? 800-1000 was a peak in a about 5 minutes. it was

Re: [ceph-users] IOPS requirements

2016-06-17 Thread Gandalf Corvotempesta
2016-06-17 10:03 GMT+02:00 Christian Balzer : > I'm unfamilar with Xen and Xenserver (the later doesn't support RBD, btw), > but if you can see all the combined activity of your VMs on your HW in the > dom0 like with KVM/qemu, a simple "iostat" or "iostat -x" will give you the >

[ceph-users] IOPS requirements

2016-06-17 Thread Gandalf Corvotempesta
As I'm planning a new cluster where to move all my virtual machine (currently on local storage on each hypervisor) i would like to evaluate the current IOPS on each server Knowing the current iops i'll be able to know how many iops i need on ceph I'm not an expert, do know know how to get this

Re: [ceph-users] Switches and latency

2016-06-16 Thread Gandalf Corvotempesta
2016-06-16 12:54 GMT+02:00 Oliver Dzombic : > aside from the question of the coolness factor of Infinitiband, > you should always also consider the question of replacing parts and > extending cluster. > > A 10G Network environment is up to date currently, and will be for

Re: [ceph-users] Switches and latency

2016-06-15 Thread Gandalf Corvotempesta
2016-06-15 22:59 GMT+02:00 Nick Fisk : > Possibly, but by how much? 20GB of bandwidth is a lot to feed 12x7.2k disks, > particularly if they start doing any sort of non-sequential IO. Assuming 100MB/s for each SATA disk, 12 disks are 1200MB/s = 9600mbit/s Why are you talking

Re: [ceph-users] Switches and latency

2016-06-15 Thread Gandalf Corvotempesta
2016-06-15 22:13 GMT+02:00 Nick Fisk : > I would reconsider if you need separate switches for each network, vlans > would normally be sufficient. If bandwidth is not an issue, you could even > tag both vlans over the same uplinks. Then there is the discussion around > whether

[ceph-users] Switches and latency

2016-06-15 Thread Gandalf Corvotempesta
Let's assume a fully redundant network. We need 4 switches, 2 for the public network, 2 for the cluster network. 10GBase-T has higher latency than SFP+ but are also cheaper, as manu new servers ha 10GBaseT integrated onboard and there is no need for twinax cables or transceaver. I think that low

Re: [ceph-users] Disk failures

2016-06-15 Thread Gandalf Corvotempesta
Il 15 giu 2016 09:58, "Christian Balzer" ha scritto > You _do_ know how and where Ceph/RBD store their data? > > Right now that's on disks/SSDs, formated with a file system. > And XFS or EXT4 will not protect against bitrot, while BTRFS and ZFS will. > Wait, I'm new to ceph and

Re: [ceph-users] Disk failures

2016-06-15 Thread Gandalf Corvotempesta
Il 15 giu 2016 09:42, "Christian Balzer" ha scritto: > > This is why people are using BTRFS and ZFS for filestore (despite the > problems they in turn create) and why the roadmap for bluestore has > checksums for reads on it as well (or so we've been told). Bitrot happens only on

Re: [ceph-users] Disk failures

2016-06-15 Thread Gandalf Corvotempesta
Il 15 giu 2016 03:27, "Christian Balzer" ha scritto: > And that makes deep-scrubbing something of quite limited value. This is not true. If you checksum *before* writing to disk (so when data is still in ram) then when reading back from disk you could do the checksum verification

Re: [ceph-users] RDMA/Infiniband status

2016-06-09 Thread Gandalf Corvotempesta
Il 09 giu 2016 15:41, "Adam Tygart" ha scritto: > > If you're > using pure DDR, you may need to tune the broadcast group in your > subnet manager to set the speed to DDR. Do you know how to set this with opensm? I would like to bring up my test cluster again next days

Re: [ceph-users] Disk failures

2016-06-09 Thread Gandalf Corvotempesta
2016-06-09 10:28 GMT+02:00 Christian Balzer : > Define "small" cluster. Max 14 OSD nodes with 12 disks each, replica 3. > Your smallest failure domain both in Ceph (CRUSH rules) and for > calculating how much over-provisioning you need should always be the > node/host. > This is

Re: [ceph-users] RDMA/Infiniband status

2016-06-09 Thread Gandalf Corvotempesta
2016-06-09 10:18 GMT+02:00 Christian Balzer : > IPoIB is about half the speed of your IB layer, yes. Ok, so it's normal. I've seen benchmarks on net stating that IPoIB on DDR should reach about 16-17Gb/s I'll plan to move to QDR > And bandwidth is (usually) not the biggest issue,

[ceph-users] RDMA/Infiniband status

2016-06-09 Thread Gandalf Corvotempesta
Last time i've used Ceph (about 2014) RDMA/Infiniband support was just a proof of concept and I was using IPoIB with low performance (about 8-10GB/s on a Infiniband DDR 20Gb/s) This was 2 years ago. Any news about this? Is RDMA/Infiniband supported like with GlusterFS?

Re: [ceph-users] Disk failures

2016-06-09 Thread Gandalf Corvotempesta
2016-06-09 9:16 GMT+02:00 Christian Balzer : > Neither, a journal failure is lethal for the OSD involved and unless you > have LOTS of money RAID1 SSDs are a waste. Ok, so if a journal failure is lethal, ceph automatically remove the affected OSD and start rebalance, right ? >

Re: [ceph-users] Disk failures

2016-06-09 Thread Gandalf Corvotempesta
Il 09 giu 2016 02:09, "Christian Balzer" ha scritto: > Ceph currently doesn't do any (relevant) checksumming at all, so if a > PRIMARY PG suffers from bit-rot this will be undetected until the next > deep-scrub. > > This is one of the longest and gravest outstanding issues with

Re: [ceph-users] Disk failures

2016-06-08 Thread Gandalf Corvotempesta
2016-06-08 20:49 GMT+02:00 Krzysztof Nowicki : > From my own experience with failing HDDs I've seen cases where the drive was > failing silently initially. This manifested itself in repeated deep scrub > failures. Correct me if I'm wrong here, but Ceph keeps

[ceph-users] Disk failures

2016-06-07 Thread Gandalf Corvotempesta
Hi, How ceph detect and manage disk failures? What happens if some data are wrote on a bad sector? Are there any change to get the bad sector "distributed" across the cluster due to the replication? Is ceph able to remove the OSD bound to the failed disk automatically?

[ceph-users] Disaster recovery and backups

2016-06-05 Thread Gandalf Corvotempesta
Let's assume that everything went very very bad and i have to manually recover a cluster with an unconfigured ceph. 1. How can i recover datas directly from raw disks? Is this possible? 2. How can i restore a ceph cluster (and have data back) by using existing disks? 3. How do you manage backups

Re: [ceph-users] Migrate whole clusters

2014-05-13 Thread Gandalf Corvotempesta
2014-05-13 21:21 GMT+02:00 Gregory Farnum g...@inktank.com: You misunderstand. Migrating between machines for incrementally upgrading your hardware is normal behavior and well-tested (likewise for swapping in all-new hardware, as long as you understand the IO requirements involved). So is

[ceph-users] Migrate whole clusters

2014-05-09 Thread Gandalf Corvotempesta
Let's assume a test cluster up and running with real data on it. Which is the best way to migrate everything to a production (and larger) cluster? I'm thinking to add production MONs to the test cluster, after that, add productions OSDs to the test cluster, waiting for a full rebalance and then

Re: [ceph-users] Replace journals disk

2014-05-09 Thread Gandalf Corvotempesta
2014-05-09 15:55 GMT+02:00 Sage Weil s...@inktank.com: This looks correct to me! Some command to automate this in ceph would be nice. For example, skipping the mkjournal step: ceph-osd -i 30 --mkjournal ceph-osd -i 31 --mkjournal ceph should be smarth enough to automatically make journals if

Re: [ceph-users] Replace journals disk

2014-05-08 Thread Gandalf Corvotempesta
2014-05-08 18:43 GMT+02:00 Indra Pramana in...@sg.or.id: Since we don't use ceph.conf to indicate the data and journal paths, how can I recreate the journal partitions? 1. Dump the partition scheme: sgdisk --backup=/tmp/journal_table /dev/sdd 2. Replace the journal disk device 3. Restore the

[ceph-users] Cache tiering

2014-05-07 Thread Gandalf Corvotempesta
Very simple question: what happen if server bound to the cache pool goes down? For example, a read-only cache could be archived by using a single server with no redudancy. Is ceph smart enough to detect that cache is unavailable and transparently redirect all request to the main pool as usual ?

Re: [ceph-users] Replace journals disk

2014-05-06 Thread Gandalf Corvotempesta
2014-05-06 12:39 GMT+02:00 Andrija Panic andrija.pa...@gmail.com: Good question - I'm also interested. Do you want to movejournal to dedicated disk/partition i.e. on SSD or just replace (failed) disk with new/bigger one ? I would like to replace the disk with a bigger one (in fact, my new disk

Re: [ceph-users] Replace journals disk

2014-05-06 Thread Gandalf Corvotempesta
2014-05-06 13:08 GMT+02:00 Dan Van Der Ster daniel.vanders...@cern.ch: I've followed this recipe successfully in the past: http://wiki.skytech.dk/index.php/Ceph_-_howto,_rbd,_lvm,_cluster#Add.2Fmove_journal_in_running_cluster I'll try but my ceph.conf doesn't have any osd journal setting set

Re: [ceph-users] Replace journals disk

2014-05-06 Thread Gandalf Corvotempesta
2014-05-06 14:09 GMT+02:00 Fred Yang frederic.y...@gmail.com: The journal location is not in ceph.conf, check /var/lib/ceph/osd/ceph-X/journal, which is a symlink to the osd's journal device. Symlink are pointing to partition UUID this prevent the replacement without manual intervetion:

Re: [ceph-users] Replace journals disk

2014-05-06 Thread Gandalf Corvotempesta
2014-05-06 16:33 GMT+02:00 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com: Symlink are pointing to partition UUID this prevent the replacement without manual intervetion: journal - /dev/disk/by-partuuid/b234da10-dcad-40c7-aa97-92d35099e5a4 is not possible to create symlink pointing

Re: [ceph-users] Replace journals disk

2014-05-06 Thread Gandalf Corvotempesta
2014-05-06 19:40 GMT+02:00 Craig Lewis cle...@centraldesktop.com: I haven't tried this yet, but I imagine that the process is similar to moving your journal from the spinning disk to an SSD. My journals are on SSD. I have to replace that SSD. ___

[ceph-users] pgmap version increasing

2014-04-30 Thread Gandalf Corvotempesta
I'm testing an idle ceph cluster. my pgmap version is always increasing, is this normal ? 2014-04-30 17:20:41.934127 mon.0 [INF] pgmap v281: 640 pgs: 640 active+clean; 0 bytes data, 333 MB used, 14896 GB / 14896 GB avail 2014-04-30 17:20:42.962033 mon.0 [INF] pgmap v282: 640 pgs: 640

Re: [ceph-users] Unable to bring cluster up

2014-04-30 Thread Gandalf Corvotempesta
2014-04-30 22:11 GMT+02:00 Andrey Korolyov and...@xdel.ru: regarding this one and previous you told about memory consumption - there are too much PGs, so memory consumption is so high as you are observing. Dead loop of osd-never-goes-up is probably because of suicide timeout of internal

Re: [ceph-users] Red Hat to acquire Inktank

2014-04-30 Thread Gandalf Corvotempesta
2014-04-30 14:18 GMT+02:00 Sage Weil s...@inktank.com: Today we are announcing some very big news: Red Hat is acquiring Inktank. Great news. Any changes to get native Infiniband support in ceph like in GlusterFS ? ___ ceph-users mailing list

Re: [ceph-users] Red Hat to acquire Inktank

2014-04-30 Thread Gandalf Corvotempesta
2014-04-30 22:27 GMT+02:00 Mark Nelson mark.nel...@inktank.com: Check out the xio work that the linuxbox/mellanox folks are working on. Matt Benjamin has posted quite a bit of info to the list recently! Is that usable ? ___ ceph-users mailing list

Re: [ceph-users] Red Hat to acquire Inktank

2014-04-30 Thread Gandalf Corvotempesta
2014-05-01 0:11 GMT+02:00 Mark Nelson mark.nel...@inktank.com: Usable is such a vague word. I imagine it's testable after a fashion. :D Ok but I prefere an official support with IB integrated in main ceph repo ___ ceph-users mailing list

Re: [ceph-users] OOM-Killer for ceph-osd

2014-04-28 Thread Gandalf Corvotempesta
2014-04-27 23:58 GMT+02:00 Andrey Korolyov and...@xdel.ru: Nothing looks wrong, except heartbeat interval which probably should be smaller due to recovery considerations. Try ``ceph osd tell X heap release'' and if it will not change memory consumption, file a bug. What should I look for

Re: [ceph-users] cluster_network ignored

2014-04-28 Thread Gandalf Corvotempesta
2014-04-26 12:06 GMT+02:00 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com: I've not defined cluster IPs for each OSD server but only the whole subnet. Should I define each IP for each OSD ? This is not wrote on docs and could be tricky to do this in big environments with hundreds

Re: [ceph-users] cluster_network ignored

2014-04-28 Thread Gandalf Corvotempesta
2014-04-28 17:17 GMT+02:00 Kurt Bauer kurt.ba...@univie.ac.at: What do you mean by I see all OSDs down? I mean that my OSDs are detected as down: $ sudo ceph osd tree # id weight type name up/down reweight -1 12.74 root default -2 3.64 host osd13 0 1.82 osd.0 down 0 2 1.82 osd.2 down 0 -3 5.46

Re: [ceph-users] OOM-Killer for ceph-osd

2014-04-27 Thread Gandalf Corvotempesta
So, are you suggesting to lower the pg count ? Actually i'm using the suggested number of OSD*100/Replicas and I have just 2 OSDs per server. 2014-04-24 19:34 GMT+02:00 Andrey Korolyov and...@xdel.ru: On 04/24/2014 08:14 PM, Gandalf Corvotempesta wrote: During a recovery, I'm hitting oom

Re: [ceph-users] cluster_network ignored

2014-04-26 Thread Gandalf Corvotempesta
have all of the cluster IP's defined in the host file on each OSD server? As I understand it, the mon's do not use a cluster network, only the OSD servers. -Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Gandalf

[ceph-users] cluster_network ignored

2014-04-24 Thread Gandalf Corvotempesta
I'm trying to configure a small ceph cluster with both public and cluster networks. This is my conf: [global] public_network = 192.168.0/24 cluster_network = 10.0.0.0/24 auth cluster required = cephx auth service required = cephx auth client required = cephx fsid =

[ceph-users] Fwd: RadosGW: bad request

2014-04-23 Thread Gandalf Corvotempesta
-- Forwarded message -- From: Gandalf Corvotempesta gandalf.corvotempe...@gmail.com Date: 2014-04-14 16:06 GMT+02:00 Subject: Fwd: [ceph-users] RadosGW: bad request To: ceph-users@lists.ceph.com ceph-users@lists.ceph.com -- Forwarded message -- From: Gandalf

Re: [ceph-users] RadosGW: bad request

2014-04-09 Thread Gandalf Corvotempesta
2014-04-07 20:24 GMT+02:00 Yehuda Sadeh yeh...@inktank.com: Try bumping up logs (debug rgw = 20, debug ms = 1). Not enough info here to say much, note that it takes exactly 30 seconds for the gateway to send the error response, may be some timeout. I'd verify that the correct fastcgi module is

[ceph-users] RadosGW: bad request

2014-04-07 Thread Gandalf Corvotempesta
I'm getting these trying to upload any file: 2014-04-07 14:33:27.084369 7f5268f86700 5 Getting permissions id=testuser owner=testuser perm=2 2014-04-07 14:33:27.084372 7f5268f86700 10 uid=testuser requested perm (type)=2, policy perm=2, user_perm_mask=2, acl perm=2 2014-04-07 14:33:27.084377

Re: [ceph-users] OSD down after PG increase

2014-03-13 Thread Gandalf Corvotempesta
2014-03-13 9:02 GMT+01:00 Andrey Korolyov and...@xdel.ru: Yes, if you have essentially high amount of commited data in the cluster and/or large number of PG(tens of thousands). I've increased from 64 to 8192 PGs If you have a room to experiment with this transition from scratch you may want

Re: [ceph-users] OSD down after PG increase

2014-03-13 Thread Gandalf Corvotempesta
2014-03-13 11:19 GMT+01:00 Dan Van Der Ster daniel.vanders...@cern.ch: Do you mean you used PG splitting? You should split PGs by a factor of 2x at a time. So to get from 64 to 8192, do 64-128, then 128-256, ..., 4096-8192. I've brutally increased, no further steps. 64 - 8192 :-)

Re: [ceph-users] OSD down after PG increase

2014-03-13 Thread Gandalf Corvotempesta
2014-03-13 11:32 GMT+01:00 Dan Van Der Ster daniel.vanders...@cern.ch: Do you have any other pools? Remember that you need to include _all_ pools in the PG calculation, not just a single pool. Actually I have only standard pools (that should be 3) In production i'll also have RGW. So, which is

Re: [ceph-users] clock skew

2014-03-13 Thread Gandalf Corvotempesta
2014-03-13 12:59 GMT+01:00 Joao Eduardo Luis joao.l...@inktank.com: Anyway, most timeouts will hold for 5 seconds. Allowing clock drifts up to 1 second may work, but we don't have hard data to support such claim. Over a second of drift may be problematic if the monitors are under some

Re: [ceph-users] clock skew

2014-03-12 Thread Gandalf Corvotempesta
2014-01-30 18:41 GMT+01:00 Eric Eastman eri...@aol.com: I have this problem on some of my Ceph clusters, and I think it is due to the older hardware the I am using does not have the best clocks. To fix the problem, I setup one server in my lab to be my local NTP time server, and then on each

[ceph-users] Wrong PG nums

2014-03-12 Thread Gandalf Corvotempesta
Hi to all I have this in my conf: # grep 'pg num' /etc/ceph/ceph.conf osd pool default pg num = 5600 But: # ceph osd pool get data pg_num pg_num: 64 Is this normal ? Why just 64 pg was created ? ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] Add RGW replication

2014-03-01 Thread Gandalf Corvotempesta
Hi, I have a working ceph cluster. Is possible to add RGW replication across two sites in a second time or is a feature that needs to be implemented from the start? ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] clock skew

2014-01-30 Thread Gandalf Corvotempesta
Hi. I'm using ntpd on each ceph server and is syncing properly but every time that I reboot, ceph starts in degraded mode with clock skew warning. The only way that I have to solve this is manually restart ceph on each node (without resyncing clock) Any suggestion ?

Re: [ceph-users] clock skew

2014-01-30 Thread Gandalf Corvotempesta
2014-01-30 Emmanuel Lacour elac...@easter-eggs.com: here, I just wait until the skew is finished, without touching ceph. It doesn't seems to do anything bad ... I've waited more than 1 hour with no success. ___ ceph-users mailing list

[ceph-users] Chef cookbooks

2014-01-29 Thread Gandalf Corvotempesta
I'm looking at this: https://github.com/ceph/ceph-cookbooks seems to support the whole ceph stack (rgw, mons, osd, msd) Here: http://wiki.ceph.com/Guides/General_Guides/Deploying_Ceph_with_Chef#Configure_your_Ceph_Environment I can see that I need to configure the environment as for example and

[ceph-users] ceph-deploy: update ceph.conf

2014-01-29 Thread Gandalf Corvotempesta
Hi, I would like to customize my ceph.conf generated by ceph-deploy. Should I customize ceph.conf stored on admin node and then sync it on each ceph nodes? If yes: 1. can I sync directly from ceph-deploy or I have to sync manually via scp ? 2. I don't see any host definition in ceph.conf, what

Re: [ceph-users] Sanity check of deploying Ceph very unconventionally (on top of RAID6, with very few nodes and OSDs)

2013-12-22 Thread Gandalf Corvotempesta
2013/12/17 Christian Balzer ch...@gol.com: Network: Infiniband QDR, 2x 18port switches (interconnected of course), redundant paths everywhere, including to the clients (compute nodes). Are you using IPoIB ? How do you interconnect both switches without making loops ? AFAIK, IB switches doesn't

Re: [ceph-users] ceph-deploy: cluster network and admin node

2013-12-19 Thread Gandalf Corvotempesta
2013/12/17 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com: There isnt' anything about how to define a cluster network for OSD. I don't know how to set a cluster address to each OSD. No help about this? I would like to set a cluster-address for each OSD Is this possible with ceph-deploy

Re: [ceph-users] USB pendrive as boot disk

2013-12-17 Thread Gandalf Corvotempesta
2013/12/16 Gregory Farnum g...@inktank.com: There are log_to_syslog and err_to_syslog config options that will send the ceph log output there. I don't remember all the config stuff you need to set up properly and be aware of, but you should be able to find it by searching the list archives or

[ceph-users] ceph-deploy: cluster network and admin node

2013-12-17 Thread Gandalf Corvotempesta
Hi to all I'm playing with ceph-deploy for the first time. Some questions: 1. how can I set a cluster network to be used by OSDs? Should I set it manually? 2. does admin node need to be reachable from each other server or can I use a natted workstation ?

Re: [ceph-users] ceph-deploy: cluster network and admin node

2013-12-17 Thread Gandalf Corvotempesta
2013/12/17 Alfredo Deza alfredo.d...@inktank.com: The docs have a quick section to do this with ceph-deploy (http://ceph.com/docs/master/start/quick-ceph-deploy/) Have you seen that before? Or do you need something that covers a cluster in more detail? There isnt' anything about how to define

Re: [ceph-users] Journal, SSD and OS

2013-12-06 Thread Gandalf Corvotempesta
2013/12/6 Sebastien Han sebastien@enovance.com: @James: I think that Gandalf’s main idea was to save some costs/space on the servers so having dedicated disks is not an option. (that what I understand from your comment “have the OS somewhere else” but I could be wrong) You are right. I

Re: [ceph-users] Journal, SSD and OS

2013-12-05 Thread Gandalf Corvotempesta
2013/12/4 Simon Leinen simon.lei...@switch.ch: I think this is a fine configuration - you won't be writing to the root partition too much, outside journals. We also put journals on the same SSDs as root partitions (not that we're very ambitious about performance...). Do you suggest a RAID1

[ceph-users] Journal, SSD and OS

2013-12-03 Thread Gandalf Corvotempesta
Hi, what do you think to use the same SSD as journal and as root partition? Forexample: 1x 128GB SSD 6 OSD 15GB for each journal, for each OSD 5GB as root partition for OS. This give me 105GB of used space and 23GB of unused space (i've read somewhere that is better to not use the whole SSD

Re: [ceph-users] installing OS on software RAID

2013-11-30 Thread Gandalf Corvotempesta
2013/11/25 James Harper james.har...@bendigoit.com.au: Is the OS doing anything apart from ceph? Would booting a ramdisk-only system from USB or compact flash work? This is the same question i've made some times ago. Is ok to use USB as standard OS (OS, non OSD!) disk? OSDs and journals will

[ceph-users] Docker

2013-11-28 Thread Gandalf Corvotempesta
Anybody using MONs and RGW inside docker containers? I would like to use a server with two docker containers, one for mon and one for RGW This to archieve a better isolation between services and some reusable components (the same container can be exported and used multiple times on multiple

[ceph-users] USB pendrive as boot disk

2013-11-05 Thread Gandalf Corvotempesta
Hi, what do you think to use a USB pendrive as boot disk for OSDs nodes? Pendrive are cheaper and bigger, and doing this will allow me to use all spinning disks and SSDs as OSD storage/journal. More over, in a future, i'll be able to boot from net replacing the pendrive without loosing space on

Re: [ceph-users] USB pendrive as boot disk

2013-11-05 Thread Gandalf Corvotempesta
2013/11/5 ja...@peacon.co.uk: It has been reported that the system is heavy on the OS during recovery; Why? Recovery is made from OSDs/SSD, why ceph is heavy on OS disks? There is nothing usefull to read from that disks during a recovery. ___

Re: [ceph-users] About use same SSD for OS and Journal

2013-10-26 Thread Gandalf Corvotempesta
2013/10/24 Wido den Hollander w...@42on.com: I have never seen one Intel SSD fail. I've been using them since the X25-M 80GB SSDs and those are still in production without even one wearing out or failing. Which kind of SSD are you using, right now, as journal ?

[ceph-users] MONs numbers, hardware sizing and write ack

2013-09-19 Thread Gandalf Corvotempesta
Hi to all, increasing the total numbers of MONs available in a cluster, for example growing from 3 to 5, will also decrease the hardware requirements (i.e. RAM and CPU) for each mon instance ? I'm asking this because our cluster will be made with 5 OSD server and I can easily put one MON on each

[ceph-users] 10/100 network for Mons?

2013-09-18 Thread Gandalf Corvotempesta
Hi to all. Actually I'm building a test cluster with 3 OSD servers connected with IPoIB for cluster networks and 10GbE for public network. I have to connect these OSDs to some MONs servers located in another rack with no gigabit or 10Gb connection. Could I use some 10/100 networks ports? Which

[ceph-users] VM storage and OSD Ceph failures

2013-09-17 Thread Gandalf Corvotempesta
Hi to all. Let's assume a Ceph cluster used to store VM disk images. VMs will be booted directly from the RBD. What will happens in case of OSD failure if the failed OSD is the primary where VM is reading from ? ___ ceph-users mailing list

Re: [ceph-users] VM storage and OSD Ceph failures

2013-09-17 Thread Gandalf Corvotempesta
2013/9/17 Gregory Farnum g...@inktank.com: The VM read will hang until a replica gets promoted and the VM resends the read. In a healthy cluster with default settings this will take about 15 seconds. Thank you. ___ ceph-users mailing list

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-28 Thread Gandalf Corvotempesta
2013/6/20 Matthew Anderson manderson8...@gmail.com: Hi All, I've had a few conversations on IRC about getting RDMA support into Ceph and thought I would give it a quick attempt to hopefully spur some interest. What I would like to accomplish is an RSockets only implementation so I'm able to

[ceph-users] SSD suggestions as journal

2013-07-22 Thread Gandalf Corvotempesta
I'm looking at some SSDs drives to be used as journal. Seagate 600 should be the better in write intensive operation (like a journal): http://www.storagereview.com/seagate_600_pro_enterprise_ssd_review what do you suggest? Is this good enough ? Should I look for write-intensive operations when

Re: [ceph-users] SSD recommendations for OSD journals

2013-07-22 Thread Gandalf Corvotempesta
2013/7/22 Chen, Xiaoxi xiaoxi.c...@intel.com: Imaging you have several writes have been flushed to journal and acked,but not yet write to disk. Now the system crash by kernal panic or power failure,you will lose your data in ram disk,thus lose data that assumed to be successful written.

Re: [ceph-users] SSD suggestions as journal

2013-07-22 Thread Gandalf Corvotempesta
2013/7/22 Mark Nelson mark.nel...@inktank.com: I don't have any in my test lab, but the DC S3700 continues to look like a good option and has a great reputation, but might be a bit pricey. From that article it looks like the Micron P400m might be worth looking at too, but seems to be a bit

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-07-16 Thread Gandalf Corvotempesta
2013/6/20 Matthew Anderson manderson8...@gmail.com: Hi All, I've had a few conversations on IRC about getting RDMA support into Ceph and thought I would give it a quick attempt to hopefully spur some interest. What I would like to accomplish is an RSockets only implementation so I'm able to

[ceph-users] journal size suggestions

2013-07-09 Thread Gandalf Corvotempesta
Hi, i'm planning a new cluster on a 10GbE network. Each storage node will have a maximum of 12 SATA disks and 2 SSD as journals. What do you suggest as journal size for each OSD? 5GB is enough? Should I just consider SATA writing speed when calculating journal size or also network speed?

Re: [ceph-users] journal size suggestions

2013-07-09 Thread Gandalf Corvotempesta
frequency is 5 seconds. What do you mean with fine tuning spinning storage media? On which tuning are you referring to? Il giorno 09/lug/2013 23:45, Andrey Korolyov and...@xdel.ru ha scritto: On Wed, Jul 10, 2013 at 1:16 AM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: Hi, i'm

[ceph-users] Fwd: Multi Rack Reference architecture

2013-06-04 Thread Gandalf Corvotempesta
-- Forwarded message -- From: Gandalf Corvotempesta gandalf.corvotempe...@gmail.com Date: 2013/5/31 Subject: Multi Rack Reference architecture To: ceph-users@lists.ceph.com ceph-users@lists.ceph.com In reference architecture PDF, downloadable from your website, there was some

[ceph-users] Clustered FS for RBD

2013-06-04 Thread Gandalf Corvotempesta
Any experiences with clustered FS on top of RBD devices? Which FS do you suggest for more or less 10.000 mailboxes accessed by 10 dovecot nodes ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Clustered FS for RBD

2013-06-04 Thread Gandalf Corvotempesta
2013/6/4 Smart Weblications GmbH - Florian Wiessner f.wiess...@smart-weblications.de we use ocfs2 ontop of rbd... the only bad thing is that ocfs2 will fence all nodes if rbd is not responding within defined timeout... if rbd is not responding to all nodes, having all ocfs2 fenced should

[ceph-users] Multi Rack Reference architecture

2013-05-31 Thread Gandalf Corvotempesta
In reference architecture PDF, downloadable from your website, there was some reference to a multi rack architecture described in another doc. Is this paper available ? ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] Fwd: RGW

2013-05-21 Thread Gandalf Corvotempesta
-- Forwarded message -- From: Gandalf Corvotempesta gandalf.corvotempe...@gmail.com Date: 2013/5/20 Subject: RGW To: ceph-users@lists.ceph.com ceph-users@lists.ceph.com Hi, i'm receiving an EntityTooLarge error when trying to upload an object of 100MB I've already set

[ceph-users] RGW

2013-05-20 Thread Gandalf Corvotempesta
Hi, i'm receiving an EntityTooLarge error when trying to upload an object of 100MB I've already set LimitRequestBody to 0 in apache. Anyting else to check ? ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] CRUSH maps for multiple switches

2013-05-08 Thread Gandalf Corvotempesta
Let's assume 20 OSDs servers and 4x 12 ports switches, 2 for public network and 2 for cluster netowork No link between public switches and no link between cluster switches. first 10 OSD servers connected to public switch1 and the other 10 OSDs connected to public switch2. The same apply for

Re: [ceph-users] Best solution for shared FS on Ceph for web clusters

2013-04-24 Thread Gandalf Corvotempesta
2013/4/24 Maik Kulbe i...@linux-web-development.de: At the moment I'm trying a solution that uses RBD with a normal FS like EXT4 or ZFS and where two server export that block device via NFS(with heartbeat for redundancy and failover) but that involves problems with file system consistency. If

[ceph-users] RDMA

2013-04-18 Thread Gandalf Corvotempesta
Hi, will RDMA be supported in the shortterm? I'm planning an infrastructure and I don't know if starting with IB QDR or 10GbE. IB is much cheaper than 10GbE and with RDMA should be 4x faster, but with IPoIB as workaround I've read that is very very heavy on CPU and very slow (15gbit more or less)

  1   2   >