Re: [ceph-users] Impact of a small DB size with Bluestore

2019-11-26 Thread Simon Ironside
Agree this needs tidied up in the docs. New users have little chance of 
getting it right relying on the docs alone. It's been discussed at 
length here several times in various threads but it doesn't always seem 
we reach the same conclusion so reading here doesn't guarantee 
understanding this correctly either as I'm no doubt about to demonstrate :)


Mattia Belluco said back in May:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/035086.html

"when RocksDB needs to compact a layer it rewrites it
*before* deleting the old data; if you'd like to be sure you db does not
spill over to the spindle you should allocate twice the size of the
biggest layer to allow for compaction."

I didn't spot anyone disagreeing so I used 64GiB DB/WAL partitions on 
the SSDs in my most recent clusters to allow for this and to be certain 
that I definitely had room for the WAL on top and wouldn't get caught 
out by people saying GB (x1000^3 bytes) when they mean GiB (x1024^3 
bytes). I left the rest of the SSD empty to make the most of wear 
leveling, garbage collection etc.


Simon

On 26/11/2019 12:20, Janne Johansson wrote:

It's mentioned here among other places
https://books.google.se/books?id=vuiLDwAAQBAJ=PA79=PA79=rocksdb+sizes+3+30+300+g=bl=TlH4GR0E8P=ACfU3U0QOJQZ05POZL9DQFBVwTapML81Ew=en=X=2ahUKEwiPscq57YfmAhVkwosKHY1bB1YQ6AEwAnoECAoQAQ#v=onepage=rocksdb%20sizes%203%2030%20300%20g=false

The 4% was a quick ballpark figure someone came up with to give early 
adopters a decent start, but later science has shown that L0,L1,L2 
levels make the sizes 3,30,300 "optimal" to not waste SSD space that 
will not be used.
You can set 240, but it will not be better than 30. It will be better 
than 24, so "not super bad, but not optimal".

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-15 Thread Simon Ironside

On 15/11/2019 15:44, Joshua M. Boniface wrote:

Hey All:

I've also quite frequently experienced this sort of issue with my Ceph 
RBD-backed QEMU/KVM
cluster (not OpenStack specifically). Should this workaround of allowing the 
'osd blacklist'
command in the caps help in that scenario as well, or is this an 
OpenStack-specific
functionality?


Yes, my use case is RBD backed QEMU/KVM too, not Openstack. It's 
required for all RBD clients.


Simon

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-15 Thread Simon Ironside

Hi Florian,

On 15/11/2019 12:32, Florian Haas wrote:


I received this off-list but then subsequently saw this message pop up
in the list archive, so I hope it's OK to reply on-list?


Of course, I just clicked the wrong reply button the first time.


So that cap was indeed missing, thanks for the hint! However, I am still
trying to understand how this is related to the issue we saw.


I had exactly the same happen to me as happened to you a week or so ago. 
Compute node lost power and once restored the VMs would start booting 
but fail early on when they tried to write.


My key was also missing that cap, adding it and resetting the affected 
VMs was the only action I took to sort things out. I didn't need to go 
around removing locks by hand as you did. As you say, waiting 30 seconds 
didn't do any good so it doesn't appear to be a watcher thing.


This was mentioned in the release notes for Luminous[1], I'd missed it 
too as I redeployed Nautilus instead and skipped these steps:




Verify that all RBD client users have sufficient caps to blacklist other 
client users. RBD client users with only "allow r" monitor caps should 
be updated as follows:


# ceph auth caps client. mon 'allow r, allow command "osd 
blacklist"' osd ''




Simon

[1] 
https://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Global power failure, OpenStack Nova/libvirt/KVM, and Ceph RBD locks

2019-11-15 Thread Simon Ironside

Hi Florian,

Any chance the key your compute nodes are using for the RBD pool is 
missing 'allow command "osd blacklist"' from its mon caps?


Simon

On 15/11/2019 08:19, Florian Haas wrote:

Hi everyone,

I'm trying to wrap my head around an issue we recently saw, as it
relates to RBD locks, Qemu/KVM, and libvirt.

Our data center graced us with a sudden and complete dual-feed power
failure that affected both a Ceph cluster (Luminous, 12.2.12), and
OpenStack compute nodes that used RBDs in that Ceph cluster. (Yes, these
things really happen, even in 2019.)

Once nodes were powered back up, the Ceph cluster came up gracefully
with no intervention required — all we saw was some Mon clock skew until
NTP peers had fully synced. Yay! However, our Nova compute nodes, or
rather the libvirt VMs that were running on them, were in not so great a
shape. The VMs booted up fine initially, but then blew up as soon as
they were trying to write to their RBD-backed virtio devices — which, of
course, was very early in the boot sequence as they had dirty filesystem
journals to apply.

Being able to read from, but not write to, RBDs is usually an issue with
exclusive locking, so we stopped one of the affected VMs, checked the
RBD locks on its device, and found (with rbd lock ls) that the lock was
still being held even after the VM was definitely down — both "openstack
server show" and "virsh domstate" agreed on this. We manually cleared
the lock (rbd lock rm), started the VM, and it booted up fine.

Repeat for all VMs, and we were back in business.

If I understand correctly, image locks — in contrast to image watchers —
have no timeout, so locks must be always be explicitly released, or they
linger forever.

So that raises a few questions:

(1) Is it correct to assume that the lingering lock was actually from
*before* the power failure?

(2) What, exactly, triggers the lock acquisition and release in this
context? Is it nova-compute that does this, or libvirt, or Qemu/KVM?

(3) Would the same issue be expected essentially in any hard failure of
even a single compute node, and if so, does that mean that what
https://docs.ceph.com/docs/master/rbd/rbd-openstack/ says about "nova
evacuate" (and presumably, by extension also about "nova host-evacuate")
is inaccurate? If so, what would be required to make that work?

(4) If (3), is it correct to assume that the same considerations apply
to the Nova resume_guests_state_on_host_boot feature, i.e. that
automatic guest recovery wouldn't be expected to succeed even if a node
experienced just a hard reboot, as opposed to a a catastrophic permanent
failure? And again, what would be required to make that work?  Is it
really necessary to clean all RBD locks manually?

Grateful for any insight that people could share here. I'd volunteer to
add a brief writeup of locking functionality in this context to the docs.

Thanks!

Cheers,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New best practices for osds???

2019-07-24 Thread Simon Ironside
RAID0 mode being discussed here means several RAID0 "arrays", each with 
a single physical disk as a member of it.

I.e. the number of OSDs is the same whether in RAID0 or JBOD mode.
E.g. 12x physicals disks = 12x RAID0 single disk "arrays" or 12x JBOD 
physical disks = 12x OSDs.


Simon

On 24/07/2019 23:14, solarflow99 wrote:
I can't understand how using RAID0 is better than JBOD, considering 
jbod would be many individual disks, each used as OSDs, instead of a 
single big one used as a single OSD.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] num of objects degraded

2019-06-13 Thread Simon Ironside

Hi,

20067 objects actual data
with 3x replication = 60201 objects

On 13/06/2019 08:36, zhanrzh...@teamsun.com.cn wrote:


And total num of objects are 20067
/[root@ceph-25 src]# ./rados -p rbd ls| wc -l/
/20013/
/[root@ceph-25 src]# ./rados -p cephfs_data ls | wc -l/
/0/
/[root@ceph-25 src]# ./rados -p cephfs_metadata ls | wc -l/
/54/

But the num of objects that ceph -s show  are 60201.
I can't understand it .Can someone explain it to me?
thanks!!!



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] objects degraded higher than 100%

2019-03-21 Thread Simon Ironside

This behaviour is still an issue in mimic 13.2.5 and nautilus 14.2.0.
I've logged https://tracker.ceph.com/issues/38841 for this. Apologies if 
this has already been done.


Simon

On 06/03/2019 20:17, Simon Ironside wrote:
Yes, as I said that bug is marked resolved. It's also marked as only 
affecting jewel and luminous.

I'm pointing out that it's still an issue today in mimic 13.2.4.

Simon

On 06/03/2019 16:04, Darius Kasparavičius wrote:

For some reason I didn't notice that number.

But it's most likely you are hitting this or similar bug: 
https://tracker.ceph.com/issues/21803




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-osd 14.2.0 won't start: Failed to pick public address on IPv6 only cluster

2019-03-20 Thread Simon Ironside

On 20/03/2019 19:53, Ricardo Dias wrote:

Make sure you have the following option in ceph.conf:

ms_bind_ipv4 = false

That will prevent the OSD from trying to find an IPv4 address.



Thank you! I've only ever used ms_bind_ipv6 = true on its own. Adding 
your line solved my problem.


Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-osd 14.2.0 won't start: Failed to pick public address on IPv6 only cluster

2019-03-20 Thread Simon Ironside

Hi Everyone,

I'm upgrading an IPv6 only cluster from 13.2.5 Mimic to 14.2.0 Nautilus. 
The mon and mgr upgrades went fine, the first OSD node unfortunately 
fails to restart after upgdating the packages.


The affected ceph-osd logs show the lines:

Unable to find any IPv4 address in networks 'MY /64' interfaces ''
Failed to pick public address.

Where MY/64 is the correct IPv6 public subnet from ceph.conf.
Should the single quotes after interfaces be blank? It would be on bond0 
in my case, just in case that's relevant.


The upgrade from 13.2.4 to 13.2.5 went without a hitch.
I've obviously not gone any further but any suggestions for how to proceed?

Thanks,
Simon.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] objects degraded higher than 100%

2019-03-06 Thread Simon Ironside
Yes, as I said that bug is marked resolved. It's also marked as only 
affecting jewel and luminous.

I'm pointing out that it's still an issue today in mimic 13.2.4.

Simon

On 06/03/2019 16:04, Darius Kasparavičius wrote:

For some reason I didn't notice that number.

But it's most likely you are hitting this or similar bug: 
https://tracker.ceph.com/issues/21803


On Wed, Mar 6, 2019, 17:30 Simon Ironside <mailto:sirons...@caffetine.org>> wrote:


That's the misplaced objects, no problem there. Degraded objects
are at
153.818%.

Simon

On 06/03/2019 15:26, Darius Kasparavičius wrote:
> Hi,
>
> there it's 1.2% not 1200%.
>
> On Wed, Mar 6, 2019 at 4:36 PM Simon Ironside
mailto:sirons...@caffetine.org>> wrote:
>> Hi,
>>
>> I'm still seeing this issue during failure testing of a new
Mimic 13.2.4
>> cluster. To reproduce:
>>
>> - Working Mimic 13.2.4 cluster
>> - Pull a disk
>> - Wait for recovery to complete (i.e. back to HEALTH_OK)
>> - Remove the OSD with `ceph osd crush remove`
>> - See greater than 100% degraded objects while it recovers as below
>>
>> It doesn't seem to do any harm, once recovery completes the cluster
>> returns to HEALTH_OK.
>> I can only find bug 21803 on the tracker that seems to cover this
>> behaviour which is marked as resolved.
>>
>> Simon
>>
>>     cluster:
>>       id:     MY ID
>>       health: HEALTH_WARN
>>               709/58572 objects misplaced (1.210%)
>>               Degraded data redundancy: 90094/58572 objects
degraded
>> (153.818%), 49 pgs degraded, 51 pgs undersized
>>
>>     services:
>>       mon: 3 daemons, quorum san2-mon1,san2-mon2,san2-mon3
>>       mgr: san2-mon1(active), standbys: san2-mon2, san2-mon3
>>       osd: 52 osds: 52 up, 52 in; 84 remapped pgs
>>
>>     data:
>>       pools:   16 pools, 2016 pgs
>>       objects: 19.52 k objects, 72 GiB
>>       usage:   7.8 TiB used, 473 TiB / 481 TiB avail
>>       pgs:     90094/58572 objects degraded (153.818%)
>>                709/58572 objects misplaced (1.210%)
>>                1932 active+clean
>>                47
 active+recovery_wait+undersized+degraded+remapped
>>                33   active+remapped+backfill_wait
>>                2 active+recovering+undersized+remapped
>>                1 active+recovery_wait+undersized+degraded
>>                1 active+recovering+undersized+degraded+remapped
>>
>>     io:
>>       client:   24 KiB/s wr, 0 op/s rd, 3 op/s wr
>>       recovery: 0 B/s, 126 objects/s



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] objects degraded higher than 100%

2019-03-06 Thread Simon Ironside
That's the misplaced objects, no problem there. Degraded objects are at 
153.818%.


Simon

On 06/03/2019 15:26, Darius Kasparavičius wrote:

Hi,

there it's 1.2% not 1200%.

On Wed, Mar 6, 2019 at 4:36 PM Simon Ironside  wrote:

Hi,

I'm still seeing this issue during failure testing of a new Mimic 13.2.4
cluster. To reproduce:

- Working Mimic 13.2.4 cluster
- Pull a disk
- Wait for recovery to complete (i.e. back to HEALTH_OK)
- Remove the OSD with `ceph osd crush remove`
- See greater than 100% degraded objects while it recovers as below

It doesn't seem to do any harm, once recovery completes the cluster
returns to HEALTH_OK.
I can only find bug 21803 on the tracker that seems to cover this
behaviour which is marked as resolved.

Simon

cluster:
  id: MY ID
  health: HEALTH_WARN
  709/58572 objects misplaced (1.210%)
  Degraded data redundancy: 90094/58572 objects degraded
(153.818%), 49 pgs degraded, 51 pgs undersized

services:
  mon: 3 daemons, quorum san2-mon1,san2-mon2,san2-mon3
  mgr: san2-mon1(active), standbys: san2-mon2, san2-mon3
  osd: 52 osds: 52 up, 52 in; 84 remapped pgs

data:
  pools:   16 pools, 2016 pgs
  objects: 19.52 k objects, 72 GiB
  usage:   7.8 TiB used, 473 TiB / 481 TiB avail
  pgs: 90094/58572 objects degraded (153.818%)
   709/58572 objects misplaced (1.210%)
   1932 active+clean
   47   active+recovery_wait+undersized+degraded+remapped
   33   active+remapped+backfill_wait
   2active+recovering+undersized+remapped
   1active+recovery_wait+undersized+degraded
   1active+recovering+undersized+degraded+remapped

io:
  client:   24 KiB/s wr, 0 op/s rd, 3 op/s wr
  recovery: 0 B/s, 126 objects/s


On 13/10/2017 18:53, David Zafman wrote:

I improved the code to compute degraded objects during
backfill/recovery.  During my testing it wouldn't result in percentage
above 100%.  I'll have to look at the code and verify that some
subsequent changes didn't break things.

David


On 10/13/17 9:55 AM, Florian Haas wrote:

Okay, in that case I've no idea. What was the timeline for the
recovery
versus the rados bench and cleanup versus the degraded object counts,
then?

1. Jewel deployment with filestore.
2. Upgrade to Luminous (including mgr deployment and "ceph osd
require-osd-release luminous"), still on filestore.
3. rados bench with subsequent cleanup.
4. All OSDs up, all  PGs active+clean.
5. Stop one OSD. Remove from CRUSH, auth list, OSD map.
6. Reinitialize OSD with bluestore.
7. Start OSD, commencing backfill.
8. Degraded objects above 100%.

Please let me know if that information is useful. Thank you!

Hmm, that does leave me a little perplexed.

Yeah exactly, me too. :)


David, do we maybe do something with degraded counts based on the
number of
objects identified in pg logs? Or some other heuristic for number of
objects
that might be stale? That's the only way I can think of to get these
weird
returning sets.

One thing that just crossed my mind: would it make a difference
whether after the OSD goes out or not, in the time window between it
going down and being deleted from the crushmap/osdmap? I think it
shouldn't (whether being marked out or just non-existent, it's not
eligible for holding any data so either way), but I'm not really sure
about the mechanics of the internals here.

Cheers,
Florian

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] objects degraded higher than 100%

2019-03-06 Thread Simon Ironside

Hi,

I'm still seeing this issue during failure testing of a new Mimic 13.2.4 
cluster. To reproduce:


- Working Mimic 13.2.4 cluster
- Pull a disk
- Wait for recovery to complete (i.e. back to HEALTH_OK)
- Remove the OSD with `ceph osd crush remove`
- See greater than 100% degraded objects while it recovers as below

It doesn't seem to do any harm, once recovery completes the cluster 
returns to HEALTH_OK.
I can only find bug 21803 on the tracker that seems to cover this 
behaviour which is marked as resolved.


Simon

  cluster:
    id: MY ID
    health: HEALTH_WARN
    709/58572 objects misplaced (1.210%)
    Degraded data redundancy: 90094/58572 objects degraded 
(153.818%), 49 pgs degraded, 51 pgs undersized


  services:
    mon: 3 daemons, quorum san2-mon1,san2-mon2,san2-mon3
    mgr: san2-mon1(active), standbys: san2-mon2, san2-mon3
    osd: 52 osds: 52 up, 52 in; 84 remapped pgs

  data:
    pools:   16 pools, 2016 pgs
    objects: 19.52 k objects, 72 GiB
    usage:   7.8 TiB used, 473 TiB / 481 TiB avail
    pgs: 90094/58572 objects degraded (153.818%)
 709/58572 objects misplaced (1.210%)
 1932 active+clean
 47   active+recovery_wait+undersized+degraded+remapped
 33   active+remapped+backfill_wait
 2    active+recovering+undersized+remapped
 1    active+recovery_wait+undersized+degraded
 1    active+recovering+undersized+degraded+remapped

  io:
    client:   24 KiB/s wr, 0 op/s rd, 3 op/s wr
    recovery: 0 B/s, 126 objects/s


On 13/10/2017 18:53, David Zafman wrote:


I improved the code to compute degraded objects during 
backfill/recovery.  During my testing it wouldn't result in percentage 
above 100%.  I'll have to look at the code and verify that some 
subsequent changes didn't break things.


David


On 10/13/17 9:55 AM, Florian Haas wrote:
Okay, in that case I've no idea. What was the timeline for the 
recovery

versus the rados bench and cleanup versus the degraded object counts,
then?

1. Jewel deployment with filestore.
2. Upgrade to Luminous (including mgr deployment and "ceph osd
require-osd-release luminous"), still on filestore.
3. rados bench with subsequent cleanup.
4. All OSDs up, all  PGs active+clean.
5. Stop one OSD. Remove from CRUSH, auth list, OSD map.
6. Reinitialize OSD with bluestore.
7. Start OSD, commencing backfill.
8. Degraded objects above 100%.

Please let me know if that information is useful. Thank you!


Hmm, that does leave me a little perplexed.

Yeah exactly, me too. :)

David, do we maybe do something with degraded counts based on the 
number of
objects identified in pg logs? Or some other heuristic for number of 
objects
that might be stale? That's the only way I can think of to get these 
weird

returning sets.

One thing that just crossed my mind: would it make a difference
whether after the OSD goes out or not, in the time window between it
going down and being deleted from the crushmap/osdmap? I think it
shouldn't (whether being marked out or just non-existent, it's not
eligible for holding any data so either way), but I'm not really sure
about the mechanics of the internals here.

Cheers,
Florian


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] backfill_toofull after adding new OSDs

2019-03-06 Thread Simon Ironside

I've just seen this when *removing* an OSD too.
Issue resolved itself during recovery. OSDs were not full, not even 
close, there's virtually nothing on this cluster.
Mimic 13.2.4 on RHEL 7.6. OSDs are all Bluestore HDD with SSD DBs. 
Everything is otherwise default.


  cluster:
    id: MY ID
    health: HEALTH_ERR
    1161/66039 objects misplaced (1.758%)
    Degraded data redundancy: 220095/66039 objects degraded 
(333.280%), 137 pgs degraded

    Degraded data redundancy (low space): 1 pg backfill_toofull

  services:
    mon: 3 daemons, quorum san2-mon1,san2-mon2,san2-mon3
    mgr: san2-mon1(active), standbys: san2-mon2, san2-mon3
    osd: 53 osds: 52 up, 52 in; 186 remapped pgs

  data:
    pools:   16 pools, 2016 pgs
    objects: 22.01 k objects, 83 GiB
    usage:   7.9 TiB used, 473 TiB / 481 TiB avail
    pgs: 220095/66039 objects degraded (333.280%)
 1161/66039 objects misplaced (1.758%)
 1830 active+clean
 134  active+recovery_wait+undersized+degraded+remapped
 45   active+remapped+backfill_wait
 3    active+recovering+undersized+remapped
 3    active+recovery_wait+undersized+degraded
 1    active+remapped+backfill_wait+backfill_toofull

  io:
    client:   60 KiB/s wr, 0 op/s rd, 5 op/s wr
    recovery: 8.6 MiB/s, 110 objects/s


On 07/02/2019 04:26, Brad Hubbard wrote:

Let's try to restrict discussion to the original thread
"backfill_toofull while OSDs are not full" and get a tracker opened up
for this issue.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph on Azure ?

2018-12-23 Thread Simon Ironside
Equinix's Cloud Exchange 
 
service is worth a look too, it hooks in to Azure ExpressRoute amongst 
others.


Simon

On 23/12/2018 16:49, Erik McCormick wrote:
Dedicated links are not that difficult to come by anymore. It's mainly 
done with SDN. I know Megaport, for example, let's you provision 
virtual circuits to dozens of providers including Azure, AWS, and GCP. 
You can run several virtual circuits over a single ccross-connect.


I look forward to hearing your performance results running in cloud 
VMs, but I'm fairly confident it will be both sub-optimal and expensive.


Cheers,
Erik

On Sun, Dec 23, 2018, 10:46 AM LuD j  wrote:


Hello Marc,
Unfortunatly we can't move from Azure so easily, we plan to open
more and more azure region in the futur, so this strategy leads us
to the ceph integration issue.
Even if we had others datacenters near to them, I guess it would
require dedicated network links between the ceph clients and the
ceph cluster and we may not have the resources for this kind of
architecture.

We are going to try ceph on azure by deploying an small cluster
and keep a eye on any performances issues.



Le dim. 23 déc. 2018 à 14:46, Marc Roos mailto:m.r...@f1-outsourcing.eu>> a écrit :


What about putting it in a datacenter near them? Or move
everything out
to some provider that allows you to have both.


-Original Message-
From: LuD j [mailto:luds.jer...@gmail.com
]
Sent: maandag 17 december 2018 21:38
To: ceph-users@lists.ceph.com 
Subject: [ceph-users] Ceph on Azure ?

Hello,

We are working to integrate s3 protocol in our webs
applications. The
objective is to stop storing documents in bdd or filesytem but
use s3's
buckets in replacement.
We already gave a try to ceph with rados gateway on physicals
nodes, its
working well.

But we are also on Azure, and we can't get baremetals servers
from them.
We planned a high volumetry ~50TB/year + 20% each year which
make ~80K
Euro/year per Azure region. The storage cost on Azure is high
and they
don't provide any Qos on the network latency.
We found a 2016 post from the gitlab infrastructure's team
about the
network latency issue on azure which confirms our concern:
https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/678


Is there anyone using ceph in production on a cloud provider
like Azure?


___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] all vms can not start up when boot all the ceph hosts.

2018-12-04 Thread Simon Ironside

On 04/12/2018 09:37, linghucongsong wrote:

But it is just in case suddenly power off for all the hosts!


I'm surprised you're seeing I/O errors inside the VM once they're restarted.
Is the cluster healthy? What's the output of ceph status?

Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous v12.2.10 released

2018-11-27 Thread Simon Ironside

On 27/11/2018 14:50, Abhishek Lekshmanan wrote:


We're happy to announce the tenth bug fix release of the Luminous
v12.2.x long term stable release series. The previous release, v12.2.9,
introduced the PG hard-limit patches which were found to cause an issue
in certain upgrade scenarios, and this release was expedited to revert
those patches. If you already successfully upgraded to v12.2.9, you
should **not** upgrade to v12.2.10, but rather **wait** for a release in
which http://tracker.ceph.com/issues/36686 is addressed. All other users
are encouraged to upgrade to this release.


Is it safe for v12.2.9 users upgrade to v13.2.2 Mimic?

http://tracker.ceph.com/issues/36686 suggests a similar revert might be 
on the cards for v13.2.3 so I'm not sure.


Thanks,
Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 12.2.9 release

2018-11-08 Thread Simon Ironside


On 08/11/2018 09:17, Marc Roos wrote:
  
And that is why I don't like ceph-deploy. Unless you have maybe hundreds

of disks, I don’t see why you cannot install it "manually".


On 07/11/2018 22:22, Ricardo J. Barberis wrote:
  
Also relevant: if you use ceph-deploy like I do con CentOS 7, it

installs the latest version available, so I inadvertedly ended up with
12.2.9 on my last four servers.


I use ceph-deploy (on RHEL 7) but with my own repos. I still fell into 
the 12.2.9 trap but that's because that's the package version I'd 
mirrored. If I'd downloaded 12.2.8 instead it would've been fine with 
ceph-deploy.


Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 12.2.9 release

2018-11-07 Thread Simon Ironside

On 07/11/2018 15:39, Gregory Farnum wrote:
On Wed, Nov 7, 2018 at 5:58 AM Simon Ironside <mailto:sirons...@caffetine.org>> wrote:




On 07/11/2018 10:59, Konstantin Shalygin wrote:
>> I wonder if there is any release announcement for ceph 12.2.9
that I missed.
>> I just found the new packages on download.ceph.com
<http://download.ceph.com>, is this an official
>> release?
>
> This is because 12.2.9 have a several bugs. You should avoid to
use this
> release and wait for 12.2.10

Argh! What's it doing in the repos then?? I've just upgraded to it!
What are the bugs? Is there a thread about them?


If you’ve already upgraded and have no issues then you won’t have any 
trouble going forward — except perhaps on the next upgrade, if you do 
it while the cluster is unhealthy.


Thanks, the upgrade went fine and I've no known issues. The only warning 
I have is about too many PGs per OSD which is my fault not ceph's. I 
trust that doesn't count as a reason not to proceed to 13.2.2?


I agree that it’s annoying when these issues make it out. We’ve had 
ongoing discussions to try and improve the release process so it’s 
less drawn-out and to prevent these upgrade issues from making it 
through testing, but nobody has resolved it yet. If anybody has 
experience working with deb repositories and handling releases, the 
Ceph upstream could use some help... ;)


Totally, I get that this happens from time to time but once a bad 
release is known why not just delete the affected packages from the 
official repos? That seems to me to be a really easy step to take 
especially if release announcements haven't been sent, docs.ceph.com 
hasn't been updated yet etc. I reposync --newest-only the RPMs from the 
official repos to my own then update my ceph hosts from there which is 
how I ended up with 12.2.9.


Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 12.2.9 release

2018-11-07 Thread Simon Ironside




On 07/11/2018 10:59, Konstantin Shalygin wrote:

I wonder if there is any release announcement for ceph 12.2.9 that I missed.
I just found the new packages on download.ceph.com, is this an official
release?


This is because 12.2.9 have a several bugs. You should avoid to use this 
release and wait for 12.2.10


Argh! What's it doing in the repos then?? I've just upgraded to it!
What are the bugs? Is there a thread about them?

Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove host weight 0 from crushmap

2018-08-01 Thread Simon Ironside

On 01/08/18 13:39, Marc Roos wrote:


Is there already a command to remove an host from the crush map (like
ceph osd crush rm osd.23), without having to 'manually' edit the crush
map?


Yes, it's the same: ceph osd crush remove 

Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RAID question for Ceph

2018-07-19 Thread Simon Ironside

On 19/07/18 07:59, Dietmar Rieder wrote:


We have P840ar controllers with battery backed cache in our OSD nodes
and configured an individual RAID-0 for each OSD (ceph luminous +
bluestore). We have not seen any problems with this setup so far and
performance is great at least for our workload.


I'm doing the same with LSI RAID controllers for the same reason, to 
take advantage of the battery backed cache. No problems with this here 
either. As Troy said, you do need to go through the additional step of 
creating a single disk RAID0 whenever you replace a disk that you 
wouldn't with regular HBA.


Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Delete pool nicely

2018-07-17 Thread Simon Ironside


On 22/05/18 18:28, David Turner wrote:
 From my experience, that would cause you some troubles as it would 
throw the entire pool into the deletion queue to be processed as it 
cleans up the disks and everything.  I would suggest using a pool 
listing from `rados -p .rgw.buckets ls` and iterate on that using some 
scripts around the `rados -p .rgw.buckest rm ` command that 
you could stop, restart at a faster pace, slow down, etc.  Once the 
objects in the pool are gone, you can delete the empty pool without any 
problems.  I like this option because it makes it simple to stop it if 
you're impacting your VM traffic.


Just to finish the story here; thanks again for the advice - it worked well.

Generating the list of objects took around 6 hours but didn't cause any 
issues doing so. I had a sleep 0.1 between each rm iteration. Probably a 
bit on the conservative side but didn't cause me any problems either and 
was making acceptable progress so I didn't change it.


3 weeks later and the pool was more or less empty (I avoided ~600 
objects/400KiB with $ characters in the name that I couldn't be bothered 
handling automatically) so I deleted the pools. I did get some slow 
request warnings immediately after deleting the pools but they went away 
in a minute or so.


Thanks,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSDs for data drives

2018-07-11 Thread Simon Ironside



On 11/07/18 14:26, Simon Ironside wrote:

The 2TB Samsung 850 EVO for example is only rated for 300TBW (terabytes 
written). Over the 5 year warranty period that's only 165GB/day, not 
even 0.01 full drive writes per day. The SM863a part of the same size is 
rated for 12,320TBW, over 3 DWPD.


Sorry, my maths is out above - that should be "not even 0.1 full drive 
writes per day".


Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSDs for data drives

2018-07-11 Thread Simon Ironside



On 11/07/18 13:49, Satish Patel wrote:

Prices going way up if I am picking Samsung SM863a for all data drives.

We have many servers running on consumer grade sad drives and we never 
noticed any performance or any fault so far (but we never used ceph before)


I thought that is the whole point of ceph to provide high availability 
if drive go down also parellel read from multiple osd node


I wouldn't use consumer drives. They tend not to have power loss 
protection, performance can degrade sharply as queue depth increases and 
the endurance is nowhere near enterprise drives. Depending on your use 
pattern, you may get a real shock at how quickly they'll wear out.


The 2TB Samsung 850 EVO for example is only rated for 300TBW (terabytes 
written). Over the 5 year warranty period that's only 165GB/day, not 
even 0.01 full drive writes per day. The SM863a part of the same size is 
rated for 12,320TBW, over 3 DWPD.


Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Journel SSD recommendation

2018-07-10 Thread Simon Ironside


On 10/07/18 19:32, Robert Stanford wrote:


  Do the recommendations apply to both data and journal SSDs equally?



Search the list for "Many concurrent drive failures - How do I activate 
pgs?" to read about the Intel DC S4600 failure story. The OP had several 
2TB models of these fail when used as Bluestore data devices. The 
Samsung SM863a is discussed as a good alternative in the same thread.


Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Journel SSD recommendation

2018-07-10 Thread Simon Ironside



On 10/07/18 18:59, Satish Patel wrote:


Thanks, I would also like to know about Intel SSD 3700 (Intel SSD SC
3700 Series SSDSC2BA400G3P), price also looking promising, Do you have
opinion on it?
I can't quite tell from Google what exactly that is. If it's the Intel 
DC S3700 then I believe those are discontinued now but if you can still 
get hold of them they were used successfully and recommended by lots of 
cephers, myself included.


Cheers,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Journel SSD recommendation

2018-07-10 Thread Simon Ironside

Hi,

On 10/07/18 16:25, Satish Patel wrote:

Folks,

I am in middle or ordering hardware for my Ceph cluster, so need some
recommendation from communities.

- What company/Vendor SSD is good ?


Samsung SM863a is the current favourite I believe.

The Intel DC S4600 is one to specifically avoid at the moment unless the 
latest firmware has resolved some of the list member reported issues.



- What size should be good for Journal (for BlueStore)


ceph-disk defaults to a RocksDB partition that is 1% of the main device 
size. That'll get you in the right ball park.



I have lots of Samsung 850 EVO but they are consumer, Do you think
consume drive should be good for journal?


No :)

Cheers,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Jewel/Luminous Filestore/Bluestore for a new cluster

2018-06-06 Thread Simon Ironside

On 05/06/18 01:14, Subhachandra Chandra wrote:

We have not observed any major issues. We have had occasional OSD daemon 
crashes due to an assert which is a known bug but the cluster recovered 
without any intervention each time. All the nodes have been rebooted 2-3 
times due to CoreOS updates and no issues with that either.


Hi Subhachandra,

Thanks for your answer, it's this sort of stuff that worries me. My 
Filestore OSD daemons on Hammer & Jewel don't crash at all so this 
sounds like a regression and I should wait before deploying 
Luminous/Bluestore.


Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD recommendation

2018-06-01 Thread Simon Ironside
Thanks for the input, both. I've gone ahead with the SM863As. I've no 
input on the Microns I'm afraid. The specs look good to me, I just can't 
get them easily.


Sean, I didn't know you'd lost 10 in all. I do have 4x 480GB S4600s I'm 
using as Filestore journals in production for a couple of months now 
(purchased before I saw the S4600 thread) without issue. IIRC you were 
using the same 2TB S4600s as OSDs as the OP of the S4600 thread - I'm 
keeping my fingers crossed that if I was going to have the problem you 
experienced, I would've had it by now . . .


Thanks again,
Simon

On 31/05/18 19:12, Sean Redmond wrote:

I know the s4600 thread well as I had over 10 of those drives fail 
before I took them all out of production.


Intel did say a firmware fix was on the way but I could not wait and 
opted for SM863A and never looked back...


I will be sticking with SM863A for now on futher orders.

On Thu, 31 May 2018, 15:33 Fulvio Galeazzi, > wrote:


      I am also about to buy some new hardware and for SATA ~400GB I
was
considering Micron 5200 MAX, rated at 5 DWPD, for journaling/FSmetadata.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-05-31 Thread Simon Ironside

On 24/05/18 19:21, Lionel Bouton wrote:


Unfortunately I just learned that Supermicro found an incompatibility
between this motherboard and SM863a SSDs (I don't have more information
yet) and they proposed S4600 as an alternative. I immediately remembered
that there were problems and asked for a delay/more information and dug
out this old thread.


In case it helps you, I'm about to go down the same Supermicro EPYC and 
SM863a path as you. I asked about the incompatibility you mentioned and 
they knew what I was referring to. The incompatibility is between the 
on-board SATA controller and the SM863a and has apparently already been 
fixed. Even if not fixed, the incompatibility wouldn't be present if 
you're using a RAID controller instead of the on board SATA (which I 
intend to - don't know if you were?).


HTH,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Jewel/Luminous Filestore/Bluestore for a new cluster

2018-05-30 Thread Simon Ironside

On 30/05/18 20:35, Jack wrote:

Why would you deploy a Jewel cluster, which is almost 3 majors versions
away ?
Bluestore is also the good answer
It works well, have many advantages, and is simply the future of Ceph


Indeed, and normally I wouldn't even ask, but as I say there's been some 
comments/threads recently that make me doubt the obvious Luminous + 
Bluestore path. A few that stand out in my memory are:


* "Useless due to http://tracker.ceph.com/issues/22102; [1]
* OSD crash with segfault Luminous 12.2.4 [2] [3] [4]

There are others but those two stuck out for me. I realise that people 
will generally only report problems rather than "I installed ceph and 
everything went fine!" stories to this list but it was enough to 
motivate me to ask if Luminous/Bluestore was considered a good choice 
for a fresh install or if I should wait a bit.


Thanks,
Simon.

[1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026339.html
[2] 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025373.html

[3] http://tracker.ceph.com/issues/23431
[4] http://tracker.ceph.com/issues/23352
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Jewel/Luminous Filestore/Bluestore for a new cluster

2018-05-30 Thread Simon Ironside

Hi again,

I've been happily using both Hammer and Jewel with SSD journals and 
spinning disk Filestore OSDs for several years now and, as per my other 
email, I'm about to purchase hardware to build a new (separate) 
production cluster. I intend to use the same mixture of SSD for journals 
(or DB/WAL) and spinning disks for Filestore/Bluestore data as per my 
existing cluster.


* What's the recommendation for what to deploy?

I have a feeling the answer is going to be Luminous (as that's current 
LTS) and Bluestore (since that's the default in Luminous) but several 
recent threads and comments on this list make me doubt whether that 
would be a good choice right now.


* Is using Bluestore and integrated DB/WAL (without SSDs at all) a 
reasonable option for those used to the performance of SSD Journals + 
spinning disk Filestore OSDs?


Thanks very much in advance for any advice.

Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] SSD recommendation

2018-05-30 Thread Simon Ironside

Hi Everyone,

I'm about to purchase hardware for a new production cluster. I was going 
to use 480GB Intel DC S4600 SSDs as either Journal devices for Filestore 
and/or DB/WAL for Bluestore spinning disk OSDs until I saw David 
Herselman's "Many concurrent drive failures" thread which has given me 
the fear.


What's the current go to for Journal and/or DB/WAL SSDs if not the S4600?

I'm planning on using AMD EPYC based Supermicros for OSD nodes with 3x 
10TB SAS 7.2k to each SSD with 10gig networking. Happy to provide more 
info here if it's useful.


Thanks,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-05-29 Thread Simon Ironside

On 24/05/18 19:21, Lionel Bouton wrote:


Has anyone successfully used Ceph with S4600 ? If so could you share if
you used filestore or bluestore, which firmware was used and
approximately how much data was written on the most used SSDs ?


I have 4 new OSD nodes which have 480GB S4600s (Firmware revision: 
SCV10100) as journals for spinning disk Hammer Filestore OSDs. They're 
relatively new but have been in production for a couple of months 
without issue, touch wood.


My monitors are using relatively new (< 2 months old) 240GB S4500s 
(Firmware revision: SCV10121) again without issue to date.


Was there any conclusion to this? Was the OP just unlucky? I note that 
Red Hat specifically recommend S4600s here so David's story is a heck of 
a shock:


https://www.redhat.com/cms/managed-files/st-ceph-storage-intel-configuration-guide-technology-detail-f11532-201804-en.pdf

Regards,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Delete pool nicely

2018-05-24 Thread Simon Ironside


On 22/05/18 18:28, David Turner wrote:
 From my experience, that would cause you some troubles as it would 
throw the entire pool into the deletion queue to be processed as it 
cleans up the disks and everything.  I would suggest using a pool 
listing from `rados -p .rgw.buckets ls` and iterate on that using some 
scripts around the `rados -p .rgw.buckest rm ` command that 
you could stop, restart at a faster pace, slow down, etc.  Once the 
objects in the pool are gone, you can delete the empty pool without any 
problems.  I like this option because it makes it simple to stop it if 
you're impacting your VM traffic.


Brilliant, thanks David. That's exactly the kind of answer I needed.

Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Delete pool nicely

2018-05-22 Thread Simon Ironside

Hi Everyone,

I have an older cluster (Hammer 0.94.7) with a broken radosgw service 
that I'd just like to blow away before upgrading to Jewel after which 
I'll start again with EC pools.


I don't need the data but I'm worried that deleting the .rgw.buckets 
pool will cause performance degradation for the production RBD pool used 
by VMs. .rgw.buckets is a replicated pool (size=3) with ~14TB data in 
5.3M objects. A little over half the data in the whole cluster.


Is deleting this pool simply using ceph osd pool delete likely to cause 
me a performance problem? If so, is there a way I can do it better?


Thanks,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

2018-04-25 Thread Simon Ironside

On 25/04/18 10:52, Ranjan Ghosh wrote:

And, yes, we're running a "size:2 min_size:1" because we're on a very 
tight budget. If I understand correctly, this means: Make changes of 
files to one server. *Eventually* copy them to the other server. I hope 
this *eventually* means after a few minutes.
size:2 means there's two replicas of every object. Writes are 
synchronous (i.e. a write isn't complete until it's written to both 
OSDs) so there's no eventually - it's immediate.


min_size:1 means the cluster will still allow access while only one 
replica is available.


Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Broken rgw user

2018-04-25 Thread Simon Ironside

Hi Everyone,

I've got a problem with one rgw user on Hammer 0.94.7.

* "radosgw-admin user info" no longer works:

could not fetch user info: no user info saved

* I can still retrieve their stats via "radosgw-admin user stats", 
although the returned data is wrong:


{
"stats": {
"total_entries": 0,
"total_bytes": 0,
"total_bytes_rounded": 0
},
"last_stats_sync": "2018-04-24 15:58:27.354280Z",
"last_stats_update": "0.00"
}

* The user still shows in metadata list user

* As far as I can see, the user still works, I can access the account ok 
with s3cmd etc.


Does anyone know how to fix the user info and user stats issues?

Thanks,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] degraded PGs when adding OSDs

2018-02-11 Thread Simon Ironside

On 09/02/18 09:05, Janne Johansson wrote:
2018-02-08 23:38 GMT+01:00 Simon Ironside <sirons...@caffetine.org 
<mailto:sirons...@caffetine.org>>:


Hi Everyone,
I recently added an OSD to an active+clean Jewel (10.2.3) cluster
and was surprised to see a peak of 23% objects degraded. Surely this
should be at or near zero and the objects should show as misplaced?
I've searched and found Chad William Seys' thread from 2015 but
didn't see any conclusion that explains this:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-July/003355.html
<http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-July/003355.html>

  I agree, I always viewed it as if you had three copies of your PG, add 
a new OSD and that PG decides one of the copies should be on that OSD 
instead of one of the 3 older ones, it would just stop caring about the 
old PG, create a new empty PG on the new OSD and then as the synch is 
going towards the new PG it is "behind" in the data it contains until 
sync is done, but it (and its 2 previous copies) are correctly placed 
for the new crush map. Misplaced would probably be a more natural way of 
seeing it, at least if the now-abandoned PG was still being updated 
while the sync is done, but I don't think it is. It gets orphaned rather 
quickly as the new OSD kicks in.


I guess this design choice boils down to "being able to handle someone 
adding more OSDs to a cluster that is close to getting full", at the 
expense of "discarding one or more of the old copies and scaring the 
admin as if there was a huge issue when just adding one or many new 
shiny OSDs".


It certainly does scare me, especially as this particular cluster is 
size=2, min_size=1.


My worry is that I could experience a disk failure while adding a new 
OSD and potentially lose data while if the same disk failed when the 
cluster was active+clean I wouldn't. That doesn't seem like a very safe 
design choice but perhaps the real answer is to use size=3.


Reweighting an active OSD to 0 does the same thing on my cluster, causes 
the objects to go degraded instead of misplaced as I'd expect.


Thanks,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] degraded PGs when adding OSDs

2018-02-08 Thread Simon Ironside

Hi Everyone,

I recently added an OSD to an active+clean Jewel (10.2.3) cluster and 
was surprised to see a peak of 23% objects degraded. Surely this should 
be at or near zero and the objects should show as misplaced?


I've searched and found Chad William Seys' thread from 2015 but didn't 
see any conclusion that explains this:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-July/003355.html

Thanks,
Simon.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] attempt to access beyond end of device on osd prepare

2016-02-01 Thread Simon Ironside

On 01/02/16 17:47, Wade Holler wrote:

I can at least say that I've seen this. (a lot)

Running Infernalis with Btrfs on Cent 7.2.  I haven't seen any other
issue in the cluster that I would say is related.

Take it for what you will.


Thanks Wade,

That's reassuring at least that it hasn't caused you problems longer term.

I've not been able to reproduce the message myself by mounting and 
dismounting the file systems manually and I note that if I reboot the 
server without starting ceph services (i.e. chkconfig ceph off; reboot) 
the messages still appear for each OSD (xfs) disk even though they were 
never mounted by ceph or fstab:


Feb  1 22:24:34 san1-osd2 kernel: XFS (sdb1): Mounting V4 Filesystem
Feb  1 22:24:34 san1-osd2 kernel: XFS (sdb1): Ending clean mount
Feb  1 22:24:34 san1-osd2 kernel: attempt to access beyond end of device
Feb  1 22:24:34 san1-osd2 kernel: sdb1: rw=0, want=7814035088, 
limit=7814035087


FWIW, the messages seem to only appear once per disk and have not yet 
re-appeared during normal use of the cluster on either OSD host.


Maybe this is a kernel thing rather than a ceph thing. I'm probably 
running the similar if not same version as you: 3.10.0-327


Regards,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] attempt to access beyond end of device on osd prepare

2016-02-01 Thread Simon Ironside

Has nobody else encountered this (or can explain it away) then?

That's a bit of a worry, I wish I'd never spotted it now. I've just 
built the second OSD server and it produces exactly the same "attempt to 
access beyond end of device" message for each osd prepared.


Regards,
Simon.

On 27/01/16 12:26, Simon Ironside wrote:

Hi All,

I'm setting up a new cluster and owing to an unrelated mistake I made
during setup I've been paying particular attention to the system log on
the OSD server.

When I run the ceph-deploy osd prepare step, an 'access beyond end of
device' error is logged just after the file system is mounted where it
seems to be attempting to reach one block beyond the end of the disk:

Jan 26 16:13:47 ceph-osd1 kernel: attempt to access beyond end of device
Jan 26 16:13:47 ceph-osd1 kernel: sde1: rw=0, want=7814035088,
limit=7814035087

This happens with every disk in the OSD host, whether configured for
internal or external journals, and also appears when the OSDs are
remounted if the host is rebooted. The OSDs start up and seem to
function just fine.

I've checked the XFS file system is without the bounds of the partition
with xfs_info, and it is. I've also checked the OSD hosts in my other
live clusters but they don't show this error so I'm reluctant to ignore it.

As this is a new cluster it doesn't yet contain anything important so
it's no problem to flatten it and start again. I'm using hammer 0.94.5
on RHEL 7.2.

Thanks,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph + Libvirt + QEMU-KVM

2016-01-28 Thread Simon Ironside

Hi,

On 28/01/16 19:51, koukou73gr wrote:


Doesn't discard require the pc-q35-rhel7 (or equivalent) guest machine
type, which in turn shoves a non-removable SATA AHCI device in the guest
which can't be frozen and thus disables guest live migration?


I have no trouble with live migration and using discard. The guest 
machine type is:


hvm

Cheers,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph + Libvirt + QEMU-KVM

2016-01-28 Thread Simon Ironside

On 28/01/16 03:37, Bill WONG wrote:

thank you! possible to show me what package you installed in compute
node for ceph?


Sure, here's the package selection from my kickstart script:

@^virtualization-host-environment
@base
@core
@virtualization-hypervisor
@virtualization-tools
@virtualization-platform
@virtualization-client
xorg-x11-xauth
ceph

I've got some other bits and pieces installed (like net-snmp, telnet 
etc) but the above are the core bits.


Simon.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph + Libvirt + QEMU-KVM

2016-01-28 Thread Simon Ironside

On 28/01/16 08:30, Bill WONG wrote:


without having qcow2, the qemu-kvm cannot make snapshot and other
features anyone have ideas or experiences on this?
thank you!


I'm using raw too and create snapshots using "rbd snap create"

Cheers,
Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph + Libvirt + QEMU-KVM

2016-01-28 Thread Simon Ironside

On 28/01/16 10:59, Bill WONG wrote:


how you manage to preform snapshot with the raw format in qemu-kvm VMs?


# Create a snapshot called backup of testvm's image in the rbd pool:
rbd snap create --read-only rbd/testvm@backup

# Dump the snapshot to file
rbd export rbd/testvm@backup testvm-backup.img

# Delete the snapshot
rbd snap rm rbd/testvm@backup


cloud you please let me know how you use the ceph as backend storage of
qemu-kvm, as if i google it, most of the ceph application is used for
OpenStack, but not simply pure qemu-kvm.


I'm using pure libvirt/qemu-kvm too. Last I checked, virt-install 
doesn't support using rbd volumes directly. There's two ways I know of 
to get around this:


1. Create the VM using a file-based qcow2 image then convert to rbd

2. Modify the XML produced by virt-install before the VM is started

I can provide detailed steps for these if you need them.

Cheers,
Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph + Libvirt + QEMU-KVM

2016-01-28 Thread Simon Ironside

On 28/01/16 12:56, Bill WONG wrote:


you dump the snapshot to raw img, possible to exprt to qcow2 format?


Yes, I dump to raw because I'm able to get better and faster compression 
of the image myself than using the qcow2 format.


You can export directly to qcow2 with qemu-img convert if you want:
qemu-img convert -c -p -f rbd -O qcow2 \
rbd:rbd/testvm@backup \
testvm.qcow2


and you meant create the VM using qcow2 with the local HDD storage, then
convert to rbd?
is it perfect, if you can provide more details... it's highly
appreciated.. thank you!


Ok . . .

1. Create the VM using a file-based qcow2 image then convert to rbd

# Create a VM using a regular file
virt-install --name testvm \
--ram=1024 --vcpus=1 --os-variant=rhel7 \
--controller scsi,model=virtio-scsi \
--disk 
path=/var/lib/libvirt/images/testvm.qcow2,size=10,bus=scsi,format=qcow2 \

--cdrom=/var/lib/libvirt/images/rhel7.iso \
--nonetworks --graphics vnc

# Complete your VM's setup then shut it down

# Convert qcow2 image to rbd
qemu-img convert -p -f qcow2 -O rbd \
/var/lib/libvirt/images/testvm.qcow2 \
rbd:rbd/testvm

# Delete the qcow2 image, don't need it anymore
rm -f /var/lib/libvirt/images/testvm.qcow2

# Update the VM definition
virsh edit testvm
  # Find the  section referring to your original qcow2 image
  # Delete it and replace with:

  


  
  
  


  


  

  # Obvious use your own ceph monitor host name(s)
  # Also change CEPH_USERNAME and SECRET_UUID to suit

# Restart your VM, it'll now be using ceph storage directly.

Btw, using virtio-scsi devices as above and discard='unmap' above 
enables TRIM support. This means you can use fstrim or mount file 
systems with discard inside the VM to free up unused space in the image.


2. Modify the XML produced by virt-install before the VM is started

The process here is basically the same as above, the trick is to make 
the disk XML change before the VM is started for the first time so that 
it's not necessary to shut down the VM to copy from qcow2 file to rbd image.


# Create RBD image for the VM
qemu-img create -f rbd rbd:rbd/testvm 10G

# Create a VM XML but don't start it
virt-install --name testvm \
--ram=1024 --vcpus=1 --os-variant=rhel7 \
--controller scsi,model=virtio-scsi \
--disk 
path=/var/lib/libvirt/images/deleteme.img,size=1,bus=scsi,format=raw \

--cdrom=/var/lib/libvirt/images/rhel7.iso \
--nonetworks --graphics vnc \
--dry-run --print-step 1 > testvm.xml

# Define the VM from XML
virsh define testvm.xml

# Update the VM definition
virsh edit testvm
  # Find the  section referring to your original deleteme image
  # Delete it and replace it with RBD disk XML as in procedure 1.

# Restart your VM, it'll now be using ceph storage directly.

I think it's easier to understand what's going on with procedure 1 but 
once you're comfortable I suspect you'll end up using procedure 2, 
mainly because it saves having to shut down the VM and do the conversion 
and also because my compute nodes only have tiny local storage.


It's also possible to script much of the above with the likes of virsh 
detach-disk and virsh attach-device to make the disk XML change.


Cheers,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph + Libvirt + QEMU-KVM

2016-01-28 Thread Simon Ironside

On 28/01/16 14:51, Bill WONG wrote:


perfect! thank you!


You're welcome :)


do you find a strange issue on the snapshot of rbd? the snapshot actual
size is bigger than the original image file.. if you use rbd du to check
the size..


I'm using the hammer release so "rbd du" doesn't work for me.

Cheers,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph + Libvirt + QEMU-KVM

2016-01-27 Thread Simon Ironside

Hi Bill,

On 27/01/16 15:37, Bill WONG wrote:


---
for example:
qemu-img create -f rbd rbd:data/foo 10G

Formatting 'rbd:data/foo', fmt=rbd size=10737418240
no monitors specified to connect to.
qemu-img: rbd:data/foo: error connecting



Do you have /etc/ceph/ceph.conf present on this host to specify which 
monitors to use?


Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph + Libvirt + QEMU-KVM

2016-01-27 Thread Simon Ironside

On 27/01/16 16:51, Bill WONG wrote:


i have ceph cluster and KVM in different machine the qemu-kvm
(CentOS7) is dedicated compute node installed with qemu-kvm + libvirtd
only, there should be no /etc/ceph/ceph.conf


Likewise, my compute nodes are separate machines from the OSDs/monitors 
but the compute nodes still have the ceph package installed and 
/etc/ceph/ceph.conf present. They just aren't running any ceph daemons.


I give the compute nodes their own ceph key with write access to the 
pool for VM storage and read access to the monitors. I can then use ceph 
status, rbd create, qemu-img etc directly on the compute nodes.


Cheers,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] attempt to access beyond end of device on osd prepare

2016-01-27 Thread Simon Ironside

Hi All,

I'm setting up a new cluster and owing to an unrelated mistake I made 
during setup I've been paying particular attention to the system log on 
the OSD server.


When I run the ceph-deploy osd prepare step, an 'access beyond end of 
device' error is logged just after the file system is mounted where it 
seems to be attempting to reach one block beyond the end of the disk:


Jan 26 16:13:47 ceph-osd1 kernel: attempt to access beyond end of device
Jan 26 16:13:47 ceph-osd1 kernel: sde1: rw=0, want=7814035088, 
limit=7814035087


This happens with every disk in the OSD host, whether configured for 
internal or external journals, and also appears when the OSDs are 
remounted if the host is rebooted. The OSDs start up and seem to 
function just fine.


I've checked the XFS file system is without the bounds of the partition 
with xfs_info, and it is. I've also checked the OSD hosts in my other 
live clusters but they don't show this error so I'm reluctant to ignore it.


As this is a new cluster it doesn't yet contain anything important so 
it's no problem to flatten it and start again. I'm using hammer 0.94.5 
on RHEL 7.2.


Thanks,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-extras for rhel7

2014-07-25 Thread Simon Ironside

Hi again,

I've had a look at the qemu-kvm SRPM and RBD is intentionally disabled 
in the RHEL 7.0 release packages. There's a block in the .spec file that 
reads:


%if %{rhev}
--enable-live-block-ops \
--enable-ceph-support \
%else
--disable-live-block-ops \
--disable-ceph-support \
%endif

rhev is defined 0 at the top of the file, setting this to 1 and 
rebuilding after sorting the build dependencies yields some new packages 
with RBD support and a -rhev suffix that install and work on RHEL 7.0 
just fine. I tested with a KVM VM using RBD/cephx storage via 
libvirt/qemu directly. As I was using virtio-scsi, TRIM also worked.


iasl was the only build requirement I wasn't able to satisfy so I 
commented it out (the comments state that it's not a hard requirement). 
This doesn't seem to have had any ill effects for me.


To avoid the -rhev suffix I ultimately made the attached changes to the 
spec file before rebuilding them for myself.


Cheers,
Simon

On 21/07/14 14:23, Simon Ironside wrote:

Hi,

Is there going to be ceph-extras repos for rhel7?

Unless I'm very much mistaken I think the RHEL 7.0 release qemu-kvm
packages don't support RBD.

Cheers,
Simon.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--- qemu-kvm.spec.orig
+++ qemu-kvm.spec
@@ -74,7 +74,7 @@
 Summary: QEMU is a FAST! processor emulator
 Name: %{pkgname}%{?pkgsuffix}
 Version: 1.5.3
-Release: 60%{?dist}
+Release: 61%{?dist}
 # Epoch because we pushed a qemu-1.0 package. AIUI this can't ever be dropped
 Epoch: 10
 License: GPLv2+ and LGPLv2+ and BSD
@@ -2273,7 +2273,7 @@
 # iasl and cpp for acpi generation (not a hard requirement as we can use
 # pre-compiled files, but it's better to use this)
 %ifarch %{ix86} x86_64
-BuildRequires: iasl
+#BuildRequires: iasl
 BuildRequires: cpp
 %endif
 
@@ -3551,14 +3551,14 @@
 --enable-ceph-support \
 %else
 --disable-live-block-ops \
---disable-ceph-support \
+--enable-ceph-support \
 %endif
 --disable-live-block-migration \
 --enable-glusterfs \
 %if %{rhev}
 --block-drv-rw-whitelist=qcow2,raw,file,host_device,nbd,iscsi,gluster,rbd \
 %else
---block-drv-rw-whitelist=qcow2,raw,file,host_device,nbd,iscsi,gluster \
+--block-drv-rw-whitelist=qcow2,raw,file,host_device,nbd,iscsi,gluster,rbd \
 %endif
 --block-drv-ro-whitelist=vmdk,vhdx,vpc \
 $@
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-extras for rhel7

2014-07-21 Thread Simon Ironside

Hi,

Is there going to be ceph-extras repos for rhel7?

Unless I'm very much mistaken I think the RHEL 7.0 release qemu-kvm 
packages don't support RBD.


Cheers,
Simon.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Adding OSD with journal on another disk

2014-07-19 Thread Simon Ironside

Hi there,

OS: RHEL 7.0 x86_64
ceph: 0.80.4
ceph-deploy: 1.5.9

(disks have been zapped first)

I think there might be a typo in the documentation here:
http://ceph.com/docs/master/rados/deployment/ceph-deploy-osd/#prepare-osds

If I follow by doing this:
ceph-deploy osd prepare ceph-osd1:sdc:/dev/sdb1

A data partition on sdc is created but a journal partition on sdb is 
not, sdb which remains empty. Instead a regular file named /dev/sdb1 is 
created on ceph-osd1 which needs to be cleaned up by deleting manually.


Dropping the partition number from the journal device name like this:
ceph-deploy osd prepare ceph-osd1:sdc:/dev/sdb
ceph-deploy osd activate ceph-osd1:/dev/sdc1:/dev/sdb1

Appears to work. Both the data and journal partitions are created, the 
data partition is mounted and the OSD shows as up in the cluster.


Adding more disks and journals in the same way also seems to work fine i.e.:

ceph-deploy osd prepare ceph-osd1:sdd:/dev/sdb
ceph-deploy osd prepare ceph-osd1:sde:/dev/sdb
ceph-deploy osd activate ceph-osd1:/dev/sdd1:/dev/sdb2
ceph-deploy osd activate ceph-osd1:/dev/sde1:/dev/sdb3

This all looks good, the data partitions are created on each data disk, 
the journal partitions are added (not overwritten) on the journal disk 
and the journal symlinks on each mounted OSD point to a unique partition id.


Is this latter way the correct way to create OSDs with separated 
journals? I guess the former way expects you to have already created the 
journal partition and only behaved badly because I didn't?


Thanks,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Journal SSD durability

2014-05-22 Thread Simon Ironside

Hi,

Just to revisit this one last time . . .

Is the issue only with the SandForce SF-2281 in the Kingston E50? Or are 
all SandForce controllers considered dodgy, including the SF-2582 in the 
Kingston E100 and a few other manufacturer's enterprise SSDs?


Thanks,
Simon.

On 16/05/14 22:30, Carlos M. Perez wrote:

Unfortunately, the Seagate Pro 600 has been discontinued, 
http://comms.seagate.com/servlet/servlet.FileDownload?file=00P300JHLCCEA5.  
The replacement is the 1200 series which are more 2x the price but have a SAS 
12gbps interface.  You can still find the 600's out there at around $300/drive. 
 Still a very good price based on specs and backed by the reviews.

The Kingston E100's have a DWPD rating of 11 at the 100/200GB capacity, and similar 
specs to the S3700's (400GB), but more expensive per GB  PBW than the intel 
S3700, so I'd probably stick with the S3700s.

Carlos M. Perez
CMP Consulting Services
305-669-1515


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Simon Ironside
Sent: Friday, May 16, 2014 4:08 PM
To: Christian Balzer
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Journal SSD durability

On 16/05/14 16:34, Christian Balzer wrote:

Thanks for bringing that to my attention.
It looks very good until one gets to the Sandforce controller in the specs.

As in, if you're OK with occasional massive spikes in latency, go for
it (same for the Intel 530).
If you prefer consistent perfomance, avoid.


Cool, that saves me from burning £100 unnecessarily. Thanks.
I've one more suggestion before I just buy an Intel DC S3500 . . .

Seagate 600 Pro 100GB
520/300 Sequential Read/Write
80k/20k Random 4k Read/Write IOPS
Power Loss Protection
280/650TB endurance (two figures, weird, but both high) 5yr warranty and
not a bad price

http://www.seagate.com/www-content/product-content/ssd-fam/600-pro-
ssd/en-gb/docs/600-pro-ssd-data-sheet-ds1790-3-1310gb.pdf

It's not a SandForce controller :) It's a LAMD LM87800.

Cheers,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph deploy on rhel6.5 installs ceph from el6 and fails

2014-05-22 Thread Simon Ironside

On 22/05/14 23:56, Lukac, Erik wrote:

But: this fails because of the dependencies. xfsprogs is in rhel6 repo,
but not in el6 L


I hadn't noticed that xfsprogs is included in the ceph repos, I'm using 
the package from the RHEL 6.5 DVD, which is the same version, you'll 
find it in the ScalableFileSystem repo on the Install DVD.


HTH,
Simon.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Journal SSD durability

2014-05-16 Thread Simon Ironside

On 13/05/14 13:23, Christian Balzer wrote:

Alas a DC3500 240GB SSD will perform well enough at half the price of
the DC3700 and give me enough breathing room at about 80GB/day writes,
so this is what I will order in the end.

Did you consider DC3700 100G with similar price?


The 3500 is already potentially slower than the actual HDDs when doing
sequential writes, the 100GB 3700 most definitely so.


Hi,

Any thoughts or experience of the Kingston E50 100GB SSD?

The 310TB endurance, power-loss protection and 550/530MBps sequential 
read/write rates seems to be quite suitable for journalling.


Cheers,
Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Journal SSD durability

2014-05-16 Thread Simon Ironside

On 16/05/14 22:30, Carlos M. Perez wrote:

Unfortunately, the Seagate Pro 600 has been discontinued, 
http://comms.seagate.com/servlet/servlet.FileDownload?file=00P300JHLCCEA5.  
The replacement is the 1200 series which are more 2x the price but have a SAS 
12gbps interface.  You can still find the 600's out there at around $300/drive. 
 Still a very good price based on specs and backed by the reviews.


Thanks, that's encouraging. What a shame they're being discontinued.
You can certainly still get them in the UK at ~£100 (~$160) a drive. 
Sounds like it might be worth a shot.


Cheers,
Simon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-noarch firefly repodata

2014-05-11 Thread Simon Ironside

Hi there,

Is there any reason not to use the latest packages from:
ceph.com/rpm-firefly/rhel6/noarch/ ?

I.e. when installing via yum, ceph-deploy-1.4.0 is installed but 1.5.0, 
1.5.1 and 1.5.2 are present in the directory above.


Yum also complains about radosgw-agent-1.2.0-0.noarch.rpm not being 
present on the same repo.


Perhaps the repodata just needs updated?

Cheers,
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com