[ceph-users] Crush Map for test lab

2017-10-11 Thread Ashley Merrick
Hello, Setting up a new test lab, single server 5 disks/OSD. Want to run an EC Pool that has more shards than avaliable OSD's , is it possible to force crush to 're use an OSD for another shard? I know normally this is bad practice but is for testing only on a single server setup. Thanks,

Re: [ceph-users] Ceph-ISCSI

2017-10-11 Thread Adrian Saul
It’s a fair point – in our case we are based on CentOS so self-support only anyway (business does not like paying support costs). At the time we evaluated LIO, SCST and STGT, with a directive to use ALUA support instead of IP failover. In the end we went with SCST as it had more mature

Re: [ceph-users] Ceph-ISCSI

2017-10-11 Thread Samuel Soulard
Yes I looked at this solution, and it seems interesting. However, one point often stick with business requirements is commercial support. With Redhat or Suse, you have support provided with the solution. I'm not sure about SCST what support channel they offer. Sam On Oct 11, 2017 20:05,

Re: [ceph-users] Ceph-ISCSI

2017-10-11 Thread Adrian Saul
As an aside, SCST iSCSI will support ALUA and does PGRs through the use of DLM. We have been using that with Solaris and Hyper-V initiators for RBD backed storage but still have some ongoing issues with ALUA (probably our current config, we need to lab later recommendations). >

Re: [ceph-users] assertion error trying to start mds server

2017-10-11 Thread Bill Sharer
I was wondering if I can't get the second mds back up That offline backward scrub check sounds like it should be able to also salvage what it can of the two pools to a normal filesystem.  Is there an option for that or has someone written some form of salvage tool? On 10/11/2017 07:07 AM,

Re: [ceph-users] Luminous 12.2.1 - RadosGW Multisite doesnt replicate multipart uploads

2017-10-11 Thread Enrico Kern
or this: { "shard_id": 22, "entries": [ { "id": "1_1507761448.758184_10459.1", "section": "data", "name": "testbucket:6a9448d2-bdba-4bec-aad6-aba72cd8eac6.21344646.3/Wireshark-win64-2.2.7.exe",

Re: [ceph-users] Luminous 12.2.1 - RadosGW Multisite doesnt replicate multipart uploads

2017-10-11 Thread Enrico Kern
its 45MB, but it happens with all multipart uploads. sync error list shows { "shard_id": 31, "entries": [ { "id": "1_1507761459.607008_8197.1", "section": "data", "name":

Re: [ceph-users] Luminous 12.2.1 - RadosGW Multisite doesnt replicate multipart uploads

2017-10-11 Thread Yehuda Sadeh-Weinraub
What is the size of the object? Is it only this one? Try this command: 'radosgw-admin sync error list'. Does it show anything related to that object? Thanks, Yehuda On Wed, Oct 11, 2017 at 3:26 PM, Enrico Kern wrote: > if i change permissions the sync status

Re: [ceph-users] Luminous 12.2.1 - RadosGW Multisite doesnt replicate multipart uploads

2017-10-11 Thread Enrico Kern
in addition i noticed that if you delete a bucket that had multipart upload files which were not replicated in it that the files are not deleted in the pool, while the bucket is gone the data stil remains in the pool where the multipart upload was initiated. On Thu, Oct 12, 2017 at 12:26 AM,

Re: [ceph-users] Luminous 12.2.1 - RadosGW Multisite doesnt replicate multipart uploads

2017-10-11 Thread Enrico Kern
if i change permissions the sync status shows that it is syncing 1 shard, but no files ends up in the pool (testing with empty data pool). after a while it shows that data is back in sync but there is no file On Wed, Oct 11, 2017 at 11:26 PM, Yehuda Sadeh-Weinraub wrote: >

Re: [ceph-users] RGW flush_read_list error

2017-10-11 Thread Travis Nielsen
To the client they were showing up as a 500 error. Ty, do you know of any client-side issues that could have come up during the test run? And there was only a single GET happening at a time, right? On 10/11/17, 9:27 AM, "ceph-users on behalf of Casey Bodley"

Re: [ceph-users] Luminous 12.2.1 - RadosGW Multisite doesnt replicate multipart uploads

2017-10-11 Thread Yehuda Sadeh-Weinraub
Thanks for your report. We're looking into it. You can try to see if touching the object (e.g., modifying its permissions) triggers the sync. Yehuda On Wed, Oct 11, 2017 at 1:36 PM, Enrico Kern wrote: > Hi David, > > yeah seems you are right, they are stored as

Re: [ceph-users] Luminous 12.2.1 - RadosGW Multisite doesnt replicate multipart uploads

2017-10-11 Thread Enrico Kern
Hi David, yeah seems you are right, they are stored as different filenames in the data bucket when using multisite upload. But anyway it stil doesnt get replicated. As example i have files like

[ceph-users] Luminous 12.2.1 - RadosGW Multisite doesnt replicate multipart uploads

2017-10-11 Thread Enrico Kern
Hi all, i just setup multisite replication according to the docs from http://docs.ceph.com/docs/master/radosgw/multisite/ and everything works except that if a client uploads via multipart the files dont get replicated. If i in one zone rename a file that was uploaded via multipart it gets

Re: [ceph-users] Ceph-ISCSI

2017-10-11 Thread Samuel Soulard
Ahh so, in this case, only Suse Enterprise Storage is able to provide ISCSI connections of MS Clusters if an HA is required be it Active/Standby, Active/Active or Active/Failover. On Wed, Oct 11, 2017 at 2:03 PM, Jason Dillaman wrote: > On Wed, Oct 11, 2017 at 1:10 PM,

Re: [ceph-users] Ceph-ISCSI

2017-10-11 Thread Jason Dillaman
On Wed, Oct 11, 2017 at 1:10 PM, Samuel Soulard wrote: > Hmmm, If you failover the identity of the LIO configuration including PGRs > (I believe they are files on disk), this would work no? Using an 2 ISCSI > gateways which have shared storage to store the LIO

Re: [ceph-users] Ceph-ISCSI

2017-10-11 Thread Samuel Soulard
Hmmm, If you failover the identity of the LIO configuration including PGRs (I believe they are files on disk), this would work no? Using an 2 ISCSI gateways which have shared storage to store the LIO configuration and PGR data. Also, you said another "fails over to another port", do you mean a

Re: [ceph-users] Ceph-ISCSI

2017-10-11 Thread Jason Dillaman
On Wed, Oct 11, 2017 at 12:31 PM, Samuel Soulard wrote: > Hi to all, > > What if you're using an ISCSI gateway based on LIO and KRBD (that is, RBD > block device mounted on the ISCSI gateway and published through LIO). The > LIO target portal (virtual IP) would failover

Re: [ceph-users] Ceph-ISCSI

2017-10-11 Thread Samuel Soulard
Hi to all, What if you're using an ISCSI gateway based on LIO and KRBD (that is, RBD block device mounted on the ISCSI gateway and published through LIO). The LIO target portal (virtual IP) would failover to another node. This would theoretically provide support for PGRs since LIO does support

Re: [ceph-users] RGW flush_read_list error

2017-10-11 Thread Casey Bodley
Hi Travis, This is reporting an error when sending data back to the client. Generally it means that the client timed out and closed the connection. Are you also seeing failures on the client side? Casey On 10/10/2017 06:45 PM, Travis Nielsen wrote: In Luminous 12.2.1, when running a GET

Re: [ceph-users] Ceph-ISCSI

2017-10-11 Thread David Disseldorp
Hi Jason, Thanks for the detailed write-up... On Wed, 11 Oct 2017 08:57:46 -0400, Jason Dillaman wrote: > On Wed, Oct 11, 2017 at 6:38 AM, Jorge Pinilla López > wrote: > > > As far as I am able to understand there are 2 ways of setting iscsi for > > ceph > > > > 1- using

Re: [ceph-users] ceph osd disk full (partition 100% used)

2017-10-11 Thread Webert de Souza Lima
That sounds like it. Thanks David. I wonder if that behavior of ignoring the OSD full_ratio is intentional. Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* On Wed, Oct 11, 2017 at 12:26 PM, David Turner wrote: > The full ratio is based

Re: [ceph-users] ceph osd disk full (partition 100% used)

2017-10-11 Thread David Turner
The full ratio is based on the max bytes. if you say that the cache should have a max bytes of 1TB and that the full ratio is .8, then it will aim to keep it at 800GB. Without a max bytes value set, the ratios are a percentage of unlimited... aka no limit themselves. The full_ratio should be

[ceph-users] ceph osd disk full (partition 100% used)

2017-10-11 Thread Webert de Souza Lima
Hi, I have a cephfs cluster as follows: 1 15x HDD data pool (primary cephfs data pool) 1 2x SSD data pool (linked to a specific dir via xattrs) 1 2x SSD metadata pool 1 2x SSD cache tier pool the cache tier pool consists in 2 host, with one SSD OSD on each host, with size=2 replicated by host.

Re: [ceph-users] Ceph-ISCSI

2017-10-11 Thread Jake Young
On Wed, Oct 11, 2017 at 8:57 AM Jason Dillaman wrote: > On Wed, Oct 11, 2017 at 6:38 AM, Jorge Pinilla López > wrote: > >> As far as I am able to understand there are 2 ways of setting iscsi for >> ceph >> >> 1- using kernel (lrbd) only able on SUSE,

Re: [ceph-users] min_size & hybrid OSD latency

2017-10-11 Thread David Turner
Christian is correct that min_size does not affect how many need to ACK the write, it is responsible for how many copies need to be available for the PG to be accessible. This is where SSD journals for filestore and SSD DB/WAL partitions come into play. The write is considered ACK'd as soon as

Re: [ceph-users] min_size & hybrid OSD latency

2017-10-11 Thread Reed Dier
Just for the sake of putting this in the public forum, In theory, by placing the primary copy of the object on an SSD medium, and placing replica copies on HDD medium, it should still yield some improvement in writes, compared to an all HDD scenario. My logic here is rooted in the idea that

[ceph-users] general protection fault: 0000 [#1] SMP

2017-10-11 Thread Olivier Bonvalet
Hi, I had a "general protection fault: " with Ceph RBD kernel client. Not sure how to read the call, is it Ceph related ? Oct 11 16:15:11 lorunde kernel: [311418.891238] general protection fault: [#1] SMP Oct 11 16:15:11 lorunde kernel: [311418.891855] Modules linked in: cpuid

Re: [ceph-users] advice on number of objects per OSD

2017-10-11 Thread David Turner
I've managed RBD cluster that had all of the RBDs configured to 1M objects and filled up the cluster to 75% full with 4TB drives. Other than the collection splitting (subfolder splitting as I've called it before) we didn't have any problems with object counts. On Wed, Oct 11, 2017 at 9:47 AM

Re: [ceph-users] right way to recover a failed OSD (disk) when using BlueStore ?

2017-10-11 Thread Alejandro Comisario
David, thanks. I've switched the brnach to Luminous and the doc is the same (thankfully). No worries, i'll wait till someone that hopefully did it already might give me a hint. thanks! On Wed, Oct 11, 2017 at 11:00 AM, David Turner wrote: > Careful when you're looking at

Re: [ceph-users] right way to recover a failed OSD (disk) when using BlueStore ?

2017-10-11 Thread David Turner
Careful when you're looking at documentation. You're looking at the master branch which might have unreleased features or changes that your release doesn't have. You'll want to change master in the url to luminous to make sure that you're looking at the documentation for your version of Ceph. I

Re: [ceph-users] advice on number of objects per OSD

2017-10-11 Thread Gregory Farnum
These limits unfortunately aren’t very well understood or studied right now. The biggest slowdown I’m aware of is that when using FileStore you see an impact as it starts to create more folders internally (this is the “collection splitting”) and require more cached metadata to do fast lookups.

Re: [ceph-users] BlueStore Cache Ratios

2017-10-11 Thread Mark Nelson
Hi Jorge, I was sort of responsible for all of this. :) So basically there are different caches in different places: - rocksdb bloom filter and index cache - rocksdb block cache (which can be configured to include filters and indexes) - rocksdb compressed block cache - bluestore onode cache

Re: [ceph-users] BlueStore Cache Ratios

2017-10-11 Thread Jorge Pinilla López
okay, thanks for the explanation, so from the 3GB of Cache (default cache for SSD) only a 0.5GB is going to K/V and 2.5 going to metadata. Is there a way of knowing how much k/v, metadata, data is storing and how full cache is so I can adjust my ratios?, I was thinking some ratios (like 0.9 k/v,

Re: [ceph-users] Ceph-ISCSI

2017-10-11 Thread Jason Dillaman
On Wed, Oct 11, 2017 at 6:38 AM, Jorge Pinilla López wrote: > As far as I am able to understand there are 2 ways of setting iscsi for > ceph > > 1- using kernel (lrbd) only able on SUSE, CentOS, fedora... > The target_core_rbd approach is only utilized by SUSE (and its

Re: [ceph-users] BlueStore Cache Ratios

2017-10-11 Thread Mohamad Gebai
Hi Jorge, On 10/10/2017 07:23 AM, Jorge Pinilla López wrote: > Are .99 KV, .01 MetaData and .0 Data ratios right? they seem a little > too disproporcionate. Yes, this is correct. > Also .99 KV and Cache of 3GB for SSD means that almost the 3GB would > be used for KV but there is also another

Re: [ceph-users] Bareos and libradosstriper works only for 4M sripe_unit size

2017-10-11 Thread Alexander Kushnirenko
Oh! I put a wrong link, sorry The picture which explains stripe_unit and stripe count is here: https://indico.cern.ch/event/330212/contributions/1718786/attachments/642384/883834/CephPluginForXroot.pdf I tried to attach it in the mail, but it was blocked. On Wed, Oct 11, 2017 at 3:16 PM,

Re: [ceph-users] Bareos and libradosstriper works only for 4M sripe_unit size

2017-10-11 Thread Alexander Kushnirenko
Hi, Ian! Thank you for your reference! Could you comment on the following rule: object_size = stripe_unit * stripe_count Or it is not necessarily so? I refer to page 8 in this report: https://indico.cern.ch/event/531810/contributions/2298934/at

Re: [ceph-users] Bareos and libradosstriper works only for 4M sripe_unit size

2017-10-11 Thread Alexander Kushnirenko
Hi, Gregory! You are absolutely right! Thanks! The following sequence solves the problem: rados_striper_set_object_layout_stripe_unit(m_striper, stripe_unit); rados_striper_set_object_layout_stripe_count(m_striper, stripe_count); int stripe_size = stripe_unit * stripe_count;

Re: [ceph-users] assertion error trying to start mds server

2017-10-11 Thread John Spray
On Wed, Oct 11, 2017 at 1:42 AM, Bill Sharer wrote: > I've been in the process of updating my gentoo based cluster both with > new hardware and a somewhat postponed update. This includes some major > stuff including the switch from gcc 4.x to 5.4.0 on existing hardware >

[ceph-users] Ceph-ISCSI

2017-10-11 Thread Jorge Pinilla López
As far as I am able to understand there are 2 ways of setting iscsi for ceph 1- using kernel (lrbd) only able on SUSE, CentOS, fedora... 2- using userspace (tcmu , ceph-iscsi-conf, ceph-iscsi-cli) I don't know which one is better, I am seeing that oficial support is pointing to tcmu but i havent

Re: [ceph-users] Bareos and libradosstriper works only for 4M sripe_unit size

2017-10-11 Thread ian.johnson
Hi Gregory You're right, when setting the object layout in libradosstriper, one should set all three parameters (the number of stripes, the size of the stripe unit, and the size of the striped object). The Ceph plugin for GridFTP has an example of this at

Re: [ceph-users] A new SSD for journals - everything sucks?

2017-10-11 Thread Piotr Dałek
On 17-10-11 09:50 AM, Josef Zelenka wrote: Hello everyone, lately, we've had issues with buying SSDs that we use for journaling(Kingston stopped making them) - Kingston V300 - so we decided to start using a different model and started researching which one would be the best price/value for

[ceph-users] A new SSD for journals - everything sucks?

2017-10-11 Thread Josef Zelenka
Hello everyone, lately, we've had issues with buying SSDs that we use for journaling(Kingston stopped making them) - Kingston V300 - so we decided to start using a different model and started researching which one would be the best price/value for us. We compared five models, to check if they

Re: [ceph-users] All replicas of pg 5.b got placed on the same host - how to correct?

2017-10-11 Thread Konrad Riedel
Thanks a lot - problem fixed. On 10.10.2017 16:58, Peter Linder wrote: I think your failure domain within your rules is wrong. step choose firstn 0 type osd Should be: step choose firstn 0 type host On 10/10/2017 5:05 PM, Konrad Riedel wrote: Hello Ceph-users, after switching to