Re: [ceph-users] ceph-ansible / block-db block-wal

2019-11-01 Thread solarflow99
ceph-ansible is able to find those on its own now, try just not specifying
the devices and dedicated devices like before, you'll see in the osd .yml
file its changed.


On Wed, Oct 30, 2019 at 3:47 AM Lars Täuber  wrote:

> I don't use ansible anymore. But this was my config for the host onode1:
>
> ./host_vars/onode2.yml:
>
> lvm_volumes:
>   - data: /dev/sdb
> db: '1'
> db_vg: host-2-db
>   - data: /dev/sdc
> db: '2'
> db_vg: host-2-db
>   - data: /dev/sde
> db: '3'
> db_vg: host-2-db
>   - data: /dev/sdf
> db: '4'
> db_vg: host-2-db
> …
>
> one config file per host. The LVs were created by hand on a PV over RAID1
> over two SSDs.
> The hosts had empty slots for hdds to be bought later. So I had to
> "partition" the PV by hand, because ansible uses the whole RAID1 only for
> the present HDDs.
>
> It is said that only certain sizes of DB & WAL partitions are sensible.
> I now use 58GiB LVs.
> The remaining space in the RAID1 is used for a faster OSD.
>
>
> Lars
>
>
> Wed, 30 Oct 2019 10:02:23 +
> CUZA Frédéric  ==> "ceph-users@lists.ceph.com" <
> ceph-users@lists.ceph.com> :
> > Hi Everyone,
> >
> > Does anyone know how to indicate block-db and block-wal to device on
> ansible ?
> > In ceph-deploy it is quite easy :
> > ceph-deploy osd create osd_host08 --data /dev/sdl --block-db /dev/sdm12
> --block-wal /dev/sdn12 -bluestore
> >
> > On my data nodes I have 12 HDDs and 2 SSDs I use those SSDs for block-db
> and block-wal.
> > How to indicate for each osd which partition to use ?
> >
> > And finally, how do you handle the deployment if you have multiple data
> nodes setup ?
> > SSDs on sdm and sdn on one host and SSDs on sda and sdb on another ?
> >
> > Thank you for your help.
> >
> > Regards,
>
>
> --
> Informationstechnologie
> Berlin-Brandenburgische Akademie der Wissenschaften
> Jägerstraße 22-23  10117 Berlin
> Tel.: +49 30 20370-352   http://www.bbaw.de
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Looking for the best way to utilize 1TB NVMe added to the host with 8x3TB HDD OSDs

2019-09-21 Thread solarflow99
now my understanding is that a NVMe drive is recommended to help speed up
bluestore.  If it were to fail then those OSDs would be lost but assuming
there is 3x replication and enough OSDs I don't see the problem here.
There are other scenarios where a whole server might le lost, it doesn't
mean the total loss of the cluster.


On Sat, Sep 21, 2019 at 5:27 AM Ashley Merrick 
wrote:

> Placing it as a Journal / Bluestore DB/WAL will help with writes mostly,
> by the sounds of it you want to increase read performance?, how important
> is the data on this CEPH cluster?
>
> If you place it as a Journal DB/WAL any failure of it will cause total
> data loss so I would very much advise against this unless this is totally
> for testing and total data loss is not an issue.
>
> In that can is worth upgrading to blue store by rebuilding each OSD
> placing the DB/WAL on a SSD partition, you can do this one OSD at a time
> but there is no migration path so you would need to wait for data
> rebuilding after each OSD change before moving onto the next.
>
> If you need to make sure your data is safe then your really limited to
> using it as a read only cache, but I think even then most setups would
> cause all OSD's to go offline till you manually removed it from a read only
> cache if the disk failed.
> However bcache/dm-cache may support this automatically however is still a
> risk that I personally wouldn't want to take.
>
> Also it really depends on your use for CEPH and the I/O activity expected
> to what the best option may be.
>
>
>
>  On Fri, 20 Sep 2019 14:56:12 +0800 *Wladimir Mutel  >* wrote 
>
> Dear everyone,
>
> Last year I set up an experimental Ceph cluster (still single node,
> failure domain = osd, MB Asus P10S-M WS, CPU Xeon E3-1235L, RAM 64 GB,
> HDDs WD30EFRX, Ubuntu 18.04, now with kernel 5.3.0 from Ubuntu mainline
> PPA and Ceph 14.2.4 from download.ceph.com/debian-nautilus/dists/bionic
> ). I set up JErasure 2+1 pool, created some RBDs using that as data pool
> and exported them by iSCSI (using tcmu-runner, gwcli and associated
> packages). But with HDD-only setup their performance was less than
> stellar, not saturating even 1Gbit Ethernet on RBD reads.
>
> This year my experiment was funded with Gigabyte PCIe NVMe 1TB SSD
> (GP-ASACNE2100TTTDR). Now it is plugged in the MB and is visible as a
> storage device to lsblk. Also I can see its 4 interrupt queues in
> /proc/interrupts, and its transfer measured by hdparm -t is about
> 2.3GB/sec.
>
> And now I want to ask your advice on how to best include it into this
> already existing setup. Should I allocate it for OSD journals and
> databases ? Is there a way to reconfigure existing OSD in this way
> without destroying and recreating it ? Or are there plans to ease this
> kind of migration ? Can I add it as a write-adsorbing cache to
> individual RBD images ? To individual block devices at the level of
> bcache/dm-cache ? What about speeding up RBD reads ?
>
> I would appreciate to read your opinions and recommendations.
> (just want to warn you that in this situation I don't have financial
> option of going full-SSD)
>
> Thank you all in advance for your response
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] increase pg_num error

2019-09-11 Thread solarflow99
You don't have to increase pgp_num first?


On Wed, Sep 11, 2019 at 6:23 AM Kyriazis, George 
wrote:

> I have the same problem (nautilus installed), but the proposed command
> gave me an error:
>
> # ceph osd require-osd-release nautilus
> Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_NAUTILUS feature
> #
>
> I created my cluster with mimic and then upgraded to nautilus.
>
> What would be my next step?
>
> Thanks!
>
> George
>
>
> > On Jul 1, 2019, at 9:21 AM, Nathan Fish  wrote:
> >
> > I ran into this recently. Try running "ceph osd require-osd-release
> > nautilus". This drops backwards compat with pre-nautilus and allows
> > changing settings.
> >
> > On Mon, Jul 1, 2019 at 4:24 AM Sylvain PORTIER  wrote:
> >>
> >> Hi all,
> >>
> >> I am using ceph 14.2.1 (Nautilus)
> >>
> >> I am unable to increase the pg_num of a pool.
> >>
> >> I have a pool named Backup, the current pg_num is 64 : ceph osd pool get
> >> Backup pg_num => result pg_num: 64
> >>
> >> And when I try to increase it using the command
> >>
> >> ceph osd pool set Backup pg_num 512 => result "set pool 6 pg_num to 512"
> >>
> >> And then I check with the command : ceph osd pool get Backup pg_num =>
> >> result pg_num: 64
> >>
> >> I don't how to increase the pg_num of a pool, I also tried the autoscale
> >> module, but it doesn't work (unable to activate the autoscale, always
> >> warn mode).
> >>
> >> Thank you for your help,
> >>
> >>
> >> Cabeur.
> >>
> >>
> >> ---
> >> L'absence de virus dans ce courrier électronique a été vérifiée par le
> logiciel antivirus Avast.
> >> https://www.avast.com/antivirus
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] forcing an osd down

2019-09-03 Thread solarflow99
I noticed this has happened before, this time I can't get it to stay down
at all, it just keeps coming back up:

# ceph osd down osd.48
marked down osd.48.

# ceph osd tree |grep osd.48
48   3.64000 osd.48 down0  1.0

# ceph osd tree |grep osd.48
48   3.64000 osd.48   up0  1.0



health HEALTH_WARN
2 pgs backfilling
1 pgs degraded
2 pgs stuck unclean
recovery 18/164089686 objects degraded (0.000%)
recovery 1467405/164089686 objects misplaced (0.894%)
 monmap e1: 3 mons at {0=
192.168.4.10:6789/0,1=192.168.4.11:6789/0,2=192.168.4.12:6789/0}
election epoch 210, quorum 0,1,2 0,1,2
 mdsmap e166: 1/1/1 up {0=0=up:active}, 2 up:standby
 osdmap e25733: 45 osds: 45 up, 44 in; 2 remapped pgs
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New Cluster Failing to Start

2019-08-15 Thread solarflow99
You are using Nautilus right?  Did you use ansible to deploy it?

On Wed, Aug 14, 2019, 10:31 AM  wrote:

> All;
>
> We're working to deploy our first production Ceph cluster, and we've run
> into a snag.
>
> The MONs start, but the "cluster" doesn't appear to come up.  Ceph -s
> never returns.
>
> These are the last lines in the event log of one of the mons:
> 2019-08-13 16:20:03.706 7f668108f180  0 starting mon.s700034 rank 0 at
> public addrs [v2:10.0.80.10:3330/0,v1:10.0.80.10:6789/0] at bind addrs
> [v2:10.0.80.10:3330/0,v1:10.0.80.10:6789/0] mon_data
> /var/lib/ceph/mon/ceph-s700034 fsid effc5134-e0cc-4628-a079-d67b60071f90
> 2019-08-13 16:20:03.709 7f668108f180  1 mon.s700034@-1(???) e0 preinit
> fsid effc5134-e0cc-4628-a079-d67b60071f90
> 2019-08-13 16:20:03.709 7f668108f180  1 mon.s700034@-1(???) e0
> initial_members s700034,s700035,s700036, filtering seed monmap
> 2019-08-13 16:20:03.713 7f668108f180  0 mon.s700034@-1(probing) e0  my
> rank is now 0 (was -1)
>
> Aside from the address and hostname, the others logs end with the same
> statements.
>
> I'm not seeing the log entries that I would expect as each MON joins the
> cluster, nor am I seeing the "cluster" log files being generated (i.e. I'm
> used to seeing ceph.log, and ceph-audit.log on one of the MONs).
>
> Each machine can ping the others.  Firewall rules are in place for ports
> 330 & 6789.
>
> Any idea what I'm missing?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] WAL/DB size

2019-08-14 Thread solarflow99
> Actually standalone WAL is required when you have either very small fast
> device (and don't want db to use it) or three devices (different in
> performance) behind OSD (e.g. hdd, ssd, nvme). So WAL is to be located
> at the fastest one.
>
> For the given use case you just have HDD and NVMe and DB and WAL can
> safely collocate. Which means you don't need to allocate specific volume
> for WAL. Hence no need to answer the question how many space is needed
> for WAL. Simply allocate DB and WAL will appear there automatically.
>
>
Yes, i'm surprised how often people talk about the DB and WAL separately
for no good reason.  In common setups bluestore goes on flash and the
storage goes on the HDDs, simple.

In the event flash is 100s of GB and would be wasted, is there anything
that needs to be done to set rocksdb to use the highest level?  600 I
believe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-ansible firewalld blocking ceph comms

2019-07-25 Thread solarflow99
I used ceph-ansible just fine, never had this problem.

On Thu, Jul 25, 2019 at 1:31 PM Nathan Harper 
wrote:

> Hi all,
>
> We've run into a strange issue with one of our clusters managed with
> ceph-ansible.   We're adding some RGW nodes to our cluster, and so re-ran
> site.yml against the cluster.  The new RGWs added successfully, but
>
> When we did, we started to get slow requests, effectively across the whole
> cluster.   Quickly we realised that the firewall was now (apparently)
> blocking Ceph communications.   I say apparently, because the config looks
> correct:
>
> [root@osdsrv05 ~]# firewall-cmd --list-all
>> public (active)
>>   target: default
>>   icmp-block-inversion: no
>>   interfaces:
>>   sources: 172.20.22.0/24 172.20.23.0/24
>>   services: ssh dhcpv6-client ceph
>>   ports:
>>   protocols:
>>   masquerade: no
>>   forward-ports:
>>   source-ports:
>>   icmp-blocks:
>>   rich rules:
>>
>
> If we drop the firewall everything goes back healthy.   All the clients
> (Openstack cinder) are on the 172.20.22.0 network (172.20.23.0 is the
> replication network).  Has anyone seen this?
> --
> *Nathan Harper* // IT Systems Lead
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New best practices for osds???

2019-07-24 Thread solarflow99
I can't understand how using RAID0 is better than JBOD, considering jbod
would be many individual disks, each used as OSDs, instead of a single big
one used as a single OSD.



On Mon, Jul 22, 2019 at 4:05 AM Vitaliy Filippov  wrote:

> OK, I meant "it may help performance" :) the main point is that we had at
> least one case of data loss due to some Adaptec controller in RAID0 mode
> discussed recently in our ceph chat...
>
> --
> With best regards,
>Vitaliy Filippov
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph performance IOPS

2019-07-05 Thread solarflow99
Just set 1 or more SSDs for bluestore, as long as you're within the 4% rule
I think it should be enough.


On Fri, Jul 5, 2019 at 7:15 AM Davis Mendoza Paco 
wrote:

> Hi all,
> I have installed ceph luminous, witch 5 nodes(45 OSD), each OSD server
> supports up to 16HD and I'm only using 9
>
> I wanted to ask for help to improve IOPS performance since I have about
> 350 virtual machines of approximately 15 GB in size and I/O processes are
> very slow.
> You who recommend me?
>
> In the documentation of ceph recommend using SSD for the journal, my
> question is
> How many SSD do I have to enable per server so that the journals of the 9
> OSDs can be separated into SSDs?
>
> I currently use ceph with OpenStack, on 11 servers with SO Debian Stretch:
> * 3 controller
> * 3 compute
> * 5 ceph-osd
>   network: bond lacp 10GB
>   RAM: 96GB
>   HD: 9 disk SATA-3TB (bluestore)
>
> --
> *Davis Mendoza P.*
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How does monitor know OSD is dead?

2019-06-28 Thread solarflow99
The thing i've seen a lot is where an OSD would get marked down because of
a failed drive, then then it would add itself right back again


On Fri, Jun 28, 2019 at 9:12 AM Robert LeBlanc  wrote:

> I'm not sure why the monitor did not mark it down after 600 seconds
> (default). The reason it is so long is that you don't want to move data
> around unnecessarily if the osd is just being rebooted/restarted. Usually,
> you will still have min_size OSDs available for all PGs that will allow IO
> to continue. Then when the down timeout expires it will start backfilling
> and recovering the PGs that were affected. Double check that size !=
> min_size for your pools.
> 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Thu, Jun 27, 2019 at 5:26 PM Bryan Henderson 
> wrote:
>
>> What does it take for a monitor to consider an OSD down which has been
>> dead as
>> a doornail since the cluster started?
>>
>> A couple of times, I have seen 'ceph status' report an OSD was up, when
>> it was
>> quite dead.  Recently, a couple of OSDs were on machines that failed to
>> boot
>> up after a power failure.  The rest of the Ceph cluster came up, though,
>> and
>> reported all OSDs up and in.  I/Os stalled, probably because they were
>> waiting
>> for the dead OSDs to come back.
>>
>> I waited 15 minutes, because the manual says if the monitor doesn't hear a
>> heartbeat from an OSD in that long (default value of
>> mon_osd_report_timeout),
>> it marks it down.  But it didn't.  I did "osd down" commands for the dead
>> OSDs
>> and the status changed to down and I/O started working.
>>
>> And wouldn't even 15 minutes of grace be unacceptable if it means I/Os
>> have to
>> wait that long before falling back to a redundant OSD?
>>
>> --
>> Bryan Henderson   San Jose, California
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD hanging on 12.2.12 by message worker

2019-06-10 Thread solarflow99
can the bitmap allocator be set in ceph-ansible?  I wonder why is it not
default in 12.2.12


On Thu, Jun 6, 2019 at 7:06 AM Stefan Kooman  wrote:

> Quoting Max Vernimmen (vernim...@textkernel.nl):
> >
> > This is happening several times per day after we made several changes at
> > the same time:
> >
> >- add physical ram to the ceph nodes
> >- move from fixed 'bluestore cache size hdd|sdd' and 'bluestore cache
> kv
> >max' to 'bluestore cache autotune = 1' and 'osd memory target =
> >20401094656'.
> >- update ceph from 12.2.8 to 12.2.11
> >- update clients from 12.2.8 to 12.2.11
> >
> > We have since upgraded the ceph nodes to 12.2.12 but it did not help to
> fix
> > this problem.
>
> Have you tried the new bitmap allocator for the OSDs already (available
> since 12.2.12):
>
> [osd]
>
> # MEMORY ALLOCATOR
> bluestore_allocator = bitmap
> bluefs_allocator = bitmap
>
> The issues you are reporting sound like an issue many of us have seen on
> luminous and mimic clusters and has been identified to be caused by the
> "stupid allocator" memory allocator.
>
> Gr. Stefan
>
>
> --
> | BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
> | GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Failed Disk simulation question

2019-05-24 Thread solarflow99
I think a deep scrub would eventually catch this right?


On Wed, May 22, 2019 at 2:56 AM Eugen Block  wrote:

> Hi Alex,
>
> > The cluster has been idle at the moment being new and all.  I
> > noticed some disk related errors in dmesg but that was about it.
> > It looked to me for the next 20 - 30 minutes the failure has not
> > been detected.  All osds were up and in and health was OK. OSD logs
> > had no smoking gun either.
> > After 30 minutes, I restarted the OSD container and it failed to
> > start as expected.
>
> if the cluster doesn't have to read or write to specific OSDs (or
> sectors on that OSD) the failure won't be detected immediately. We had
> an issue last year where one of the SSDs (used for rocksdb and wal)
> had a failure, but that was never reported. We discovered that when we
> tried to migrate the lvm to a new device and got read errors.
>
> > Later on, I performed the same operation during the fio bench mark
> > and OSD failed immediately.
>
> This confirms our experience, if there's data to read/write on that
> disk the failure will be detected.
> Please note that this was in a Luminous cluster, I don't know if and
> how Nautilus has improved in sensing disk failures.
>
> Regards,
> Eugen
>
>
> Zitat von Alex Litvak :
>
> > Hello cephers,
> >
> > I know that there was similar question posted 5 years ago.  However
> > the answer was inconclusive for me.
> > I installed a new Nautilus 14.2.1 cluster and started pre-production
> > testing.  I followed RedHat document and simulated a soft disk
> > failure by
> >
> > #  echo 1 > /sys/block/sdc/device/delete
> >
> > The cluster has been idle at the moment being new and all.  I
> > noticed some disk related errors in dmesg but that was about it.
> > It looked to me for the next 20 - 30 minutes the failure has not
> > been detected.  All osds were up and in and health was OK. OSD logs
> > had no smoking gun either.
> > After 30 minutes, I restarted the OSD container and it failed to
> > start as expected.
> >
> > Later on, I performed the same operation during the fio bench mark
> > and OSD failed immediately.
> >
> > My question is:  Should the disk problem have been detected quick
> > enough even on the idle cluster? I thought Nautilus has the means to
> > sense failure before intensive IO hit the disk.
> > Am I wrong to expect that?
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ansible 2.8 for Nautilus

2019-05-20 Thread solarflow99
Does anyone know the necessary steps to install ansible 2.8 in rhel7? I'm
assuming most people are doing it with pip?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IO

2019-05-13 Thread solarflow99
Are you sure can you really use 3.2 for nautilus?

On Fri, May 10, 2019 at 7:23 AM Tarek Zegar  wrote:

> Ceph-ansible 3.2, rolling upgrade mimic -> nautilus. The ansible file sets
> flag "norebalance". When there is*no* I/O to the cluster, upgrade works
> fine. When upgrading with IO running in the background, some PG become
> `active+undersized+remapped+backfilling`
> Flag norebalance prevents them from backfilling / recovering and upgrade
> fails. I'm uncertain why those OSD are "backfilling" instead of
> "recovering" but I guess it doesn't matter, norebalance halts the process.
> setting ceph tell osd.* injectargs '--osd_max_backfills=2 made no
> difference
>
>
> *https://github.com/ceph/ceph-ansible/commit/08d94324545b3c4e0f6a1caf6224f37d1c2b36db*
> 
>  <--
> did anyone other then the author verify this?
>
> *Tarek*
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-create-keys loops

2019-05-06 Thread solarflow99
you mention the version of ansible, that is right.  How about the branch of
ceph-ansible?  should be 3.2-stable, what OS?   I haven't come across this
problem myself, a lot of other ones.



On Mon, May 6, 2019 at 3:47 AM ST Wong (ITSC)  wrote:

> Hi all,
>
>
>
> I’ve problem in deploying mimic using ceph-ansible at following step:
>
>
>
> -- cut here ---
>
> TASK [ceph-mon : collect admin and bootstrap keys]
> *
>
> Monday 06 May 2019  17:01:23 +0800 (0:00:00.854)   0:05:38.899
> 
>
> fatal: [cphmon3a]: FAILED! => {"changed": false, "cmd":
> ["ceph-create-keys", "--cluster", "ceph", "-i", "cphmon3a", "-t", "600"],
> "delta": "0:11:24.675833", "end": "2019-05-06 17:12:48.500996", "msg":
> "non-zero return code", "rc": 1, "start": "2019-05-06 17:01:23.825163",
> "stderr": "INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'\n 
> INFO:ceph-create-keys:ceph-mon
> is not in quorum: u'probing'\nINFO:ceph-create-keys:ceph-mon is not in
> quorum: u'probing'\nINFO:ceph-create-keys:ceph-mon is not in quorum:
> u'probing'\nINFO:ceph-create-keys:ceph-mon is not in quorum:
> u'probing'\nINFO:ceph-create-keys:ceph-mon is not in quorum:
> u'probing'\nINFO:ceph-create-keys:ceph-mon is not in quorum:
> u'probing'\nINFO:ceph-create-keys:ceph-mon is not in quorum:
> u'probing'\nINFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'\n
>
> -- cut here ---
>
>
>
> There are 2 NIC on all MONs.   The site.yml contains some basic
> configuration:
>
>
>
> --- site.yml 
>
> dummy:
>
>
>
> ceph_origin: repository
>
> ceph_repository: community
>
> ceph_stable_release: mimic
>
> public_network: "123.123.7.0/24" ß fake ip
> range of our public network
>
> cluster_network: "192.168.77.0/24"
>
> monitor_interface: p2p1
>
> --- site.yml 
>
>
>
>
>
>
>
> And the ceph.conf created on MONs :
>
>
>
>
>
>
>
>  ceph.conf ---
>
> # Please do not change this file directly since it is managed by Ansible
> and will be overwritten
>
>
>
> [global]
>
> fsid = 17db45c6-b5ac-47e8-b5cb-b3e5215f4af4
>
>
>
>
>
> mon initial members = cphmon1a,cphmon2a,cphmon3a,cphmon4b,cphmon5b,cphmon6b
>
> osd pool default crush rule = -1
>
>
>
> mon host =
> 123.123.7.92,123.123.7.93,123.123.7.94,123.123.7.95,123.123.7.96,123.123.7.97
>
>
>
> public network = 123.123.7.0/24
>
> cluster network = 192.168.77.0/24
>
>  ceph.conf ---
>
>
>
> Also get same error if running following on one of the MONs:
>
>
>
> #ceph-create-keys --cluster ceph --id cphmon1a
>
> INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
>
> INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
>
> INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
>
> INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
>
>
>
>
>
> We’re using Ansible 2.6 (for deploying Mimic using ceph-ansible).
>
>
>
> Would anyone please help?
>
>
>
> Thanks a lot.
>
> /st wong
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph cluster available to clients with 2 different VLANs ?

2019-05-03 Thread solarflow99
How is this better than using a single public network, routing through a L3
switch?

If I understand the scenario right, this way would require the switch to be
a trunk port containing all the public vlans, and you can bridge directly
through the switch so L3 wouldn't be necessary?



On Fri, May 3, 2019 at 11:43 AM EDH - Manuel Rios Fernandez <
mrios...@easydatahost.com> wrote:

> You can put multiple networks in ceph.conf with commas
>
>
>
> public network = 172.16.2.0/24, 192.168.0/22
>
>
>
> But remember your servers must be able to reach it. L3 , FW needed.
>
>
>
> Regards
>
> Manuel
>
>
>
>
>
> *De:* ceph-users  *En nombre de *Martin
> Verges
> *Enviado el:* viernes, 3 de mayo de 2019 11:36
> *Para:* Hervé Ballans 
> *CC:* ceph-users 
> *Asunto:* Re: [ceph-users] Ceph cluster available to clients with 2
> different VLANs ?
>
>
>
> Hello,
>
>
>
> configure a gateway on your router or use a good rack switch that can
> provide such features and use layer3 routing to connect different vlans /
> ip zones.
>
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
>
>
>
>
> Am Fr., 3. Mai 2019 um 10:21 Uhr schrieb Hervé Ballans <
> herve.ball...@ias.u-psud.fr>:
>
> Hi all,
>
> I have a Ceph cluster on Luminous 12.2.10 with 3 mon and 6 osd servers.
> My current network settings is a separated public and cluster (private
> IP) network.
>
> I would like my cluster available to clients on another VLAN than the
> default one (which is the public network on ceph.conf)
>
> Is it possible ? How can I achieve that ?
> For information, each node still has two unused network cards.
>
> Thanks for any suggestions,
>
> Hervé
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] showing active config settings

2019-04-25 Thread solarflow99
It sucks that its so hard to set/view active settings, this should be a lot
simpler in my opinion

On Tue, Apr 23, 2019 at 1:58 PM solarflow99  wrote:

> Thanks, but does this not work on Luminous maybe?  I am on the mon hosts
> trying this:
>
>
> # ceph config set osd osd_recovery_max_active 4
> Invalid command: unused arguments: [u'4']
> config set   :  Set a configuration option at runtime (not
> persistent)
> Error EINVAL: invalid command
>
> # ceph daemon osd.0 config diff|grep -A5 osd_recovery_max_active
> admin_socket: exception getting command descriptions: [Errno 2] No such
> file or directory
>
>
> On Tue, Apr 16, 2019 at 4:04 PM Brad Hubbard  wrote:
>
>> $ ceph config set osd osd_recovery_max_active 4
>> $ ceph daemon osd.0 config diff|grep -A5 osd_recovery_max_active
>> "osd_recovery_max_active": {
>> "default": 3,
>> "mon": 4,
>> "override": 4,
>> "final": 4
>> },
>>
>> On Wed, Apr 17, 2019 at 5:29 AM solarflow99 
>> wrote:
>> >
>> > I wish there was a way to query the running settings from one of the
>> MGR hosts, and it doesn't help that ansible doesn't even copy the keyring
>> to the OSD nodes so commands there wouldn't work anyway.
>> > I'm still puzzled why it doesn't show any change when I run this no
>> matter what I set it to:
>> >
>> > # ceph -n osd.1 --show-config | grep osd_recovery_max_active
>> > osd_recovery_max_active = 3
>> >
>> > in fact it doesn't matter if I use an OSD number that doesn't exist,
>> same thing if I use ceph get
>> >
>> >
>> >
>> > On Tue, Apr 16, 2019 at 1:18 AM Brad Hubbard 
>> wrote:
>> >>
>> >> On Tue, Apr 16, 2019 at 6:03 PM Paul Emmerich 
>> wrote:
>> >> >
>> >> > This works, it just says that it *might* require a restart, but this
>> >> > particular option takes effect without a restart.
>> >>
>> >> We've already looked at changing the wording once to make it more
>> palatable.
>> >>
>> >> http://tracker.ceph.com/issues/18424
>> >>
>> >> >
>> >> > Implementation detail: this message shows up if there's no internal
>> >> > function to be called when this option changes, so it can't be sure
>> if
>> >> > the change is actually doing anything because the option might be
>> >> > cached or only read on startup. But in this case this option is read
>> >> > in the relevant path every time and no notification is required. But
>> >> > the injectargs command can't know that.
>> >>
>> >> Right on all counts. The functions are referred to as observers and
>> >> register to be notified if the value changes, hence "not observed."
>> >>
>> >> >
>> >> > Paul
>> >> >
>> >> > On Mon, Apr 15, 2019 at 11:38 PM solarflow99 
>> wrote:
>> >> > >
>> >> > > Then why doesn't this work?
>> >> > >
>> >> > > # ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'
>> >> > > osd.0: osd_recovery_max_active = '4' (not observed, change may
>> require restart)
>> >> > > osd.1: osd_recovery_max_active = '4' (not observed, change may
>> require restart)
>> >> > > osd.2: osd_recovery_max_active = '4' (not observed, change may
>> require restart)
>> >> > > osd.3: osd_recovery_max_active = '4' (not observed, change may
>> require restart)
>> >> > > osd.4: osd_recovery_max_active = '4' (not observed, change may
>> require restart)
>> >> > >
>> >> > > # ceph -n osd.1 --show-config | grep osd_recovery_max_active
>> >> > > osd_recovery_max_active = 3
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Wed, Apr 10, 2019 at 7:21 AM Eugen Block  wrote:
>> >> > >>
>> >> > >> > I always end up using "ceph --admin-daemon
>> >> > >> > /var/run/ceph/name-of-socket-here.asok config show | grep ..."
>> to get what
>> >> > >> > is in effect now for a certain daemon.
>> >> > >> > Needs you to be on the host of the daemon of course.
>> >> > >>
>> >> > >> Me too, I just wanted to try what OP re

Re: [ceph-users] showing active config settings

2019-04-23 Thread solarflow99
Thanks, but does this not work on Luminous maybe?  I am on the mon hosts
trying this:


# ceph config set osd osd_recovery_max_active 4
Invalid command: unused arguments: [u'4']
config set   :  Set a configuration option at runtime (not
persistent)
Error EINVAL: invalid command

# ceph daemon osd.0 config diff|grep -A5 osd_recovery_max_active
admin_socket: exception getting command descriptions: [Errno 2] No such
file or directory


On Tue, Apr 16, 2019 at 4:04 PM Brad Hubbard  wrote:

> $ ceph config set osd osd_recovery_max_active 4
> $ ceph daemon osd.0 config diff|grep -A5 osd_recovery_max_active
> "osd_recovery_max_active": {
> "default": 3,
> "mon": 4,
> "override": 4,
> "final": 4
> },
>
> On Wed, Apr 17, 2019 at 5:29 AM solarflow99  wrote:
> >
> > I wish there was a way to query the running settings from one of the MGR
> hosts, and it doesn't help that ansible doesn't even copy the keyring to
> the OSD nodes so commands there wouldn't work anyway.
> > I'm still puzzled why it doesn't show any change when I run this no
> matter what I set it to:
> >
> > # ceph -n osd.1 --show-config | grep osd_recovery_max_active
> > osd_recovery_max_active = 3
> >
> > in fact it doesn't matter if I use an OSD number that doesn't exist,
> same thing if I use ceph get
> >
> >
> >
> > On Tue, Apr 16, 2019 at 1:18 AM Brad Hubbard 
> wrote:
> >>
> >> On Tue, Apr 16, 2019 at 6:03 PM Paul Emmerich 
> wrote:
> >> >
> >> > This works, it just says that it *might* require a restart, but this
> >> > particular option takes effect without a restart.
> >>
> >> We've already looked at changing the wording once to make it more
> palatable.
> >>
> >> http://tracker.ceph.com/issues/18424
> >>
> >> >
> >> > Implementation detail: this message shows up if there's no internal
> >> > function to be called when this option changes, so it can't be sure if
> >> > the change is actually doing anything because the option might be
> >> > cached or only read on startup. But in this case this option is read
> >> > in the relevant path every time and no notification is required. But
> >> > the injectargs command can't know that.
> >>
> >> Right on all counts. The functions are referred to as observers and
> >> register to be notified if the value changes, hence "not observed."
> >>
> >> >
> >> > Paul
> >> >
> >> > On Mon, Apr 15, 2019 at 11:38 PM solarflow99 
> wrote:
> >> > >
> >> > > Then why doesn't this work?
> >> > >
> >> > > # ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'
> >> > > osd.0: osd_recovery_max_active = '4' (not observed, change may
> require restart)
> >> > > osd.1: osd_recovery_max_active = '4' (not observed, change may
> require restart)
> >> > > osd.2: osd_recovery_max_active = '4' (not observed, change may
> require restart)
> >> > > osd.3: osd_recovery_max_active = '4' (not observed, change may
> require restart)
> >> > > osd.4: osd_recovery_max_active = '4' (not observed, change may
> require restart)
> >> > >
> >> > > # ceph -n osd.1 --show-config | grep osd_recovery_max_active
> >> > > osd_recovery_max_active = 3
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Apr 10, 2019 at 7:21 AM Eugen Block  wrote:
> >> > >>
> >> > >> > I always end up using "ceph --admin-daemon
> >> > >> > /var/run/ceph/name-of-socket-here.asok config show | grep ..."
> to get what
> >> > >> > is in effect now for a certain daemon.
> >> > >> > Needs you to be on the host of the daemon of course.
> >> > >>
> >> > >> Me too, I just wanted to try what OP reported. And after trying
> that,
> >> > >> I'll keep it that way. ;-)
> >> > >>
> >> > >>
> >> > >> Zitat von Janne Johansson :
> >> > >>
> >> > >> > Den ons 10 apr. 2019 kl 13:37 skrev Eugen Block :
> >> > >> >
> >> > >> >> > If you don't specify which daemon to talk to, it tells you
> what the
> >> > >> >> > defaults would be for a random daemon started just now using
> the same
> >

Re: [ceph-users] showing active config settings

2019-04-16 Thread solarflow99
I wish there was a way to query the running settings from one of the MGR
hosts, and it doesn't help that ansible doesn't even copy the keyring to
the OSD nodes so commands there wouldn't work anyway.
I'm still puzzled why it doesn't show any change when I run this no matter
what I set it to:

# ceph -n osd.1 --show-config | grep osd_recovery_max_active
osd_recovery_max_active = 3

in fact it doesn't matter if I use an OSD number that doesn't exist, same
thing if I use ceph get



On Tue, Apr 16, 2019 at 1:18 AM Brad Hubbard  wrote:

> On Tue, Apr 16, 2019 at 6:03 PM Paul Emmerich 
> wrote:
> >
> > This works, it just says that it *might* require a restart, but this
> > particular option takes effect without a restart.
>
> We've already looked at changing the wording once to make it more
> palatable.
>
> http://tracker.ceph.com/issues/18424
>
> >
> > Implementation detail: this message shows up if there's no internal
> > function to be called when this option changes, so it can't be sure if
> > the change is actually doing anything because the option might be
> > cached or only read on startup. But in this case this option is read
> > in the relevant path every time and no notification is required. But
> > the injectargs command can't know that.
>
> Right on all counts. The functions are referred to as observers and
> register to be notified if the value changes, hence "not observed."
>
> >
> > Paul
> >
> > On Mon, Apr 15, 2019 at 11:38 PM solarflow99 
> wrote:
> > >
> > > Then why doesn't this work?
> > >
> > > # ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'
> > > osd.0: osd_recovery_max_active = '4' (not observed, change may require
> restart)
> > > osd.1: osd_recovery_max_active = '4' (not observed, change may require
> restart)
> > > osd.2: osd_recovery_max_active = '4' (not observed, change may require
> restart)
> > > osd.3: osd_recovery_max_active = '4' (not observed, change may require
> restart)
> > > osd.4: osd_recovery_max_active = '4' (not observed, change may require
> restart)
> > >
> > > # ceph -n osd.1 --show-config | grep osd_recovery_max_active
> > > osd_recovery_max_active = 3
> > >
> > >
> > >
> > > On Wed, Apr 10, 2019 at 7:21 AM Eugen Block  wrote:
> > >>
> > >> > I always end up using "ceph --admin-daemon
> > >> > /var/run/ceph/name-of-socket-here.asok config show | grep ..." to
> get what
> > >> > is in effect now for a certain daemon.
> > >> > Needs you to be on the host of the daemon of course.
> > >>
> > >> Me too, I just wanted to try what OP reported. And after trying that,
> > >> I'll keep it that way. ;-)
> > >>
> > >>
> > >> Zitat von Janne Johansson :
> > >>
> > >> > Den ons 10 apr. 2019 kl 13:37 skrev Eugen Block :
> > >> >
> > >> >> > If you don't specify which daemon to talk to, it tells you what
> the
> > >> >> > defaults would be for a random daemon started just now using the
> same
> > >> >> > config as you have in /etc/ceph/ceph.conf.
> > >> >>
> > >> >> I tried that, too, but the result is not correct:
> > >> >>
> > >> >> host1:~ # ceph -n osd.1 --show-config | grep
> osd_recovery_max_active
> > >> >> osd_recovery_max_active = 3
> > >> >>
> > >> >
> > >> > I always end up using "ceph --admin-daemon
> > >> > /var/run/ceph/name-of-socket-here.asok config show | grep ..." to
> get what
> > >> > is in effect now for a certain daemon.
> > >> > Needs you to be on the host of the daemon of course.
> > >> >
> > >> > --
> > >> > May the most significant bit of your life be positive.
> > >>
> > >>
> > >>
> > >> ___
> > >> ceph-users mailing list
> > >> ceph-users@lists.ceph.com
> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Cheers,
> Brad
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] showing active config settings

2019-04-15 Thread solarflow99
Then why doesn't this work?

# ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'
osd.0: osd_recovery_max_active = '4' (not observed, change may require
restart)
osd.1: osd_recovery_max_active = '4' (not observed, change may require
restart)
osd.2: osd_recovery_max_active = '4' (not observed, change may require
restart)
osd.3: osd_recovery_max_active = '4' (not observed, change may require
restart)
osd.4: osd_recovery_max_active = '4' (not observed, change may require
restart)

# ceph -n osd.1 --show-config | grep osd_recovery_max_active
osd_recovery_max_active = 3



On Wed, Apr 10, 2019 at 7:21 AM Eugen Block  wrote:

> > I always end up using "ceph --admin-daemon
> > /var/run/ceph/name-of-socket-here.asok config show | grep ..." to get
> what
> > is in effect now for a certain daemon.
> > Needs you to be on the host of the daemon of course.
>
> Me too, I just wanted to try what OP reported. And after trying that,
> I'll keep it that way. ;-)
>
>
> Zitat von Janne Johansson :
>
> > Den ons 10 apr. 2019 kl 13:37 skrev Eugen Block :
> >
> >> > If you don't specify which daemon to talk to, it tells you what the
> >> > defaults would be for a random daemon started just now using the same
> >> > config as you have in /etc/ceph/ceph.conf.
> >>
> >> I tried that, too, but the result is not correct:
> >>
> >> host1:~ # ceph -n osd.1 --show-config | grep osd_recovery_max_active
> >> osd_recovery_max_active = 3
> >>
> >
> > I always end up using "ceph --admin-daemon
> > /var/run/ceph/name-of-socket-here.asok config show | grep ..." to get
> what
> > is in effect now for a certain daemon.
> > Needs you to be on the host of the daemon of course.
> >
> > --
> > May the most significant bit of your life be positive.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] showing active config settings

2019-04-09 Thread solarflow99
I noticed when changing some settings, they appear to stay the same, for
example when trying to set this higher:

ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'

It gives the usual warning about may need to restart, but it still has the
old value:

# ceph --show-config | grep osd_recovery_max_active
osd_recovery_max_active = 3


restarting the OSDs seems fairly intrusive for every configuration change.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] scrub errors

2019-03-28 Thread solarflow99
ok, I tried doing ceph osd out on each of the 4 OSDs 1 by 1.  I got it out
of backfill mode but still not sure if it'll fix anything.  pg 10.2a still
shows state active+clean+inconsistent.  Peer 8  is now
remapped+inconsistent+peering, and the other peer is
active+clean+inconsistent


On Wed, Mar 27, 2019 at 4:13 PM Brad Hubbard  wrote:

> On Thu, Mar 28, 2019 at 8:33 AM solarflow99  wrote:
> >
> > yes, but nothing seems to happen.  I don't understand why it lists OSDs
> 7 in the  "recovery_state": when i'm only using 3 replicas and it seems to
> use 41,38,8
>
> Well, osd 8s state is listed as
> "active+undersized+degraded+remapped+wait_backfill" so it seems to be
> stuck waiting for backfill for some reason. One thing you could try is
> restarting all of the osds including 7 and 17 to see if forcing them
> to peer again has any positive effect. Don't restart them all at once,
> just one at a time waiting until each has peered before moving on.
>
> >
> > # ceph health detail
> > HEALTH_ERR 1 pgs inconsistent; 47 scrub errors
> > pg 10.2a is active+clean+inconsistent, acting [41,38,8]
> > 47 scrub errors
> >
> >
> >
> > As you can see all OSDs are up and in:
> >
> > # ceph osd stat
> >  osdmap e23265: 49 osds: 49 up, 49 in
> >
> >
> >
> >
> > And this just stays the same:
> >
> > "up": [
> > 41,
> > 38,
> > 8
> > ],
> > "acting": [
> > 41,
> > 38,
> > 8
> >
> >  "recovery_state": [
> > {
> > "name": "Started\/Primary\/Active",
> > "enter_time": "2018-09-22 07:07:48.637248",
> > "might_have_unfound": [
> > {
> > "osd": "7",
> > "status": "not queried"
> > },
> > {
> > "osd": "8",
> > "status": "already probed"
> > },
> > {
> >     "osd": "17",
> > "status": "not queried"
> > },
> > {
> > "osd": "38",
> > "status": "already probed"
> > }
> > ],
> >
> >
> > On Tue, Mar 26, 2019 at 4:53 PM Brad Hubbard 
> wrote:
> >>
> >>
> http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/
> >>
> >> Did you try repairing the pg?
> >>
> >>
> >> On Tue, Mar 26, 2019 at 9:08 AM solarflow99 
> wrote:
> >> >
> >> > yes, I know its old.  I intend to have it replaced but thats a few
> months away and was hoping to get past this.  the other OSDs appear to be
> ok, I see them up and in, why do you see something wrong?
> >> >
> >> > On Mon, Mar 25, 2019 at 4:00 PM Brad Hubbard 
> wrote:
> >> >>
> >> >> Hammer is no longer supported.
> >> >>
> >> >> What's the status of osds 7 and 17?
> >> >>
> >> >> On Tue, Mar 26, 2019 at 8:56 AM solarflow99 
> wrote:
> >> >> >
> >> >> > hi, thanks.  Its still using Hammer.  Here's the output from the
> pg query, the last command you gave doesn't work at all but be too old.
> >> >> >
> >> >> >
> >> >> > # ceph pg 10.2a query
> >> >> > {
> >> >> > "state": "active+clean+inconsistent",
> >> >> > "snap_trimq": "[]",
> >> >> > "epoch": 23265,
> >> >> > "up": [
> >> >> > 41,
> >> >> > 38,
> >> >> > 8
> >> >> > ],
> >> >> > "acting": [
> >> >> > 41,
> >> >> > 38,
> >> >> > 8
> >> >> > ],
> >> >> > "actingbackfill": [
> >> >> > "8",
> >> >> > "38",
> >> >> > "41"
> >> >> > ],
> >> >> > "in

Re: [ceph-users] scrub errors

2019-03-27 Thread solarflow99
yes, but nothing seems to happen.  I don't understand why it lists OSDs 7
in the  "recovery_state": when i'm only using 3 replicas and it seems to
use 41,38,8

# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 47 scrub errors
pg 10.2a is active+clean+inconsistent, acting [41,38,8]
47 scrub errors



As you can see all OSDs are up and in:

# ceph osd stat
 osdmap e23265: 49 osds: 49 up, 49 in




And this just stays the same:

"up": [
41,
38,
8
],
"acting": [
41,
38,
8

 "recovery_state": [
{
"name": "Started\/Primary\/Active",
"enter_time": "2018-09-22 07:07:48.637248",
"might_have_unfound": [
{
"osd": "7",
"status": "not queried"
},
{
"osd": "8",
"status": "already probed"
},
{
"osd": "17",
"status": "not queried"
},
{
"osd": "38",
"status": "already probed"
}
],


On Tue, Mar 26, 2019 at 4:53 PM Brad Hubbard  wrote:

> http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/
>
> Did you try repairing the pg?
>
>
> On Tue, Mar 26, 2019 at 9:08 AM solarflow99  wrote:
> >
> > yes, I know its old.  I intend to have it replaced but thats a few
> months away and was hoping to get past this.  the other OSDs appear to be
> ok, I see them up and in, why do you see something wrong?
> >
> > On Mon, Mar 25, 2019 at 4:00 PM Brad Hubbard 
> wrote:
> >>
> >> Hammer is no longer supported.
> >>
> >> What's the status of osds 7 and 17?
> >>
> >> On Tue, Mar 26, 2019 at 8:56 AM solarflow99 
> wrote:
> >> >
> >> > hi, thanks.  Its still using Hammer.  Here's the output from the pg
> query, the last command you gave doesn't work at all but be too old.
> >> >
> >> >
> >> > # ceph pg 10.2a query
> >> > {
> >> > "state": "active+clean+inconsistent",
> >> > "snap_trimq": "[]",
> >> > "epoch": 23265,
> >> > "up": [
> >> > 41,
> >> > 38,
> >> > 8
> >> > ],
> >> > "acting": [
> >> > 41,
> >> > 38,
> >> > 8
> >> > ],
> >> > "actingbackfill": [
> >> > "8",
> >> > "38",
> >> > "41"
> >> > ],
> >> > "info": {
> >> > "pgid": "10.2a",
> >> > "last_update": "23265'20886859",
> >> > "last_complete": "23265'20886859",
> >> > "log_tail": "23265'20883809",
> >> > "last_user_version": 20886859,
> >> > "last_backfill": "MAX",
> >> > "purged_snaps": "[]",
> >> > "history": {
> >> > "epoch_created": 8200,
> >> > "last_epoch_started": 21481,
> >> > "last_epoch_clean": 21487,
> >> > "last_epoch_split": 0,
> >> > "same_up_since": 21472,
> >> > "same_interval_since": 21474,
> >> > "same_primary_since": 8244,
> >> > "last_scrub": "23265'20864209",
> >> > "last_scrub_stamp": "2019-03-22 22:39:13.930673",
> >> > "last_deep_scrub": "23265'20864209",
> >> > "last_deep_scrub_stamp": "2019-03-22 22:39:13.930673",
> >> > "last_clean_scrub_stamp": "2019-03-15 01:33:21.447438"
> >> > },
> >> > "stats": {
> >> > "version": "23265'20886859",
> >> > "reported_seq": "10109937",
> &

Re: [ceph-users] scrub errors

2019-03-25 Thread solarflow99
yes, I know its old.  I intend to have it replaced but thats a few months
away and was hoping to get past this.  the other OSDs appear to be ok, I
see them up and in, why do you see something wrong?

On Mon, Mar 25, 2019 at 4:00 PM Brad Hubbard  wrote:

> Hammer is no longer supported.
>
> What's the status of osds 7 and 17?
>
> On Tue, Mar 26, 2019 at 8:56 AM solarflow99  wrote:
> >
> > hi, thanks.  Its still using Hammer.  Here's the output from the pg
> query, the last command you gave doesn't work at all but be too old.
> >
> >
> > # ceph pg 10.2a query
> > {
> > "state": "active+clean+inconsistent",
> > "snap_trimq": "[]",
> > "epoch": 23265,
> > "up": [
> > 41,
> > 38,
> > 8
> > ],
> > "acting": [
> > 41,
> > 38,
> > 8
> > ],
> > "actingbackfill": [
> > "8",
> > "38",
> > "41"
> > ],
> > "info": {
> > "pgid": "10.2a",
> > "last_update": "23265'20886859",
> > "last_complete": "23265'20886859",
> > "log_tail": "23265'20883809",
> > "last_user_version": 20886859,
> > "last_backfill": "MAX",
> > "purged_snaps": "[]",
> > "history": {
> > "epoch_created": 8200,
> > "last_epoch_started": 21481,
> > "last_epoch_clean": 21487,
> > "last_epoch_split": 0,
> > "same_up_since": 21472,
> > "same_interval_since": 21474,
> > "same_primary_since": 8244,
> > "last_scrub": "23265'20864209",
> > "last_scrub_stamp": "2019-03-22 22:39:13.930673",
> > "last_deep_scrub": "23265'20864209",
> > "last_deep_scrub_stamp": "2019-03-22 22:39:13.930673",
> > "last_clean_scrub_stamp": "2019-03-15 01:33:21.447438"
> > },
> > "stats": {
> > "version": "23265'20886859",
> > "reported_seq": "10109937",
> > "reported_epoch": "23265",
> > "state": "active+clean+inconsistent",
> > "last_fresh": "2019-03-25 15:52:53.720768",
> > "last_change": "2019-03-22 22:39:13.931038",
> > "last_active": "2019-03-25 15:52:53.720768",
> > "last_peered": "2019-03-25 15:52:53.720768",
> > "last_clean": "2019-03-25 15:52:53.720768",
> > "last_became_active": "0.00",
> > "last_became_peered": "0.00",
> > "last_unstale": "2019-03-25 15:52:53.720768",
> > "last_undegraded": "2019-03-25 15:52:53.720768",
> > "last_fullsized": "2019-03-25 15:52:53.720768",
> > "mapping_epoch": 21472,
> > "log_start": "23265'20883809",
> > "ondisk_log_start": "23265'20883809",
> > "created": 8200,
> > "last_epoch_clean": 21487,
> > "parent": "0.0",
> > "parent_split_bits": 0,
> > "last_scrub": "23265'20864209",
> > "last_scrub_stamp": "2019-03-22 22:39:13.930673",
> > "last_deep_scrub": "23265'20864209",
> > "last_deep_scrub_stamp": "2019-03-22 22:39:13.930673",
> > "last_clean_scrub_stamp": "2019-03-15 01:33:21.447438",
> > "log_size": 3050,
> > "ondisk_log_size": 3050,
> > "stats_invalid": "0",
> > "stat_sum": {
> > "num_bytes": 8220278746,
> > "num_objects": 345034,
> > "

Re: [ceph-users] scrub errors

2019-03-25 Thread solarflow99
_log_size": 3050,
"stats_invalid": "0",
"stat_sum": {
"num_bytes": 6405126628,
"num_objects": 241711,
"num_object_clones": 0,
"num_object_copies": 725130,
"num_objects_missing_on_primary": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 241711,
"num_whiteouts": 0,
"num_read": 5637862,
"num_read_kb": 48735376,
"num_write": 6789687,
"num_write_kb": 67678402,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 167079,
"num_bytes_recovered": 5191625476,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0
},
"up": [
41,
38,
8
],
"acting": [
41,
38,
8
],
"blocked_by": [],
"up_primary": 41,
"acting_primary": 41
},
"empty": 0,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 21481,
"hit_set_history": {
"current_last_update": "0'0",
"current_last_stamp": "0.00",
"current_info": {
"begin": "0.00",
"end": "0.00",
"version": "0'0",
"using_gmt": "0"
},
"history": []
}
}
],
"recovery_state": [
{
"name": "Started\/Primary\/Active",
"enter_time": "2018-09-22 07:07:48.637248",
"might_have_unfound": [
{
"osd": "7",
"status": "not queried"
},
{
"osd": "8",
"status": "already probed"
},
{
"osd": "17",
"status": "not queried"
},
{
"osd": "38",
"status": "already probed"
}
],
"recovery_progress": {
"backfill_targets": [],
"waiting_on_backfill": [],
"last_backfill_started": "-1\/0\/\/0",
"backfill_info": {
"begin": "-1\/0\/\/0",
"end": "-1\/0\/\/0",
"objects": []
},
"peer_backfill_info": [],
"backfills_in_flight": [],
"recovering": [],
"pg_backend": {
"pull_from_peer": [],
"pushing": []
}
},
"scrub": {
"scrubber.epoch_start": "21474",
"scrubber.active": 0,
"scrubber.waiting_on": 0,
"scrubber.waiting_on_whom": []
}
},
{
"name": "Started",
"enter_time": "2018-09-22 07:07:42.138358"
}
],
"agent_state": {}
}


On Mon, Mar 25, 2019 at 3:46 PM Brad Hubbard  wrote:

> It would help to know what version you are running but, to begin with,
> could you post the output of the following?
>
> $ sudo ceph pg 10.2a query
> $ sudo rados list-inconsistent-obj 10.2a --format=json-pretty
>
> Also, have a read of
> http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/
> (adjust the URl for your release).
>
> On Tue, Mar 26, 2019 at 8:19 AM solarflow99  wrote:
> >
> > I noticed my cluster has scrub errors but the deep-scrub command doesn't
> show any errors.  Is there any way to know what it takes to fix it?
> >
> >
> >
> > # ceph health detail
> > HEALTH_ERR 1 pgs inconsistent; 47 scrub errors
> > pg 10.2a is active+clean+inconsistent, acting [41,38,8]
> > 47 scrub errors
> >
> > # zgrep 10.2a /var/log/ceph/ceph.log*
> > /var/log/ceph/ceph.log-20190323.gz:2019-03-22 16:20:18.148299 osd.41
> 192.168.4.19:6809/30077 54885 : cluster [INF] 10.2a deep-scrub starts
> > /var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024040 osd.41
> 192.168.4.19:6809/30077 54886 : cluster [ERR] 10.2a shard 38 missing
> 10/24083d2a/ec50777d-cc99-46a8-8610-4492213f412f/head
> > /var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024049 osd.41
> 192.168.4.19:6809/30077 54887 : cluster [ERR] 10.2a shard 38 missing
> 10/ff183d2a/fce859b9-61a9-46cb-82f1-4b4af31c10db/head
> > /var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024074 osd.41
> 192.168.4.19:6809/30077 54888 : cluster [ERR] 10.2a shard 38 missing
> 10/34283d2a/4b7c96cb-c494-4637-8669-e42049bd0e1c/head
> > /var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024076 osd.41
> 192.168.4.19:6809/30077 54889 : cluster [ERR] 10.2a shard 38 missing
> 10/df283d2a/bbe61149-99f8-4b83-a42b-b208d18094a8/head
> > /var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024077 osd.41
> 192.168.4.19:6809/30077 54890 : cluster [ERR] 10.2a shard 38 missing
> 10/35383d2a/60e8ed9b-bd04-5a43-8917-6f29eba28a66:0014/head
> > /var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024078 osd.41
> 192.168.4.19:6809/30077 54891 : cluster [ERR] 10.2a shard 38 missing
> 10/d5383d2a/2bdeb186-561b-4151-b87e-fe7c2e217d41/head
> > /var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024080 osd.41
> 192.168.4.19:6809/30077 54892 : cluster [ERR] 10.2a shard 38 missing
> 10/a7383d2a/b6b9d21d-2f4f-4550-8928-52552349db7d/head
> > /var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024081 osd.41
> 192.168.4.19:6809/30077 54893 : cluster [ERR] 10.2a shard 38 missing
> 10/9c383d2a/5b552687-c709-4e87-b773-1cce5b262754/head
> > /var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024082 osd.41
> 192.168.4.19:6809/30077 54894 : cluster [ERR] 10.2a shard 38 missing
> 10/5d383d2a/cb1a2ea8-0872-4de9-8b93-5ea8d9d8e613/head
> > /var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024083 osd.41
> 192.168.4.19:6809/30077 54895 : cluster [ERR] 10.2a shard 38 missing
> 10/8f483d2a/74c7a2b9-f00a-4c89-afbd-c1b8439234ac/head
> > /var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024085 osd.41
> 192.168.4.19:6809/30077 54896 : cluster [ERR] 10.2a shard 38 missing
> 10/b1583d2a/b3f00768-82a2-4637-91d1-164f3a51312a/head
> > /var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024086 osd.41
> 192.168.4.19:6809/30077 54897 : cluster [ERR] 10.2a shard 38 missing
> 10/35583d2a/e347aff4-7b71-476e-863a-310e767e4160/head
> > /var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024088 osd.41
> 192.168.4.19:6809/30077 54898 : cluster [ERR] 10.2a shard 38 missing
> 10/69583d2a/0805d07a-49d1-44cb-87c7-3bd73a0ce692/head
> > /var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024122 osd.41
> 192.168.4.19:6809/30077 54899 : cluster [ERR] 10.2a shard 38 missing
> 10/1a583d2a/d65bcf6a-9457-46c3-8fbc-432ebbaad89a/head
> > /var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024123 osd.41
> 192.168.4.19:6809/30077 54900 : cluster [ERR] 10.2a shard 38 missing
> 10/6d583d2a/5592f7d6-a131-4eb2-a3dd-b2d96691dd7e/head
> > /var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024124 osd.41
> 192.168.4.19:6809/30077 54901 : cluster [ERR] 10.2a shard 38 missing
> 10/f0683d2a/81897399-4cb0-59b3-b9ae-bf043a272137:0003/head
> >
> >
> >
> > # ceph pg deep-scrub 10.2a
> > instructing pg 10.2a on osd.41 to deep-scrub
> >
> >
> > # ceph -w | grep 10.2a
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Cheers,
> Brad
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] scrub errors

2019-03-25 Thread solarflow99
I noticed my cluster has scrub errors but the deep-scrub command doesn't
show any errors.  Is there any way to know what it takes to fix it?



# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 47 scrub errors
pg 10.2a is active+clean+inconsistent, acting [41,38,8]
47 scrub errors

# zgrep 10.2a /var/log/ceph/ceph.log*
/var/log/ceph/ceph.log-20190323.gz:2019-03-22 16:20:18.148299 osd.41
192.168.4.19:6809/30077 54885 : cluster [INF] 10.2a deep-scrub starts
/var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024040 osd.41
192.168.4.19:6809/30077 54886 : cluster [ERR] 10.2a shard 38 missing
10/24083d2a/ec50777d-cc99-46a8-8610-4492213f412f/head
/var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024049 osd.41
192.168.4.19:6809/30077 54887 : cluster [ERR] 10.2a shard 38 missing
10/ff183d2a/fce859b9-61a9-46cb-82f1-4b4af31c10db/head
/var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024074 osd.41
192.168.4.19:6809/30077 54888 : cluster [ERR] 10.2a shard 38 missing
10/34283d2a/4b7c96cb-c494-4637-8669-e42049bd0e1c/head
/var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024076 osd.41
192.168.4.19:6809/30077 54889 : cluster [ERR] 10.2a shard 38 missing
10/df283d2a/bbe61149-99f8-4b83-a42b-b208d18094a8/head
/var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024077 osd.41
192.168.4.19:6809/30077 54890 : cluster [ERR] 10.2a shard 38 missing
10/35383d2a/60e8ed9b-bd04-5a43-8917-6f29eba28a66:0014/head
/var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024078 osd.41
192.168.4.19:6809/30077 54891 : cluster [ERR] 10.2a shard 38 missing
10/d5383d2a/2bdeb186-561b-4151-b87e-fe7c2e217d41/head
/var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024080 osd.41
192.168.4.19:6809/30077 54892 : cluster [ERR] 10.2a shard 38 missing
10/a7383d2a/b6b9d21d-2f4f-4550-8928-52552349db7d/head
/var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024081 osd.41
192.168.4.19:6809/30077 54893 : cluster [ERR] 10.2a shard 38 missing
10/9c383d2a/5b552687-c709-4e87-b773-1cce5b262754/head
/var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024082 osd.41
192.168.4.19:6809/30077 54894 : cluster [ERR] 10.2a shard 38 missing
10/5d383d2a/cb1a2ea8-0872-4de9-8b93-5ea8d9d8e613/head
/var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024083 osd.41
192.168.4.19:6809/30077 54895 : cluster [ERR] 10.2a shard 38 missing
10/8f483d2a/74c7a2b9-f00a-4c89-afbd-c1b8439234ac/head
/var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024085 osd.41
192.168.4.19:6809/30077 54896 : cluster [ERR] 10.2a shard 38 missing
10/b1583d2a/b3f00768-82a2-4637-91d1-164f3a51312a/head
/var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024086 osd.41
192.168.4.19:6809/30077 54897 : cluster [ERR] 10.2a shard 38 missing
10/35583d2a/e347aff4-7b71-476e-863a-310e767e4160/head
/var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024088 osd.41
192.168.4.19:6809/30077 54898 : cluster [ERR] 10.2a shard 38 missing
10/69583d2a/0805d07a-49d1-44cb-87c7-3bd73a0ce692/head
/var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024122 osd.41
192.168.4.19:6809/30077 54899 : cluster [ERR] 10.2a shard 38 missing
10/1a583d2a/d65bcf6a-9457-46c3-8fbc-432ebbaad89a/head
/var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024123 osd.41
192.168.4.19:6809/30077 54900 : cluster [ERR] 10.2a shard 38 missing
10/6d583d2a/5592f7d6-a131-4eb2-a3dd-b2d96691dd7e/head
/var/log/ceph/ceph.log-20190323.gz:2019-03-22 18:29:02.024124 osd.41
192.168.4.19:6809/30077 54901 : cluster [ERR] 10.2a shard 38 missing
10/f0683d2a/81897399-4cb0-59b3-b9ae-bf043a272137:0003/head



# ceph pg deep-scrub 10.2a
instructing pg 10.2a on osd.41 to deep-scrub


# ceph -w | grep 10.2a
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 3-node cluster with 3 x Intel Optane 900P - very low benchmarked performance (200 IOPS)?

2019-03-11 Thread solarflow99
how about adding:  --sync=1 --numjobs=1  to the command as well?



On Sat, Mar 9, 2019 at 12:09 PM Vitaliy Filippov  wrote:

> There are 2:
>
> fio -ioengine=rbd -direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite
> -pool=bench -rbdname=testimg
>
> fio -ioengine=rbd -direct=1 -name=test -bs=4k -iodepth=128 -rw=randwrite
> -pool=bench -rbdname=testimg
>
> The first measures your min possible latency - it does not scale with the
> number of OSDs at all, but it's usually what real applications like
> DBMSes
> need.
>
> The second measures your max possible random write throughput which you
> probably won't be able to utilize if you don't have enough VMs all
> writing
> in parallel.
>
> --
> With best regards,
>Vitaliy Filippov
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 14.1.0, No dashboard module

2019-03-06 Thread solarflow99
sounds right to me


On Wed, Mar 6, 2019 at 7:35 AM Kai Wagner  wrote:

> Hi all,
>
> I think this change really late in the game just results into confusion.
>
> I would be in favor to make the ceph-mgr-dashboard package a dependency of
> the ceph-mgr so that people just need to enable the dashboard without the
> need to install another package separately. This way we could also use the
> current documentation and we don't need to update everything.
>
> My thought is that we would like to encourage people to use the dashboard
> but without the dependency this will result to the opposite.
>
> Note: Shouldn't we also rename the ceph-mgr-dashboard package to just
> ceph-dashboard as this is now the official name of it?
>
> Thoughts?
>
> Kai
> On 3/5/19 3:37 PM, Laura Paduano wrote:
>
> Hi Ashley,
>
> thanks for pointing this out! I've created a tracker issue [1] and we will
> take care of updating the documentation accordingly.
>
> Thanks,
> Laura
>
>
> [1] https://tracker.ceph.com/issues/38584
>
> On 05.03.19 10:16, Ashley Merrick wrote:
>
> As a follow up seems the dashboard is a separate package not installed by
> default called "ceph-mgr-dashboard"
>
> Seems this is currently missing off the RC notes, and the master doc for
> ceph dashboard.
>
> Cheers
>
> On Tue, Mar 5, 2019 at 10:54 AM Ashley Merrick 
> wrote:
>
>> I have just spun up a small test environment to give the first RC a test
>> run.
>>
>> Have managed to get a MON / MGR running fine on latest .dev packages on
>> Ubuntu 18.04, however when I go to try enable the dashboard I get the
>> following error.
>>
>> ceph mgr module enable dashboard
>> Error ENOENT: all mgr daemons do not support module 'dashboard', pass
>> --force to force enablement
>>
>> Trying with --force does nothing, checking the mgr log during boot the
>> dashboard plugin is not listed along all the plugins available.
>>
>> I had a look through the tracker and commits since the RC1, however can't
>> see this already mentioned, not sure if this is expected for RC1 or a bug.
>>
>> Thanks,
>> Ashley
>>
>
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
> HRB 21284 (AG Nürnberg)
>
>
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> --
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nürnberg)
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd unmap fails with error: rbd: sysfs write failed rbd: unmap failed: (16) Device or resource busy

2019-03-01 Thread solarflow99
It has to be mounted from somewhere, if that server goes offline, you need
to mount it from somewhere else right?


On Thu, Feb 28, 2019 at 11:15 PM David Turner  wrote:

> Why are you making the same rbd to multiple servers?
>
> On Wed, Feb 27, 2019, 9:50 AM Ilya Dryomov  wrote:
>
>> On Wed, Feb 27, 2019 at 12:00 PM Thomas <74cmo...@gmail.com> wrote:
>> >
>> > Hi,
>> > I have noticed an error when writing to a mapped RBD.
>> > Therefore I unmounted the block device.
>> > Then I tried to unmap it w/o success:
>> > ld2110:~ # rbd unmap /dev/rbd0
>> > rbd: sysfs write failed
>> > rbd: unmap failed: (16) Device or resource busy
>> >
>> > The same block device is mapped on another client and there are no
>> issues:
>> > root@ld4257:~# rbd info hdb-backup/ld2110
>> > rbd image 'ld2110':
>> > size 7.81TiB in 2048000 objects
>> > order 22 (4MiB objects)
>> > block_name_prefix: rbd_data.3cda0d6b8b4567
>> > format: 2
>> > features: layering
>> > flags:
>> > create_timestamp: Fri Feb 15 10:53:50 2019
>> > root@ld4257:~# rados -p hdb-backup  listwatchers
>> rbd_data.3cda0d6b8b4567
>> > error listing watchers hdb-backup/rbd_data.3cda0d6b8b4567: (2) No such
>> > file or directory
>> > root@ld4257:~# rados -p hdb-backup  listwatchers
>> rbd_header.3cda0d6b8b4567
>> > watcher=10.76.177.185:0/1144812735 client.21865052 cookie=1
>> > watcher=10.97.206.97:0/4023931980 client.18484780
>> > cookie=18446462598732841027
>> >
>> >
>> > Question:
>> > How can I force to unmap the RBD on client ld2110 (= 10.76.177.185)?
>>
>> Hi Thomas,
>>
>> It appears that /dev/rbd0 is still open on that node.
>>
>> Was the unmount successful?  Which filesystem (ext4, xfs, etc)?
>>
>> What is the output of "ps aux | grep rbd" on that node?
>>
>> Try lsof, fuser, check for LVM volumes and multipath -- these have been
>> reported to cause this issue previously:
>>
>>   http://tracker.ceph.com/issues/12763
>>
>> Thanks,
>>
>> Ilya
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd space usage

2019-02-28 Thread solarflow99
yes, but:

# rbd showmapped
id pool image snap device
0  rbd  nfs1  -/dev/rbd0
1  rbd  nfs2  -/dev/rbd1


# df -h
Filesystem  Size  Used Avail Use% Mounted on
/dev/rbd0   8.0T  4.8T  3.3T  60% /mnt/nfsroot/rbd0
/dev/rbd1   9.8T   34M  9.8T   1% /mnt/nfsroot/rbd1


only 5T is taken up


On Thu, Feb 28, 2019 at 2:26 PM Jack  wrote:

> Are not you using 3-replicas pool ?
>
> (15745GB + 955GB + 1595M) * 3 ~= 51157G (there is overhead involved)
>
> Best regards,
>
> On 02/28/2019 11:09 PM, solarflow99 wrote:
> > thanks, I still can't understand whats taking up all the space 27.75
> >
> > On Thu, Feb 28, 2019 at 7:18 AM Mohamad Gebai  wrote:
> >
> >> On 2/27/19 4:57 PM, Marc Roos wrote:
> >>> They are 'thin provisioned' meaning if you create a 10GB rbd, it does
> >>> not use 10GB at the start. (afaik)
> >>
> >> You can use 'rbd -p rbd du' to see how much of these devices is
> >> provisioned and see if it's coherent.
> >>
> >> Mohamad
> >>
> >>>
> >>>
> >>> -Original Message-
> >>> From: solarflow99 [mailto:solarflo...@gmail.com]
> >>> Sent: 27 February 2019 22:55
> >>> To: Ceph Users
> >>> Subject: [ceph-users] rbd space usage
> >>>
> >>> using ceph df it looks as if RBD images can use the total free space
> >>> available of the pool it belongs to, 8.54% yet I know they are created
> >>> with a --size parameter and thats what determines the actual space.  I
> >>> can't understand the difference i'm seeing, only 5T is being used but
> >>> ceph df shows 51T:
> >>>
> >>>
> >>> /dev/rbd0   8.0T  4.8T  3.3T  60% /mnt/nfsroot/rbd0
> >>> /dev/rbd1   9.8T   34M  9.8T   1% /mnt/nfsroot/rbd1
> >>>
> >>>
> >>>
> >>> # ceph df
> >>> GLOBAL:
> >>> SIZE AVAIL RAW USED %RAW USED
> >>> 180T  130T   51157G 27.75
> >>> POOLS:
> >>> NAMEID USED   %USED MAX AVAIL
> >>> OBJECTS
> >>> rbd 0  15745G  8.543G
> >>> 4043495
> >>> cephfs_data 1   0 03G
> >>> 0
> >>> cephfs_metadata 21962 03G
> >>>20
> >>> spider_stage 9   1595M 03G47835
> >>> spider   10   955G  0.523G
> >>> 42541237
> >>>
> >>>
> >>>
> >>>
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd space usage

2019-02-28 Thread solarflow99
thanks, I still can't understand whats taking up all the space 27.75

On Thu, Feb 28, 2019 at 7:18 AM Mohamad Gebai  wrote:

> On 2/27/19 4:57 PM, Marc Roos wrote:
> > They are 'thin provisioned' meaning if you create a 10GB rbd, it does
> > not use 10GB at the start. (afaik)
>
> You can use 'rbd -p rbd du' to see how much of these devices is
> provisioned and see if it's coherent.
>
> Mohamad
>
> >
> >
> > -Original Message-
> > From: solarflow99 [mailto:solarflo...@gmail.com]
> > Sent: 27 February 2019 22:55
> > To: Ceph Users
> > Subject: [ceph-users] rbd space usage
> >
> > using ceph df it looks as if RBD images can use the total free space
> > available of the pool it belongs to, 8.54% yet I know they are created
> > with a --size parameter and thats what determines the actual space.  I
> > can't understand the difference i'm seeing, only 5T is being used but
> > ceph df shows 51T:
> >
> >
> > /dev/rbd0   8.0T  4.8T  3.3T  60% /mnt/nfsroot/rbd0
> > /dev/rbd1   9.8T   34M  9.8T   1% /mnt/nfsroot/rbd1
> >
> >
> >
> > # ceph df
> > GLOBAL:
> > SIZE AVAIL RAW USED %RAW USED
> > 180T  130T   51157G 27.75
> > POOLS:
> > NAMEID USED   %USED MAX AVAIL
> > OBJECTS
> > rbd 0  15745G  8.543G
> > 4043495
> > cephfs_data 1   0 03G
> > 0
> > cephfs_metadata 21962 03G
> >20
> > spider_stage 9   1595M 03G47835
> > spider   10   955G  0.523G
> > 42541237
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd space usage

2019-02-27 Thread solarflow99
using ceph df it looks as if RBD images can use the total free space
available of the pool it belongs to, 8.54% yet I know they are created with
a --size parameter and thats what determines the actual space.  I can't
understand the difference i'm seeing, only 5T is being used but ceph df
shows 51T:


/dev/rbd0   8.0T  4.8T  3.3T  60% /mnt/nfsroot/rbd0
/dev/rbd1   9.8T   34M  9.8T   1% /mnt/nfsroot/rbd1


# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
180T  130T   51157G 27.75
POOLS:
NAMEID USED   %USED MAX AVAIL
OBJECTS
rbd 0  15745G  8.543G
4043495
cephfs_data 1   0 0
3G0
cephfs_metadata 21962 03G
20
spider_stage 9   1595M 03G47835
spider   10   955G  0.523G 42541237
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Configuration about using nvme SSD

2019-02-26 Thread solarflow99
I saw Intel had a demo of a luminous cluster running on top of the line
hardware, they used 2 OSD partitions with the best performance.  I was
interested that they would split them like that, and asked the demo person
how they came to that number.  I never got a really good answer except that
it would provide better performance.  So I guess this must be why.



On Mon, Feb 25, 2019 at 8:30 PM  wrote:

> I create 2-4 RBD images sized 10GB or more with --thick-provision, then
> run
>
> fio -ioengine=rbd -direct=1 -invalidate=1 -name=test -bs=4k -iodepth=128
> -rw=randwrite -pool=rpool -runtime=60 -rbdname=testimg
>
> For each of them at the same time.
>
> > How do you test what total 4Kb random write iops (RBD) you have?
> >
> > -Original Message-
> > From: Vitaliy Filippov [mailto:vita...@yourcmc.ru]
> > Sent: 24 February 2019 17:39
> > To: David Turner
> > Cc: ceph-users; 韦皓诚
> > Subject: *SPAM* Re: [ceph-users] Configuration about using nvme
> > SSD
> >
> > I've tried 4x OSD on fast SAS SSDs in a test setup with only 2 such
> > drives in cluster - it increased CPU consumption a lot, but total 4Kb
> > random write iops (RBD) only went from ~11000 to ~22000. So it was 2x
> > increase, but at a huge cost.
> >
> >> One thing that's worked for me to get more out of nvmes with Ceph is
> >> to create multiple partitions on the nvme with an osd on each
> > partition.
> >> That
> >> way you get more osd processes and CPU per nvme device. I've heard of
> >> people using up to 4 partitions like this.
> >
> > --
> > With best regards,
> >Vitaliy Filippov
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ?= Intel P4600 3.2TB=?utf-8?q? U.2 form factor NVMe firmware problems causing dead disks

2019-02-26 Thread solarflow99
I knew it.  FW updates are very important for SSDs

On Sat, Feb 23, 2019 at 8:35 PM Michel Raabe  wrote:

> On Monday, February 18, 2019 16:44 CET, David Turner <
> drakonst...@gmail.com> wrote:
> > Has anyone else come across this issue before?  Our current theory is
> that
> > Bluestore is accessing the disk in a way that is triggering a bug in the
> > older firmware version that isn't triggered by more traditional
> > filesystems.  We have a scheduled call with Intel to discuss this, but
> > their preliminary searches into the bugfixes and known problems between
> > firmware versions didn't indicate the bug that we triggered.  It would be
> > good to have some more information about what those differences for disk
> > accessing might be to hopefully get a better answer from them as to what
> > the problem is.
> >
> >
> > [1]
> >
> https://www.intel.com/content/www/us/en/products/memory-storage/solid-state-drives/data-center-ssds/dc-p4600-series/dc-p4600-3-2tb-2-5inch-3d1.html
>
>  Yes and no. We got a same issue with the P4500 4TB. 3 disks in one day.
> In the end it was a firmware bug.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Bluestore] Some of my osd's uses BlueFS slow storage for db - why?

2019-02-22 Thread solarflow99
Aren't you undersized at only 30GB?  I thought you should have 4% of your
OSDs


On Fri, Feb 22, 2019 at 3:10 PM Nick Fisk  wrote:

> >On 2/16/19 12:33 AM, David Turner wrote:
> >> The answer is probably going to be in how big your DB partition is vs
> >> how big your HDD disk is.  From your output it looks like you have a
> >> 6TB HDD with a 28GB Blocks.DB partition.  Even though the DB used
> >> size isn't currently full, I would guess that at some point since
> >> this OSD was created that it did fill up and what you're seeing is
> >> the part of the DB that spilled over to the data disk. This is why
> >> the official recommendation (that is quite cautious, but cautious
> >> because some use cases will use this up) for a blocks.db partition is
> >> 4% of the data drive.  For your 6TB disks that's a recommendation of
> >> 240GB per DB partition.  Of course the actual size of the DB needed
> >> is dependent on your use case.  But pretty much every use case for a
> >> 6TB disk needs a bigger partition than 28GB.
> >
> >
> >My current db size of osd.33 is 7910457344 bytes, and osd.73 is
> >2013265920+4685037568 bytes. 7544Mbyte (24.56% of db_total_bytes) vs
> >6388Mbyte (6.69% of db_total_bytes).
> >
> >Why osd.33 is not used slow storage at this case?
>
> Bluestore/RocksDB will only put the next level up size of DB on flash if
> the whole size will fit.
> These sizes are roughly 3GB,30GB,300GB. Anything in-between those sizes
> are pointless. Only ~3GB of SSD will ever be used out of a
> 28GB partition. Likewise a 240GB partition is also pointless as only ~30GB
> will be used.
>
> I'm currently running 30GB partitions on my cluster with a mix of 6,8,10TB
> disks. The 10TB's are about 75% full and use around 14GB,
> this is on mainly 3x Replica RBD(4MB objects)
>
> Nick
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intel P4600 3.2TB U.2 form factor NVMe firmware problems causing dead disks

2019-02-19 Thread solarflow99
no, but I know that if the wear leveling isn't right then I wouldn't expect
them to last long, FW updates on SSDs are very important.


On Mon, Feb 18, 2019 at 7:44 AM David Turner  wrote:

> We have 2 clusters of [1] these disks that have 2 Bluestore OSDs per disk
> (partitioned), 3 disks per node, 5 nodes per cluster.  The clusters are
> 12.2.4 running CephFS and RBDs.  So in total we have 15 NVMe's per cluster
> and 30 NVMe's in total.  They were all built at the same time and were
> running firmware version QDV10130.  On this firmware version we early on
> had 2 disks failures, a few months later we had 1 more, and then a month
> after that (just a few weeks ago) we had 7 disk failures in 1 week.
>
> The failures are such that the disk is no longer visible to the OS.  This
> holds true beyond server reboots as well as placing the failed disks into a
> new server.  With a firmware upgrade tool we got an error that pretty much
> said there's no way to get data back and to RMA the disk.  We upgraded all
> of our remaining disks' firmware to QDV101D1 and haven't had any problems
> since then.  Most of our failures happened while rebalancing the cluster
> after replacing dead disks and we tested rigorously around that use case
> after upgrading the firmware.  This firmware version seems to have resolved
> whatever the problem was.
>
> We have about 100 more of these scattered among database servers and other
> servers that have never had this problem while running the
> QDV10130 firmware as well as firmwares between this one and the one we
> upgraded to.  Bluestore on Ceph is the only use case we've had so far with
> this sort of failure.
>
> Has anyone else come across this issue before?  Our current theory is that
> Bluestore is accessing the disk in a way that is triggering a bug in the
> older firmware version that isn't triggered by more traditional
> filesystems.  We have a scheduled call with Intel to discuss this, but
> their preliminary searches into the bugfixes and known problems between
> firmware versions didn't indicate the bug that we triggered.  It would be
> good to have some more information about what those differences for disk
> accessing might be to hopefully get a better answer from them as to what
> the problem is.
>
>
> [1]
> https://www.intel.com/content/www/us/en/products/memory-storage/solid-state-drives/data-center-ssds/dc-p4600-series/dc-p4600-3-2tb-2-5inch-3d1.html
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] upgrading

2019-02-05 Thread solarflow99
Does ceph-ansible support upgrading a cluster to the latest minor versions,
(ex. mimic 13.2.2 to 13.2.4)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Optane still valid

2019-02-04 Thread solarflow99
I think one limitation would be the 375GB since bluestore needs a larger
amount of space than filestore did.

On Mon, Feb 4, 2019 at 10:20 AM Florian Engelmann <
florian.engelm...@everyware.ch> wrote:

> Hi,
>
> we have built a 6 Node NVMe only Ceph Cluster with 4x Intel DC P4510 8TB
> each and one Intel DC P4800X 375GB Optane each. Up to 10x P4510 can be
> installed in each node.
> WAL and RocksDBs for all P4510 should be stored on the Optane (approx.
> 30GB per RocksDB incl. WAL).
> Internally, discussions arose whether the Optane would become a
> bottleneck from a certain number of P4510 on.
> For us, the lowest possible latency is very important. Therefore the
> Optane NVMes were bought. In view of the good performance of the P4510,
> the question arises whether the Optanes still have a noticeable effect
> or whether they are actually just SPOFs?
>
>
> All the best,
> Florian
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD default pool

2019-02-01 Thread solarflow99
I thought a new cluster would have the 'rbd' pool already created, has this
changed?  I'm using mimic.


# rbd ls
rbd: error opening default pool 'rbd'
Ensure that the default pool has been created or specify an alternate pool
name.
rbd: list: (2) No such file or directory
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH_FSAL Nfs-ganesha

2019-01-30 Thread solarflow99
Do can you do HA on the NFS shares?

On Wed, Jan 30, 2019 at 9:10 AM David C  wrote:

> Hi Patrick
>
> Thanks for the info. If I did multiple exports, how does that work in
> terms of the cache settings defined in ceph.conf, are those settings per
> CephFS client or a shared cache? I.e if I've definied client_oc_size, would
> that be per export?
>
> Cheers,
>
> On Tue, Jan 15, 2019 at 6:47 PM Patrick Donnelly 
> wrote:
>
>> On Mon, Jan 14, 2019 at 7:11 AM Daniel Gryniewicz 
>> wrote:
>> >
>> > Hi.  Welcome to the community.
>> >
>> > On 01/14/2019 07:56 AM, David C wrote:
>> > > Hi All
>> > >
>> > > I've been playing around with the nfs-ganesha 2.7 exporting a cephfs
>> > > filesystem, it seems to be working pretty well so far. A few
>> questions:
>> > >
>> > > 1) The docs say " For each NFS-Ganesha export, FSAL_CEPH uses a
>> > > libcephfs client,..." [1]. For arguments sake, if I have ten top level
>> > > dirs in my Cephfs namespace, is there any value in creating a separate
>> > > export for each directory? Will that potentially give me better
>> > > performance than a single export of the entire namespace?
>> >
>> > I don't believe there are any advantages from the Ceph side.  From the
>> > Ganesha side, you configure permissions, client ACLs, squashing, and so
>> > on on a per-export basis, so you'll need different exports if you need
>> > different settings for each top level directory.  If they can all use
>> > the same settings, one export is probably better.
>>
>> There may be performance impact (good or bad) with having separate
>> exports for CephFS. Each export instantiates a separate instance of
>> the CephFS client which has its own bookkeeping and set of
>> capabilities issued by the MDS. Also, each client instance has a
>> separate big lock (potentially a big deal for performance). If the
>> data for each export is disjoint (no hard links or shared inodes) and
>> the NFS server is expected to have a lot of load, breaking out the
>> exports can have a positive impact on performance. If there are hard
>> links, then the clients associated with the exports will potentially
>> fight over capabilities which will add to request latency.)
>>
>> --
>> Patrick Donnelly
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] io-schedulers

2018-11-05 Thread solarflow99
I'm interested to know about this too.


On Mon, Nov 5, 2018 at 10:45 AM Bastiaan Visser  wrote:

>
> There are lots of rumors around about the benefit of changing
> io-schedulers for OSD disks.
> Even some benchmarks can be found, but they are all more than a few years
> old.
> Since ceph is moving forward with quite a pace, i am wondering what the
> common practice is to use as io-scheduler on OSD's.
>
> And since blk-mq is around these days, are the multi-queue schedules
> already being used in production clusters?
>
> Regards,
>  Bastiaan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Drive for Wal and Db

2018-10-22 Thread solarflow99
Why didn't you just install the DB + WAL on the NVMe?  Is this "data disk"
still an ssd?



On Mon, Oct 22, 2018 at 3:34 PM David Turner  wrote:

> And by the data disk I mean that I didn't specify a location for the DB
> partition.
>
> On Mon, Oct 22, 2018 at 4:06 PM David Turner 
> wrote:
>
>> Track down where it says they point to?  Does it match what you expect?
>> It does for me.  I have my DB on my data disk and my WAL on a separate NVMe.
>>
>> On Mon, Oct 22, 2018 at 3:21 PM Robert Stanford 
>> wrote:
>>
>>>
>>>  David - is it ensured that wal and db both live where the symlink
>>> block.db points?  I assumed that was a symlink for the db, but necessarily
>>> for the wal, because it can live in a place different than the db.
>>>
>>> On Mon, Oct 22, 2018 at 2:18 PM David Turner 
>>> wrote:
>>>
 You can always just go to /var/lib/ceph/osd/ceph-{osd-num}/ and look at
 where the symlinks for block and block.wal point to.

 On Mon, Oct 22, 2018 at 12:29 PM Robert Stanford <
 rstanford8...@gmail.com> wrote:

>
>  That's what they say, however I did exactly this and my cluster
> utilization is higher than the total pool utilization by about the number
> of OSDs * wal size.  I want to verify that the wal is on the SSDs too but
> I've asked here and no one seems to know a way to verify this.  Do you?
>
>  Thank you, R
>
> On Mon, Oct 22, 2018 at 5:22 AM Maged Mokhtar 
> wrote:
>
>>
>> If you specify a db on ssd and data on hdd and not explicitly specify
>> a
>> device for wal, wal will be placed on same ssd partition with db.
>> Placing only wal on ssd or creating separate devices for wal and db
>> are
>> less common setups.
>>
>> /Maged
>>
>> On 22/10/18 09:03, Fyodor Ustinov wrote:
>> > Hi!
>> >
>> > For sharing SSD between WAL and DB what should be placed on SSD?
>> WAL or DB?
>> >
>> > - Original Message -
>> > From: "Maged Mokhtar" 
>> > To: "ceph-users" 
>> > Sent: Saturday, 20 October, 2018 20:05:44
>> > Subject: Re: [ceph-users] Drive for Wal and Db
>> >
>> > On 20/10/18 18:57, Robert Stanford wrote:
>> >
>> >
>> >
>> >
>> > Our OSDs are BlueStore and are on regular hard drives. Each OSD has
>> a partition on an SSD for its DB. Wal is on the regular hard drives. 
>> Should
>> I move the wal to share the SSD with the DB?
>> >
>> > Regards
>> > R
>> >
>> >
>> > ___
>> > ceph-users mailing list [ mailto:ceph-users@lists.ceph.com |
>> ceph-users@lists.ceph.com ] [
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]
>> >
>> > you should put wal on the faster device, wal and db could share the
>> same ssd partition,
>> >
>> > Maged
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
 ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD for MON/MGR/MDS

2018-10-15 Thread solarflow99
I think the answer is, yes.  I'm pretty sure only the OSDs require very
long life enterprise grade SSDs

On Mon, Oct 15, 2018 at 4:16 AM ST Wong (ITSC)  wrote:

> Hi all,
>
>
>
> We’ve got some servers with some small size SSD but no hard disks other
> than system disks.  While they’re not suitable for OSD, will the SSD be
> useful for running MON/MGR/MDS?
>
>
>
> Thanks a lot.
>
> Regards,
>
> /st wong
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mds is stuck in creating status

2018-10-15 Thread solarflow99
I had the same thing happen too when I built a ceph cluster on a single VM
for testing, I wasn't concerned though because I knew the slow speed was
likely a problem.


On Mon, Oct 15, 2018 at 7:34 AM Kisik Jeong 
wrote:

> Hello,
>
> I successfully deployed Ceph cluster with 16 OSDs and created CephFS
> before.
> But after rebooting due to mds slow request problem, when creating CephFS,
> Ceph mds goes creating status and never changes.
> Seeing Ceph status, there is no other problem I think. Here is 'ceph -s'
> result:
>
> csl@hpc1:~$ ceph -s
>   cluster:
> id: 1a32c483-cb2e-4ab3-ac60-02966a8fd327
> health: HEALTH_OK
>
>   services:
> mon: 1 daemons, quorum hpc1
> mgr: hpc1(active)
> mds: cephfs-1/1/1 up  {0=hpc1=up:creating}
> osd: 16 osds: 16 up, 16 in
>
>   data:
> pools:   2 pools, 640 pgs
> objects: 7 objects, 124B
> usage:   34.3GiB used, 116TiB / 116TiB avail
> pgs: 640 active+clean
>
> However, CephFS still works in case of 8 OSDs.
>
> If there is any doubt of this phenomenon, please let me know. Thank you.
>
> PS. I attached my ceph.conf contents:
>
> [global]
> fsid = 1a32c483-cb2e-4ab3-ac60-02966a8fd327
> mon_initial_members = hpc1
> mon_host = 192.168.40.10
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
>
> public_network = 192.168.40.0/24
> cluster_network = 192.168.40.0/24
>
> [osd]
> osd journal size = 1024
> osd max object name len = 256
> osd max object namespace len = 64
> osd mount options f2fs = active_logs=2
>
> [osd.0]
> host = hpc9
> public_addr = 192.168.40.18
> cluster_addr = 192.168.40.18
>
> [osd.1]
> host = hpc10
> public_addr = 192.168.40.19
> cluster_addr = 192.168.40.19
>
> [osd.2]
> host = hpc9
> public_addr = 192.168.40.18
> cluster_addr = 192.168.40.18
>
> [osd.3]
> host = hpc10
> public_addr = 192.168.40.19
> cluster_addr = 192.168.40.19
>
> [osd.4]
> host = hpc9
> public_addr = 192.168.40.18
> cluster_addr = 192.168.40.18
>
> [osd.5]
> host = hpc10
> public_addr = 192.168.40.19
> cluster_addr = 192.168.40.19
>
> [osd.6]
> host = hpc9
> public_addr = 192.168.40.18
> cluster_addr = 192.168.40.18
>
> [osd.7]
> host = hpc10
> public_addr = 192.168.40.19
> cluster_addr = 192.168.40.19
>
> [osd.8]
> host = hpc9
> public_addr = 192.168.40.18
> cluster_addr = 192.168.40.18
>
> [osd.9]
> host = hpc10
> public_addr = 192.168.40.19
> cluster_addr = 192.168.40.19
>
> [osd.10]
> host = hpc9
> public_addr = 192.168.10.18
> cluster_addr = 192.168.40.18
>
> [osd.11]
> host = hpc10
> public_addr = 192.168.10.19
> cluster_addr = 192.168.40.19
>
> [osd.12]
> host = hpc9
> public_addr = 192.168.10.18
> cluster_addr = 192.168.40.18
>
> [osd.13]
> host = hpc10
> public_addr = 192.168.10.19
> cluster_addr = 192.168.40.19
>
> [osd.14]
> host = hpc9
> public_addr = 192.168.10.18
> cluster_addr = 192.168.40.18
>
> [osd.15]
> host = hpc10
> public_addr = 192.168.10.19
> cluster_addr = 192.168.40.19
>
> --
> Kisik Jeong
> Ph.D. Student
> Computer Systems Laboratory
> Sungkyunkwan University
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD to pool ratio

2018-10-11 Thread solarflow99
I think PGs have more to do with this, the docs were pretty good at
explaining it.  Hope this helps

On Thu, Oct 11, 2018, 6:20 PM ST Wong (ITSC)  wrote:

> Hi all,  we’re new to CEPH.  We’ve some old machines redeployed for
> setting up CEPH cluster for our testing environment.
>
> There are over 100 disks for OSDs.   Will use replication with 2 copies.
> We wonder if it’s better to create pools on all OSDs, or using some OSDs
> for particular pools, for better performance and reliability ? Thanks a
> lot.
>
>
>
> Regards
>
> /st wong
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs set quota without mount

2018-10-11 Thread solarflow99
can't you route to your ceph public network?  that would avoid having to
create hosts on the same vlan, I think thats how most shops would do it.


On Thu, Oct 11, 2018 at 2:07 PM Felix Stolte  wrote:

> Our ceph cluster is mainly used for openstack but we also need to provide
> storage to linux workstations via nfs and smb for our windows clients. Even
> though our linux workstations could talk to cephs directly we don't want
> them to be in our ceph public network. Ceph public network is only
> connected to openstack and ceph nodes. In addition we are implementing two
> gateway servers to export cephfs via nfs/smb using ctdb as HA
>
> On 10/11/2018 08:42 PM, solarflow99 wrote:
>
> I am just interested to know more about your use case for NFS as opposed
> to just using cephfs directly, and what are you using for HA?
>
>
> On Thu, Oct 11, 2018 at 1:54 AM Felix Stolte 
> wrote:
>
>> Hey folks,
>>
>> I use nfs-ganesha to export cephfs to nfs. nfs-ganesha can talk to
>> cephfs via libcephfs so there is no need for mounting cephfs manually. I
>> also like to use directory quotas from cephfs. Anyone knows a way to set
>> quota on directories without the need to mount it first?
>>
>> I was thinking about an admin socket, but the mds does not seem to offer
>> this functionality.
>>
>> Regards Felix
>>
>> Forschungszentrum Jülich GmbH
>> 52425 Jülich
>> Sitz der Gesellschaft: Jülich
>> Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498
>> Vorsitzender des Aufsichtsrats: MinDir. Dr. Karl Eugen Huthmacher
>> Geschäftsführung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
>> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
>> Prof. Dr. Sebastian M. Schmidt
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> --
> Felix Stolte
> IT-Services
> Tel.: +49 2461 61-9243
> Email: f.sto...@fz-juelich.de
>
> Forschungszentrum Jülich GmbH
> 52425 Jülich
> Sitz der Gesellschaft: Jülich
> Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir. Dr. Karl Eugen Huthmacher
> Geschäftsführung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs set quota without mount

2018-10-11 Thread solarflow99
I am just interested to know more about your use case for NFS as opposed to
just using cephfs directly, and what are you using for HA?


On Thu, Oct 11, 2018 at 1:54 AM Felix Stolte  wrote:

> Hey folks,
>
> I use nfs-ganesha to export cephfs to nfs. nfs-ganesha can talk to
> cephfs via libcephfs so there is no need for mounting cephfs manually. I
> also like to use directory quotas from cephfs. Anyone knows a way to set
> quota on directories without the need to mount it first?
>
> I was thinking about an admin socket, but the mds does not seem to offer
> this functionality.
>
> Regards Felix
>
> Forschungszentrum Jülich GmbH
> 52425 Jülich
> Sitz der Gesellschaft: Jülich
> Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir. Dr. Karl Eugen Huthmacher
> Geschäftsführung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] daahboard

2018-10-08 Thread solarflow99
Ok, thanks for the clarification. I guess I had assumed ansible was
supposed to take care of all that, now I got it working.


On Mon, Oct 8, 2018 at 3:07 PM Jonas Jelten  wrote:

> You need to add or generate a certificate, without it the dashboard
> doesn't start.
> The procedure is described in the documentation.
>
> -- JJ
>
> On 09/10/2018 00.05, solarflow99 wrote:
> > seems like it did, yet I don't see anything listening on the port it
> should be for dashboard.
> >
> > # ceph mgr module ls
> > {
> > "enabled_modules": [
> > "dashboard",
> > "status"
> > ],
> >
> >
> >
> > # ceph status
> >   cluster:
> > id: d36fd17c-174e-40d6-95b9-86bdd196b7d2
> > health: HEALTH_OK
> >
> >   services:
> > mon: 3 daemons, quorum cephmgr101,cephmgr102,cephmgr103
> > mgr: cephmgr103(active), standbys: cephmgr102, cephmgr101
> > mds: cephfs-1/1/1 up  {0=cephmgr103=up:active}, 2 up:standby
> > osd: 3 osds: 3 up, 3 in
> >
> >   data:
> > pools:   3 pools, 192 pgs
> > objects: 2.02 k objects, 41 MiB
> > usage:   6.5 GiB used, 86 GiB / 93 GiB avail
> > pgs: 192 active+clean
> >
> >
> >
> > # netstat -tlpn | grep ceph
> > tcp0  0 172.20.3.23:6789 <http://172.20.3.23:6789>
> 0.0.0.0:*   LISTEN  8422/ceph-mon
> > tcp0  0 172.20.3.23:6800 <http://172.20.3.23:6800>
> 0.0.0.0:*   LISTEN  21250/ceph-mds
> > tcp0  0 172.20.3.23:6801 <http://172.20.3.23:6801>
> 0.0.0.0:*   LISTEN  16562/ceph-mgr
> >
> >
> > On Mon, Oct 8, 2018 at 2:48 AM John Spray  jsp...@redhat.com>> wrote:
> >
> > Assuming that ansible is correctly running "ceph mgr module enable
> > dashboard", then the next place to look is in "ceph status" (any
> > errors?) and "ceph mgr module ls" (any reports of the module unable
> to
> > run?)
> >
> > John
> > On Sat, Oct 6, 2018 at 1:53 AM solarflow99  <mailto:solarflo...@gmail.com>> wrote:
> > >
> > > I enabled the dashboard module in ansible but I don't see ceph-mgr
> listening on a port for it.  Is there something
> > else I missed?
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] daahboard

2018-10-08 Thread solarflow99
seems like it did, yet I don't see anything listening on the port it should
be for dashboard.

# ceph mgr module ls
{
"enabled_modules": [
"dashboard",
"status"
],



# ceph status
  cluster:
id: d36fd17c-174e-40d6-95b9-86bdd196b7d2
health: HEALTH_OK

  services:
mon: 3 daemons, quorum cephmgr101,cephmgr102,cephmgr103
mgr: cephmgr103(active), standbys: cephmgr102, cephmgr101
mds: cephfs-1/1/1 up  {0=cephmgr103=up:active}, 2 up:standby
osd: 3 osds: 3 up, 3 in

  data:
pools:   3 pools, 192 pgs
objects: 2.02 k objects, 41 MiB
usage:   6.5 GiB used, 86 GiB / 93 GiB avail
pgs: 192 active+clean



# netstat -tlpn | grep ceph
tcp0  0 172.20.3.23:67890.0.0.0:*
LISTEN  8422/ceph-mon
tcp0  0 172.20.3.23:68000.0.0.0:*
LISTEN  21250/ceph-mds
tcp0  0 172.20.3.23:68010.0.0.0:*
LISTEN  16562/ceph-mgr


On Mon, Oct 8, 2018 at 2:48 AM John Spray  wrote:

> Assuming that ansible is correctly running "ceph mgr module enable
> dashboard", then the next place to look is in "ceph status" (any
> errors?) and "ceph mgr module ls" (any reports of the module unable to
> run?)
>
> John
> On Sat, Oct 6, 2018 at 1:53 AM solarflow99  wrote:
> >
> > I enabled the dashboard module in ansible but I don't see ceph-mgr
> listening on a port for it.  Is there something else I missed?
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cannot write to cephfs if some osd's are not available on the client network

2018-10-06 Thread solarflow99
now this goes against what I thought I learned about ceph fs.  You should
be able to RW to/from all OSDs, how can it be limited to only a single OSD??


On Sat, Oct 6, 2018 at 4:30 AM Christopher Blum 
wrote:

> I wouldn't recommend you pursuit this any further, but if this is the only
> client that would reside on the same VM as the OSD, one thing you could try
> is to decrease the primary affinity to 0 [1] for the local OSD .
> That way that single OSD would never become a primary OSD ;)
>
> Disclaimer: This is more like a hack.
>
>
> [1] https://ceph.com/geen-categorie/ceph-primary-affinity/
>
> On Fri, Oct 5, 2018 at 10:23 PM Gregory Farnum  wrote:
>
>> On Fri, Oct 5, 2018 at 3:13 AM Marc Roos 
>> wrote:
>>
>>>
>>>
>>> I guess then this waiting "quietly" should be looked at again, I am
>>> having load of 10 on this vm.
>>>
>>> [@~]# uptime
>>>  11:51:58 up 4 days,  1:35,  1 user,  load average: 10.00, 10.01, 10.05
>>>
>>> [@~]# uname -a
>>> Linux smb 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018
>>> x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> [@~]# cat /etc/redhat-release
>>> CentOS Linux release 7.5.1804 (Core)
>>>
>>> [@~]# dmesg
>>> [348948.927734] libceph: osd23 192.168.10.114:6810 socket closed (con
>>> state CONNECTING)
>>> [348957.120090] libceph: osd27 192.168.10.114:6802 socket closed (con
>>> state CONNECTING)
>>> [349010.370171] libceph: osd26 192.168.10.114:6806 socket closed (con
>>> state CONNECTING)
>>> [349114.822301] libceph: osd24 192.168.10.114:6804 socket closed (con
>>> state CONNECTING)
>>> [349141.447330] libceph: osd29 192.168.10.114:6812 socket closed (con
>>> state CONNECTING)
>>> [349278.668658] libceph: osd25 192.168.10.114:6800 socket closed (con
>>> state CONNECTING)
>>> [349440.467038] libceph: osd28 192.168.10.114:6808 socket closed (con
>>> state CONNECTING)
>>> [349465.043957] libceph: osd23 192.168.10.114:6810 socket closed (con
>>> state CONNECTING)
>>> [349473.236400] libceph: osd27 192.168.10.114:6802 socket closed (con
>>> state CONNECTING)
>>> [349526.486408] libceph: osd26 192.168.10.114:6806 socket closed (con
>>> state CONNECTING)
>>> [349630.938498] libceph: osd24 192.168.10.114:6804 socket closed (con
>>> state CONNECTING)
>>> [349657.563561] libceph: osd29 192.168.10.114:6812 socket closed (con
>>> state CONNECTING)
>>> [349794.784936] libceph: osd25 192.168.10.114:6800 socket closed (con
>>> state CONNECTING)
>>> [349956.583300] libceph: osd28 192.168.10.114:6808 socket closed (con
>>> state CONNECTING)
>>> [349981.160225] libceph: osd23 192.168.10.114:6810 socket closed (con
>>> state CONNECTING)
>>> [349989.352510] libceph: osd27 192.168.10.114:6802 socket closed (con
>>> state CONNECTING)
>>>
>>
>> Looks like in this case the client is spinning trying to establish the
>> network connections it expects to be available. There's not really much
>> else it can do — we expect and require full routing. The monitors are
>> telling the clients that the OSDs are up and available, and it is doing
>> data IO that requires them. So it tries to establish a connection, sees the
>> network fail, and tries again.
>>
>> Unfortunately the restricted-network use case you're playing with here is
>> just not supported by Ceph.
>> -Greg
>>
>>
>>> ..
>>> ..
>>> ..
>>>
>>>
>>>
>>>
>>> -Original Message-
>>> From: John Spray [mailto:jsp...@redhat.com]
>>> Sent: donderdag 27 september 2018 11:43
>>> To: Marc Roos
>>> Cc: ceph-users@lists.ceph.com
>>> Subject: Re: [ceph-users] Cannot write to cephfs if some osd's are not
>>> available on the client network
>>>
>>> On Thu, Sep 27, 2018 at 10:16 AM Marc Roos 
>>> wrote:
>>> >
>>> >
>>> > I have a test cluster and on a osd node I put a vm. The vm is using a
>>> > macvtap on the client network interface of the osd node. Making access
>>>
>>> > to local osd's impossible.
>>> >
>>> > the vm of course reports that it cannot access the local osd's. What I
>>>
>>> > am getting is:
>>> >
>>> > - I cannot reboot this vm normally, need to reset it.
>>>
>>> When linux tries to shut down cleanly, part of that is flushing buffers
>>> from any mounted filesystem back to disk.  If you have a network
>>> filesystem mounted, and the network is unavailable, that can cause the
>>> process to block.  You can try forcibly unmounting before rebooting.
>>>
>>> > - vm is reporting very high load.
>>>
>>> The CPU load part is surprising -- in general Ceph clients should wait
>>> quietly when blocked, rather than spinning.
>>>
>>> > I guess this should not be happening not? Because it should choose an
>>> > other available osd of the 3x replicated pool and just write the data
>>> > to that one?
>>>
>>> No -- writes always go through the primary OSD for the PG being written
>>> to.  If an OSD goes down, then another OSD will become the primary.  In
>>> your case, the primary OSD is not going down, it's just being cut off
>>> from the client by the network, so the writes are blocking indefinitely.
>>>
>>> John
>>>
>>> >
>>> >
>>> >
>>> >

[ceph-users] daahboard

2018-10-05 Thread solarflow99
I enabled the dashboard module in ansible but I don't see ceph-mgr
listening on a port for it.  Is there something else I missed?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Some questions concerning filestore --> bluestore migration

2018-10-05 Thread solarflow99
oh my.. yes 2TB enterprise class SSDs, that a much higher requirement than
filestore needed.  That would be cost prohibitive to any lower end ceph
cluster,



On Thu, Oct 4, 2018 at 11:19 PM Massimo Sgaravatto <
massimo.sgarava...@gmail.com> wrote:

> Argg !!
> With 10x10TB SATA DB and 2 SSD disks this would mean 2 TB for each SSD !
> If this is really required I am afraid I will keep using filestore ...
>
> Cheers, Massimo
>
> On Fri, Oct 5, 2018 at 7:26 AM  wrote:
>
>> Hello
>>
>> Am 4. Oktober 2018 02:38:35 MESZ schrieb solarflow99 <
>> solarflo...@gmail.com>:
>> >I use the same configuration you have, and I plan on using bluestore.
>> >My
>> >SSDs are only 240GB and it worked with filestore all this time, I
>> >suspect
>> >bluestore should be fine too.
>> >
>> >
>> >On Wed, Oct 3, 2018 at 4:25 AM Massimo Sgaravatto <
>> >massimo.sgarava...@gmail.com> wrote:
>> >
>> >> Hi
>> >>
>> >> I have a ceph cluster, running luminous, composed of 5 OSD nodes,
>> >which is
>> >> using filestore.
>> >> Each OSD node has 2 E5-2620 v4 processors, 64 GB of RAM, 10x6TB SATA
>> >disk
>> >> + 2x200GB SSD disk (then I have 2 other disks in RAID for the OS), 10
>> >Gbps.
>> >> So each SSD disk is used for the journal for 5 OSDs. With this
>> >> configuration everything is running smoothly ...
>> >>
>> >>
>> >> We are now buying some new storage nodes, and I am trying to buy
>> >something
>> >> which is bluestore compliant. So the idea is to consider a
>> >configuration
>> >> something like:
>> >>
>> >> - 10 SATA disks (8TB / 10TB / 12TB each. TBD)
>> >> - 2 processor (~ 10 core each)
>> >> - 64 GB of RAM
>> >> - 2 SSD to be used for WAL+DB
>> >> - 10 Gbps
>> >>
>> >> For what concerns the size of the SSD disks I read in this mailing
>> >list
>> >> that it is suggested to have at least 10GB of SSD disk/10TB of SATA
>> >disk.
>> >>
>> >>
>> >> So, the questions:
>> >>
>> >> 1) Does this hardware configuration seem reasonable ?
>> >>
>> >> 2) Are there problems to live (forever, or until filestore
>> >deprecation)
>> >> with some OSDs using filestore (the old ones) and some OSDs using
>> >bluestore
>> >> (the old ones) ?
>> >>
>> >> 3) Would you suggest to update to bluestore also the old OSDs, even
>> >if the
>> >> available SSDs are too small (they don't satisfy the "10GB of SSD
>> >disk/10TB
>> >> of SATA disk" rule) ?
>>
>> AFAIR should the db size 4% of the osd in question.
>>
>> So
>>
>> For example, if the block size is 1TB, then block.db shouldn’t be less
>> than 40GB
>>
>> See:
>> http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/
>>
>> Hth
>> - Mehmet
>>
>> >>
>> >> Thanks, Massimo
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-volume: recreate OSD with same ID after drive replacement

2018-10-03 Thread solarflow99
thats strange, I recall only deleting the OSD from the crushmap, authm then
osd rm..


On Wed, Oct 3, 2018 at 2:54 PM Alfredo Deza  wrote:

> On Wed, Oct 3, 2018 at 3:52 PM Andras Pataki
>  wrote:
> >
> > Ok, understood (for next time).
> >
> > But just as an update/closure to my investigation - it seems this is a
> > feature of ceph-volume (that it can't just create an OSD from scratch
> > with a given ID), not of base ceph.  The underlying ceph command (ceph
> > osd new) very happily accepts an osd-id as an extra optional argument
> > (after the fsid), and creates and osd with the given ID.  In fact, a
> > quick change to ceph_volume (create_id function in prepare.py) will make
> > ceph-volume recreate the OSD with a given ID.  I'm not a ceph-volume
> > expert, but a feature to create an OSD with a given ID from scratch
> > would be nice (given that the underlying raw ceph commands already
> > support it).
>
> That is something that I wasn't aware of, thanks for bringing it up.
> I've created an issue on the tracker to accommodate for that behavior:
>
> http://tracker.ceph.com/issues/36307
>
> >
> > Andras
> >
> > On 10/3/18 11:41 AM, Alfredo Deza wrote:
> > > On Wed, Oct 3, 2018 at 11:23 AM Andras Pataki
> > >  wrote:
> > >> Thanks - I didn't realize that was such a recent fix.
> > >>
> > >> I've now tried 12.2.8, and perhaps I'm not clear on what I should have
> > >> done to the OSD that I'm replacing, since I'm getting the error "The
> osd
> > >> ID 747 is already in use or does not exist.".  The case is clearly the
> > >> latter, since I've completely removed the old OSD (osd crush remove,
> > >> auth del, osd rm, wipe disk).  Should I have done something different
> > >> (i.e. not remove the OSD completely)?
> > > Yeah, you completely removed it so now it can't be re-used. This is
> > > the proper way if wanting to re-use the ID:
> > >
> > >
> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#rados-replacing-an-osd
> > >
> > > Basically:
> > >
> > >  ceph osd destroy {id} --yes-i-really-mean-it
> > >
> > >> Searching the docs I see a command 'ceph osd destroy'.  What does that
> > >> do (compared to my removal procedure, osd crush remove, auth del, osd
> rm)?
> > >>
> > >> Thanks,
> > >>
> > >> Andras
> > >>
> > >>
> > >> On 10/3/18 10:36 AM, Alfredo Deza wrote:
> > >>> On Wed, Oct 3, 2018 at 9:57 AM Andras Pataki
> > >>>  wrote:
> >  After replacing failing drive I'd like to recreate the OSD with the
> same
> >  osd-id using ceph-volume (now that we've moved to ceph-volume from
> >  ceph-disk).  However, I seem to not be successful.  The command I'm
> using:
> > 
> >  ceph-volume lvm prepare --bluestore --osd-id 747 --data
> H901D44/H901D44
> >  --block.db /dev/disk/by-partlabel/H901J44
> > 
> >  But it created an OSD the ID 601, which was the lowest it could
> allocate
> >  and ignored the 747 apparently.  This is with ceph 12.2.7. Any
> ideas?
> > >>> Yeah, this was a problem that was fixed and released as part of
> 12.2.8
> > >>>
> > >>> The tracker issue is: http://tracker.ceph.com/issues/24044
> > >>>
> > >>> The Luminous PR is https://github.com/ceph/ceph/pull/23102
> > >>>
> > >>> Sorry for the trouble!
> >  Andras
> > 
> >  ___
> >  ceph-users mailing list
> >  ceph-users@lists.ceph.com
> >  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Some questions concerning filestore --> bluestore migration

2018-10-03 Thread solarflow99
I use the same configuration you have, and I plan on using bluestore.  My
SSDs are only 240GB and it worked with filestore all this time, I suspect
bluestore should be fine too.


On Wed, Oct 3, 2018 at 4:25 AM Massimo Sgaravatto <
massimo.sgarava...@gmail.com> wrote:

> Hi
>
> I have a ceph cluster, running luminous, composed of 5 OSD nodes, which is
> using filestore.
> Each OSD node has 2 E5-2620 v4 processors, 64 GB of RAM, 10x6TB SATA disk
> + 2x200GB SSD disk (then I have 2 other disks in RAID for the OS), 10 Gbps.
> So each SSD disk is used for the journal for 5 OSDs. With this
> configuration everything is running smoothly ...
>
>
> We are now buying some new storage nodes, and I am trying to buy something
> which is bluestore compliant. So the idea is to consider a configuration
> something like:
>
> - 10 SATA disks (8TB / 10TB / 12TB each. TBD)
> - 2 processor (~ 10 core each)
> - 64 GB of RAM
> - 2 SSD to be used for WAL+DB
> - 10 Gbps
>
> For what concerns the size of the SSD disks I read in this mailing list
> that it is suggested to have at least 10GB of SSD disk/10TB of SATA disk.
>
>
> So, the questions:
>
> 1) Does this hardware configuration seem reasonable ?
>
> 2) Are there problems to live (forever, or until filestore deprecation)
> with some OSDs using filestore (the old ones) and some OSDs using bluestore
> (the old ones) ?
>
> 3) Would you suggest to update to bluestore also the old OSDs, even if the
> available SSDs are too small (they don't satisfy the "10GB of SSD disk/10TB
> of SATA disk" rule) ?
>
> Thanks, Massimo
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] too few PGs per OSD

2018-10-01 Thread solarflow99
I have a new deployment and it always has this problem even if I increase
the size of the OSD, it stays at 8.  I saw examples where others had this
problem but it was with the RBD pool, I don't have an RBD pool, and just
deployed it fresh with ansible.


health: HEALTH_WARN
1 MDSs report slow metadata IOs
Reduced data availability: 16 pgs inactive
Degraded data redundancy: 16 pgs undersized
too few PGs per OSD (16 < min 30)
 data:
pools:   2 pools, 16 pgs
objects: 0  objects, 0 B
usage:   2.0 GiB used, 39 GiB / 41 GiB avail
pgs: 100.000% pgs not active
 16 undersized+peered


# ceph osd pool ls
cephfs_data
cephfs_metadata


# ceph osd tree
ID CLASS WEIGHT  TYPE NAMESTATUS REWEIGHT PRI-AFF
-1   0.03989 root default
-3   0.03989 host mytesthost104
 0   hdd 0.03989 osd.0up  1.0 1.0


# ceph osd pool set cephfs_data pgp_num 64
Error EINVAL: specified pgp_num 64 > pg_num 8
# ceph osd pool set cephfs_data pgp_num 256
Error EINVAL: specified pgp_num 256 > pg_num 8
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-ansible

2018-09-27 Thread solarflow99
Thanks guys, installing this package did the trick, it works now.



On Mon, Sep 24, 2018 at 8:39 AM Ken Dreyer  wrote:

> Hi Alfredo,
>
> I've packaged the latest version in Fedora, but I didn't update EPEL.
> I've submitted the update for EPEL now at
> https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2018-7f8d3be3e2 .
> solarflow99, you can test this package and report "+1" in Bodhi there.
>
> It's also in the CentOS Storage SIG
> (http://cbs.centos.org/koji/buildinfo?buildID=23004) . Today I've
> tagged that build in CBS into storage7-ceph-luminous-testing and
> storage7-ceph-mimic-testing, so it will show up at
> https://buildlogs.centos.org/centos/7/storage/x86_64/ceph-luminous/
> soon. solarflow99, you could test this as well (although CentOS does
> not have a feedback mechanism like Fedora's Bodhi yet)
> On Fri, Sep 21, 2018 at 4:43 AM Alfredo Deza  wrote:
> >
> > On Thu, Sep 20, 2018 at 7:04 PM solarflow99 
> wrote:
> > >
> > > oh, was that all it was...  git clone
> https://github.com/ceph/ceph-ansible/
> > > I installed the notario  package from EPEL,
> python2-notario-0.0.11-2.el7.noarch  and thats the newest they have
> >
> > Hey Ken, I thought the latest versions were being packaged, is there
> > something I've missed? The tags have changed format it seems, from
> > 0.0.11
> > >
> > >
> > >
> > >
> > > On Thu, Sep 20, 2018 at 3:57 PM Alfredo Deza  wrote:
> > >>
> > >> Not sure how you installed ceph-ansible, the requirements mention a
> > >> version of a dependency (the notario module) which needs to be 0.0.13
> > >> or newer, and you seem to be using an older one.
> > >>
> > >>
> > >> On Thu, Sep 20, 2018 at 6:53 PM solarflow99 
> wrote:
> > >> >
> > >> > Hi, tying to get this to do a simple deployment, and i'm getting a
> strange error, has anyone seen this?  I'm using Centos 7, rel 5   ansible
> 2.5.3  python version = 2.7.5
> > >> >
> > >> > I've tried with mimic luninous and even jewel, no luck at all.
> > >> >
> > >> >
> > >> >
> > >> > TASK [ceph-validate : validate provided configuration]
> **
> > >> > task path:
> /home/jzygmont/ansible/ceph-ansible/roles/ceph-validate/tasks/main.yml:2
> > >> > Thursday 20 September 2018  14:05:18 -0700 (0:00:05.734)
>  0:00:37.439 
> > >> > The full traceback is:
> > >> > Traceback (most recent call last):
> > >> >   File
> "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line
> 138, in run
> > >> > res = self._execute()
> > >> >   File
> "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line
> 561, in _execute
> > >> > result = self._handler.run(task_vars=variables)
> > >> >   File
> "/home/jzygmont/ansible/ceph-ansible/plugins/actions/validate.py", line 43,
> in run
> > >> > notario.validate(host_vars, install_options, defined_keys=True)
> > >> > TypeError: validate() got an unexpected keyword argument
> 'defined_keys'
> > >> >
> > >> > fatal: [172.20.3.178]: FAILED! => {
> > >> > "msg": "Unexpected failure during module execution.",
> > >> > "stdout": ""
> > >> > }
> > >> >
> > >> > NO MORE HOSTS LEFT
> **
> > >> >
> > >> > PLAY RECAP
> **
> > >> > 172.20.3.178   : ok=25   changed=0unreachable=0
> failed=1
> > >> >
> > >> > ___
> > >> > ceph-users mailing list
> > >> > ceph-users@lists.ceph.com
> > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [slightly OT] XFS vs. BTRFS vs. others as root/usr/var/tmp filesystems ?

2018-09-23 Thread solarflow99
ya, sadly it looks like btrfs will never materialize as the next filesystem
of the future.  Redhat as an example even dropped it from its future, as
others probably will and have too.


On Sun, Sep 23, 2018 at 11:28 AM mj  wrote:

> Hi,
>
> Just a very quick and simple reply:
>
> XFS has *always* treated us nicely, and we have been using it for a VERY
> long time, ever since the pre-2000 suse 5.2 days on pretty much all our
> machines.
>
> We have seen only very few corruptions on xfs, and the few times we
> tried btrfs, (almost) always 'something' happened. (same for the few
> times we tried reiserfs, btw)
>
> So, while my story may be very anecdotical (and you will probably find
> many others here claiming the opposite) our own conclusion is very
> clear: we love xfs, and do not like btrfs very much.
>
> MJ
>
> On 09/22/2018 10:58 AM, Nicolas Huillard wrote:
> > Hi all,
> >
> > I don't have a good track record with XFS since I got rid of ReiserFS a
> > long time ago. I decided XFS was a good idea on servers, while I tested
> > BTRFS on various less important devices.
> > So far, XFS betrayed me far more often (a few times) than BTRFS
> > (never).
> > Last time was yesterday, on a root filesystem with "Block out of range:
> > block 0x17b9814b0, EOFS 0x12a000" "I/O Error Detected. Shutting down
> > filesystem" (shutting down the root filesystem is pretty hard).
> >
> > Some threads on this ML discuss a similar problem, related to
> > partitioning and logical sectors located just after the end of the
> > partition. The problem here does not seem to be the same, as the
> > requested block is very far out of bound (2 orders of magnitude too
> > far), and I use a recent Debian stock kernel with every security patch.
> >
> > My question is : should I trust XFS for small root filesystems (/,
> > /tmp, /var on LVM sitting within md-RAID1 smallish partition), or is
> > BTRFS finally trusty enough for a general purpose cluster (still root
> > et al. filesystems), or do you guys just use the distro-recommended
> > setup (typically Ext4 on plain disks) ?
> >
> > Debian stretch with 4.9.110-3+deb9u4 kernel.
> > Ceph 12.2.8 on bluestore (not related to the question).
> >
> > Partial output of lsblk /dev/sdc /dev/nvme0n1:
> > NAME  MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
> > sdc 8:32   0 447,1G  0 disk
> > ├─sdc1  8:33   0  55,9G  0 part
> > │ └─md0 9:00  55,9G  0 raid1
> > │   ├─oxygene_system-root 253:40   9,3G  0 lvm   /
> > │   ├─oxygene_system-tmp  253:50   9,3G  0 lvm   /tmp
> > │   └─oxygene_system-var  253:60   4,7G  0 lvm   /var
> > └─sdc2  8:34   0  29,8G  0 part  [SWAP]
> > nvme0n1   259:00   477G  0 disk
> > ├─nvme0n1p1   259:10  55,9G  0 part
> > │ └─md0 9:00  55,9G  0 raid1
> > │   ├─oxygene_system-root 253:40   9,3G  0 lvm   /
> > │   ├─oxygene_system-tmp  253:50   9,3G  0 lvm   /tmp
> > │   └─oxygene_system-var  253:60   4,7G  0 lvm   /var
> > ├─nvme0n1p2   259:20  29,8G  0 part  [SWAP]
> >
> > TIA !
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-ansible

2018-09-20 Thread solarflow99
oh, was that all it was...  git clone https://github.com/ceph/ceph-ansible/
I installed the notario  package from EPEL,
python2-notario-0.0.11-2.el7.noarch  and thats the newest they have




On Thu, Sep 20, 2018 at 3:57 PM Alfredo Deza  wrote:

> Not sure how you installed ceph-ansible, the requirements mention a
> version of a dependency (the notario module) which needs to be 0.0.13
> or newer, and you seem to be using an older one.
>
>
> On Thu, Sep 20, 2018 at 6:53 PM solarflow99  wrote:
> >
> > Hi, tying to get this to do a simple deployment, and i'm getting a
> strange error, has anyone seen this?  I'm using Centos 7, rel 5   ansible
> 2.5.3  python version = 2.7.5
> >
> > I've tried with mimic luninous and even jewel, no luck at all.
> >
> >
> >
> > TASK [ceph-validate : validate provided configuration]
> **
> > task path:
> /home/jzygmont/ansible/ceph-ansible/roles/ceph-validate/tasks/main.yml:2
> > Thursday 20 September 2018  14:05:18 -0700 (0:00:05.734)
>  0:00:37.439 
> > The full traceback is:
> > Traceback (most recent call last):
> >   File
> "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line
> 138, in run
> > res = self._execute()
> >   File
> "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line
> 561, in _execute
> > result = self._handler.run(task_vars=variables)
> >   File
> "/home/jzygmont/ansible/ceph-ansible/plugins/actions/validate.py", line 43,
> in run
> > notario.validate(host_vars, install_options, defined_keys=True)
> > TypeError: validate() got an unexpected keyword argument 'defined_keys'
> >
> > fatal: [172.20.3.178]: FAILED! => {
> > "msg": "Unexpected failure during module execution.",
> > "stdout": ""
> > }
> >
> > NO MORE HOSTS LEFT
> **
> >
> > PLAY RECAP
> **
> > 172.20.3.178   : ok=25   changed=0unreachable=0
> failed=1
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-ansible

2018-09-20 Thread solarflow99
Hi, tying to get this to do a simple deployment, and i'm getting a strange
error, has anyone seen this?  I'm using Centos 7, rel 5   ansible 2.5.3
python version = 2.7.5

I've tried with mimic luninous and even jewel, no luck at all.



TASK [ceph-validate : validate provided configuration]
**
task path:
/home/jzygmont/ansible/ceph-ansible/roles/ceph-validate/tasks/main.yml:2
Thursday 20 September 2018  14:05:18 -0700 (0:00:05.734)   0:00:37.439

The full traceback is:
Traceback (most recent call last):
  File
"/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line
138, in run
res = self._execute()
  File
"/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line
561, in _execute
result = self._handler.run(task_vars=variables)
  File "/home/jzygmont/ansible/ceph-ansible/plugins/actions/validate.py",
line 43, in run
notario.validate(host_vars, install_options, defined_keys=True)
TypeError: validate() got an unexpected keyword argument 'defined_keys'

fatal: [172.20.3.178]: FAILED! => {
"msg": "Unexpected failure during module execution.",
"stdout": ""
}

NO MORE HOSTS LEFT
**

PLAY RECAP
**
172.20.3.178   : ok=25   changed=0unreachable=0failed=1
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] network architecture questions

2018-09-18 Thread solarflow99
thanks for the replies, I don't know that cephFS clients go through the
MONs, they reach the OSDs directly.  When I mentioned NFS, I meant NFS
clients (ie. not cephFS clients) This should have been pretty straight
forward.
Anyone doing HA on the MONs?  How do you mount the cephFS shares, surely
you'd have a vip?



On Tue, Sep 18, 2018 at 12:37 PM Jean-Charles Lopez 
wrote:

> > On Sep 17, 2018, at 16:13, solarflow99  wrote:
> >
> > Hi, I read through the various documentation and had a few questions:
> >
> > - From what I understand cephFS clients reach the OSDs directly, does
> the cluster network need to be opened up as a public network?
> Client traffic only goes over the public network. Only OSD to OSD traffic
> (replication, rebalancing, recovery go over the cluster network)
> >
> > - Is it still necessary to have a public and cluster network when the
> using cephFS since the clients all reach the OSD's directly?
> Separating the network is a plus for troubleshooting and sizing for
> bandwidth
> >
> > - Simplest way to do HA on the mons for providing NFS, etc?
> Don’t really understand the question (NFS vs CephFS).
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] network architecture questions

2018-09-18 Thread solarflow99
Hi, anyone able to answer these few questions?



On Mon, Sep 17, 2018 at 4:13 PM solarflow99  wrote:

> Hi, I read through the various documentation and had a few questions:
>
> - From what I understand cephFS clients reach the OSDs directly, does the
> cluster network need to be opened up as a public network?
>
> - Is it still necessary to have a public and cluster network when the
> using cephFS since the clients all reach the OSD's directly?
>
> - Simplest way to do HA on the mons for providing NFS, etc?
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] network architecture questions

2018-09-17 Thread solarflow99
Hi, I read through the various documentation and had a few questions:

- From what I understand cephFS clients reach the OSDs directly, does the
cluster network need to be opened up as a public network?

- Is it still necessary to have a public and cluster network when the using
cephFS since the clients all reach the OSD's directly?

- Simplest way to do HA on the mons for providing NFS, etc?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com