Re: [ceph-users] Ceph-Deploy error on 15/71 stage
Hi Eugen. Just tried everything again here by removing the /sda4 partitions and letting it so that either salt-run proposal-populate or salt-run state.orch ceph.stage.configure could try to find the free space on the partitions to work with: unsuccessfully again. :( Just to make things clear: are you so telling me that it is completely impossible to have a ceph "volume" in non-dedicated devices, sharing space with, for instance, the nodes swap, boot or main partition? And so the only possible way to have a functioning ceph distributed filesystem working would be by having in each node at least one disk dedicated for the operational system and another, independent disk dedicated to the ceph filesystem? That would be a awful drawback in our plans if real, but if there is no other way, we will have to just give up. Just, please, answer this two questions clearly, before we capitulate? :( Anyway, thanks a lot, once again, Jones On Mon, Sep 3, 2018 at 5:39 AM Eugen Block wrote: > Hi Jones, > > I still don't think creating an OSD on a partition will work. The > reason is that SES creates an additional partition per OSD resulting > in something like this: > > vdb 253:16 05G 0 disk > ├─vdb1253:17 0 100M 0 part /var/lib/ceph/osd/ceph-1 > └─vdb2253:18 0 4,9G 0 part > > Even with external block.db and wal.db on additional devices you would > still need two partitions for the OSD. I'm afraid with your setup this > can't work. > > Regards, > Eugen > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph-Deploy error on 15/71 stage
-03:00 polar kernel: [3.036222] ata2.00: configured for UDMA/133 2018-08-30T10:21:18.787469-03:00 polar kernel: [3.043916] scsi 1:0:0:0: CD-ROMPLDS DVD+-RW DU-8A5LH 6D1M PQ: 0 ANSI: 5 2018-08-30T10:21:18.787470-03:00 polar kernel: [3.052087] usb 1-6: new low-speed USB device number 2 using xhci_hcd 2018-08-30T10:21:18.787471-03:00 polar kernel: [3.063179] scsi 1:0:0:0: Attached scsi generic sg1 type 5 2018-08-30T10:21:18.787472-03:00 polar kernel: [3.083566] sda: sda1 sda2 sda3 sda4 2018-08-30T10:21:18.787472-03:00 polar kernel: [3.084238] sd 0:0:0:0: [sda] Attached SCSI disk 2018-08-30T10:21:18.787473-03:00 polar kernel: [3.113065] sr 1:0:0:0: [sr0] scsi3-mmc drive: 24x/24x writer cd/rw xa/form2 cdda tray 2018-08-30T10:21:18.787475-03:00 polar kernel: [3.113068] cdrom: Uniform CD-ROM driver Revision: 3.20 2018-08-30T10:21:18.787476-03:00 polar kernel: [3.113272] sr 1:0:0:0: Attached scsi CD-ROM sr0 2018-08-30T10:21:18.787477-03:00 polar kernel: [3.213133] usb 1-6: New USB device found, idVendor=413c, idProduct=2113 ### I'm trying to run deploy again here, however I'm having some connection issues today (possibly due to the heavy rain) affecting the initial stages of it. If it succeeds, I send the outputs from /var/log/messages on the minions right away. Thanks a lot, Jones On Fri, Aug 31, 2018 at 4:00 AM Eugen Block wrote: > Hi, > > I'm not sure if there's a misunderstanding. You need to track the logs > during the osd deployment step (stage.3), that is where it fails, and > this is where /var/log/messages could be useful. Since the deployment > failed you have no systemd-units (ceph-osd@.service) to log > anything. > > Before running stage.3 again try something like > > grep -C5 ceph-disk /var/log/messages (or messages-201808*.xz) > > or > > grep -C5 sda4 /var/log/messages (or messages-201808*.xz) > > If that doesn't reveal anything run stage.3 again and watch the logs. > > Regards, > Eugen > > > Zitat von Jones de Andrade : > > > Hi Eugen. > > > > Ok, edited the file /etc/salt/minion, uncommented the "log_level_logfile" > > line and set it to "debug" level. > > > > Turned off the computer, waited a few minutes so that the time frame > would > > stand out in the /var/log/messages file, and restarted the computer. > > > > Using vi I "greped out" (awful wording) the reboot section. From that, I > > also removed most of what it seemed totally unrelated to ceph, salt, > > minions, grafana, prometheus, whatever. > > > > I got the lines below. It does not seem to complain about anything that I > > can see. :( > > > > > > 2018-08-30T15:41:46.455383-03:00 torcello systemd[1]: systemd 234 running > > in system mode. (+PAM -AUDIT +SELINUX -IMA +APPARMOR -SMACK +SYSVINIT > +UTMP > > +LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID -ELFUTILS > > +KMOD -IDN2 -IDN default-hierarchy=hybrid) > > 2018-08-30T15:41:46.456330-03:00 torcello systemd[1]: Detected > architecture > > x86-64. > > 2018-08-30T15:41:46.456350-03:00 torcello systemd[1]: nss-lookup.target: > > Dependency Before=nss-lookup.target dropped > > 2018-08-30T15:41:46.456357-03:00 torcello systemd[1]: Started Load Kernel > > Modules. > > 2018-08-30T15:41:46.456369-03:00 torcello systemd[1]: Starting Apply > Kernel > > Variables... > > 2018-08-30T15:41:46.457230-03:00 torcello systemd[1]: Started > Alertmanager > > for prometheus. > > 2018-08-30T15:41:46.457237-03:00 torcello systemd[1]: Started Monitoring > > system and time series database. > > 2018-08-30T15:41:46.457403-03:00 torcello systemd[1]: Starting NTP > > client/server... > > > > > > > > > > > > > > *2018-08-30T15:41:46.457425-03:00 torcello systemd[1]: Started Prometheus > > exporter for machine metrics.2018-08-30T15:41:46.457706-03:00 torcello > > prometheus[695]: level=info ts=2018-08-30T18:41:44.797896888Z > > caller=main.go:225 msg="Starting Prometheus" version="(version=2.1.0, > > branch=non-git, revision=non-git)"2018-08-30T15:41:46.457712-03:00 > torcello > > prometheus[695]: level=info ts=2018-08-30T18:41:44.797969232Z > > caller=main.go:226 build_context="(go=go1.9.4, user=abuild@lamb69, > > date=20180513-03:46:03)"2018-08-30T15:41:46.457719-03:00 torcello > > prometheus[695]: level=info ts=2018-08-30T18:41:44.798008802Z > > caller=main.go:227 host_details="(Linux 4.12.14-lp150.12.4-default #1 SMP > > Tue May 22 05:17:22 UTC 2018 (66b2eda) x86_64 torcello > > (none))"2018-08-30T15:41:46.457726-03:00 torcello pr
Re: [ceph-users] Ceph-Deploy error on 15/71 stage
o systemd[2295]: Reached target Timers. 2018-08-30T15:44:15.511664-03:00 torcello systemd[2295]: Reached target Paths. 2018-08-30T15:44:15.517873-03:00 torcello systemd[2295]: Listening on D-Bus User Message Bus Socket. 2018-08-30T15:44:15.518060-03:00 torcello systemd[2295]: Reached target Sockets. 2018-08-30T15:44:15.518216-03:00 torcello systemd[2295]: Reached target Basic System. 2018-08-30T15:44:15.518373-03:00 torcello systemd[2295]: Reached target Default. 2018-08-30T15:44:15.518501-03:00 torcello systemd[2295]: Startup finished in 31ms. 2018-08-30T15:44:15.518634-03:00 torcello systemd[1]: Started User Manager for UID 1000. 2018-08-30T15:44:15.518759-03:00 torcello systemd[1792]: Received SIGRTMIN+24 from PID 2300 (kill). 2018-08-30T15:44:15.537634-03:00 torcello systemd[1]: Stopped User Manager for UID 464. 2018-08-30T15:44:15.538422-03:00 torcello systemd[1]: Removed slice User Slice of sddm. 2018-08-30T15:44:15.613246-03:00 torcello systemd[2295]: Started D-Bus User Message Bus. 2018-08-30T15:44:15.623989-03:00 torcello dbus-daemon[2311]: [session uid=1000 pid=2311] Successfully activated service 'org.freedesktop.systemd1' 2018-08-30T15:44:16.447162-03:00 torcello kapplymousetheme[2350]: kcm_input: Using X11 backend 2018-08-30T15:44:16.901642-03:00 torcello node_exporter[807]: time="2018-08-30T15:44:16-03:00" level=error msg="ERROR: ntp collector failed after 0.000205s: couldn't get SNTP reply: read udp 127.0.0.1:53434-> 127.0.0.1:123: read: connection refused" source="collector.go:123" Any ideas? Thanks a lot, Jones On Thu, Aug 30, 2018 at 4:14 AM Eugen Block wrote: > Hi, > > > So, it only contains logs concerning the node itself (is it correct? > sincer > > node01 is also the master, I was expecting it to have logs from the other > > too) and, moreover, no ceph-osd* files. Also, I'm looking the logs I have > > available, and nothing "shines out" (sorry for my poor english) as a > > possible error. > > the logging is not configured to be centralised per default, you would > have to configure that yourself. > > Regarding the OSDs, if there are OSD logs created, they're created on > the OSD nodes, not on the master. But since the OSD deployment fails, > there probably are no OSD specific logs yet. So you'll have to take a > look into the syslog (/var/log/messages), that's where the salt-minion > reports its attempts to create the OSDs. Chances are high that you'll > find the root cause in here. > > If the output is not enough, set the log-level to debug: > > osd-1:~ # grep -E "^log_level" /etc/salt/minion > log_level: debug > > > Regards, > Eugen > > > Zitat von Jones de Andrade : > > > Hi Eugen. > > > > Sorry for the delay in answering. > > > > Just looked in the /var/log/ceph/ directory. It only contains the > following > > files (for example on node01): > > > > ### > > # ls -lart > > total 3864 > > -rw--- 1 ceph ceph 904 ago 24 13:11 ceph.audit.log-20180829.xz > > drwxr-xr-x 1 root root 898 ago 28 10:07 .. > > -rw-r--r-- 1 ceph ceph 189464 ago 28 23:59 > ceph-mon.node01.log-20180829.xz > > -rw--- 1 ceph ceph 24360 ago 28 23:59 ceph.log-20180829.xz > > -rw-r--r-- 1 ceph ceph 48584 ago 29 00:00 > ceph-mgr.node01.log-20180829.xz > > -rw--- 1 ceph ceph 0 ago 29 00:00 ceph.audit.log > > drwxrws--T 1 ceph ceph 352 ago 29 00:00 . > > -rw-r--r-- 1 ceph ceph 1908122 ago 29 12:46 ceph-mon.node01.log > > -rw--- 1 ceph ceph 175229 ago 29 12:48 ceph.log > > -rw-r--r-- 1 ceph ceph 1599920 ago 29 12:49 ceph-mgr.node01.log > > ### > > > > So, it only contains logs concerning the node itself (is it correct? > sincer > > node01 is also the master, I was expecting it to have logs from the other > > too) and, moreover, no ceph-osd* files. Also, I'm looking the logs I have > > available, and nothing "shines out" (sorry for my poor english) as a > > possible error. > > > > Any suggestion on how to proceed? > > > > Thanks a lot in advance, > > > > Jones > > > > > > On Mon, Aug 27, 2018 at 5:29 AM Eugen Block wrote: > > > >> Hi Jones, > >> > >> all ceph logs are in the directory /var/log/ceph/, each daemon has its > >> own log file, e.g. OSD logs are named ceph-osd.*. > >> > >> I haven't tried it but I don't think SUSE Enterprise Storage deploys > >> OSDs on partitioned disks. Is there a way to attach a second disk to > >> the OSD nodes, maybe via USB or something? > >> > >> Although this thread is ceph related it is referring to a specific > >> produ
Re: [ceph-users] Ceph-Deploy error on 15/71 stage
Hi Eugen. Sorry for the delay in answering. Just looked in the /var/log/ceph/ directory. It only contains the following files (for example on node01): ### # ls -lart total 3864 -rw--- 1 ceph ceph 904 ago 24 13:11 ceph.audit.log-20180829.xz drwxr-xr-x 1 root root 898 ago 28 10:07 .. -rw-r--r-- 1 ceph ceph 189464 ago 28 23:59 ceph-mon.node01.log-20180829.xz -rw--- 1 ceph ceph 24360 ago 28 23:59 ceph.log-20180829.xz -rw-r--r-- 1 ceph ceph 48584 ago 29 00:00 ceph-mgr.node01.log-20180829.xz -rw--- 1 ceph ceph 0 ago 29 00:00 ceph.audit.log drwxrws--T 1 ceph ceph 352 ago 29 00:00 . -rw-r--r-- 1 ceph ceph 1908122 ago 29 12:46 ceph-mon.node01.log -rw--- 1 ceph ceph 175229 ago 29 12:48 ceph.log -rw-r--r-- 1 ceph ceph 1599920 ago 29 12:49 ceph-mgr.node01.log ### So, it only contains logs concerning the node itself (is it correct? sincer node01 is also the master, I was expecting it to have logs from the other too) and, moreover, no ceph-osd* files. Also, I'm looking the logs I have available, and nothing "shines out" (sorry for my poor english) as a possible error. Any suggestion on how to proceed? Thanks a lot in advance, Jones On Mon, Aug 27, 2018 at 5:29 AM Eugen Block wrote: > Hi Jones, > > all ceph logs are in the directory /var/log/ceph/, each daemon has its > own log file, e.g. OSD logs are named ceph-osd.*. > > I haven't tried it but I don't think SUSE Enterprise Storage deploys > OSDs on partitioned disks. Is there a way to attach a second disk to > the OSD nodes, maybe via USB or something? > > Although this thread is ceph related it is referring to a specific > product, so I would recommend to post your question in the SUSE forum > [1]. > > Regards, > Eugen > > [1] https://forums.suse.com/forumdisplay.php?99-SUSE-Enterprise-Storage > > Zitat von Jones de Andrade : > > > Hi Eugen. > > > > Thanks for the suggestion. I'll look for the logs (since it's our first > > attempt with ceph, I'll have to discover where they are, but no problem). > > > > One thing called my attention on your response however: > > > > I haven't made myself clear, but one of the failures we encountered were > > that the files now containing: > > > > node02: > >-- > >storage: > >-- > >osds: > >-- > >/dev/sda4: > >-- > >format: > >bluestore > >standalone: > >True > > > > Were originally empty, and we filled them by hand following a model found > > elsewhere on the web. It was necessary, so that we could continue, but > the > > model indicated that, for example, it should have the path for /dev/sda > > here, not /dev/sda4. We chosen to include the specific partition > > identification because we won't have dedicated disks here, rather just > the > > very same partition as all disks were partitioned exactly the same. > > > > While that was enough for the procedure to continue at that point, now I > > wonder if it was the right call and, if it indeed was, if it was done > > properly. As such, I wonder: what you mean by "wipe" the partition here? > > /dev/sda4 is created, but is both empty and unmounted: Should a different > > operation be performed on it, should I remove it first, should I have > > written the files above with only /dev/sda as target? > > > > I know that probably I wouldn't run in this issues with dedicated discks, > > but unfortunately that is absolutely not an option. > > > > Thanks a lot in advance for any comments and/or extra suggestions. > > > > Sincerely yours, > > > > Jones > > > > On Sat, Aug 25, 2018 at 5:46 PM Eugen Block wrote: > > > >> Hi, > >> > >> take a look into the logs, they should point you in the right direction. > >> Since the deployment stage fails at the OSD level, start with the OSD > >> logs. Something's not right with the disks/partitions, did you wipe > >> the partition from previous attempts? > >> > >> Regards, > >> Eugen > >> > >> Zitat von Jones de Andrade : > >> > >>> (Please forgive my previous email: I was using another message and > >>> completely forget to update the subject) > >>> > >>> Hi all. > >>> > >>> I'm new to ceph, and after having serious problems in ceph stages 0, 1 > >> and > >>> 2 that I could solve myself, now it seems that I have hit a w
Re: [ceph-users] Ceph-Deploy error on 15/71 stage
Hi Eugen. Thanks for the suggestion. I'll look for the logs (since it's our first attempt with ceph, I'll have to discover where they are, but no problem). One thing called my attention on your response however: I haven't made myself clear, but one of the failures we encountered were that the files now containing: node02: -- storage: -- osds: -- /dev/sda4: -- format: bluestore standalone: True Were originally empty, and we filled them by hand following a model found elsewhere on the web. It was necessary, so that we could continue, but the model indicated that, for example, it should have the path for /dev/sda here, not /dev/sda4. We chosen to include the specific partition identification because we won't have dedicated disks here, rather just the very same partition as all disks were partitioned exactly the same. While that was enough for the procedure to continue at that point, now I wonder if it was the right call and, if it indeed was, if it was done properly. As such, I wonder: what you mean by "wipe" the partition here? /dev/sda4 is created, but is both empty and unmounted: Should a different operation be performed on it, should I remove it first, should I have written the files above with only /dev/sda as target? I know that probably I wouldn't run in this issues with dedicated discks, but unfortunately that is absolutely not an option. Thanks a lot in advance for any comments and/or extra suggestions. Sincerely yours, Jones On Sat, Aug 25, 2018 at 5:46 PM Eugen Block wrote: > Hi, > > take a look into the logs, they should point you in the right direction. > Since the deployment stage fails at the OSD level, start with the OSD > logs. Something's not right with the disks/partitions, did you wipe > the partition from previous attempts? > > Regards, > Eugen > > Zitat von Jones de Andrade : > > > (Please forgive my previous email: I was using another message and > > completely forget to update the subject) > > > > Hi all. > > > > I'm new to ceph, and after having serious problems in ceph stages 0, 1 > and > > 2 that I could solve myself, now it seems that I have hit a wall harder > > than my head. :) > > > > When I run salt-run state.orch ceph.stage.deploy, i monitor I see it > going > > up to here: > > > > ### > > [14/71] ceph.sysctl on > > node01... ✓ (0.5s) > > node02 ✓ (0.7s) > > node03... ✓ (0.6s) > > node04. ✓ (0.5s) > > node05... ✓ (0.6s) > > node06.. ✓ (0.5s) > > > > [15/71] ceph.osd on > > node01.. ❌ (0.7s) > > node02 ❌ (0.7s) > > node03... ❌ (0.7s) > > node04. ❌ (0.6s) > > node05... ❌ (0.6s) > > node06.. ❌ (0.7s) > > > > Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71 time=624.7s > > > > Failures summary: > > > > ceph.osd (/srv/salt/ceph/osd): > > node02: > > deploy OSDs: Module function osd.deploy threw an exception. > Exception: > > Mine on node02 for cephdisks.list > > node03: > > deploy OSDs: Module function osd.deploy threw an exception. > Exception: > > Mine on node03 for cephdisks.list > > node01: > > deploy OSDs: Module function osd.deploy threw an exception. > Exception: > > Mine on node01 for cephdisks.list > > node04: > > deploy OSDs: Module function osd.deploy threw an exception. > Exception: > > Mine on node04 for cephdisks.list > > node05: > > deploy OSDs: Module function osd.deploy threw an exception. > Exception: > > Mine on node05 for cephdisks.list > > node06: > > deploy OSDs: Module function osd.deploy threw an exception. > Exception: > > Mine on node06 for cephdisks.list > > ### > > > > Since this is a first attempt in 6 simple test machines, we are going to > > put the mon, osds, etc, in all nodes at first. Only the master is left > in a > > single machine (node01) by now. > > > > As they are simple machines, they have a single hdd, which is par
[ceph-users] Ceph-Deploy error on 15/71 stage
(Please forgive my previous email: I was using another message and completely forget to update the subject) Hi all. I'm new to ceph, and after having serious problems in ceph stages 0, 1 and 2 that I could solve myself, now it seems that I have hit a wall harder than my head. :) When I run salt-run state.orch ceph.stage.deploy, i monitor I see it going up to here: ### [14/71] ceph.sysctl on node01... ✓ (0.5s) node02 ✓ (0.7s) node03... ✓ (0.6s) node04. ✓ (0.5s) node05... ✓ (0.6s) node06.. ✓ (0.5s) [15/71] ceph.osd on node01.. ❌ (0.7s) node02 ❌ (0.7s) node03... ❌ (0.7s) node04. ❌ (0.6s) node05... ❌ (0.6s) node06.. ❌ (0.7s) Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71 time=624.7s Failures summary: ceph.osd (/srv/salt/ceph/osd): node02: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node02 for cephdisks.list node03: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node03 for cephdisks.list node01: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node01 for cephdisks.list node04: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node04 for cephdisks.list node05: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node05 for cephdisks.list node06: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node06 for cephdisks.list ### Since this is a first attempt in 6 simple test machines, we are going to put the mon, osds, etc, in all nodes at first. Only the master is left in a single machine (node01) by now. As they are simple machines, they have a single hdd, which is partitioned as follows (the hda4 partition is unmounted and left for the ceph system): ### # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:00 465,8G 0 disk ├─sda1 8:10 500M 0 part /boot/efi ├─sda2 8:2016G 0 part [SWAP] ├─sda3 8:30 49,3G 0 part / └─sda4 8:40 400G 0 part sr0 11:01 3,7G 0 rom # salt -I 'roles:storage' cephdisks.list node01: node02: node03: node04: node05: node06: # salt -I 'roles:storage' pillar.get ceph node02: -- storage: -- osds: -- /dev/sda4: -- format: bluestore standalone: True (and so on for all 6 machines) ## Finally and just in case, my policy.cfg file reads: # #cluster-unassigned/cluster/*.sls cluster-ceph/cluster/*.sls profile-default/cluster/*.sls profile-default/stack/default/ceph/minions/*yml config/stack/default/global.yml config/stack/default/ceph/cluster.yml role-master/cluster/node01.sls role-admin/cluster/*.sls role-mon/cluster/*.sls role-mgr/cluster/*.sls role-mds/cluster/*.sls role-ganesha/cluster/*.sls role-client-nfs/cluster/*.sls role-client-cephfs/cluster/*.sls ## Please, could someone help me and shed some light on this issue? Thanks a lot in advance, Regasrds, Jones ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mimic prometheus plugin -no socket could be created
Hi all. I'm new to ceph, and after having serious problems in ceph stages 0, 1 and 2 that I could solve myself, now it seems that I have hit a wall harder than my head. :) When I run salt-run state.orch ceph.stage.deploy, i monitor I see it going up to here: ### [14/71] ceph.sysctl on node01... ✓ (0.5s) node02 ✓ (0.7s) node03... ✓ (0.6s) node04. ✓ (0.5s) node05... ✓ (0.6s) node06.. ✓ (0.5s) [15/71] ceph.osd on node01.. ❌ (0.7s) node02 ❌ (0.7s) node03... ❌ (0.7s) node04. ❌ (0.6s) node05... ❌ (0.6s) node06.. ❌ (0.7s) Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71 time=624.7s Failures summary: ceph.osd (/srv/salt/ceph/osd): node02: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node02 for cephdisks.list node03: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node03 for cephdisks.list node01: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node01 for cephdisks.list node04: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node04 for cephdisks.list node05: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node05 for cephdisks.list node06: deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node06 for cephdisks.list ### Since this is a first attempt in 6 simple test machines, we are going to put the mon, osds, etc, in all nodes at first. Only the master is left in a single machine (node01) by now. As they are simple machines, they have a single hdd, which is partitioned as follows (the hda4 partition is unmounted and left for the ceph system): ### # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:00 465,8G 0 disk ├─sda1 8:10 500M 0 part /boot/efi ├─sda2 8:2016G 0 part [SWAP] ├─sda3 8:30 49,3G 0 part / └─sda4 8:40 400G 0 part sr0 11:01 3,7G 0 rom # salt -I 'roles:storage' cephdisks.list node01: node02: node03: node04: node05: node06: # salt -I 'roles:storage' pillar.get ceph node02: -- storage: -- osds: -- /dev/sda4: -- format: bluestore standalone: True (and so on for all 6 machines) ## Finally and just in case, my policy.cfg file reads: # #cluster-unassigned/cluster/*.sls cluster-ceph/cluster/*.sls profile-default/cluster/*.sls profile-default/stack/default/ceph/minions/*yml config/stack/default/global.yml config/stack/default/ceph/cluster.yml role-master/cluster/node01.sls role-admin/cluster/*.sls role-mon/cluster/*.sls role-mgr/cluster/*.sls role-mds/cluster/*.sls role-ganesha/cluster/*.sls role-client-nfs/cluster/*.sls role-client-cephfs/cluster/*.sls ## Please, could someone help me and shed some light on this issue? Thanks a lot in advance, Regasrds, Jones On Thu, Aug 23, 2018 at 2:46 PM John Spray wrote: > On Thu, Aug 23, 2018 at 5:18 PM Steven Vacaroaia wrote: > > > > Hi All, > > > > I am trying to enable prometheus plugin with no success due to "no > socket could be created" > > > > The instructions for enabling the plugin are very straightforward and > simple > > > > Note > > My ultimate goal is to use Prometheus with Cephmetrics > > Some of you suggested to deploy ceph-exporter but why do we need to do > that when there is a plugin already ? > > > > > > How can I troubleshoot this further ? > > > > nhandled exception from module 'prometheus' while running on mgr.mon01: > error('No socket could be created',) > > Aug 23 12:03:06 mon01 ceph-mgr: 2018-08-23 12:03:06.615 7fadab50e700 -1 > prometheus.serve: > > Aug 23 12:03:06 mon01 ceph-mgr: 2018-08-23 12:03:06.615 7fadab50e700 -1 > Traceback (most recent call last): > > Aug 23 12:03:06 mon01 ceph-mgr: File > "/usr/lib64/ceph/mgr/prometheus/module.py", line 720, in serve > > Aug 23 12:03:06 mon01 ceph-mgr: cherrypy.engine.start() > > Aug 23 12:03:06 mon01 ceph-mgr: File > "/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 250, in > start > > Aug 23 12:03:06 mon01 ceph-mgr: raise e_info > > Aug 23 12:03:06 mon01 ceph-mgr: ChannelFailures: error('No socket could > be created',) > > The things I usually check if a process can't create a socket are: > - is
[ceph-users] Looking for some advise on distributed FS: Is Ceph the right option for me?
Hi all. I'm looking for some information on several distributed filesystems for our application. It looks like it finally came down to two candidates, Ceph being one of them. But there are still a few questions about ir that I would really like to clarify, if possible. Our plan, initially on 6 workstations, is to have it hosting a distributed file system that can withstand two simultaneous computers failures without data loss (something that can remember a raid 6, but over the network). This file system will also need to be also remotely mounted (NFS server with fallbacks) by other 5+ computers. Students will be working on all 11+ computers at the same time (different requisites from different softwares: some use many small files, other a few really big, 100s gb, files), and absolutely no hardware modifications are allowed. This initial test bed is for undergraduate students usage, but if successful will be employed also for our small clusters. The connection is a simple GbE. Our actual concerns are: 1) Data Resilience: It seems that double copy of each block is the standard setting, is it correct? As such, it will strip-parity data among three computers for each block? 2) Metadata Resilience: We seen that we can now have more than a single Metadata Server (which was a show-stopper on previous versions). However, do they have to be dedicated boxes, or they can share boxes with the Data Servers? Can it be configured in such a way that even if two metadata server computers fail the whole system data will still be accessible from the remaining computers, without interruptions, or they share different data aiming only for performance? 3) Other softwares compability: We seen that there is NFS incompability, is it correct? Also, any posix issues? 4) No single (or double) point of failure: every single possible stance has to be able to endure a *double* failure (yes, things can get time to be fixed here). Does Ceph need s single master server for any of its activities? Can it endure double failure? How long would it take to any sort of "fallback" to be completed, users would need to wait to regain access? I think that covers the initial questions we have. Sorry if this is the wrong list, however. Looking forward for any answer or suggestion, Regards, Jones ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com