Re: [ceph-users] Ceph-Deploy error on 15/71 stage

2018-09-04 Thread Jones de Andrade
Hi Eugen.

Just tried everything again here by removing the /sda4 partitions and
letting it so that either salt-run proposal-populate or salt-run state.orch
ceph.stage.configure could try to find the free space on the partitions to
work with: unsuccessfully again. :(

Just to make things clear: are you so telling me that it is completely
impossible to have a ceph "volume" in non-dedicated devices, sharing space
with, for instance, the nodes swap, boot or main partition?

And so the only possible way to have a functioning ceph distributed
filesystem working would be by having in each node at least one disk
dedicated for the operational system and another, independent disk
dedicated to the ceph filesystem?

That would be a awful drawback in our plans if real, but if there is no
other way, we will have to just give up. Just, please, answer this two
questions clearly, before we capitulate?  :(

Anyway, thanks a lot, once again,

Jones

On Mon, Sep 3, 2018 at 5:39 AM Eugen Block  wrote:

> Hi Jones,
>
> I still don't think creating an OSD on a partition will work. The
> reason is that SES creates an additional partition per OSD resulting
> in something like this:
>
> vdb   253:16   05G  0 disk
> ├─vdb1253:17   0  100M  0 part /var/lib/ceph/osd/ceph-1
> └─vdb2253:18   0  4,9G  0 part
>
> Even with external block.db and wal.db on additional devices you would
> still need two partitions for the OSD. I'm afraid with your setup this
> can't work.
>
> Regards,
> Eugen
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-Deploy error on 15/71 stage

2018-08-31 Thread Jones de Andrade
-03:00 polar kernel: [3.036222] ata2.00:
configured for UDMA/133
2018-08-30T10:21:18.787469-03:00 polar kernel: [3.043916] scsi 1:0:0:0:
CD-ROMPLDS DVD+-RW DU-8A5LH 6D1M PQ: 0 ANSI: 5
2018-08-30T10:21:18.787470-03:00 polar kernel: [3.052087] usb 1-6: new
low-speed USB device number 2 using xhci_hcd
2018-08-30T10:21:18.787471-03:00 polar kernel: [3.063179] scsi 1:0:0:0:
Attached scsi generic sg1 type 5
2018-08-30T10:21:18.787472-03:00 polar kernel: [3.083566]  sda: sda1
sda2 sda3 sda4
2018-08-30T10:21:18.787472-03:00 polar kernel: [3.084238] sd 0:0:0:0:
[sda] Attached SCSI disk
2018-08-30T10:21:18.787473-03:00 polar kernel: [3.113065] sr 1:0:0:0:
[sr0] scsi3-mmc drive: 24x/24x writer cd/rw xa/form2 cdda tray
2018-08-30T10:21:18.787475-03:00 polar kernel: [3.113068] cdrom:
Uniform CD-ROM driver Revision: 3.20
2018-08-30T10:21:18.787476-03:00 polar kernel: [3.113272] sr 1:0:0:0:
Attached scsi CD-ROM sr0
2018-08-30T10:21:18.787477-03:00 polar kernel: [3.213133] usb 1-6: New
USB device found, idVendor=413c, idProduct=2113
###

I'm trying to run deploy again here, however I'm having some connection
issues today (possibly due to the heavy rain) affecting the initial stages
of it. If it succeeds, I send the outputs from /var/log/messages on the
minions right away.

Thanks a lot,

Jones

On Fri, Aug 31, 2018 at 4:00 AM Eugen Block  wrote:

> Hi,
>
> I'm not sure if there's a misunderstanding. You need to track the logs
> during the osd deployment step (stage.3), that is where it fails, and
> this is where /var/log/messages could be useful. Since the deployment
> failed you have no systemd-units (ceph-osd@.service) to log
> anything.
>
> Before running stage.3 again try something like
>
> grep -C5 ceph-disk /var/log/messages (or messages-201808*.xz)
>
> or
>
> grep -C5 sda4 /var/log/messages (or messages-201808*.xz)
>
> If that doesn't reveal anything run stage.3 again and watch the logs.
>
> Regards,
> Eugen
>
>
> Zitat von Jones de Andrade :
>
> > Hi Eugen.
> >
> > Ok, edited the file /etc/salt/minion, uncommented the "log_level_logfile"
> > line and set it to "debug" level.
> >
> > Turned off the computer, waited a few minutes so that the time frame
> would
> > stand out in the /var/log/messages file, and restarted the computer.
> >
> > Using vi I "greped out" (awful wording) the reboot section. From that, I
> > also removed most of what it seemed totally unrelated to ceph, salt,
> > minions, grafana, prometheus, whatever.
> >
> > I got the lines below. It does not seem to complain about anything that I
> > can see. :(
> >
> > 
> > 2018-08-30T15:41:46.455383-03:00 torcello systemd[1]: systemd 234 running
> > in system mode. (+PAM -AUDIT +SELINUX -IMA +APPARMOR -SMACK +SYSVINIT
> +UTMP
> > +LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID -ELFUTILS
> > +KMOD -IDN2 -IDN default-hierarchy=hybrid)
> > 2018-08-30T15:41:46.456330-03:00 torcello systemd[1]: Detected
> architecture
> > x86-64.
> > 2018-08-30T15:41:46.456350-03:00 torcello systemd[1]: nss-lookup.target:
> > Dependency Before=nss-lookup.target dropped
> > 2018-08-30T15:41:46.456357-03:00 torcello systemd[1]: Started Load Kernel
> > Modules.
> > 2018-08-30T15:41:46.456369-03:00 torcello systemd[1]: Starting Apply
> Kernel
> > Variables...
> > 2018-08-30T15:41:46.457230-03:00 torcello systemd[1]: Started
> Alertmanager
> > for prometheus.
> > 2018-08-30T15:41:46.457237-03:00 torcello systemd[1]: Started Monitoring
> > system and time series database.
> > 2018-08-30T15:41:46.457403-03:00 torcello systemd[1]: Starting NTP
> > client/server...
> >
> >
> >
> >
> >
> >
> > *2018-08-30T15:41:46.457425-03:00 torcello systemd[1]: Started Prometheus
> > exporter for machine metrics.2018-08-30T15:41:46.457706-03:00 torcello
> > prometheus[695]: level=info ts=2018-08-30T18:41:44.797896888Z
> > caller=main.go:225 msg="Starting Prometheus" version="(version=2.1.0,
> > branch=non-git, revision=non-git)"2018-08-30T15:41:46.457712-03:00
> torcello
> > prometheus[695]: level=info ts=2018-08-30T18:41:44.797969232Z
> > caller=main.go:226 build_context="(go=go1.9.4, user=abuild@lamb69,
> > date=20180513-03:46:03)"2018-08-30T15:41:46.457719-03:00 torcello
> > prometheus[695]: level=info ts=2018-08-30T18:41:44.798008802Z
> > caller=main.go:227 host_details="(Linux 4.12.14-lp150.12.4-default #1 SMP
> > Tue May 22 05:17:22 UTC 2018 (66b2eda) x86_64 torcello
> > (none))"2018-08-30T15:41:46.457726-03:00 torcello pr

Re: [ceph-users] Ceph-Deploy error on 15/71 stage

2018-08-30 Thread Jones de Andrade
o systemd[2295]: Reached target
Timers.
2018-08-30T15:44:15.511664-03:00 torcello systemd[2295]: Reached target
Paths.
2018-08-30T15:44:15.517873-03:00 torcello systemd[2295]: Listening on D-Bus
User Message Bus Socket.
2018-08-30T15:44:15.518060-03:00 torcello systemd[2295]: Reached target
Sockets.
2018-08-30T15:44:15.518216-03:00 torcello systemd[2295]: Reached target
Basic System.
2018-08-30T15:44:15.518373-03:00 torcello systemd[2295]: Reached target
Default.
2018-08-30T15:44:15.518501-03:00 torcello systemd[2295]: Startup finished
in 31ms.
2018-08-30T15:44:15.518634-03:00 torcello systemd[1]: Started User Manager
for UID 1000.
2018-08-30T15:44:15.518759-03:00 torcello systemd[1792]: Received
SIGRTMIN+24 from PID 2300 (kill).
2018-08-30T15:44:15.537634-03:00 torcello systemd[1]: Stopped User Manager
for UID 464.
2018-08-30T15:44:15.538422-03:00 torcello systemd[1]: Removed slice User
Slice of sddm.
2018-08-30T15:44:15.613246-03:00 torcello systemd[2295]: Started D-Bus User
Message Bus.
2018-08-30T15:44:15.623989-03:00 torcello dbus-daemon[2311]: [session
uid=1000 pid=2311] Successfully activated service 'org.freedesktop.systemd1'
2018-08-30T15:44:16.447162-03:00 torcello kapplymousetheme[2350]:
kcm_input: Using X11 backend
2018-08-30T15:44:16.901642-03:00 torcello node_exporter[807]:
time="2018-08-30T15:44:16-03:00" level=error msg="ERROR: ntp collector
failed after 0.000205s: couldn't get SNTP reply: read udp 127.0.0.1:53434->
127.0.0.1:123: read: connection refused" source="collector.go:123"


Any ideas?

Thanks a lot,

Jones

On Thu, Aug 30, 2018 at 4:14 AM Eugen Block  wrote:

> Hi,
>
> > So, it only contains logs concerning the node itself (is it correct?
> sincer
> > node01 is also the master, I was expecting it to have logs from the other
> > too) and, moreover, no ceph-osd* files. Also, I'm looking the logs I have
> > available, and nothing "shines out" (sorry for my poor english) as a
> > possible error.
>
> the logging is not configured to be centralised per default, you would
> have to configure that yourself.
>
> Regarding the OSDs, if there are OSD logs created, they're created on
> the OSD nodes, not on the master. But since the OSD deployment fails,
> there probably are no OSD specific logs yet. So you'll have to take a
> look into the syslog (/var/log/messages), that's where the salt-minion
> reports its attempts to create the OSDs. Chances are high that you'll
> find the root cause in here.
>
> If the output is not enough, set the log-level to debug:
>
> osd-1:~ # grep -E "^log_level" /etc/salt/minion
> log_level: debug
>
>
> Regards,
> Eugen
>
>
> Zitat von Jones de Andrade :
>
> > Hi Eugen.
> >
> > Sorry for the delay in answering.
> >
> > Just looked in the /var/log/ceph/ directory. It only contains the
> following
> > files (for example on node01):
> >
> > ###
> > # ls -lart
> > total 3864
> > -rw--- 1 ceph ceph 904 ago 24 13:11 ceph.audit.log-20180829.xz
> > drwxr-xr-x 1 root root 898 ago 28 10:07 ..
> > -rw-r--r-- 1 ceph ceph  189464 ago 28 23:59
> ceph-mon.node01.log-20180829.xz
> > -rw--- 1 ceph ceph   24360 ago 28 23:59 ceph.log-20180829.xz
> > -rw-r--r-- 1 ceph ceph   48584 ago 29 00:00
> ceph-mgr.node01.log-20180829.xz
> > -rw--- 1 ceph ceph   0 ago 29 00:00 ceph.audit.log
> > drwxrws--T 1 ceph ceph 352 ago 29 00:00 .
> > -rw-r--r-- 1 ceph ceph 1908122 ago 29 12:46 ceph-mon.node01.log
> > -rw--- 1 ceph ceph  175229 ago 29 12:48 ceph.log
> > -rw-r--r-- 1 ceph ceph 1599920 ago 29 12:49 ceph-mgr.node01.log
> > ###
> >
> > So, it only contains logs concerning the node itself (is it correct?
> sincer
> > node01 is also the master, I was expecting it to have logs from the other
> > too) and, moreover, no ceph-osd* files. Also, I'm looking the logs I have
> > available, and nothing "shines out" (sorry for my poor english) as a
> > possible error.
> >
> > Any suggestion on how to proceed?
> >
> > Thanks a lot in advance,
> >
> > Jones
> >
> >
> > On Mon, Aug 27, 2018 at 5:29 AM Eugen Block  wrote:
> >
> >> Hi Jones,
> >>
> >> all ceph logs are in the directory /var/log/ceph/, each daemon has its
> >> own log file, e.g. OSD logs are named ceph-osd.*.
> >>
> >> I haven't tried it but I don't think SUSE Enterprise Storage deploys
> >> OSDs on partitioned disks. Is there a way to attach a second disk to
> >> the OSD nodes, maybe via USB or something?
> >>
> >> Although this thread is ceph related it is referring to a specific
> >> produ

Re: [ceph-users] Ceph-Deploy error on 15/71 stage

2018-08-29 Thread Jones de Andrade
Hi Eugen.

Sorry for the delay in answering.

Just looked in the /var/log/ceph/ directory. It only contains the following
files (for example on node01):

###
# ls -lart
total 3864
-rw--- 1 ceph ceph 904 ago 24 13:11 ceph.audit.log-20180829.xz
drwxr-xr-x 1 root root 898 ago 28 10:07 ..
-rw-r--r-- 1 ceph ceph  189464 ago 28 23:59 ceph-mon.node01.log-20180829.xz
-rw--- 1 ceph ceph   24360 ago 28 23:59 ceph.log-20180829.xz
-rw-r--r-- 1 ceph ceph   48584 ago 29 00:00 ceph-mgr.node01.log-20180829.xz
-rw--- 1 ceph ceph   0 ago 29 00:00 ceph.audit.log
drwxrws--T 1 ceph ceph 352 ago 29 00:00 .
-rw-r--r-- 1 ceph ceph 1908122 ago 29 12:46 ceph-mon.node01.log
-rw--- 1 ceph ceph  175229 ago 29 12:48 ceph.log
-rw-r--r-- 1 ceph ceph 1599920 ago 29 12:49 ceph-mgr.node01.log
###

So, it only contains logs concerning the node itself (is it correct? sincer
node01 is also the master, I was expecting it to have logs from the other
too) and, moreover, no ceph-osd* files. Also, I'm looking the logs I have
available, and nothing "shines out" (sorry for my poor english) as a
possible error.

Any suggestion on how to proceed?

Thanks a lot in advance,

Jones


On Mon, Aug 27, 2018 at 5:29 AM Eugen Block  wrote:

> Hi Jones,
>
> all ceph logs are in the directory /var/log/ceph/, each daemon has its
> own log file, e.g. OSD logs are named ceph-osd.*.
>
> I haven't tried it but I don't think SUSE Enterprise Storage deploys
> OSDs on partitioned disks. Is there a way to attach a second disk to
> the OSD nodes, maybe via USB or something?
>
> Although this thread is ceph related it is referring to a specific
> product, so I would recommend to post your question in the SUSE forum
> [1].
>
> Regards,
> Eugen
>
> [1] https://forums.suse.com/forumdisplay.php?99-SUSE-Enterprise-Storage
>
> Zitat von Jones de Andrade :
>
> > Hi Eugen.
> >
> > Thanks for the suggestion. I'll look for the logs (since it's our first
> > attempt with ceph, I'll have to discover where they are, but no problem).
> >
> > One thing called my attention on your response however:
> >
> > I haven't made myself clear, but one of the failures we encountered were
> > that the files now containing:
> >
> > node02:
> >--
> >storage:
> >--
> >osds:
> >--
> >/dev/sda4:
> >--
> >format:
> >bluestore
> >standalone:
> >True
> >
> > Were originally empty, and we filled them by hand following a model found
> > elsewhere on the web. It was necessary, so that we could continue, but
> the
> > model indicated that, for example, it should have the path for /dev/sda
> > here, not /dev/sda4. We chosen to include the specific partition
> > identification because we won't have dedicated disks here, rather just
> the
> > very same partition as all disks were partitioned exactly the same.
> >
> > While that was enough for the procedure to continue at that point, now I
> > wonder if it was the right call and, if it indeed was, if it was done
> > properly.  As such, I wonder: what you mean by "wipe" the partition here?
> > /dev/sda4 is created, but is both empty and unmounted: Should a different
> > operation be performed on it, should I remove it first, should I have
> > written the files above with only /dev/sda as target?
> >
> > I know that probably I wouldn't run in this issues with dedicated discks,
> > but unfortunately that is absolutely not an option.
> >
> > Thanks a lot in advance for any comments and/or extra suggestions.
> >
> > Sincerely yours,
> >
> > Jones
> >
> > On Sat, Aug 25, 2018 at 5:46 PM Eugen Block  wrote:
> >
> >> Hi,
> >>
> >> take a look into the logs, they should point you in the right direction.
> >> Since the deployment stage fails at the OSD level, start with the OSD
> >> logs. Something's not right with the disks/partitions, did you wipe
> >> the partition from previous attempts?
> >>
> >> Regards,
> >> Eugen
> >>
> >> Zitat von Jones de Andrade :
> >>
> >>> (Please forgive my previous email: I was using another message and
> >>> completely forget to update the subject)
> >>>
> >>> Hi all.
> >>>
> >>> I'm new to ceph, and after having serious problems in ceph stages 0, 1
> >> and
> >>> 2 that I could solve myself, now it seems that I have hit a w

Re: [ceph-users] Ceph-Deploy error on 15/71 stage

2018-08-26 Thread Jones de Andrade
Hi Eugen.

Thanks for the suggestion. I'll look for the logs (since it's our first
attempt with ceph, I'll have to discover where they are, but no problem).

One thing called my attention on your response however:

I haven't made myself clear, but one of the failures we encountered were
that the files now containing:

node02:
--
storage:
--
osds:
--
/dev/sda4:
--
format:
bluestore
standalone:
True

Were originally empty, and we filled them by hand following a model found
elsewhere on the web. It was necessary, so that we could continue, but the
model indicated that, for example, it should have the path for /dev/sda
here, not /dev/sda4. We chosen to include the specific partition
identification because we won't have dedicated disks here, rather just the
very same partition as all disks were partitioned exactly the same.

While that was enough for the procedure to continue at that point, now I
wonder if it was the right call and, if it indeed was, if it was done
properly.  As such, I wonder: what you mean by "wipe" the partition here?
/dev/sda4 is created, but is both empty and unmounted: Should a different
operation be performed on it, should I remove it first, should I have
written the files above with only /dev/sda as target?

I know that probably I wouldn't run in this issues with dedicated discks,
but unfortunately that is absolutely not an option.

Thanks a lot in advance for any comments and/or extra suggestions.

Sincerely yours,

Jones

On Sat, Aug 25, 2018 at 5:46 PM Eugen Block  wrote:

> Hi,
>
> take a look into the logs, they should point you in the right direction.
> Since the deployment stage fails at the OSD level, start with the OSD
> logs. Something's not right with the disks/partitions, did you wipe
> the partition from previous attempts?
>
> Regards,
> Eugen
>
> Zitat von Jones de Andrade :
>
> > (Please forgive my previous email: I was using another message and
> > completely forget to update the subject)
> >
> > Hi all.
> >
> > I'm new to ceph, and after having serious problems in ceph stages 0, 1
> and
> > 2 that I could solve myself, now it seems that I have hit a wall harder
> > than my head. :)
> >
> > When I run salt-run state.orch ceph.stage.deploy, i monitor I see it
> going
> > up to here:
> >
> > ###
> > [14/71]   ceph.sysctl on
> >   node01... ✓ (0.5s)
> >   node02 ✓ (0.7s)
> >   node03... ✓ (0.6s)
> >   node04. ✓ (0.5s)
> >   node05... ✓ (0.6s)
> >   node06.. ✓ (0.5s)
> >
> > [15/71]   ceph.osd on
> >   node01.. ❌ (0.7s)
> >   node02 ❌ (0.7s)
> >   node03... ❌ (0.7s)
> >   node04. ❌ (0.6s)
> >   node05... ❌ (0.6s)
> >   node06.. ❌ (0.7s)
> >
> > Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71 time=624.7s
> >
> > Failures summary:
> >
> > ceph.osd (/srv/salt/ceph/osd):
> >   node02:
> > deploy OSDs: Module function osd.deploy threw an exception.
> Exception:
> > Mine on node02 for cephdisks.list
> >   node03:
> > deploy OSDs: Module function osd.deploy threw an exception.
> Exception:
> > Mine on node03 for cephdisks.list
> >   node01:
> > deploy OSDs: Module function osd.deploy threw an exception.
> Exception:
> > Mine on node01 for cephdisks.list
> >   node04:
> > deploy OSDs: Module function osd.deploy threw an exception.
> Exception:
> > Mine on node04 for cephdisks.list
> >   node05:
> > deploy OSDs: Module function osd.deploy threw an exception.
> Exception:
> > Mine on node05 for cephdisks.list
> >   node06:
> > deploy OSDs: Module function osd.deploy threw an exception.
> Exception:
> > Mine on node06 for cephdisks.list
> > ###
> >
> > Since this is a first attempt in 6 simple test machines, we are going to
> > put the mon, osds, etc, in all nodes at first. Only the master is left
> in a
> > single machine (node01) by now.
> >
> > As they are simple machines, they have a single hdd, which is par

[ceph-users] Ceph-Deploy error on 15/71 stage

2018-08-24 Thread Jones de Andrade
(Please forgive my previous email: I was using another message and
completely forget to update the subject)

Hi all.

I'm new to ceph, and after having serious problems in ceph stages 0, 1 and
2 that I could solve myself, now it seems that I have hit a wall harder
than my head. :)

When I run salt-run state.orch ceph.stage.deploy, i monitor I see it going
up to here:

###
[14/71]   ceph.sysctl on
  node01... ✓ (0.5s)
  node02 ✓ (0.7s)
  node03... ✓ (0.6s)
  node04. ✓ (0.5s)
  node05... ✓ (0.6s)
  node06.. ✓ (0.5s)

[15/71]   ceph.osd on
  node01.. ❌ (0.7s)
  node02 ❌ (0.7s)
  node03... ❌ (0.7s)
  node04. ❌ (0.6s)
  node05... ❌ (0.6s)
  node06.. ❌ (0.7s)

Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71 time=624.7s

Failures summary:

ceph.osd (/srv/salt/ceph/osd):
  node02:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node02 for cephdisks.list
  node03:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node03 for cephdisks.list
  node01:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node01 for cephdisks.list
  node04:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node04 for cephdisks.list
  node05:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node05 for cephdisks.list
  node06:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node06 for cephdisks.list
###

Since this is a first attempt in 6 simple test machines, we are going to
put the mon, osds, etc, in all nodes at first. Only the master is left in a
single machine (node01) by now.

As they are simple machines, they have a single hdd, which is partitioned
as follows (the hda4 partition is unmounted and left for the ceph system):

###
# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda  8:00 465,8G  0 disk
├─sda1   8:10   500M  0 part /boot/efi
├─sda2   8:2016G  0 part [SWAP]
├─sda3   8:30  49,3G  0 part /
└─sda4   8:40   400G  0 part
sr0 11:01   3,7G  0 rom

# salt -I 'roles:storage' cephdisks.list
node01:
node02:
node03:
node04:
node05:
node06:

# salt -I 'roles:storage' pillar.get ceph
node02:
--
storage:
--
osds:
--
/dev/sda4:
--
format:
bluestore
standalone:
True
(and so on for all 6 machines)
##

Finally and just in case, my policy.cfg file reads:

#
#cluster-unassigned/cluster/*.sls
cluster-ceph/cluster/*.sls
profile-default/cluster/*.sls
profile-default/stack/default/ceph/minions/*yml
config/stack/default/global.yml
config/stack/default/ceph/cluster.yml
role-master/cluster/node01.sls
role-admin/cluster/*.sls
role-mon/cluster/*.sls
role-mgr/cluster/*.sls
role-mds/cluster/*.sls
role-ganesha/cluster/*.sls
role-client-nfs/cluster/*.sls
role-client-cephfs/cluster/*.sls
##

Please, could someone help me and shed some light on this issue?

Thanks a lot in advance,

Regasrds,

Jones
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic prometheus plugin -no socket could be created

2018-08-24 Thread Jones de Andrade
Hi all.

I'm new to ceph, and after having serious problems in ceph stages 0, 1 and
2 that I could solve myself, now it seems that I have hit a wall harder
than my head. :)

When I run salt-run state.orch ceph.stage.deploy, i monitor I see it going
up to here:

###
[14/71]   ceph.sysctl on
  node01... ✓ (0.5s)
  node02 ✓ (0.7s)
  node03... ✓ (0.6s)
  node04. ✓ (0.5s)
  node05... ✓ (0.6s)
  node06.. ✓ (0.5s)

[15/71]   ceph.osd on
  node01.. ❌ (0.7s)
  node02 ❌ (0.7s)
  node03... ❌ (0.7s)
  node04. ❌ (0.6s)
  node05... ❌ (0.6s)
  node06.. ❌ (0.7s)

Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71 time=624.7s

Failures summary:

ceph.osd (/srv/salt/ceph/osd):
  node02:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node02 for cephdisks.list
  node03:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node03 for cephdisks.list
  node01:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node01 for cephdisks.list
  node04:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node04 for cephdisks.list
  node05:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node05 for cephdisks.list
  node06:
deploy OSDs: Module function osd.deploy threw an exception. Exception:
Mine on node06 for cephdisks.list
###

Since this is a first attempt in 6 simple test machines, we are going to
put the mon, osds, etc, in all nodes at first. Only the master is left in a
single machine (node01) by now.

As they are simple machines, they have a single hdd, which is partitioned
as follows (the hda4 partition is unmounted and left for the ceph system):

###
# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda  8:00 465,8G  0 disk
├─sda1   8:10   500M  0 part /boot/efi
├─sda2   8:2016G  0 part [SWAP]
├─sda3   8:30  49,3G  0 part /
└─sda4   8:40   400G  0 part
sr0 11:01   3,7G  0 rom

# salt -I 'roles:storage' cephdisks.list
node01:
node02:
node03:
node04:
node05:
node06:

# salt -I 'roles:storage' pillar.get ceph
node02:
--
storage:
--
osds:
--
/dev/sda4:
--
format:
bluestore
standalone:
True
(and so on for all 6 machines)
##

Finally and just in case, my policy.cfg file reads:

#
#cluster-unassigned/cluster/*.sls
cluster-ceph/cluster/*.sls
profile-default/cluster/*.sls
profile-default/stack/default/ceph/minions/*yml
config/stack/default/global.yml
config/stack/default/ceph/cluster.yml
role-master/cluster/node01.sls
role-admin/cluster/*.sls
role-mon/cluster/*.sls
role-mgr/cluster/*.sls
role-mds/cluster/*.sls
role-ganesha/cluster/*.sls
role-client-nfs/cluster/*.sls
role-client-cephfs/cluster/*.sls
##

Please, could someone help me and shed some light on this issue?

Thanks a lot in advance,

Regasrds,

Jones



On Thu, Aug 23, 2018 at 2:46 PM John Spray  wrote:

> On Thu, Aug 23, 2018 at 5:18 PM Steven Vacaroaia  wrote:
> >
> > Hi All,
> >
> > I am trying to enable prometheus plugin with no success due to "no
> socket could be created"
> >
> > The instructions for enabling the plugin are very straightforward and
> simple
> >
> > Note
> > My ultimate goal is to use Prometheus with Cephmetrics
> > Some of you suggested to deploy ceph-exporter but why do we need to do
> that when there is a plugin already ?
> >
> >
> > How can I troubleshoot this further ?
> >
> > nhandled exception from module 'prometheus' while running on mgr.mon01:
> error('No socket could be created',)
> > Aug 23 12:03:06 mon01 ceph-mgr: 2018-08-23 12:03:06.615 7fadab50e700 -1
> prometheus.serve:
> > Aug 23 12:03:06 mon01 ceph-mgr: 2018-08-23 12:03:06.615 7fadab50e700 -1
> Traceback (most recent call last):
> > Aug 23 12:03:06 mon01 ceph-mgr: File
> "/usr/lib64/ceph/mgr/prometheus/module.py", line 720, in serve
> > Aug 23 12:03:06 mon01 ceph-mgr: cherrypy.engine.start()
> > Aug 23 12:03:06 mon01 ceph-mgr: File
> "/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 250, in
> start
> > Aug 23 12:03:06 mon01 ceph-mgr: raise e_info
> > Aug 23 12:03:06 mon01 ceph-mgr: ChannelFailures: error('No socket could
> be created',)
>
> The things I usually check if a process can't create a socket are:
>  - is 

[ceph-users] Looking for some advise on distributed FS: Is Ceph the right option for me?

2018-07-10 Thread Jones de Andrade
Hi all.

I'm looking for some information on several distributed filesystems for our
application.

It looks like it finally came down to two candidates, Ceph being one of
them. But there are still a few questions about ir that I would really like
to clarify, if possible.

Our plan, initially on 6 workstations, is to have it hosting a distributed
file system that can withstand two simultaneous computers failures without
data loss (something that can remember a raid 6, but over the network).
This file system will also need to be also remotely mounted (NFS server
with fallbacks) by other 5+ computers. Students will be working on all 11+
computers at the same time (different requisites from different softwares:
some use many small files, other a few really big, 100s gb, files), and
absolutely no hardware modifications are allowed. This initial test bed is
for undergraduate students usage, but if successful will be employed also
for our small clusters. The connection is a simple GbE.

Our actual concerns are:
1) Data Resilience: It seems that double copy of each block is the standard
setting, is it correct? As such, it will strip-parity data among three
computers for each block?

2) Metadata Resilience: We seen that we can now have more than a single
Metadata Server (which was a show-stopper on previous versions). However,
do they have to be dedicated boxes, or they can share boxes with the Data
Servers? Can it be configured in such a way that even if two metadata
server computers fail the whole system data will still be accessible from
the remaining computers, without interruptions, or they share different
data aiming only for performance?

3) Other softwares compability: We seen that there is NFS incompability, is
it correct? Also, any posix issues?

4) No single (or double) point of failure: every single possible stance has
to be able to endure a *double* failure (yes, things can get time to be
fixed here). Does Ceph need s single master server for any of its
activities? Can it endure double failure? How long would it take to any
sort of "fallback" to be completed, users would need to wait to regain
access?

I think that covers the initial questions we have. Sorry if this is the
wrong list, however.

Looking forward for any answer or suggestion,

Regards,

Jones
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com