Re: [ceph-users] Ceph + SAMBA (vfs_ceph)

2019-08-28 Thread Bill Sharer
Your windows client is failing to authenticate when it tries to mount 
the share.  That could be a simple fix or hideously complicated 
depending on what type of Windows network you are running in.  Is this 
lab environment using a Windows server running as an Active Directory 
Domain controller or have you just been working with standalone installs 
of Linux and Windows in your lab?  Are your windows installs simply 
based on a retail version of Windows Home or do you have the Pro or 
Enterprise versions licensed?


If you are stuck with a Home only version or simply want to do ad-hoc 
stuff without much futher ado (probably why you have SECURITY=USER 
stanza in your conf) then just look at using smbpasswd to create the 
password hashes necessary for SMB mounting.  This is necessary because 
Windows and Unix/Linux have different hashing schemes.   This samba wiki 
link will probably be a good starting point for you.


https://wiki.samba.org/index.php/Setting_up_Samba_as_a_Standalone_Server

If you are an Active Directory network, you will end up mucking around 
in a lot more config files in order to get your Linux boxes to join the 
Directory as members and then authenticate against the domain 
controllers.  That can also be a somewhat simple thing, but it can get 
hairy if your organization has infosec in mind and has hardening 
procedures that they applied.  That's when you might be breaking out 
Wireshark and analyzing the exchanges between Linux and the dc to figure 
out what sort of insanity is going on in your IT department.  If you 
aren't the domain admin or aren't good friends with one who also knows 
Unix/Linux you may never get anywhere.


Bill Sharer



On 8/28/19 2:32 PM, Salsa wrote:

This is the result:

# testparm -s
Load smb config files from /etc/samba/smb.conf
rlimit_max: increasing rlimit_max (1024) to minimum Windows limit (16384)
Processing section "[homes]"
Processing section "[cephfs]"
Processing section "[printers]"
Processing section "[print$]"
Loaded services file OK.
Server role: ROLE_STANDALONE

# Global parameters
[global]
load printers = No
netbios name = SAMBA-CEPH
printcap name = cups
security = USER
workgroup = CEPH
smbd: backgroundqueue = no
idmap config * : backend = tdb
cups options = raw
valid users = samba
...
[cephfs]
create mask = 0777
directory mask = 0777
guest ok = Yes
guest only = Yes
kernel share modes = No
path = /
read only = No
vfs objects = ceph
ceph: user_id = samba
ceph:config_file = /etc/ceph/ceph.conf


I cut off some parts I thought were not relevant.

--
Salsa

Sent with ProtonMail <https://protonmail.com> Secure Email.

‐‐‐ Original Message ‐‐‐
On Wednesday, August 28, 2019 3:09 AM, Konstantin Shalygin 
 wrote:





I'm running a ceph installation on a lab to evaluate for production and I have 
a cluster running, but I need to mount on different windows servers and 
desktops. I created an NFS share and was able to mount it on my Linux desktop, 
but not a Win 10 desktop. Since it seems that Windows server 2016 is required 
to mount the NFS share I quit that route and decided to try samba.

I compiled a version of Samba that has this vfs_ceph module, but I can't set it 
up correctly. It seems I'm missing some user configuration as I've hit this 
error:

"
~$ smbclient -U samba.gw //10.17.6.68/cephfs_a
WARNING: The "syslog" option is deprecated
Enter WORKGROUP\samba.gw's password:
session setup failed: NT_STATUS_LOGON_FAILURE
"
Does anyone know of any good setup tutorial to follow?

This is my smb config so far:

# Global parameters
[global]
load printers = No
netbios name = SAMBA-CEPH
printcap name = cups
security = USER
workgroup = CEPH
smbd: backgroundqueue = no
idmap config * : backend = tdb
cups options = raw
valid users = samba

[cephfs]
create mask = 0777
directory mask = 0777
guest ok = Yes
guest only = Yes
kernel share modes = No
path = /
read only = No
vfs objects = ceph
ceph: user_id = samba
ceph:config_file = /etc/ceph/ceph.conf

Thanks



Your configuration seems correct, but conf have or don't have special 
characters such a spaces, lower case options. First what you should 
do is run `testparm -s` and paste here what in output.




k




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] clock skew

2019-04-25 Thread Bill Sharer
If you are just synching to the outside pool, the three hosts may end up 
latching on to different outside servers as their definitive sources.  
You might want to make one of the three a higher priority source to the 
other two and possibly just have it use the outside sources as sync.  
Also for hardware newer than about five years old, you might want to 
look at enabling the NIC clocks using LinuxPTP to keep clock jitter down 
inside your LAN.  I wrote this article on the Gentoo wiki on enabling 
PTP in chrony.


https://wiki.gentoo.org/wiki/Chrony_with_hardware_timestamping

Bill Sharer


On 4/25/19 6:33 AM, mj wrote:

Hi all,

On our three-node cluster, we have setup chrony for time sync, and 
even though chrony reports that it is synced to ntp time, at the same 
time ceph occasionally reports time skews that can last several hours.


See for example:


root@ceph2:~# ceph -v
ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) 
luminous (stable)

root@ceph2:~# ceph health detail
HEALTH_WARN clock skew detected on mon.1
MON_CLOCK_SKEW clock skew detected on mon.1
    mon.1 addr 10.10.89.2:6789/0 clock skew 0.506374s > max 0.5s 
(latency 0.000591877s)

root@ceph2:~# chronyc tracking
Reference ID    : 7F7F0101 ()
Stratum : 10
Ref time (UTC)  : Wed Apr 24 19:05:28 2019
System time : 0.00133 seconds slow of NTP time
Last offset : -0.00524 seconds
RMS offset  : 0.00524 seconds
Frequency   : 12.641 ppm slow
Residual freq   : +0.000 ppm
Skew    : 0.000 ppm
Root delay  : 0.00 seconds
Root dispersion : 0.00 seconds
Update interval : 1.4 seconds
Leap status : Normal
root@ceph2:~# 


For the record: mon.1 = ceph2 = 10.10.89.2, and time is synced 
similarly with NTP on the two other nodes.


We don't understand this...

I have now injected mon_clock_drift_allowed 0.7, so at least we have 
HEALTH_OK again. (to stop upsetting my monitoring system)


But two questions:

- can anyone explain why this is happening, is it looks as if ceph and 
NTP/chrony disagree on just how time-synced the servers are..?


- how to determine the current clock skew from cephs perspective? 
Because "ceph health detail" in case of HEALTH_OK does not show it.
(I want to start monitoring it continuously, to see if I can find some 
sort of pattern)


Thanks!

MJ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] assertion error trying to start mds server

2017-10-12 Thread Bill Sharer
After your comment about the dual mds servers I decided to just give up
trying to get the second restarted.  After eyeballing what I had on one
of the new Ryzen boxes for drive space, I decided to just dump the
filesystem.  That will also make things go faster if and when I flip
everything over to bluestore.  So far so good...  I just took a peek and
saw the files being owned by Mr root though.  Is there going to be an
ownership reset at some point or will I have to resolve that by hand?


On 10/12/2017 06:09 AM, John Spray wrote:
> On Thu, Oct 12, 2017 at 12:23 AM, Bill Sharer <bsha...@sharerland.com> wrote:
>> I was wondering if I can't get the second mds back up That offline
>> backward scrub check sounds like it should be able to also salvage what
>> it can of the two pools to a normal filesystem.  Is there an option for
>> that or has someone written some form of salvage tool?
> Yep, cephfs-data-scan can do that.
>
> To scrape the files out of a CephFS data pool to a local filesystem, do this:
> cephfs-data-scan scan_extents   # this is discovering
> all the file sizes
> cephfs-data-scan scan_inodes --output-dir /tmp/my_output 
>
> The time taken by both these commands scales linearly with the number
> of objects in your data pool.
>
> This tool may not see the correct filename for recently created files
> (any file whose metadata is in the journal but not flushed), these
> files will go into a lost+found directory, named after their inode
> number.
>
> John
>
>> On 10/11/2017 07:07 AM, John Spray wrote:
>>> On Wed, Oct 11, 2017 at 1:42 AM, Bill Sharer <bsha...@sharerland.com> wrote:
>>>> I've been in the process of updating my gentoo based cluster both with
>>>> new hardware and a somewhat postponed update.  This includes some major
>>>> stuff including the switch from gcc 4.x to 5.4.0 on existing hardware
>>>> and using gcc 6.4.0 to make better use of AMD Ryzen on the new
>>>> hardware.  The existing cluster was on 10.2.2, but I was going to
>>>> 10.2.7-r1 as an interim step before moving on to 12.2.0 to begin
>>>> transitioning to bluestore on the osd's.
>>>>
>>>> The Ryzen units are slated to be bluestore based OSD servers if and when
>>>> I get to that point.  Up until the mds failure, they were simply cephfs
>>>> clients.  I had three OSD servers updated to 10.2.7-r1 (one is also a
>>>> MON) and had two servers left to update.  Both of these are also MONs
>>>> and were acting as a pair of dual active MDS servers running 10.2.2.
>>>> Monday morning I found out the hard way that an UPS one of them was on
>>>> has a dead battery.  After I fsck'd and came back up, I saw the
>>>> following assertion error when it was trying to start it's mds.B server:
>>>>
>>>>
>>>>  mdsbeacon(64162/B up:replay seq 3 v4699) v7  126+0+0 (709014160
>>>> 0 0) 0x7f6fb4001bc0 con 0x55f94779d
>>>> 8d0
>>>>  0> 2017-10-09 11:43:06.935662 7f6fa9ffb700 -1 mds/journal.cc: In
>>>> function 'virtual void EImportStart::r
>>>> eplay(MDSRank*)' thread 7f6fa9ffb700 time 2017-10-09 11:43:06.934972
>>>> mds/journal.cc: 2929: FAILED assert(mds->sessionmap.get_version() == cmapv)
>>>>
>>>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> const*)+0x82) [0x55f93d64a122]
>>>>  2: (EImportStart::replay(MDSRank*)+0x9ce) [0x55f93d52a5ce]
>>>>  3: (MDLog::_replay_thread()+0x4f4) [0x55f93d4a8e34]
>>>>  4: (MDLog::ReplayThread::entry()+0xd) [0x55f93d25bd4d]
>>>>  5: (()+0x74a4) [0x7f6fd009b4a4]
>>>>  6: (clone()+0x6d) [0x7f6fce5a598d]
>>>>  NOTE: a copy of the executable, or `objdump -rdS ` is
>>>> needed to interpret this.
>>>>
>>>> --- logging levels ---
>>>>0/ 5 none
>>>>0/ 1 lockdep
>>>>0/ 1 context
>>>>1/ 1 crush
>>>>1/ 5 mds
>>>>1/ 5 mds_balancer
>>>>1/ 5 mds_locker
>>>>1/ 5 mds_log
>>>>1/ 5 mds_log_expire
>>>>1/ 5 mds_migrator
>>>>0/ 1 buffer
>>>>0/ 1 timer
>>>>0/ 1 filer
>>>>0/ 1 striper
>>>>0/ 1 objecter
>>>>0/ 5 rados
>>>>0/ 5 rbd
>>>>0/ 5 rbd_mirror
>>>>0/ 5 rbd_replay
>>>>0/ 5 journaler
>>>>0/ 5 objectcacher
>>>&g

Re: [ceph-users] assertion error trying to start mds server

2017-10-11 Thread Bill Sharer
I was wondering if I can't get the second mds back up That offline
backward scrub check sounds like it should be able to also salvage what
it can of the two pools to a normal filesystem.  Is there an option for
that or has someone written some form of salvage tool?

On 10/11/2017 07:07 AM, John Spray wrote:
> On Wed, Oct 11, 2017 at 1:42 AM, Bill Sharer <bsha...@sharerland.com> wrote:
>> I've been in the process of updating my gentoo based cluster both with
>> new hardware and a somewhat postponed update.  This includes some major
>> stuff including the switch from gcc 4.x to 5.4.0 on existing hardware
>> and using gcc 6.4.0 to make better use of AMD Ryzen on the new
>> hardware.  The existing cluster was on 10.2.2, but I was going to
>> 10.2.7-r1 as an interim step before moving on to 12.2.0 to begin
>> transitioning to bluestore on the osd's.
>>
>> The Ryzen units are slated to be bluestore based OSD servers if and when
>> I get to that point.  Up until the mds failure, they were simply cephfs
>> clients.  I had three OSD servers updated to 10.2.7-r1 (one is also a
>> MON) and had two servers left to update.  Both of these are also MONs
>> and were acting as a pair of dual active MDS servers running 10.2.2.
>> Monday morning I found out the hard way that an UPS one of them was on
>> has a dead battery.  After I fsck'd and came back up, I saw the
>> following assertion error when it was trying to start it's mds.B server:
>>
>>
>>  mdsbeacon(64162/B up:replay seq 3 v4699) v7  126+0+0 (709014160
>> 0 0) 0x7f6fb4001bc0 con 0x55f94779d
>> 8d0
>>  0> 2017-10-09 11:43:06.935662 7f6fa9ffb700 -1 mds/journal.cc: In
>> function 'virtual void EImportStart::r
>> eplay(MDSRank*)' thread 7f6fa9ffb700 time 2017-10-09 11:43:06.934972
>> mds/journal.cc: 2929: FAILED assert(mds->sessionmap.get_version() == cmapv)
>>
>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x82) [0x55f93d64a122]
>>  2: (EImportStart::replay(MDSRank*)+0x9ce) [0x55f93d52a5ce]
>>  3: (MDLog::_replay_thread()+0x4f4) [0x55f93d4a8e34]
>>  4: (MDLog::ReplayThread::entry()+0xd) [0x55f93d25bd4d]
>>  5: (()+0x74a4) [0x7f6fd009b4a4]
>>  6: (clone()+0x6d) [0x7f6fce5a598d]
>>  NOTE: a copy of the executable, or `objdump -rdS ` is
>> needed to interpret this.
>>
>> --- logging levels ---
>>0/ 5 none
>>0/ 1 lockdep
>>0/ 1 context
>>1/ 1 crush
>>1/ 5 mds
>>1/ 5 mds_balancer
>>1/ 5 mds_locker
>>1/ 5 mds_log
>>1/ 5 mds_log_expire
>>1/ 5 mds_migrator
>>0/ 1 buffer
>>0/ 1 timer
>>0/ 1 filer
>>0/ 1 striper
>>0/ 1 objecter
>>0/ 5 rados
>>0/ 5 rbd
>>0/ 5 rbd_mirror
>>0/ 5 rbd_replay
>>0/ 5 journaler
>>0/ 5 objectcacher
>>0/ 5 client
>>0/ 5 osd
>>0/ 5 optracker
>>0/ 5 objclass
>>1/ 3 filestore
>>1/ 3 journal
>>0/ 5 ms
>>1/ 5 mon
>>0/10 monc
>>1/ 5 paxos
>>0/ 5 tp
>>1/ 5 auth
>>1/ 5 crypto
>>1/ 1 finisher
>>1/ 5 heartbeatmap
>>1/ 5 perfcounter
>>1/ 5 rgw
>>1/10 civetweb
>>1/ 5 javaclient
>>1/ 5 asok
>>1/ 1 throttle
>>0/ 0 refs
>>1/ 5 xio
>>1/ 5 compressor
>>1/ 5 newstore
>>1/ 5 bluestore
>>1/ 5 bluefs
>>1/ 3 bdev
>>1/ 5 kstore
>>4/ 5 rocksdb
>>4/ 5 leveldb
>>1/ 5 kinetic
>>1/ 5 fuse
>>   -2/-2 (syslog threshold)
>>   -1/-1 (stderr threshold)
>>   max_recent 1
>>   max_new 1000
>>   log_file /var/log/ceph/ceph-mds.B.log
>>
>>
>>
>> When I was googling around, I ran into this Cern presentation and tried
>> out the offline backware scrubbing commands on slide 25 first:
>>
>> https://indico.cern.ch/event/531810/contributions/2309925/attachments/1357386/2053998/GoncaloBorges-HEPIX16-v3.pdf
>>
>>
>> Both ran without any messages, so I'm assuming I have sane contents in
>> the cephfs_data and cephfs_metadata pools.  Still no luck getting things
>> restarted, so I tried the cephfs-journal-tool journal reset on slide
>> 23.  That didn't work either.  Just for giggles, I tried setting up the
>> two Ryzen boxes as new mds.C and mds.D servers which would run on
>> 10.2.7-r1 instead of using mds.A and mds.B (10.2.2).  The D server fails
>> with the same assert as

[ceph-users] assertion error trying to start mds server

2017-10-10 Thread Bill Sharer
I've been in the process of updating my gentoo based cluster both with
new hardware and a somewhat postponed update.  This includes some major
stuff including the switch from gcc 4.x to 5.4.0 on existing hardware
and using gcc 6.4.0 to make better use of AMD Ryzen on the new
hardware.  The existing cluster was on 10.2.2, but I was going to
10.2.7-r1 as an interim step before moving on to 12.2.0 to begin
transitioning to bluestore on the osd's.

The Ryzen units are slated to be bluestore based OSD servers if and when
I get to that point.  Up until the mds failure, they were simply cephfs
clients.  I had three OSD servers updated to 10.2.7-r1 (one is also a
MON) and had two servers left to update.  Both of these are also MONs
and were acting as a pair of dual active MDS servers running 10.2.2. 
Monday morning I found out the hard way that an UPS one of them was on
has a dead battery.  After I fsck'd and came back up, I saw the
following assertion error when it was trying to start it's mds.B server:


 mdsbeacon(64162/B up:replay seq 3 v4699) v7  126+0+0 (709014160
0 0) 0x7f6fb4001bc0 con 0x55f94779d
8d0
 0> 2017-10-09 11:43:06.935662 7f6fa9ffb700 -1 mds/journal.cc: In
function 'virtual void EImportStart::r
eplay(MDSRank*)' thread 7f6fa9ffb700 time 2017-10-09 11:43:06.934972
mds/journal.cc: 2929: FAILED assert(mds->sessionmap.get_version() == cmapv)

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x82) [0x55f93d64a122]
 2: (EImportStart::replay(MDSRank*)+0x9ce) [0x55f93d52a5ce]
 3: (MDLog::_replay_thread()+0x4f4) [0x55f93d4a8e34]
 4: (MDLog::ReplayThread::entry()+0xd) [0x55f93d25bd4d]
 5: (()+0x74a4) [0x7f6fd009b4a4]
 6: (clone()+0x6d) [0x7f6fce5a598d]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 newstore
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 kinetic
   1/ 5 fuse
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent 1
  max_new 1000
  log_file /var/log/ceph/ceph-mds.B.log



When I was googling around, I ran into this Cern presentation and tried
out the offline backware scrubbing commands on slide 25 first:

https://indico.cern.ch/event/531810/contributions/2309925/attachments/1357386/2053998/GoncaloBorges-HEPIX16-v3.pdf


Both ran without any messages, so I'm assuming I have sane contents in
the cephfs_data and cephfs_metadata pools.  Still no luck getting things
restarted, so I tried the cephfs-journal-tool journal reset on slide
23.  That didn't work either.  Just for giggles, I tried setting up the
two Ryzen boxes as new mds.C and mds.D servers which would run on
10.2.7-r1 instead of using mds.A and mds.B (10.2.2).  The D server fails
with the same assert as follows:


=== 132+0+1979520 (4198351460 0 1611007530) 0x7fffc4000a70 con
0x7fffe0013310
 0> 2017-10-09 13:01:31.571195 7fffd99f5700 -1 mds/journal.cc: In
function 'virtual void EImportStart::replay(MDSRank*)' thread
7fffd99f5700 time 2017-10-09 13:01:31.570608
mds/journal.cc: 2949: FAILED assert(mds->sessionmap.get_version() == cmapv)

 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x80) [0x55b7ebc8]
 2: (EImportStart::replay(MDSRank*)+0x9ea) [0x55a5674a]
 3: (MDLog::_replay_thread()+0xe51) [0x559cef21]
 4: (MDLog::ReplayThread::entry()+0xd) [0x557778cd]
 5: (()+0x7364) [0x77bc5364]
 6: (clone()+0x6d) [0x76051ccd]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] what happen to the OSDs if the OS disk dies?

2016-08-12 Thread Bill Sharer
If all the system disk does is handle the o/s (ie osd journals are on 
dedicated or osd drives as well), no problem.  Just rebuild the system 
and copy the ceph.conf back in when you re-install ceph.  Keep a spare 
copy of your original fstab to keep your osd filesystem mounts straight.


Just keep in mind that you are down 11 osds while that system drive gets 
rebuilt though.  It's safer to do 10 osds and then have a mirror set for 
the system disk.


Bill Sharer


On 08/12/2016 03:33 PM, Ronny Aasen wrote:

On 12.08.2016 13:41, Félix Barbeira wrote:

Hi,

I'm planning to make a ceph cluster but I have a serious doubt. At 
this moment we have ~10 servers DELL R730xd with 12x4TB SATA disks. 
The official ceph docs says:


"We recommend using a dedicated drive for the operating system and 
software, and one drive for each Ceph OSD Daemon you run on the host."


I could use for example 1 disk for the OS and 11 for OSD data. In the 
operating system I would run 11 daemons to control the OSDs. 
But...what happen to the cluster if the disk with the OS fails?? 
maybe the cluster thinks that 11 OSD failed and try to replicate all 
that data over the cluster...that sounds no good.


Should I use 2 disks for the OS making a RAID1? in this case I'm 
"wasting" 8TB only for ~10GB that the OS needs.


In all the docs that i've been reading says ceph has no unique single 
point of failure, so I think that this scenario must have a optimal 
solution, maybe somebody could help me.


Thanks in advance.

--
Félix Barbeira.

if you do not have dedicated slots on the back for OS disks, then i 
would recomend using SATADOM flash modules directly into a SATA port 
internal in the machine. Saves you 2 slots for osd's and they are 
quite reliable. you could even use 2 sd cards if your machine have the 
internal SD slot


http://www.dell.com/downloads/global/products/pedge/en/poweredge-idsdm-whitepaper-en.pdf

kind regards
Ronny Aasen


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ONE pg deep-scrub blocks cluster

2016-07-28 Thread Bill Sharer
Removing osd.4 and still getting the scrub problems removes its drive 
from consideration as the culprit.  Try the same thing again for osd.16 
and then osd.28.


smartctl may not show anything out of sorts until the marginally bad 
sector or sectors finally goes bad and gets remapped.  The only hint may 
be buried in the raw read error rate, seek error rate or other error 
counts like ecc or crc errors.  The long test you are running may or may 
not show any new information.



Bill Sharer

On 07/28/2016 11:46 AM, c wrote:

Am 2016-07-28 15:26, schrieb Bill Sharer:

I suspect the data for one or more shards on this osd's underlying
filesystem has a marginally bad sector or sectors.  A read from the
deep scrub may be causing the drive to perform repeated seeks and
reads of the sector until it gets a good read from the filesystem.
You might want to look at the SMART info on the drive or drives in the
RAID set to see what the error counts suggest about this.  You may
also be looking at a drive that's about to fail.

Bill Sharer


Hello Bill,

thank you for reading and answering my eMail :)

As I wrote, I have already checked the disks via "smartctl"

- osd.4: http://slexy.org/view/s2LR5ncr8G
- osd.16: http://slexy.org/view/s2LH6FBcYP
- osd.28: http://slexy.org/view/s21Yod9dUw

Now there is running a long test " smartctl --test long /dev/DISK " on 
all disks - to be really on the safe side. This will take a while.


There is no RAID used for the OSDs!

I have forgot to mention that for a test I had removed (completely) 
"osd.4" from the Cluster and did run " ceph pg deep-scrub 0.223 " 
again with the same result (nearly all of my VMs stop working for a 
while).


- Mehmet



On 07/28/2016 08:46 AM, c wrote:

Hello Ceph alikes :)

i have a strange issue with one PG (0.223) combined with "deep-scrub".

Always when ceph - or I manually - run a " ceph pg deep-scrub 0.223 
", this leads to many "slow/block requests" so that nearly all of my 
VMs stop working for a while.


This happens only to this one PG 0.223 and in combination with 
deep-scrub (!). All other Placement Groups where a deep-scrub occurs 
are fine. The mentioned PG also works fine when a "normal scrub" 
occurs.


These OSDs are involved:

#> ceph pg map 0.223
osdmap e7047 pg 0.223 (0.223) -> up [4,16,28] acting [4,16,28]

*The LogFiles*

"deep-scrub" starts @ 2016-07-28 12:44:00.588542 and takes 
approximately 12 Minutes (End: 2016-07-28 12:56:31.891165)

- ceph.log: http://pastebin.com/FSY45VtM

I have done " ceph tell osd injectargs '--debug-osd = 5/5' " for the 
related OSDs 4,16 and 28


LogFile - osd.4
- ceph-osd.4.log: http://slexy.org/view/s20zzAfxFH

LogFile - osd.16
- ceph-osd.16.log: http://slexy.org/view/s25H3Zvkb0

LogFile - osd.28
- ceph-osd.28.log: http://slexy.org/view/s21Ecpwd70

I have checked the disks 4,16 and 28 with smartctl and could not any 
issues - also there are no odd "dmesg" messages.


*ceph -s*
cluster 98a410bf-b823-47e4-ad17-4543afa24992
 health HEALTH_OK
 monmap e2: 3 mons at 
{monitor1=172.16.0.2:6789/0,monitor3=172.16.0.4:6789/0,monitor2=172.16.0.3:6789/0}

election epoch 38, quorum 0,1,2 monitor1,monitor2,monitor3
 osdmap e7047: 30 osds: 30 up, 30 in
flags sortbitwise
  pgmap v3253519: 1024 pgs, 1 pools, 2858 GB data, 692 kobjects
8577 GB used, 96256 GB / 102 TB avail
1024 active+clean
  client io 396 kB/s rd, 3141 kB/s wr, 55 op/s rd, 269 op/s wr

This is my Setup:

*Software/OS*

- Jewel
#> ceph tell osd.* version | grep version | uniq
"version": "ceph version 10.2.2 
(45107e21c568dd033c2f0a3107dec8f0b0e58374)"

#> ceph tell mon.* version
[...] ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

- Ubuntu 16.04 LTS on all OSD and MON Server
#> uname -a
Linux galawyn 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


*Server*

3x OSD Server, each with
- 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz ==> 12 Cores, no 
Hyper-Threading

- 64GB RAM
- 10x 4TB HGST 7K4000 SAS2 (6GB/s) Disks as OSDs
- 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device 
for 10-12 Disks

- 1x Samsung SSD 840/850 Pro only for the OS

3x MON Server
- Two of them with 1x Intel(R) Xeon(R) CPU E3-1265L V2 @ 2.50GHz (4 
Cores, 8 Threads)
- The third one has 2x Intel(R) Xeon(R) CPU L5430  @ 2.66GHz ==> 8 
Cores, no Hyper-Threading

- 32 GB RAM
- 1x Raid 10 (4 Disks)

*Network*
- Each Server and Client has an active connection @ 1x 10GB; A 
second connection is also connected via 10GB but provides only a 
Backup connection when the active Switch fails - no LACP possible.

- We do not use Jumbo Frames yet..
- Public and Cluster-Network related Ceph traffic is going through 
this one active 10GB Interface on each Server.


Any ideas what is going o

Re: [ceph-users] ONE pg deep-scrub blocks cluster

2016-07-28 Thread Bill Sharer
I suspect the data for one or more shards on this osd's underlying 
filesystem has a marginally bad sector or sectors.  A read from the deep 
scrub may be causing the drive to perform repeated seeks and reads of 
the sector until it gets a good read from the filesystem.  You might 
want to look at the SMART info on the drive or drives in the RAID set to 
see what the error counts suggest about this.  You may also be looking 
at a drive that's about to fail.


Bill Sharer

On 07/28/2016 08:46 AM, c wrote:

Hello Ceph alikes :)

i have a strange issue with one PG (0.223) combined with "deep-scrub".

Always when ceph - or I manually - run a " ceph pg deep-scrub 0.223 ", 
this leads to many "slow/block requests" so that nearly all of my VMs 
stop working for a while.


This happens only to this one PG 0.223 and in combination with 
deep-scrub (!). All other Placement Groups where a deep-scrub occurs 
are fine. The mentioned PG also works fine when a "normal scrub" occurs.


These OSDs are involved:

#> ceph pg map 0.223
osdmap e7047 pg 0.223 (0.223) -> up [4,16,28] acting [4,16,28]

*The LogFiles*

"deep-scrub" starts @ 2016-07-28 12:44:00.588542 and takes 
approximately 12 Minutes (End: 2016-07-28 12:56:31.891165)

- ceph.log: http://pastebin.com/FSY45VtM

I have done " ceph tell osd injectargs '--debug-osd = 5/5' " for the 
related OSDs 4,16 and 28


LogFile - osd.4
- ceph-osd.4.log: http://slexy.org/view/s20zzAfxFH

LogFile - osd.16
- ceph-osd.16.log: http://slexy.org/view/s25H3Zvkb0

LogFile - osd.28
- ceph-osd.28.log: http://slexy.org/view/s21Ecpwd70

I have checked the disks 4,16 and 28 with smartctl and could not any 
issues - also there are no odd "dmesg" messages.


*ceph -s*
cluster 98a410bf-b823-47e4-ad17-4543afa24992
 health HEALTH_OK
 monmap e2: 3 mons at 
{monitor1=172.16.0.2:6789/0,monitor3=172.16.0.4:6789/0,monitor2=172.16.0.3:6789/0}

election epoch 38, quorum 0,1,2 monitor1,monitor2,monitor3
 osdmap e7047: 30 osds: 30 up, 30 in
flags sortbitwise
  pgmap v3253519: 1024 pgs, 1 pools, 2858 GB data, 692 kobjects
8577 GB used, 96256 GB / 102 TB avail
1024 active+clean
  client io 396 kB/s rd, 3141 kB/s wr, 55 op/s rd, 269 op/s wr

This is my Setup:

*Software/OS*

- Jewel
#> ceph tell osd.* version | grep version | uniq
"version": "ceph version 10.2.2 
(45107e21c568dd033c2f0a3107dec8f0b0e58374)"

#> ceph tell mon.* version
[...] ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

- Ubuntu 16.04 LTS on all OSD and MON Server
#> uname -a
Linux galawyn 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 
2016 x86_64 x86_64 x86_64 GNU/Linux


*Server*

3x OSD Server, each with
- 2x Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz ==> 12 Cores, no 
Hyper-Threading

- 64GB RAM
- 10x 4TB HGST 7K4000 SAS2 (6GB/s) Disks as OSDs
- 1x INTEL SSDPEDMD400G4 (Intel DC P3700 NVMe) as Journaling Device 
for 10-12 Disks

- 1x Samsung SSD 840/850 Pro only for the OS

3x MON Server
- Two of them with 1x Intel(R) Xeon(R) CPU E3-1265L V2 @ 2.50GHz (4 
Cores, 8 Threads)
- The third one has 2x Intel(R) Xeon(R) CPU L5430  @ 2.66GHz ==> 8 
Cores, no Hyper-Threading

- 32 GB RAM
- 1x Raid 10 (4 Disks)

*Network*
- Each Server and Client has an active connection @ 1x 10GB; A second 
connection is also connected via 10GB but provides only a Backup 
connection when the active Switch fails - no LACP possible.

- We do not use Jumbo Frames yet..
- Public and Cluster-Network related Ceph traffic is going through 
this one active 10GB Interface on each Server.


Any ideas what is going on?
Can I provide more input to find a solution?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Active MON aborts on Jewel 10.2.2 with FAILED assert(info.state == MDSMap::STATE_STANDBY

2016-07-08 Thread Bill Sharer
Just for giggles I tried the rolling upgrade to 10.2.2 again today.  
This time I rolled mon.0 and osd.0 first while keeping the mds servers 
up and then rolled them before moving on to the other three.  No 
assertion failure this time since I guess I always had an mds active.  I 
wonder if I will have a problem though if I do a complete cold start of 
the cluster.


Bill Sharer


On 07/06/2016 04:19 PM, Bill Sharer wrote:
Manual downgrade to 10.2.0 put me back in business.  I'm going to mask 
10.2.2 and then try to let 10.2.1 emerge.


Bill Sharer

On 07/06/2016 02:16 PM, Bill Sharer wrote:
I noticed on that USE list that the 10.2.2 ebuild introduced a new 
cephfs emerge flag, so I enabled that and emerged everywhere again.  
The active mon is still crashing on the assertion though.



Bill Sharer

On 07/05/2016 08:14 PM, Bill Sharer wrote:

Relevant USE flags FWIW

# emerge -pv ceph

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild   R   ~] sys-cluster/ceph-10.2.2::gentoo  USE="fuse gtk 
jemalloc ldap libaio libatomic nss radosgw static-libs xfs 
-babeltrace -cephfs -cryptopp -debug -lttng -tcmalloc {-test} -zfs" 
PYTHON_TARGETS="python2_7 python3_4 -python3_5" 11,271 KiB



Bill Sharer


On 07/05/2016 01:45 PM, Gregory Farnum wrote:

Thanks for the report; created a ticket and somebody will get on it
shortly. http://tracker.ceph.com/issues/16592
-Greg

On Sun, Jul 3, 2016 at 5:55 PM, Bill Sharer 
<bsha...@sharerland.com> wrote:
I was working on  a rolling upgrade on Gentoo to Jewel 10.2.2 from 
10.2.0.
However now I can't get a monitor quorum going again because as 
soon as I

get one, the mon which wins the election blows out with an assertion
failure.  Here's my status at the moment

kroll110.2.2ceph mon.0 and ceph osd.0 normally my lead mon
kroll210.2.2ceph mon 1 and ceph osd 2
kroll310.2.2ceph osd 1
kroll410.2.2ceph mon 3 and ceph osd 3
kroll510.2.2ceph mon 4 and ceph mds 2 normally my active mds
kroll610.2.0ceph mon 5 and ceph mds B normally standby mds

I had done rolling upgrade of everything but kroll6 and had 
rebooted the
first three osd and mon servers.  mds 2 went down during gentoo 
update of
kroll5 because of memory scarcity so mds B was the active mds 
server.  After

rebooting kroll4 I found that mon 0 had gone done with the assertion
failure.  I ended up stopping all ceph processes but desktops with 
client
mounts were all still up for the moment and basically would be 
stuck on

locks if I tried to access cephfs.

After trying to restart mons only beginning with mon 0 initially, the
following happened to mon.0 after enough mons were up for a quorum:

2016-07-03 16:34:26.555728 7fbff22f8480  1 leveldb: Recovering log 
#2592390
2016-07-03 16:34:26.555762 7fbff22f8480  1 leveldb: Level-0 table 
#2592397:

started
2016-07-03 16:34:26.558788 7fbff22f8480  1 leveldb: Level-0 table 
#2592397:

192 bytes OK
2016-07-03 16:34:26.562263 7fbff22f8480  1 leveldb: Delete type=3 
#2592388


2016-07-03 16:34:26.562364 7fbff22f8480  1 leveldb: Delete type=0 
#2592390


2016-07-03 16:34:26.563126 7fbff22f8480 -1 wrote monmap to
/etc/ceph/tmpmonmap
2016-07-03 17:09:25.753729 7f8291dff480  0 ceph version 10.2.2
(45107e21c568dd033c2f0a3107dec8f0b0e58374), pro
cess ceph-mon, pid 20842
2016-07-03 17:09:25.762588 7f8291dff480  1 leveldb: Recovering log 
#2592398
2016-07-03 17:09:25.767722 7f8291dff480  1 leveldb: Delete type=0 
#2592398


2016-07-03 17:09:25.767803 7f8291dff480  1 leveldb: Delete type=3 
#2592396


2016-07-03 17:09:25.768600 7f8291dff480  0 starting mon.0 rank 0 at
192.168.2.1:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid
1798897a-f0c9-422d-86b3-d4933a12c7ac
2016-07-03 17:09:25.769066 7f8291dff480  1 mon.0@-1(probing) e10 
preinit

fsid 1798897a-f0c9-422d-86b3-d4933a12c7ac
2016-07-03 17:09:25.769923 7f8291dff480  1
mon.0@-1(probing).paxosservice(pgmap 17869652..17870289) refresh 
upgraded,

format 0 -> 1
2016-07-03 17:09:25.769947 7f8291dff480  1 mon.0@-1(probing).pg v0
on_upgrade discarding in-core PGMap
2016-07-03 17:09:25.776148 7f8291dff480  0 mon.0@-1(probing).mds 
e1532

print_map
e1532
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client 
writeable
ranges,3=default file layouts on dirs,4=dir inode in separate 
object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,8=no anchor 
table}


Filesystem 'cephfs' (0)
fs_name cephfs
epoch   1530
flags   0
modified2016-05-19 01:21:31.953710
tableserver 0
root0
session_timeout 60
session_autoclose   300
max_file_size   1099511627776
last_failure1478
last_failure_osd_epoch  26431
compat  compat={},rocompat={},incompat={1=base v0.20,2=client 
writeable
ranges,3=default file layouts on dirs,4=dir inode in separate 
object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,8=no anchor 
table}

max_mds 1
in  0
up  {0=11

Re: [ceph-users] Active MON aborts on Jewel 10.2.2 with FAILED assert(info.state == MDSMap::STATE_STANDBY

2016-07-06 Thread Bill Sharer
Manual downgrade to 10.2.0 put me back in business.  I'm going to mask 
10.2.2 and then try to let 10.2.1 emerge.


Bill Sharer

On 07/06/2016 02:16 PM, Bill Sharer wrote:
I noticed on that USE list that the 10.2.2 ebuild introduced a new 
cephfs emerge flag, so I enabled that and emerged everywhere again.  
The active mon is still crashing on the assertion though.



Bill Sharer

On 07/05/2016 08:14 PM, Bill Sharer wrote:

Relevant USE flags FWIW

# emerge -pv ceph

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild   R   ~] sys-cluster/ceph-10.2.2::gentoo  USE="fuse gtk 
jemalloc ldap libaio libatomic nss radosgw static-libs xfs 
-babeltrace -cephfs -cryptopp -debug -lttng -tcmalloc {-test} -zfs" 
PYTHON_TARGETS="python2_7 python3_4 -python3_5" 11,271 KiB



Bill Sharer


On 07/05/2016 01:45 PM, Gregory Farnum wrote:

Thanks for the report; created a ticket and somebody will get on it
shortly. http://tracker.ceph.com/issues/16592
-Greg

On Sun, Jul 3, 2016 at 5:55 PM, Bill Sharer <bsha...@sharerland.com> 
wrote:
I was working on  a rolling upgrade on Gentoo to Jewel 10.2.2 from 
10.2.0.
However now I can't get a monitor quorum going again because as 
soon as I

get one, the mon which wins the election blows out with an assertion
failure.  Here's my status at the moment

kroll110.2.2ceph mon.0 and ceph osd.0 normally my lead mon
kroll210.2.2ceph mon 1 and ceph osd 2
kroll310.2.2ceph osd 1
kroll410.2.2ceph mon 3 and ceph osd 3
kroll510.2.2ceph mon 4 and ceph mds 2 normally my active mds
kroll610.2.0ceph mon 5 and ceph mds B normally standby mds

I had done rolling upgrade of everything but kroll6 and had 
rebooted the
first three osd and mon servers.  mds 2 went down during gentoo 
update of
kroll5 because of memory scarcity so mds B was the active mds 
server.  After

rebooting kroll4 I found that mon 0 had gone done with the assertion
failure.  I ended up stopping all ceph processes but desktops with 
client
mounts were all still up for the moment and basically would be 
stuck on

locks if I tried to access cephfs.

After trying to restart mons only beginning with mon 0 initially, the
following happened to mon.0 after enough mons were up for a quorum:

2016-07-03 16:34:26.555728 7fbff22f8480  1 leveldb: Recovering log 
#2592390
2016-07-03 16:34:26.555762 7fbff22f8480  1 leveldb: Level-0 table 
#2592397:

started
2016-07-03 16:34:26.558788 7fbff22f8480  1 leveldb: Level-0 table 
#2592397:

192 bytes OK
2016-07-03 16:34:26.562263 7fbff22f8480  1 leveldb: Delete type=3 
#2592388


2016-07-03 16:34:26.562364 7fbff22f8480  1 leveldb: Delete type=0 
#2592390


2016-07-03 16:34:26.563126 7fbff22f8480 -1 wrote monmap to
/etc/ceph/tmpmonmap
2016-07-03 17:09:25.753729 7f8291dff480  0 ceph version 10.2.2
(45107e21c568dd033c2f0a3107dec8f0b0e58374), pro
cess ceph-mon, pid 20842
2016-07-03 17:09:25.762588 7f8291dff480  1 leveldb: Recovering log 
#2592398
2016-07-03 17:09:25.767722 7f8291dff480  1 leveldb: Delete type=0 
#2592398


2016-07-03 17:09:25.767803 7f8291dff480  1 leveldb: Delete type=3 
#2592396


2016-07-03 17:09:25.768600 7f8291dff480  0 starting mon.0 rank 0 at
192.168.2.1:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid
1798897a-f0c9-422d-86b3-d4933a12c7ac
2016-07-03 17:09:25.769066 7f8291dff480  1 mon.0@-1(probing) e10 
preinit

fsid 1798897a-f0c9-422d-86b3-d4933a12c7ac
2016-07-03 17:09:25.769923 7f8291dff480  1
mon.0@-1(probing).paxosservice(pgmap 17869652..17870289) refresh 
upgraded,

format 0 -> 1
2016-07-03 17:09:25.769947 7f8291dff480  1 mon.0@-1(probing).pg v0
on_upgrade discarding in-core PGMap
2016-07-03 17:09:25.776148 7f8291dff480  0 mon.0@-1(probing).mds e1532
print_map
e1532
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client 
writeable
ranges,3=default file layouts on dirs,4=dir inode in separate 
object,5=mds

uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}

Filesystem 'cephfs' (0)
fs_name cephfs
epoch   1530
flags   0
modified2016-05-19 01:21:31.953710
tableserver 0
root0
session_timeout 60
session_autoclose   300
max_file_size   1099511627776
last_failure1478
last_failure_osd_epoch  26431
compat  compat={},rocompat={},incompat={1=base v0.20,2=client 
writeable
ranges,3=default file layouts on dirs,4=dir inode in separate 
object,5=mds

uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}
max_mds 1
in  0
up  {0=1190233}
failed
damaged
stopped
data_pools  0
metadata_pool   1
inline_data disabled
1190233:192.168.2.6:6800/5437 'B' mds.0.1526 up:active seq 
103145



Standby daemons:

1190222:192.168.2.5:6801/5871 '2' mds.-1.0 up:standby seq 
135114


2016-07-03 17:09:25.776444 7f8291dff480  0 mon.0@-1(probing).osd 
e26460

crush map has features 2200130813952, adjusting msgr requires
2016-07-03 17:09:25.776450 7f8291df

Re: [ceph-users] Active MON aborts on Jewel 10.2.2 with FAILED assert(info.state == MDSMap::STATE_STANDBY

2016-07-06 Thread Bill Sharer
I noticed on that USE list that the 10.2.2 ebuild introduced a new 
cephfs emerge flag, so I enabled that and emerged everywhere again.  The 
active mon is still crashing on the assertion though.



Bill Sharer

On 07/05/2016 08:14 PM, Bill Sharer wrote:

Relevant USE flags FWIW

# emerge -pv ceph

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild   R   ~] sys-cluster/ceph-10.2.2::gentoo  USE="fuse gtk 
jemalloc ldap libaio libatomic nss radosgw static-libs xfs -babeltrace 
-cephfs -cryptopp -debug -lttng -tcmalloc {-test} -zfs" 
PYTHON_TARGETS="python2_7 python3_4 -python3_5" 11,271 KiB



Bill Sharer


On 07/05/2016 01:45 PM, Gregory Farnum wrote:

Thanks for the report; created a ticket and somebody will get on it
shortly. http://tracker.ceph.com/issues/16592
-Greg

On Sun, Jul 3, 2016 at 5:55 PM, Bill Sharer <bsha...@sharerland.com> 
wrote:
I was working on  a rolling upgrade on Gentoo to Jewel 10.2.2 from 
10.2.0.
However now I can't get a monitor quorum going again because as soon 
as I

get one, the mon which wins the election blows out with an assertion
failure.  Here's my status at the moment

kroll110.2.2ceph mon.0 and ceph osd.0 normally my lead mon
kroll210.2.2ceph mon 1 and ceph osd 2
kroll310.2.2ceph osd 1
kroll410.2.2ceph mon 3 and ceph osd 3
kroll510.2.2ceph mon 4 and ceph mds 2 normally my active mds
kroll610.2.0ceph mon 5 and ceph mds B normally standby mds

I had done rolling upgrade of everything but kroll6 and had rebooted 
the
first three osd and mon servers.  mds 2 went down during gentoo 
update of
kroll5 because of memory scarcity so mds B was the active mds 
server.  After

rebooting kroll4 I found that mon 0 had gone done with the assertion
failure.  I ended up stopping all ceph processes but desktops with 
client

mounts were all still up for the moment and basically would be stuck on
locks if I tried to access cephfs.

After trying to restart mons only beginning with mon 0 initially, the
following happened to mon.0 after enough mons were up for a quorum:

2016-07-03 16:34:26.555728 7fbff22f8480  1 leveldb: Recovering log 
#2592390
2016-07-03 16:34:26.555762 7fbff22f8480  1 leveldb: Level-0 table 
#2592397:

started
2016-07-03 16:34:26.558788 7fbff22f8480  1 leveldb: Level-0 table 
#2592397:

192 bytes OK
2016-07-03 16:34:26.562263 7fbff22f8480  1 leveldb: Delete type=3 
#2592388


2016-07-03 16:34:26.562364 7fbff22f8480  1 leveldb: Delete type=0 
#2592390


2016-07-03 16:34:26.563126 7fbff22f8480 -1 wrote monmap to
/etc/ceph/tmpmonmap
2016-07-03 17:09:25.753729 7f8291dff480  0 ceph version 10.2.2
(45107e21c568dd033c2f0a3107dec8f0b0e58374), pro
cess ceph-mon, pid 20842
2016-07-03 17:09:25.762588 7f8291dff480  1 leveldb: Recovering log 
#2592398
2016-07-03 17:09:25.767722 7f8291dff480  1 leveldb: Delete type=0 
#2592398


2016-07-03 17:09:25.767803 7f8291dff480  1 leveldb: Delete type=3 
#2592396


2016-07-03 17:09:25.768600 7f8291dff480  0 starting mon.0 rank 0 at
192.168.2.1:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid
1798897a-f0c9-422d-86b3-d4933a12c7ac
2016-07-03 17:09:25.769066 7f8291dff480  1 mon.0@-1(probing) e10 
preinit

fsid 1798897a-f0c9-422d-86b3-d4933a12c7ac
2016-07-03 17:09:25.769923 7f8291dff480  1
mon.0@-1(probing).paxosservice(pgmap 17869652..17870289) refresh 
upgraded,

format 0 -> 1
2016-07-03 17:09:25.769947 7f8291dff480  1 mon.0@-1(probing).pg v0
on_upgrade discarding in-core PGMap
2016-07-03 17:09:25.776148 7f8291dff480  0 mon.0@-1(probing).mds e1532
print_map
e1532
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate 
object,5=mds

uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}

Filesystem 'cephfs' (0)
fs_name cephfs
epoch   1530
flags   0
modified2016-05-19 01:21:31.953710
tableserver 0
root0
session_timeout 60
session_autoclose   300
max_file_size   1099511627776
last_failure1478
last_failure_osd_epoch  26431
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate 
object,5=mds

uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}
max_mds 1
in  0
up  {0=1190233}
failed
damaged
stopped
data_pools  0
metadata_pool   1
inline_data disabled
1190233:192.168.2.6:6800/5437 'B' mds.0.1526 up:active seq 
103145



Standby daemons:

1190222:192.168.2.5:6801/5871 '2' mds.-1.0 up:standby seq 
135114


2016-07-03 17:09:25.776444 7f8291dff480  0 mon.0@-1(probing).osd e26460
crush map has features 2200130813952, adjusting msgr requires
2016-07-03 17:09:25.776450 7f8291dff480  0 mon.0@-1(probing).osd e26460
crush map has features 2200130813952, adjusting msgr requires
2016-07-03 17:09:25.776453 7f8291dff480  0 mon.0@-1(probing).osd 

Re: [ceph-users] Active MON aborts on Jewel 10.2.2 with FAILED assert(info.state == MDSMap::STATE_STANDBY

2016-07-05 Thread Bill Sharer

Relevant USE flags FWIW

# emerge -pv ceph

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild   R   ~] sys-cluster/ceph-10.2.2::gentoo  USE="fuse gtk jemalloc 
ldap libaio libatomic nss radosgw static-libs xfs -babeltrace -cephfs 
-cryptopp -debug -lttng -tcmalloc {-test} -zfs" 
PYTHON_TARGETS="python2_7 python3_4 -python3_5" 11,271 KiB



Bill Sharer


On 07/05/2016 01:45 PM, Gregory Farnum wrote:

Thanks for the report; created a ticket and somebody will get on it
shortly. http://tracker.ceph.com/issues/16592
-Greg

On Sun, Jul 3, 2016 at 5:55 PM, Bill Sharer <bsha...@sharerland.com> wrote:

I was working on  a rolling upgrade on Gentoo to Jewel 10.2.2 from 10.2.0.
However now I can't get a monitor quorum going again because as soon as I
get one, the mon which wins the election blows out with an assertion
failure.  Here's my status at the moment

kroll110.2.2ceph mon.0 and ceph osd.0 normally my lead mon
kroll210.2.2ceph mon 1 and ceph osd 2
kroll310.2.2ceph osd 1
kroll410.2.2ceph mon 3 and ceph osd 3
kroll510.2.2ceph mon 4 and ceph mds 2 normally my active mds
kroll610.2.0ceph mon 5 and ceph mds B normally standby mds

I had done rolling upgrade of everything but kroll6 and had rebooted the
first three osd and mon servers.  mds 2 went down during gentoo update of
kroll5 because of memory scarcity so mds B was the active mds server.  After
rebooting kroll4 I found that mon 0 had gone done with the assertion
failure.  I ended up stopping all ceph processes but desktops with client
mounts were all still up for the moment and basically would be stuck on
locks if I tried to access cephfs.

After trying to restart mons only beginning with mon 0 initially, the
following happened to mon.0 after enough mons were up for a quorum:

2016-07-03 16:34:26.555728 7fbff22f8480  1 leveldb: Recovering log #2592390
2016-07-03 16:34:26.555762 7fbff22f8480  1 leveldb: Level-0 table #2592397:
started
2016-07-03 16:34:26.558788 7fbff22f8480  1 leveldb: Level-0 table #2592397:
192 bytes OK
2016-07-03 16:34:26.562263 7fbff22f8480  1 leveldb: Delete type=3 #2592388

2016-07-03 16:34:26.562364 7fbff22f8480  1 leveldb: Delete type=0 #2592390

2016-07-03 16:34:26.563126 7fbff22f8480 -1 wrote monmap to
/etc/ceph/tmpmonmap
2016-07-03 17:09:25.753729 7f8291dff480  0 ceph version 10.2.2
(45107e21c568dd033c2f0a3107dec8f0b0e58374), pro
cess ceph-mon, pid 20842
2016-07-03 17:09:25.762588 7f8291dff480  1 leveldb: Recovering log #2592398
2016-07-03 17:09:25.767722 7f8291dff480  1 leveldb: Delete type=0 #2592398

2016-07-03 17:09:25.767803 7f8291dff480  1 leveldb: Delete type=3 #2592396

2016-07-03 17:09:25.768600 7f8291dff480  0 starting mon.0 rank 0 at
192.168.2.1:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid
1798897a-f0c9-422d-86b3-d4933a12c7ac
2016-07-03 17:09:25.769066 7f8291dff480  1 mon.0@-1(probing) e10 preinit
fsid 1798897a-f0c9-422d-86b3-d4933a12c7ac
2016-07-03 17:09:25.769923 7f8291dff480  1
mon.0@-1(probing).paxosservice(pgmap 17869652..17870289) refresh upgraded,
format 0 -> 1
2016-07-03 17:09:25.769947 7f8291dff480  1 mon.0@-1(probing).pg v0
on_upgrade discarding in-core PGMap
2016-07-03 17:09:25.776148 7f8291dff480  0 mon.0@-1(probing).mds e1532
print_map
e1532
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}

Filesystem 'cephfs' (0)
fs_name cephfs
epoch   1530
flags   0
modified2016-05-19 01:21:31.953710
tableserver 0
root0
session_timeout 60
session_autoclose   300
max_file_size   1099511627776
last_failure1478
last_failure_osd_epoch  26431
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}
max_mds 1
in  0
up  {0=1190233}
failed
damaged
stopped
data_pools  0
metadata_pool   1
inline_data disabled
1190233:192.168.2.6:6800/5437 'B' mds.0.1526 up:active seq 103145


Standby daemons:

1190222:192.168.2.5:6801/5871 '2' mds.-1.0 up:standby seq 135114

2016-07-03 17:09:25.776444 7f8291dff480  0 mon.0@-1(probing).osd e26460
crush map has features 2200130813952, adjusting msgr requires
2016-07-03 17:09:25.776450 7f8291dff480  0 mon.0@-1(probing).osd e26460
crush map has features 2200130813952, adjusting msgr requires
2016-07-03 17:09:25.776453 7f8291dff480  0 mon.0@-1(probing).osd e26460
crush map has features 2200130813952, adjusting msgr requires
2016-07-03 17:09:25.776454 7f8291dff480  0 mon.0@-1(probing).osd e26460
crush map has features 2200130813952, adjusting msgr requires
2016-07-03 17:09:25.776696 7f8291dff480  1
mon.0@-1(probing).paxosservice(auth 19251..19344) refr

[ceph-users] Active MON aborts on Jewel 10.2.2 with FAILED assert(info.state == MDSMap::STATE_STANDBY

2016-07-03 Thread Bill Sharer
TH_ERR; 187 pgs are stuck inac
tive for more than 300 seconds; 114 pgs degraded; 138 pgs peering; 49 
pgs stale; 138 pgs stuck inactive; 49 pg
s stuck stale; 114 pgs stuck unclean; 114 pgs undersized; recovery 
1019372/17575834 objects degraded (5.800%);
 too many PGs per OSD (449 > max 300); 1/4 in osds are down; noout 
flag(s) set; 2 mons down, quorum 0,2,3 0,3,4
2016-07-03 17:12:55.193094 7f828388e700  0 log_channel(cluster) log 
[INF] : monmap e10: 5 mons at 
{0=192.168.2.1:6789/0,1=192.168.2.2:6789/0,3=192.168.2.4:6789/0,4=192.168.2.5:6789/0,5=192.168.2.6:6789/0}
2016-07-03 17:12:55.193254 7f828388e700  0 log_channel(cluster) log 
[INF] : pgmap v17870289: 768 pgs: 49 stale+active+clean, 114 
active+undersized+degraded, 138 peering, 467 active+clean; 7128 GB data, 
16620 GB used, 4824 GB / 22356 GB avail; 1019372/17575834 objects 
degraded (5.800%)
2016-07-03 17:12:55.195553 7f828388e700 -1 mon/MDSMonitor.cc: In 
function 'bool 
MDSMonitor::maybe_promote_standby(std::shared_ptr)' thread 
7f828388e700 time 2016-07-03 17:12:55.193360

mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x82) [0x556e001f1e12]
 2: 
(MDSMonitor::maybe_promote_standby(std::shared_ptr)+0x97f) 
[0x556dffede53f]

 3: (MDSMonitor::tick()+0x3b6) [0x556dffee0866]
 4: (MDSMonitor::on_active()+0x28) [0x556dffed9038]
 5: (PaxosService::_active()+0x66a) [0x556dffe5968a]
 6: (Context::complete(int)+0x9) [0x556dffe249a9]
 7: (void finish_contexts(CephContext*, std::list<Context*, 
std::allocator<Context*> >&, int)+0xac) [0x556dffe2ba7c]

 8: (Paxos::finish_round()+0xd0) [0x556dffe50460]
 9: (Paxos::handle_last(std::shared_ptr)+0x103d) 
[0x556dffe51acd]
 10: (Paxos::dispatch(std::shared_ptr)+0x38c) 
[0x556dffe5254c]
 11: (Monitor::dispatch_op(std::shared_ptr)+0xd3b) 
[0x556dffe2245b]

 12: (Monitor::_ms_dispatch(Message*)+0x581) [0x556dffe22b91]
 13: (Monitor::ms_dispatch(Message*)+0x23) [0x556dffe41393]
 14: (DispatchQueue::entry()+0x7ba) [0x556e002e722a]
 15: (DispatchQueue::DispatchThread::entry()+0xd) [0x556e001d62cd]
 16: (()+0x74a4) [0x7f8290f904a4]
 17: (clone()+0x6d) [0x7f828f29298d]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


If asked, I'll dump the rest of the log.



Bill Sharer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disk failures

2016-06-14 Thread Bill Sharer
This is why I use btrfs mirror sets underneath ceph and hopefully more 
than make up for the space loss by going with 2 replicas instead of 3 
and on the fly lzo compression.  The ceph deep scrubs replace any need 
for btrfs scrubs, but I still get the benefit of self healing when btrfs 
finds bit rot.


The only errors I've run into are from hard shutdowns and possible ecc 
errors due to working with consumer hardware and memory.  I've been on 
top of btrfs using gentoo since Firefly.


Bill Sharer


On 06/14/2016 09:27 PM, Christian Balzer wrote:

Hello,

On Tue, 14 Jun 2016 14:26:41 +0200 Jan Schermer wrote:


Hi,
bit rot is not "bit rot" per se - nothing is rotting on the drive
platter.

Never mind that I used the wrong terminology (according to Wiki) and that
my long experience with "laser-rot" probably caused me to choose that
term, there are data degradation scenarios that are caused by
undetected media failures or by the corruption happening in the write
path, thus making them quite reproducible.


It occurs during reads (mostly, anyway), and it's random. You
can happily read a block and get the correct data, then read it again
and get garbage, then get correct data again. This could be caused by a
worn out cell on SSD but firmwares look for than and rewrite it if the
signal is attentuated too much. On spinners there are no cells to
refresh so rewriting it doesn't help either.

You can't really "look for" bit rot due to the reasons above, strong
checksumming/hash verification during reads is the only solution.


Which is what I've been saying in the mail below and for years on this ML.

And that makes deep-scrubbing something of quite limited value.

Christian

And trust me, bit rot is a very real thing and very dangerous as well -
do you think companies like Seagate or WD would lie about bit rot if
it's not real? I'd buy a drive with BER 10^999 over one with 10^14,
wouldn't everyone? And it is especially dangerous when something like
Ceph handles much larger blocks of data than the client does. While the
client (or an app) has some knowledge of the data _and_ hopefully throws
an error if it read garbage, Ceph will (if for example snapshots are
used and FIEMAP is off) actually have to read the whole object (say
4MiB) and write it elsewhere, without any knowledge whether what it read
(and wrote) made any sense to the app. This way corruption might spread
silently into your backups if you don't validate the data somehow (or
dump it from a database for example, where it's likely to get detected).

Btw just because you think you haven't seen it doesn't mean you haven't
seen it - never seen artefacting in movies? Just a random bug in the
decoder, is it? VoD guys would tell you...

For things like databases this is somewhat less impactful - bit rot
doesn't "flip a bit" but affects larger blocks of data (like one
sector), so databases usually catch this during read and err instead of
returning garbage to the client.

Jan




On 09 Jun 2016, at 09:16, Christian Balzer <ch...@gol.com> wrote:


Hello,

On Thu, 9 Jun 2016 08:43:23 +0200 Gandalf Corvotempesta wrote:


Il 09 giu 2016 02:09, "Christian Balzer" <ch...@gol.com> ha scritto:

Ceph currently doesn't do any (relevant) checksumming at all, so if a
PRIMARY PG suffers from bit-rot this will be undetected until the
next deep-scrub.

This is one of the longest and gravest outstanding issues with Ceph
and supposed to be addressed with bluestore (which currently doesn't
have checksum verified reads either).

So if bit rot happens on primary PG, ceph is spreading the currupted
data across the cluster?

No.

You will want to re-read the Ceph docs and the countless posts here
about replication within Ceph works.
http://docs.ceph.com/docs/hammer/architecture/#smart-daemons-enable-hyperscale

A client write goes to the primary OSD/PG and will not be ACK'ed to the
client until is has reached all replica OSDs.
This happens while the data is in-flight (in RAM), it's not read from
the journal or filestore.


What would be sent to the replica,  the original data or the saved
one?

When bit rot happens I'll have 1 corrupted object and 2 good.
how do you manage this between deep scrubs?  Which data would be used
by ceph? I think that a bitrot on a huge VM block device could lead
to a mess like the whole device corrupted
VM affected by bitrot would be able to stay up and running?
And bitrot on a qcow2 file?


Bitrot is a bit hyped, I haven't seen any on the Ceph clusters I run
nor on other systems here where I (can) actually check for it.

As to how it would affect things, that very much depends.

If it's something like a busy directory inode that gets corrupted, the
data in question will be in RAM (SLAB) and the next update  will
correct things.

If it's a logfile, you're likely to never notice until deep-scrub
detects it eventually.

This isn't a  Ceph specific question, on all systems that aren't backe

Re: [ceph-users] Deploying ceph by hand: a few omissions

2016-05-01 Thread Bill Sharer
I have an active and a standby setup. The faillover takes less than a 
minute if you manually stop the active service.  Add whatever the 
timeout is for the faillover to happen if things go pear shaped for the box.


Things are back to letters now for mds servers.  I had started with 
letters on firefly as recommended.  Then somewhere (giant?), I was 
getting prodded to use numbers instead.  Now with later hammer and 
infernalis, I'm back to getting scolded for not using letters :-)


I'm holding off on jewel for the moment until I get things straightened 
out with the kde4 to plasma upgrade.  I think that one got stablized 
before it was quite ready for prime time.  Even then I'll probably take 
a good long time to backup some stuff before I try out the shiny new 
fsck utility.


On 05/01/2016 07:13 PM, Stuart Longland wrote:

Hi Bill,
On 02/05/16 04:37, Bill Sharer wrote:

Actually you didn't need to do a udev rule for raw journals.  Disk
devices in gentoo have their group ownership set to 'disk'.  I only
needed to drop ceph into that in /etc/group when going from hammer to
infernalis.

Yeah, I recall trying that on the Ubuntu-based Ceph cluster at work, and
Ceph still wasn't happy, hence I've gone the route of making the
partition owned by the ceph user.


Did you poke around any of the ceph howto's on the gentoo wiki? It's
been a while since I wrote this guide when I first rolled out with firefly:

https://wiki.gentoo.org/wiki/Ceph/Guide

That used to be https://wiki.gentoo.org/wiki/Ceph before other people
came in behind me and expanded on things

No, hadn't looked at that.


I've pretty much had these bookmarks sitting around forever for adding
and removing mons and osds

http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/

For the MDS server I think I originally went to this blog which also has
other good info.

http://www.sebastien-han.fr/blog/2013/05/13/deploy-a-ceph-mds-server/

That might be my next step, depending on how stable CephFS is now.  One
thing that has worried me is since you can only deploy one MDS, what
happens if that MDS goes down?

If it's simply a case of spin up another one, then fine, I can put up
with a little downtime.  If there's data loss though, then no, that's
not good.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deploying ceph by hand: a few omissions

2016-05-01 Thread Bill Sharer
Actually you didn't need to do a udev rule for raw journals.  Disk 
devices in gentoo have their group ownership set to 'disk'.  I only 
needed to drop ceph into that in /etc/group when going from hammer to 
infernalis.


Did you poke around any of the ceph howto's on the gentoo wiki? It's 
been a while since I wrote this guide when I first rolled out with firefly:


https://wiki.gentoo.org/wiki/Ceph/Guide

That used to be https://wiki.gentoo.org/wiki/Ceph before other people 
came in behind me and expanded on things


I've pretty much had these bookmarks sitting around forever for adding 
and removing mons and osds


http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/

For the MDS server I think I originally went to this blog which also has 
other good info.


http://www.sebastien-han.fr/blog/2013/05/13/deploy-a-ceph-mds-server/


On 05/01/2016 06:46 AM, Stuart Longland wrote:

Hi all,

This evening I was in the process of deploying a ceph cluster by hand.
I did it by hand because to my knowledge, ceph-deploy doesn't support
Gentoo, and my cluster here runs that.

The instructions I followed are these ones:
http://docs.ceph.com/docs/master/install/manual-deployment and I'm
running the 10.0.2 release of Ceph:

ceph version 10.0.2 (86764eaebe1eda943c59d7d784b893ec8b0c6ff9)

Things went okay bootstrapping the monitors.  I'm running a 3-node
cluster, with OSDs and monitors co-located.  Each node has a 1TB 2.5"
HDD and a 40GB partition on SSD for the journal.

Things went pear shaped however when I tried bootstrapping the OSDs.
All was going fine until it came time to activate my first OSD.

ceph-disk activate barfed because I didn't have the bootstrap-osd key.
No one told me I needed to create one, or how to do it.  There's a brief
note about using --activate-key, but no word on what to pass as the
argument.  I tried passing in my admin keyring in /etc/ceph, but it
didn't like that.

In the end, I muddled my way through the manual OSD deployment steps,
which worked fine.  After correcting permissions for the ceph user, I
found the OSDs came up.  As an added bonus, I now know how to work
around the journal permission issue at work since I've reproduced it
here, using a UDEV rules file like the following:

SUBSYSTEM=="block", KERNEL=="sda7", OWNER="ceph", GROUP="ceph", MODE="0600"

The cluster seems to be happy enough now, but some notes on how one
generates the OSD activation keys to use with `ceph-disk activate` would
be a big help.

Regards,


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] hadoop on cephfs

2016-04-29 Thread Bill Sharer
Actually this guy is already a fan of Hadoop.  I was just wondering 
whether anyone has been playing around with it on top of cephfs lately.  
It seems like the last round of papers were from around cuttlefish.


On 04/28/2016 06:21 AM, Oliver Dzombic wrote:

Hi,

bad idea :-)

Its of course nice and important to drag developer towards a
new/promising technology/software.

But if the technology under the individual required specifications does
not match, you will just risk to show this developer how worst this
new/promising technology is.

So you will just reach the opposite of what you want.

So before you are doing something, usually big, like hadoop on an
unstable software, maybe you should not use it.

For the good of the developer, for your good and for the good of the
reputation of the new/promising technology/software you wish.

To force a pinguin to somehow live in the sahara, might be possible ( at
least for some time ), but usually not a good idea ;-)



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] hadoop on cephfs

2016-04-28 Thread Bill Sharer
Just got into a discussion today where I may have a chance to do work 
with a db guy who wants hadoop and I want to steer him to it on cephfs.  
While I'd really like to run gentoo with either infernalis or jewel 
(when it becomes stable in portage), odds are more likely that I will be 
required to use rhel/centos6.7 and thus stuck back at Hammer.  Any thoughts?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Infernalis OSD errored out on journal permissions without mentioning anything in its log

2016-03-31 Thread Bill Sharer
This took a little head scratching until I figured out why my osd 
daemons were  not restarting under Infernalis on Gentoo.


I had just upgraded from Hammer to Infernalis and had reset ownership 
from root:root to ceph:ceph on the files of each OSD in 
/var/lib/ceph/osd/ceph-n.  However I forgot to take into account the 
ownership on the journals which I have set up as raw partitions. Under 
Gentoo, I needed to put the ceph user into the "disk" group to allow it 
to have write access to the device files.


The osd startup init script started the osd with ok status but the 
actual process would exit without writing anything to its 
/var/log/ceph/ceph-osd.n.log.  I would have thought there might have 
been some sort of permission error logged, but nope :-)


Bill Sharer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with journal on another drive

2015-09-29 Thread Bill Sharer
I think I got over 10% improvement when I changed from cooked journal 
file on btrfs based system SSD to a raw partition on the system SSD.  
The cluster I've been testing with is all consumer grade stuff running 
on top of AMD piledriver and kaveri based mobo's with the on-board 
SATA.  My SSDs are a hodgepodge of OCZ Vertex 4 and Samsung 840 and 850 
(non-pro).  I'm also seeing a performance win by merging individual osds 
into btrfs mirror sets after doing thatand dropping the replica count 
from 3 to 2.  I also consider this a better defense in depth strategy 
since btrfs self-heals when it hits bit rot on the mirrors and raid sets.


That boost was probably aio and dio kicking in because of the raw versus 
cooked.  Note that I'm running Hammer on gentoo and my current WIP is 
moving kernels from 3.8 to 4.0.5 everywhere.  It will be interesting to 
see what happens with that.


Regards
Bill

On 09/29/2015 07:32 AM, Jiri Kanicky wrote:

Hi Lionel.

Thank you for your reply. In this case I am considering to create 
separate partitions for each disk on the SSD drive. Would be good to 
know what is the performance difference, because creating partitions 
is kind of waste of space.


One more question, is it a good idea to move journal for 3 OSDs to a 
single SSD considering if SSD fails the whole node with 3 HDDs will be 
down? Thinking of it, leaving journal on each OSD might be safer, 
because journal on one disk does not affect other disks (OSDs). Or do 
you think that having the journal on SSD is better trade off?


Thank you
Jiri

On 29/09/2015 21:10, Lionel Bouton wrote:

Le 29/09/2015 07:29, Jiri Kanicky a écrit :

Hi,

Is it possible to create journal in directory as explained here:
http://wiki.skytech.dk/index.php/Ceph_-_howto,_rbd,_lvm,_cluster#Add.2Fmove_journal_in_running_cluster 


Yes, the general idea (stop, flush, move, update ceph.conf, mkjournal,
start) is valid for moving your journal wherever you want.
That said it probably won't perform as well on a filesystem (LVM as
lower overhead than a filesystem).


1. Create BTRFS over /dev/sda6 (assuming this is SSD partition alocate
for journal) and mount it to /srv/ceph/journal

BTRFS is probably the worst idea for hosting journals. If you must use
BTRFS, you'll have to make sure that the journals are created NoCoW
before the first byte is ever written to them.


2. Add OSD: ceph-deploy osd create --fs-type btrfs
ceph1:sdb:/srv/ceph/journal/osd$id/journal

I've no experience with ceph-deploy...

Best regards,

Lionel



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com