Re: [ceph-users] New user on Ubuntu 16.04

2016-09-08 Thread Alex Evonosky
disregard--

found the issue, it was a remote hostname issue not matching the
localhostname.

Thank you.

On Thu, Sep 8, 2016 at 10:26 PM, Alex Evonosky 
wrote:

> Hey group-
>
> I am a new CEPH user on Ubuntu and notice this when creating a brand new
> monitor following the documentation:
>
> storage@alex-desktop:~/ceph$ ceph-deploy --overwrite-conf mon create
> alex-desktop
> [ceph_deploy.conf][DEBUG ] found configuration file at:
> /home/storage/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (1.5.34): /usr/bin/ceph-deploy
> --overwrite-conf mon create alex-desktop
> [ceph_deploy.cli][INFO  ] ceph-deploy options:
> [ceph_deploy.cli][INFO  ]  username  : None
> [ceph_deploy.cli][INFO  ]  verbose   : False
> [ceph_deploy.cli][INFO  ]  overwrite_conf: True
> [ceph_deploy.cli][INFO  ]  subcommand: create
> [ceph_deploy.cli][INFO  ]  quiet : False
> [ceph_deploy.cli][INFO  ]  cd_conf   :
> 
> [ceph_deploy.cli][INFO  ]  cluster   : ceph
> [ceph_deploy.cli][INFO  ]  mon   : ['alex-desktop']
> [ceph_deploy.cli][INFO  ]  func  :  at 0x7f834118c1b8>
> [ceph_deploy.cli][INFO  ]  ceph_conf : None
> [ceph_deploy.cli][INFO  ]  keyrings  : None
> [ceph_deploy.cli][INFO  ]  default_release   : False
> [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts alex-desktop
> [ceph_deploy.mon][DEBUG ] detecting platform for host alex-desktop ...
> [alex-desktop][DEBUG ] connection detected need for sudo
> [alex-desktop][DEBUG ] connected to host: alex-desktop
> [alex-desktop][DEBUG ] detect platform information from remote host
> [alex-desktop][DEBUG ] detect machine type
> [alex-desktop][DEBUG ] find the location of an executable
> [ceph_deploy.mon][INFO  ] distro info: Ubuntu 16.04 xenial
> [alex-desktop][DEBUG ] determining if provided host has same hostname in
> remote
> [alex-desktop][DEBUG ] get remote short hostname
> [alex-desktop][DEBUG ] deploying mon to alex-desktop
> [alex-desktop][DEBUG ] get remote short hostname
> [alex-desktop][DEBUG ] remote hostname: alex-desktop
> [alex-desktop][DEBUG ] write cluster configuration to
> /etc/ceph/{cluster}.conf
> [alex-desktop][DEBUG ] create the mon path if it does not exist
> [alex-desktop][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-alex-
> desktop/done
> [alex-desktop][DEBUG ] create a done file to avoid re-doing the mon
> deployment
> [alex-desktop][DEBUG ] create the init path if it does not exist
> [alex-desktop][INFO  ] Running command: sudo systemctl enable ceph.target
> [alex-desktop][INFO  ] Running command: sudo systemctl enable
> ceph-mon@alex-desktop
> [alex-desktop][INFO  ] Running command: sudo systemctl start
> ceph-mon@alex-desktop
> [alex-desktop][INFO  ] Running command: sudo ceph --cluster=ceph
> --admin-daemon /var/run/ceph/ceph-mon.alex-desktop.asok mon_status
> [alex-desktop][ERROR ] no valid command found; 10 closest matches:
> [alex-desktop][ERROR ] config set   [...]
> [alex-desktop][ERROR ] version
> [alex-desktop][ERROR ] git_version
> [alex-desktop][ERROR ] help
> [alex-desktop][ERROR ] config show
> [alex-desktop][ERROR ] get_command_descriptions
> [alex-desktop][ERROR ] config get 
> [alex-desktop][ERROR ] perfcounters_dump
> [alex-desktop][ERROR ] 2
> [alex-desktop][ERROR ] config diff
> [alex-desktop][ERROR ] admin_socket: invalid command
> [alex-desktop][WARNIN] monitor: mon.alex-desktop, might not be running yet
> [alex-desktop][INFO  ] Running command: sudo ceph --cluster=ceph
> --admin-daemon /var/run/ceph/ceph-mon.alex-desktop.asok mon_status
> [alex-desktop][ERROR ] no valid command found; 10 closest matches:
> [alex-desktop][ERROR ] config set   [...]
> [alex-desktop][ERROR ] version
> [alex-desktop][ERROR ] git_version
> [alex-desktop][ERROR ] help
> [alex-desktop][ERROR ] config show
> [alex-desktop][ERROR ] get_command_descriptions
> [alex-desktop][ERROR ] config get 
> [alex-desktop][ERROR ] perfcounters_dump
> [alex-desktop][ERROR ] 2
> [alex-desktop][ERROR ] config diff
> [alex-desktop][ERROR ] admin_socket: invalid command
> [alex-desktop][WARNIN] monitor alex-desktop does not exist in monmap
> [alex-desktop][WARNIN] neither `public_addr` nor `public_network` keys are
> defined for monitors
> [alex-desktop][WARNIN] monitors may not be able to form quorum
>
>
> This is a brand new install of ceph just testing on two nodes.
>
> Thank you,
> Alex
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-deploy not creating osd's

2016-09-08 Thread Shain Miley
I ended up starting from scratch and doing a purge and purgedata on that 
host using ceph-deploy, after that things seemed to go better.
The osd is up and in at this point, however when the osd was added to 
the cluster...no data was being moved to the new osd.


Here is a copy of my current crush map:

http://pastebin.com/PMk3xZ0a

as you can see from the entry for osd number 108 (the last osd to be 
added to the cluster)...the crush map does not contain a host entry for 
hqosd10...which is the host for osd #108.


Any ideas on how to resolve this?

Thanks,
Shain


On 9/8/16 2:20 PM, Shain Miley wrote:

Hello,

I am trying to use ceph-deploy to add some new osd's to our cluster.  
I have used this method over the last few years to add all of our 107 
osd's and things have seemed to work quite well.


One difference this time is that we are going to use a pci nvme card 
to journal the 16 disks in this server (Dell R730xd).


As you can see below it appears as though things complete 
successfully, however the osd count never increases, and when I look 
at hqosd10, there are no osd's mounted, and nothing in 
'/var/lib/ceph/osd', no ceph daemons running, etc.


I created the partitions on the nvme card by hand using parted (I was 
not sure if I ceph-deploy should take care of this part or not).


I have zapped the disk and re-run this command several times, and I 
have gotten the same result every time.


We are running Ceph version 0.94.9  on Ubuntu 14.04.5

Here is the output from my attempt:

root@hqceph1:/usr/local/ceph-deploy# ceph-deploy --verbose osd create 
hqosd10:sdb:/dev/nvme0n1p1
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.36): /usr/local/bin/ceph-deploy 
--verbose osd create hqosd10:sdb:/dev/nvme0n1p1

[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  disk  : 
[('hqosd10', '/dev/sdb', '/dev/nvme0n1p1')]

[ceph_deploy.cli][INFO  ]  dmcrypt   : False
[ceph_deploy.cli][INFO  ]  verbose   : True
[ceph_deploy.cli][INFO  ]  bluestore : None
[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  subcommand: create
[ceph_deploy.cli][INFO  ]  dmcrypt_key_dir   : 
/etc/ceph/dmcrypt-keys

[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   : 


[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  fs_type   : xfs
[ceph_deploy.cli][INFO  ]  func  : osd at 0x7f6ba750cc80>

[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.cli][INFO  ]  zap_disk  : False
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
hqosd10:/dev/sdb:/dev/nvme0n1p1

[hqosd10][DEBUG ] connected to host: hqosd10
[hqosd10][DEBUG ] detect platform information from remote host
[hqosd10][DEBUG ] detect machine type
[hqosd10][DEBUG ] find the location of an executable
[hqosd10][INFO  ] Running command: /sbin/initctl version
[hqosd10][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] Deploying osd to hqosd10
[hqosd10][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.osd][DEBUG ] Preparing host hqosd10 disk /dev/sdb journal 
/dev/nvme0n1p1 activate True

[hqosd10][DEBUG ] find the location of an executable
[hqosd10][INFO  ] Running command: /usr/sbin/ceph-disk -v prepare 
--cluster ceph --fs-type xfs -- /dev/sdb /dev/nvme0n1p1
[hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
--cluster=ceph --show-config-value=fsid
[hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
[hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
--cluster=ceph --show-config-value=osd_journal_size
[hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_cryptsetup_parameters
[hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size
[hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_dmcrypt_type

[hqosd10][WARNIN] DEBUG:ceph-disk:Journal /dev/nvme0n1p1 is a partition
[hqosd10][WARNIN] WARNING:ceph-disk:OSD will not be hot-swappable if 
journal is not the same device as the osd data
[hqosd10][WARNIN] INFO:ceph-disk:Running command: /sbin/blkid -p -o 
udev /dev/nvme0n1p1
[hqosd10][WARNIN] 

[ceph-users] New user on Ubuntu 16.04

2016-09-08 Thread Alex Evonosky
Hey group-

I am a new CEPH user on Ubuntu and notice this when creating a brand new
monitor following the documentation:

storage@alex-desktop:~/ceph$ ceph-deploy --overwrite-conf mon create
alex-desktop
[ceph_deploy.conf][DEBUG ] found configuration file at:
/home/storage/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.34): /usr/bin/ceph-deploy
--overwrite-conf mon create alex-desktop
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  overwrite_conf: True
[ceph_deploy.cli][INFO  ]  subcommand: create
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   :

[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  mon   : ['alex-desktop']
[ceph_deploy.cli][INFO  ]  func  : 
[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  keyrings  : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts alex-desktop
[ceph_deploy.mon][DEBUG ] detecting platform for host alex-desktop ...
[alex-desktop][DEBUG ] connection detected need for sudo
[alex-desktop][DEBUG ] connected to host: alex-desktop
[alex-desktop][DEBUG ] detect platform information from remote host
[alex-desktop][DEBUG ] detect machine type
[alex-desktop][DEBUG ] find the location of an executable
[ceph_deploy.mon][INFO  ] distro info: Ubuntu 16.04 xenial
[alex-desktop][DEBUG ] determining if provided host has same hostname in
remote
[alex-desktop][DEBUG ] get remote short hostname
[alex-desktop][DEBUG ] deploying mon to alex-desktop
[alex-desktop][DEBUG ] get remote short hostname
[alex-desktop][DEBUG ] remote hostname: alex-desktop
[alex-desktop][DEBUG ] write cluster configuration to
/etc/ceph/{cluster}.conf
[alex-desktop][DEBUG ] create the mon path if it does not exist
[alex-desktop][DEBUG ] checking for done path:
/var/lib/ceph/mon/ceph-alex-desktop/done
[alex-desktop][DEBUG ] create a done file to avoid re-doing the mon
deployment
[alex-desktop][DEBUG ] create the init path if it does not exist
[alex-desktop][INFO  ] Running command: sudo systemctl enable ceph.target
[alex-desktop][INFO  ] Running command: sudo systemctl enable
ceph-mon@alex-desktop
[alex-desktop][INFO  ] Running command: sudo systemctl start
ceph-mon@alex-desktop
[alex-desktop][INFO  ] Running command: sudo ceph --cluster=ceph
--admin-daemon /var/run/ceph/ceph-mon.alex-desktop.asok mon_status
[alex-desktop][ERROR ] no valid command found; 10 closest matches:
[alex-desktop][ERROR ] config set   [...]
[alex-desktop][ERROR ] version
[alex-desktop][ERROR ] git_version
[alex-desktop][ERROR ] help
[alex-desktop][ERROR ] config show
[alex-desktop][ERROR ] get_command_descriptions
[alex-desktop][ERROR ] config get 
[alex-desktop][ERROR ] perfcounters_dump
[alex-desktop][ERROR ] 2
[alex-desktop][ERROR ] config diff
[alex-desktop][ERROR ] admin_socket: invalid command
[alex-desktop][WARNIN] monitor: mon.alex-desktop, might not be running yet
[alex-desktop][INFO  ] Running command: sudo ceph --cluster=ceph
--admin-daemon /var/run/ceph/ceph-mon.alex-desktop.asok mon_status
[alex-desktop][ERROR ] no valid command found; 10 closest matches:
[alex-desktop][ERROR ] config set   [...]
[alex-desktop][ERROR ] version
[alex-desktop][ERROR ] git_version
[alex-desktop][ERROR ] help
[alex-desktop][ERROR ] config show
[alex-desktop][ERROR ] get_command_descriptions
[alex-desktop][ERROR ] config get 
[alex-desktop][ERROR ] perfcounters_dump
[alex-desktop][ERROR ] 2
[alex-desktop][ERROR ] config diff
[alex-desktop][ERROR ] admin_socket: invalid command
[alex-desktop][WARNIN] monitor alex-desktop does not exist in monmap
[alex-desktop][WARNIN] neither `public_addr` nor `public_network` keys are
defined for monitors
[alex-desktop][WARNIN] monitors may not be able to form quorum


This is a brand new install of ceph just testing on two nodes.

Thank you,
Alex
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Memory leak with latest ceph code

2016-09-08 Thread Zhiyuan Wang
Hi guys

I am testing the performance of blue store backend on PCIE SSD with ceph jewel 
10.2.2 and I found that the 4K random write performance is bad because of large 
write amplification.

After I got some recently change log which related to this issue. I upgrade to 
the latest code, and I was so surprised that the performance was much better. 
But there is an issue that the memory is out after running tens of minutes. 
Does anyone have idea about the issue. I know the code is not stable, but I 
eager for great performance.

Thanks a lot
Email Disclaimer & Confidentiality Notice
This message is confidential and intended solely for the use of the recipient 
to whom they are addressed. If you are not the intended recipient you should 
not deliver, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail and delete this e-mail from your system. Copyright © 2016 
by Istuary Innovation Labs, Inc. All rights reserved.

Thanks
Email Disclaimer & Confidentiality Notice
This message is confidential and intended solely for the use of the recipient 
to whom they are addressed. If you are not the intended recipient you should 
not deliver, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail and delete this e-mail from your system. Copyright © 2016 
by Istuary Innovation Labs, Inc. All rights reserved.

Email Disclaimer & Confidentiality Notice
This message is confidential and intended solely for the use of the recipient 
to whom they are addressed. If you are not the intended recipient you should 
not deliver, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail and delete this e-mail from your system. Copyright © 2016 
by Istuary Innovation Labs, Inc. All rights reserved.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] non-effective new deep scrub interval

2016-09-08 Thread Christian Balzer

Hello,

On Thu, 8 Sep 2016 17:09:27 +0200 (CEST) David DELON wrote:

> 
> First, thanks for your answer Christian.
>
C'est rien.
 
> - Le 8 Sep 16, à 13:30, Christian Balzer ch...@gol.com a écrit :
> 
> > Hello,
> > 
> > On Thu, 8 Sep 2016 09:48:46 +0200 (CEST) David DELON wrote:
> > 
> >> 
> >> Hello,
> >> 
> >> i'm using ceph jewel.
> >> I would like to schedule the deep scrub operations on my own.
> > 
> > Welcome to the club, alas the ride isn't for the faint of heart.
> > 
> > You will want to (re-)search the ML archive (google) and in particular the
> > recent "Spreading deep-scrubbing load" thread.
> 
> It is not exactly what i would like to do. That's why i have posted.
> I wanted to trigger on my own the deep scrubbing on sundays with a cron 
> script... 
>
If you look at that thread (and others) that's what I do, too.
And ideally, not even needing a cron script after the first time,
provided your scrubs can fit into the time frame permitted.
 
> 
> >> First of all, i have tried to change the interval value for 30 days:
> >> In each /etc/ceph/ceph.conf, i have added:
> >> 
> >> [osd]
> >> #30*24*3600
> >> osd deep scrub interval = 2592000
> >> I have restarted all the OSD daemons.
> > 
> > This could have been avoided by an "inject" for all OSDs.
> > Restarting (busy) OSDs isn't particular nice for a cluster.
> 
> I have first done the inject of the new value. But as it did not the trick 
> after some hours and the "injectargs" command have returned
> "(unchangeable)"
> i have thought OSD restarts were needed... 
>
I keep forgetting about that, annoying.
 
> 
> >> The new value has been taken into account as for each OSD:
> >> 
> >> ceph --admin-daemon /var/run/ceph/ceph-osd.X.asok config show | grep
> >> deep_scrub_interval
> >> "osd_deep_scrub_interval": "2.592e+06",
> >> 
> >> 
> >> I have checked the last_deep_scrub value for each pg with
> >> ceph pg dump
> >> And each pg has been deep scrubbed during the last 7 days (which is the 
> >> default
> >> behavior).
> >> 
> > See the above thread.
> > 
> >> Since i have made the changes 2 days ago, it keeps on deep scrubbing.
> >> Do i miss something?
> >> 
> > At least 2 things, maybe more.
> > 
> > Unless you changed the "osd_scrub_max_interval" as well, that will enforce
> > things, by default after a week.
> 
> Increasing osd_scrub_max_interval and osd_scrub_min_interval does not solve.
> 

osd_scrub_min_interval has no impact on deep scrubs,
osd_scrub_max_interval interestingly and unexpectedly does.

Meaning it's the next one:

> > And with Jewel you get that well meaning, but turned on by default and
> > ill-documented "osd_scrub_interval_randomize_ratio", which will spread
> > things out happily and not when you want them.
> > 

If you set osd_scrub_interval_randomize_ratio to 0, scrubs should be
become fixed interval and deterministic again.

Christian

> > Again, read the above thread.
> > 
> > Also your cluster _should_ be able to endure deep scrubs even when busy,
> > otherwise you're looking at trouble when you loose an OSD and the
> > resulting balancing as well.
> > 
> > Setting these to something sensible:
> >"osd_scrub_begin_hour": "0",
> >"osd_scrub_end_hour": "6",
> > 
> > and especially this:
> >"osd_scrub_sleep": "0.1",
> 
> 
> OK, i will consider this solution.
> 
> > will minimize the impact of scrub as well.
> > 
> > Christian
> > --
> > Christian BalzerNetwork/Systems Engineer
> > ch...@gol.com   Global OnLine Japan/Rakuten Communications
> > http://www.gol.com/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] FW: Multiple public networks and ceph-mon daemons listening

2016-09-08 Thread Jim Kilborn
Thanks for the clarification Greg. The private network was a NAT network, but I 
got rid of the NAT, and set the head node just to straight routing. I went 
ahead an set all the daemons to the private network, and its working fine now. 
I was hoping to avoid routing the outside traffic, but no big deal.

I’m new to cephfs and ceph completely, so I’m in that steep learning curve 
phase

Thanks again

Sent from Windows Mail

From: Gregory Farnum
Sent: ‎Thursday‎, ‎September‎ ‎8‎, ‎2016 ‎6‎:‎05‎ ‎PM
To: Jim Kilborn
Cc: Wido den Hollander, 
ceph-users@lists.ceph.com

On Thu, Sep 8, 2016 at 7:13 AM, Jim Kilborn  wrote:
> Thanks for the reply.
>
>
>
> When I said the compute nodes mounted the cephfs volume, I am referring to a 
> real linux cluster of physical machines,. Openstack VM/ compute nodes are not 
> involved in my setup. We are transitioning from an older linux cluster using 
> nfs from the head node/san to the new cluster using cephfs. All physical 
> systems mounting the shared volume. Storing home directories and data.
>
>
>
> http://oi63.tinypic.com/2ljp72v.jpg
>
>
>
>
>
> The linux cluster is in a NAT private network, where the only systems 
> attached to the corporate network are the ceph servers and our main linux 
> head node. They are dual connected.
>
> Your saying I cant have ceph volumes mounted and the traffic to the osds 
> coming in on more than one interface? It is limited to one interface?

Well, obviously clients connect to OSDs on the "public" network,
right? The "cluster" network is used by the OSDs for replication. And
as you've noticed, the monitors only use one address, and that needs
to be accessible/routable for everybody.

I presume you *have* a regular IP network on the OSDs that the clients
can route? Otherwise they won't be able to access any data at all. So
I think you just want to set up the monitors and the OSDs on the same
TCP network...

Otherwise there's a bit of a misunderstanding, probably because of the
names. Consider "cluster" network to mean "OSD replication traffic"
and "public" to mean "everything else, including all client IO".
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Client XXX failing to respond to cache pressure

2016-09-08 Thread Gregory Farnum
On Thu, Sep 8, 2016 at 5:59 AM, Georgi Chorbadzhiyski
 wrote:
> Today I was surprised to find our cluster in HEALTH_WARN condition and
> searching in documentation was no help at all.
>
> Does anybody have an idea how to cure the dreaded "failing to respond
> to cache pressure" message. As I understand it, it tells me that a
> client is not responding to MDS request to prune it's cache but
> I have no idea what is causing the problem and how to cure it.
>
> I'm using kernel cephfs driver on kernel 4.4.14.

You probably want to search the list archives at gmane et al for this.
You should check to see how many files the clients are actually using
compared to what they hold caps on (you can check caps via the admin
socket); it might just be that the amount of in-use data is higher
than your MDS cache size (100k by default, but you probably have
enough memory to increase it by one or two orders of magnitude).
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs lost from cephfs data pool, how to determine which files to restore from backup?

2016-09-08 Thread John Spray
On Thu, Sep 8, 2016 at 3:42 PM, John Spray  wrote:
> On Thu, Sep 8, 2016 at 2:06 AM, Gregory Farnum  wrote:
>> On Wed, Sep 7, 2016 at 7:44 AM, Michael Sudnick
>>  wrote:
>>> I've had to force recreate some PGs on my cephfs data pool due to some
>>> cascading disk failures in my homelab cluster. Is there a way to easily
>>> determine which files I need to restore from backup? My metadata pool is
>>> completely intact.
>>
>> Assuming you're on Jewel, run a recursive "scrub" on the MDS root via
>> the admin socket, and all the missing files should get logged in the
>> local MDS log.
>
> This isn't quite accurate -- the forward scrub is only checking for
> the first object in the file (which contains the backtrace), so it
> won't identify any files where other objects may have been in the lost
> PGs.
>
> Also, it turns out that the MDS doesn't actually log anything in this
> case; the issue is noted in the scrub result object for the inode, but
> that doesn't go anywhere unless you were explicitly doing "scrub_path
> /" in which case you get the detailed results on the command
> line.
>
> Anyway -- currently there isn't an efficient tool for answering the
> question "which files have objects in this PG?".  The only way to work
> it out is to scan through every possible object in every file in the
> system.  You can sort of do this by writing a script, but it'll be
> very slow if you have to call into "ceph osd map" for each object ID.
> It may well be faster to do a full restore from your backup.

Follow up... we've talked about writing this tool before and it felt
like the time had come, so this will go into Kraken:
https://github.com/ceph/ceph/pull/11026

John

>
>> (I'm surprised at this point to discover we don't seem to have any
>> documentation about how scrubbing works. It's a regular admin socket
>> command and "ceph daemon mds. help" should get you going where
>> you need.)
>> -Greg
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] FW: Multiple public networks and ceph-mon daemons listening

2016-09-08 Thread Gregory Farnum
On Thu, Sep 8, 2016 at 7:13 AM, Jim Kilborn  wrote:
> Thanks for the reply.
>
>
>
> When I said the compute nodes mounted the cephfs volume, I am referring to a 
> real linux cluster of physical machines,. Openstack VM/ compute nodes are not 
> involved in my setup. We are transitioning from an older linux cluster using 
> nfs from the head node/san to the new cluster using cephfs. All physical 
> systems mounting the shared volume. Storing home directories and data.
>
>
>
> http://oi63.tinypic.com/2ljp72v.jpg
>
>
>
>
>
> The linux cluster is in a NAT private network, where the only systems 
> attached to the corporate network are the ceph servers and our main linux 
> head node. They are dual connected.
>
> Your saying I cant have ceph volumes mounted and the traffic to the osds 
> coming in on more than one interface? It is limited to one interface?

Well, obviously clients connect to OSDs on the "public" network,
right? The "cluster" network is used by the OSDs for replication. And
as you've noticed, the monitors only use one address, and that needs
to be accessible/routable for everybody.

I presume you *have* a regular IP network on the OSDs that the clients
can route? Otherwise they won't be able to access any data at all. So
I think you just want to set up the monitors and the OSDs on the same
TCP network...

Otherwise there's a bit of a misunderstanding, probably because of the
names. Consider "cluster" network to mean "OSD replication traffic"
and "public" to mean "everything else, including all client IO".
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph OSD with 95% full

2016-09-08 Thread Ronny Aasen
ceph-dash is VERY easy to set up and get working 
https://github.com/Crapworks/ceph-dash


gives you a nice webpage to manually observe from.
the page is also easily read by any alerting software you might have. 
and you should configure it to alert on anything besides HEALTH_OK




Kind regards
Ronny Aasen


On 20.07.2016 15:52, M Ranga Swami Reddy wrote:

Do we have any tool to monitor the OSDs usage with help of UI?

Thanks
Swami

[snip]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS and calculation of directory size

2016-09-08 Thread John Spray
On Thu, Sep 8, 2016 at 6:59 PM, Ilya Moldovan  wrote:
> Hello!
>
> How CephFS calculates the directory size? As I know there is two
> implementations:
>
> 1. Recursive directory traversal like in EXT4 and NTFS
> 2. Calculation of the directory size by the file system driver and save it
> as an attribute. In this case, the driver catches adding, deleting and
> editing files on the fly and changes the size of the directory. In this case
> there is no need recursive directory traversal.
>
> The directory which we are requesting a size can potentially contain
> thousands of files at different levels of nesting.
>
> Our components will call the the directory size using the POSIX API. The
> number of calls of this attribute is will be high and recursive directory
> traversal is not suitable for us.

CephFS does not calculate the recursive statistics (rstats) every time
you stat the directory -- accessing them is fast (although they are
updated a little bit lazily).

John

>
> Thanks for the answers!
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph-deploy not creating osd's

2016-09-08 Thread Shain Miley

Hello,

I am trying to use ceph-deploy to add some new osd's to our cluster.  I 
have used this method over the last few years to add all of our 107 
osd's and things have seemed to work quite well.


One difference this time is that we are going to use a pci nvme card to 
journal the 16 disks in this server (Dell R730xd).


As you can see below it appears as though things complete successfully, 
however the osd count never increases, and when I look at hqosd10, there 
are no osd's mounted, and nothing in '/var/lib/ceph/osd', no ceph 
daemons running, etc.


I created the partitions on the nvme card by hand using parted (I was 
not sure if I ceph-deploy should take care of this part or not).


I have zapped the disk and re-run this command several times, and I have 
gotten the same result every time.


We are running Ceph version 0.94.9  on Ubuntu 14.04.5

Here is the output from my attempt:

root@hqceph1:/usr/local/ceph-deploy# ceph-deploy --verbose osd create 
hqosd10:sdb:/dev/nvme0n1p1
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.36): /usr/local/bin/ceph-deploy 
--verbose osd create hqosd10:sdb:/dev/nvme0n1p1

[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  disk  : [('hqosd10', 
'/dev/sdb', '/dev/nvme0n1p1')]

[ceph_deploy.cli][INFO  ]  dmcrypt   : False
[ceph_deploy.cli][INFO  ]  verbose   : True
[ceph_deploy.cli][INFO  ]  bluestore : None
[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  subcommand: create
[ceph_deploy.cli][INFO  ]  dmcrypt_key_dir   : 
/etc/ceph/dmcrypt-keys

[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   : 


[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  fs_type   : xfs
[ceph_deploy.cli][INFO  ]  func  : at 0x7f6ba750cc80>

[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.cli][INFO  ]  zap_disk  : False
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
hqosd10:/dev/sdb:/dev/nvme0n1p1

[hqosd10][DEBUG ] connected to host: hqosd10
[hqosd10][DEBUG ] detect platform information from remote host
[hqosd10][DEBUG ] detect machine type
[hqosd10][DEBUG ] find the location of an executable
[hqosd10][INFO  ] Running command: /sbin/initctl version
[hqosd10][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] Deploying osd to hqosd10
[hqosd10][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.osd][DEBUG ] Preparing host hqosd10 disk /dev/sdb journal 
/dev/nvme0n1p1 activate True

[hqosd10][DEBUG ] find the location of an executable
[hqosd10][INFO  ] Running command: /usr/sbin/ceph-disk -v prepare 
--cluster ceph --fs-type xfs -- /dev/sdb /dev/nvme0n1p1
[hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
--cluster=ceph --show-config-value=fsid
[hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
[hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
--cluster=ceph --show-config-value=osd_journal_size
[hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_cryptsetup_parameters
[hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size
[hqosd10][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_dmcrypt_type

[hqosd10][WARNIN] DEBUG:ceph-disk:Journal /dev/nvme0n1p1 is a partition
[hqosd10][WARNIN] WARNING:ceph-disk:OSD will not be hot-swappable if 
journal is not the same device as the osd data
[hqosd10][WARNIN] INFO:ceph-disk:Running command: /sbin/blkid -p -o udev 
/dev/nvme0n1p1
[hqosd10][WARNIN] WARNING:ceph-disk:Journal /dev/nvme0n1p1 was not 
prepared with ceph-disk. Symlinking directly.

[hqosd10][WARNIN] DEBUG:ceph-disk:Creating osd partition on /dev/sdb
[hqosd10][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk 
--largest-new=1 --change-name=1:ceph data 
--partition-guid=1:1541833e-1513-4446-9779-7dcb61a95a07 
--typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/sdb

[hqosd10][DEBUG ] The operation has completed successfully.
[hqosd10][WARNIN] DEBUG:ceph-disk:Calling partprobe on created device 
/dev/sdb

[hqosd10][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdb

[ceph-users] CephFS and calculation of directory size

2016-09-08 Thread Ilya Moldovan
Hello!

How CephFS calculates the directory size? As I know there is two
implementations:

1. Recursive directory traversal like in EXT4 and NTFS
2. Calculation of the directory size by the file system driver and save it
as an attribute. In this case, the driver catches adding, deleting and
editing files on the fly and changes the size of the directory. In this
case there is no need recursive directory traversal.

The directory which we are requesting a size can potentially contain
thousands of files at different levels of nesting.

Our components will call the the directory size using the POSIX API. The
number of calls of this attribute is will be high and recursive directory
traversal is not suitable for us.

Thanks for the answers!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cannot start the Ceph daemons using upstart after upgrading to Jewel 10.2.2

2016-09-08 Thread David
Afaik, the daemons are managed by systemd now on most distros e.g:

systemctl start ceph-osd@0.service



On Thu, Sep 8, 2016 at 3:36 PM, Simion Marius Rad  wrote:

> Hello,
>
> Today I upgraded an Infernalis 9.2.1 cluster to Jewel 10.2.2.
> All went well until I wanted to restart the daemons using upstart (initctl
> ).
> Any upstart invocation fails to start the daemons.
> In order to keep the cluster up I started the daemons by myself using the
> commands invoked usually by upstart.
>
>
> The cluster runs on Ubuntu 14.04 LTS (kernel 3.19 ).
>
> Did someone else have a similar issue after upgrade ?
>
> Thanks,
> Simion Rad
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Excluding buckets in RGW Multi-Site Sync

2016-09-08 Thread Casey Bodley


On 09/08/2016 08:35 AM, Wido den Hollander wrote:

Hi,

I've been setting up a RGW Multi-Site [0] configuration in 6 VMs. 3 VMs per 
cluster and one RGW per cluster.

Works just fine, I can create a user in the master zone, create buckets and 
upload data using s3cmd (S3).

What I see is that ALL data is synced between the two zones. While I understand 
that's indeed the purpose of it, is there a way to disable the sync for 
specific buckets/users?

Wido

[0]: http://docs.ceph.com/docs/master/radosgw/multisite/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hi Wido,

This came up recently on the ceph-devel list (see [rgw multisite] 
disable specified bucket data sync), and there's an initial PR to do 
this at https://github.com/ceph/ceph/pull/10995.


Casey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] non-effective new deep scrub interval

2016-09-08 Thread David DELON

First, thanks for your answer Christian.

- Le 8 Sep 16, à 13:30, Christian Balzer ch...@gol.com a écrit :

> Hello,
> 
> On Thu, 8 Sep 2016 09:48:46 +0200 (CEST) David DELON wrote:
> 
>> 
>> Hello,
>> 
>> i'm using ceph jewel.
>> I would like to schedule the deep scrub operations on my own.
> 
> Welcome to the club, alas the ride isn't for the faint of heart.
> 
> You will want to (re-)search the ML archive (google) and in particular the
> recent "Spreading deep-scrubbing load" thread.

It is not exactly what i would like to do. That's why i have posted.
I wanted to trigger on my own the deep scrubbing on sundays with a cron 
script... 


>> First of all, i have tried to change the interval value for 30 days:
>> In each /etc/ceph/ceph.conf, i have added:
>> 
>> [osd]
>> #30*24*3600
>> osd deep scrub interval = 2592000
>> I have restarted all the OSD daemons.
> 
> This could have been avoided by an "inject" for all OSDs.
> Restarting (busy) OSDs isn't particular nice for a cluster.

I have first done the inject of the new value. But as it did not the trick 
after some hours and the "injectargs" command have returned
"(unchangeable)"
i have thought OSD restarts were needed... 


>> The new value has been taken into account as for each OSD:
>> 
>> ceph --admin-daemon /var/run/ceph/ceph-osd.X.asok config show | grep
>> deep_scrub_interval
>> "osd_deep_scrub_interval": "2.592e+06",
>> 
>> 
>> I have checked the last_deep_scrub value for each pg with
>> ceph pg dump
>> And each pg has been deep scrubbed during the last 7 days (which is the 
>> default
>> behavior).
>> 
> See the above thread.
> 
>> Since i have made the changes 2 days ago, it keeps on deep scrubbing.
>> Do i miss something?
>> 
> At least 2 things, maybe more.
> 
> Unless you changed the "osd_scrub_max_interval" as well, that will enforce
> things, by default after a week.

Increasing osd_scrub_max_interval and osd_scrub_min_interval does not solve.

> And with Jewel you get that well meaning, but turned on by default and
> ill-documented "osd_scrub_interval_randomize_ratio", which will spread
> things out happily and not when you want them.
> 
> Again, read the above thread.
> 
> Also your cluster _should_ be able to endure deep scrubs even when busy,
> otherwise you're looking at trouble when you loose and OSD and the
> resulting balancing as well.
> 
> Setting these to something sensible:
>"osd_scrub_begin_hour": "0",
>"osd_scrub_end_hour": "6",
> 
> and especially this:
>"osd_scrub_sleep": "0.1",


OK, i will consider this solution.

> will minimize the impact of scrub as well.
> 
> Christian
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Memory leak with latest ceph code

2016-09-08 Thread Wangzhiyuan
Hi guys

I am testing the performance of blue store backend on PCIE SSD with ceph jewel 
10.2.2 and I found that the 4K random write performance is bad because of large 
write amplification.

After I got some recently change log which related to this issue. I upgrade to 
the latest code, and I was so surprised that the performance was much better. 
But there is an issue that the memory is out after running tens of minutes. 
Does anyone have idea about the issue. I know the code is not stable, but I 
eager for great performance.

Thanks a lot
Email Disclaimer & Confidentiality Notice
This message is confidential and intended solely for the use of the recipient 
to whom they are addressed. If you are not the intended recipient you should 
not deliver, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail and delete this e-mail from your system. Copyright © 2016 
by Istuary Innovation Labs, Inc. All rights reserved.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs lost from cephfs data pool, how to determine which files to restore from backup?

2016-09-08 Thread John Spray
On Thu, Sep 8, 2016 at 2:06 AM, Gregory Farnum  wrote:
> On Wed, Sep 7, 2016 at 7:44 AM, Michael Sudnick
>  wrote:
>> I've had to force recreate some PGs on my cephfs data pool due to some
>> cascading disk failures in my homelab cluster. Is there a way to easily
>> determine which files I need to restore from backup? My metadata pool is
>> completely intact.
>
> Assuming you're on Jewel, run a recursive "scrub" on the MDS root via
> the admin socket, and all the missing files should get logged in the
> local MDS log.

This isn't quite accurate -- the forward scrub is only checking for
the first object in the file (which contains the backtrace), so it
won't identify any files where other objects may have been in the lost
PGs.

Also, it turns out that the MDS doesn't actually log anything in this
case; the issue is noted in the scrub result object for the inode, but
that doesn't go anywhere unless you were explicitly doing "scrub_path
/" in which case you get the detailed results on the command
line.

Anyway -- currently there isn't an efficient tool for answering the
question "which files have objects in this PG?".  The only way to work
it out is to scan through every possible object in every file in the
system.  You can sort of do this by writing a script, but it'll be
very slow if you have to call into "ceph osd map" for each object ID.
It may well be faster to do a full restore from your backup.

John

> (I'm surprised at this point to discover we don't seem to have any
> documentation about how scrubbing works. It's a regular admin socket
> command and "ceph daemon mds. help" should get you going where
> you need.)
> -Greg
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore crashes

2016-09-08 Thread thomas.swindells
You are right we are running ceph version 10.2.2 
(45107e21c568dd033c2f0a3107dec8f0b0e58374) at the moment, if master has had 
substantial work done on it then it sounds like it is worth us retesting on 
that.
I fully appreciate that it is still a work in progress and that future 
performance may go down as well as up. That said, other than this one issue we 
have been impressed with the stability and not come across other significant 
bugs,
Thanks,
Thomas

  From: Thomas Swindells 
 To: Mark Nelson ; "ceph-users@lists.ceph.com" 
 
 Sent: Thursday, 8 September 2016, 15:39
 Subject: Re: [ceph-users] Bluestore crashes
   
You are right we are running ceph version 10.2.2 
(45107e21c568dd033c2f0a3107dec8f0b0e58374) at the moment, if master has had 
substantial work done on it then it sounds like it is worth us retesting on 
that.
I fully appreciate that it is still a work in progress and that future 
performance may go down as well as up. That said, other than this one issue we 
have been impressed with the stability and not come across other significant 
bugs,
Thanks,
Thomas

  From: Mark Nelson 
 To: ceph-users@lists.ceph.com 
 Sent: Thursday, 8 September 2016, 15:15
 Subject: Re: [ceph-users] Bluestore crashes
  
It's important to keep in mind that bluestore is still rapidly being 
developed.  At any given commit it might crash, eat data, be horribly 
slow, destroy your computer, etc.  It's very much wild west territory. 
Jewel's version of bluestore is quite different than what is in master 
right now.  Please take any benchmarks with a big grain of salt since 
there are several known issues we are working through (especially around 
encode/decode).  Having said that, I'm glad to hear it's faster! :)

Mark

On 09/08/2016 08:19 AM, Wido den Hollander wrote:
>
>> Op 8 september 2016 om 14:58 schreef thomas.swinde...@yahoo.com:
>>
>>
>> We've been doing some performance testing on Bluestore to see whether it 
>> could be viable to use in the future.
>> The good news we are seeing significant performance improvements on using 
>> it, so thank you for all the work that has gone into it.
>> The bad news is we keep encountering crashes and corruption requiring the 
>> rebuild: Example log extract looks like the following:
>> 2016-09-03 17:56:57.337756 7f593fe9b700 -1 freelist release bad release 
>> 564521787392~4096 overlaps with 564521787392~40962016-09-03 17:56:57.340169 
>> 7f593fe9b700 -1 os/bluestore/FreelistManager.cc: In function 'int 
>> FreelistManager::release(uint64_t, uint64_t, KeyValueDB::Transaction)' 
>> thread 7f593fe9b700 time 2016-09-03 
>> 17:56:57.338393os/bluestore/FreelistManager.cc: 245: FAILED assert(0 == "bad 
>> release overlap")
>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) 1: 
>>(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) 
>>[0x7f59643945b5] 2: (FreelistManager::release(unsigned long, unsigned long, 
>>std::shared_ptr)+0x533) [0x7f596402c963] 3: 
>>(BlueStore::_txc_update_fm(BlueStore::TransContext*)+0x317) [0x7f5963fda0d7] 
>>4: (BlueStore::_kv_sync_thread()+0x9b0) [0x7f5964003690] 5: 
>>(BlueStore::KVSyncThread::entry()+0xd) [0x7f59640299dd] 6: (()+0x7dc5) 
>>[0x7f59622c4dc5] 7: (clone()+0x6d) [0x7f596095021d] NOTE: a copy of the 
>>executable, or `objdump -rdS ` is needed to interpret this.
>> --- begin dump of recent events 1> 2016-09-03 17:43:23.940910 
>> 7f593fe9b700  5 rocksdb: EmitPhysicalRecord: log 37 offset 70561038 len 23 
>> crc 696327790 -> 2016-09-03 17:43:23.965781 7f593fe9b700  5 rocksdb: 
>> EmitPhysicalRecord: log 37 offset 70561061 len 2178 crc 2109984297 -9998> 
>> 2016-09-03 17:43:23.965833 7f593fe9b700  5 rocksdb: EmitPhysicalRecord: log 
>> 37 offset 70563239 len 2175 crc 493419836 -9997> 2016-09-03 17:43:23.965867 
>> 7f593fe9b700  5 rocksdb: EmitPhysicalRecord: log 37 offset 70565414 len 23 
>> crc 1766806723...
>>
>
> Seems you are running Jewel (10.2.2) and the BlueStore version in there is 
> not the best.
>
> If you want to test with BlueStore I recommend that you run with code from 
> the master branch.
>
> Wido
>
>> This appears to match this existing bug: http://tracker.ceph.com/issues/15659
>> Are there any know work-arounds to prevent the issue from happening? What 
>> information and support can we provide to help the fix on the issue to be 
>> progressed? We seem to be able to reliably reproduce the issue after about 6 
>> hours or so of test running so would be able to test any proposed fixes if 
>> that would be helpful,
>> Thanks,
>> Thomas___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

[ceph-users] Cannot start the Ceph daemons using upstart after upgrading to Jewel 10.2.2

2016-09-08 Thread Simion Marius Rad
Hello,

Today I upgraded an Infernalis 9.2.1 cluster to Jewel 10.2.2.
All went well until I wanted to restart the daemons using upstart (initctl
).
Any upstart invocation fails to start the daemons.
In order to keep the cluster up I started the daemons by myself using the
commands invoked usually by upstart.


The cluster runs on Ubuntu 14.04 LTS (kernel 3.19 ).

Did someone else have a similar issue after upgrade ?

Thanks,
Simion Rad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Jewel 10.2.2 - Error when flushing journal

2016-09-08 Thread Alexey Sheplyakov
Hi,

> root@:~# ceph-osd -i 12 --flush-journal
> SG_IO: questionable sense data, results may be incorrect
> SG_IO: questionable sense data, results may be incorrect

As far as I understand these lines is a hdparm warning (OSD uses hdparm
command to query the journal device write cache state).
The message means hdparm is unable to reliably figure out if the drive
write cache is enabled. This might indicate a hardware problem.

> ceph-osd -i 12 --flush-journal

I think it's a good idea to a) check the journal drive (smartctl), b)
capture a more verbose log,
i.e. add this to ceph.conf

[osd]
debug filestore = 20/20
debug journal = 20/20

and try flushing the journal once more (note: this won't fix the problem,
the point is to get a useful log)

Best regards,
  Alexey


On Wed, Sep 7, 2016 at 6:48 PM, Mehmet  wrote:

> Hey again,
>
> now i have stopped my osd.12 via
>
> root@:~# systemctl stop ceph-osd@12
>
> and when i am flush the journal...
>
> root@:~# ceph-osd -i 12 --flush-journal
> SG_IO: questionable sense data, results may be incorrect
> SG_IO: questionable sense data, results may be incorrect
> *** Caught signal (Segmentation fault) **
>  in thread 7f421d49d700 thread_name:ceph-osd
>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>  1: (()+0x96bdde) [0x564545e65dde]
>  2: (()+0x113d0) [0x7f422277e3d0]
>  3: [0x56455055a3c0]
> 2016-09-07 17:42:58.128839 7f421d49d700 -1 *** Caught signal (Segmentation
> fault) **
>  in thread 7f421d49d700 thread_name:ceph-osd
>
>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>  1: (()+0x96bdde) [0x564545e65dde]
>  2: (()+0x113d0) [0x7f422277e3d0]
>  3: [0x56455055a3c0]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> to interpret this.
>
>  0> 2016-09-07 17:42:58.128839 7f421d49d700 -1 *** Caught signal
> (Segmentation fault) **
>  in thread 7f421d49d700 thread_name:ceph-osd
>
>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>  1: (()+0x96bdde) [0x564545e65dde]
>  2: (()+0x113d0) [0x7f422277e3d0]
>  3: [0x56455055a3c0]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> to interpret this.
>
> Segmentation fault
>
> The logfile with further information
> - http://slexy.org/view/s2T8AohMfU
>
> I guess i will get same message when i flush the other journals.
>
> - Mehmet
>
>
> Am 2016-09-07 13:23, schrieb Mehmet:
>
>> Hello ceph people,
>>
>> yesterday i stopped one of my OSDs via
>>
>> root@:~# systemctl stop ceph-osd@10
>>
>> and tried to flush the journal for this osd via
>>
>> root@:~# ceph-osd -i 10 --flush-journal
>>
>> but getting this output on the screen:
>>
>> SG_IO: questionable sense data, results may be incorrect
>> SG_IO: questionable sense data, results may be incorrect
>> *** Caught signal (Segmentation fault) **
>>  in thread 7fd846333700 thread_name:ceph-osd
>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (()+0x96bdde) [0x55f33b862dde]
>>  2: (()+0x113d0) [0x7fd84b6143d0]
>>  3: [0x55f345bbff80]
>> 2016-09-06 22:12:51.850739 7fd846333700 -1 *** Caught signal
>> (Segmentation fault) **
>>  in thread 7fd846333700 thread_name:ceph-osd
>>
>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (()+0x96bdde) [0x55f33b862dde]
>>  2: (()+0x113d0) [0x7fd84b6143d0]
>>  3: [0x55f345bbff80]
>>  NOTE: a copy of the executable, or `objdump -rdS ` is
>> needed to interpret this.
>>
>>  0> 2016-09-06 22:12:51.850739 7fd846333700 -1 *** Caught signal
>> (Segmentation fault) **
>>  in thread 7fd846333700 thread_name:ceph-osd
>>
>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (()+0x96bdde) [0x55f33b862dde]
>>  2: (()+0x113d0) [0x7fd84b6143d0]
>>  3: [0x55f345bbff80]
>>  NOTE: a copy of the executable, or `objdump -rdS ` is
>> needed to interpret this.
>>
>> Segmentation fault
>>
>> This is the logfile from my osd.10 with further informations
>> - http://slexy.org/view/s21tfwQ1fZ
>>
>> Today i stopped another OSD (osd.11)
>>
>> root@:~# systemctl stop ceph-osd@11
>>
>> I did not not get the above mentioned error - but this
>>
>> root@:~# ceph-osd -i 11 --flush-journal
>> SG_IO: questionable sense data, results may be incorrect
>> SG_IO: questionable sense data, results may be incorrect
>> 2016-09-07 13:19:39.729894 7f3601a298c0 -1 flushed journal
>> /var/lib/ceph/osd/ceph-11/journal for object store
>> /var/lib/ceph/osd/ceph-11
>>
>> This is the logfile from my osd.11 with further informations
>> - http://slexy.org/view/s2AlEhV38m
>>
>> This is not realy a case actualy cause i will setup the journal
>> partitions again with 20GB (from 5GB actual) an bring the OSD then
>> bring up again.
>> But i thought i should mail this error to the mailing list.
>>
>> This is my Setup:
>>
>> *Software/OS*
>> - Jewel
>> #> ceph tell osd.* version | grep version | uniq
>> "version": "ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec
>> 8f0b0e58374)"
>>
>> #> ceph tell mon.* version
>> [...] ceph version 

Re: [ceph-users] Bluestore crashes

2016-09-08 Thread Mark Nelson
It's important to keep in mind that bluestore is still rapidly being 
developed.  At any given commit it might crash, eat data, be horribly 
slow, destroy your computer, etc.  It's very much wild west territory. 
Jewel's version of bluestore is quite different than what is in master 
right now.  Please take any benchmarks with a big grain of salt since 
there are several known issues we are working through (especially around 
encode/decode).  Having said that, I'm glad to hear it's faster! :)


Mark

On 09/08/2016 08:19 AM, Wido den Hollander wrote:



Op 8 september 2016 om 14:58 schreef thomas.swinde...@yahoo.com:


We've been doing some performance testing on Bluestore to see whether it could 
be viable to use in the future.
The good news we are seeing significant performance improvements on using it, 
so thank you for all the work that has gone into it.
The bad news is we keep encountering crashes and corruption requiring the 
rebuild: Example log extract looks like the following:
2016-09-03 17:56:57.337756 7f593fe9b700 -1 freelist release bad release 564521787392~4096 
overlaps with 564521787392~40962016-09-03 17:56:57.340169 7f593fe9b700 -1 
os/bluestore/FreelistManager.cc: In function 'int FreelistManager::release(uint64_t, 
uint64_t, KeyValueDB::Transaction)' thread 7f593fe9b700 time 2016-09-03 
17:56:57.338393os/bluestore/FreelistManager.cc: 245: FAILED assert(0 == "bad release 
overlap")
 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) 1: 
(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) 
[0x7f59643945b5] 2: (FreelistManager::release(unsigned long, unsigned long, 
std::shared_ptr)+0x533) [0x7f596402c963] 3: 
(BlueStore::_txc_update_fm(BlueStore::TransContext*)+0x317) [0x7f5963fda0d7] 4: 
(BlueStore::_kv_sync_thread()+0x9b0) [0x7f5964003690] 5: 
(BlueStore::KVSyncThread::entry()+0xd) [0x7f59640299dd] 6: (()+0x7dc5) [0x7f59622c4dc5] 7: 
(clone()+0x6d) [0x7f596095021d] NOTE: a copy of the executable, or `objdump -rdS 
` is needed to interpret this.
--- begin dump of recent events 1> 2016-09-03 17:43:23.940910 7f593fe9b700  5 
rocksdb: EmitPhysicalRecord: log 37 offset 70561038 len 23 crc 696327790 -> 
2016-09-03 17:43:23.965781 7f593fe9b700  5 rocksdb: EmitPhysicalRecord: log 37 offset 
70561061 len 2178 crc 2109984297 -9998> 2016-09-03 17:43:23.965833 7f593fe9b700  5 
rocksdb: EmitPhysicalRecord: log 37 offset 70563239 len 2175 crc 493419836 -9997> 
2016-09-03 17:43:23.965867 7f593fe9b700  5 rocksdb: EmitPhysicalRecord: log 37 offset 
70565414 len 23 crc 1766806723...



Seems you are running Jewel (10.2.2) and the BlueStore version in there is not 
the best.

If you want to test with BlueStore I recommend that you run with code from the 
master branch.

Wido


This appears to match this existing bug: http://tracker.ceph.com/issues/15659
Are there any know work-arounds to prevent the issue from happening? What 
information and support can we provide to help the fix on the issue to be 
progressed? We seem to be able to reliably reproduce the issue after about 6 
hours or so of test running so would be able to test any proposed fixes if that 
would be helpful,
Thanks,
Thomas___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore crashes

2016-09-08 Thread Wido den Hollander

> Op 8 september 2016 om 14:58 schreef thomas.swinde...@yahoo.com:
> 
> 
> We've been doing some performance testing on Bluestore to see whether it 
> could be viable to use in the future.
> The good news we are seeing significant performance improvements on using it, 
> so thank you for all the work that has gone into it.
> The bad news is we keep encountering crashes and corruption requiring the 
> rebuild: Example log extract looks like the following:
> 2016-09-03 17:56:57.337756 7f593fe9b700 -1 freelist release bad release 
> 564521787392~4096 overlaps with 564521787392~40962016-09-03 17:56:57.340169 
> 7f593fe9b700 -1 os/bluestore/FreelistManager.cc: In function 'int 
> FreelistManager::release(uint64_t, uint64_t, KeyValueDB::Transaction)' thread 
> 7f593fe9b700 time 2016-09-03 17:56:57.338393os/bluestore/FreelistManager.cc: 
> 245: FAILED assert(0 == "bad release overlap")
>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) 1: 
> (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) 
> [0x7f59643945b5] 2: (FreelistManager::release(unsigned long, unsigned long, 
> std::shared_ptr)+0x533) [0x7f596402c963] 3: 
> (BlueStore::_txc_update_fm(BlueStore::TransContext*)+0x317) [0x7f5963fda0d7] 
> 4: (BlueStore::_kv_sync_thread()+0x9b0) [0x7f5964003690] 5: 
> (BlueStore::KVSyncThread::entry()+0xd) [0x7f59640299dd] 6: (()+0x7dc5) 
> [0x7f59622c4dc5] 7: (clone()+0x6d) [0x7f596095021d] NOTE: a copy of the 
> executable, or `objdump -rdS ` is needed to interpret this.
> --- begin dump of recent events 1> 2016-09-03 17:43:23.940910 
> 7f593fe9b700  5 rocksdb: EmitPhysicalRecord: log 37 offset 70561038 len 23 
> crc 696327790 -> 2016-09-03 17:43:23.965781 7f593fe9b700  5 rocksdb: 
> EmitPhysicalRecord: log 37 offset 70561061 len 2178 crc 2109984297 -9998> 
> 2016-09-03 17:43:23.965833 7f593fe9b700  5 rocksdb: EmitPhysicalRecord: log 
> 37 offset 70563239 len 2175 crc 493419836 -9997> 2016-09-03 17:43:23.965867 
> 7f593fe9b700  5 rocksdb: EmitPhysicalRecord: log 37 offset 70565414 len 23 
> crc 1766806723...
> 

Seems you are running Jewel (10.2.2) and the BlueStore version in there is not 
the best.

If you want to test with BlueStore I recommend that you run with code from the 
master branch.

Wido

> This appears to match this existing bug: http://tracker.ceph.com/issues/15659
> Are there any know work-arounds to prevent the issue from happening? What 
> information and support can we provide to help the fix on the issue to be 
> progressed? We seem to be able to reliably reproduce the issue after about 6 
> hours or so of test running so would be able to test any proposed fixes if 
> that would be helpful,
> Thanks,
> Thomas___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] FW: Multiple public networks and ceph-mon daemons listening

2016-09-08 Thread Wido den Hollander

> Op 8 september 2016 om 15:02 schreef Jim Kilborn :
> 
> 
> Hello all…
> 
> I am setting up a ceph cluster (jewel) on a private network. The compute 
> nodes are all running centos 7 and mounting the cephfs volume using the 
> kernel driver. The ceph storage nodes are dual connected to the private 
> network, as well as our corporate network, as some users need to mount the 
> volume to their workstations (also centos 7) from the corporate network.
> 

Compute via CephFS? I would highly recommend using RBD for Block Devices, don't 
use CephFS in between.

> The private network is infiniband, so I have that set as the cluster network 
> , and have both networks listed in the private networks in the ceph.conf.
> 
> However, the mon daemons only listen on the private network, and if I want to 
> mount the volume from the corporate network, it has to mount via the private 
> network address of the ceph storage nodes, which means that the cluster head 
> node (linux) has to route that traffic.
> 
> I would like to know if there is a way to have the monitors listen on both 
> their interfaces, like the osd/mds daemons do, so I could use the appropriate 
> address in the fstab of the clients, depending on which network they are on.
> 

No, not really. The monitors have one IP and that's the only IP they can work 
with.

Your setup isn't really going to work either. Although the OSDs seem to listen 
on */[::], they don't The OSDMap contains the IPs of the OSDs where clients 
will connect to. That can't be dual-homed.

You would have to come up with a routed network in this case so people can 
reach the IPs of all Ceph nodes.

Wido

> Alternatively, I could have one of the mon daemons added with its private 
> network address, as all ceph storage nodes are dual connected, but I would 
> lose some fault tolerance I think (if that monitor goes down)
> 
> Just thought there must be a better way. I have 3 monitor nodes (dual 
> functioning as osd nodes). There are all brand new dell 730xd with 12GB ram 
> and dual xeons. I also have a ssd cache in front of a erasure coded pool.
> 
> Any suggestions?
> 
> Thanks for taking the time…
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] FW: Multiple public networks and ceph-mon daemons listening

2016-09-08 Thread Jim Kilborn
Hello all…

I am setting up a ceph cluster (jewel) on a private network. The compute nodes 
are all running centos 7 and mounting the cephfs volume using the kernel 
driver. The ceph storage nodes are dual connected to the private network, as 
well as our corporate network, as some users need to mount the volume to their 
workstations (also centos 7) from the corporate network.

The private network is infiniband, so I have that set as the cluster network , 
and have both networks listed in the private networks in the ceph.conf.

However, the mon daemons only listen on the private network, and if I want to 
mount the volume from the corporate network, it has to mount via the private 
network address of the ceph storage nodes, which means that the cluster head 
node (linux) has to route that traffic.

I would like to know if there is a way to have the monitors listen on both 
their interfaces, like the osd/mds daemons do, so I could use the appropriate 
address in the fstab of the clients, depending on which network they are on.

Alternatively, I could have one of the mon daemons added with its private 
network address, as all ceph storage nodes are dual connected, but I would lose 
some fault tolerance I think (if that monitor goes down)

Just thought there must be a better way. I have 3 monitor nodes (dual 
functioning as osd nodes). There are all brand new dell 730xd with 12GB ram and 
dual xeons. I also have a ssd cache in front of a erasure coded pool.

Any suggestions?

Thanks for taking the time…

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Client XXX failing to respond to cache pressure

2016-09-08 Thread Georgi Chorbadzhiyski
Today I was surprised to find our cluster in HEALTH_WARN condition and
searching in documentation was no help at all.

Does anybody have an idea how to cure the dreaded "failing to respond
to cache pressure" message. As I understand it, it tells me that a
client is not responding to MDS request to prune it's cache but
I have no idea what is causing the problem and how to cure it.

I'm using kernel cephfs driver on kernel 4.4.14.

# ceph --version
ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)

# ceph -s
cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb1
 health HEALTH_WARN
mds0: Client HOST1 failing to respond to cache pressure
mds0: Client HOST2 failing to respond to cache pressure
mds0: Client HOST3 failing to respond to cache pressure
mds0: Client HOST4 failing to respond to cache pressure
 monmap e2: 5 mons at 
{HOST10=1.2.3.10:6789/0,HOST5=1.2.3.5:6789/0,HOST6=1.2.3.6:6789/0,HOST7=1.2.3.7:6789/0,HOST11=1.2.3.11:6789/0}
election epoch 188, quorum 0,1,2,3,4 HOST10,HOST5,HOST6,HOST7,HOST11
 mdsmap e777: 1/1/1 up {0=HOST7=up:active}, 2 up:standby
 osdmap e8293: 61 osds: 61 up, 60 in
  pgmap v3149484: 6144 pgs, 3 pools, 706 GB data, 650 kobjects
1787 GB used, 88434 GB / 90264 GB avail
6144 active+clean
  client io 31157 kB/s rd, 1567 kB/s wr, 647 op/s
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bluestore crashes

2016-09-08 Thread thomas.swindells
We've been doing some performance testing on Bluestore to see whether it could 
be viable to use in the future.
The good news we are seeing significant performance improvements on using it, 
so thank you for all the work that has gone into it.
The bad news is we keep encountering crashes and corruption requiring the 
rebuild: Example log extract looks like the following:
2016-09-03 17:56:57.337756 7f593fe9b700 -1 freelist release bad release 
564521787392~4096 overlaps with 564521787392~40962016-09-03 17:56:57.340169 
7f593fe9b700 -1 os/bluestore/FreelistManager.cc: In function 'int 
FreelistManager::release(uint64_t, uint64_t, KeyValueDB::Transaction)' thread 
7f593fe9b700 time 2016-09-03 17:56:57.338393os/bluestore/FreelistManager.cc: 
245: FAILED assert(0 == "bad release overlap")
 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) 1: 
(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) 
[0x7f59643945b5] 2: (FreelistManager::release(unsigned long, unsigned long, 
std::shared_ptr)+0x533) [0x7f596402c963] 3: 
(BlueStore::_txc_update_fm(BlueStore::TransContext*)+0x317) [0x7f5963fda0d7] 4: 
(BlueStore::_kv_sync_thread()+0x9b0) [0x7f5964003690] 5: 
(BlueStore::KVSyncThread::entry()+0xd) [0x7f59640299dd] 6: (()+0x7dc5) 
[0x7f59622c4dc5] 7: (clone()+0x6d) [0x7f596095021d] NOTE: a copy of the 
executable, or `objdump -rdS ` is needed to interpret this.
--- begin dump of recent events 1> 2016-09-03 17:43:23.940910 
7f593fe9b700  5 rocksdb: EmitPhysicalRecord: log 37 offset 70561038 len 23 crc 
696327790 -> 2016-09-03 17:43:23.965781 7f593fe9b700  5 rocksdb: 
EmitPhysicalRecord: log 37 offset 70561061 len 2178 crc 2109984297 -9998> 
2016-09-03 17:43:23.965833 7f593fe9b700  5 rocksdb: EmitPhysicalRecord: log 37 
offset 70563239 len 2175 crc 493419836 -9997> 2016-09-03 17:43:23.965867 
7f593fe9b700  5 rocksdb: EmitPhysicalRecord: log 37 offset 70565414 len 23 crc 
1766806723...

This appears to match this existing bug: http://tracker.ceph.com/issues/15659
Are there any know work-arounds to prevent the issue from happening? What 
information and support can we provide to help the fix on the issue to be 
progressed? We seem to be able to reliably reproduce the issue after about 6 
hours or so of test running so would be able to test any proposed fixes if that 
would be helpful,
Thanks,
Thomas___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Excluding buckets in RGW Multi-Site Sync

2016-09-08 Thread Wido den Hollander
Hi,

I've been setting up a RGW Multi-Site [0] configuration in 6 VMs. 3 VMs per 
cluster and one RGW per cluster.

Works just fine, I can create a user in the master zone, create buckets and 
upload data using s3cmd (S3).

What I see is that ALL data is synced between the two zones. While I understand 
that's indeed the purpose of it, is there a way to disable the sync for 
specific buckets/users?

Wido

[0]: http://docs.ceph.com/docs/master/radosgw/multisite/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rgw meta pool

2016-09-08 Thread Pavan Rallabhandi
Trying it one more time on the users list.

In our clusters running Jewel 10.2.2, I see default.rgw.meta pool running into 
large number of objects, potentially to the same range of objects contained in 
the data pool. 

I understand that the immutable metadata entries are now stored in this heap 
pool, but I couldn’t reason out why the metadata objects are left in this pool 
even after the actual bucket/object/user deletions.

The put_entry() promptly seems to be storing the same in the heap pool 
https://github.com/ceph/ceph/blob/master/src/rgw/rgw_metadata.cc#L880, but I do 
not see them to be reaped ever. Are they left there for some reason?

Thanks,
-Pavan.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] non-effective new deep scrub interval

2016-09-08 Thread Christian Balzer

Hello,

On Thu, 8 Sep 2016 09:48:46 +0200 (CEST) David DELON wrote:

> 
> Hello, 
> 
> i'm using ceph jewel. 
> I would like to schedule the deep scrub operations on my own. 

Welcome to the club, alas the ride isn't for the faint of heart.

You will want to (re-)search the ML archive (google) and in particular the
recent "Spreading deep-scrubbing load" thread.

> First of all, i have tried to change the interval value for 30 days: 
> In each /etc/ceph/ceph.conf, i have added: 
> 
> [osd] 
> #30*24*3600 
> osd deep scrub interval = 2592000 
> I have restarted all the OSD daemons. 

This could have been avoided by an "inject" for all OSDs.
Restarting (busy) OSDs isn't particular nice for a cluster.

> The new value has been taken into account as for each OSD: 
> 
> ceph --admin-daemon /var/run/ceph/ceph-osd.X.asok config show | grep 
> deep_scrub_interval 
> "osd_deep_scrub_interval": "2.592e+06", 
> 
> 
> I have checked the last_deep_scrub value for each pg with 
> ceph pg dump 
> And each pg has been deep scrubbed during the last 7 days (which is the 
> default behavior). 
> 
See the above thread.

> Since i have made the changes 2 days ago, it keeps on deep scrubbing. 
> Do i miss something? 
> 
At least 2 things, maybe more.

Unless you changed the "osd_scrub_max_interval" as well, that will enforce
things, by default after a week.

And with Jewel you get that well meaning, but turned on by default and
ill-documented "osd_scrub_interval_randomize_ratio", which will spread
things out happily and not when you want them.

Again, read the above thread.

Also your cluster _should_ be able to endure deep scrubs even when busy,
otherwise you're looking at trouble when you loose and OSD and the
resulting balancing as well.

Setting these to something sensible:
"osd_scrub_begin_hour": "0",
"osd_scrub_end_hour": "6",

and especially this:
"osd_scrub_sleep": "0.1",

will minimize the impact of scrub as well.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] experiences in upgrading Infernalis to Jewel

2016-09-08 Thread Arvydas Opulskis
Hi,

if you are using RGW, you can experience similar problems to ours when
creating a bucket. You'll find what went wrong and how we solved it in my
older email. Subject of topic is "Can't create bucket (ERROR: endpoints not
configured for upstream zone)"

Cheers,
Arvydas

On Thu, Sep 8, 2016 at 12:10 PM, felderm  wrote:

>
> Thanks Alexandre!!
>
> We plan to proceed as follows for upgrading infernalis to jewel:
>
> Enable Repo on all Nodes
> #cat /etc/apt/sources.list.d/ceph_com_jewel.list
> deb http://download.ceph.com/debian-jewel trusty main
>
> For all Monitors (one after the other)
> 1) sudo apt-get update && sudo apt-get install ceph-common
> 2) service ceph stop
> 3) service ceph start
>
> For all OSDs (one after the other)
> 1) sudo apt-get update && sudo apt-get install ceph-common
> 2) service ceph stop
> 3) service ceph start
>
> based on the documentation there are no "release specific" procedures
> for upgrading infernalis to jewel
> http://docs.ceph.com/docs/jewel/install/upgrading-ceph/
>
> Other opinions/Recommendations are very welcome! Is it really that
> simple and unproblematic ??
>
> Thanks and regards
> felder
>
>
>
>
>
> On 09/07/2016 02:33 PM, Alexandre DERUMIER wrote:
> > Hi,
> >
> > I think it's more simple to
> >
> > 1) change the repository
> > 2) apt-get dist-upgrade
> >
> > 3) restart mon on each node
> > 4) restart osd on each node
> >
> > done
> >
> >
> > I have upgrade 4 cluster like this without any problem
> > I never have used ceph-deploy for upgrade.
> >
> >
> > - Mail original -
> > De: "felderm" 
> > À: "ceph-users" 
> > Envoyé: Mercredi 7 Septembre 2016 14:27:02
> > Objet: [ceph-users] experiences in upgrading Infernalis to Jewel
> >
> > Hi All
> >
> > We are preparing upgrade from Ceph Infernalis 9.2.0 to Ceph Jewel
> > 10.2.2. Based on the Upgrade procedure documentation
> > http://docs.ceph.com/docs/jewel/install/upgrading-ceph/ it sound easy.
> > But it often fails when you think it's easy. Therefore I would like to
> > know your opinion for the following questions:
> >
> > 1) We think to upgrade one monitor after each other. If we break one
> > monitor, the 2 others are still operational.
> >
> > in the Documentation the propose
> > ceph-deploy install --release jewel mon1 mon2 mon3
> >
> > Wouldn't it be wise to upgrade one after the other?
> > ceph-deploy install --release jewel mon1
> > ceph-deploy install --release jewel mon2
> > ceph-deploy install --release jewel mon3
> >
> > Same procedure for Monitor Nodes ??
> >
> > 2) On which node would you recommend running the ceph-deploy command?
> > which one is the admin node?
> >
> > 3) If we are using ceph-deploy for upgrading, do we need to change the
> > apt repository?
> > #cat /etc/apt/sources.list.d/ceph_com_debian_infernalis.list
> > deb http://ceph.com/debian-infernalis trusty main
> >
> > 4) There is not possibility to revert the upgrade. Is there any plan B
> > when the upgrade fails ? Sorry for being so pessimistic.
> >
> > 5) General Experiences in upgrading ceph? Does it fails often? How was
> > your plan B in case of upgrade failures ?
> >
> > Your feedbacks are highly appreciated!
> > Thanks
> > felder
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] experiences in upgrading Infernalis to Jewel

2016-09-08 Thread felderm

Thanks Alexandre!!

We plan to proceed as follows for upgrading infernalis to jewel:

Enable Repo on all Nodes
#cat /etc/apt/sources.list.d/ceph_com_jewel.list
deb http://download.ceph.com/debian-jewel trusty main

For all Monitors (one after the other)
1) sudo apt-get update && sudo apt-get install ceph-common
2) service ceph stop
3) service ceph start

For all OSDs (one after the other)
1) sudo apt-get update && sudo apt-get install ceph-common
2) service ceph stop
3) service ceph start

based on the documentation there are no "release specific" procedures
for upgrading infernalis to jewel
http://docs.ceph.com/docs/jewel/install/upgrading-ceph/

Other opinions/Recommendations are very welcome! Is it really that
simple and unproblematic ??

Thanks and regards
felder





On 09/07/2016 02:33 PM, Alexandre DERUMIER wrote:
> Hi,
>
> I think it's more simple to 
>
> 1) change the repository
> 2) apt-get dist-upgrade
>
> 3) restart mon on each node
> 4) restart osd on each node
>
> done
>
>
> I have upgrade 4 cluster like this without any problem
> I never have used ceph-deploy for upgrade.
>
>
> - Mail original -
> De: "felderm" 
> À: "ceph-users" 
> Envoyé: Mercredi 7 Septembre 2016 14:27:02
> Objet: [ceph-users] experiences in upgrading Infernalis to Jewel
>
> Hi All 
>
> We are preparing upgrade from Ceph Infernalis 9.2.0 to Ceph Jewel 
> 10.2.2. Based on the Upgrade procedure documentation 
> http://docs.ceph.com/docs/jewel/install/upgrading-ceph/ it sound easy. 
> But it often fails when you think it's easy. Therefore I would like to 
> know your opinion for the following questions: 
>
> 1) We think to upgrade one monitor after each other. If we break one 
> monitor, the 2 others are still operational. 
>
> in the Documentation the propose 
> ceph-deploy install --release jewel mon1 mon2 mon3 
>
> Wouldn't it be wise to upgrade one after the other? 
> ceph-deploy install --release jewel mon1 
> ceph-deploy install --release jewel mon2 
> ceph-deploy install --release jewel mon3 
>
> Same procedure for Monitor Nodes ?? 
>
> 2) On which node would you recommend running the ceph-deploy command? 
> which one is the admin node? 
>
> 3) If we are using ceph-deploy for upgrading, do we need to change the 
> apt repository? 
> #cat /etc/apt/sources.list.d/ceph_com_debian_infernalis.list 
> deb http://ceph.com/debian-infernalis trusty main 
>
> 4) There is not possibility to revert the upgrade. Is there any plan B 
> when the upgrade fails ? Sorry for being so pessimistic. 
>
> 5) General Experiences in upgrading ceph? Does it fails often? How was 
> your plan B in case of upgrade failures ? 
>
> Your feedbacks are highly appreciated! 
> Thanks 
> felder 
>
>
>
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] non-effective new deep scrub interval

2016-09-08 Thread David DELON

Hello, 

i'm using ceph jewel. 
I would like to schedule the deep scrub operations on my own. 
First of all, i have tried to change the interval value for 30 days: 
In each /etc/ceph/ceph.conf, i have added: 

[osd] 
#30*24*3600 
osd deep scrub interval = 2592000 
I have restarted all the OSD daemons. 
The new value has been taken into account as for each OSD: 

ceph --admin-daemon /var/run/ceph/ceph-osd.X.asok config show | grep 
deep_scrub_interval 
"osd_deep_scrub_interval": "2.592e+06", 


I have checked the last_deep_scrub value for each pg with 
ceph pg dump 
And each pg has been deep scrubbed during the last 7 days (which is the default 
behavior). 

Since i have made the changes 2 days ago, it keeps on deep scrubbing. 
Do i miss something? 

Thanks. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados bench output question

2016-09-08 Thread mj

Hi Christian,

Thanks a lot for all your information!

(specially the bit that ceph never reads from the journal, but writes to 
osd from memory was new for me)


MJ

On 09/07/2016 03:20 AM, Christian Balzer wrote:


hello,

On Tue, 6 Sep 2016 13:38:45 +0200 lists wrote:


Hi Christian,

Thanks for your reply.


What SSD model (be precise)?

Samsung 480GB PM863 SSD


So that's not your culprit then (they are supposed to handle sync writes
at full speed).


Only one SSD?

Yes. With a 5GB partition based journal for each osd.


A bit small, but in normal scenarios that shouldn't be a problem.
Read:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg28003.html


During the 0 MB/sec, there is NO increased cpu usage: it is usually
around 15 - 20% for the four ceph-osd processes.


Watch your node(s) with atop or iostat.

Ok, I will do.


Best results will be had with 3 large terminals (one per node) running
atop, interval set to at least 5, down from default 10 seconds.
Same diff with iostat, parameters "-x 2".


Do we have an issue..? And if yes: Anyone with a suggestions where to
look at?


You will find that either your journal SSD is overwhelmed and a single
SSD peaking around 500MB/s wouldn't be that surprising.
Or that your HDDs can't scribble away at more than the speed above, the
more likely reason.
Even a combination of both.

Ceph needs to flush data to the OSDs eventually (and that is usually more
or less immediately with default parameters), so for a sustained,
sequential write test you're looking at the speed of your HDDs.
And that will be spiky of sorts, due to FS journals, seeks for other
writes (replicas), etc.

But would we expect the MB/sec to drop to ZERO, during journal-to-osd
flushes?


A common misconception when people start up with Ceph and probably
something that should be better explained in the docs. Or not, given that
Blustore is on the shimmering horizon.

Ceph never reads from the journals, unless there has been a crash.
(Now would be a good time to read that link above if you haven't yet)

What happens is that (depending on the various filestore and journal
parameters) Ceph starts flushing the still in memory data to the OSD
(disk, FS) after the journal has been written, as I mentioned above.

The logic here is to not create an I/O storm after letting things pile up
for a long time.
People with fast storage subsystems and/or SSDs/NVMes as OSDs tend to tune
these parameters.

So now think about what happens during that rados bench run:
A 4MB object gets written (created, then filled), so the client talks to
the OSD that holds the primary PG for that object.
That OSD writes the data to the journal and sends it to the other OSDs
(replicas).
Once all journals have been written, the primary OSD acks the write to
the client.

And this happens with 16 threads by default, making things nicely busy.
Now keeping in mind the above description and the fact that you have a
small cluster, a single OSD that gets too busy will block the whole
cluster basically.

So things dropping to zero means that at least one OSD was so busy (not
CPU in your case, IOwait) that it couldn't take in more data.
The fact that your drops happen in a rather predictable, roughly 9
seconds interval, suggests also the possibility that the actual journal
got full, but that's not conclusive.

Christian


Thanks for the quick feedback, and I'll dive into atop and iostat next.

Regards,
MJ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Scrub and deep-scrub repeating over and over

2016-09-08 Thread Arvydas Opulskis
Hi Goncalo, there it is:

# ceph pg 11.34a query
{
"state": "active+clean+scrubbing",
"snap_trimq": "[]",
"epoch": 6547,
"up": [
24,
3
],
"acting": [
24,
3
],
"actingbackfill": [
"3",
"24"
],
"info": {
"pgid": "11.34a",
"last_update": "6547'85045",
"last_complete": "6547'85045",
"log_tail": "6215'81998",
"last_user_version": 85045,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": "[]",
"history": {
"epoch_created": 5178,
"last_epoch_started": 5241,
"last_epoch_clean": 5241,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 5184,
"same_interval_since": 5240,
"same_primary_since": 5096,
"last_scrub": "6547'85045",
"last_scrub_stamp": "2016-09-08 09:20:06.804646",
"last_deep_scrub": "6547'85045",
"last_deep_scrub_stamp": "2016-09-08 09:18:22.582767",
"last_clean_scrub_stamp": "2016-09-08 09:20:06.804646"
},
"stats": {
"version": "6547'85045",
"reported_seq": "219744",
"reported_epoch": "6547",
"state": "active+clean+scrubbing",
"last_fresh": "2016-09-08 09:20:13.712725",
"last_change": "2016-09-08 09:20:13.712725",
"last_active": "2016-09-08 09:20:13.712725",
"last_peered": "2016-09-08 09:20:13.712725",
"last_clean": "2016-09-08 09:20:13.712725",
"last_became_active": "2016-07-27 18:46:25.926150",
"last_became_peered": "2016-07-27 18:46:25.926150",
"last_unstale": "2016-09-08 09:20:13.712725",
"last_undegraded": "2016-09-08 09:20:13.712725",
"last_fullsized": "2016-09-08 09:20:13.712725",
"mapping_epoch": 5185,
"log_start": "6215'81998",
"ondisk_log_start": "6215'81998",
"created": 5178,
"last_epoch_clean": 5241,
"parent": "0.0",
"parent_split_bits": 10,
"last_scrub": "6547'85045",
"last_scrub_stamp": "2016-09-08 09:20:06.804646",
"last_deep_scrub": "6547'85045",
"last_deep_scrub_stamp": "2016-09-08 09:18:22.582767",
"last_clean_scrub_stamp": "2016-09-08 09:20:06.804646",
"log_size": 3047,
"ondisk_log_size": 3047,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": true,
"stat_sum": {
"num_bytes": 6225173162,
"num_objects": 2688,
"num_object_clones": 0,
"num_object_copies": 5376,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 2688,
"num_whiteouts": 0,
"num_read": 3416,
"num_read_kb": 710270,
"num_write": 16467,
"num_write_kb": 2275320,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 0,
"num_bytes_recovered": 0,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
"num_objects_pinned": 0
},
"up": [
24,
3
],
"acting": [
24,
3
],
"blocked_by": [],
"up_primary": 24,
"acting_primary": 24
},
"empty": 0,
"dne": 0,
"incomplete": 0,
"last_epoch_started": 5241,
"hit_set_history": {
"current_last_update": "0'0",
"history": []
}
},
"peer_info": [
{
"peer": "3",
"pgid": "11.34a",
"last_update": "6547'85045",
"last_complete": "6547'85045",
"log_tail": "4988'75612",
"last_user_version": 0,
"last_backfill": "MAX",

Re: [ceph-users] Scrub and deep-scrub repeating over and over

2016-09-08 Thread Goncalo Borges

Can you please share the result of

ceph pg 11.34a query

?

On 09/08/2016 05:03 PM, Arvydas Opulskis wrote:

2016-09-08 08:45:01.441945 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:45:03.585039 osd.24 [INF] 11.34a scrub ok


--
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW  2006
T: +61 2 93511937

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Scrub and deep-scrub repeating over and over

2016-09-08 Thread Arvydas Opulskis
Hi all,

we have several PG's with repeating scrub tasks. As soon as scrub is
complete, it starts again. You can get an idea from the log bellow:

$ ceph -w | grep -i "11.34a"
2016-09-08 08:28:33.346798 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:28:37.319018 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:28:39.363732 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:28:41.319834 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:28:43.411455 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:28:45.320538 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:28:47.308737 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:28:55.322159 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:28:57.362063 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:29:00.322918 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:29:02.418139 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:29:07.324022 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:29:09.469796 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:29:12.324752 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:29:14.353026 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:29:17.325801 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:29:19.446962 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:29:22.326297 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:29:24.389610 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:29:29.327707 osd.24 [INF] 11.34a deep-scrub starts
2016-09-08 08:37:13.887668 osd.24 [INF] 11.34a deep-scrub ok
2016-09-08 08:37:18.383127 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:37:20.700806 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:37:27.385027 osd.24 [INF] 11.34a deep-scrub starts
2016-09-08 08:44:36.073670 osd.24 [INF] 11.34a deep-scrub ok
2016-09-08 08:44:44.438164 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:44:47.017694 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:44:58.441510 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:45:00.524666 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:45:01.441945 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:45:03.585039 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:45:07.443524 osd.24 [INF] 11.34a deep-scrub starts
2016-09-08 08:52:16.020630 osd.24 [INF] 11.34a deep-scrub ok
2016-09-08 08:52:18.494388 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:52:20.519264 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:52:23.495231 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:52:25.514784 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:52:29.496117 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:52:31.505832 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:52:34.496818 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:52:36.475993 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:52:36.497652 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:52:38.483388 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:52:41.498299 osd.24 [INF] 11.34a scrub starts
2016-09-08 08:52:43.509776 osd.24 [INF] 11.34a scrub ok
2016-09-08 08:52:45.498929 osd.24 [INF] 11.34a deep-scrub starts

Some options from the cluster:
# ceph daemon /var/run/ceph/ceph-osd.24.asok config show | grep -i "scrub"
"mon_warn_not_scrubbed": "0",
"mon_warn_not_deep_scrubbed": "0",
"mon_scrub_interval": "86400",
"mon_scrub_timeout": "300",
"mon_scrub_max_keys": "100",
"mon_scrub_inject_crc_mismatch": "0",
"mon_scrub_inject_missing_keys": "0",
"mds_max_scrub_ops_in_progress": "5",
"osd_scrub_invalid_stats": "true",
"osd_max_scrubs": "1",
"osd_scrub_begin_hour": "23",
"osd_scrub_end_hour": "7",
"osd_scrub_load_threshold": "10",
"osd_scrub_min_interval": "86400",
"osd_scrub_max_interval": "604800",
"osd_scrub_interval_randomize_ratio": "0.5",
"osd_scrub_chunk_min": "5",
"osd_scrub_chunk_max": "25",
"osd_scrub_sleep": "0",
"osd_scrub_auto_repair": "false",
"osd_scrub_auto_repair_num_errors": "5",
"osd_deep_scrub_interval": "604800",
"osd_deep_scrub_randomize_ratio": "0.15",
"osd_deep_scrub_stride": "524288",
"osd_deep_scrub_update_digest_min_age": "7200",
"osd_debug_scrub_chance_rewrite_digest": "0",
"osd_scrub_priority": "5",
"osd_scrub_cost": "52428800",


Deep-scrub and scrub dates are updated after each operation in pg dump.
Does anyone have ideas why this is happening and how to solve this? Pool
size is 2, if this matter.

Thanks for any ideas!

Arvydas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs lost from cephfs data pool, how to determine which files to restore from backup?

2016-09-08 Thread Goncalo Borges

Hi Greg...




I've had to force recreate some PGs on my cephfs data pool due to some
cascading disk failures in my homelab cluster. Is there a way to easily
determine which files I need to restore from backup? My metadata pool is
completely intact.

Assuming you're on Jewel, run a recursive "scrub" on the MDS root via
the admin socket, and all the missing files should get logged in the
local MDS log.


The data file is stripped into different objects (according to the 
selected layout) that are then stored in different pgs and OSDs.


So, if a few pgs are lost, it mean that some files may be totally lost 
(if all of its objects were stored in the lost pgs) or that some files 
may only be partially lost (if some of its objects were stored in the 
losts pgs)


Does this method properly takes into account for the second mentioned case?




(I'm surprised at this point to discover we don't seem to have any
documentation about how scrubbing works. It's a regular admin socket
command and "ceph daemon mds. help" should get you going where
you need.)


Indeed. Only found some references to it on John's CephFS update Feb 
2016 talk: http://www.slideshare.net/JohnSpray1/cephfs-update-february-2016


Cheers
Goncalo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com