[ceph-users] Checking cephfs compression is working

2018-11-16 Thread Rhian Resnick
How do you confirm that cephfs files and rados objects are being compressed?

I don't see how in the docs.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recover files from cephfs data pool

2018-11-05 Thread Rhian Resnick
Gotcha. Yah I think we are going continue the scanning to build a new
metadata pool. I am making some progress on a script to extract files from
the data store. Just need to find the exact format of the xattr's and the
object hierarchy for large files. If I end up taking the script to the
finish line this will be something I post for the community. So I am
reading c source code at the moment to see what cephfs is doing.


On Mon, Nov 5, 2018 at 8:10 PM Sergey Malinin  wrote:

> With cppool you got bunch of useless zero-sized objects because unlike
> "export", cppool does not copy omap data which actually holds all the
> inodes info.
> I suggest truncating journals only for an effort of reducing downtime
> followed by immediate backup of available files to a fresh fs. After
> resetting journals the part of your fs covered by not flushed "UPDATE"
> entries *will* become inconsistent. MDS may start to occasionally segfault
> but it can be avoided by setting forced readonly mode (in this mode MDS
> journal will not flush so you will need extra disk space).
> If you want to get the original fs recovered and fully functional - you
> need to somehow replay the journal (I'm unsure whether cephfs-data-scan
> tool operates on journal entries).
>
>
>
> On 6.11.2018, at 03:43, Rhian Resnick  wrote:
>
> Workload is mixed.
>
> We ran a rados cpool to backup the metadata pool.
>
> So your thinking that truncating journal and purge queue (we are luminous)
> with a reset could bring us online missing just data from that day. (most
> when the issue started)
>
> If so we could continue our scan into our recovery partition and give it a
> try tomorrow after discussions with our recovery team.
>
>
>
>
> On Mon, Nov 5, 2018 at 7:40 PM Sergey Malinin  wrote:
>
>> What was your recent workload? There are chances not to lose much if it
>> was mostly read ops. If such, you *must backup your metadata pool via
>> "rados export" in order to preserve omap data*, then try truncating
>> journals (along with purge queue if supported by your ceph version), wiping
>> session table, and resetting the fs.
>>
>>
>> On 6.11.2018, at 03:26, Rhian Resnick  wrote:
>>
>> That was our original plan. So we migrated to bigger disks and have space
>> but recover dentry uses up all our memory (128 GB) and crashes out.
>>
>> On Mon, Nov 5, 2018 at 7:23 PM Sergey Malinin  wrote:
>>
>>> I had the same problem with multi-mds. I solved it by freeing up a
>>> little space on OSDs, doing "recover dentries", truncating the journal, and
>>> then "fs reset". After that I was able to revert to single-active MDS and
>>> kept on running for a year until it failed on 13.2.2 upgrade :))
>>>
>>>
>>> On 6.11.2018, at 03:18, Rhian Resnick  wrote:
>>>
>>> Our metadata pool went from 700 MB to 1 TB in size in a few hours. Used
>>> all space on OSD and now 2 ranks report damage. The recovery tools on the
>>> journal fail as they run out of memory leaving us with the option of
>>> truncating the journal and loosing data or recovering using the scan tools.
>>>
>>> Any ideas on solutions are welcome. I posted all the logs and and
>>> cluster design previously but am happy to do so again. We are not desperate
>>> but we are hurting with this long downtime.
>>>
>>> On Mon, Nov 5, 2018 at 7:05 PM Sergey Malinin  wrote:
>>>
>>>> What kind of damage have you had? Maybe it is worth trying to get MDS
>>>> to start and backup valuable data instead of doing long running recovery?
>>>>
>>>>
>>>> On 6.11.2018, at 02:59, Rhian Resnick  wrote:
>>>>
>>>> Sounds like I get to have some fun tonight.
>>>>
>>>> On Mon, Nov 5, 2018, 6:39 PM Sergey Malinin >>>
>>>>> inode linkage (i.e. folder hierarchy) and file names are stored in
>>>>> omap data of objects in metadata pool. You can write a script that would
>>>>> traverse through all the metadata pool to find out file names correspond 
>>>>> to
>>>>> objects in data pool and fetch required files via 'rados get' command.
>>>>>
>>>>> > On 6.11.2018, at 02:26, Sergey Malinin  wrote:
>>>>> >
>>>>> > Yes, 'rados -h'.
>>>>> >
>>>>> >
>>>>> >> On 6.11.2018, at 02:25, Rhian Resnick  wrote:
>>>>> >>
>>>>> >> Does a tool exist to recover files from a cephfs data partition? We
>>>>> are rebuilding metadata but have a user who needs data asap.
>>>>> >> ___
>>>>> >> ceph-users mailing list
>>>>> >> ceph-users@lists.ceph.com
>>>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>> >
>>>>>
>>>>>
>>>>
>>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recover files from cephfs data pool

2018-11-05 Thread Rhian Resnick
Workload is mixed.

We ran a rados cpool to backup the metadata pool.

So your thinking that truncating journal and purge queue (we are luminous)
with a reset could bring us online missing just data from that day. (most
when the issue started)

If so we could continue our scan into our recovery partition and give it a
try tomorrow after discussions with our recovery team.




On Mon, Nov 5, 2018 at 7:40 PM Sergey Malinin  wrote:

> What was your recent workload? There are chances not to lose much if it
> was mostly read ops. If such, you *must backup your metadata pool via
> "rados export" in order to preserve omap data*, then try truncating
> journals (along with purge queue if supported by your ceph version), wiping
> session table, and resetting the fs.
>
>
> On 6.11.2018, at 03:26, Rhian Resnick  wrote:
>
> That was our original plan. So we migrated to bigger disks and have space
> but recover dentry uses up all our memory (128 GB) and crashes out.
>
> On Mon, Nov 5, 2018 at 7:23 PM Sergey Malinin  wrote:
>
>> I had the same problem with multi-mds. I solved it by freeing up a little
>> space on OSDs, doing "recover dentries", truncating the journal, and then
>> "fs reset". After that I was able to revert to single-active MDS and kept
>> on running for a year until it failed on 13.2.2 upgrade :))
>>
>>
>> On 6.11.2018, at 03:18, Rhian Resnick  wrote:
>>
>> Our metadata pool went from 700 MB to 1 TB in size in a few hours. Used
>> all space on OSD and now 2 ranks report damage. The recovery tools on the
>> journal fail as they run out of memory leaving us with the option of
>> truncating the journal and loosing data or recovering using the scan tools.
>>
>> Any ideas on solutions are welcome. I posted all the logs and and cluster
>> design previously but am happy to do so again. We are not desperate but we
>> are hurting with this long downtime.
>>
>> On Mon, Nov 5, 2018 at 7:05 PM Sergey Malinin  wrote:
>>
>>> What kind of damage have you had? Maybe it is worth trying to get MDS to
>>> start and backup valuable data instead of doing long running recovery?
>>>
>>>
>>> On 6.11.2018, at 02:59, Rhian Resnick  wrote:
>>>
>>> Sounds like I get to have some fun tonight.
>>>
>>> On Mon, Nov 5, 2018, 6:39 PM Sergey Malinin >>
>>>> inode linkage (i.e. folder hierarchy) and file names are stored in omap
>>>> data of objects in metadata pool. You can write a script that would
>>>> traverse through all the metadata pool to find out file names correspond to
>>>> objects in data pool and fetch required files via 'rados get' command.
>>>>
>>>> > On 6.11.2018, at 02:26, Sergey Malinin  wrote:
>>>> >
>>>> > Yes, 'rados -h'.
>>>> >
>>>> >
>>>> >> On 6.11.2018, at 02:25, Rhian Resnick  wrote:
>>>> >>
>>>> >> Does a tool exist to recover files from a cephfs data partition? We
>>>> are rebuilding metadata but have a user who needs data asap.
>>>> >> ___
>>>> >> ceph-users mailing list
>>>> >> ceph-users@lists.ceph.com
>>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> >
>>>>
>>>>
>>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recover files from cephfs data pool

2018-11-05 Thread Rhian Resnick
That was our original plan. So we migrated to bigger disks and have space
but recover dentry uses up all our memory (128 GB) and crashes out.

On Mon, Nov 5, 2018 at 7:23 PM Sergey Malinin  wrote:

> I had the same problem with multi-mds. I solved it by freeing up a little
> space on OSDs, doing "recover dentries", truncating the journal, and then
> "fs reset". After that I was able to revert to single-active MDS and kept
> on running for a year until it failed on 13.2.2 upgrade :))
>
>
> On 6.11.2018, at 03:18, Rhian Resnick  wrote:
>
> Our metadata pool went from 700 MB to 1 TB in size in a few hours. Used
> all space on OSD and now 2 ranks report damage. The recovery tools on the
> journal fail as they run out of memory leaving us with the option of
> truncating the journal and loosing data or recovering using the scan tools.
>
> Any ideas on solutions are welcome. I posted all the logs and and cluster
> design previously but am happy to do so again. We are not desperate but we
> are hurting with this long downtime.
>
> On Mon, Nov 5, 2018 at 7:05 PM Sergey Malinin  wrote:
>
>> What kind of damage have you had? Maybe it is worth trying to get MDS to
>> start and backup valuable data instead of doing long running recovery?
>>
>>
>> On 6.11.2018, at 02:59, Rhian Resnick  wrote:
>>
>> Sounds like I get to have some fun tonight.
>>
>> On Mon, Nov 5, 2018, 6:39 PM Sergey Malinin >
>>> inode linkage (i.e. folder hierarchy) and file names are stored in omap
>>> data of objects in metadata pool. You can write a script that would
>>> traverse through all the metadata pool to find out file names correspond to
>>> objects in data pool and fetch required files via 'rados get' command.
>>>
>>> > On 6.11.2018, at 02:26, Sergey Malinin  wrote:
>>> >
>>> > Yes, 'rados -h'.
>>> >
>>> >
>>> >> On 6.11.2018, at 02:25, Rhian Resnick  wrote:
>>> >>
>>> >> Does a tool exist to recover files from a cephfs data partition? We
>>> are rebuilding metadata but have a user who needs data asap.
>>> >> ___
>>> >> ceph-users mailing list
>>> >> ceph-users@lists.ceph.com
>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>>
>>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recover files from cephfs data pool

2018-11-05 Thread Rhian Resnick
Our metadata pool went from 700 MB to 1 TB in size in a few hours. Used all
space on OSD and now 2 ranks report damage. The recovery tools on the
journal fail as they run out of memory leaving us with the option of
truncating the journal and loosing data or recovering using the scan tools.

Any ideas on solutions are welcome. I posted all the logs and and cluster
design previously but am happy to do so again. We are not desperate but we
are hurting with this long downtime.

On Mon, Nov 5, 2018 at 7:05 PM Sergey Malinin  wrote:

> What kind of damage have you had? Maybe it is worth trying to get MDS to
> start and backup valuable data instead of doing long running recovery?
>
>
> On 6.11.2018, at 02:59, Rhian Resnick  wrote:
>
> Sounds like I get to have some fun tonight.
>
> On Mon, Nov 5, 2018, 6:39 PM Sergey Malinin 
>> inode linkage (i.e. folder hierarchy) and file names are stored in omap
>> data of objects in metadata pool. You can write a script that would
>> traverse through all the metadata pool to find out file names correspond to
>> objects in data pool and fetch required files via 'rados get' command.
>>
>> > On 6.11.2018, at 02:26, Sergey Malinin  wrote:
>> >
>> > Yes, 'rados -h'.
>> >
>> >
>> >> On 6.11.2018, at 02:25, Rhian Resnick  wrote:
>> >>
>> >> Does a tool exist to recover files from a cephfs data partition? We
>> are rebuilding metadata but have a user who needs data asap.
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Recover files from cephfs data pool

2018-11-05 Thread Rhian Resnick
Does a tool exist to recover files from a cephfs data partition? We are
rebuilding metadata but have a user who needs data asap.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] speeding up ceph

2018-11-05 Thread Rhian Resnick
What type of bandwidth did you see during the recovery process? We are
seeing around 2 Mbps on each box running 20 processes each.

On Mon, Nov 5, 2018 at 11:31 AM Sergey Malinin  wrote:

> Although I was advised not to use caching during recovery, I didn't notice
> any improvements after disabling it.
>
>
> > On 5.11.2018, at 17:32, Rhian Resnick  wrote:
> >
> > We are running cephfs-data-scan to rebuild metadata. Would changing the
> cache tier mode of our cephfs data partition improve performance? If so
> what should we switch to?
> >
> > Thanks
> >
> > Rhian
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] speeding up ceph

2018-11-05 Thread Rhian Resnick
We are running cephfs-data-scan to rebuild metadata. Would changing the
cache tier mode of our cephfs data partition improve performance? If so
what should we switch to?

Thanks

Rhian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs-data-scan

2018-11-03 Thread Rhian Resnick
Sounds like we are going to restart with 20 threads on each storage node.

On Sat, Nov 3, 2018 at 8:26 PM Sergey Malinin  wrote:

> scan_extents using 8 threads took 82 hours for my cluster holding 120M
> files on 12 OSDs with 1gbps between nodes. I would have gone with lot more
> threads if I had known it only operated on data pool and the only problem
> was network latency. If I recall correctly, each worker used up to 800mb
> ram so beware the OOM killer.
> scan_inodes runs several times faster but I don’t remember exact timing.
> In your case I believe scan_extents & scan_inodes can be done in a few
> hours by running the tool on each OSD node, but scan_links will be
> painfully slow due to it’s single-threaded nature.
> In my case I ended up getting MDS to start and copied all data to a fresh
> filesystem ignoring errors.
> On Nov 4, 2018, 02:22 +0300, Rhian Resnick , wrote:
>
> For a 150TB file system with 40 Million files how many cephfs-data-scan
> threads should be used? Or what is the expected run time. (we have 160 osd
> with 4TB disks.)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs-data-scan

2018-11-03 Thread Rhian Resnick
For a 150TB file system with 40 Million files how many cephfs-data-scan
threads should be used? Or what is the expected run time. (we have 160 osd
with 4TB disks.)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Snapshot cephfs data pool from ceph cmd

2018-11-03 Thread Rhian Resnick
is it possible to snapshot the cephfs data pool?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs-journal-tool event recover_dentries summary killed due to memory usage

2018-11-03 Thread Rhian Resnick
Having attempted to recover using the journal tool and having that fail we
are goinig to rebuild our metadata using a separate metadata pool.

We have the following procedure we are going to use. The issue I haven't
found yet (likely lack of sleep) is how to replace the original metadata
pool in the cephfs so we can continue to use the default name. Then how we
remove the secondary file system.

# ceph fs

ceph fs flag set enable_multiple true --yes-i-really-mean-it
ceph osd pool create recovery 512 replicated replicated_ruleset
ceph fs new recovery-fs recovery cephfs-cold
--allow-dangerous-metadata-overlay
cephfs-data-scan init --force-init --filesystem recovery-fs
--alternate-pool recovery
ceph fs reset recovery-fs --yes-i-really-mean-it


# create structure
cephfs-table-tool recovery-fs:all reset session
cephfs-table-tool recovery-fs:all reset snap
cephfs-table-tool recovery-fs:all reset inode

# build new metadata

# scan_extents

cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 0
--worker_m 4 --filesystem cephfs cephfs-cold
cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 1
--worker_m 4 --filesystem cephfs cephfs-cold
cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 2
--worker_m 4 --filesystem cephfs cephfs-cold
cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 3
--worker_m 4 --filesystem cephfs cephfs-cold

# scan inodes
cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0
--worker_m 4 --filesystem cephfs --force-corrupt --force-init cephfs-cold
cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0
--worker_m 4 --filesystem cephfs --force-corrupt --force-init cephfs-cold
cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0
--worker_m 4 --filesystem cephfs --force-corrupt --force-init cephfs-cold
cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0
--worker_m 4 --filesystem cephfs --force-corrupt --force-init cephfs-cold

cephfs-data-scan scan_links --filesystem recovery-fs

# need help

Thanks

Rhian

On Fri, Nov 2, 2018 at 9:47 PM Rhian Resnick  wrote:

> I was posting with my office account but I think it is being blocked.
>
> Our cephfs's metadata pool went from 1GB to 1TB in a matter of hours and
> after using all storage on the OSD's reports two damaged ranks.
>
> The cephfs-journal-tool crashes when performing any operations due to
> memory utilization.
>
> We tried a backup which crashed (we then did a rados cppool to backup our
> metadata).
> I then tried to run a dentry recovery which failed due to memory usage.
>
> Any recommendations for the next step?
>
> Data from our config and status
>
>
>
>
> Combined logs (after marking things as repaired to see if that would rescue 
> us):
>
>
> Nov  1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 
> -1 mds.4.purge_queue operator(): Error -108 loading Journaler
> Nov  1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 
> -1 mds.4.purge_queue operator(): Error -108 loading Journaler
> Nov  1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 
> -1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged 
> (MDS_DAMAGE)
> Nov  1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 
> -1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged 
> (MDS_DAMAGE)
> Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 
> 7f6dacd69700 -1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from 
> _is_readable
> Nov  1 10:26:47 ceph-storage2 ceph-mds: mds.1 10.141.255.202:6898/1492854021 
> 1 : Error loading MDS rank 1: (22) Invalid argument
> Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914949 
> 7f6dacd69700  0 mds.1.log _replay journaler got error -22, aborting
> Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 
> 7f6dacd69700 -1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from 
> _is_readable
> Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 
> 7f6dacd69700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: 
> (22) Invalid argument
> Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 
> 7f6dacd69700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: 
> (22) Invalid argument
> Nov  1 10:26:47 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:47.999432 7fa3b57ce700 
> -1 log_channel(cluster) log [ERR] : Health check update: 2 mds daemons 
> damaged (MDS_DAMAGE)
> Nov  1 10:26:47 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:47.999432 7fa3b57ce700 
> -1 log_channel(cluster) log [ERR] : Health check update: 2 mds daemons 
> damaged (MDS_DAMAGE)
> Nov  1 10:26:55 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:55.026231 7fa3b57ce700 
> -1 log_channel(clust

Re: [ceph-users] cephfs-journal-tool event recover_dentries summary killed due to memory usage

2018-11-03 Thread Rhian Resnick
Morning,


Having attempted to recover using the journal tool and having that fail we are 
goinig to rebuild our metadata using a separate metadata pool.


We have the following procedure we are going to use. The issue I haven't found 
yet (likely lack of sleep) is how to replace the original metadata pool in the 
cephfs so we can continue to use the default name. Then how we remove the 
secondary file system.


# ceph fs

ceph fs flag set enable_multiple true --yes-i-really-mean-it
ceph osd pool create recovery 512 replicated replicated_ruleset
ceph fs new recovery-fs recovery cephfs-cold --allow-dangerous-metadata-overlay
cephfs-data-scan init --force-init --filesystem recovery-fs --alternate-pool 
recovery
ceph fs reset recovery-fs --yes-i-really-mean-it


# create structure
cephfs-table-tool recovery-fs:all reset session
cephfs-table-tool recovery-fs:all reset snap
cephfs-table-tool recovery-fs:all reset inode

# build new metadata

# scan_extents
cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 0 --worker_m 
4 --filesystem cephfs cephfs-cold
cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 1 --worker_m 
4 --filesystem cephfs cephfs-cold
cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 2 --worker_m 
4 --filesystem cephfs cephfs-cold
cephfs-data-scan scan_extents --alternate-pool recovery --worker_n 3 --worker_m 
4 --filesystem cephfs cephfs-cold

# scan inodes
cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0 --worker_m 
4 --filesystem cephfs --force-corrupt --force-init cephfs-cold
cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0 --worker_m 
4 --filesystem cephfs --force-corrupt --force-init cephfs-cold
cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0 --worker_m 
4 --filesystem cephfs --force-corrupt --force-init cephfs-cold
cephfs-data-scan scan_inodes --alternate-pool recovery --worker_n 0 --worker_m 
4 --filesystem cephfs --force-corrupt --force-init cephfs-cold

cephfs-data-scan scan_links --filesystem recovery-fs

# need help

how to move the new metadata pool to the original filesystem?

how to remove the new cephfs so the original mounts work.


Rhian Resnick

Associate Director Research Computing

Enterprise Systems

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>



From: ceph-users  on behalf of Rhian Resnick 

Sent: Friday, November 2, 2018 9:47 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] cephfs-journal-tool event recover_dentries summary killed 
due to memory usage

I was posting with my office account but I think it is being blocked.

Our cephfs's metadata pool went from 1GB to 1TB in a matter of hours and after 
using all storage on the OSD's reports two damaged ranks.

The cephfs-journal-tool crashes when performing any operations due to memory 
utilization.

We tried a backup which crashed (we then did a rados cppool to backup our 
metadata).
I then tried to run a dentry recovery which failed due to memory usage.

Any recommendations for the next step?

Data from our config and status




Combined logs (after marking things as repaired to see if that would rescue us):


Nov  1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 
-1 mds.4.purge_queue operator(): Error -108 loading Journaler
Nov  1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 
-1 mds.4.purge_queue operator(): Error -108 loading Journaler
Nov  1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 
-1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged 
(MDS_DAMAGE)
Nov  1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 
-1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged 
(MDS_DAMAGE)
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 7f6dacd69700 
-1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from _is_readable
Nov  1 10:26:47 ceph-storage2 ceph-mds: mds.1 
10.141.255.202:6898/1492854021<http://10.141.255.202:6898/1492854021> 1 : Error 
loading MDS rank 1: (22) Invalid argument
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914949 7f6dacd69700 
 0 mds.1.log _replay journaler got error -22, aborting
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 7f6dacd69700 
-1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from _is_readable
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 7f6dacd69700 
-1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: (22) Invalid 
argument
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 7f6dacd69700 
-1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: (22) Invalid 
argument
Nov  1 10:26:47 ceph-p-mon2 ceph-

[ceph-users] cephfs-journal-tool event recover_dentries summary killed due to memory usage

2018-11-02 Thread Rhian Resnick
I was posting with my office account but I think it is being blocked.

Our cephfs's metadata pool went from 1GB to 1TB in a matter of hours and
after using all storage on the OSD's reports two damaged ranks.

The cephfs-journal-tool crashes when performing any operations due to
memory utilization.

We tried a backup which crashed (we then did a rados cppool to backup our
metadata).
I then tried to run a dentry recovery which failed due to memory usage.

Any recommendations for the next step?

Data from our config and status




Combined logs (after marking things as repaired to see if that would rescue us):


Nov  1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499
7f68db7a3700 -1 mds.4.purge_queue operator(): Error -108 loading
Journaler
Nov  1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499
7f68db7a3700 -1 mds.4.purge_queue operator(): Error -108 loading
Journaler
Nov  1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143
7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update:
1 mds daemon damaged (MDS_DAMAGE)
Nov  1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143
7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update:
1 mds daemon damaged (MDS_DAMAGE)
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934
7f6dacd69700 -1 mds.1.journaler.mdlog(ro) try_read_entry: decode error
from _is_readable
Nov  1 10:26:47 ceph-storage2 ceph-mds: mds.1
10.141.255.202:6898/1492854021 1 : Error loading MDS rank 1: (22)
Invalid argument
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914949
7f6dacd69700  0 mds.1.log _replay journaler got error -22, aborting
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934
7f6dacd69700 -1 mds.1.journaler.mdlog(ro) try_read_entry: decode error
from _is_readable
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745
7f6dacd69700 -1 log_channel(cluster) log [ERR] : Error loading MDS
rank 1: (22) Invalid argument
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745
7f6dacd69700 -1 log_channel(cluster) log [ERR] : Error loading MDS
rank 1: (22) Invalid argument
Nov  1 10:26:47 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:47.999432
7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update:
2 mds daemons damaged (MDS_DAMAGE)
Nov  1 10:26:47 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:47.999432
7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update:
2 mds daemons damaged (MDS_DAMAGE)
Nov  1 10:26:55 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:55.026231
7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update:
1 mds daemon damaged (MDS_DAMAGE)
Nov  1 10:26:55 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:55.026231
7fa3b57ce700 -1 log_channel(cluster) log [ERR] : Health check update:
1 mds daemon damaged (MDS_DAMAGE)

Ceph OSD Status: (The missing and oud osd's are in a different pool
from all data, these were the bad ssds that caused the issue)


  cluster:
id: 6a2e8f21-bca2-492b-8869-eecc995216cc
health: HEALTH_ERR
1 filesystem is degraded
2 mds daemons damaged

  services:
mon: 3 daemons, quorum ceph-p-mon2,ceph-p-mon1,ceph-p-mon3
mgr: ceph-p-mon1(active), standbys: ceph-p-mon2
mds: cephfs-3/5/5 up
{0=ceph-storage3=up:resolve,2=ceph-p-mon3=up:resolve,4=ceph-p-mds1=up:resolve},
3 up:standby, 2 damaged
osd: 170 osds: 167 up, 158 in

  data:
pools:   7 pools, 7520 pgs
objects: 188.46M objects, 161TiB
usage:   275TiB used, 283TiB / 558TiB avail
pgs: 7511 active+clean
 9active+clean+scrubbing+deep

  io:
client:   0B/s rd, 17.2KiB/s wr, 0op/s rd, 1op/s wr



Ceph OSD Tree:

ID  CLASS WEIGHTTYPE NAME  STATUS REWEIGHT PRI-AFF
-10   0 root deefault
 -9 5.53958 root ssds
-11 1.89296 host ceph-cache1
 35   hdd   1.09109 osd.35 up0 1.0
181   hdd   0.26729 osd.181up0 1.0
182   hdd   0.26729 osd.182  down0 1.0
183   hdd   0.26729 osd.183  down0 1.0
-12 1.75366 host ceph-cache2
 46   hdd   1.09109 osd.46 up0 1.0
185   hdd   0.26729 osd.185  down0 1.0
186   hdd   0.12799 osd.186up0 1.0
187   hdd   0.26729 osd.187up0 1.0
-13 1.89296 host ceph-cache3
 60   hdd   1.09109 osd.60 up0 1.0
189   hdd   0.26729 osd.189up0 1.0
190   hdd   0.26729 osd.190up0 1.0
191   hdd   0.26729 osd.191up0 1.0
 -5 4.33493 root ssds-ro
 -6 1.44498 host ceph-storage1-ssd
 85   ssd   0.72249 osd.85 up  1.0 1.0
 89   ssd   0.72249 osd.89 up  

[ceph-users] Damaged MDS Ranks will not start / recover

2018-11-02 Thread Rhian Resnick
 up  1.0 1.0
 77   hdd   3.63199 osd.77 up  1.0 1.0
 82   hdd   3.63199 osd.82 up  1.0 1.0
 86   hdd   3.63199 osd.86 up  1.0 1.0
 88   hdd   3.63199 osd.88 up  1.0 1.0
 95   hdd   3.63199 osd.95 up  1.0 1.0
103   hdd   3.63199 osd.103up  1.0 1.0
109   hdd   3.63199 osd.109up  1.0 1.0
113   hdd   3.63199 osd.113up  1.0 1.0
120   hdd   3.63199 osd.120up  1.0 1.0
127   hdd   3.63199 osd.127up  1.0 1.0
134   hdd   3.63199 osd.134up  1.0 1.0
140   hdd   3.63869 osd.140up  1.0 1.0
141   hdd   3.63199 osd.141up  1.0 1.0
143   hdd   3.63199 osd.143up  1.0 1.0
144   hdd   3.63199 osd.144up  1.0 1.0
145   hdd   3.63199 osd.145up  1.0 1.0
146   hdd   3.63199 osd.146up  1.0 1.0
147   hdd   3.63199 osd.147up  1.0 1.0
148   hdd   3.63199 osd.148up  1.0 1.0
149   hdd   3.63199 osd.149up  1.0 1.0
150   hdd   3.63199 osd.150up  1.0 1.0
151   hdd   3.63199 osd.151up  1.0 1.0
152   hdd   3.63199 osd.152up  1.0 1.0
153   hdd   3.63199 osd.153up  1.0 1.0
154   hdd   3.63199 osd.154up  1.0 1.0
155   hdd   3.63199 osd.155up  1.0 1.0
156   hdd   3.63199 osd.156up  1.0 1.0
157   hdd   3.63199 osd.157up  1.0 1.0
158   hdd   3.63199 osd.158up  1.0 1.0
159   hdd   3.63199 osd.159up  1.0 1.0
161   hdd   3.63199 osd.161up  1.0 1.0
162   hdd   3.63199 osd.162up  1.0 1.0
164   hdd   3.63199 osd.164up  1.0 1.0
165   hdd   3.63199 osd.165up  1.0 1.0
167   hdd   3.63199 osd.167up  1.0 1.0
168   hdd   3.63199 osd.168up  1.0 1.0
169   hdd   3.63199 osd.169up  1.0 1.0
170   hdd   3.63199 osd.170up  1.0 1.0
171   hdd   3.63199 osd.171up  1.0 1.0
172   hdd   3.63199 osd.172up  1.0 1.0
173   hdd   3.63199 osd.173up  1.0 1.0
174   hdd   3.63869 osd.174up  1.0 1.0
177   hdd   3.63199 osd.177up  1.0 1.0



# Ceph configuration shared by all nodes


[global]
fsid = 6a2e8f21-bca2-492b-8869-eecc995216cc
public_network = 10.141.0.0/16
cluster_network = 10.85.8.0/22
mon_initial_members = ceph-p-mon1, ceph-p-mon2, ceph-p-mon3
mon_host = 10.141.161.248,10.141.160.250,10.141.167.237
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx


# Cephfs needs these to be set to support larger directories
mds_bal_frag = true
allow_dirfrags = true

rbd_default_format = 2
mds_beacon_grace = 60
mds session timeout = 120

log to syslog = true
err to syslog = true
clog to syslog = true


[mds]

[osd]
osd op threads = 32
osd max backfills = 32





# Old method of moving ssds to a pool

[osd.85]
host = ceph-storage1
crush_location =  root=ssds host=ceph-storage1-ssd

[osd.89]
host = ceph-storage1
crush_location =  root=ssds host=ceph-storage1-ssd

[osd.160]
host = ceph-storage3
crush_location =  root=ssds host=ceph-storage3-ssd

[osd.163]
host = ceph-storage3
crush_location =  root=ssds host=ceph-storage3-ssd

[osd.166]
host = ceph-storage3
crush_location =  root=ssds host=ceph-storage3-ssd

[osd.5]
host = ceph-storage2
crush_location =  root=ssds host=ceph-storage2-ssd

[osd.68]
host = ceph-storage2
crush_location =  root=ssds host=ceph-storage2-ssd

[osd.87]
host = ceph-storage2
crush_location =  root=ssds host=ceph-storage2-ssd





Rhian Resnick

Associate Director Research Computing

Enterprise Systems

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Removing MDS

2018-11-02 Thread Rhian Resnick
Morning our backup of the metadata is 75% done (rados cppool as the metadata 
export fails by using up all server memory). Before we start working on fixing 
our metadata we wanted our projected procedure to be reviewed.


Does the following sequence look correct for our environment?


  1.  rados cppool cephfs_metadata cephfs_metadata.bk
  2.  cephfs-journal-tool event recover_dentries summary --rank=0
  3.  cephfs-journal-tool event recover_dentries summary --rank=1
  4.  cephfs-journal-tool event recover_dentries summary --rank=2
  5.  cephfs-journal-tool event recover_dentries summary --rank=3
  6.  cephfs-journal-tool event recover_dentries summary --rank=4
  7.  cephfs-journal-tool journal reset --rank=0
  8.  cephfs-journal-tool journal reset --rank=1
  9.  cephfs-journal-tool journal reset --rank=2
  10. cephfs-journal-tool journal reset --rank=3
  11. cephfs-journal-tool journal reset --rank=4
  12. cephfs-table-tool all reset session
  13. Start metadata servers
  14. Scrub mds:
 *   ceph daemon mds.{hostname} scrub_path / recursive
 *   ceph daemon mds.{hostname} scrub_path / force
  15.





Rhian Resnick

Associate Director Research Computing

Enterprise Systems

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>



From: Rhian Resnick
Sent: Thursday, November 1, 2018 10:32 AM
To: Patrick Donnelly
Cc: Ceph Users
Subject: Re: [ceph-users] Removing MDS


Morning all,


This has been a rough couple days. We thought we had resolved all our 
performance issues by moving the ceph metadata to some high intensity write 
disks from Intel but what we didn't notice was that Ceph labeled them as HDD's 
(thanks dell raid controller).


We believe this caused read lock errors and resulted in the journal increasing 
from 700MB to 1 TB in 2 hours. (Basically over lunch) We tried to migrate and 
then stop everything before the OSD's reached full status but failed.


Over the last 12 hours the data has been migrated from the SDD's back to 
spinning disks but the MDS servers are now reporting that two ranks are damaged.


We are running a backup of the metadata pool but wanted to know what the list 
thinks the next steps should be. I have attached the error's we see in the logs 
as well as our OSD Tree, ceph.conf (comments removed), and ceph fs dump.


Combined logs (after marking things as repaired to see if that would rescue us):


Nov  1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 
-1 mds.4.purge_queue operator(): Error -108 loading Journaler
Nov  1 10:07:02 ceph-p-mds2 ceph-mds: 2018-11-01 10:07:02.045499 7f68db7a3700 
-1 mds.4.purge_queue operator(): Error -108 loading Journaler
Nov  1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 
-1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged 
(MDS_DAMAGE)
Nov  1 10:26:40 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:40.968143 7fa3b57ce700 
-1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged 
(MDS_DAMAGE)
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 7f6dacd69700 
-1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from _is_readable
Nov  1 10:26:47 ceph-storage2 ceph-mds: mds.1 10.141.255.202:6898/1492854021 1 
: Error loading MDS rank 1: (22) Invalid argument
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914949 7f6dacd69700 
 0 mds.1.log _replay journaler got error -22, aborting
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.914934 7f6dacd69700 
-1 mds.1.journaler.mdlog(ro) try_read_entry: decode error from _is_readable
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 7f6dacd69700 
-1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: (22) Invalid 
argument
Nov  1 10:26:47 ceph-storage2 ceph-mds: 2018-11-01 10:26:47.915745 7f6dacd69700 
-1 log_channel(cluster) log [ERR] : Error loading MDS rank 1: (22) Invalid 
argument
Nov  1 10:26:47 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:47.999432 7fa3b57ce700 
-1 log_channel(cluster) log [ERR] : Health check update: 2 mds daemons damaged 
(MDS_DAMAGE)
Nov  1 10:26:47 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:47.999432 7fa3b57ce700 
-1 log_channel(cluster) log [ERR] : Health check update: 2 mds daemons damaged 
(MDS_DAMAGE)
Nov  1 10:26:55 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:55.026231 7fa3b57ce700 
-1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged 
(MDS_DAMAGE)
Nov  1 10:26:55 ceph-p-mon2 ceph-mon: 2018-11-01 10:26:55.026231 7fa3b57ce700 
-1 log_channel(cluster) log [ERR] : Health check update: 1 mds daemon damaged 
(MDS_DAMAGE)

Ceph OSD Status: (The missing and oud osd's are in a different pool  from all 
data, these were the bad ssds that caused the issue)


  cluster:
id: 6a2e8f21-bca2-492

Re: [ceph-users] Removing MDS

2018-11-01 Thread Rhian Resnick
up  1.0 1.0
149   hdd   3.63199 osd.149up  1.0 1.0
150   hdd   3.63199 osd.150up  1.0 1.0
151   hdd   3.63199 osd.151up  1.0 1.0
152   hdd   3.63199 osd.152up  1.0 1.0
153   hdd   3.63199 osd.153up  1.0 1.0
154   hdd   3.63199 osd.154up  1.0 1.0
155   hdd   3.63199 osd.155up  1.0 1.0
156   hdd   3.63199 osd.156up  1.0 1.0
157   hdd   3.63199 osd.157up  1.0 1.0
158   hdd   3.63199 osd.158up  1.0 1.0
159   hdd   3.63199 osd.159up  1.0 1.0
161   hdd   3.63199 osd.161up  1.0 1.0
162   hdd   3.63199 osd.162up  1.0 1.0
164   hdd   3.63199 osd.164up  1.0 1.0
165   hdd   3.63199 osd.165up  1.0 1.0
167   hdd   3.63199 osd.167up  1.0 1.0
168   hdd   3.63199 osd.168up  1.0 1.0
169   hdd   3.63199 osd.169up  1.0 1.0
170   hdd   3.63199 osd.170up  1.0 1.0
171   hdd   3.63199 osd.171up  1.0 1.0
172   hdd   3.63199 osd.172up  1.0 1.0
173   hdd   3.63199 osd.173up  1.0 1.0
174   hdd   3.63869 osd.174up  1.0 1.0
177   hdd   3.63199 osd.177up  1.0 1.0



# Ceph configuration shared by all nodes


[global]
fsid = 6a2e8f21-bca2-492b-8869-eecc995216cc
public_network = 10.141.0.0/16
cluster_network = 10.85.8.0/22
mon_initial_members = ceph-p-mon1, ceph-p-mon2, ceph-p-mon3
mon_host = 10.141.161.248,10.141.160.250,10.141.167.237
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx


# Cephfs needs these to be set to support larger directories
mds_bal_frag = true
allow_dirfrags = true

rbd_default_format = 2
mds_beacon_grace = 60
mds session timeout = 120

log to syslog = true
err to syslog = true
clog to syslog = true


[mds]

[osd]
osd op threads = 32
osd max backfills = 32





# Old method of moving ssds to a pool

[osd.85]
host = ceph-storage1
crush_location =  root=ssds host=ceph-storage1-ssd

[osd.89]
host = ceph-storage1
crush_location =  root=ssds host=ceph-storage1-ssd

[osd.160]
host = ceph-storage3
crush_location =  root=ssds host=ceph-storage3-ssd

[osd.163]
host = ceph-storage3
crush_location =  root=ssds host=ceph-storage3-ssd

[osd.166]
host = ceph-storage3
crush_location =  root=ssds host=ceph-storage3-ssd

[osd.5]
host = ceph-storage2
crush_location =  root=ssds host=ceph-storage2-ssd

[osd.68]
host = ceph-storage2
crush_location =  root=ssds host=ceph-storage2-ssd

[osd.87]
host = ceph-storage2
crush_location =  root=ssds host=ceph-storage2-ssd






Rhian Resnick

Associate Director Research Computing

Enterprise Systems

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>



From: Patrick Donnelly 
Sent: Tuesday, October 30, 2018 8:40 PM
To: Rhian Resnick
Cc: Ceph Users
Subject: Re: [ceph-users] Removing MDS

On Tue, Oct 30, 2018 at 4:05 PM Rhian Resnick  wrote:
> We are running into issues deactivating mds ranks. Is there a way to safely 
> forcibly remove a rank?

No, there's no "safe" way to force the issue. The rank needs to come
back, flush its journal, and then complete its deactivation. To get
more help, you need to describe your environment, version of Ceph in
use, relevant log snippets, etc.

--
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Removing MDS

2018-10-30 Thread Rhian Resnick
That is what I though. I am increasing debug to see where we are getting stuck. 
I am not sure if it is an issue deactivating or a rdlock issue.


Thanks if we discover more we will post a question with details.


Rhian Resnick

Associate Director Research Computing

Enterprise Systems

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>



From: Patrick Donnelly 
Sent: Tuesday, October 30, 2018 8:40 PM
To: Rhian Resnick
Cc: Ceph Users
Subject: Re: [ceph-users] Removing MDS

On Tue, Oct 30, 2018 at 4:05 PM Rhian Resnick  wrote:
> We are running into issues deactivating mds ranks. Is there a way to safely 
> forcibly remove a rank?

No, there's no "safe" way to force the issue. The rank needs to come
back, flush its journal, and then complete its deactivation. To get
more help, you need to describe your environment, version of Ceph in
use, relevant log snippets, etc.

--
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Removing MDS

2018-10-30 Thread Rhian Resnick
Evening,


We are running into issues deactivating mds ranks. Is there a way to safely 
forcibly remove a rank?


Rhian Resnick

Associate Director Research Computing

Enterprise Systems

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Reducing Max_mds

2018-10-30 Thread Rhian Resnick
John,


Thanks!


Rhian Resnick

Associate Director Research Computing

Enterprise Systems

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>



From: John Spray 
Sent: Tuesday, October 30, 2018 5:26 AM
To: Rhian Resnick
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Reducing Max_mds

On Tue, Oct 30, 2018 at 6:36 AM Rhian Resnick  wrote:
>
> Evening,
>
>
> I am looking to decrease our max mds servers as we had a server failure and 
> need to remove a node.
>
>
> When we attempt to decrease the number of mds servers from 5 to 4 (or any 
> other number) they never transition to standby. They just stay active.
>
>
> ceph fs set cephfs max_mds X

After you decrease max_mds, use "ceph mds deactivate " to bring
the actual number of active daemons in line with your new intended
maximum.

>From Ceph 13.x that happens automatically, but since you're on 12.x it
needs doing by hand.

John

>
> Nothing looks useful in the mds or mon logs and I was wondering what you 
> recommend looking at?
>
>
> We are on 12.2.9 running Centos.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Reducing Max_mds

2018-10-30 Thread Rhian Resnick
Evening,


I am looking to decrease our max mds servers as we had a server failure and 
need to remove a node.


When we attempt to decrease the number of mds servers from 5 to 4 (or any other 
number) they never transition to standby. They just stay active.


ceph fs set cephfs max_mds X


Nothing looks useful in the mds or mon logs and I was wondering what you 
recommend looking at?


We are on 12.2.9 running Centos.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error Creating OSD

2018-04-14 Thread Rhian Resnick
Afternoon,


Happily, I resolved this issue.


Running vgdisplay showed that ceph-volume tried to create a disk on failed 
disk. (We didn't know we had a bad did so this is information that was new to 
us) and when the command failed it left three bad volume groups. Since you 
cannot rename them you need to use the following command to delete them.


vgdisplay to find the bad volume groups

vgremove --select vg_uuid=your uuid -f # -f forces it to be removed


Rhian Resnick

Associate Director Middleware and HPC

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>



From: Rhian Resnick
Sent: Saturday, April 14, 2018 12:47 PM
To: Alfredo Deza
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Error Creating OSD


Thanks all,


Here is a link to our our command being executed: https://pastebin.com/iy8iSaKH



Here are the results from the command


Executed with debug enabled (after a zap with destroy)


[root@ceph-storage3 ~]# ceph-volume lvm create --bluestore --data /dev/sdu
Running command: ceph-authtool --gen-print-key
Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
664894a8-530a-4557-b2f4-1af5b391f2b7
--> Was unable to complete a new OSD, will rollback changes
--> OSD will be fully purged from the cluster, because the ID was generated
Running command: ceph osd purge osd.140 --yes-i-really-mean-it
 stderr: purged osd.140
Traceback (most recent call last):
  File "/sbin/ceph-volume", line 6, in 
main.Volume()
  File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 37, in 
__init__
self.main(self.argv)
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 59, 
in newfunc
return f(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 153, in main
terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in 
dispatch
instance.main()
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/main.py", line 
38, in main
terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in 
dispatch
instance.main()
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/create.py", 
line 74, in main
self.create(args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, 
in is_root
return func(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/create.py", 
line 26, in create
prepare_step.safe_prepare(args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", 
line 217, in safe_prepare
self.prepare(args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, 
in is_root
return func(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", 
line 283, in prepare
block_lv = self.prepare_device(args.data, 'block', cluster_fsid, osd_fsid)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", 
line 193, in prepare_device
if api.get_vg(vg_name=vg_name):
  File "/usr/lib/python2.7/site-packages/ceph_volume/api/lvm.py", line 334, in 
get_vg
return vgs.get(vg_name=vg_name, vg_tags=vg_tags)
  File "/usr/lib/python2.7/site-packages/ceph_volume/api/lvm.py", line 429, in 
get
    raise MultipleVGsError(vg_name)
ceph_volume.exceptions.MultipleVGsError: Got more than 1 result looking for 
volume group: ceph-6a2e8f21-bca2-492b-8869-eecc995216cc




Rhian Resnick

Associate Director Middleware and HPC

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>



From: Alfredo Deza <ad...@redhat.com>
Sent: Saturday, April 14, 2018 8:45 AM
To: Rhian Resnick
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Error Creating OSD



On Fri, Apr 13, 2018 at 8:20 PM, Rhian Resnick 
<rresn...@fau.edu<mailto:rresn...@fau.edu>> wrote:

Evening,

When attempting to create an OSD we receive the following error.

[ceph-admin@ceph-storage3 ~]$ sudo ceph-volume lvm create --bluestore --data 
/dev/sdu
Running command: ceph-authtool --gen-print-key
Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
c8cb8cff-dad9-48b8-8d77-6f130a4b629d
--> Was unable to complete a new OSD, will rollback changes

Re: [ceph-users] Error Creating OSD

2018-04-14 Thread Rhian Resnick
Thanks all,


Here is a link to our our command being executed: https://pastebin.com/iy8iSaKH



Here are the results from the command


Executed with debug enabled (after a zap with destroy)


[root@ceph-storage3 ~]# ceph-volume lvm create --bluestore --data /dev/sdu
Running command: ceph-authtool --gen-print-key
Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
664894a8-530a-4557-b2f4-1af5b391f2b7
--> Was unable to complete a new OSD, will rollback changes
--> OSD will be fully purged from the cluster, because the ID was generated
Running command: ceph osd purge osd.140 --yes-i-really-mean-it
 stderr: purged osd.140
Traceback (most recent call last):
  File "/sbin/ceph-volume", line 6, in 
main.Volume()
  File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 37, in 
__init__
self.main(self.argv)
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 59, 
in newfunc
return f(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 153, in main
terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in 
dispatch
instance.main()
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/main.py", line 
38, in main
terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in 
dispatch
instance.main()
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/create.py", 
line 74, in main
self.create(args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, 
in is_root
return func(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/create.py", 
line 26, in create
prepare_step.safe_prepare(args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", 
line 217, in safe_prepare
self.prepare(args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, 
in is_root
return func(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", 
line 283, in prepare
block_lv = self.prepare_device(args.data, 'block', cluster_fsid, osd_fsid)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/prepare.py", 
line 193, in prepare_device
if api.get_vg(vg_name=vg_name):
  File "/usr/lib/python2.7/site-packages/ceph_volume/api/lvm.py", line 334, in 
get_vg
return vgs.get(vg_name=vg_name, vg_tags=vg_tags)
  File "/usr/lib/python2.7/site-packages/ceph_volume/api/lvm.py", line 429, in 
get
raise MultipleVGsError(vg_name)
ceph_volume.exceptions.MultipleVGsError: Got more than 1 result looking for 
volume group: ceph-6a2e8f21-bca2-492b-8869-eecc995216cc




Rhian Resnick

Associate Director Middleware and HPC

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>



From: Alfredo Deza <ad...@redhat.com>
Sent: Saturday, April 14, 2018 8:45 AM
To: Rhian Resnick
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Error Creating OSD



On Fri, Apr 13, 2018 at 8:20 PM, Rhian Resnick 
<rresn...@fau.edu<mailto:rresn...@fau.edu>> wrote:

Evening,

When attempting to create an OSD we receive the following error.

[ceph-admin@ceph-storage3 ~]$ sudo ceph-volume lvm create --bluestore --data 
/dev/sdu
Running command: ceph-authtool --gen-print-key
Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
c8cb8cff-dad9-48b8-8d77-6f130a4b629d
--> Was unable to complete a new OSD, will rollback changes
--> OSD will be fully purged from the cluster, because the ID was generated
Running command: ceph osd purge osd.140 --yes-i-really-mean-it
 stderr: purged osd.140
-->  MultipleVGsError: Got more than 1 result looking for volume group: 
ceph-6a2e8f21-bca2-492b-8869-eecc995216cc

Any hints on what to do? This occurs when we attempt to create osd's on this 
node.

Can you use a paste site and get the /var/log/ceph/ceph-volume.log contents? 
Also, if you could try the same command but with:

CEPH_VOLUME_DEBUG=1

I think you are hitting two issues here:

1) Somehow `osd new` is not completing and failing
2) The `purge` command to wipe out the LV is getting multiple LV's and cannot 
make sure to match the one it used.

#2 definitely looks like something we are doing wrong, and #1 can have a lot of 
different causes. The logs would be tremendously helpful!


Rhian Resnick

Associate Director Middleware and HPC

Office of Inf

[ceph-users] Error Creating OSD

2018-04-13 Thread Rhian Resnick
Evening,

When attempting to create an OSD we receive the following error.

[ceph-admin@ceph-storage3 ~]$ sudo ceph-volume lvm create --bluestore --data 
/dev/sdu
Running command: ceph-authtool --gen-print-key
Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
c8cb8cff-dad9-48b8-8d77-6f130a4b629d
--> Was unable to complete a new OSD, will rollback changes
--> OSD will be fully purged from the cluster, because the ID was generated
Running command: ceph osd purge osd.140 --yes-i-really-mean-it
 stderr: purged osd.140
-->  MultipleVGsError: Got more than 1 result looking for volume group: 
ceph-6a2e8f21-bca2-492b-8869-eecc995216cc

Any hints on what to do? This occurs when we attempt to create osd's on this 
node.


Rhian Resnick

Associate Director Middleware and HPC

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs increase max file size

2017-08-04 Thread Rhian Resnick
Morning,


We ran into an issue with the default max file size of a cephfs file. Is it 
possible to increase this value to 20 TB from 1 TB without recreating the file 
system?


Rhian Resnick

Assistant Director Middleware and HPC

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Inconsistent pgs with size_mismatch_oi

2017-07-03 Thread Rhian Resnick
I didn't see any guidance on how to resolve the check some error online. Any 
hints?


Rhian Resnick

Assistant Director Middleware and HPC

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>



From: Gregory Farnum <gfar...@redhat.com>
Sent: Monday, July 3, 2017 11:49 AM
To: Rhian Resnick
Cc: ceph-users
Subject: Re: [ceph-users] Inconsistent pgs with size_mismatch_oi

On Mon, Jul 3, 2017 at 6:02 AM, Rhian Resnick <rresn...@fau.edu> wrote:
>
> Sorry to bring up an old post but on Kraken I am unable to repair a PG that 
> is inconsistent in a  cache tier . We remove the bad object but am still 
> seeing the following error in the OSD's logs.

It's possible, but the digest error means they checksum differently,
rather than having different sizes (and the size check precedes the
digest one).
The part where all three of them are exactly the same is interesting
and actually makes me suspect that something just went wrong in
calculating the checksum...

>
>
>
> Prior to removing invalid object:
>
> /var/log/ceph/ceph-osd.126.log:928:2017-07-03 08:07:55.331479 7f95a73eb700 -1 
> log_channel(cluster) log [ERR] : 1.15f shard 63:  soid 
> 1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest 
> 0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 
> client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 
> alloc_hint [0 0 0])
> /var/log/ceph/ceph-osd.126.log:929:2017-07-03 08:07:55.331483 7f95a73eb700 -1 
> log_channel(cluster) log [ERR] : 1.15f shard 126: soid 
> 1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest 
> 0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 
> client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 
> alloc_hint [0 0 0])
> /var/log/ceph/ceph-osd.126.log:930:2017-07-03 08:07:55.331487 7f95a73eb700 -1 
> log_channel(cluster) log [ERR] : 1.15f shard 143: soid 
> 1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest 
> 0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 
> client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 
> alloc_hint [0 0 0])
> /var/log/ceph/ceph-osd.126.log:931:2017-07-03 08:07:55.331491 7f95a73eb700 -1 
> log_channel(cluster) log [ERR] : 1.15f soid 
> 1:fa86fe35:::10006cdc2c5.:head: failed to pick suitable auth object
> /var/log/ceph/ceph-osd.126.log:932:2017-07-03 08:08:27.605139 7f95a4be6700 -1 
> log_channel(cluster) log [ERR] : 1.15f repair 3 errors, 0 fixed
>
>
>
> Post Removing invalid object:
> /var/log/ceph/ceph-osd.126.log:3433:2017-07-03 08:37:03.780584 7f95a73eb700 
> -1 log_channel(cluster) log [ERR] : 1.15f shard 63:  soid 
> 1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest 
> 0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 
> client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 
> alloc_hint [0 0 0])
> /var/log/ceph/ceph-osd.126.log:3434:2017-07-03 08:37:03.780591 7f95a73eb700 
> -1 log_channel(cluster) log [ERR] : 1.15f shard 126: soid 
> 1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest 
> 0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 
> client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 
> alloc_hint [0 0 0])
> /var/log/ceph/ceph-osd.126.log:3435:2017-07-03 08:37:03.780593 7f95a73eb700 
> -1 log_channel(cluster) log [ERR] : 1.15f shard 143  missing 
> 1:fa86fe35:::10006cdc2c5.:head
> /var/log/ceph/ceph-osd.126.log:3436:2017-07-03 08:37:03.780594 7f95a73eb700 
> -1 log_channel(cluster) log [ERR] : 1.15f soid 
> 1:fa86fe35:::10006cdc2c5.:head: failed to pick suitable auth object
> /var/log/ceph/ceph-osd.126.log:3437:2017-07-03 08:37:39.278991 7f95a4be6700 
> -1 log_channel(cluster) log [ERR] : 1.15f repair 3 errors, 0 fixed
>
>
>
> Is it possible this thread is related to the error we are seeing?
>
>
> Rhian Resnick
>
> Assistant Director Middleware and HPC
>
> Office of Information Technology
>
>
> Florida Atlantic University
>
> 777 Glades Road, CM22, Rm 173B
>
> Boca Raton, FL 33431
>
> Phone 561.297.2647
>
> Fax 561.297.0222
>
>
>
>
>
> 
> From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Gregory 
> Farnum <gfar...@redhat.com>
> Sent: Monday, May 15, 2017 6:28 PM
> To: Lincoln Bryant; Weil, Sage
> Cc: ceph-users
> Subjec

Re: [ceph-users] Inconsistent pgs with size_mismatch_oi

2017-07-03 Thread Rhian Resnick
Sorry to bring up an old post but on Kraken I am unable to repair a PG that is 
inconsistent in a  cache tier . We remove the bad object but am still seeing 
the following error in the OSD's logs.



Prior to removing invalid object:

/var/log/ceph/ceph-osd.126.log:928:2017-07-03 08:07:55.331479 7f95a73eb700 -1 
log_channel(cluster) log [ERR] : 1.15f shard 63:  soid 
1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest 
0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 
client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 
alloc_hint [0 0 0])
/var/log/ceph/ceph-osd.126.log:929:2017-07-03 08:07:55.331483 7f95a73eb700 -1 
log_channel(cluster) log [ERR] : 1.15f shard 126: soid 
1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest 
0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 
client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 
alloc_hint [0 0 0])
/var/log/ceph/ceph-osd.126.log:930:2017-07-03 08:07:55.331487 7f95a73eb700 -1 
log_channel(cluster) log [ERR] : 1.15f shard 143: soid 
1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest 
0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 
client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 
alloc_hint [0 0 0])
/var/log/ceph/ceph-osd.126.log:931:2017-07-03 08:07:55.331491 7f95a73eb700 -1 
log_channel(cluster) log [ERR] : 1.15f soid 
1:fa86fe35:::10006cdc2c5.:head: failed to pick suitable auth object
/var/log/ceph/ceph-osd.126.log:932:2017-07-03 08:08:27.605139 7f95a4be6700 -1 
log_channel(cluster) log [ERR] : 1.15f repair 3 errors, 0 fixed



Post Removing invalid object:
/var/log/ceph/ceph-osd.126.log:3433:2017-07-03 08:37:03.780584 7f95a73eb700 -1 
log_channel(cluster) log [ERR] : 1.15f shard 63:  soid 
1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest 
0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 
client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 
alloc_hint [0 0 0])
/var/log/ceph/ceph-osd.126.log:3434:2017-07-03 08:37:03.780591 7f95a73eb700 -1 
log_channel(cluster) log [ERR] : 1.15f shard 126: soid 
1:fa86fe35:::10006cdc2c5.:head data_digest 0x931041e9 != data_digest 
0xcd130b55 from auth oi 1:fa86fe35:::10006cdc2c5.:head(25726'1664129 
client.8168902.0:607753 dirty|data_digest s 1713351 uv 1664129 dd cd130b55 
alloc_hint [0 0 0])
/var/log/ceph/ceph-osd.126.log:3435:2017-07-03 08:37:03.780593 7f95a73eb700 -1 
log_channel(cluster) log [ERR] : 1.15f shard 143  missing 
1:fa86fe35:::10006cdc2c5.:head
/var/log/ceph/ceph-osd.126.log:3436:2017-07-03 08:37:03.780594 7f95a73eb700 -1 
log_channel(cluster) log [ERR] : 1.15f soid 
1:fa86fe35:::10006cdc2c5.:head: failed to pick suitable auth object
/var/log/ceph/ceph-osd.126.log:3437:2017-07-03 08:37:39.278991 7f95a4be6700 -1 
log_channel(cluster) log [ERR] : 1.15f repair 3 errors, 0 fixed



Is it possible this thread is related to the error we are seeing?



Rhian Resnick

Assistant Director Middleware and HPC

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>



From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Gregory 
Farnum <gfar...@redhat.com>
Sent: Monday, May 15, 2017 6:28 PM
To: Lincoln Bryant; Weil, Sage
Cc: ceph-users
Subject: Re: [ceph-users] Inconsistent pgs with size_mismatch_oi



On Mon, May 15, 2017 at 3:19 PM Lincoln Bryant 
<linco...@uchicago.edu<mailto:linco...@uchicago.edu>> wrote:
Hi Greg,

Curiously, some of these scrub errors went away on their own. The example pg in 
the original post is now active+clean, and nothing interesting in the logs:

# zgrep "36.277b" ceph-osd.244*gz
ceph-osd.244.log-20170510.gz:2017-05-09 06:56:40.739855 7f0184623700  0 
log_channel(cluster) log [INF] : 36.277b scrub starts
ceph-osd.244.log-20170510.gz:2017-05-09 06:58:01.872484 7f0186e28700  0 
log_channel(cluster) log [INF] : 36.277b scrub ok
ceph-osd.244.log-20170511.gz:2017-05-10 20:40:47.536974 7f0186e28700  0 
log_channel(cluster) log [INF] : 36.277b scrub starts
ceph-osd.244.log-20170511.gz:2017-05-10 20:41:38.399614 7f0184623700  0 
log_channel(cluster) log [INF] : 36.277b scrub ok
ceph-osd.244.log-20170514.gz:2017-05-13 20:49:47.063789 7f0186e28700  0 
log_channel(cluster) log [INF] : 36.277b scrub starts
ceph-osd.244.log-20170514.gz:2017-05-13 20:50:42.085718 7f0186e28700  0 
log_channel(cluster) log [INF] : 36.277b scrub ok
ceph-osd.244.log-20170515.gz:2017-05-15 00:10:39.417578 7f0184623700  0 
log_channel(cluster) log [INF] : 36.277b scrub starts
ceph-osd.244.log-20170515.gz:2017-05-15 00:11:26.189777 7f0186

Re: [ceph-users] Odd latency numbers

2017-03-16 Thread Rhian Resnick
Regarding opennebula it is working, we do find the network functionality less 
then flexible. We would prefer the orchestration layer allow each primary group 
to create a network infrastructure internally to meet their needs and then 
automatically provide nat from one or more public ip addresses (think aws and 
azure). This doesn't seem to be implemented at this time and will likely 
require manual intervention per group of users to resolve. Otherwise we like 
the software and find it much more lightweight then openstack. We need a tool 
that can be managed by a very small team and opennebula meets that goal.




Thanks for checking this out this data for our test cluster, it isn't 
production so yes we are throwing the spaghetti on the wall trying to make sure 
our we are able to handle issues as they come up.


We already planned to increase the pg count and have done so. (thanks)


Here is our osd tree, as this is test we are currently sharing the osd disks 
for cache tier (replica 3) and data (erasure), some more hardware is on the way 
so we can test the using SSD's.


We have been reviewing atop, iostat, sar, and our snmp monitoring (not granular 
enough) and have confirmed the disks on this particular node are under a higher 
load then the others. We will likely take the time to deploy graphite since it 
will help with another project also. On speculation that was discussed this 
morning is a bad cache battery on the perc card in ceph-mon1 which could 
explain the +10 ms latency we see on all for drives. (Wouldn't be ceph at all 
in this case)


ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 3.12685 root default
-2 1.08875 host ceph-mon1
 0 0.27219 osd.0   up  1.0  1.0
 1 0.27219 osd.1   up  1.0  1.0
 2 0.27219 osd.2   up  1.0  1.0
 4 0.27219 osd.4   up  1.0  1.0
-3 0.94936 host ceph-mon2
 3 0.27219 osd.3   up  1.0  1.0
 5 0.27219 osd.5   up  1.0  1.0
 7 0.27219 osd.7   up  1.0  1.0
 9 0.13280 osd.9   up  1.0  1.0
-4 1.08875 host ceph-mon3
 6 0.27219 osd.6   up  1.0  1.0
 8 0.27219 osd.8   up  1.0  1.0
10 0.27219 osd.10  up  1.0  1.0
11 0.27219 osd.11  up  1.0  1.0




Rhian Resnick

Assistant Director Middleware and HPC

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>



From: Christian Balzer <ch...@gol.com>
Sent: Wednesday, March 15, 2017 8:31 PM
To: ceph-users@lists.ceph.com
Cc: Rhian Resnick
Subject: Re: [ceph-users] Odd latency numbers


Hello,

On Wed, 15 Mar 2017 16:49:00 + Rhian Resnick wrote:

> Morning all,
>
>
> We starting to apply load to our test cephfs system and are noticing some odd 
> latency numbers. We are using erasure coding for the cold data pools and 
> replication for our our cache tiers (not on ssd yet) . We noticed the 
> following high latency on one node and it seams to be slowing down writes and 
> reads on the cluster.
>
The pg dump below was massive overkill at this point in time, whereas a
"ceph osd tree" would have probably shown us the topology (where is your
tier, where your EC pool(s)?).
Same for a "ceph osd pool ls detail".

So if we were to assume that node is you cache tier (replica 1?), then the
latencies would make sense.
But that's guesswork, so describe your cluster in more detail.

And yes, a single slow OSD (stealthily failing drive, etc) can bring a
cluster to its knees.
This is why many people here tend to get every last bit of info with
collectd and feed it into carbon and graphite/grafana, etc.
This will immediately indicate culprits and allow you to correlate this
with other data, like actual disk/network/cpu load, etc.

For the time being run atop on that node and see if you can reduce the
issue to something like "all disk are busy all the time" or "CPU meltdown".

>
> Our next step is break out mds, mgr, and mons to different machines but we 
> wanted to start the discussion here.
>

If your nodes (not a single iota of HW/NW info from you) are powerful
enough, breaking out stuff isn't likely to help or a necessity.

More below.

>
> Here is a bunch of information you may find useful.
>
>
> ceph.conf
>
> [global]
> fsid = X
> mon_initial_members = ceph-mon1, ceph-mon2, ceph-mon3
> mon_host = 10.141.167.238,10.141.160.251,10.141.161.249
> auth_cluster_required = cephx
> auth_service_requ

[ceph-users] Odd latency numbers

2017-03-15 Thread Rhian Resnick
:52.1942570'0  99:61 [2,11,7]  
2 [2,11,7]  20'0 2017-03-15 07:35:23.370607 
0'0 2017-03-12 22:50:44.393477
0.33  0  00 0   0   00  
  0 active+clean 2017-03-15 10:59:52.1965500'0  99:37 [3,10,2]  
3 [3,10,2]  30'0 2017-03-14 10:22:36.232867 
0'0 2017-03-12 22:50:44.393478
0.34  0  00 0   0   00  
  0 active+clean 2017-03-15 11:00:36.9608760'0  99:48 [9,11,2]  
9 [9,11,2]  90'0 2017-03-15 11:00:36.960827 
0'0 2017-03-12 22:50:44.393479
0.35  0  00 0   0   00  
  0 active+clean 2017-03-15 10:59:52.1199400'0  99:46  [5,6,1]  
5  [5,6,1]  50'0 2017-03-14 05:38:36.747488 
0'0 2017-03-12 22:50:44.393481
0.36  0  00 0   0   00  
  0 active+clean 2017-03-15 10:59:50.0872360'0  97:48  [9,6,4]  
9  [9,6,4]  90'0 2017-03-15 09:17:58.121500 
0'0 2017-03-12 22:50:44.393482
0.37  0  00 0   0   00  
  0 active+clean 2017-03-15 10:59:52.1947850'0  99:39  [9,2,8]  
9  [9,2,8]  90'0 2017-03-15 04:47:09.503797 
0'0 2017-03-12 22:50:44.393483
0.38  0  00 0   0   00  
  0 active+clean 2017-03-15 10:59:45.7985990'0  94:56 [9,0,11]  
9 [9,0,11]  90'0 2017-03-15 04:44:51.485759 
0'0 2017-03-12 22:50:44.393484
0.39  0  00 0   0   00  
  0 active+clean 2017-03-15 11:52:52.3924200'0 100:48  [9,1,6]  
9  [9,1,6]  90'0 2017-03-15 11:52:52.392352 
0'0 2017-03-12 22:50:44.393486
0.3a  0  00 0   0   00  
  0 active+clean 2017-03-15 10:59:50.0866270'0  97:59  [4,8,7]  
4  [4,8,7]  40'0 2017-03-14 09:43:11.863773 
0'0 2017-03-14 09:43:11.863773
0.3b  0  00 0   0   00  
  0 active+clean 2017-03-15 10:59:52.1710560'0  99:43  [2,8,5]  
2  [2,8,5]  20'0 2017-03-15 07:41:47.424549 
0'0 2017-03-12 22:50:44.393488
0.3c  0  00 0   0   00  
  0 active+clean 2017-03-15 11:02:00.9813320'0  99:39  [9,2,8]  
9  [9,2,8]  90'0 2017-03-15 11:02:00.981264 
0'0 2017-03-12 22:50:44.393490
0.3d  0  00 0   0   00  
  0 active+clean 2017-03-15 10:59:52.5408310'0  99:61 [1,10,7]  
1 [1,10,7]  10'0 2017-03-15 07:55:33.313612 
0'0 2017-03-12 22:50:44.393491
0.3e  0  00 0   0   00  
  0 active+clean 2017-03-15 10:59:52.1957870'0  99:30  [8,2,3]  
8  [8,2,3]  80'0 2017-03-15 02:28:17.185544 
0'0 2017-03-12 22:50:44.393492
0.3f  0  00 0   0   00  
  0 active+clean 2017-03-15 10:59:52.2655300'0  99:43 [2,11,5]  
2 [2,11,5]  20'0 2017-03-15 00:30:38.822799 
0'0 2017-03-12 22:50:44.393493

5  0 0 0 0 00 0 0
4  0 0 0 0 00 0 0
3  52275 0 0 0 0 48371113 36638 36638
2 650158 0 0 0 0 426305171964 36623 36623
1 466451 0 0 0 0  79835701754 36540 36540
0  0 0 0 0 00 0 0

sum 1168884 0 0 0 0 506189244831 109801 109801
OSD_STAT USED   AVAIL TOTAL HB_PEERSPG_SUM
11   56284M  223G  278G  [0,1,2,3,4,5,7,8,9,10] 28
10   97062M  183G  278G  [0,1,2,4,5,6,7,8,9,11] 30
041564M  238G  278G [1,3,4,5,6,7,8,9,10,11] 22
1  123G  154G  278G [0,2,3,5,6,7,8,9,10,11] 42
2  112G  166G  278G [1,3,4,5,6,7,8,9,10,11] 28
447643M  232G  278G [0,1,3,5,6,7,8,9,10,11] 32
397557M  183G  278G [0,1,2,4,6,7,8,9,10,11] 32
5  127G  151G  278G [0,1,2,4,6,7,8,9,10,11] 31
671151M  209G  278G [1,2,3,4,5,7,8,9,10,11] 32
779459M  201G  278G [0,1,2,4,5,6,8,9,10,11] 40
923961M  112G  136G [0,1,2,4,5,6,7,8,10,11] 21
8  104G  174G  278G [0,1,2,3,4,5,7,9,10,11] 34
sum970G 2231G 3202G




Rhian Resnick

Assistant Director Middleware and HPC

Office of Information Technology


Florida Atlantic University

777 Glades Road

Re: [ceph-users] cephfs and erasure coding

2017-03-09 Thread Rhian Resnick
Thanks everyone for the input. We are online in our test environment and are 
running user workflows to make sure everything is running as expected.



Rhian

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Rhian 
Resnick
Sent: Thursday, March 9, 2017 8:31 AM
To: Maxime Guyot <maxime.gu...@elits.com>
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] cephfs and erasure coding

Thanks for the confirmations of what is possible.

We plan on creating a new file system, rsync and delete the old one.

Rhian

On Mar 9, 2017 2:27 AM, Maxime Guyot 
<maxime.gu...@elits.com<mailto:maxime.gu...@elits.com>> wrote:

Hi,



>“The answer as to how to move an existing cephfs pool from replication to 
>erasure coding (and vice versa) is to create the new pool and rsync your data 
>between them.”

Shouldn’t it be possible to just do the “ceph osd tier add  ecpool cachepool && 
ceph osd tier cache-mode cachepool writeback” and let Ceph redirect the 
requests (CephFS or other) to the cache pool?



Cheers,

Maxime



From: ceph-users 
<ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>> 
on behalf of David Turner 
<david.tur...@storagecraft.com<mailto:david.tur...@storagecraft.com>>
Date: Wednesday 8 March 2017 22:27
To: Rhian Resnick <rresn...@fau.edu<mailto:rresn...@fau.edu>>, 
"ceph-us...@ceph.com<mailto:ceph-us...@ceph.com>" 
<ceph-us...@ceph.com<mailto:ceph-us...@ceph.com>>
Subject: Re: [ceph-users] cephfs and erasure coding



I use CephFS on erasure coding at home using a cache tier.  It works fine for 
my use case, but we know nothing about your use case to know if it will work 
well for you.

The answer as to how to move an existing cephfs pool from replication to 
erasure coding (and vice versa) is to create the new pool and rsync your data 
between them.



[cid:image001.jpg@01D298AE.DE1475E0]<https://storagecraft.com>


David Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943




If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.


____
____

From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Rhian Resnick 
[rresn...@fau.edu]
Sent: Wednesday, March 08, 2017 12:54 PM
To: ceph-us...@ceph.com<mailto:ceph-us...@ceph.com>
Subject: [ceph-users] cephfs and erasure coding

Two questions on Cephfs and erasure coding that Google couldn't answer.





1) How well does cephfs work with erasure coding?



2) How would you move an existing cephfs pool that uses replication to erasure 
coding?



Rhian Resnick

Assistant Director Middleware and HPC

Office of Information Technology



Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [mage] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs and erasure coding

2017-03-08 Thread Rhian Resnick
Two questions on Cephfs and erasure coding that Google couldn't answer.



1) How well does cephfs work with erasure coding?


2) How would you move an existing cephfs pool that uses replication to erasure 
coding?


Rhian Resnick

Assistant Director Middleware and HPC

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cephfs with large numbers of files per directory

2017-02-21 Thread Rhian Resnick
Logan,


Thank you for the feedback.


Rhian Resnick

Assistant Director Middleware and HPC

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>



From: Logan Kuhn <log...@wolfram.com>
Sent: Tuesday, February 21, 2017 8:42 AM
To: Rhian Resnick
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] Cephfs with large numbers of files per directory

We have a very similar configuration at one point.

I was fairly new when we started to move away from it, but what happened to us 
is that anytime a directory needed to stat, backup, ls, rsync, etc.  It would 
take minutes to return and while it was waiting CPU load would spike due to 
iowait.  The difference between what you've said and what we did was that we 
used a gateway machine, the actual cluster never had any issues with it.  This 
was also on infernalis so things probably have changed in Jewel and Kraken.

Regards,
Logan

- On Feb 21, 2017, at 7:37 AM, Rhian Resnick <rresn...@fau.edu> wrote:

Good morning,


We are currently investigating using Ceph for a KVM farm, block storage and 
possibly file systems (cephfs with ceph-fuse, and ceph hadoop). Our cluster 
will be composed of 4 nodes, ~240 OSD's, and 4 monitors providing mon and mds 
as required.


What experience has the community had with large numbers of files in a single 
directory (500,000 - 5 million). We know that directory fragmentation will be 
required but are concerned about the stability of the implementation.


Your opinions and suggestions are welcome.


Thank you


Rhian Resnick

Assistant Director Middleware and HPC

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cephfs with large numbers of files per directory

2017-02-21 Thread Rhian Resnick
Good morning,


We are currently investigating using Ceph for a KVM farm, block storage and 
possibly file systems (cephfs with ceph-fuse, and ceph hadoop). Our cluster 
will be composed of 4 nodes, ~240 OSD's, and 4 monitors providing mon and mds 
as required.


What experience has the community had with large numbers of files in a single 
directory (500,000 - 5 million). We know that directory fragmentation will be 
required but are concerned about the stability of the implementation.


Your opinions and suggestions are welcome.


Thank you


Rhian Resnick

Assistant Director Middleware and HPC

Office of Information Technology


Florida Atlantic University

777 Glades Road, CM22, Rm 173B

Boca Raton, FL 33431

Phone 561.297.2647

Fax 561.297.0222

 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com