Would it be beneficial for anyone to have an archive copy of an osd
that took more than 4 days to export. All but an hour of that time was
spent exporting 1 pg (that ended up being 197MB). I can even send
along the extracted pg for analysis...
--
Adam
On Fri, Jun 3, 2016 at 2:39 PM, Adam Tygart
With regards to this export/import process, I've been exporting a pg
from an osd for more than 24 hours now. The entire OSD only has 8.6GB
of data. 3GB of that is in omap. The export for this particular PG is
only 108MB in size right now, after more than 24 hours. How is it
possible that a fragment
Nice catch. That was a copy-paste error. Sorry
it should have read:
3. Flush the journal and export the primary version of the PG. This took
1 minute on a well-behaved PG and 4 hours on the misbehaving PG
i.e. ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-16
--journal-path /va
Is there any way we could have a "leveldb_defrag_on_mount" option for
the osds similar to the "leveldb_compact_on_mount" option?
Also, I've got at least one user that is creating and deleting
thousands of files at a time in some of their directories (keeping
1-2% of them). Could that cause this fr
I'm still exporting pgs out of some of the downed osds, but things are
definitely looking promising.
Marginally related to this thread, as these seem to be most of the
hanging objects when exporting pgs, what are inodes in the 600 range
used for within the metadata pool? I know the 200 range is us
On Thu, Jun 2, 2016 at 9:07 AM, Brandon Morris, PMP
wrote:
> The only way that I was able to get back to Health_OK was to export/import.
> * Please note, any time you use the ceph_objectstore_tool you risk data
> loss if not done carefully. Never remove a PG until you have a known good
I suspect the problem is that ReplicatedBackend::build_push_op assumes
that osd_recovery_max_chunk (defaults to 8MB) of omap entries is about
the same amount of work to get as 8MB of normal object data. The fix
would be to add another config osd_recovery_max_omap_entries_per_chunk
with a sane defa
On Thu, Jun 2, 2016 at 9:49 AM, Adam Tygart wrote:
> Okay,
>
> Exporting, removing and importing the pgs seems to be working
> (slowly). The question now becomes, why does and export/import work?
> That would make me think there is a bug in there somewhere in the pg
> loading code. Or does it have
Okay,
Exporting, removing and importing the pgs seems to be working
(slowly). The question now becomes, why does and export/import work?
That would make me think there is a bug in there somewhere in the pg
loading code. Or does it have to do with re-creating the leveldb
databases? The same number
I concur with Greg.
The only way that I was able to get back to Health_OK was to
export/import. * Please note, any time you use the
ceph_objectstore_tool you risk data loss if not done carefully. Never
remove a PG until you have a known good export *
Here are the steps I used:
1. set
On Wed, Jun 1, 2016 at 2:47 PM, Adam Tygart wrote:
> I tried to compact the leveldb on osd 16 and the osd is still hitting
> the suicide timeout. I know I've got some users with more than 1
> million files in single directories.
>
> Now that I'm in this situation, can I get some pointers on how ca
On Wed, Jun 1, 2016 at 9:13 AM, Adam Tygart wrote:
> Hello all,
>
> I'm running into an issue with ceph osds crashing over the last 4
> days. I'm running Jewel (10.2.1) on CentOS 7.2.1511.
>
> A little setup information:
> 26 hosts
> 2x 400GB Intel DC P3700 SSDs
> 12x6TB spinning disks
> 4x4TB spi
I tried to compact the leveldb on osd 16 and the osd is still hitting
the suicide timeout. I know I've got some users with more than 1
million files in single directories.
Now that I'm in this situation, can I get some pointers on how can I
use either of your options?
Thanks,
Adam
On Wed, Jun 1,
If that pool is your metadata pool, it looks at a quick glance like
it's timing out somewhere while reading and building up the omap
contents (ie, the contents of a directory). Which might make sense if,
say, you have very fragmented leveldb stores combined with very large
CephFS directories. Tryin
I've been attempting to work through this, finding the pgs that are
causing hangs, determining if they are "safe" to remove, and removing
them with ceph-objectstore-tool on osd 16.
I'm now getting hangs (followed by suicide timeouts) referencing pgs
that I've just removed, so this doesn't seem to
Adam,
We ran into similar issues when we get too many objects in bucket
(around 300 million). The .rgw.buckets.index pool became unable to
complete backfill operations.The only way we were able to get past it
was to export the offending placement group with the ceph-objectstore-tool
and
Hello all,
I'm running into an issue with ceph osds crashing over the last 4
days. I'm running Jewel (10.2.1) on CentOS 7.2.1511.
A little setup information:
26 hosts
2x 400GB Intel DC P3700 SSDs
12x6TB spinning disks
4x4TB spinning disks.
The SSDs are used for both journals and as an OSD (for t
17 matches
Mail list logo