Another ML thread currently happening is "[ceph-users] Cluster network
slower than public network" And It has some good information that might be
useful for you.
On Thu, Nov 16, 2017 at 10:32 AM David Turner wrote:
> That depends on another question. Does the client
> Op 16 november 2017 om 16:32 schreef Robert Stanford
> :
>
>
> Once 'osd max write size' (90MB by default I believe) is exceeded, does
> Ceph reject the object (which is coming in through RGW), or does it break
> it up into smaller objects (of max 'osd max write
On Thu, Nov 16, 2017 at 3:32 PM, David Turner wrote:
> That depends on another question. Does the client write all 3 copies or
> does the client send the copy to the primary OSD and then the primary OSD
> sends the write to the secondaries? Someone asked this recently,
Hi,
On 11/16/2017 01:36 PM, Jogi Hofmüller wrote:
Dear all,
for about a month we experience something strange in our small cluster.
Let me first describe what happened on the way.
On Oct 4ht smartmon told us that the journal SSDs in one of our two
ceph nodes will fail. Since getting
Thank you all for your time and support.
I don't see any backfilling in the logs and the number of
"active+degraded" as well as "active+remapped" and "active+clean"
objects is the same for some time now. The only thing I see is
"scrubbing".
Wido, I cannot do anything with the data in osd.0
There is another thread in the ML right now covering this exact topic. The
general consensus is that for most deployments, a separate network for
public and cluster is wasted complexity.
On Thu, Nov 16, 2017 at 9:59 AM Jake Young wrote:
> On Wed, Nov 15, 2017 at 1:07 PM
That depends on another question. Does the client write all 3 copies or
does the client send the copy to the primary OSD and then the primary OSD
sends the write to the secondaries? Someone asked this recently, but I
don't recall if an answer was given. I'm not actually certain which is the
The filestore_split_multiple command does indeed need a restart of the OSD
daemon to take effect. Same with the filestore_merge_threshold. These
settings also only affect filestore. If you're using bluestore, then they
don't mean anything.
You can utilize the ceph-objectstore-tool to split
No,
What test parameters (iodepth/file size/numjobs) would make sense for 3
node/27OSD@4TB ?
- Rado
-Original Message-
From: Mark Nelson [mailto:mnel...@redhat.com]
Sent: Thursday, November 16, 2017 10:56 AM
To: Milanov, Radoslav Nikiforov ; David Turner
Thanks. Does this mean that I can send >90MB objects to my RGWs and they
will break it up into manageable (<=90MB) chunks before storing them? Or
if I'm going to store objects > 90MB do I need to change this parameter?
I don't know if we'll be able to use libradosstriper, but thanks for
Did you happen to have a chance to try with a higher io depth?
Mark
On 11/16/2017 09:53 AM, Milanov, Radoslav Nikiforov wrote:
FYI
Having 50GB bock.db made no difference on the performance.
- Rado
*From:*David Turner [mailto:drakonst...@gmail.com]
*Sent:* Tuesday, November 14, 2017 6:13
Hi
I have searched the threads for a resolution to this problem, but so far have
had no success.
First – my setup. I am trying to replicate the setup on the quick ceph-deploy
pages. I have 4 virtual machines (virtualbox running SL7.3 – a CentOS clone).
Iptables is not running on any nodes.
> Op 16 november 2017 om 16:20 schreef David Turner :
>
>
> There is another thread in the ML right now covering this exact topic. The
> general consensus is that for most deployments, a separate network for
> public and cluster is wasted complexity.
>
Indeed. Just for
Once 'osd max write size' (90MB by default I believe) is exceeded, does
Ceph reject the object (which is coming in through RGW), or does it break
it up into smaller objects (of max 'osd max write size' size)? If it
breaks them up, does it read the fragments in parallel when they're
requested by
I would like to thank all of you very much for your assistance, help,
support and time.
I have to say that I totally agree with you regarding the number of
replicas and probably this is the best time to switch to 3 replicas
since all services have been stopped due to this emergency.
After I
Thanks for your reply and information. Yes, we are using filestore. Will it
still work in Luminous: ??
http://docs.ceph.com/docs/master/rados/configuration/filestore-config-ref/ :
"filestore merge threshold
Description: Min number of files in a subdir before merging into parent
NOTE: A negative
The first step is to make sure that it is out of the cluster. Does `ceph
osd stat` show the same number of OSDs as in (it's the same as a line from
`ceph status`)? It should show 1 less for up, but if it's still
registering the OSD as in then the backfilling won't start. `ceph osd out
0` should
FYI
Having 50GB bock.db made no difference on the performance.
- Rado
From: David Turner [mailto:drakonst...@gmail.com]
Sent: Tuesday, November 14, 2017 6:13 PM
To: Milanov, Radoslav Nikiforov
Cc: Mark Nelson ; ceph-users@lists.ceph.com
Subject: Re:
> Op 16 november 2017 om 16:46 schreef Robert Stanford
> :
>
>
> Thanks. Does this mean that I can send >90MB objects to my RGWs and they
> will break it up into manageable (<=90MB) chunks before storing them? Or
> if I'm going to store objects > 90MB do I need to
On 17-11-16 05:34 PM, Jaroslaw Owsiewski wrote:
Thanks for your reply and information. Yes, we are using filestore. Will it
still work in Luminous: ??
http://docs.ceph.com/docs/master/rados/configuration/filestore-config-ref/ :
|"filestore merge threshold|
Description: Min number of files in
Dear cephers,
I have an emergency on a rather small ceph cluster.
My cluster consists of 2 OSD nodes with 10 disks x4TB each and 3
monitor nodes.
The version of ceph running is Firefly v.0.80.9
(b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
The cluster originally was build with "Replicated
Hi cephers.
Some thoughts...
At this time my cluster on Kraken 11.2.0 - works smooth with FileStore
and RBD only.
I want upgrade to Luminous 12.2.1 and go to Bluestore because this
cluster want grows double with new disks, so is best opportunity migrate
to Bluestore.
In ML I was found two
My cluster (55 OSDs) runs 12.2.x since the release, and bluestore too
All good so far
On 16/11/2017 15:14, Konstantin Shalygin wrote:
> Hi cephers.
> Some thoughts...
> At this time my cluster on Kraken 11.2.0 - works smooth with FileStore
> and RBD only.
> I want upgrade to Luminous 12.2.1 and
We are still working on establishing an official repo for these
packages. I, unfortunately, keep forgetting to continuously pester the
responsible team. I've opened a tracker ticket [1] for the request,
but in the meantime, you can access RPMs from here [2][3].
[1]
On Wed, Nov 15, 2017 at 8:31 AM, Wei Jin wrote:
> I tried to do purge/purgedata and then redo the deploy command for a
> few times, and it still fails to start osd.
> And there is no error log, anyone know what's the problem?
Seems like this is OSD 0, right? Have you checked
I was told at the Openstack Summit that 12.2.2 should drop "In a few days."
That was a week ago yesterday. If you have a little leeway, it may be
best to wait. I know I am, but I'm paranoid.
There was also a performance regression mentioned recently that's supposed
to be fixed.
-Erik
On Nov
> Op 16 november 2017 om 14:40 schreef Georgios Dimitrakakis
> :
>
>
> @Sean Redmond: No I don't have any unfound objects. I only have "stuck
> unclean" with "active+degraded" status
> @Caspar Smit: The cluster is scrubbing ...
>
> @All: My concern is because of one
Dear all,
for about a month we experience something strange in our small cluster.
Let me first describe what happened on the way.
On Oct 4ht smartmon told us that the journal SSDs in one of our two
ceph nodes will fail. Since getting replacements took way longer than
expected we decided to
Have created a ticket
http://tracker.ceph.com/issues/22144
Feel free to add anything extra you have seen.
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ashley
Merrick
Sent: 16 November 2017 17:27
To: Eric Nelson
Cc: ceph-us...@ceph.com
Subject:
HI,
what exactly means message:
filestore_split_multiple = '24' (not observed, change may require restart)
This has happend after command:
# ceph tell osd.0 injectargs '--filestore-split-multiple 24'
Do I really need to restart OSD to make changes to take effect?
ceph version 12.2.1 ()
On Wed, Nov 15, 2017 at 1:07 PM Ronny Aasen
wrote:
> On 15.11.2017 13:50, Gandalf Corvotempesta wrote:
>
> As 10gb switches are expansive, what would happen by using a gigabit
> cluster network and a 10gb public network?
>
> Replication and rebalance should be slow,
Here is my crushmap. You can see our general setup. We are using the bottom
rule for the EC pool.
We are trying to get to the point where we can lose an entire host and the
cluster will continue to work.
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
Currently experiencing a nasty bug http://tracker.ceph.com/issues/21142
I would say wait a while for the next point release.
,Ashley
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jack
Sent: 16 November 2017 22:22
To:
Hi,
We intend build a new Ceph cluster with 6 Ceph OSD hosts, 10 SAS disks
every host, using 10Gbps NIC for client network, object is replicated 3.
So, how could I sizing the cluster network for best performance?
As i have read, 3x replicate means 3x bandwidth client network = 30 Gbps,
is it
@Sean Redmond: No I don't have any unfound objects. I only have "stuck
unclean" with "active+degraded" status
@Caspar Smit: The cluster is scrubbing ...
@All: My concern is because of one copy left for the data on the failed
disk.
If I just remove the OSD.0 from crush map does that copy all
> Op 16 november 2017 om 14:46 schreef Caspar Smit :
>
>
> 2017-11-16 14:43 GMT+01:00 Wido den Hollander :
>
> >
> > > Op 16 november 2017 om 14:40 schreef Georgios Dimitrakakis <
> > gior...@acmac.uoc.gr>:
> > >
> > >
> > > @Sean Redmond: No I don't
On Wed, Nov 15, 2017 at 5:03 AM, Ragan, Tj (Dr.)
wrote:
> $ cat /etc/yum.repos.d/ceph.repo
> [Ceph]
> name=Ceph packages for $basearch
> baseurl=http://download.ceph.com/rpm-jewel/el7/$basearch
> enabled=1
> gpgcheck=1
> type=rpm-md
>
On 17-11-16 02:44 PM, Jaroslaw Owsiewski wrote:
HI,
what exactly means message:
filestore_split_multiple = '24' (not observed, change may require restart)
This has happend after command:
# ceph tell osd.0 injectargs '--filestore-split-multiple 24'
It means that "filestore split multiple"
2017-11-16 14:05 GMT+01:00 Georgios Dimitrakakis :
> Dear cephers,
>
> I have an emergency on a rather small ceph cluster.
>
> My cluster consists of 2 OSD nodes with 10 disks x4TB each and 3 monitor
> nodes.
>
> The version of ceph running is Firefly v.0.80.9
>
2017-11-16 14:43 GMT+01:00 Wido den Hollander :
>
> > Op 16 november 2017 om 14:40 schreef Georgios Dimitrakakis <
> gior...@acmac.uoc.gr>:
> >
> >
> > @Sean Redmond: No I don't have any unfound objects. I only have "stuck
> > unclean" with "active+degraded" status
> > @Caspar
Hi,
Am Donnerstag, den 16.11.2017, 13:44 +0100 schrieb Burkhard Linke:
> > What remains is the growth of used data in the cluster.
> >
> > I put background information of our cluster and some graphs of
> > different metrics on a wiki page:
> >
> >
We upgraded from firefly to 12.2.1 . We cannot use our RadosGW S3 Endpoints
anymore since multipart uploads get not replicated. So we are also waiting
for 12.2.2 to finally allow usage of our s3 endpoints again
On Thu, Nov 16, 2017 at 3:33 PM, Ashley Merrick
wrote:
>
On Thu, Nov 16, 2017 at 6:33 AM, Ashley Merrick wrote:
>
> Currently experiencing a nasty bug http://tracker.ceph.com/issues/21142
Can you add more info to tracker about ceph osd tree(node/memory info)
and what was the version of ceph before and was it in healthy state
What type of SAS disks, spinners or SSD? You really need to specify
the sustained write throughput of your OSD nodes if you want to figure
out whether your network is sufficient/appropriate.
At 3x replication if you want to sustain e.g. 1 GB/s of write traffic
from clients then you will need 2
Hello,
Any input from anyone else who may have expirenced this assert and has a work
around or something that can be done to get the OSD's online.
Currently have 14 OSD's that just loop for this reasons causing us to have a
partial outage on PG's.
Thanks,
Ashley
Get Outlook for
It depends on what you expect your typical workload to be like. Ceph
(and distributed storage in general) likes high io depths so writes can
hit all of the drives at the same time. There are tricks (like
journals, writahead logs, centralized caches, etc) that can help
mitigate this, but I
We upgraded to it a few weeks ago in order to get some of the new indexing
features, but have also had a few nasty bugs in the process (including this
one) as we have been upgrading osds from filestore to bluestore. Currently
these are isolated to our SSD cache tier so I've been evicting
Hi all,
I did a pretty bit mistake doing our upgrade from hammer to luminous,
skipping the jewel release.
When I realized and tried to switch back to jewel, it was too late -
the cluster now won't start, complaining about "The disk uses features
unsupported by the executable.":
2017-11-17
Our cluster here having troubles is primarily for object storage, and
somewhere around 650M objects and 600T. Majority of objects being small
jpgs, large objects are big movie .ts files and .mp4.
This was upgraded from jewel on xenial last month.majority of bugs are
ceph-osd on SSDs for us. We've
Hello Vasu,
Sorry I linked to the wrong bug, the one that's causing me large issues is :
http://tracker.ceph.com/issues/22144
,Ashley
-Original Message-
From: Vasu Kulkarni [mailto:vakul...@redhat.com]
Sent: 17 November 2017 03:36
To: Ashley Merrick
Cc: Jack
Le Wed, 15 Nov 2017 19:46:48 +,
Shawn Edwards a écrit :
> On Wed, Nov 15, 2017, 11:07 David Turner
> wrote:
>
> > I'm not going to lie. This makes me dislike Bluestore quite a
> > bit. Using multiple OSDs to an SSD journal allowed for you to
We are running RHCS2.3 (jewel) with ganesha 2.4.2 and cephfs fsal, compiled
from srpm. experimenting with CTDB for controlling ganesha HA since we run
samba on same servers.
Haven't done much functionality/stress testing but on face value basic
stuff seems to work well (file operations).
In
We have a small prod cluster with 12.2.1 and bluestore, running just cephfs,
for HPC use. It's been in prod for about 7 weeks now, and pretty stable.
From: ceph-users on behalf of Eric Nelson
Sent:
We upgraded from firefly to 12.2.1
You still on FileStore?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
My cluster (55 OSDs) runs 12.2.x since the release, and bluestore too
All good so far
This is cleanly deployed cluster or upgrade from some version?
___
ceph-users mailing list
ceph-users@lists.ceph.com
I was told at the Openstack Summit that 12.2.2 should drop "In a few days."
That was a week ago yesterday. If you have a little leeway, it may be
best to wait. I know I am, but I'm paranoid.
There was also a performance regression mentioned recently that's supposed
to be fixed.
As we can see
Hello,
Good to hear it's not just me, however have a cluster basically offline due to
too many OSD's dropping for this issue.
Anybody have any suggestions?
,Ashley
From: Eric Nelson
Sent: 16 November 2017 00:06:14
To: Ashley Merrick
57 matches
Mail list logo