ok, thank you all.
2014-09-16 0:52 GMT+08:00 Yehuda Sadeh yeh...@redhat.com:
I agree with Greg. When dealing with the latencies that we deal with due
to different IO operations (networking, storage), it's mostly not worth the
trouble. I think the main reason we didn't actually put it to use
Hello fellow cephalopods,
every deep scrub seems to dig up inconsistencies (i.e. scrub errors)
that we could use some help with diagnosing.
I understand there used to be a data corruption issue before .80.3 so we
made sure that all the nodes were upgraded to .80.5 and all the daemons
were
On 15/09/14 17:28, Sage Weil wrote:
rule myrule {
ruleset 1
type replicated
min_size 1
max_size 10
step take default
step choose firstn 2 type rack
step chooseleaf firstn 2 type host
step emit
}
That will give you 4 osds, spread across 2
Hi All,
I am trying configure multi-site data replication. I am getting this error
continuously.
INFO:urllib3.connectionpool:Starting new HTTP connection (1): cephog1
ERROR:radosgw_agent.sync:finding number of shards failed
WARNING:radosgw_agent.sync:error preparing for sync, will retry.
Hello,
We have a machine that mounts a rbd image as a block device, then rsync files
from another server to this mount.
As this rsync traffic will have to share bandwith with the writing to the RBD,
I
wonder if it is possible to specify which NIC to mount the RBD through?
We are using 0.85.5
- Message from Gregory Farnum g...@inktank.com -
Date: Mon, 15 Sep 2014 10:37:07 -0700
From: Gregory Farnum g...@inktank.com
Subject: Re: [ceph-users] OSD troubles on FS+Tiering
To: Kenneth Waegeman kenneth.waege...@ugent.be
Cc: ceph-users ceph-users@lists.ceph.com
Hi,
Thanks for keeping us updated on this subject.
dsync is definitely killing the ssd…
I don’t have much to add, I’m just surprised that you’re only getting 5299 with
0.85 since I’ve been able to get 6,4K, well I was using the 200GB model, that
might explain this.
On 12 Sep 2014, at 16:32,
Heh, you'll have to talk to Haomai about issues with the
KeyValueStore, but I know he's found a number of issues in the version
of it that went to 0.85.
In future please flag when you're running with experimental stuff; it
helps direct attention to the right places! ;)
-Greg
Software Engineer #42
http://tracker.ceph.com/issues/4137 contains links to all the tasks we
have so far. You can also search any of the ceph-devel list archives
for forward scrub.
On Mon, Sep 15, 2014 at 10:16 PM, brandon li brandon.li@gmail.com wrote:
Great to know you are working on it!
I am new to the
Did you follow this ceph.com/docs/master/rbd/rbd-openstack/ to configure your
env?
On 12 Sep 2014, at 14:38, m.channappa.nega...@accenture.com wrote:
Hello Team,
I have configured ceph as a multibackend for openstack.
I have created 2 pools .
1. Volumes (replication size =3 )
Hi Daniel,
When I run
crushtool --outfn crushmap --build --num_osds 100 host straw 2 rack straw 10
default straw 0
crushtool -d crushmap -o crushmap.txt
cat crushmap.txt EOF
rule myrule {
ruleset 1
type replicated
min_size 1
max_size 10
step take default
Hi Greg,
just picked up this one from the archive while researching a different
issue and thought I'd follow up.
On Tue, Aug 19, 2014 at 6:24 PM, Gregory Farnum g...@inktank.com wrote:
The sst files are files used by leveldb to store its data; you cannot
remove them. Are you running on a very
Hi.
I'm new to ceph and have been going thorough the setup phase. I was able to
setup couple of Proof of Concept cluster. Got some general questions,
thought the community would be able to clarify.
1. I've been using ceph-deploy for deployment. In a 3 Monitor and 3 OSD
configuration, one of the
Hi,
On 16 Sep 2014, at 16:46, shiva rkreddy
shiva.rkre...@gmail.commailto:shiva.rkre...@gmail.com wrote:
2. Has any one used SSD devices for Monitors. If so, can you please share the
details ? Any specific changes to the configuration files?
We use SSDs on our monitors — a spinning disk was
I don't really know; Joao has handled all these cases. I *think* they've
been tied to a few bad versions of LevelDB, but I'm not certain. (There
were a number of discussions about it on the public mailing lists.)
-Greg
On Tuesday, September 16, 2014, Florian Haas flor...@hastexo.com wrote:
Hi
On 09/16/2014 04:35 PM, Gregory Farnum wrote:
I don't really know; Joao has handled all these cases. I *think* they've
been tied to a few bad versions of LevelDB, but I'm not certain. (There
were a number of discussions about it on the public mailing lists.)
-Greg
On Tuesday, September 16,
Dear Karan and rest of the followers,
since I haven't received anything from Mellanox regarding this webinar
I 've decided to look for it myself.
You can find the webinar here:
http://www.mellanox.com/webinars/2014/inktank_ceph/
Best,
G.
On Mon, 14 Jul 2014 15:47:39 +0300, Karan Singh
Hi Loic,
Thanks for providing a detailed example. I'm able to run the example
that you provide, and also got my own live crushmap to produce some
results, when I appended the --num-rep 3 option to the command.
Without that option, even your example is throwing segfaults - maybe a
bug in
Replying to myself, and for the benefit of other caffeine-starved people:
Setting the last rule to chooseleaf firstn 0 does not generate the
desired results, and ends up sometimes putting all replicas in the same
zone.
I'm slowly getting the hang of customised crushmaps ;-)
On 16/09/14 18:39,
Hi Daniel,
Can you provide your exact crush map and exact crushtool command
that results in segfaults?
Johnu
On 9/16/14, 10:23 AM, Daniel Swarbrick
daniel.swarbr...@profitbricks.com wrote:
Replying to myself, and for the benefit of other caffeine-starved people:
Setting the last
Hi Daniel,
I see the core dump now, thank you. http://tracker.ceph.com/issues/9490
Cheers
On 16/09/2014 18:39, Daniel Swarbrick wrote:
Hi Loic,
Thanks for providing a detailed example. I'm able to run the example
that you provide, and also got my own live crushmap to produce some
results,
Assuming you're using the kernel?
In any case, Ceph generally doesn't do anything to select between
different NICs; it just asks for a connection to a given IP. So you
should just be able to set up a route for that IP.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue,
On Tue, Sep 16, 2014 at 12:03 AM, Marc m...@shoowin.de wrote:
Hello fellow cephalopods,
every deep scrub seems to dig up inconsistencies (i.e. scrub errors)
that we could use some help with diagnosing.
I understand there used to be a data corruption issue before .80.3 so we
made sure that
Hi,
I saw that the development snapshot 0.85 was released last week, and
have been patiently waiting for packages to appear, so that I can
upgrade a test cluster here.
Can we still expect packages (wheezy, in my case) of 0.85 to be published?
Thanks!
Hi Greg,
I believe Marc is referring to the corruption triggered by set_extsize on xfs.
That option was disabled by default in 0.80.4... See the thread firefly scrub
error.
Cheers,
Dan
From: Gregory Farnum g...@inktank.com
Sent: Sep 16, 2014 8:15 PM
To: Marc
Cc: ceph-users@lists.ceph.com
Ah, you're right — it wasn't popping up in the same searches and I'd
forgotten that was so recent.
In that case, did you actually deep scrub *everything* in the cluster,
Marc? You'll need to run and fix every PG in the cluster, and the
background deep scrubbing doesn't move through the data very
Thanks for the poke; looks like something went wrong during the
release build last week. We're investigating now.
-Greg
On Tue, Sep 16, 2014 at 11:08 AM, Daniel Swarbrick
daniel.swarbr...@profitbricks.com wrote:
Hi,
I saw that the development snapshot 0.85 was released last week, and
have
No noise. I ran into the /var/local/osd0/journal issue myself. I will
add notes shortly.
On Fri, Apr 4, 2014 at 6:18 AM, Brian Candler b.cand...@pobox.com wrote:
On 04/04/2014 14:11, Alfredo Deza wrote:
Have you set passwordless sudo on the remote host?#
No. Ah... I missed this bit:
echo
We ran a for-loop to tell all the OSDs to deep scrub (since * still
doesn't work) after the upgrade. The deep scrub this week that produced
these errors is the weekly scheduled one though. I shall go investigate
the mentioned thread...
On 16/09/2014 20:36, Gregory Farnum wrote:
Ah, you're right
On Tue, Sep 16, 2014 at 6:15 PM, Joao Eduardo Luis
joao.l...@inktank.com wrote:
Forcing the monitor to compact on start and restarting the mon is the
current workaround for overgrown ssts. This happens on a regular basis with
some clusters and I've not been able to track down the source. It
Hi,
I’m just surprised that you’re only getting 5299 with 0.85 since I’ve been
able to get 6,4K, well I was using the 200GB model
Your model is
DC S3700
mine is DC s3500
with lower writes, so that could explain the difference.
BTW, I'll be at the ceph days in paris thursday, could be
Is it using any CPU or Disk I/O during the 15 minutes?
On Sun, Sep 14, 2014 at 11:34 AM, Christopher Thorjussen
christopher.thorjus...@onlinebackupcompany.com wrote:
I'm waiting for my cluster to recover from a crashed disk and a second osd
that has been taken out (crushmap, rm, stopped).
On Fri, Sep 12, 2014 at 4:35 PM, JIten Shah jshah2...@me.com wrote:
1. If we need to modify those numbers, do we need to update the values in
ceph.conf and restart every OSD or we can run a command on MON, that will
overwrite it?
That will work. You can also update the values without a
I've got several osds that are spinning at 100%.
I've retained some professional services to have a look. Its out of my
newbie reach..
/Christopher
On Tue, Sep 16, 2014 at 11:23 PM, Craig Lewis cle...@centraldesktop.com
wrote:
Is it using any CPU or Disk I/O during the 15 minutes?
On Sun,
On Mon, Sep 8, 2014 at 2:53 PM, Francois Deppierraz franc...@ctrlaltdel.ch
wrote:
XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
All logs from before the disaster are still there, do you have any
advise on what would be relevant?
This is a problem. It's not
I ran into a similar issue before. I was having a lot of OSD crashes
caused by XFS memory allocation deadlocks. My OSDs crashed so many times
that they couldn't replay the OSD Map before they would be marked
unresponsive.
See if this sounds familiar:
Thanks Craig. That’s exactly what I was looking for.
—Jiten
On Sep 16, 2014, at 2:42 PM, Craig Lewis cle...@centraldesktop.com wrote:
On Fri, Sep 12, 2014 at 4:35 PM, JIten Shah jshah2...@me.com wrote:
1. If we need to modify those numbers, do we need to update the values in
ceph.conf
I've been throught your post many times (google likes it ;)
I've been trying all the noout/nodown/noup.
But I will look into the XFS issue you are talking about. And read all of
the post one more time..
/C
On Wed, Sep 17, 2014 at 12:01 AM, Craig Lewis cle...@centraldesktop.com
wrote:
I ran
On 17/09/14 08:39, Alexandre DERUMIER wrote:
Hi,
I’m just surprised that you’re only getting 5299 with 0.85 since I’ve been able
to get 6,4K, well I was using the 200GB model
Your model is
DC S3700
mine is DC s3500
with lower writes, so that could explain the difference.
Interesting -
Hi Guys,
We have a cluster with 1000 OSD nodes and 5 MON nodes and 1 MDS node. In order
to be able to loose quite a few OSD’s and still survive the load, we were
thinking of making the replication factor to 50.
Is that too big of a number? what is the performance implications and any other
On Tue, Sep 16, 2014 at 5:10 PM, JIten Shah jshah2...@me.com wrote:
Hi Guys,
We have a cluster with 1000 OSD nodes and 5 MON nodes and 1 MDS node. In
order to be able to loose quite a few OSD’s and still survive the load, we
were thinking of making the replication factor to 50.
Is that
Hi Mark/Alexandre,
The results are with journal and data configured in the same SSD ?
Also, how are you configuring your journal device, is it a block device ?
If journal and data are not in the same device result may change.
BTW, there are SSDs like SanDisk optimas drives that is using capacitor
Yeah, so generally those will be correlated with some failure domain,
and if you spread your replicas across failure domains you won't hit
any issues. And if hosts are down for any length of time the OSDs will
re-replicate data to keep it at proper redundancy.
-Greg
Software Engineer #42 @
Hi Kenneth,
This problem is much like your last reported problem. It doesn't
backport to 0.85, so the only master branch has no existing bug.
On Tue, Sep 16, 2014 at 9:58 PM, Gregory Farnum g...@inktank.com wrote:
Heh, you'll have to talk to Haomai about issues with the
KeyValueStore, but I
Hi
I'm getting the below error while installing ceph in admin node. Please let
me know how to resolve the same.
[ceph@ceph-admin ceph-cluster]$* ceph-deploy mon create-initial ceph-admin*
[ceph_deploy.conf][DEBUG ] found configuration file at:
/home/ceph/.cephdeploy.conf
Thanks Dan. Is there any preferred filesystem filesystem for the leveldb
files? I understand that the filesystem should be of same type on both /var
and ssd partition.
Should it be ext4, xfs, something else or doesn't matter?
On Tue, Sep 16, 2014 at 10:15 AM, Dan Van Der Ster
Hello Sebastien,
Thanks for your reply. I fixed error. It was a configuration mistake from my
end.
Regards,
Malleshi CN
-Original Message-
From: Sebastien Han [mailto:sebastien@enovance.com]
Sent: Tuesday, September 16, 2014 7:43 PM
To: Channappa Negalur, M.
Cc:
47 matches
Mail list logo