to 777 and user sam2 cannot write either.
On 11/5/2019 7:10 PM, Yan, Zheng wrote:
On Wed, Nov 6, 2019 at 5:47 AM Alex Litvak wrote:
Hello Cephers,
I am trying to understand how uid and gid are handled on the shared cephfs
mount. I am using 14.2.2 and cephfs kernel based client.
I have 2
I changed /webcluster/data this way
rwxrwxrwx dev dev /webcluster/data
Still newly added users can't write to it.
On 11/5/2019 7:04 PM, Alex Litvak wrote:
Plot thickens.
I create a new user sam2 and group sam2 both uid and gid = 1501. User sam2 is
a member of group dev. When I switch
It makes no sense. Old users from the group dev (created and connected) long
time ago can write into data dir and new ones cannot.
On 11/5/2019 3:07 PM, Alex Litvak wrote:
Hello Cephers,
I am trying to understand how uid and gid are handled on the shared cephfs
mount. I am using 14.2.2 and cephfs
Hello Cephers,
I am trying to understand how uid and gid are handled on the shared cephfs
mount. I am using 14.2.2 and cephfs kernel based client.
I have 2 client vms with following uid gid
vm1 user dev (uid=500) group dev (gid=500)
vm2 user dev (uid=500) group dev (gid=500)
vm1 user
Hello cephers,
So I am having trouble with a new hardware systems with strange OSD behavior
and I want to replace a disk with a brand new one to test the theory.
I run all daemons in containers and on one of the nodes I have mon, mgr, and 6
osds. So following
Hellow everyone,
Can you shed the line on the cause of the crash? Could actually client request
trigger it?
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 22:52:58.867
7f093d71e700 -1 bdev(0x55b72c156000 /var/lib/ceph/osd/ceph-17/block) aio_submit
retries 16
Sep 30 22:52:58
Hello everyone,
I am running a number of parallel benchmark tests against the cluster that
should be ready to go to production.
I enabled prometheus to monitor various information and while cluster stays
healthy through the tests with no errors or slow requests,
I noticed an apply / commit
If it is a release it broke my ansible installation because it is missing
librados2
https://download.ceph.com/rpm-nautilus/el7/x86_64/librados2-14.2.3-0.el7.x86_64.rpm
(404 Not Found).
Please fix it one way or another.
On 9/3/2019 8:31 PM, Sasha Litvak wrote:
Is there an actual release or
components rather than turning everything up to 20. Sorry for hijacking the post, I will create a new one
when I have more information.
On 7/23/2019 9:50 PM, Alex Litvak wrote:
I just had an osd crashed with no logs (debug was not enabled). Happened 24 hours later after actual upgrade from 14.2.1
I just had an osd crashed with no logs (debug was not enabled). Happened 24 hours later after actual upgrade from 14.2.1 to 14.2.2. Nothing else changed as far as environment or load. Disk is OK.
Restarted osd and it came back. Had cluster up for 2 month until the upgrade without an issue.
I was planning to upgrade 14.2.1 to 14.2.2 next week. Since there are few reports of crashes, does any one knows if upgrade somehow triggers the issue? If not, that what is? Since this has been
reported before the upgrade by some, just wondering if upgrade to 14.2.2 makes the problem worse.
The issue should have been resolved by backport
https://tracker.ceph.com/issues/40424 in nautilus, was it merged into 14.2.2 ?
Also do you think it is safe to upgrade from 14.2.1 to 14.2.2 ?
On 7/19/2019 1:05 PM, Paul Emmerich wrote:
I've also encountered a crash just like that after
rek Zegar
Senior SDS Engineer_
__Email __tze...@us.ibm.com_ <mailto:email%20address>
Mobile _630.974.7172_
Alex Litvak ---06/24/2019 01:07:28 PM---Jason, Here you go:
From: Alex Litvak <_alexander.v.litva
Jason,
What are you suggesting to do ? Removing this line from the config database
and keeping in config files instead?
On 6/24/2019 1:12 PM, Jason Dillaman wrote:
On Mon, Jun 24, 2019 at 2:05 PM Alex Litvak
wrote:
Jason,
Here you go:
WHOMASK LEVELOPTION
On 6/24/2019 11:50 AM, Jason Dillaman wrote:
On Sun, Jun 23, 2019 at 4:27 PM Alex Litvak
wrote:
Hello everyone,
I encounter this in nautilus client and not with mimic. Removing admin socket
entry from config on client makes no difference
Error:
rbd ls -p one
2019-06-23 12:58:29.344
Hello everyone,
I encounter this in nautilus client and not with mimic. Removing admin socket
entry from config on client makes no difference
Error:
rbd ls -p one
2019-06-23 12:58:29.344 7ff2710b0700 -1 set_mon_vals failed to set admin_socket
= /var/run/ceph/$name.$pid.asok: Configuration
Hello cephers,
I know that there was similar question posted 5 years ago. However the answer
was inconclusive for me.
I installed a new Nautilus 14.2.1 cluster and started pre-production testing.
I followed RedHat document and simulated a soft disk failure by
# echo 1 >
From what I see, the message is generated by a mon container on each node.
Does mon issue a manual compaction of rocksdb at some point (debug is a rocksdb
one)?
On 3/18/2019 12:33 AM, Konstantin Shalygin wrote:
I am getting a huge number of messages on one out of three nodes showing Manual
Konstantin,
I am not sure I understand. You mean something in the container does a manual
compaction job sporadically? What would be doing that? I am confused.
On 3/18/2019 12:33 AM, Konstantin Shalygin wrote:
I am getting a huge number of messages on one out of three nodes showing Manual
those things.
Thank you again,
On 3/17/2019 4:11 AM, Alex Litvak wrote:
Hello everyone,
I am getting a huge number of messages on one out of three nodes showing Manual
compaction starting all the time. I see no such of log entries on the other
nodes in the cluster.
Mar 16 06:40:11 storage1n1
Hello everyone,
As I am troubleshooting an issue I see logs literally littered with messages such as below. I searched documentation and couldn't find a specific debug nob to turn. I see some debugging is on by
default but I don't need to see staff below especially mgr and client repeating.
Hello everyone,
I am getting a huge number of messages on one out of three nodes showing Manual
compaction starting all the time. I see no such of log entries on the other
nodes in the cluster.
Mar 16 06:40:11 storage1n1-chi docker[24502]: debug 2019-03-16 06:40:11.441 7f6967af4700 4
"time": "2019-03-08 07:53:37.002282",
"event": "done"
}
]
}
},
It just tell me throttled, nothing else. What does throttled mean in this case?
I see so
Hello Cephers,
I am trying to find the cause of multiple slow ops happened with my
small cluster. I have a 3 node with 9 OSDs
Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
128 GB RAM
Each OSD is SSD Intel DC-S3710 800GB
It runs mimic 13.2.2 in containers.
Cluster was operating normally for 4
Dear Cephers,
In mimic 13.2.2
ceph tell mgr.* injectargs --log-to-stderr=false
Returns an error (no valid command found ...). What is the correct way to
inject mgr configuration values?
The same command works on mon
ceph tell mon.* injectargs --log-to-stderr=false
Thank you in advance,
Hello everyone,
I am running mimic 13.2.2 cluster in containers and noticed that docker
logs ate all of my local disk space after a while. So I changed some
debugging of rock, level, and mem db to 1/5 (default 4/5) and changed
mon logging as such
ceph tell mon.* injectargs
It is true for all distros. It doesn't happen the first time either. I
think it is a bit dangerous.
On 1/3/19 12:25 AM, Ashley Merrick wrote:
Have just run an apt update and have noticed there are some CEPH
packages now available for update on my mimic cluster / ubuntu.
Have yet to install
Hello everyone,
I am running mds + mon on 3 nodes. Recently due to increased cache pressure
and NUMA non-interleave effect, we decided to double the memory on the nodes
from 32 G to 64 G.
We wanted to upgrade a standby node first to be able to test new memory vendor.
So without much
Sorry for hijacking a thread but do you have an idea of what to watch for:
I monitor admin sockets of osds and occasionally I see a burst of both
op_w_process_latency and op_w_latency to near 150 - 200 ms on 7200 SAS
enterprise drives.
For example load average on the node jumps up with idle 97
John,
If I go with write through, shouldn't disk cache be enabled?
On 11/20/2018 6:12 AM, John Petrini wrote:
I would disable cache on the controller for your journals. Use write through
and no read ahead. Did you make sure the disk cache is disabled?
On Tuesday, November 20, 2018, Alex
I went through raid controller firmware update. I replaced a pair of SSDs with new ones. Nothing have changed. Per controller card utility it shows that no patrol reading happens and battery
backup is in a good shape. Cache policy is WriteBack. I am aware on the bad battery effect but it
you
On Mon, 19 Nov 2018 at 12:28 AM, Alex Litvak mailto:alexander.v.lit...@gmail.com>> wrote:
All machines state the same.
/opt/MegaRAID/MegaCli/MegaCli64 -LDGetProp -DskCache -Lall -a0
Adapter 0-VD 0(target id: 0): Disk Write Cache : Disk's Default
Adapter 0-VD 1(target
check ssd disk caches.
On Sun, Nov 18, 2018 at 11:40 AM Alex Litvak
wrote:
All 3 nodes have this status for SSD mirror. Controller cache is on for all 3.
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write
write cache on SSDs enabled on three servers? Can you check them?
On Sun, Nov 18, 2018 at 9:05 AM Alex Litvak
wrote:
Raid card for journal disks is Perc H730 (Megaraid), RAID 1, battery back cache
is on
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache
card installed on this system? What is the raid mode?
On Sun, Nov 18, 2018 at 8:25 AM Alex Litvak
wrote:
Here is another snapshot. I wonder if this write io wait is too big
Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
0.09 11.750.00 11.75 2.25 1.80
dm-25 0.00 0.000.00 12.00 0.00 160.0026.67
0.065.080.005.08 1.25 1.50
On 11/17/2018 10:19 PM, Alex Litvak wrote:
I stand corrected, I looked at the device iostat, but it was partitioned. Here
is a more
I stand corrected, I looked at the device iostat, but it was partitioned. Here
is a more correct picture of what is going on now.
Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
dm-14 0.00 0.000.00
Plot thickens:
I checked c-states and apparently I am operating in c1 with all CPUS on.
Apparently servers were tuned to use latency-performance
tuned-adm active
Current active profile: latency-performance
turbostat shows
PackageCore CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI
John,
Thank you for suggestions:
I looked into journal SSDs. It is close to 3 years old showing 5.17% of
wear (352941GB Written to disk with 3.6 PB endurance specs over 5 years)
It could be that smart not telling all but that it what I see.
Vendor Specific SMART Attributes with Thresholds:
I am evaluating bluestore on the separate cluster. Unfortunately
upgrading this one is out of the question at the moment for multiple
reasons. That is why I am trying to find a possible root cause.
On 11/17/2018 2:14 PM, Paul Emmerich wrote:
Are you running FileStore? (The config options
the spikes, how ever it is
still small.
On 11/17/2018 1:40 PM, Kees Meijs wrote:
Hi Alex,
What kind of clients do you use? Is it KVM (QEMU) using NBD driver,
kernel, or...?
Regards,
Kees
On 17-11-18 20:17, Alex Litvak wrote:
Hello everyone,
I am trying to troubleshoot cluster exhibiting huge
Hello everyone,
I am trying to troubleshoot cluster exhibiting huge spikes of latency.
I cannot quite catch it because it happens during the light activity and
randomly affects one osd node out of 3 in the pool.
This is a file store.
I see some osds exhibit applied latency of 400 ms, 1
with upmap. If
you can, it is hands down the best way to balance your cluster.
On Sat, Oct 27, 2018, 9:14 PM Alex Litvak <mailto:alexander.v.lit...@gmail.com>> wrote:
I have a cluster using 2 roots. I attempted to reweigh osds under the
"default" root used by pool rbd,
I have a cluster using 2 roots. I attempted to reweigh osds under the
"default" root used by pool rbd, cephfs-data, cephfs-meta using Cern
script: crush-reweight-by-utilization.py. I ran it first and it showed
4 candidates (per script default ), it shows final weight and single
step
, maybe less when is
not in replay mode.
Anyway, we've deactivated CephFS for now there. I'll try with older versions on
a test environment
El lun., 8 oct. 2018 a las 5:21, Alex Litvak ()
escribió:
How is this not an emergency announcement? Also I wonder if I can
downgrade at all ? I am
How is this not an emergency announcement? Also I wonder if I can
downgrade at all ? I am using ceph with docker deployed with
ceph-ansible. I wonder if I should push downgrade or basically wait for
the fix. I believe, a fix needs to be provided.
Thank you,
On 10/7/2018 9:30 PM, Yan,
gor
On 10/2/2018 5:04 PM, Alex Litvak wrote:
I am sorry for interrupting the thread, but my understanding always
was that blue store on the single device should not care of the DB
size, i.e. it would use the data part for all operations if DB is
full. And if it is not true, what would be sens
I am sorry for interrupting the thread, but my understanding always was
that blue store on the single device should not care of the DB size,
i.e. it would use the data part for all operations if DB is full. And
if it is not true, what would be sensible defaults on 800 GB SSD? I
used
not be modified at runtime
On 08/30/2018 09:06 PM, Alex Litvak wrote:
I keep getting the following error message:
018-08-30 18:52:37.882 7fca9df7c700 -1 asok(0x7fca98000fe0)
AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed
to bind the UNIX domain socket to
'/var/run/ceph/ceph
I keep getting the following error message:
018-08-30 18:52:37.882 7fca9df7c700 -1 asok(0x7fca98000fe0)
AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed
to bind the UNIX domain socket to
'/var/run/ceph/ceph-client.admin.asok': (17) File exists
Otherwise things seem
Are there plans to release Jewel 10.2.11 before the end of the support?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Hammer RPMs for 0.94.8 are still not available for EL6. Can this
please be addressed ?
Thank you in advance,
On 08/27/2016 06:25 PM,
alexander.v.lit...@gmail.com wrote:
RPMs are not available at the distro side.
On Fri, 26 Aug 2016 21:31:45 + (UTC), Sage Weil
wrote:
52 matches
Mail list logo