Re: [ceph-users] Dramatic performance drop at certain number of objects in pool

2016-06-24 Thread Wade Holler
t; > > > On 6/24/16, 10:23 AM, "Wade Holler" wrote: > >>On the vm.vfs_cace_pressure = 1 : We had this initially and I still >>think it is the best choice for most configs. However with our large >>memory footprint, vfs_cache_pressure=1 increased the likel

Re: [ceph-users] Dramatic performance drop at certain number of objects in pool

2016-06-24 Thread Wade Holler
or so to test. Best Regards, Wade On Thu, Jun 23, 2016 at 8:09 PM, Somnath Roy wrote: > Oops , typo , 128 GB :-)... > > -Original Message- > From: Christian Balzer [mailto:ch...@gol.com] > Sent: Thursday, June 23, 2016 5:08 PM > To: ceph-users@lists.ceph.com > Cc: So

Re: [ceph-users] Dramatic performance drop at certain number of objects in pool

2016-06-22 Thread Wade Holler
No. Our application writes very small objects. On Wed, Jun 22, 2016 at 10:01 PM, Blair Bethwaite wrote: > On 23 June 2016 at 11:41, Wade Holler wrote: >> Workload is native librados with python. ALL 4k objects. > > Was that meant to be 4MB? > > -

Re: [ceph-users] Dramatic performance drop at certain number of objects in pool

2016-06-22 Thread Wade Holler
r for certain workloads (e.g. > RBD) it's better to increase default object size somewhat before > pushing the split/merge up a lot... > > Cheers, > > On 23 June 2016 at 11:26, Wade Holler wrote: >> Based on everyones suggestions; The first modification to 50 / 16 &g

Re: [ceph-users] Dramatic performance drop at certain number of objects in pool

2016-06-22 Thread Wade Holler
ctory is under the >> calculated threshold and a write occurs (maybe a read, I forget). >> > If it's a read a plain scrub might do the trick. > > Christian >> Warren >> >> >> From: ceph-users >> mailto:ceph-users-boun...@lists.ceph.com>> &

Re: [ceph-users] Dramatic performance drop at certain number of objects in pool

2016-06-20 Thread Wade Holler
Thanks everyone for your replies. I sincerely appreciate it. We are testing with different pg_num and filestore_split_multiple settings. Early indications are well not great. Regardless it is nice to understand the symptoms better so we try to design around it. Best Regards, Wade On Mon,

Re: [ceph-users] Dramatic performance drop at certain number of objects in pool

2016-06-16 Thread Wade Holler
0 On Thu, Jun 16, 2016 at 8:48 AM, Blair Bethwaite wrote: > Hi Wade, > > What IO are you seeing on the OSD devices when this happens (see e.g. > iostat), are there short periods of high read IOPS where (almost) no > writes occur? What does your memory usage look like (including

Re: [ceph-users] Dramatic performance drop at certain number of objects in pool

2016-06-16 Thread Wade Holler
S where (almost) no > writes occur? What does your memory usage look like (including slab)? > > Cheers, > > On 16 June 2016 at 22:14, Wade Holler wrote: > > Hi All, > > > > I have a repeatable condition when the object count in a pool gets to > > 320-

[ceph-users] Performance drop when object count in a pool hits a threshold

2016-06-16 Thread Wade Holler
Hi All, I have a repeatable condition when the object count in a pool gets to 320-330 million the object write time dramatically and almost instantly increases as much as 10X, exhibited by fs_apply_latency going from 10ms to 100s of ms. Can someone point me in a direction / have an explanation ?

Re: [ceph-users] RadosGW performance s3 many objects

2016-05-23 Thread Wade Holler
We (my customer ) are trying to test at Jewell now but I can say that the above behavior was also observed by my customer at Infernalis. After 300 million or so objects in a single bucket the cluster basically fell down as described above. Few hundred osds in this cluster. We are concerned that thi

[ceph-users] Unsubscribe

2016-05-20 Thread Wade Holler
___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Incorrect crush map

2016-05-05 Thread Wade Holler
md: Unit ceph-osd@43.service entered >> failed state. >> May 4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@43.service failed. >> May 4 22:13:26 sm-cld-mtl-013 systemd: ceph-osd@42.service: main >> process exited, code=exited, status=1/FAILURE >> May 4 22:13:26 sm-cld-mt

Re: [ceph-users] Incorrect crush map

2016-05-03 Thread Wade Holler
Hi Ben, What OS+Version ? Best Regards, Wade On Tue, May 3, 2016 at 2:44 PM Ben Hines wrote: > My crush map keeps putting some OSDs on the wrong node. Restarting them > fixes it temporarily, but they eventually hop back to the other node that > they aren't really on. > > Is there anything tha

[ceph-users] pg to RadosGW object list

2016-03-08 Thread Wade Holler
Hi All, I searched google and what not but haven't found this yet. Does anyone know how to do PG -> applicable RadosGW Object mapping? Best Regards, Wade ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-user

Re: [ceph-users] Increasing time to save RGW objects

2016-02-09 Thread Wade Holler
Hi there, What is the best way to "look at the rgw admin socket " to see what operations are taking a long time ? Best Regards Wade On Mon, Feb 8, 2016 at 12:16 PM Gregory Farnum wrote: > On Mon, Feb 8, 2016 at 8:49 AM, Kris Jurka wrote: > > > > I've been testing the performance of ceph by sto

Re: [ceph-users] Optimal OSD count for SSDs / NVMe disks

2016-02-04 Thread Wade Holler
24 Cheers Wade On Thu, Feb 4, 2016 at 7:24 AM Sascha Vogt wrote: > Hi, > > Am 04.02.2016 um 12:59 schrieb Wade Holler: > > You referenced parallel writes for journal and data. Which is default > > for btrfs but but XFS. Now you are mentioning multiple parallel writes

Re: [ceph-users] Optimal OSD count for SSDs / NVMe disks

2016-02-04 Thread Wade Holler
Am 03.02.2016 um 17:24 schrieb Wade Holler: > > AFAIK when using XFS, parallel write as you described is not enabled. > Not sure I'm getting this. If I have multiple OSDs on the same NVMe > (separated by different data-partitions) I have multiple parallel writes > (one "stream

Re: [ceph-users] Optimal OSD count for SSDs / NVMe disks

2016-02-03 Thread Wade Holler
y but if your KVM instances are really that short lived, could you get away with size=2 on the cache pool from and availability perspective ? On Wed, Feb 3, 2016 at 7:44 AM Sascha Vogt wrote: > Hi Wade, > > Am 03.02.2016 um 13:26 schrieb Wade Holler: > > What is your file syste

Re: [ceph-users] Optimal OSD count for SSDs / NVMe disks

2016-02-03 Thread Wade Holler
Hi Sascha, What is your file system type, XFS or Btrfs ? Thanks Wade On Wed, Feb 3, 2016 at 7:01 AM Sascha Vogt wrote: > Hi all, > > we recently tried adding a cache tier to our ceph cluster. We had 5 > spinning disks per hosts with a single journal NVMe disk, hosting the 5 > journals (1 OSD pe

Re: [ceph-users] ceph random read performance is better than sequential read?

2016-02-02 Thread Wade Holler
Could you share the fio command and your read_ahead_kb setting for the OSD devices ? "performance is better" is a little too general. I understand that we usually mean higher IOPS or higher aggregate throughput when we say performance is better. However, application random read performance "gene

Re: [ceph-users] attempt to access beyond end of device on osd prepare

2016-02-01 Thread Wade Holler
327.3.1 as I recall. Thank for the additional information / confirmation. On Mon, Feb 1, 2016 at 6:05 PM Simon Ironside wrote: > On 01/02/16 17:47, Wade Holler wrote: > > I can at least say that I've seen this. (a lot) > > > > Running Infernalis with Btrfs on Cent 7.2.

Re: [ceph-users] attempt to access beyond end of device on osd prepare

2016-02-01 Thread Wade Holler
I can at least say that I've seen this. (a lot) Running Infernalis with Btrfs on Cent 7.2. I haven't seen any other issue in the cluster that I would say is related. Take it for what you will. Best Regards, Wade On Mon, Feb 1, 2016 at 12:39 PM Simon Ironside wrote: > Has nobody else encount

Re: [ceph-users] SSD OSDs - more Cores or more GHz

2016-01-20 Thread Wade Holler
Great commentary. While it is fundamentally true that higher clock speed equals lower latency, I'm my practical experience we are more often interested in latency at the concurrency profile of the applications. So in this regard I favor more cores when I have to choose, such that we can support m

Re: [ceph-users] CentOS 7.2, Infernalis, preparing osd's and partprobe issues.

2016-01-13 Thread Wade Holler
and I was able to start all osd just running systemctl > start ceph.target > > Cheers > Goncalo > > > -- > *From:* Wade Holler [wade.hol...@gmail.com] > *Sent:* 08 January 2016 01:15 > *To:* Goncalo Borges; Loic Dachary > *Cc:* ceph-users@lists.

Re: [ceph-users] ceph osd tree output

2016-01-11 Thread Wade Holler
appropriate ? Thank you ahead of time for your help! Best Regards, Wade On Mon, Jan 11, 2016 at 5:43 PM John Spray wrote: > On Mon, Jan 11, 2016 at 10:32 PM, Wade Holler > wrote: > > Does anyone else have any suggestions here? I am increasingly concerned > > about my config if

Re: [ceph-users] ceph osd tree output

2016-01-11 Thread Wade Holler
at 11:12 AM Wade Holler wrote: > It is not set in the conf file. So why do I still have this behavior ? > > On Fri, Jan 8, 2016 at 11:08 AM hnuzhoulin wrote: > >> Yeah,this setting can not see in asok config. >> You just set it in ceph.conf and restart mon and osd servi

Re: [ceph-users] using cache-tier with writeback mode, raods bench result degrade

2016-01-08 Thread Wade Holler
My experience is performance degrades dramatically when dirty objects are flushed. Best Regards, Wade On Fri, Jan 8, 2016 at 11:08 AM hnuzhoulin wrote: > Hi,guyes > Recentlly,I am testing cache-tier using writeback mode.but I found a > strange things. > the performance using rados bench degr

Re: [ceph-users] ceph osd tree output

2016-01-08 Thread Wade Holler
> what I use this config is when I changed crushmap manually,and I do not > want the service init script to rebuild crushmap as default way. > > maybe this is not siut for your problem.just have a try. > > 在 Fri, 08 Jan 2016 21:51:32 +0800,Wade Holler 写道: > > That is not se

Re: [ceph-users] ceph osd tree output

2016-01-08 Thread Wade Holler
hmap, even if the crushmap does not reflect > reality. > > Regards, > > Mart > > > > > > On 01/08/2016 02:16 AM, Wade Holler wrote: > > Sure. Apologies for all the text: We have 12 Nodes for OSDs, 15 OSDs per > node, but I will only include a sample: >

Re: [ceph-users] ceph osd tree output

2016-01-07 Thread Wade Holler
} host cpn4 { Thank you for your review ! Wade On Thu, Jan 7, 2016 at 6:03 PM Shinobu Kinjo wrote: > Can you share the output with us? > > Rgds, > Shinobu > > - Original Message ----- > From: "Wade Holler" > To: "ceph-users"

[ceph-users] ceph osd tree output

2016-01-07 Thread Wade Holler
Sometimes my ceph osd tree output is wrong. Ie. Wrong osds on the wrong hosts ? Anyone else have this issue? I have seen this at Infernalis and Jewell. Thanks Wade ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi

Re: [ceph-users] CentOS 7.2, Infernalis, preparing osd's and partprobe issues.

2016-01-07 Thread Wade Holler
I commented out partprobe and everything seems to work just fine. *If someone has experience with why this is very bad please advise. Make sure you know about http://tracker.ceph.com/issues/13833 also. *ps we are running btrfs in the test jig and had to add the "-f" to the btrfs_args for ceph-dis

[ceph-users] Combo for Reliable SSD testing

2016-01-04 Thread Wade Holler
All, I am testing an all SSD and NVMe (journal) config for a customers first endeavor investigating Ceph for performance oriented workloads. Can someone recommend a good performance and reliable ( under high load ) combination? Terrible high level question I know but we have had a number of issu

Re: [ceph-users] ceph-deploy create-initial errors out with "Some monitors have still not reached quorum"

2015-12-31 Thread Wade Holler
I assume you have tested with firewalld disabled ? Best Regards Wade On Thu, Dec 31, 2015 at 9:13 PM Maruthi Seshidhar < maruthi.seshid...@gmail.com> wrote: > hi fellow users, > > I am setting up a ceph cluster with 3 monitors, 4 osds on CentOS 7.1 > > Each of the nodes have 2 NICs. > 10.31.141.0

Re: [ceph-users] more performance issues :(

2015-12-24 Thread Wade Holler
Have a look at the iostsat -x 1 1000 output to see what the drives are doing On Wed, Dec 23, 2015 at 4:35 PM Florian Rommel < florian.rom...@datalounges.com> wrote: > Ah, totally forgot the additional details :) > > OS is SUSE Enterprise Linux 12.0 with all patches, > Ceph version 0.94.3 > 4 node

Re: [ceph-users] requests are blocked

2015-12-22 Thread Wade Holler
on OSDs ? this > will be nasty if it goes into our production! > > > > -- > > Dan > > > > *From:* Wade Holler [mailto:wade.hol...@gmail.com] > *Sent:* Tuesday, December 22, 2015 4:36 PM > *To:* Dan Nica ; ceph-users@lists.ceph.com > *Subject:* Re: [ceph-us

Re: [ceph-users] requests are blocked

2015-12-22 Thread Wade Holler
I had major host stability problems under load with -327 . Repeatable test cases under high load with XFS or BTRFS would result in hung kernel tasks and of course the sympathetic behavior you mention. requests are blocked mean that the op tracker in ceph hasn't received a timely response from the

Re: [ceph-users] rbd image mount on multiple clients

2015-12-21 Thread Wade Holler
Hi Dan , When we say "mount" we are usually referring to a file system. Mounting a non shared filesystem on multiple hosts concurrently will certainly break things since each non shared filesystem host thinks it has exclusive access. Of course this is not true if a shared / clustered filesystem is

Re: [ceph-users] radosgw bucket index sharding tips?

2015-12-16 Thread Wade Holler
I'm interested in this too. Should start testing next week at 1B+ objects and I sure would like a recommendation of what config to start with. We learned the hard way that not sharding is very bad at scales like this. On Wed, Dec 16, 2015 at 2:06 PM Florian Haas wrote: > Hi Ben & everyone, > > j

Re: [ceph-users] recommendations for file sharing

2015-12-15 Thread Wade Holler
Keep it simple is my approach. #1 If needed Add rudimentary HA with pacemaker. http://linux-ha.org/wiki/Samba Cheers Wade On Tue, Dec 15, 2015 at 5:45 AM Alex Leake wrote: > Good Morning, > > > I have a production Ceph cluster at the University I work at, which runs > brilliantly. > > > Howeve