[ceph-users] Receiving "failed to parse date for auth header"

2015-09-04 Thread Ramon Marco Navarro
Good day everyone! I'm having a problem using aws-java-sdk to connect to Ceph using radosgw. I am reading a " NOTICE: failed to parse date for auth header" message in the logs. HTTP_DATE is "Fri, 04 Sep 2015 09:25:33 +00:00", which is I think a valid rfc 1123 date... Here's a link to the related

[ceph-users] Nova fails to download image from Glance backed with Ceph

2015-09-04 Thread Vasiliy Angapov
Hi all, Not sure actually where does this bug belong to - OpenStack or Ceph - but writing here in humble hope that anyone faced that issue also. I configured test OpenStack instance with Glance images stored in Ceph 0.94.3. Nova has local storage. But when I'm trying to launch instance from

Re: [ceph-users] CephFS and caching

2015-09-04 Thread Les
Cephfs can use fscache. I am testing it at the moment. Some lines from my deployment process: sudo apt-get install linux-generic-lts-utopic cachefilesd sudo reboot sudo mkdir /mnt/cephfs sudo mkdir /mnt/ceph_cache sudo mkfs -t xfs /dev/md3 # A 100gb local raid partition sudo bash -c "echo

[ceph-users] Ceph Client parallized access?

2015-09-04 Thread Alexander Walker
Hi, i've configured a CephFS and mouted this in fstab ceph1:6789,ceph2:6789,ceph3:6789:/ /cephfsceph name=admin,secret=AQDVOOhVxEI7IBAAM+4el6WYbCwKvFxmW7ygcA==,noatime 0 2 it's mean: 1. Ceph Client can write data on all three server at the same time? 2. Client access the second

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread Nick Fisk
Actually just thinking about this some more, shouldn't the PG's per OSD "golden rule" also depend on the size of the OSD? If this Directory splitting is a big deal then an 8TB OSD is going to need a lot more PG's than say a 1TB OSD. Any thoughts Mark? > -Original Message- > From:

[ceph-users] Deep scrubbing OSD

2015-09-04 Thread Межов Игорь Александрович
Hi! Just one simple question: how can we see, when deep-scrub of osd complete, if we execute 'ceph osd deep-scrub ' command? Megov Igor CIO, Yuterra ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread Nick Fisk
I've just made the same change ( 4 and 40 for now) on my cluster which is a similar size to yours. I didn't see any merging happening, although most of the directory's I looked at had more files in than the new merge threshold, so I guess this is to be expected I'm currently splitting my PG's

Re: [ceph-users] How to disable object-map and exclusive features ?

2015-09-04 Thread Jason Dillaman
> I have a coredump with the size of 1200M compressed . > > Where shall i put the dump ? > I believe you can use the ceph-post-file utility [1] to upload the core and your current package list to ceph.com. Jason [1] http://ceph.com/docs/master/man/8/ceph-post-file/

Re: [ceph-users] Nova fails to download image from Glance backed with Ceph

2015-09-04 Thread Jan Schermer
Didn't you run out of space? Happened to me when a customer tried to create a 1TB image... Z. > On 04 Sep 2015, at 15:15, Sebastien Han wrote: > > Just to take away a possible issue from infra (LBs etc). > Did you try to download the image on the compute node? Something like

Re: [ceph-users] maximum number of mapped rbds?

2015-09-04 Thread Ilya Dryomov
On Fri, Sep 4, 2015 at 4:44 PM, Ilya Dryomov wrote: > On Fri, Sep 4, 2015 at 4:30 PM, Sebastien Han wrote: >> Which Kernel are you running on? >> These days, the theoretical limit is 65536 AFAIK. >> >> Ilya would know the kernel needed for that. > > 3.14 or

[ceph-users] Impact add PG

2015-09-04 Thread Jimmy Goffaux
English version : Hello everyone, Recently we have increased the number of PG in a pool. We had a big performance problem because everything had CEPH cluster 0 on IOPS while there are production above. So we did this: ceph tell osd.* injectargs '--osd_max_backfills 1' ceph tell osd.*

Re: [ceph-users] maximum number of mapped rbds?

2015-09-04 Thread Sebastien Han
Which Kernel are you running on? These days, the theoretical limit is 65536 AFAIK. Ilya would know the kernel needed for that. > On 03 Sep 2015, at 15:05, Jeff Epstein wrote: > > Hello, > > In response to an rbd map command, we are getting a "Device or resource

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread Mark Nelson
There's a lot of factors that play into all of this. The more PGs you have, the more total objects you can store before you hit the thresholds. More PGs also means slightly better random distribution across OSDs (Not really affected by the size of the OSD assuming all OSDs are uniform). You

Re: [ceph-users] maximum number of mapped rbds?

2015-09-04 Thread Ilya Dryomov
On Fri, Sep 4, 2015 at 4:30 PM, Sebastien Han wrote: > Which Kernel are you running on? > These days, the theoretical limit is 65536 AFAIK. > > Ilya would know the kernel needed for that. 3.14 or later, and, if you are loading your kernel modules by hand or have your distro load

Re: [ceph-users] crash on rbd bench-write

2015-09-04 Thread Jason Dillaman
Any particular reason why you have the image mounted via the kernel client while performing a benchmark? Not to say this is the reason for the crash, but strange since 'rbd bench-write' will not test the kernel IO speed since it uses the user-mode library. Are you able to test bench-write

Re: [ceph-users] Impact add PG

2015-09-04 Thread Wang, Warren
Sadly, this is one of those things that people find out after running their first production Ceph cluster. Never run with the defaults. I know it's been recently reduced to 3 and 1 or 1 and 3, I forget, but I would advocate 1 and 1. Even that will cause a tremendous amount of traffic with any

Re: [ceph-users] Nova fails to download image from Glance backed with Ceph

2015-09-04 Thread Sebastien Han
Just to take away a possible issue from infra (LBs etc). Did you try to download the image on the compute node? Something like rbd export? > On 04 Sep 2015, at 11:56, Vasiliy Angapov wrote: > > Hi all, > > Not sure actually where does this bug belong to - OpenStack or Ceph

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread Jan Schermer
Mark could you please elaborate on this? "use larger directory splitting thresholds to at least balance that part of the equation out" Thanks Jan > On 04 Sep 2015, at 15:31, Mark Nelson wrote: > > There's a lot of factors that play into all of this. The more PGs you have,

[ceph-users] Best layout for SSD & SAS OSDs

2015-09-04 Thread German Anders
Hi cephers, I've the following scheme: 7x OSD servers with: 4x 800GB SSD Intel DC S3510 (OSD-SSD) 3x 120GB SSD Intel DC S3500 (Journals) 5x 3TB SAS disks (OSD-SAS) The OSD servers are located on two separate Racks with two power circuits each. I would like to know what is the

Re: [ceph-users] Best layout for SSD & SAS OSDs

2015-09-04 Thread Nick Fisk
I wouldn't advise upgrading yet if this cluster is going into production. I think several people got bitten last time round when they upgraded to pre hammer. Here is a good example on how to create separate root's for SSD's and HDD's

Re: [ceph-users] Receiving "failed to parse date for auth header"

2015-09-04 Thread Ilya Dryomov
On Fri, Sep 4, 2015 at 12:42 PM, Ramon Marco Navarro wrote: > Good day everyone! > > I'm having a problem using aws-java-sdk to connect to Ceph using radosgw. I > am reading a " NOTICE: failed to parse date for auth header" message in the > logs. HTTP_DATE is "Fri, 04 Sep

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread Ben Hines
Yeah, i'm not seeing stuff being moved at all. Perhaps we should file a ticket to request a way to tell an OSD to rebalance its directory structure. On Fri, Sep 4, 2015 at 5:08 AM, Nick Fisk wrote: > I've just made the same change ( 4 and 40 for now) on my cluster which is a >

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread James (Fei) Liu-SSI
Hi Quentin and Andrija, Thanks so much for reporting the problems with Samsung. Would be possible to get to know your configuration of your system? What kind of workload are you running? Do you use Samsung SSD as separate journaling disk, right? Thanks so much. James From: ceph-users

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Andrija Panic
Quentin, try fio or dd with O_DIRECT and D_SYNC flags, and you will see less than 1MB/s - that is common for most "home" drives - check the post down to understand We removed all Samsung 850 pro 256GB from our new CEPH installation and replaced with Intel S3500 (18.000 (4Kb) IOPS constant

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Quentin Hartman
Yeah, we've ordered some S3700's since we can't afford to have these sorts of failures and haven't been able to find any of the DC-rated Samsung drives anywhere. fwiw, we didn't have any performance problems with the samsungs, it's exclusively this sudden failure that's making us look elsewhere.

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Andrija Panic
Hi James, yes CEPH with Cloudstack. all 6 SSDs (2 SSDs in each of 3 nodes) vanished in 2-3 weeks total time, and yes brand new Samsung 850 Pro 128GB - I also checked wear_level atribute via smartctl prior to all drives dying - no indication wear_level is low or anything...also all other

Re: [ceph-users] ESXi/LIO/RBD repeatable problem, hang when cloning VM

2015-09-04 Thread Alex Gorbachev
On Thu, Sep 3, 2015 at 3:20 AM, Nicholas A. Bellinger wrote: > (RESENDING) > > On Wed, 2015-09-02 at 21:14 -0400, Alex Gorbachev wrote: >> e have experienced a repeatable issue when performing the following: >> >> Ceph backend with no issues, we can repeat any time at will

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread James (Fei) Liu-SSI
Andrija, In your email thread, (18.000 (4Kb) IOPS constant write speed stands for 18K iops with 4k block size, right? However, you can only achieve 200IOPS with Samsung 850Pro, right? Theoretically, Samsung 850 Pro can get up to 100,000 IOPS with 4k Random Read with certain workload. It is a

Re: [ceph-users] OSD respawning -- FAILED assert(clone_size.count(clone))

2015-09-04 Thread David Zafman
Chris, I see that you have stack traces that indicate some OSDs are running v0.94.2 (osd.23) and some running v0.94.3 (osd.30). They should be running the same release except briefly while upgrading. I see some snapshot/cache tiering fixes went into 0.94.3. So an OSD running v0.94.2

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Quentin Hartman
Mine are also mostly 850 Pros. I have a few 840s, and a few 850 EVOs in there just because I couldn't find 14 pros at the time we were ordering hardware. I have 14 nodes, each with a single 128 or 120GB SSD that serves as the boot drive and the journal for 3 OSDs. And similarly, mine just started

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Quentin Hartman
Oh, I forgot to mention, these drives have been in service for about 9 months. If it's useful / interesting at all, here is the smartctl -a output from one of the 840's I installed about the same time as the ones that failed recently, but it has not yet failed: smartctl 6.2 2013-07-26 r3841

[ceph-users] Cannot add/create new monitor on ceph v0.94.3

2015-09-04 Thread Chang, Fangzhe (Fangzhe)
Hi, I’m trying to add a second monitor using ‘ceph-deploy mon new ’. However, the log file shows the following error: 2015-09-04 16:13:54.863479 7f4cbc3f7700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption 2015-09-04 16:13:54.863491 7f4cbc3f7700 0 --

[ceph-users] XFS and nobarriers on Intel SSD

2015-09-04 Thread Richard Bade
Hi Everyone, We have a Ceph pool that is entirely made up of Intel S3700/S3710 enterprise SSD's. We are seeing some significant I/O delays on the disks causing a “SCSI Task Abort” from the OS. This seems to be triggered by the drive receiving a “Synchronize cache command”. My current thinking

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread James (Fei) Liu-SSI
Hi Anrija, Your feedback is greatly appreciated. Regards, James From: Andrija Panic [mailto:andrija.pa...@gmail.com] Sent: Friday, September 04, 2015 12:39 PM To: James (Fei) Liu-SSI Cc: Quentin Hartman; ceph-users Subject: Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Quentin Hartman
I just went through and ran this on all my currently running SSDs: echo "$(smartctl -a /dev/sda | grep Total_LBAs_Written | awk '{ print $NF }') * 512 /1025/1024/1024/1024" | bc which is showing about 32TB written on the oldest nodes, about 20 on the newer ones, and 1 on the first one I've RMA'd

Re: [ceph-users] XFS and nobarriers on Intel SSD

2015-09-04 Thread Jan Schermer
>> We are seeing some significant I/O delays on the disks causing a “SCSI Task >> Abort” from the OS. This seems to be triggered by the drive receiving a >> “Synchronize cache command”. >> >> How exactly do you know this is the cause? This is usually just an effect of something going wrong

Re: [ceph-users] Nova fails to download image from Glance backed with Ceph

2015-09-04 Thread Vasiliy Angapov
Thanks for response! The free space on /var/lib/nova/instances is very large on every compute host. Glance image-download works as expected. 2015-09-04 21:27 GMT+08:00 Jan Schermer : > Didn't you run out of space? Happened to me when a customer tried to create a > 1TB image...

Re: [ceph-users] Best layout for SSD & SAS OSDs

2015-09-04 Thread Christian Balzer
Hello, On Fri, 4 Sep 2015 12:30:12 -0300 German Anders wrote: > Hi cephers, > >I've the following scheme: > > 7x OSD servers with: > Is this a new cluster, total initial deployment? What else are these nodes made of, CPU/RAM/network? While uniform nodes have some appeal

[ceph-users] ceph osd prepare btrfs

2015-09-04 Thread German Anders
Trying to do a prepare on a osd with btrfs, and getting this error: [cibosd04][INFO ] Running command: sudo ceph-disk -v prepare --cluster ceph --fs-type btrfs -- /dev/sdc [cibosd04][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid

Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Andrija Panic
Hi James, I had 3 CEPH nodes as folowing: 12 OSDs(HDD) and 2 SSDs (2x 6 Journals partitions on each SSD) - SSDs just vanished with no warning, no smartctl errors nothing... so 2 SSDs in 3 servers vanished in...2-3 weeks, after a 3-4 months of being in production (VMs/KVM/CloudStack) Mine were

Re: [ceph-users] Best layout for SSD & SAS OSDs

2015-09-04 Thread Nick Fisk
Hi German, Are the power feeds completely separate (ie 4 feeds in total), or just each rack has both feeds? If it’s the latter I don’t see any benefit from including this into the crushmap and would just create a “rack” bucket. Also assuming your servers have dual PSU’s, this also changes

Re: [ceph-users] Best layout for SSD & SAS OSDs

2015-09-04 Thread German Anders
Thanks a lot Nick, regarding the power feeds, we only had two circuits for all the racks, so I'll to do in the crush the "rack" bucket and separate the osd servers on the rack buckets, then regarding the SSD pools, I've installed the hammer version and wondering to upgrade to Infernalis v9.0.3 and

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread Shinobu Kinjo
> IIRC, it only triggers the move (merge or split) when that folder is hit by a > request, so most likely it happens gradually. Do you know what causes this? I would like to be more clear "gradually". Shinobu - Original Message - From: "GuangYang" To: "Ben Hines"

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread GuangYang
> Date: Fri, 4 Sep 2015 20:31:59 -0400 > From: ski...@redhat.com > To: yguan...@outlook.com > CC: bhi...@gmail.com; n...@fisk.me.uk; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Ceph performance, empty vs part full > >> IIRC, it only triggers the

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread Shinobu Kinjo
Very nice. You're my hero! Shinobu - Original Message - From: "GuangYang" To: "Shinobu Kinjo" Cc: "Ben Hines" , "Nick Fisk" , "ceph-users" Sent: Saturday, September 5, 2015 9:40:06

Re: [ceph-users] НА: which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Christian Balzer
Hello, On Fri, 4 Sep 2015 22:37:06 + Межов Игорь Александрович wrote: > Hi! > > > Have worked with Intel DC S3700 200Gb. Due to budget restrictions, one > > ssd hosts a system volume and 1:12 OSD journals. 6 nodes, 120Tb raw > space. > Meaning you're limited to 360MB/s writes per node at

[ceph-users] НА: which SSD / experiences with Samsung 843T vs. Intel s3700

2015-09-04 Thread Межов Игорь Александрович
Hi! Have worked with Intel DC S3700 200Gb. Due to budget restrictions, one ssd hosts a system volume and 1:12 OSD journals. 6 nodes, 120Tb raw space. Cluster serves as RBD storage for ~100VM. Not a single failure per year - all devices are healthy. The remainig resource (by smart) is ~92%.

Re: [ceph-users] XFS and nobarriers on Intel SSD

2015-09-04 Thread Richard Bade
Hi Jan, Thanks for your response. > *How exactly do you know this is the cause? This is usually just an effect > of something going wrong and part of error recovery process.**Preceding > this event should be the real error/root cause...* We have been working with LSI/Avago to resolve this. We

Re: [ceph-users] Ceph performance, empty vs part full

2015-09-04 Thread GuangYang
IIRC, it only triggers the move (merge or split) when that folder is hit by a request, so most likely it happens gradually. Another thing might be helpful (and we have had good experience with), is that we do the folder splitting at the pool creation time, so that we avoid the performance

Re: [ceph-users] libvirt rbd issue

2015-09-04 Thread Rafael Lopez
We don't have thousands but these RBDs are in a pool backed by ~600ish. I can see the fd count is up well past 10k, closer to 15k when I use a decent number of RBDs (eg. 16 or 32) and seems to increase more the bigger the file I write. Procs are almost 30k when writing a 50GB file across that

Re: [ceph-users] high density machines

2015-09-04 Thread Gurvinder Singh
On 09/04/2015 02:31 AM, Wang, Warren wrote: > In the minority on this one. We have a number of the big SM 72 drive units w/ > 40 Gbe. Definitely not as fast as even the 36 drive units, but it isn't awful > for our average mixed workload. We can exceed all available performance with > some