Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-27 Thread Samuel Just
I just pushed a patch to wip-dumpling-log-assert (based on current
dumpling head).  I had disabled most of the code in PGLog::check() but
left an (I thought) innocuous assert.  It seems that with (at least)
g++ 4.6.3, stl list::size() is linear in the size of the list, so that
assert actually traverses the pg log on each operation.  The patch in
wip-dumpling-log-assert should disable that assert as well by default.
 Let me know if it helps.

It should be built within an hour of this email.
-Sam

On Mon, Aug 26, 2013 at 10:46 PM, Matthew Anderson
manderson8...@gmail.com wrote:
 Hi Guys,

 I'm having the same problem as Oliver with 0.67.2. CPU usage is around
 double that of the 0.61.8 OSD's in the same cluster which appears to
 be causing the performance decrease.

 I did a perf comparison (not sure if I did it right but it seems ok).
 Both hosts are the same spec running Ubuntu 12.04.1 (3.2 kernel),
 journal and osd data is on an SSD, OSD's are in the same pool with the
 same weight and the perf tests were run at the same time on a
 realworld load consisting of RBD traffic only.

 Dumpling -

 Events: 332K cycles
  17.93%  ceph-osd  libc-2.15.so   [.] 0x15d523
  17.03%  ceph-osd  ceph-osd   [.] 0x5c2897
   4.66%  ceph-osd  ceph-osd   [.]
 leveldb::InternalKeyComparator::Compare(leveldb::Slice const, level
   3.46%  ceph-osd  ceph-osd   [.] leveldb::Block::Iter::Next()
   2.70%  ceph-osd  libstdc++.so.6.0.16[.]
 std::string::_M_mutate(unsigned long, unsigned long, unsigned long)
   2.60%  ceph-osd  ceph-osd   [.] PGLog::check()
   2.57%  ceph-osd  [kernel.kallsyms]  [k] __ticket_spin_lock
   2.49%  ceph-osd  ceph-osd   [.] ceph_crc32c_le_intel
   1.93%  ceph-osd  libsnappy.so.1.1.2 [.]
 snappy::RawUncompress(snappy::Source*, char*)
   1.53%  ceph-osd  libstdc++.so.6.0.16[.] std::string::append(char
 const*, unsigned long)
   1.47%  ceph-osd  libtcmalloc.so.0.1.0   [.] operator new(unsigned long)
   1.33%  ceph-osd  [kernel.kallsyms]  [k] copy_user_generic_string
   0.98%  ceph-osd  libtcmalloc.so.0.1.0   [.] operator delete(void*)
   0.90%  ceph-osd  libstdc++.so.6.0.16[.] std::string::assign(char
 const*, unsigned long)
   0.75%  ceph-osd  libstdc++.so.6.0.16[.]
 std::string::_M_replace_safe(unsigned long, unsigned long, char cons
   0.58%  ceph-osd  [kernel.kallsyms]  [k] wait_sb_inodes
   0.55%  ceph-osd  ceph-osd   [.]
 leveldb::Block::Iter::Valid() const
   0.51%  ceph-osd  libtcmalloc.so.0.1.0   [.]
 tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::
   0.50%  ceph-osd  libtcmalloc.so.0.1.0   [.]
 tcmalloc::CentralFreeList::FetchFromSpans()
   0.47%  ceph-osd  libstdc++.so.6.0.16[.] 0x9ebc8
   0.46%  ceph-osd  libc-2.15.so   [.] vfprintf
   0.45%  ceph-osd  [kernel.kallsyms]  [k] find_busiest_group
   0.45%  ceph-osd  libstdc++.so.6.0.16[.]
 std::string::resize(unsigned long, char)
   0.43%  ceph-osd  libpthread-2.15.so [.] pthread_mutex_unlock
   0.41%  ceph-osd  [kernel.kallsyms]  [k] iput_final
   0.40%  ceph-osd  ceph-osd   [.]
 leveldb::Block::Iter::Seek(leveldb::Slice const)
   0.39%  ceph-osd  libc-2.15.so   [.] _IO_vfscanf
   0.39%  ceph-osd  ceph-osd   [.] leveldb::Block::Iter::key() 
 const
   0.39%  ceph-osd  libtcmalloc.so.0.1.0   [.]
 tcmalloc::CentralFreeList::ReleaseToSpans(void*)
   0.37%  ceph-osd  libstdc++.so.6.0.16[.] std::basic_ostreamchar,
 std::char_traitschar  std::__ostream_in


 Cuttlefish -

 Events: 160K cycles
   7.53%  ceph-osd  [kernel.kallsyms]  [k] __ticket_spin_lock
   6.26%  ceph-osd  libc-2.15.so   [.] 0x89115
   3.06%  ceph-osd  ceph-osd   [.] ceph_crc32c_le
   2.66%  ceph-osd  libtcmalloc.so.0.1.0   [.] operator new(unsigned long)
   2.46%  ceph-osd  [kernel.kallsyms]  [k] find_busiest_group
   1.80%  ceph-osd  libtcmalloc.so.0.1.0   [.] operator delete(void*)
   1.42%  ceph-osd  [kernel.kallsyms]  [k] try_to_wake_up
   1.27%  ceph-osd  ceph-osd   [.] 0x531fb6
   1.21%  ceph-osd  libstdc++.so.6.0.16[.] 0x9ebc8
   1.14%  ceph-osd  [kernel.kallsyms]  [k] wait_sb_inodes
   1.02%  ceph-osd  libc-2.15.so   [.] _IO_vfscanf
   1.01%  ceph-osd  [kernel.kallsyms]  [k] update_shares
   0.98%  ceph-osd  [kernel.kallsyms]  [k] filemap_fdatawait_range
   0.90%  ceph-osd  libstdc++.so.6.0.16[.] std::basic_ostreamchar,
 std::char_traitschar  std
   0.89%  ceph-osd  [kernel.kallsyms]  [k] iput_final
   0.79%  ceph-osd  libstdc++.so.6.0.16[.] std::basic_stringchar,
 std::char_traitschar, std::a
   0.79%  ceph-osd  [kernel.kallsyms]  [k] copy_user_generic_string
   0.78%  ceph-osd  libc-2.15.so   [.] vfprintf
   0.70%  ceph-osd  libtcmalloc.so.0.1.0   [.]
 tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc:
   0.69%  ceph-osd  [kernel.kallsyms]  [k] __d_lookup_rcu
   0.69%  ceph-osd  

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-27 Thread Matthew Anderson
Hi Sam,

It looks like that has dropped the CPU usage a fair bit. CPU usage
still seems a bit higher than Cuttlefish but that might just be due to
the levelDB changes.

Here's the updated perf report -

Events: 80K cycles
 17.25%  ceph-osd  libc-2.15.so   [.] 0x15d534
 14.63%  ceph-osd  ceph-osd   [.] 0x5c801b
  3.87%  ceph-osd  ceph-osd   [.]
leveldb::InternalKeyComparator::Compare(leveldb::Slice const,
leveldb::Slice const) const
  2.91%  ceph-osd  ceph-osd   [.] leveldb::Block::Iter::Next()
  2.58%  ceph-osd  [kernel.kallsyms]  [k] __ticket_spin_lock
  2.45%  ceph-osd  libstdc++.so.6.0.16[.]
std::string::_M_mutate(unsigned long, unsigned long, unsigned long)
  2.02%  ceph-osd  ceph-osd   [.] ceph_crc32c_le_intel
  1.80%  ceph-osd  libtcmalloc.so.0.1.0   [.] operator new(unsigned long)
  1.38%  ceph-osd  libstdc++.so.6.0.16[.] std::string::append(char
const*, unsigned long)
  1.15%  ceph-osd  libsnappy.so.1.1.2 [.]
snappy::RawUncompress(snappy::Source*, char*)
  1.04%  ceph-osd  libtcmalloc.so.0.1.0   [.] operator delete(void*)
  1.03%  ceph-osd  [kernel.kallsyms]  [k] copy_user_generic_string
  0.77%  ceph-osd  libstdc++.so.6.0.16[.]
std::string::_M_replace_safe(unsigned long, unsigned long, char
const*, unsigned long)
  0.72%  ceph-osd  libstdc++.so.6.0.16[.] 0x9ebc8
  0.68%  ceph-osd  libstdc++.so.6.0.16[.] std::basic_stringchar,
std::char_traitschar, std::allocatorchar
::basic_string(std::string const)
  0.67%  ceph-osd  [kernel.kallsyms]  [k] find_busiest_group
  0.61%  ceph-osd  [kernel.kallsyms]  [k] tg_load_down
  0.57%  ceph-osd  libc-2.15.so   [.] vfprintf
  0.54%  ceph-osd  libc-2.15.so   [.] _IO_vfscanf
  0.53%  ceph-osd  libtcmalloc.so.0.1.0   [.]
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
unsigned long, int)
  0.51%  ceph-osd  [kernel.kallsyms]  [k] wait_sb_inodes
  0.47%  ceph-osd  libpthread-2.15.so [.] pthread_mutex_unlock
  0.47%  ceph-osd  libstdc++.so.6.0.16[.] std::string::assign(char
const*, unsigned long)
  0.47%  ceph-osd  ceph-osd   [.]
leveldb::Block::Iter::Valid() const

On Tue, Aug 27, 2013 at 2:33 PM, Samuel Just sam.j...@inktank.com wrote:
 I just pushed a patch to wip-dumpling-log-assert (based on current
 dumpling head).  I had disabled most of the code in PGLog::check() but
 left an (I thought) innocuous assert.  It seems that with (at least)
 g++ 4.6.3, stl list::size() is linear in the size of the list, so that
 assert actually traverses the pg log on each operation.  The patch in
 wip-dumpling-log-assert should disable that assert as well by default.
  Let me know if it helps.

 It should be built within an hour of this email.
 -Sam

 On Mon, Aug 26, 2013 at 10:46 PM, Matthew Anderson
 manderson8...@gmail.com wrote:
 Hi Guys,

 I'm having the same problem as Oliver with 0.67.2. CPU usage is around
 double that of the 0.61.8 OSD's in the same cluster which appears to
 be causing the performance decrease.

 I did a perf comparison (not sure if I did it right but it seems ok).
 Both hosts are the same spec running Ubuntu 12.04.1 (3.2 kernel),
 journal and osd data is on an SSD, OSD's are in the same pool with the
 same weight and the perf tests were run at the same time on a
 realworld load consisting of RBD traffic only.

 Dumpling -

 Events: 332K cycles
  17.93%  ceph-osd  libc-2.15.so   [.] 0x15d523
  17.03%  ceph-osd  ceph-osd   [.] 0x5c2897
   4.66%  ceph-osd  ceph-osd   [.]
 leveldb::InternalKeyComparator::Compare(leveldb::Slice const, level
   3.46%  ceph-osd  ceph-osd   [.] leveldb::Block::Iter::Next()
   2.70%  ceph-osd  libstdc++.so.6.0.16[.]
 std::string::_M_mutate(unsigned long, unsigned long, unsigned long)
   2.60%  ceph-osd  ceph-osd   [.] PGLog::check()
   2.57%  ceph-osd  [kernel.kallsyms]  [k] __ticket_spin_lock
   2.49%  ceph-osd  ceph-osd   [.] ceph_crc32c_le_intel
   1.93%  ceph-osd  libsnappy.so.1.1.2 [.]
 snappy::RawUncompress(snappy::Source*, char*)
   1.53%  ceph-osd  libstdc++.so.6.0.16[.] std::string::append(char
 const*, unsigned long)
   1.47%  ceph-osd  libtcmalloc.so.0.1.0   [.] operator new(unsigned long)
   1.33%  ceph-osd  [kernel.kallsyms]  [k] copy_user_generic_string
   0.98%  ceph-osd  libtcmalloc.so.0.1.0   [.] operator delete(void*)
   0.90%  ceph-osd  libstdc++.so.6.0.16[.] std::string::assign(char
 const*, unsigned long)
   0.75%  ceph-osd  libstdc++.so.6.0.16[.]
 std::string::_M_replace_safe(unsigned long, unsigned long, char cons
   0.58%  ceph-osd  [kernel.kallsyms]  [k] wait_sb_inodes
   0.55%  ceph-osd  ceph-osd   [.]
 leveldb::Block::Iter::Valid() const
   0.51%  ceph-osd  libtcmalloc.so.0.1.0   [.]
 tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::
   0.50%  ceph-osd  libtcmalloc.so.0.1.0   [.]
 

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-27 Thread Oliver Daudey
Hey Samuel,

The PGLog::check() is now no longer visible in profiling, so it helped
for that.  Unfortunately, it doesn't seem to have helped to bring down
the OSD's CPU-loading much.  Leveldb still uses much more than in
Cuttlefish.  On my test-cluster, I didn't notice any difference in the
RBD bench-results, either, so I have to assume that it didn't help
performance much.

Here's the `perf top' I took just now on my production-cluster with your
new version, under regular load.  Also note the memcmp and memcpy,
which also don't show up when running a Cuttlefish-OSD:
 15.65%  [kernel] [k]
intel_idle
  7.20%  libleveldb.so.1.9[.]
0x3ceae   
  6.28%  libc-2.11.3.so   [.]
memcmp
  5.22%  [kernel] [k]
find_busiest_group
  3.92%  kvm  [.]
0x2cf006  
  2.40%  libleveldb.so.1.9[.]
leveldb::InternalKeyComparator::Compar
  1.95%  [kernel] [k]
_raw_spin_lock
  1.69%  [kernel] [k]
default_send_IPI_mask_sequence_phys   
  1.46%  libc-2.11.3.so   [.]
memcpy
  1.17%  libleveldb.so.1.9[.]
leveldb::Block::Iter::Next()  
  1.16%  [kernel] [k]
hrtimer_interrupt 
  1.07%  [kernel] [k]
native_write_cr0  
  1.01%  [kernel] [k]
__hrtimer_start_range_ns  
  1.00%  [kernel] [k]
clockevents_program_event 
  0.93%  [kernel] [k]
find_next_bit 
  0.93%  libstdc++.so.6.0.13  [.]
std::string::_M_mutate(unsigned long, 
  0.89%  [kernel] [k]
cpumask_next_and  
  0.87%  [kernel] [k]
__schedule
  0.85%  [kernel] [k]
_raw_spin_unlock_irqrestore   
  0.85%  [kernel] [k]
do_select 
  0.84%  [kernel] [k]
apic_timer_interrupt  
  0.80%  [kernel] [k]
fget_light
  0.79%  [kernel] [k]
native_write_msr_safe 
  0.76%  [kernel] [k]
_raw_spin_lock_irqsave
  0.66%  libc-2.11.3.so   [.]
0xdc6d8   
  0.61%  libpthread-2.11.3.so [.]
pthread_mutex_lock
  0.61%  [kernel] [k]
tg_load_down  
  0.59%  [kernel] [k]
reschedule_interrupt  
  0.59%  libsnappy.so.1.1.2   [.]
snappy::RawUncompress(snappy::Source*,
  0.56%  libstdc++.so.6.0.13  [.] std::string::append(char
const*, unsig
  0.54%  [kvm_intel]  [k]
vmx_vcpu_run  
  0.53%  [kernel] [k]
copy_user_generic_string  
  0.53%  [kernel] [k]
load_balance  
  0.50%  [kernel] [k]
rcu_needs_cpu 
  0.45%  [kernel] [k] fput


   Regards,

 Oliver

On ma, 2013-08-26 at 23:33 -0700, Samuel Just wrote:
 I just pushed a patch to wip-dumpling-log-assert (based on current
 dumpling head).  I had disabled most of the code in PGLog::check() but
 left an (I thought) innocuous assert.  It seems that with (at least)
 g++ 4.6.3, stl list::size() is linear in the size of the list, so that
 assert actually traverses the pg log on each operation.  The patch in
 wip-dumpling-log-assert should disable that assert as well by default.
  Let me know if it helps.
 
 It should be built within an hour of this email.
 -Sam
 
 On Mon, Aug 26, 2013 at 10:46 PM, Matthew Anderson
 manderson8...@gmail.com wrote:
  Hi Guys,
 
  I'm having the same problem as Oliver with 0.67.2. CPU usage is around
  double that of the 0.61.8 OSD's in the same cluster which appears to
  be causing the performance decrease.
 
  I did a perf comparison (not sure if I did it right but it seems ok).
  Both hosts are the same spec running Ubuntu 12.04.1 (3.2 kernel),
  journal and osd data is on an SSD, OSD's are in the same pool with the
  same weight and the perf tests were run at the same time on a
  realworld load consisting of RBD traffic only.
 
  Dumpling -
 
  Events: 332K cycles
   17.93%  ceph-osd  libc-2.15.so   [.] 0x15d523
   17.03%  ceph-osd  ceph-osd   [.] 0x5c2897
4.66%  ceph-osd  ceph-osd   [.]
  leveldb::InternalKeyComparator::Compare(leveldb::Slice const, level
3.46%  ceph-osd  ceph-osd   [.] leveldb::Block::Iter::Next()
2.70%  ceph-osd  libstdc++.so.6.0.16[.]
  

Re: [ceph-users] lvm for a quick ceph lab cluster test

2013-08-27 Thread Robert Sander
On 26.08.2013 23:07, Samuel Just wrote:
 Seems reasonable to me.  I'm not sure I've heard anything about using
 LVM under ceph.  Let us know how it goes!

We are currently using it on a test cluster distributed on our desktops.
Loïc Dachary visited us and wrote a small article:
http://dachary.org/?p=2269

One thing with LVM volumes is that you have to manually create the
filesystem (mkfs.xfs) and mount it somewhere and then point ceph-deploy
to that directory. It then creates a symlink under /var/lib/ceph/osd.

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph + Xen - RBD io hang

2013-08-27 Thread James Dingwall

Hi,

I am doing some experimentation with Ceph and Xen (on the same host) and 
I'm experiencing some problems with the rbd device that I'm using as the 
block device.  My environment is:


2 node Ceph 0.67.2 cluster, 4x OSD (btrfs) and 1x mon
Xen 4.3.0
Kernel 3.10.9

The domU I'm trying to build is from the Ubuntu 13.04 desktop release.  
When I pass through the rbd (format 1 or 2) device as 
phy:/dev/rbd/rbd/ubuntu-test then the domU has no problems reading data 
from it, the test I ran was:


for i in $(seq 0 1023) ; do
dd if=/dev/xvda of=/dev/null bs=4k count=1024 skip=$(($i * 4))
done

However writing data causes the domU to hang while while i is still in 
single figures but it doesn't seem consistent about the exact value.

for i in $(seq 0 1023) ; do
dd of=/dev/xvda of=/dev/zero bs=4k count=1024 seek=$(($i * 4))
done

eventually the kernel in the domU will print a hung task warning.  I 
have tried the domU as pv and hvm (with xen_platform_pci = 1 and 0) but 
have the same behaviour in both cases.  Once this state is triggered on 
the rbd device then any interaction with it in dom0 will result in the 
same hang.  I'm assuming that there is some unfavourable interaction 
between ceph/rbd and blkback but I haven't found anything in the dom0 
logs so I would like to know if anyone has some suggestions about where 
to start trying to hunt this down.


Thanks,
James
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph + Xen - RBD io hang

2013-08-27 Thread Olivier Bonvalet
Hi,

I use Ceph 0.61.8 and Xen 4.2.2 (Debian) in production, and can't use
kernel 3.10.* on dom0, which hang very soon. But it's visible in kernel
logs of the dom0, not the domU.

Anyway, you should probably re-try with kernel 3.9.11 for the dom0 (I
also use 3.10.9 in domU).

Olivier

Le mardi 27 août 2013 à 11:46 +0100, James Dingwall a écrit :
 Hi,
 
 I am doing some experimentation with Ceph and Xen (on the same host) and 
 I'm experiencing some problems with the rbd device that I'm using as the 
 block device.  My environment is:
 
 2 node Ceph 0.67.2 cluster, 4x OSD (btrfs) and 1x mon
 Xen 4.3.0
 Kernel 3.10.9
 
 The domU I'm trying to build is from the Ubuntu 13.04 desktop release.  
 When I pass through the rbd (format 1 or 2) device as 
 phy:/dev/rbd/rbd/ubuntu-test then the domU has no problems reading data 
 from it, the test I ran was:
 
 for i in $(seq 0 1023) ; do
  dd if=/dev/xvda of=/dev/null bs=4k count=1024 skip=$(($i * 4))
 done
 
 However writing data causes the domU to hang while while i is still in 
 single figures but it doesn't seem consistent about the exact value.
 for i in $(seq 0 1023) ; do
  dd of=/dev/xvda of=/dev/zero bs=4k count=1024 seek=$(($i * 4))
 done
 
 eventually the kernel in the domU will print a hung task warning.  I 
 have tried the domU as pv and hvm (with xen_platform_pci = 1 and 0) but 
 have the same behaviour in both cases.  Once this state is triggered on 
 the rbd device then any interaction with it in dom0 will result in the 
 same hang.  I'm assuming that there is some unfavourable interaction 
 between ceph/rbd and blkback but I haven't found anything in the dom0 
 logs so I would like to know if anyone has some suggestions about where 
 to start trying to hunt this down.
 
 Thanks,
 James
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems with keyrings during deployment

2013-08-27 Thread Francesc Alted
Hi again,

I continue to try debugging the problem reported before.  Now, I have been
trying to use a couple of VM for doing this (one with Ubuntu 12.04 64-bit,
and the other with Ubuntu 12.10 64-bit, and I use the ceph.com repos for
installing the Ceph libraries).  And, unfortunately, I am getting into the
same problem: the keyring do not appear where they should (i.e.
bootstrap-mds and bootstrap-osd in /var/lib/ceph).

I have followed the preflight check list (
http://ceph.com/docs/next/start/quick-start-preflight/), and the ceph user
in the admin box can login perfectly well on the server box, so not sure
what's going on here.

I have even tried to use a single ceph server for installing everything
(adding the 'osd crush chooseleaf type = 0' line into the ceph conf file)
but then again the keyrings do not appear.

Nobody is having the same problems than me (using latest Ceph Dumpling
0.67.2 release here)?

Thanks for any insight!

Francesc

On Mon, Aug 26, 2013 at 1:55 PM, Francesc Alted franc...@continuum.iowrote:

 Hi,

 I am a newcomer to Ceph.  After having a look at the docs (BTW, it is nice
 to see its concepts being implemented), I am trying to do some tests,
 mainly to check the Python APIs to access RADOS and RDB components.  I am
 following this quick guide:

 http://ceph.com/docs/next/start/quick-ceph-deploy/

 But after adding a monitor (ceph-deploy mon create ceph-server), I see
 that the subdirectories bootstrap-mds and bootstrap-osd (in /var/lib/ceph)
 do not contain keyrings.  I have tried to create the monitor again (as
 suggested in the docs), but the keyrings continue to not appear there:

 $ ceph-deploy gatherkeys ceph-server
 [ceph_deploy.gatherkeys][DEBUG ] Checking ceph-server for
 /etc/ceph/ceph.client.admin.keyring
 [ceph_deploy.gatherkeys][WARNIN] Unable to find
 /etc/ceph/ceph.client.admin.keyring on ['ceph-server']
 [ceph_deploy.gatherkeys][DEBUG ] Have ceph.mon.keyring
 [ceph_deploy.gatherkeys][DEBUG ] Checking ceph-server for
 /var/lib/ceph/bootstrap-osd/ceph.keyring
 [ceph_deploy.gatherkeys][WARNIN] Unable to find
 /var/lib/ceph/bootstrap-osd/ceph.keyring on ['ceph-server']
 [ceph_deploy.gatherkeys][DEBUG ] Checking ceph-server for
 /var/lib/ceph/bootstrap-mds/ceph.keyring
 [ceph_deploy.gatherkeys][WARNIN] Unable to find
 /var/lib/ceph/bootstrap-mds/ceph.keyring on ['ceph-server']

 My admin node (the machine from where I issue the ceph commands) is an
 openSUSE 12.3 where I compiled the ceph-0.67.1 tarball.  The server node is
 a Debian Precise 64-bit (using vagrant w/ VirtaulBox), and Ceph
 installation seems to have gone well, as per the logs:

 [ceph-server][INFO  ] Running command: ceph --version
 [ceph-server][INFO  ] ceph version 0.67.2
 (eb4380dd036a0b644c6283869911d615ed729ac8)

 Any hints on what is going on there?  Thanks!

 --
 Francesc Alted




-- 
Francesc Alted
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems with keyrings during deployment

2013-08-27 Thread Oliver Daudey
Hey Francesc,

I encountered these while playing with ceph-deploy a couple of days
earlier.  Haven't done any troubleshooting on it yet.  I encountered the
error with the gatherkeys-option, just like you did.


   Regards,

  Oliver

On di, 2013-08-27 at 13:18 +0200, Francesc Alted wrote:
 Hi again,
 
 
 I continue to try debugging the problem reported before.  Now, I have
 been trying to use a couple of VM for doing this (one with Ubuntu
 12.04 64-bit, and the other with Ubuntu 12.10 64-bit, and I use the
 ceph.com repos for installing the Ceph libraries).  And,
 unfortunately, I am getting into the same problem: the keyring do not
 appear where they should (i.e. bootstrap-mds and bootstrap-osd
 in /var/lib/ceph).
 
 
 I have followed the preflight check list
 (http://ceph.com/docs/next/start/quick-start-preflight/), and the ceph
 user in the admin box can login perfectly well on the server box, so
 not sure what's going on here.
 
 
 I have even tried to use a single ceph server for installing
 everything (adding the 'osd crush chooseleaf type = 0' line into the
 ceph conf file) but then again the keyrings do not appear.
 
 
 Nobody is having the same problems than me (using latest Ceph Dumpling
 0.67.2 release here)? 
 
 
 Thanks for any insight!
 
 
 Francesc
 
 On Mon, Aug 26, 2013 at 1:55 PM, Francesc Alted
 franc...@continuum.io wrote:
 Hi,
 
 
 I am a newcomer to Ceph.  After having a look at the docs
 (BTW, it is nice to see its concepts being implemented), I am
 trying to do some tests, mainly to check the Python APIs to
 access RADOS and RDB components.  I am following this quick
 guide:
 
 
 http://ceph.com/docs/next/start/quick-ceph-deploy/
 
 
 But after adding a monitor (ceph-deploy mon create
 ceph-server), I see that the subdirectories bootstrap-mds and
 bootstrap-osd (in /var/lib/ceph) do not contain keyrings.  I
 have tried to create the monitor again (as suggested in the
 docs), but the keyrings continue to not appear there:
 
 
 $ ceph-deploy gatherkeys ceph-server
 [ceph_deploy.gatherkeys][DEBUG ] Checking ceph-server
 for /etc/ceph/ceph.client.admin.keyring
 [ceph_deploy.gatherkeys][WARNIN] Unable to
 find /etc/ceph/ceph.client.admin.keyring on ['ceph-server']
 [ceph_deploy.gatherkeys][DEBUG ] Have ceph.mon.keyring
 [ceph_deploy.gatherkeys][DEBUG ] Checking ceph-server
 for /var/lib/ceph/bootstrap-osd/ceph.keyring
 [ceph_deploy.gatherkeys][WARNIN] Unable to
 find /var/lib/ceph/bootstrap-osd/ceph.keyring on
 ['ceph-server']
 [ceph_deploy.gatherkeys][DEBUG ] Checking ceph-server
 for /var/lib/ceph/bootstrap-mds/ceph.keyring
 [ceph_deploy.gatherkeys][WARNIN] Unable to
 find /var/lib/ceph/bootstrap-mds/ceph.keyring on
 ['ceph-server']
 
 
 My admin node (the machine from where I issue the ceph
 commands) is an openSUSE 12.3 where I compiled the ceph-0.67.1
 tarball.  The server node is a Debian Precise 64-bit (using
 vagrant w/ VirtaulBox), and Ceph installation seems to have
 gone well, as per the logs:
 
 
 [ceph-server][INFO  ] Running command: ceph --version
 [ceph-server][INFO  ] ceph version 0.67.2
 (eb4380dd036a0b644c6283869911d615ed729ac8)
 
 
 Any hints on what is going on there?  Thanks!
 
 
 -- 
 Francesc Alted
 
 
 
 
 -- 
 Francesc Alted 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-27 Thread Mark Nelson

Hi Olver/Matthew,

Ignoring CPU usage, has speed remained slower as well?

Mark

On 08/27/2013 03:08 AM, Oliver Daudey wrote:

Hey Samuel,

The PGLog::check() is now no longer visible in profiling, so it helped
for that.  Unfortunately, it doesn't seem to have helped to bring down
the OSD's CPU-loading much.  Leveldb still uses much more than in
Cuttlefish.  On my test-cluster, I didn't notice any difference in the
RBD bench-results, either, so I have to assume that it didn't help
performance much.

Here's the `perf top' I took just now on my production-cluster with your
new version, under regular load.  Also note the memcmp and memcpy,
which also don't show up when running a Cuttlefish-OSD:
  15.65%  [kernel] [k]
intel_idle
   7.20%  libleveldb.so.1.9[.]
0x3ceae
   6.28%  libc-2.11.3.so   [.]
memcmp
   5.22%  [kernel] [k]
find_busiest_group
   3.92%  kvm  [.]
0x2cf006
   2.40%  libleveldb.so.1.9[.]
leveldb::InternalKeyComparator::Compar
   1.95%  [kernel] [k]
_raw_spin_lock
   1.69%  [kernel] [k]
default_send_IPI_mask_sequence_phys
   1.46%  libc-2.11.3.so   [.]
memcpy
   1.17%  libleveldb.so.1.9[.]
leveldb::Block::Iter::Next()
   1.16%  [kernel] [k]
hrtimer_interrupt
   1.07%  [kernel] [k]
native_write_cr0
   1.01%  [kernel] [k]
__hrtimer_start_range_ns
   1.00%  [kernel] [k]
clockevents_program_event
   0.93%  [kernel] [k]
find_next_bit
   0.93%  libstdc++.so.6.0.13  [.]
std::string::_M_mutate(unsigned long,
   0.89%  [kernel] [k]
cpumask_next_and
   0.87%  [kernel] [k]
__schedule
   0.85%  [kernel] [k]
_raw_spin_unlock_irqrestore
   0.85%  [kernel] [k]
do_select
   0.84%  [kernel] [k]
apic_timer_interrupt
   0.80%  [kernel] [k]
fget_light
   0.79%  [kernel] [k]
native_write_msr_safe
   0.76%  [kernel] [k]
_raw_spin_lock_irqsave
   0.66%  libc-2.11.3.so   [.]
0xdc6d8
   0.61%  libpthread-2.11.3.so [.]
pthread_mutex_lock
   0.61%  [kernel] [k]
tg_load_down
   0.59%  [kernel] [k]
reschedule_interrupt
   0.59%  libsnappy.so.1.1.2   [.]
snappy::RawUncompress(snappy::Source*,
   0.56%  libstdc++.so.6.0.13  [.] std::string::append(char
const*, unsig
   0.54%  [kvm_intel]  [k]
vmx_vcpu_run
   0.53%  [kernel] [k]
copy_user_generic_string
   0.53%  [kernel] [k]
load_balance
   0.50%  [kernel] [k]
rcu_needs_cpu
   0.45%  [kernel] [k] fput


Regards,

  Oliver

On ma, 2013-08-26 at 23:33 -0700, Samuel Just wrote:

I just pushed a patch to wip-dumpling-log-assert (based on current
dumpling head).  I had disabled most of the code in PGLog::check() but
left an (I thought) innocuous assert.  It seems that with (at least)
g++ 4.6.3, stl list::size() is linear in the size of the list, so that
assert actually traverses the pg log on each operation.  The patch in
wip-dumpling-log-assert should disable that assert as well by default.
  Let me know if it helps.

It should be built within an hour of this email.
-Sam

On Mon, Aug 26, 2013 at 10:46 PM, Matthew Anderson
manderson8...@gmail.com wrote:

Hi Guys,

I'm having the same problem as Oliver with 0.67.2. CPU usage is around
double that of the 0.61.8 OSD's in the same cluster which appears to
be causing the performance decrease.

I did a perf comparison (not sure if I did it right but it seems ok).
Both hosts are the same spec running Ubuntu 12.04.1 (3.2 kernel),
journal and osd data is on an SSD, OSD's are in the same pool with the
same weight and the perf tests were run at the same time on a
realworld load consisting of RBD traffic only.

Dumpling -

Events: 332K cycles
  17.93%  ceph-osd  libc-2.15.so   [.] 0x15d523
  17.03%  ceph-osd  ceph-osd   [.] 0x5c2897
   4.66%  ceph-osd  ceph-osd   [.]
leveldb::InternalKeyComparator::Compare(leveldb::Slice const, level
   3.46%  ceph-osd  ceph-osd   [.] leveldb::Block::Iter::Next()
   2.70%  ceph-osd  libstdc++.so.6.0.16[.]
std::string::_M_mutate(unsigned long, unsigned long, unsigned long)
   2.60%  ceph-osd  ceph-osd   [.] PGLog::check()
   2.57%  ceph-osd  [kernel.kallsyms]  [k] __ticket_spin_lock
   2.49%  ceph-osd  ceph-osd   [.] ceph_crc32c_le_intel
   1.93%  ceph-osd  libsnappy.so.1.1.2 [.]
snappy::RawUncompress(snappy::Source*, char*)
   1.53%  ceph-osd  libstdc++.so.6.0.16[.] std::string::append(char
const*, unsigned long)
   1.47%  ceph-osd  libtcmalloc.so.0.1.0   [.] operator new(unsigned long)
   1.33%  ceph-osd  [kernel.kallsyms]  

[ceph-users] Ceph-OSD on compute nodes?

2013-08-27 Thread Mark Chaney
How does the community feel about running OSDs on the same node as openstack 
compute? What if its only 3 sata disks? Isnt ceph-OSD a bit to CPU and ram 
hungry for doing such a thing and would lead little left over for vm instances? 
Just curious as I just saw someone in a forum that said they were going to do 
that and i always thought it was not recommended by ceph developers.

- Mark
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 1 particular ceph-mon never jobs on 0.67.2

2013-08-27 Thread James Page
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 26/08/13 19:31, Sage Weil wrote:
 I'm wondering what kind of delay, or additional start-on
 logic I can add to the upstart script to work around this.
 Hmm, this is beyond my upstart-fu, unfortunately.  This has come up
  before, actually.  Previously we would wait for any interface to
 come up and then start, but that broke with multi-nic machines, and
 I ended up just making things start in runlevel [2345].
 
 James, do you know what should be done to make the job wait for
 *all* network interfaces to be up?  Is that even the right solution
 here?

This is actually really tricky; runlevel [2345] should cover most use
cases as this will ensure that interfaces configured in
/etc/network/interfaces. But it sounds like that might not be the
case; Travis - in your example are all network interfaces configured
using /etc/network/interfaces?


- -- 
James Page
Ubuntu and Debian Developer
james.p...@ubuntu.com
jamesp...@debian.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQIcBAEBCAAGBQJSHG6KAAoJEL/srsug59jDHJ4P/RR0HOUC8awoKlH1f92GRqqa
bOo/vszIY/4c2NhCTXRX3jWxBCexlNLlwYExsbX9hzP3DDOMzOMdXh2rMM9o3zaD
98z+o1jC+hUYf27UmK+ZbZGqr4Xh9bi07g6jF2/u3OmbCcxQUaRQdzDp4dbf1MK4
Q2iigJhLBSiZw+OX0+2210+7Cmz9lKKNeuuUcsqT0jdagPYJIQlIbA9v7aNzsxlI
AmEShkCBoI9lzedyFsBIZ10gtMDrvBPJHyDf3VySW/ZhLlZeAnPhRZaZ/AkcrToX
1x6quQvheWyr52bbe0gnAAoIZUpLyCG0+Xkp9+jw11HWLTdGMsn3nMI7BUZ6MHrB
8rIdBGc9gxuKsZyqP/QRBVWDWjACHckjAl0ORJdeXkfm6ZmruRTEB2CNgXZF+Wl5
h0InmcdjMTIwgxV5wgJ4d6Lom45AKaTIumpBiGvmMjuVm08V0xftkPpNbpIsbbol
fmmpqTlxJtVrsd1CZd1nN74z+EOgrCDRJg4bzSPVRjkYIJc6by3udLSRFlQfz5qd
8pm7PsbSEBEY873HPZMuxAMfXQKf/EMNZTq6bbrA61sgIXUEr/YFmG9EiA8ptnAp
1cy4zRgrnL1KI9rSrjKi19wFeYEn0HRLlPqlA8likUTaGbNpXppohpt7RyE1eA6t
vdMkd1v47yZuNoEsEA8e
=iLfL
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy pushy dependency problem

2013-08-27 Thread Kevin Weiler
Hey ceph devs,

I think there is a problem with your dependencies in the current version of 
ceph-deploy for FC19 (and probably other redhat variants). The package 
ceph-deploy explictly requires python-pushy = 0.5.3, but the pushy package is 
simply named pushy (and is the correct version). The spec file looks fine in 
the ceph-deploy git repo, maybe you just need to rerun the package/repo 
generation? Thanks!

--
Kevin Weiler
IT

IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | 
http://imc-chicago.com/
Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: 
kevin.wei...@imc-chicago.commailto:kevin.wei...@imc-chicago.com



The information in this e-mail is intended only for the person or entity to 
which it is addressed.

It may contain confidential and /or privileged material. If someone other than 
the intended recipient should receive this e-mail, he / she shall not be 
entitled to read, disseminate, disclose or duplicate it.

If you receive this e-mail unintentionally, please inform us immediately by 
reply and then delete it from your system. Although this information has been 
compiled with great care, neither IMC Financial Markets  Asset Management nor 
any of its related entities shall accept any responsibility for any errors, 
omissions or other inaccuracies in this information or for the consequences 
thereof, nor shall it be bound in any way by the contents of this e-mail or its 
attachments. In the event of incomplete or incorrect transmission, please 
return the e-mail to the sender and permanently delete this message and any 
attachments.

Messages and attachments are scanned for all known viruses. Always scan 
attachments before opening them.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy pushy dependency problem

2013-08-27 Thread Nico Massenberg
I'm having the same issue on debian 7.1. When reinstalling ceph-deploy after 
purge I get the following:

root@vl0181:~# aptitude install ceph-deploy
The following NEW packages will be installed:
  ceph-deploy{b} 
0 packages upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 36.5 kB of archives. After unpacking 360 kB will be used.
The following packages have unmet dependencies:
 ceph-deploy : Depends: python-pushy (= 0.5.3) but 0.5.1-1 is installed.
The following actions will resolve these dependencies:

 Keep the following packages at their current version:
1) ceph-deploy [Not Installed] 

Aptitude updated and upgraded.
Regards

Am 26.08.2013 um 23:57 schrieb Kevin Weiler kevin.wei...@imc-chicago.com:

 Hey ceph devs,
 
 I think there is a problem with your dependencies in the current version of 
 ceph-deploy for FC19 (and probably other redhat variants). The package 
 ceph-deploy explictly requires python-pushy = 0.5.3, but the pushy package 
 is simply named pushy (and is the correct version). The spec file looks 
 fine in the ceph-deploy git repo, maybe you just need to rerun the 
 package/repo generation? Thanks!
 
 -- 
 Kevin Weiler
 IT
 
  
 IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | 
 http://imc-chicago.com/
 Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: 
 kevin.wei...@imc-chicago.com
 
 
 The information in this e-mail is intended only for the person or entity to 
 which it is addressed.
 
 It may contain confidential and /or privileged material. If someone other 
 than the intended recipient should receive this e-mail, he / she shall not be 
 entitled to read, disseminate, disclose or duplicate it.
 
 If you receive this e-mail unintentionally, please inform us immediately by 
 reply and then delete it from your system. Although this information has 
 been compiled with great care, neither IMC Financial Markets  Asset 
 Management nor any of its related entities shall accept any responsibility 
 for any errors, omissions or other inaccuracies in this information or for 
 the consequences thereof, nor shall it be bound in any way by the contents of 
 this e-mail or its attachments. In the event of incomplete or incorrect 
 transmission, please return the e-mail to the sender and permanently delete 
 this message and any attachments.
 
 Messages and attachments are scanned for all known viruses. Always scan 
 attachments before opening them.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Storage, File Systems and Data Scrubbing

2013-08-27 Thread Johannes Klarenbeek
 in
 most cases. It seems that most people choose this system because of its
 journaling feature and XFS for its additional attribute storage which has a
 64kb limit which should be sufficient for most operations.

 But when you look at file system benchmarks btrfs is really, really slow.
 Then comes XFS, then EXT4, but EXT2 really dwarfs all other throughput
 results. On journaling systems (like XFS, EXT4 and btrfs) disabling
 journaling actually helps throughput as well. Sometimes more then 2 times
 for write actions.

 The preferred configuration for OSD's is one OSD per disk. Each object is
 striped among all Object Storage Daemons in a cluster. So if I would take
 one disk for the cluster and check its data, chances are slim that I will
 find a complete object there (a non-striped, full object I mean).

 When a client issues an object write (I assume a full object/file write in
 this case) it is the client's responsibility to stripe it among the object
 storage daemons. When a stripe is successfully stored by the daemon an ACK
 signal is send to (?) the client and all participating OSD's. When all
 participating OSD's for the object have completed the client assumes all is
 well and returns control to the application

 If I'm not mistaken, then journaling is meant for the rare occasions that a
 hardware failure will occur and the data is corrupted. Ceph does this too in
 another way of course. But ceph should be able to notice when a block/stripe
 is correct or not. In the rare occasion that a node is failing while doing a
 write; an ACK signal is not send to the caller and therefor the client can
 resend the block/stripe to another OSD. Therefor I fail to see the purpose
 of this extra journaling feature.

 Also ceph schedules a data scrubbing process every day (or however it is
 configured) that should be able to tackle bad sectors or other errors on the
 file system and accordingly repair them on the same daemon or flag the whole
 block as bad. Since everything is replicated the block is still in the
 storage cluster so no harm is done.

 In a normal/single file system I truly see the value of journaling and the
 potential for btrfs (although it's still very slow). However in a system
 like ceph, journaling seems to me more like a paranoid super fail save.

 Did anyone experiment with file systems that disabled journaling and how did
 it perform?

 Regards,
 Johannes






 __ Informatie van ESET Endpoint Antivirus, versie van database
 viruskenmerken 8713 (20130821) __

 Het bericht is gecontroleerd door ESET Endpoint Antivirus.

 http://www.eset.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 __ Informatie van ESET Endpoint Antivirus, versie van database
 viruskenmerken 8713 (20130821) __

 Het bericht is gecontroleerd door ESET Endpoint Antivirus.

 http://www.eset.com


 __ Informatie van ESET Endpoint Antivirus, versie van database
 viruskenmerken 8713 (20130821) __

 Het bericht is gecontroleerd door ESET Endpoint Antivirus.

 http://www.eset.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



__ Informatie van ESET Endpoint Antivirus, versie van database 
viruskenmerken 8731 (20130826) __

Het bericht is gecontroleerd door ESET Endpoint Antivirus.

http://www.eset.com


__ Informatie van ESET Endpoint Antivirus, versie van database 
viruskenmerken 8733 (20130827) __

Het bericht is gecontroleerd door ESET Endpoint Antivirus.

http://www.eset.com


__ Informatie van ESET Endpoint Antivirus, versie van database 
viruskenmerken 8733 (20130827) __

Het bericht is gecontroleerd door ESET Endpoint Antivirus.

http://www.eset.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Some help needed with ceph deployment

2013-08-27 Thread Johannes Klarenbeek
Hi,

It seems that all my pgs are stuck somewhat. I'm not sure what to do from here. 
I waited a day in the hope that ceph would find a way to deal with this... but 
nothing happened.
I'm testing on a single ubuntu server 13.04 with dumpling 0.67.2. Below is my 
ceph status.

root@cephnode2:/root# ceph -s
  cluster 9087eb7a-abe1-4d38-99dc-cb6b266f0f84
   health HEALTH_WARN 37 pgs degraded; 192 pgs stuck unclean
   monmap e1: 1 mons at {cephnode2=172.16.1.2:6789/0}, election epoch 1, quorum 
0 cephnode2
   osdmap e38: 6 osds: 6 up, 6 in
pgmap v65: 192 pgs: 155 active+remapped, 37 active+degraded; 0 bytes data, 
213 MB used, 11172 GB / 11172 GB avail
   mdsmap e1: 0/0/1 up

root@cephnode2:/root# ceph osd tree
# idweight  type name   up/down reweight
-1  10.92   root default
-2  10.92   host cephnode2
0   1.82osd.0   up  1
1   1.82osd.1   up  1
2   1.82osd.2   up  1
3   1.82osd.3   up  1
4   1.82osd.4   up  1
5   1.82osd.5   up  1

root@cephnode2:/root#ceph health detail
HEALTH_WARN 37 pgs degraded; 192 pgs stuck unclean
pg 0.3f is stuck unclean since forever, current state active+remapped, last 
acting [2,0]
pg 1.3e is stuck unclean since forever, current state active+remapped, last 
acting [2,0]
pg 2.3d is stuck unclean since forever, current state active+remapped, last 
acting [2,0]
pg 0.3e is stuck unclean since forever, current state active+remapped, last 
acting [4,0]
pg 1.3f is stuck unclean since forever, current state active+remapped, last 
acting [1,0]
pg 2.3c is stuck unclean since forever, current state active+remapped, last 
acting [4,0]
pg 0.3d is stuck unclean since forever, current state active+degraded, last 
acting [0]
pg 1.3c is stuck unclean since forever, current state active+degraded, last 
acting [0]
pg 2.3f is stuck unclean since forever, current state active+remapped, last 
acting [4,1]
pg 0.3c is stuck unclean since forever, current state active+remapped, last 
acting [3,1]
pg 1.3d is stuck unclean since forever, current state active+remapped, last 
acting [4,0]
pg 2.3e is stuck unclean since forever, current state active+remapped, last 
acting [1,0]
pg 0.3b is stuck unclean since forever, current state active+degraded, last 
acting [0]
pg 1.3a is stuck unclean since forever, current state active+degraded, last 
acting [0]
pg 2.39 is stuck unclean since forever, current state active+degraded, last 
acting [0]
pg 0.3a is stuck unclean since forever, current state active+remapped, last 
acting [1,0]
pg 1.3b is stuck unclean since forever, current state active+remapped, last 
acting [3,1]
pg 2.38 is stuck unclean since forever, current state active+remapped, last 
acting [1,0]
pg 0.39 is stuck unclean since forever, current state active+degraded, last 
acting [0]
pg 1.38 is stuck unclean since forever, current state active+degraded, last 
acting [0]
pg 2.3b is stuck unclean since forever, current state active+degraded, last 
acting [0]
pg 0.38 is stuck unclean since forever, current state active+remapped, last 
acting [1,0]
pg 1.39 is stuck unclean since forever, current state active+remapped, last 
acting [1,0]
pg 2.3a is stuck unclean since forever, current state active+remapped, last 
acting [3,1]
pg 0.37 is stuck unclean since forever, current state active+remapped, last 
acting [3,2]
[...] and many more.

I found one entry on the mailing list from someone that had a similar issue and 
he fixed it with the following commands:

#ceph osd getcrushmap -o /tmp/crush
#crushtool -i /tmp/crush --enable-unsafe-tunables
--set-choose-local-tries 0 --set-choose-local-fallback-tries 0
--set-choose-total-tries 50 -o /tmp/crush.new
root@ceph-admin:/etc/ceph# ceph osd setcrushmap -i /tmp/crush.new

but I'm not sure what he is trying to do here. Especially 
-enable-unsafe-tunables seems a little ... unsafe.

I also read this 
http://eu.ceph.com/docs/wip-3060/ops/manage/failures/osd/#failures-osd-unfound 
link. But it doesn't detail about any actions that one can do in order to fix 
it to a HEALTH_OK status.


Regards,
Johannes


__ Informatie van ESET Endpoint Antivirus, versie van database 
viruskenmerken 8733 (20130827) __

Het bericht is gecontroleerd door ESET Endpoint Antivirus.

http://www.eset.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-27 Thread Ian Colle

On Aug 27, 2013, at 2:08, Oliver Daudey oli...@xs4all.nl wrote:

 Hey Samuel,
 
 The PGLog::check() is now no longer visible in profiling, so it helped
 for that.  Unfortunately, it doesn't seem to have helped to bring down
 the OSD's CPU-loading much.  Leveldb still uses much more than in
 Cuttlefish.  On my test-cluster, I didn't notice any difference in the
 RBD bench-results, either, so I have to assume that it didn't help
 performance much.
 
 Here's the `perf top' I took just now on my production-cluster with your
 new version, under regular load.  Also note the memcmp and memcpy,
 which also don't show up when running a Cuttlefish-OSD:

memcpy is in fact also present in your Cuttlefish OSD, just a bit further down 
the list (increased from .7% to 1.4%). 

memcmp definitely looks suspicious and is something we're looking into.

 15.65%  [kernel] [k]
 intel_idle
  7.20%  libleveldb.so.1.9[.]
 0x3ceae   
  6.28%  libc-2.11.3.so   [.]
 memcmp
  5.22%  [kernel] [k]
 find_busiest_group
  3.92%  kvm  [.]
 0x2cf006  
  2.40%  libleveldb.so.1.9[.]
 leveldb::InternalKeyComparator::Compar
  1.95%  [kernel] [k]
 _raw_spin_lock
  1.69%  [kernel] [k]
 default_send_IPI_mask_sequence_phys   
  1.46%  libc-2.11.3.so   [.]
 memcpy
  1.17%  libleveldb.so.1.9[.]
 leveldb::Block::Iter::Next()  
  1.16%  [kernel] [k]
 hrtimer_interrupt 
  1.07%  [kernel] [k]
 native_write_cr0  
  1.01%  [kernel] [k]
 __hrtimer_start_range_ns  
  1.00%  [kernel] [k]
 clockevents_program_event 
  0.93%  [kernel] [k]
 find_next_bit 
  0.93%  libstdc++.so.6.0.13  [.]
 std::string::_M_mutate(unsigned long, 
  0.89%  [kernel] [k]
 cpumask_next_and  
  0.87%  [kernel] [k]
 __schedule
  0.85%  [kernel] [k]
 _raw_spin_unlock_irqrestore   
  0.85%  [kernel] [k]
 do_select 
  0.84%  [kernel] [k]
 apic_timer_interrupt  
  0.80%  [kernel] [k]
 fget_light
  0.79%  [kernel] [k]
 native_write_msr_safe 
  0.76%  [kernel] [k]
 _raw_spin_lock_irqsave
  0.66%  libc-2.11.3.so   [.]
 0xdc6d8   
  0.61%  libpthread-2.11.3.so [.]
 pthread_mutex_lock
  0.61%  [kernel] [k]
 tg_load_down  
  0.59%  [kernel] [k]
 reschedule_interrupt  
  0.59%  libsnappy.so.1.1.2   [.]
 snappy::RawUncompress(snappy::Source*,
  0.56%  libstdc++.so.6.0.13  [.] std::string::append(char
 const*, unsig
  0.54%  [kvm_intel]  [k]
 vmx_vcpu_run  
  0.53%  [kernel] [k]
 copy_user_generic_string  
  0.53%  [kernel] [k]
 load_balance  
  0.50%  [kernel] [k]
 rcu_needs_cpu 
  0.45%  [kernel] [k] fput
 
 
   Regards,
 
 Oliver
 
 On ma, 2013-08-26 at 23:33 -0700, Samuel Just wrote:
 I just pushed a patch to wip-dumpling-log-assert (based on current
 dumpling head).  I had disabled most of the code in PGLog::check() but
 left an (I thought) innocuous assert.  It seems that with (at least)
 g++ 4.6.3, stl list::size() is linear in the size of the list, so that
 assert actually traverses the pg log on each operation.  The patch in
 wip-dumpling-log-assert should disable that assert as well by default.
 Let me know if it helps.
 
 It should be built within an hour of this email.
 -Sam
 
 On Mon, Aug 26, 2013 at 10:46 PM, Matthew Anderson
 manderson8...@gmail.com wrote:
 Hi Guys,
 
 I'm having the same problem as Oliver with 0.67.2. CPU usage is around
 double that of the 0.61.8 OSD's in the same cluster which appears to
 be causing the performance decrease.
 
 I did a perf comparison (not sure if I did it right but it seems ok).
 Both hosts are the same spec running Ubuntu 12.04.1 (3.2 kernel),
 journal and osd data is on an SSD, OSD's are in the same pool with the
 same weight and the perf tests were run at the same time on a
 realworld load consisting of RBD traffic only.
 
 Dumpling -
 
 Events: 332K cycles
 17.93%  ceph-osd  libc-2.15.so   [.] 0x15d523
 17.03%  ceph-osd  ceph-osd  

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-27 Thread Oliver Daudey
Hey Ian, Samuel,

FYI: I still had some attempted optimization-options in place on the
production-cluster, which might have skewed my results a bit.  OSD
version 0.67.2-16-geeb1f86 seems to be a lot less hard on the CPU in the
configuration that I did all other tests in.  I haven't yet verified
sufficiently if this is accompanied by a speed-increase as well.  On the
test-cluster, I didn't see any difference in speed, but that may not
mean much, as the load-pattern on production is totally different.
Sorry for that mixup.

Updated `perf top'-output, extra options removed, under current load,
which should be higher than in my previous mail:
 18.08%  [kernel] [k] intel_idle

  5.87%  [kernel] [k] find_busiest_group

  4.92%  kvm  [.] 0xcefe2

  3.24%  [kernel] [k] native_write_msr_safe

  2.92%  [kernel] [k]
default_send_IPI_mask_sequence_phys
  2.66%  [kernel] [k] _raw_spin_lock

  1.50%  [kernel] [k] native_write_cr0

  1.36%  libleveldb.so.1.9[.] 0x3cebc

  1.27%  [kernel] [k] __hrtimer_start_range_ns

  1.17%  [kernel] [k] hrtimer_interrupt

  1.10%  libc-2.11.3.so   [.] memcmp

  1.07%  [kernel] [k] apic_timer_interrupt

  1.00%  [kernel] [k] find_next_bit

  0.99%  [kernel] [k] cpumask_next_and

  0.99%  [kernel] [k] __schedule

  0.97%  [kernel] [k] clockevents_program_event

  0.97%  [kernel] [k] _raw_spin_unlock_irqrestore

  0.90%  [kernel] [k] fget_light

  0.85%  [kernel] [k] do_select

  0.84%  [kernel] [k] reschedule_interrupt

  0.83%  [kvm_intel]  [k] vmx_vcpu_run

  0.79%  [kernel] [k] _raw_spin_lock_irqsave

  0.78%  [kernel] [k] try_to_wake_up

  0.70%  libc-2.11.3.so   [.] memcpy

  0.66%  [kernel] [k] copy_user_generic_string

  0.63%  [kernel] [k] sync_inodes_sb

  0.61%  [kernel] [k] load_balance

  0.61%  [kernel] [k] tg_load_down

  0.56%  [kernel] [k] irq_entries_start

  0.56%  libc-2.11.3.so   [.] 0x73612

  0.54%  libpthread-2.11.3.so [.] pthread_mutex_lock

  0.51%  [kernel] [k] rcu_needs_cpu


   Regards,

  Oliver

On 27-08-13 16:04, Ian Colle wrote:
 
 On Aug 27, 2013, at 2:08, Oliver Daudey oli...@xs4all.nl wrote:
 
 Hey Samuel,

 The PGLog::check() is now no longer visible in profiling, so it helped
 for that.  Unfortunately, it doesn't seem to have helped to bring down
 the OSD's CPU-loading much.  Leveldb still uses much more than in
 Cuttlefish.  On my test-cluster, I didn't notice any difference in the
 RBD bench-results, either, so I have to assume that it didn't help
 performance much.

 Here's the `perf top' I took just now on my production-cluster with your
 new version, under regular load.  Also note the memcmp and memcpy,
 which also don't show up when running a Cuttlefish-OSD:
 
 memcpy is in fact also present in your Cuttlefish OSD, just a bit further 
 down the list (increased from .7% to 1.4%). 
 
 memcmp definitely looks suspicious and is something we're looking into.
 
 15.65%  [kernel] [k]
 intel_idle
  7.20%  libleveldb.so.1.9[.]
 0x3ceae   
  6.28%  libc-2.11.3.so   [.]
 memcmp
  5.22%  [kernel] [k]
 find_busiest_group
  3.92%  kvm  [.]
 0x2cf006  
  2.40%  libleveldb.so.1.9[.]
 leveldb::InternalKeyComparator::Compar
  1.95%  [kernel] [k]
 _raw_spin_lock
  1.69%  [kernel] [k]
 default_send_IPI_mask_sequence_phys   
  1.46%  libc-2.11.3.so   [.]
 memcpy
  1.17%  libleveldb.so.1.9[.]
 leveldb::Block::Iter::Next()  
  1.16%  [kernel] [k]
 hrtimer_interrupt 
  1.07%  [kernel] [k]
 native_write_cr0  
  1.01%  [kernel] [k]
 __hrtimer_start_range_ns  
  1.00%  [kernel] [k]
 clockevents_program_event 
  0.93%  [kernel] [k]
 find_next_bit 
  0.93%  libstdc++.so.6.0.13  [.]
 std::string::_M_mutate(unsigned long, 
  0.89%  [kernel] [k]
 cpumask_next_and  
  0.87%  [kernel] [k]
 __schedule
  0.85%  [kernel] [k]
 

Re: [ceph-users] 1 particular ceph-mon never jobs on 0.67.2

2013-08-27 Thread Travis Rhoden
Hi James,

Yes, all configured using the interfaces file.  Only two interfaces, eth0
and eth1:

auto eth0
iface eth0 inet dhcp

auto eth1
iface eth1 inet dhcp

I took a single node and rebooted it several times, and it really was about
50/50 whether or not the OSDs showed up under 'localhost' or n0.  I tried
a few different things last night with no luck.  I modified when ceph-all
starts by writing differet start on values to
/etc/init/ceph-all.override.  I was grasping for straws a bit, as I just
kept adding (and'ing) events, hoping to find something that works.  I tried:

start on (local-filesystems and net-device-up IFACE=eth0)
start on (local-filesystems and net-device-up IFACE=eth0 and net-device-up
IFACE=eth1)
start on (local-filesystems and net-device-up IFACE=eth0 and net-device-up
IFACE=eth1 and started network-services)

Oddly, the last one seemed to work at first.  When I added the started
network-services to the list, the OSDs came up correctly each time!  But,
the monitor never started.  If I started it directly start ceph-mon
id=n0, it came up fine, but not during boot.  I spent a couple hours
trying to debug *that* before I gave up and switched to static hostnames.
=/  I had even thrown --verbose in the kernel command line so I could see
all the upstart events happening, but didn't see anything obvious.

So now I'm back to the stock upstart scripts, using static hostnames, and,
and I don't have any issues with OSDs moving in the crushmap, or any new
problems with the monitors.  Sage, I do think I still saw a weird issue
with my third mon not starting (same as the original email -- even now with
static hostnames), but it was late, and I lost access to the cluster right
about then and haven't regained it.  Ill double-check that when I get
access again and hopefully will find that problem has gone away too.

 - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-27 Thread Samuel Just
What options were you using?
-Sam

On Tue, Aug 27, 2013 at 7:35 AM, Oliver Daudey oli...@xs4all.nl wrote:
 Hey Ian, Samuel,

 FYI: I still had some attempted optimization-options in place on the
 production-cluster, which might have skewed my results a bit.  OSD
 version 0.67.2-16-geeb1f86 seems to be a lot less hard on the CPU in the
 configuration that I did all other tests in.  I haven't yet verified
 sufficiently if this is accompanied by a speed-increase as well.  On the
 test-cluster, I didn't see any difference in speed, but that may not
 mean much, as the load-pattern on production is totally different.
 Sorry for that mixup.

 Updated `perf top'-output, extra options removed, under current load,
 which should be higher than in my previous mail:
  18.08%  [kernel] [k] intel_idle

   5.87%  [kernel] [k] find_busiest_group

   4.92%  kvm  [.] 0xcefe2

   3.24%  [kernel] [k] native_write_msr_safe

   2.92%  [kernel] [k]
 default_send_IPI_mask_sequence_phys
   2.66%  [kernel] [k] _raw_spin_lock

   1.50%  [kernel] [k] native_write_cr0

   1.36%  libleveldb.so.1.9[.] 0x3cebc

   1.27%  [kernel] [k] __hrtimer_start_range_ns

   1.17%  [kernel] [k] hrtimer_interrupt

   1.10%  libc-2.11.3.so   [.] memcmp

   1.07%  [kernel] [k] apic_timer_interrupt

   1.00%  [kernel] [k] find_next_bit

   0.99%  [kernel] [k] cpumask_next_and

   0.99%  [kernel] [k] __schedule

   0.97%  [kernel] [k] clockevents_program_event

   0.97%  [kernel] [k] _raw_spin_unlock_irqrestore

   0.90%  [kernel] [k] fget_light

   0.85%  [kernel] [k] do_select

   0.84%  [kernel] [k] reschedule_interrupt

   0.83%  [kvm_intel]  [k] vmx_vcpu_run

   0.79%  [kernel] [k] _raw_spin_lock_irqsave

   0.78%  [kernel] [k] try_to_wake_up

   0.70%  libc-2.11.3.so   [.] memcpy

   0.66%  [kernel] [k] copy_user_generic_string

   0.63%  [kernel] [k] sync_inodes_sb

   0.61%  [kernel] [k] load_balance

   0.61%  [kernel] [k] tg_load_down

   0.56%  [kernel] [k] irq_entries_start

   0.56%  libc-2.11.3.so   [.] 0x73612

   0.54%  libpthread-2.11.3.so [.] pthread_mutex_lock

   0.51%  [kernel] [k] rcu_needs_cpu


Regards,

   Oliver

 On 27-08-13 16:04, Ian Colle wrote:

 On Aug 27, 2013, at 2:08, Oliver Daudey oli...@xs4all.nl wrote:

 Hey Samuel,

 The PGLog::check() is now no longer visible in profiling, so it helped
 for that.  Unfortunately, it doesn't seem to have helped to bring down
 the OSD's CPU-loading much.  Leveldb still uses much more than in
 Cuttlefish.  On my test-cluster, I didn't notice any difference in the
 RBD bench-results, either, so I have to assume that it didn't help
 performance much.

 Here's the `perf top' I took just now on my production-cluster with your
 new version, under regular load.  Also note the memcmp and memcpy,
 which also don't show up when running a Cuttlefish-OSD:

 memcpy is in fact also present in your Cuttlefish OSD, just a bit further 
 down the list (increased from .7% to 1.4%).

 memcmp definitely looks suspicious and is something we're looking into.

 15.65%  [kernel] [k]
 intel_idle
  7.20%  libleveldb.so.1.9[.]
 0x3ceae
  6.28%  libc-2.11.3.so   [.]
 memcmp
  5.22%  [kernel] [k]
 find_busiest_group
  3.92%  kvm  [.]
 0x2cf006
  2.40%  libleveldb.so.1.9[.]
 leveldb::InternalKeyComparator::Compar
  1.95%  [kernel] [k]
 _raw_spin_lock
  1.69%  [kernel] [k]
 default_send_IPI_mask_sequence_phys
  1.46%  libc-2.11.3.so   [.]
 memcpy
  1.17%  libleveldb.so.1.9[.]
 leveldb::Block::Iter::Next()
  1.16%  [kernel] [k]
 hrtimer_interrupt
  1.07%  [kernel] [k]
 native_write_cr0
  1.01%  [kernel] [k]
 __hrtimer_start_range_ns
  1.00%  [kernel] [k]
 clockevents_program_event
  0.93%  [kernel] [k]
 find_next_bit
  0.93%  libstdc++.so.6.0.13  [.]
 std::string::_M_mutate(unsigned long,
  0.89%  [kernel] [k]
 cpumask_next_and
  0.87%  [kernel] [k]
 __schedule
  0.85%  [kernel] [k]
 _raw_spin_unlock_irqrestore
  0.85%  [kernel] [k]
 do_select
  0.84%  [kernel] [k]
 apic_timer_interrupt
  0.80%  [kernel] [k]
 fget_light
  0.79%  [kernel] 

Re: [ceph-users] ceph-deploy pushy dependency problem

2013-08-27 Thread Gary Lowell
Hi Kevin -

The latest version, ceph-deploy 1.2.2 in the  ceph rpm-dumpling repo should 
have the correct dependency (see below).If doesn't work for you, or if you 
are using a different repo please let me know.

Thanks,
Gary


ubuntu@jenkins:~/repos/rpm-dumpling/fc19/noarch$ rpm -q --requires -p 
ceph-deploy-1.2.2-0.noarch.rpm
warning: ceph-deploy-1.2.2-0.noarch.rpm: Header V4 RSA/SHA1 Signature, key ID 
17ed316d: NOKEY
/usr/bin/env
gdisk
pushy = 0.5.3
python(abi) = 2.7
python-argparse
python-distribute
rpmlib(CompressedFileNames) = 3.0.4-1
rpmlib(PayloadFilesHavePrefix) = 4.0-1





On Aug 26, 2013, at 2:57 PM, Kevin Weiler kevin.wei...@imc-chicago.com wrote:

 Hey ceph devs,
 
 I think there is a problem with your dependencies in the current version of 
 ceph-deploy for FC19 (and probably other redhat variants). The package 
 ceph-deploy explictly requires python-pushy = 0.5.3, but the pushy package 
 is simply named pushy (and is the correct version). The spec file looks 
 fine in the ceph-deploy git repo, maybe you just need to rerun the 
 package/repo generation? Thanks!
 
 -- 
 Kevin Weiler
 IT
 
  
 IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | 
 http://imc-chicago.com/
 Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: 
 kevin.wei...@imc-chicago.com
 
 
 The information in this e-mail is intended only for the person or entity to 
 which it is addressed.
 
 It may contain confidential and /or privileged material. If someone other 
 than the intended recipient should receive this e-mail, he / she shall not be 
 entitled to read, disseminate, disclose or duplicate it.
 
 If you receive this e-mail unintentionally, please inform us immediately by 
 reply and then delete it from your system. Although this information has 
 been compiled with great care, neither IMC Financial Markets  Asset 
 Management nor any of its related entities shall accept any responsibility 
 for any errors, omissions or other inaccuracies in this information or for 
 the consequences thereof, nor shall it be bound in any way by the contents of 
 this e-mail or its attachments. In the event of incomplete or incorrect 
 transmission, please return the e-mail to the sender and permanently delete 
 this message and any attachments.
 
 Messages and attachments are scanned for all known viruses. Always scan 
 attachments before opening them.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-27 Thread Oliver Daudey
Hey Samuel,

These:
osd op threads = 8
osd disk threads = 2
filestore op threads = 8

They increased performance on my test-cluster, but seemed to have the
opposite effect on the much heavier loaded production-environment.


   Regards,

  Oliver

On 27-08-13 16:37, Samuel Just wrote:
 What options were you using?
 -Sam
 
 On Tue, Aug 27, 2013 at 7:35 AM, Oliver Daudey oli...@xs4all.nl wrote:
 Hey Ian, Samuel,

 FYI: I still had some attempted optimization-options in place on the
 production-cluster, which might have skewed my results a bit.  OSD
 version 0.67.2-16-geeb1f86 seems to be a lot less hard on the CPU in the
 configuration that I did all other tests in.  I haven't yet verified
 sufficiently if this is accompanied by a speed-increase as well.  On the
 test-cluster, I didn't see any difference in speed, but that may not
 mean much, as the load-pattern on production is totally different.
 Sorry for that mixup.

 Updated `perf top'-output, extra options removed, under current load,
 which should be higher than in my previous mail:
  18.08%  [kernel] [k] intel_idle

   5.87%  [kernel] [k] find_busiest_group

   4.92%  kvm  [.] 0xcefe2

   3.24%  [kernel] [k] native_write_msr_safe

   2.92%  [kernel] [k]
 default_send_IPI_mask_sequence_phys
   2.66%  [kernel] [k] _raw_spin_lock

   1.50%  [kernel] [k] native_write_cr0

   1.36%  libleveldb.so.1.9[.] 0x3cebc

   1.27%  [kernel] [k] __hrtimer_start_range_ns

   1.17%  [kernel] [k] hrtimer_interrupt

   1.10%  libc-2.11.3.so   [.] memcmp

   1.07%  [kernel] [k] apic_timer_interrupt

   1.00%  [kernel] [k] find_next_bit

   0.99%  [kernel] [k] cpumask_next_and

   0.99%  [kernel] [k] __schedule

   0.97%  [kernel] [k] clockevents_program_event

   0.97%  [kernel] [k] _raw_spin_unlock_irqrestore

   0.90%  [kernel] [k] fget_light

   0.85%  [kernel] [k] do_select

   0.84%  [kernel] [k] reschedule_interrupt

   0.83%  [kvm_intel]  [k] vmx_vcpu_run

   0.79%  [kernel] [k] _raw_spin_lock_irqsave

   0.78%  [kernel] [k] try_to_wake_up

   0.70%  libc-2.11.3.so   [.] memcpy

   0.66%  [kernel] [k] copy_user_generic_string

   0.63%  [kernel] [k] sync_inodes_sb

   0.61%  [kernel] [k] load_balance

   0.61%  [kernel] [k] tg_load_down

   0.56%  [kernel] [k] irq_entries_start

   0.56%  libc-2.11.3.so   [.] 0x73612

   0.54%  libpthread-2.11.3.so [.] pthread_mutex_lock

   0.51%  [kernel] [k] rcu_needs_cpu


Regards,

   Oliver

 On 27-08-13 16:04, Ian Colle wrote:

 On Aug 27, 2013, at 2:08, Oliver Daudey oli...@xs4all.nl wrote:

 Hey Samuel,

 The PGLog::check() is now no longer visible in profiling, so it helped
 for that.  Unfortunately, it doesn't seem to have helped to bring down
 the OSD's CPU-loading much.  Leveldb still uses much more than in
 Cuttlefish.  On my test-cluster, I didn't notice any difference in the
 RBD bench-results, either, so I have to assume that it didn't help
 performance much.

 Here's the `perf top' I took just now on my production-cluster with your
 new version, under regular load.  Also note the memcmp and memcpy,
 which also don't show up when running a Cuttlefish-OSD:

 memcpy is in fact also present in your Cuttlefish OSD, just a bit further 
 down the list (increased from .7% to 1.4%).

 memcmp definitely looks suspicious and is something we're looking into.

 15.65%  [kernel] [k]
 intel_idle
  7.20%  libleveldb.so.1.9[.]
 0x3ceae
  6.28%  libc-2.11.3.so   [.]
 memcmp
  5.22%  [kernel] [k]
 find_busiest_group
  3.92%  kvm  [.]
 0x2cf006
  2.40%  libleveldb.so.1.9[.]
 leveldb::InternalKeyComparator::Compar
  1.95%  [kernel] [k]
 _raw_spin_lock
  1.69%  [kernel] [k]
 default_send_IPI_mask_sequence_phys
  1.46%  libc-2.11.3.so   [.]
 memcpy
  1.17%  libleveldb.so.1.9[.]
 leveldb::Block::Iter::Next()
  1.16%  [kernel] [k]
 hrtimer_interrupt
  1.07%  [kernel] [k]
 native_write_cr0
  1.01%  [kernel] [k]
 __hrtimer_start_range_ns
  1.00%  [kernel] [k]
 clockevents_program_event
  0.93%  [kernel] [k]
 find_next_bit
  0.93%  libstdc++.so.6.0.13  [.]
 std::string::_M_mutate(unsigned long,
  0.89%  [kernel] [k]
 cpumask_next_and

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-27 Thread Oliver Daudey
Hey Mark,

That will take a day or so for me to know with enough certainty.  With
the low CPU-usage and preliminary results today, I'm confident enough to
upgrade all OSDs in production and test the cluster all-Dumpling
tomorrow.  For now, I only upgraded a single OSD and measured CPU-usage
and whatever performance-effects that had on the cluster, so if I would
lose that OSD, I could recover. :-)  Will get back to you.


   Regards,

   Oliver

On 27-08-13 15:04, Mark Nelson wrote:
 Hi Olver/Matthew,
 
 Ignoring CPU usage, has speed remained slower as well?
 
 Mark
 
 On 08/27/2013 03:08 AM, Oliver Daudey wrote:
 Hey Samuel,

 The PGLog::check() is now no longer visible in profiling, so it helped
 for that.  Unfortunately, it doesn't seem to have helped to bring down
 the OSD's CPU-loading much.  Leveldb still uses much more than in
 Cuttlefish.  On my test-cluster, I didn't notice any difference in the
 RBD bench-results, either, so I have to assume that it didn't help
 performance much.

 Here's the `perf top' I took just now on my production-cluster with your
 new version, under regular load.  Also note the memcmp and memcpy,
 which also don't show up when running a Cuttlefish-OSD:
   15.65%  [kernel] [k]
 intel_idle
7.20%  libleveldb.so.1.9[.]
 0x3ceae
6.28%  libc-2.11.3.so   [.]
 memcmp
5.22%  [kernel] [k]
 find_busiest_group
3.92%  kvm  [.]
 0x2cf006
2.40%  libleveldb.so.1.9[.]
 leveldb::InternalKeyComparator::Compar
1.95%  [kernel] [k]
 _raw_spin_lock
1.69%  [kernel] [k]
 default_send_IPI_mask_sequence_phys
1.46%  libc-2.11.3.so   [.]
 memcpy
1.17%  libleveldb.so.1.9[.]
 leveldb::Block::Iter::Next()
1.16%  [kernel] [k]
 hrtimer_interrupt
1.07%  [kernel] [k]
 native_write_cr0
1.01%  [kernel] [k]
 __hrtimer_start_range_ns
1.00%  [kernel] [k]
 clockevents_program_event
0.93%  [kernel] [k]
 find_next_bit
0.93%  libstdc++.so.6.0.13  [.]
 std::string::_M_mutate(unsigned long,
0.89%  [kernel] [k]
 cpumask_next_and
0.87%  [kernel] [k]
 __schedule
0.85%  [kernel] [k]
 _raw_spin_unlock_irqrestore
0.85%  [kernel] [k]
 do_select
0.84%  [kernel] [k]
 apic_timer_interrupt
0.80%  [kernel] [k]
 fget_light
0.79%  [kernel] [k]
 native_write_msr_safe
0.76%  [kernel] [k]
 _raw_spin_lock_irqsave
0.66%  libc-2.11.3.so   [.]
 0xdc6d8
0.61%  libpthread-2.11.3.so [.]
 pthread_mutex_lock
0.61%  [kernel] [k]
 tg_load_down
0.59%  [kernel] [k]
 reschedule_interrupt
0.59%  libsnappy.so.1.1.2   [.]
 snappy::RawUncompress(snappy::Source*,
0.56%  libstdc++.so.6.0.13  [.] std::string::append(char
 const*, unsig
0.54%  [kvm_intel]  [k]
 vmx_vcpu_run
0.53%  [kernel] [k]
 copy_user_generic_string
0.53%  [kernel] [k]
 load_balance
0.50%  [kernel] [k]
 rcu_needs_cpu
0.45%  [kernel] [k] fput


 Regards,

   Oliver

 On ma, 2013-08-26 at 23:33 -0700, Samuel Just wrote:
 I just pushed a patch to wip-dumpling-log-assert (based on current
 dumpling head).  I had disabled most of the code in PGLog::check() but
 left an (I thought) innocuous assert.  It seems that with (at least)
 g++ 4.6.3, stl list::size() is linear in the size of the list, so that
 assert actually traverses the pg log on each operation.  The patch in
 wip-dumpling-log-assert should disable that assert as well by default.
   Let me know if it helps.

 It should be built within an hour of this email.
 -Sam

 On Mon, Aug 26, 2013 at 10:46 PM, Matthew Anderson
 manderson8...@gmail.com wrote:
 Hi Guys,

 I'm having the same problem as Oliver with 0.67.2. CPU usage is around
 double that of the 0.61.8 OSD's in the same cluster which appears to
 be causing the performance decrease.

 I did a perf comparison (not sure if I did it right but it seems ok).
 Both hosts are the same spec running Ubuntu 12.04.1 (3.2 kernel),
 journal and osd data is on an SSD, OSD's are in the same pool with the
 same weight and the perf tests were run at the same time on a
 realworld load consisting of RBD traffic only.

 Dumpling -

 Events: 332K cycles
   17.93%  ceph-osd  libc-2.15.so   [.] 0x15d523
   17.03%  ceph-osd  ceph-osd   [.] 0x5c2897
4.66%  ceph-osd  ceph-osd   [.]
 leveldb::InternalKeyComparator::Compare(leveldb::Slice const, level
3.46%  ceph-osd  ceph-osd   [.]
 leveldb::Block::Iter::Next()

Re: [ceph-users] ceph-deploy pushy dependency problem

2013-08-27 Thread Gary Lowell
Hi Mike -

We released a 1.2.2 version recently that should have the correct dependency.  
You may have to do  an aptitude update to refresh the cache.   I did a test 
install on our debian wheezy system and that correctly updated the python pushy 
package.   let me know if you continue to have problems.

Thanks,
Gary


root@gitbuilder-cdep-deb-wheezy-amd64-basic:~# aptitude install ceph-deploy
The following NEW packages will be installed:
  ceph-deploy 
The following packages will be upgraded:
  python-pushy 
1 packages upgraded, 1 newly installed, 0 to remove and 6 not upgraded.
Need to get 0 B/69.1 kB of archives. After unpacking 417 kB will be used.
Do you want to continue? [Y/n/?] y
debconf: delaying package configuration, since apt-utils is not installed
(Reading database ... 42202 files and directories currently installed.)
Preparing to replace python-pushy 0.5.1-1 (using 
.../python-pushy_0.5.3-1~bpo70+1.ceph_amd64.deb) ...
Unpacking replacement python-pushy ...
Selecting previously unselected package ceph-deploy.
Unpacking ceph-deploy (from .../ceph-deploy_1.2.2~bpo70+1_all.deb) ...
Setting up python-pushy (0.5.3-1~bpo70+1.ceph) ...
Setting up ceph-deploy (1.2.2~bpo70+1) ...
 
Current status: 6 updates [-1].




On Aug 27, 2013, at 6:13 AM, Nico Massenberg nico.massenb...@kontrast.de 
wrote:

 I'm having the same issue on debian 7.1. When reinstalling ceph-deploy after 
 purge I get the following:
 
 root@vl0181:~# aptitude install ceph-deploy
 The following NEW packages will be installed:
   ceph-deploy{b} 
 0 packages upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
 Need to get 36.5 kB of archives. After unpacking 360 kB will be used.
 The following packages have unmet dependencies:
  ceph-deploy : Depends: python-pushy (= 0.5.3) but 0.5.1-1 is installed.
 The following actions will resolve these dependencies:
 
  Keep the following packages at their current version:
 1) ceph-deploy [Not Installed] 
 
 Aptitude updated and upgraded.
 Regards
 
 Am 26.08.2013 um 23:57 schrieb Kevin Weiler kevin.wei...@imc-chicago.com:
 
 Hey ceph devs,
 
 I think there is a problem with your dependencies in the current version of 
 ceph-deploy for FC19 (and probably other redhat variants). The package 
 ceph-deploy explictly requires python-pushy = 0.5.3, but the pushy package 
 is simply named pushy (and is the correct version). The spec file looks 
 fine in the ceph-deploy git repo, maybe you just need to rerun the 
 package/repo generation? Thanks!
 
 -- 
 Kevin Weiler
 IT
 
  
 IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 
 | http://imc-chicago.com/
 Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: 
 kevin.wei...@imc-chicago.com
 
 
 The information in this e-mail is intended only for the person or entity to 
 which it is addressed.
 
 It may contain confidential and /or privileged material. If someone other 
 than the intended recipient should receive this e-mail, he / she shall not 
 be entitled to read, disseminate, disclose or duplicate it.
 
 If you receive this e-mail unintentionally, please inform us immediately by 
 reply and then delete it from your system. Although this information has 
 been compiled with great care, neither IMC Financial Markets  Asset 
 Management nor any of its related entities shall accept any responsibility 
 for any errors, omissions or other inaccuracies in this information or for 
 the consequences thereof, nor shall it be bound in any way by the contents 
 of this e-mail or its attachments. In the event of incomplete or incorrect 
 transmission, please return the e-mail to the sender and permanently delete 
 this message and any attachments.
 
 Messages and attachments are scanned for all known viruses. Always scan 
 attachments before opening them.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-27 Thread Mark Nelson
Ok, definitely let us know how it goes!  For what it's worth, I'm 
testing Sam's wip-dumpling-perf branch with the wbthrottle code disabled 
now and comparing it both to that same branch with it enabled along with 
0.67.1.  Don't have any perf data, but quite a bit of other data to look 
through, both in terms of RADOS bench and RBD.


Mark

On 08/27/2013 10:07 AM, Oliver Daudey wrote:

Hey Mark,

That will take a day or so for me to know with enough certainty.  With
the low CPU-usage and preliminary results today, I'm confident enough to
upgrade all OSDs in production and test the cluster all-Dumpling
tomorrow.  For now, I only upgraded a single OSD and measured CPU-usage
and whatever performance-effects that had on the cluster, so if I would
lose that OSD, I could recover. :-)  Will get back to you.


Regards,

Oliver

On 27-08-13 15:04, Mark Nelson wrote:

Hi Olver/Matthew,

Ignoring CPU usage, has speed remained slower as well?

Mark

On 08/27/2013 03:08 AM, Oliver Daudey wrote:

Hey Samuel,

The PGLog::check() is now no longer visible in profiling, so it helped
for that.  Unfortunately, it doesn't seem to have helped to bring down
the OSD's CPU-loading much.  Leveldb still uses much more than in
Cuttlefish.  On my test-cluster, I didn't notice any difference in the
RBD bench-results, either, so I have to assume that it didn't help
performance much.

Here's the `perf top' I took just now on my production-cluster with your
new version, under regular load.  Also note the memcmp and memcpy,
which also don't show up when running a Cuttlefish-OSD:
   15.65%  [kernel] [k]
intel_idle
7.20%  libleveldb.so.1.9[.]
0x3ceae
6.28%  libc-2.11.3.so   [.]
memcmp
5.22%  [kernel] [k]
find_busiest_group
3.92%  kvm  [.]
0x2cf006
2.40%  libleveldb.so.1.9[.]
leveldb::InternalKeyComparator::Compar
1.95%  [kernel] [k]
_raw_spin_lock
1.69%  [kernel] [k]
default_send_IPI_mask_sequence_phys
1.46%  libc-2.11.3.so   [.]
memcpy
1.17%  libleveldb.so.1.9[.]
leveldb::Block::Iter::Next()
1.16%  [kernel] [k]
hrtimer_interrupt
1.07%  [kernel] [k]
native_write_cr0
1.01%  [kernel] [k]
__hrtimer_start_range_ns
1.00%  [kernel] [k]
clockevents_program_event
0.93%  [kernel] [k]
find_next_bit
0.93%  libstdc++.so.6.0.13  [.]
std::string::_M_mutate(unsigned long,
0.89%  [kernel] [k]
cpumask_next_and
0.87%  [kernel] [k]
__schedule
0.85%  [kernel] [k]
_raw_spin_unlock_irqrestore
0.85%  [kernel] [k]
do_select
0.84%  [kernel] [k]
apic_timer_interrupt
0.80%  [kernel] [k]
fget_light
0.79%  [kernel] [k]
native_write_msr_safe
0.76%  [kernel] [k]
_raw_spin_lock_irqsave
0.66%  libc-2.11.3.so   [.]
0xdc6d8
0.61%  libpthread-2.11.3.so [.]
pthread_mutex_lock
0.61%  [kernel] [k]
tg_load_down
0.59%  [kernel] [k]
reschedule_interrupt
0.59%  libsnappy.so.1.1.2   [.]
snappy::RawUncompress(snappy::Source*,
0.56%  libstdc++.so.6.0.13  [.] std::string::append(char
const*, unsig
0.54%  [kvm_intel]  [k]
vmx_vcpu_run
0.53%  [kernel] [k]
copy_user_generic_string
0.53%  [kernel] [k]
load_balance
0.50%  [kernel] [k]
rcu_needs_cpu
0.45%  [kernel] [k] fput


 Regards,

   Oliver

On ma, 2013-08-26 at 23:33 -0700, Samuel Just wrote:

I just pushed a patch to wip-dumpling-log-assert (based on current
dumpling head).  I had disabled most of the code in PGLog::check() but
left an (I thought) innocuous assert.  It seems that with (at least)
g++ 4.6.3, stl list::size() is linear in the size of the list, so that
assert actually traverses the pg log on each operation.  The patch in
wip-dumpling-log-assert should disable that assert as well by default.
   Let me know if it helps.

It should be built within an hour of this email.
-Sam

On Mon, Aug 26, 2013 at 10:46 PM, Matthew Anderson
manderson8...@gmail.com wrote:

Hi Guys,

I'm having the same problem as Oliver with 0.67.2. CPU usage is around
double that of the 0.61.8 OSD's in the same cluster which appears to
be causing the performance decrease.

I did a perf comparison (not sure if I did it right but it seems ok).
Both hosts are the same spec running Ubuntu 12.04.1 (3.2 kernel),
journal and osd data is on an SSD, OSD's are in the same pool with the
same weight and the perf tests were run at the same time on a
realworld load consisting of RBD traffic only.

Dumpling -

Events: 332K 

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-27 Thread Samuel Just
Oliver,

This patch isn't in dumpling head yet, you may want to wait on a
dumpling point release.
-Sam

On Tue, Aug 27, 2013 at 8:13 AM, Mark Nelson mark.nel...@inktank.com wrote:
 Ok, definitely let us know how it goes!  For what it's worth, I'm testing
 Sam's wip-dumpling-perf branch with the wbthrottle code disabled now and
 comparing it both to that same branch with it enabled along with 0.67.1.
 Don't have any perf data, but quite a bit of other data to look through,
 both in terms of RADOS bench and RBD.

 Mark


 On 08/27/2013 10:07 AM, Oliver Daudey wrote:

 Hey Mark,

 That will take a day or so for me to know with enough certainty.  With
 the low CPU-usage and preliminary results today, I'm confident enough to
 upgrade all OSDs in production and test the cluster all-Dumpling
 tomorrow.  For now, I only upgraded a single OSD and measured CPU-usage
 and whatever performance-effects that had on the cluster, so if I would
 lose that OSD, I could recover. :-)  Will get back to you.


 Regards,

 Oliver

 On 27-08-13 15:04, Mark Nelson wrote:

 Hi Olver/Matthew,

 Ignoring CPU usage, has speed remained slower as well?

 Mark

 On 08/27/2013 03:08 AM, Oliver Daudey wrote:

 Hey Samuel,

 The PGLog::check() is now no longer visible in profiling, so it helped
 for that.  Unfortunately, it doesn't seem to have helped to bring down
 the OSD's CPU-loading much.  Leveldb still uses much more than in
 Cuttlefish.  On my test-cluster, I didn't notice any difference in the
 RBD bench-results, either, so I have to assume that it didn't help
 performance much.

 Here's the `perf top' I took just now on my production-cluster with your
 new version, under regular load.  Also note the memcmp and memcpy,
 which also don't show up when running a Cuttlefish-OSD:
15.65%  [kernel] [k]
 intel_idle
 7.20%  libleveldb.so.1.9[.]
 0x3ceae
 6.28%  libc-2.11.3.so   [.]
 memcmp
 5.22%  [kernel] [k]
 find_busiest_group
 3.92%  kvm  [.]
 0x2cf006
 2.40%  libleveldb.so.1.9[.]
 leveldb::InternalKeyComparator::Compar
 1.95%  [kernel] [k]
 _raw_spin_lock
 1.69%  [kernel] [k]
 default_send_IPI_mask_sequence_phys
 1.46%  libc-2.11.3.so   [.]
 memcpy
 1.17%  libleveldb.so.1.9[.]
 leveldb::Block::Iter::Next()
 1.16%  [kernel] [k]
 hrtimer_interrupt
 1.07%  [kernel] [k]
 native_write_cr0
 1.01%  [kernel] [k]
 __hrtimer_start_range_ns
 1.00%  [kernel] [k]
 clockevents_program_event
 0.93%  [kernel] [k]
 find_next_bit
 0.93%  libstdc++.so.6.0.13  [.]
 std::string::_M_mutate(unsigned long,
 0.89%  [kernel] [k]
 cpumask_next_and
 0.87%  [kernel] [k]
 __schedule
 0.85%  [kernel] [k]
 _raw_spin_unlock_irqrestore
 0.85%  [kernel] [k]
 do_select
 0.84%  [kernel] [k]
 apic_timer_interrupt
 0.80%  [kernel] [k]
 fget_light
 0.79%  [kernel] [k]
 native_write_msr_safe
 0.76%  [kernel] [k]
 _raw_spin_lock_irqsave
 0.66%  libc-2.11.3.so   [.]
 0xdc6d8
 0.61%  libpthread-2.11.3.so [.]
 pthread_mutex_lock
 0.61%  [kernel] [k]
 tg_load_down
 0.59%  [kernel] [k]
 reschedule_interrupt
 0.59%  libsnappy.so.1.1.2   [.]
 snappy::RawUncompress(snappy::Source*,
 0.56%  libstdc++.so.6.0.13  [.] std::string::append(char
 const*, unsig
 0.54%  [kvm_intel]  [k]
 vmx_vcpu_run
 0.53%  [kernel] [k]
 copy_user_generic_string
 0.53%  [kernel] [k]
 load_balance
 0.50%  [kernel] [k]
 rcu_needs_cpu
 0.45%  [kernel] [k] fput


  Regards,

Oliver

 On ma, 2013-08-26 at 23:33 -0700, Samuel Just wrote:

 I just pushed a patch to wip-dumpling-log-assert (based on current
 dumpling head).  I had disabled most of the code in PGLog::check() but
 left an (I thought) innocuous assert.  It seems that with (at least)
 g++ 4.6.3, stl list::size() is linear in the size of the list, so that
 assert actually traverses the pg log on each operation.  The patch in
 wip-dumpling-log-assert should disable that assert as well by default.
Let me know if it helps.

 It should be built within an hour of this email.
 -Sam

 On Mon, Aug 26, 2013 at 10:46 PM, Matthew Anderson
 manderson8...@gmail.com wrote:

 Hi Guys,

 I'm having the same problem as Oliver with 0.67.2. CPU usage is around
 double that of the 0.61.8 OSD's in the same cluster which appears to
 be causing the performance decrease.

 I did a perf comparison (not sure if I did it 

Re: [ceph-users] Ceph + Xen - RBD io hang

2013-08-27 Thread Sage Weil
Hi James,

Can you post the contents of the hung task warning so we can see where it 
is stuck?

Thanks!
sage


On Tue, 27 Aug 2013, James Dingwall wrote:

 Hi,
 
 I am doing some experimentation with Ceph and Xen (on the same host) and I'm
 experiencing some problems with the rbd device that I'm using as the block
 device.  My environment is:
 
 2 node Ceph 0.67.2 cluster, 4x OSD (btrfs) and 1x mon
 Xen 4.3.0
 Kernel 3.10.9
 
 The domU I'm trying to build is from the Ubuntu 13.04 desktop release.  When I
 pass through the rbd (format 1 or 2) device as phy:/dev/rbd/rbd/ubuntu-test
 then the domU has no problems reading data from it, the test I ran was:
 
 for i in $(seq 0 1023) ; do
 dd if=/dev/xvda of=/dev/null bs=4k count=1024 skip=$(($i * 4))
 done
 
 However writing data causes the domU to hang while while i is still in single
 figures but it doesn't seem consistent about the exact value.
 for i in $(seq 0 1023) ; do
 dd of=/dev/xvda of=/dev/zero bs=4k count=1024 seek=$(($i * 4))
 done
 
 eventually the kernel in the domU will print a hung task warning.  I have
 tried the domU as pv and hvm (with xen_platform_pci = 1 and 0) but have the
 same behaviour in both cases.  Once this state is triggered on the rbd device
 then any interaction with it in dom0 will result in the same hang.  I'm
 assuming that there is some unfavourable interaction between ceph/rbd and
 blkback but I haven't found anything in the dom0 logs so I would like to know
 if anyone has some suggestions about where to start trying to hunt this down.
 
 Thanks,
 James
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems with keyrings during deployment

2013-08-27 Thread Sage Weil
On Tue, 27 Aug 2013, Francesc Alted wrote:
 Hi again,
 
 I continue to try debugging the problem reported before.  Now, I have been
 trying to use a couple of VM for doing this (one with Ubuntu 12.04 64-bit,
 and the other with Ubuntu 12.10 64-bit, and I use the ceph.com repos for
 installing the Ceph libraries).  And, unfortunately, I am getting into the
 same problem: the keyring do not appear where they should (i.e.
 bootstrap-mds and bootstrap-osd in /var/lib/ceph).
 
 I have followed the preflight check list
 (http://ceph.com/docs/next/start/quick-start-preflight/), and the ceph user
 in the admin box can login perfectly well on the server box, so not sure
 what's going on here.
 
 I have even tried to use a single ceph server for installing everything
 (adding the 'osd crush chooseleaf type = 0' line into the ceph conf file)
 but then again the keyrings do not appear.
 
 Nobody is having the same problems than me (using latest Ceph Dumpling
 0.67.2 release here)? 
 
 Thanks for any insight!

There are several possible pitfalls here; the missing keys are just the 
most visible symptom of the monitors not forming an initial quorum.

Can you post the contents of your ceph.conf and output from 'ceph daemon 
mon.`hostnam` mon-status' on each of the mon nodes?

thanks!
sage


 
 Francesc
 
 On Mon, Aug 26, 2013 at 1:55 PM, Francesc Alted franc...@continuum.io
 wrote:
   Hi,
 
 I am a newcomer to Ceph.  After having a look at the docs (BTW, it is
 nice to see its concepts being implemented), I am trying to do some
 tests, mainly to check the Python APIs to access RADOS and RDB
 components.  I am following this quick guide:
 
 http://ceph.com/docs/next/start/quick-ceph-deploy/
 
 But after adding a monitor (ceph-deploy mon create ceph-server), I see
 that the subdirectories bootstrap-mds and bootstrap-osd (in
 /var/lib/ceph) do not contain keyrings.  I have tried to create the
 monitor again (as suggested in the docs), but the keyrings continue to
 not appear there:
 
 $ ceph-deploy gatherkeys ceph-server
 [ceph_deploy.gatherkeys][DEBUG ] Checking ceph-server for
 /etc/ceph/ceph.client.admin.keyring
 [ceph_deploy.gatherkeys][WARNIN] Unable to find
 /etc/ceph/ceph.client.admin.keyring on ['ceph-server']
 [ceph_deploy.gatherkeys][DEBUG ] Have ceph.mon.keyring
 [ceph_deploy.gatherkeys][DEBUG ] Checking ceph-server for
 /var/lib/ceph/bootstrap-osd/ceph.keyring
 [ceph_deploy.gatherkeys][WARNIN] Unable to find
 /var/lib/ceph/bootstrap-osd/ceph.keyring on ['ceph-server']
 [ceph_deploy.gatherkeys][DEBUG ] Checking ceph-server for
 /var/lib/ceph/bootstrap-mds/ceph.keyring
 [ceph_deploy.gatherkeys][WARNIN] Unable to find
 /var/lib/ceph/bootstrap-mds/ceph.keyring on ['ceph-server']
 
 My admin node (the machine from where I issue the ceph commands) is an
 openSUSE 12.3 where I compiled the ceph-0.67.1 tarball.  The server
 node is a Debian Precise 64-bit (using vagrant w/ VirtaulBox), and
 Ceph installation seems to have gone well, as per the logs:
 
 [ceph-server][INFO  ] Running command: ceph --version
 [ceph-server][INFO  ] ceph version 0.67.2
 (eb4380dd036a0b644c6283869911d615ed729ac8)
 
 Any hints on what is going on there?  Thanks!
 
 --
 Francesc Alted
 
 
 
 
 --
 Francesc Alted
 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] problem uploading 1GB and up file

2013-08-27 Thread Marc-Andre Jutras

Hello,

I got a weird upload issue with Ceph - dumpling (0.67.2) and I don't 
know if someone can help me out to pin point my problem...


Basically, If I'm trying to upload a 1Gb file, as soon as my upload is 
completed, apache return a 500 error... no problem if I upload a 900Mb 
file or less, just got that specific problem with any file bigger than 
1Gb !!


I also got 2 apache server in place, one with the modify fastcgi module 
and the other one without - both server generate the same issue/behavior...


One thing that I remark : with a 1Gb file or up, radosgw throw me a 
bunch of these errors : ( not getting these with a 900Mb or less file 
size... )


2013-08-27 11:18:09.850700 7f1e490ec700 20 get_obj_state: s-obj_tag was 
set empty
2013-08-27 11:18:09.850705 7f1e490ec700 20 
prepare_atomic_for_write_impl: state is not atomic. state=0x7f1dc40ad938
2013-08-27 11:18:09.859304 7f1e490ec700 20 get_obj_state: 
rctx=0x7f1dc40028b0 obj=aaa:_shadow__ivTDZIBHCbNVE4p-DDeXDJHni8SArBZ_435 
state=0x7f1dc40adbf8 s-prefetch_data=0

(...)
2013-08-27 11:18:09.878697 7f1e490ec700  0 WARNING: set_req_state_err 
err_no=27 resorting to 500


not seeing anything others errors, no OSD errors, no MDS errors... Can't 
find anything on google related to this warning msg: set_req_state_err 
err_no=27


any clue ??

Thanks
M-A

= = =

ceph -v
ceph version 0.67.2 (eb4380dd036a0b644c6283869911d615ed729ac8)

 ceph status
  cluster eb16413a----f23fddd6a5f6
   health HEALTH_OK
   monmap e1: 2 mons at 
{coe-w1-stor-db01=10.150.2.101:6789/0,coe-w1-stor-db02=10.150.2.102:6789/0}, 
election epoch 30, quorum 0,1 coe-w1-stor-db01,coe-w1-stor-db02

   osdmap e518: 101 osds: 101 up, 101 in
pgmap v8991: 288 pgs: 288 active+clean; 2633 MB data, 11312 MB 
used, 250 TB / 250 TB avail

   mdsmap e64: 1/1/1 up {0=coe-w1-stor-db02=up:active}, 1 up:standby

apache:
172.16.11.118 - - [27/Aug/2013:11:15:21 -0400] GET 
/aaa/?delimiter=%2Fmax-keys=1000prefix HTTP/1.1 200 1596 - 
Cyberduck/4.3.1 (Mac OS X/10.8.4) (i386)
172.16.11.118 - - [27/Aug/2013:11:15:23 -0400] GET / HTTP/1.1 200 1673 
- Cyberduck/4.3.1 (Mac OS X/10.8.4) (i386)
172.16.11.118 - - [27/Aug/2013:11:15:23 -0400] GET 
/aaa/?delimiter=%2Fmax-keys=1000prefix HTTP/1.1 200 573 - 
Cyberduck/4.3.1 (Mac OS X/10.8.4) (i386)
172.16.11.118 - - [27/Aug/2013:11:15:23 -0400] PUT 
/aaa/XenServer-6.2-binpkg.iso HTTP/1.1 500 377 - Cyberduck/4.3.1 
(Mac OS X/10.8.4) (i386)
172.16.11.118 - - [27/Aug/2013:11:18:09 -0400] PUT 
/aaa/XenServer-6.2-binpkg.iso HTTP/1.1 200 315 - Cyberduck/4.3.1 
(Mac OS X/10.8.4) (i386)
172.16.11.118 - - [27/Aug/2013:11:18:27 -0400] GET 
/aaa/?delimiter=%2Fmax-keys=1000prefix HTTP/1.1 200 944 - 
Cyberduck/4.3.1 (Mac OS X/10.8.4) (i386)
172.16.11.118 - - [27/Aug/2013:11:18:27 -0400] HEAD 
/aaa/XenServer-6.2-binpkg.iso HTTP/1.1 404 248 - Cyberduck/4.3.1 
(Mac OS X/10.8.4) (i386)



radosgw.log
2013-08-27 11:15:23.634295 7f1e490ec700  2 req 4:0.000457:s3:PUT 
/aaa/XenServer-6.2-binpkg.iso:put_obj:verifying op params
2013-08-27 11:15:23.634297 7f1e490ec700  2 req 4:0.000459:s3:PUT 
/aaa/XenServer-6.2-binpkg.iso:put_obj:executing
2013-08-27 11:15:31.797706 7f1e42ffd700  2 
RGWDataChangesLog::ChangesRenewThread: start
2013-08-27 11:15:53.797852 7f1e42ffd700  2 
RGWDataChangesLog::ChangesRenewThread: start
2013-08-27 11:16:15.798007 7f1e42ffd700  2 
RGWDataChangesLog::ChangesRenewThread: start
2013-08-27 11:16:37.798146 7f1e42ffd700  2 
RGWDataChangesLog::ChangesRenewThread: start
2013-08-27 11:16:59.798282 7f1e42ffd700  2 
RGWDataChangesLog::ChangesRenewThread: start
2013-08-27 11:17:21.798415 7f1e42ffd700  2 
RGWDataChangesLog::ChangesRenewThread: start
2013-08-27 11:17:43.798551 7f1e42ffd700  2 
RGWDataChangesLog::ChangesRenewThread: start

2013-08-27 11:18:05.495773 7f1e490ec700 10 x x-amz-acl:private
2013-08-27 11:18:05.495932 7f1e490ec700 20 get_obj_state: 
rctx=0x7f1dc40028b0 obj=aaa:XenServer-6.2-binpkg.iso 
state=0x7f1dc405f3a8 s-prefetch_data=0
2013-08-27 11:18:05.497367 7f1e490ec700  0 setting object 
write_tag=default.15804.4
2013-08-27 11:18:05.508505 7f1e490ec700 20 get_obj_state: 
rctx=0x7f1dc40028b0 obj=aaa:_shadow__ivTDZIBHCbNVE4p-DDeXDJHni8SArBZ_1 
state=0x7f1dc4037f88 s-prefetch_data=0
2013-08-27 11:18:05.509832 7f1e490ec700 20 get_obj_state: s-obj_tag was 
set empty


( bunch of those messages  )


2013-08-27 11:18:09.840315 7f1e490ec700 20 
prepare_atomic_for_write_impl: state is not atomic. state=0x7f1dc40ad4a8
2013-08-27 11:18:09.849362 7f1e490ec700 20 get_obj_state: 
rctx=0x7f1dc40028b0 obj=aaa:_shadow__ivTDZIBHCbNVE4p-DDeXDJHni8SArBZ_434 
state=0x7f1dc40ad938 s-prefetch_data=0
2013-08-27 11:18:09.850700 7f1e490ec700 20 get_obj_state: s-obj_tag was 
set empty
2013-08-27 11:18:09.850705 7f1e490ec700 20 
prepare_atomic_for_write_impl: state is not atomic. state=0x7f1dc40ad938
2013-08-27 11:18:09.859304 7f1e490ec700 20 get_obj_state: 
rctx=0x7f1dc40028b0 obj=aaa:_shadow__ivTDZIBHCbNVE4p-DDeXDJHni8SArBZ_435 
state=0x7f1dc40adbf8 

Re: [ceph-users] Storage, File Systems and Data Scrubbing

2013-08-27 Thread Sage Weil
On Tue, 27 Aug 2013, ker can wrote:
 This was very helpful -thanks.  However I'm still trying to reconcile this
 with something that Sage mentioned a while back on a similar topic.
 Apparently you can disable the journal if you're using  btrfs.  Is that
 possible because btrfs takes care of things like atomic object writes and
 updates to the osd metadata ? 

It's because with btrfs we take snapshots that are consistent checkpoints.  
You *can* disable the journal, but it means that writes only commit when 
a new checkpoint is made (i.e., snapshot), which is a infrequent and 
relatively expensive operation.. so in general the write latency is 
terrible.  This is useful only for workloads where you are doing bulk data 
inject (for example) and write latency is not important.

sage



 
 
 -Original Message-
 From: ceph-users-boun...@lists.ceph.com
 [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Sage Weil
 Sent: Thursday, July 11, 2013 8:39 PM
 To: Mark Nelson
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Turning off ceph journaling with xfs ?
 
  
 
 Note that you *can* disable teh journal if you use btrfs, but your write
 latency will tend to be pretty terrible.  This is only viable for
 bulk-storage use cases where throughput trumps all and latency is not an
 issue at all (it may be seconds).
 
  
 
 We are planning on eliminating the double-write for at least large writes
 when using btrfs by cloning data out of the journal and into the target
 file.  This is not a hugely complex task (although it is non-trivial) but it
 hasn't made it to the top of the priority list yet.
 
  
 
 sage
 
 
 
 On Mon, Aug 26, 2013 at 4:05 PM, Samuel Just sam.j...@inktank.com wrote:
   ceph-osd builds a transactional interface on top of the usual
   posix
   operations so that we can do things like atomically perform an
   object
   write and update the osd metadata.  The current implementation
   requires our own journal and some metadata ordering (which is
   provided
   by the backing filesystem's own journal) to implement our own
   atomic
   operations.  It's true that in some cases you might be able to
   get
   away with having the client replay the operation (which we do
   anyway
   for other reasons), but that wouldn't be enough to ensure
   consistency
   of the filesystem's own internal structures.  It also wouldn't
   be
   enough to ensure that the OSD's internal structure remain
   consistent
   in the case of a crash.  Also, if the client is unavailable to
   do the
   replay, you'd have a problem.
 
   In summary, it's actually really hard to to detect
   partial/corrupted
   writes after a crash without journaling of some form.
   -Sam
 
 
 
 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Limitations of Ceph

2013-08-27 Thread Guido Winkelmann
Hi,

I have been running a small Ceph cluster for experimentation for a while, and 
now my employer has asked me to do little talk about my findings, and one 
important part is, of course, going to be practical limitations of Ceph.

Here is my list so far:

- Ceph is not supported by VMWare ESX. That may change in the future, but 
seeing how VMWare is now owned by EMC, they might make it a political decision 
to not support Ceph.
Apparently, you can import an RBD volume on linux server and then reexport it 
to a VMWare host as an iSCSI target, but doing so would introduce a bottleneck 
and a single point of failure, which kind of defeats the purpose of having a 
Ceph cluster in the first place.

- Ceph is not supported by Windows clients, or even, as far as I can tell, 
anything that isn't a very recent version of Linux. (User space only clients 
work in some cases.)

- There is no dynamic tiered storage, and there probably never will be, if I 
understand the architecture correctly.
You can have different pools with different perfomance characteristics (like 
one on cheap and large 7200 RPM disks, and another on SSDs), but once you have 
put a given bunch of data on one pool, it is pretty much stuck there. (I.e. 
you cannot move it to another pool without very tight and very manual 
coordination with all clients using it.)

- There is no active data deduplication, and, again, if I understand the 
architecture correctly, there probably never will be.
There is, however, sparse allocation and COW-cloning for RBD volumes, which 
does something similar. Under certain conditions, it is even possible to use 
the discard option of modern filesystems to automatically keep unused regions 
of an RBD volume sparse.

- Bad support for multiple customers accessing the same cluster.
This is assuming that, if you have multiple customers, it is imperative that 
any one given customer must be unable to access or even modify the data of any 
other customer. You can have authorization on the pool layer, but it has been 
reported that Ceph reacts badly to defining a large number of pools.
Multi-customer support in CephFS is non-existant.
RadosGW probably supports multi-customer, but I haven't tried it.

- No dynamic partitioning for CephFS
The original paper talked about dynamic partioning of the CephFS namespace, so 
that multiple Metadata Servers could share the workload of a large number of 
CephFS clients. This isn't implemented yet (or implemented but not working 
properly?), and the only currently support multi-MDS configuration is 1 active 
/ n standby. This limits the scalability of CephFS. It looks to me like CephFS 
is not a major focus of the development team at this time.

Can you give me some comments on that? Am I totally in the wrong on some of 
those points? Have I forgotten some important limitation?

Regards,

Guido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems with keyrings during deployment

2013-08-27 Thread Alfredo Deza
On Tue, Aug 27, 2013 at 12:30 PM, Francesc Alted franc...@continuum.iowrote:

 On Tue, Aug 27, 2013 at 6:25 PM, Alfredo Deza alfredo.d...@inktank.comwrote:




 On Tue, Aug 27, 2013 at 12:04 PM, Francesc Alted 
 franc...@continuum.iowrote:

 On Tue, Aug 27, 2013 at 5:29 PM, Sage Weil s...@inktank.com wrote:

 On Tue, 27 Aug 2013, Francesc Alted wrote:
  Hi again,
 
  I continue to try debugging the problem reported before.  Now, I have
 been
  trying to use a couple of VM for doing this (one with Ubuntu 12.04
 64-bit,
  and the other with Ubuntu 12.10 64-bit, and I use the ceph.com repos
 for
  installing the Ceph libraries).  And, unfortunately, I am getting
 into the
  same problem: the keyring do not appear where they should (i.e.
  bootstrap-mds and bootstrap-osd in /var/lib/ceph).
 
  I have followed the preflight check list
  (http://ceph.com/docs/next/start/quick-start-preflight/), and the
 ceph user
  in the admin box can login perfectly well on the server box, so not
 sure
  what's going on here.
 
  I have even tried to use a single ceph server for installing
 everything
  (adding the 'osd crush chooseleaf type = 0' line into the ceph conf
 file)
  but then again the keyrings do not appear.
 
  Nobody is having the same problems than me (using latest Ceph Dumpling
  0.67.2 release here)?
 
  Thanks for any insight!

 There are several possible pitfalls here; the missing keys are just the
 most visible symptom of the monitors not forming an initial quorum.

 Can you post the contents of your ceph.conf and output from 'ceph daemon
 mon.`hostnam` mon-status' on each of the mon nodes?


 Okay, I tracked down my problem.  It turned out that I was setting
 different names for the ceph servers in /etc/hosts than their own
 `hostname`.  These log lines when creating the monitor gave me the clue:

 [ceph-server2][INFO  ] creating keyring file:
 /var/lib/ceph/tmp/ceph-vagrant.mon.keyring
 [ceph-server2][INFO  ] create the monitor keyring file
 [ceph-server2][INFO  ] Running command: ceph-mon --cluster ceph --mkfs
 -i vagrant --keyring /var/lib/ceph/tmp/ceph-vagrant.mon.keyring
 [ceph-server2][INFO  ] ceph-mon: mon.noname-a 192.168.33.11:6789/0 is
 local, renaming to mon.vagrant
 [ceph-server2][INFO  ] ceph-mon: set fsid to
 253c5a74-699b-44ef-a071-5883716fa620

 I was calling this 'vagrant' hostname 'ceph-server2' in my /etc/hosts
 and I realized this was fooling cephs.  So I changed all my /etc/hosts to
 follow the original hostnames (changed to 'quantal64'), and pum! everything
 works as intended:

 [quantal64][INFO  ] creating keyring file:
 /var/lib/ceph/tmp/ceph-quantal64.mon.keyring
 [quantal64][INFO  ] create the monitor keyring file
 [quantal64][INFO  ] Running command: ceph-mon --cluster ceph --mkfs -i
 quantal64 --keyring /var/lib/ceph/tmp/ceph-quantal64.mon.keyring
 [quantal64][INFO  ] ceph-mon: mon.noname-a 192.168.33.11:6789/0 is
 local, renaming to mon.quantal64
 [quantal64][INFO  ] ceph-mon: set fsid to
 96c48ec5-7dd5-4f76-81f9-4fdc711a76f0

 Now I can gather the keys normally:

 $ ceph-deploy gatherkeys quantal64
 [ceph_deploy.gatherkeys][DEBUG ] Checking quantal64 for
 /etc/ceph/ceph.client.admin.keyring
 [ceph_deploy.gatherkeys][DEBUG ] Got ceph.client.admin.keyring key from
 quantal64.
 [ceph_deploy.gatherkeys][DEBUG ] Have ceph.mon.keyring
 [ceph_deploy.gatherkeys][DEBUG ] Checking quantal64 for
 /var/lib/ceph/bootstrap-osd/ceph.keyring
 [ceph_deploy.gatherkeys][DEBUG ] Got ceph.bootstrap-osd.keyring key from
 quantal64.
 [ceph_deploy.gatherkeys][DEBUG ] Checking quantal64 for
 /var/lib/ceph/bootstrap-mds/ceph.keyring
 [ceph_deploy.gatherkeys][DEBUG ] Got ceph.bootstrap-mds.keyring key from
 quantal64.

 Well, thanks anyways.  Now it is time to make some more progress and
 create some ODSs :)


 Francesc, thanks for pasting this log info, it is useful to know what
 worked for you :) I will update the docs for ceph-deploy
 on things to watch out so that there is *something* users can try when
 this comes up.


 No problem.  A possible idea for enhancing the capabilities to
 self-detecting problems would be to implement a check in ceph-deploy (or in
 another place) that warns (or just gives an error) when it detects that the
 hostname is different depending on whether they do a DNS lookup or a
 `hostname` output.


I went ahead and created http://tracker.ceph.com/issues/6132 to track this.

Thanks again.


 --
 Francesc Alted
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Limitations of Ceph

2013-08-27 Thread Sage Weil
Hi Guido!

On Tue, 27 Aug 2013, Guido Winkelmann wrote:
 Hi,
 
 I have been running a small Ceph cluster for experimentation for a while, and 
 now my employer has asked me to do little talk about my findings, and one 
 important part is, of course, going to be practical limitations of Ceph.
 
 Here is my list so far:
 
 - Ceph is not supported by VMWare ESX. That may change in the future, but 
 seeing how VMWare is now owned by EMC, they might make it a political 
 decision 
 to not support Ceph.
 Apparently, you can import an RBD volume on linux server and then reexport it 
 to a VMWare host as an iSCSI target, but doing so would introduce a 
 bottleneck 
 and a single point of failure, which kind of defeats the purpose of having a 
 Ceph cluster in the first place.

It will be a challenge to make ESX natively support RBD as RBD is open 
source (ESX is proprietary), ESX is (I think) based on a *BSD kernel, and 
VMWare just announced a possibly competitive product.  Inktank is doing 
what it can.

Meanwhile, we are pursuing a robust iSCSI solution. Sadly this will 
require a traditional HA failover setup, but that's how the cookie 
crumbles when you use legacy protocols.

 - Ceph is not supported by Windows clients, or even, as far as I can tell, 
 anything that isn't a very recent version of Linux. (User space only clients 
 work in some cases.)

There is ongoing work here; nothing to announce yet.

 - There is no dynamic tiered storage, and there probably never will be, if I 
 understand the architecture correctly.
 You can have different pools with different perfomance characteristics (like 
 one on cheap and large 7200 RPM disks, and another on SSDs), but once you 
 have 
 put a given bunch of data on one pool, it is pretty much stuck there. (I.e. 
 you cannot move it to another pool without very tight and very manual 
 coordination with all clients using it.)

This is a key item on the roadmap for Emperor (nov) and Firefly (feb).  
We are building two capabilities: 'cache pools' that let you put fast 
storage in front of your main data pool, and a tiered 'cold' pool that 
lets you bleed cold objects off to a cheaper, slower tier (probably using 
erasure coding.. which is also coming in firefly).

 - There is no active data deduplication, and, again, if I understand the 
 architecture correctly, there probably never will be.
 There is, however, sparse allocation and COW-cloning for RBD volumes, which 
 does something similar. Under certain conditions, it is even possible to use 
 the discard option of modern filesystems to automatically keep unused regions 
 of an RBD volume sparse.

You can do two things:

- Do dedup inside an osd.  Btrfs is growing this capability, and ZFS 
already has it.  This is not ideal because data is random distributed 
across nodes.

- You can build dedup on top of rados, for example by naming objects after 
a hash of their content.  This will never be a 'magic and transparent 
dedup for all rados apps' because CAS is based on naming objects from 
content, and rados fundamentally places data based on name and eschews 
metadata.  That means there isn't normally a way to point to the content 
unless there is some MDS on top of rados.  Someday CephFS will get this, 
but raw librados users and RBD won't get it for free.

 - Bad support for multiple customers accessing the same cluster.
 This is assuming that, if you have multiple customers, it is imperative that 
 any one given customer must be unable to access or even modify the data of 
 any 
 other customer. You can have authorization on the pool layer, but it has been 
 reported that Ceph reacts badly to defining a large number of pools.
 Multi-customer support in CephFS is non-existant.
 RadosGW probably supports multi-customer, but I haven't tried it.

The just-released Dumpling included support for rados namespaces, which 
are designed to address exactly this issue.  Namespaces exist inside 
pools, and the auth capabilities can restrict access to a specific 
namespace.

 - No dynamic partitioning for CephFS
 The original paper talked about dynamic partioning of the CephFS namespace, 
 so 
 that multiple Metadata Servers could share the workload of a large number of 
 CephFS clients. This isn't implemented yet (or implemented but not working 
 properly?), and the only currently support multi-MDS configuration is 1 
 active 
 / n standby. This limits the scalability of CephFS. It looks to me like 
 CephFS 
 is not a major focus of the development team at this time.

This has been implemented since ~2006.  We do not recommend it for 
production because it has not had the QA attention it deserves.  That 
said, Zheng Yan has been doing a lot of great work here recently and 
things have improved considerably.  Please try it!  You just need to do 
'ceph mds set_max_mds 3' (or whatever) to tell ceph how many active 
ceph-mds daemons you want.

Hope that helps!

sage
___
ceph-users 

Re: [ceph-users] Administering a ceph cluster

2013-08-27 Thread John Wilkins
This is an error in the docs.  Upstart jobs apply to each node. I've
updated the docs to reflect this understanding. When deployed as a
service with the -a option, ceph would start daemons across nodes.
With upstart, you need to start and stop by invoking upstart on each
node.

On Tue, Aug 27, 2013 at 10:03 AM, Francesc Alted franc...@continuum.io wrote:
 Hi,

 So I have already setup a shiny new Ceph cluster (in one single machine,
 quantal64, adminstered from another machine, precise64). Now, for operating
 the cluster, I am a bit unsure on how to interpret the docs in
 http://ceph.com/docs/next/rados/operations/operating/.  My interpretation is
 that I should start the cluster from the *admin* node, right?  But once I
 have done this in precise64 (via `sudo start ceph-all`), I try to see the
 status of it with the `ceph` command and I am getting this:

 $ ceph
 2013-08-27 16:50:35.946904 7f43d44c6700  1 -- :/0 messenger.start
 2013-08-27 16:50:35.947392 7f43d44c6700 -1 monclient(hunting): ERROR:
 missing keyring, cannot use cephx for authentication
 2013-08-27 16:50:35.947410 7f43d44c6700  0 librados: client.admin
 initialization error (2) No such file or directory
 2013-08-27 16:50:35.947444 7f43d44c6700  1 -- :/1020622 mark_down_all
 2013-08-27 16:50:35.947604 7f43d44c6700  1 -- :/1020622 shutdown complete.
 Error connecting to cluster: ObjectNotFound

 Then, I tried to start the cluster right at 'cluster' machine (quantal64),
 but I am getting the same error in the admin machine.  Here it is the
 contents of my 'my-cluster' directory in the admin machine:

 vagrant@precise64:~/my-cluster$ ls
 ceph.bootstrap-mds.keyring  ceph.bootstrap-osd.keyring
 ceph.client.admin.keyring  ceph.conf  ceph.log  ceph.mon.keyring

 and my ceph.conf contents:

 $ cat ceph.conf
 [global]
 fsid = 64b3090b-a692-4993-98a0-ba3e0bedd7db
 mon initial members = quantal64
 mon host = 192.168.33.11
 auth supported = cephx
 osd journal size = 1024
 filestore xattr use omap = true

 [osd.1]
 host = quantal64

 Am I doing something wrong?

 Thanks,

 --
 Francesc Alted
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Limitations of Ceph

2013-08-27 Thread Guido Winkelmann
Hi Sage,

Thanks for your comments, much appreciated.

Am Dienstag, 27. August 2013, 10:19:46 schrieb Sage Weil:
 Hi Guido!
 
 On Tue, 27 Aug 2013, Guido Winkelmann wrote:
[...]
  - There is no dynamic tiered storage, and there probably never will be, if
  I understand the architecture correctly.
  You can have different pools with different perfomance characteristics
  (like one on cheap and large 7200 RPM disks, and another on SSDs), but
  once you have put a given bunch of data on one pool, it is pretty much
  stuck there. (I.e. you cannot move it to another pool without very tight
  and very manual coordination with all clients using it.)
 
 This is a key item on the roadmap for Emperor (nov) and Firefly (feb).
 We are building two capabilities: 'cache pools' that let you put fast
 storage in front of your main data pool, and a tiered 'cold' pool that
 lets you bleed cold objects off to a cheaper, slower tier

Sounds interesting.
Will that work on entire PGs or on single objects? How do you keep track of 
which object lies on what pool without resorting to a lookup step before every 
operation? Will that feature retain backwards compatibility with older Ceph 
clients?

 (probably using erasure coding.. which is also coming in firefly).

... which happens to address another issue I forgot to mention

  - There is no active data deduplication, and, again, if I understand the
  architecture correctly, there probably never will be.
  There is, however, sparse allocation and COW-cloning for RBD volumes,
  which does something similar. Under certain conditions, it is even
  possible to use the discard option of modern filesystems to automatically
  keep unused regions of an RBD volume sparse.
 
 You can do two things:
 
 - Do dedup inside an osd.  Btrfs is growing this capability, and ZFS
 already has it.  This is not ideal because data is random distributed
 across nodes.
 
 - You can build dedup on top of rados, for example by naming objects after
 a hash of their content.  This will never be a 'magic and transparent
 dedup for all rados apps' because CAS is based on naming objects from
 content, and rados fundamentally places data based on name and eschews
 metadata.  That means there isn't normally a way to point to the content
 unless there is some MDS on top of rados.  Someday CephFS will get this,
 but raw librados users and RBD won't get it for free.

I read that as TL;DR: No real deduplication.
 
  - Bad support for multiple customers accessing the same cluster.
  This is assuming that, if you have multiple customers, it is imperative
  that any one given customer must be unable to access or even modify the
  data of any other customer. You can have authorization on the pool layer,
  but it has been reported that Ceph reacts badly to defining a large
  number of pools. Multi-customer support in CephFS is non-existant.
  RadosGW probably supports multi-customer, but I haven't tried it.
 
 The just-released Dumpling included support for rados namespaces, which
 are designed to address exactly this issue.  Namespaces exist inside
 pools, and the auth capabilities can restrict access to a specific
 namespace.

I'm having some trouble finding this in the documentation. Can you give me a 
pointer here?
 
  - No dynamic partitioning for CephFS
  The original paper talked about dynamic partioning of the CephFS
  namespace, so that multiple Metadata Servers could share the workload of
  a large number of CephFS clients. This isn't implemented yet (or
  implemented but not working properly?), and the only currently support
  multi-MDS configuration is 1 active / n standby. This limits the
  scalability of CephFS. It looks to me like CephFS is not a major focus of
  the development team at this time.
 
 This has been implemented since ~2006.  We do not recommend it for
 production because it has not had the QA attention it deserves.  That
 said, Zheng Yan has been doing a lot of great work here recently and
 things have improved considerably.  Please try it!  You just need to do
 'ceph mds set_max_mds 3' (or whatever) to tell ceph how many active
 ceph-mds daemons you want.

Okay, I think I will try this.

Guido

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fedora 18 Qemu

2013-08-27 Thread Joe Ryner
Does anyone have the best patches for Fedora 18 qemu that fixes aio issues?  I 
have built my own but am having mixed results?

Its qemu 1.2.2

Or would it be better to jump to Fedora 19.  I am running Fedora 18 in hopes 
that RHEL 7 will be based on it.

Thanks,
Joe

-- 
Joe Ryner
Center for the Application of Information Technologies (CAIT)
Production Coordinator
P: (309) 298-1804
F: (309) 298-2806
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] problem uploading 1GB and up file

2013-08-27 Thread Craig Lewis
Since you mention a problem starting at 1GB, check to see if you have a 
LimitRequestBody directive 
(http://httpd.apache.org/docs/2.2/mod/core.html#limitrequestbody)


   The|LimitRequestBody|directive allows the user to set a limit on the
   allowed size of an HTTP request message body within the context in
   which the directive is given.



To me, this doesn't look like an Apache problem.  If it was 
LimitRequestBody, Apache should deny the request, and RadosGW would 
never see it.


But it's quick and easy to verify that this isn't a problem.  I'd check 
the other LimitRequest parameters too.





*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com mailto:cle...@centraldesktop.com

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website http://www.centraldesktop.com/  | Twitter 
http://www.twitter.com/centraldesktop  | Facebook 
http://www.facebook.com/CentralDesktop  | LinkedIn 
http://www.linkedin.com/groups?gid=147417  | Blog 
http://cdblog.centraldesktop.com/


On 8/27/13 08:54 , Marc-Andre Jutras wrote:

Hello,

I got a weird upload issue with Ceph - dumpling (0.67.2) and I don't 
know if someone can help me out to pin point my problem...


Basically, If I'm trying to upload a 1Gb file, as soon as my upload is 
completed, apache return a 500 error... no problem if I upload a 900Mb 
file or less, just got that specific problem with any file bigger than 
1Gb !!


I also got 2 apache server in place, one with the modify fastcgi 
module and the other one without - both server generate the same 
issue/behavior...


One thing that I remark : with a 1Gb file or up, radosgw throw me a 
bunch of these errors : ( not getting these with a 900Mb or less file 
size... )


2013-08-27 11:18:09.850700 7f1e490ec700 20 get_obj_state: s-obj_tag 
was set empty
2013-08-27 11:18:09.850705 7f1e490ec700 20 
prepare_atomic_for_write_impl: state is not atomic. state=0x7f1dc40ad938
2013-08-27 11:18:09.859304 7f1e490ec700 20 get_obj_state: 
rctx=0x7f1dc40028b0 
obj=aaa:_shadow__ivTDZIBHCbNVE4p-DDeXDJHni8SArBZ_435 
state=0x7f1dc40adbf8 s-prefetch_data=0

(...)
2013-08-27 11:18:09.878697 7f1e490ec700  0 WARNING: set_req_state_err 
err_no=27 resorting to 500


not seeing anything others errors, no OSD errors, no MDS errors... 
Can't find anything on google related to this warning msg: 
set_req_state_err err_no=27


any clue ??

Thanks
M-A

= = =

ceph -v
ceph version 0.67.2 (eb4380dd036a0b644c6283869911d615ed729ac8)

 ceph status
  cluster eb16413a----f23fddd6a5f6
   health HEALTH_OK
   monmap e1: 2 mons at 
{coe-w1-stor-db01=10.150.2.101:6789/0,coe-w1-stor-db02=10.150.2.102:6789/0}, 
election epoch 30, quorum 0,1 coe-w1-stor-db01,coe-w1-stor-db02

   osdmap e518: 101 osds: 101 up, 101 in
pgmap v8991: 288 pgs: 288 active+clean; 2633 MB data, 11312 MB 
used, 250 TB / 250 TB avail

   mdsmap e64: 1/1/1 up {0=coe-w1-stor-db02=up:active}, 1 up:standby

apache:
172.16.11.118 - - [27/Aug/2013:11:15:21 -0400] GET 
/aaa/?delimiter=%2Fmax-keys=1000prefix HTTP/1.1 200 1596 - 
Cyberduck/4.3.1 (Mac OS X/10.8.4) (i386)
172.16.11.118 - - [27/Aug/2013:11:15:23 -0400] GET / HTTP/1.1 200 
1673 - Cyberduck/4.3.1 (Mac OS X/10.8.4) (i386)
172.16.11.118 - - [27/Aug/2013:11:15:23 -0400] GET 
/aaa/?delimiter=%2Fmax-keys=1000prefix HTTP/1.1 200 573 - 
Cyberduck/4.3.1 (Mac OS X/10.8.4) (i386)
172.16.11.118 - - [27/Aug/2013:11:15:23 -0400] PUT 
/aaa/XenServer-6.2-binpkg.iso HTTP/1.1 500 377 - Cyberduck/4.3.1 
(Mac OS X/10.8.4) (i386)
172.16.11.118 - - [27/Aug/2013:11:18:09 -0400] PUT 
/aaa/XenServer-6.2-binpkg.iso HTTP/1.1 200 315 - Cyberduck/4.3.1 
(Mac OS X/10.8.4) (i386)
172.16.11.118 - - [27/Aug/2013:11:18:27 -0400] GET 
/aaa/?delimiter=%2Fmax-keys=1000prefix HTTP/1.1 200 944 - 
Cyberduck/4.3.1 (Mac OS X/10.8.4) (i386)
172.16.11.118 - - [27/Aug/2013:11:18:27 -0400] HEAD 
/aaa/XenServer-6.2-binpkg.iso HTTP/1.1 404 248 - Cyberduck/4.3.1 
(Mac OS X/10.8.4) (i386)



radosgw.log
2013-08-27 11:15:23.634295 7f1e490ec700  2 req 4:0.000457:s3:PUT 
/aaa/XenServer-6.2-binpkg.iso:put_obj:verifying op params
2013-08-27 11:15:23.634297 7f1e490ec700  2 req 4:0.000459:s3:PUT 
/aaa/XenServer-6.2-binpkg.iso:put_obj:executing
2013-08-27 11:15:31.797706 7f1e42ffd700  2 
RGWDataChangesLog::ChangesRenewThread: start
2013-08-27 11:15:53.797852 7f1e42ffd700  2 
RGWDataChangesLog::ChangesRenewThread: start
2013-08-27 11:16:15.798007 7f1e42ffd700  2 
RGWDataChangesLog::ChangesRenewThread: start
2013-08-27 11:16:37.798146 7f1e42ffd700  2 
RGWDataChangesLog::ChangesRenewThread: start
2013-08-27 11:16:59.798282 7f1e42ffd700  2 
RGWDataChangesLog::ChangesRenewThread: start
2013-08-27 11:17:21.798415 7f1e42ffd700  2 
RGWDataChangesLog::ChangesRenewThread: start
2013-08-27 11:17:43.798551 7f1e42ffd700  2 
RGWDataChangesLog::ChangesRenewThread: start

2013-08-27 11:18:05.495773 7f1e490ec700 10 x x-amz-acl:private
2013-08-27 11:18:05.495932 7f1e490ec700 20 get_obj_state: 

Re: [ceph-users] Limitations of Ceph

2013-08-27 Thread Neil Levine
On Tue, Aug 27, 2013 at 10:19 AM, Sage Weil s...@inktank.com wrote:

 Hi Guido!

 On Tue, 27 Aug 2013, Guido Winkelmann wrote:
  Hi,
 
  I have been running a small Ceph cluster for experimentation for a
while, and
  now my employer has asked me to do little talk about my findings, and
one
  important part is, of course, going to be practical limitations of Ceph.
 
  Here is my list so far:
 
  - Ceph is not supported by VMWare ESX. That may change in the future,
but
  seeing how VMWare is now owned by EMC, they might make it a political
decision
  to not support Ceph.
  Apparently, you can import an RBD volume on linux server and then
reexport it
  to a VMWare host as an iSCSI target, but doing so would introduce a
bottleneck
  and a single point of failure, which kind of defeats the purpose of
having a
  Ceph cluster in the first place.

 It will be a challenge to make ESX natively support RBD as RBD is open
 source (ESX is proprietary), ESX is (I think) based on a *BSD kernel, and
 VMWare just announced a possibly competitive product.  Inktank is doing
 what it can.

To add some context to this, my current understanding is that VMware do
provide mechanisms to add plugins to ESX but a formal partnership is needed
for those plugins to be signed  certified. As such the challenge is more
commercial than technical. Inktank are in conversations with VMware but if
you are interested in seeing support, please tell your VMware account rep
and let us know so we can demonstrate the customer demand for this.

VMware partner with multiple storage companies (as evidenced by the number
of storage vendors at VMWorld this week) so the fact that they have
launched vSAN and are owned by EMC is not a commercial barrier. The ESX
business unit want to sell as many licenses as possible and so a good
storage ecosystem is critical to them.

On the Windows side, as Sage said, watch this space. :-)

Neil
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Geek On Duty: Hours Update

2013-08-27 Thread Ross David Turk

Hey all!  We’ve moved our Friday “Geek on Duty” shift from 1pm PT to
8am PT.  You can see the new calendar here:

http://ceph.com/help/community/

We’re a bit light on US-friendly shifts at the moment.  If anyone wants
to volunteer to be a Geek on Duty, tell commun...@ceph.com and we’ll set
you up!

In exchange for your time,
  your company gets: a nice, cozy spot on ceph.com
  you get: to build your skills by helping people :)

Cheers,
Ross

--
Ross Turk
Community, Inktank

@rossturk @inktank @ceph

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Some help needed with ceph deployment

2013-08-27 Thread Sage Weil
]},
 
     { pgid: 2.31,
 
   osds: [
 
     3,
 
     1]},
 
     { pgid: 2.32,
 
   osds: [
 
     2,
 
     1]},
 
     { pgid: 2.33,
 
   osds: [
 
     3,
 
     0]},
 
     { pgid: 2.35,
 
   osds: [
 
     3,
 
     2]},
 
     { pgid: 2.36,
 
   osds: [
 
     1,
 
     0]},
 
     { pgid: 2.38,
 
   osds: [
 
     1,
 
     0]},
 
     { pgid: 2.3a,
 
   osds: [
 
     3,
 
     1]},
 
     { pgid: 2.3c,
 
   osds: [
 
     4,
 
     0]},
 
     { pgid: 2.3d,
 
   osds: [
 
     2,
 
     0]},
 
     { pgid: 2.3e,
 
   osds: [
 
     1,
 
     0]},
 
     { pgid: 2.3f,
 
   osds: [
 
     4,
 
     1]}],
 
   blacklist: []}
 
  
 
 
 
 __ Informatie van ESET Endpoint Antivirus, versie van database
 viruskenmerken 8734 (20130827) __
 
 Het bericht is gecontroleerd door ESET Endpoint Antivirus.
 
 http://www.eset.com
 
 
 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Could you add me to the subscribe list?

2013-08-27 Thread sriram

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] xfsprogs not found in RHEL

2013-08-27 Thread sriram
I am trying to install CEPH and I get the following error -

--- Package ceph.x86_64 0:0.67.2-0.el6 will be installed
-- Processing Dependency: xfsprogs for package: ceph-0.67.2-0.el6.x86_64
--- Package python-babel.noarch 0:0.9.4-5.1.el6 will be installed
--- Package python-backports-ssl_match_hostname.noarch 0:3.2-0.3.a3.el6
will be installed
--- Package python-docutils.noarch 0:0.6-1.el6 will be installed
-- Processing Dependency: python-imaging for package:
python-docutils-0.6-1.el6.noarch
--- Package python-jinja2.x86_64 0:2.2.1-1.el6 will be installed
--- Package python-pygments.noarch 0:1.1.1-1.el6 will be installed
--- Package python-six.noarch 0:1.1.0-2.el6 will be installed
-- Running transaction check
--- Package ceph.x86_64 0:0.67.2-0.el6 will be installed
-- Processing Dependency: xfsprogs for package: ceph-0.67.2-0.el6.x86_64
--- Package python-imaging.x86_64 0:1.1.6-19.el6 will be installed
-- Finished Dependency Resolution
Error: Package: ceph-0.67.2-0.el6.x86_64 (ceph)
   Requires: xfsprogs


Machine Info -

Linux version 2.6.32-131.4.1.el6.x86_64 (
mockbu...@x86-003.build.bos.redhat.com) (gcc version 4.4.5 20110214 (Red
Hat 4.4.5-6) (GCC) ) #1 SMP Fri Jun 10 10:54:26 EDT 2011
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] xfsprogs not found in RHEL

2013-08-27 Thread sriram
I am trying to install CEPH and I get the following error -

--- Package ceph.x86_64 0:0.67.2-0.el6 will be installed
-- Processing Dependency: xfsprogs for package: ceph-0.67.2-0.el6.x86_64
--- Package python-babel.noarch 0:0.9.4-5.1.el6 will be installed
--- Package python-backports-ssl_match_hostname.noarch 0:3.2-0.3.a3.el6
will be installed
--- Package python-docutils.noarch 0:0.6-1.el6 will be installed
-- Processing Dependency: python-imaging for package:
python-docutils-0.6-1.el6.noarch
--- Package python-jinja2.x86_64 0:2.2.1-1.el6 will be installed
--- Package python-pygments.noarch 0:1.1.1-1.el6 will be installed
--- Package python-six.noarch 0:1.1.0-2.el6 will be installed
-- Running transaction check
--- Package ceph.x86_64 0:0.67.2-0.el6 will be installed
-- Processing Dependency: xfsprogs for package: ceph-0.67.2-0.el6.x86_64
--- Package python-imaging.x86_64 0:1.1.6-19.el6 will be installed
-- Finished Dependency Resolution
Error: Package: ceph-0.67.2-0.el6.x86_64 (ceph)
   Requires: xfsprogs


Machine Info -

Linux version 2.6.32-131.4.1.el6.x86_64 (
mockbu...@x86-003.build.bos.redhat.com) (gcc version 4.4.5 20110214 (Red
Hat 4.4.5-6) (GCC) ) #1 SMP Fri Jun 10 10:54:26 EDT 2011
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xfsprogs not found in RHEL

2013-08-27 Thread Lincoln Bryant
Hi,

xfsprogs should be included in the EL6 base.

Perhaps run yum clean all and try again?

Cheers,
Lincoln

On Aug 27, 2013, at 9:16 PM, sriram wrote:

 I am trying to install CEPH and I get the following error - 
 
 --- Package ceph.x86_64 0:0.67.2-0.el6 will be installed
 -- Processing Dependency: xfsprogs for package: ceph-0.67.2-0.el6.x86_64
 --- Package python-babel.noarch 0:0.9.4-5.1.el6 will be installed
 --- Package python-backports-ssl_match_hostname.noarch 0:3.2-0.3.a3.el6 will 
 be installed
 --- Package python-docutils.noarch 0:0.6-1.el6 will be installed
 -- Processing Dependency: python-imaging for package: 
 python-docutils-0.6-1.el6.noarch
 --- Package python-jinja2.x86_64 0:2.2.1-1.el6 will be installed
 --- Package python-pygments.noarch 0:1.1.1-1.el6 will be installed
 --- Package python-six.noarch 0:1.1.0-2.el6 will be installed
 -- Running transaction check
 --- Package ceph.x86_64 0:0.67.2-0.el6 will be installed
 -- Processing Dependency: xfsprogs for package: ceph-0.67.2-0.el6.x86_64
 --- Package python-imaging.x86_64 0:1.1.6-19.el6 will be installed
 -- Finished Dependency Resolution
 Error: Package: ceph-0.67.2-0.el6.x86_64 (ceph)
Requires: xfsprogs
 
 
 Machine Info - 
 
 Linux version 2.6.32-131.4.1.el6.x86_64 
 (mockbu...@x86-003.build.bos.redhat.com) (gcc version 4.4.5 20110214 (Red Hat 
 4.4.5-6) (GCC) ) #1 SMP Fri Jun 10 10:54:26 EDT 2011
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xfsprogs not found in RHEL

2013-08-27 Thread sriram
Tried

yum clean all followed by
yum install ceph

and the same result.


On Tue, Aug 27, 2013 at 7:44 PM, Lincoln Bryant linco...@uchicago.eduwrote:

 Hi,

 xfsprogs should be included in the EL6 base.

 Perhaps run yum clean all and try again?

 Cheers,
 Lincoln

 On Aug 27, 2013, at 9:16 PM, sriram wrote:

 I am trying to install CEPH and I get the following error -

 --- Package ceph.x86_64 0:0.67.2-0.el6 will be installed
 -- Processing Dependency: xfsprogs for package: ceph-0.67.2-0.el6.x86_64
 --- Package python-babel.noarch 0:0.9.4-5.1.el6 will be installed
 --- Package python-backports-ssl_match_hostname.noarch 0:3.2-0.3.a3.el6
 will be installed
 --- Package python-docutils.noarch 0:0.6-1.el6 will be installed
 -- Processing Dependency: python-imaging for package:
 python-docutils-0.6-1.el6.noarch
 --- Package python-jinja2.x86_64 0:2.2.1-1.el6 will be installed
 --- Package python-pygments.noarch 0:1.1.1-1.el6 will be installed
 --- Package python-six.noarch 0:1.1.0-2.el6 will be installed
 -- Running transaction check
 --- Package ceph.x86_64 0:0.67.2-0.el6 will be installed
 -- Processing Dependency: xfsprogs for package: ceph-0.67.2-0.el6.x86_64
 --- Package python-imaging.x86_64 0:1.1.6-19.el6 will be installed
 -- Finished Dependency Resolution
 Error: Package: ceph-0.67.2-0.el6.x86_64 (ceph)
Requires: xfsprogs


 Machine Info -

 Linux version 2.6.32-131.4.1.el6.x86_64 (
 mockbu...@x86-003.build.bos.redhat.com) (gcc version 4.4.5 20110214 (Red
 Hat 4.4.5-6) (GCC) ) #1 SMP Fri Jun 10 10:54:26 EDT 2011
  ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com