[ceph-users] How do you replace an OSD?

2013-08-13 Thread Dmitry Postrigan

I just got my small Ceph cluster running. I run 6 OSDs on the same server to 
basically replace mdraid.

I have tried to simulate a hard drive (OSD) failure: removed the OSD 
(out+stop), zapped it, and then
prepared and activated it. It worked, but I ended up with one extra OSD (and 
the old one still showing in the ceph -w output).
I guess this is not how I am supposed to do it?

Documentation recommends manually editing the configuration, however, there are 
no osd entries in my /etc/ceph/ceph.conf

So what would be the best way to replace a failed OSD?

Dmitry

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy and journal on separate disk

2013-08-13 Thread Pavel Timoschenkov
Hi.
Yes, i'm zapped all disks before.

More about my situation:
sdaa - one of disk for data: 3 TB with GPT partition table.
sda - ssd drive with manual created partitions (10 GB) for journal with MBR 
partition table.
===
fdisk -l /dev/sda

Disk /dev/sda: 480.1 GB, 480103981056 bytes
255 heads, 63 sectors/track, 58369 cylinders, total 937703088 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00033624

   Device Boot  Start End  Blocks   Id  System
/dev/sda1204819531775 9764864   83  Linux
/dev/sda21953177639061503 9764864   83  Linux
/dev/sda33906150458593279 9765888   83  Linux
/dev/sda47812505697656831 9765888   83  Linux

===

If i'm executed ceph-deploy osd prepare without journal options - it's ok:


ceph@ceph-admin:~$ ceph-deploy disk zap ceph001:sdaa ceph001:sda1
[ceph_deploy.osd][DEBUG ] zapping /dev/sdaa on ceph001
[ceph_deploy.osd][DEBUG ] zapping /dev/sda1 on ceph001

ceph@ceph-admin:~$ ceph-deploy osd prepare ceph001:sdaa
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ceph001:/dev/sdaa:
[ceph_deploy.osd][DEBUG ] Deploying osd to ceph001
[ceph_deploy.osd][DEBUG ] Host ceph001 is now ready for osd use.
[ceph_deploy.osd][DEBUG ] Preparing host ceph001 disk /dev/sdaa journal None 
activate False

root@ceph001:~# gdisk -l /dev/sdaa
GPT fdisk (gdisk) version 0.8.1

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sdaa: 5860533168 sectors, 2.7 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 575ACF17-756D-47EC-828B-2E0A0B8ED757
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 5860533134
Partitions will be aligned on 2048-sector boundaries
Total free space is 4061 sectors (2.0 MiB)

Number  Start (sector)End (sector)  Size   Code  Name
   1 2099200  5860533134   2.7 TiB   ceph data
   22048 2097152   1023.0 MiB    ceph journal

Problems start, when i'm try create journal on separate drive:

ceph@ceph-admin:~$ ceph-deploy disk zap ceph001:sdaa ceph001:sda1
[ceph_deploy.osd][DEBUG ] zapping /dev/sdaa on ceph001
[ceph_deploy.osd][DEBUG ] zapping /dev/sda1 on ceph001

ceph@ceph-admin:~$ ceph-deploy osd prepare ceph001:sdaa:sda1
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
ceph001:/dev/sdaa:/dev/sda1
[ceph_deploy.osd][DEBUG ] Deploying osd to ceph001
[ceph_deploy.osd][DEBUG ] Host ceph001 is now ready for osd use.
[ceph_deploy.osd][DEBUG ] Preparing host ceph001 disk /dev/sdaa journal 
/dev/sda1 activate False
[ceph_deploy.osd][ERROR ] ceph-disk-prepare -- /dev/sdaa /dev/sda1 returned 1
Information: Moved requested sector from 34 to 2048 in
order to align on 2048-sector boundaries.
The operation has completed successfully.
meta-data=/dev/sdaa1 isize=2048   agcount=32, agsize=22892700 blks
 =   sectsz=512   attr=2, projid32bit=0
data =   bsize=4096   blocks=732566385, imaxpct=5
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal log   bsize=4096   blocks=357698, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same 
device as the osd data
mount: /dev/sdaa1: more filesystems detected. This should not happen,
   use -t type to explicitly specify the filesystem type or
   use wipefs(8) to clean up the device.

mount: you must specify the filesystem type
ceph-disk: Mounting filesystem failed: Command '['mount', '-o', 'noatime', 
'--', '/dev/sdaa1', '/var/lib/ceph/tmp/mnt.fZQxiz']' returned non-zero exit 
status 32

ceph-deploy: Failed to create 1 OSDs

-Original Message-
From: Samuel Just [mailto:sam.j...@inktank.com] 
Sent: Monday, August 12, 2013 11:39 PM
To: Pavel Timoschenkov
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] ceph-deploy and journal on separate disk

Did you try using ceph-deploy disk zap ceph001:sdaa first?
-Sam

On Mon, Aug 12, 2013 at 6:21 AM, Pavel Timoschenkov 
pa...@bayonetteas.onmicrosoft.com wrote:
 Hi.

 I have some problems with create journal on separate disk, using 
 ceph-deploy osd prepare command.

 When I try execute next command:

 ceph-deploy osd prepare ceph001:sdaa:sda1

 where:

 sdaa - disk for ceph data

 sda1 - partition on ssd drive for journal

 I get next errors:

 ==
 ==

 ceph@ceph-admin:~$ ceph-deploy osd prepare ceph001:sdaa:sda1

 

Re: [ceph-users] Ceph instead of RAID

2013-08-13 Thread Wolfgang Hennerbichler


On 08/13/2013 09:23 AM, Jeffrey 'jf' Lim wrote:
 Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks will 
 be local, so I could simply create
 6 local OSDs + a monitor, right? Is there anything I need to watch out for 
 in such configuration?

 You can do that. Although it's nice to play with and everything, I
 wouldn't recommend doing it. It will give you more pain than pleasure.
 
 How so? Care to elaborate?

Ceph is a complex system, built for clusters. It does some stuff in
software that is otherwhise done in hardware (raid controllers). The
nature of the complexity of a cluster system is a lot of overhead
compared to a local raid [whatever] system, and latency of disk i/o will
naturally suffer a bit. An OSD needs about 300 MB of RAM (may vary on
your PGs), times 6 is a waste of nearly 2 GB of RAM (compared to a
local RAID). Also ceph is young, and it does indeed have some bugs. RAID
is old, and very mature. Although I rely on ceph on a productive
cluster, too, it is way harder to maintain than a simple local raid.
When a disk fails in ceph you don't have to worry about your data, which
is a good thing, but you have to worry about the rebuilding (which isn't
too hard, but at least you need to know SOMETHING about ceph), with
(hardware) RAID you simply replace the disk, and it will be rebuilt.

Others will find more reasons why this is not the best idea for a
production system.

Don't get me wrong, I'm a big supporter of ceph, but only for clusters,
not for single systems.

wogri

 -jf
 
 
 --
 He who settles on the idea of the intelligent man as a static entity
 only shows himself to be a fool.
 
 Every nonfree program has a lord, a master --
 and if you use the program, he is your master.
 --Richard Stallman
 


-- 
DI (FH) Wolfgang Hennerbichler
Software Development
Unit Advanced Computing Technologies
RISC Software GmbH
A company of the Johannes Kepler University Linz

IT-Center
Softwarepark 35
4232 Hagenberg
Austria

Phone: +43 7236 3343 245
Fax: +43 7236 3343 250
wolfgang.hennerbich...@risc-software.at
http://www.risc-software.at
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph instead of RAID

2013-08-13 Thread Dmitry Postrigan
 This will be a single server configuration, the goal is to replace mdraid, 
 hence I tried to use localhost
 (nothing more will be added to the cluster). Are you saying it will be less 
 fault tolerant than a RAID-10?

 Ceph is a distributed object store. If you stay within a single machine,
 keep using a local RAID solution (hardware or software).

 Why would you want to make this switch?

I do not think RAID-10 on 6 3TB disks is going to be reliable at all. I have 
simulated several failures, and
it looks like a rebuild will take a lot of time. Funnily, during one of these 
experiments, another drive
failed, and I had lost the entire array. Good luck recovering from that...

I feel that Ceph is better than mdraid because:
1) When ceph cluster is far from being full, 'rebuilding' will be much faster 
vs mdraid
2) You can easily change the number of replicas
3) When multiple disks have bad sectors, I suspect ceph will be much easier to 
recover data from than from
mdraid which will simply never finish rebuilding.
4) If we need to migrate data over to a different server with no downtime, we 
just add more OSDs, wait, and
then remove the old ones :-)

This is my initial observation though, so please correct me if I am wrong.

Dmitry

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph instead of RAID

2013-08-13 Thread Wolfgang Hennerbichler


On 08/13/2013 09:47 AM, Dmitry Postrigan wrote:

 Why would you want to make this switch?
 
 I do not think RAID-10 on 6 3TB disks is going to be reliable at all. I have 
 simulated several failures, and
 it looks like a rebuild will take a lot of time. Funnily, during one of these 
 experiments, another drive
 failed, and I had lost the entire array. Good luck recovering from that...

good point.

 I feel that Ceph is better than mdraid because:
 1) When ceph cluster is far from being full, 'rebuilding' will be much faster 
 vs mdraid

true

 2) You can easily change the number of replicas

true

 3) When multiple disks have bad sectors, I suspect ceph will be much easier 
 to recover data from than from
 mdraid which will simply never finish rebuilding.

maybe not true. also if you have one disk that is starting to be slow
(because of upcoming failure), ceph will slow down drastically, and you
need to find the failing disk.

 4) If we need to migrate data over to a different server with no downtime, we 
 just add more OSDs, wait, and
 then remove the old ones :-)

true. but maybe not as easy and painless as you would expect it to be.
also bear in mind that ceph needs a monitor up and running all time.

 This is my initial observation though, so please correct me if I am wrong.

ceph is easier to maintain than most distributed systems I know, but
still harder than a local RAID. Keep that in mind.

 Dmitry

Wolfgang

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 


-- 
DI (FH) Wolfgang Hennerbichler
Software Development
Unit Advanced Computing Technologies
RISC Software GmbH
A company of the Johannes Kepler University Linz

IT-Center
Softwarepark 35
4232 Hagenberg
Austria

Phone: +43 7236 3343 245
Fax: +43 7236 3343 250
wolfgang.hennerbich...@risc-software.at
http://www.risc-software.at
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mounting a pool via fuse

2013-08-13 Thread Dzianis Kahanovich
Georg Höllrigl пишет:

 I'm using ceph 0.61.7.
 
 When using ceph-fuse, I couldn't find a way, to only mount one pool.
 
 Is there a way to mount a pool - or is it simply not supported?

This mean mount as fs?
Same as kernel-level cephfs (fuse  cephfs = same instance). You cannot mount
pool, but can mount filesystem and can map pool to any point of filesystem
(file or directory), include root.

First, mount ceph via kernel - mount -t ceph (just for cephfs tool syntax
compatibility). For example - to /mnt/ceph. Then say ceph df and lookup pool
number (not name!), for example pool number is 10. And last:
mkdir -p /mnt/ceph/pools/pool1
cephfs /mnt/ceph/pools/pool1 set_layout -p 10

or just (for ceph's root):

cephfs /mnt/ceph set_layout -p 10

Next you can unmount kernel-level and mount this point via fuse.

PS For ceph developers: trying this for qouta (with ceph osd pool set-quota)
semi-working: on quota overflow - nothing limited, but ceph health show
warning. In case of no other ways to quota, it may qualified as bug and not
too actual only while big number of pools performance limitation. So, FYI.

-- 
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Wheezy machine died with problems on osdmap

2013-08-13 Thread Giuseppe 'Gippa' Paterno'
Hi all,
my Debian 7 wheezy machine died with the following in the logs:
http://pastebin.ubuntu.com/5981058/

It's using kvm and ceph as an rdb device.
ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)

Can you give me please some advices?
Thanks,
Giuseppe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] one pg stuck with 2 unfound pieces

2013-08-13 Thread Jens-Christian Fischer
We have a cluster with 10 servers, 64 OSDs and 5 Mons on them. The OSDs are 3TB 
disk, formatted with btrfs and the servers are either on Ubuntu 12.10 or 13.04.

Recently one of the servers (13.04) stood still (due to problems with btrfs - 
something we have seen a few times). I decided to not try to recover the disks, 
but reformat them with XFS. I removed the OSDs, reformatted, and re-created 
them (they got the same OSD numbers)

I redid this twice (because I wrongly partioned the disks in the first place) 
and I ended up with 2 unfound pieces in one pg:

root@s2:~# ceph health details
HEALTH_WARN 1 pgs degraded; 1 pgs recovering; 1 pgs stuck unclean; recovery 
4448/28915270 degraded (0.015%); 2/9854766 unfound (0.000%)
pg 0.cfa is stuck unclean for 1004252.309704, current state 
active+recovering+degraded+remapped, last acting [23,50]
pg 0.cfa is active+recovering+degraded+remapped, acting [23,50], 2 unfound
recovery 4448/28915270 degraded (0.015%); 2/9854766 unfound (0.000%)


root@s2:~# ceph pg 0.cfa query

{ state: active+recovering+degraded+remapped,
  epoch: 28197,
  up: [
23,
50,
18],
  acting: [
23,
50],
  info: { pgid: 0.cfa,
  last_update: 28082'7774,
  last_complete: 23686'7083,
  log_tail: 14360'4061,
  last_backfill: MAX,
  purged_snaps: [],
  history: { epoch_created: 1,
  last_epoch_started: 28197,
  last_epoch_clean: 24810,
  last_epoch_split: 0,
  same_up_since: 28195,
  same_interval_since: 28196,
  same_primary_since: 26036,
  last_scrub: 20585'6801,
  last_scrub_stamp: 2013-07-28 15:40:53.298786,
  last_deep_scrub: 20585'6801,
  last_deep_scrub_stamp: 2013-07-28 15:40:53.298786,
  last_clean_scrub_stamp: 2013-07-28 15:40:53.298786},
  stats: { version: 28082'7774,
  reported: 28197'41950,
  state: active+recovering+degraded+remapped,
  last_fresh: 2013-08-13 14:34:33.057271,
  last_change: 2013-08-13 14:34:33.057271,
  last_active: 2013-08-13 14:34:33.057271,
  last_clean: 2013-08-01 23:50:18.414082,
  last_became_active: 2013-05-29 13:10:51.366237,
  last_unstale: 2013-08-13 14:34:33.057271,
  mapping_epoch: 28195,
  log_start: 14360'4061,
  ondisk_log_start: 14360'4061,
  created: 1,
  last_epoch_clean: 24810,
  parent: 0.0,
  parent_split_bits: 0,
  last_scrub: 20585'6801,
  last_scrub_stamp: 2013-07-28 15:40:53.298786,
  last_deep_scrub: 20585'6801,
  last_deep_scrub_stamp: 2013-07-28 15:40:53.298786,
  last_clean_scrub_stamp: 2013-07-28 15:40:53.298786,
  log_size: 0,
  ondisk_log_size: 0,
  stats_invalid: 0,
  stat_sum: { num_bytes: 145307402,
  num_objects: 2234,
  num_object_clones: 0,
  num_object_copies: 0,
  num_objects_missing_on_primary: 0,
  num_objects_degraded: 0,
  num_objects_unfound: 0,
  num_read: 744,
  num_read_kb: 410184,
  num_write: 7774,
  num_write_kb: 1155438,
  num_scrub_errors: 0,
  num_shallow_scrub_errors: 0,
  num_deep_scrub_errors: 0,
  num_objects_recovered: 3998,
  num_bytes_recovered: 278803622,
  num_keys_recovered: 0},
  stat_cat_sum: {},
  up: [
23,
50,
18],
  acting: [
23,
50]},
  empty: 0,
  dne: 0,
  incomplete: 0,
  last_epoch_started: 28197},
  recovery_state: [
{ name: Started\/Primary\/Active,
  enter_time: 2013-08-13 14:34:33.026698,
  might_have_unfound: [
{ osd: 9,
  status: querying},
{ osd: 18,
  status: querying},
{ osd: 50,
  status: already probed}],
  recovery_progress: { backfill_target: 50,
  waiting_on_backfill: 0,
  backfill_pos: 96220cfa\/1799e82.\/head\/\/0,
  backfill_info: { begin: 0\/\/0\/\/-1,
  end: 0\/\/0\/\/-1,
  objects: []},
  peer_backfill_info: { begin: 0\/\/0\/\/-1,
  end: 0\/\/0\/\/-1,
  objects: []},
  backfills_in_flight: [],
  pull_from_peer: [],
  pushing: []},
  scrub: { scrubber.epoch_start: 0,
  scrubber.active: 0,
  scrubber.block_writes: 0,
  scrubber.finalizing: 0,
  scrubber.waiting_on: 0,
  scrubber.waiting_on_whom: []}},
{ name: Started,
  enter_time: 2013-08-13 14:34:32.024282}]}

I have tried to mark those two pieces as lost, but ceph wouldn't let me (due to 
the fact that it is 

Re: [ceph-users] Ceph instead of RAID

2013-08-13 Thread Mark Nelson

On 08/13/2013 02:56 AM, Dmitry Postrigan wrote:

I am currently installing some backup servers with 6x3TB drives in them. I 
played with RAID-10 but I was not
impressed at all with how it performs during a recovery.

Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks will be 
local, so I could simply create
6 local OSDs + a monitor, right? Is there anything I need to watch out for in 
such configuration?



You can do that. Although it's nice to play with and everything, I
wouldn't recommend doing it. It will give you more pain than pleasure.


Any specific reason? I just got it up and running, an after simulating some 
failures, I like it much better than
mdraid. Again, this only applies to large arrays (6x3TB in my case). I would 
not use ceph to replace a RAID-1
array of course, but it looks like a good idea to replace a large RAID10 array 
with a local ceph installation.

The only thing I do not enjoy about ceph is performance. Probably need to do 
more tweaking, but so far numbers
are not very impressive. I have two exactly same servers running same OS, 
kernel, etc. Each server has 6x 3TB
drives (same model and firmware #).

Server 1 runs ceph (2 replicas)
Server 2 runs mdraid (raid-10)

I ran some very basic benchmarks on both servers:

dd if=/dev/zero of=/storage/test.bin bs=1M count=10
Ceph: 113 MB/s
mdraid: 467 MB/s


dd if=/storage/test.bin of=/dev/null bs=1M
Ceph: 114 MB/s
mdraid: 550 MB/s


As you can see, mdraid is by far faster than ceph. It could be by design, or 
perhaps I am not doing it
right. Even despite such difference in speed, I would still go with ceph 
because *I think* it is more reliable.


couple of things:

1) Ceph is doing full data journal writes so is going to eat (at least) 
half of your write performance right there.


2) Ceph tends to like lots of concurrency.  You'll probably see higher 
numbers with multiple dd reads/writes going at once.


3) Ceph is a lot more complex than something like mdraid.  It gives you 
a lot more power and flexibility but the cost is greater complexity. 
There are probably things you can tune to get your numbers up, but it 
could take some work.


Having said all of this, my primary test box is a single server and I 
can get 90MB/s+ per drive out of Ceph (with 24 drives!), but if I was 
building a production box and never planned to expand to multiple 
servers, I'd certainly be looking into zfs or btrfs RAID.


Mark



Dmitry

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Wheezy machine died with problems on osdmap

2013-08-13 Thread Sage Weil
On Tue, 13 Aug 2013, Giuseppe 'Gippa' Paterno' wrote:
 Hi all,
 my Debian 7 wheezy machine died with the following in the logs:
 http://pastebin.ubuntu.com/5981058/
 
 It's using kvm and ceph as an rdb device.
 ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
 
 Can you give me please some advices?

What kernel version of this?  It looks like an old kernel bug.  Generally 
speaking you should be using 3.4 at the very least if you are using 
the kernel client.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mounting a pool via fuse

2013-08-13 Thread Georg Höllrigl

Thank you for the explaination.

By mounting as filesystem I'm talking about something similar to this:
http://www.sebastien-han.fr/blog/2013/02/11/mount-a-specific-pool-with-cephfs/

Using the kernel module, I can mount a subdirectory into my directory 
tree - a directory, where I have assigned a pool.

Using fuse, I can't mount a subdirectory?

By the way setting the layout seems to have a bug:

# cephfs /mnt/macm01 set_layout -p 4
Error setting layout: Invalid argument

I have to add the -u option, then it works:

# cephfs /mnt/mailstore set_layout -p 5 -u 4194304

Kind Regards,
Georg





On 13.08.2013 12:09, Dzianis Kahanovich wrote:

Georg Höllrigl пишет:


I'm using ceph 0.61.7.

When using ceph-fuse, I couldn't find a way, to only mount one pool.

Is there a way to mount a pool - or is it simply not supported?


This mean mount as fs?
Same as kernel-level cephfs (fuse  cephfs = same instance). You cannot mount
pool, but can mount filesystem and can map pool to any point of filesystem
(file or directory), include root.

First, mount ceph via kernel - mount -t ceph (just for cephfs tool syntax
compatibility). For example - to /mnt/ceph. Then say ceph df and lookup pool
number (not name!), for example pool number is 10. And last:
mkdir -p /mnt/ceph/pools/pool1
cephfs /mnt/ceph/pools/pool1 set_layout -p 10

or just (for ceph's root):

cephfs /mnt/ceph set_layout -p 10

Next you can unmount kernel-level and mount this point via fuse.

PS For ceph developers: trying this for qouta (with ceph osd pool set-quota)
semi-working: on quota overflow - nothing limited, but ceph health show
warning. In case of no other ways to quota, it may qualified as bug and not
too actual only while big number of pools performance limitation. So, FYI.



--
Dipl.-Ing. (FH) Georg Höllrigl
Technik



Xidras GmbH
Stockern 47
3744 Stockern
Austria

Tel: +43 (0) 2983 201 - 30505
Fax: +43 (0) 2983 201 - 930505
Email:   georg.hoellr...@xidras.com
Web: http://www.xidras.com

FN 317036 f | Landesgericht Krems | ATU64485024



VERTRAULICHE INFORMATIONEN!
Diese eMail enthält vertrauliche Informationen und ist nur für den 
berechtigten
Empfänger bestimmt. Wenn diese eMail nicht für Sie bestimmt ist, bitten 
wir Sie,

diese eMail an uns zurückzusenden und anschließend auf Ihrem Computer und
Mail-Server zu löschen. Solche eMails und Anlagen dürfen Sie weder nutzen,
noch verarbeiten oder Dritten zugänglich machen, gleich in welcher Form.
Wir danken für Ihre Kooperation!

CONFIDENTIAL!
This email contains confidential information and is intended for the 
authorised
recipient only. If you are not an authorised recipient, please return 
the email

to us and then delete it from your computer and mail-server. You may neither
use nor edit any such emails including attachments, nor make them accessible
to third parties in any manner whatsoever.
Thank you for your cooperation

 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mounting a pool via fuse

2013-08-13 Thread Sage Weil
On Tue, 13 Aug 2013, Georg H?llrigl wrote:
 Thank you for the explaination.
 
 By mounting as filesystem I'm talking about something similar to this:
 http://www.sebastien-han.fr/blog/2013/02/11/mount-a-specific-pool-with-cephfs/
 
 Using the kernel module, I can mount a subdirectory into my directory 
 tree - a directory, where I have assigned a pool.
 Using fuse, I can't mount a subdirectory?

ceph-fuse --mount-root /some/path /mnt/ceph

should do the trick.

 By the way setting the layout seems to have a bug:
 
 # cephfs /mnt/macm01 set_layout -p 4
 Error setting layout: Invalid argument
 
 I have to add the -u option, then it works:
 
 # cephfs /mnt/mailstore set_layout -p 5 -u 4194304

Curious.  Opened a bug!

sage

 
 Kind Regards,
 Georg
 
 
 
 
 
 On 13.08.2013 12:09, Dzianis Kahanovich wrote:
  Georg H?llrigl ?:
 
  I'm using ceph 0.61.7.
 
  When using ceph-fuse, I couldn't find a way, to only mount one pool.
 
  Is there a way to mount a pool - or is it simply not supported?
 
  This mean mount as fs?
  Same as kernel-level cephfs (fuse  cephfs = same instance). You cannot
 mount
  pool, but can mount filesystem and can map pool to any point of filesystem
  (file or directory), include root.
 
  First, mount ceph via kernel - mount -t ceph (just for cephfs tool
 syntax
  compatibility). For example - to /mnt/ceph. Then say ceph df and lookup
 pool
  number (not name!), for example pool number is 10. And last:
  mkdir -p /mnt/ceph/pools/pool1
  cephfs /mnt/ceph/pools/pool1 set_layout -p 10
 
  or just (for ceph's root):
 
  cephfs /mnt/ceph set_layout -p 10
 
  Next you can unmount kernel-level and mount this point via fuse.
 
  PS For ceph developers: trying this for qouta (with ceph osd pool
 set-quota)
  semi-working: on quota overflow - nothing limited, but ceph health show
  warning. In case of no other ways to quota, it may qualified as bug and
 not
  too actual only while big number of pools performance limitation. So, FYI.
 
 
 -- 
 Dipl.-Ing. (FH) Georg H?llrigl
 Technik
 
 
 
 Xidras GmbH
 Stockern 47
 3744 Stockern
 Austria
 
 Tel: +43 (0) 2983 201 - 30505
 Fax: +43 (0) 2983 201 - 930505
 Email:   georg.hoellr...@xidras.com
 Web: http://www.xidras.com
 
 FN 317036 f | Landesgericht Krems | ATU64485024
 
 
 
 VERTRAULICHE INFORMATIONEN!
 Diese eMail enth?lt vertrauliche Informationen und ist nur f?r den 
 berechtigten
 Empf?nger bestimmt. Wenn diese eMail nicht f?r Sie bestimmt ist, bitten 
 wir Sie,
 diese eMail an uns zur?ckzusenden und anschlie?end auf Ihrem Computer und
 Mail-Server zu l?schen. Solche eMails und Anlagen d?rfen Sie weder nutzen,
 noch verarbeiten oder Dritten zug?nglich machen, gleich in welcher Form.
 Wir danken f?r Ihre Kooperation!
 
 CONFIDENTIAL!
 This email contains confidential information and is intended for the 
 authorised
 recipient only. If you are not an authorised recipient, please return 
 the email
 to us and then delete it from your computer and mail-server. You may neither
 use nor edit any such emails including attachments, nor make them accessible
 to third parties in any manner whatsoever.
 Thank you for your cooperation
 
 
  
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Jeppesen, Nelson
Thanks Joao,

Is there a doc somewhere on the dependencies? I assume I'll need to setup the 
tool chain to compile?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Jeppesen, Nelson
Is there an easy way I can find the age and/or expiration of the service ticket 
on a particular osd? Is that a file or just kept in ram?


-Original Message-
From: Sage Weil [mailto:s...@inktank.com] 
Sent: Tuesday, August 13, 2013 9:01 AM
To: Jeppesen, Nelson
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Why is my mon store.db is 220GB?

On Tue, 13 Aug 2013, Jeppesen, Nelson wrote:
 Interesting,
 
 So if I change ' auth service ticket ttl' to 172,800, in theory I could go 
 without a monitor for 48 hours?

If there are no up/down events, no new clients need to start, no osd recovery 
going on, then I *think* so.  I may be forgetting something.

sage


 
 
 -Original Message-
 From: Sage Weil [mailto:s...@inktank.com]
 Sent: Monday, August 12, 2013 9:50 PM
 To: Jeppesen, Nelson
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Why is my mon store.db is 220GB?
 
 On Mon, 12 Aug 2013, Jeppesen, Nelson wrote:
  Joao,
  
  (log file uploaded to http://pastebin.com/Ufrxn6fZ)
  
  I had some good luck and some bad luck. I copied the store.db to a new 
  monitor, injected a modified monmap and started it up (This is all on the 
  same host.) Very quickly it reached quorum (as far as I can tell) but 
  didn't respond. Running 'ceph -w' just hung, no timeouts or errors. Same 
  thing when restarting an OSD.
  
  The last lines of the log file   '...ms_verify_authorizer..' are from 'ceph 
  -w' attempts.
  
  I restarted everything again and it sat there synchronizing. IO stat 
  reported about 100MB/s, but just reads. I let it sit there for 7 min but 
  nothing happened.
 
 Can you do this again with --debug-mon 20 --debug-ms 1?  It looks as though 
 the main dispatch thread is blocked (7f71a1aa5700 does nothing after winning 
 the election).  It would also be helpful to gdb attach to the running 
 ceph-mon and capture the output from 'thread apply all bt'.
 
  Side question, how long can a ceph cluster run without a monitor? I 
  was able to upload files via rados gateway without issue even when 
  the monitor was down.
 
 Quite a while, as long as no new processes need to authenticate, and no nodes 
 go up or down.  Eventually the authentication keys are going to time out, 
 though (1 hour is the default).
 
 sage
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Sage Weil
On Tue, 13 Aug 2013, Jeppesen, Nelson wrote:
 Is there an easy way I can find the age and/or expiration of the service 
 ticket on a particular osd? Is that a file or just kept in ram?

It's just in ram.  If you crank up debug auth = 10 you will periodically 
see it dump the rotating keys and expirations.  Ideally the middle one 
will remain valid, but things won't grind to a halt until they are all 
expired.

sage

  
 
 -Original Message-
 From: Sage Weil [mailto:s...@inktank.com] 
 Sent: Tuesday, August 13, 2013 9:01 AM
 To: Jeppesen, Nelson
 Cc: ceph-users@lists.ceph.com
 Subject: RE: [ceph-users] Why is my mon store.db is 220GB?
 
 On Tue, 13 Aug 2013, Jeppesen, Nelson wrote:
  Interesting,
  
  So if I change ' auth service ticket ttl' to 172,800, in theory I could go 
  without a monitor for 48 hours?
 
 If there are no up/down events, no new clients need to start, no osd recovery 
 going on, then I *think* so.  I may be forgetting something.
 
 sage
 
 
  
  
  -Original Message-
  From: Sage Weil [mailto:s...@inktank.com]
  Sent: Monday, August 12, 2013 9:50 PM
  To: Jeppesen, Nelson
  Cc: ceph-users@lists.ceph.com
  Subject: Re: [ceph-users] Why is my mon store.db is 220GB?
  
  On Mon, 12 Aug 2013, Jeppesen, Nelson wrote:
   Joao,
   
   (log file uploaded to http://pastebin.com/Ufrxn6fZ)
   
   I had some good luck and some bad luck. I copied the store.db to a new 
   monitor, injected a modified monmap and started it up (This is all on the 
   same host.) Very quickly it reached quorum (as far as I can tell) but 
   didn't respond. Running 'ceph -w' just hung, no timeouts or errors. Same 
   thing when restarting an OSD.
   
   The last lines of the log file   '...ms_verify_authorizer..' are from 
   'ceph -w' attempts.
   
   I restarted everything again and it sat there synchronizing. IO stat 
   reported about 100MB/s, but just reads. I let it sit there for 7 min but 
   nothing happened.
  
  Can you do this again with --debug-mon 20 --debug-ms 1?  It looks as though 
  the main dispatch thread is blocked (7f71a1aa5700 does nothing after 
  winning the election).  It would also be helpful to gdb attach to the 
  running ceph-mon and capture the output from 'thread apply all bt'.
  
   Side question, how long can a ceph cluster run without a monitor? I 
   was able to upload files via rados gateway without issue even when 
   the monitor was down.
  
  Quite a while, as long as no new processes need to authenticate, and no 
  nodes go up or down.  Eventually the authentication keys are going to time 
  out, though (1 hour is the default).
  
  sage
  
  
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Joao Eduardo Luis

On 13/08/13 09:19, Jeppesen, Nelson wrote:

Thanks Joao,

Is there a doc somewhere on the dependencies? I assume I’ll need to
setup the tool chain to compile?




README on the ceph repo has the dependencies.

You could also try getting it from the gitbuilders [1], but I'm not sure 
how you'd go about doing that without installing other packages.


[1] - http://gitbuilder.ceph.com/

--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Start Stop OSD

2013-08-13 Thread Dan Mick
Adding back ceph-users; try not to turn public threads into private ones 
when the problem hasn't been resolved.


On 08/13/2013 04:42 AM, Joshua Young wrote:

So I put the journals on their own partitions and they worked just
fine. All night they were up doing normal operations. When running
initctl list | grep ceph I would get ...

ceph-mds-all-starter stop/waiting
ceph-mds-all start/running
ceph-osd-all start/running
ceph-osd-all-starter stop/waiting
ceph-all start/running
ceph-mon-all start/running
ceph-mon-all-starter stop/waiting
ceph-mon (ceph/cloud3) start/running, process 1864
ceph-create-keys stop/waiting
ceph-osd (ceph/8) start/running, process 2136
ceph-osd (ceph/20) start/running, process 5281
ceph-osd (ceph/15) start/running, process 5292
ceph-osd (ceph/14) start/running, process 2135
ceph-mds stop/waiting



This is correct. There are 4 OSDs on this server. Now I have come in
today and running ceph -s still says all of my OSDS are up. When I run
the same command as above I only see OSD 14. When I go into the logs of
one of the others (OSD 15 ) I see this...


Does ps agree that only one OSD is left running?


2013-08-13 06:37:48.414775 7ffa2099a7c0  0 ceph version 0.61.7 
(8f010aff684e820ecc837c25ac77c7a05d7191ff), process ceph-osd, pid 16597
2013-08-13 06:37:48.421208 7ffa2099a7c0  0 filestore(/var/lib/ceph/osd/ceph-15) 
lock_fsid failed to lock /var/lib/ceph/osd/ceph-15/fsid, is another ceph-osd 
still running? (11) Resource temporarily unavailable
2013-08-13 06:37:48.421246 7ffa2099a7c0 -1 filestore(/var/lib/ceph/osd/ceph-15) 
FileStore::mount: lock_fsid failed
2013-08-13 06:37:48.421274 7ffa2099a7c0 -1 ^[[0;31m ** ERROR: error converting 
store /var/lib/ceph/osd/ceph-15: (16) Device or resource busy^[[0m
2013-08-13 06:37:48.445927 7f0fbb6687c0  0 ceph version 0.61.7 
(8f010aff684e820ecc837c25ac77c7a05d7191ff), process ceph-osd, pid 16659
2013-08-13 06:37:48.447470 7f0fbb6687c0  0 filestore(/var/lib/ceph/osd/ceph-15) 
lock_fsid failed to lock /var/lib/ceph/osd/ceph-15/fsid, is another ceph-osd 
still running? (11) Resource temporarily unavailable
2013-08-13 06:37:48.447480 7f0fbb6687c0 -1 filestore(/var/lib/ceph/osd/ceph-15) 
FileStore::mount: lock_fsid failed
2013-08-13 06:37:48.447500 7f0fbb6687c0 -1 ^[[0;31m ** ERROR: error converting 
store /var/lib/ceph/osd/ceph-15: (16) Device or resource busy^[[0m
2013-08-13 06:37:48.474852 7f28f332c7c0  0 ceph version 0.61.7 
(8f010aff684e820ecc837c25ac77c7a05d7191ff), process ceph-osd, pid 16752
2013-08-13 06:37:48.476695 7f28f332c7c0  0 filestore(/var/lib/ceph/osd/ceph-15) 
lock_fsid failed to lock /var/lib/ceph/osd/ceph-15/fsid, is another ceph-osd 
still running? (11) Resource temporarily unavailable
2013-08-13 06:37:48.476707 7f28f332c7c0 -1 filestore(/var/lib/ceph/osd/ceph-15) 
FileStore::mount: lock_fsid failed
2013-08-13 06:37:48.476728 7f28f332c7c0 -1 ^[[0;31m ** ERROR: error converting 
store /var/lib/ceph/osd/ceph-15: (16) Device or resource busy^[[0m
2013-08-13 06:37:48.501723 7f84618467c0  0 ceph version 0.61.7 
(8f010aff684e820ecc837c25ac77c7a05d7191ff), process ceph-osd, pid 16845
2013-08-13 06:37:48.503919 7f84618467c0  0 filestore(/var/lib/ceph/osd/ceph-15) 
lock_fsid failed to lock /var/lib/ceph/osd/ceph-15/fsid, is another ceph-osd 
still running? (11) Resource temporarily unavailable
2013-08-13 06:37:48.503932 7f84618467c0 -1 filestore(/var/lib/ceph/osd/ceph-15) 
FileStore::mount: lock_fsid failed
2013-08-13 06:37:48.503955 7f84618467c0 -1 ^[[0;31m ** ERROR: error converting 
store /var/lib/ceph/osd/ceph-15: (16) Device or resource busy^[[0m
2013-08-13 06:37:48.529665 7f29c2a367c0  0 ceph version 0.61.7 
(8f010aff684e820ecc837c25ac77c7a05d7191ff), process ceph-osd, pid 16944
2013-08-13 06:37:48.531227 7f29c2a367c0  0 filestore(/var/lib/ceph/osd/ceph-15) 
lock_fsid failed to lock /var/lib/ceph/osd/ceph-15/fsid, is another ceph-osd 
still running? (11) Resource temporarily unavailable
2013-08-13 06:37:48.531239 7f29c2a367c0 -1 filestore(/var/lib/ceph/osd/ceph-15) 
FileStore::mount: lock_fsid failed
2013-08-13 06:37:48.531260 7f29c2a367c0 -1 ^[[0;31m ** ERROR: error converting 
store /var/lib/ceph/osd/ceph-15: (16) Device or resource busy^[[0m



So the OSD can't get a lock on its data.  You aren't attempting to share 
devices/partitions for OSD storage as well, are you?


What is your cluster configuration?



Any idea? Thanks



-Original Message-
From: Dan Mick [mailto:dan.m...@inktank.com]
Sent: Monday, August 12, 2013 5:50 PM
To: Joshua Young
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Start Stop OSD



On 08/12/2013 04:49 AM, Joshua Young wrote:

I have 2 issues that I can not find a solution to.

First: I am unable to stop / start any osd by command. I have deployed
with ceph-deploy on Ubuntu 13.04 and everything seems to be working
find. I have 5 hosts 5 mons and 20 osds.

Using initctl list | grep ceph gives me



ceph-osd (ceph/15) start/running, process 2122


The fact that only one is output means 

Re: [ceph-users] Ceph instead of RAID

2013-08-13 Thread Martin B Nielsen
Hi,

I'd just like to echo what Wolfgang said about ceph being a complex system.

I initially started out testing ceph with a setup much like yours. And
while it overall performed ok, it was not as good as sw raid on the same
machine.

Also, as Mark said you'll have at very best half write speeds because of
how the journaling works if you do larger continuous writes.

Ceph really shines with multiple servers  multiple concurrency.

My testmachine was running for ½ a year+ (going from argonaut -
cuttlefish) and in that process I came to realize that mixing types of disk
(and size) was a bad idea (some enterprise SATA, some fast desktop and some
green disks) - as speed will be determined by the slowest drive in your
setup (that's why they're advocating using similar hw if at all possible I
guess).

I also experienced all the challenging issues having to deal with a very
young technology; osds suddenly refusing to start, pg's going into various
incomplete/down/inconsistent states, monitor leveldb running full, monitor
dying at weird times and well - I think it is good for a learning
experience, but like Wolfgang said I think it is too much hassle for too
little gain when you have something like raid10/zfs around.

But, by all means, don't let us discourage you if you want to go this route
- ceph's unique self-healing ability was what drew me into running a single
machine in the first place.

Cheers,
Martin



On Tue, Aug 13, 2013 at 9:32 AM, Wolfgang Hennerbichler 
wolfgang.hennerbich...@risc-software.at wrote:



 On 08/13/2013 09:23 AM, Jeffrey 'jf' Lim wrote:
  Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks
 will be local, so I could simply create
  6 local OSDs + a monitor, right? Is there anything I need to watch out
 for in such configuration?
 
  You can do that. Although it's nice to play with and everything, I
  wouldn't recommend doing it. It will give you more pain than pleasure.
 
  How so? Care to elaborate?

 Ceph is a complex system, built for clusters. It does some stuff in
 software that is otherwhise done in hardware (raid controllers). The
 nature of the complexity of a cluster system is a lot of overhead
 compared to a local raid [whatever] system, and latency of disk i/o will
 naturally suffer a bit. An OSD needs about 300 MB of RAM (may vary on
 your PGs), times 6 is a waste of nearly 2 GB of RAM (compared to a
 local RAID). Also ceph is young, and it does indeed have some bugs. RAID
 is old, and very mature. Although I rely on ceph on a productive
 cluster, too, it is way harder to maintain than a simple local raid.
 When a disk fails in ceph you don't have to worry about your data, which
 is a good thing, but you have to worry about the rebuilding (which isn't
 too hard, but at least you need to know SOMETHING about ceph), with
 (hardware) RAID you simply replace the disk, and it will be rebuilt.

 Others will find more reasons why this is not the best idea for a
 production system.

 Don't get me wrong, I'm a big supporter of ceph, but only for clusters,
 not for single systems.

 wogri

  -jf
 
 
  --
  He who settles on the idea of the intelligent man as a static entity
  only shows himself to be a fool.
 
  Every nonfree program has a lord, a master --
  and if you use the program, he is your master.
  --Richard Stallman
 


 --
 DI (FH) Wolfgang Hennerbichler
 Software Development
 Unit Advanced Computing Technologies
 RISC Software GmbH
 A company of the Johannes Kepler University Linz

 IT-Center
 Softwarepark 35
 4232 Hagenberg
 Austria

 Phone: +43 7236 3343 245
 Fax: +43 7236 3343 250
 wolfgang.hennerbich...@risc-software.at
 http://www.risc-software.at
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] one pg stuck with 2 unfound pieces

2013-08-13 Thread Samuel Just
You can run 'ceph pg 0.cfa mark_unfound_lost revert'. (Revert Lost
section of http://ceph.com/docs/master/rados/operations/placement-groups/).
-Sam

On Tue, Aug 13, 2013 at 6:50 AM, Jens-Christian Fischer
jens-christian.fisc...@switch.ch wrote:
 We have a cluster with 10 servers, 64 OSDs and 5 Mons on them. The OSDs are
 3TB disk, formatted with btrfs and the servers are either on Ubuntu 12.10 or
 13.04.

 Recently one of the servers (13.04) stood still (due to problems with btrfs
 - something we have seen a few times). I decided to not try to recover the
 disks, but reformat them with XFS. I removed the OSDs, reformatted, and
 re-created them (they got the same OSD numbers)

 I redid this twice (because I wrongly partioned the disks in the first
 place) and I ended up with 2 unfound pieces in one pg:

 root@s2:~# ceph health details
 HEALTH_WARN 1 pgs degraded; 1 pgs recovering; 1 pgs stuck unclean; recovery
 4448/28915270 degraded (0.015%); 2/9854766 unfound (0.000%)
 pg 0.cfa is stuck unclean for 1004252.309704, current state
 active+recovering+degraded+remapped, last acting [23,50]
 pg 0.cfa is active+recovering+degraded+remapped, acting [23,50], 2 unfound
 recovery 4448/28915270 degraded (0.015%); 2/9854766 unfound (0.000%)


 root@s2:~# ceph pg 0.cfa query

 { state: active+recovering+degraded+remapped,
   epoch: 28197,
   up: [
 23,
 50,
 18],
   acting: [
 23,
 50],
   info: { pgid: 0.cfa,
   last_update: 28082'7774,
   last_complete: 23686'7083,
   log_tail: 14360'4061,
   last_backfill: MAX,
   purged_snaps: [],
   history: { epoch_created: 1,
   last_epoch_started: 28197,
   last_epoch_clean: 24810,
   last_epoch_split: 0,
   same_up_since: 28195,
   same_interval_since: 28196,
   same_primary_since: 26036,
   last_scrub: 20585'6801,
   last_scrub_stamp: 2013-07-28 15:40:53.298786,
   last_deep_scrub: 20585'6801,
   last_deep_scrub_stamp: 2013-07-28 15:40:53.298786,
   last_clean_scrub_stamp: 2013-07-28 15:40:53.298786},
   stats: { version: 28082'7774,
   reported: 28197'41950,
   state: active+recovering+degraded+remapped,
   last_fresh: 2013-08-13 14:34:33.057271,
   last_change: 2013-08-13 14:34:33.057271,
   last_active: 2013-08-13 14:34:33.057271,
   last_clean: 2013-08-01 23:50:18.414082,
   last_became_active: 2013-05-29 13:10:51.366237,
   last_unstale: 2013-08-13 14:34:33.057271,
   mapping_epoch: 28195,
   log_start: 14360'4061,
   ondisk_log_start: 14360'4061,
   created: 1,
   last_epoch_clean: 24810,
   parent: 0.0,
   parent_split_bits: 0,
   last_scrub: 20585'6801,
   last_scrub_stamp: 2013-07-28 15:40:53.298786,
   last_deep_scrub: 20585'6801,
   last_deep_scrub_stamp: 2013-07-28 15:40:53.298786,
   last_clean_scrub_stamp: 2013-07-28 15:40:53.298786,
   log_size: 0,
   ondisk_log_size: 0,
   stats_invalid: 0,
   stat_sum: { num_bytes: 145307402,
   num_objects: 2234,
   num_object_clones: 0,
   num_object_copies: 0,
   num_objects_missing_on_primary: 0,
   num_objects_degraded: 0,
   num_objects_unfound: 0,
   num_read: 744,
   num_read_kb: 410184,
   num_write: 7774,
   num_write_kb: 1155438,
   num_scrub_errors: 0,
   num_shallow_scrub_errors: 0,
   num_deep_scrub_errors: 0,
   num_objects_recovered: 3998,
   num_bytes_recovered: 278803622,
   num_keys_recovered: 0},
   stat_cat_sum: {},
   up: [
 23,
 50,
 18],
   acting: [
 23,
 50]},
   empty: 0,
   dne: 0,
   incomplete: 0,
   last_epoch_started: 28197},
   recovery_state: [
 { name: Started\/Primary\/Active,
   enter_time: 2013-08-13 14:34:33.026698,
   might_have_unfound: [
 { osd: 9,
   status: querying},
 { osd: 18,
   status: querying},
 { osd: 50,
   status: already probed}],
   recovery_progress: { backfill_target: 50,
   waiting_on_backfill: 0,
   backfill_pos: 96220cfa\/1799e82.\/head\/\/0,
   backfill_info: { begin: 0\/\/0\/\/-1,
   end: 0\/\/0\/\/-1,
   objects: []},
   peer_backfill_info: { begin: 0\/\/0\/\/-1,
   end: 0\/\/0\/\/-1,
   objects: []},
   backfills_in_flight: [],
   pull_from_peer: [],
   pushing: []},
   scrub: { scrubber.epoch_start: 0,
   

Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-13 Thread Samuel Just
Cool!
-Sam

On Tue, Aug 13, 2013 at 4:49 AM, Jeff Moskow j...@rtr.com wrote:
 Sam,

 Thanks that did it :-)

health HEALTH_OK
monmap e17: 5 mons at
 {a=172.16.170.1:6789/0,b=172.16.170.2:6789/0,c=172.16.170.3:6789/0,d=172.16.170.4:6789/0,e=172.16.170.5:6789/0},
 election epoch 9794, quorum 0,1,2,3,4 a,b,c,d,e
osdmap e23445: 14 osds: 13 up, 13 in
 pgmap v13552855: 2102 pgs: 2102 active+clean; 531 GB data, 1564 GB used,
 9350 GB / 10914 GB avail; 13104KB/s rd, 4007KB/s wr, 560op/s
mdsmap e3: 0/0/1 up


 --

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] basic single node set up issue on rhel6

2013-08-13 Thread Alfredo Deza
Hi Sijo


On Mon, Aug 12, 2013 at 12:26 PM, Mathew, Sijo (KFRM 1) 
sijo.mat...@credit-suisse.com wrote:

  Hi,

 ** **

 I have been trying to get ceph installed on a single node. But I’m stuck
 with the following error.

 ** **

 [host]$ ceph-deploy -v mon create ceph-server-299

 Deploying mon, cluster ceph hosts ceph-server-299

 Deploying mon to ceph-server-299

 Distro RedHatEnterpriseServer codename Santiago, will use sysvinit

 Traceback (most recent call last):

   File /usr/bin/ceph-deploy, line 21, in module

 main()

   File /usr/lib/python2.6/site-packages/ceph_deploy/cli.py, line 112, in
 main

 return args.func(args)

   File /usr/lib/python2.6/site-packages/ceph_deploy/mon.py, line 234, in
 mon

 mon_create(args)

   File /usr/lib/python2.6/site-packages/ceph_deploy/mon.py, line 138, in
 mon_create

 init=init,

   File /usr/lib/python2.6/site-packages/pushy/protocol/proxy.py, line
 255, in lambda

 (conn.operator(type_, self, args, kwargs))

   File /usr/lib/python2.6/site-packages/pushy/protocol/connection.py,
 line 66, in operator

 return self.send_request(type_, (object, args, kwargs))

   File
 /usr/lib/python2.6/site-packages/pushy/protocol/baseconnection.py, line
 323, in send_request

 return self.__handle(m)

   File
 /usr/lib/python2.6/site-packages/pushy/protocol/baseconnection.py, line
 639, in __handle

 raise e

 pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or directory


This looks like a very old version of ceph-deploy. Can you attempt to
install a newer version?

We released version 1.2 to the Python Package Index and to our repos here:
http://ceph.com/packages/ceph-extras/rpm/

If you are familiar with Python install tools you could simply do: `sudo
pip install ceph-deploy`, otherwise could you try with the RPM packages?

But, you mention the lack of internet connection, so that would mean that
for `pip` it would be quite the headache to meet all of ceph-deploy's
dependencies.

Can you try with the RPMs for version 1.2 and run again? 1.2 had a massive
amount of bug fixes and it includes much better logging output.

Once you do, paste back the output here so I can take a look.

 

 ** **

 I saw a similar thread in the archives, but the solution given there
 doesn’t seem to be that clear.

 ** **

 http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-June/002344.html*
 ***

 ** **

 I had to install all the rpms separately as the machines that I work with
 don’t have internet access and “ceph-deploy install“ needs internet access.
 Could someone suggest what might be wrong here?

 ** **

 Environment: RHEL 6.4, ceph 0.61

 ** **

 Thanks,

 Sijo Mathew 

 ** **


 **


 ==
 Please access the attached hyperlink for an important electronic
 communications disclaimer:
 http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html

 ==
 


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Jeppesen, Nelson
I built the wip-monstore-copy branch with './configure --with-rest-bench 
--with-debug' and 'make'. It worked and I get all the usual stuff but 
ceph-monstore-tool is missing. I see code in ./src/tools/. Did I miss something?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy and journal on separate disk

2013-08-13 Thread Alfredo Deza
On Tue, Aug 13, 2013 at 3:21 AM, Pavel Timoschenkov 
pa...@bayonetteas.onmicrosoft.com wrote:

 Hi.
 Yes, i'm zapped all disks before.

 More about my situation:
 sdaa - one of disk for data: 3 TB with GPT partition table.
 sda - ssd drive with manual created partitions (10 GB) for journal with
 MBR partition table.
 ===
 fdisk -l /dev/sda

 Disk /dev/sda: 480.1 GB, 480103981056 bytes
 255 heads, 63 sectors/track, 58369 cylinders, total 937703088 sectors
 Units = sectors of 1 * 512 = 512 bytes
 Sector size (logical/physical): 512 bytes / 512 bytes
 I/O size (minimum/optimal): 512 bytes / 512 bytes
 Disk identifier: 0x00033624

Device Boot  Start End  Blocks   Id  System
 /dev/sda1204819531775 9764864   83  Linux
 /dev/sda21953177639061503 9764864   83  Linux
 /dev/sda33906150458593279 9765888   83  Linux
 /dev/sda47812505697656831 9765888   83  Linux

 ===

 If i'm executed ceph-deploy osd prepare without journal options - it's
 ok:


 ceph@ceph-admin:~$ ceph-deploy disk zap ceph001:sdaa ceph001:sda1
 [ceph_deploy.osd][DEBUG ] zapping /dev/sdaa on ceph001
 [ceph_deploy.osd][DEBUG ] zapping /dev/sda1 on ceph001

 ceph@ceph-admin:~$ ceph-deploy osd prepare ceph001:sdaa
 [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ceph001:/dev/sdaa:
 [ceph_deploy.osd][DEBUG ] Deploying osd to ceph001
 [ceph_deploy.osd][DEBUG ] Host ceph001 is now ready for osd use.
 [ceph_deploy.osd][DEBUG ] Preparing host ceph001 disk /dev/sdaa journal
 None activate False

 root@ceph001:~# gdisk -l /dev/sdaa
 GPT fdisk (gdisk) version 0.8.1

 Partition table scan:
   MBR: protective
   BSD: not present
   APM: not present
   GPT: present

 Found valid GPT with protective MBR; using GPT.
 Disk /dev/sdaa: 5860533168 sectors, 2.7 TiB
 Logical sector size: 512 bytes
 Disk identifier (GUID): 575ACF17-756D-47EC-828B-2E0A0B8ED757
 Partition table holds up to 128 entries
 First usable sector is 34, last usable sector is 5860533134
 Partitions will be aligned on 2048-sector boundaries
 Total free space is 4061 sectors (2.0 MiB)

 Number  Start (sector)End (sector)  Size   Code  Name
1 2099200  5860533134   2.7 TiB   ceph data
22048 2097152   1023.0 MiB    ceph journal

 Problems start, when i'm try create journal on separate drive:

 ceph@ceph-admin:~$ ceph-deploy disk zap ceph001:sdaa ceph001:sda1
 [ceph_deploy.osd][DEBUG ] zapping /dev/sdaa on ceph001
 [ceph_deploy.osd][DEBUG ] zapping /dev/sda1 on ceph001

 ceph@ceph-admin:~$ ceph-deploy osd prepare ceph001:sdaa:sda1
 [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks
 ceph001:/dev/sdaa:/dev/sda1
 [ceph_deploy.osd][DEBUG ] Deploying osd to ceph001
 [ceph_deploy.osd][DEBUG ] Host ceph001 is now ready for osd use.
 [ceph_deploy.osd][DEBUG ] Preparing host ceph001 disk /dev/sdaa journal
 /dev/sda1 activate False
 [ceph_deploy.osd][ERROR ] ceph-disk-prepare -- /dev/sdaa /dev/sda1
 returned 1
 Information: Moved requested sector from 34 to 2048 in
 order to align on 2048-sector boundaries.
 The operation has completed successfully.
 meta-data=/dev/sdaa1 isize=2048   agcount=32, agsize=22892700
 blks
  =   sectsz=512   attr=2, projid32bit=0
 data =   bsize=4096   blocks=732566385, imaxpct=5
  =   sunit=0  swidth=0 blks
 naming   =version 2  bsize=4096   ascii-ci=0
 log  =internal log   bsize=4096   blocks=357698, version=2
  =   sectsz=512   sunit=0 blks, lazy-count=1
 realtime =none   extsz=4096   blocks=0, rtextents=0

 WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same
 device as the osd data
 mount: /dev/sdaa1: more filesystems detected. This should not happen,
use -t type to explicitly specify the filesystem type or
use wipefs(8) to clean up the device.

 mount: you must specify the filesystem type
 ceph-disk: Mounting filesystem failed: Command '['mount', '-o', 'noatime',
 '--', '/dev/sdaa1', '/var/lib/ceph/tmp/mnt.fZQxiz']' returned non-zero exit
 status 32

 ceph-deploy: Failed to create 1 OSDs

 It looks like at some point the filesystem is not passed to the options.
Would you mind running the `ceph-disk-prepare` command again but with
the --verbose flag?

I think that from the output above (correct it if I am mistaken) that would
be something like:

ceph-disk-prepare --verbose -- /dev/sdaa /dev/sda1


And paste the results back so we can take a look?

 -Original Message-
 From: Samuel Just [mailto:sam.j...@inktank.com]
 Sent: Monday, August 12, 2013 11:39 PM
 To: Pavel Timoschenkov
 Cc: ceph-us...@ceph.com
 Subject: Re: [ceph-users] ceph-deploy and journal on separate disk

 Did you try using ceph-deploy disk zap ceph001:sdaa first?
 -Sam

 On Mon, Aug 12, 2013 

Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Mandell Degerness
Hmm.  This sounds very similar to the problem I reported (with
debug-mon = 20 and debug ms = 1 logs as of today) on our support site
(ticket #438) - Sage, please take a look.

On Mon, Aug 12, 2013 at 9:49 PM, Sage Weil s...@inktank.com wrote:
 On Mon, 12 Aug 2013, Jeppesen, Nelson wrote:
 Joao,

 (log file uploaded to http://pastebin.com/Ufrxn6fZ)

 I had some good luck and some bad luck. I copied the store.db to a new 
 monitor, injected a modified monmap and started it up (This is all on the 
 same host.) Very quickly it reached quorum (as far as I can tell) but didn't 
 respond. Running 'ceph -w' just hung, no timeouts or errors. Same thing when 
 restarting an OSD.

 The last lines of the log file   '...ms_verify_authorizer..' are from 'ceph 
 -w' attempts.

 I restarted everything again and it sat there synchronizing. IO stat 
 reported about 100MB/s, but just reads. I let it sit there for 7 min but 
 nothing happened.

 Can you do this again with --debug-mon 20 --debug-ms 1?  It looks as
 though the main dispatch thread is blocked (7f71a1aa5700 does nothing
 after winning the election).  It would also be helpful to gdb attach to
 the running ceph-mon and capture the output from 'thread apply all bt'.

 Side question, how long can a ceph cluster run without a monitor? I was
 able to upload files via rados gateway without issue even when the
 monitor was down.

 Quite a while, as long as no new processes need to authenticate, and no
 nodes go up or down.  Eventually the authentication keys are going to time
 out, though (1 hour is the default).

 sage
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Jeppesen, Nelson
Never mind, I removed --with-rest-bench and it worked.

 I built the wip-monstore-copy branch with './configure --with-rest-bench 
 --with-debug' and 'make'. It worked and I get all the usual stuff but ceph- 
 monstore-tool is missing. I see code in ./src/tools/. Did I miss something?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Designing an application with Ceph

2013-08-13 Thread Nulik Nol
Hi,
I am planning to use Ceph as a database storage for a webmail
client/server application, and I am thinking to store the data as
key/value pair instead of using any RDBMSs, for speed. The webmail
will manage companies, and each company will have many users, users
will end/receive emails and store them in their inboxes, kind of like
Gmail, but per company. The server will be developed in C, client code
in HTML/Javascript and binary client (standalone app) in C++
So, my question is, how would you recommend me to design the backend ?

I have thought of these choices:

1. Use Ceph as filesystem and BerkeleyDB as the database engine.
Berekley DB uses 2 files per table, so I will have 1 directory per
company and a 2 files per each table, I think there will be no more
than 20 tables in my whole app. Ceph will be used here as a remote
filesystem where BerkeleyDB will do all the data organization. The
RADOS interface of Ceph (to store key/pair values) will be not used,
since Berkeley DB will write and read to the OSDs directly and
Berkeley DB is a key/value pair database. But I have never used a DB
one a remote filesystem not sure if it will work well. Advantages of
this architecture: quick  easy.
Disadvantages: lower performance (overhead in CephFS and BerkeleyDB),
also I will not be able to write plugins for RADOS in C++ to combine
many data modifications in a single call to the server.

2. Use librados C api and write all the 'queries' hardcoded in C
specifically for the
application. Since the application is pretty standard and is not
supposed to change
much, I can do this. I would create a RADOS object for each
application object (like for example 'user' record, 'email' record,
'chat message' record, etc...).
Advantages: high performance. Disadvantages: a bit more to code ,
specially the data search functions.

I am interested in performance, so I am thinking to go for the option
2, what do you think? Can RADOS fully replace a database engine ? (I
mean, NoSQL engine, like Berkeley for example)

Will appreciate very much your comments.
TIA
Nulik
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-13 Thread Sage Weil
On Mon, 5 Aug 2013, Mike Dawson wrote:
 Josh,
 
 Logs are uploaded to cephdrop with the file name mikedawson-rbd-qemu-deadlock.
 
 - At about 2013-08-05 19:46 or 47, we hit the issue, traffic went to 0
 - At about 2013-08-05 19:53:51, ran a 'virsh screenshot'
 
 
 Environment is:
 
 - Ceph 0.61.7 (client is co-mingled with three OSDs)
 - rbd cache = true and cache=writeback
 - qemu 1.4.0 1.4.0+dfsg-1expubuntu4
 - Ubuntu Raring with 3.8.0-25-generic
 
 This issue is reproducible in my environment, and I'm willing to run any wip
 branch you need. What else can I provide to help?

This looks like a different issue than Oliver's.  I see one anomaly in the 
log, where a rbd io completion is triggered a second time for no apparent 
reason.  I opened a separate bug 

http://tracker.ceph.com/issues/5955

and pushed wip-5955 that will hopefully shine some light on the weird 
behavior I saw.  Can you reproduce with this branch and

 debug objectcacher = 20
 debug ms = 1
 debug rbd = 20
 debug finisher = 20

Thanks!
sage


 
 Thanks,
 Mike Dawson
 
 
 On 8/5/2013 3:48 AM, Stefan Hajnoczi wrote:
  On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote:
   Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com:
We can un-wedge the guest by opening a NoVNC session or running a
'virsh screenshot' command. After that, the guest resumes and runs as
expected. At that point we can examine the guest. Each time we'll see:
  
  If virsh screenshot works then this confirms that QEMU itself is still
  responding.  Its main loop cannot be blocked since it was able to
  process the screendump command.
  
  This supports Josh's theory that a callback is not being invoked.  The
  virtio-blk I/O request would be left in a pending state.
  
  Now here is where the behavior varies between configurations:
  
  On a Windows guest with 1 vCPU, you may see the symptom that the guest no
  longer responds to ping.
  
  On a Linux guest with multiple vCPUs, you may see the hung task message
  from the guest kernel because other vCPUs are still making progress.
  Just the vCPU that issued the I/O request and whose task is in
  UNINTERRUPTIBLE state would really be stuck.
  
  Basically, the symptoms depend not just on how QEMU is behaving but also
  on the guest kernel and how many vCPUs you have configured.
  
  I think this can explain how both problems you are observing, Oliver and
  Mike, are a result of the same bug.  At least I hope they are :).
  
  Stefan
  
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-13 Thread Sage Weil
Hi Oliver,

(Posted this on the bug too, but:)

Your last log revealed a bug in the librados aio flush.  A fix is pushed 
to wip-librados-aio-flush (bobtail) and wip-5919 (master); can you retest 
please (with caching off again)?

Thanks!
sage


On Fri, 9 Aug 2013, Oliver Francke wrote:
 Hi Josh,
 
 just opened
 
 http://tracker.ceph.com/issues/5919
 
 with all collected information incl. debug-log.
 
 Hope it helps,
 
 Oliver.
 
 On 08/08/2013 07:01 PM, Josh Durgin wrote:
  On 08/08/2013 05:40 AM, Oliver Francke wrote:
   Hi Josh,
   
   I have a session logged with:
   
debug_ms=1:debug_rbd=20:debug_objectcacher=30
   
   as you requested from Mike, even if I think, we do have another story
   here, anyway.
   
   Host-kernel is: 3.10.0-rc7, qemu-client 1.6.0-rc2, client-kernel is
   3.2.0-51-amd...
   
   Do you want me to open a ticket for that stuff? I have about 5MB
   compressed logfile waiting for you ;)
  
  Yes, that'd be great. If you could include the time when you saw the guest
  hang that'd be ideal. I'm not sure if this is one or two bugs,
  but it seems likely it's a bug in rbd and not qemu.
  
  Thanks!
  Josh
  
   Thnx in advance,
   
   Oliver.
   
   On 08/05/2013 09:48 AM, Stefan Hajnoczi wrote:
On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote:
 Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com:
  We can un-wedge the guest by opening a NoVNC session or running a
  'virsh screenshot' command. After that, the guest resumes and runs
  as expected. At that point we can examine the guest. Each time we'll
  see:
If virsh screenshot works then this confirms that QEMU itself is still
responding.  Its main loop cannot be blocked since it was able to
process the screendump command.

This supports Josh's theory that a callback is not being invoked.  The
virtio-blk I/O request would be left in a pending state.

Now here is where the behavior varies between configurations:

On a Windows guest with 1 vCPU, you may see the symptom that the guest
no
longer responds to ping.

On a Linux guest with multiple vCPUs, you may see the hung task message
from the guest kernel because other vCPUs are still making progress.
Just the vCPU that issued the I/O request and whose task is in
UNINTERRUPTIBLE state would really be stuck.

Basically, the symptoms depend not just on how QEMU is behaving but also
on the guest kernel and how many vCPUs you have configured.

I think this can explain how both problems you are observing, Oliver and
Mike, are a result of the same bug.  At least I hope they are :).

Stefan
   
   
  
 
 
 -- 
 
 Oliver Francke
 
 filoo GmbH
 Moltkestra?e 25a
 0 G?tersloh
 HRB4355 AG G?tersloh
 
 Gesch?ftsf?hrer: J.Rehp?hler | C.Kunz
 
 Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Jeppesen, Nelson
Joao,

ceph-monstore-tool --mon-store-path /var/lib/ceph/mon/ceph-2 --out 
/var/lib/ceph/mon/ceph-1  --command store-copy
is running now. It hit 52MB very quickly then nothing with lots of disk read, 
which is what I'd expect. Its reading fast and expect it to finish in 35min.

Just to make sure, this won't add a new monitor, just clean it up. So, when 
it's done I should do the following:

mv /var/lib/ceph/mon/ceph-2 /var/lib/ceph/mon/ceph-2.old
mv /var/lib/ceph/mon/ceph-1 /var/lib/ceph/mon/ceph-2
service ceph start mon.2



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Joao Eduardo Luis

On 13/08/13 14:46, Jeppesen, Nelson wrote:

Joao,

ceph-monstore-tool --mon-store-path /var/lib/ceph/mon/ceph-2 --out
/var/lib/ceph/mon/ceph-1  --command store-copy

is running now. It hit 52MB very quickly then nothing with lots of disk
read, which is what I’d expect. Its reading fast and expect it to finish
in 35min.

Just to make sure, this won’t add a new monitor, just clean it up. So,
when it’s done I should do the following:

mv /var/lib/ceph/mon/ceph-2 /var/lib/ceph/mon/ceph-2.old

mv /var/lib/ceph/mon/ceph-1 /var/lib/ceph/mon/ceph-2

service ceph start mon.2


Correct.  The tool just extracts whatever is on one mon store and copies 
it to another store.  The contents should be the same and the monitor 
should come back to life as if nothing had happened.


If for some reason that is not the case, you'll still have the original 
store readily to be used.  Let me know if that happens and I'll be happy 
to help.


  -Joao


--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Joao Eduardo Luis

On 13/08/13 14:46, Jeppesen, Nelson wrote:

Joao,

ceph-monstore-tool --mon-store-path /var/lib/ceph/mon/ceph-2 --out
/var/lib/ceph/mon/ceph-1  --command store-copy

is running now. It hit 52MB very quickly then nothing with lots of disk
read, which is what I’d expect. Its reading fast and expect it to finish
in 35min.

Just to make sure, this won’t add a new monitor, just clean it up. So,
when it’s done I should do the following:

mv /var/lib/ceph/mon/ceph-2 /var/lib/ceph/mon/ceph-2.old

mv /var/lib/ceph/mon/ceph-1 /var/lib/ceph/mon/ceph-2

service ceph start mon.2


Sage pointed out that you'll also need to copy the 'keyring' file from 
the original mon data dir to the new mon data dir.


So that would be 'cp /var/lib/ceph/mon/ceph-2/keyring 
/var/lib/ceph/mon/ceph-1/'


You should be good to go then.

  -Joao


--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Designing an application with Ceph

2013-08-13 Thread Samuel Just
2 is certainly an intriguing option.  RADOS isn't really a database
engine (even a nosql one), but should be able to serve your needs
here.  Have you seen the omap api available in librados?  It allows
you to efficiently store key/value pairs attached to a librados object
(uses leveldb on the OSDs to actually handle the key/value mapping).

One caveat is that the C api is somewhat less complete than the C++
api.  That would be pretty easily remedied if there were demand
though.
-Sam

On Tue, Aug 13, 2013 at 2:01 PM, Nulik Nol nulik...@gmail.com wrote:
 Hi,
 I am planning to use Ceph as a database storage for a webmail
 client/server application, and I am thinking to store the data as
 key/value pair instead of using any RDBMSs, for speed. The webmail
 will manage companies, and each company will have many users, users
 will end/receive emails and store them in their inboxes, kind of like
 Gmail, but per company. The server will be developed in C, client code
 in HTML/Javascript and binary client (standalone app) in C++
 So, my question is, how would you recommend me to design the backend ?

 I have thought of these choices:

 1. Use Ceph as filesystem and BerkeleyDB as the database engine.
 Berekley DB uses 2 files per table, so I will have 1 directory per
 company and a 2 files per each table, I think there will be no more
 than 20 tables in my whole app. Ceph will be used here as a remote
 filesystem where BerkeleyDB will do all the data organization. The
 RADOS interface of Ceph (to store key/pair values) will be not used,
 since Berkeley DB will write and read to the OSDs directly and
 Berkeley DB is a key/value pair database. But I have never used a DB
 one a remote filesystem not sure if it will work well. Advantages of
 this architecture: quick  easy.
 Disadvantages: lower performance (overhead in CephFS and BerkeleyDB),
 also I will not be able to write plugins for RADOS in C++ to combine
 many data modifications in a single call to the server.

 2. Use librados C api and write all the 'queries' hardcoded in C
 specifically for the
 application. Since the application is pretty standard and is not
 supposed to change
 much, I can do this. I would create a RADOS object for each
 application object (like for example 'user' record, 'email' record,
 'chat message' record, etc...).
 Advantages: high performance. Disadvantages: a bit more to code ,
 specially the data search functions.

 I am interested in performance, so I am thinking to go for the option
 2, what do you think? Can RADOS fully replace a database engine ? (I
 mean, NoSQL engine, like Berkeley for example)

 Will appreciate very much your comments.
 TIA
 Nulik
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Jeppesen, Nelson
Success! It was pretty quick too, maybe 20-30min. It’s now at 100MB.

In a matter of min I was able to add two monitors and now I’m back to three 
monitors.

Thank you again, Joao and Sage! I can sleep at night now knowing that a single 
node won't take down the cluster anymore ☺
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why is my mon store.db is 220GB?

2013-08-13 Thread Joao Eduardo Luis

On 13/08/13 16:13, Jeppesen, Nelson wrote:

Success! It was pretty quick too, maybe 20-30min. It’s now at 100MB.

In a matter of min I was able to add two monitors and now I’m back to three 
monitors.

Thank you again, Joao and Sage! I can sleep at night now knowing that a single 
node won't take down the cluster anymore ☺


Hooray!  Glad to know everything worked out! :-)

  -Joao


--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd map issues: no such file or directory (ENOENT) AND map wrong image

2013-08-13 Thread David Zafman

On Aug 12, 2013, at 7:41 PM, Josh Durgin josh.dur...@inktank.com wrote:

 On 08/12/2013 07:18 PM, PJ wrote:
 
 If the target rbd device only map on one virtual machine, format it as
 ext4 and mount to two places
   mount /dev/rbd0 /nfs -- for nfs server usage
   mount /dev/rbd0 /ftp  -- for ftp server usage
 nfs and ftp servers run on the same virtual machine. Will file system
 (ext4) help to handle the simultaneous access from nfs and ftp?
 
 I doubt that'll work perfectly on a normal disk, although rbd should
 behave the same in this case. Consider what happens when to be some
 issues when the same files are modified at once by the ftp and nfs
 servers. You could run ftp on an nfs client on a different machine
 safely.
 


Modern Linux kernels will do a bind mount when a block device is mounted on 2 
different directories.   Think directory hard links.  Simultaneous access will 
NOT corrupt ext4, but as Josh said modifying the same file at once by ftp and 
nfs isn't going produce good results.  With file locking 2 nfs clients could 
coordinate using advisory locking.  

David Zafman
Senior Developer
http://www.inktank.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] basic single node set up issue on rhel6

2013-08-13 Thread Alfredo Deza
On Tue, Aug 13, 2013 at 4:20 PM, Mathew, Sijo (KFRM 1) 
sijo.mat...@credit-suisse.com wrote:

  Hi,

 ** **

 Installed ceph-deploy_1.2.1 via rpm but it looks like it needs
 pushy=0.5.2, which I couldn’t find in the repository. Please advise.


Can you try again? It seems we left out the new pushy requirement out and
that should be fixed.



 

 ** **

 [host]$ ceph-deploy mon create ceph-server-299

 Traceback (most recent call last):

   File /usr/bin/ceph-deploy, line 21, in module

 main()

   File /usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py,
 line 83, in newfunc

 return f(*a, **kw)

   File /usr/lib/python2.6/site-packages/ceph_deploy/cli.py, line 85, in
 main

 args = parse_args(args=args, namespace=namespace)

   File /usr/lib/python2.6/site-packages/ceph_deploy/cli.py, line 54, in
 parse_args

 for ep in pkg_resources.iter_entry_points('ceph_deploy.cli')

   File /usr/lib/python2.6/site-packages/pkg_resources.py, line 1947, in
 load

 if require: self.require(env, installer)

   File /usr/lib/python2.6/site-packages/pkg_resources.py, line 1960, in
 require

 working_set.resolve(self.dist.requires(self.extras),env,installer))***
 *

   File /usr/lib/python2.6/site-packages/pkg_resources.py, line 550, in
 resolve

 raise VersionConflict(dist,req) # XXX put more info here

 pkg_resources.VersionConflict: (pushy 0.5.1
 (/usr/lib/python2.6/site-packages), Requirement.parse('pushy=0.5.2'))

 ** **

 Thanks,

 Sijo Mathew

 ** **

 *From:* Alfredo Deza [mailto:alfredo.d...@inktank.com]
 *Sent:* Tuesday, August 13, 2013 3:33 PM
 *To:* Mathew, Sijo (KFRM 1)
 *Cc:* ceph-users@lists.ceph.com
 *Subject:* Re: [ceph-users] basic single node set up issue on rhel6

 ** **

 Hi Sijo

 ** **

 On Mon, Aug 12, 2013 at 12:26 PM, Mathew, Sijo (KFRM 1) 
 sijo.mat...@credit-suisse.com wrote:

 Hi,

  

 I have been trying to get ceph installed on a single node. But I’m stuck
 with the following error.

  

 [host]$ ceph-deploy -v mon create ceph-server-299

 Deploying mon, cluster ceph hosts ceph-server-299

 Deploying mon to ceph-server-299

 Distro RedHatEnterpriseServer codename Santiago, will use sysvinit

 Traceback (most recent call last):

   File /usr/bin/ceph-deploy, line 21, in module

 main()

   File /usr/lib/python2.6/site-packages/ceph_deploy/cli.py, line 112, in
 main

 return args.func(args)

   File /usr/lib/python2.6/site-packages/ceph_deploy/mon.py, line 234, in
 mon

 mon_create(args)

   File /usr/lib/python2.6/site-packages/ceph_deploy/mon.py, line 138, in
 mon_create

 init=init,

   File /usr/lib/python2.6/site-packages/pushy/protocol/proxy.py, line
 255, in lambda

 (conn.operator(type_, self, args, kwargs))

   File /usr/lib/python2.6/site-packages/pushy/protocol/connection.py,
 line 66, in operator

 return self.send_request(type_, (object, args, kwargs))

   File
 /usr/lib/python2.6/site-packages/pushy/protocol/baseconnection.py, line
 323, in send_request

 return self.__handle(m)

   File
 /usr/lib/python2.6/site-packages/pushy/protocol/baseconnection.py, line
 639, in __handle

 raise e

 pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or directory**
 **

 ** **

 This looks like a very old version of ceph-deploy. Can you attempt to
 install a newer version?

 We released version 1.2 to the Python Package Index and to our repos
 here:  http://ceph.com/packages/ceph-extras/rpm/

 If you are familiar with Python install tools you could simply do: `sudo
 pip install ceph-deploy`, otherwise could you try with the RPM packages?**
 **

 But, you mention the lack of internet connection, so that would mean that
 for `pip` it would be quite the headache to meet all of ceph-deploy's
 dependencies.

 Can you try with the RPMs for version 1.2 and run again? 1.2 had a massive
 amount of bug fixes and it includes much better logging output.

 Once you do, paste back the output here so I can take a look.

   

 I saw a similar thread in the archives, but the solution given there
 doesn’t seem to be that clear.

  

 http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-June/002344.html*
 ***

  

 I had to install all the rpms separately as the machines that I work with
 don’t have internet access and “ceph-deploy install“ needs internet access.
 Could someone suggest what might be wrong here?

  

 Environment: RHEL 6.4, ceph 0.61

  

 Thanks,

 Sijo Mathew 

  

 ** **


 ==
 Please access the attached hyperlink for an important electronic
 communications disclaimer:
 http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html

 

[ceph-users] v0.67 Dumpling released

2013-08-13 Thread Sage Weil
Another three months have gone by, and the next stable release of Ceph is 
ready: Dumpling!  Thank you to everyone who has contributed to this 
release!

This release focuses on a few major themes since v0.61 (Cuttlefish):

 * rgw: multi-site, multi-datacenter support for S3/Swift object storage
 * new RESTful API endpoint for administering the cluster, based on a new 
   and improved management API and updated CLI
 * mon: stability and performance
 * osd: stability performance
 * cephfs: open-by-ino support (for improved NFS reexport)
 * improved support for Red Hat platforms
 * use of the Intel CRC32c instruction when available

As with previous stable releases, you can upgrade from previous versions 
of Ceph without taking the entire cluster online, as long as a few simple 
guidelines are followed.

 * For Dumpling, we have tested upgrades from both Bobtail and Cuttlefish.  
   If you are running Argonaut, please upgrade to Bobtail and then to 
   Dumpling.
 * Please upgrade daemons/hosts in the following order:
   1. Upgrade ceph-common on all nodes that will use the command line ceph 
  utility.
   2. Upgrade all monitors (upgrade ceph package, restart ceph-mon 
  daemons). This can happen one daemon or host at a time. Note that 
  because cuttlefish and dumpling monitors cant talk to each other, 
  all monitors should be upgraded in relatively short succession to
  minimize the risk that an untimely failure will reduce availability.
   3. Upgrade all osds (upgrade ceph package, restart ceph-osd daemons). 
  This can happen one daemon or host at a time.
   4. Upgrade radosgw (upgrade radosgw package, restart radosgw daemons).

There are several small compatibility changes between Cuttlefish and 
Dumpling, particularly with the CLI interface.  Please see the complete 
release notes for a summary of the changes since v0.66 and v0.61 
Cuttlefish, and other possible issues that should be considered before 
upgrading:

   http://ceph.com/docs/master/release-notes/#v0-67-dumpling

Dumpling is the second Ceph release on our new three-month stable release 
cycle.  We are very pleased to have pulled everything together on 
schedule.  The next stable release, which will be code-named Emperor, is 
slated for three months from now (beginning of November).

You can download v0.67 Dumpling from the usual locations:

 * Git at git://github.com/ceph/ceph.git
 * Tarball at http://ceph.com/download/ceph-0.67.tar.gz
 * For Debian/Ubuntu packages, see http://ceph.com/docs/master/install/debian
 * For RPMs, see http://ceph.com/docs/master/install/rpm
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com