Monitor crash

2012-12-20 Thread Eric_YH_Chen
Hi All: I met this crash once. You can download the full log here. https://dl.dropbox.com/u/35107741/ceph-mon.log ceph version 0.48.2argonaut (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe) 1: /usr/bin/ceph-mon() [0x52569a] 2: (()+0xfcb0) [0x7ffad0949cb0] 3: (gsignal()+0x35)

OSD crash on 0.48.2argonaut

2012-11-14 Thread Eric_YH_Chen
Dear All: I met this issue on one of osd node. Is this a known issue? Thanks! ceph version 0.48.2argonaut (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe) 1: /usr/bin/ceph-osd() [0x6edaba] 2: (()+0xfcb0) [0x7f08b112dcb0] 3: (gsignal()+0x35) [0x7f08afd09445] 4: (abort()+0x17b)

Limitaion of CephFS

2012-10-30 Thread Eric_YH_Chen
Hi all: I have some question about the limitation of CephFS. Would you please help to answer these questions? Thanks! 1. Max file size 2. Max number of files 3. Max filename length 4. filename character set, ex: any byte, except null, / 5. max pathname length And one question about RBD 1. max

RE: RBD boot from volume weirdness in OpenStack

2012-10-25 Thread Eric_YH_Chen
Dear Josh and Travis: I am trying to setup the openstack+ceph environment too, but I am not using devstack. I deploy the glance, cinder, nova, keystone into different servers. All the basic function works fine, I can import image, create volume and create virtual machine. It seems the glance

Re: rbd map error with new rbd format

2012-10-18 Thread Eric_YH_Chen
Hi, Josh: Yeah, format 2 and layering support is in progress for kernel rbd, but not ready yet. The userspace side is all ready in the master branch, but it takes more time to implement in the kernel. Btw, instead of --new-format you should use --format 2. It's in the man page in the master

accident corruption in osdmap

2012-08-08 Thread Eric_YH_Chen
Dear all: My Environment: two servers, and 12 hard-disk on each server. Version: Ceph 0.48, Kernel: 3.2.0-27 We create a ceph cluster with 24 osd, 3 monitors Osd.0 ~ osd.11 is on server1 Osd.12 ~ osd.23 is on server2 Mon.0 is on server1 Mon.1 is on server2 Mon.2 is on

RE: cannot startup one of the osd

2012-08-01 Thread Eric_YH_Chen
Hi, Samuel: And the ceph cluster stays at a not healthy status. How could we fix it? There are 230 object unfound and we cannot access some rbd devices now. It would hang at rbd info image_name. root@ubuntu:~$ ceph -s health HEALTH_WARN 96 pgs backfill; 96 pgs degraded; 96 pgs recovering; 96

RE: High-availability testing of ceph

2012-07-31 Thread Eric_YH_Chen
Hi, Josh: Thanks for your reply. However, I had asked a question about replica setting before. http://www.spinics.net/lists/ceph-devel/msg07346.html If the performance of rbd device is n MB/s under replica=2, then that means the total io throughputs on hard disk is over 3 * n MB/s. Because I

The cluster do not aware some osd are disappear

2012-07-31 Thread Eric_YH_Chen
Dear All: My Environment: two servers, and 12 hard-disk on each server. Version: Ceph 0.48, Kernel: 3.2.0-27 We create a ceph cluster with 24 osd, 3 monitors Osd.0 ~ osd.11 is on server1 Osd.12 ~ osd.23 is on server2 Mon.0 is on server1 Mon.1 is on server2 Mon.2 is on server3

cannot startup one of the osd

2012-07-31 Thread Eric_YH_Chen
Hi, all: My Environment: two servers, and 12 hard-disk on each server. Version: Ceph 0.48, Kernel: 3.2.0-27 We create a ceph cluster with 24 osd, 3 monitors Osd.0 ~ osd.11 is on server1 Osd.12 ~ osd.23 is on server2 Mon.0 is on server1 Mon.1 is on server2 Mon.2 is on server3

RE: cannot startup one of the osd

2012-07-31 Thread Eric_YH_Chen
Hi, Samuel: It happens every startup, I cannot fix it now. -Original Message- From: Samuel Just [mailto:sam.j...@inktank.com] Sent: Wednesday, August 01, 2012 1:36 AM To: Eric YH Chen/WYHQ/Wiwynn Cc: ceph-devel@vger.kernel.org; Chris YT Huang/WYHQ/Wiwynn; Victor CY Chang/WYHQ/Wiwynn

High-availability testing of ceph

2012-07-30 Thread Eric_YH_Chen
Hi, all: I am testing high-availability of ceph. Environment: two servers, and 12 hard-disk on each server. Version: Ceph 0.48 Kernel: 3.2.0-27 We create a ceph cluster with 24 osd. Osd.0 ~ osd.11 is on server1 Osd.12 ~ osd.23 is on server2 The crush rule is using default

RE: rbd map fail when the crushmap algorithm changed to tree

2012-07-08 Thread Eric_YH_Chen
Hi, Gregory: OS: Ubuntu 12.04 kernel: 3.2.0-26 ceph: 0.48 filesystem : ext4 My step to assign new crush map 1. ceph osd getcrushmap -o curmap 2. crushtool -d curmap -o curmap.txt 3. modify the curmap.txt and rename to newmap.txt 4. service ceph -a stop = destruct the cluster 5. mkcephfs -a -c

rbd map fail when the crushmap algorithm changed to tree

2012-07-06 Thread Eric_YH_Chen
Hi all: Here is the original crushmap, I change the algorithm of host to tree and set back to ceph cluster. However, when I try to map one imge to rados block device (RBD), it would hang and no response until I press ctrl-c. ( rbd map = then hang) Is there any wrong in the crushmap?

What does replica size mean?

2012-07-04 Thread Eric_YH_Chen
Hi, all: Just want to make sure one thing. If I set replica size as 2, that means one data with 2 copies, right? Therefore, if I measure the performance of rbd is 100MB/s, I can imagine the actually io throughputs on hard disk is over 100MB/s *3 = 300 MB/s.   Am I correct? Thanks! -- To

RE: Performance benchmark of rbd

2012-06-18 Thread Eric_YH_Chen
Hi, Mark and all: I think you may miss this mail before, so I send it again. == I forget to mention one thing, I create the rbd at the same machine and test it. That means the network latency may be lower than normal case. 1. I use ext4 as the backend filesystem and with following

Performance benchmark of rbd

2012-06-13 Thread Eric_YH_Chen
Hi, all: I am doing some benchmark of rbd. The platform is on a NAS storage. CPU: Intel E5640 2.67GHz Memory: 192 GB Hard Disk: SATA 250G * 1, 7200 rpm (H0) + SATA 1T * 12 , 7200rpm (H1~ H12) RAID Card: LSI 9260-4i OS: Ubuntu12.04 with Kernel 3.2.0-24 Network:

RE: Performance benchmark of rbd

2012-06-13 Thread Eric_YH_Chen
Hi, Mark: I forget to mention one thing, I create the rbd at the same machine and test it. That means the network latency may be lower than normal case. 1. I use ext4 as the backend filesystem and with following attribute. data=writeback,noatime,nodiratime,user_xattr 2. I use the default

Journal size of each disk

2012-06-11 Thread Eric_YH_Chen
Dear all: I would like to know if the journal size influence the performance of disk. If the size of each of my disk is 1T, how much size should I prepare for the journal? Thanks for any comment. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of

How to turn on async write of rbd ?

2012-06-10 Thread Eric_YH_Chen
Dear All: I saw rbd support async writes since 0.36, http://ceph.com/2011/09/ But I cannot find related document that how to turn on it. Should I just write enabled to /sys/devices/rbd/0/power/async? One more thing, If I want to implement iSCSI multipath with RBD, just like

Cannot restart the osd successful after reboot machine

2012-06-07 Thread Eric_YH_Chen
Dear All: In my testing environment, we deploy ceph cluster by version 0.43, kernel 3.2.0. (We deploy it several months ago, so the version is not the latest one) There are 5 MON and 8 OSD in the cluster.  We have 5 servers for the monitors. And two storages servers, 4 OSD for each. We meet a

Snapshot/Clone in RBD

2012-05-22 Thread Eric_YH_Chen
Hi all: According to the document, snapshot in RBD is Read-only. That is to say, if I want to clone the image, I should use rbd_copy. Right? I am curious about if this function is optimized, ex: copy-on-write, to speed up the performance. What I want to do is to integrate

OSD id in configuration file

2012-02-13 Thread Eric_YH_Chen
Hi, all: For the scalability consideration, we would like to name the first harddisk as 00101 on first server. And named the first harddisk as 00201 on second server. The ceph.conf seems like this: [osd] osd data = /srv/osd.$id osd journal = /srv/osd.$id.journal osd journal size = 1000

Ceph based on ext4

2011-12-29 Thread Eric_YH_Chen
Hi, all: We want to test the stability and performance of ceph with ext4 file system. Here is the information from `mount`, do we set all attributes correct? /dev/sda1 on /srv/osd.0 type ext4 (rw,noatime,nodiratime,errors=remount-ro,data=writeback,user_xattr) And

Why only support odd number monitors in ceph cluster?

2011-12-22 Thread Eric_YH_Chen
Hi, all: I am curious about why ceph cluster only support odd number monitors. If we lose 1 monitor (becomes odd number), would it cause any problem if we do not handle this situation in short time? Thanks! -- To unsubscribe from this list: send the line unsubscribe

Do not understand some terms about cluster health

2011-12-21 Thread Eric_YH_Chen
Hi, All When I type 'ceph health' to get the status of cluster, it will show some information. Would you please to explain the term? Ex: HEALTH_WARN 3/54 degraded (5.556%) What does degraded mean ? Is it a serious error and how to fix it ? Ex: HEALTH_WARN 264 pgs

Hang at 'rbd info image'

2011-12-18 Thread Eric_YH_Chen
Hi, all:     I met a situation that ceph system would hang at ‘rbd info image.   Only can reproduced on one image.   How can I know what happened? Is there any log that I can provide to you for analysis? Thanks!

Hang when mapping a long name rbd image

2011-12-16 Thread Eric_YH_Chen
Hi, all: My ceph version is ceph version 0.39 (commit:321ecdaba2ceeddb0789d8f4b7180a8ea5785d83) When I try to map a long name rbd image to device, it would hang for long time. For example: sudo rbd map iqn.2012-01.com.sample:storage.ttt --secret /etc/ceph/secretfile sudo rbd map

How to know the status of the monitors?

2011-12-13 Thread Eric_YH_Chen
Hi, all: Is there any command or API can retrieve the status of all monitors? I use ‘ceph mon stat’ to get this information. However, the mon.e is not available (server is shutdown). 2011-12-13 17:08:35.499022 mon - [mon,stat]    2011-12-13 17:08:35.499716 mon.1 - 'e3: 5 mons at

RE: rbd device would disappear after re-boot

2011-12-07 Thread Eric_YH_Chen
Hi, Tommi: What I see is like this. lrwxrwxrwx 1 root root 10 2011-12-07 16:48 foo1:0 - ../../rbd0 lrwxrwxrwx 1 root root 10 2011-12-07 16:50 foo2:1 - ../../rbd1 The extra number (:0 and :1) behind the image name make the problem still exists. -Original Message- From: Tommi Virtanen

How to sync data on different server but with the same image

2011-12-06 Thread Eric_YH_Chen
Dear All: I map the same rbd image to the rbd device on two different servers. For example: 1. create rbd image named foo 2. map foo to /dev/rbd0 on server A, mount /dev/rbd0 to /mnt 3. map foo to /dev/rbd0 on server B, mount /dev/rbd0 to /mnt If I put add a

rbd device would disappear after re-boot

2011-12-06 Thread Eric_YH_Chen
Dear All: Another question about the rbd device. All rbd device would disappear after server re-boot. Do you have any plan to implement a feature that server would re-map all the device during initialization? If not, do you have any suggestion (about technical part) if we want to

RE: How to sync data on different server but with the same image

2011-12-06 Thread Eric_YH_Chen
Hi, Wido: This is a preliminary experiment before implement iSCSI High Available multipath. http://ceph.newdream.net/wiki/ISCSI Therefore, we use Ceph as rbd device instead of file system. -Original Message- From: Wido den Hollander [mailto:w...@widodh.nl] Sent:

RE: How to sync data on different server but with the same image

2011-12-06 Thread Eric_YH_Chen
Hi, Brian: I would like to use SCST target or LIO target. But I found they are not supported on ubuntu 11.10 server. (LIO iSCSI is released under 3.1 kernel, but 11.10 server uses 3.0 kernel) Could you kindly to share how to use SCST target or LIO target on ubuntu ? I

RE: rbd device would disappear after re-boot

2011-12-06 Thread Eric_YH_Chen
Hi, Tommi: Thanks for your suggestion. We will try to implement a workaround solution for our internal experiment. I have another little question. Could I map specific image to specific device? For example: Before re-boot id poolimage snapdevice 0 rbd foo1-

Suggest to return [] when no image in the pool

2011-11-07 Thread Eric_YH_Chen
Hi, developers, When I used the API in rbd.py, I found RBD().list(ioctx) would return [] when there is no image in the pool. I suggest it should return [] in this case. It would avoid some programming problem. Thanks! regards, Eric/Pjack

RE: Cannot execute rados.py with sudoer

2011-11-03 Thread Eric_YH_Chen
Hi, Tommi, Here is my ceph.conf. The /var/log/ceph folder is created by myself. Because the script in 0.37 didn't create it. Maybe the problem is I did not set correct permission to the folder. ; global [global] auth supported = cephx max open files = 131072 log file

Cannot execute rados.py with sudoer

2011-11-02 Thread Eric_YH_Chen
Hi, all:    When I use raods.py, I met some problem even if the user is in sudoer.   I found it would access /etc/ceph/client.admin.keyring and /var/log/ceph/client.admin.log which is only available to root.     Do you have any suggestion? I cannot execute the python problem with “root”

RE: Cannot execute rados.py with sudoer

2011-11-02 Thread Eric_YH_Chen
Hi, Greg, The log is generated by ceph service at runtime. Even I change the permission, it would be overwritten by the service someday. I am afraid if there is any other permission problem when I execute other commands. Ex: I need to modify more files' permission Ex: The library use any

Some question about Ceph system (release 0.37)

2011-10-25 Thread Eric_YH_Chen
Hi, everyone Nice to meet you. I would like to do some experience on ceph system. After I installed the latest release 0.37, the ceph system seems work well. (type ceph -s ) However, I found some error messages in dmesg that I never saw it in release 0.34. My environment is ubuntu 11.04 with