Monitor crash

2012-12-20 Thread Eric_YH_Chen
Hi All: I met this crash once. You can download the full log here. https://dl.dropbox.com/u/35107741/ceph-mon.log ceph version 0.48.2argonaut (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe) 1: /usr/bin/ceph-mon() [0x52569a] 2: (()+0xfcb0) [0x7ffad0949cb0] 3: (gsignal()+0x35) [0x7ffacf725425]

OSD crash on 0.48.2argonaut

2012-11-14 Thread Eric_YH_Chen
Dear All: I met this issue on one of osd node. Is this a known issue? Thanks! ceph version 0.48.2argonaut (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe) 1: /usr/bin/ceph-osd() [0x6edaba] 2: (()+0xfcb0) [0x7f08b112dcb0] 3: (gsignal()+0x35) [0x7f08afd09445] 4: (abort()+0x17b) [0x7f08afd0cbab

Limitaion of CephFS

2012-10-30 Thread Eric_YH_Chen
Hi all: I have some question about the limitation of CephFS. Would you please help to answer these questions? Thanks! 1. Max file size 2. Max number of files 3. Max filename length 4. filename character set, ex: any byte, except null, "/" 5. max pathname length And one question about RBD 1. max

RE: RBD boot from volume weirdness in OpenStack

2012-10-25 Thread Eric_YH_Chen
Dear Josh and Travis: I am trying to setup the openstack+ceph environment too, but I am not using devstack. I deploy the glance, cinder, nova, keystone into different servers. All the basic function works fine, I can import image, create volume and create virtual machine. It seems the glance and

RE: rbd map error with new rbd format

2012-10-22 Thread Eric_YH_Chen
> At this point format 2 is understood by the kernel, and the > infrastructure for opening parent images and the I/O path > for clones is in progress. We estimate about 4-8 weeks for this, > but you should check back then. > Kernel 3.6 was already released, so this would probably go into 3.8 or >

Re: rbd map error with new rbd format

2012-10-18 Thread Eric_YH_Chen
Hi, Josh: > Yeah, format 2 and layering support is in progress for kernel rbd, > but not ready yet. The userspace side is all ready in the master > branch, but it takes more time to implement in the kernel. > Btw, instead of --new-format you should use --format 2. It's in > the man page in the mas

RE: accident corruption in osdmap

2012-08-09 Thread Eric_YH_Chen
Dear Sage and Dan: Thanks for kindly reply. If we want to patch http://tracker.newdream.net/issues/2446 on kernel 3.2, can we just modify the file as below? We are afraid that it may cause other side effect. diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c index 2592f3c..011a3a9 100644 ---

RE: cannot startup one of the osd

2012-08-08 Thread Eric_YH_Chen
Dear Samuel: I met similar issue from different scenario. Reproduce way: Step 1. Remove one disk directly (hot plug) and wait for the cluster remove the osd. Step 2. Put the original disk back to system and reboot. You can download the osd map / osd dump / pg dump via the link below. https://d

accident corruption in osdmap

2012-08-08 Thread Eric_YH_Chen
Dear all: My Environment: two servers, and 12 hard-disk on each server. Version: Ceph 0.48, Kernel: 3.2.0-27 We create a ceph cluster with 24 osd, 3 monitors Osd.0 ~ osd.11 is on server1 Osd.12 ~ osd.23 is on server2 Mon.0 is on server1 Mon.1 is on server2 Mon.2 is on ser

RE: The cluster do not aware some osd are disappear

2012-08-02 Thread Eric_YH_Chen
Hi, Tommi: I use this two command to get the crush map, how should I modify it? ceph osd getcrushmap -o curmap crushtool -d curmap -o curmap.txt # begin crush map # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5 device 6 osd.6 device 7 os

RE: cannot startup one of the osd

2012-08-01 Thread Eric_YH_Chen
Hi, Samuel: And the ceph cluster stays at a not healthy status. How could we fix it? There are 230 object unfound and we cannot access some rbd devices now. It would hang at "rbd info ". root@ubuntu:~$ ceph -s health HEALTH_WARN 96 pgs backfill; 96 pgs degraded; 96 pgs recovering; 96 pgs stu

RE: The cluster do not aware some osd are disappear

2012-07-31 Thread Eric_YH_Chen
Hi, Josh: I do not assign the crushmap by myself, I use the default setting. And after I reboot the server, I cannot reproduce this situation. The heartbeat check works fine when one of the server not available. -Original Message- From: Josh Durgin [mailto:josh.dur...@inktank.com] Sent

RE: cannot startup one of the osd

2012-07-31 Thread Eric_YH_Chen
Hi, Samuel: It happens every startup, I cannot fix it now. -Original Message- From: Samuel Just [mailto:sam.j...@inktank.com] Sent: Wednesday, August 01, 2012 1:36 AM To: Eric YH Chen/WYHQ/Wiwynn Cc: ceph-devel@vger.kernel.org; Chris YT Huang/WYHQ/Wiwynn; Victor CY Chang/WYHQ/Wiwynn Sub

cannot startup one of the osd

2012-07-31 Thread Eric_YH_Chen
Hi, all: My Environment: two servers, and 12 hard-disk on each server. Version: Ceph 0.48, Kernel: 3.2.0-27 We create a ceph cluster with 24 osd, 3 monitors Osd.0 ~ osd.11 is on server1 Osd.12 ~ osd.23 is on server2 Mon.0 is on server1 Mon.1 is on server2 Mon.2 is on server3 wh

The cluster do not aware some osd are disappear

2012-07-31 Thread Eric_YH_Chen
Dear All: My Environment: two servers, and 12 hard-disk on each server. Version: Ceph 0.48, Kernel: 3.2.0-27 We create a ceph cluster with 24 osd, 3 monitors Osd.0 ~ osd.11 is on server1 Osd.12 ~ osd.23 is on server2 Mon.0 is on server1 Mon.1 is on server2 Mon.2 is on server3 w

RE: High-availability testing of ceph

2012-07-31 Thread Eric_YH_Chen
Hi, Josh: Thanks for your reply. However, I had asked a question about replica setting before. http://www.spinics.net/lists/ceph-devel/msg07346.html If the performance of rbd device is n MB/s under replica=2, then that means the total io throughputs on hard disk is over 3 * n MB/s. Because I th

High-availability testing of ceph

2012-07-30 Thread Eric_YH_Chen
Hi, all: I am testing high-availability of ceph. Environment: two servers, and 12 hard-disk on each server. Version: Ceph 0.48 Kernel: 3.2.0-27 We create a ceph cluster with 24 osd. Osd.0 ~ osd.11 is on server1 Osd.12 ~ osd.23 is on server2 The crush rule is using default rul

RE: rbd map fail when the crushmap algorithm changed to tree

2012-07-08 Thread Eric_YH_Chen
Hi, Gregory: OS: Ubuntu 12.04 kernel: 3.2.0-26 ceph: 0.48 filesystem : ext4 My step to assign new crush map 1. ceph osd getcrushmap -o curmap 2. crushtool -d curmap -o curmap.txt 3. modify the curmap.txt and rename to newmap.txt 4. service ceph -a stop => destruct the cluster 5. mkcephfs -a -c c

rbd map fail when the crushmap algorithm changed to tree

2012-07-06 Thread Eric_YH_Chen
Hi all: Here is the original crushmap, I change the algorithm of host to "tree" and set back to ceph cluster. However, when I try to map one imge to rados block device (RBD), it would hang and no response until I press ctrl-c. ( rbd map => then hang) Is there any wrong in the crushmap? Th

What does replica size mean?

2012-07-04 Thread Eric_YH_Chen
Hi, all: Just want to make sure one thing. If I set replica size as 2, that means one data with 2 copies, right? Therefore, if I measure the performance of rbd is 100MB/s, I can imagine the actually io throughputs on hard disk is over 100MB/s *3 = 300 MB/s.   Am I correct? Thanks! -- To unsu

RE: Performance benchmark of rbd

2012-06-18 Thread Eric_YH_Chen
Hi, Mark and all: I think you may miss this mail before, so I send it again. == I forget to mention one thing, I create the rbd at the same machine and test it. That means the network latency may be lower than normal case. 1. I use ext4 as the backend filesystem and with following attribu

RE: Performance benchmark of rbd

2012-06-13 Thread Eric_YH_Chen
Hi, Mark: I forget to mention one thing, I create the rbd at the same machine and test it. That means the network latency may be lower than normal case. 1. I use ext4 as the backend filesystem and with following attribute. data=writeback,noatime,nodiratime,user_xattr 2. I use the default repli

Performance benchmark of rbd

2012-06-13 Thread Eric_YH_Chen
Hi, all: I am doing some benchmark of rbd. The platform is on a NAS storage. CPU: Intel E5640 2.67GHz Memory: 192 GB Hard Disk: SATA 250G * 1, 7200 rpm (H0) + SATA 1T * 12 , 7200rpm (H1~ H12) RAID Card: LSI 9260-4i OS: Ubuntu12.04 with Kernel 3.2.0-24 Network:

Journal size of each disk

2012-06-10 Thread Eric_YH_Chen
Dear all: I would like to know if the journal size influence the performance of disk. If the size of each of my disk is 1T, how much size should I prepare for the journal? Thanks for any comment. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of

How to turn on async write of rbd ?

2012-06-10 Thread Eric_YH_Chen
Dear All: I saw rbd support "async writes" since 0.36, http://ceph.com/2011/09/ But I cannot find related document that how to turn on it. Should I just write "enabled" to /sys/devices/rbd/0/power/async? One more thing, If I want to implement iSCSI multipath with RBD, just like http:/

Cannot restart the osd successful after reboot machine

2012-06-07 Thread Eric_YH_Chen
Dear All: In my testing environment, we deploy ceph cluster by version 0.43, kernel 3.2.0. (We deploy it several months ago, so the version is not the latest one) There are 5 MON and 8 OSD in the cluster.  We have 5 servers for the monitors. And two storages servers, 4 OSD for each. We meet a sit

Can we mount cephfs in centos 5.7 ?

2012-06-05 Thread Eric_YH_Chen
Dear All:   My ceph cluster is installed on Ubuntu 12.04 with kernel 3.2.0.    But I have a situation that I may need to mount cephfs on centos 5.7, which with kernel 2.6.18. (that is to say , I only need to execute mount.ceph on the centos 5.7, not the whole ceph system) I want to

Snapshot/Clone in RBD

2012-05-22 Thread Eric_YH_Chen
Hi all: According to the document, snapshot in RBD is "Read-only". That is to say, if I want to clone the image, I should use "rbd_copy". Right? I am curious about if this function is optimized, ex: copy-on-write, to speed up the performance. What I want to do is to integr

OSD id in configuration file

2012-02-13 Thread Eric_YH_Chen
Hi, all: For the scalability consideration, we would like to name the first harddisk as "00101" on first server. And named the first harddisk as "00201" on second server. The ceph.conf seems like this: [osd] osd data = /srv/osd.$id osd journal = /srv/osd.$id.journal osd journal size = 1000 [osd

RE: Hang at 'rbd info '

2011-12-29 Thread Eric_YH_Chen
Hi, Tommi: The health of ceph is 'HEALTH_OK'. Do you want to see any other information of ceph cluster? I also tried these two command but no use. `ceph osd repair *` `ceph osd scrub *` And I also cannot map the rbd image to rbd device. Is there any solution to fix

Ceph based on ext4

2011-12-29 Thread Eric_YH_Chen
Hi, all: We want to test the stability and performance of ceph with ext4 file system. Here is the information from `mount`, do we set all attributes correct? /dev/sda1 on /srv/osd.0 type ext4 (rw,noatime,nodiratime,errors=remount-ro,data=writeback,user_xattr) And fro

Why only support odd number monitors in ceph cluster?

2011-12-22 Thread Eric_YH_Chen
Hi, all: I am curious about why ceph cluster only support odd number monitors. If we lose 1 monitor (becomes odd number), would it cause any problem if we do not handle this situation in short time? Thanks! -- To unsubscribe from this list: send the line "unsubscribe ce

Do not understand some terms about cluster health

2011-12-21 Thread Eric_YH_Chen
Hi, All When I type 'ceph health' to get the status of cluster, it will show some information. Would you please to explain the term? Ex: HEALTH_WARN 3/54 degraded (5.556%) What does "degraded" mean ? Is it a serious error and how to fix it ? Ex: HEALTH_WARN 264 pgs degr

Hang at 'rbd info '

2011-12-18 Thread Eric_YH_Chen
Hi, all:     I met a situation that ceph system would hang at ‘rbd info .   Only can reproduced on one image.   How can I know what happened? Is there any log that I can provide to you for analysis? Thanks!

Hang when mapping a long name rbd image

2011-12-16 Thread Eric_YH_Chen
Hi, all: My ceph version is ceph version 0.39 (commit:321ecdaba2ceeddb0789d8f4b7180a8ea5785d83) When I try to map a long name rbd image to device, it would hang for long time. For example: sudo rbd map iqn.2012-01.com.sample:storage.ttt --secret /etc/ceph/secretfile sudo rbd map iqn.2012-01.

How to know the status of the monitors?

2011-12-13 Thread Eric_YH_Chen
Hi, all: Is there any command or API can retrieve the status of all monitors? I use ‘ceph mon stat’ to get this information. However, the mon.e is not available (server is shutdown). 2011-12-13 17:08:35.499022 mon <- [mon,stat]    2011-12-13 17:08:35.499716 mon.1 -> 'e3: 5 mons at

RE: rbd device would disappear after re-boot

2011-12-07 Thread Eric_YH_Chen
Hi, Tommi: What I see is like this. lrwxrwxrwx 1 root root 10 2011-12-07 16:48 foo1:0 -> ../../rbd0 lrwxrwxrwx 1 root root 10 2011-12-07 16:50 foo2:1 -> ../../rbd1 The extra number (:0 and :1) behind the image name make the problem still exists. -Original Message- From: Tommi Virtanen

RE: rbd device would disappear after re-boot

2011-12-06 Thread Eric_YH_Chen
Hi, Tommi: Thanks for your suggestion. We will try to implement a workaround solution for our internal experiment. I have another little question. Could I map specific image to specific device? For example: Before re-boot id poolimage snapdevice 0 rbd foo1- /d

RE: How to sync data on different server but with the same image

2011-12-06 Thread Eric_YH_Chen
Hi, Brian: I would like to use SCST target or LIO target. But I found they are not supported on ubuntu 11.10 server. (LIO iSCSI is released under 3.1 kernel, but 11.10 server uses 3.0 kernel) Could you kindly to share how to use SCST target or LIO target on ubuntu ? I tr

RE: How to sync data on different server but with the same image

2011-12-06 Thread Eric_YH_Chen
Hi, Wido: This is a preliminary experiment before implement iSCSI High Available multipath. http://ceph.newdream.net/wiki/ISCSI Therefore, we use Ceph as rbd device instead of file system. -Original Message- From: Wido den Hollander [mailto:w...@widodh.nl] Sent: Tues

rbd device would disappear after re-boot

2011-12-06 Thread Eric_YH_Chen
Dear All: Another question about the rbd device. All rbd device would disappear after server re-boot. Do you have any plan to implement a feature that server would re-map all the device during initialization? If not, do you have any suggestion (about technical part) if we want to implem

How to sync data on different server but with the same image

2011-12-06 Thread Eric_YH_Chen
Dear All: I map the same rbd image to the rbd device on two different servers. For example: 1. create rbd image named foo 2. map foo to /dev/rbd0 on server A, mount /dev/rbd0 to /mnt 3. map foo to /dev/rbd0 on server B, mount /dev/rbd0 to /mnt If I put add a fil

No function to list the pool in rados.py

2011-12-01 Thread Eric_YH_Chen
Hi, ceph developments: Could we add one more function in rados.py to list the pool? I had already implemented the function. Thanks a lot! class Rados(object): def list(self): size = c_size_t(512) while True: c_names = create_string_buffer(size.value)

Suggest to return [] when no image in the pool

2011-11-07 Thread Eric_YH_Chen
Hi, developers, When I used the API in rbd.py, I found RBD().list(ioctx) would return [""] when there is no image in the pool. I suggest it should return [] in this case. It would avoid some programming problem. Thanks! regards, Eric/Pjack

RE: Cannot execute rados.py with sudoer

2011-11-03 Thread Eric_YH_Chen
Hi, Tommi, Here is my ceph.conf. The "/var/log/ceph" folder is created by myself. Because the script in 0.37 didn't create it. Maybe the problem is I did not set correct permission to the folder. ; global [global] auth supported = cephx max open files = 131072 log file

RE: Cannot execute rados.py with sudoer

2011-11-02 Thread Eric_YH_Chen
Hi, Greg, The log is generated by ceph service at runtime. Even I change the permission, it would be overwritten by the service someday. I am afraid if there is any other permission problem when I execute other commands. Ex: I need to modify more files' permission Ex: The library use any AP

Cannot execute rados.py with sudoer

2011-11-02 Thread Eric_YH_Chen
Hi, all:    When I use raods.py, I met some problem even if the user is in sudoer.   I found it would access /etc/ceph/client.admin.keyring and /var/log/ceph/client.admin.log which is only available to root.     Do you have any suggestion? I cannot execute the python problem with “root” acco

Some question about Ceph system (release 0.37)

2011-10-25 Thread Eric_YH_Chen
Hi, everyone Nice to meet you. I would like to do some experience on ceph system. After I installed the latest release 0.37, the ceph system seems work well. (type ceph -s ) However, I found some error messages in "dmesg" that I never saw it in release 0.34. My environment is ubuntu 11.04 wit