[Gluster-users] glusterfs - git clone is very slow
Hi , We are facing slowness issue while clone repository from github using gluster latest v3.70 shared directory . Please refer the following time difference for git clone command . GlusterFS - shared directory : test@test:~/gluster$ time git clone https://github.com/elastic/elasticsearch.git Cloning into 'elasticsearch'... remote: Counting objects: 359724, done. remote: Compressing objects: 100% (55/55), done. remote: Total 359724 (delta 59), reused 20 (delta 20), pack-reused 359649 Receiving objects: 100% (359724/359724), 129.04 MiB | 569.00 KiB/s, done. Resolving deltas: 100% (203986/203986), done. Checking out files: 100% (5272/5272), done. *real9m1.972s* user0m27.063s sys0m18.974s Normal machine - without glusterfs shared directory test@test:~/s$ time git clone https://github.com/elastic/elasticsearch.git Cloning into 'elasticsearch'... remote: Counting objects: 359724, done. remote: Compressing objects: 100% (55/55), done. remote: Total 359724 (delta 59), reused 20 (delta 20), pack-reused 359649 Receiving objects: 100% (359724/359724), 129.04 MiB | 2.12 MiB/s, done. Resolving deltas: 100% (203986/203986), done. Checking connectivity... done Checking out files: 100% (5272/5272), done. *real1m56.895s* user0m12.974s sys0m4.972s Can you please check the same and let us know what are the configuration should be done to get better performance . Thanks. Siva ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 答复: 答复: 答复: Gluster peer rejected and failed to start
On 06/02/2015 12:23 PM, vyyy杨雨阳 wrote: Attached glusterd log files for 10 11 [2015-06-02 06:33:42.668268] E [glusterd-handshake.c:972:gd_validate_mgmt_hndsk_req] 0-management: Rejecting management handshake request from unknown peer 10.8.230.212:1002 From the above log it looks like node 11 tried to handshake with 12, but how come 12 is part of the cluster? Could you run gluster peer status in 12 and share glusterd log? I am still under the same opinion which I had earlier - you are trying to add node which is already a member of another cluster. Best Regards Yuyang Yang -邮件原件- 发件人: Atin Mukherjee [mailto:amukh...@redhat.com] 发送时间: Tuesday, June 02, 2015 2:42 PM 收件人: vyyy杨雨阳; Gluster-users@gluster.org 主题: Re: 答复: 答复: [Gluster-users] Gluster peer rejected and failed to start On 06/02/2015 12:04 PM, vyyy杨雨阳 wrote: Glusterfs05~glusterfs10 are clustered for 2 years, recently upgrade to 3.6.3 Glusterfs11~glusterfs14 are new nodes need to join the cluster On glusterfs09: [root@SH02SVR5952 ~]# gluster peer status Number of Peers: 6 Hostname: glusterfs06.sh2.ctripcorp.com Uuid: 2cb15023-28b0-4d0d-8a43-b8c6e570776f State: Peer in Cluster (Connected) Hostname: glusterfs07.sh2.ctripcorp.com Uuid: 5357c40d-7e34-41f0-a96b-9aa76e52ad23 State: Peer in Cluster (Connected) Hostname: glusterfs08.sh2.ctripcorp.com Uuid: 83e1a9db-3134-45e4-acd2-387b12b5b207 State: Peer in Cluster (Connected) Hostname: 10.8.230.209 Uuid: 04f22ee8-8e00-4c32-a924-b40a0e413aa6 State: Peer in Cluster (Connected) Hostname: glusterfs10.sh2.ctripcorp.com Uuid: ea17d7f9-d737-4472-ab9a-feed3cfac57c State: Peer in Cluster (Disconnected) Hostname: glusterfs11.sh2.ctripcorp.com Uuid: 2d703550-92b5-4f5e-af90-ff2fbf3366f0 State: Peer Rejected (Connected) [root@SH02SVR5952 ~]# Can you attach glusterd log files for 10 11? [root@SH02SVR5952 ~]# gluster volume status Status of volume: JQStore2 Gluster process PortOnline Pid -- Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick49152 Y 2782 Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick49152 Y 2744 Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick49152 Y 5307 Brick glusterfs09.sh2.ctripcorp.com:/export/sdb/brick49152 Y 3986 NFS Server on localhost 2049Y 51697 Self-heal Daemon on localhostN/A Y 51710 NFS Server on glusterfs07.sh2.ctripcorp.com 2049Y 110894 Self-heal Daemon on glusterfs07.sh2.ctripcorp.comN/A Y 110905 NFS Server on glusterfs06.sh2.ctripcorp.com 2049Y 22185 Self-heal Daemon on glusterfs06.sh2.ctripcorp.comN/A Y 22192 NFS Server on 10.8.230.209 2049Y 4091 Self-heal Daemon on 10.8.230.209 N/A Y 4104 Task Status of Volume JQStore2 -- There are no active volume tasks Status of volume: Webresource Gluster process PortOnline Pid -- Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick3 49155 Y 2787 Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick3 49155 Y 2753 Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick3 49155 Y 5313 Brick glusterfs09.sh2.ctripcorp.com:/export/sdb/brick3 49155 Y 3992 NFS Server on localhost 2049Y 51697 Self-heal Daemon on localhostN/A Y 51710 NFS Server on 10.8.230.209 2049Y 4091 Self-heal Daemon on 10.8.230.209 N/A Y 4104 NFS Server on glusterfs06.sh2.ctripcorp.com 2049Y 22185 Self-heal Daemon on glusterfs06.sh2.ctripcorp.comN/A Y 22192 NFS Server on glusterfs07.sh2.ctripcorp.com 2049Y 110894 Self-heal Daemon on glusterfs07.sh2.ctripcorp.comN/A Y 110905 Task Status of Volume Webresource -- There are no active volume tasks Status of volume: ccim Gluster process PortOnline Pid -- Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick2 49154 Y 2793 Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick2 49154 Y 2745 Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick2 49154 Y 5320 Brick
[Gluster-users] gluster-3.7 cannot start volume ganesha feature cannot turn on problem
Hi all, I have two question for glusterfs-3.7 on fedora-22 I used to have a glusterfs cluster version 3.6.2. The following configuration can be work in version-3.6.2, but not in version-3.7 There is 2 node for glusterfs. OS: fedora 22 Gluster: 3.7 on https://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.0/ #gluster peer probe n1 #gluster volume create ganesha n1:/data/brick1/gv0 n2:/data/brick1/gv0 Volume Name: ganesha Type: Distribute Volume ID: cbb8d360-0025-419c-a12b-b29e4b91d7f8 Status: Created Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: n1:/data/brick1/gv0 Brick2: n2:/data/brick1/gv0 Options Reconfigured: performance.readdir-ahead: on The problem to start the volume ganesha #gluster volume start ganesha volume start: ganesha: failed: Commit failed on localhost. Please check the log file for more details. LOG in /var/log/glusterfs/bricks/data-brick1-gv0.log [2015-06-02 08:02:55.232923] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.0 (args: /usr/sbin/glusterfsd -s n2 --volfile-id ganesha.n2.data-brick1-gv0 -p /var/lib/glusterd/vols/ganesha/run/n2-data-brick1-gv0.pid -S /var/run/gluster/73ea8a39514304f5ebd440321d784386.socket --brick-name /data/brick1/gv0 -l /var/log/glusterfs/bricks/data-brick1-gv0.log --xlator-option *-posix.glusterd-uuid=35547067-d343-4fee-802a-0e911b5a07cd --brick-port 49157 --xlator-option ganesha-server.listen-port=49157) [2015-06-02 08:02:55.284206] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-06-02 08:02:55.397923] W [xlator.c:192:xlator_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.0/xlator/features/changelog.so: undefined symbol: changelog_select_event [2015-06-02 08:02:55.397963] E [graph.y:212:volume_type] 0-parser: Volume 'ganesha-changelog', line 30: type 'features/changelog' is not valid or not found on this machine [2015-06-02 08:02:55.397992] E [graph.y:321:volume_end] 0-parser: type not specified for volume ganesha-changelog [2015-06-02 08:02:55.398214] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph [2015-06-02 08:02:55.398423] W [glusterfsd.c:1219:cleanup_and_exit] (-- 0-: received signum (0), shutting down I cannot google method to resolve it. Does anyone have across this problem? Another question is the feature in nfs-ganesha(version 2.2) The volume command I cannot turn on this feature. I try to copy the demo glusterfs-ganesha video but cannot work. Demo link: https://plus.google.com/events/c9omal6366f2cfkcd0iuee5ta1o [root@n1 brick1]# gluster nfs-ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue? (y/n) y nfs-ganesha: failed: Commit failed on localhost. Please check the log file for more details. Does anyone have the detail configuration? THANKS for giving advice. Regards, Ben ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances
hi Geoffrey, Since you are saying it happens on all types of volumes, lets do the following: 1) Create a dist-repl volume 2) Set the options etc you need. 3) enable gluster volume profile using gluster volume profile volname start 4) run the work load 5) give output of gluster volume profile volname info Repeat the steps above on new and old version you are comparing this with. That should give us insight into what could be causing the slowness. Pranith On 06/02/2015 03:22 AM, Geoffrey Letessier wrote: Dear all, I have a crash test cluster where i’ve tested the new version of GlusterFS (v3.7) before upgrading my HPC cluster in production. But… all my tests show me very very low performances. For my benches, as you can read below, I do some actions (untar, du, find, tar, rm) with linux kernel sources, dropping cache, each on distributed, replicated, distributed-replicated, single (single brick) volumes and the native FS of one brick. # time (echo 3 /proc/sys/vm/drop_caches; tar xJf ~/linux-4.1-rc5.tar.xz; sync; echo 3 /proc/sys/vm/drop_caches) # time (echo 3 /proc/sys/vm/drop_caches; du -sh linux-4.1-rc5/; echo 3 /proc/sys/vm/drop_caches) # time (echo 3 /proc/sys/vm/drop_caches; find linux-4.1-rc5/|wc -l; echo 3 /proc/sys/vm/drop_caches) # time (echo 3 /proc/sys/vm/drop_caches; tar czf linux-4.1-rc5.tgz linux-4.1-rc5/; echo 3 /proc/sys/vm/drop_caches) # time (echo 3 /proc/sys/vm/drop_caches; rm -rf linux-4.1-rc5.tgz linux-4.1-rc5/; echo 3 /proc/sys/vm/drop_caches) And here are the process times: --- | | UNTAR | DU | FIND | TAR | RM | --- | single | ~3m45s | ~43s |~47s | ~3m10s | ~3m15s | --- | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s | --- | distributed | ~4m18s | ~41s |~57s | ~2m24s | ~1m38s | --- | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s | --- | native FS |~11s |~4s | ~2s | ~56s | ~10s | --- I get the same results, whether with default configurations with custom configurations. if I look at the side of the ifstat command, I can note my IO write processes never exceed 3MBs... EXT4 native FS seems to be faster (roughly 15-20% but no more) than XFS one My [test] storage cluster config is composed by 2 identical servers (biCPU Intel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb ethernet) My volume settings: single: 1server 1 brick replicated: 2 servers 1 brick each distributed: 2 servers 2 bricks each dist-repl: 2 bricks in the same server and replica 2 All seems to be OK in gluster status command line. Do you have an idea why I obtain so bad results? Thanks in advance. Geoffrey --- Geoffrey Letessier Responsable informatique ingénieur système CNRS - UPR 9080 - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@cnrs.fr mailto:geoffrey.letess...@cnrs.fr ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Features - Object Count
Hi, That is exactly what I was looking for. Thanks a lot. Regards, Asen Asenov On Mon, Jun 1, 2015 at 2:51 PM, Sachin Pandit span...@redhat.com wrote: - Original Message - From: M S Vishwanath Bhat msvb...@gmail.com To: aasenov1989 aasenov1...@gmail.com Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Monday, June 1, 2015 3:02:08 PM Subject: Re: [Gluster-users] Features - Object Count On 29 May 2015 at 18:11, aasenov1989 aasenov1...@gmail.com wrote: Hi, So is there a way to find how many files I have on each brick of the volume? I don't think gluster provides a way to exactly get the number of files in a brick or volume. Sorry if my solution is very obvious. But I generally use find to get the number of files in a particular brick. find /brick/path ! -path /brick/path/.glusterfs* | wc -l Hi, You can also do getfattr -d -m . -e hex brick_path This command is to get the extended attributes of a directory. When you issue this command after enabling quota then you can see an extended attribute with name trusted.glusterfs.quota.size That basically holds the size, file count and directory count. The extended attribute consists of 48 hexadecimal numbers. First 16 will give you the size, next 16 the file count and last 16 the directory count. Hope this helps. Thanks, Sachin Pandit. Best Regards, Vishwanath Regards, Asen Asenov On Fri, May 29, 2015 at 3:33 PM, Atin Mukherjee atin.mukherje...@gmail.com wrote: Sent from Samsung Galaxy S4 On 29 May 2015 17:59, aasenov1989 aasenov1...@gmail.com wrote: Hi, Thnaks for the help. I was able to retrieve number of objects for entire volume. But I didn't figure out how to set quota for particular brick. I have replicated volume with 2 bricks on 2 nodes: Bricks: Brick1: host1:/dataDir Brick2: host2:/dataDir Both bricks are up and files are replicated. But when I try to set quota on a particular brick: IIUC, You won't be able to set quota at brick level as multiple bricks comprise a volume which is exposed to the user. Quota team can correct me if I am wrong. gluster volume quota TestVolume limit-objects /dataDir/ 9223372036854775807 quota command failed : Failed to get trusted.gfid attribute on path /dataDir/. Reason : No such file or directory please enter the path relative to the volume What should be the path to brick directories relative to the volume? Regards, Asen Asenov On Fri, May 29, 2015 at 12:35 PM, Sachin Pandit span...@redhat.com wrote: - Original Message - From: aasenov1989 aasenov1...@gmail.com To: Humble Devassy Chirammal humble.deva...@gmail.com Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, May 29, 2015 12:22:43 AM Subject: Re: [Gluster-users] Features - Object Count Thanks Humble, But as far as I understand the object count is connected with the quotas set per folders. What I want is to get number of files I have in entire volume - even when volume is distributed across multiple computers. I think the purpose of this feature: http://gluster.readthedocs.org/en/latest/Feature%20Planning/GlusterFS%203.7/Object%20Count/ Hi, You are absolutely correct. You can retrieve number of files in the entire volume if you have the limit-objects set on the root. If limit-objects is set on the directory present in a mount point then it will only show the number of files and directories of that particular directory. In your case, if you want to retrieve number of files and directories present in the entire volume then you might have to set the object limit on the root. Thanks, Sachin Pandit. is to provide such functionality. Am I right or there is no way to retrieve number of files for entire volume? Regards, Asen Asenov On Thu, May 28, 2015 at 8:09 PM, Humble Devassy Chirammal humble.deva...@gmail.com wrote: Hi Asen, https://gluster.readthedocs.org/en/latest/Features/quota-object-count/ , hope this helps. --Humble On Thu, May 28, 2015 at 8:38 PM, aasenov1989 aasenov1...@gmail.com wrote: Hi, I wanted to ask how to use this feature in gluster 3.7.0, as I was unable to find anything. How can I retrieve number of objects in volume and number of objects in particular brick? Thanks in advance. Regards, Asen Asenov ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org
Re: [Gluster-users] 3.6.3 split brain on web browser cache dir w. replica 3 volume
Cheers that's a great help. I am assuming the extra trusted.afr.volname-client- entries are left over from the removed peers, can I expect they will disappear after glusterfsd gets restarted? On 1 June 2015 at 23:49, Ravishankar N ravishan...@redhat.com wrote: On 06/01/2015 08:15 PM, Alastair Neil wrote: I have a replica 3 volume I am using to serve my home directory. I have notices a couple of split-brains recently on files used by browsers(for the most recent see below, I had an earlier one on .config/google-chrome/Default/Session Storage/) . When I was running replica 2 I don't recall seeing more than two entries of the form: trusted.afr.volname.client-?. I did have two other servers that I have removed from service recently but I am curious to know if there is some way to map what the server reports as trusted.afr.volname-client-? to a hostname? Your volfile (/var/lib/glusterd/vols/volname/trusted-volname.tcp-fuse.vol) should contain which brick (remote-subvolume + remote-host) a given trusted.afr* maps to. Hope that helps, Ravi Thanks, Alastair # gluster volume heal homes info Brick gluster-2:/export/brick2/home/ /a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in split-brain Number of entries: 1 Brick gluster1:/export/brick2/home/ /a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in split-brain Number of entries: 1 Brick gluster0:/export/brick2/home/ /a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in split-brain Number of entries: 1 # getfattr -d -m . -e hex /export/brick2/home/a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair getfattr: Removing leading '/' from absolute path names # file: export/brick2/home/a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.dirty=0x trusted.afr.homes-client-0=0x trusted.afr.homes-client-1=0x trusted.afr.homes-client-2=0x trusted.afr.homes-client-3=0x0002 trusted.afr.homes-client-4=0x trusted.gfid=0x3ae398227cea4f208d7652dbfb93e3e5 trusted.glusterfs.dht=0x0001 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.edf41dc8-2122-4aa3-bc20-29225564ca8c.contri=0x162d2200 trusted.glusterfs.quota.size=0x162d2200 ___ Gluster-users mailing listGluster-users@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances
I am seeing problems on 3.7 as well. Can you check /var/log/messages on both the clients and servers for hung tasks like: Jun 2 15:23:14 gqac006 kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Jun 2 15:23:14 gqac006 kernel: iozoneD 0001 0 21999 1 0x0080 Jun 2 15:23:14 gqac006 kernel: 880611321cc8 0082 880611321c18 a027236e Jun 2 15:23:14 gqac006 kernel: 880611321c48 a0272c10 88052bd1e040 880611321c78 Jun 2 15:23:14 gqac006 kernel: 88052bd1e0f0 88062080c7a0 880625addaf8 880611321fd8 Jun 2 15:23:14 gqac006 kernel: Call Trace: Jun 2 15:23:14 gqac006 kernel: [a027236e] ? rpc_make_runnable+0x7e/0x80 [sunrpc] Jun 2 15:23:14 gqac006 kernel: [a0272c10] ? rpc_execute+0x50/0xa0 [sunrpc] Jun 2 15:23:14 gqac006 kernel: [810aaa21] ? ktime_get_ts+0xb1/0xf0 Jun 2 15:23:14 gqac006 kernel: [811242d0] ? sync_page+0x0/0x50 Jun 2 15:23:14 gqac006 kernel: [8152a1b3] io_schedule+0x73/0xc0 Jun 2 15:23:14 gqac006 kernel: [8112430d] sync_page+0x3d/0x50 Jun 2 15:23:14 gqac006 kernel: [8152ac7f] __wait_on_bit+0x5f/0x90 Jun 2 15:23:14 gqac006 kernel: [81124543] wait_on_page_bit+0x73/0x80 Jun 2 15:23:14 gqac006 kernel: [8109eb80] ? wake_bit_function+0x0/0x50 Jun 2 15:23:14 gqac006 kernel: [8113a525] ? pagevec_lookup_tag+0x25/0x40 Jun 2 15:23:14 gqac006 kernel: [8112496b] wait_on_page_writeback_range+0xfb/0x190 Jun 2 15:23:14 gqac006 kernel: [81124b38] filemap_write_and_wait_range+0x78/0x90 Jun 2 15:23:14 gqac006 kernel: [811c07ce] vfs_fsync_range+0x7e/0x100 Jun 2 15:23:14 gqac006 kernel: [811c08bd] vfs_fsync+0x1d/0x20 Jun 2 15:23:14 gqac006 kernel: [811c08fe] do_fsync+0x3e/0x60 Jun 2 15:23:14 gqac006 kernel: [811c0950] sys_fsync+0x10/0x20 Jun 2 15:23:14 gqac006 kernel: [8100b072] system_call_fastpath+0x16/0x1b Do you see a perf problem with just a simple DD or do you need a more complex workload to hit the issue? I think I saw an issue with metadata performance that I am trying to run down, let me know if you can see the problem with simple DD reads / writes or if we need to do some sort of dir / metadata access as well. -b - Original Message - From: Geoffrey Letessier geoffrey.letess...@cnrs.fr To: Pranith Kumar Karampuri pkara...@redhat.com Cc: gluster-users@gluster.org Sent: Tuesday, June 2, 2015 8:09:04 AM Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances Hi Pranith, I’m sorry but I cannot bring you any comparison because comparison will be distorted by the fact in my HPC cluster in production the network technology is InfiniBand QDR and my volumes are quite different (brick in RAID6 (12x2TB), 2 bricks per server and 4 servers into my pool) Concerning your demand, in attachments you can find all expected results hoping it can help you to solve this serious performance issue (maybe I need play with glusterfs parameters?). Thank you very much by advance, Geoffrey -- Geoffrey Letessier Responsable informatique ingénieur système UPR 9080 - CNRS - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@ibpc.fr Le 2 juin 2015 à 10:09, Pranith Kumar Karampuri pkara...@redhat.com a écrit : hi Geoffrey, Since you are saying it happens on all types of volumes, lets do the following: 1) Create a dist-repl volume 2) Set the options etc you need. 3) enable gluster volume profile using gluster volume profile volname start 4) run the work load 5) give output of gluster volume profile volname info Repeat the steps above on new and old version you are comparing this with. That should give us insight into what could be causing the slowness. Pranith On 06/02/2015 03:22 AM, Geoffrey Letessier wrote: Dear all, I have a crash test cluster where i’ve tested the new version of GlusterFS (v3.7) before upgrading my HPC cluster in production. But… all my tests show me very very low performances. For my benches, as you can read below, I do some actions (untar, du, find, tar, rm) with linux kernel sources, dropping cache, each on distributed, replicated, distributed-replicated, single (single brick) volumes and the native FS of one brick. # time (echo 3 /proc/sys/vm/drop_caches; tar xJf ~/linux-4.1-rc5.tar.xz; sync; echo 3 /proc/sys/vm/drop_caches) # time (echo 3 /proc/sys/vm/drop_caches; du -sh linux-4.1-rc5/; echo 3 /proc/sys/vm/drop_caches) # time (echo 3 /proc/sys/vm/drop_caches; find linux-4.1-rc5/|wc -l; echo 3 /proc/sys/vm/drop_caches) # time (echo 3 /proc/sys/vm/drop_caches; tar czf linux-4.1-rc5.tgz linux-4.1-rc5/; echo 3 /proc/sys/vm/drop_caches) # time (echo 3
[Gluster-users] can't remove brick - wrong operating-version
Hi All, I'm hitting what seems to be a known but unresolved bug, exactly similar to these: Most recently: https://bugzilla.redhat.com/show_bug.cgi?id=1168897 Similar from some time ago: https://bugzilla.redhat.com/show_bug.cgi?id=1127328 Essentially the upshot is that the remove-brick operation reports: volume remove-brick commit force: failed: One or more nodes do not support the required op-version. Cluster op-version must atleast be 30600. I'm on CentOS 6.6 with GlusterFS 3.6.3 from glusterfs-epel. The operating-version in /var/lib/glusterd/glusterd.info is set to 2 on all hosts participating in the volume. I see that some recommend manually changing that setting in glusterd.info to something higher than 30600, but that does not seem particularly safe, and a Ubuntu 14.04 user reported that glusterd wouldn't actually start when that setting was changed. Is there any workaround to this? I can't imagine everyone in the world running Gluster is unable to remove bricks at the moment ... Thanks in advance for any insight you can provide. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.6.3 split brain on web browser cache dir w. replica 3 volume
On 06/03/2015 01:14 AM, Alastair Neil wrote: Cheers that's a great help. I am assuming the extra trusted.afr.volname-client- entries are left over from the removed peers, Correct. can I expect they will disappear after glusterfsd gets restarted? They will remain, but it should not affect normal operation in any way. On 1 June 2015 at 23:49, Ravishankar N ravishan...@redhat.com mailto:ravishan...@redhat.com wrote: On 06/01/2015 08:15 PM, Alastair Neil wrote: I have a replica 3 volume I am using to serve my home directory. I have notices a couple of split-brains recently on files used by browsers(for the most recent see below, I had an earlier one on .config/google-chrome/Default/Session Storage/) . When I was running replica 2 I don't recall seeing more than two entries of the form: trusted.afr.volname.client-?. I did have two other servers that I have removed from service recently but I am curious to know if there is some way to map what the server reports as trusted.afr.volname-client-? to a hostname? Your volfile (/var/lib/glusterd/vols/volname/trusted-volname.tcp-fuse.vol) should contain which brick (remote-subvolume + remote-host) a given trusted.afr* maps to. Hope that helps, Ravi Thanks, Alastair # gluster volume heal homes info Brick gluster-2:/export/brick2/home/ /a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in split-brain Number of entries: 1 Brick gluster1:/export/brick2/home/ /a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in split-brain Number of entries: 1 Brick gluster0:/export/brick2/home/ /a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in split-brain Number of entries: 1 # getfattr -d -m . -e hex /export/brick2/home/a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair getfattr: Removing leading '/' from absolute path names # file: export/brick2/home/a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.dirty=0x trusted.afr.homes-client-0=0x trusted.afr.homes-client-1=0x trusted.afr.homes-client-2=0x trusted.afr.homes-client-3=0x0002 trusted.afr.homes-client-4=0x trusted.gfid=0x3ae398227cea4f208d7652dbfb93e3e5 trusted.glusterfs.dht=0x0001 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.edf41dc8-2122-4aa3-bc20-29225564ca8c.contri=0x162d2200 trusted.glusterfs.quota.size=0x162d2200 ___ Gluster-users mailing list Gluster-users@gluster.org mailto:Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] can't remove brick - wrong operating-version
Sent from Samsung Galaxy S4 On 3 Jun 2015 01:17, Branden Timm bt...@wisc.edu wrote: Hi All, I'm hitting what seems to be a known but unresolved bug, exactly similar to these: Most recently: https://bugzilla.redhat.com/show_bug.cgi?id=1168897 Similar from some time ago: https://bugzilla.redhat.com/show_bug.cgi?id=1127328 Essentially the upshot is that the remove-brick operation reports: volume remove-brick commit force: failed: One or more nodes do not support the required op-version. Cluster op-version must atleast be 30600. I'm on CentOS 6.6 with GlusterFS 3.6.3 from glusterfs-epel. The operating-version in /var/lib/glusterd/glusterd.info is set to 2 on all hosts participating in the volume. I see that some recommend manually changing that setting in glusterd.info to something higher than 30600, but that does not seem particularly safe, and a Ubuntu 14.04 user reported that glusterd wouldn't actually start when that setting was changed. Is there any workaround to this? I can't imagine everyone in the world running Gluster is unable to remove bricks at the moment ... Thanks in advance for any insight you can provide. Could you execute gluster volume set all cluster.op-version 30600? This should bump up the cluster op-version which will ideally persist the value in glusterd.info file. Post that you should be able to execute remove brick command. HTH, Atin ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances
Hi Ben, I just check my messages log files, both on client and server, and I dont find any hung task you notice on yours.. As you can read below, i dont note the performance issue in a simple DD but I think my issue is concerning a set of small files (tens of thousands nay more)… [root@nisus test]# ddt -t 10g /mnt/test/ Writing to /mnt/test/ddt.8362 ... syncing ... done. sleeping 10 seconds ... done. Reading from /mnt/test/ddt.8362 ... done. 10240MiBKiB/s CPU% Write 114770 4 Read40675 4 for info: /mnt/test concerns the single v2 GlFS volume [root@nisus test]# ddt -t 10g /mnt/fhgfs/ Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done. sleeping 10 seconds ... done. Reading from /mnt/fhgfs/ddt.8380 ... done. 10240MiBKiB/s CPU% Write 102591 1 Read98079 2 Do you have a idea how to tune/optimize performance settings? and/or TCP settings (MTU, etc.)? --- | | UNTAR | DU | FIND | TAR | RM | --- | single | ~3m45s | ~43s |~47s | ~3m10s | ~3m15s | --- | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s | --- | distributed | ~4m18s | ~41s |~57s | ~2m24s | ~1m38s | --- | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s | --- | native FS |~11s |~4s | ~2s |~56s | ~10s | --- | BeeGFS | ~3m43s | ~15s | ~3s | ~1m33s | ~46s | --- | single (v2) | ~3m6s | ~14s |~32s | ~1m2s | ~44s | --- for info: -BeeGFS is a distributed FS (4 bricks, 2 bricks per server and 2 servers) - single (v2): simple gluster volume with default settings I also note I obtain the same tar/untar performance issue with FhGFS/BeeGFS but the rest (DU, FIND, RM) looks like to be OK. Thank you very much for your reply and help. Geoffrey --- Geoffrey Letessier Responsable informatique ingénieur système CNRS - UPR 9080 - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@cnrs.fr Le 2 juin 2015 à 21:53, Ben Turner btur...@redhat.com a écrit : I am seeing problems on 3.7 as well. Can you check /var/log/messages on both the clients and servers for hung tasks like: Jun 2 15:23:14 gqac006 kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Jun 2 15:23:14 gqac006 kernel: iozoneD 0001 0 21999 1 0x0080 Jun 2 15:23:14 gqac006 kernel: 880611321cc8 0082 880611321c18 a027236e Jun 2 15:23:14 gqac006 kernel: 880611321c48 a0272c10 88052bd1e040 880611321c78 Jun 2 15:23:14 gqac006 kernel: 88052bd1e0f0 88062080c7a0 880625addaf8 880611321fd8 Jun 2 15:23:14 gqac006 kernel: Call Trace: Jun 2 15:23:14 gqac006 kernel: [a027236e] ? rpc_make_runnable+0x7e/0x80 [sunrpc] Jun 2 15:23:14 gqac006 kernel: [a0272c10] ? rpc_execute+0x50/0xa0 [sunrpc] Jun 2 15:23:14 gqac006 kernel: [810aaa21] ? ktime_get_ts+0xb1/0xf0 Jun 2 15:23:14 gqac006 kernel: [811242d0] ? sync_page+0x0/0x50 Jun 2 15:23:14 gqac006 kernel: [8152a1b3] io_schedule+0x73/0xc0 Jun 2 15:23:14 gqac006 kernel: [8112430d] sync_page+0x3d/0x50 Jun 2 15:23:14 gqac006 kernel: [8152ac7f] __wait_on_bit+0x5f/0x90 Jun 2 15:23:14 gqac006 kernel: [81124543] wait_on_page_bit+0x73/0x80 Jun 2 15:23:14 gqac006 kernel: [8109eb80] ? wake_bit_function+0x0/0x50 Jun 2 15:23:14 gqac006 kernel: [8113a525] ? pagevec_lookup_tag+0x25/0x40 Jun 2 15:23:14 gqac006 kernel: [8112496b] wait_on_page_writeback_range+0xfb/0x190 Jun 2 15:23:14 gqac006 kernel: [81124b38] filemap_write_and_wait_range+0x78/0x90 Jun 2 15:23:14 gqac006 kernel: [811c07ce] vfs_fsync_range+0x7e/0x100 Jun 2 15:23:14 gqac006 kernel: [811c08bd] vfs_fsync+0x1d/0x20 Jun 2 15:23:14 gqac006 kernel: [811c08fe] do_fsync+0x3e/0x60 Jun 2 15:23:14 gqac006 kernel: [811c0950] sys_fsync+0x10/0x20 Jun 2 15:23:14 gqac006 kernel: [8100b072] system_call_fastpath+0x16/0x1b Do you see a perf problem with just a simple DD or do you need a more complex workload to hit the issue? I think I saw an issue with metadata performance that I am trying to run down, let me know if you can see the
[Gluster-users] GlusterFS 3.7.1 released
All, GlusterFS 3.7.1 has been released. The packages for Centos, Debian, Fedora and RHEL are available at http://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.1/ in their respective directories. A total of 58 patches were merged after v3.7.0. The following is the distribution of patches among components/features. 12 tests (regression test-suite) 8 tier 5 glusterd 5 bitrot 4 geo-rep 3 afr 21 'everywhere else' List of known bugs for 3.7.1 is being tracked at https://bugzilla.redhat.com/show_bug.cgi?id=1219955. Testing feedback and patches would be welcome. ~kp ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Client load high (300) using fuse mount
On 02. 06. 2015 07:33, Pranith Kumar Karampuri wrote: hi Mitja, Could you please give output of the following commands: 1) gluster volume info Volume Name: gvol-splet Type: Replicate Volume ID: FAKE-ID Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gluster1.setup.tld:/gluster/gvol-splet/brick0/brick Brick2: gluster2.setup.tld:/gluster/gvol-splet/brick0/brick Brick3: gluster3.setup.tld:/gluster/gvol-splet/brick0/brick Options Reconfigured: performance.cache-size: 4GB network.ping-timeout: 15 auth.allow: WEBNODE-IP1,WEBNODE-IP2 cluster.quorum-type: auto network.remote-dio: on cluster.eager-lock: on performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.cache-refresh-timeout: 4 performance.io-thread-count: 32 nfs.disable: on 2) gluster volume profile volname start 3) Wait while the CPU is high for 5-10 minutes 4) gluster volume profile volname info output-you-need-to-attach-to-this-mail.txt I cannot give you the results from the production system, because the web server was unresponsive and I switched back to local storage. The attached file contains results from the setup that was briefly in production and will be again when this is solved. The load is sythetic, generated by jmeter. During the test iotop on GlusterFS peers showed practically zero disk activity. Pretty much the same as under a real world load. Average load on the web node was a bit above 50 constantly. I will try to get the results from the production setup. Rerads, Mitja 4th command tells what are the operations that are issued a lot. Pranith On 06/01/2015 04:41 PM, Mitja Mihelič wrote: Hi! I am trying to set up a Wordpress cluster using GlusterFS used for storage. Web nodes will access the same Wordpress install on a volume mounted via FUSE from a 3 peer GlusterFS TSP. I started with one web node and Wordpress on local storage. The load average was constantly about 5. iotop showed about 300kB/s disk reads or less. The load average was below 6. When I mounted the GlusterFS volume to the web node the 1min load average went over 300. Each of the 3 peers is transmitting about 10MB/s to my web node regardless of the load. TSP peers are on 10Gbit NICs and the web node is on a 1Gbit NIC. I'm out of ideas here... Could it be the network? What should I look at for optimizing the network stack on the client? Options set on TSP: Options Reconfigured: performance.cache-size: 4GB network.ping-timeout: 15 cluster.quorum-type: auto network.remote-dio: on cluster.eager-lock: on performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.cache-refresh-timeout: 4 performance.io-thread-count: 32 nfs.disable: on Regards, Mitja Brick: gluster1.setup.tld:/gluster/gvol-splet/brick0/brick - Cumulative Stats: Block Size: 1b+ 2b+ 4b+ No. of Reads:0 0 1 No. of Writes: 2866 180 Block Size: 8b+ 16b+ 32b+ No. of Reads:2 94354 131 No. of Writes: 2124251 Block Size: 64b+ 128b+ 256b+ No. of Reads: 71 2160815208832 No. of Writes: 631 372 386 Block Size:512b+1024b+2048b+ No. of Reads: 2481316 1414880 1377502 No. of Writes: 147 111 261 Block Size: 4096b+8192b+ 16384b+ No. of Reads: 2753313 2770744 3389212 No. of Writes:17604 2566 996 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 1284591803165390224 No. of Writes: 721 1035 11387 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us 3569 FORGET 0.00 0.00 us 0.00 us 0.00 us6622013 RELEASE 0.00 0.00 us 0.00 us 0.00 us5019505 RELEASEDIR 0.00 35.00 us 35.00 us 35.00 us 1SETXATTR 0.00 67.00 us 67.00 us 67.00 us 1
Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances
Hi Pranith,I’m sorry but I cannot bring you any comparison because comparison will be distorted by the fact in my HPC cluster in production the network technology is InfiniBand QDR and my volumes are quite different (brick in RAID6 (12x2TB), 2 bricks per server and 4 servers into my pool)Concerning your demand, in attachments you can find all expected results hoping it can help you to solve this serious performance issue (maybe I need play with glusterfs parameters?).Thank you very much by advance,Geoffrey --Geoffrey LetessierResponsable informatique ingénieur systèmeUPR 9080 - CNRS - Laboratoire de BiochimieThéoriqueInstitut de Biologie Physico-Chimique13, rue Pierre et Marie Curie - 75005 ParisTel: 01 58 41 50 93 - eMail:geoffrey.letess...@ibpc.fr Le 2 juin 2015 à 10:09, Pranith Kumar Karampuri pkara...@redhat.com a écrit : hi Geoffrey, Since you are saying it happens on all types of volumes, lets do the following: 1) Create a dist-repl volume 2) Set the options etc you need. 3) enable gluster volume profile using "gluster volume profile volname start" 4) run the work load 5) give output of "gluster volume profile volname info" Repeat the steps above on new and old version you are comparing this with. That should give us insight into what could be causing the slowness. Pranith On 06/02/2015 03:22 AM, Geoffrey Letessier wrote: Dear all, I have a crash test cluster where i’ve tested the new version of GlusterFS (v3.7) before upgrading my HPC cluster in production. But… all my tests show me very very low performances. For my benches, as you can read below, I do some actions (untar, du, find, tar, rm) with linux kernel sources, dropping cache, each on distributed, replicated, distributed-replicated, single (single brick) volumes and the native FS of one brick. # time (echo 3 /proc/sys/vm/drop_caches; tar xJf ~/linux-4.1-rc5.tar.xz; sync; echo 3 /proc/sys/vm/drop_caches) # time (echo 3 /proc/sys/vm/drop_caches; du -sh linux-4.1-rc5/; echo 3 /proc/sys/vm/drop_caches) # time (echo 3 /proc/sys/vm/drop_caches; find linux-4.1-rc5/|wc -l; echo 3 /proc/sys/vm/drop_caches) # time (echo 3 /proc/sys/vm/drop_caches; tar czf linux-4.1-rc5.tgz linux-4.1-rc5/; echo 3 /proc/sys/vm/drop_caches) # time (echo 3 /proc/sys/vm/drop_caches; rm -rf linux-4.1-rc5.tgz linux-4.1-rc5/; echo 3 /proc/sys/vm/drop_caches) And here are the process times: --- | | UNTAR | DU | FIND | TAR | RM | --- | single | ~3m45s | ~43s | ~47s| ~3m10s | ~3m15s | --- | replicated | ~5m10s | ~59s | ~1m6s| ~1m19s | ~1m49s | --- | distributed | ~4m18s | ~41s | ~57s| ~2m24s | ~1m38s | --- | dist-repl | ~8m18s | ~1m4s |~1m11s | ~1m24s | ~2m40s | --- | native FS | ~11s | ~4s | ~2s| ~56s | ~10s | --- I get the same results, whether with default configurations with custom configurations. if I look at the side of the ifstat command, I can note my IO write processes never exceed 3MBs... EXT4 native FS seems to be faster (roughly 15-20% but no more) than XFS one My [test] storage cluster config is composed by 2 identical servers (biCPU Intel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb ethernet) My volume settings: single: 1server 1 brick replicated: 2 servers 1 brick each distributed: 2 servers 2 bricks each dist-repl: 2 bricks in the same server and replica 2 All seems to be OK in gluster status command line. Do you have an idea why I obtain so bad results? Thanks in advance. Geoffrey --- Geoffrey Letessier
Re: [Gluster-users] gfapi access not working with 3.7.0
On 05/31/2015 01:02 AM, Alessandro De Salvo wrote: Thanks again Pranith! Unfortunately the fixes missed the window for 3.7.1. These fixes will be available in the next release. Pranith Alessandro Il giorno 30/mag/2015, alle ore 03:16, Pranith Kumar Karampuri pkara...@redhat.com ha scritto: Alessandro, Same issue as the bug you talked about in gluster volume heal info thread. http://review.gluster.org/11002 should address this (Not the same fix you patched for glfsheal). I will backport this one to 3.7.1 as well. Pranith On 05/30/2015 12:23 AM, Alessandro De Salvo wrote: Hi, I'm trying to access a volume using gfapi and gluster 3.7.0. This was working with 3.6.3, but not working anymore after the upgrade. The volume has snapshots enabled, and it's configured in the following way: # gluster volume info adsnet-vm-01 Volume Name: adsnet-vm-01 Type: Replicate Volume ID: f8f615df-3dde-4ea6-9bdb-29a1706e864c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: gwads02.sta.adsnet.it:/gluster/vm01/data Brick2: gwads03.sta.adsnet.it:/gluster/vm01/data Options Reconfigured: server.allow-insecure: on features.file-snapshot: on features.barrier: disable nfs.disable: true Also, my /etc/glusterfs/glusterd.vol has the needed option: # cat /etc/glusterfs/glusterd.vol # This file is managed by puppet, do not change volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option ping-timeout 30 option rpc-auth-allow-insecure on # option base-port 49152 end-volume However, when I try for example to access an image via qemu-img it segfaults: # qemu-img info gluster://gwads03.sta.adsnet.it/adsnet-vm-01/images/foreman7.vm.adsnet.it.qcow2 [2015-05-29 18:39:41.436951] E [MSGID: 108006] [afr-common.c:3919:afr_notify] 0-adsnet-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-05-29 18:39:41.438234] E [rpc-transport.c:512:rpc_transport_unref] (-- /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fc3851caf16] (-- /lib64/libgfrpc.so.0(rpc_transport_unref+0xa3)[0x7fc387c855a3] (-- /lib64/libgfrpc.so.0(rpc_clnt_unref+0x5c)[0x7fc387c888ec] (-- /lib64/libglusterfs.so.0(+0x21791)[0x7fc3851c7791] (-- /lib64/libglusterfs.so.0(+0x21725)[0x7fc3851c7725] ) 0-rpc_transport: invalid argument: this [2015-05-29 18:39:41.438484] E [rpc-transport.c:512:rpc_transport_unref] (-- /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fc3851caf16] (-- /lib64/libgfrpc.so.0(rpc_transport_unref+0xa3)[0x7fc387c855a3] (-- /lib64/libgfrpc.so.0(rpc_clnt_unref+0x5c)[0x7fc387c888ec] (-- /lib64/libglusterfs.so.0(+0x21791)[0x7fc3851c7791] (-- /lib64/libglusterfs.so.0(+0x21725)[0x7fc3851c7725] ) 0-rpc_transport: invalid argument: this Segmentation fault (core dumped) The volume is fine: # gluster volume status adsnet-vm-01 Status of volume: adsnet-vm-01 Gluster process TCP Port RDMA Port Online Pid -- Brick gwads02.sta.adsnet.it:/gluster/vm01/d ata 49159 0 Y 27878 Brick gwads03.sta.adsnet.it:/gluster/vm01/d ata 49159 0 Y 24638 Self-heal Daemon on localhost N/A N/AY 28031 Self-heal Daemon on gwads03.sta.adsnet.it N/A N/AY 24667 Task Status of Volume adsnet-vm-01 -- There are no active volume tasks Running with the debugger I see the following: (gdb) r Starting program: /usr/bin/qemu-img info gluster://gwads03.sta.adsnet.it/adsnet-vm-01/images/foreman7.vm.adsnet.it.qcow2 [Thread debugging using libthread_db enabled] Using host libthread_db library /lib64/libthread_db.so.1. [New Thread 0x7176a700 (LWP 30027)] [New Thread 0x70f69700 (LWP 30028)] [New Thread 0x7fffe99ab700 (LWP 30029)] [New Thread 0x7fffe8fa7700 (LWP 30030)] [New Thread 0x7fffe3fff700 (LWP 30031)] [New Thread 0x7fffdbfff700 (LWP 30032)] [New Thread 0x7fffdb2dd700 (LWP 30033)] [2015-05-29 18:51:25.656014] E [MSGID: 108006] [afr-common.c:3919:afr_notify] 0-adsnet-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-05-29 18:51:25.657338] E [rpc-transport.c:512:rpc_transport_unref] (-- /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x748bcf16] (-- /lib64/libgfrpc.so.0(rpc_transport_unref+0xa3)[0x773775a3] (-- /lib64/libgfrpc.so.0(rpc_clnt_unref+0x5c)[0x7737a8ec] (-- /lib64/libglusterfs.so.0(+0x21791)[0x748b9791] (-- /lib64/libglusterfs.so.0(+0x21725)[0x748b9725] ) 0-rpc_transport: invalid argument: this [2015-05-29 18:51:25.657619] E
[Gluster-users] Minutes from todays Gluster Community Bug Triage meeting
On Tue, Jun 02, 2015 at 12:51:37PM +0200, Niels de Vos wrote: Hi all, This meeting is scheduled for anyone that is interested in learning more about, or assisting with the Bug Triage. Meeting details: - location: #gluster-meeting on Freenode IRC ( https://webchat.freenode.net/?channels=gluster-meeting ) - date: every Tuesday - time: 12:00 UTC (in your terminal, run: date -d 12:00 UTC) - agenda: https://public.pad.fsfe.org/p/gluster-bug-triage Currently the following items are listed: * Roll Call * Status of last weeks action items * Group Triage * Open Floor The last two topics have space for additions. If you have a suitable bug or topic to discuss, please add it to the agenda. Appreciate your participation. Minutes: http://meetbot.fedoraproject.org/gluster-meeting/2015-06-02/gluster-meeting.2015-06-02-12.06.html Minutes (text): http://meetbot.fedoraproject.org/gluster-meeting/2015-06-02/gluster-meeting.2015-06-02-12.06.txt Log: http://meetbot.fedoraproject.org/gluster-meeting/2015-06-02/gluster-meeting.2015-06-02-12.06.log.html Meeting summary 1. a. Agenda: https://public.pad.fsfe.org/p/gluster-bug-triage (ndevos, 12:06:40) 2. Roll Call (ndevos, 12:07:01) 3. Action Items from last week (ndevos, 12:08:55) 4. ndevos needs to look into building nightly debug rpms that can be used for testing (ndevos, 12:09:35) 5. Group Triage (ndevos, 12:11:13) a. 0 bugs are waiting on feedback from b...@gluster.org (ndevos, 12:11:52) b. 20 new bugs that have not been (completely) triaged yet: http://goo.gl/WuDQun (ndevos, 12:12:43) 6. Open Floor (ndevos, 12:46:08) Meeting ended at 12:47:35 UTC (full logs). Action items 1. (none) People present (lines said) 1. ndevos (43) 2. soumya (15) 3. rjoseph (3) 4. zodbot (2) Generated by MeetBot 0.1.4. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster-3.7 cannot start volume ganesha feature cannot turn on problem
On 06/02/2015 04:38 PM, Anoop C S wrote: On 06/02/2015 01:42 PM, 莊尚豪 wrote: Hi all, I have two question for glusterfs-3.7 on fedora-22 I used to have a glusterfs cluster version 3.6.2. The following configuration can be work in version-3.6.2, but not in version-3.7 There is 2 node for glusterfs. OS: fedora 22 Gluster: 3.7 on https://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.0/ #gluster peer probe n1 #gluster volume create ganesha n1:/data/brick1/gv0 n2:/data/brick1/gv0 Volume Name: ganesha Type: Distribute Volume ID: cbb8d360-0025-419c-a12b-b29e4b91d7f8 Status: Created Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: n1:/data/brick1/gv0 Brick2: n2:/data/brick1/gv0 Options Reconfigured: performance.readdir-ahead: on The problem to start the volume ganesha #gluster volume start ganesha volume start: ganesha: failed: Commit failed on localhost. Please check the log file for more details. LOG in /var/log/glusterfs/bricks/data-brick1-gv0.log [2015-06-02 08:02:55.232923] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.0 (args: /usr/sbin/glusterfsd -s n2 --volfile-id ganesha.n2.data-brick1-gv0 -p /var/lib/glusterd/vols/ganesha/run/n2-data-brick1-gv0.pid -S /var/run/gluster/73ea8a39514304f5ebd440321d784386.socket --brick-name /data/brick1/gv0 -l /var/log/glusterfs/bricks/data-brick1-gv0.log --xlator-option *-posix.glusterd-uuid=35547067-d343-4fee-802a-0e911b5a07cd --brick-port 49157 --xlator-option ganesha-server.listen-port=49157) [2015-06-02 08:02:55.284206] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-06-02 08:02:55.397923] W [xlator.c:192:xlator_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.0/xlator/features/changelog.so: undefined symbol: changelog_select_event This particular error for undefined symbol changelog_select_event was identified recently and corresponding fix [ http://review.gluster.org/#/c/11004/ ] is already in master and hopefully will be available with v3.7.1. [2015-06-02 08:02:55.397963] E [graph.y:212:volume_type] 0-parser: Volume 'ganesha-changelog', line 30: type 'features/changelog' is not valid or not found on this machine [2015-06-02 08:02:55.397992] E [graph.y:321:volume_end] 0-parser: type not specified for volume ganesha-changelog [2015-06-02 08:02:55.398214] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph [2015-06-02 08:02:55.398423] W [glusterfsd.c:1219:cleanup_and_exit] (-- 0-: received signum (0), shutting down I cannot google method to resolve it. Does anyone have across this problem? Another question is the feature in nfs-ganesha(version 2.2) The volume command I cannot turn on this feature. I try to copy the demo glusterfs-ganesha video but cannot work. Demo link: https://plus.google.com/events/c9omal6366f2cfkcd0iuee5ta1o [root@n1 brick1]# gluster nfs-ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue? (y/n) y nfs-ganesha: failed: Commit failed on localhost. Please check the log file for more details. As you may have seen in the demo video, there are many pre-requisites to be followed before enabling nfs-ganesha. Can you please re-check if you have all those steps taken care of ? Also look at the logs '/var/log/ganesha.log' and '/var/log/messages' for any specific errors logged. Thanks, Soumya Adding ganesha folks to the thread. Does anyone have the detail configuration? THANKS for giving advice. Regards, Ben ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gfapi access not working with 3.7.0
OK, Thanks Pranith. Do you have a timeline for that? Cheers, Alessandro Il giorno 02/giu/2015, alle ore 15:12, Pranith Kumar Karampuri pkara...@redhat.com ha scritto: On 05/31/2015 01:02 AM, Alessandro De Salvo wrote: Thanks again Pranith! Unfortunately the fixes missed the window for 3.7.1. These fixes will be available in the next release. Pranith Alessandro Il giorno 30/mag/2015, alle ore 03:16, Pranith Kumar Karampuri pkara...@redhat.com ha scritto: Alessandro, Same issue as the bug you talked about in gluster volume heal info thread. http://review.gluster.org/11002 should address this (Not the same fix you patched for glfsheal). I will backport this one to 3.7.1 as well. Pranith On 05/30/2015 12:23 AM, Alessandro De Salvo wrote: Hi, I'm trying to access a volume using gfapi and gluster 3.7.0. This was working with 3.6.3, but not working anymore after the upgrade. The volume has snapshots enabled, and it's configured in the following way: # gluster volume info adsnet-vm-01 Volume Name: adsnet-vm-01 Type: Replicate Volume ID: f8f615df-3dde-4ea6-9bdb-29a1706e864c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: gwads02.sta.adsnet.it:/gluster/vm01/data Brick2: gwads03.sta.adsnet.it:/gluster/vm01/data Options Reconfigured: server.allow-insecure: on features.file-snapshot: on features.barrier: disable nfs.disable: true Also, my /etc/glusterfs/glusterd.vol has the needed option: # cat /etc/glusterfs/glusterd.vol # This file is managed by puppet, do not change volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option ping-timeout 30 option rpc-auth-allow-insecure on # option base-port 49152 end-volume However, when I try for example to access an image via qemu-img it segfaults: # qemu-img info gluster://gwads03.sta.adsnet.it/adsnet-vm-01/images/foreman7.vm.adsnet.it.qcow2 [2015-05-29 18:39:41.436951] E [MSGID: 108006] [afr-common.c:3919:afr_notify] 0-adsnet-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-05-29 18:39:41.438234] E [rpc-transport.c:512:rpc_transport_unref] (-- /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fc3851caf16] (-- /lib64/libgfrpc.so.0(rpc_transport_unref+0xa3)[0x7fc387c855a3] (-- /lib64/libgfrpc.so.0(rpc_clnt_unref+0x5c)[0x7fc387c888ec] (-- /lib64/libglusterfs.so.0(+0x21791)[0x7fc3851c7791] (-- /lib64/libglusterfs.so.0(+0x21725)[0x7fc3851c7725] ) 0-rpc_transport: invalid argument: this [2015-05-29 18:39:41.438484] E [rpc-transport.c:512:rpc_transport_unref] (-- /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fc3851caf16] (-- /lib64/libgfrpc.so.0(rpc_transport_unref+0xa3)[0x7fc387c855a3] (-- /lib64/libgfrpc.so.0(rpc_clnt_unref+0x5c)[0x7fc387c888ec] (-- /lib64/libglusterfs.so.0(+0x21791)[0x7fc3851c7791] (-- /lib64/libglusterfs.so.0(+0x21725)[0x7fc3851c7725] ) 0-rpc_transport: invalid argument: this Segmentation fault (core dumped) The volume is fine: # gluster volume status adsnet-vm-01 Status of volume: adsnet-vm-01 Gluster process TCP Port RDMA Port Online Pid -- Brick gwads02.sta.adsnet.it:/gluster/vm01/d ata 49159 0 Y 27878 Brick gwads03.sta.adsnet.it:/gluster/vm01/d ata 49159 0 Y 24638 Self-heal Daemon on localhost N/A N/AY 28031 Self-heal Daemon on gwads03.sta.adsnet.it N/A N/AY 24667 Task Status of Volume adsnet-vm-01 -- There are no active volume tasks Running with the debugger I see the following: (gdb) r Starting program: /usr/bin/qemu-img info gluster://gwads03.sta.adsnet.it/adsnet-vm-01/images/foreman7.vm.adsnet.it.qcow2 [Thread debugging using libthread_db enabled] Using host libthread_db library /lib64/libthread_db.so.1. [New Thread 0x7176a700 (LWP 30027)] [New Thread 0x70f69700 (LWP 30028)] [New Thread 0x7fffe99ab700 (LWP 30029)] [New Thread 0x7fffe8fa7700 (LWP 30030)] [New Thread 0x7fffe3fff700 (LWP 30031)] [New Thread 0x7fffdbfff700 (LWP 30032)] [New Thread 0x7fffdb2dd700 (LWP 30033)] [2015-05-29 18:51:25.656014] E [MSGID: 108006] [afr-common.c:3919:afr_notify] 0-adsnet-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-05-29 18:51:25.657338] E [rpc-transport.c:512:rpc_transport_unref] (-- /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x748bcf16] (--
Re: [Gluster-users] gfapi access not working with 3.7.0
On 06/02/2015 06:52 PM, Alessandro De Salvo wrote: OK, Thanks Pranith. Do you have a timeline for that? It will be discussed in tomorrow's weekly community developers meeting. Then we may have some estimate. Pranith Cheers, Alessandro Il giorno 02/giu/2015, alle ore 15:12, Pranith Kumar Karampuri pkara...@redhat.com ha scritto: On 05/31/2015 01:02 AM, Alessandro De Salvo wrote: Thanks again Pranith! Unfortunately the fixes missed the window for 3.7.1. These fixes will be available in the next release. Pranith Alessandro Il giorno 30/mag/2015, alle ore 03:16, Pranith Kumar Karampuri pkara...@redhat.com ha scritto: Alessandro, Same issue as the bug you talked about in gluster volume heal info thread. http://review.gluster.org/11002 should address this (Not the same fix you patched for glfsheal). I will backport this one to 3.7.1 as well. Pranith On 05/30/2015 12:23 AM, Alessandro De Salvo wrote: Hi, I'm trying to access a volume using gfapi and gluster 3.7.0. This was working with 3.6.3, but not working anymore after the upgrade. The volume has snapshots enabled, and it's configured in the following way: # gluster volume info adsnet-vm-01 Volume Name: adsnet-vm-01 Type: Replicate Volume ID: f8f615df-3dde-4ea6-9bdb-29a1706e864c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: gwads02.sta.adsnet.it:/gluster/vm01/data Brick2: gwads03.sta.adsnet.it:/gluster/vm01/data Options Reconfigured: server.allow-insecure: on features.file-snapshot: on features.barrier: disable nfs.disable: true Also, my /etc/glusterfs/glusterd.vol has the needed option: # cat /etc/glusterfs/glusterd.vol # This file is managed by puppet, do not change volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option ping-timeout 30 option rpc-auth-allow-insecure on # option base-port 49152 end-volume However, when I try for example to access an image via qemu-img it segfaults: # qemu-img info gluster://gwads03.sta.adsnet.it/adsnet-vm-01/images/foreman7.vm.adsnet.it.qcow2 [2015-05-29 18:39:41.436951] E [MSGID: 108006] [afr-common.c:3919:afr_notify] 0-adsnet-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-05-29 18:39:41.438234] E [rpc-transport.c:512:rpc_transport_unref] (-- /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fc3851caf16] (-- /lib64/libgfrpc.so.0(rpc_transport_unref+0xa3)[0x7fc387c855a3] (-- /lib64/libgfrpc.so.0(rpc_clnt_unref+0x5c)[0x7fc387c888ec] (-- /lib64/libglusterfs.so.0(+0x21791)[0x7fc3851c7791] (-- /lib64/libglusterfs.so.0(+0x21725)[0x7fc3851c7725] ) 0-rpc_transport: invalid argument: this [2015-05-29 18:39:41.438484] E [rpc-transport.c:512:rpc_transport_unref] (-- /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fc3851caf16] (-- /lib64/libgfrpc.so.0(rpc_transport_unref+0xa3)[0x7fc387c855a3] (-- /lib64/libgfrpc.so.0(rpc_clnt_unref+0x5c)[0x7fc387c888ec] (-- /lib64/libglusterfs.so.0(+0x21791)[0x7fc3851c7791] (-- /lib64/libglusterfs.so.0(+0x21725)[0x7fc3851c7725] ) 0-rpc_transport: invalid argument: this Segmentation fault (core dumped) The volume is fine: # gluster volume status adsnet-vm-01 Status of volume: adsnet-vm-01 Gluster process TCP Port RDMA Port Online Pid -- Brick gwads02.sta.adsnet.it:/gluster/vm01/d ata 49159 0 Y 27878 Brick gwads03.sta.adsnet.it:/gluster/vm01/d ata 49159 0 Y 24638 Self-heal Daemon on localhost N/A N/AY 28031 Self-heal Daemon on gwads03.sta.adsnet.it N/A N/AY 24667 Task Status of Volume adsnet-vm-01 -- There are no active volume tasks Running with the debugger I see the following: (gdb) r Starting program: /usr/bin/qemu-img info gluster://gwads03.sta.adsnet.it/adsnet-vm-01/images/foreman7.vm.adsnet.it.qcow2 [Thread debugging using libthread_db enabled] Using host libthread_db library /lib64/libthread_db.so.1. [New Thread 0x7176a700 (LWP 30027)] [New Thread 0x70f69700 (LWP 30028)] [New Thread 0x7fffe99ab700 (LWP 30029)] [New Thread 0x7fffe8fa7700 (LWP 30030)] [New Thread 0x7fffe3fff700 (LWP 30031)] [New Thread 0x7fffdbfff700 (LWP 30032)] [New Thread 0x7fffdb2dd700 (LWP 30033)] [2015-05-29 18:51:25.656014] E [MSGID: 108006] [afr-common.c:3919:afr_notify] 0-adsnet-vm-01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-05-29 18:51:25.657338] E [rpc-transport.c:512:rpc_transport_unref] (--
Re: [Gluster-users] gluster-3.7 cannot start volume ganesha feature cannot turn on problem
Can you please attach the glusterd logs here? You are having trouble to even start the volume here right? And also HA configuration is mandatory to use NFS-Ganesha in this release. Once you have the volume started, I can help you with the remaining steps in detail. Thanks Meghana - Original Message - From: Anoop C S achir...@redhat.com To: gluster-users@gluster.org Cc: Meghana Madhusudhan mmadh...@redhat.com, Soumya Koduri skod...@redhat.com Sent: Tuesday, June 2, 2015 4:38:39 PM Subject: Re: [Gluster-users] gluster-3.7 cannot start volume ganesha feature cannot turn on problem manadatroy On 06/02/2015 01:42 PM, 莊尚豪 wrote: Hi all, I have two question for glusterfs-3.7 on fedora-22 I used to have a glusterfs cluster version 3.6.2. The following configuration can be work in version-3.6.2, but not in version-3.7 There is 2 node for glusterfs. OS: fedora 22 Gluster: 3.7 on https://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.0/ #gluster peer probe n1 #gluster volume create ganesha n1:/data/brick1/gv0 n2:/data/brick1/gv0 Volume Name: ganesha Type: Distribute Volume ID: cbb8d360-0025-419c-a12b-b29e4b91d7f8 Status: Created Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: n1:/data/brick1/gv0 Brick2: n2:/data/brick1/gv0 Options Reconfigured: performance.readdir-ahead: on The problem to start the volume ganesha #gluster volume start ganesha volume start: ganesha: failed: Commit failed on localhost. Please check the log file for more details. LOG in /var/log/glusterfs/bricks/data-brick1-gv0.log [2015-06-02 08:02:55.232923] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.0 (args: /usr/sbin/glusterfsd -s n2 --volfile-id ganesha.n2.data-brick1-gv0 -p /var/lib/glusterd/vols/ganesha/run/n2-data-brick1-gv0.pid -S /var/run/gluster/73ea8a39514304f5ebd440321d784386.socket --brick-name /data/brick1/gv0 -l /var/log/glusterfs/bricks/data-brick1-gv0.log --xlator-option *-posix.glusterd-uuid=35547067-d343-4fee-802a-0e911b5a07cd --brick-port 49157 --xlator-option ganesha-server.listen-port=49157) [2015-06-02 08:02:55.284206] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-06-02 08:02:55.397923] W [xlator.c:192:xlator_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.0/xlator/features/changelog.so: undefined symbol: changelog_select_event This particular error for undefined symbol changelog_select_event was identified recently and corresponding fix [ http://review.gluster.org/#/c/11004/ ] is already in master and hopefully will be available with v3.7.1. [2015-06-02 08:02:55.397963] E [graph.y:212:volume_type] 0-parser: Volume 'ganesha-changelog', line 30: type 'features/changelog' is not valid or not found on this machine [2015-06-02 08:02:55.397992] E [graph.y:321:volume_end] 0-parser: type not specified for volume ganesha-changelog [2015-06-02 08:02:55.398214] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph [2015-06-02 08:02:55.398423] W [glusterfsd.c:1219:cleanup_and_exit] (-- 0-: received signum (0), shutting down I cannot google method to resolve it. Does anyone have across this problem? Another question is the feature in nfs-ganesha(version 2.2) The volume command I cannot turn on this feature. I try to copy the demo glusterfs-ganesha video but cannot work. Demo link: https://plus.google.com/events/c9omal6366f2cfkcd0iuee5ta1o [root@n1 brick1]# gluster nfs-ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue? (y/n) y nfs-ganesha: failed: Commit failed on localhost. Please check the log file for more details. Adding ganesha folks to the thread. Does anyone have the detail configuration? THANKS for giving advice. Regards, Ben ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster-3.7 cannot start volume ganesha feature cannot turn on problem
Forwarded Message Subject: Re: [Gluster-users] gluster-3.7 cannot start volume ganesha feature cannot turn on problem Date: Tue, 2 Jun 2015 09:01:38 -0400 (EDT) From: Meghana Madhusudhan mmadh...@redhat.com To: Anoop C S achir...@redhat.com CC: gluster-users@gluster.org, Soumya Koduri skod...@redhat.com Hi Anoop, Can you add the ID of the person who asked this question and forward the same? Can you please attach the glusterd logs here? You are having trouble to even start the volume here right? And also HA configuration is mandatory to use NFS-Ganesha in this release. Once you have the volume started, I can help you with the remaining steps in detail. Thanks Meghana - Original Message - From: Anoop C S achir...@redhat.com To: gluster-users@gluster.org Cc: Meghana Madhusudhan mmadh...@redhat.com, Soumya Koduri skod...@redhat.com Sent: Tuesday, June 2, 2015 4:38:39 PM Subject: Re: [Gluster-users] gluster-3.7 cannot start volume ganesha feature cannot turn on problem manadatroy On 06/02/2015 01:42 PM, 莊尚豪 wrote: Hi all, I have two question for glusterfs-3.7 on fedora-22 I used to have a glusterfs cluster version 3.6.2. The following configuration can be work in version-3.6.2, but not in version-3.7 There is 2 node for glusterfs. OS: fedora 22 Gluster: 3.7 on https://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.0/ #gluster peer probe n1 #gluster volume create ganesha n1:/data/brick1/gv0 n2:/data/brick1/gv0 Volume Name: ganesha Type: Distribute Volume ID: cbb8d360-0025-419c-a12b-b29e4b91d7f8 Status: Created Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: n1:/data/brick1/gv0 Brick2: n2:/data/brick1/gv0 Options Reconfigured: performance.readdir-ahead: on The problem to start the volume ganesha #gluster volume start ganesha volume start: ganesha: failed: Commit failed on localhost. Please check the log file for more details. LOG in /var/log/glusterfs/bricks/data-brick1-gv0.log [2015-06-02 08:02:55.232923] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.0 (args: /usr/sbin/glusterfsd -s n2 --volfile-id ganesha.n2.data-brick1-gv0 -p /var/lib/glusterd/vols/ganesha/run/n2-data-brick1-gv0.pid -S /var/run/gluster/73ea8a39514304f5ebd440321d784386.socket --brick-name /data/brick1/gv0 -l /var/log/glusterfs/bricks/data-brick1-gv0.log --xlator-option *-posix.glusterd-uuid=35547067-d343-4fee-802a-0e911b5a07cd --brick-port 49157 --xlator-option ganesha-server.listen-port=49157) [2015-06-02 08:02:55.284206] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-06-02 08:02:55.397923] W [xlator.c:192:xlator_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.0/xlator/features/changelog.so: undefined symbol: changelog_select_event This particular error for undefined symbol changelog_select_event was identified recently and corresponding fix [ http://review.gluster.org/#/c/11004/ ] is already in master and hopefully will be available with v3.7.1. [2015-06-02 08:02:55.397963] E [graph.y:212:volume_type] 0-parser: Volume 'ganesha-changelog', line 30: type 'features/changelog' is not valid or not found on this machine [2015-06-02 08:02:55.397992] E [graph.y:321:volume_end] 0-parser: type not specified for volume ganesha-changelog [2015-06-02 08:02:55.398214] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph [2015-06-02 08:02:55.398423] W [glusterfsd.c:1219:cleanup_and_exit] (-- 0-: received signum (0), shutting down I cannot google method to resolve it. Does anyone have across this problem? Another question is the feature in nfs-ganesha(version 2.2) The volume command I cannot turn on this feature. I try to copy the demo glusterfs-ganesha video but cannot work. Demo link: https://plus.google.com/events/c9omal6366f2cfkcd0iuee5ta1o [root@n1 brick1]# gluster nfs-ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue? (y/n) y nfs-ganesha: failed: Commit failed on localhost. Please check the log file for more details. Adding ganesha folks to the thread. Does anyone have the detail configuration? THANKS for giving advice. Regards, Ben ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Geo-Replication - Changelog socket is not present - Falling back to xsync
Sure, https://dl.dropboxusercontent.com/u/2663552/logs.tgz Yesterday I restart the geo-rep (and reset the changelog.changelog option). Today it looks converged and changelog keeps doing his job. BUT hybridcrawl doesn’t seem to update symlink links if they changed on master: From master: ll -n /usr/global/images/3.2/latest lrwxrwxrwx 1 499 499 3 Jun 1 21:40 /usr/global/images/3.2/latest - S22 On slave: ls /usr/global/images/3.2/latest lrwxrwxrwx 1 root root 2 May 9 07:01 /usr/global/images/3.2/latest - S3 The point is I can’t get the gfid from the symlink because it resolve the target folder. And by the way all data synced in hybrid crawl are root.root on the slave (they should keep the owner from the master as it also exist on the slave). So. 1/I will need to remove symlinks from the salve and retrigger an hybrid crawl (again) 2/I will need to update permissions of the salve according to permissions on master (will be long and difficult) 3/Or I missed something here. Thanks! -- Cyril Peponnet On Jun 1, 2015, at 10:20 PM, Kotresh Hiremath Ravishankar khire...@redhat.commailto:khire...@redhat.com wrote: Hi Cyril, Could you please attach the geo-replication logs? Thanks and Regards, Kotresh H R - Original Message - From: Cyril N PEPONNET (Cyril) cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.com To: Kotresh Hiremath Ravishankar khire...@redhat.commailto:khire...@redhat.com Cc: gluster-users gluster-users@gluster.orgmailto:gluster-users@gluster.org Sent: Monday, June 1, 2015 10:34:42 PM Subject: Re: [Gluster-users] Geo-Replication - Changelog socket is not present - Falling back to xsync Some news, Looks like changelog is not working anymore. When I touch a file in master it doesnt propagate to slave… .processing folder contain a thousand of changelog not processed. I had to stop the geo-rep, reset changelog.changelog to the volume and restart the geo-rep. It’s now sending missing files using hybrid crawl. So geo-repo is not working as expected. Another thing, we use symlink to point to latest release build, and it seems that symlinks are not synced when they change from master to slave. Any idea on how I can debug this ? -- Cyril Peponnet On May 29, 2015, at 3:01 AM, Kotresh Hiremath Ravishankar khire...@redhat.commailto:khire...@redhat.commailto:khire...@redhat.com wrote: Yes, geo-rep internally uses fuse mount. I will explore further and get back to you if there is a way. Thanks and Regards, Kotresh H R - Original Message - From: Cyril N PEPONNET (Cyril) cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.com To: Kotresh Hiremath Ravishankar khire...@redhat.commailto:khire...@redhat.commailto:khire...@redhat.com Cc: gluster-users gluster-users@gluster.orgmailto:gluster-users@gluster.orgmailto:gluster-users@gluster.org Sent: Thursday, May 28, 2015 10:12:57 PM Subject: Re: [Gluster-users] Geo-Replication - Changelog socket is not present - Falling back to xsync One more thing: nfs.volume-access read-only works only for nfs clients, glusterfs client have still write access features.read-only on need a vol restart and set RO for everyone but in this case, geo-rep goes faulty. [2015-05-28 09:42:27.917897] E [repce(/export/raid/usr_global):188:__call__] RepceClient: call 8739:139858642609920:1432831347.73 (keep_alive) failed on peer with OSError [2015-05-28 09:42:27.918102] E [syncdutils(/export/raid/usr_global):240:log_raise_exception] top: FAIL: Traceback (most recent call last): File /usr/libexec/glusterfs/python/syncdaemon/syncdutils.py, line 266, in twrap tf(*aa) File /usr/libexec/glusterfs/python/syncdaemon/master.py, line 391, in keep_alive cls.slave.server.keep_alive(vi) File /usr/libexec/glusterfs/python/syncdaemon/repce.py, line 204, in __call__ return self.ins(self.meth, *a) File /usr/libexec/glusterfs/python/syncdaemon/repce.py, line 189, in __call__ raise res OSError: [Errno 30] Read- So there is no proper way to protect the salve against write. -- Cyril Peponnet On May 28, 2015, at 8:54 AM, Cyril Peponnet cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.com wrote: Hi Kotresh, Inline. Again, thank for you time. -- Cyril Peponnet On May 27, 2015, at 10:47 PM, Kotresh Hiremath Ravishankar khire...@redhat.commailto:khire...@redhat.commailto:khire...@redhat.commailto:khire...@redhat.com wrote: Hi Cyril, Replies inline. Thanks and Regards, Kotresh H R - Original Message - From: Cyril N PEPONNET (Cyril) cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.com To: Kotresh Hiremath Ravishankar khire...@redhat.commailto:khire...@redhat.commailto:khire...@redhat.commailto:khire...@redhat.com Cc: gluster-users
Re: [Gluster-users] 答复: 答复: Gluster peer rejected and failed to start
On 06/02/2015 12:04 PM, vyyy杨雨阳 wrote: Glusterfs05~glusterfs10 are clustered for 2 years, recently upgrade to 3.6.3 Glusterfs11~glusterfs14 are new nodes need to join the cluster On glusterfs09: [root@SH02SVR5952 ~]# gluster peer status Number of Peers: 6 Hostname: glusterfs06.sh2.ctripcorp.com Uuid: 2cb15023-28b0-4d0d-8a43-b8c6e570776f State: Peer in Cluster (Connected) Hostname: glusterfs07.sh2.ctripcorp.com Uuid: 5357c40d-7e34-41f0-a96b-9aa76e52ad23 State: Peer in Cluster (Connected) Hostname: glusterfs08.sh2.ctripcorp.com Uuid: 83e1a9db-3134-45e4-acd2-387b12b5b207 State: Peer in Cluster (Connected) Hostname: 10.8.230.209 Uuid: 04f22ee8-8e00-4c32-a924-b40a0e413aa6 State: Peer in Cluster (Connected) Hostname: glusterfs10.sh2.ctripcorp.com Uuid: ea17d7f9-d737-4472-ab9a-feed3cfac57c State: Peer in Cluster (Disconnected) Hostname: glusterfs11.sh2.ctripcorp.com Uuid: 2d703550-92b5-4f5e-af90-ff2fbf3366f0 State: Peer Rejected (Connected) [root@SH02SVR5952 ~]# Can you attach glusterd log files for 10 11? [root@SH02SVR5952 ~]# gluster volume status Status of volume: JQStore2 Gluster process PortOnline Pid -- Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick 49152 Y 2782 Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick 49152 Y 2744 Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick 49152 Y 5307 Brick glusterfs09.sh2.ctripcorp.com:/export/sdb/brick 49152 Y 3986 NFS Server on localhost 2049Y 51697 Self-heal Daemon on localhost N/A Y 51710 NFS Server on glusterfs07.sh2.ctripcorp.com 2049Y 110894 Self-heal Daemon on glusterfs07.sh2.ctripcorp.com N/A Y 110905 NFS Server on glusterfs06.sh2.ctripcorp.com 2049Y 22185 Self-heal Daemon on glusterfs06.sh2.ctripcorp.com N/A Y 22192 NFS Server on 10.8.230.2092049Y 4091 Self-heal Daemon on 10.8.230.209 N/A Y 4104 Task Status of Volume JQStore2 -- There are no active volume tasks Status of volume: Webresource Gluster process PortOnline Pid -- Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick349155 Y 2787 Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick349155 Y 2753 Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick349155 Y 5313 Brick glusterfs09.sh2.ctripcorp.com:/export/sdb/brick349155 Y 3992 NFS Server on localhost 2049Y 51697 Self-heal Daemon on localhost N/A Y 51710 NFS Server on 10.8.230.2092049Y 4091 Self-heal Daemon on 10.8.230.209 N/A Y 4104 NFS Server on glusterfs06.sh2.ctripcorp.com 2049Y 22185 Self-heal Daemon on glusterfs06.sh2.ctripcorp.com N/A Y 22192 NFS Server on glusterfs07.sh2.ctripcorp.com 2049Y 110894 Self-heal Daemon on glusterfs07.sh2.ctripcorp.com N/A Y 110905 Task Status of Volume Webresource -- There are no active volume tasks Status of volume: ccim Gluster process PortOnline Pid -- Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick249154 Y 2793 Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick249154 Y 2745 Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick249154 Y 5320 Brick glusterfs09.sh2.ctripcorp.com:/export/sdb/brick249154 Y 3999 NFS Server on localhost 2049Y 51697 Self-heal Daemon on localhost N/A Y 51710 NFS Server on glusterfs06.sh2.ctripcorp.com 2049Y 22185 Self-heal Daemon on glusterfs06.sh2.ctripcorp.com N/A Y 22192 NFS Server on glusterfs07.sh2.ctripcorp.com 2049Y 110894 Self-heal Daemon on glusterfs07.sh2.ctripcorp.com N/A Y 110905 NFS Server on 10.8.230.2092049Y 4091 Self-heal Daemon on 10.8.230.209 N/A Y 4104 Task Status of Volume ccim -- There are no active volume
Re: [Gluster-users] 答复: Gluster peer rejected and failed to start
On 06/02/2015 11:33 AM, vyyy杨雨阳 wrote: Actually I have 2 problems 1、New nodes can't add to the clusters I cleaned /var/lib/glusterd, now status is State: Accepted peer request (Connected) 2、One of clusterd nodes shown 'Peer rejected' and glusterd failed to start. The log is attached in pre-mail This is a product cluster, this problem is more egent From the log I can clearly see the problematic node is glusterfs09.sh2.ctripcorp.com. Your existing cluster configuration has bricks hosted in glusterfs09.sh2.ctripcorp.com however the same is not part of the cluster. Could you paste the output of gluster peer status and gluster volume status ? Best Regards Yuyang Yang -邮件原件- 发件人: Atin Mukherjee [mailto:amukh...@redhat.com] 发送时间: Tuesday, June 02, 2015 12:52 PM 收件人: vyyy杨雨阳; Gluster-users@gluster.org 主题: Re: [Gluster-users] Gluster peer rejected and failed to start On 06/02/2015 10:00 AM, vyyy杨雨阳 wrote: Hi We have a gluster (Version 3.6.3) cluster with 6 nodes, I tried to add 4 more nodes, but ‘Peer Rejected’, then I tried to resolve it by dump /var/lib/glusterd and probe again, not success, this is a question, But strange thing is: A node already in cluster also shown “Peer Reject” I tried to restart glusterd, It failed I found that /var/lib/glusterd/peers is empty, I copied the files from other nodes, still can’t start glusterd It seems like you are trying to peer probe nodes which are either either part of some other clusters (uncleaned nodes). Could you check whether the nodes which you are adding have empty /var/lib/glusterd? If not clean them and retry. ~Atin etc-glusterfs-glusterd.vol.log shown that cluster member as “unknown peer ” [2015-06-02 01:52:14.650635] C [glusterd-handler.c:2369:__glusterd_handle_friend_update] 0-: Received friend update request from unknown peer 04f22ee8-8e00-4c32-a924-b40a0e413aa6 [2015-06-02 01:52:14.650786] C [glusterd-handler.c:2369:__glusterd_handle_friend_update] 0-: Received friend update request from unknown peer 674a78b5-0590-48d4-8752-d4608832ed1d [2015-06-02 01:52:14.657881] C [glusterd-handler.c:2369:__glusterd_handle_friend_update] 0-: Received friend update request from unknown peer 83e1a9db-3134-45e4-acd2-387b12b5b207 [2015-06-02 01:52:17.747865] W [glusterd-handler.c:697:__glusterd_handle_cluster_lock] 0-management: 04f22ee8-8e00-4c32-a924-b40a0e413aa6 doesn't belong to the cluster. Ignoring request. [2015-06-02 01:52:17.747908] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2015-06-02 01:52:40.338885] W [glusterd-handler.c:697:__glusterd_handle_cluster_lock] 0-management: 674a78b5-0590-48d4-8752-d4608832ed1d doesn't belong to the cluster. Ignoring request. [2015-06-02 01:52:40.338929] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2015-06-02 01:52:41.310451] W [glusterd-handler.c:697:__glusterd_handle_cluster_lock] 0-management: 674a78b5-0590-48d4-8752-d4608832ed1d doesn't belong to the cluster. Ignoring request. [2015-06-02 01:52:41.310486] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully Debug info is as following, /usr/sbin/glusterd [root@SH02SVR5951 peers]# /usr/sbin/glusterd --debug [2015-06-02 04:09:24.626690] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.6.3 (args: /usr/sbin/glusterd --debug) [2015-06-02 04:09:24.626739] D [logging.c:1763:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5 [2015-06-02 04:09:24.627052] D [MSGID: 0] [glusterfsd.c:613:get_volfp] 0-glusterfsd: loading volume file /etc/glusterfs/glusterd.vol [2015-06-02 04:09:24.629683] I [glusterd.c:1214:init] 0-management: Maximum allowed open file descriptors set to 65536 [2015-06-02 04:09:24.629706] I [glusterd.c:1259:init] 0-management: Using /var/lib/glusterd as working directory [2015-06-02 04:09:24.629764] D [glusterd.c:391:glusterd_rpcsvc_options_build] 0-: listen-backlog value: 128 [2015-06-02 04:09:24.629895] D [rpcsvc.c:2198:rpcsvc_init] 0-rpc-service: RPC service inited. [2015-06-02 04:09:24.629904] D [rpcsvc.c:1801:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0 [2015-06-02 04:09:24.629930] D [rpc-transport.c:262:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.6.3/rpc-transport/socket.so [2015-06-02 04:09:24.631989] D [socket.c:3807:socket_init] 0-socket.management: SSL support on the I/O path is NOT enabled [2015-06-02 04:09:24.632005] D [socket.c:3810:socket_init] 0-socket.management: SSL support for glusterd is NOT enabled [2015-06-02 04:09:24.632013] D [socket.c:3827:socket_init] 0-socket.management: using system polling thread
[Gluster-users] 答复: Gluster peer rejected and failed to start
Actually I have 2 problems 1、New nodes can't add to the clusters I cleaned /var/lib/glusterd, now status is State: Accepted peer request (Connected) 2、One of clusterd nodes shown 'Peer rejected' and glusterd failed to start. The log is attached in pre-mail This is a product cluster, this problem is more egent Best Regards Yuyang Yang -邮件原件- 发件人: Atin Mukherjee [mailto:amukh...@redhat.com] 发送时间: Tuesday, June 02, 2015 12:52 PM 收件人: vyyy杨雨阳; Gluster-users@gluster.org 主题: Re: [Gluster-users] Gluster peer rejected and failed to start On 06/02/2015 10:00 AM, vyyy杨雨阳 wrote: Hi We have a gluster (Version 3.6.3) cluster with 6 nodes, I tried to add 4 more nodes, but ‘Peer Rejected’, then I tried to resolve it by dump /var/lib/glusterd and probe again, not success, this is a question, But strange thing is: A node already in cluster also shown “Peer Reject” I tried to restart glusterd, It failed I found that /var/lib/glusterd/peers is empty, I copied the files from other nodes, still can’t start glusterd It seems like you are trying to peer probe nodes which are either either part of some other clusters (uncleaned nodes). Could you check whether the nodes which you are adding have empty /var/lib/glusterd? If not clean them and retry. ~Atin etc-glusterfs-glusterd.vol.log shown that cluster member as “unknown peer ” [2015-06-02 01:52:14.650635] C [glusterd-handler.c:2369:__glusterd_handle_friend_update] 0-: Received friend update request from unknown peer 04f22ee8-8e00-4c32-a924-b40a0e413aa6 [2015-06-02 01:52:14.650786] C [glusterd-handler.c:2369:__glusterd_handle_friend_update] 0-: Received friend update request from unknown peer 674a78b5-0590-48d4-8752-d4608832ed1d [2015-06-02 01:52:14.657881] C [glusterd-handler.c:2369:__glusterd_handle_friend_update] 0-: Received friend update request from unknown peer 83e1a9db-3134-45e4-acd2-387b12b5b207 [2015-06-02 01:52:17.747865] W [glusterd-handler.c:697:__glusterd_handle_cluster_lock] 0-management: 04f22ee8-8e00-4c32-a924-b40a0e413aa6 doesn't belong to the cluster. Ignoring request. [2015-06-02 01:52:17.747908] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2015-06-02 01:52:40.338885] W [glusterd-handler.c:697:__glusterd_handle_cluster_lock] 0-management: 674a78b5-0590-48d4-8752-d4608832ed1d doesn't belong to the cluster. Ignoring request. [2015-06-02 01:52:40.338929] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2015-06-02 01:52:41.310451] W [glusterd-handler.c:697:__glusterd_handle_cluster_lock] 0-management: 674a78b5-0590-48d4-8752-d4608832ed1d doesn't belong to the cluster. Ignoring request. [2015-06-02 01:52:41.310486] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully Debug info is as following, /usr/sbin/glusterd [root@SH02SVR5951 peers]# /usr/sbin/glusterd --debug [2015-06-02 04:09:24.626690] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.6.3 (args: /usr/sbin/glusterd --debug) [2015-06-02 04:09:24.626739] D [logging.c:1763:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5 [2015-06-02 04:09:24.627052] D [MSGID: 0] [glusterfsd.c:613:get_volfp] 0-glusterfsd: loading volume file /etc/glusterfs/glusterd.vol [2015-06-02 04:09:24.629683] I [glusterd.c:1214:init] 0-management: Maximum allowed open file descriptors set to 65536 [2015-06-02 04:09:24.629706] I [glusterd.c:1259:init] 0-management: Using /var/lib/glusterd as working directory [2015-06-02 04:09:24.629764] D [glusterd.c:391:glusterd_rpcsvc_options_build] 0-: listen-backlog value: 128 [2015-06-02 04:09:24.629895] D [rpcsvc.c:2198:rpcsvc_init] 0-rpc-service: RPC service inited. [2015-06-02 04:09:24.629904] D [rpcsvc.c:1801:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0 [2015-06-02 04:09:24.629930] D [rpc-transport.c:262:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.6.3/rpc-transport/socket.so [2015-06-02 04:09:24.631989] D [socket.c:3807:socket_init] 0-socket.management: SSL support on the I/O path is NOT enabled [2015-06-02 04:09:24.632005] D [socket.c:3810:socket_init] 0-socket.management: SSL support for glusterd is NOT enabled [2015-06-02 04:09:24.632013] D [socket.c:3827:socket_init] 0-socket.management: using system polling thread [2015-06-02 04:09:24.632024] D [name.c:550:server_fill_address_family] 0-socket.management: option address-family not specified, defaulting to inet [2015-06-02 04:09:24.632072] D [rpc-transport.c:262:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.6.3/rpc-transport/rdma.so [2015-06-02 04:09:24.632102]
[Gluster-users] 答复: 答复: Gluster peer rejected and failed to start
Glusterfs05~glusterfs10 are clustered for 2 years, recently upgrade to 3.6.3 Glusterfs11~glusterfs14 are new nodes need to join the cluster On glusterfs09: [root@SH02SVR5952 ~]# gluster peer status Number of Peers: 6 Hostname: glusterfs06.sh2.ctripcorp.com Uuid: 2cb15023-28b0-4d0d-8a43-b8c6e570776f State: Peer in Cluster (Connected) Hostname: glusterfs07.sh2.ctripcorp.com Uuid: 5357c40d-7e34-41f0-a96b-9aa76e52ad23 State: Peer in Cluster (Connected) Hostname: glusterfs08.sh2.ctripcorp.com Uuid: 83e1a9db-3134-45e4-acd2-387b12b5b207 State: Peer in Cluster (Connected) Hostname: 10.8.230.209 Uuid: 04f22ee8-8e00-4c32-a924-b40a0e413aa6 State: Peer in Cluster (Connected) Hostname: glusterfs10.sh2.ctripcorp.com Uuid: ea17d7f9-d737-4472-ab9a-feed3cfac57c State: Peer in Cluster (Disconnected) Hostname: glusterfs11.sh2.ctripcorp.com Uuid: 2d703550-92b5-4f5e-af90-ff2fbf3366f0 State: Peer Rejected (Connected) [root@SH02SVR5952 ~]# [root@SH02SVR5952 ~]# gluster volume status Status of volume: JQStore2 Gluster process PortOnline Pid -- Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick 49152 Y 2782 Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick 49152 Y 2744 Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick 49152 Y 5307 Brick glusterfs09.sh2.ctripcorp.com:/export/sdb/brick 49152 Y 3986 NFS Server on localhost 2049Y 51697 Self-heal Daemon on localhost N/A Y 51710 NFS Server on glusterfs07.sh2.ctripcorp.com 2049Y 110894 Self-heal Daemon on glusterfs07.sh2.ctripcorp.com N/A Y 110905 NFS Server on glusterfs06.sh2.ctripcorp.com 2049Y 22185 Self-heal Daemon on glusterfs06.sh2.ctripcorp.com N/A Y 22192 NFS Server on 10.8.230.209 2049Y 4091 Self-heal Daemon on 10.8.230.209N/A Y 4104 Task Status of Volume JQStore2 -- There are no active volume tasks Status of volume: Webresource Gluster process PortOnline Pid -- Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick3 49155 Y 2787 Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick3 49155 Y 2753 Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick3 49155 Y 5313 Brick glusterfs09.sh2.ctripcorp.com:/export/sdb/brick3 49155 Y 3992 NFS Server on localhost 2049Y 51697 Self-heal Daemon on localhost N/A Y 51710 NFS Server on 10.8.230.209 2049Y 4091 Self-heal Daemon on 10.8.230.209N/A Y 4104 NFS Server on glusterfs06.sh2.ctripcorp.com 2049Y 22185 Self-heal Daemon on glusterfs06.sh2.ctripcorp.com N/A Y 22192 NFS Server on glusterfs07.sh2.ctripcorp.com 2049Y 110894 Self-heal Daemon on glusterfs07.sh2.ctripcorp.com N/A Y 110905 Task Status of Volume Webresource -- There are no active volume tasks Status of volume: ccim Gluster process PortOnline Pid -- Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick2 49154 Y 2793 Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick2 49154 Y 2745 Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick2 49154 Y 5320 Brick glusterfs09.sh2.ctripcorp.com:/export/sdb/brick2 49154 Y 3999 NFS Server on localhost 2049Y 51697 Self-heal Daemon on localhost N/A Y 51710 NFS Server on glusterfs06.sh2.ctripcorp.com 2049Y 22185 Self-heal Daemon on glusterfs06.sh2.ctripcorp.com N/A Y 22192 NFS Server on glusterfs07.sh2.ctripcorp.com 2049Y 110894 Self-heal Daemon on glusterfs07.sh2.ctripcorp.com N/A Y 110905 NFS Server on 10.8.230.209 2049Y 4091 Self-heal Daemon on 10.8.230.209N/A Y 4104 Task Status of Volume ccim -- There are no active volume tasks Status of volume: cloudimage Gluster process PortOnline Pid -- Brick