[Gluster-users] glusterfs - git clone is very slow

2015-06-02 Thread siva kumar
Hi ,

We are facing slowness issue while clone repository from github using
gluster latest v3.70 shared directory .

 Please refer the following time difference for git clone command .

GlusterFS -  shared directory :

test@test:~/gluster$ time git clone
https://github.com/elastic/elasticsearch.git
Cloning into 'elasticsearch'...
remote: Counting objects: 359724, done.
remote: Compressing objects: 100% (55/55), done.
remote: Total 359724 (delta 59), reused 20 (delta 20), pack-reused 359649
Receiving objects: 100% (359724/359724), 129.04 MiB | 569.00 KiB/s, done.
Resolving deltas: 100% (203986/203986), done.
Checking out files: 100% (5272/5272), done.

*real9m1.972s*
user0m27.063s
sys0m18.974s



Normal machine - without glusterfs shared directory

test@test:~/s$ time git clone https://github.com/elastic/elasticsearch.git
Cloning into 'elasticsearch'...
remote: Counting objects: 359724, done.
remote: Compressing objects: 100% (55/55), done.
remote: Total 359724 (delta 59), reused 20 (delta 20), pack-reused 359649
Receiving objects: 100% (359724/359724), 129.04 MiB | 2.12 MiB/s, done.
Resolving deltas: 100% (203986/203986), done.
Checking connectivity... done
Checking out files: 100% (5272/5272), done.

*real1m56.895s*
user0m12.974s
sys0m4.972s


Can you please check the same and let us know what are the configuration
should be done to get better performance .

Thanks.
Siva
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 答复: 答复: 答复: Gluster peer rejected and failed to start

2015-06-02 Thread Atin Mukherjee


On 06/02/2015 12:23 PM, vyyy杨雨阳 wrote:
 Attached glusterd log files for 10  11
[2015-06-02 06:33:42.668268] E
[glusterd-handshake.c:972:gd_validate_mgmt_hndsk_req] 0-management:
Rejecting management handshake request from unknown peer 10.8.230.212:1002

From the above log it looks like node 11 tried to handshake with 12, but
how come 12 is part of the cluster? Could you run gluster peer status in
12 and share glusterd log? I am still under the same opinion which I had
earlier - you are trying to add node which is already a member of
another cluster.
 
 Best Regards 
 Yuyang Yang
 
 -邮件原件-
 发件人: Atin Mukherjee [mailto:amukh...@redhat.com] 
 发送时间: Tuesday, June 02, 2015 2:42 PM
 收件人: vyyy杨雨阳; Gluster-users@gluster.org
 主题: Re: 答复: 答复: [Gluster-users] Gluster peer rejected and failed to start
 
 
 
 On 06/02/2015 12:04 PM, vyyy杨雨阳 wrote:
 Glusterfs05~glusterfs10 are clustered for 2 years, recently upgrade to 3.6.3
 Glusterfs11~glusterfs14 are new nodes need to join the cluster

 On glusterfs09:

 [root@SH02SVR5952 ~]# gluster peer status
 Number of Peers: 6

 Hostname: glusterfs06.sh2.ctripcorp.com
 Uuid: 2cb15023-28b0-4d0d-8a43-b8c6e570776f
 State: Peer in Cluster (Connected)

 Hostname: glusterfs07.sh2.ctripcorp.com
 Uuid: 5357c40d-7e34-41f0-a96b-9aa76e52ad23
 State: Peer in Cluster (Connected)

 Hostname: glusterfs08.sh2.ctripcorp.com
 Uuid: 83e1a9db-3134-45e4-acd2-387b12b5b207
 State: Peer in Cluster (Connected)

 Hostname: 10.8.230.209
 Uuid: 04f22ee8-8e00-4c32-a924-b40a0e413aa6
 State: Peer in Cluster (Connected)

 Hostname: glusterfs10.sh2.ctripcorp.com
 Uuid: ea17d7f9-d737-4472-ab9a-feed3cfac57c
 State: Peer in Cluster (Disconnected)

 Hostname: glusterfs11.sh2.ctripcorp.com
 Uuid: 2d703550-92b5-4f5e-af90-ff2fbf3366f0
 State: Peer Rejected (Connected)
 [root@SH02SVR5952 ~]#
 Can you attach glusterd log files for 10  11?

 [root@SH02SVR5952 ~]# gluster volume status
 Status of volume: JQStore2
 Gluster process  PortOnline  
 Pid
 --
 Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick49152   Y   
 2782
 Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick49152   Y   
 2744
 Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick49152   Y   
 5307
 Brick glusterfs09.sh2.ctripcorp.com:/export/sdb/brick49152   Y   
 3986
 NFS Server on localhost  2049Y   
 51697
 Self-heal Daemon on localhostN/A Y   
 51710
 NFS Server on glusterfs07.sh2.ctripcorp.com  2049Y   110894
 Self-heal Daemon on glusterfs07.sh2.ctripcorp.comN/A Y   110905
 NFS Server on glusterfs06.sh2.ctripcorp.com  2049Y   22185
 Self-heal Daemon on glusterfs06.sh2.ctripcorp.comN/A Y   22192
 NFS Server on 10.8.230.209   2049Y   4091
 Self-heal Daemon on 10.8.230.209 N/A Y   4104
  
 Task Status of Volume JQStore2
 --
 There are no active volume tasks
  
 Status of volume: Webresource
 Gluster process  PortOnline  
 Pid
 --
 Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick3   49155   Y   
 2787
 Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick3   49155   Y   
 2753
 Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick3   49155   Y   
 5313
 Brick glusterfs09.sh2.ctripcorp.com:/export/sdb/brick3   49155   Y   
 3992
 NFS Server on localhost  2049Y   
 51697
 Self-heal Daemon on localhostN/A Y   
 51710
 NFS Server on 10.8.230.209   2049Y   4091
 Self-heal Daemon on 10.8.230.209 N/A Y   4104
 NFS Server on glusterfs06.sh2.ctripcorp.com  2049Y   22185
 Self-heal Daemon on glusterfs06.sh2.ctripcorp.comN/A Y   22192
 NFS Server on glusterfs07.sh2.ctripcorp.com  2049Y   110894
 Self-heal Daemon on glusterfs07.sh2.ctripcorp.comN/A Y   110905
  
 Task Status of Volume Webresource
 --
 There are no active volume tasks
  
 Status of volume: ccim
 Gluster process  PortOnline  
 Pid
 --
 Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick2   49154   Y   
 2793
 Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick2   49154   Y   
 2745
 Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick2   49154   Y   
 5320
 Brick 

[Gluster-users] gluster-3.7 cannot start volume ganesha feature cannot turn on problem

2015-06-02 Thread 莊尚豪
Hi all,

I have two question for glusterfs-3.7 on fedora-22

I used to have a glusterfs cluster version 3.6.2. 

The following configuration can be work in version-3.6.2, but not in
version-3.7

 

There is 2 node for glusterfs.

OS: fedora 22

Gluster: 3.7 on
https://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.0/

 

#gluster peer probe n1

#gluster volume create ganesha n1:/data/brick1/gv0 n2:/data/brick1/gv0

 

Volume Name: ganesha

Type: Distribute

Volume ID: cbb8d360-0025-419c-a12b-b29e4b91d7f8

Status: Created

Number of Bricks: 2

Transport-type: tcp

Bricks:

Brick1: n1:/data/brick1/gv0

Brick2: n2:/data/brick1/gv0

Options Reconfigured:

performance.readdir-ahead: on

 

The problem to start the volume ganesha 

#gluster volume start ganesha

 

volume start: ganesha: failed: Commit failed on localhost. Please check the
log file for more details.

 

LOG in /var/log/glusterfs/bricks/data-brick1-gv0.log

 

[2015-06-02 08:02:55.232923] I [MSGID: 100030] [glusterfsd.c:2294:main]
0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.0
(args: /usr/sbin/glusterfsd -s n2 --volfile-id ganesha.n2.data-brick1-gv0 -p
/var/lib/glusterd/vols/ganesha/run/n2-data-brick1-gv0.pid -S
/var/run/gluster/73ea8a39514304f5ebd440321d784386.socket --brick-name
/data/brick1/gv0 -l /var/log/glusterfs/bricks/data-brick1-gv0.log
--xlator-option *-posix.glusterd-uuid=35547067-d343-4fee-802a-0e911b5a07cd
--brick-port 49157 --xlator-option ganesha-server.listen-port=49157)

[2015-06-02 08:02:55.284206] I
[event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 1

[2015-06-02 08:02:55.397923] W [xlator.c:192:xlator_dynload] 0-xlator:
/usr/lib64/glusterfs/3.7.0/xlator/features/changelog.so: undefined symbol:
changelog_select_event

[2015-06-02 08:02:55.397963] E [graph.y:212:volume_type] 0-parser: Volume
'ganesha-changelog', line 30: type 'features/changelog' is not valid or not
found on this machine

[2015-06-02 08:02:55.397992] E [graph.y:321:volume_end] 0-parser: type not
specified for volume ganesha-changelog

[2015-06-02 08:02:55.398214] E [MSGID: 100026]
[glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the
graph

[2015-06-02 08:02:55.398423] W [glusterfsd.c:1219:cleanup_and_exit] (-- 0-:
received signum (0), shutting down

 

I cannot google method to resolve it.

Does anyone have across this problem?

 

Another question is the feature in nfs-ganesha(version 2.2)

The volume command I cannot turn on this feature.

I try to copy the demo glusterfs-ganesha video but cannot work.

Demo link: https://plus.google.com/events/c9omal6366f2cfkcd0iuee5ta1o

 

[root@n1 brick1]# gluster nfs-ganesha enable

Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted
pool. Do you still want to continue? (y/n) y

nfs-ganesha: failed: Commit failed on localhost. Please check the log file
for more details.

 

Does anyone have the detail configuration?

THANKS for giving advice.

 

Regards,

Ben

 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances

2015-06-02 Thread Pranith Kumar Karampuri

hi Geoffrey,
 Since you are saying it happens on all types of volumes, 
lets do the following:

1) Create a dist-repl volume
2) Set the options etc you need.
3) enable gluster volume profile using gluster volume profile volname 
start

4) run the work load
5) give output of gluster volume profile volname info

Repeat the steps above on new and old version you are comparing this 
with. That should give us insight into what could be causing the slowness.


Pranith
On 06/02/2015 03:22 AM, Geoffrey Letessier wrote:

Dear all,

I have a crash test cluster where i’ve tested the new version of 
GlusterFS (v3.7) before upgrading my HPC cluster in production.

But… all my tests show me very very low performances.

For my benches, as you can read below, I do some actions (untar, du, 
find, tar, rm) with linux kernel sources, dropping cache, each on 
distributed, replicated, distributed-replicated, single (single brick) 
volumes and the native FS of one brick.


# time (echo 3  /proc/sys/vm/drop_caches; tar xJf 
~/linux-4.1-rc5.tar.xz; sync; echo 3  /proc/sys/vm/drop_caches)
# time (echo 3  /proc/sys/vm/drop_caches; du -sh linux-4.1-rc5/; echo 
3  /proc/sys/vm/drop_caches)
# time (echo 3  /proc/sys/vm/drop_caches; find linux-4.1-rc5/|wc -l; 
echo 3  /proc/sys/vm/drop_caches)
# time (echo 3  /proc/sys/vm/drop_caches; tar czf linux-4.1-rc5.tgz 
linux-4.1-rc5/; echo 3  /proc/sys/vm/drop_caches)
# time (echo 3  /proc/sys/vm/drop_caches; rm -rf linux-4.1-rc5.tgz 
linux-4.1-rc5/; echo 3  /proc/sys/vm/drop_caches)


And here are the process times:

---
| |  UNTAR  |   DU   |  FIND   |   TAR   |   RM   |
---
| single  |  ~3m45s |   ~43s |~47s | ~3m10s | ~3m15s |
---
| replicated  |  ~5m10s |   ~59s |   ~1m6s | ~1m19s | ~1m49s |
---
| distributed |  ~4m18s |   ~41s |~57s | ~2m24s | ~1m38s |
---
| dist-repl   |  ~8m18s |  ~1m4s |  ~1m11s | ~1m24s | ~2m40s |
---
| native FS   |~11s |~4s | ~2s | ~56s |   ~10s |
---

I get the same results, whether with default configurations with 
custom configurations.


if I look at the side of the ifstat command, I can note my IO write 
processes never exceed 3MBs...


EXT4 native FS seems to be faster (roughly 15-20% but no more) than 
XFS one


My [test] storage cluster config is composed by 2 identical servers 
(biCPU Intel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb ethernet)


My volume settings:
single: 1server 1 brick
replicated: 2 servers 1 brick each
distributed: 2 servers 2 bricks each
dist-repl: 2 bricks in the same server and replica 2

All seems to be OK in gluster status command line.

Do you have an idea why I obtain so bad results?
Thanks in advance.
Geoffrey
---
Geoffrey Letessier

Responsable informatique  ingénieur système
CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@cnrs.fr 
mailto:geoffrey.letess...@cnrs.fr




___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Features - Object Count

2015-06-02 Thread aasenov1989
Hi,
That is exactly what I was looking for. Thanks a lot.

Regards,
Asen Asenov

On Mon, Jun 1, 2015 at 2:51 PM, Sachin Pandit span...@redhat.com wrote:



 - Original Message -
  From: M S Vishwanath Bhat msvb...@gmail.com
  To: aasenov1989 aasenov1...@gmail.com
  Cc: Gluster-users@gluster.org List gluster-users@gluster.org
  Sent: Monday, June 1, 2015 3:02:08 PM
  Subject: Re: [Gluster-users] Features - Object Count
 
 
 
  On 29 May 2015 at 18:11, aasenov1989  aasenov1...@gmail.com  wrote:
 
 
 
  Hi,
  So is there a way to find how many files I have on each brick of the
 volume?
  I don't think gluster provides a way to exactly get the number of files
 in a
  brick or volume.
 
  Sorry if my solution is very obvious. But I generally use find to get the
  number of files in a particular brick.
 
  find /brick/path ! -path /brick/path/.glusterfs* | wc -l

 Hi,

 You can also do getfattr -d -m . -e hex brick_path
 This command is to get the extended attributes of a directory.
 When you issue this command after enabling quota then
 you can see an extended attribute with name trusted.glusterfs.quota.size
 That basically holds the size, file count and directory count.

 The extended attribute consists of 48 hexadecimal numbers. First 16 will
 give
 you the size, next 16 the file count and last 16 the directory count.

 Hope this helps.

 Thanks,
 Sachin Pandit.


 
 
  Best Regards,
  Vishwanath
 
 
 
 
 
  Regards,
  Asen Asenov
 
  On Fri, May 29, 2015 at 3:33 PM, Atin Mukherjee 
 atin.mukherje...@gmail.com
   wrote:
 
 
 
 
 
 
  Sent from Samsung Galaxy S4
  On 29 May 2015 17:59, aasenov1989  aasenov1...@gmail.com  wrote:
  
   Hi,
   Thnaks for the help. I was able to retrieve number of objects for
 entire
   volume. But I didn't figure out how to set quota for particular brick.
 I
   have replicated volume with 2 bricks on 2 nodes:
   Bricks:
   Brick1: host1:/dataDir
   Brick2: host2:/dataDir
   Both bricks are up and files are replicated. But when I try to set
 quota on
   a particular brick:
  IIUC, You won't be able to set quota at brick level as multiple bricks
  comprise a volume which is exposed to the user. Quota team can correct
 me if
  I am wrong.
 
  
   gluster volume quota TestVolume limit-objects /dataDir/
 9223372036854775807
   quota command failed : Failed to get trusted.gfid attribute on path
   /dataDir/. Reason : No such file or directory
   please enter the path relative to the volume
  
   What should be the path to brick directories relative to the volume?
  
   Regards,
   Asen Asenov
  
  
   On Fri, May 29, 2015 at 12:35 PM, Sachin Pandit  span...@redhat.com 
   wrote:
  
   - Original Message -
From: aasenov1989  aasenov1...@gmail.com 
To: Humble Devassy Chirammal  humble.deva...@gmail.com 
Cc:  Gluster-users@gluster.org List  gluster-users@gluster.org 
Sent: Friday, May 29, 2015 12:22:43 AM
Subject: Re: [Gluster-users] Features - Object Count
   
Thanks Humble,
But as far as I understand the object count is connected with the
 quotas
set
per folders. What I want is to get number of files I have in entire
volume -
even when volume is distributed across multiple computers. I think
 the
purpose of this feature:
   
 http://gluster.readthedocs.org/en/latest/Feature%20Planning/GlusterFS%203.7/Object%20Count/
  
   Hi,
  
   You are absolutely correct. You can retrieve number of files in the
 entire
   volume if you have the limit-objects set on the root. If
 limit-objects
   is set on the directory present in a mount point then it will only
 show
   the number of files and directories of that particular directory.
  
   In your case, if you want to retrieve number of files and directories
   present in the entire volume then you might have to set the object
 limit
   on the root.
  
  
   Thanks,
   Sachin Pandit.
  
  
is to provide such functionality. Am I right or there is no way to
retrieve
number of files for entire volume?
   
Regards,
Asen Asenov
   
On Thu, May 28, 2015 at 8:09 PM, Humble Devassy Chirammal 
humble.deva...@gmail.com  wrote:
   
   
   
Hi Asen,
   
   
 https://gluster.readthedocs.org/en/latest/Features/quota-object-count/ ,
hope
this helps.
   
--Humble
   
   
On Thu, May 28, 2015 at 8:38 PM, aasenov1989 
 aasenov1...@gmail.com 
wrote:
   
   
   
Hi,
I wanted to ask how to use this feature in gluster 3.7.0, as I was
unable to
find anything. How can I retrieve number of objects in volume and
 number
of
objects in particular brick?
   
Thanks in advance.
   
Regards,
Asen Asenov
   
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users
   
   
   
___
Gluster-users mailing list
Gluster-users@gluster.org

Re: [Gluster-users] 3.6.3 split brain on web browser cache dir w. replica 3 volume

2015-06-02 Thread Alastair Neil
Cheers that's a great help.  I am assuming the extra
trusted.afr.volname-client- entries are left over from the removed peers,
can I expect they will disappear after glusterfsd gets restarted?



On 1 June 2015 at 23:49, Ravishankar N ravishan...@redhat.com wrote:



 On 06/01/2015 08:15 PM, Alastair Neil wrote:


  I have a replica 3 volume I am using to serve my home directory.  I have
 notices a couple of split-brains recently on files used by browsers(for the
 most recent see below, I had an earlier one on
 .config/google-chrome/Default/Session Storage/) .  When I was running
 replica 2 I don't recall seeing more than two entries of the form:
 trusted.afr.volname.client-?.  I did have two other servers that I have
 removed from service recently but I am curious to know if there is some way
 to map  what the server reports as trusted.afr.volname-client-? to a
 hostname?



 Your volfile
 (/var/lib/glusterd/vols/volname/trusted-volname.tcp-fuse.vol) should
 contain which brick (remote-subvolume + remote-host) a given trusted.afr*
 maps to.
 Hope that helps,
 Ravi


  Thanks, Alastair


  # gluster volume heal homes info
 Brick gluster-2:/export/brick2/home/
 /a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in split-brain
 Number of entries: 1
 Brick gluster1:/export/brick2/home/
 /a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in split-brain
 Number of entries: 1
 Brick gluster0:/export/brick2/home/
 /a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in split-brain
 Number of entries: 1
 # getfattr -d -m . -e hex
 /export/brick2/home/a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair
 getfattr: Removing leading '/' from absolute path names
 # file:
 export/brick2/home/a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair

 security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
 trusted.afr.dirty=0x
 trusted.afr.homes-client-0=0x
 trusted.afr.homes-client-1=0x
 trusted.afr.homes-client-2=0x
 trusted.afr.homes-client-3=0x0002
 trusted.afr.homes-client-4=0x
 trusted.gfid=0x3ae398227cea4f208d7652dbfb93e3e5
 trusted.glusterfs.dht=0x0001
 trusted.glusterfs.quota.dirty=0x3000

 trusted.glusterfs.quota.edf41dc8-2122-4aa3-bc20-29225564ca8c.contri=0x162d2200
 trusted.glusterfs.quota.size=0x162d2200




 ___
 Gluster-users mailing 
 listGluster-users@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances

2015-06-02 Thread Ben Turner
I am seeing problems on 3.7 as well.  Can you check /var/log/messages on both 
the clients and servers for hung tasks like:

Jun  2 15:23:14 gqac006 kernel: echo 0  
/proc/sys/kernel/hung_task_timeout_secs disables this message.
Jun  2 15:23:14 gqac006 kernel: iozoneD 0001 0 21999
  1 0x0080
Jun  2 15:23:14 gqac006 kernel: 880611321cc8 0082 
880611321c18 a027236e
Jun  2 15:23:14 gqac006 kernel: 880611321c48 a0272c10 
88052bd1e040 880611321c78
Jun  2 15:23:14 gqac006 kernel: 88052bd1e0f0 88062080c7a0 
880625addaf8 880611321fd8
Jun  2 15:23:14 gqac006 kernel: Call Trace:
Jun  2 15:23:14 gqac006 kernel: [a027236e] ? 
rpc_make_runnable+0x7e/0x80 [sunrpc]
Jun  2 15:23:14 gqac006 kernel: [a0272c10] ? rpc_execute+0x50/0xa0 
[sunrpc]
Jun  2 15:23:14 gqac006 kernel: [810aaa21] ? ktime_get_ts+0xb1/0xf0
Jun  2 15:23:14 gqac006 kernel: [811242d0] ? sync_page+0x0/0x50
Jun  2 15:23:14 gqac006 kernel: [8152a1b3] io_schedule+0x73/0xc0
Jun  2 15:23:14 gqac006 kernel: [8112430d] sync_page+0x3d/0x50
Jun  2 15:23:14 gqac006 kernel: [8152ac7f] __wait_on_bit+0x5f/0x90
Jun  2 15:23:14 gqac006 kernel: [81124543] wait_on_page_bit+0x73/0x80
Jun  2 15:23:14 gqac006 kernel: [8109eb80] ? 
wake_bit_function+0x0/0x50
Jun  2 15:23:14 gqac006 kernel: [8113a525] ? 
pagevec_lookup_tag+0x25/0x40
Jun  2 15:23:14 gqac006 kernel: [8112496b] 
wait_on_page_writeback_range+0xfb/0x190
Jun  2 15:23:14 gqac006 kernel: [81124b38] 
filemap_write_and_wait_range+0x78/0x90
Jun  2 15:23:14 gqac006 kernel: [811c07ce] vfs_fsync_range+0x7e/0x100
Jun  2 15:23:14 gqac006 kernel: [811c08bd] vfs_fsync+0x1d/0x20
Jun  2 15:23:14 gqac006 kernel: [811c08fe] do_fsync+0x3e/0x60
Jun  2 15:23:14 gqac006 kernel: [811c0950] sys_fsync+0x10/0x20
Jun  2 15:23:14 gqac006 kernel: [8100b072] 
system_call_fastpath+0x16/0x1b

Do you see a perf problem with just a simple DD or do you need a more complex 
workload to hit the issue?  I think I saw an issue with metadata performance 
that I am trying to run down, let me know if you can see the problem with 
simple DD reads / writes or if we need to do some sort of dir / metadata access 
as well.

-b

- Original Message -
 From: Geoffrey Letessier geoffrey.letess...@cnrs.fr
 To: Pranith Kumar Karampuri pkara...@redhat.com
 Cc: gluster-users@gluster.org
 Sent: Tuesday, June 2, 2015 8:09:04 AM
 Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances
 
 Hi Pranith,
 
 I’m sorry but I cannot bring you any comparison because comparison will be
 distorted by the fact in my HPC cluster in production the network technology
 is InfiniBand QDR and my volumes are quite different (brick in RAID6
 (12x2TB), 2 bricks per server and 4 servers into my pool)
 
 Concerning your demand, in attachments you can find all expected results
 hoping it can help you to solve this serious performance issue (maybe I need
 play with glusterfs parameters?).
 
 Thank you very much by advance,
 Geoffrey
 --
 Geoffrey Letessier
 Responsable informatique  ingénieur système
 UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
 Institut de Biologie Physico-Chimique
 13, rue Pierre et Marie Curie - 75005 Paris
 Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@ibpc.fr
 
 
 
 
 Le 2 juin 2015 à 10:09, Pranith Kumar Karampuri  pkara...@redhat.com  a
 écrit :
 
 hi Geoffrey,
 Since you are saying it happens on all types of volumes, lets do the
 following:
 1) Create a dist-repl volume
 2) Set the options etc you need.
 3) enable gluster volume profile using gluster volume profile volname
 start
 4) run the work load
 5) give output of gluster volume profile volname info
 
 Repeat the steps above on new and old version you are comparing this with.
 That should give us insight into what could be causing the slowness.
 
 Pranith
 On 06/02/2015 03:22 AM, Geoffrey Letessier wrote:
 
 
 Dear all,
 
 I have a crash test cluster where i’ve tested the new version of GlusterFS
 (v3.7) before upgrading my HPC cluster in production.
 But… all my tests show me very very low performances.
 
 For my benches, as you can read below, I do some actions (untar, du, find,
 tar, rm) with linux kernel sources, dropping cache, each on distributed,
 replicated, distributed-replicated, single (single brick) volumes and the
 native FS of one brick.
 
 # time (echo 3  /proc/sys/vm/drop_caches; tar xJf ~/linux-4.1-rc5.tar.xz;
 sync; echo 3  /proc/sys/vm/drop_caches)
 # time (echo 3  /proc/sys/vm/drop_caches; du -sh linux-4.1-rc5/; echo 3 
 /proc/sys/vm/drop_caches)
 # time (echo 3  /proc/sys/vm/drop_caches; find linux-4.1-rc5/|wc -l; echo 3
  /proc/sys/vm/drop_caches)
 # time (echo 3  /proc/sys/vm/drop_caches; tar czf linux-4.1-rc5.tgz
 linux-4.1-rc5/; echo 3  /proc/sys/vm/drop_caches)
 # time (echo 3  

[Gluster-users] can't remove brick - wrong operating-version

2015-06-02 Thread Branden Timm
Hi All,

I'm hitting what seems to be a known but unresolved bug, exactly similar to 
these:


Most recently:

https://bugzilla.redhat.com/show_bug.cgi?id=1168897


Similar from some time ago:

https://bugzilla.redhat.com/show_bug.cgi?id=1127328


Essentially the upshot is that the remove-brick operation reports:


volume remove-brick commit force: failed: One or more nodes do not support the 
required op-version. Cluster op-version must atleast be 30600.


I'm on CentOS 6.6 with GlusterFS 3.6.3 from glusterfs-epel. The 
operating-version in /var/lib/glusterd/glusterd.info is set to 2 on all hosts 
participating in the volume.


I see that some recommend manually changing that setting in glusterd.info to 
something higher than 30600, but that does not seem particularly safe, and a 
Ubuntu 14.04 user reported that glusterd wouldn't actually start when that 
setting was changed.


Is there any workaround to this? I can't imagine everyone in the world running 
Gluster is unable to remove bricks at the moment ...


Thanks in advance for any insight you can provide.


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.6.3 split brain on web browser cache dir w. replica 3 volume

2015-06-02 Thread Ravishankar N



On 06/03/2015 01:14 AM, Alastair Neil wrote:
Cheers that's a great help.  I am assuming the extra 
trusted.afr.volname-client- entries are left over from the removed peers,


Correct.

can I expect they will disappear after glusterfsd gets restarted?


They will remain, but it should not affect normal operation in any way.



On 1 June 2015 at 23:49, Ravishankar N ravishan...@redhat.com 
mailto:ravishan...@redhat.com wrote:




On 06/01/2015 08:15 PM, Alastair Neil wrote:


I have a replica 3 volume I am using to serve my home directory. 
I have notices a couple of split-brains recently on files used by

browsers(for the most recent see below, I had an earlier one on
.config/google-chrome/Default/Session Storage/) .  When I was
running replica 2 I don't recall seeing more than two entries of
the form: trusted.afr.volname.client-?.  I did have two other
servers that I have removed from service recently but I am
curious to know if there is some way to map  what the server
reports as trusted.afr.volname-client-? to a hostname?




Your volfile
(/var/lib/glusterd/vols/volname/trusted-volname.tcp-fuse.vol)
should contain which brick (remote-subvolume + remote-host) a
given trusted.afr* maps to.
Hope that helps,
Ravi



Thanks, Alastair


# gluster volume heal homes info
Brick gluster-2:/export/brick2/home/
/a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in
split-brain
Number of entries: 1
Brick gluster1:/export/brick2/home/
/a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in
split-brain
Number of entries: 1
Brick gluster0:/export/brick2/home/
/a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in
split-brain
Number of entries: 1
# getfattr -d -m . -e hex
/export/brick2/home/a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair
getfattr: Removing leading '/' from absolute path names
# file:
export/brick2/home/a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair

security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.dirty=0x
trusted.afr.homes-client-0=0x
trusted.afr.homes-client-1=0x
trusted.afr.homes-client-2=0x
trusted.afr.homes-client-3=0x0002
trusted.afr.homes-client-4=0x
trusted.gfid=0x3ae398227cea4f208d7652dbfb93e3e5
trusted.glusterfs.dht=0x0001
trusted.glusterfs.quota.dirty=0x3000

trusted.glusterfs.quota.edf41dc8-2122-4aa3-bc20-29225564ca8c.contri=0x162d2200
trusted.glusterfs.quota.size=0x162d2200




___
Gluster-users mailing list
Gluster-users@gluster.org  mailto:Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users





___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] can't remove brick - wrong operating-version

2015-06-02 Thread Atin Mukherjee
Sent from Samsung Galaxy S4
On 3 Jun 2015 01:17, Branden Timm bt...@wisc.edu wrote:

 Hi All,

 I'm hitting what seems to be a known but unresolved bug, exactly similar
to these:


 Most recently:

 https://bugzilla.redhat.com/show_bug.cgi?id=1168897


 Similar from some time ago:

 https://bugzilla.redhat.com/show_bug.cgi?id=1127328


 Essentially the upshot is that the remove-brick operation reports:


 volume remove-brick commit force: failed: One or more nodes do not
support the required op-version. Cluster op-version must atleast be 30600.


 I'm on CentOS 6.6 with GlusterFS 3.6.3 from glusterfs-epel. The
operating-version in /var/lib/glusterd/glusterd.info is set to 2 on all
hosts participating in the volume.


 I see that some recommend manually changing that setting in glusterd.info
to something higher than 30600, but that does not seem particularly safe,
and a Ubuntu 14.04 user reported that glusterd wouldn't actually start when
that setting was changed.


 Is there any workaround to this? I can't imagine everyone in the world
running Gluster is unable to remove bricks at the moment ...


 Thanks in advance for any insight you can provide.

Could you execute gluster volume set all cluster.op-version 30600? This
should bump up the cluster op-version which will ideally persist the value
in glusterd.info file. Post that you should be able to execute remove brick
command.

HTH,
Atin



 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances

2015-06-02 Thread Geoffrey Letessier
Hi Ben,

I just check my messages log files, both on client and server, and I dont find 
any hung task you notice on yours.. 

As you can read below, i dont note the performance issue in a simple DD but I 
think my issue is concerning a set of small files (tens of thousands nay more)…

[root@nisus test]# ddt -t 10g /mnt/test/
Writing to /mnt/test/ddt.8362 ... syncing ... done.
sleeping 10 seconds ... done.
Reading from /mnt/test/ddt.8362 ... done.
10240MiBKiB/s  CPU%
Write  114770 4
Read40675 4

for info: /mnt/test concerns the single v2 GlFS volume

[root@nisus test]# ddt -t 10g /mnt/fhgfs/
Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done.
sleeping 10 seconds ... done.
Reading from /mnt/fhgfs/ddt.8380 ... done.
10240MiBKiB/s  CPU%
Write  102591 1
Read98079 2

Do you have a idea how to tune/optimize performance settings? and/or TCP 
settings (MTU, etc.)?

---
| |  UNTAR  |   DU   |  FIND   |   TAR   |   RM   |
---
| single  |  ~3m45s |   ~43s |~47s |  ~3m10s | ~3m15s |
---
| replicated  |  ~5m10s |   ~59s |   ~1m6s |  ~1m19s | ~1m49s |
---
| distributed |  ~4m18s |   ~41s |~57s |  ~2m24s | ~1m38s |
---
| dist-repl   |  ~8m18s |  ~1m4s |  ~1m11s |  ~1m24s | ~2m40s |
---
| native FS   |~11s |~4s | ~2s |~56s |   ~10s |
---
| BeeGFS  |  ~3m43s |   ~15s | ~3s |  ~1m33s |   ~46s |
---
| single (v2) |   ~3m6s |   ~14s |~32s |   ~1m2s |   ~44s |
---
for info: 
-BeeGFS is a distributed FS (4 bricks, 2 bricks per server and 2 
servers)
- single (v2): simple gluster volume with default settings

I also note I obtain the same tar/untar performance issue with FhGFS/BeeGFS but 
the rest (DU, FIND, RM) looks like to be OK.

Thank you very much for your reply and help.
Geoffrey
---
Geoffrey Letessier

Responsable informatique  ingénieur système
CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@cnrs.fr

Le 2 juin 2015 à 21:53, Ben Turner btur...@redhat.com a écrit :

 I am seeing problems on 3.7 as well.  Can you check /var/log/messages on both 
 the clients and servers for hung tasks like:
 
 Jun  2 15:23:14 gqac006 kernel: echo 0  
 /proc/sys/kernel/hung_task_timeout_secs disables this message.
 Jun  2 15:23:14 gqac006 kernel: iozoneD 0001 0 21999  
 1 0x0080
 Jun  2 15:23:14 gqac006 kernel: 880611321cc8 0082 
 880611321c18 a027236e
 Jun  2 15:23:14 gqac006 kernel: 880611321c48 a0272c10 
 88052bd1e040 880611321c78
 Jun  2 15:23:14 gqac006 kernel: 88052bd1e0f0 88062080c7a0 
 880625addaf8 880611321fd8
 Jun  2 15:23:14 gqac006 kernel: Call Trace:
 Jun  2 15:23:14 gqac006 kernel: [a027236e] ? 
 rpc_make_runnable+0x7e/0x80 [sunrpc]
 Jun  2 15:23:14 gqac006 kernel: [a0272c10] ? rpc_execute+0x50/0xa0 
 [sunrpc]
 Jun  2 15:23:14 gqac006 kernel: [810aaa21] ? ktime_get_ts+0xb1/0xf0
 Jun  2 15:23:14 gqac006 kernel: [811242d0] ? sync_page+0x0/0x50
 Jun  2 15:23:14 gqac006 kernel: [8152a1b3] io_schedule+0x73/0xc0
 Jun  2 15:23:14 gqac006 kernel: [8112430d] sync_page+0x3d/0x50
 Jun  2 15:23:14 gqac006 kernel: [8152ac7f] __wait_on_bit+0x5f/0x90
 Jun  2 15:23:14 gqac006 kernel: [81124543] 
 wait_on_page_bit+0x73/0x80
 Jun  2 15:23:14 gqac006 kernel: [8109eb80] ? 
 wake_bit_function+0x0/0x50
 Jun  2 15:23:14 gqac006 kernel: [8113a525] ? 
 pagevec_lookup_tag+0x25/0x40
 Jun  2 15:23:14 gqac006 kernel: [8112496b] 
 wait_on_page_writeback_range+0xfb/0x190
 Jun  2 15:23:14 gqac006 kernel: [81124b38] 
 filemap_write_and_wait_range+0x78/0x90
 Jun  2 15:23:14 gqac006 kernel: [811c07ce] 
 vfs_fsync_range+0x7e/0x100
 Jun  2 15:23:14 gqac006 kernel: [811c08bd] vfs_fsync+0x1d/0x20
 Jun  2 15:23:14 gqac006 kernel: [811c08fe] do_fsync+0x3e/0x60
 Jun  2 15:23:14 gqac006 kernel: [811c0950] sys_fsync+0x10/0x20
 Jun  2 15:23:14 gqac006 kernel: [8100b072] 
 system_call_fastpath+0x16/0x1b
 
 Do you see a perf problem with just a simple DD or do you need a more complex 
 workload to hit the issue?  I think I saw an issue with metadata performance 
 that I am trying to run down, let me know if you can see the 

[Gluster-users] GlusterFS 3.7.1 released

2015-06-02 Thread Krishnan Parthasarathi
All,

GlusterFS 3.7.1 has been released. The packages for Centos, Debian, Fedora and 
RHEL
are available at http://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.1/ 
in their
respective directories.

A total of 58 patches were merged after v3.7.0. The following is the 
distribution
of patches among components/features.

   12 tests (regression test-suite)
8 tier
5 glusterd
5 bitrot
4 geo-rep
3 afr
   21 'everywhere else'

List of known bugs for 3.7.1 is being tracked at 
https://bugzilla.redhat.com/show_bug.cgi?id=1219955.
Testing feedback and patches would be welcome.

~kp
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Client load high (300) using fuse mount

2015-06-02 Thread Mitja Mihelič


On 02. 06. 2015 07:33, Pranith Kumar Karampuri wrote:

hi Mitja,
 Could you please give output of the following commands:
1) gluster volume info

Volume Name: gvol-splet
Type: Replicate
Volume ID: FAKE-ID
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gluster1.setup.tld:/gluster/gvol-splet/brick0/brick
Brick2: gluster2.setup.tld:/gluster/gvol-splet/brick0/brick
Brick3: gluster3.setup.tld:/gluster/gvol-splet/brick0/brick
Options Reconfigured:
performance.cache-size: 4GB
network.ping-timeout: 15
auth.allow: WEBNODE-IP1,WEBNODE-IP2
cluster.quorum-type: auto
network.remote-dio: on
cluster.eager-lock: on
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.cache-refresh-timeout: 4
performance.io-thread-count: 32
nfs.disable: on

2) gluster volume profile volname start
3) Wait while the CPU is high for 5-10 minutes
4) gluster volume profile volname info  
output-you-need-to-attach-to-this-mail.txt
I cannot give you the results from the production system, because the 
web server was unresponsive and I switched back to local storage.
The attached file contains results from the setup that was briefly in 
production and will be again when this is solved. The load is sythetic, 
generated by jmeter. During the test iotop on GlusterFS peers showed 
practically zero disk activity. Pretty much the same as under a real 
world load. Average load on the web node was a bit above 50 constantly.


I will try to get the results from the production setup.

Rerads, Mitja




4th command tells what are the operations that are issued a lot.

Pranith

On 06/01/2015 04:41 PM, Mitja Mihelič wrote:

Hi!

I am trying to set up a Wordpress cluster using GlusterFS used for 
storage. Web nodes will access the same Wordpress install on a volume 
mounted via FUSE from a 3 peer GlusterFS TSP.


I started with one web node and Wordpress on local storage. The load 
average was constantly about 5. iotop showed about 300kB/s disk reads 
or less. The load average was below 6.


When I mounted the GlusterFS volume to the web node the 1min load 
average went over 300. Each of the 3 peers is transmitting about 
10MB/s to my web node regardless of the load.

TSP peers are on 10Gbit NICs and the web node is on a 1Gbit NIC.

I'm out of ideas here... Could it be the network?
What should I look at for optimizing the network stack on the client?

Options set on TSP:
Options Reconfigured:
performance.cache-size: 4GB
network.ping-timeout: 15
cluster.quorum-type: auto
network.remote-dio: on
cluster.eager-lock: on
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.cache-refresh-timeout: 4
performance.io-thread-count: 32
nfs.disable: on

Regards, Mitja



Brick: gluster1.setup.tld:/gluster/gvol-splet/brick0/brick
-
Cumulative Stats:
   Block Size:  1b+   2b+   4b+ 
 No. of Reads:0 0 1 
No. of Writes:   2866   180 
 
   Block Size:  8b+  16b+  32b+ 
 No. of Reads:2 94354   131 
No. of Writes:  2124251 
 
   Block Size: 64b+ 128b+ 256b+ 
 No. of Reads:   71   2160815208832 
No. of Writes:  631   372   386 
 
   Block Size:512b+1024b+2048b+ 
 No. of Reads:  2481316   1414880   1377502 
No. of Writes:  147   111   261 
 
   Block Size:   4096b+8192b+   16384b+ 
 No. of Reads:  2753313   2770744   3389212 
No. of Writes:17604  2566   996 
 
   Block Size:  32768b+   65536b+  131072b+ 
 No. of Reads:  1284591803165390224 
No. of Writes:  721  1035 11387 
 
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls Fop
 -   ---   ---   ---   
  0.00   0.00 us   0.00 us   0.00 us   3569  FORGET
  0.00   0.00 us   0.00 us   0.00 us6622013 RELEASE
  0.00   0.00 us   0.00 us   0.00 us5019505  RELEASEDIR
  0.00  35.00 us  35.00 us  35.00 us  1SETXATTR
  0.00  67.00 us  67.00 us  67.00 us  1 

Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances

2015-06-02 Thread Geoffrey Letessier
Hi Pranith,I’m sorry but I cannot bring you any comparison because comparison will be distorted by the fact in my HPC cluster in production the network technology is InfiniBand QDR and my volumes are quite different (brick in RAID6 (12x2TB), 2 bricks per server and 4 servers into my pool)Concerning your demand, in attachments you can find all expected results hoping it can help you to solve this serious performance issue (maybe I need play with glusterfs parameters?).Thank you very much by advance,Geoffrey
--Geoffrey LetessierResponsable informatique ingénieur systèmeUPR 9080 - CNRS - Laboratoire de BiochimieThéoriqueInstitut de Biologie Physico-Chimique13, rue Pierre et Marie Curie - 75005 ParisTel: 01 58 41 50 93 - eMail:geoffrey.letess...@ibpc.fr

Le 2 juin 2015 à 10:09, Pranith Kumar Karampuri pkara...@redhat.com a écrit :
  

  
  
hi Geoffrey,
 Since you are saying it happens on all types of
volumes, lets do the following:
1) Create a dist-repl volume
2) Set the options etc you need.
3) enable gluster volume profile using "gluster volume profile
volname start"
4) run the work load
5) give output of "gluster volume profile volname info"

Repeat the steps above on new and old version you are comparing this
with. That should give us insight into what could be causing the
slowness.

Pranith
On 06/02/2015 03:22 AM, Geoffrey
  Letessier wrote:


  
  Dear all,
  
  
  I have a crash test cluster where i’ve tested the new version
of GlusterFS (v3.7) before upgrading my HPC cluster in
production.
  But… all my tests show me very very low performances.
  
  
  For my benches, as you can read below, I do some actions
(untar, du, find, tar, rm) with linux kernel sources, dropping
cache, each on distributed, replicated, distributed-replicated,
single (single brick) volumes and the native FS of one brick.
  
  
  
# time (echo 3  /proc/sys/vm/drop_caches; tar xJf
  ~/linux-4.1-rc5.tar.xz; sync; echo 3 
  /proc/sys/vm/drop_caches)
# time (echo 3  /proc/sys/vm/drop_caches; du -sh
  linux-4.1-rc5/; echo 3  /proc/sys/vm/drop_caches)
# time (echo 3  /proc/sys/vm/drop_caches; find
  linux-4.1-rc5/|wc -l; echo 3  /proc/sys/vm/drop_caches)
# time (echo 3  /proc/sys/vm/drop_caches; tar czf
  linux-4.1-rc5.tgz linux-4.1-rc5/; echo 3 
  /proc/sys/vm/drop_caches)
# time (echo 3  /proc/sys/vm/drop_caches; rm -rf
  linux-4.1-rc5.tgz linux-4.1-rc5/; echo 3 
  /proc/sys/vm/drop_caches)


And here are the process times:



  
---
|   | UNTAR |  DU  | FIND  |  TAR
   |  RM  |
---
| single   | ~3m45s |  ~43s | ~47s|
  ~3m10s | ~3m15s |
---
| replicated | ~5m10s |  ~59s | ~1m6s|
  ~1m19s | ~1m49s |
---
| distributed | ~4m18s |  ~41s | ~57s|
  ~2m24s | ~1m38s |
---
| dist-repl  | ~8m18s | ~1m4s |~1m11s |
  ~1m24s | ~2m40s |
---
| native FS  |  ~11s |  ~4s |  ~2s| 
  ~56s |  ~10s |
---
  



I get the same results, whether with default configurations
  with custom configurations.


if I look at the side of the ifstat command, I can note my
  IO write processes never exceed 3MBs...


EXT4 native FS seems to be faster (roughly 15-20% but no
  more) than XFS one


My [test] storage cluster config is composed by 2 identical
  servers (biCPU Intel Xeon X5355, 8GB of RAM, 2x2TB HDD
  (no-RAID) and Gb ethernet)


My volume settings:
 single:
  1server 1 brick
 replicated:
  2 servers 1 brick each
 distributed:
  2 servers 2 bricks each
 dist-repl:
  2 bricks in the same server and replica 2


All seems to be OK in gluster status command line.


Do you have an idea why I obtain so bad results?
Thanks in advance.
Geoffrey
---

  Geoffrey Letessier

  

Re: [Gluster-users] gfapi access not working with 3.7.0

2015-06-02 Thread Pranith Kumar Karampuri



On 05/31/2015 01:02 AM, Alessandro De Salvo wrote:

Thanks again Pranith!
Unfortunately the fixes missed the window for 3.7.1. These fixes will be 
available in the next release.


Pranith


Alessandro


Il giorno 30/mag/2015, alle ore 03:16, Pranith Kumar Karampuri 
pkara...@redhat.com ha scritto:

Alessandro,
  Same issue as the bug you talked about in gluster volume heal info 
thread. http://review.gluster.org/11002 should address this (Not the same fix you patched 
for glfsheal). I will backport this one to 3.7.1 as well.

Pranith
On 05/30/2015 12:23 AM, Alessandro De Salvo wrote:

Hi,
I'm trying to access a volume using gfapi and gluster 3.7.0. This was
working with 3.6.3, but not working anymore after the upgrade.
The volume has snapshots enabled, and it's configured in the following
way:

# gluster volume info adsnet-vm-01
  Volume Name: adsnet-vm-01
Type: Replicate
Volume ID: f8f615df-3dde-4ea6-9bdb-29a1706e864c
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gwads02.sta.adsnet.it:/gluster/vm01/data
Brick2: gwads03.sta.adsnet.it:/gluster/vm01/data
Options Reconfigured:
server.allow-insecure: on
features.file-snapshot: on
features.barrier: disable
nfs.disable: true

Also, my /etc/glusterfs/glusterd.vol has the needed option:

# cat /etc/glusterfs/glusterd.vol
# This file is managed by puppet, do not change
volume management
 type mgmt/glusterd
 option working-directory /var/lib/glusterd
 option transport-type socket,rdma
 option transport.socket.keepalive-time 10
 option transport.socket.keepalive-interval 2
 option transport.socket.read-fail-log off
 option ping-timeout 30
 option rpc-auth-allow-insecure on
#   option base-port 49152
end-volume

However, when I try for example to access an image via qemu-img it
segfaults:

# qemu-img info
gluster://gwads03.sta.adsnet.it/adsnet-vm-01/images/foreman7.vm.adsnet.it.qcow2
[2015-05-29 18:39:41.436951] E [MSGID: 108006]
[afr-common.c:3919:afr_notify] 0-adsnet-vm-01-replicate-0: All
subvolumes are down. Going offline until atleast one of them comes back
up.
[2015-05-29 18:39:41.438234] E [rpc-transport.c:512:rpc_transport_unref]
(-- /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fc3851caf16]
(-- /lib64/libgfrpc.so.0(rpc_transport_unref+0xa3)[0x7fc387c855a3]
(-- /lib64/libgfrpc.so.0(rpc_clnt_unref+0x5c)[0x7fc387c888ec]
(-- /lib64/libglusterfs.so.0(+0x21791)[0x7fc3851c7791]
(-- /lib64/libglusterfs.so.0(+0x21725)[0x7fc3851c7725] )
0-rpc_transport: invalid argument: this
[2015-05-29 18:39:41.438484] E [rpc-transport.c:512:rpc_transport_unref]
(-- /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fc3851caf16]
(-- /lib64/libgfrpc.so.0(rpc_transport_unref+0xa3)[0x7fc387c855a3]
(-- /lib64/libgfrpc.so.0(rpc_clnt_unref+0x5c)[0x7fc387c888ec]
(-- /lib64/libglusterfs.so.0(+0x21791)[0x7fc3851c7791]
(-- /lib64/libglusterfs.so.0(+0x21725)[0x7fc3851c7725] )
0-rpc_transport: invalid argument: this
Segmentation fault (core dumped)

The volume is fine:

# gluster volume status adsnet-vm-01
Status of volume: adsnet-vm-01
Gluster process TCP Port  RDMA Port  Online
Pid
--
Brick gwads02.sta.adsnet.it:/gluster/vm01/d
ata 49159 0  Y
27878
Brick gwads03.sta.adsnet.it:/gluster/vm01/d
ata 49159 0  Y
24638
Self-heal Daemon on localhost   N/A   N/AY
28031
Self-heal Daemon on gwads03.sta.adsnet.it   N/A   N/AY
24667
  Task Status of Volume adsnet-vm-01
--
There are no active volume tasks


Running with the debugger I see the following:

(gdb) r
Starting program: /usr/bin/qemu-img info
gluster://gwads03.sta.adsnet.it/adsnet-vm-01/images/foreman7.vm.adsnet.it.qcow2
[Thread debugging using libthread_db enabled]
Using host libthread_db library /lib64/libthread_db.so.1.
[New Thread 0x7176a700 (LWP 30027)]
[New Thread 0x70f69700 (LWP 30028)]
[New Thread 0x7fffe99ab700 (LWP 30029)]
[New Thread 0x7fffe8fa7700 (LWP 30030)]
[New Thread 0x7fffe3fff700 (LWP 30031)]
[New Thread 0x7fffdbfff700 (LWP 30032)]
[New Thread 0x7fffdb2dd700 (LWP 30033)]
[2015-05-29 18:51:25.656014] E [MSGID: 108006]
[afr-common.c:3919:afr_notify] 0-adsnet-vm-01-replicate-0: All
subvolumes are down. Going offline until atleast one of them comes back
up.
[2015-05-29 18:51:25.657338] E [rpc-transport.c:512:rpc_transport_unref]
(-- /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x748bcf16]
(-- /lib64/libgfrpc.so.0(rpc_transport_unref+0xa3)[0x773775a3]
(-- /lib64/libgfrpc.so.0(rpc_clnt_unref+0x5c)[0x7737a8ec]
(-- /lib64/libglusterfs.so.0(+0x21791)[0x748b9791]
(-- /lib64/libglusterfs.so.0(+0x21725)[0x748b9725] )
0-rpc_transport: invalid argument: this
[2015-05-29 18:51:25.657619] E 

[Gluster-users] Minutes from todays Gluster Community Bug Triage meeting

2015-06-02 Thread Niels de Vos
On Tue, Jun 02, 2015 at 12:51:37PM +0200, Niels de Vos wrote:
 Hi all,
 
 This meeting is scheduled for anyone that is interested in learning more
 about, or assisting with the Bug Triage.
 
 Meeting details:
 - location: #gluster-meeting on Freenode IRC
 ( https://webchat.freenode.net/?channels=gluster-meeting )
 - date: every Tuesday
 - time: 12:00 UTC
 (in your terminal, run: date -d 12:00 UTC)
 - agenda: https://public.pad.fsfe.org/p/gluster-bug-triage
 
 Currently the following items are listed:
 * Roll Call
 * Status of last weeks action items
 * Group Triage
 * Open Floor
 
 The last two topics have space for additions. If you have a suitable bug
 or topic to discuss, please add it to the agenda.
 
 Appreciate your participation.


Minutes: 
http://meetbot.fedoraproject.org/gluster-meeting/2015-06-02/gluster-meeting.2015-06-02-12.06.html
Minutes (text): 
http://meetbot.fedoraproject.org/gluster-meeting/2015-06-02/gluster-meeting.2015-06-02-12.06.txt
Log: 
http://meetbot.fedoraproject.org/gluster-meeting/2015-06-02/gluster-meeting.2015-06-02-12.06.log.html


  Meeting summary

1.   a. Agenda: https://public.pad.fsfe.org/p/gluster-bug-triage (ndevos, 
12:06:40)
2. Roll Call (ndevos, 12:07:01)
3. Action Items from last week (ndevos, 12:08:55)
4. ndevos needs to look into building nightly debug rpms that can be used 
for testing (ndevos, 12:09:35)
5. Group Triage (ndevos, 12:11:13)
 a. 0 bugs are waiting on feedback from b...@gluster.org (ndevos, 
12:11:52)
 b. 20 new bugs that have not been (completely) triaged yet: 
http://goo.gl/WuDQun (ndevos, 12:12:43)
6. Open Floor (ndevos, 12:46:08)

   Meeting ended at 12:47:35 UTC (full logs).

  Action items

1. (none)

  People present (lines said)

1. ndevos (43)
2. soumya (15)
3. rjoseph (3)
4. zodbot (2)

   Generated by MeetBot 0.1.4.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster-3.7 cannot start volume ganesha feature cannot turn on problem

2015-06-02 Thread Soumya Koduri



On 06/02/2015 04:38 PM, Anoop C S wrote:



On 06/02/2015 01:42 PM, 莊尚豪 wrote:

Hi all,

I have two question for glusterfs-3.7 on fedora-22

I used to have a glusterfs cluster version 3.6.2.

The following configuration can be work in version-3.6.2, but not
in version-3.7



There is 2 node for glusterfs.

OS: fedora 22

Gluster: 3.7 on
https://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.0/



#gluster peer probe n1

#gluster volume create ganesha n1:/data/brick1/gv0
n2:/data/brick1/gv0



Volume Name: ganesha

Type: Distribute

Volume ID: cbb8d360-0025-419c-a12b-b29e4b91d7f8

Status: Created

Number of Bricks: 2

Transport-type: tcp

Bricks:

Brick1: n1:/data/brick1/gv0

Brick2: n2:/data/brick1/gv0

Options Reconfigured:

performance.readdir-ahead: on



The problem to start the volume ganesha

#gluster volume start ganesha



volume start: ganesha: failed: Commit failed on localhost. Please
check the log file for more details.



LOG in /var/log/glusterfs/bricks/data-brick1-gv0.log



[2015-06-02 08:02:55.232923] I [MSGID: 100030]
[glusterfsd.c:2294:main] 0-/usr/sbin/glusterfsd: Started running
/usr/sbin/glusterfsd version 3.7.0 (args: /usr/sbin/glusterfsd -s
n2 --volfile-id ganesha.n2.data-brick1-gv0 -p
/var/lib/glusterd/vols/ganesha/run/n2-data-brick1-gv0.pid -S
/var/run/gluster/73ea8a39514304f5ebd440321d784386.socket
--brick-name /data/brick1/gv0 -l
/var/log/glusterfs/bricks/data-brick1-gv0.log --xlator-option
*-posix.glusterd-uuid=35547067-d343-4fee-802a-0e911b5a07cd
--brick-port 49157 --xlator-option
ganesha-server.listen-port=49157)

[2015-06-02 08:02:55.284206] I
[event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started
thread with index 1

[2015-06-02 08:02:55.397923] W [xlator.c:192:xlator_dynload]
0-xlator: /usr/lib64/glusterfs/3.7.0/xlator/features/changelog.so:
undefined symbol: changelog_select_event



This particular error for undefined symbol changelog_select_event
was identified recently and corresponding fix [
http://review.gluster.org/#/c/11004/ ] is already in master and
hopefully will be available with v3.7.1.


[2015-06-02 08:02:55.397963] E [graph.y:212:volume_type] 0-parser:
Volume 'ganesha-changelog', line 30: type 'features/changelog' is
not valid or not found on this machine

[2015-06-02 08:02:55.397992] E [graph.y:321:volume_end] 0-parser:
type not specified for volume ganesha-changelog

[2015-06-02 08:02:55.398214] E [MSGID: 100026]
[glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct
the graph

[2015-06-02 08:02:55.398423] W [glusterfsd.c:1219:cleanup_and_exit]
(-- 0-: received signum (0), shutting down



I cannot google method to resolve it.

Does anyone have across this problem?



Another question is the feature in nfs-ganesha(version 2.2)

The volume command I cannot turn on this feature.

I try to copy the demo glusterfs-ganesha video but cannot work.

Demo link:
https://plus.google.com/events/c9omal6366f2cfkcd0iuee5ta1o



[root@n1 brick1]# gluster nfs-ganesha enable

Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the
trusted pool. Do you still want to continue? (y/n) y

nfs-ganesha: failed: Commit failed on localhost. Please check the
log file for more details.




As you may have seen in the demo video, there are many pre-requisites to 
be followed before enabling nfs-ganesha. Can you please re-check if you 
have all those steps taken care of ?  Also look at the logs 
'/var/log/ganesha.log' and '/var/log/messages' for any specific errors 
logged.


Thanks,
Soumya


Adding ganesha folks to the thread.



Does anyone have the detail configuration?

THANKS for giving advice.



Regards,

Ben






___ Gluster-users
mailing list Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gfapi access not working with 3.7.0

2015-06-02 Thread Alessandro De Salvo
OK, Thanks Pranith.
Do you have a timeline for that?
Cheers,

Alessandro


 Il giorno 02/giu/2015, alle ore 15:12, Pranith Kumar Karampuri 
 pkara...@redhat.com ha scritto:
 
 
 
 On 05/31/2015 01:02 AM, Alessandro De Salvo wrote:
 Thanks again Pranith!
 Unfortunately the fixes missed the window for 3.7.1. These fixes will be 
 available in the next release.
 
 Pranith
 
  Alessandro
 
 Il giorno 30/mag/2015, alle ore 03:16, Pranith Kumar Karampuri 
 pkara...@redhat.com ha scritto:
 
 Alessandro,
  Same issue as the bug you talked about in gluster volume heal info 
 thread. http://review.gluster.org/11002 should address this (Not the same 
 fix you patched for glfsheal). I will backport this one to 3.7.1 as well.
 
 Pranith
 On 05/30/2015 12:23 AM, Alessandro De Salvo wrote:
 Hi,
 I'm trying to access a volume using gfapi and gluster 3.7.0. This was
 working with 3.6.3, but not working anymore after the upgrade.
 The volume has snapshots enabled, and it's configured in the following
 way:
 
 # gluster volume info adsnet-vm-01
  Volume Name: adsnet-vm-01
 Type: Replicate
 Volume ID: f8f615df-3dde-4ea6-9bdb-29a1706e864c
 Status: Started
 Number of Bricks: 1 x 2 = 2
 Transport-type: tcp
 Bricks:
 Brick1: gwads02.sta.adsnet.it:/gluster/vm01/data
 Brick2: gwads03.sta.adsnet.it:/gluster/vm01/data
 Options Reconfigured:
 server.allow-insecure: on
 features.file-snapshot: on
 features.barrier: disable
 nfs.disable: true
 
 Also, my /etc/glusterfs/glusterd.vol has the needed option:
 
 # cat /etc/glusterfs/glusterd.vol
 # This file is managed by puppet, do not change
 volume management
 type mgmt/glusterd
 option working-directory /var/lib/glusterd
 option transport-type socket,rdma
 option transport.socket.keepalive-time 10
 option transport.socket.keepalive-interval 2
 option transport.socket.read-fail-log off
 option ping-timeout 30
 option rpc-auth-allow-insecure on
 #   option base-port 49152
 end-volume
 
 However, when I try for example to access an image via qemu-img it
 segfaults:
 
 # qemu-img info
 gluster://gwads03.sta.adsnet.it/adsnet-vm-01/images/foreman7.vm.adsnet.it.qcow2
 [2015-05-29 18:39:41.436951] E [MSGID: 108006]
 [afr-common.c:3919:afr_notify] 0-adsnet-vm-01-replicate-0: All
 subvolumes are down. Going offline until atleast one of them comes back
 up.
 [2015-05-29 18:39:41.438234] E [rpc-transport.c:512:rpc_transport_unref]
 (-- /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fc3851caf16]
 (-- /lib64/libgfrpc.so.0(rpc_transport_unref+0xa3)[0x7fc387c855a3]
 (-- /lib64/libgfrpc.so.0(rpc_clnt_unref+0x5c)[0x7fc387c888ec]
 (-- /lib64/libglusterfs.so.0(+0x21791)[0x7fc3851c7791]
 (-- /lib64/libglusterfs.so.0(+0x21725)[0x7fc3851c7725] )
 0-rpc_transport: invalid argument: this
 [2015-05-29 18:39:41.438484] E [rpc-transport.c:512:rpc_transport_unref]
 (-- /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fc3851caf16]
 (-- /lib64/libgfrpc.so.0(rpc_transport_unref+0xa3)[0x7fc387c855a3]
 (-- /lib64/libgfrpc.so.0(rpc_clnt_unref+0x5c)[0x7fc387c888ec]
 (-- /lib64/libglusterfs.so.0(+0x21791)[0x7fc3851c7791]
 (-- /lib64/libglusterfs.so.0(+0x21725)[0x7fc3851c7725] )
 0-rpc_transport: invalid argument: this
 Segmentation fault (core dumped)
 
 The volume is fine:
 
 # gluster volume status adsnet-vm-01
 Status of volume: adsnet-vm-01
 Gluster process TCP Port  RDMA Port  Online
 Pid
 --
 Brick gwads02.sta.adsnet.it:/gluster/vm01/d
 ata 49159 0  Y
 27878
 Brick gwads03.sta.adsnet.it:/gluster/vm01/d
 ata 49159 0  Y
 24638
 Self-heal Daemon on localhost   N/A   N/AY
 28031
 Self-heal Daemon on gwads03.sta.adsnet.it   N/A   N/AY
 24667
  Task Status of Volume adsnet-vm-01
 --
 There are no active volume tasks
 
 
 Running with the debugger I see the following:
 
 (gdb) r
 Starting program: /usr/bin/qemu-img info
 gluster://gwads03.sta.adsnet.it/adsnet-vm-01/images/foreman7.vm.adsnet.it.qcow2
 [Thread debugging using libthread_db enabled]
 Using host libthread_db library /lib64/libthread_db.so.1.
 [New Thread 0x7176a700 (LWP 30027)]
 [New Thread 0x70f69700 (LWP 30028)]
 [New Thread 0x7fffe99ab700 (LWP 30029)]
 [New Thread 0x7fffe8fa7700 (LWP 30030)]
 [New Thread 0x7fffe3fff700 (LWP 30031)]
 [New Thread 0x7fffdbfff700 (LWP 30032)]
 [New Thread 0x7fffdb2dd700 (LWP 30033)]
 [2015-05-29 18:51:25.656014] E [MSGID: 108006]
 [afr-common.c:3919:afr_notify] 0-adsnet-vm-01-replicate-0: All
 subvolumes are down. Going offline until atleast one of them comes back
 up.
 [2015-05-29 18:51:25.657338] E [rpc-transport.c:512:rpc_transport_unref]
 (-- /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x748bcf16]
 (-- 

Re: [Gluster-users] gfapi access not working with 3.7.0

2015-06-02 Thread Pranith Kumar Karampuri



On 06/02/2015 06:52 PM, Alessandro De Salvo wrote:

OK, Thanks Pranith.
Do you have a timeline for that?
It will be discussed in tomorrow's weekly community developers meeting. 
Then we may have some estimate.


Pranith

Cheers,

Alessandro



Il giorno 02/giu/2015, alle ore 15:12, Pranith Kumar Karampuri 
pkara...@redhat.com ha scritto:



On 05/31/2015 01:02 AM, Alessandro De Salvo wrote:

Thanks again Pranith!

Unfortunately the fixes missed the window for 3.7.1. These fixes will be 
available in the next release.

Pranith

Alessandro


Il giorno 30/mag/2015, alle ore 03:16, Pranith Kumar Karampuri 
pkara...@redhat.com ha scritto:

Alessandro,
  Same issue as the bug you talked about in gluster volume heal info 
thread. http://review.gluster.org/11002 should address this (Not the same fix you patched 
for glfsheal). I will backport this one to 3.7.1 as well.

Pranith
On 05/30/2015 12:23 AM, Alessandro De Salvo wrote:

Hi,
I'm trying to access a volume using gfapi and gluster 3.7.0. This was
working with 3.6.3, but not working anymore after the upgrade.
The volume has snapshots enabled, and it's configured in the following
way:

# gluster volume info adsnet-vm-01
  Volume Name: adsnet-vm-01
Type: Replicate
Volume ID: f8f615df-3dde-4ea6-9bdb-29a1706e864c
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gwads02.sta.adsnet.it:/gluster/vm01/data
Brick2: gwads03.sta.adsnet.it:/gluster/vm01/data
Options Reconfigured:
server.allow-insecure: on
features.file-snapshot: on
features.barrier: disable
nfs.disable: true

Also, my /etc/glusterfs/glusterd.vol has the needed option:

# cat /etc/glusterfs/glusterd.vol
# This file is managed by puppet, do not change
volume management
 type mgmt/glusterd
 option working-directory /var/lib/glusterd
 option transport-type socket,rdma
 option transport.socket.keepalive-time 10
 option transport.socket.keepalive-interval 2
 option transport.socket.read-fail-log off
 option ping-timeout 30
 option rpc-auth-allow-insecure on
#   option base-port 49152
end-volume

However, when I try for example to access an image via qemu-img it
segfaults:

# qemu-img info
gluster://gwads03.sta.adsnet.it/adsnet-vm-01/images/foreman7.vm.adsnet.it.qcow2
[2015-05-29 18:39:41.436951] E [MSGID: 108006]
[afr-common.c:3919:afr_notify] 0-adsnet-vm-01-replicate-0: All
subvolumes are down. Going offline until atleast one of them comes back
up.
[2015-05-29 18:39:41.438234] E [rpc-transport.c:512:rpc_transport_unref]
(-- /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fc3851caf16]
(-- /lib64/libgfrpc.so.0(rpc_transport_unref+0xa3)[0x7fc387c855a3]
(-- /lib64/libgfrpc.so.0(rpc_clnt_unref+0x5c)[0x7fc387c888ec]
(-- /lib64/libglusterfs.so.0(+0x21791)[0x7fc3851c7791]
(-- /lib64/libglusterfs.so.0(+0x21725)[0x7fc3851c7725] )
0-rpc_transport: invalid argument: this
[2015-05-29 18:39:41.438484] E [rpc-transport.c:512:rpc_transport_unref]
(-- /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7fc3851caf16]
(-- /lib64/libgfrpc.so.0(rpc_transport_unref+0xa3)[0x7fc387c855a3]
(-- /lib64/libgfrpc.so.0(rpc_clnt_unref+0x5c)[0x7fc387c888ec]
(-- /lib64/libglusterfs.so.0(+0x21791)[0x7fc3851c7791]
(-- /lib64/libglusterfs.so.0(+0x21725)[0x7fc3851c7725] )
0-rpc_transport: invalid argument: this
Segmentation fault (core dumped)

The volume is fine:

# gluster volume status adsnet-vm-01
Status of volume: adsnet-vm-01
Gluster process TCP Port  RDMA Port  Online
Pid
--
Brick gwads02.sta.adsnet.it:/gluster/vm01/d
ata 49159 0  Y
27878
Brick gwads03.sta.adsnet.it:/gluster/vm01/d
ata 49159 0  Y
24638
Self-heal Daemon on localhost   N/A   N/AY
28031
Self-heal Daemon on gwads03.sta.adsnet.it   N/A   N/AY
24667
  Task Status of Volume adsnet-vm-01
--
There are no active volume tasks


Running with the debugger I see the following:

(gdb) r
Starting program: /usr/bin/qemu-img info
gluster://gwads03.sta.adsnet.it/adsnet-vm-01/images/foreman7.vm.adsnet.it.qcow2
[Thread debugging using libthread_db enabled]
Using host libthread_db library /lib64/libthread_db.so.1.
[New Thread 0x7176a700 (LWP 30027)]
[New Thread 0x70f69700 (LWP 30028)]
[New Thread 0x7fffe99ab700 (LWP 30029)]
[New Thread 0x7fffe8fa7700 (LWP 30030)]
[New Thread 0x7fffe3fff700 (LWP 30031)]
[New Thread 0x7fffdbfff700 (LWP 30032)]
[New Thread 0x7fffdb2dd700 (LWP 30033)]
[2015-05-29 18:51:25.656014] E [MSGID: 108006]
[afr-common.c:3919:afr_notify] 0-adsnet-vm-01-replicate-0: All
subvolumes are down. Going offline until atleast one of them comes back
up.
[2015-05-29 18:51:25.657338] E [rpc-transport.c:512:rpc_transport_unref]
(-- 

Re: [Gluster-users] gluster-3.7 cannot start volume ganesha feature cannot turn on problem

2015-06-02 Thread Anoop C S

Can you please attach the glusterd logs here? You are having trouble
to even start the volume here right?
And also HA configuration is mandatory to use NFS-Ganesha in this
release. Once you have the volume
started, I can help you with the remaining steps in detail.

Thanks
Meghana

- Original Message -
From: Anoop C S achir...@redhat.com
To: gluster-users@gluster.org
Cc: Meghana Madhusudhan mmadh...@redhat.com, Soumya Koduri
skod...@redhat.com
Sent: Tuesday, June 2, 2015 4:38:39 PM
Subject: Re: [Gluster-users] gluster-3.7 cannot start volume  ganesha
feature cannot turn on problem
manadatroy


On 06/02/2015 01:42 PM, 莊尚豪 wrote:
 Hi all,
 
 I have two question for glusterfs-3.7 on fedora-22
 
 I used to have a glusterfs cluster version 3.6.2.
 
 The following configuration can be work in version-3.6.2, but not 
 in version-3.7
 
 
 
 There is 2 node for glusterfs.
 
 OS: fedora 22
 
 Gluster: 3.7 on 
 https://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.0/
 
 
 
 #gluster peer probe n1
 
 #gluster volume create ganesha n1:/data/brick1/gv0 
 n2:/data/brick1/gv0
 
 
 
 Volume Name: ganesha
 
 Type: Distribute
 
 Volume ID: cbb8d360-0025-419c-a12b-b29e4b91d7f8
 
 Status: Created
 
 Number of Bricks: 2
 
 Transport-type: tcp
 
 Bricks:
 
 Brick1: n1:/data/brick1/gv0
 
 Brick2: n2:/data/brick1/gv0
 
 Options Reconfigured:
 
 performance.readdir-ahead: on
 
 
 
 The problem to start the volume ganesha
 
 #gluster volume start ganesha
 
 
 
 volume start: ganesha: failed: Commit failed on localhost. Please 
 check the log file for more details.
 
 
 
 LOG in /var/log/glusterfs/bricks/data-brick1-gv0.log
 
 
 
 [2015-06-02 08:02:55.232923] I [MSGID: 100030] 
 [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfsd: Started running 
 /usr/sbin/glusterfsd version 3.7.0 (args: /usr/sbin/glusterfsd -s 
 n2 --volfile-id ganesha.n2.data-brick1-gv0 -p 
 /var/lib/glusterd/vols/ganesha/run/n2-data-brick1-gv0.pid -S 
 /var/run/gluster/73ea8a39514304f5ebd440321d784386.socket 
 --brick-name /data/brick1/gv0 -l 
 /var/log/glusterfs/bricks/data-brick1-gv0.log --xlator-option 
 *-posix.glusterd-uuid=35547067-d343-4fee-802a-0e911b5a07cd 
 --brick-port 49157 --xlator-option 
 ganesha-server.listen-port=49157)
 
 [2015-06-02 08:02:55.284206] I 
 [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started 
 thread with index 1
 
 [2015-06-02 08:02:55.397923] W [xlator.c:192:xlator_dynload] 
 0-xlator: /usr/lib64/glusterfs/3.7.0/xlator/features/changelog.so: 
 undefined symbol: changelog_select_event
 

This particular error for undefined symbol changelog_select_event
was identified recently and corresponding fix [
http://review.gluster.org/#/c/11004/ ] is already in master and
hopefully will be available with v3.7.1.

 [2015-06-02 08:02:55.397963] E [graph.y:212:volume_type] 0-parser: 
 Volume 'ganesha-changelog', line 30: type 'features/changelog' is 
 not valid or not found on this machine
 
 [2015-06-02 08:02:55.397992] E [graph.y:321:volume_end] 0-parser: 
 type not specified for volume ganesha-changelog
 
 [2015-06-02 08:02:55.398214] E [MSGID: 100026] 
 [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to
 construct the graph
 
 [2015-06-02 08:02:55.398423] W
 [glusterfsd.c:1219:cleanup_and_exit] (-- 0-: received signum (0),
 shutting down
 
 
 
 I cannot google method to resolve it.
 
 Does anyone have across this problem?
 
 
 
 Another question is the feature in nfs-ganesha(version 2.2)
 
 The volume command I cannot turn on this feature.
 
 I try to copy the demo glusterfs-ganesha video but cannot work.
 
 Demo link: 
 https://plus.google.com/events/c9omal6366f2cfkcd0iuee5ta1o
 
 
 
 [root@n1 brick1]# gluster nfs-ganesha enable
 
 Enabling NFS-Ganesha requires Gluster-NFS to be disabled across
 the trusted pool. Do you still want to continue? (y/n) y
 
 nfs-ganesha: failed: Commit failed on localhost. Please check the 
 log file for more details.
 
 

Adding ganesha folks to the thread.

 
 Does anyone have the detail configuration?
 
 THANKS for giving advice.
 
 
 
 Regards,
 
 Ben
 
 
 
 
 
 
 ___ Gluster-users 
 mailing list Gluster-users@gluster.org 
 http://www.gluster.org/mailman/listinfo/gluster-users
 


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gluster-3.7 cannot start volume ganesha feature cannot turn on problem

2015-06-02 Thread Anoop C S



 Forwarded Message 
Subject: Re: [Gluster-users] gluster-3.7 cannot start volume  ganesha
feature cannot turn on problem
Date: Tue, 2 Jun 2015 09:01:38 -0400 (EDT)
From: Meghana Madhusudhan mmadh...@redhat.com
To: Anoop C S achir...@redhat.com
CC: gluster-users@gluster.org, Soumya Koduri skod...@redhat.com

Hi Anoop,
Can you add the ID of the person who asked this question and forward the
same?


Can you please attach the glusterd logs here? You are having trouble to
even start the volume here right?
And also HA configuration is mandatory to use NFS-Ganesha in this
release. Once you have the volume
started, I can help you with the remaining steps in detail.

Thanks
Meghana

- Original Message -
From: Anoop C S achir...@redhat.com
To: gluster-users@gluster.org
Cc: Meghana Madhusudhan mmadh...@redhat.com, Soumya Koduri
skod...@redhat.com
Sent: Tuesday, June 2, 2015 4:38:39 PM
Subject: Re: [Gluster-users] gluster-3.7 cannot start volume  ganesha
feature cannot turn on problem
manadatroy


On 06/02/2015 01:42 PM, 莊尚豪 wrote:
 Hi all,
 
 I have two question for glusterfs-3.7 on fedora-22
 
 I used to have a glusterfs cluster version 3.6.2.
 
 The following configuration can be work in version-3.6.2, but not
 in version-3.7
 
 
 
 There is 2 node for glusterfs.
 
 OS: fedora 22
 
 Gluster: 3.7 on 
 https://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.0/
 
 
 
 #gluster peer probe n1
 
 #gluster volume create ganesha n1:/data/brick1/gv0
 n2:/data/brick1/gv0
 
 
 
 Volume Name: ganesha
 
 Type: Distribute
 
 Volume ID: cbb8d360-0025-419c-a12b-b29e4b91d7f8
 
 Status: Created
 
 Number of Bricks: 2
 
 Transport-type: tcp
 
 Bricks:
 
 Brick1: n1:/data/brick1/gv0
 
 Brick2: n2:/data/brick1/gv0
 
 Options Reconfigured:
 
 performance.readdir-ahead: on
 
 
 
 The problem to start the volume ganesha
 
 #gluster volume start ganesha
 
 
 
 volume start: ganesha: failed: Commit failed on localhost. Please
 check the log file for more details.
 
 
 
 LOG in /var/log/glusterfs/bricks/data-brick1-gv0.log
 
 
 
 [2015-06-02 08:02:55.232923] I [MSGID: 100030]
 [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfsd: Started running
 /usr/sbin/glusterfsd version 3.7.0 (args: /usr/sbin/glusterfsd -s
 n2 --volfile-id ganesha.n2.data-brick1-gv0 -p 
 /var/lib/glusterd/vols/ganesha/run/n2-data-brick1-gv0.pid -S 
 /var/run/gluster/73ea8a39514304f5ebd440321d784386.socket
 --brick-name /data/brick1/gv0 -l
 /var/log/glusterfs/bricks/data-brick1-gv0.log --xlator-option
 *-posix.glusterd-uuid=35547067-d343-4fee-802a-0e911b5a07cd 
 --brick-port 49157 --xlator-option
 ganesha-server.listen-port=49157)
 
 [2015-06-02 08:02:55.284206] I 
 [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started
 thread with index 1
 
 [2015-06-02 08:02:55.397923] W [xlator.c:192:xlator_dynload]
 0-xlator: /usr/lib64/glusterfs/3.7.0/xlator/features/changelog.so:
 undefined symbol: changelog_select_event
 

This particular error for undefined symbol changelog_select_event
was identified recently and corresponding fix [
http://review.gluster.org/#/c/11004/ ] is already in master and
hopefully will be available with v3.7.1.

 [2015-06-02 08:02:55.397963] E [graph.y:212:volume_type] 0-parser:
 Volume 'ganesha-changelog', line 30: type 'features/changelog' is
 not valid or not found on this machine
 
 [2015-06-02 08:02:55.397992] E [graph.y:321:volume_end] 0-parser:
 type not specified for volume ganesha-changelog
 
 [2015-06-02 08:02:55.398214] E [MSGID: 100026] 
 [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct
 the graph
 
 [2015-06-02 08:02:55.398423] W [glusterfsd.c:1219:cleanup_and_exit]
 (-- 0-: received signum (0), shutting down
 
 
 
 I cannot google method to resolve it.
 
 Does anyone have across this problem?
 
 
 
 Another question is the feature in nfs-ganesha(version 2.2)
 
 The volume command I cannot turn on this feature.
 
 I try to copy the demo glusterfs-ganesha video but cannot work.
 
 Demo link:
 https://plus.google.com/events/c9omal6366f2cfkcd0iuee5ta1o
 
 
 
 [root@n1 brick1]# gluster nfs-ganesha enable
 
 Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the
 trusted pool. Do you still want to continue? (y/n) y
 
 nfs-ganesha: failed: Commit failed on localhost. Please check the
 log file for more details.
 
 

Adding ganesha folks to the thread.

 
 Does anyone have the detail configuration?
 
 THANKS for giving advice.
 
 
 
 Regards,
 
 Ben
 
 
 
 
 
 
 ___ Gluster-users
 mailing list Gluster-users@gluster.org 
 http://www.gluster.org/mailman/listinfo/gluster-users
 


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-Replication - Changelog socket is not present - Falling back to xsync

2015-06-02 Thread PEPONNET, Cyril N (Cyril)
Sure,

https://dl.dropboxusercontent.com/u/2663552/logs.tgz

Yesterday I restart the geo-rep (and reset the changelog.changelog option). 
Today it looks converged and changelog keeps doing his job.

BUT

hybridcrawl doesn’t seem to update symlink links if they changed on master:

From master:

ll -n /usr/global/images/3.2/latest
lrwxrwxrwx 1 499 499 3 Jun  1 21:40 /usr/global/images/3.2/latest - S22

On slave:

ls /usr/global/images/3.2/latest
lrwxrwxrwx 1 root root 2 May  9 07:01 /usr/global/images/3.2/latest - S3

The point is I can’t get the gfid from the symlink because it resolve the 
target folder.

And by the way all data synced in hybrid crawl are root.root on the slave (they 
should keep the owner from the master as it also exist on the slave).

So.

1/I will need to remove symlinks from the salve and retrigger an hybrid crawl 
(again)
2/I will need to update permissions of the salve according to permissions on 
master (will be long and difficult)
3/Or I missed something here.

Thanks!


--
Cyril Peponnet

On Jun 1, 2015, at 10:20 PM, Kotresh Hiremath Ravishankar 
khire...@redhat.commailto:khire...@redhat.com wrote:

Hi Cyril,

Could you please attach the geo-replication logs?

Thanks and Regards,
Kotresh H R

- Original Message -
From: Cyril N PEPONNET (Cyril) 
cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.com
To: Kotresh Hiremath Ravishankar 
khire...@redhat.commailto:khire...@redhat.com
Cc: gluster-users 
gluster-users@gluster.orgmailto:gluster-users@gluster.org
Sent: Monday, June 1, 2015 10:34:42 PM
Subject: Re: [Gluster-users] Geo-Replication - Changelog socket is not present 
- Falling back to xsync

Some news,

Looks like changelog is not working anymore. When I touch a file in master it
doesnt propagate to slave…

.processing folder contain a thousand of changelog not processed.

I had to stop the geo-rep, reset changelog.changelog to the volume and
restart the geo-rep. It’s now sending missing files using hybrid crawl.

So geo-repo is not working as expected.

Another thing, we use symlink to point to latest release build, and it seems
that symlinks are not synced when they change from master to slave.

Any idea on how I can debug this ?

--
Cyril Peponnet

On May 29, 2015, at 3:01 AM, Kotresh Hiremath Ravishankar
khire...@redhat.commailto:khire...@redhat.commailto:khire...@redhat.com 
wrote:

Yes, geo-rep internally uses fuse mount.
I will explore further and get back to you
if there is a way.

Thanks and Regards,
Kotresh H R

- Original Message -
From: Cyril N PEPONNET (Cyril)
cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.com
To: Kotresh Hiremath Ravishankar
khire...@redhat.commailto:khire...@redhat.commailto:khire...@redhat.com
Cc: gluster-users
gluster-users@gluster.orgmailto:gluster-users@gluster.orgmailto:gluster-users@gluster.org
Sent: Thursday, May 28, 2015 10:12:57 PM
Subject: Re: [Gluster-users] Geo-Replication - Changelog socket is not
present - Falling back to xsync

One more thing:

nfs.volume-access read-only works only for nfs clients, glusterfs client have
still write access

features.read-only on need a vol restart and set RO for everyone but in this
case, geo-rep goes faulty.

[2015-05-28 09:42:27.917897] E [repce(/export/raid/usr_global):188:__call__]
RepceClient: call 8739:139858642609920:1432831347.73 (keep_alive) failed on
peer with OSError
[2015-05-28 09:42:27.918102] E
[syncdutils(/export/raid/usr_global):240:log_raise_exception] top: FAIL:
Traceback (most recent call last):
File /usr/libexec/glusterfs/python/syncdaemon/syncdutils.py, line 266, in
twrap
  tf(*aa)
File /usr/libexec/glusterfs/python/syncdaemon/master.py, line 391, in
keep_alive
  cls.slave.server.keep_alive(vi)
File /usr/libexec/glusterfs/python/syncdaemon/repce.py, line 204, in
__call__
  return self.ins(self.meth, *a)
File /usr/libexec/glusterfs/python/syncdaemon/repce.py, line 189, in
__call__
  raise res
OSError: [Errno 30] Read-

So there is no proper way to protect the salve against write.

--
Cyril Peponnet

On May 28, 2015, at 8:54 AM, Cyril Peponnet
cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.com
wrote:

Hi Kotresh,

Inline.

Again, thank for you time.

--
Cyril Peponnet

On May 27, 2015, at 10:47 PM, Kotresh Hiremath Ravishankar
khire...@redhat.commailto:khire...@redhat.commailto:khire...@redhat.commailto:khire...@redhat.com
wrote:

Hi Cyril,

Replies inline.

Thanks and Regards,
Kotresh H R

- Original Message -
From: Cyril N PEPONNET (Cyril)
cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.com
To: Kotresh Hiremath Ravishankar
khire...@redhat.commailto:khire...@redhat.commailto:khire...@redhat.commailto:khire...@redhat.com
Cc: gluster-users

Re: [Gluster-users] 答复: 答复: Gluster peer rejected and failed to start

2015-06-02 Thread Atin Mukherjee


On 06/02/2015 12:04 PM, vyyy杨雨阳 wrote:
 Glusterfs05~glusterfs10 are clustered for 2 years, recently upgrade to 3.6.3
 Glusterfs11~glusterfs14 are new nodes need to join the cluster
 
 On glusterfs09:
 
 [root@SH02SVR5952 ~]# gluster peer status
 Number of Peers: 6
 
 Hostname: glusterfs06.sh2.ctripcorp.com
 Uuid: 2cb15023-28b0-4d0d-8a43-b8c6e570776f
 State: Peer in Cluster (Connected)
 
 Hostname: glusterfs07.sh2.ctripcorp.com
 Uuid: 5357c40d-7e34-41f0-a96b-9aa76e52ad23
 State: Peer in Cluster (Connected)
 
 Hostname: glusterfs08.sh2.ctripcorp.com
 Uuid: 83e1a9db-3134-45e4-acd2-387b12b5b207
 State: Peer in Cluster (Connected)
 
 Hostname: 10.8.230.209
 Uuid: 04f22ee8-8e00-4c32-a924-b40a0e413aa6
 State: Peer in Cluster (Connected)
 
 Hostname: glusterfs10.sh2.ctripcorp.com
 Uuid: ea17d7f9-d737-4472-ab9a-feed3cfac57c
 State: Peer in Cluster (Disconnected)
 
 Hostname: glusterfs11.sh2.ctripcorp.com
 Uuid: 2d703550-92b5-4f5e-af90-ff2fbf3366f0
 State: Peer Rejected (Connected)
 [root@SH02SVR5952 ~]#
Can you attach glusterd log files for 10  11?
 
 [root@SH02SVR5952 ~]# gluster volume status
 Status of volume: JQStore2
 Gluster process   PortOnline  
 Pid
 --
 Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick 49152   Y   2782
 Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick 49152   Y   2744
 Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick 49152   Y   5307
 Brick glusterfs09.sh2.ctripcorp.com:/export/sdb/brick 49152   Y   3986
 NFS Server on localhost   2049Y   
 51697
 Self-heal Daemon on localhost N/A Y   51710
 NFS Server on glusterfs07.sh2.ctripcorp.com   2049Y   110894
 Self-heal Daemon on glusterfs07.sh2.ctripcorp.com N/A Y   110905
 NFS Server on glusterfs06.sh2.ctripcorp.com   2049Y   22185
 Self-heal Daemon on glusterfs06.sh2.ctripcorp.com N/A Y   22192
 NFS Server on 10.8.230.2092049Y   4091
 Self-heal Daemon on 10.8.230.209  N/A Y   4104
  
 Task Status of Volume JQStore2
 --
 There are no active volume tasks
  
 Status of volume: Webresource
 Gluster process   PortOnline  
 Pid
 --
 Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick349155   Y   
 2787
 Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick349155   Y   
 2753
 Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick349155   Y   
 5313
 Brick glusterfs09.sh2.ctripcorp.com:/export/sdb/brick349155   Y   
 3992
 NFS Server on localhost   2049Y   
 51697
 Self-heal Daemon on localhost N/A Y   51710
 NFS Server on 10.8.230.2092049Y   4091
 Self-heal Daemon on 10.8.230.209  N/A Y   4104
 NFS Server on glusterfs06.sh2.ctripcorp.com   2049Y   22185
 Self-heal Daemon on glusterfs06.sh2.ctripcorp.com N/A Y   22192
 NFS Server on glusterfs07.sh2.ctripcorp.com   2049Y   110894
 Self-heal Daemon on glusterfs07.sh2.ctripcorp.com N/A Y   110905
  
 Task Status of Volume Webresource
 --
 There are no active volume tasks
  
 Status of volume: ccim
 Gluster process   PortOnline  
 Pid
 --
 Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick249154   Y   
 2793
 Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick249154   Y   
 2745
 Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick249154   Y   
 5320
 Brick glusterfs09.sh2.ctripcorp.com:/export/sdb/brick249154   Y   
 3999
 NFS Server on localhost   2049Y   
 51697
 Self-heal Daemon on localhost N/A Y   51710
 NFS Server on glusterfs06.sh2.ctripcorp.com   2049Y   22185
 Self-heal Daemon on glusterfs06.sh2.ctripcorp.com N/A Y   22192
 NFS Server on glusterfs07.sh2.ctripcorp.com   2049Y   110894
 Self-heal Daemon on glusterfs07.sh2.ctripcorp.com N/A Y   110905
 NFS Server on 10.8.230.2092049Y   4091
 Self-heal Daemon on 10.8.230.209  N/A Y   4104
  
 Task Status of Volume ccim
 --
 There are no active volume 

Re: [Gluster-users] 答复: Gluster peer rejected and failed to start

2015-06-02 Thread Atin Mukherjee


On 06/02/2015 11:33 AM, vyyy杨雨阳 wrote:
 Actually I have 2 problems
 1、New nodes can't add to the clusters
   I cleaned /var/lib/glusterd, now status is 
   State: Accepted peer request (Connected)
 
 2、One of clusterd nodes shown 'Peer rejected' and glusterd failed to start.
   The log is attached in pre-mail
   This is a product cluster, this problem is more egent
From the log I can clearly see the problematic node is
glusterfs09.sh2.ctripcorp.com. Your existing cluster configuration has
bricks hosted in glusterfs09.sh2.ctripcorp.com however the same is not
part of the cluster. Could you paste the output of gluster peer status
and gluster volume status ?
   
 
 
 Best Regards 
 Yuyang Yang
 
 -邮件原件-
 发件人: Atin Mukherjee [mailto:amukh...@redhat.com] 
 发送时间: Tuesday, June 02, 2015 12:52 PM
 收件人: vyyy杨雨阳; Gluster-users@gluster.org
 主题: Re: [Gluster-users] Gluster peer rejected and failed to start
 
 
 
 On 06/02/2015 10:00 AM, vyyy杨雨阳 wrote:
 Hi

 We have a gluster (Version 3.6.3) cluster with 6 nodes, I tried to add 4 
 more nodes, but ‘Peer Rejected’, then I tried to resolve it by dump 
 /var/lib/glusterd and probe again, not success, this is a question, But 
 strange thing is:

 A node already in cluster also shown “Peer Reject”

 I tried to restart glusterd, It failed

 I found that /var/lib/glusterd/peers is empty, I copied the files from other 
 nodes, still can’t start glusterd
 It seems like you are trying to peer probe nodes which are either either
 part of some other clusters (uncleaned nodes). Could you check whether
 the nodes which you are adding have empty /var/lib/glusterd? If not
 clean them and retry.
 
 ~Atin



 etc-glusterfs-glusterd.vol.log shown that cluster member as “unknown peer ”

 [2015-06-02 01:52:14.650635] C 
 [glusterd-handler.c:2369:__glusterd_handle_friend_update] 0-: Received 
 friend update request from unknown peer 04f22ee8-8e00-4c32-a924-b40a0e413aa6
 [2015-06-02 01:52:14.650786] C 
 [glusterd-handler.c:2369:__glusterd_handle_friend_update] 0-: Received 
 friend update request from unknown peer 674a78b5-0590-48d4-8752-d4608832ed1d
 [2015-06-02 01:52:14.657881] C 
 [glusterd-handler.c:2369:__glusterd_handle_friend_update] 0-: Received 
 friend update request from unknown peer 83e1a9db-3134-45e4-acd2-387b12b5b207
 [2015-06-02 01:52:17.747865] W 
 [glusterd-handler.c:697:__glusterd_handle_cluster_lock] 0-management: 
 04f22ee8-8e00-4c32-a924-b40a0e413aa6 doesn't belong to the cluster. Ignoring 
 request.
 [2015-06-02 01:52:17.747908] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 
 0-rpcsvc: rpc actor failed to complete successfully
 [2015-06-02 01:52:40.338885] W 
 [glusterd-handler.c:697:__glusterd_handle_cluster_lock] 0-management: 
 674a78b5-0590-48d4-8752-d4608832ed1d doesn't belong to the cluster. Ignoring 
 request.
 [2015-06-02 01:52:40.338929] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 
 0-rpcsvc: rpc actor failed to complete successfully
 [2015-06-02 01:52:41.310451] W 
 [glusterd-handler.c:697:__glusterd_handle_cluster_lock] 0-management: 
 674a78b5-0590-48d4-8752-d4608832ed1d doesn't belong to the cluster. Ignoring 
 request.
 [2015-06-02 01:52:41.310486] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 
 0-rpcsvc: rpc actor failed to complete successfully



 Debug info is as following,



 /usr/sbin/glusterd
 [root@SH02SVR5951 peers]# /usr/sbin/glusterd --debug
 [2015-06-02 04:09:24.626690] I [MSGID: 100030] [glusterfsd.c:2018:main] 
 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.6.3 
 (args: /usr/sbin/glusterd --debug)
 [2015-06-02 04:09:24.626739] D [logging.c:1763:__gf_log_inject_timer_event] 
 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
 [2015-06-02 04:09:24.627052] D [MSGID: 0] [glusterfsd.c:613:get_volfp] 
 0-glusterfsd: loading volume file /etc/glusterfs/glusterd.vol
 [2015-06-02 04:09:24.629683] I [glusterd.c:1214:init] 0-management: Maximum 
 allowed open file descriptors set to 65536
 [2015-06-02 04:09:24.629706] I [glusterd.c:1259:init] 0-management: Using 
 /var/lib/glusterd as working directory
 [2015-06-02 04:09:24.629764] D 
 [glusterd.c:391:glusterd_rpcsvc_options_build] 0-: listen-backlog value: 128
 [2015-06-02 04:09:24.629895] D [rpcsvc.c:2198:rpcsvc_init] 0-rpc-service: 
 RPC service inited.
 [2015-06-02 04:09:24.629904] D [rpcsvc.c:1801:rpcsvc_program_register] 
 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, 
 Port: 0
 [2015-06-02 04:09:24.629930] D [rpc-transport.c:262:rpc_transport_load] 
 0-rpc-transport: attempt to load file 
 /usr/lib64/glusterfs/3.6.3/rpc-transport/socket.so
 [2015-06-02 04:09:24.631989] D [socket.c:3807:socket_init] 
 0-socket.management: SSL support on the I/O path is NOT enabled
 [2015-06-02 04:09:24.632005] D [socket.c:3810:socket_init] 
 0-socket.management: SSL support for glusterd is NOT enabled
 [2015-06-02 04:09:24.632013] D [socket.c:3827:socket_init] 
 0-socket.management: using system polling thread
 

[Gluster-users] 答复: Gluster peer rejected and failed to start

2015-06-02 Thread vyyy杨雨阳
Actually I have 2 problems
1、New nodes can't add to the clusters
I cleaned /var/lib/glusterd, now status is 
State: Accepted peer request (Connected)

2、One of clusterd nodes shown 'Peer rejected' and glusterd failed to start.
The log is attached in pre-mail
This is a product cluster, this problem is more egent



Best Regards 
Yuyang Yang

-邮件原件-
发件人: Atin Mukherjee [mailto:amukh...@redhat.com] 
发送时间: Tuesday, June 02, 2015 12:52 PM
收件人: vyyy杨雨阳; Gluster-users@gluster.org
主题: Re: [Gluster-users] Gluster peer rejected and failed to start



On 06/02/2015 10:00 AM, vyyy杨雨阳 wrote:
 Hi
 
 We have a gluster (Version 3.6.3) cluster with 6 nodes, I tried to add 4 more 
 nodes, but ‘Peer Rejected’, then I tried to resolve it by dump 
 /var/lib/glusterd and probe again, not success, this is a question, But 
 strange thing is:
 
 A node already in cluster also shown “Peer Reject”
 
 I tried to restart glusterd, It failed
 
 I found that /var/lib/glusterd/peers is empty, I copied the files from other 
 nodes, still can’t start glusterd
It seems like you are trying to peer probe nodes which are either either
part of some other clusters (uncleaned nodes). Could you check whether
the nodes which you are adding have empty /var/lib/glusterd? If not
clean them and retry.

~Atin
 
 
 
 etc-glusterfs-glusterd.vol.log shown that cluster member as “unknown peer ”
 
 [2015-06-02 01:52:14.650635] C 
 [glusterd-handler.c:2369:__glusterd_handle_friend_update] 0-: Received friend 
 update request from unknown peer 04f22ee8-8e00-4c32-a924-b40a0e413aa6
 [2015-06-02 01:52:14.650786] C 
 [glusterd-handler.c:2369:__glusterd_handle_friend_update] 0-: Received friend 
 update request from unknown peer 674a78b5-0590-48d4-8752-d4608832ed1d
 [2015-06-02 01:52:14.657881] C 
 [glusterd-handler.c:2369:__glusterd_handle_friend_update] 0-: Received friend 
 update request from unknown peer 83e1a9db-3134-45e4-acd2-387b12b5b207
 [2015-06-02 01:52:17.747865] W 
 [glusterd-handler.c:697:__glusterd_handle_cluster_lock] 0-management: 
 04f22ee8-8e00-4c32-a924-b40a0e413aa6 doesn't belong to the cluster. Ignoring 
 request.
 [2015-06-02 01:52:17.747908] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 
 0-rpcsvc: rpc actor failed to complete successfully
 [2015-06-02 01:52:40.338885] W 
 [glusterd-handler.c:697:__glusterd_handle_cluster_lock] 0-management: 
 674a78b5-0590-48d4-8752-d4608832ed1d doesn't belong to the cluster. Ignoring 
 request.
 [2015-06-02 01:52:40.338929] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 
 0-rpcsvc: rpc actor failed to complete successfully
 [2015-06-02 01:52:41.310451] W 
 [glusterd-handler.c:697:__glusterd_handle_cluster_lock] 0-management: 
 674a78b5-0590-48d4-8752-d4608832ed1d doesn't belong to the cluster. Ignoring 
 request.
 [2015-06-02 01:52:41.310486] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 
 0-rpcsvc: rpc actor failed to complete successfully
 
 
 
 Debug info is as following,
 
 
 
 /usr/sbin/glusterd
 [root@SH02SVR5951 peers]# /usr/sbin/glusterd --debug
 [2015-06-02 04:09:24.626690] I [MSGID: 100030] [glusterfsd.c:2018:main] 
 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.6.3 (args: 
 /usr/sbin/glusterd --debug)
 [2015-06-02 04:09:24.626739] D [logging.c:1763:__gf_log_inject_timer_event] 
 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
 [2015-06-02 04:09:24.627052] D [MSGID: 0] [glusterfsd.c:613:get_volfp] 
 0-glusterfsd: loading volume file /etc/glusterfs/glusterd.vol
 [2015-06-02 04:09:24.629683] I [glusterd.c:1214:init] 0-management: Maximum 
 allowed open file descriptors set to 65536
 [2015-06-02 04:09:24.629706] I [glusterd.c:1259:init] 0-management: Using 
 /var/lib/glusterd as working directory
 [2015-06-02 04:09:24.629764] D [glusterd.c:391:glusterd_rpcsvc_options_build] 
 0-: listen-backlog value: 128
 [2015-06-02 04:09:24.629895] D [rpcsvc.c:2198:rpcsvc_init] 0-rpc-service: RPC 
 service inited.
 [2015-06-02 04:09:24.629904] D [rpcsvc.c:1801:rpcsvc_program_register] 
 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port:  0
 [2015-06-02 04:09:24.629930] D [rpc-transport.c:262:rpc_transport_load] 
 0-rpc-transport: attempt to load file 
 /usr/lib64/glusterfs/3.6.3/rpc-transport/socket.so
 [2015-06-02 04:09:24.631989] D [socket.c:3807:socket_init] 
 0-socket.management: SSL support on the I/O path is NOT enabled
 [2015-06-02 04:09:24.632005] D [socket.c:3810:socket_init] 
 0-socket.management: SSL support for glusterd is NOT enabled
 [2015-06-02 04:09:24.632013] D [socket.c:3827:socket_init] 
 0-socket.management: using system polling thread
 [2015-06-02 04:09:24.632024] D [name.c:550:server_fill_address_family] 
 0-socket.management: option address-family not specified, defaulting to inet
 [2015-06-02 04:09:24.632072] D [rpc-transport.c:262:rpc_transport_load] 
 0-rpc-transport: attempt to load file 
 /usr/lib64/glusterfs/3.6.3/rpc-transport/rdma.so
 [2015-06-02 04:09:24.632102] 

[Gluster-users] 答复: 答复: Gluster peer rejected and failed to start

2015-06-02 Thread vyyy杨雨阳
Glusterfs05~glusterfs10 are clustered for 2 years, recently upgrade to 3.6.3
Glusterfs11~glusterfs14 are new nodes need to join the cluster

On glusterfs09:

[root@SH02SVR5952 ~]# gluster peer status
Number of Peers: 6

Hostname: glusterfs06.sh2.ctripcorp.com
Uuid: 2cb15023-28b0-4d0d-8a43-b8c6e570776f
State: Peer in Cluster (Connected)

Hostname: glusterfs07.sh2.ctripcorp.com
Uuid: 5357c40d-7e34-41f0-a96b-9aa76e52ad23
State: Peer in Cluster (Connected)

Hostname: glusterfs08.sh2.ctripcorp.com
Uuid: 83e1a9db-3134-45e4-acd2-387b12b5b207
State: Peer in Cluster (Connected)

Hostname: 10.8.230.209
Uuid: 04f22ee8-8e00-4c32-a924-b40a0e413aa6
State: Peer in Cluster (Connected)

Hostname: glusterfs10.sh2.ctripcorp.com
Uuid: ea17d7f9-d737-4472-ab9a-feed3cfac57c
State: Peer in Cluster (Disconnected)

Hostname: glusterfs11.sh2.ctripcorp.com
Uuid: 2d703550-92b5-4f5e-af90-ff2fbf3366f0
State: Peer Rejected (Connected)
[root@SH02SVR5952 ~]#

[root@SH02SVR5952 ~]# gluster volume status
Status of volume: JQStore2
Gluster process PortOnline  Pid
--
Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick   49152   Y   2782
Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick   49152   Y   2744
Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick   49152   Y   5307
Brick glusterfs09.sh2.ctripcorp.com:/export/sdb/brick   49152   Y   3986
NFS Server on localhost 2049Y   51697
Self-heal Daemon on localhost   N/A Y   51710
NFS Server on glusterfs07.sh2.ctripcorp.com 2049Y   110894
Self-heal Daemon on glusterfs07.sh2.ctripcorp.com   N/A Y   110905
NFS Server on glusterfs06.sh2.ctripcorp.com 2049Y   22185
Self-heal Daemon on glusterfs06.sh2.ctripcorp.com   N/A Y   22192
NFS Server on 10.8.230.209  2049Y   4091
Self-heal Daemon on 10.8.230.209N/A Y   4104
 
Task Status of Volume JQStore2
--
There are no active volume tasks
 
Status of volume: Webresource
Gluster process PortOnline  Pid
--
Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick3  49155   Y   2787
Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick3  49155   Y   2753
Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick3  49155   Y   5313
Brick glusterfs09.sh2.ctripcorp.com:/export/sdb/brick3  49155   Y   3992
NFS Server on localhost 2049Y   51697
Self-heal Daemon on localhost   N/A Y   51710
NFS Server on 10.8.230.209  2049Y   4091
Self-heal Daemon on 10.8.230.209N/A Y   4104
NFS Server on glusterfs06.sh2.ctripcorp.com 2049Y   22185
Self-heal Daemon on glusterfs06.sh2.ctripcorp.com   N/A Y   22192
NFS Server on glusterfs07.sh2.ctripcorp.com 2049Y   110894
Self-heal Daemon on glusterfs07.sh2.ctripcorp.com   N/A Y   110905
 
Task Status of Volume Webresource
--
There are no active volume tasks
 
Status of volume: ccim
Gluster process PortOnline  Pid
--
Brick glusterfs05.sh2.ctripcorp.com:/export/sdb/brick2  49154   Y   2793
Brick glusterfs06.sh2.ctripcorp.com:/export/sdb/brick2  49154   Y   2745
Brick glusterfs07.sh2.ctripcorp.com:/export/sdb/brick2  49154   Y   5320
Brick glusterfs09.sh2.ctripcorp.com:/export/sdb/brick2  49154   Y   3999
NFS Server on localhost 2049Y   51697
Self-heal Daemon on localhost   N/A Y   51710
NFS Server on glusterfs06.sh2.ctripcorp.com 2049Y   22185
Self-heal Daemon on glusterfs06.sh2.ctripcorp.com   N/A Y   22192
NFS Server on glusterfs07.sh2.ctripcorp.com 2049Y   110894
Self-heal Daemon on glusterfs07.sh2.ctripcorp.com   N/A Y   110905
NFS Server on 10.8.230.209  2049Y   4091
Self-heal Daemon on 10.8.230.209N/A Y   4104
 
Task Status of Volume ccim
--
There are no active volume tasks
 
Status of volume: cloudimage
Gluster process PortOnline  Pid
--
Brick