[Gluster-users] Gluster 2.6 and infiniband

2012-06-07 Thread bxma...@gmail.com
Hello,

i have a problem with gluster 3.2.6 and infiniband. With gluster 3.3
its working ok but with 3.2.6 i have following problems:

when i'm trying to mount rdma volume using command mount -t glusterfs
192.168.100.1:/atlas1.rdma mount  i get:

[2012-06-07 04:30:18.894337] I [glusterfsd.c:1493:main]
0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs
version 3.2.6
[2012-06-07 04:30:18.907499] E
[glusterfsd-mgmt.c:628:mgmt_getspec_cbk] 0-glusterfs: failed to get
the 'volume file' from server
[2012-06-07 04:30:18.907592] E
[glusterfsd-mgmt.c:695:mgmt_getspec_cbk] 0-mgmt: failed to fetch
volume file (key:/atlas1.rdma)
[2012-06-07 04:30:18.907995] W [glusterfsd.c:727:cleanup_and_exit]
(--/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0xc9)
[0x7f784e2c8bc9] (--/usr/local/
lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f784e2c8975]
(--/usr/local/sbin/glusterfs(mgmt_getspec_cbk+0x28b) [0x40861b])))
0-: received signum (0)
, shutting down
[2012-06-07 04:30:18.908049] I [fuse-bridge.c:3727:fini] 0-fuse:
Unmounting 'mount'.

Same command without .rdma works ok.

thanks

Matus
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster 2.6 and infiniband

2012-06-07 Thread Amar Tumballi

On 06/07/2012 02:04 PM, bxma...@gmail.com wrote:

Hello,

i have a problem with gluster 3.2.6 and infiniband. With gluster 3.3
its working ok but with 3.2.6 i have following problems:

when i'm trying to mount rdma volume using command mount -t glusterfs
192.168.100.1:/atlas1.rdma mount  i get:

[2012-06-07 04:30:18.894337] I [glusterfsd.c:1493:main]
0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs
version 3.2.6
[2012-06-07 04:30:18.907499] E
[glusterfsd-mgmt.c:628:mgmt_getspec_cbk] 0-glusterfs: failed to get
the 'volume file' from server
[2012-06-07 04:30:18.907592] E
[glusterfsd-mgmt.c:695:mgmt_getspec_cbk] 0-mgmt: failed to fetch
volume file (key:/atlas1.rdma)
[2012-06-07 04:30:18.907995] W [glusterfsd.c:727:cleanup_and_exit]
(--/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0xc9)
[0x7f784e2c8bc9] (--/usr/local/
lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f784e2c8975]
(--/usr/local/sbin/glusterfs(mgmt_getspec_cbk+0x28b) [0x40861b])))
0-: received signum (0)
, shutting down
[2012-06-07 04:30:18.908049] I [fuse-bridge.c:3727:fini] 0-fuse:
Unmounting 'mount'.

Same command without .rdma works ok.



Is the volume's transport type only 'rdma' ? or 'tcp,rdma' ? If its only 
'rdma', then appending .rdma to volume name is not required. The 
appending of .rdma is only required when there are both type of 
transports on a volume (ie, 'tcp,rdma'), as from the client you can 
decide which transport you want to mount.


default volume name would point to 'tcp' transport type, and appending 
.rdma, will point to rdma transport type.


Hope that is clear now.

Regards,
Amar
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster 2.6 and infiniband

2012-06-07 Thread bxma...@gmail.com
Hello,

at first it was tcp then tcp,rdma.

You are right that without tcp definition .rdma is not working. But
now i have another problem.
I'm trying tcp / rdma, im trying even tcp/rdma using normal network
card ( not using infiniband IP but normal 1gbit network
card and i have still same speed, upload about 30mb/s and download
about 200mb/s .. so i'm not sure if rdma is even working.

Native infiniband is giving me 3500mb/s speed with benchmark tests
(ib_rdma_bw ).

thanks

Matus

2012/6/7 Amar Tumballi ama...@redhat.com:
 On 06/07/2012 02:04 PM, bxma...@gmail.com wrote:

 Hello,

 i have a problem with gluster 3.2.6 and infiniband. With gluster 3.3
 its working ok but with 3.2.6 i have following problems:

 when i'm trying to mount rdma volume using command mount -t glusterfs
 192.168.100.1:/atlas1.rdma mount  i get:

 [2012-06-07 04:30:18.894337] I [glusterfsd.c:1493:main]
 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs
 version 3.2.6
 [2012-06-07 04:30:18.907499] E
 [glusterfsd-mgmt.c:628:mgmt_getspec_cbk] 0-glusterfs: failed to get
 the 'volume file' from server
 [2012-06-07 04:30:18.907592] E
 [glusterfsd-mgmt.c:695:mgmt_getspec_cbk] 0-mgmt: failed to fetch
 volume file (key:/atlas1.rdma)
 [2012-06-07 04:30:18.907995] W [glusterfsd.c:727:cleanup_and_exit]
 (--/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0xc9)
 [0x7f784e2c8bc9] (--/usr/local/
 lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f784e2c8975]
 (--/usr/local/sbin/glusterfs(mgmt_getspec_cbk+0x28b) [0x40861b])))
 0-: received signum (0)
 , shutting down
 [2012-06-07 04:30:18.908049] I [fuse-bridge.c:3727:fini] 0-fuse:
 Unmounting 'mount'.

 Same command without .rdma works ok.


 Is the volume's transport type only 'rdma' ? or 'tcp,rdma' ? If its only
 'rdma', then appending .rdma to volume name is not required. The appending
 of .rdma is only required when there are both type of transports on a
 volume (ie, 'tcp,rdma'), as from the client you can decide which transport
 you want to mount.

 default volume name would point to 'tcp' transport type, and appending
 .rdma, will point to rdma transport type.

 Hope that is clear now.

 Regards,
 Amar
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-07 Thread Fernando Frediani (Qube)
Hi,
Sorry this reply won't be of any help to your problem, but I am too curious to 
understand how it can be even slower if monting using Gluster client which I 
would expect always be quicker than NFS or anything else.
If you find the reason port it back to the list and share with us please. I 
think this directory index issues has been reported already for systems with 
many files.

Regards,

Fernando

From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of olav johansen
Sent: 07 June 2012 03:32
To: gluster-users@gluster.org
Subject: [Gluster-users] Performance optimization tips Gluster 3.3? (small 
files / directory listings)

Hi,

I'm using Gluster 3.3.0-1.el6.x86_64, on two storage nodes, replicated mode 
(fs1, fs2)
Node specs: CentOS 6.2 Intel Quad Core 2.8Ghz, 4Gb ram, 3ware raid, 2x500GB 
sata 7200rpm (RAID1 for os), 6x1TB sata 7200rpm (RAID10 for /data), 1Gbit 
network

I've it mounted data partition to web1 a Dual Quad 2.8Ghz, 8Gb ram, using 
glusterfs. (also tried NFS - Gluster mount)

We have 50Gb of files, ~800'000 files in 3 levels of directories (max 2000 
directories in one folder)

My main problem is speed of directory indexes ls -alR  on the gluster mount 
takes 23 minutes every time.

It don't seem like any directory listing information cache, with regular NFS 
(not gluster) between web1-fs1, this takes 6m13s first time, and 5m13s there 
after.

Gluster mount is 4+ times slower for directory indexing performance vs pure NFS 
to single server, is this as expected?
I understand there is a lot more calls involved checking both nodes but I'm 
just looking for a reality check regarding this.

Any suggestions of how I can speed this up?

Thanks,

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-07 Thread Brian Candler
On Thu, Jun 07, 2012 at 10:10:03AM +, Fernando Frediani (Qube) wrote:
Sorry this reply won’t be of any help to your problem, but I am too
curious to understand how it can be even slower if monting using
Gluster client which I would expect always be quicker than NFS or
anything else.

(1) Try it with ls -aR or find . instead of ls -alR

(2) Try it on a gluster non-replicated volume (for fair comparison with
direct NFS access)

With a replicated volume, many accesses involve sending queries to *both*
servers to check they are in sync - even read accesses.  This in turn can
cause disk seeks on both machines, so the latency you'll get is the larger
of the two.  If you are doing lots of accesses sequentially then the
latencies will all add up.

A stat() is one of those accesses which touches both machines, and ls -l
forces a stat() of each file found.

In fact, a quick test suggests ls -l does stat, lstat, getxattr and
lgetxattr:

$ ls -laR . /dev/null 2ert; cut -f1 -d'(' ert | sort | uniq -c
 13 access
  1 arch_prctl
  5 brk
395 close
  4 connect
  1 execve
  1 exit_group
  2 fcntl
391 fstat
  3 futex
702 getdents
  1 getrlimit
   1719 getxattr
  3 ioctl
   1721 lgetxattr
  9 lseek
   1721 lstat
 58 mmap
 24 mprotect
 12 munmap
424 open
 19 read
  2 readlink
  2 rt_sigaction
  1 rt_sigprocmask
  1 set_robust_list
  1 set_tid_address
  4 socket
   1719 stat
  1 statfs
 29 write

Looking at the detail in the strace output, I see these are actually

lstat(target-file, ...)
lgetxattr(target-file, security.selinux, ...)
getxattr(target-file, system.posix_acl_access, ...)
stat(/etc/localtime, ...)

Compare without -l:

$ strace ls -aR . /dev/null 2ert; cut -f1 -d'(' ert | sort | uniq -c
  9 access
  1 arch_prctl
  4 brk
377 close
  1 execve
  1 exit_group
  1 fcntl
376 fstat
  3 futex
702 getdents
  1 getrlimit
  3 ioctl
 39 mmap
 16 mprotect
  4 munmap
388 open
 11 read
  2 rt_sigaction
  1 rt_sigprocmask
  1 set_robust_list
  1 set_tid_address
  1 stat
  1 statfs
  9 write

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-07 Thread Christian Meisinger
Hello there.


That's really interesting, because we think about using GlusterFS too with a
similar setup/scenario.

I read about a really strange setup with GlusterFS native client mount on
the web servers and NFS mount on top of that so you get GlusterFS failover +
NFS caching.
Can't find the link right now.


- Original Message -
From: olav johansen luxis2...@gmail.com
To: gluster-users@gluster.org
Sent: Thursday, June 7, 2012 8:02:14 AM
Subject: [Gluster-users] Performance optimization tips Gluster 3.3? (small
files / directory listings)


Hi, 

I'm using Gluster 3.3.0-1.el6.x86_64, on two storage nodes, replicated mode
(fs1, fs2) Node specs: CentOS 6.2 Intel Quad Core 2.8Ghz, 4Gb ram, 3ware
raid, 2x500GB sata 7200rpm (RAID1 for os), 6x1TB sata 7200rpm (RAID10 for
/data), 1Gbit network 

I've it mounted data partition to web1 a Dual Quad 2.8Ghz, 8Gb ram, using
glusterfs. (also tried NFS - Gluster mount) 

We have 50Gb of files, ~800'000 files in 3 levels of directories (max 2000
directories in one folder) 

My main problem is speed of directory indexes ls -alR on the gluster mount
takes 23 minutes every time. 

It don't seem like any directory listing information cache, with regular NFS
(not gluster) between web1-fs1, this takes 6m13s first time, and 5m13s
there after. 

Gluster mount is 4+ times slower for directory indexing performance vs pure
NFS to single server, is this as expected? 
I understand there is a lot more calls involved checking both nodes but I'm
just looking for a reality check regarding this. 

Any suggestions of how I can speed this up? 

Thanks, 



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-07 Thread Gerald Brandt
Here's the link:

http://community.gluster.org/a/nfs-performance-with-fuse-client-redundancy/

Sent again with a reply to all.

Gerald


- Original Message -
 From: Christian Meisinger em_got...@gmx.net
 To: olav johansen luxis2...@gmail.com
 Cc: gluster-users@gluster.org
 Sent: Thursday, June 7, 2012 7:00:14 AM
 Subject: Re: [Gluster-users] Performance optimization tips Gluster3.3?
 (small  files / directory listings)
 
 Hello there.
 
 
 That's really interesting, because we think about using GlusterFS too
 with a
 similar setup/scenario.
 
 I read about a really strange setup with GlusterFS native client
 mount on
 the web servers and NFS mount on top of that so you get GlusterFS
 failover +
 NFS caching.
 Can't find the link right now.
 
 
 - Original Message -
 From: olav johansen luxis2...@gmail.com
 To: gluster-users@gluster.org
 Sent: Thursday, June 7, 2012 8:02:14 AM
 Subject: [Gluster-users] Performance optimization tips Gluster 3.3?
 (small
 files / directory listings)
 
 
 Hi,
 
 I'm using Gluster 3.3.0-1.el6.x86_64, on two storage nodes,
 replicated mode
 (fs1, fs2) Node specs: CentOS 6.2 Intel Quad Core 2.8Ghz, 4Gb ram,
 3ware
 raid, 2x500GB sata 7200rpm (RAID1 for os), 6x1TB sata 7200rpm (RAID10
 for
 /data), 1Gbit network
 
 I've it mounted data partition to web1 a Dual Quad 2.8Ghz, 8Gb ram,
 using
 glusterfs. (also tried NFS - Gluster mount)
 
 We have 50Gb of files, ~800'000 files in 3 levels of directories (max
 2000
 directories in one folder)
 
 My main problem is speed of directory indexes ls -alR on the
 gluster mount
 takes 23 minutes every time.
 
 It don't seem like any directory listing information cache, with
 regular NFS
 (not gluster) between web1-fs1, this takes 6m13s first time, and
 5m13s
 there after.
 
 Gluster mount is 4+ times slower for directory indexing performance
 vs pure
 NFS to single server, is this as expected?
 I understand there is a lot more calls involved checking both nodes
 but I'm
 just looking for a reality check regarding this.
 
 Any suggestions of how I can speed this up?
 
 Thanks,
 
 
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster 2.6 and infiniband

2012-06-07 Thread Sabuj Pattanayek
To make a long story short, I made rdma client connect files and
mounted with them directly :

#/etc/glusterd/vols/pirdist/pirdist.rdma-fuse.vol   /pirdist
glusterfs
transport=rdma  0 0
#/etc/glusterd/vols/pirstripe/pirstripe.rdma-fuse.vol   /pirstripe  
glusterfs
transport=rdma  0 0

the transport=rdma does nothing here since it reads the parameters
from .vol files . However you'll see that they're now commented out
since RDMA has been very unstable for us. Servers lose their
connections to each other, which somehow causes gbe clients to lose
their connections.  IP over IB however is working great, although at
the expense of some performance vs RDMA, but it's still much better
than gbe.

On Thu, Jun 7, 2012 at 4:25 AM, bxma...@gmail.com bxma...@gmail.com wrote:
 Hello,

 at first it was tcp then tcp,rdma.

 You are right that without tcp definition .rdma is not working. But
 now i have another problem.
 I'm trying tcp / rdma, im trying even tcp/rdma using normal network
 card ( not using infiniband IP but normal 1gbit network
 card and i have still same speed, upload about 30mb/s and download
 about 200mb/s .. so i'm not sure if rdma is even working.

 Native infiniband is giving me 3500mb/s speed with benchmark tests
 (ib_rdma_bw ).

 thanks

 Matus

 2012/6/7 Amar Tumballi ama...@redhat.com:
 On 06/07/2012 02:04 PM, bxma...@gmail.com wrote:

 Hello,

 i have a problem with gluster 3.2.6 and infiniband. With gluster 3.3
 its working ok but with 3.2.6 i have following problems:

 when i'm trying to mount rdma volume using command mount -t glusterfs
 192.168.100.1:/atlas1.rdma mount  i get:

 [2012-06-07 04:30:18.894337] I [glusterfsd.c:1493:main]
 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs
 version 3.2.6
 [2012-06-07 04:30:18.907499] E
 [glusterfsd-mgmt.c:628:mgmt_getspec_cbk] 0-glusterfs: failed to get
 the 'volume file' from server
 [2012-06-07 04:30:18.907592] E
 [glusterfsd-mgmt.c:695:mgmt_getspec_cbk] 0-mgmt: failed to fetch
 volume file (key:/atlas1.rdma)
 [2012-06-07 04:30:18.907995] W [glusterfsd.c:727:cleanup_and_exit]
 (--/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0xc9)
 [0x7f784e2c8bc9] (--/usr/local/
 lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7f784e2c8975]
 (--/usr/local/sbin/glusterfs(mgmt_getspec_cbk+0x28b) [0x40861b])))
 0-: received signum (0)
 , shutting down
 [2012-06-07 04:30:18.908049] I [fuse-bridge.c:3727:fini] 0-fuse:
 Unmounting 'mount'.

 Same command without .rdma works ok.


 Is the volume's transport type only 'rdma' ? or 'tcp,rdma' ? If its only
 'rdma', then appending .rdma to volume name is not required. The appending
 of .rdma is only required when there are both type of transports on a
 volume (ie, 'tcp,rdma'), as from the client you can decide which transport
 you want to mount.

 default volume name would point to 'tcp' transport type, and appending
 .rdma, will point to rdma transport type.

 Hope that is clear now.

 Regards,
 Amar
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-07 Thread Pranith Kumar Karampuri
Brian,
  Small correction: 'sending queries to *both* servers to check they are in 
sync - even read accesses.' Read fops like stat/getxattr etc are sent to only 
one brick.

Pranith.
- Original Message -
From: Brian Candler b.cand...@pobox.com
To: Fernando Frediani (Qube) fernando.fredi...@qubenet.net
Cc: olav johansen luxis2...@gmail.com, gluster-users@gluster.org 
gluster-users@gluster.org
Sent: Thursday, June 7, 2012 4:24:37 PM
Subject: Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small  
files / directory listings)

On Thu, Jun 07, 2012 at 10:10:03AM +, Fernando Frediani (Qube) wrote:
Sorry this reply won’t be of any help to your problem, but I am too
curious to understand how it can be even slower if monting using
Gluster client which I would expect always be quicker than NFS or
anything else.

(1) Try it with ls -aR or find . instead of ls -alR

(2) Try it on a gluster non-replicated volume (for fair comparison with
direct NFS access)

With a replicated volume, many accesses involve sending queries to *both*
servers to check they are in sync - even read accesses.  This in turn can
cause disk seeks on both machines, so the latency you'll get is the larger
of the two.  If you are doing lots of accesses sequentially then the
latencies will all add up.

A stat() is one of those accesses which touches both machines, and ls -l
forces a stat() of each file found.

In fact, a quick test suggests ls -l does stat, lstat, getxattr and
lgetxattr:

$ ls -laR . /dev/null 2ert; cut -f1 -d'(' ert | sort | uniq -c
 13 access
  1 arch_prctl
  5 brk
395 close
  4 connect
  1 execve
  1 exit_group
  2 fcntl
391 fstat
  3 futex
702 getdents
  1 getrlimit
   1719 getxattr
  3 ioctl
   1721 lgetxattr
  9 lseek
   1721 lstat
 58 mmap
 24 mprotect
 12 munmap
424 open
 19 read
  2 readlink
  2 rt_sigaction
  1 rt_sigprocmask
  1 set_robust_list
  1 set_tid_address
  4 socket
   1719 stat
  1 statfs
 29 write

Looking at the detail in the strace output, I see these are actually

lstat(target-file, ...)
lgetxattr(target-file, security.selinux, ...)
getxattr(target-file, system.posix_acl_access, ...)
stat(/etc/localtime, ...)

Compare without -l:

$ strace ls -aR . /dev/null 2ert; cut -f1 -d'(' ert | sort | uniq -c
  9 access
  1 arch_prctl
  4 brk
377 close
  1 execve
  1 exit_group
  1 fcntl
376 fstat
  3 futex
702 getdents
  1 getrlimit
  3 ioctl
 39 mmap
 16 mprotect
  4 munmap
388 open
 11 read
  2 rt_sigaction
  1 rt_sigprocmask
  1 set_robust_list
  1 set_tid_address
  1 stat
  1 statfs
  9 write

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Export Gluster backed volume with standard NFS

2012-06-07 Thread Scot Kreienkamp
Hey everyone,

I currently have an NFS server that I need to make highly available.  I was 
thinking I would use Gluster, but since there's no way to match Gluster's built 
in NFS server to my current NFS exports file I can't use the Gluster NFS 
server.  So I was thinking I could have two bricks running replicated Gluster 
volumes, have them mount the Gluster volume from themselves with the FUSE 
client which would give them automatic failover, and then use standard NFS to 
re-export the mounts I need.  Is anyone already doing that or anything like it? 
 Any problems or performance issues?

Thanks

Scot Kreienkamp
skre...@la-z-boy.com




This message is intended only for the individual or entity to which it is 
addressed. It may contain privileged, confidential information which is exempt 
from disclosure under applicable laws. If you are not the intended recipient, 
please note that you are strictly prohibited from disseminating or distributing 
this information (other than to the intended recipient) or copying this 
information. If you have received this communication in error, please notify us 
immediately by e-mail or by telephone at the above number. Thank you.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-07 Thread Brian Candler
On Thu, Jun 07, 2012 at 08:34:56AM -0400, Pranith Kumar Karampuri wrote:
 Brian,
   Small correction: 'sending queries to *both* servers to check they are in 
 sync - even read accesses.' Read fops like stat/getxattr etc are sent to only 
 one brick.

Is that new behaviour for 3.3? My understanding was that stat() was a
healing operation.
http://gluster.org/community/documentation/index.php/Gluster_3.2:_Triggering_Self-Heal_on_Replicate

If this is no longer true, then I'd like to understand what happens after a
node has been down and comes up again.  I understand there's a self-healing
daemon in 3.3, but what if you try to access a file which has not yet been
healed?

I'm interested in understanding this, especially the split-brain scenarios
(better to understand them *before* you're stuck in a problem :-)

BTW I'm in the process of building a 2-node 3.3 test cluster right now.

Cheers,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-07 Thread Pranith Kumar Karampuri
hi Brian,
'stat' command comes as fop (File-operation) 'lookup' to the gluster mount 
which triggers self-heal. So the behavior is still same.
I was referring to the fop 'stat' which will be performed only on one of the 
bricks.
Unfortunately most of the commands and fops have same name.
Following are some of the examples of read-fops:
.access
.stat
.fstat
.readlink
.getxattr
.fgetxattr
.readv

Pranith.
- Original Message -
From: Brian Candler b.cand...@pobox.com
To: Pranith Kumar Karampuri pkara...@redhat.com
Cc: olav johansen luxis2...@gmail.com, gluster-users@gluster.org, Fernando 
Frediani (Qube) fernando.fredi...@qubenet.net
Sent: Thursday, June 7, 2012 7:06:26 PM
Subject: Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small  
files / directory listings)

On Thu, Jun 07, 2012 at 08:34:56AM -0400, Pranith Kumar Karampuri wrote:
 Brian,
   Small correction: 'sending queries to *both* servers to check they are in 
 sync - even read accesses.' Read fops like stat/getxattr etc are sent to only 
 one brick.

Is that new behaviour for 3.3? My understanding was that stat() was a
healing operation.
http://gluster.org/community/documentation/index.php/Gluster_3.2:_Triggering_Self-Heal_on_Replicate

If this is no longer true, then I'd like to understand what happens after a
node has been down and comes up again.  I understand there's a self-healing
daemon in 3.3, but what if you try to access a file which has not yet been
healed?

I'm interested in understanding this, especially the split-brain scenarios
(better to understand them *before* you're stuck in a problem :-)

BTW I'm in the process of building a 2-node 3.3 test cluster right now.

Cheers,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Gluster 3.3.0 and VMware ESXi 5

2012-06-07 Thread Atha Kouroussis
Hi everybody,
we are testing Gluster 3.3 as an alternative to our current Nexenta based 
storage. With the introduction of granular based locking gluster seems like a 
viable alternative for VM storage.

Regrettably we cannot get it to work even for the most rudimentary tests. We 
have a two brick setup with two ESXi 5 servers. We created both distributed and 
replicated volumes. We can mount the volumes via NFS on the ESXi servers 
without any issues but that is as far as we can go.

When we try to migrate a VM to the gluster backed datastore there is no 
activity on the bricks and eventually the operation times out on the ESXi side. 
The nfs.log shows messages like these (distributed volume):

[2012-06-07 00:00:16.992649] E [nfs3.c:3551:nfs3_rmdir_resume] 0-nfs-nfsv3: 
Unable to resolve FH: (192.168.11.11:646) vmvol : 
7d25cb9a-b9c8-440d-bbd8-973694ccad17
[2012-06-07 00:00:17.027559] W [nfs3.c:3525:nfs3svc_rmdir_cbk] 0-nfs: 3bb48d69: 
/TEST = -1 (Directory not empty)
[2012-06-07 00:00:17.066276] W [nfs3.c:3525:nfs3svc_rmdir_cbk] 0-nfs: 3bb48d90: 
/TEST = -1 (Directory not empty)
[2012-06-07 00:00:17.097118] E [nfs3.c:3551:nfs3_rmdir_resume] 0-nfs-nfsv3: 
Unable to resolve FH: (192.168.11.11:646) vmvol : 
----0001


When the volume is mounted on the ESXi servers, we get messages like these in 
nfs.log:

[2012-06-06 23:57:34.697460] W [socket.c:195:__socket_rwv] 0-socket.nfs-server: 
readv failed (Connection reset by peer)


The same volumes mounted via NFS on a linux box work fine and we did a couple 
of benchmarks with bonnie++ with very promising results.
Curiously, if we ssh into the ESXi boxes and go to the mount point of the 
volume, we can see it contents and write.

Any clues of what might be going on? Thanks in advance.

Cheers,
Atha


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Issue recreating volumes

2012-06-07 Thread Brian Candler
Here are a couple of wrinkles I have come across while trying gluster 3.3.0
under ubuntu-12.04.

(1) At one point I decided to delete some volumes and recreate them. But
it would not let me recreate them:

root@dev-storage2:~# gluster volume create fast 
dev-storage1:/disk/storage1/fast dev-storage2:/disk/storage2/fast
/disk/storage2/fast or a prefix of it is already part of a volume

This is even though gluster volume info showed no volumes.

Restarting glusterd didn't help either. Nor indeed did a complete reinstall
of glusterfs, even with apt-get remove --purge and rm -rf'ing the state
directories.

Digging around, I found some hidden state files:

# ls -l /disk/storage1/*/.glusterfs/00/00
/disk/storage1/fast/.glusterfs/00/00:
total 0
lrwxrwxrwx 1 root root 8 Jun  7 14:23 ----0001 
- ../../..

/disk/storage1/safe/.glusterfs/00/00:
total 0
lrwxrwxrwx 1 root root 8 Jun  7 14:21 ----0001 
- ../../..

I deleted them on both machines:

rm -rf /disk/*/.glusterfs

Problem solved? No, not even with glusterd restart :-(

root@dev-storage2:~# gluster volume create safe replica 2 
dev-storage1:/disk/storage1/safe dev-storage2:/disk/storage2/safe
/disk/storage2/safe or a prefix of it is already part of a volume

In the end, what I needed was to delete the actual data bricks themselves:

rm -rf /disk/*/fast
rm -rf /disk/*/safe

That allowed me to recreate the volumes.

This is probably an understanding/documentation issue. I'm sure there's a
lot of magic going on in the gluster 3.3 internals (is that long ID some
sort of replica update sequence number?) which if it were fully documented
would make it easier to recover from these situations.


(2) Minor point: the FUSE client no longer seems to understand or need the
_netdev option, however it still invokes it if you use defaults in
/etc/fstab, and so you get a warning about an unknown option:

root@dev-storage1:~# grep gluster /etc/fstab
storage1:/safe /gluster/safe glusterfs defaults,nobootwait 0 0
storage1:/fast /gluster/fast glusterfs defaults,nobootwait 0 0

root@dev-storage1:~# mount /gluster/safe
unknown option _netdev (ignored)

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Issue recreating volumes

2012-06-07 Thread Pranith Kumar Karampuri
Brian,
 The first point(1) is working as it is intended. Allowing something like 
that can get the volume into very complicated state.
Please go through the following bug:
https://bugzilla.redhat.com/show_bug.cgi?id=812214

Pranith

- Original Message -
From: Brian Candler b.cand...@pobox.com
To: gluster-users@gluster.org
Sent: Thursday, June 7, 2012 8:57:16 PM
Subject: [Gluster-users] Issue recreating volumes

Here are a couple of wrinkles I have come across while trying gluster 3.3.0
under ubuntu-12.04.

(1) At one point I decided to delete some volumes and recreate them. But
it would not let me recreate them:

root@dev-storage2:~# gluster volume create fast 
dev-storage1:/disk/storage1/fast dev-storage2:/disk/storage2/fast
/disk/storage2/fast or a prefix of it is already part of a volume

This is even though gluster volume info showed no volumes.

Restarting glusterd didn't help either. Nor indeed did a complete reinstall
of glusterfs, even with apt-get remove --purge and rm -rf'ing the state
directories.

Digging around, I found some hidden state files:

# ls -l /disk/storage1/*/.glusterfs/00/00
/disk/storage1/fast/.glusterfs/00/00:
total 0
lrwxrwxrwx 1 root root 8 Jun  7 14:23 ----0001 
- ../../..

/disk/storage1/safe/.glusterfs/00/00:
total 0
lrwxrwxrwx 1 root root 8 Jun  7 14:21 ----0001 
- ../../..

I deleted them on both machines:

rm -rf /disk/*/.glusterfs

Problem solved? No, not even with glusterd restart :-(

root@dev-storage2:~# gluster volume create safe replica 2 
dev-storage1:/disk/storage1/safe dev-storage2:/disk/storage2/safe
/disk/storage2/safe or a prefix of it is already part of a volume

In the end, what I needed was to delete the actual data bricks themselves:

rm -rf /disk/*/fast
rm -rf /disk/*/safe

That allowed me to recreate the volumes.

This is probably an understanding/documentation issue. I'm sure there's a
lot of magic going on in the gluster 3.3 internals (is that long ID some
sort of replica update sequence number?) which if it were fully documented
would make it easier to recover from these situations.


(2) Minor point: the FUSE client no longer seems to understand or need the
_netdev option, however it still invokes it if you use defaults in
/etc/fstab, and so you get a warning about an unknown option:

root@dev-storage1:~# grep gluster /etc/fstab
storage1:/safe /gluster/safe glusterfs defaults,nobootwait 0 0
storage1:/fast /gluster/fast glusterfs defaults,nobootwait 0 0

root@dev-storage1:~# mount /gluster/safe
unknown option _netdev (ignored)

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster 3.3.0 and VMware ESXi 5

2012-06-07 Thread Fernando Frediani (Qube)
Hi Atha,

I have a very similar setup and behaviour here.
I have two bricks with replication and I am able to mount the NFS, deploy a 
machine there, but when I try to Power it On it simply doesn't work and gives a 
different message saying that it couldn't find some files.

I wonder if anyone actually got it working with VMware ESXi and can share with 
us their scenario setup. Here I have two CentOS 6.2 and Gluster 3.3.0.

Fernando

-Original Message-
From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Atha Kouroussis
Sent: 07 June 2012 15:29
To: gluster-users@gluster.org
Subject: [Gluster-users] Gluster 3.3.0 and VMware ESXi 5

Hi everybody,
we are testing Gluster 3.3 as an alternative to our current Nexenta based 
storage. With the introduction of granular based locking gluster seems like a 
viable alternative for VM storage.

Regrettably we cannot get it to work even for the most rudimentary tests. We 
have a two brick setup with two ESXi 5 servers. We created both distributed and 
replicated volumes. We can mount the volumes via NFS on the ESXi servers 
without any issues but that is as far as we can go.

When we try to migrate a VM to the gluster backed datastore there is no 
activity on the bricks and eventually the operation times out on the ESXi side. 
The nfs.log shows messages like these (distributed volume):

[2012-06-07 00:00:16.992649] E [nfs3.c:3551:nfs3_rmdir_resume] 0-nfs-nfsv3: 
Unable to resolve FH: (192.168.11.11:646) vmvol : 
7d25cb9a-b9c8-440d-bbd8-973694ccad17
[2012-06-07 00:00:17.027559] W [nfs3.c:3525:nfs3svc_rmdir_cbk] 0-nfs: 3bb48d69: 
/TEST = -1 (Directory not empty)
[2012-06-07 00:00:17.066276] W [nfs3.c:3525:nfs3svc_rmdir_cbk] 0-nfs: 3bb48d90: 
/TEST = -1 (Directory not empty)
[2012-06-07 00:00:17.097118] E [nfs3.c:3551:nfs3_rmdir_resume] 0-nfs-nfsv3: 
Unable to resolve FH: (192.168.11.11:646) vmvol : 
----0001


When the volume is mounted on the ESXi servers, we get messages like these in 
nfs.log:

[2012-06-06 23:57:34.697460] W [socket.c:195:__socket_rwv] 0-socket.nfs-server: 
readv failed (Connection reset by peer)


The same volumes mounted via NFS on a linux box work fine and we did a couple 
of benchmarks with bonnie++ with very promising results.
Curiously, if we ssh into the ESXi boxes and go to the mount point of the 
volume, we can see it contents and write.

Any clues of what might be going on? Thanks in advance.

Cheers,
Atha


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Suggestions for Gluster 3.4

2012-06-07 Thread Fernando Frediani (Qube)
It was said in previous emails about suggestions on how to improve Gluster on 
the development of the next version, 3.4.
Well I guess we can all put up a list and see what will be more popular and 
useful to most people then send to the developers for consideration.

My list starts with:

RAID 1E type cluster (Numbers of nodes don't need to be multiple of the either 
'replicate' or 'stripe' count. Can grow the cluster adding a single node.)
Server x Brick awareness ( Avoids replicate data on two bricks running on the 
same server. Very useful when having multiple logical drives under the same 
RAID controller for improved performance.).
Rack awareness (For very large clusters: Avoids replicate data on servers on 
the same rack)

Regards,

Fernando
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Troubleshooting Unified Object and File Storage in 3.3

2012-06-07 Thread Jason Brooks
On Wed 06 Jun 2012 10:25:38 PM PDT, Vijay Bellur wrote:
 On 06/07/2012 03:22 AM, Jason Brooks wrote:
 I've been testing on CentOS 6.2. The only command from the Admin guide
 I've run successfully has been: curl -v -H 'X-Storage-User: test:tester'
 -H 'X-Storage-Pass:testing' -k http://127.0.0.1:8080/auth/v1.0.

 I started out with a centos machine running gluster-swift, which I was
 connecting to a four node gluster cluster. It wasn't clear to me from
 the admin guide where I was supposed to mount my gluster volume,

 You will need to mount the gluster volume at
 /mnt/gluster-object/account. For the example in admin guide,
 /mnt/gluster-object/AUTH_test needs to be the mountpoint for your
 gluster volume.

Thanks -- that helps a lot.

Another Q on the admin guide. Under 12.4.4. Configuring Authentication 
System the guide says Proxy server must be configured to authenticate 
using  tempauth.

Is this the only supported auth method? I'm experimenting with keystone.

Thanks, Jason


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Troubleshooting Unified Object and File Storage in 3.3

2012-06-07 Thread Jason Brooks
On 06/06/2012 10:25 PM, Vijay Bellur wrote:
 On 06/07/2012 03:22 AM, Jason Brooks wrote:
 I've been testing on CentOS 6.2. The only command from the Admin guide
 I've run successfully has been: curl -v -H 'X-Storage-User: test:tester'
 -H 'X-Storage-Pass:testing' -k http://127.0.0.1:8080/auth/v1.0.

 I started out with a centos machine running gluster-swift, which I was
 connecting to a four node gluster cluster. It wasn't clear to me from
 the admin guide where I was supposed to mount my gluster volume,
 
 You will need to mount the gluster volume at
 /mnt/gluster-object/account. For the example in admin guide,
 /mnt/gluster-object/AUTH_test needs to be the mountpoint for your
 gluster volume.

There's something I'm confused about -- if I mount my gluster volume at
AUTH_test, I am able to work with it, but is the idea that users should
manually create a gluster volume and mountpoint for every account?

I've been working through this Fedora 17 openstack howto:
https://fedoraproject.org/wiki/Getting_started_with_OpenStack_on_Fedora_17#Configure_swift_with_keystone.
I thought I'd bring gluster into the mix, but it's not clear to me how
the setup directions I see here and elsewhere for swift ought to
interact with the gluster-swift packages.

The gluster-swift-plugin places a set of configuration files into
/etc/swift -- the 1.conf files and the ring configurations. The admin
guide doesn't mention any swift-ring-builder operations -- are these not
required with UFO?

Thanks, Jason
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)

2012-06-07 Thread olav johansen
Hi All,

Thanks for great feedback, I had changed ip's and I noticed one server
wasn't connecting correctly when checking log.

To ensure I had no wrong-doings I've re-done the bricks from scratch, clean
configurations, with mount info attached below, still not performing
'great' compared to a single NFS mount.

The application we're running our files don't change, we only add / delete
files, so I'd like to get directory / file info cached as much as possible.


Config info:
gluster volume info data-storage

Volume Name: data-storage
Type: Replicate
Volume ID: cc91c107-bdbb-4179-a097-cdd3e9d5ac93
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: fs1:/data/storage
Brick2: fs2:/data/storage
gluster


On my web1 node I mounted:
# mount -t glusterfs fs1:/data-storage /storage

I've copied over my data to it again and doing a ls several times, takes
~0.5 seconds:
[@web1 files]# time ls -all|wc -l
1989

real0m0.485s
user0m0.022s
sys 0m0.109s
[@web1 files]# time ls -all|wc -l
1989

real0m0.489s
user0m0.016s
sys 0m0.116s
[@web1 files]# time ls -all|wc -l
1989

real0m0.493s
user0m0.018s
sys 0m0.115s

Doing the same thing on the raw os files on one node takes 0.021s
[@fs2 files]# time ls -all|wc -l
1989

real0m0.021s
user0m0.007s
sys 0m0.015s
[@fs2 files]# time ls -all|wc -l
1989

real0m0.020s
user0m0.008s
sys 0m0.013s


Now full directory listing even seems slower... :
[@web1 files]# time ls -alR|wc -l
2242956

real74m0.660s
user0m20.117s
sys 1m24.734s
[@web1 files]# time ls -alR|wc -l
2242956

real26m27.159s
user0m17.387s
sys 1m11.217s
[@web1 files]# time ls -alR|wc -l
2242956

real27m38.163s
user0m18.333s
sys 1m19.824s


Just as crazy reference, on another single server with SSD's (Raid 10)
drives I get:
files# time ls -alR|wc -l
2260484

real0m15.761s
user0m5.170s
sys 0m7.670s
For the same operation. (this server even have more files...)

My goal is to get this directory listing as fast as possible, I don't have
the hardware/budget to test a SSD configuration, but would a SSD setup give
me ~1minute directory listing time (assuming it is 4 times slower than
single node)?

If I added two more bricks to the cluster / replicated, would this double
read speed?

Thanks for any insight!


 storage.log from web1 on mount -
[2012-06-07 20:47:45.584320] I [glusterfsd.c:1666:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.3.0
[2012-06-07 20:47:45.624548] I [io-cache.c:1549:check_cache_size_ok]
0-data-storage-quick-read: Max cache size is 8252092416
[2012-06-07 20:47:45.624612] I [io-cache.c:1549:check_cache_size_ok]
0-data-storage-io-cache: Max cache size is 8252092416
[2012-06-07 20:47:45.628148] I [client.c:2142:notify]
0-data-storage-client-0: parent translators are ready, attempting connect
on transport
[2012-06-07 20:47:45.631059] I [client.c:2142:notify]
0-data-storage-client-1: parent translators are ready, attempting connect
on transport
Given volfile:
+--+
  1: volume data-storage-client-0
  2: type protocol/client
  3: option remote-host fs1
  4: option remote-subvolume /data/storage
  5: option transport-type tcp
  6: end-volume
  7:
  8: volume data-storage-client-1
  9: type protocol/client
 10: option remote-host fs2
 11: option remote-subvolume /data/storage
 12: option transport-type tcp
 13: end-volume
 14:
 15: volume data-storage-replicate-0
 16: type cluster/replicate
 17: subvolumes data-storage-client-0 data-storage-client-1
 18: end-volume
 19:
 20: volume data-storage-write-behind
 21: type performance/write-behind
 22: subvolumes data-storage-replicate-0
 23: end-volume
 24:
 25: volume data-storage-read-ahead
 26: type performance/read-ahead
 27: subvolumes data-storage-write-behind
 28: end-volume
 29:
 30: volume data-storage-io-cache
 31: type performance/io-cache
 32: subvolumes data-storage-read-ahead
 33: end-volume
34:
 35: volume data-storage-quick-read
 36: type performance/quick-read
 37: subvolumes data-storage-io-cache
 38: end-volume
 39:
 40: volume data-storage-md-cache
 41: type performance/md-cache
 42: subvolumes data-storage-quick-read
 43: end-volume
 44:
 45: volume data-storage
 46: type debug/io-stats
 47: option latency-measurement off
 48: option count-fop-hits off
 49: subvolumes data-storage-md-cache
 50: end-volume

+--+
[2012-06-07 20:47:45.642625] I [rpc-clnt.c:1660:rpc_clnt_reconfig]
0-data-storage-client-0: changing port to 24009 (from 0)
[2012-06-07 20:47:45.648604] I [rpc-clnt.c:1660:rpc_clnt_reconfig]
0-data-storage-client-1: changing port to 24009 (from 0)
[2012-06-07 20:47:49.592729] I

Re: [Gluster-users] Gluster 3.3.0 and VMware ESXi 5

2012-06-07 Thread Atha Kouroussis
Hi Fernando, 
thanks for the reply. I'm seeing exactly the same behavior. I'm wondering if it 
somehow has to do with locking. I read here 
(http://community.gluster.org/q/can-not-mount-nfs-share-without-nolock-option/) 
that locking on NFS was not implemented in 3.2.x and it is now in 3.3. I tested 
3.2.x with ESXi a few months ago and it seemed to work fine but the lack of 
granular locking made it a no-go back then.

Anybody care to chime in with any suggestions? Is there a way to revert NFS to 
3.2.x behavior to test?

Cheers,
Atha

On Thursday, June 7, 2012 at 11:52 AM, Fernando Frediani (Qube) wrote:

 Hi Atha,
 
 I have a very similar setup and behaviour here.
 I have two bricks with replication and I am able to mount the NFS, deploy a 
 machine there, but when I try to Power it On it simply doesn't work and gives 
 a different message saying that it couldn't find some files.
 
 I wonder if anyone actually got it working with VMware ESXi and can share 
 with us their scenario setup. Here I have two CentOS 6.2 and Gluster 3.3.0.
 
 Fernando
 
 -Original Message-
 From: gluster-users-boun...@gluster.org 
 [mailto:gluster-users-boun...@gluster.org] On Behalf Of Atha Kouroussis
 Sent: 07 June 2012 15:29
 To: gluster-users@gluster.org (mailto:gluster-users@gluster.org)
 Subject: [Gluster-users] Gluster 3.3.0 and VMware ESXi 5
 
 Hi everybody,
 we are testing Gluster 3.3 as an alternative to our current Nexenta based 
 storage. With the introduction of granular based locking gluster seems like a 
 viable alternative for VM storage.
 
 Regrettably we cannot get it to work even for the most rudimentary tests. We 
 have a two brick setup with two ESXi 5 servers. We created both distributed 
 and replicated volumes. We can mount the volumes via NFS on the ESXi servers 
 without any issues but that is as far as we can go.
 
 When we try to migrate a VM to the gluster backed datastore there is no 
 activity on the bricks and eventually the operation times out on the ESXi 
 side. The nfs.log shows messages like these (distributed volume):
 
 [2012-06-07 00:00:16.992649] E [nfs3.c:3551:nfs3_rmdir_resume] 0-nfs-nfsv3: 
 Unable to resolve FH: (192.168.11.11:646) vmvol : 
 7d25cb9a-b9c8-440d-bbd8-973694ccad17
 [2012-06-07 00:00:17.027559] W [nfs3.c:3525:nfs3svc_rmdir_cbk] 0-nfs: 
 3bb48d69: /TEST = -1 (Directory not empty)
 [2012-06-07 00:00:17.066276] W [nfs3.c:3525:nfs3svc_rmdir_cbk] 0-nfs: 
 3bb48d90: /TEST = -1 (Directory not empty)
 [2012-06-07 00:00:17.097118] E [nfs3.c:3551:nfs3_rmdir_resume] 0-nfs-nfsv3: 
 Unable to resolve FH: (192.168.11.11:646) vmvol : 
 ----0001
 
 
 When the volume is mounted on the ESXi servers, we get messages like these in 
 nfs.log:
 
 [2012-06-06 23:57:34.697460] W [socket.c:195:__socket_rwv] 
 0-socket.nfs-server: readv failed (Connection reset by peer)
 
 
 The same volumes mounted via NFS on a linux box work fine and we did a couple 
 of benchmarks with bonnie++ with very promising results.
 Curiously, if we ssh into the ESXi boxes and go to the mount point of the 
 volume, we can see it contents and write.
 
 Any clues of what might be going on? Thanks in advance.
 
 Cheers,
 Atha
 
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org (mailto:Gluster-users@gluster.org)
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users



___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Issue recreating volumes

2012-06-07 Thread Amar Tumballi

Hi Brian,

Answers inline.


Here are a couple of wrinkles I have come across while trying gluster 3.3.0
under ubuntu-12.04.

(1) At one point I decided to delete some volumes and recreate them. But
it would not let me recreate them:

 root@dev-storage2:~# gluster volume create fast 
dev-storage1:/disk/storage1/fast dev-storage2:/disk/storage2/fast
 /disk/storage2/fast or a prefix of it is already part of a volume

This is even though gluster volume info showed no volumes.

Restarting glusterd didn't help either. Nor indeed did a complete reinstall
of glusterfs, even with apt-get remove --purge and rm -rf'ing the state
directories.

Digging around, I found some hidden state files:

 # ls -l /disk/storage1/*/.glusterfs/00/00
 /disk/storage1/fast/.glusterfs/00/00:
 total 0
 lrwxrwxrwx 1 root root 8 Jun  7 14:23 ----0001 
-  ../../..

 /disk/storage1/safe/.glusterfs/00/00:
 total 0
 lrwxrwxrwx 1 root root 8 Jun  7 14:21 ----0001 
-  ../../..

I deleted them on both machines:

 rm -rf /disk/*/.glusterfs

Problem solved? No, not even with glusterd restart :-(

 root@dev-storage2:~# gluster volume create safe replica 2 
dev-storage1:/disk/storage1/safe dev-storage2:/disk/storage2/safe
 /disk/storage2/safe or a prefix of it is already part of a volume

In the end, what I needed was to delete the actual data bricks themselves:

 rm -rf /disk/*/fast
 rm -rf /disk/*/safe

That allowed me to recreate the volumes.

This is probably an understanding/documentation issue. I'm sure there's a
lot of magic going on in the gluster 3.3 internals (is that long ID some
sort of replica update sequence number?) which if it were fully documented
would make it easier to recover from these situations.



Preventing of 'recreating' of a volume (actually internally, it just 
prevents you from 're-using' the bricks, you can create same volume name 
with different bricks), is very much intentional to prevent disasters 
(like data loss) from happening.


We treat data separate from volume's config information. Hence, when a 
volume is 'delete'd, only the configuration details of the volume is 
lost, but data belonging to the volume is present on its brick as is. It 
is admin's discretion to handle the data later.


Considering above point, now, if we allow 're-using' of the same brick 
which was part of some volume earlier, it could lead to issues of data 
placement in wrong brick, internal inode number clashes etc, which could 
lead to 'heal' the data from client perspective, leading to deleting 
some files which would be important.


If admin is aware of the case, and knows that there is no 'data' inside 
the brick, then easier option is to delete the export dir and it gets 
created by 'gluster volume create'. If you want to fix it without 
deleting the export directory, then it is also possible, by deleting the 
extended attributes on the brick like below.


bash# setfattr -x trusted.glusterfs.volume-id $brickdir
bash# setfattr -x trusted.gfid $brickdir


And now, creating the brick should succeed.



(2) Minor point: the FUSE client no longer seems to understand or need the
_netdev option, however it still invokes it if you use defaults in
/etc/fstab, and so you get a warning about an unknown option:

 root@dev-storage1:~# grep gluster /etc/fstab
 storage1:/safe /gluster/safe glusterfs defaults,nobootwait 0 0
 storage1:/fast /gluster/fast glusterfs defaults,nobootwait 0 0

 root@dev-storage1:~# mount /gluster/safe
 unknown option _netdev (ignored)



Will look into this.

Regards,
Amar
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Issue recreating volumes

2012-06-07 Thread Rajesh Amaravathi
one can use the clear_xattrs.sh script with the bricks as argument to remove 
all the xattrs set on bricks. it recursively deleted all 
xattrs from the bricks' files. after running this script on bricks, we can 
re-use them

Regards, 
Rajesh Amaravathi, 
Software Engineer, GlusterFS 
RedHat Inc. 

- Original Message -
From: Amar Tumballi ama...@redhat.com
To: Brian Candler b.cand...@pobox.com
Cc: gluster-users@gluster.org
Sent: Friday, June 8, 2012 10:34:08 AM
Subject: Re: [Gluster-users] Issue recreating volumes

Hi Brian,

Answers inline.

 Here are a couple of wrinkles I have come across while trying gluster 3.3.0
 under ubuntu-12.04.

 (1) At one point I decided to delete some volumes and recreate them. But
 it would not let me recreate them:

  root@dev-storage2:~# gluster volume create fast 
 dev-storage1:/disk/storage1/fast dev-storage2:/disk/storage2/fast
  /disk/storage2/fast or a prefix of it is already part of a volume

 This is even though gluster volume info showed no volumes.

 Restarting glusterd didn't help either. Nor indeed did a complete reinstall
 of glusterfs, even with apt-get remove --purge and rm -rf'ing the state
 directories.

 Digging around, I found some hidden state files:

  # ls -l /disk/storage1/*/.glusterfs/00/00
  /disk/storage1/fast/.glusterfs/00/00:
  total 0
  lrwxrwxrwx 1 root root 8 Jun  7 14:23 
 ----0001 -  ../../..

  /disk/storage1/safe/.glusterfs/00/00:
  total 0
  lrwxrwxrwx 1 root root 8 Jun  7 14:21 
 ----0001 -  ../../..

 I deleted them on both machines:

  rm -rf /disk/*/.glusterfs

 Problem solved? No, not even with glusterd restart :-(

  root@dev-storage2:~# gluster volume create safe replica 2 
 dev-storage1:/disk/storage1/safe dev-storage2:/disk/storage2/safe
  /disk/storage2/safe or a prefix of it is already part of a volume

 In the end, what I needed was to delete the actual data bricks themselves:

  rm -rf /disk/*/fast
  rm -rf /disk/*/safe

 That allowed me to recreate the volumes.

 This is probably an understanding/documentation issue. I'm sure there's a
 lot of magic going on in the gluster 3.3 internals (is that long ID some
 sort of replica update sequence number?) which if it were fully documented
 would make it easier to recover from these situations.


Preventing of 'recreating' of a volume (actually internally, it just 
prevents you from 're-using' the bricks, you can create same volume name 
with different bricks), is very much intentional to prevent disasters 
(like data loss) from happening.

We treat data separate from volume's config information. Hence, when a 
volume is 'delete'd, only the configuration details of the volume is 
lost, but data belonging to the volume is present on its brick as is. It 
is admin's discretion to handle the data later.

Considering above point, now, if we allow 're-using' of the same brick 
which was part of some volume earlier, it could lead to issues of data 
placement in wrong brick, internal inode number clashes etc, which could 
lead to 'heal' the data from client perspective, leading to deleting 
some files which would be important.

If admin is aware of the case, and knows that there is no 'data' inside 
the brick, then easier option is to delete the export dir and it gets 
created by 'gluster volume create'. If you want to fix it without 
deleting the export directory, then it is also possible, by deleting the 
extended attributes on the brick like below.

bash# setfattr -x trusted.glusterfs.volume-id $brickdir
bash# setfattr -x trusted.gfid $brickdir


And now, creating the brick should succeed.


 (2) Minor point: the FUSE client no longer seems to understand or need the
 _netdev option, however it still invokes it if you use defaults in
 /etc/fstab, and so you get a warning about an unknown option:

  root@dev-storage1:~# grep gluster /etc/fstab
  storage1:/safe /gluster/safe glusterfs defaults,nobootwait 0 0
  storage1:/fast /gluster/fast glusterfs defaults,nobootwait 0 0

  root@dev-storage1:~# mount /gluster/safe
  unknown option _netdev (ignored)


Will look into this.

Regards,
Amar
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users