Re: [Gluster-users] Is read cache a file cache or a block cache?

2015-03-12 Thread Anand Avati
The cache works by remembering 128KB "pages" within files. Effectively
"blocks" in your terminology.

Thanks

On Wed, 11 Mar 2015 at 12:36 Jon Heese  wrote:

> Hello,
>
> I have a two-server, two-brick (one brick per server) replicated Gluster
> 3.6.2 volume, and I'm interested in the 'performance.cache-size' option
> and how that read cache works.
>
> My volume currently stores a handful of ~500GB image files, which are
> then fed to an iSCSI daemon to serve up datastores and other
> miscellaneous iSCSI disks to servers over an iSCSI network.
>
> I have about 14GB of unutilized (minus system cache/buffers) memory on
> the gluster servers (which are also the gluster clients, in this case)
> which I'd like to utilize to improve the read performance of this volume.
>
> So since my files are are well over the "tens of GB" mark, I'm curious:
> Does the Gluster read cache work at the block level -- i.e. caching
> *blocks* that are likely to be read -- or does it work at the file level
> -- caching *files* that are likely to be read?  Obviously, the latter
> might work well for me, but the former is likely not very useful.
>
> I've tried searching around for details on how this works, but short of
> diving into the code itself (which is likely beyond my skill level and
> time allowance), I haven't been able to find the answer to this question.
>
> If I've misunderstood how any of this is supposed to work, please feel
> free to correct me.  Thanks in advance!
>
> Regards,
> Jon Heese
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] O_DIRECT (I think) again

2015-02-13 Thread Anand Avati
O_DIRECT support in fuse has been for quite some time now, surely well
before 3.4

On Fri, Feb 13, 2015, 02:37 Pedro Serotto  wrote:

> Dear All,
>
> I am actually using the following software stack:
>
> debian wheezy with kernel 3.2.0-4-amd64, glusterfs 3.6.2, openstack Juno,
> libvirt 1.2.9.
>
> If I try to attach a block storage to a running vm, Openstack shows the
> following error:
> "DeviceIsBusy: The supplied device (vdc) is busy".
>
> If I try to attach a block storage to a running vm, Libvirt shows the
> following error:
>  "qemuMonitorTextAddDrive:2621 : operation failed: open disk image file
> failed"
>
> Looking up for this issue on the web, I found out that Libvirt tries to
> open the block device by using  O_DIRECT flag on; This last one is
> supported only by fuse for kernel >3.4.
> Therefore, I tried to apply some options (
> http://www.gluster.org/documentation/use_cases/Virt-store-usecase/) to
> Gluster, but the problem has not been solved.
> I also found  https://github.com/avati/liboindirect but it is old and not
> mantained.
>
> Does somebody found himself in the same situation? If yes, could you
> please show me how to solve it by mainteining the same version of my
> software stack.
>
>
> Thanks&Regards
>
> Pedro
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Why is Gluster 3.4+ limited to 32 groups when Gluster 3.0 was not?

2015-02-02 Thread Anand Avati
Hi Barry!

Your observation is right. Sometime after 3.0 (not sure which exact
version, probably 3.1) Gluster introduced POSIX acl support (on the server
side). Until then, if fuse let a request through into Gluster, server
assumed request to be authenticated - however fuse does not support POSIX
acl. Now that auth was introduced on the server side for POSIX acl support,
the gid transfet size limit inevitably shows.

HTH
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Atomic rename in Gluster 3.6

2015-01-26 Thread Anand Avati
Atomicity of rename() has two aspects. One is the back end view (for crash
consistency), of having an unambiguous single point in time when the rename
is declared complete. Dht does quite a few tricks to make this atomicity
work well in practice. The other is the effect on API, in particular the
effect on open() of a file which is getting replaced by a rename of another
file over it. The nature of fuse operations (i.e split into lookup()
followed by open() on the result of lookup) makes it almost impossible to
handle the case when lookup happens before the backend rename, and open()
arrives on the file handle which is now gone (deleted due to replace)
during rename.

For theoretical correctness, we would need fuse to forward the open-intent
flag in lookup, so that Gluster can keep the file pre-open in anticipation
of the imminent arrival of open() fop - guarding against getting replaced
by a different file in the mean time. However the recent upstream kernel
patchset of handling ESTALE with a retry, along with fixes in Gluster to be
careful of returning ESTALE vs ENOENT minimize the window of race. This may
be good enough for a lot of use cases (though still possible to hit the
race if you are creating temp files and renaming them to the filename of
interest in a tight loop).

Extending fuse with open-intent flag and its proper usage in Gluster is the
foolproof fix, but I dont think anyone is working on this yet.

Thanks

On Mon, Jan 26, 2015, 02:21 Sebastien Cote  wrote:

> Hi,
>
>
> I was wondering if there was a way to make the rename operation atomic
> with Gluster 3.6. I have seen older posts on this list suggesting the use
> of the following configuration parameter:
>
>  cluster.extra-hash-regex: "(.*)\\.tmp"
>
>
>
> I did not see comments on the success or failure of that method, or a
> proposition for an alternative. So is atomic rename a dead end, even with
> Gluster 3.6 ?
>
> Thank you,
>
> Sebastien
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Appending time to snap name in USS

2015-01-08 Thread Anand Avati
It would be convenient if the time is appended to the snap name on the fly
(when receiving list of snap names from glusterd?) so that the timezone
application can be dynamic (which is what users would expect).

Thanks

On Thu Jan 08 2015 at 3:21:15 AM Poornima Gurusiddaiah 
wrote:

> Hi,
>
> Windows has a feature called shadow copy. This is widely used by all
> windows users to view the previous versions of a file.
> For shadow copy to work with glusterfs backend, the problem was that
> the clients expect snapshots to contain some format
> of time in their name.
>
> After evaluating the possible ways(asking the user to create
> snapshot with some format of time in it and have rename snapshot
> for existing snapshots) the following method seemed simpler.
>
> If the USS is enabled, then the creation time of the snapshot is
> appended to the snapname and is listed in the .snaps directory.
> The actual name of the snapshot is left unmodified. i.e. the  snapshot
> list/info/restore etc. commands work with the original snapname.
> The patch for the same can be found @http://review.gluster.org/#/c/9371/
>
> The impact is that, the users would see the snapnames to be different in
> the ".snaps" folder
> than what they have created. Also the current patch does not take care of
> the scenario where
> the snapname already has time in its name.
>
> Eg:
> Without this patch:
> drwxr-xr-x 4 root root 110 Dec 26 04:14 snap1
> drwxr-xr-x 4 root root 110 Dec 26 04:14 snap2
>
> With this patch
> drwxr-xr-x 4 root root 110 Dec 26 04:14 snap1@GMT-2014.12.30-05.07.50
> drwxr-xr-x 4 root root 110 Dec 26 04:14 snap2@GMT-2014.12.30-23.49.02
>
> Please let me know if you have any suggestions or concerns on the same.
>
> Thanks,
> Poornima
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Chaning position of md-cache in xlator graph

2014-10-21 Thread Anand Avati
On Tue, Oct 21, 2014 at 2:58 AM, Raghavendra Gowdappa 
wrote:

>
> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1138970



Posted a comment in the BZ
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Fwd: License option for the GlusterFS

2014-10-16 Thread Anand Avati
GlusterFS did undergo a few revisions of license changes. We have finally
settled on dual licensed "GPL v2 / LGPL v3 or later" - for all the code in
glusterfs.git outside contrib/.

Thanks

On Thu, Oct 16, 2014 at 12:36 AM, Zhou Ganhong  wrote:

>
>
> Hi, all
>
>I am confused by the License used by the GlusterFS. There
> are different descriptions in the web site. Which one is correct? Thanks a
> lot.
>
>
>
>Here is the link that claims that AGPL is appled.
>
>
> http://www.gluster.org/documentation/community/GNU_Affero_General_Public_License/
>
>
>
> But In these links,
> http://gluster.org/community/documentation/index.php/Developers and
> https://forge.gluster.org/glusterfs-core/pages/Home, the below is
> mentioned.  It seems that the client library is Dual License.
>
>
>
> License Change 
> - we recently changed the client library code to a dual license under the
> GPL v2 and the LGPL v3 or later
>
>
>
>
>
>// Gulf
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] file corruption on Gluster 3.5.1 and Ubuntu 14.04

2014-09-07 Thread Anand Avati
The only reason O_APPEND gets stripped on the server side, is because of
one of the following xlators:

- stripe
- quiesce
- crypt

If you have any of these, please try unloading/reconfiguring without these
features and try again.

Thanks


On Sat, Sep 6, 2014 at 3:31 PM, mike  wrote:

> I was able to narrow it down to smallish python script.
>
> I've attached that to the bug.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1138970
>
>
> On Sep 6, 2014, at 1:05 PM, Justin Clift  wrote:
>
> > Thanks Mike, this is good stuff. :)
> >
> > + Justin
> >
> >
> > On 06/09/2014, at 8:19 PM, mike wrote:
> >> I upgraded the client to Gluster 3.5.2, but there is no difference.
> >>
> >> The bug is almost certainly in the Fuse client. If I remount the
> filesystem with NFS, the problem is no longer observable.
> >>
> >> I spent a little time looking through the xlator/fuse-bridge to see
> where the offsets are coming from, but I'm really not familiar enough with
> the code, so it is slow going.
> >>
> >> Unfortunately, I'm still having trouble reproducing this in a python
> script that could be readily attached to a bug report.
> >>
> >> I'll take a crack at that again, but I will a file a bug anyway for
> completeness.
> >>
> >> On Sep 5, 2014, at 7:10 PM, mike  wrote:
> >>
> >>> I have narrowed down the source of the bug.
> >>>
> >>> Here is an strace of glusterfsd http://fpaste.org/131455/40996378/
> >>>
> >>> The first line represents a write that does *not* make it into the
> underlying file.
> >>>
> >>> The last line is the write that stomps the earlier write.
> >>>
> >>> As I said, the client file is opened in O_APPEND mode, but on the
> glusterfsd side, the file is just O_CREAT|O_WRONLY. The means the offsets
> to pwrite() need to be valid.
> >>>
> >>> I correlated this to a tcpdump I took and I can see that in fact, the
> RPCs being sent have the wrong offset.  Interestingly,
> glusterfs.write-is-append = 0, which I wouldn't have expected.
> >>>
> >>> I think the bug lies in the glusterfs fuse client.
> >>>
> >>> As to your question about Gluster 3.5.2, I may be able to do that if I
> am unable to find the bug in the source.
> >>>
> >>> -Mike
> >>>
> >>> On Sep 5, 2014, at 6:16 PM, Justin Clift  wrote:
> >>>
>  On 06/09/2014, at 12:10 AM, mike wrote:
> > I have found that the O_APPEND flag is key to this failure - I had
> overlooked that flag when reading the strace and trying to cobble up a
> minimal reproduction.
> >
> > I now have a small pair of python scripts that can reliably
> reproduce this failure.
> 
> 
>  As a thought, is there a reasonable way you can test this on
> GlusterFS 3.5.2?
> 
>  There were some important bug fixes in 3.5.2 (from 3.5.1).
> 
>  Note I'm not saying yours is one of them, I'm just asking if it's
>  easy to test and find out. :)
> 
>  Regards and best wishes,
> 
>  Justin Clift
> 
>  --
>  GlusterFS - http://www.gluster.org
> 
>  An open source, distributed file system scaling to several
>  petabytes, and handling thousands of clients.
> 
>  My personal twitter: twitter.com/realjustinclift
> 
> >>>
> >>
> >
> > --
> > GlusterFS - http://www.gluster.org
> >
> > An open source, distributed file system scaling to several
> > petabytes, and handling thousands of clients.
> >
> > My personal twitter: twitter.com/realjustinclift
> >
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Transparent encryption in GlusterFS: Implications on manageability

2014-08-13 Thread Anand Avati
+1 for all the points.


On Wed, Aug 13, 2014 at 11:22 AM, Jeff Darcy  wrote:

> > I.1 Generating the master volume key
> >
> >
> > Master volume key should be generated by user on the trusted machine.
> > Recommendations on master key generation provided at section 6.2 of
> > the manpages [1]. Generating of master volume key is in user's
> > competence.
>
> That was fine for an initial implementation, but it's still the single
> largest obstacle to adoption of this feature.  Looking forward, we need
> to provide full CLI support for generating keys in the necessary format,
> specifying their location, etc.
>
> >I.2 Location of the master volume key when mounting a
> >volume
> >
> >
> > At mount time the crypt translator searches for a master volume key on
> > the client machine at the location specified by the respective
> > translator option. If there is no any key at the specified location,
> > or the key at specified location is in improper format, then mount
> > will fail. Otherwise, the crypt translator loads the key to its
> > private memory data structures.
> >
> > Location of the master volume key can be specified at volume creation
> > time (see option "master-key", section 6.7 of the man pages [1]).
> > However, this option can be overridden by user at mount time to
> > specify another location, see section 7 of manpages [1], steps 6, 7,
> > 8.
>
> Again, we need to improve on this.  We should support this as a volume
> or mount option in its own right, not rely on the generic
> --xlator-option mechanism.  Adding options to mount.glusterfs isn't
> hard.  Alternatively, we could make this look like a volume option
> settable once through the CLI, even though the path is stored locally on
> the client.  Or we could provide a separate special-purpose
> command/script, which again only needs to be run once.  It would even be
> acceptable to treat the path to the key file (not its contents!) as a
> true volume option, stored on the servers.  Any of these would be better
> than requiring the user to understand our volfile format and
> construction so that they can add the necessary option by hand.
>
> >II. Check graph of translators on your client machine
> >after mount!
> >
> >
> > During mount your client machine receives configuration info from the
> > non-trusted server. In particular, this info contains the graph of
> > translators, which can be subjected to tampering, so that encryption
> > won't be invoked for your volume at all. So it is highly important to
> > verify this graph. After successful mount make sure that the graph of
> > translators contains the crypt translator with proper options (see
> > FAQ#1, section 11 of the manpages [1]).
>
> It is important to verify the graph, but not by poking through log files
> and not without more information about what to look for.  So we got a
> volfile that includes the crypt translator, with some options.  The
> *code* should ensure that the master-key option has the value from the
> command line or local config, and not some other.  If we have to add
> special support for this in otherwise-generic graph initialization code,
> that's fine.
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] performance/writebehind behavior

2014-07-28 Thread Anand Avati
On Mon, Jul 28, 2014 at 10:43 AM, Richard van der Hoff <
rich...@swiftserve.com> wrote:

> On 28/07/14 18:05, Anand Avati wrote:
>
>> Whether flush-behind is enabled or not, close() will guarantee all
>> previous write()s on that fd have been acknowledged by server.
>>
>
> Thanks Anand. So can you explain why the 'wc' in my example doesn't see
> all of the data written by the dd?
>
>
I'm wondering if it is because of attribute cache. Maybe attribute cache
(either in fuse or gluster, don't know yet) is not getting invalidated for
some reason. Try each of the following and check if any of them make the
test work right:

#1 mount glusterfs with --attribute-cache=0

#2 disable stat prefetch with : gluster volume set $name
performance.stat-prefetch off


That should help diagnose the problem further.

Thanks
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] performance/writebehind behavior

2014-07-28 Thread Anand Avati
Whether flush-behind is enabled or not, close() will guarantee all previous
write()s on that fd have been acknowledged by server. It is just the post
processing of close() itself which is performed in background when
flush-behind is enabled. The word "flush" here is probably confusing as it
is specific to FUSE terminology where an app triggered close() appears as a
flush fop.

Thanks


On Mon, Jul 28, 2014 at 9:58 AM, Richard van der Hoff <
rich...@swiftserve.com> wrote:

> Would anyone be able to help out with this question?
>
> Thanks
>
> Richard
>
>
>
> On 11/07/14 00:08, Pranith Kumar Karampuri wrote:
>
>> CC write-behind Dev
>> On 07/10/2014 11:59 PM, Richard van der Hoff wrote:
>>
>>> Hi folks,
>>>
>>> Just wondering if anyone could clear up a question about expected
>>> behavior for the performance/writebehind translator.
>>>
>>> I'm using Gluster 3.3, with a single volume which is distributed to
>>> two bricks on a pair of servers. I have performance.flush-behind=off;
>>> the documentation [1] leads me to expect (but doesn't say explicitly)
>>> that this will make close() block until the write has been flushed -
>>> but that isn't consistent with the behaviour I'm seeing:
>>>
>>> $ dd if=/dev/zero of=/shared/vod/zero bs=1024 count=1000; wc -c
>>> /shared/vod/zero
>>> 1000+0 records in
>>> 1000+0 records out
>>> 1024000 bytes (1.0 MB) copied, 0.133975 s, 7.6 MB/s
>>> 1016832 /shared/vod/zero
>>>
>>> As you can see, the wc doesn't see all of the data which has been
>>> written by dd.
>>>
>>> If anyone could clear up whether or not this is expected, I'd be
>>> grateful.
>>>
>>> Thanks
>>>
>>> Richard
>>>
>>> [1]
>>> http://gluster.org/community/documentation/index.php/
>>> Translators/performance/writebehind
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Addition of GlusterFS Port Maintainers

2014-06-24 Thread Anand Avati
On Tue, Jun 24, 2014 at 10:43 AM, Justin Clift  wrote:

> On 24/06/2014, at 6:34 PM, Vijay Bellur wrote:
> > Hi All,
> >
> > Since there has been traction for ports of GlusterFS to other unix
> distributions, we thought of adding maintainers for the various ports that
> are around. I am glad to announce that the following individuals who have
> been chugging GlusterFS along on those distributions have readily agreed to
> be port maintainers. Please welcome:
> >
> > 1. Emmanuel Dreyfus as maintainer for NetBSD
> >
> > 2. Harshavardhana and Dennis Schafroth for Mac OS X
> >
> > 3. Harshavardhana as interim maintainer for FreeBSD
> >
> > All port maintainers will have commit access to GlusterFS repository and
> will manage patches in gerrit that are necessary for keeping the ports
> functional. We believe that this effort will help in keeping releases on
> various ports up to date.
> >
> > Let us extend our co-operation to port maintainers and help evolve a
> more broader, vibrant community for GlusterFS!
>
>
> Excellent stuff. :)
>
> + Justin
>

+1
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Painfully slow volume actions

2014-05-21 Thread Anand Avati
Is it possible that each of your bricks is in its own vm, and the vm system
drives (where /var/lib/glusterd resides) are all placed on the same host
drive? Glusterd updates happen synchronously even in the latest release and
the change to use buffered writes + fsync went into master only recently..
 On May 21, 2014 1:25 AM, "Benjamin Kingston"  wrote:

> I'm trying to get gluster working on a test lab and had excellent success
> setting up a volume and 14 bricks on the first go around. However I
> realized the reasoning behind using a subdirectory in each brick and
> decommissioned the whole volume to start over. I also deleted the
> /var/lib/glusterd directory and removed/installed the necessary gluster
> packages. I'm running Scientific linux 6.4 with all recent updates.
>
> Upon recreating the new volume with the new brick, I found the process
> very very slow, about 2-3 minutes to process the change. Adding additional
> bricks also takes the same amount of time as well as simple set parameter
> actions. This was with a distributed volume with only one host involved.
>
> I notice in the cli log file, it constantly complains about not being able
> to guess the transport family, which an online search for the error or
> parts of the error only brought up issues that apply to older versions of
> gluster.
>
> One thing of note, is I'm currently trying a distributed volume with a 2nd
> host, and actions are still slow on the host containing the batch of bricks
> I'm trying to add is very slow, however the other host with no volumes at
> this time runs gluster vol info volname very quickly. I will be trying to
> add bricks shortly, but since I had very quick response the first time
> around I'm hoping someone may be able to shed some light for me.
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 32 group limit

2014-05-19 Thread Anand Avati
David,
http://review.gluster.org/7501 will "fix" the problem (in 3.5.x when it
gets backported), by providing a workaround solution where you (the admin)
will have to set up user accounts on the server side which are capable of
resolving the list of groupIDs for a user through getpwuid() call, and it
is the admin's responsibility to keep these credentials synchronized
(either using a centralized directory/ldap service or manually/scripting.)

In the last email I was referring to another approach where such an "extra
setup" is not necessary and the gluster native client and server can fetch
all the required groupIDs without any outside help.

There are benefits and problems with both approaches.
HTH


On Mon, May 19, 2014 at 11:57 AM, David F. Robinson <
david.robin...@corvidtec.com> wrote:

>  Avati,
>
> I am slightly confused.  Should we be able to utilize 93-groups with the
> current (3.5.0-2) release of gluster and fuse?  Or, are we stuck with
> 32-groups until the fixes are released in the next version?
> It wasn't clear if the fixes were to take you up to the 93-group limit or
> beyond it...
>
> David
>
>
> -- Original Message --
> From: "Anand Avati" 
> To: "Niels de Vos" 
> Cc: "gluster-users" 
> Sent: 5/19/2014 2:50:54 PM
> Subject: Re: [Gluster-users] 32 group limit
>
>
>  On Mon, May 19, 2014 at 8:39 AM, Niels de Vos  wrote:
>>
>> The 32 limit you are hitting is caused by FUSE. The Linux kernel module
>> provides the groups of the process that accesses the FUSE-mountpoint
>> through /proc/$PID/status (line starting with 'Groups:'). The kernel
>> does not pass more groups than 32, this limit is hardcoded in the FUSE
>> kernel module.
>>
>
> Minor nit - 32 groupid limit is independent of FUSE. It is just the
> limited number of groupIDs the kernel exposes through proc. FUSE
> filesystems's only option is to use this sub-optimal interface to get group
> IDs. It is not unreasonable to actually implement a new reverse call in
> FUSE over /dev/fuse to resolve group-ids of a given request/process. Even
> then, the 400byte RPC limit of 93 would apply (we could at some point in
> the future move away from RPC)
>
> Avati
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 32 group limit

2014-05-19 Thread Anand Avati
On Mon, May 19, 2014 at 8:39 AM, Niels de Vos  wrote:
>
> The 32 limit you are hitting is caused by FUSE. The Linux kernel module
> provides the groups of the process that accesses the FUSE-mountpoint
> through /proc/$PID/status (line starting with 'Groups:'). The kernel
> does not pass more groups than 32, this limit is hardcoded in the FUSE
> kernel module.
>

Minor nit - 32 groupid limit is independent of FUSE. It is just the limited
number of groupIDs the kernel exposes through proc. FUSE filesystems's only
option is to use this sub-optimal interface to get group IDs. It is not
unreasonable to actually implement a new reverse call in FUSE over
/dev/fuse to resolve group-ids of a given request/process. Even then, the
400byte RPC limit of 93 would apply (we could at some point in the future
move away from RPC)

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] User-serviceable snapshots design

2014-05-08 Thread Anand Avati
On Thu, May 8, 2014 at 12:20 PM, Jeff Darcy  wrote:

> > They were: a) snap view generation requires privileged ops to
> > glusterd. So moving this task to the server side solves a lot of those
> > challenges.
>
> Not really.  A server-side component issuing privileged requests
> whenever a client asks it to is no more secure than a client-side
> component issuing them directly.


client cannot ask the server side component to do any privileged requests
on its behalf. If it has the right to connect to the volume, then it can
issue a readdir() request and get served with whatever is served to it. If
it presents an unknown filehandle, snap-view-server returns ESTALE.


>  There needs to be some sort of
> authentication and authorization at the glusterd level (the only place
> these all converge).  This is a more general problem that we've had with
> glusterd for a long time.  If security is a sincere concern for USS,
> shouldn't we address it by trying to move the general solution forward?
>

The goal was to not make the security problem harder or worse. With this
design the privileged operation is still contained within the server side.
If clients were to issue RPCs to glusterd (to get list of snaps, their
volfiles etc.), it would have been a challenge for the general glusterd
security problem.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] User-serviceable snapshots design

2014-05-08 Thread Anand Avati
On Thu, May 8, 2014 at 11:48 AM, Jeff Darcy  wrote:

> > client graph is not dynamically modified. the snapview-client and
> > protocol/server are inserted by volgen and no further changes are made on
> > the client side. I believe Anand was referring to " Adding a
> protocol/client
> > instance to connect to protocol/server at the daemon" as an action being
> > performed by volgen.
>
> OK, so let's say we create a new volfile including connections for a
> snapshot
> that didn't even exist when the client first mounted.  Are you saying we do
> a full graph switch to that new volfile?


No graph changes either on client side or server side. The snap-view-server
will detect availability of new snapshot from glusterd, and will spin up a
new glfs_t for the corresponding snap, and start returning new list of
"names" in readdir(), etc.


>  That still seems dynamic.  Doesn't
> that still mean we need to account for USS state when we regenerate the
> next volfile after an add-brick (for example)?  One way or another the
> graph's going to change, which creates a lot of state-management issues.
>

No volfile/graph changes at all. Creation/removal of snapshots is handled
in the form of a dynamic list of glfs_t's on the server side.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] User-serviceable snapshots design

2014-05-08 Thread Anand Avati
On Thu, May 8, 2014 at 4:53 AM, Jeff Darcy  wrote:

> > > * How do clients find it?  Are we dynamically changing the client
> > >side graph to add new protocol/client instances pointing to new
> > >snapview-servers, or is snapview-client using RPC directly?  Are
> > >the snapview-server ports managed through the glusterd portmapper
> > >interface, or patched in some other way?
> > Adding a protocol/client instance to connect to protocol/server at the
> > daemon.
>
> So now the client graph is being dynamically modified, in ways that
> make it un-derivable from the volume configuration (because they're
> based in part on user activity since then)?  What happens if a normal
> graph switch (e.g. due to add-brick) happens?  I'll need to think some
> more about what this architectural change really means.


client graph is not dynamically modified. the snapview-client and
protocol/server are inserted by volgen and no further changes are made on
the client side. I believe Anand was referring to " Adding a
protocol/client instance to connect to protocol/server at the daemon" as an
action being performed by volgen.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] User-serviceable snapshots design

2014-05-08 Thread Anand Avati
On Thu, May 8, 2014 at 4:48 AM, Jeff Darcy  wrote:

>
> If snapview-server runs on all servers, how does a particular client
> decide which one to use?  Do we need to do something to avoid hot spots?
>
> Overall, it seems like having clients connect *directly* to the snapshot
> volumes once they've been started might have avoided some complexity or
> problems.  Was this considered?
>

Yes this was considered. I have mentioned the two reasons why this was
dropped in the other mail. They were: a) snap view generation requires
privileged ops to glusterd. So moving this task to the server side solves a
lot of those challenges. b) keep tab on total number of connections in the
system and don't explore the connections with more clients (given that
there can be lots of snapshots.)


> > > * How does snapview-server manage user credentials for connecting
> > >to snap bricks?  What if multiple users try to use the same
> > >snapshot at the same time?  How does any of this interact with
> > >on-wire or on-disk encryption?
> >
> > No interaction with on-disk or on-wire encryption. Multiple users can
> > always access the same snapshot (volume) at the same time. Why do you
> > see any restrictions there?
>
> If we're using either on-disk or on-network encryption, client keys and
> certificates must remain on the clients.  They must not be on servers.
> If the volumes are being proxied through snapview-server, it needs
> those credentials, but letting it have them defeats both security
> mechanisms.
>

The encryption xlator sits on top of snapview-client on the client side,
and should be able to decrypt file content whether coming from a snap view
or the main volume. keys and certs remain on the client. But thanks for
mentioning this, we need to spin up an instance of locks xlator on top of
snapview-server to satisfy the locking requests from crypt.

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] User-serviceable snapshots design

2014-05-08 Thread Anand Avati
On Thu, May 8, 2014 at 4:45 AM, Ira Cooper  wrote:

> Also inline.
>
> - Original Message -
>
> > The scalability factor I mentioned simply had to do with the core
> > infrastructure (depending on very basic mechanisms like the epoll wait
> > thread, the entire end-to-end flow of a single fop like say, a lookup()
> > here). Even though this was contained to an extent by the introduction
> > of the io-threads xlator in snapd, it is still a complex path that is
> > not exactly about high performance design. That wasn't the goal to begin
> > with.
>
> Yes, if you get rid of the daemon it doesn't have those issues ;).
>
> > I am not sure what the linear range versus a non-linear one has to do
> > with the design? Maybe you are seeing something that I miss. A random
> > gfid is generated in the snapview-server xlator on lookups. The
> > snapview-client is a kind of a basic redirector that detects when a
> > reference is made to a "virtual" inode (based on stored context) and
> > simply redirects to the snapd daemon. It stores the info returned from
> > snapview-server, capturing the essential inode info in the inode context
> > (note this is the client side inode we are talking abt).
>
> That last note, is merely a warning against changing the properties of the
> UUID generator, please ignore it.
>
> > In the daemon there is another level of translation which needs to
> > associate this gfid with an inode in the context of the protocol-server
> > xlator. The next step of the translation is that this inode needs to be
> > translated to the actual gfid on disk - that is the only on-disk gfid
> > which exists in one of the snapshotted gluster volumes. To that extent
> > the snapview-s xlator needs to know which is the glfs_t structure to
> > access so it can get to the right gfapi graph. Once it knows that, it
> > can access any object in that gfapi graph using the glfs_object (which
> > has the real inode info from the gfapi world and the actual on-disk
> gfid).
>
> No daemon!  SCRAP IT!  Throw it in the bin, and don't let it climb back
> out.
>
> What you are proposing: random gfid -> real gfid ; as the mapping the
> daemon must maintain.
>
> What I am proposing: real gfid + offset -> real gfid ; offset is a per
> snapshot value, local to the client.
>
> Because the lookup table is now trivial, a single integer per snapshot.
>  You don't need all that complex infrastructure.
>

The purpose for the existence of the daemon is two:

- client cannot perform privileged ops to glusterd regarding listing of
snaps etc.

- limit the total number of connections coming to bricks. If each client
has a new set of connections to each of the snpashot bricks, the total
number of connections in the system will become a function of the total
number of clients * total number of snapshots.

gfid management is something completely orthogonal, we can use the current
random gfid or a more deterministic one (going to require a LOT more
changes to make gfids deterministic, and what about already assigned ones,
etc.) whether the .snap view is generated on client side or server side.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Status on Gluster on OS X (10.9)

2014-04-04 Thread Anand Avati
I did now. I'd recommend adding a check for libintl.h in configure.ac and
fail gracefully suggesting installing gettext.

Thanks


On Fri, Apr 4, 2014 at 10:59 PM, Dennis Schafroth wrote:

>
> On 05 Apr 2014, at 07:38 , Anand Avati  wrote:
>
> And here:
>
> ./gf-error-codes.h:12:10: fatal error: 'libintl.h' file not found
>
>
> I guess I was wrong that gettext / libintl.h was not required. It seems to
> be in use in logging.c
>
> Until I figure out if this is the case, I would suggest installing gettext.
>
> cheers,
> :-Dennis
>
>
>
> On Fri, Apr 4, 2014 at 10:15 PM, Dennis Schafroth wrote:
>
>>
>> Pushed a fix to make it work without gettext / libintl header.
>>
>> I compiled without the CFLAGS and LDFLAGS
>>
>
> Hmm. Apparently not.
>
>
>
>> cheers,
>> :-Dennis
>>
>> On 05 Apr 2014, at 07:04 , Dennis Schafroth  wrote:
>>
>>
>> Bummer.
>>
>> That is from gettext which I thought was only optional.
>>
>> I got it using either Homebrew (http://brew.sh/) or macports
>>
>> Homebrew seems quite good these days I would prob. recommend that.
>>
>> It will install using a one-liner in /usr/local and but require sudo
>> right underway to sett rights
>>
>> brew install gettext
>>
>> It will require setting some CFLAGS / LDFLAGS when ./configure:
>> LDFLAGS=-L/usr/local/opt/gettext/lib
>> CPPFLAGS=-I/usr/local/opt/gettext/include
>>
>> cheers,
>> :-Dennis
>>
>> On 05 Apr 2014, at 06:56 , Anand Avati  wrote:
>>
>> Build fails for me:
>>
>> Making all in libglusterfs
>> Making all in src
>>   CC   libglusterfs_la-dict.lo
>>   CC   libglusterfs_la-xlator.lo
>>   CC   libglusterfs_la-logging.lo
>> logging.c:26:10: fatal error: 'libintl.h' file not found
>> #include 
>>  ^
>> 1 error generated.
>> make[4]: *** [libglusterfs_la-logging.lo] Error 1
>> make[3]: *** [all] Error 2
>> make[2]: *** [all-recursive] Error 1
>> make[1]: *** [all-recursive] Error 1
>> make: *** [all] Error 2
>>
>>
>> How did you get libintl.h in your system? Also, please add a check for it
>> in configure.ac and report the missing package.
>>
>> Thanks,
>>
>>
>> On Fri, Apr 4, 2014 at 6:08 PM, Dennis Schafroth 
>> wrote:
>>
>>>
>>> It's been quiet on this topic, but actually Harshavardhana and I have
>>> been quite busy off-line working on this. Since my initial "success" we
>>> have been able to get it  to compile with clang (almost as clean as with
>>> gcc) and actually run. The later was a bit tricky because clang has more
>>> strict strategy about exporting functions with inline, which ended with
>>> many runs with missing functions.
>>>
>>> So right now I can run everything, but there is an known issue with
>>> NFS/NLM4, but this should not matter for people trying to run the client
>>> with OSX FUSE.
>>>
>>> Anyone brave enough wanting to try the *client* can check out:
>>>
>>> Still need Xcode + command line tools (clang, make)
>>> A installed OSXFUSE (FUSE for OS X)
>>>
>>> $ git clone g...@forge.gluster.org
>>> :~schafdog/glusterfs-core/osx-glusterfs.git
>>> $ cd osx-glusterfs
>>>
>>> Either
>>> $ ./configure.osx
>>> Or
>>> - $ ./autogen.sh (requires aclocal, autoconf, automake)
>>> - $ ./configure
>>>
>>> $ make
>>> $ sudo make install
>>>
>>> You should be able to mount using sudo glusterfs --volfile=>> file>.vol 
>>>
>>> And yes this is very much bleeding edge. My mac did kernel panic
>>> yesterday, when it was running both client and server.
>>>
>>> I would really like to get feed back from anyone trying this out.
>>>
>>> cheers,
>>> :-Dennis Schafroth
>>>
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>> ___
>> Gluster-devel mailing list
>> gluster-de...@nongnu.org
>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>>
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Status on Gluster on OS X (10.9)

2014-04-04 Thread Anand Avati
And here:

./gf-error-codes.h:12:10: fatal error: 'libintl.h' file not found


On Fri, Apr 4, 2014 at 10:15 PM, Dennis Schafroth wrote:

>
> Pushed a fix to make it work without gettext / libintl header.
>
> I compiled without the CFLAGS and LDFLAGS
>
> cheers,
> :-Dennis
>
> On 05 Apr 2014, at 07:04 , Dennis Schafroth  wrote:
>
>
> Bummer.
>
> That is from gettext which I thought was only optional.
>
> I got it using either Homebrew (http://brew.sh/) or macports
>
> Homebrew seems quite good these days I would prob. recommend that.
>
> It will install using a one-liner in /usr/local and but require sudo right
> underway to sett rights
>
> brew install gettext
>
> It will require setting some CFLAGS / LDFLAGS when ./configure:
> LDFLAGS=-L/usr/local/opt/gettext/lib
> CPPFLAGS=-I/usr/local/opt/gettext/include
>
> cheers,
> :-Dennis
>
> On 05 Apr 2014, at 06:56 , Anand Avati  wrote:
>
> Build fails for me:
>
> Making all in libglusterfs
> Making all in src
>   CC   libglusterfs_la-dict.lo
>   CC   libglusterfs_la-xlator.lo
>   CC   libglusterfs_la-logging.lo
> logging.c:26:10: fatal error: 'libintl.h' file not found
> #include 
>  ^
> 1 error generated.
> make[4]: *** [libglusterfs_la-logging.lo] Error 1
> make[3]: *** [all] Error 2
> make[2]: *** [all-recursive] Error 1
> make[1]: *** [all-recursive] Error 1
> make: *** [all] Error 2
>
>
> How did you get libintl.h in your system? Also, please add a check for it
> in configure.ac and report the missing package.
>
> Thanks,
>
>
> On Fri, Apr 4, 2014 at 6:08 PM, Dennis Schafroth 
> wrote:
>
>>
>> It's been quiet on this topic, but actually Harshavardhana and I have
>> been quite busy off-line working on this. Since my initial "success" we
>> have been able to get it  to compile with clang (almost as clean as with
>> gcc) and actually run. The later was a bit tricky because clang has more
>> strict strategy about exporting functions with inline, which ended with
>> many runs with missing functions.
>>
>> So right now I can run everything, but there is an known issue with
>> NFS/NLM4, but this should not matter for people trying to run the client
>> with OSX FUSE.
>>
>> Anyone brave enough wanting to try the *client* can check out:
>>
>> Still need Xcode + command line tools (clang, make)
>> A installed OSXFUSE (FUSE for OS X)
>>
>> $ git clone g...@forge.gluster.org
>> :~schafdog/glusterfs-core/osx-glusterfs.git
>> $ cd osx-glusterfs
>>
>> Either
>> $ ./configure.osx
>> Or
>> - $ ./autogen.sh (requires aclocal, autoconf, automake)
>> - $ ./configure
>>
>> $ make
>> $ sudo make install
>>
>> You should be able to mount using sudo glusterfs --volfile=> file>.vol 
>>
>> And yes this is very much bleeding edge. My mac did kernel panic
>> yesterday, when it was running both client and server.
>>
>> I would really like to get feed back from anyone trying this out.
>>
>> cheers,
>> :-Dennis Schafroth
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>
>
> ___
> Gluster-devel mailing list
> gluster-de...@nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Status on Gluster on OS X (10.9)

2014-04-04 Thread Anand Avati
Build fails for me:

Making all in libglusterfs
Making all in src
  CC   libglusterfs_la-dict.lo
  CC   libglusterfs_la-xlator.lo
  CC   libglusterfs_la-logging.lo
logging.c:26:10: fatal error: 'libintl.h' file not found
#include 
 ^
1 error generated.
make[4]: *** [libglusterfs_la-logging.lo] Error 1
make[3]: *** [all] Error 2
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2


How did you get libintl.h in your system? Also, please add a check for it
in configure.ac and report the missing package.

Thanks,


On Fri, Apr 4, 2014 at 6:08 PM, Dennis Schafroth wrote:

>
> It's been quiet on this topic, but actually Harshavardhana and I have been
> quite busy off-line working on this. Since my initial "success" we have
> been able to get it  to compile with clang (almost as clean as with gcc)
> and actually run. The later was a bit tricky because clang has more strict
> strategy about exporting functions with inline, which ended with many runs
> with missing functions.
>
> So right now I can run everything, but there is an known issue with
> NFS/NLM4, but this should not matter for people trying to run the client
> with OSX FUSE.
>
> Anyone brave enough wanting to try the *client* can check out:
>
> Still need Xcode + command line tools (clang, make)
> A installed OSXFUSE (FUSE for OS X)
>
> $ git clone g...@forge.gluster.org
> :~schafdog/glusterfs-core/osx-glusterfs.git
> $ cd osx-glusterfs
>
> Either
> $ ./configure.osx
> Or
> - $ ./autogen.sh (requires aclocal, autoconf, automake)
> - $ ./configure
>
> $ make
> $ sudo make install
>
> You should be able to mount using sudo glusterfs --volfile= file>.vol 
>
> And yes this is very much bleeding edge. My mac did kernel panic
> yesterday, when it was running both client and server.
>
> I would really like to get feed back from anyone trying this out.
>
> cheers,
> :-Dennis Schafroth
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How to compile gluster 3.4 on Mac OS X 10.8.4?

2014-03-19 Thread Anand Avati
(moving to gluster-devel@)

That is great progress! Please keep posting the intermediate work upstream
(into gerrit) as you move along..

Regarding the hang: Do you have cli.log printing anything at all (typically
/var/log/glusterfs/cli.log)

Avati



On Wed, Mar 19, 2014 at 5:07 PM, Dennis Schafroth wrote:

>
> I now have a branch of HEAD compiling under OS X 10.9, when I disable the
> qemu-block and fusermount options.
>
> Still having a build issue with libtool and libspl, which I have only
> hacked my way around.
>
> Actually both the glusterd and gluster runs, but using gluster (OS X)
>  hangs on both pool list and peer probe . However probing
> glusterd from a linux succeds. But glusterd's log does indicates some issue.
>
> cheers,
> :-Dennis Schafroth
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gfid files which are not hard links anymore

2014-03-12 Thread Anand Avati
Most likely reason is that someone deleted these files manually from the
brick directories. You must never access/modify the data from the brick
directories directly, and all modifications must happen from a gluster
client mount point. You may inspect the file contents to figure out if you
still need those files. In the current position, those files can never be
absorbed into the gluster volume.

Avati


On Wed, Mar 12, 2014 at 3:48 AM, Chiku  wrote:

> Hello,
>
> I have few question about folders .glusterfs/xx/xx/
> What I understand it's inside these folder there are files which are
> hard-links to a regular files in the volume and the filename matches with
> trusted.gfid of the regular file linked.
> Do I miss anything else ?
>
> I have an replication volume with 3 nodes and right now the self-heal info
> doesn't know.
> with find . -path -type f -links -2, I find for :
> node 1 : 0
> node 2 : 2 files
> ./.glusterfs/b1/87/b187389f-0688-4828-b02a-e6f6e1daa4ea
> ./.glusterfs/c0/06/c006fe90-43ce-43d1-a1ea-ae2db0f04637
>
> -rw-r--r-- 1 1019 abc 206 Feb 17 02:50 ./.glusterfs/b1/87/b187389f-
> 0688-4828-b02a-e6f6e1daa4ea
>
> node 3 : 644 files
>
> I check for node 2, these 2 gfid files don't match any trusted.gfid.
> find . -noleaf -ignore_readdir_race -path ./.glusterfs -prune -o -type f
> -print0 |xargs -0 getfattr -m . -n trusted.gfid -e hex |grep 'e6f6e1daa4ea'
> On node 3, those gfid files don't match any trusted.gfid
>
> Can I remove those gfid files ?
> That happended because someone removed the regular files without remonving
> the gfid files ?
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] iSCSI and Gluster

2014-03-05 Thread Anand Avati
Can you please post some logs (the client logs which is exporting ISCSI)?
It is hard to diagnose issues without logs.

thanks,
Avati


On Wed, Mar 5, 2014 at 9:28 AM, Carlos Capriotti  wrote:

> Hi all. Again.
>
> I am still fighting that "VMware esxi cannot use striped gluster volumes"
> thing, and a couple of crazy ideas are coming to mind.
>
> One of them is using iSCSI WITH gluster, and esxi connecting via iSCSI.
>
> My experience with iSCSI is limited to a couple of FreeNAS test installs,
> and some tuning on FreeNAS and esxi in order to implement multipathing, but
> nothing dead serious.
>
> I remember that after creating a volume and formating it (zvol), THEN
> space was allocated to iSCSI. Makes some sense, since iSCIS is a block
> device, and after it is available, the operating system will actually use
> it. But it is a bit foggy.
>
> I am trying to bypass the present limitation on Gluster, which refuses to
> talk to esxi using a striped volume.
>
> So, here is the question: anyone here uses gluster and iSCSI ?
>
> Would anyone care to comment on performance of this kind of solution, pros
> and cons ?
>
> Thanks.
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] libgfapi consistency model

2014-02-05 Thread Anand Avati
Jay,
there are few parts to consistency.

- file data consistency: libgfapi by itself does not perform any file data
caching, it is entirely dependent on the set of translators (write-behind,
io-cache, read-ahead, quick-read) that are loaded, and the effect of those
xlators is same in both FUSE and libgfapi

- inode attribute/xattr (metadata) consistency: (the thing you tune with
--attribute-timeout=N in FUSE) again libgfapi does not perform any meta
data caching and depends whether you have loaded md-cache/stat-prefetch
translator.

- entry consistency: this is remembering dentries (e.g: "the name
'file.txt' under directory having gfid 12345 maps to file with gfid 48586",
or "the name 'cat.jpg' under directory having gfid 456346 does not exist or
map to any inode" etc.) and is similar to the thing you tune with
--entry-timeout=N in FUSE. libgfapi remembers such dentries in an
optimistic way such that the path resolver re-uses the knowledge for the
next path resolution call. However the last component of a path is always
resolved "uncached" (even if entry is available in cache) and upon any
ESTALE error the entire path resolution + fop is re-attempted in a purely
uncached mode. This approach is very similar to the retry based optimistic
path resolution in the more recent linux kernel vfs.

HTH
Avati


On Wed, Feb 5, 2014 at 8:31 AM, Jay Vyas  wrote:

> Hi gluster !
>
> How does libgfapi enforce FileSystem consistency?  Is it better than doing
> this than exsiting FUSE mounts which require the *timeout parameters to be
> set to 0?
>
> Thanks!
>
> This is important to us in hadoopland.
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Passing noforget option to glusterfs native client mounts

2013-12-24 Thread Anand Avati
Hi,
Allowing noforget option to FUSE will not help for your cause. Gluster
persents the address of the inode_t as the nodeid to FUSE. In turn FUSE
creates a filehandle using this nodeid for knfsd to export to nfs client.
When knfsd fails over to another server, FUSE will decode the handle
encoded by the other NFS server and try to use the nodeid of the other
server - which will obviously not work as the virtual address of glusterfs
process on the other server is not valid here.

Short version: the file-handle generated through FUSE is not durable. The
"noforget" option in FUSE is a hack to avoid ESTALE messages because of
dcache pruning. If you have enough inode in your volume, your system will
go OOM at some point. The "noforget" is NOT a solution for providing NFS
failover to a different server.

For reasons such as these, we ended up implementing our own NFS server
where we encode a filehandle using the GFID (which is durable across
reboots and server failovers). I would strongly recommend NOT using knfsd
with any FUSE based filesystems (not just glusterfs) for a serious
production use, and it will just not work if you are designing for NFS high
availability/fail-over.

Thanks,
Avati


On Sat, Dec 21, 2013 at 8:52 PM, Anirban Ghoshal <
chalcogen_eg_oxy...@yahoo.com> wrote:

> If somebody has an idea on how this could be done, could you please help
> out? I am still stuck on this, apparently...
>
> Thanks,
> Anirban
>
>
>   On Thursday, 19 December 2013 1:40 AM, Chalcogen <
> chalcogen_eg_oxy...@yahoo.com> wrote:
>   P.s. I think I need to clarify this:
>
> I am only reading from the mounts, and not modifying anything on the
> server. and so the commonest causes on stale file handles do not appy.
>
> Anirban
>
> On Thursday 19 December 2013 01:16 AM, Chalcogen wrote:
>
> Hi everybody,
>
> A few months back I joined a project where people want to replace their
> legacy fuse-based (twin-server) replicated file-system with GlusterFS. They
> also have a high-availability NFS server code tagged with the kernel NFSD
> that they would wish to retain (the nfs-kernel-server, I mean). The reason
> they wish to retain the kernel NFS and not use the NFS server that comes
> with GlusterFS is mainly because there's this bit of code that allows NFS
> IP's to be migrated from one host server to the other in the case that one
> happens to go down, and tweaks on the export server configuration allow the
> file-handles to remain identical on the new host server.
>
> The solution was to mount gluster volumes using the mount.glusterfs native
> client program and then export the directories over the kernel NFS server.
> This seems to work most of the time, but on rare occasions, 'stale file
> handle' is reported off certain clients, which really puts a damper over
> the 'high-availability' thing. After suitably instrumenting the nfsd/fuse
> code in the kernel, it seems that decoding of the file-handle fails on the
> server because the inode record corresponding to the nodeid in the handle
> cannot be looked up. Combining this with the fact that a second attempt by
> the client to execute lookup on the same file passes, one might suspect
> that the problem is identical to what many people attempting to export fuse
> mounts over the kernel's NFS server are facing; viz, fuse 'forgets' the
> inode records thereby causing ilookup5() to fail. Miklos and other fuse
> developers/hackers would point towards '-o noforget' while mounting their
> fuse file-systems.
>
> I tried passing  '-o noforget' to mount.glusterfs, but it does not seem to
> recognize it. Could somebody help me out with the correct syntax to pass
> noforget to gluster volumes? Or, something we could pass to glusterfs that
> would instruct fuse to allocate a bigger cache for our inodes?
>
> Additionally, should you think that something else might be behind our
> problems, please do let me know.
>
> Here's my configuration:
>
> Linux kernel version: 2.6.34.12
> GlusterFS versionn: 3.4.0
> nfs.disable option for volumes: OFF on all volumes
>
> Thanks a lot for your time!
> Anirban
>
> P.s. I found quite a few pages on the web that admonish users that
> GlusterFS is not compatible with the kernel NFS server, but do not really
> give much detail. Is this one of the reasons for saying so?
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gerrit doesn't use HTTPS

2013-12-14 Thread Anand Avati
On Sat, Dec 14, 2013 at 5:58 AM, James  wrote:

> On Sat, Dec 14, 2013 at 3:28 AM, Vijay Bellur  wrote:
> > On 12/13/2013 04:05 AM, James wrote:
> >>
> >> I just noticed that the Gluster Gerrit [1] doesn't use HTTPS!
> >>
> >> Can this be fixed ASAP?
> >>
> >
> > Configured now, thanks!
> Thanks for looking into this promptly!
>
> >
> > Please check and let us know if you encounter any problems with https.
> 1) None of the CN information (name, location, etc) has been filled
> in... Either that or I'm hitting a MITM (less likely).
>
> 2) Ideally the certificate would be signed. If it's not signed, you
> should at least publish the "correct" signature somewhere we trust.
>
> If you need help wrangling any of the SSL, I'm happy to help!
>

IIRC we should be having a CA signed cert for *.gluster.org. Copying JM.

Avati

>
> > -Vijay
> >
> Thanks!
>
> James
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gluster fails under heavy array job load load

2013-12-12 Thread Anand Avati
Please provide the full client and server logs (in a bug report). The
snippets give some hints, but are not very meaningful without the full
context/history since mount time (they have after-the-fact symptoms, but
not the part which show the reason why disconnects happened).

Even before looking into the full logs here are some quick observations:

- write-behind-window-size = 1024MB seems *excessively* high. Please set
this to 1MB (default) and check if the stability improves.

- I see RDMA is enabled on the volume. Are you mounting clients through
RDMA? If so, for the purpose of diagnostics can you mount through TCP and
check the stability improves? If you are using RDMA with such a high
write-behind-window-size, spurious ping-timeouts are an almost certainty
during heavy writes. The RDMA driver has limited flow control, and setting
such a high window-size can easily congest all the RDMA buffers resulting
in spurious ping-timeouts and disconnections.

Avati


On Thu, Dec 12, 2013 at 5:03 PM, harry mangalam wrote:

>  Hi All,
>
> (Gluster Volume Details at bottom)
>
>
>
> I've posted some of this previously, but even after various upgrades,
> attempted fixes, etc, it remains a problem.
>
>
>
>
>
> Short version: Our gluster fs (~340TB) provides scratch space for a
> ~5000core academic compute cluster.
>
> Much of our load is streaming IO, doing a lot of genomics work, and that
> is the load under which we saw this latest failure.
>
> Under heavy batch load, especially array jobs, where there might be
> several 64core nodes doing I/O on the 4servers/8bricks, we often get job
> failures that have the following profile:
>
>
>
> Client POV:
>
> Here is a sampling of the client logs (/var/log/glusterfs/gl.log) for all
> compute nodes that indicated interaction with the user's files
>
> 
>
>
>
> Here are some client Info logs that seem fairly serious:
>
> 
>
>
>
> The errors that referenced this user were gathered from all the nodes that
> were running his code (in compute*) and agglomerated with:
>
>
>
> cut -f2,3 -d']' compute* |cut -f1 -dP | sort | uniq -c | sort -gr
>
>
>
> and placed here to show the profile of errors that his run generated.
>
> 
>
>
>
> so 71 of them were:
>
> W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-7: remote
> operation failed: Transport endpoint is not connected.
>
> etc
>
>
>
> We've seen this before and previously discounted it bc it seems to have
> been related to the problem of spurious NFS-related bugs, but now I'm
> wondering whether it's a real problem.
>
> Also the 'remote operation failed: Stale file handle. ' warnings.
>
>
>
> There were no Errors logged per se, tho some of the W's looked fairly
> nasty, like the 'dht_layout_dir_mismatch'
>
>
>
> From the server side, however, during the same period, there were:
>
> 0 Warnings about this user's files
>
> 0 Errors
>
> 458 Info lines
>
> of which only 1 line was not a 'cleanup' line like this:
>
> ---
>
> 10.2.7.11:[2013-12-12 21:22:01.064289] I
> [server-helpers.c:460:do_fd_cleanup] 0-gl-server: fd cleanup on
> /path/to/file
>
> ---
>
> it was:
>
> ---
>
> 10.2.7.14:[2013-12-12 21:00:35.209015] I
> [server-rpc-fops.c:898:_gf_server_log_setxattr_failure] 0-gl-server:
> 113697332: SETXATTR /bio/tdlong/RNAseqIII/ckpt.1084030
> (c9488341-c063-4175-8492-75e2e282f690) ==> trusted.glusterfs.dht
>
> ---
>
>
>
> We're losing about 10% of these kinds of array jobs bc of this, which is
> just not supportable.
>
>
>
>
>
>
>
> Gluster details
>
>
>
> servers and clients running gluster 3.4.0-8.el6 over QDR IB, IPoIB, thru 2
> Mellanox, 1 Voltaire switches, Mellanox cards, CentOS 6.4
>
>
>
> $ gluster volume info
>
>  Volume Name: gl
>
> Type: Distribute
>
> Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332
>
> Status: Started
>
> Number of Bricks: 8
>
> Transport-type: tcp,rdma
>
> Bricks:
>
> Brick1: bs2:/raid1
>
> Brick2: bs2:/raid2
>
> Brick3: bs3:/raid1
>
> Brick4: bs3:/raid2
>
> Brick5: bs4:/raid1
>
> Brick6: bs4:/raid2
>
> Brick7: bs1:/raid1
>
> Brick8: bs1:/raid2
>
> Options Reconfigured:
>
> performance.write-behind-window-size: 1024MB
>
> performance.flush-behind: on
>
> performance.cache-size: 268435456
>
> nfs.disable: on
>
> performance.io-cache: on
>
> performance.quick-read: on
>
> performance.io-thread-count: 64
>
> auth.allow: 10.2.*.*,10.1.*.*
>
>
>
>
>
> 'gluster volume status gl detail':
>
> 
>
>
>
> ---
>
> Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
>
> [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
>
> 415 South Circle View Dr, Irvine, CA, 92697 [shipping]
>
> MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
>
> ---
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-

Re: [Gluster-users] Structure needs cleaning on some files

2013-12-12 Thread Anand Avati
Looks like your issue was fixed by patch http://review.gluster.org/4989/ in
master branch. Backporting this to release-3.4 now.

Thanks!
Avati


On Thu, Dec 12, 2013 at 1:26 PM, Anand Avati  wrote:

> I have the same question. Do you have excessively high --entry-timeout
> parameter to your FUSE mount? In any case, "Structure needs cleaning" error
> should not surface up to FUSE and that is still a bug.
>
>
> On Thu, Dec 12, 2013 at 12:46 PM, Maik Kulbe <
> i...@linux-web-development.de> wrote:
>
>> How do you mount your Client? FUSE? I had similar problems when playing
>> around with the timeout options for the FUSE mount. If they are too high
>> they cache the metadata for too long. When you move the file the inode
>> should stay the same and on the second node the path should stay in cache
>> for a while so it still knows the inode for that moved files old path thus
>> can act on the file without knowing it's path.
>>
>> The problems kick in when you delete a file and recreate it - the cache
>> tries to access the old inode, which was deleted, thus throwing errors. If
>> I recall correctly the "structure needs cleaning" is one of two error
>> messages I got, depending on which of the timeout mount options was set to
>> a higher value.
>>
>> -Original Mail-
>> From: Johan Huysmans [johan.huysm...@inuits.be]
>> Sent: 12.12.13 - 14:51:35
>> To: gluster-users@gluster.org [gluster-users@gluster.org]
>>
>> Subject: Re: [Gluster-users] Structure needs cleaning on some files
>>
>>
>>  I created a bug for this issue:
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1041109
>>>
>>> gr.
>>> Johan
>>>
>>> On 10-12-13 12:52, Johan Huysmans wrote:
>>>
>>> Hi All,
>>>
>>> It seems I can easily reproduce the problem.
>>>
>>> * on node 1 create a file (touch , cat , ...).
>>> * on node 2 take md5sum of direct file (md5sum /path/to/file)
>>> * on node 1 move file to other name (mv file file1)
>>> * on node 2 take md5sum of direct file (md5sum /path/to/file), this is
>>> still working although the file is not really there
>>> * on node 1 change file content
>>> * on node 2 take md5sum of direct file (md5sum /path/to/file), this is
>>> still working and has a changed md5sum
>>>
>>> This is really strange behaviour.
>>> Is this normal, can this be altered with a a setting?
>>>
>>> Thanks for any info,
>>> gr.
>>> Johan
>>>
>>> On 10-12-13 10:02, Johan Huysmans wrote:
>>>
>>> I could reproduce this problem with while my mount point is running in
>>> debug mode.
>>> logfile is attached.
>>>
>>> gr.
>>> Johan Huysmans
>>>
>>> On 10-12-13 09:30, Johan Huysmans wrote:
>>>
>>> Hi All,
>>>
>>> When reading some files we get this error:
>>> md5sum: /path/to/file.xml: Structure needs cleaning
>>>
>>> in /var/log/glusterfs/mnt-sharedfs.log we see these errors:
>>> [2013-12-10 08:07:32.256910] W
>>> [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0:
>>> remote operation failed: No such file or directory
>>> [2013-12-10 08:07:32.257436] W
>>> [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1:
>>> remote operation failed: No such file or directory
>>> [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk]
>>> 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml => -1 (Structure
>>> needs cleaning)
>>>
>>> We are using gluster 3.4.1-3 on CentOS6.
>>> Our servers are 64-bit, our clients 32-bit (we are already using
>>> --enable-ino32 on the mountpoint)
>>>
>>> This is my gluster configuration:
>>> Volume Name: testvolume
>>> Type: Replicate
>>> Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: SRV-1:/gluster/brick1
>>> Brick2: SRV-2:/gluster/brick2
>>> Options Reconfigured:
>>> performance.force-readdirp: on
>>> performance.stat-prefetch: off
>>> network.ping-timeout: 5
>>>
>>> And this is how the applications work:
>>> We have 2 client nodes who both have a fuse.glusterfs mountpoint.
>>> On 1 client node we have a application which writes files.
>>> On the other client node we have a applic

Re: [Gluster-users] Structure needs cleaning on some files

2013-12-12 Thread Anand Avati
I have the same question. Do you have excessively high --entry-timeout
parameter to your FUSE mount? In any case, "Structure needs cleaning" error
should not surface up to FUSE and that is still a bug.


On Thu, Dec 12, 2013 at 12:46 PM, Maik Kulbe
wrote:

> How do you mount your Client? FUSE? I had similar problems when playing
> around with the timeout options for the FUSE mount. If they are too high
> they cache the metadata for too long. When you move the file the inode
> should stay the same and on the second node the path should stay in cache
> for a while so it still knows the inode for that moved files old path thus
> can act on the file without knowing it's path.
>
> The problems kick in when you delete a file and recreate it - the cache
> tries to access the old inode, which was deleted, thus throwing errors. If
> I recall correctly the "structure needs cleaning" is one of two error
> messages I got, depending on which of the timeout mount options was set to
> a higher value.
>
> -Original Mail-
> From: Johan Huysmans [johan.huysm...@inuits.be]
> Sent: 12.12.13 - 14:51:35
> To: gluster-users@gluster.org [gluster-users@gluster.org]
>
> Subject: Re: [Gluster-users] Structure needs cleaning on some files
>
>
>  I created a bug for this issue:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1041109
>>
>> gr.
>> Johan
>>
>> On 10-12-13 12:52, Johan Huysmans wrote:
>>
>> Hi All,
>>
>> It seems I can easily reproduce the problem.
>>
>> * on node 1 create a file (touch , cat , ...).
>> * on node 2 take md5sum of direct file (md5sum /path/to/file)
>> * on node 1 move file to other name (mv file file1)
>> * on node 2 take md5sum of direct file (md5sum /path/to/file), this is
>> still working although the file is not really there
>> * on node 1 change file content
>> * on node 2 take md5sum of direct file (md5sum /path/to/file), this is
>> still working and has a changed md5sum
>>
>> This is really strange behaviour.
>> Is this normal, can this be altered with a a setting?
>>
>> Thanks for any info,
>> gr.
>> Johan
>>
>> On 10-12-13 10:02, Johan Huysmans wrote:
>>
>> I could reproduce this problem with while my mount point is running in
>> debug mode.
>> logfile is attached.
>>
>> gr.
>> Johan Huysmans
>>
>> On 10-12-13 09:30, Johan Huysmans wrote:
>>
>> Hi All,
>>
>> When reading some files we get this error:
>> md5sum: /path/to/file.xml: Structure needs cleaning
>>
>> in /var/log/glusterfs/mnt-sharedfs.log we see these errors:
>> [2013-12-10 08:07:32.256910] W
>> [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0:
>> remote operation failed: No such file or directory
>> [2013-12-10 08:07:32.257436] W
>> [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1:
>> remote operation failed: No such file or directory
>> [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk]
>> 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml => -1 (Structure
>> needs cleaning)
>>
>> We are using gluster 3.4.1-3 on CentOS6.
>> Our servers are 64-bit, our clients 32-bit (we are already using
>> --enable-ino32 on the mountpoint)
>>
>> This is my gluster configuration:
>> Volume Name: testvolume
>> Type: Replicate
>> Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7
>> Status: Started
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: SRV-1:/gluster/brick1
>> Brick2: SRV-2:/gluster/brick2
>> Options Reconfigured:
>> performance.force-readdirp: on
>> performance.stat-prefetch: off
>> network.ping-timeout: 5
>>
>> And this is how the applications work:
>> We have 2 client nodes who both have a fuse.glusterfs mountpoint.
>> On 1 client node we have a application which writes files.
>> On the other client node we have a application which reads these
>> files.
>> On the node where the files are written we don't see any problem,
>> and can read that file without problems.
>> On the other node we have problems (error messages above) reading
>> that file.
>> The problem occurs when we perform a md5sum on the exact file, when
>> perform a md5sum on all files in that directory there is no problem.
>>
>> How can we solve this problem as this is annoying.
>> The problem occurs after some time (can be days), an umount and
>> mount of the mountpoint solves it for some days.
>> Once it occurs (and we don't remount) it occurs every time.
>>
>> I hope someone can help me with this problems.
>>
>> Thanks,
>> Johan Huysmans
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>
> _

Re: [Gluster-users] Is Gluster the wrong solution for us?

2013-12-11 Thread Anand Avati
Scott,
It is really unfortunate that you were bit by that bug. I am hoping to
convince you to at least not abandon the deployment this early with some
responses:

- Note that you typically don't have to proactively rebalance your volume.
If your new data comes in the form of new directories, they naturally
spread out. Even old directories will consume the new servers once
min-free-disk is reached.

- Rebalance algorithm has a layout-overlap-maximizer function to minimize
the amount of data moved. The diagram in the blog post you linked is
describing old behavior. The overlap maximizer can be found here:
https://github.com/gluster/glusterfs/blob/master/xlators/cluster/dht/src/dht-selfheal.c#L633

- Other than the overlap maximizer, there are enhancements to cancel
negative moves (moving to serve with lesser free space) which also
contribute significantly towards minimizing "churn".

- There have been a lot of bug fixes in rebalance in the master branch, and
we are actively backporting them into 3.4.2. I am fairly confident you will
have a much smoother experience with 3.4.2.

Hope that helps!
Avati



On Wed, Dec 11, 2013 at 5:15 PM, Scott Smith wrote:

>  We are about to abandon GlusterFS as a solution for our object storage
> needs.  I’m hoping to get some feedback to tell me whether we have missed
> something and are making the wrong decision.  We’re already a year into
> this project after evaluating a number of solutions.  I’d like not to
> abandon GlusterFS if we just misunderstand how it works.
>
>
>
> Our use case is fairly straight forward.  We need to save a bunch of
> somewhat large files (1MB-100MB).  For the most part, these files are write
> once, read several times.  Our initial store is 80TB, but we expect to go
> to roughly 320TB fairly quickly.  After that, we expect to be adding
> another 80TB every few months.  We are using some COTS servers which we add
> in pairs; each server has 40TB of usable storage.  We intend to keep two
> copies of each file.  We currently run 4TB bricks
>
>
>
> In our somewhat limited test environment, GlusterFS seemed to work well.
> And, our initial introduction of GlusterFS into our production environment
> went well.  We had our initial 2 server (80TB) cluster about 50% full and
> things seemed to be going well.
>
>
>
> Then we added another pair of servers (for a total of 160TB).  This went
> fine until we did the rebalance.  We were running 3.3.1.  We ran into the
> handle leak problem (which unfortunately we didn’t know about beforehand).
> We also found that if any of the bricks went offline while the rebalance
> was going on, then files were lost or they lost their permissions.  We
> still don’t know why some of the bricks went offline, but they did and we
> have verified in our test environment that this is sufficient to cause the
> corruption problem.
>
>
>
> The good news is that we think both of these problems got fixed in 3.4.1.
> So why are we leaving?
>
>
>
> In trying to figure out what was going on with our GlusterFS system after
> the disastrous rebalance, we ran across two posts.  The first one was
> http://hekafs.org/index.php/2012/03/glusterfs-algorithms-distribution/.
> If we understand it correctly, anytime you add new storage servers to your
> cluster, you have to do a rebalance and that rebalance will require a
> minimum of 50% of the data in the cluster to be moved to make the hashing
> algorithms work.  This means that when we have a 320TB cluster and add
> another 80TB, we have to move at least 160TB just to get things back into
> balance.  Our estimate is that that will take months.  It probably won’t
> finish before we need to add another 80TB.
>
>
>
> The other post we ran across was
> http://www.gluster.org/community/documentation/index.php/Planning34/ElasticBrick.
> This post seems to confirm our understanding of the rebalance.  It appears
> to be a discussion of the rebalance problem and a possible solution.  It
> was apparently discussed for 3.4, but didn’t make the cut.
>
>
>
> I’d be happy to find out that we just got it wrong.  Tell me that
> rebalancing doesn’t work the way we think.  Or maybe we should configure
> things different or something.
>
>
>
> My problem is that if GlusterFS isn’t good for starting with a small
> cluster (80TB) and growing over time to half a petabyte, what is the use
> case it is intended for?  Do you really have to start out with the amount
> of storage you think you’ll need in the long-run and just fill it up as you
> go?  That’s why I’m nervous about our understanding of the rebalance.  It’s
> hard to believe it works this way (at least from our perspective).
>
>
>
> We have a lot of man hours into writing code and putting infrastructure in
> for GlusterFS.  We can likely reuse much of it for another system.  I would
> just like to know that we really do understand the rebalance and that it
> really works the way I described it before we start evaluating other object
> store solutions.
>
>
>

Re: [Gluster-users] Mechanisms for automatic management of Gluster

2013-12-11 Thread Anand Avati
James,
This is the right way to think about the problem. I have more specific
comments in the script, but just wanted to let you know this is a great
start.

Thanks!


On Wed, Nov 27, 2013 at 7:42 AM, James  wrote:

> Hi,
>
> This is along the lines of "tools for sysadmins". I plan on using
> these algorithms for puppet-gluster, but will try to maintain them
> separately as a standalone tool.
>
> The problem: Given a set of bricks and servers, if they have a logical
> naming convention, can an algorithm decide the ideal order. This could
> allow parameters such as replica count, and
> chained=true/false/offset#.
>
> The second problem: Given a set of bricks in a volume, if someone adds
> X bricks and removes Y bricks, is this valid, and what is the valid
> sequence of add/remove brick commands.
>
> I've written some code with test cases to try and figure this all out.
> I've left out a lot of corner cases, but the boilerplate is there to
> make it happen. Hopefully it's self explanatory. (gluster.py) Read and
> run it.
>
> Once this all works, the puppet-gluster use case is magic. It will be
> able to take care of these operations for you (if you want).
>
> For non puppet users, this will give admins the confidence to know
> what commands they should _probably_ run in what order. I say probably
> because we assume that if there's an error, they'll stop and inspect
> first.
>
> I haven't yet tried to implement the chained cases, or anything
> involving striping. There are also some corner cases with some of the
> current code. Once you add chaining and striping, etc, I realized it
> was time to step back and ask for help :)
>
> I hope this all makes sense. Comments, code, test cases are appreciated!
>
> Cheers,
>
> James
> @purpleidea (irc/twitter)
> https://ttboj.wordpress.com/
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Why does NUFA not allow mounts even from trusted peers without a subvolume?

2013-12-04 Thread Anand Avati
Oops, you are right. I got misled by git-describe. Can you please add
http://review.gluster.org/5414/ to the backport wishlist (for 3.4.2) -
http://www.gluster.org/community/documentation/index.php/Backport_Wishlist

Thanks!
Avati


On Wed, Dec 4, 2013 at 2:34 PM, Michael Lampe wrote:

> Just checked nufa.c from the official 3.4.1 sources and things look very
> much different from
>
> http://review.gluster.org/#/c/5414/2/xlators/cluster/dht/src/nufa.c
>
> There is surely no such fallback.
>
> -Michael
>
>
> Michael Lampe wrote:
>
>> I'm using the 3.4.1-3.el5 packages from gluster.org.
>>
>> Am I perhaps misinterpreting the problem?
>>
>> -Michael
>>
>> Anand Avati wrote:
>>
>>> You are probably using 3.3 or older? This has been fixed in 3.4 (
>>> http://review.gluster.org/5414)
>>>
>>> Thanks,
>>> Avati
>>>
>>>
>>> On Wed, Dec 4, 2013 at 2:05 PM, Michael Lampe
>>> wrote:
>>>
>>>  I've installed GlusterFS on our 23-node Beowulf cluster. Each node has a
>>>> disc which provides a brick for the GlusterFS volume and every node
>>>> mounts
>>>> this volume. The NUFA translator is ideal for the code we run and
>>>> everythings works fine so far.
>>>>
>>>> Only problem is the frontend node, which does not have a subvolume:
>>>> After
>>>> I've turned on NUFA, it no longer can mount the volume. :(
>>>>
>>>> [2013-12-04 21:35:44.262738] E [nufa.c:641:init] 0-gv0-dht: Could not
>>>> find
>>>> specified or local subvol
>>>> [2013-12-04 21:35:44.262766] E [xlator.c:390:xlator_init] 0-gv0-dht:
>>>> Initialization of volume 'gv0-dht' failed, review your volfile again
>>>> [2013-12-04 21:35:44.262783] E [graph.c:292:glusterfs_graph_init]
>>>> 0-gv0-dht: initializing translator failed
>>>> [2013-12-04 21:35:44.262799] E [graph.c:479:glusterfs_graph_activate]
>>>> 0-graph: init failed
>>>>
>>>> Is there any technical reason for treating things this way? Why cannot
>>>> NUFA fall back to distribute, like it does when the local subvolume
>>>> has not
>>>> enough free space?
>>>>
>>>> Or is there a better way to include a frontend node?
>>>>
>>>> -Michael
>>>> ___
>>>> Gluster-users mailing list
>>>> Gluster-users@gluster.org
>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Why does NUFA not allow mounts even from trusted peers without a subvolume?

2013-12-04 Thread Anand Avati
You are probably using 3.3 or older? This has been fixed in 3.4 (
http://review.gluster.org/5414)

Thanks,
Avati


On Wed, Dec 4, 2013 at 2:05 PM, Michael Lampe wrote:

> I've installed GlusterFS on our 23-node Beowulf cluster. Each node has a
> disc which provides a brick for the GlusterFS volume and every node mounts
> this volume. The NUFA translator is ideal for the code we run and
> everythings works fine so far.
>
> Only problem is the frontend node, which does not have a subvolume: After
> I've turned on NUFA, it no longer can mount the volume. :(
>
> [2013-12-04 21:35:44.262738] E [nufa.c:641:init] 0-gv0-dht: Could not find
> specified or local subvol
> [2013-12-04 21:35:44.262766] E [xlator.c:390:xlator_init] 0-gv0-dht:
> Initialization of volume 'gv0-dht' failed, review your volfile again
> [2013-12-04 21:35:44.262783] E [graph.c:292:glusterfs_graph_init]
> 0-gv0-dht: initializing translator failed
> [2013-12-04 21:35:44.262799] E [graph.c:479:glusterfs_graph_activate]
> 0-graph: init failed
>
> Is there any technical reason for treating things this way? Why cannot
> NUFA fall back to distribute, like it does when the local subvolume has not
> enough free space?
>
> Or is there a better way to include a frontend node?
>
> -Michael
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS was removed from Fedora EPEL

2013-12-03 Thread Anand Avati
Sorry I misread. Please provide the gluster client logs from the session
where you faced the failure (of the previously attached log)

Thanks,
Avati


On Tue, Dec 3, 2013 at 4:48 PM, Nguyen Viet Cuong wrote:

> Hi Avati,
>
> I do not use RDMA for transport. I use tcp transport over the IPoIB
> interface.
>
> I have already re-installed 3.2.7 on that server for shipping. I will
> install 3.4.1 on other servers and send you client logs, but please wait
> for a while.
>
> Regards,
> Cuong
>
>
> On Wed, Dec 4, 2013 at 9:35 AM, Anand Avati  wrote:
>
>> Nguyen,
>> I did not realize you were using RDMA. Can you paste the gluster client
>> logs as well?
>>
>> Thanks,
>> Avati
>>
>>
>> On Tue, Dec 3, 2013 at 4:16 PM, Nguyen Viet Cuong wrote:
>>
>>> Hi Keithley,
>>>
>>> Please find the bug in the attached log file. I experienced this bug on
>>> both 3.4.0 and 3.4.1. There is no problem with GlusterFS 3.2.7.
>>>
>>> I use IOR 3.0.1 for the test. The connection is IBoIP. OFED is come from
>>> Mellanox (1.5.3). OS is CentOS 6.4.
>>>
>>> About the GlusterFS on RHEL 6.5, I wonder that why there is no
>>> glusterfs-server and glusterfs-geo-replication packages?
>>>
>>>
>>> On Tue, Dec 3, 2013 at 1:03 AM, Kaleb S. KEITHLEY 
>>> wrote:
>>>
>>>> On 12/02/2013 10:52 AM, Nguyen Viet Cuong wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Actually, I have very bad experience with GlusterFS 3.3.x and 3.4.x
>>>>> under very high pressure (> 64 processes write in parallel in more than
>>>>> 10 minutes, for example).
>>>>>
>>>>
>>>> Have you filed a bug?
>>>>
>>>>
>>>>  GlusterFS 3.2.7 from EPEL is really stable and
>>>>> we use it for production.
>>>>>
>>>>> Unfortunately, there is no official built of GlusterFS 3.2.x on
>>>>> Gluster's repo.
>>>>>
>>>>
>>>> You can get the glusterfs RPMs that were built in the Fedora Koji build
>>>> system at
>>>>
>>>> https://koji.fedoraproject.org/koji/packageinfo?packageID=5443
>>>>
>>>> and in particular the 3.2.7 el6 RPMs are at
>>>>
>>>> https://koji.fedoraproject.org/koji/buildinfo?buildID=323952
>>>>
>>>>
>>>> --
>>>>
>>>> Kaleb
>>>>
>>>
>>>
>>>
>>> --
>>> Nguyen Viet Cuong
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>
>
> --
> Nguyen Viet Cuong
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS was removed from Fedora EPEL

2013-12-03 Thread Anand Avati
Nguyen,
I did not realize you were using RDMA. Can you paste the gluster client
logs as well?

Thanks,
Avati


On Tue, Dec 3, 2013 at 4:16 PM, Nguyen Viet Cuong wrote:

> Hi Keithley,
>
> Please find the bug in the attached log file. I experienced this bug on
> both 3.4.0 and 3.4.1. There is no problem with GlusterFS 3.2.7.
>
> I use IOR 3.0.1 for the test. The connection is IBoIP. OFED is come from
> Mellanox (1.5.3). OS is CentOS 6.4.
>
> About the GlusterFS on RHEL 6.5, I wonder that why there is no
> glusterfs-server and glusterfs-geo-replication packages?
>
>
> On Tue, Dec 3, 2013 at 1:03 AM, Kaleb S. KEITHLEY wrote:
>
>> On 12/02/2013 10:52 AM, Nguyen Viet Cuong wrote:
>>
>>> Hi,
>>>
>>> Actually, I have very bad experience with GlusterFS 3.3.x and 3.4.x
>>> under very high pressure (> 64 processes write in parallel in more than
>>> 10 minutes, for example).
>>>
>>
>> Have you filed a bug?
>>
>>
>>  GlusterFS 3.2.7 from EPEL is really stable and
>>> we use it for production.
>>>
>>> Unfortunately, there is no official built of GlusterFS 3.2.x on
>>> Gluster's repo.
>>>
>>
>> You can get the glusterfs RPMs that were built in the Fedora Koji build
>> system at
>>
>> https://koji.fedoraproject.org/koji/packageinfo?packageID=5443
>>
>> and in particular the 3.2.7 el6 RPMs are at
>>
>> https://koji.fedoraproject.org/koji/buildinfo?buildID=323952
>>
>>
>> --
>>
>> Kaleb
>>
>
>
>
> --
> Nguyen Viet Cuong
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Block size reported from FUSE-client

2013-11-26 Thread Anand Avati
You are seeing a side-effect of http://review.gluster.com/3631. Which
means: if your backend filesystem uses 4KB blocks, then the value reported
by gluster will be at worst 7 blocks smaller (4KB / 512 - 1).


On Tue, Nov 26, 2013 at 3:13 AM, Maik Kulbe
wrote:

> So st_blocks on FUSE mount is different from st_blocks on backend for the
>> same file?
>>
>
> Yes. Just a quick example: I create a file with 5 Bytes in size. I theory
> Gluster should report 8 x 512 Byte blocks, because the underlying XFS uses
> a 4K block size. Instead, it reports the minimum count of blocks the file
> size would fit in:
>
> client> echo test > /gluster/tmp/test.txt
>
> client> stat /gluster/tmp/test.txt
>  File: `/gluster/tmp/test.txt'
>  Size: 5Blocks: 1  IO Block: 131072 regular file
> Device: 14h/20d Inode: 12072747239032953097  Links: 1
> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
> Access: 2013-11-26 12:09:22.960828776 +0100
> Modify: 2013-11-26 12:09:22.964828962 +0100
> Change: 2013-11-26 12:09:22.964828962 +0100
> Birth: -
>
> gluster> stat /bricks/0/tmp/test.txt
>  File: `/bricks/0/tmp/test.txt'
>  Size: 5Blocks: 8  IO Block: 4096   regular file
> Device: ca03h/51715dInode: 859069733   Links: 2
> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
> Access: 2013-11-26 12:09:22.957683891 +0100
> Modify: 2013-11-26 12:09:22.961684089 +0100
> Change: 2013-11-26 12:09:22.961684089 +0100
> Birth: -
>
>
>
>> On Nov 25, 2013 8:50 AM, "Maik Kulbe" 
>> wrote:
>>
>> From man (2) stat:
>>
>> blksize_t st_blksize; /* blocksize for file system I/O */
>> blkcnt_t  st_blocks;  /* number of 512B blocks allocated */
>>
>> The 128K  you are seeing is "st_blksize" which is the recommended I/O
>> transfer size. The number of consumed blocks is always reported as 512
>> byte blocks. The actual block size with which storage allocation
>> happens
>> depends on the backend filesystem.
>>
>> This is what was confusing me. On the file systems one of our
>> programmers tested the latter one it was always showing the blocks
>> allocated. So if you had a 1k file and 4k block size it would report 8
>> 512-byte blocks, gluster just reports 2 blocks.
>>
>> Avati
>>
>> On Mon, Nov 25, 2013 at 7:18 AM, Maik Kulbe
>>  wrote:
>>
>> Hi,
>>
>> I've come to notice that the file system block size reported from stat
>> on a client is 128k, which is pretty high for the small files I use.
>> On
>> the other hand, I tested copying smaller files to the volume and it
>> seems those 128k are not the real block size - when I copy two 64k
>> files
>> to the volume `df` reports only a change after both files have been
>> copied.
>>
>> So my question would be what is the real block size for the Gluster
>> volume? The block size of the underlying xfs? Or something else? And
>> is
>> it possible to read the real block size? We wanted to use the block
>> size
>> reported by stat to calculate the real file size use on disk but it
>> seems that is not possible with Gluster..
>>
>> Thank you in advance,
>> Maik
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Block size reported from FUSE-client

2013-11-25 Thread Anand Avati
>From man (2) stat:

   blksize_t st_blksize; /* blocksize for file system I/O */
   blkcnt_t  st_blocks;  /* number of 512B blocks allocated */


The 128K  you are seeing is "st_blksize" which is the recommended I/O
transfer size. The number of consumed blocks is always reported as 512 byte
blocks. The actual block size with which storage allocation happens depends
on the backend filesystem.

Avati

On Mon, Nov 25, 2013 at 7:18 AM, Maik Kulbe
wrote:

> Hi,
>
> I've come to notice that the file system block size reported from stat on
> a client is 128k, which is pretty high for the small files I use. On the
> other hand, I tested copying smaller files to the volume and it seems those
> 128k are not the real block size - when I copy two 64k files to the volume
> `df` reports only a change after both files have been copied.
>
> So my question would be what is the real block size for the Gluster
> volume? The block size of the underlying xfs? Or something else? And is it
> possible to read the real block size? We wanted to use the block size
> reported by stat to calculate the real file size use on disk but it seems
> that is not possible with Gluster..
>
> Thank you in advance,
> Maik
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Maildir issue.

2013-11-24 Thread Anand Avati
Or actually, is it a 32-bit binary? (running "file /usr/bin/binary" on the
pop3 daemon should reveal). If it is, try mount FUSE client with -o
enable-ino32 and retry the pop3 daemon.

Avati


On Sun, Nov 24, 2013 at 12:40 AM, Anand Avati  wrote:

> The "problematic" line is:
>
> 9096  open("tmp", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
>
> ...
>
> 9096  getdents(3, 0x8295024, 32768) = -1 EINVAL (Invalid argument)
>
>
> Is the client a 32-bit system?
>
> Avati
>
>
> On Sun, Nov 24, 2013 at 12:02 AM, W K  wrote:
>
>>  On 11/23/13, 11:13 PM, Anand Avati wrote:
>>
>> Can you provide the following details from the time of the pop3 test on
>> FUSE mount:
>>
>>  1. mount FUSE client with -LTRACE and logs from that session
>>
>>
>> I'm unable to do that
>>
>> #mount -t glusterfs -LTRACE gluster1:/mailtest /homegluster
>> mount: no such partition found
>>
>> maybe I am doing things incorrectly.
>>
>> here is the client log.
>>
>> [2013-11-24 07:40:59.285098] I [glusterfsd.c:1910:main] 
>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.1 
>> (/usr/sbin/glusterfs --volfile-id=/mailtest --volfile-server=gluster1 
>> /homegluster)
>> [2013-11-24 07:40:59.289602] I [socket.c:3480:socket_init] 0-glusterfs: SSL 
>> support is NOT enabled
>> [2013-11-24 07:40:59.289662] I [socket.c:3495:socket_init] 0-glusterfs: 
>> using system polling thread
>> [2013-11-24 07:40:59.300344] I [socket.c:3480:socket_init] 
>> 0-mailtest-client-1: SSL support is NOT enabled
>> [2013-11-24 07:40:59.300408] I [socket.c:3495:socket_init] 
>> 0-mailtest-client-1: using system polling thread
>> [2013-11-24 07:40:59.301555] I [socket.c:3480:socket_init] 
>> 0-mailtest-client-0: SSL support is NOT enabled
>> [2013-11-24 07:40:59.301583] I [socket.c:3495:socket_init] 
>> 0-mailtest-client-0: using system polling thread
>> [2013-11-24 07:40:59.301642] I [client.c:2154:notify] 0-mailtest-client-0: 
>> parent translators are ready, attempting connect on transport
>> [2013-11-24 07:40:59.305024] I [client.c:2154:notify] 0-mailtest-client-1: 
>> parent translators are ready, attempting connect on transport
>> Given volfile:
>> +--+
>>   1: volume mailtest-client-0
>>   2: type protocol/client
>>   3: option transport-type tcp
>>   4: option remote-subvolume /gluster1/mailtest/brick1
>>   5: option remote-host gluster2
>>   6: end-volume
>>   7:
>>   8: volume mailtest-client-1
>>   9: type protocol/client
>>  10: option transport-type tcp
>>  11: option remote-subvolume /gluster1/mailtest/brick1
>>  12: option remote-host gluster1
>>  13: end-volume
>>  14:
>>  15: volume mailtest-replicate-0
>>  16: type cluster/replicate
>>  17: subvolumes mailtest-client-0 mailtest-client-1
>>  18: end-volume
>>  19:
>>  20: volume mailtest-dht
>>  21: type cluster/distribute
>>  22: subvolumes mailtest-replicate-0
>>  23: end-volume
>>  24:
>>  25: volume mailtest-write-behind
>>  26: type performance/write-behind
>>  27: subvolumes mailtest-dht
>>  28: end-volume
>>  29:
>>  30: volume mailtest-read-ahead
>>  31: type performance/read-ahead
>>  32: subvolumes mailtest-write-behind
>>  33: end-volume
>>  34:
>>  35: volume mailtest-io-cache
>>  36: type performance/io-cache
>>  37: subvolumes mailtest-read-ahead
>>  38: end-volume
>>  39:
>>  40: volume mailtest-quick-read
>>  41: type performance/quick-read
>>  42: subvolumes mailtest-io-cache
>>  43: end-volume
>>  44:
>>  45: volume mailtest-open-behind
>>  46: type performance/open-behind
>>  47: subvolumes mailtest-quick-read
>>  48: end-volume
>>  49:
>>  50: volume mailtest-md-cache
>>  51: type performance/md-cache
>>  52: subvolumes mailtest-open-behind
>>  53: end-volume
>>  54:
>>  55: volume mailtest
>>  56: type debug/io-stats
>>  57: option count-fop-hits off
>>  58: option latency-measurement off
>>  59: subvolumes mailtest-md-cache
>>  60: end-volume
>>
>> +--+
>> [2013-11-24 07:40:59.308657] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 
>> 0-mailtest-client-1: changing port to 49154 (from 0)
>> [2013-11-24 07:40:59.30

Re: [Gluster-users] Maildir issue.

2013-11-24 Thread Anand Avati
The "problematic" line is:

9096  open("tmp", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3

...

9096  getdents(3, 0x8295024, 32768) = -1 EINVAL (Invalid argument)


Is the client a 32-bit system?

Avati


On Sun, Nov 24, 2013 at 12:02 AM, W K  wrote:

>  On 11/23/13, 11:13 PM, Anand Avati wrote:
>
> Can you provide the following details from the time of the pop3 test on
> FUSE mount:
>
>  1. mount FUSE client with -LTRACE and logs from that session
>
>
> I'm unable to do that
>
> #mount -t glusterfs -LTRACE gluster1:/mailtest /homegluster
> mount: no such partition found
>
> maybe I am doing things incorrectly.
>
> here is the client log.
>
> [2013-11-24 07:40:59.285098] I [glusterfsd.c:1910:main] 
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.1 
> (/usr/sbin/glusterfs --volfile-id=/mailtest --volfile-server=gluster1 
> /homegluster)
> [2013-11-24 07:40:59.289602] I [socket.c:3480:socket_init] 0-glusterfs: SSL 
> support is NOT enabled
> [2013-11-24 07:40:59.289662] I [socket.c:3495:socket_init] 0-glusterfs: using 
> system polling thread
> [2013-11-24 07:40:59.300344] I [socket.c:3480:socket_init] 
> 0-mailtest-client-1: SSL support is NOT enabled
> [2013-11-24 07:40:59.300408] I [socket.c:3495:socket_init] 
> 0-mailtest-client-1: using system polling thread
> [2013-11-24 07:40:59.301555] I [socket.c:3480:socket_init] 
> 0-mailtest-client-0: SSL support is NOT enabled
> [2013-11-24 07:40:59.301583] I [socket.c:3495:socket_init] 
> 0-mailtest-client-0: using system polling thread
> [2013-11-24 07:40:59.301642] I [client.c:2154:notify] 0-mailtest-client-0: 
> parent translators are ready, attempting connect on transport
> [2013-11-24 07:40:59.305024] I [client.c:2154:notify] 0-mailtest-client-1: 
> parent translators are ready, attempting connect on transport
> Given volfile:
> +--+
>   1: volume mailtest-client-0
>   2: type protocol/client
>   3: option transport-type tcp
>   4: option remote-subvolume /gluster1/mailtest/brick1
>   5: option remote-host gluster2
>   6: end-volume
>   7:
>   8: volume mailtest-client-1
>   9: type protocol/client
>  10: option transport-type tcp
>  11: option remote-subvolume /gluster1/mailtest/brick1
>  12: option remote-host gluster1
>  13: end-volume
>  14:
>  15: volume mailtest-replicate-0
>  16: type cluster/replicate
>  17: subvolumes mailtest-client-0 mailtest-client-1
>  18: end-volume
>  19:
>  20: volume mailtest-dht
>  21: type cluster/distribute
>  22: subvolumes mailtest-replicate-0
>  23: end-volume
>  24:
>  25: volume mailtest-write-behind
>  26: type performance/write-behind
>  27: subvolumes mailtest-dht
>  28: end-volume
>  29:
>  30: volume mailtest-read-ahead
>  31: type performance/read-ahead
>  32: subvolumes mailtest-write-behind
>  33: end-volume
>  34:
>  35: volume mailtest-io-cache
>  36: type performance/io-cache
>  37: subvolumes mailtest-read-ahead
>  38: end-volume
>  39:
>  40: volume mailtest-quick-read
>  41: type performance/quick-read
>  42: subvolumes mailtest-io-cache
>  43: end-volume
>  44:
>  45: volume mailtest-open-behind
>  46: type performance/open-behind
>  47: subvolumes mailtest-quick-read
>  48: end-volume
>  49:
>  50: volume mailtest-md-cache
>  51: type performance/md-cache
>  52: subvolumes mailtest-open-behind
>  53: end-volume
>  54:
>  55: volume mailtest
>  56: type debug/io-stats
>  57: option count-fop-hits off
>  58: option latency-measurement off
>  59: subvolumes mailtest-md-cache
>  60: end-volume
>
> +--+
> [2013-11-24 07:40:59.308657] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 
> 0-mailtest-client-1: changing port to 49154 (from 0)
> [2013-11-24 07:40:59.308714] W [socket.c:514:__socket_rwv] 
> 0-mailtest-client-1: readv failed (No data available)
> [2013-11-24 07:40:59.311861] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 
> 0-mailtest-client-0: changing port to 49156 (from 0)
> [2013-11-24 07:40:59.311963] W [socket.c:514:__socket_rwv] 
> 0-mailtest-client-0: readv failed (No data available)
> [2013-11-24 07:40:59.315123] I 
> [client-handshake.c:1658:select_server_supported_programs] 
> 0-mailtest-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
> [2013-11-24 07:40:59.315357] I 
> [client-handshake.c:1658:select_server_supported_programs] 
> 0-mailtest-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
> [2013-11-24

Re: [Gluster-users] Maildir issue.

2013-11-23 Thread Anand Avati
Can you provide the following details from the time of the pop3 test on
FUSE mount:

1. mount FUSE client with -LTRACE and logs from that session

2. strace -f -p  -o /tmp/pop3-strace.log

Thanks,
Avati


On Sat, Nov 23, 2013 at 3:56 PM, W K  wrote:

> We brought up a test cluster to investigate GlusterFS.
>
> Using the Quick Start instructions, we brought up a 2 server 1 brick
> replicating setup and mounted to it from a third box with the fuse mount
> (all ver 3.4.1)
>
> # gluster volume info
>
> Volume Name: mailtest
> Type: Replicate
> Volume ID: 9e412774-b8c9-4135-b7fb-bc0dd298d06a
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: gluster2:/gluster1/mailtest/brick1
> Brick2: gluster1:/gluster1/mailtest/brick1
> Options Reconfigured:
> server.root-squash: no
>
> We then began loading the cluster with data (from a nearby Mailserver) by
> rsyncing Maildirs onto the mount. We chose maildirs because small files are
> supposed to be a worst case scenario in Gluster.
>
> During our testing, everything worked great. Speed was acceptable (we are
> familiar with other Distributed File Systems, so we don't have unrealistic
> expectations).
>
> We yanked cords, turned off machines and generally tortured the setup, all
> to good effect. Everything performs as advertised, though you have to do a
> LOT of googling to get some of the answers when 'recovering'. There appears
> to be a lot of 'secret' recipes to get things done and I think the doc site
> should link to JoeJulians blog, among others .
>
> We then decided to see what email Maildir performance was like using a
> pop3 tester program.
>
> So we quickly installed qmail which has the qmail-pop3d daemon.
>
> The result was that the pop daemon cant see the email
>
> # telnet localhost 110
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> +OK <20481.1385248027@mailtest>
> USER glusttest
> +OK
> PASS somepass
> +OK
> LIST
> +OK
> .
>
> If we copy that SAME directory over to /users (not on the gluster mount),
> then the LIST command shows that email. So we know that the qmail-pop3d
> setup is working fine.
>
> LIST
> +OK
> 1 3120
> 2 11157
> 3 3267
> etc.
>
> So since we normally use POP/IMAP over NFS, we decided to try the gluster
> NFS mount.
>
> That the NFS mount gave us an even stranger result. It doesn't even see
> the Maildir.
>
> # telnet localhost 110
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> +OK <20488.1385248223@mailtest>
> USER glusttest
> +OK
> PASS somepass
> -ERR this user has no $HOME/Maildir
> Connection closed by foreign host.
>
>
> We have verified that the NTP daemon and thus the time settings are
> correct.
> There are NO errors corresponding to this activity in any of the log files
> on either the data servers or the mount (logs only refer to the
> mount/unmount ops).
> Using the command line, we can manually manipulate files on the mount to
> our hearts content (as root or as an appropriate user) with no errors.
> mkdir, rm, touch, all work fine.
>
> However, using the VmailMGR command line tools, we are unable to add new
> users when mounted under NFS
>
> # su - glusttest
>
> $ vadduser test2
> Enter the user's new password:
> Please type it again for verification:
> vadduser: error adding the virtual user:
>   Can't create the mail directory './users/test2'
>
> then when its manually created qmail refused to deliver to that Maildir
> because it cant see the install
> (in this case .qmail file)
>
> 1385249260.729221 delivery 146770: failure: Sorry,_no_mailbox_here_by_
> that_name._(#5.1.1)/
>
>
> Under the FUSE mount, VmailMGR tools work fine and email is able to be
> delivered to the Maildir.
>
> $ vadduser test3
> Enter the user's new password:
> Please type it again for verification:
> vadduser: user 'test3' successfully added.
>
> but  of course the pop3 daemon doesn't see the email that qmail process
> just was just delivered.
>
> Finally, you can see from the info provided, that we deliberately disabled
> root_squash (since thats a new 3.4 feature) but that made no difference to
> any of the above results.
>
> SELINUX is disabled on all 3 machines.
>
> So
>
> What is going on here?
>
> Its not a mission critical, as GlusterFS is probably inappropriate for a
> mailserver, but I really need to understand what is going on so I can
> recommend GlusterFS for other situations.
>
> Is this a MailDir thing or is something else going on?
>
>
> Sincerely,
>
> -bill
>
>
>
>
>
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Fwd: bug in 3.4.1 when creating symlinks

2013-11-22 Thread Anand Avati
Thanks for testing the patch Peter. Will appreciate if you can vote based
on your results on http://review.gluster.org/6319/. That will help get it
merged sooner and into a release.

Thanks again,
Avati


On Thu, Nov 21, 2013 at 8:17 AM, Peter Drake  wrote:

> Great, after initial testing that patch appears to have addressed the
> problem.  I will put it through our full system tests, but at least my
> example script can no longer reproduce the problem.  Thank you.
>
>
> On Wed, Nov 20, 2013 at 10:25 PM, Anand Avati  wrote:
>
>> Peter,
>> Thanks, this was helpful. Can you please try out the following patch:
>>
>> http://review.gluster.org/6319
>>
>>
>> Thanks,
>> Avati
>>
>>
>> On Wed, Nov 20, 2013 at 6:35 PM, Peter Drake wrote:
>>
>>> I've included straces from both successful and unsuccessful exections,
>>> as well as the PHP error information below.  Let me know if there is
>>> anything else I can provide which would be helpful.
>>>
>>> PHP Error (as provided by error_get_last()):
>>>
>>> Array
>>> (
>>> [type] => 2
>>> [message] => symlink(): No such file or directory
>>> [file] => /tmp/symlink-test.php
>>> [line] => 78
>>> )
>>>
>>> Straces on both clients for symlink creation which was unsuccessful on
>>> one client:
>>>
>>> Strace on unsuccessful client web-1:
>>> lstat("/mnt/gfs/test1385000751", {st_mode=S_IFLNK|0777, st_size=20,
>>> ...}) = 0
>>> lstat("/tmp/test1385000751", 0x7fff2b5eb2e0) = -1 ENOENT (No such file
>>> or directory)
>>> lstat("/mnt/gfs/test1385000751", {st_mode=S_IFLNK|0777, st_size=20,
>>> ...}) = 0
>>> readlink("/mnt/gfs/test1385000751", 0x7fff2b5eb3f0, 4096) = -1 EINVAL
>>> (Invalid argument)
>>> lstat("/tmp/test1385000751", 0x7fff2b5ef2d0) = -1 ENOENT (No such file
>>> or directory)
>>> write(1, "Failed to create local link: /tm"..., 50) = 50
>>>
>>> Strace on successful client web-2:
>>> lstat("/mnt/gfs/test1385000751", 0x7fff3171b720) = -1 ENOENT (No such
>>> file or directory)
>>> lstat("/mnt/gfs/test1385000751", 0x7fff31717730) = -1 ENOENT (No such
>>> file or directory)
>>> symlink("/mnt/gfs/test-target", "/mnt/gfs/test1385000751") = 0
>>> lstat("/mnt/gfs/test1385000751", {st_mode=S_IFLNK|0777, st_size=20,
>>> ...}) = 0
>>> lstat("/tmp/test1385000751", 0x7fff31717730) = -1 ENOENT (No such file
>>> or directory)
>>> lstat("/mnt/gfs/test1385000751", {st_mode=S_IFLNK|0777, st_size=20,
>>> ...}) = 0
>>> readlink("/mnt/gfs/test1385000751", "/mnt/gfs/test-target"..., 4096) = 20
>>> symlink("/mnt/gfs/test1385000751", "/tmp/test1385000751") = 0
>>> lstat("/tmp/test1385000751", {st_mode=S_IFLNK|0777, st_size=23, ...}) = 0
>>>
>>>
>>> Straces on both clients for symlink creation which was successful on
>>> both clients:
>>>
>>> Strace on successful client web-1:
>>> lstat("/mnt/gfs/test1385000727", {st_mode=S_IFLNK|0777, st_size=20,
>>> ...}) = 0
>>> lstat("/tmp/test1385000727", 0x7fff31717730) = -1 ENOENT (No such file
>>> or directory)
>>> lstat("/mnt/gfs/test1385000727", {st_mode=S_IFLNK|0777, st_size=20,
>>> ...}) = 0
>>> readlink("/mnt/gfs/test1385000727", "/mnt/gfs/test-target"..., 4096) = 20
>>> symlink("/mnt/gfs/test1385000727", "/tmp/test1385000727") = 0
>>> lstat("/tmp/test1385000727", {st_mode=S_IFLNK|0777, st_size=23, ...}) = 0
>>>
>>> Strace on successful client web-2:
>>> lstat("/mnt/gfs/test1385000727", 0x7fff2b5ef2d0) = -1 ENOENT (No such
>>> file or directory)
>>> lstat("/mnt/gfs/test1385000727", 0x7fff2b5eb2e0) = -1 ENOENT (No such
>>> file or directory)
>>> symlink("/mnt/gfs/test-target", "/mnt/gfs/test1385000727") = 0
>>> lstat("/mnt/gfs/test1385000727", {st_mode=S_IFLNK|0777, st_size=20,
>>> ...}) = 0
>>> lstat("/tmp/test1385000727", 0x7fff2b5eb2e0) = -1 ENOENT (No such file
>>> or directory)
>>> lstat("/mnt/gfs/test1385000727", {st_mode=S_IFLNK|0777, st_size=20,
>>> ...}) = 0
>>> readlink("/mnt/gfs/test1385000727", "/mnt/gfs/test-target".

Re: [Gluster-users] gluster

2013-11-20 Thread Anand Avati

On 11/20/13, 10:40 PM, Randy Breunling wrote:

Hi.

We met at a storage meetup in SF a couple months ago...and I think
exchanged a couple emails regarding some gluster-related questions I had
(which I can't seem to find at this time).

Anyway...I'm interested in learning a little more about gluster and was
wondering if the gluster community website is the best place to go to
learn about it...and if I have questions...is that a good place for a
non-developer to post questions.   Is there also gluster information on
the RedHat website someplace...or anyplace else that's got good
introductory and use-case-related information.

Thanks in advance for any feedback.

--Randy Breunling

rbreunl...@gmail.com 
Data Architecture consultant, Decision Sciences International


Randy,

gluster.org is the best place. Please join gluster-users@ mailing list 
(on cc, link available in gluster.org) and introduce yourself and ask 
any questions you may have. We have an active and vibrant community on 
the mailing list!


Thanks,
Avati

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Fwd: bug in 3.4.1 when creating symlinks

2013-11-20 Thread Anand Avati
Peter,
Thanks, this was helpful. Can you please try out the following patch:

http://review.gluster.org/6319


Thanks,
Avati


On Wed, Nov 20, 2013 at 6:35 PM, Peter Drake  wrote:

> I've included straces from both successful and unsuccessful exections, as
> well as the PHP error information below.  Let me know if there is anything
> else I can provide which would be helpful.
>
> PHP Error (as provided by error_get_last()):
>
> Array
> (
> [type] => 2
> [message] => symlink(): No such file or directory
> [file] => /tmp/symlink-test.php
> [line] => 78
> )
>
> Straces on both clients for symlink creation which was unsuccessful on one
> client:
>
> Strace on unsuccessful client web-1:
> lstat("/mnt/gfs/test1385000751", {st_mode=S_IFLNK|0777, st_size=20, ...})
> = 0
> lstat("/tmp/test1385000751", 0x7fff2b5eb2e0) = -1 ENOENT (No such file or
> directory)
> lstat("/mnt/gfs/test1385000751", {st_mode=S_IFLNK|0777, st_size=20, ...})
> = 0
> readlink("/mnt/gfs/test1385000751", 0x7fff2b5eb3f0, 4096) = -1 EINVAL
> (Invalid argument)
> lstat("/tmp/test1385000751", 0x7fff2b5ef2d0) = -1 ENOENT (No such file or
> directory)
> write(1, "Failed to create local link: /tm"..., 50) = 50
>
> Strace on successful client web-2:
> lstat("/mnt/gfs/test1385000751", 0x7fff3171b720) = -1 ENOENT (No such file
> or directory)
> lstat("/mnt/gfs/test1385000751", 0x7fff31717730) = -1 ENOENT (No such file
> or directory)
> symlink("/mnt/gfs/test-target", "/mnt/gfs/test1385000751") = 0
> lstat("/mnt/gfs/test1385000751", {st_mode=S_IFLNK|0777, st_size=20, ...})
> = 0
> lstat("/tmp/test1385000751", 0x7fff31717730) = -1 ENOENT (No such file or
> directory)
> lstat("/mnt/gfs/test1385000751", {st_mode=S_IFLNK|0777, st_size=20, ...})
> = 0
> readlink("/mnt/gfs/test1385000751", "/mnt/gfs/test-target"..., 4096) = 20
> symlink("/mnt/gfs/test1385000751", "/tmp/test1385000751") = 0
> lstat("/tmp/test1385000751", {st_mode=S_IFLNK|0777, st_size=23, ...}) = 0
>
>
> Straces on both clients for symlink creation which was successful on both
> clients:
>
> Strace on successful client web-1:
> lstat("/mnt/gfs/test1385000727", {st_mode=S_IFLNK|0777, st_size=20, ...})
> = 0
> lstat("/tmp/test1385000727", 0x7fff31717730) = -1 ENOENT (No such file or
> directory)
> lstat("/mnt/gfs/test1385000727", {st_mode=S_IFLNK|0777, st_size=20, ...})
> = 0
> readlink("/mnt/gfs/test1385000727", "/mnt/gfs/test-target"..., 4096) = 20
> symlink("/mnt/gfs/test1385000727", "/tmp/test1385000727") = 0
> lstat("/tmp/test1385000727", {st_mode=S_IFLNK|0777, st_size=23, ...}) = 0
>
> Strace on successful client web-2:
> lstat("/mnt/gfs/test1385000727", 0x7fff2b5ef2d0) = -1 ENOENT (No such file
> or directory)
> lstat("/mnt/gfs/test1385000727", 0x7fff2b5eb2e0) = -1 ENOENT (No such file
> or directory)
> symlink("/mnt/gfs/test-target", "/mnt/gfs/test1385000727") = 0
> lstat("/mnt/gfs/test1385000727", {st_mode=S_IFLNK|0777, st_size=20, ...})
> = 0
> lstat("/tmp/test1385000727", 0x7fff2b5eb2e0) = -1 ENOENT (No such file or
> directory)
> lstat("/mnt/gfs/test1385000727", {st_mode=S_IFLNK|0777, st_size=20, ...})
> = 0
> readlink("/mnt/gfs/test1385000727", "/mnt/gfs/test-target"..., 4096) = 20
> symlink("/mnt/gfs/test1385000727", "/tmp/test1385000727") = 0
> lstat("/tmp/test1385000727", {st_mode=S_IFLNK|0777, st_size=23, ...}) = 0
>
>
>
> On Wed, Nov 13, 2013 at 3:24 PM, Anand Avati  wrote:
>
>>
>>
>>
>> On Wed, Nov 13, 2013 at 12:14 PM, Peter Drake wrote:
>>
>>> Thanks for taking the time to look at this and reply.  To clarify, the
>>> script that was running and created the log entries is an internal tool
>>> which does lots of other, unrelated things, but the part that caused the
>>> error takes actions very similar to the gist.  I tried to pull out the
>>> related log entries to the best of my ability.  The script in the gist did
>>> not create those log entries, but it does reliably reproduce the same error
>>> / failure (failure when attempting to create a symlink from the local
>>> filesystem to a symlink on the gluster filesystem).  I would not be
>>> surprised if the PHP version of symlink has behavior that is different than
>>> the symlink syscall.
>>>
>>>
>> So what is the failure (errno) when crea

Re: [Gluster-users] Fencing FOPs on data-split-brained files

2013-11-15 Thread Anand Avati
Ravi,
We should not mix up data and entry operation domains, if a file is in data
split brain that should not stop a user from rename/link/unlink operations
on the file.

Regarding your concern about complications while healing - we should change
our "manual fixing" instructions to:

- go to backend, access through gfid path or normal path
- rmxattr the afr changelogs
- truncate the file to 0 bytes (like "> filename")

Accessing the path through gfid and truncating to 0 bytes addresses your
concerns about hardlinks/renames.

Avati


On Wed, Nov 13, 2013 at 3:01 AM, Ravishankar N wrote:

> Hi,
>
> Currenly in glusterfs, when there is a data splt-brain (only) on a file,
> we disallow the following operations from the mount-point by returning EIO
> to the application:
> - Writes to the file (truncate, dd, echo, cp etc)
> - Reads to the file (cat)
> - Reading extended attributes (getfattr) [1]
>
> However we do permit the following operations:
> -creating hardlinks
> -creating symlinks
> -mv
> -setattr
> -chmod
> -chown
> --touch
> -ls
> -stat
>
> While it makes sense to allow `ls` and `stat`, is it okay to  add checks
> in the FOPS to disallow the other operations? Allowing creation of links
> and changing file attributes only seems to complicate things before the
> admin can go to the backend bricks and resolve the splitbrain (by deleteing
> all but the healthy copy of the file including hardlinks). More so if the
> file is renamed before addressing the split-brain.
> Please share your thoughs.
>
> Thanks,
> Ravi
>
> [1] http://review.gluster.org/#/c/5988/
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Fwd: bug in 3.4.1 when creating symlinks

2013-11-13 Thread Anand Avati
On Wed, Nov 13, 2013 at 12:14 PM, Peter Drake wrote:

> Thanks for taking the time to look at this and reply.  To clarify, the
> script that was running and created the log entries is an internal tool
> which does lots of other, unrelated things, but the part that caused the
> error takes actions very similar to the gist.  I tried to pull out the
> related log entries to the best of my ability.  The script in the gist did
> not create those log entries, but it does reliably reproduce the same error
> / failure (failure when attempting to create a symlink from the local
> filesystem to a symlink on the gluster filesystem).  I would not be
> surprised if the PHP version of symlink has behavior that is different than
> the symlink syscall.
>
>
So what is the failure (errno) when creating a symlink from local fs to
glusterfs? Can you get an strace of the script when the error happens?

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Fwd: bug in 3.4.1 when creating symlinks

2013-11-13 Thread Anand Avati
On Wed, Nov 13, 2013 at 9:01 AM, Peter Drake  wrote:

> I have a replicated Gluster setup, 2 servers (fs-1 and fs-2) x 1 brick.  I
> have two clients (web-1 and web-2) which are connected and simultaneously
> execute tasks.  These clients mount the Gluster volume at /mnt/gfs.  One
> task they execute looks like this (note this is pseudocode, the actual task
> is PHP):
>
> 1. @symlink(/mnt/gfs/slow265, /mnt/gfs/slow265.prod);
> 2. if (!is_link(/mnt/gfs/slow265.prod)) {
> 3.   throw Exception;
> 4. }
> 5. symlink(/mnt/gfs/slow265.prod, /home/user/slow265.prod)
>
> Note that line 1 may fail on either client because the link may have been
> created by the other client, but this is suppressed, the link is checked
> and an exception is thrown if the link does not exist.  These two tasks,
> when executed at the same time, usually succeed.  However, in a recent run,
> we saw an error on web-1 in line 5 because the local filesystem symlink
> creation failed, despite line 2 confirming that the target Gluster symlink
> existed.
>

This is strange. The creation of a symlink is completely independent of
whether the destination exists, readable, or whatever. If creation of a
symlink failed, it cannot be because of anything to do with the
destination. Unless symlink() of PHP does something "intelligent" and makes
it behave differently than symlink(2) syscall.



> I've created a PHP script which can be run simultaneously on two clients
> to recreate the error: https://gist.github.com/pdrakeweb/7347198
>
> Running the same test script on a Gluster 3.0.8 setup does not cause the
> error to occur.  Running the same test on a local-only filesystem also does
> not cause the error to occur.  I'd appreciate any insight people might have
> into what is going on here and whether this is a bug in 3.4.1.  Below are
> the related log entries from my Gluster servers and clients.
>
> Entries from web-1's mnt-gfs.log file:
>
> [2013-11-05 05:25:24.686506] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 
> 0-test-fs-cluster-1-client-1: remote operation failed: File exists. Path: 
> /slow265.prod (----)
> [2013-11-05 05:25:24.686584] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 
> 0-test-fs-cluster-1-client-0: remote operation failed: File exists. Path: 
> /slow265.prod (----)
>
>
So, slow265.prod is supposed to be a symlink? I really wonder why mknod()
is in the picture.


> [2013-11-05 05:25:24.686649] E [dht-helper.c:1052:dht_inode_ctx_get] 
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4git/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75)
>  [0x7f5e03dd4ff5] 
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4git/xlator/cluster/distribute.so(dht_layout_preset+0x59)
>  [0x7f5e03dc1c89] 
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4git/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x34)
>  [0x7f5e03dc3b34]))) 0-test-fs-cluster-1-dht: invalid argument: inode
> [2013-11-05 05:25:24.686687] E [dht-helper.c:1071:dht_inode_ctx_set] 
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4git/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75)
>  [0x7f5e03dd4ff5] 
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4git/xlator/cluster/distribute.so(dht_layout_preset+0x59)
>  [0x7f5e03dc1c89] 
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4git/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x52)
>  [0x7f5e03dc3b52]))) 0-test-fs-cluster-1-dht: invalid argument: inode
> [2013-11-05 05:25:24.689670] W [fuse-bridge.c:1311:fuse_readlink_cbk] 
> 0-glusterfs-fuse: 1736: /slow265.prod => -1 (Invalid argument)
>
>
> Entries from web-2's mnt-gfs.log file:
>
> [2013-11-05 05:25:26.164593] W [client-rpc-fops.c:2469:client3_3_link_cbk] 
> 0-test-fs-cluster-1-client-1: remote operation failed: File exists 
> (---- -> /slow265.prod)
>
>
client3_3_link_cbk is the callback for the link() hardlink system call.
Wonder why that is coming in the picture! Neither should mknod(). Going by
your script, the only calls should be SYMLINK, LOOKUP and READLINK.


> [2013-11-05 05:25:26.210652] W [client-rpc-fops.c:2469:client3_3_link_cbk] 
> 0-test-fs-cluster-1-client-0: remote operation failed: File exists 
> (---- -> /slow265.prod)
>
>
> Entries from fs-1's brick.log:
>
> [2013-11-05 05:25:24.832262] I [server-rpc-fops.c:575:server_mknod_cbk] 
> 0-test-fs-cluster-1-server: 3337: MKNOD (null) 
> (----0001/slow265.prod) ==> (File exists)
>
> [2013-11-05 05:25:26.391611] I [server-rpc-fops.c:1211:server_link_cbk] 
> 0-test-fs-cluster-1-server: 3301: LINK /slow265.prod 
> (3658314e-7730-4771-8ac3-2d6fb20b1b13) -> 
> ----0001/slow265.prod ==> (File exists)
>
>
>
>
> Entries from fs-2's brick.log:
>
> [2013-11-05 05:25:24.554824] I [server-rpc-fops.c:575:server_mknod_cbk] 
> 0-test-fs-cluster-1-server: 3290: MKNOD (null) 
> (----0001/slow265.prod) ==>

Re: [Gluster-users] Failed rebalance - lost files, inaccessible files, permission issues

2013-11-09 Thread Anand Avati
Shawn,

Thanks for the detailed info. I have not yet looked into your logs, but
will do so soon. There have been patches on rebalance which do fix issues
related to ownership. But I am not (yet) sure about bugs which caused data
loss. One question I have is -

[2013-10-29 23:13:49.611069] I [dht-rebalance.c:647:dht_migrate_file]
0-mdfs-dht: /REDACTED/mdfs/KPA/kpacontentminepix/docs/008/058: attempting
to move from mdfs-replicate-1 to mdfs-replicate-6
[2013-10-29 23:13:49.611582] I [dht-rebalance.c:647:dht_migrate_file]
0-mdfs-dht: /REDACTED/mdfs/KPA/kpacontentminepix/docs/008/058: attempting
to move from mdfs-replicate-1 to mdfs-replicate-6

Are these two lines from the same log file or separate log files? If they
are from the same log, then it might be you need
http://review.gluster.org/4300 (available in 3.4)

It might also be that the "permission issues" is a cascaded effect of the
same underlying problem - because the temporary file created by rebalance
would have different permissions during the process of rebalance and
failures might have left them in that state.

Avati


On Fri, Nov 8, 2013 at 7:23 PM, Shawn Heisey  wrote:

> I'm starting a new thread on this, because I have more concrete
> information than I did the first time around.  The full rebalance log from
> the machine where I started the rebalance can be found at the following
> link.  It is slightly redacted - one search/replace was made to replace an
> identifying word with REDACTED.
>
> https://dl.dropboxusercontent.com/u/97770508/mdfs-rebalance-redacted.zip
>
> The existing servers are running version 3.3.1-11.el6.x86_64, from
> kkeithley's epel repository.  Those servers have CentOS 6.3 on them.
>
> The newer servers are running version 3.3.1-15.el6.x86_64, and are CentOS
> 6.4, fully upgraded as of October 28th, 2013.  Both sets of servers have
> contrib, plus, epel, rpmforge, and kkeithley's glusterfs repo.
>
> One of our developers went through the log linked above and some of the
> source code.  I am reproducing his detailed comments below.  I have not
> looked at the log super-closely, except to produce a list of files that
> failed to migrate.
>
> What I'm hoping is that this information can either point us at some proof
> (bug, committed patch, etc) that the problem is fixed in 3.4.1, or we can
> use it to file a new new bug.  I'm hoping that either an upgrade will fix
> it or that a workaround can be found.
>
> I'm still hoping to hire someone to look things over.  Where can I find
> some good resources for this?  I tried sending a message to Redhat
> Consulting, but something may have gone wrong with that process, because
> it's been two days with no response.
>
> Full quote from our developer:
>
> --
> Preface:  I know what happened to the files and it's not what I thought it
> was.  I don't know the exact cause but we're closer.
>
> Here's where I throw away my vague hunch of two days ago.  I just realized
> that all the ZUMA files I saw on the new bricks were simply links created
> when users tried to access the files.  We did indeed rebalance on the files
> in chronological order of their uploads.  That was a two-day-long
> wrong-tree barking session because I didn't understand the architecture.
>
> When I looked at the individual cases of lost or corrupted files, one
> thing kept staring at me in the face until I recognized it:
>
> [2013-11-02 03:56:36.472170] I [dht-rebalance.c:647:dht_migrate_file]
> 0-mdfs-dht: /REDACTED/mdfs/AKG/akgphotos/docs/000/002: attempting to move
> from mdfs-replicate-2 to mdfs-replicate-12
> [2013-11-02 03:56:36.472186] I [dht-rebalance.c:647:dht_migrate_file]
> 0-mdfs-dht: /REDACTED/mdfs/AKG/akgphotos/docs/000/002: attempting to move
> from mdfs-replicate-2 to mdfs-replicate-12
> [2013-11-02 03:56:36.480567] I [dht-rebalance.c:647:dht_migrate_file]
> 0-mdfs-dht: /REDACTED/mdfs/AKG/akgphotos/docs/000/002: attempting to move
> from mdfs-replicate-2 to mdfs-replicate-12
>
> Three simultaneous processes on the same file!  Of course that would have
> undefined results, and be the cause of all our problems.  NFS may not be
> related after all.
>
> Tediously scrolling through the error log I found mostly errors where it
> refused to copy files from a more empty brick to a fuller brick, which
> makes perfect sense.  The wheels started falling off about 26 hours into
> the rebalance.
>
> [2013-10-29 23:13:17.193108] C 
> [client-handshake.c:126:rpc_client_ping_timer_expired]
> 0-mdfs-client-1: server 10.116.0.22:24025 has not responded in the last
> 42 seconds, disconnecting.
> [2013-10-29 23:13:17.200616] E [rpc-clnt.c:373:saved_frames_unwind]
> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x36de60f808]
> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0)
> [0x36de60f4c0] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)
> [0x36de60ef2e]))) 0-mdfs-client-1: forced unwinding frame type(GlusterFS
> 3.1) op(STAT(1)) called at 2013-10-29 23:12:20.6417

Re: [Gluster-users] gluster-deploy tool updated

2013-11-01 Thread Anand Avati
Sounds good! URL please ? :-)


On Fri, Nov 1, 2013 at 12:54 PM, Paul Cuzner  wrote:

>
> Hi,
>
> Just to let you know that I've updated the deploy tool (aka setup wizard),
> to include the creation/tuning of the 1st volume.
>
> Here's the changelog info;
>
> - Added optparse module for command line arguments. Added -n to bypass
> accesskey checking
> - Added password check code to RequestHandler class, and updated js to use
> xml and ajax request
> - Added globals module to share config across modules
> - http server default 'run' method overridden to enable it to be stopped
> (e.g. when error met)
> - added ability to create a volume after bricks are defined, and apply use
> case tuning/settings
> - some minor UI fixes
> - added initial error page
> - Added help page showing mount option syntax for smb,nfs and native client
> - css split to place theme type elements in the same file, so people can
> play with skinning the interface
>
> If your interested there is a screenshots directory in the archive, so you
> can have a look through that to get a feel for the 'workflow'.
>
> There's still a heap of changes needed for real world deployments - but it
> makes my life easier when testing things out ;o)
>
> Cheers,
>
> Paul C
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] KVM guest I/O errors with xfs backed gluster volumes

2013-10-29 Thread Anand Avati
Looks like what is happening is that qemu performs ioctls() on the backend
to query logical_block_size (for direct IO alignment). That works on XFS,
but fails on FUSE (hence qemu ends up performing IO with default 512
alignment rather than 4k).

Looks like this might be something we can enhance gluster driver in qemu.
Note that glusterfs does not have an ioctl() FOP, but we could probably
wire up a virtual xattr call for this purpose.

Copying Bharata to check if he has other solutions in mind.

Avati



On Tue, Oct 29, 2013 at 12:13 AM, Anand Avati  wrote:

> What happens when you try to use KVM on an image directly on XFS, without
> involving gluster?
>
> Avati
>
>
> On Sun, Oct 27, 2013 at 5:53 PM, Jacob Yundt  wrote:
>
>> I think I finally made some progress on this bug!
>>
>> I noticed that all disks in my gluster server(s) have 4K sectors.
>> Using an older disk with 512 byte sectors, I did _not_ get any errors
>> on my gluster client / KVM server.  I switched back to using my newer
>> 4K drives and manually set the XFS sector size (sectsz) to 512.  With
>> the manually set sector size of 512, everything worked as expected.
>>
>> I think I might be hitting some sort of qemu/libvirt bug.  However,
>> all of the bugs I found that sound similar[1][2] have already been
>> fixed in RHEL6.
>>
>> Anyone else using XFS backed bricks on 4K sector drives to host KVM
>> images in RHEL6?
>>
>> -Jacob
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=608548
>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=748902
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] KVM guest I/O errors with xfs backed gluster volumes

2013-10-29 Thread Anand Avati
What happens when you try to use KVM on an image directly on XFS, without
involving gluster?

Avati


On Sun, Oct 27, 2013 at 5:53 PM, Jacob Yundt  wrote:

> I think I finally made some progress on this bug!
>
> I noticed that all disks in my gluster server(s) have 4K sectors.
> Using an older disk with 512 byte sectors, I did _not_ get any errors
> on my gluster client / KVM server.  I switched back to using my newer
> 4K drives and manually set the XFS sector size (sectsz) to 512.  With
> the manually set sector size of 512, everything worked as expected.
>
> I think I might be hitting some sort of qemu/libvirt bug.  However,
> all of the bugs I found that sound similar[1][2] have already been
> fixed in RHEL6.
>
> Anyone else using XFS backed bricks on 4K sector drives to host KVM
> images in RHEL6?
>
> -Jacob
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=608548
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=748902
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] metadata for stat : Should it be identical?

2013-10-25 Thread Anand Avati
On Fri, Oct 25, 2013 at 12:51 PM, James  wrote:

> On Fri, Oct 25, 2013 at 3:18 PM, Anand Avati  wrote:
> > In normal operations they will differ as much as the time drift between
> the
> > servers + lag in delivery/issue of write() calls on the servers. This
> delta
> > is "fixed up" by consistently returning the highest of the two mtimes
> > whenever queried.
>
>
> But lets says we have replica == 2.
> On server A this mtime is 4:45
> On server B this mtime is 4:46
> So fuse queries the times, and it returns the max, which is 4:46.
> All is good.
> Suppose now that server B is down, and the query is run again.
> It should now return 4:45, although this means that file has changed mtime.
> This could break client operations, which may care about a change in mtime.
>
> Note that I don't expect differences of 1 minute, but I just chose
> arbitrary values to make understanding the example easier.
>
> So isn't this a bug?
>

This is a known behavior - when a server goes down, files can witness a
changed mtime (but I don't think that is what the original post was about).
We could provide a new feature/enhancement to keep mtimes in sync with
explicit utimes() call per-write() - but that might be too expensive for
most users.

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] metadata for stat : Should it be identical?

2013-10-25 Thread Anand Avati
After a self-heal, mtimes of the two files will be *exactly* the same,
because we explicitly set the mtime using utime() call.

In normal operations they will differ as much as the time drift between the
servers + lag in delivery/issue of write() calls on the servers. This delta
is "fixed up" by consistently returning the highest of the two mtimes
whenever queried.

Avati


On Fri, Oct 25, 2013 at 11:17 AM, James  wrote:

> On Fri, Oct 25, 2013 at 1:46 PM, Anand Avati  wrote:
> > Gluster's replication is synchronous. So writes are done in parallel. If
> a
> > server was down and we self-heal it later, we sync both data and mtime.
>
> Which is why the mtime of the two files should be the "greatest" of
> the mtimes... Is this the case, or rather, what is the case? Will the
> two files have the exact same mtime? I actually don't care what the
> mtime is, but I think it should be consistent.
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 3.4 Fuse client Performace

2013-10-25 Thread Anand Avati
Also, have you specified a block size for dd? The default (512 bytes) is
too low for the number of context switches it generates in FUSE. Use a
higher block size (64-128KB) and check the throughput.

Avati


On Fri, Oct 25, 2013 at 7:53 AM, Joe Julian  wrote:

> Have you brought this up with Red Hat Support? That  is  what you pay them
> for.
>
>
>
> Jung Young Seok  wrote:
> >I've wrote below email. However it seems I missed mail key word rule on
> >subject.
> >So I'm sending it again.
> >Please check the below mail and any response will be helpful.
> >Thanks,
> >2013. 10. 25. 오후 6:01에 "Jung Young Seok" 님이
> >작성:
> >
> >
> >Dear GlusterFS Engineer,
> >
> >I have questions that my glusterfs server and fuse client
> >perform properly on below specification.
> >
> >It can write only *65MB*/s through FUSE client to 1 glusterfs server (1
> >brick and no replica for 1 volume )
> > - NW bandwidth are enough for now. I've check it with iftop
> > - However it can write *120MB*/s when I mount nfs on the same volume.
> >
> >Could anyone check if the glusterfs and fuse client perform properly?
> >
> >
> >Detail explanations are below.
> >===
> >I've set 4 glusterfs servers and 1 fuse client.
> >Each spec is as followings.
> >
> >*Server x 4*
> > - CPU : Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz (2 cpu * 4 core)
> > - Memory : 32GB
> > - HDD (3TB 7.2K RPM SATA x 14 )
> >   * RAID6(33T)
> >   * XFS
> > - OS : RHS 2.1
> > - 4 Gluster Server will be used 2 replica x 2 distributed as 1 volume
> > - NW 1G for replica
> > - NW 1G for Storage and management
>
> No need. The fuse client connects to all the servers. Replication happens
> from the client.
>
> > - Current active profile: rhs-high-throughput
> >
> >*FUSE Client (gluster 3.4)*
> > - CPU : Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
> > - Memory : 32GB
> > - OS : CentOS6.4
> > - NW 2G for Storage (NIC bonding)
> >
> >All server will be in 10G network. (for now 1G network)
> >
> >
> >I've tested to check primitive disk performance.
> > - on first glusterfs server
> >* it can write 870MB/s (dd if=/dev/zero of=./dummy bs=4096 count=1)
> >  * it can read 1GB/s   (cat test_file.23 > /dev/null )
> > - on fuse client  (mount volume : 1 brick(1dist, no-replica)
> >  * it can write 64.8MB/s
> > - on nfs client (mount volume : 1 brick(1dist, no-replica)
> >  * it can write 120MB/s (it reached NW bandwith
>
> My usual question here is how does dd represent your expected use case?
> Are you comparing apples to orchards?
>
> >
> >
> >I wonder why fuse client much slower than nfs client. (it's no-replica
> peer)
> >Is it normal performance?
>
> I always max out my network connection with the fuse client, so no. It's
> not normal.
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] metadata for stat : Should it be identical?

2013-10-25 Thread Anand Avati
On Fri, Oct 25, 2013 at 9:14 AM, Brad Childs  wrote:

> If the replica took n+1 day to complete,
>

Gluster's replication is synchronous. So writes are done in parallel. If a
server was down and we self-heal it later, we sync both data and mtime.


> then returning the highest value would show that the file was modified a
> full day after the user considered it modified.  Shouldn't it be the lesser
> value (if both replicas are consistant)?
>
> In support of James statement from a user perspective, I may want to know
> the last time I wrote some data to a file-- timestamp is metadata for the
> user.  Showing the date gluster completed replicating the file to another
> node is confusing.
>
>
As I described above, that is not the case. Delayed replication (healing)
happens for both data and mtime.

Avati



>
> -bc
>
>
> --
>
> *From: *"Anand Avati" 
> *To: *"James" 
> *Cc: *"gluster-users" 
> *Sent: *Thursday, October 24, 2013 7:12:56 PM
> *Subject: *Re: [Gluster-users] metadata for stat : Should it be identical?
>
>
> Gluster does have logic to always show mtime which is the highest in
> value. It is probably a bug if you are witnessing different mtimes at
> different times when no writes have happened in between.
>
> Avati
>
>
>
> On Thu, Oct 24, 2013 at 4:31 PM, James  wrote:
>
>> On Thu, 2013-10-24 at 13:00 -0700, Robert Hajime Lanning wrote:
>> >
>> > Design philosophy...
>> >
>> > There is no metadata server.  When you look at timestamps in stat,
>> > you
>> > are seeing the real stat of the file.
>> So this raises an interesting point...
>>
>> >
>> > If you have "replica 2" then you have two files.  The stat can come
>> > from
>> > either one.  Mtime will be the modification time of the file
>> If the replica N files all have slightly different mtimes (it seems they
>> usually will because they weren't written at exactly the same time),
>> then isn't this a point of inconsistency for a script running on a fuse
>> mount which expects the same mtime on a file?
>>
>> Shouldn't gluster somehow coordinate to set all the files mtimes to be
>> consistent to say the last mtime in the replica set?
>>
>>
>> >  referenced
>> > by the time on the server (not the client.)
>> >
>> > One of the strengths of GlusterFS is that it does not have the
>> > bottleneck/single point of failure of a single metadata server.
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] metadata for stat : Should it be identical?

2013-10-24 Thread Anand Avati
Gluster does have logic to always show mtime which is the highest in value.
It is probably a bug if you are witnessing different mtimes at different
times when no writes have happened in between.

Avati



On Thu, Oct 24, 2013 at 4:31 PM, James  wrote:

> On Thu, 2013-10-24 at 13:00 -0700, Robert Hajime Lanning wrote:
> >
> > Design philosophy...
> >
> > There is no metadata server.  When you look at timestamps in stat,
> > you
> > are seeing the real stat of the file.
> So this raises an interesting point...
>
> >
> > If you have "replica 2" then you have two files.  The stat can come
> > from
> > either one.  Mtime will be the modification time of the file
> If the replica N files all have slightly different mtimes (it seems they
> usually will because they weren't written at exactly the same time),
> then isn't this a point of inconsistency for a script running on a fuse
> mount which expects the same mtime on a file?
>
> Shouldn't gluster somehow coordinate to set all the files mtimes to be
> consistent to say the last mtime in the replica set?
>
>
> >  referenced
> > by the time on the server (not the client.)
> >
> > One of the strengths of GlusterFS is that it does not have the
> > bottleneck/single point of failure of a single metadata server.
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Advice for building samba-glusterfs-vfs

2013-10-15 Thread Anand Avati
Very likely reason for getting ENODATA for posix_acl_default key is because
your backend is not mounted with -o acl?

Avati


On Mon, Oct 14, 2013 at 12:41 AM, Vijay Bellur  wrote:

> On 10/11/2013 10:20 AM, Dan Mons wrote:
>
>> Following up on this:
>>
>> * GlusterFS 3.4.1 solves the problem of renaming over CIFS from a
>> Windows client (via Samba3 and vfs_glusterfs/libgfapi).  Happy days!
>>
>> * I still see 4 lines of this sort of thing for each file read in
>> /var/log/glusterfs/bricks/*.**log:
>>
>> "[2013-10-11 04:01:37.554826] E [posix.c:2668:posix_getxattr]
>> 0-prodbackup-posix: getxattr failed on /gback_brick0/vfx:
>> system.posix_acl_default (No data available)"
>>
>
> Have sent a patch to lower the severity of this log message to DEBUG when
> the error number maps to "No data available":
>
> http://review.gluster.org/#/c/**6084/
>
> -Vijay
>
>
>
> __**_
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.**org/mailman/listinfo/gluster-**users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Phasing out replace-brick for data migration in favor of remove-brick.

2013-10-10 Thread Anand Avati
http://review.gluster.org/#/c/6031/ (patch to remove replace-brick data
migration) is slated for merge before 3.5. Review comments (on gerrit)
welcome.

Thanks,
Avati


On Thu, Oct 3, 2013 at 9:27 AM, Anand Avati  wrote:

> On Thu, Oct 3, 2013 at 8:57 AM, KueiHuan Chen wrote:
>
>> Hi, Avati
>>
>>   In your chained configuration, how to replace whole h1 without
>> replace-brick ? Is there has a better way than replace brick in this
>> situation ?
>>
>>   h0:/b1 h1:/b2 h1:/b1 h2:/b2 h2:/b1 h0:/b2 (A new h3 want to replace old
>> h1.)
>>
>
>
> You have a couple of options,
>
> A)
>
> replace-brick h1:/b1 h3:/b1
> replace-brick h1:/b2 h3:/b2
>
> and let self-heal bring the disks up to speed, or
>
> B)
>
> add-brick replica 2 h3:/b1 h2:/b2a
> add-brick replica 2 h3:/b2 h0:/b1a
>
> remove-brick h0:/b1 h1:/b2 start .. commit
> remove-brick h2:/b2 h1:/b1 start .. commit
>
> Let me know if you still have questions.
>
> Avati
>
>
>> Thanks.
>> Best Regards,
>>
>> KueiHuan-Chen
>> Synology Incorporated.
>> Email: khc...@synology.com
>> Tel: +886-2-25521814 ext.827
>>
>>
>> 2013/9/30 Anand Avati :
>> >
>> >
>> >
>> > On Fri, Sep 27, 2013 at 1:56 AM, James  wrote:
>> >>
>> >> On Fri, 2013-09-27 at 00:35 -0700, Anand Avati wrote:
>> >> > Hello all,
>> >> Hey,
>> >>
>> >> Interesting timing for this post...
>> >> I've actually started working on automatic brick addition/removal. (I'm
>> >> planning to add this to puppet-gluster of course.) I was hoping you
>> >> could help out with the algorithm. I think it's a bit different if
>> >> there's no replace-brick command as you are proposing.
>> >>
>> >> Here's the problem:
>> >> Given a logically optimal initial volume:
>> >>
>> >> volA: rep=2; h1:/b1 h2:/b1 h3:/b1 h4:/b1 h1:/b2 h2:/b2 h3:/b2 h4:/b2
>> >>
>> >> suppose I know that I want to add/remove bricks such that my new volume
>> >> (if I had created it new) looks like:
>> >>
>> >> volB: rep=2; h1:/b1 h3:/b1 h4:/b1 h5:/b1 h6:/b1 h1:/b2 h3:/b2 h4:/b2
>> >> h5:/b2 h6:/b2
>> >>
>> >> What is the optimal algorithm for determining the correct sequence of
>> >> transforms that are needed to accomplish this task. Obviously there are
>> >> some simpler corner cases, but I'd like to solve the general case.
>> >>
>> >> The transforms are obviously things like running the add-brick {...}
>> and
>> >> remove-brick {...} commands.
>> >>
>> >> Obviously we have to take into account that it's better to add bricks
>> >> and rebalance before we remove bricks and risk the file system if a
>> >> replica is missing. The algorithm should work for any replica N. We
>> want
>> >> to make sure the new layout makes sense to replicate the data on
>> >> different servers. In many cases, this will require creating a circular
>> >> "chain" of bricks as illustrated in the bottom of this image:
>> >> http://joejulian.name/media/uploads/images/replica_expansion.png
>> >> for example. I'd like to optimize for safety first, and then time, I
>> >> imagine.
>> >>
>> >> Many thanks in advance.
>> >>
>> >
>> > I see what you are asking. First of all, when running a 2-replica
>> volume you
>> > almost pretty much always want to have an even number of servers, and
>> add
>> > servers in even numbers. Ideally the two "sides" of the replicas should
>> be
>> > placed in separate failures zones - separate racks with separate power
>> > supplies or separate AZs in the cloud. Having an odd number of servers
>> with
>> > an 2 replicas is a very "odd" configuration. In all these years I am
>> yet to
>> > come across a customer who has a production cluster with 2 replicas and
>> an
>> > odd number of servers. And setting up replicas in such a chained manner
>> > makes it hard to reason about availability, especially when you are
>> trying
>> > recover from a disaster. Having clear and separate "pairs" is definitely
>> > what is recommended.
>> >
>> > That being said, nothing prevents one from setting up a chain like
>> above as
>> > long as you are comfortable with the complexity of the c

Re: [Gluster-users] Phasing out replace-brick for data migration in favor of remove-brick.

2013-10-03 Thread Anand Avati
On Thu, Oct 3, 2013 at 8:57 AM, KueiHuan Chen wrote:

> Hi, Avati
>
>   In your chained configuration, how to replace whole h1 without
> replace-brick ? Is there has a better way than replace brick in this
> situation ?
>
>   h0:/b1 h1:/b2 h1:/b1 h2:/b2 h2:/b1 h0:/b2 (A new h3 want to replace old
> h1.)
>


You have a couple of options,

A)

replace-brick h1:/b1 h3:/b1
replace-brick h1:/b2 h3:/b2

and let self-heal bring the disks up to speed, or

B)

add-brick replica 2 h3:/b1 h2:/b2a
add-brick replica 2 h3:/b2 h0:/b1a

remove-brick h0:/b1 h1:/b2 start .. commit
remove-brick h2:/b2 h1:/b1 start .. commit

Let me know if you still have questions.

Avati


> Thanks.
> Best Regards,
>
> KueiHuan-Chen
> Synology Incorporated.
> Email: khc...@synology.com
> Tel: +886-2-25521814 ext.827
>
>
> 2013/9/30 Anand Avati :
> >
> >
> >
> > On Fri, Sep 27, 2013 at 1:56 AM, James  wrote:
> >>
> >> On Fri, 2013-09-27 at 00:35 -0700, Anand Avati wrote:
> >> > Hello all,
> >> Hey,
> >>
> >> Interesting timing for this post...
> >> I've actually started working on automatic brick addition/removal. (I'm
> >> planning to add this to puppet-gluster of course.) I was hoping you
> >> could help out with the algorithm. I think it's a bit different if
> >> there's no replace-brick command as you are proposing.
> >>
> >> Here's the problem:
> >> Given a logically optimal initial volume:
> >>
> >> volA: rep=2; h1:/b1 h2:/b1 h3:/b1 h4:/b1 h1:/b2 h2:/b2 h3:/b2 h4:/b2
> >>
> >> suppose I know that I want to add/remove bricks such that my new volume
> >> (if I had created it new) looks like:
> >>
> >> volB: rep=2; h1:/b1 h3:/b1 h4:/b1 h5:/b1 h6:/b1 h1:/b2 h3:/b2 h4:/b2
> >> h5:/b2 h6:/b2
> >>
> >> What is the optimal algorithm for determining the correct sequence of
> >> transforms that are needed to accomplish this task. Obviously there are
> >> some simpler corner cases, but I'd like to solve the general case.
> >>
> >> The transforms are obviously things like running the add-brick {...} and
> >> remove-brick {...} commands.
> >>
> >> Obviously we have to take into account that it's better to add bricks
> >> and rebalance before we remove bricks and risk the file system if a
> >> replica is missing. The algorithm should work for any replica N. We want
> >> to make sure the new layout makes sense to replicate the data on
> >> different servers. In many cases, this will require creating a circular
> >> "chain" of bricks as illustrated in the bottom of this image:
> >> http://joejulian.name/media/uploads/images/replica_expansion.png
> >> for example. I'd like to optimize for safety first, and then time, I
> >> imagine.
> >>
> >> Many thanks in advance.
> >>
> >
> > I see what you are asking. First of all, when running a 2-replica volume
> you
> > almost pretty much always want to have an even number of servers, and add
> > servers in even numbers. Ideally the two "sides" of the replicas should
> be
> > placed in separate failures zones - separate racks with separate power
> > supplies or separate AZs in the cloud. Having an odd number of servers
> with
> > an 2 replicas is a very "odd" configuration. In all these years I am yet
> to
> > come across a customer who has a production cluster with 2 replicas and
> an
> > odd number of servers. And setting up replicas in such a chained manner
> > makes it hard to reason about availability, especially when you are
> trying
> > recover from a disaster. Having clear and separate "pairs" is definitely
> > what is recommended.
> >
> > That being said, nothing prevents one from setting up a chain like above
> as
> > long as you are comfortable with the complexity of the configuration. And
> > phasing out replace-brick in favor of add-brick/remove-brick does not
> make
> > the above configuration impossible either. Let's say you have a chained
> > configuration of N servers, with pairs formed between every:
> >
> > h(i):/b1 h((i+1) % N):/b2 | i := 0 -> N-1
> >
> > Now you add N+1th server.
> >
> > Using replace-brick, you have been doing thus far:
> >
> > 1. add-brick hN:/b1 h0:/b2a # because h0:/b2 was "part of a previous
> brick"
> > 2. replace-brick h0:/b2 hN:/b2 start ... commit
> >
> > In case you are doing an add-brick/remove-brick approach, you would no

Re: [Gluster-users] [Gluster-devel] Phasing out replace-brick for data migration in favor of remove-brick.

2013-09-30 Thread Anand Avati
On Fri, Sep 27, 2013 at 10:15 AM, Amar Tumballi  wrote:

>
> I plan to send out patches to remove all traces of replace-brick data
>> migration code by 3.5 branch time.
>>
>> Thanks for the initiative, let me know if you need help.
>

I could use help here, if you have free cycles to pick up this task?

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Phasing out replace-brick for data migration in favor of remove-brick.

2013-09-29 Thread Anand Avati
On Fri, Sep 27, 2013 at 1:56 AM, James  wrote:

> On Fri, 2013-09-27 at 00:35 -0700, Anand Avati wrote:
> > Hello all,
> Hey,
>
> Interesting timing for this post...
> I've actually started working on automatic brick addition/removal. (I'm
> planning to add this to puppet-gluster of course.) I was hoping you
> could help out with the algorithm. I think it's a bit different if
> there's no replace-brick command as you are proposing.
>
> Here's the problem:
> Given a logically optimal initial volume:
>
> volA: rep=2; h1:/b1 h2:/b1 h3:/b1 h4:/b1 h1:/b2 h2:/b2 h3:/b2 h4:/b2
>
> suppose I know that I want to add/remove bricks such that my new volume
> (if I had created it new) looks like:
>
> volB: rep=2; h1:/b1 h3:/b1 h4:/b1 h5:/b1 h6:/b1 h1:/b2 h3:/b2 h4:/b2
> h5:/b2 h6:/b2
>
> What is the optimal algorithm for determining the correct sequence of
> transforms that are needed to accomplish this task. Obviously there are
> some simpler corner cases, but I'd like to solve the general case.
>
> The transforms are obviously things like running the add-brick {...} and
> remove-brick {...} commands.
>
> Obviously we have to take into account that it's better to add bricks
> and rebalance before we remove bricks and risk the file system if a
> replica is missing. The algorithm should work for any replica N. We want
> to make sure the new layout makes sense to replicate the data on
> different servers. In many cases, this will require creating a circular
> "chain" of bricks as illustrated in the bottom of this image:
> http://joejulian.name/media/uploads/images/replica_expansion.png
> for example. I'd like to optimize for safety first, and then time, I
> imagine.
>
> Many thanks in advance.
>
>
I see what you are asking. First of all, when running a 2-replica volume
you almost pretty much always want to have an even number of servers, and
add servers in even numbers. Ideally the two "sides" of the replicas should
be placed in separate failures zones - separate racks with separate power
supplies or separate AZs in the cloud. Having an odd number of servers with
an 2 replicas is a very "odd" configuration. In all these years I am yet to
come across a customer who has a production cluster with 2 replicas and an
odd number of servers. And setting up replicas in such a chained manner
makes it hard to reason about availability, especially when you are trying
recover from a disaster. Having clear and separate "pairs" is definitely
what is recommended.

That being said, nothing prevents one from setting up a chain like above as
long as you are comfortable with the complexity of the configuration. And
phasing out replace-brick in favor of add-brick/remove-brick does not make
the above configuration impossible either. Let's say you have a chained
configuration of N servers, with pairs formed between every:

h(i):/b1 h((i+1) % N):/b2 | i := 0 -> N-1

Now you add N+1th server.

Using replace-brick, you have been doing thus far:

1. add-brick hN:/b1 h0:/b2a # because h0:/b2 was "part of a previous brick"
2. replace-brick h0:/b2 hN:/b2 start ... commit

In case you are doing an add-brick/remove-brick approach, you would now
instead do:

1. add-brick h(N-1):/b1a hN:/b2
2. add-brick hN:/b1 h0:/b2a
3. remove-brick h(N-1):/b1 h0:/b2 start ... commit

You will not be left with only 1 copy of a file at any point in the
process, and achieve the same "end result" as you were with replace-brick.
As mentioned before, I once again request you to consider if you really
want to deal with the configuration complexity of having chained
replication, instead of just adding servers in pairs.

Please ask if there are any more questions or concerns.

Avati



> James
>
> Some comments below, although I'm a bit tired so I hope I said it all
> right.
>
> > DHT's remove-brick + rebalance has been enhanced in the last couple of
> > releases to be quite sophisticated. It can handle graceful
> decommissioning
> > of bricks, including open file descriptors and hard links.
> Sweet
>
> >
> > This in a way is a feature overlap with replace-brick's data migration
> > functionality. Replace-brick's data migration is currently also used for
> > planned decommissioning of a brick.
> >
> > Reasons to remove replace-brick (or why remove-brick is better):
> >
> > - There are two methods of moving data. It is confusing for the users and
> > hard for developers to maintain.
> >
> > - If server being replaced is a member of a replica set, neither
> > remove-brick nor replace-brick data migration is necessary, because
> > self-healing itself will recreate the data (replace-brick actually uses
>

[Gluster-users] Phasing out replace-brick for data migration in favor of remove-brick.

2013-09-27 Thread Anand Avati
Hello all,
DHT's remove-brick + rebalance has been enhanced in the last couple of
releases to be quite sophisticated. It can handle graceful decommissioning
of bricks, including open file descriptors and hard links.

This in a way is a feature overlap with replace-brick's data migration
functionality. Replace-brick's data migration is currently also used for
planned decommissioning of a brick.

Reasons to remove replace-brick (or why remove-brick is better):

- There are two methods of moving data. It is confusing for the users and
hard for developers to maintain.

- If server being replaced is a member of a replica set, neither
remove-brick nor replace-brick data migration is necessary, because
self-healing itself will recreate the data (replace-brick actually uses
self-heal internally)

- In a non-replicated config if a server is getting replaced by a new one,
add-brick  + remove-brick  "start" achieves the same goal as
replace-brick   "start".

- In a non-replicated config,  is NOT glitch free
(applications witness ENOTCONN if they are accessing data) whereas
add-brick  + remove-brick  is completely transparent.

- Replace brick strictly requires a server with enough free space to hold
the data of the old brick, whereas remove-brick will evenly spread out the
data of the bring being removed amongst the remaining servers.

- Replace-brick code is complex and messy (the real reason :p).

- No clear reason why replace-brick's data migration is better in any way
to remove-brick's data migration.

I plan to send out patches to remove all traces of replace-brick data
migration code by 3.5 branch time.

NOTE that replace-brick command itself will still exist, and you can
replace on server with another in case a server dies. It is only the data
migration functionality being phased out.

Please do ask any questions / raise concerns at this stage :)

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] NFS crashes on Gluster 3.4.0

2013-09-25 Thread Anand Avati
On Wed, Sep 25, 2013 at 12:47 PM, John Mark Walker wrote:

>
> - Original Message -
> >
> > Sadly this won't help, but thanks for your effort. We use the Ubuntu
> > repository for Gluster, so sadly this is no option. Also I don't think
> I'd
> > be too happy with Packages built from Git on a production system. And
> > neither would my boss. ;)
>
> :) understood.
>
> >
> > So I guess we'll have to wait until the next 3.4.0 release. From your
> > experience, what would your estimates be when the next release would come
> > out?
>
> This week - I was expecting it today. Perhaps tomorrow?
>

I'm about to fire off an RC release including the ACL fix backported (the
one reported by Lukas and Lubomir). That should be the last patch for 3.4.1.

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] glusterfs-3.4.1qa2 released

2013-09-20 Thread Anand Avati
Thanks Lukas. Copying Lubomir. Can you confirm that
http://review.gluster.org/5693 is no more needed then? Also, can you please
vote on http://review.gluster.org/5979?

Thanks,
Avati



On Fri, Sep 20, 2013 at 4:39 AM, Lukáš Bezdička  wrote:

> I was unable to reproduce the issue with patch #2 from
> http://review.gluster.org/#/c/5979/
>
> Thank you.
>
>
> On Fri, Sep 20, 2013 at 11:52 AM, Anand Avati  wrote:
>
>> Please pick #2 resubmission, that is fine.
>>
>> Avati
>>
>>
>> On Fri, Sep 20, 2013 at 2:48 AM, Lukáš Bezdička <
>> lukas.bezdi...@gooddata.com> wrote:
>>
>>> Will take about 2 hours to setup test env, also build seems to be failed
>>> but does not seem to be caused by the patch :/
>>>
>>>
>>> On Fri, Sep 20, 2013 at 11:38 AM, Anand Avati  wrote:
>>>
>>>> Can you please confirm if http://review.gluster.org/5979 fixes the
>>>> problem of #998967 for you? If so we will backport and include the
>>>> patch in 3.4.1.
>>>>
>>>> Thanks,
>>>> Avati
>>>>
>>>>
>>>> On Fri, Sep 20, 2013 at 2:03 AM, Anand Avati  wrote:
>>>>
>>>>> I have a theory for #998967 (that posix-acl is not doing the right
>>>>> thing after chmod/setattr). Preparing a patch, will appreciate if you can
>>>>> test it quickly.
>>>>>
>>>>> Avati
>>>>>
>>>>>
>>>>> On Fri, Sep 20, 2013 at 1:26 AM, Lukáš Bezdička <
>>>>> lukas.bezdi...@gooddata.com> wrote:
>>>>>
>>>>>> No, I see issues reported in
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=998967 which is probably
>>>>>> related to BZ#991035.
>>>>>>
>>>>>>
>>>>>> On Thu, Sep 19, 2013 at 7:40 PM, Vijay Bellur wrote:
>>>>>>
>>>>>>> On 09/18/2013 02:45 PM, Lukáš Bezdička wrote:
>>>>>>>
>>>>>>>> Tested with glusterfs-3.4.1qa2-1.el6.x86_**64 issue with ACL is
>>>>>>>> still
>>>>>>>> there, unless one applies patch from http://review.gluster.org/#/c/
>>>>>>>> **5693/ <http://review.gluster.org/#/c/5693/>
>>>>>>>> which shoots through the caches and takes ACLs from server or sets
>>>>>>>> entry-timeout=0 it returns wrong values. This is probably because
>>>>>>>> ACL
>>>>>>>> mask being applied incorrectly in posix_acl_inherit_mode, but I'm
>>>>>>>> no C
>>>>>>>> expert to say so :(
>>>>>>>>
>>>>>>>>
>>>>>>> Checking again. Are you seeing issues reported in both BZ#991035 and
>>>>>>> BZ#990830 with 3.4.1qa2?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vijay
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ___
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users@gluster.org
>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] glusterfs-3.4.1qa2 released

2013-09-20 Thread Anand Avati
Please pick #2 resubmission, that is fine.

Avati


On Fri, Sep 20, 2013 at 2:48 AM, Lukáš Bezdička  wrote:

> Will take about 2 hours to setup test env, also build seems to be failed
> but does not seem to be caused by the patch :/
>
>
> On Fri, Sep 20, 2013 at 11:38 AM, Anand Avati  wrote:
>
>> Can you please confirm if http://review.gluster.org/5979 fixes the
>> problem of #998967 for you? If so we will backport and include the patch
>> in 3.4.1.
>>
>> Thanks,
>> Avati
>>
>>
>> On Fri, Sep 20, 2013 at 2:03 AM, Anand Avati  wrote:
>>
>>> I have a theory for #998967 (that posix-acl is not doing the right thing
>>> after chmod/setattr). Preparing a patch, will appreciate if you can test it
>>> quickly.
>>>
>>> Avati
>>>
>>>
>>> On Fri, Sep 20, 2013 at 1:26 AM, Lukáš Bezdička <
>>> lukas.bezdi...@gooddata.com> wrote:
>>>
>>>> No, I see issues reported in
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=998967 which is probably
>>>> related to BZ#991035.
>>>>
>>>>
>>>> On Thu, Sep 19, 2013 at 7:40 PM, Vijay Bellur wrote:
>>>>
>>>>> On 09/18/2013 02:45 PM, Lukáš Bezdička wrote:
>>>>>
>>>>>> Tested with glusterfs-3.4.1qa2-1.el6.x86_**64 issue with ACL is still
>>>>>> there, unless one applies patch from http://review.gluster.org/#/c/**
>>>>>> 5693/ <http://review.gluster.org/#/c/5693/>
>>>>>> which shoots through the caches and takes ACLs from server or sets
>>>>>> entry-timeout=0 it returns wrong values. This is probably because ACL
>>>>>> mask being applied incorrectly in posix_acl_inherit_mode, but I'm no C
>>>>>> expert to say so :(
>>>>>>
>>>>>>
>>>>> Checking again. Are you seeing issues reported in both BZ#991035 and
>>>>> BZ#990830 with 3.4.1qa2?
>>>>>
>>>>> Thanks,
>>>>> Vijay
>>>>>
>>>>>
>>>>
>>>> ___
>>>> Gluster-users mailing list
>>>> Gluster-users@gluster.org
>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>>>
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] glusterfs-3.4.1qa2 released

2013-09-20 Thread Anand Avati
Can you please confirm if http://review.gluster.org/5979 fixes the problem
of #998967 for you? If so we will backport and include the patch in 3.4.1.

Thanks,
Avati


On Fri, Sep 20, 2013 at 2:03 AM, Anand Avati  wrote:

> I have a theory for #998967 (that posix-acl is not doing the right thing
> after chmod/setattr). Preparing a patch, will appreciate if you can test it
> quickly.
>
> Avati
>
>
> On Fri, Sep 20, 2013 at 1:26 AM, Lukáš Bezdička <
> lukas.bezdi...@gooddata.com> wrote:
>
>> No, I see issues reported in
>> https://bugzilla.redhat.com/show_bug.cgi?id=998967 which is probably
>> related to BZ#991035.
>>
>>
>> On Thu, Sep 19, 2013 at 7:40 PM, Vijay Bellur  wrote:
>>
>>> On 09/18/2013 02:45 PM, Lukáš Bezdička wrote:
>>>
>>>> Tested with glusterfs-3.4.1qa2-1.el6.x86_**64 issue with ACL is still
>>>> there, unless one applies patch from http://review.gluster.org/#/c/**
>>>> 5693/ <http://review.gluster.org/#/c/5693/>
>>>> which shoots through the caches and takes ACLs from server or sets
>>>> entry-timeout=0 it returns wrong values. This is probably because ACL
>>>> mask being applied incorrectly in posix_acl_inherit_mode, but I'm no C
>>>> expert to say so :(
>>>>
>>>>
>>> Checking again. Are you seeing issues reported in both BZ#991035 and
>>> BZ#990830 with 3.4.1qa2?
>>>
>>> Thanks,
>>> Vijay
>>>
>>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] glusterfs-3.4.1qa2 released

2013-09-20 Thread Anand Avati
I have a theory for #998967 (that posix-acl is not doing the right thing
after chmod/setattr). Preparing a patch, will appreciate if you can test it
quickly.

Avati


On Fri, Sep 20, 2013 at 1:26 AM, Lukáš Bezdička  wrote:

> No, I see issues reported in
> https://bugzilla.redhat.com/show_bug.cgi?id=998967 which is probably
> related to BZ#991035.
>
>
> On Thu, Sep 19, 2013 at 7:40 PM, Vijay Bellur  wrote:
>
>> On 09/18/2013 02:45 PM, Lukáš Bezdička wrote:
>>
>>> Tested with glusterfs-3.4.1qa2-1.el6.x86_**64 issue with ACL is still
>>> there, unless one applies patch from http://review.gluster.org/#/c/**
>>> 5693/ 
>>> which shoots through the caches and takes ACLs from server or sets
>>> entry-timeout=0 it returns wrong values. This is probably because ACL
>>> mask being applied incorrectly in posix_acl_inherit_mode, but I'm no C
>>> expert to say so :(
>>>
>>>
>> Checking again. Are you seeing issues reported in both BZ#991035 and
>> BZ#990830 with 3.4.1qa2?
>>
>> Thanks,
>> Vijay
>>
>>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] samba-glusterfs-vfs does not build

2013-09-19 Thread Anand Avati
On Thu, Sep 19, 2013 at 11:28 AM, Nux!  wrote:

> On 18.09.2013 19:04, Nux! wrote:
>
>> Hi,
>>
>> I'm trying to build and test samba-glusterfs-vfs, but problems appear
>> from the start:
>> http://fpaste.org/40562/**95274621/ 
>>
>> Any pointers?
>>
>
> Anyone from devel has any ideas?
>
> Thanks,
> Lucian


Have you ./configure'd in the samba tree? --with-samba-source must point to
a "built" samba tree (not just extracted)

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster samba vfs read performance slow

2013-09-17 Thread Anand Avati
Can you get the volume profile dumps for both the runs and compare them?

Avati



On Tue, Sep 17, 2013 at 10:46 PM, kane  wrote:

> I have already used "kernel oplocks = no" in the smb.conf, next is my
> original smb.conf file global settings:
> [global]
> workgroup = MYGROUP
> server string = DCS Samba Server
> log file = /var/log/samba/log.vfs
> max log size = 50
> aio read size = 262144
> aio write size = 262144
> aio write behind = true
> security = user
> passdb backend = tdbsam
> load printers = yes
> cups options = raw
> read raw = yes
> write raw = yes
> max xmit = 262144
> socket options = TCP_NODELAY IPTOS_LOWDELAY SO_RCVBUF=262144
> SO_SNDBUF=262144
> #   max protocol = SMB2
> kernel oplocks = no
>     stat cache = no
>
> thank you
> -Kane
> 在 2013-9-18,下午1:38,Anand Avati  写道:
>
> > On 9/17/13 10:34 PM, kane wrote:
> >> Hi Anand,
> >>
> >> I use 2 gluster server , this is my volume info:
> >> Volume Name: soul
> >> Type: Distribute
> >> Volume ID: 58f049d0-a38a-4ebe-94c0-086d492bdfa6
> >> Status: Started
> >> Number of Bricks: 2
> >> Transport-type: tcp
> >> Bricks:
> >> Brick1: 192.168.101.133:/dcsdata/d0
> >> Brick2: 192.168.101.134:/dcsdata/d0
> >>
> >> each brick use a raid 5 logic disk with 8*2TSATA hdd.
> >>
> >> smb.conf:
> >> [gvol]
> >> comment = For samba export of volume  test
> >> vfs objects = glusterfs
> >> glusterfs:volfile_server = localhost
> >> glusterfs:volume = soul
> >> path = /
> >> read only = no
> >> guest ok = yes
> >>
> >> this my testparm result:
> >> [global]
> >> workgroup = MYGROUP
> >> server string = DCS Samba Server
> >> log file = /var/log/samba/log.vfs
> >> max log size = 50
> >> max xmit = 262144
> >> socket options = TCP_NODELAY IPTOS_LOWDELAY SO_RCVBUF=262144
> >> SO_SNDBUF=262144
> >> stat cache = No
> >> kernel oplocks = No
> >> idmap config * : backend = tdb
> >> aio read size = 262144
> >> aio write size = 262144
> >> aio write behind = true
> >> cups options = raw
> >>
> >> in client mount the smb share with cifs to dir /mnt/vfs,
> >> then use iozone executed in the cifs mount dir "/mnt/vfs":
> >> $ ./iozone -s 10G -r 128k -i0 -i1 -t 4
> >> File size set to 10485760 KB
> >> Record Size 128 KB
> >> Command line used: ./iozone -s 10G -r 128k -i0 -i1 -t 4
> >> Output is in Kbytes/sec
> >> Time Resolution = 0.01 seconds.
> >> Processor cache size set to 1024 Kbytes.
> >> Processor cache line size set to 32 bytes.
> >> File stride size set to 17 * record size.
> >> Throughput test with 4 processes
> >> Each process writes a 10485760 Kbyte file in 128 Kbyte records
> >>
> >> Children see throughput for  4 initial writers =  534315.84 KB/sec
> >> Parent sees throughput for  4 initial writers =  519428.83 KB/sec
> >> Min throughput per process =  133154.69 KB/sec
> >> Max throughput per process =  134341.05 KB/sec
> >> Avg throughput per process =  133578.96 KB/sec
> >> Min xfer = 10391296.00 KB
> >>
> >> Children see throughput for  4 rewriters =  536634.88 KB/sec
> >> Parent sees throughput for  4 rewriters =  522618.54 KB/sec
> >> Min throughput per process =  133408.80 KB/sec
> >> Max throughput per process =  134721.36 KB/sec
> >> Avg throughput per process =  134158.72 KB/sec
> >> Min xfer = 10384384.00 KB
> >>
> >> Children see throughput for  4 readers =   77403.54 KB/sec
> >> Parent sees throughput for  4 readers =   77402.86 KB/sec
> >> Min throughput per process =   19349.42 KB/sec
> >> Max throughput per process =   19353.42 KB/sec
> >> Avg throughput per process =   19350.88 KB/sec
> >> Min xfer = 10483712.00 KB
> >>
> >> Children see throughput for 4 re-readers =   77424.40 KB/sec
> >> Parent sees throughput for 4 re-readers =   77423.89 KB/sec
> >> Min throughput per process =   19354.75 KB/sec
> >> Max throughput per process =   19358.50 KB/sec
> >> Avg throughput per process =   19356.10 KB/sec
> >> Min xfer = 10483840.00 KB
> >>
> >> then the use the same command test in the 

Re: [Gluster-users] Gluster samba vfs read performance slow

2013-09-17 Thread Anand Avati
On 9/17/13 10:34 PM, kane wrote:
> Hi Anand,
> 
> I use 2 gluster server , this is my volume info:
> Volume Name: soul
> Type: Distribute
> Volume ID: 58f049d0-a38a-4ebe-94c0-086d492bdfa6
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.101.133:/dcsdata/d0
> Brick2: 192.168.101.134:/dcsdata/d0
> 
> each brick use a raid 5 logic disk with 8*2TSATA hdd.
> 
> smb.conf:
> [gvol]
>  comment = For samba export of volume  test
>  vfs objects = glusterfs
>  glusterfs:volfile_server = localhost
>  glusterfs:volume = soul
>  path = /
>  read only = no
>  guest ok = yes
> 
> this my testparm result:
> [global]
> workgroup = MYGROUP
> server string = DCS Samba Server
> log file = /var/log/samba/log.vfs
> max log size = 50
> max xmit = 262144
> socket options = TCP_NODELAY IPTOS_LOWDELAY SO_RCVBUF=262144 
> SO_SNDBUF=262144
> stat cache = No
> kernel oplocks = No
> idmap config * : backend = tdb
> aio read size = 262144
> aio write size = 262144
> aio write behind = true
> cups options = raw
> 
> in client mount the smb share with cifs to dir /mnt/vfs,
> then use iozone executed in the cifs mount dir "/mnt/vfs":
> $ ./iozone -s 10G -r 128k -i0 -i1 -t 4
> File size set to 10485760 KB
> Record Size 128 KB
> Command line used: ./iozone -s 10G -r 128k -i0 -i1 -t 4
> Output is in Kbytes/sec
> Time Resolution = 0.01 seconds.
> Processor cache size set to 1024 Kbytes.
> Processor cache line size set to 32 bytes.
> File stride size set to 17 * record size.
> Throughput test with 4 processes
> Each process writes a 10485760 Kbyte file in 128 Kbyte records
> 
> Children see throughput for  4 initial writers =  534315.84 KB/sec
> Parent sees throughput for  4 initial writers =  519428.83 KB/sec
> Min throughput per process =  133154.69 KB/sec
> Max throughput per process =  134341.05 KB/sec
> Avg throughput per process =  133578.96 KB/sec
> Min xfer = 10391296.00 KB
> 
> Children see throughput for  4 rewriters =  536634.88 KB/sec
> Parent sees throughput for  4 rewriters =  522618.54 KB/sec
> Min throughput per process =  133408.80 KB/sec
> Max throughput per process =  134721.36 KB/sec
> Avg throughput per process =  134158.72 KB/sec
> Min xfer = 10384384.00 KB
> 
> Children see throughput for  4 readers =   77403.54 KB/sec
> Parent sees throughput for  4 readers =   77402.86 KB/sec
> Min throughput per process =   19349.42 KB/sec
> Max throughput per process =   19353.42 KB/sec
> Avg throughput per process =   19350.88 KB/sec
> Min xfer = 10483712.00 KB
> 
> Children see throughput for 4 re-readers =   77424.40 KB/sec
> Parent sees throughput for 4 re-readers =   77423.89 KB/sec
> Min throughput per process =   19354.75 KB/sec
> Max throughput per process =   19358.50 KB/sec
> Avg throughput per process =   19356.10 KB/sec
> Min xfer = 10483840.00 KB
> 
> then the use the same command test in the dir mounted with glister fuse:
> File size set to 10485760 KB
> Record Size 128 KB
> Command line used: ./iozone -s 10G -r 128k -i0 -i1 -t 4
> Output is in Kbytes/sec
> Time Resolution = 0.01 seconds.
> Processor cache size set to 1024 Kbytes.
> Processor cache line size set to 32 bytes.
> File stride size set to 17 * record size.
> Throughput test with 4 processes
> Each process writes a 10485760 Kbyte file in 128 Kbyte records
> 
> Children see throughput for  4 initial writers =  887534.72 KB/sec
> Parent sees throughput for  4 initial writers =  848830.39 KB/sec
> Min throughput per process =  220140.91 KB/sec
> Max throughput per process =  223690.45 KB/sec
> Avg throughput per process =  221883.68 KB/sec
> Min xfer = 10319360.00 KB
> 
> Children see throughput for  4 rewriters =  892774.92 KB/sec
> Parent sees throughput for  4 rewriters =  871186.83 KB/sec
> Min throughput per process =  222326.44 KB/sec
> Max throughput per process =  223970.17 KB/sec
> Avg throughput per process =  223193.73 KB/sec
> Min xfer = 10431360.00 KB
> 
> Children see throughput for  4 readers =  605889.12 KB/sec
> Parent sees throughput for  4 readers =  601767.96 KB/sec
> Min throughput per process =  143133.14 KB/sec
> Max throughput per process =  159550.88 KB/sec
> Avg throughput per process =  151472.28 KB/sec
> Min xfer = 9406848.00 KB
> 
> it shows much higher perf.
> 
> any places i did wrong?
> 
> 
> thank you
> -Kane
> 
> 在 2013-9-18,下午1:19,Anand Avati  <mailto:av...@gluster.org>> 写道:
> 
>> How are you testing this? What tool are you using?
>>
>> Avati
>>
>>
>> On Tue, Sep 17, 2013 at 9:02 PM, kane > 

Re: [Gluster-users] Gluster samba vfs read performance slow

2013-09-17 Thread Anand Avati
How are you testing this? What tool are you using?

Avati


On Tue, Sep 17, 2013 at 9:02 PM, kane  wrote:

> Hi Vijay
>
> I used the code in https://github.com/gluster/glusterfs.git with
> the lasted commit:
> commit de2a8d303311bd600cb93a775bc79a0edea1ee1a
> Author: Anand Avati 
> Date:   Tue Sep 17 16:45:03 2013 -0700
>
> Revert "cluster/distribute: Rebalance should also verify free inodes"
>
> This reverts commit 215fea41a96479312a5ab8783c13b30ab9fe00fa
>
> Realized soon after merging, ….
>
> which include the patch you mentioned last time improve read perf, written
> by Anand.
>
> but the read perf was still slow:
> write: 500MB/s
> read: 77MB/s
>
> while via fuse :
> write 800MB/s
> read 600MB/s
>
> any advises?
>
>
> Thank you.
> -Kane
>
> 在 2013-9-13,下午10:37,kane  写道:
>
> > Hi Vijay,
> >
> >   thank you for post this message, i will try it soon
> >
> > -kane
> >
> >
> >
> > 在 2013-9-13,下午9:21,Vijay Bellur  写道:
> >
> >> On 09/13/2013 06:10 PM, kane wrote:
> >>> Hi
> >>>
> >>> We use gluster samba vfs test io,but the read performance via vfs is
> >>> half of write perfomance,
> >>> but via fuse the read and write performance is almost the same.
> >>>
> >>> this is our smb.conf:
> >>> [global]
> >>>workgroup = MYGROUP
> >>>server string = DCS Samba Server
> >>>log file = /var/log/samba/log.vfs
> >>>max log size = 50
> >>> #   use sendfile = true
> >>>aio read size = 262144
> >>>aio write size = 262144
> >>>aio write behind = true
> >>>min receivefile size = 262144
> >>>write cache size = 268435456
> >>>security = user
> >>>passdb backend = tdbsam
> >>>load printers = yes
> >>>cups options = raw
> >>>read raw = yes
> >>>write raw = yes
> >>>max xmit = 262144
> >>>socket options = TCP_NODELAY IPTOS_LOWDELAY SO_RCVBUF=262144
> >>> SO_SNDBUF=262144
> >>>kernel oplocks = no
> >>>stat cache = no
> >>>
> >>> any advises helpful?
> >>>
> >>
> >> This patch has shown improvement in read performance with libgfapi:
> >>
> >> http://review.gluster.org/#/c/5897/
> >>
> >> Would it be possible for you to try this patch and check if it improves
> performance in your case?
> >>
> >> -Vijay
> >>
> >
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Fwd: [Nfs-ganesha-devel] Announce: Push of next pre-2.0-dev_49

2013-09-14 Thread Anand Avati
Anand,

This is a great first step.. Looking forward for the integration to mature
soon. This is a big step for supporting NFSv4 and pNFS for GlusterFS.

Thanks!
Avati



On Sat, Sep 14, 2013 at 3:18 AM, Anand Subramanian wrote:

> FYI, the FSAL (File System Abstraction Layer) for Gluster is now available
> in the upstream nfs-ganesha community (details of branch, tag and commit
> below). This enables users to export Gluster volumes through nfs-ganesha
> and for use by both nfs v3 and v4 clients. Please note that this is an
> on-going effort.
>
> More details wrt configuration, building etc. will follow.
>
> Anand
>
>
> - Forwarded Message -
> From: Jim Lieb 
> To: nfs-ganesha-de...@lists.sourceforge.net
> Sent: Fri, 13 Sep 2013 22:20:43 -0400 (EDT)
> Subject: [Nfs-ganesha-devel] Announce: Push of next pre-2.0-dev_49
>
> Pushed to the project repo:
>
>   git://github.com/nfs-ganesha/nfs-ganesha.git branch next
>
> Branch: next
> Tag: pre-2.0-dev_49
>
> This week's merge is big.  It also took a little extra effort to file and
> fit
> some of the pieces to get them to slide into place.
>
> The Red Hat Gluster FS team has submitted their fsal.  I have built it
> but have not tested it.  It requires the glfsapi library and a header
> which I can supply to anyone else who wants to play.  They will be testing
> with us at BAT in Boston this month.  It is built by default but the build
> will be disabled if the build cannot find the header or libary.
>
> IBM has also submitted the Protectier fsal.  I have not built this but we
> expect a report from their team once they have tested the merge.
> Its build is off by default.
>
> The Pseudo filesystem handle for v4 has been reworked.  This was done
> to get the necessary handle changes in for V2.0.  Further work on pseudo
> file
> system infrastructure will build on this in 2.1.
>
> Frank and the IBM team submitted a large set of 1.5 to 2.0 bugfix ports.
>  This
> is almost all of them.  Frank has updated the port document reflecting
> current
> state.  Please feel free to grab some patches and port them.
>
> As usual, there have been bugfixes in multiple places.
>
> We tried to get the 1.5 log rotation and compression code in but found some
> bugs that will take more than a few line fix to get working in 2.0.  As a
> result, it has been reverted.
>
> Highlights:
>
> * FSAL_GLUSTER is a new fsal to export Gluster FS
>
> * FSAL_PT is a new fsal for the Protectier file system
>
> * Rework of the PseudoFS file handle format (NVFv4+ only)
>
> * More 1.5 to 2.0 bugfix ports
>
> * Lots of bugfixes
>
> Enjoy
>
> Jim
> --
> Jim Lieb
> Linux Systems Engineer
> Panasas Inc.
>
> "If ease of use was the only requirement, we would all be riding tricycles"
> - Douglas Engelbart 1925–2013
>
> Short log from pre-2.0-dev_47
> --
> commit b2a927948e627367d87af04892afbb031ed85d75
> Author: Jeremy Bongio 
>
> Don't access export in SAVEFH request when FH is for pseudofs and fix
> up
> references
>
> commit 03228228ab64f8d004b864ae7829b51707bfc068
> Author: Jim Lieb 
>
> Revert "Added support for rotation and compression of log files."
>
> commit 0f8690df03a57243d65f20d23c53f86a9e0b17cc
> Merge: cca7875 9483a7d
> Author: Jim Lieb 
>
> Merge remote-tracking branch 'ffilz/porting-doc' into merge_next
>
> commit cca787542d85112cb3e0706caf5ae007b8cd5285
> Merge: 2f0118d af03de5
> Author: Jim Lieb 
>
> Merge remote-tracking branch 'martinetd/for_dev_49' into merge_next
>
> commit 9483a7d7ab54a5e6e6daf4521928b147fa7329b8
> Author: Frank S. Filz 
>
> Clean up porting doc
>
> commit d19cadcf4069976c299e968e890efc8d0ccf001a
> Author: Frank S. Filz 
>
> Update porting doc for dev_49
>
> commit 2f0118d2eb9a3f95cff08070ff3453ca7ce0d4a2
> Merge: a75665a 9530440
> Author: Jim Lieb 
>
> Merge branch 'glusterfs' into merge_next
>
> commit a75665ac75c01e767780cea023c2a8f74b46e2a0
> Merge: 3c7578c 183e044
> Author: Jim Lieb 
>
> Merge remote-tracking branch 'sachin/next' into merge_next
>
> commit 3c7578cde4d47344b0dac2264e9990de3b029ba6
> Merge: c0aa16f 75d81d1
> Author: Jim Lieb 
>
> Merge remote-tracking branch 'linuxbox2/next' into merge_next
>
> commit c0aa16f8ea25c3dae059b349302083291ea7af9d
> Author: Jim Lieb 
>
> Fixups to logging macros and display logic
>
> commit 183e0440d2d8a9f1ef0513807829fd7c15e568d4
> Author: Sachin Bhamare 
>
> Fix the order in which credentials are set in fsal_set_credentials().
>
> commit 0af11c7592092825098215733fc9a14cbc9bcfe3
> Author: Sachin Bhamare 
>
> Fix bugs in FreeBSD version of setuser() and setgroup().
>
> commit b9ca8bddbe140f90c216aeb6611465060607420e
> Merge: 9629e2a 5eeb095
> Author: Jim Lieb 
>
> Merge remote-tracking branch 'ganltc/ibm_next_20' into merge_next
>
> commit 953044057566c7d9013b276a14879a3f226d6972
> Author: Jim Lieb 
>
> Fixups to glusterfs build
>
> commit 5eeb095abfc07819426f09f70e455f8f17cbff48
> Merge: 751ac7b da47438
> 

Re: [Gluster-users] compiling samba vfs module

2013-09-11 Thread Anand Avati
Daniel's problem is probably not ./configure params. He is trying to get
the module working with Samba 4.0. Currently the module exists on samba.git
master branch (4.1.x?) and in forge.gluster.org for samba-3.6.x. There is
no vfs module version available for Samba 4.0.

Avati


On Wed, Sep 11, 2013 at 12:02 PM, Jose Rivera  wrote:

> Hello Daniel,
>
> I think your --with-glusterfs DIR should be
>
> /data/gluster/glusterfs-3.4.0final/debian/tmp/usr/
>
> since we look for things in $DIR/include, $DIR/lib, etc.
>
> It might be more accurate for our docs to call this the glusterfs
> installation prefix instead. :) Let me know if this works.
>
> Cheers,
> --Jose
>
> - Original Message -
> > From: "Lalatendu Mohanty" 
> > To: muel...@tropenklinik.de
> > Cc: gluster-users@gluster.org
> > Sent: Wednesday, September 11, 2013 2:20:50 AM
> > Subject: Re: [Gluster-users] compiling samba vfs module
> >
> > On 09/11/2013 12:43 PM, Lalatendu Mohanty wrote:
> > > On 09/11/2013 11:34 AM, Daniel Müller wrote:
> > >> Hi again,
> > >>
> > >> I did not manage to bring samba 4 up, with sambas glusterfs vfs
> > >> module on
> > >> centos 6.4 with glusterfs-3.4.0, too.
> > >> I think it is only working with a special version.
> > >>
> > >> Greetings
> > >> Daniel
> > > Hi Daniel,
> > >
> > > You can try getting the samba source  from
> > > https://forge.gluster.org/samba-glusterfs. Though I dont expect it is
> > > different then upstream samba but it is worth a try.
> > Oops!! it only contain the VFS plug-in code, not the whole samba source
> > code. So it would not serve your purpose.
> >
> > >
> > > Thanks,
> > > Lala
> > >>
> > >> ---
> > >> EDV Daniel Müller
> > >>
> > >> Leitung EDV
> > >> Tropenklinik Paul-Lechler-Krankenhaus
> > >> Paul-Lechler-Str. 24
> > >> 72076 Tübingen
> > >>
> > >> Tel.: 07071/206-463, Fax: 07071/206-499
> > >> eMail: muel...@tropenklinik.de
> > >> Internet: www.tropenklinik.de
> > >> ---
> > >>
> > >> -Ursprüngliche Nachricht-
> > >> Von: gluster-users-boun...@gluster.org
> > >> [mailto:gluster-users-boun...@gluster.org] Im Auftrag von Tamas Papp
> > >> Gesendet: Dienstag, 10. September 2013 23:57
> > >> An: gluster-users@gluster.org
> > >> Betreff: [Gluster-users] compiling samba vfs module
> > >>
> > >> hi All,
> > >>
> > >> The system is Ubuntu 12.04
> > >>
> > >> I download and extracted source packages of samba and glusterfs and I
> > >> built
> > >> glusterfs, so I get the right necessary structure:
> > >> glusterfs version is 3.4 and it's from ppa.
> > >>
> > >> # ls
> > >>
> /data/gluster/glusterfs-3.4.0final/debian/tmp/usr/include/glusterfs/api/glfs
> > >>
> > >> .h
> > >>
> /data/gluster/glusterfs-3.4.0final/debian/tmp/usr/include/glusterfs/api/glfs
> > >>
> > >> .h
> > >>
> > >>
> > >> Unfortunately I'm getting this error:
> > >>
> > >> # ./configure --with-samba-source=/data/samba/samba-3.6.3/
> > >>
> --with-glusterfs=/data/gluster/glusterfs-3.4.0final/debian/tmp/usr/include/g
> > >>
> > >> lusterfs/
> > >> checking for gcc... gcc
> > >> checking whether the C compiler works... yes checking for C compiler
> > >> default
> > >> output file name... a.out checking for suffix of executables...
> > >> checking whether we are cross compiling... no checking for suffix of
> > >> object
> > >> files... o checking whether we are using the GNU C compiler... yes
> > >> checking
> > >> whether gcc accepts -g... yes checking for gcc option to accept ISO
> > >> C89...
> > >> none needed checking for a BSD-compatible install... /usr/bin/install
> -c
> > >> checking build system type... x86_64-unknown-linux-gnu checking host
> > >> system
> > >> type... x86_64-unknown-linux-gnu checking how to run the C
> > >> preprocessor...
> > >> gcc -E checking for grep that handles long lines and -e... /bin/grep
> > >> checking for egrep... /bin/grep -E checking for ANSI C header
> > >> files... yes
> > >> checking for sys/types.h... yes checking for sys/stat.h... yes
> > >> checking for
> > >> stdlib.h... yes checking for string.h... yes checking for memory.h...
> > >> yes
> > >> checking for strings.h... yes checking for inttypes.h... yes checking
> > >> for
> > >> stdint.h... yes checking for unistd.h... yes checking api/glfs.h
> > >> usability... no checking api/glfs.h presence... no checking for
> > >> api/glfs.h... no
> > >>
> > >> Cannot find api/glfs.h. Please specify --with-glusterfs=dir if
> necessary
> > >>
> > >>
> > >>
> > >> I don't see, what its problem is?
> > >>
> > >>
> > >> Thanks,
> > >> tamas
> > >> ___
> > >> Gluster-users mailing list
> > >> Gluster-users@gluster.org
> > >> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > >>
> > >> ___
> > >> Gluster-users mailing list
> > >> Gluster-users@gluster.org
> > >> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > >
> > > __

Re: [Gluster-users] compiling samba vfs module

2013-09-11 Thread Anand Avati
>
>
> checking api/glfs.h usability... yes
> checking api/glfs.h presence... yes
> checking for api/glfs.h... yes
> checking for glfs_init... no
>
> Cannot link to gfapi (glfs_init). Please specify --with-glusterfs=dir if
> necessary
>

I see what can be causing this. configure.ac has a hardcoded
"-L$with_glusterfs/lib64" in line 74. That should be changed to
"-L$with_glusterfs/lib -L$with_glusterfs/lib64" to allow for both dirs. I
am suspecting you have libglusterfs installed under $prefx/usr/lib? For now
you can create a symlink called lib64 to point to lib under /data/
gluster/glusterfs-3.4.0final/debian/tmp/usr.

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] compiling samba vfs module

2013-09-10 Thread Anand Avati
On Tue, Sep 10, 2013 at 2:57 PM, Tamas Papp  wrote:

>
> --with-glusterfs=/data/gluster/glusterfs-3.4.0final/debian/tmp/usr/include/glusterfs
>

Make that --with-glusterfs=/data/gluster/glusterfs-3.4.0final/debian/tmp/usr/
(exclude /include/glusterfs suffix).

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Feedback requested: Re-design of the Gluster.org web site

2013-09-06 Thread Anand Avati
On Tue, Sep 3, 2013 at 10:08 PM, John Mark Walker wrote:

> Greetings,
>
> As you probably know, the gluster.org web site has been using the same
> design for over two years now, and we still have the problem of a web site
> that uses multiple applications, none of which are integrated. We are
> fixing that - soon.
>
> Take a look at the latest site design posted on the gluster-site project
> on the Gluster Forge:
>
> -
> https://forge.gluster.org/gluster-site/gluster-site/blobs/raw/master/_site/index.html


Love the new ant logo ;) Replaced github.com/gluster with the new logo too..

Avati


>
>
> The text on these pages will change as we change the language and add more
> documents. Be patient as the site is under heavy development.
>
> We are producing a site that is consistent across all pages, including the
> Gluster Forge, and will have a single user database across all of
> gluster.org. To do this, we opted for a flat-file model powered by
> Awestruct, which you can read about at http://awestruct.org/ , and using
> asciidoc as the default document standard.
>
> New text and documents will continue to be imported into the new structure
> over the next couple of weeks, so please bear this in mind if you visit the
> staging server while it's broken.
>
> At the moment, the site design is hosted via a git repository on the
> forge. Once this is pushed to a proper staging server, I'll send out the
> new URL.
>
> Thanks,
> JM
> ___
> Announce mailing list
> annou...@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/announce
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] [FEEDBACK] Governance of GlusterFS project

2013-09-06 Thread Anand Avati
Good point Amar.. Noted.

Avati


On Fri, Sep 6, 2013 at 1:40 AM, Amar Tumballi  wrote:

> One of the other things we missed in this thread is how to handle bugs in
> bugzilla, and who should own the triage for high/urgent priority bugs.
>
> -Amar
>
>
> On Fri, Jul 26, 2013 at 10:56 PM, Anand Avati wrote:
>
>> Hello everyone,
>>
>>   We are in the process of formalizing the governance model of the
>> GlusterFS project. Historically, the governance of the project has been
>> loosely structured. This is an invitation to all of you to participate in
>> this discussion and provide your feedback and suggestions on how we should
>> evolve a formal model. Feedback from this thread will be considered to the
>> extent possible in formulating the draft (which will be sent out for review
>> as well).
>>
>>   Here are some specific topics to seed the discussion:
>>
>> - Core team formation
>>   - what are the qualifications for membership (e.g contributions of
>> code, doc, packaging, support on irc/lists, how to quantify?)
>>   - what are the responsibilities of the group (e.g direction of the
>> project, project roadmap, infrastructure, membership)
>>
>> - Roadmap
>>   - process of proposing features
>>   - process of selection of features for release
>>
>> - Release management
>>   - timelines and frequency
>>   - release themes
>>   - life cycle and support for releases
>>   - project management and tracking
>>
>> - Project maintainers
>>   - qualification for membership
>>   - process and evaluation
>>
>> There are a lot more topics which need to be discussed, I just named some
>> to get started. I am sure our community has members who belong and
>> participate (or at least are familiar with) other open source project
>> communities. Your feedback will be valuable.
>>
>> Looking forward to hearing from you!
>>
>> Avati
>>
>>
>> ___
>> Gluster-devel mailing list
>> gluster-de...@nongnu.org
>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Enabling Apache Hadoop on GlusterFS: glusterfs-hadoop 2.1 released

2013-09-05 Thread Anand Avati
On Thu, Sep 5, 2013 at 2:53 PM, Stephen Watt  wrote:

> Hi Folks
>
> We are pleased to announce a major update to the glusterfs-hadoop project
> with the release of version 2.1. The glusterfs-hadoop project, available at
> The glusterfs-hadoop project team, provides an Apache licensed Hadoop
> FileSystem plugin which enables Apache Hadoop 1.x and 2.x to run directly
> on top of GlusterFS. This release includes a re-architected plugin which
> now extends existing functionality within Hadoop to run on local and POSIX
> File Systems.
>
> -- Overview --
>
> Apache Hadoop has a pluggable FileSystem Architecture. This means that if
> you have a filesystem or object store that you would like to use with
> Hadoop, you can create a Hadoop FileSystem plugin for it which will act as
> a mediator between the generic Hadoop FileSystem interface and your
> filesystem of choice. A popular example would be that over a million Hadoop
> clusters are spun up on Amazon every year, a lot of which use Amazon S3 as
> the Hadoop FileSystem.
>
> In order to configure the plugin, a specific deployment configuration is
> required. Firstly, it is required that the Hadoop JobTracker and
> TaskTrackers (or the Hadoop 2.x equivalents) are installed on servers
> within the gluster trusted storage pool for a given gluster volume. The
> JobTracker uses the plugin to query the extended attributes for job input
> files in gluster to ascertain file placement as well as the distribution of
> file replicas across the cluster. The TaskTrackers use the plugin to
> leverage a local fuse mount of the gluster volume in order to access the
> data required for the tasks. When the JobTracker receives a Hadoop job, it
> uses the locality information it ascertains via the plugin to send the
> tasks for the Hadoop Job to Hadoop TaskTrackers on servers that have the
> data required for the task within their local bricks. This ensures data is
> read from disk and not over the network. Please see the attached diagram
> which provides an overview of the entire solution for a Hadoop 1.x
> deployment.
>
> The community project, along with the documentation and available
> releases, is hosted within the Gluster Forge at
> http://forge.gluster.org/hadoop. The glusterfs-hadoop project will also
> be available within the Fedora 20 release later this year, alongside fellow
> Fedora newcomer Apache Hadoop and the already available gluster project.
> The glusterfs-hadoop project team welcomes contributions and participation
> from the broader community.
>
> Stay tuned for upcoming posts around GlusterFS integration into the Apache
> Ambari and Fedora projects.
>
> Regards
> The glusterfs-hadoop project team
> ___
> Announce mailing list
> annou...@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/announce
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>

Congratulations! This is great news!!

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] NFS cann't use by esxi

2013-09-05 Thread Anand Avati
That's odd, because IIRC Esxi does not even use NLM protocol..

Avati


On Wed, Sep 4, 2013 at 10:44 PM, higkoohk  wrote:

> Thanks Vijay !
>
> It run success after 'volume set images-stripe nfs.nlm off'.
>
> Now I can use Esxi with Glusterfs's nfs export .
>
> Many thanks!
>
>
> 2013/9/5 Anand Avati 
>
>> This looks like it might be because you need -
>>
>> http://review.gluster.org/4591
>>
>>
>> If you can confirm, we can backport it to 3.4.1.
>>
>>
>> Thanks,
>>
>> Avati
>>
>>
>>
>> On Wed, Sep 4, 2013 at 9:03 PM, Vijay Bellur  wrote:
>>
>>>  On 09/05/2013 04:55 AM, higkoohk wrote:
>>> > yes,I'm using GlusterFS 3.4.0
>>> >
>>>
>>> Can you try by turning off acl with nfs. Some performance numbers have
>>> been better with acl turned off. This can be done through:
>>>
>>> volume set  nfs.acl off
>>>
>>> If there is no improvement, opening a bug and attaching gluster logs to
>>> that would be useful.
>>>
>>> -Vijay
>>>
>>> > 在 2013-9-5 上午3:12,"Vijay Bellur" >> > <mailto:vbel...@redhat.com>>写道:
>>> >
>>> > On 09/04/2013 08:38 AM, higkoohk wrote:
>>> >
>>> > Hello everyone:
>>> >
>>> >   When I use esxi mount nfs storage to gluster, mount
>>> success!
>>> >   But , It is very very slowly when create vm, then failed
>>> !
>>> >   I can see the vm's dir and files create successed. but
>>> the
>>> > size is
>>> > zero.
>>> >
>>> >   Does anyone hit this ? What happened and how does me do ?
>>> >
>>> >
>>> > Is this behavior being observed with GlusterFS 3.4.0?
>>> >
>>> > -Vijay
>>> >
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] NFS cann't use by esxi

2013-09-04 Thread Anand Avati
This looks like it might be because you need -

http://review.gluster.org/4591


If you can confirm, we can backport it to 3.4.1.


Thanks,

Avati



On Wed, Sep 4, 2013 at 9:03 PM, Vijay Bellur  wrote:

> On 09/05/2013 04:55 AM, higkoohk wrote:
> > yes,I'm using GlusterFS 3.4.0
> >
>
> Can you try by turning off acl with nfs. Some performance numbers have
> been better with acl turned off. This can be done through:
>
> volume set  nfs.acl off
>
> If there is no improvement, opening a bug and attaching gluster logs to
> that would be useful.
>
> -Vijay
>
> > 在 2013-9-5 上午3:12,"Vijay Bellur"  > >写道:
> >
> > On 09/04/2013 08:38 AM, higkoohk wrote:
> >
> > Hello everyone:
> >
> >   When I use esxi mount nfs storage to gluster, mount
> success!
> >   But , It is very very slowly when create vm, then failed !
> >   I can see the vm's dir and files create successed. but the
> > size is
> > zero.
> >
> >   Does anyone hit this ? What happened and how does me do ?
> >
> >
> > Is this behavior being observed with GlusterFS 3.4.0?
> >
> > -Vijay
> >
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Rebalance question

2013-09-04 Thread Anand Avati
Fred,
Questions regarding RHS (Red Hat Storage) are best asked with Red Hat
Support rather than the community. That being said, upgrading from one
version to another does not alter hash values / layouts and no new
linkfiles will be created because of the upgrade. You will see new hash
values and linkfiles if you add servers and run a simple rebalance. The
recommended way to eliminate linkfiles is to perform a full rebalance which
involves data migration.

Avati


On Wed, Sep 4, 2013 at 5:17 AM, Fred van Zwieten wrote:

> Nobody?
>
>
> Met vriendelijke groeten,
> *
> Fred van Zwieten
> *
> *Enterprise Open Source Services*
> *
> Consultant*
> *(woensdags afwezig)*
>
> *VX Company IT Services B.V.*
> *T* (035) 539 09 50 mobiel (06) 41 68 28 48
> *F* (035) 539 09 08
> *E* fvzwie...@vxcompany.com
> *I*  www.vxcompany.com
>
> Seeing, contrary to popular wisdom, isn’t believing. It’s where belief
> stops, because it isn’t needed any more.. (Terry Pratchett)
>
>
> On Mon, Sep 2, 2013 at 1:06 PM, Fred van Zwieten 
> wrote:
>
>> Hi,
>>
>>>
>>> I have a question. I am on RHS 2.0 Update 4 (will soon go to 2.1 when
>>> it's out) and I have a distributed volume (across 7 nodes) where a fair
>>> amount of directory moves take place (or directory renames). AFAIK this
>>> will most likely give new hash values and so all data needs to be moved
>>> around the pool. Gluster does not do this immediately, but instead creates
>>> a link from the new location to the old location.
>>>
>>> Questions:
>>> 1. Is there a gluster command to investigate the amount of these links
>>> (How "dirty" is my volume)
>>> 2. What command to I use to fix this. I think a "simple" rebalance would
>>> do it.
>>>
>>> I would like to create a cron job that checks the amount of links and,
>>> if above a certain level, it will do a rebalance. Alternatively I could do
>>> a fixed periodic rebalance. And I have as yet no idea on what level (what
>>> amount of found links) a rebalance is necessary.
>>>
>>> Any thoughts, hints, etc?
>>>
>>> Fred
>>>
>>> Seeing, contrary to popular wisdom, isn’t believing. It’s where belief
>>> stops, because it isn’t needed any more.. (Terry Pratchett)
>>>
>>
>>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Feedback Requested: Gluster Community Governance

2013-09-04 Thread Anand Avati
John Mark,
Thanks for sending this out. It would be good to have more clarification
around some of these points/observations:

- Scope of the governance overview. At the one end the overview document
has a "high level" organizational theme, and on the other hand it jumps all
the way down to the specifics on how code patches must be voted, with lots
of missing details in between. The proposer / maintainer of the individual
project knows best how to accept patches. We already have multi member
projects other than glusterfs in the forge (e.g gluster-swift,
gluster-hadoop, gluster-samba) which have formed their own governance
models based on the initiators. Maybe voting on code patches is probably
best left out of scope of such a high level document.

- Membership criteria - The first criteria listed for membership is code
contributions. I understand that code contributions alone should not be the
criteria for membership, but the skew of non code contributors to code
contributors ratio in the current board seems a little unbalanced. After
all, open source communities grow around developers and expecting
developers to be at least close to majority in the seed is both natural and
normal. Of course, over time and elections the distribution may organically
evolve into different forms.

- Initial members - It would be good to highlight the contributions and
reasons of selection of the initial members. Some of the names are very
familiar, while some are not known to everyone. I'm sure everybody there
have been selected for having qualified based based on the criteria.
Mentioning those criteria against respective members seems important for
clarity. E.g - X: for super awesome support on IRC, Y: for building a
massive community in their country, etc. Once seeded openly, reasons for
further members would be clear based on the election proceedings.

Thanks,
Avati


On Tue, Sep 3, 2013 at 11:26 PM, John Mark Walker wrote:

> Greetings, Gluster people.
>
>
> tl;dr
>  - This is a long description designed to elicit constructive discussion
> of some recent Gluster Community governance initiatives. For all things
> related to Gluster Community Governance, see
> http://www.gluster.org/Governance
>
>
> The recent initiatives around GlusterFS development and project governance
> have been quite amazing to witness - we have been making steady progress
> towards a "real" open source model for over a year now, and the 3.5
> planning meetings are a testament to that.
>
> You may have also noticed recent announcements about organizations joining
> the Gluster Community and the formation of a Gluster Community Board. This
> is part of the same process of opening up and making a better, more active
> community, but there is a need to define some of the new (and potentially
> confusing) terminology.
>
> - Gluster Community: What is the Gluster Community? It is a group of
> developers, users and organizations dedicated to the development of
> GlusterFS and related projects. GlusterFS is the flagship project of the
> Gluster Community, but it is not the only one - see
> http://forge.gluster.org/ to get a sense of the scope of the entire
> ecosystem. Gluster Community governance is different from GlusterFS project
> governance.
>
> - Gluster Community Board: This consists of individuals from the Gluster
> Community, as well as representatives of organizations that have signed
> letters of intent to contribute to the Gluster Community.
>
> - Letter of Intent: document signed by organizations who wish to make
> material contributions to the Gluster Community. These contributions may
> take many forms, including code contributions, event coordination,
> documentation, testing, and more. How organizations may contribute is
> listed at gluster.org/governance
>
> - Gluster Software Distribution: with so many projects filling out the
> Gluster Community, there is a need for an incubation process, as well as a
> need for criteria that determine eligibility for graduating from incubation
> into the GSD. We don't yet know how we will do this and are looking for
> your input.
>
> We realized some time ago that there was quite a demand for contributing
> to and growing the community, but there was no structure in place to do it.
> The above is our attempt to create an inclusive community that is not
> solely dependent on Red Hat and enlists the services of those who view the
> Gluster Community as a valuable part of their business.
>
> All of this is in-process but not yet finalized. There is an upcoming
> board meeting on September 18 where we will vote on parts or all of this.
>
> If you questions or just want to discuss this initiative here, reply to
> this email substituting "gluster-users@gluster.org" for "announce".
>
> For all links and documents regarding Gluster Community governance, you
> can always find the latest here: http://www.gluster.org/Governance
>
> Thanks,
> John Mark Walker
> Gluster Community Leader
>
> _

Re: [Gluster-users] GlusterFS 3.4.1 planning

2013-08-28 Thread Anand Avati
For those interested in what are the possible patches, here is a short list
of commits which are available in master but not yet backported to
release-3.4 (note the actual list > 500, this is a short list of patches
which fix some kind of an "issue" - crash, leak, incorrect behavior,
failure)

http://www.gluster.org/community/documentation/index.php/Release_341_backport_candidates

Some of the patches towards the end fix some nasty issues. Many of them are
covered in the bugs listed in
http://www.gluster.org/community/documentation/index.php/Backport_Wishlist.
If there are specific patches you would like to see backported, please
copy/paste those lines from the Release_341_backport_candidates page into
the Backport_Wishlist page. For the others, we will be using a best
judgement call based on severity and patch impact.

Avati



On Fri, Aug 9, 2013 at 2:23 AM, Vijay Bellur  wrote:

> Hi All,
>
> We are considering 3.4.1 to be released in the last week of this month. If
> you are interested in seeing specific bugs addressed or patches included in
> 3.4.1, can you please update them here:
>
> http://www.gluster.org/**community/documentation/index.**
> php/Backport_Wishlist
>
> Thanks,
> Vijay
> __**_
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.**org/mailman/listinfo/gluster-**users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Volume creation fails with "prefix of it is already part of a volume"

2013-08-27 Thread Anand Avati
What about getxattr -d -m . -e hex /mnt/brick1 and brick2?

Avati


On Tue, Aug 27, 2013 at 2:09 AM, Stroppa Daniele (strp) wrote:

>  Hi Anand,
>
>  See output below. gluster-node4 is where I execute the create volume
> command.
>
>  On gluster-node1:
>  > getfattr -d -e hex -m . /mnt
> getfattr: Removing leading '/' from absolute path names
> # file: mnt
> security.selinux=0x73797374656d5f753a6f626a6563745f723a6d6e745f743a733000
>
>  > getfattr -d -e hex -m . /mnt/brick1
> > getfattr -d -e hex -m . /mnt/brick1/vol_icclab
> > getfattr -d -e hex -m . /mnt/brick2
> > getfattr -d -e hex -m . /mnt/brick2/vol_icclab
>
>  On gluster-node4:
>  > getfattr -d -e hex -m . /mnt
> getfattr: Removing leading '/' from absolute path names
> # file: mnt
> security.selinux=0x73797374656d5f753a6f626a6563745f723a6d6e745f743a733000
>
>  > getfattr -d -e hex -m . /mnt/brick1
> > getfattr -d -e hex -m . /mnt/brick1/vol_icclab/
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/brick1/vol_icclab/
> trusted.glusterfs.volume-id=0xa2b943d2271a464da2ae7e29ede15552
>
>  > getfattr -d -e hex -m . /mnt/brick2
> > getfattr -d -e hex -m . /mnt/brick2/vol_icclab/
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/brick2/vol_icclab/
> trusted.glusterfs.volume-id=0xa2b943d2271a464da2ae7e29ede15552
>
>  Cheers,
>   --
> Daniele Stroppa
> Researcher
> Institute of Information Technology
> Zürich University of Applied Sciences
> http://www.cloudcomp.ch
>
>
>   From: Anand Avati 
> Date: Fri, 23 Aug 2013 11:02:20 -0700
> To: Daniele Stroppa 
> Cc: Vijay Bellur , "gluster-users@gluster.org" <
> gluster-users@gluster.org>
> Subject: Re: [Gluster-users] Volume creation fails with "prefix of it is
> already part of a volume"
>
>  Please provide the output of the following commands on the respective
> nodes:
>
>  on gluster-node1 and gluster-node4:
>
>  getfattr -d -e hex -m . /mnt
> getfattr -d -e hex -m . /mnt/brick1
> getfattr -d -e hex -m . /mnt/brick1/vol_icclab
> getfattr -d -e hex -m . /mnt/brick2
> getfattr -d -e hex -m . /mnt/brick2/vol_icclab
>
>  Thanks,
> Avati
>
>
>
> On Wed, Aug 21, 2013 at 11:59 PM, Stroppa Daniele (strp) wrote:
>
>> Hi Vijay,
>>
>> I did saw the link you posted, but as mentioned earlier I get this error
>> when creating the volume for the first time, not when I try to remove and
>> then re-add a brick to a volume.
>>
>> Thanks,
>> --
>> Daniele Stroppa
>> Researcher
>> Institute of Information Technology
>> Zürich University of Applied Sciences
>> http://www.cloudcomp.ch <http://www.cloudcomp.ch/>
>>
>>
>>
>>
>>
>>
>>
>>  On 22/08/2013 08:19, "Vijay Bellur"  wrote:
>>
>> >On 08/21/2013 08:21 PM, Stroppa Daniele (strp) wrote:
>> >> Thanks Daniel.
>> >>
>> >> I'm indeed running Gluster 3.4 on CentOS 6.4.
>> >>
>> >> I've tried your suggestion, it does work for me too, but it's not an
>> >> optimal solution.
>> >>
>> >> Maybe someone could shed some light on this behaviour?
>> >
>> >This might be of help in understanding the behavior:
>> >
>> >
>> http://joejulian.name/blog/glusterfs-path-or-a-prefix-of-it-is-already-par
>> >t-of-a-volume/
>> >
>> >Please do let us know if you need more clarity on this.
>> >
>> >-Vijay
>> >>
>> >> Thanks,
>> >> --
>> >> Daniele Stroppa
>> >> Researcher
>> >> Institute of Information Technology
>> >> Zürich University of Applied Sciences
>> >> http://www.cloudcomp.ch <http://www.cloudcomp.ch/>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 21/08/2013 09:00, "Daniel Müller"  wrote:
>> >>
>> >>> Are you running with gluster 3.4?
>> >>> I had the same issue. I solved it by deleting my subfolders again and
>> >>>then
>> >>> create new ones. In your case brick1 and brick2.
>> >>> Then create new subfolders in the place,ex.: mkdir /mnt/bricknew1  and
>> >>> /mnt/bricknew2 .
>> >>> This solved the problem for me, not knowing why gluster 3.4 behave
>> like
>> >>> this.
>> >>> Good Luck
>> >>>
>> >>>
>> >>

Re: [Gluster-users] Problems with data integrity

2013-08-26 Thread Anand Avati
Michael,
The problem looks very strange. We haven't come across such an issue (in
glusterfs) so far. However I do recall seeing such bit flips at a customer
site in the past, and in the end it was diagnosed to be a hardware issue.
Can you retry a few runs of same rsync directly to the backends through nfs
or rsyncd (without involving gluster or ssh), from the same client and same
data set, to both the servers and see if you can reproduce such a
md5sum/sha1sum mismatch?

Avati


On Mon, Aug 26, 2013 at 7:03 AM, Michael Peek  wrote:

>  Hi gurus,
>
> This is a follow-up to a previous report about data integrity problems
> with Gluster 3.4.0.  I will be as thorough as I can, but this is already a
> pretty long post.  So feel free to see my previous post for more
> information specific to my previous run of tests.
>
>
>1. I am running a fully up-to-date version of Ubuntu 12.04, with
>Gluster 3.4.0final-ubuntu1~precise1.
>
> 2. My cluster consists of four nodes.  Each node consists of:
>   1. Hostnames: bkupc1-a -to- bkupc1-d
>   2. Bricks: Each host has /export/a/glusterfs/ and
>   /export/b/glusterfs/, which are 4TB ext4 drives
>   3. Clients: I have a client that mounts the volume as /data/bkupc1/
>   using the fuse driver.
>
>3. My volume was created with:
>/usr/sbin/gluster peer probe bkupc1-a
>/usr/sbin/gluster peer probe bkupc1-b
>/usr/sbin/gluster peer probe bkupc1-c
>/usr/sbin/gluster peer probe bkupc1-d
>/usr/sbin/gluster volume create bkupc1 replica 2 transport tcp \
>bkupc1-a:/export/a/glusterfs   bkupc1-b:/export/a/glusterfs \
>bkupc1-c:/export/a/glusterfs   bkupc1-d:/export/a/glusterfs \
>bkupc1-a:/export/b/glusterfs   bkupc1-b:/export/b/glusterfs \
>bkupc1-c:/export/b/glusterfs   bkupc1-d:/export/b/glusterfs
>/usr/sbin/gluster volume set bkupc1 auth.allow {list of IP addresses}
>
> 4. On the client I have a 1TB drive filled with 900+GB of data in
>156,554 test files.  These files are encrypted backups that are dispersed
>throughout many subdirectories.  They are ugly to look at.  Here's an
>example:
>
>data/
>884b9a38-0443-11e3-b8fb-f46d04e15793/
>884a7040-0443-11e3-b8fb-f46d04e15793/
>8825c6c8-0443-11e3-b8fb-f46d04e15793/
>880f8f0c-0443-11e3-b8fb-f46d04e15793/
>iMmV,UqdiqZRie5QUu341iRS7s,-OK7PzXSuPgr0o30yNDXNG6uvqA0Wyr7RRR3MBE4
>
>
>
>I have pre-calculated MD5 and SHA1 checksums for all of these files,
>and I have verified that the checksums are correct on the client drive.
>
> 5. My first set of runs involved using rsync.  Nothing fancy here:
>   1. The volume is empty when I begin
>   2. I create /data/bkupc1/BACKUPS-rsync.${timestamp}/
>   3. Use rsync to copy files from the client to the volume
>   4. Here's my script:
>   #!/bin/bash -x
>
>   timestamp="${1}"
>
>   /bin/date
>
>   mkdir /data/bkupc1/BACKUPS-rsync.${timestamp}
>
>   rsync \
>   -a \
>   -v \
>   --delete \
>   --delete-excluded \
>   --force \
>   --ignore-errors \
>   --one-file-system \
>   --stats \
>   --inplace \
>   ./ \
>   /data/bkupc1/BACKUPS-rsync.${timestamp}/ \
>   #
>
>   /bin/date
>
>   (\
>   cd /data/bkupc1/BACKUPS-rsync.${timestamp}/ \
>   && md5sum -c --quiet md5sums \
>   )
>
>   /bin/date
>
>   (\
>   cd /data/bkupc1/BACKUPS-rsync.${timestamp}/ \
>   && sha1sum -c --quiet sha1sums \
>   )
>
>   /bin/date
>
>   /usr/bin/diff -r -q ./ /data/bkupc1/BACKUPS-rsync.${timestamp}/
>
>   /bin/date
>
>5. As you can see from the script, after rsyncing, I check the
>   files on the volume
>  1. Against their MD5 checksums
>  2. Then against their SHA1 checksums
>  3. Then, just to beat a dead horse, I use diff to do a
>  byte-for-byte check between the files on the client and the files on 
> the
>  volume.  (Note to self: I should replace diff with cmp, as I have 
> run into
>  "out of memory" errors with diff on files that cmp can handle just 
> fine.)
>
>   6. What I have found is that about 50% of the time, there will
>   be one or two files out of those 156,554 that differ.  I documented my
>   findings in more detail in my previous email.
>
>6. One though that occurred to me is that this could be the fault
>of rsync.  So I have repeated the tests using plain old /bin/cp.  Here's my
>(very similar) script:
>#!/bin/bash -x
>
>timestamp="${1}"
>
>/bin/date
>
>mkdir /data/bkupc1/BACKUPS-cp.${timestamp}
>
>/bin/cp -ar ./ /data/bkupc1/BACKUPS-cp.${timestamp}/
>
>/bin/date
>
>(\
>cd /data/bkupc1/BACKUPS-cp.${timestamp}/ \
>&& md5sum -c --quiet md5sums \
>)
>
>/bin/date
>
>(\
>cd /data/bkupc1/BACKUPS-cp.${timestamp}/ \
> 

Re: [Gluster-users] Fwd: FileSize changing in GlusterNodes

2013-08-26 Thread Anand Avati
On Mon, Aug 26, 2013 at 9:40 AM, Vijay Bellur  wrote:

> On 08/26/2013 10:04 PM, Anand Avati wrote:
>
>> On Sun, Aug 25, 2013 at 11:23 PM, Vijay Bellur > <mailto:vbel...@redhat.com>> wrote:
>>
>> File size as reported on the mount point and the bricks can vary
>> because of this code snippet in iatt_from_stat():
>>
>>  {
>>  uint64_t maxblocks;
>>
>>  maxblocks = (iatt->ia_size + 511) / 512;
>>
>>  if (iatt->ia_blocks > maxblocks)
>>  iatt->ia_blocks = maxblocks;
>>  }
>>
>>
>> This snippet was brought in to improve accounting behaviour for
>> quota that would fail with disk file systems that perform
>> speculative pre-allocation.
>>
>> If this aides only specific use cases, I think we should make the
>> behaviour configurable.
>>
>> Thoughts?
>>
>> -Vijay
>>
>>
>>
>> This is very unlikely the problem. st_blocks field values do not
>> influence md5sum behavior in any way. The file size (st_size) would, but
>> both du -k and the above code snipped only deal with st_blocks.
>>
>
> I was referring to du -k as seen on the bricks and the mount point. I was
> certainly not referring to the md5sum difference.
>
> -Vijay
>

I thought he was comparing du -k between the two bricks (the sentence felt
that way). In any case the above code snippet should do something
meaningful only when the file is still held open. XFS should discard the
extra allocations after close() anyways.



>
>> Bobby, it would help if you can identify the mismatching file and
>> inspect and see what is the difference between the two files?
>>
>> Avati
>>
>>
>>
>>
>>
>>  Original Message 
>> Subject:[Gluster-users] FileSize changing in GlusterNodes
>> Date:   Wed, 21 Aug 2013 05:35:40 +
>> From:   Bobby Jacob > <mailto:bobby.jacob@alshaya.**com >>
>> To: gluster-users@gluster.org 
>> <mailto:gluster-users@gluster.**org
>> >
>> > <mailto:gluster-users@gluster.**org
>> >>
>>
>>
>>
>> Hi,
>>
>> When I upload files into the gluster volume, it replicates all the
>> files
>> to both gluster nodes. But the file size slightly varies by (4-10KB),
>> which changes the md5sum of the file.
>>
>> Command to check file size : du –k *. I’m using glusterFS 3.3.1 with
>> Centos 6.4
>>
>> This is creating inconsistency between the files on both the bricks. ?
>> What is the reason for this changed file size and how can it be
>> avoided. ?
>>
>> Thanks & Regards,
>>
>> *Bobby Jacob*
>>
>>
>>
>>
>> __**_
>> Gluster-users mailing list
>> Gluster-users@gluster.org 
>> <mailto:Gluster-users@gluster.**org
>> >
>> 
>> http://supercolony.gluster.**org/mailman/listinfo/gluster-**users<http://supercolony.gluster.org/mailman/listinfo/gluster-users>
>>
>>
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Fwd: FileSize changing in GlusterNodes

2013-08-26 Thread Anand Avati
On Sun, Aug 25, 2013 at 11:23 PM, Vijay Bellur  wrote:

> File size as reported on the mount point and the bricks can vary because
> of this code snippet in iatt_from_stat():
>
> {
> uint64_t maxblocks;
>
> maxblocks = (iatt->ia_size + 511) / 512;
>
> if (iatt->ia_blocks > maxblocks)
> iatt->ia_blocks = maxblocks;
> }
>
>
> This snippet was brought in to improve accounting behaviour for quota that
> would fail with disk file systems that perform speculative pre-allocation.
>
> If this aides only specific use cases, I think we should make the
> behaviour configurable.
>
> Thoughts?
>
> -Vijay
>
>

This is very unlikely the problem. st_blocks field values do not influence
md5sum behavior in any way. The file size (st_size) would, but both du -k
and the above code snipped only deal with st_blocks.

Bobby, it would help if you can identify the mismatching file and inspect
and see what is the difference between the two files?

Avati



>
>
>
>
>  Original Message 
> Subject:[Gluster-users] FileSize changing in GlusterNodes
> Date:   Wed, 21 Aug 2013 05:35:40 +
> From:   Bobby Jacob 
> To: gluster-users@gluster.org 
>
>
>
> Hi,
>
> When I upload files into the gluster volume, it replicates all the files
> to both gluster nodes. But the file size slightly varies by (4-10KB),
> which changes the md5sum of the file.
>
> Command to check file size : du –k *. I’m using glusterFS 3.3.1 with
> Centos 6.4
>
> This is creating inconsistency between the files on both the bricks. ?
> What is the reason for this changed file size and how can it be avoided. ?
>
> Thanks & Regards,
>
> *Bobby Jacob*
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Volume creation fails with "prefix of it is already part of a volume"

2013-08-23 Thread Anand Avati
Please provide the output of the following commands on the respective nodes:

on gluster-node1 and gluster-node4:

getfattr -d -e hex -m . /mnt
getfattr -d -e hex -m . /mnt/brick1
getfattr -d -e hex -m . /mnt/brick1/vol_icclab
getfattr -d -e hex -m . /mnt/brick2
getfattr -d -e hex -m . /mnt/brick2/vol_icclab

Thanks,
Avati



On Wed, Aug 21, 2013 at 11:59 PM, Stroppa Daniele (strp) wrote:

> Hi Vijay,
>
> I did saw the link you posted, but as mentioned earlier I get this error
> when creating the volume for the first time, not when I try to remove and
> then re-add a brick to a volume.
>
> Thanks,
> --
> Daniele Stroppa
> Researcher
> Institute of Information Technology
> Zürich University of Applied Sciences
> http://www.cloudcomp.ch 
>
>
>
>
>
>
>
> On 22/08/2013 08:19, "Vijay Bellur"  wrote:
>
> >On 08/21/2013 08:21 PM, Stroppa Daniele (strp) wrote:
> >> Thanks Daniel.
> >>
> >> I'm indeed running Gluster 3.4 on CentOS 6.4.
> >>
> >> I've tried your suggestion, it does work for me too, but it's not an
> >> optimal solution.
> >>
> >> Maybe someone could shed some light on this behaviour?
> >
> >This might be of help in understanding the behavior:
> >
> >
> http://joejulian.name/blog/glusterfs-path-or-a-prefix-of-it-is-already-par
> >t-of-a-volume/
> >
> >Please do let us know if you need more clarity on this.
> >
> >-Vijay
> >>
> >> Thanks,
> >> --
> >> Daniele Stroppa
> >> Researcher
> >> Institute of Information Technology
> >> Zürich University of Applied Sciences
> >> http://www.cloudcomp.ch 
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 21/08/2013 09:00, "Daniel Müller"  wrote:
> >>
> >>> Are you running with gluster 3.4?
> >>> I had the same issue. I solved it by deleting my subfolders again and
> >>>then
> >>> create new ones. In your case brick1 and brick2.
> >>> Then create new subfolders in the place,ex.: mkdir /mnt/bricknew1  and
> >>> /mnt/bricknew2 .
> >>> This solved the problem for me, not knowing why gluster 3.4 behave like
> >>> this.
> >>> Good Luck
> >>>
> >>>
> >>> EDV Daniel Müller
> >>>
> >>> Leitung EDV
> >>> Tropenklinik Paul-Lechler-Krankenhaus
> >>> Paul-Lechler-Str. 24
> >>> 72076 Tübingen
> >>> Tel.: 07071/206-463, Fax: 07071/206-499
> >>> eMail: muel...@tropenklinik.de
> >>> Internet: www.tropenklinik.de
> >>>
> >>> Von: gluster-users-boun...@gluster.org
> >>> [mailto:gluster-users-boun...@gluster.org] Im Auftrag von Stroppa
> >>>Daniele
> >>> (strp)
> >>> Gesendet: Dienstag, 20. August 2013 21:51
> >>> An: gluster-users@gluster.org
> >>> Betreff: [Gluster-users] Volume creation fails with "prefix of it is
> >>> already
> >>> part of a volume"
> >>>
> >>> Hi All,
> >>>
> >>> I'm setting up a small test cluster: 2 nodes (gluster-node1 and
> >>> gluster-node4) with 2 bricks each (/mnt/brick1 and /mnt/brick2) and one
> >>> volume (vol_icclab). When I issue the create volume command I get the
> >>> following error:
> >>>
> >>> # gluster volume create vol_icclab replica 2 transport tcp
> >>> gluster-node4.test:/mnt/brick1/vol_icclab
> >>> gluster-node1.test:/mnt/brick1/vol_icclab
> >>> gluster-node4.test:/mnt/brick2/vol_icclab
> >>> gluster-node1.test:/mnt/brick2/vol_icclab
> >>> volume create: vol_icclab: failed: /mnt/brick1/vol_icclab or a prefix
> >>>of
> >>> it
> >>> is already part of a volume
> >>>
> >>> I checked and found this [1], but in my case the issue it's happening
> >>>when
> >>> creating the volume for the first time, not after removing/adding a
> >>>brick
> >>> to
> >>> a volume.
> >>>
> >>> Any suggestions?
> >>>
> >>> [1]
> >>>
> >>>
> http://joejulian.name/blog/glusterfs-path-or-a-prefix-of-it-is-already-p
> >>> art-of-a-volume/
> >>>
> >>> Thanks,
> >>> --
> >>> Daniele Stroppa
> >>> Researcher
> >>> Institute of Information Technology
> >>> Zürich University of Applied Sciences
> >>> http://www.cloudcomp.ch
> >>>
> >>>
> >>
> >> ___
> >> Gluster-users mailing list
> >> Gluster-users@gluster.org
> >> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >>
> >
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Volume creation fails with "prefix of it is already part of a volume"

2013-08-21 Thread Anand Avati
This is intentional behavior. We specifically brought this because creating
volumes with directories which are subdirectories of other bricks, or if a
subdirectory belongs to another brick can result in dangerous corruption of
your data.

Please create volumes with brick directories which are cleanly separate in
the namespace.

Avati


On Wed, Aug 21, 2013 at 7:51 AM, Stroppa Daniele (strp) wrote:

> Thanks Daniel.
>
> I'm indeed running Gluster 3.4 on CentOS 6.4.
>
> I've tried your suggestion, it does work for me too, but it's not an
> optimal solution.
>
> Maybe someone could shed some light on this behaviour?
>
> Thanks,
> --
> Daniele Stroppa
> Researcher
> Institute of Information Technology
> Zürich University of Applied Sciences
> http://www.cloudcomp.ch 
>
>
>
>
>
>
>
> On 21/08/2013 09:00, "Daniel Müller"  wrote:
>
> >Are you running with gluster 3.4?
> >I had the same issue. I solved it by deleting my subfolders again and then
> >create new ones. In your case brick1 and brick2.
> >Then create new subfolders in the place,ex.: mkdir /mnt/bricknew1  and
> >/mnt/bricknew2 .
> >This solved the problem for me, not knowing why gluster 3.4 behave like
> >this.
> >Good Luck
> >
> >
> >EDV Daniel Müller
> >
> >Leitung EDV
> >Tropenklinik Paul-Lechler-Krankenhaus
> >Paul-Lechler-Str. 24
> >72076 Tübingen
> >Tel.: 07071/206-463, Fax: 07071/206-499
> >eMail: muel...@tropenklinik.de
> >Internet: www.tropenklinik.de
> >
> >Von: gluster-users-boun...@gluster.org
> >[mailto:gluster-users-boun...@gluster.org] Im Auftrag von Stroppa Daniele
> >(strp)
> >Gesendet: Dienstag, 20. August 2013 21:51
> >An: gluster-users@gluster.org
> >Betreff: [Gluster-users] Volume creation fails with "prefix of it is
> >already
> >part of a volume"
> >
> >Hi All,
> >
> >I'm setting up a small test cluster: 2 nodes (gluster-node1 and
> >gluster-node4) with 2 bricks each (/mnt/brick1 and /mnt/brick2) and one
> >volume (vol_icclab). When I issue the create volume command I get the
> >following error:
> >
> ># gluster volume create vol_icclab replica 2 transport tcp
> >gluster-node4.test:/mnt/brick1/vol_icclab
> >gluster-node1.test:/mnt/brick1/vol_icclab
> >gluster-node4.test:/mnt/brick2/vol_icclab
> >gluster-node1.test:/mnt/brick2/vol_icclab
> >volume create: vol_icclab: failed: /mnt/brick1/vol_icclab or a prefix of
> >it
> >is already part of a volume
> >
> >I checked and found this [1], but in my case the issue it's happening when
> >creating the volume for the first time, not after removing/adding a brick
> >to
> >a volume.
> >
> >Any suggestions?
> >
> >[1]
> >http://joejulian.name/blog/glusterfs-path-or-a-prefix-of-it-is-already-p
> >art-of-a-volume/
> >
> >Thanks,
> >--
> >Daniele Stroppa
> >Researcher
> >Institute of Information Technology
> >Zürich University of Applied Sciences
> >http://www.cloudcomp.ch
> >
> >
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Slow on writing

2013-08-17 Thread Anand Avati
On Sat, Aug 17, 2013 at 5:20 AM, Jeff Darcy  wrote:

> On 08/16/2013 11:21 PM, Alexey Shalin wrote:
>
>> I wrote small script :
>> #!/bin/bash
>>
>> for i in {1..1000}; do
>> size=$((RANDOM%5+1))
>> dd if=/dev/zero of=/storage/test/bigfile${i} count=1024 bs=${size}k
>> done
>>
>> This script creates files with different size on volume
>>
>> here is output:
>> 2097152 bytes (2.1 MB) copied, 0.120632 s, 17.4 MB/s
>> 1024+0 records in
>> 1024+0 records out
>> 1048576 bytes (1.0 MB) copied, 0.14548 s, 7.2 MB/s
>> 1024+0 records in
>> 1024+0 records out
>>
>
> It looks like you're doing small writes (1-6KB) from a single thread.
>  That means network latency is going to be your primary limiting factor.
>  20MB/s at 4KB is 5000 IOPS or 0.2ms per network round trip.  You don't say
> what kind of network you're using, but if it's Plain Old GigE that doesn't
> seem too surprising.  BTW, the NFS numbers are likely to be better because
> the NFS client does more caching and you're not writing enough to fill
> memory, so you're actually getting less durability than in the
> native-protocol test and therefore the numbers aren't directly comparable.
>
> I suggest trying larger block sizes and higher I/O thread counts (with
> iozone you can do this in a single command instead of a script).  You
> should see a pretty marked improvement.
>
>
Also, small block size writes kill performance on FUSE because of the
context switches (and lack of write caching in FUSE). Larger block size (>=
64KB) should start showing good performance.

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Files losing permissions

2013-08-01 Thread Anand Avati
Justin,
What you are seeing are internal DHT linkfiles. They are zero byte files
with mode 01000. Changing their mode forcefully in the backend to something
else WILL render your files inaccessible from the mount point. I am
assuming that you have seen these files only in the backend and not from
the mount point. And accessing/modifying files like this directly from the
backend is very dangerous for your data, as explained in this very example.

Avati


On Thu, Aug 1, 2013 at 2:25 PM, Justin Dossey  wrote:

> One thing I do see with the issue we're having is that the files which
> have lost their permissions have "bad" versions on multiple bricks.  Since
> the replica count is 2 for any given file, there should be only two copies
> of each, no?
>
> For example, the file below has zero-length, zero-permission versions on
> uds06/brick2 and uds-07/brick2, but good versions on uds-05/brick1 and
> uds-06/brick1.
>
> FILE is /09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
> uds-05 -rw-r--r-- 2 apache apache 2233 Jul 6 2008
> /export/brick1/vol1/09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
> uds-06 -rw-r--r-- 2 apache apache 2233 Jul 6 2008
> /export/brick1/vol1/09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
> uds-06 -T 2 apache apache 0 Jul 23 03:11
> /export/brick2/vol1/09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
> uds-07 -T 2 apache apache 0 Jul 23 03:11
> /export/brick2/vol1/09/38/1f/eastar/mail/entries/trash/2008-07-06T13_41_56-07_00.dump
>
> Is it acceptable for me to just delete the zero-length copies?
>
>
>
> On Thu, Aug 1, 2013 at 12:57 PM, Justin Dossey  wrote:
>
>> Do you know whether it's acceptable to modify permissions on the brick
>> itself (as opposed to over NFS or via the fuse client)?  It seems that as
>> long as I don't modify the xattrs, the permissions I set on files on the
>> bricks are passed through.
>>
>>
>> On Thu, Aug 1, 2013 at 12:32 PM, Joel Young  wrote:
>>
>>> I am not seeing exactly that, but I am experiencing the permission for
>>> the root directory of a gluster volume reverting from a particular
>>> user.user to root.root ownership.  I have to periodically do a "cd
>>> /share; chown user.user . "
>>>
>>> On Thu, Aug 1, 2013 at 12:25 PM, Justin Dossey 
>>> wrote:
>>> > Hi all,
>>> >
>>> > I have a relatively-new GlusterFS 3.3.2 4-node cluster in
>>> > distributed-replicated mode running in a production environment.
>>> >
>>> > After adding bricks from nodes 3 and 4 (which changed the cluster type
>>> from
>>> > simple replicated-2 to distributed-replicated-2), I've discovered that
>>> files
>>> > are randomly losing their permissions.  These are files that aren't
>>> being
>>> > accessed by our clients-- some of them haven't been touched for years.
>>> >
>>> > When I say "losing their permissions", I mean that regular files are
>>> going
>>> > from 0644 to  or 1000.
>>> >
>>> > Since this is a real production issue, I run a parallel find process to
>>> > correct them every ten minutes.  It has corrected approximately 40,000
>>> files
>>> > in the past 18 hours.
>>> >
>>> > Is anyone else seeing this kind of issue?  My searches have turned up
>>> > nothing so far.
>>> >
>>> > --
>>> > Justin Dossey
>>> > CTO, PodOmatic
>>> >
>>> >
>>> > ___
>>> > Gluster-users mailing list
>>> > Gluster-users@gluster.org
>>> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>> --
>> Justin Dossey
>> CTO, PodOmatic
>>
>>
>
>
> --
> Justin Dossey
> CTO, PodOmatic
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Extremely slow NFS access from Windows

2013-07-31 Thread Anand Avati
On Wed, Jul 31, 2013 at 8:57 AM, Nux!  wrote:

> On 31.07.2013 16:21, Nux! wrote:
>
>> On 31.07.2013 12:29, Nux! wrote:
>>
>>> Hello,
>>> I'm trying to use a volume on Windows via NFS and every operation is
>>> very slow and in the nfs.log I see the following:
>>> [2013-07-31 11:26:22.644794] W [socket.c:514:__socket_rwv]
>>> 0-socket.nfs-server: writev failed (Invalid argument)
>>> [2013-07-31 11:26:34.738955] W [socket.c:514:__socket_rwv]
>>> 0-socket.nfs-server: writev failed (Invalid argument)
>>> [2013-07-31 11:26:46.816790] W [socket.c:514:__socket_rwv]
>>> 0-socket.nfs-server: writev failed (Invalid argument)
>>> [2013-07-31 11:26:56.466939] W [rpcsvc.c:180:rpcsvc_program_**actor]
>>> 0-rpc-service: RPC program version not available (req 13 2)
>>> [2013-07-31 11:26:56.466993] E
>>> [rpcsvc.c:448:rpcsvc_check_**and_reply_error] 0-rpcsvc: rpc actor failed
>>> to complete successfully
>>
>>

Looks like Windows is trying to connect to an NFSv2 server. Gluster support
NFSv3 only. The "Invalid argument" errors showing up prior also look
suspicious. Can you get trace logs?

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] new glusterfs logging framework

2013-07-30 Thread Anand Avati
On Tue, Jul 30, 2013 at 11:39 PM, Balamurugan Arumugam
wrote:

>
>
> - Original Message -
> > From: "Joe Julian" 
> > To: "Pablo" , "Balamurugan Arumugam" <
> b...@gluster.com>
> > Cc: gluster-users@gluster.org, gluster-de...@nongnu.org
> > Sent: Tuesday, July 30, 2013 9:26:55 PM
> > Subject: Re: [Gluster-users] new glusterfs logging framework
> >
> > Configuration files should be under /etc per FSH standards. Move the
> > logger.conf to /etc/glusterfs.
> >
>
> This will be done.
>
>
> > I, personally, like json logs since I'm shipping to logstash. :-) My one
> > suggestion would be to ensure the timestamps are in rfc3164.
> >
>
> rsyslog supports rfc3339 (a profile of ISO8601) and we use this.  Let me
> know your thoughts on continue using it.
>
>
> > Yes, those are complex steps, but the rpm/deb packaging should take care
> of
> > dependencies and setting up logical defaults.
> >
>
> Yes. I am planning to add rsyslog configuration for gluster at install
> time.
>
>
> > IMHO, since this is a departure from the way it's been before now, the
> config
> > file should enable this new behavior, not disable it, to avoid breaking
> > existing monitoring installations.
> >
>
> Do you mean to continue current logging in addition to syslog way?
>
>
This means unless explicitly configured with syslog, by default we should
be logging to gluster logs as before.

Avati



> Regards,
> Bala
>
>
> > Pablo  wrote:
> > >I think that adding all that 'rsyslog' configuration only to see logs
> > >is
> > >too much. (I admit it, I don't know how to configure rsyslog at that
> > >level so that may influence my opinion)
> > >
> > >Regards,
> > >
> > >
> > >El 30/07/2013 06:29 a.m., Balamurugan Arumugam escribió:
> > >> Hi All,
> > >>
> > >> Recently new logging framework was introduced [1][2][3] in glusterfs
> > >master branch.  You could read more about this on doc/logging.txt.  In
> > >brief, current log target is moved to syslog and user has an option to
> > >this new logging at compile time (passing '--disable-syslog' to
> > >./configure or '--without syslog' to rpmbuild) and run time (having a
> > >file /var/log/glusterd/logger.conf and restarting gluster services).
> > >>
> > >> As rsyslog is used as syslog server in Fedora and CentOS/RHEL and
> > >default configuration of rsyslog does not have any rule specific to
> > >gluster logs, you see all logs are in /var/log/messages in JSON format.
> > >>
> > >> Below is the way to make them neat and clean.
> > >>
> > >> For fedora users:
> > >> 1. It requires to install rsyslog-mmjsonparse rpm (yum -y install
> > >rsyslog-mmjsonparse)
> > >> 2. Place below configuration under /etc/rsyslog.d/gluster.conf file.
> > >>
> > >> #$RepeatedMsgReduction on
> > >>
> > >> $ModLoad mmjsonparse
> > >> *.* :mmjsonparse:
> > >>
> > >> template (name="GlusterLogFile" type="string"
> > >string="/var/log/gluster/%app-name%.log")
> > >> template (name="GlusterPidLogFile" type="string"
> > >string="/var/log/gluster/%app-name%-%procid%.log")
> > >>
> > >> template(name="GLFS_template" type="list") {
> > >> property(name="$!mmcount")
> > >> constant(value="/")
> > >> property(name="syslogfacility-text" caseConversion="upper")
> > >> constant(value="/")
> > >> property(name="syslogseverity-text" caseConversion="upper")
> > >> constant(value=" ")
> > >> constant(value="[")
> > >> property(name="timereported" dateFormat="rfc3339")
> > >> constant(value="] ")
> > >> constant(value="[")
> > >> property(name="$!gf_code")
> > >> constant(value="] ")
> > >> constant(value="[")
> > >> property(name="$!gf_message")
> > >> constant(value="] ")
> > >> property(name="$!msg")
> > >> constant(value="\n")
> > >> }
> > >>
> > >> if $app-name == 'gluster' or $app-name == 'glusterd' then {
> > >> action(type="omfile"
> > >>DynaFile="GlusterLogFile"
> > >>Template="GLFS_template")
> > >> stop
> > >> }
> > >>
> > >> if $app-name contains 'gluster' then {
> > >> action(type="omfile"
> > >>DynaFile="GlusterPidLogFile"
> > >>Template="GLFS_template")
> > >> stop
> > >> }
> > >>
> > >>
> > >> 3. Restart rsyslog (service rsyslog restart)
> > >> 4. Done. All gluster process specific logs are separated into
> > >/var/log/gluster/ directory
> > >>
> > >>
> > >> Note: Fedora 19 users
> > >> There is a bug in rsyslog of fedora 19 [4], so its required to
> > >recompile rsyslog source rpm downloaded from fedora repository
> > >('rpmbuild --rebuild rsyslog-7.2.6-1.fc19.src.rpm' works fine) and use
> > >generated rsyslog and rsyslog-mmjsonparse binary rpms
> > >>
> > >> For CentOS/RHEL users:
> > >> Current rsyslog available in CentOS/RHEL does not have json support.
> > >I have added the support which requires some testing.  I will update
> > >once done.
> > >>
> > >>
> > >> TODO:
> > >> 1. need to add volume:brick specific tag to logging so that those
> > >logs can be separated out than pid.
> > 

Re: [Gluster-users] uWSGI plugin and some question

2013-07-30 Thread Anand Avati
On Tue, Jul 30, 2013 at 7:47 AM, Roberto De Ioris  wrote:

>
> > On Mon, Jul 29, 2013 at 10:55 PM, Anand Avati 
> > wrote:
> >
> >
> > I am assuming the module in question is this -
> > https://github.com/unbit/uwsgi/blob/master/plugins/glusterfs/glusterfs.c
> .
> > I
> > see that you are not using the async variants of any of the glfs calls so
> > far. I also believe you would like these "synchronous" calls to play
> > nicely
> > with Coro:: by yielding in a compatible way (and getting woken up when
> > response arrives in a compatible way) - rather than implementing an
> > explicit glfs_stat_async(). The ->request() method does not seem to be be
> > naturally allowing the use of "explictly asynchronous" calls within.
> >
> > Can you provide some details of the event/request management in use? If
> > possible, I would like to provide hooks for yield and wakeup primitives
> in
> > gfapi (which you can wire with Coro:: or anything else) such that these
> > seemingly synchronous calls (glfs_open, glfs_stat etc.) don't starve the
> > app thread without yielding.
> >
> > I can see those hooks having a benefit in the qemu gfapi driver too,
> > removing a bit of code there which integrates callbacks into the event
> > loop
> > using pipes.
> >
> > Avati
> >
> >
>
> This is a prototype of async way:
>
>
> https://github.com/unbit/uwsgi/blob/master/plugins/glusterfs/glusterfs.c#L43
>
> basically once the async request is sent, the uWSGI core (it can be a
> coroutine, a greenthread or another callback) wait for a signal (via pipe
> [could be eventfd() on linux]) of the callback completion:
>
>
> https://github.com/unbit/uwsgi/blob/master/plugins/glusterfs/glusterfs.c#L78
>
> the problem is that this approach is racey in respect of the
> uwsgi_glusterfs_async_io structure.


It is probably OK since you are waiting for the completion of the AIO
request before issuing the next. One question I have in your usage is, who
is draining the "\1" written to the pipe in uwsgi_glusterfs_read_async_cb()
? Since the same pipe is re-used for the next read chunk, won't you get an
immediate wake up if you tried polling on the pipe without draining?


> Can i assume after glfs_close() all of
> the pending callbacks are cleared ?


With the way you are using the _async() calls, you do have the guarantee -
because you are waiting for the completion of each AIO request right after
issuing.

The enhancement to gfapi I was proposing was to expose hooks at yield() and
wake() points for external consumers to wire in their own ways of switching
out of the stack. This is still a half baked idea, but it will let you use
only glfs_read(), glfs_stat() etc. (and NOT the explicit async variants),
and the hooks will let you do wait_read_hook() and write(pipefd, '\1')
respectively in a generic way independent of the actual call.


> In such a way i could simply
> deallocate it (now it is on the stack) at the end of the request.
>

You probably need to do all that in case you want to have multiple
outstanding AIOs at the same time. From what I see, you just need
co-operative waiting till call completion.

Also note that the ideal block size for performing IO is 128KB. 8KB is too
little for a distributed filesystem.

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster NFS - no permission to write from Windows

2013-07-30 Thread Anand Avati
It really depends on how Windows is assigning a UNIX uid during NFS access.
Gluster just takes the Unix UID encoded in the auth header as-is. In this
case that number is actually the unsigned representation of -2.

Regarding performance, please look for any odd messages in the NFS server
logs.

Avati


On Tue, Jul 30, 2013 at 3:21 AM, Nux!  wrote:

> On 30.07.2013 11:03, Anand Avati wrote:
>
>> What unix uid is the windows client mapping the access to? I guess the
>> permission issue boils down to that. You can create a file under the mode
>> 777 dir, and check the uid/gid from a linux client. Then make sure the
>> dirs
>> you create can be writeable by that uid/gid.
>>
>
> Yep, it's some weird UID:
> drwxr-xr-x 2 4294967294 4294967294 10 Jul 30 11:14 New folder
>
> So the solution would be to "chown 4294967294:4294967294 directory"? Was
> hoping for something more elegant, but it'll have to do I guess.
>
> Also, another problem I notice is that doing anything on this NFS mount in
> windows is extremely slow and I frequently get a "Not Responding" Explorer
> window. Any thoughts on why this might happen?
>
>
> --
> Sent from the Delta quadrant using Borg technology!
>
> Nux!
> www.nux.ro
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster NFS - no permission to write from Windows

2013-07-30 Thread Anand Avati
What unix uid is the windows client mapping the access to? I guess the
permission issue boils down to that. You can create a file under the mode
777 dir, and check the uid/gid from a linux client. Then make sure the dirs
you create can be writeable by that uid/gid.

Avati


On Tue, Jul 30, 2013 at 1:43 AM, Nux!  wrote:

> Hi,
>
> I have successfully mounted a Glusterfs NFS volume on Windows (7), but I
> can't write anything to it. If I create from linux a directory on this
> volume and give it perms 777 then I can write from Windows as well.
> Any pointers on how to make it work out of the box? I do have NFS client
> installed on the Windows box.
>
> Lucian
>
> --
> Sent from the Delta quadrant using Borg technology!
>
> Nux!
> www.nux.ro
> __**_
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.**org/mailman/listinfo/gluster-**users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] [FEEDBACK] Governance of GlusterFS project

2013-07-30 Thread Anand Avati
On Sun, Jul 28, 2013 at 11:32 PM, Bryan Whitehead wrote:

> Weekend activities kept me away from watching this thread, wanted to
> add in more of my 2 cents... :)
>
> Major releases would be great to happen more often - but keeping
> current releases "more current" is really what I was talking about.
> Example, 3.3.0 was a pretty solid release but some annoying bugs got
> fixed and it felt like 3.3.1 was reasonably quick to come. But that
> release seemed to be a step back for rdma (forgive me if I was wrong -
> but I think it wasn't even possible to fuse/mount over rdma with 3.3.1
> while 3.3.0 worked). But 3.3.2 release took a pretty long time to come
> and fix that regression. I think I also recall seeing a bunch of nfs
> fixes coming and regressing (but since I don't use gluster/nfs I don't
> follow closely).
>

Bryan - yes, point well taken. I believe a dedicated release maintainer
role will help in this case. I would like to hear other suggestions or
thoughts on how you/others think this can be implemented.


>
> What I'd like to see:
> In the -devel maillinglist right now I see someone is showing brick
> add / brick replace in 3.4.0 is causing a segfault in apps using
> libgfapi (in this case qemu/libvirt) to get at gluster volumes. It
> looks like some patches were provided to fix the issue. Assuming those
> patches work I think a 3.4.1 release might be worth being pushed out.
> Basic stuff like that on something that a lot of people are going to
> care about (qemu/libvirt integration - or plain libgfapi). So if there
> was a scheduled release for say - every 1-3 months - then I think that
> might be worth doing. Ref:
> http://lists.gnu.org/archive/html/gluster-devel/2013-07/msg00089.html
>

Right, thanks for highlighting. These fixes will be back ported. I have
already submitted the backport of one of them for review at
http://review.gluster.org/5427. The other will be backported once reviewed
and accepted in master.

Thanks again!
Avati

The front page of gluster.org says 3.4.0 has "Virtual Machine Image
> Storage improvements". If 1-3 months from now more traction with
> CloudStack/OpenStack or just straight up libvirtd/qemu with gluster
> gets going. I'd much rather tell someone "make sure to use 3.4.1" than
> "be careful when doing an add-brick - all your VM's will segfault".
>
> On Sun, Jul 28, 2013 at 5:10 PM, Emmanuel Dreyfus  wrote:
> > Harshavardhana  wrote:
> >
> >> What is good for GlusterFS as a whole is highly debatable - since there
> >> are no module owners/subsystem maintainers as of yet at-least on paper.
> >
> > Just my two cents on that: you need to make clear if a module maintainer
> > is a dictator or a steward for the module: does he has the last word on
> > anything touching his module, or is there some higher instance to settle
> > discussions that do not reach consensus?
> >
> > IMO the first approach creates two problems:
> >
> > - having just one responsible person for a module is a huge bet that
> > this person will have good judgments. Be careful to let a maintainer
> > position open instead of assigning it to the wrong person.
> >
> > - having many different dictators each ruling over a module can create
> > difficult situations when a proposed change impacts many modules.
> >
> > --
> > Emmanuel Dreyfus
> > http://hcpnet.free.fr/pubz
> > m...@netbsd.org
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [FEEDBACK] Governance of GlusterFS project

2013-07-30 Thread Anand Avati
On Mon, Jul 29, 2013 at 2:17 PM, Joe Julian  wrote:

> As one of the guys supporting this software, I agree that I would like
> bugfix releases to happen more. Critical and security bugs should trigger
> an immediate test release. Other bug fixes should go out on a reasonable
> schedule (monthly?). The relatively new CI testing should make this a lot
> more feasible.
>

Joe, we will certainly be increasing the frequency of releases to push out
bug fixes sooner. Though this has been a consistent theme in everybody's
comments, your feedback in particular weighs in heavily because of your
level of involvement in guiding our users :-)

Avati


>
> If there weren't hundreds of bugs to examine between releases, I would
> happily participate in the evaluation process.
>
>
> On 07/26/2013 05:16 PM, Bryan Whitehead wrote:
>
>> I would really like to see releases happen regularly and more
>> aggressively. So maybe this plan needs a community QA guy or the
>> release manager needs to take up that responsibility to say "this code
>> is good for including in the next version". (Maybe this falls under
>> process and evaluation?)
>>
>> For example, I think the ext4 patches had long been available but they
>> just took forever to get pushed out into an official release.
>>
>> I'm in favor of closing some bugs and risking introducing new bugs for
>> the sake of releases happening often.
>>
>>
>>
>> On Fri, Jul 26, 2013 at 10:26 AM, Anand Avati 
>> wrote:
>>
>>> Hello everyone,
>>>
>>>We are in the process of formalizing the governance model of the
>>> GlusterFS
>>> project. Historically, the governance of the project has been loosely
>>> structured. This is an invitation to all of you to participate in this
>>> discussion and provide your feedback and suggestions on how we should
>>> evolve
>>> a formal model. Feedback from this thread will be considered to the
>>> extent
>>> possible in formulating the draft (which will be sent out for review as
>>> well).
>>>
>>>Here are some specific topics to seed the discussion:
>>>
>>> - Core team formation
>>>- what are the qualifications for membership (e.g contributions of
>>> code,
>>> doc, packaging, support on irc/lists, how to quantify?)
>>>- what are the responsibilities of the group (e.g direction of the
>>> project, project roadmap, infrastructure, membership)
>>>
>>> - Roadmap
>>>- process of proposing features
>>>- process of selection of features for release
>>>
>>> - Release management
>>>- timelines and frequency
>>>- release themes
>>>- life cycle and support for releases
>>>- project management and tracking
>>>
>>> - Project maintainers
>>>- qualification for membership
>>>- process and evaluation
>>>
>>> There are a lot more topics which need to be discussed, I just named
>>> some to
>>> get started. I am sure our community has members who belong and
>>> participate
>>> (or at least are familiar with) other open source project communities.
>>> Your
>>> feedback will be valuable.
>>>
>>> Looking forward to hearing from you!
>>>
>>> Avati
>>>
>>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

  1   2   3   4   5   >