Re: [Gluster-devel] epel-7 mock broken in rpm.t (due to ftp.redhat.com change?)
At a guess, it'll be somewhere on: https://git.centos.org No idea of specifics though. + Justin On 13/06/2014, at 7:21 AM, Harshavardhana wrote: > Interesting - looks like all the sources have been moved? do we know where? > > On Thu, Jun 12, 2014 at 10:48 PM, Justin Clift wrote: >> Hi Kaleb, >> >> This just started showing up in rpm.t test output: >> >> ERROR: >> Exception(/home/jenkins/root/workspace/rackspace-regression-2GB/rpmbuild-mock.d/glusterfs-3.5qa2-0.621.gita22a2f0.el6.src.rpm) >> Config(epel-7-x86_64) 0 minutes 2 seconds >> INFO: Results and/or logs in: >> /home/jenkins/root/workspace/rackspace-regression-2GB/rpmbuild-mock.d/mock.d/epel-7-x86_64 >> INFO: Cleaning up build root ('clean_on_failure=True') >> Start: lock buildroot >> Start: clean chroot >> INFO: chroot (/var/lib/mock/epel-7-x86_64) unlocked and deleted >> Finish: clean chroot >> Finish: lock buildroot >> ERROR: Command failed: >> # ['/usr/bin/yum', '--installroot', '/var/lib/mock/epel-7-x86_64/root/', >> 'install', '@buildsys-build'] >> >> http://ftp.redhat.com/pub/redhat/rhel/beta/7/x86_64/os/repodata/repomd.xml >> : [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404 Not >> Found" >> Trying other mirror. >> Error: Cannot retrieve repository metadata (repomd.xml) for repository: el. >> Please verify its path and try again >> >> Seems to be due to ftp.redhat.com changing their layout or something, which >> seems to have broken mock. >> >> Guessing we'll need to disable epel-7 testing until this gets >> fixed. >> >> + Justin >> >> -- >> GlusterFS - http://www.gluster.org >> >> An open source, distributed file system scaling to several >> petabytes, and handling thousands of clients. >> >> My personal twitter: twitter.com/realjustinclift >> >> ___ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-devel > > > > -- > Religious confuse piety with mere ritual, the virtuous confuse > regulation with outcomes -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] epel-7 mock broken in rpm.t (due to ftp.redhat.com change?)
Interesting - looks like all the sources have been moved? do we know where? On Thu, Jun 12, 2014 at 10:48 PM, Justin Clift wrote: > Hi Kaleb, > > This just started showing up in rpm.t test output: > > ERROR: > Exception(/home/jenkins/root/workspace/rackspace-regression-2GB/rpmbuild-mock.d/glusterfs-3.5qa2-0.621.gita22a2f0.el6.src.rpm) > Config(epel-7-x86_64) 0 minutes 2 seconds > INFO: Results and/or logs in: > /home/jenkins/root/workspace/rackspace-regression-2GB/rpmbuild-mock.d/mock.d/epel-7-x86_64 > INFO: Cleaning up build root ('clean_on_failure=True') > Start: lock buildroot > Start: clean chroot > INFO: chroot (/var/lib/mock/epel-7-x86_64) unlocked and deleted > Finish: clean chroot > Finish: lock buildroot > ERROR: Command failed: ># ['/usr/bin/yum', '--installroot', '/var/lib/mock/epel-7-x86_64/root/', > 'install', '@buildsys-build'] > > http://ftp.redhat.com/pub/redhat/rhel/beta/7/x86_64/os/repodata/repomd.xml > : [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404 Not > Found" > Trying other mirror. > Error: Cannot retrieve repository metadata (repomd.xml) for repository: el. > Please verify its path and try again > > Seems to be due to ftp.redhat.com changing their layout or something, which > seems to have broken mock. > > Guessing we'll need to disable epel-7 testing until this gets > fixed. > > + Justin > > -- > GlusterFS - http://www.gluster.org > > An open source, distributed file system scaling to several > petabytes, and handling thousands of clients. > > My personal twitter: twitter.com/realjustinclift > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-devel -- Religious confuse piety with mere ritual, the virtuous confuse regulation with outcomes ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-857330/normal.t failure
I've got no logs so I can't confirm it. But it is most likely the same issue we found. ~kaushal On Thu, Jun 12, 2014 at 10:49 PM, Pranith Kumar Karampuri wrote: > Kaushal, > Could you check if this is this the same rebalance failure we > discovered? > > Pranith > > On 06/12/2014 10:35 PM, Justin Clift wrote: >> >> This one seems like a "proper" failure. Is it on your radar? >> >>Test Summary Report >>--- >>./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests: 24 >> Failed: 1) >> Failed test: 13 >> >>http://build.gluster.org/job/rackspace-regression/123/console >> >> I've disconnected that slave from Jenkins, so it can be logged >> into remotely (via SSH) and checked, if that's helpful. >> >> Let me know, and I'll send you the ssh password. >> >> + Justin >> >> -- >> GlusterFS - http://www.gluster.org >> >> An open source, distributed file system scaling to several >> petabytes, and handling thousands of clients. >> >> My personal twitter: twitter.com/realjustinclift >> > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] epel-7 mock broken in rpm.t (due to ftp.redhat.com change?)
Hi Kaleb, This just started showing up in rpm.t test output: ERROR: Exception(/home/jenkins/root/workspace/rackspace-regression-2GB/rpmbuild-mock.d/glusterfs-3.5qa2-0.621.gita22a2f0.el6.src.rpm) Config(epel-7-x86_64) 0 minutes 2 seconds INFO: Results and/or logs in: /home/jenkins/root/workspace/rackspace-regression-2GB/rpmbuild-mock.d/mock.d/epel-7-x86_64 INFO: Cleaning up build root ('clean_on_failure=True') Start: lock buildroot Start: clean chroot INFO: chroot (/var/lib/mock/epel-7-x86_64) unlocked and deleted Finish: clean chroot Finish: lock buildroot ERROR: Command failed: # ['/usr/bin/yum', '--installroot', '/var/lib/mock/epel-7-x86_64/root/', 'install', '@buildsys-build'] http://ftp.redhat.com/pub/redhat/rhel/beta/7/x86_64/os/repodata/repomd.xml : [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404 Not Found" Trying other mirror. Error: Cannot retrieve repository metadata (repomd.xml) for repository: el. Please verify its path and try again Seems to be due to ftp.redhat.com changing their layout or something, which seems to have broken mock. Guessing we'll need to disable epel-7 testing until this gets fixed. + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] glusterfs split-brain problem
hi, Could you let us know what is the exact problem you are running into? Pranith On 06/13/2014 09:27 AM, Krishnan Parthasarathi wrote: Hi, Pranith, who is the AFR maintainer, would be the best person to answer this question. CC'ing Pranith and gluster-devel. Krish - Original Message - hi Krishnan Parthasarathi Do you tell me which glusterfs-version has great improvement for glusterfs split-brain problem? Can you tell me the relevant links? thank you very much! justgluste...@gmail.com ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] glusterfs split-brain problem
Hi, Pranith, who is the AFR maintainer, would be the best person to answer this question. CC'ing Pranith and gluster-devel. Krish - Original Message - > hi Krishnan Parthasarathi > > Do you tell me which glusterfs-version has great improvement for glusterfs > split-brain problem? > Can you tell me the relevant links? > > thank you very much! > > > > > justgluste...@gmail.com > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious regression test failure in ./tests/bugs/bug-1101143.t
Thank you Pranith :) On Fri, Jun 13, 2014 at 9:18 AM, Justin Clift wrote: > Thanks. :) > > + Justin > > > On 13/06/2014, at 4:46 AM, Pranith Kumar Karampuri wrote: > > Found the issue. May take a while to get to it. May be we should > redesign rename in afr for this. For now reverted it > > > > Pranith > > On 06/13/2014 02:16 AM, Justin Clift wrote: > >> This one seems to be happening a lot now. The last 3 failures (across > >> different nodes) were from this test. > >> > >> Log files here: > >> > >> > http://slave2.cloud.gluster.org/logs/glusterfs-logs-20140612%3a18%3a53%3a06.tgz > >> > >> (am installing Nginx on the slaves now, for easy log retrieval as > recommended > >> by Kaushal M) > >> > >> + Justin > > -- > GlusterFS - http://www.gluster.org > > An open source, distributed file system scaling to several > petabytes, and handling thousands of clients. > > My personal twitter: twitter.com/realjustinclift > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > -- *Raghavendra Talur * ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious regression test failure in ./tests/bugs/bug-1101143.t
Thanks. :) + Justin On 13/06/2014, at 4:46 AM, Pranith Kumar Karampuri wrote: > Found the issue. May take a while to get to it. May be we should redesign > rename in afr for this. For now reverted it > > Pranith > On 06/13/2014 02:16 AM, Justin Clift wrote: >> This one seems to be happening a lot now. The last 3 failures (across >> different nodes) were from this test. >> >> Log files here: >> >> >> http://slave2.cloud.gluster.org/logs/glusterfs-logs-20140612%3a18%3a53%3a06.tgz >> >> (am installing Nginx on the slaves now, for easy log retrieval as recommended >> by Kaushal M) >> >> + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious regression test failure in ./tests/bugs/bug-1101143.t
Found the issue. May take a while to get to it. May be we should redesign rename in afr for this. For now reverted it Pranith On 06/13/2014 02:16 AM, Justin Clift wrote: This one seems to be happening a lot now. The last 3 failures (across different nodes) were from this test. Log files here: http://slave2.cloud.gluster.org/logs/glusterfs-logs-20140612%3a18%3a53%3a06.tgz (am installing Nginx on the slaves now, for easy log retrieval as recommended by Kaushal M) + Justin On 12/06/2014, at 1:23 PM, Pranith Kumar Karampuri wrote: Thanks for reporting. Will take a look. Pranith On 06/12/2014 05:52 PM, Raghavendra Talur wrote: Hi Pranith, This test failed for my patch set today and seems to be a spurious failure. Here is the console output for the run. http://build.gluster.org/job/rackspace-regression/107/consoleFull Could you please have a look at it? -- Thanks! Raghavendra Talur | Red Hat Storage Developer | Bangalore |+918039245176 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rolling upgrades from glusterfs 3.4 to 3.5
On 12/06/2014, at 6:47 PM, Pranith Kumar Karampuri wrote: > On 06/12/2014 11:16 PM, Anand Avati wrote: >> The client can actually be fixed to be compatible with both old and new >> servers. We can change the errno from ESTALE to ENOENT before doing the GFID >> mismatch check in client_lookup_cbk. > But this will require a two hop upgrade. Is that normal/acceptable? Thoughts on this from Corvid Tech: > From: "David F. Robinson" > Subject: Re: Rolling upgrades discussion on mailing list > Date: 12 June 2014 11:01:27 PM GMT+01:00 > > Read through the discussion. Not sure I fully understood it, but here are my > thoughts regardless... > > - We have roughly 1000-nodes (HPC setup) and 100 workstations that hopefully > will be using gluster as the primary storage system (if we can resolve the > few outstanding issues)... If these all had gluster-3.4.x rpm's and I wanted > to upgrade to 3.5.x rpm's, I don't particular care if I had to incrementally > update the packages (3.4.5 --> 3.4.6 --> 3.5.0 --> 3.5.1). Requiring a > minimum version seems reasonable and workable to me as long as there is a > documented process to update. > - What I wouldn't want to do if it was avoidable was to have to stop all i/o > on all of the hpc-nodes and workstations for the update. If I could do > incremental 'rpm -Uvh' for the various versions "without" killing the > currently ongoing i/o, that is what would be most important. It wasn't clear > to me if this was not possible at all, or not possible unless you upgraded > all of the clients before upgrading the server. > - If the problem is that you have to update all of the clients BEFORE > updating the server, that doesn't sound unworkable as long as there is a > mechanism to list all clients that are connected and display the gluster > version each client is using. > - Also, a clearly documented process to update would be great. We have lots > of questions and taking the entire storage device offline to do an upgrade > isn't a really feasible. My assumption after reading the link you sent is > that we would have to upgrade all of the client software (assumes that a > newer client can still write to an older server version... i.e. client 3.5.2 > can write to a 3.4.5 server). This isn't a big deal and seems reasonable, if > there is some method to find all of the connected clients. > > - From the server side, I would like to be able to take one of the > nodes/bricks offline nicely (i.e. have the current i/o finish, accept no more > i/o requests, and then take the node offline for writes) and do the update. > Then bring that node back up, let the heal finish, and move on to the next > node/brick. We are told that simply taking a node offline will not interrupt > the i/o if you are using a replica. We have experienced mixed results with > this. I believe that the active i/o to the brick fails but additional i/o > fails over to the other brick. A nicer process to take a brick offline would > be desirable. >- The other suggestion I have seen for server software updates is to > migrate all of the data from a node/brick to the other bricks of the system > (gluster remove-brick xxx), remove this brick from the pool, do the software > update, and then add it back to the pool. The issue here is that each of our > bricks is roughly 100TB. Moving all of that data to the other bricks in the > system will take a very long time. Not a practical solution for us... > > > I don't mind having to use a process (incrementally upgrade all clients, take > bricks offline, update bricks), as long as I didn't have to turn off all i/o > to the primary storage system for the entire company in order to execute an > upgrade... The number one requirement for us would be to have a process to do > an upgrade on a "live" system... Who cares if the process is painful... That > is the IT guys problem :)... + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious regression test failure in ./tests/bugs/bug-1101143.t
This one seems to be happening a lot now. The last 3 failures (across different nodes) were from this test. Log files here: http://slave2.cloud.gluster.org/logs/glusterfs-logs-20140612%3a18%3a53%3a06.tgz (am installing Nginx on the slaves now, for easy log retrieval as recommended by Kaushal M) + Justin On 12/06/2014, at 1:23 PM, Pranith Kumar Karampuri wrote: > Thanks for reporting. Will take a look. > > Pranith > > On 06/12/2014 05:52 PM, Raghavendra Talur wrote: >> Hi Pranith, >> >> This test failed for my patch set today and seems to be a spurious failure. >> Here is the console output for the run. >> http://build.gluster.org/job/rackspace-regression/107/consoleFull >> >> Could you please have a look at it? >> >> -- >> Thanks! >> Raghavendra Talur | Red Hat Storage Developer | Bangalore |+918039245176 >> > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-devel -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Please don't start jobs on rackspace-regression yet
Hi all, Please don't start rackspace-regression testing jobs in Jenkins yet. I'm very manually changing settings on them, running a job, then rebooting each node (not fun) while I try to make them more reliable. If you want a Gerrit CR run on one of the nodes, let me know which one(s) and I'll get it done. + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rolling upgrades from glusterfs 3.4 to 3.5
On Thu, Jun 12, 2014 at 10:47 AM, Pranith Kumar Karampuri < pkara...@redhat.com> wrote: > > On 06/12/2014 11:16 PM, Anand Avati wrote: > > > > > On Thu, Jun 12, 2014 at 10:39 AM, Ravishankar N > wrote: > >> On 06/12/2014 08:19 PM, Justin Clift wrote: >> >>> On 12/06/2014, at 2:22 PM, Ravishankar N wrote: >>> >>> But we will still hit the problem when rolling upgrade is performed from 3.4 to 3.5, unless the clients are also upgraded to 3.5 >>> >>> Could we introduce a client side patch into (say) 3.4.5 that helps >>> with this? >>> >> But the client side patch is needed only if Avati's server (posix) fix >> is present. And that is present only in 3.5 and not 3.4 . > > > > The client can actually be fixed to be compatible with both old and new > servers. We can change the errno from ESTALE to ENOENT before doing the > GFID mismatch check in client_lookup_cbk. > > But this will require a two hop upgrade. Is that normal/acceptable? > One hop is always better than two hops. But at least we have a way out. It is not at all unreasonable to have clients be of a minimum version for rolling upgrades to work (upgrade still works no matter what the client versions are if they are doing read-only access). ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rolling upgrades from glusterfs 3.4 to 3.5
On 06/12/2014 11:16 PM, Anand Avati wrote: On Thu, Jun 12, 2014 at 10:39 AM, Ravishankar N mailto:ravishan...@redhat.com>> wrote: On 06/12/2014 08:19 PM, Justin Clift wrote: On 12/06/2014, at 2:22 PM, Ravishankar N wrote: But we will still hit the problem when rolling upgrade is performed from 3.4 to 3.5, unless the clients are also upgraded to 3.5 Could we introduce a client side patch into (say) 3.4.5 that helps with this? But the client side patch is needed only if Avati's server (posix) fix is present. And that is present only in 3.5 and not 3.4 . The client can actually be fixed to be compatible with both old and new servers. We can change the errno from ESTALE to ENOENT before doing the GFID mismatch check in client_lookup_cbk. But this will require a two hop upgrade. Is that normal/acceptable? Pranith Thanks ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rolling upgrades from glusterfs 3.4 to 3.5
On 12/06/2014, at 6:39 PM, Ravishankar N wrote: > On 06/12/2014 08:19 PM, Justin Clift wrote: >> On 12/06/2014, at 2:22 PM, Ravishankar N wrote: >> >>> But we will still hit the problem when rolling upgrade is performed >>> from 3.4 to 3.5, unless the clients are also upgraded to 3.5 >> >> Could we introduce a client side patch into (say) 3.4.5 that helps >> with this? > But the client side patch is needed only if Avati's server (posix) fix is > present. And that is present only in 3.5 and not 3.4 . No worries. Was a thought. :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rolling upgrades from glusterfs 3.4 to 3.5
On Thu, Jun 12, 2014 at 10:39 AM, Ravishankar N wrote: > On 06/12/2014 08:19 PM, Justin Clift wrote: > >> On 12/06/2014, at 2:22 PM, Ravishankar N wrote: >> >> >>> But we will still hit the problem when rolling upgrade is performed >>> from 3.4 to 3.5, unless the clients are also upgraded to 3.5 >>> >> >> Could we introduce a client side patch into (say) 3.4.5 that helps >> with this? >> > But the client side patch is needed only if Avati's server (posix) fix is > present. And that is present only in 3.5 and not 3.4 . The client can actually be fixed to be compatible with both old and new servers. We can change the errno from ESTALE to ENOENT before doing the GFID mismatch check in client_lookup_cbk. Thanks ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rolling upgrades from glusterfs 3.4 to 3.5
On Thu, Jun 12, 2014 at 10:33 AM, Vijay Bellur wrote: > On 06/12/2014 06:52 PM, Ravishankar N wrote: > >> Hi Vijay, >> >> Since glusterfs 3.5, posix_lookup() sends ESTALE instead of ENOENT [1] >> when when a parent gfid (entry) is not present on the brick . In a >> replicate set up, this causes a problem because AFR gives more priority >> to ESTALE than ENOENT, causing IO to fail [2]. The fix is in progress at >> [3] and is client-side specific , and would make it to 3.5.2 >> >> But we will still hit the problem when rolling upgrade is performed from >> 3.4 to 3.5, unless the clients are also upgraded to 3.5: To elaborate >> an example: >> >> 0) Create a 1x2 volume using 2 nodes and mount it from client. All >> machines are glusterfs 3.4 >> 1) Perform for i in {1..30}; do mkdir $i; tar xf glusterfs-3.5git.tar.gz >> -C $i& done >> 2) While this is going on, kill one of the node in the replica pair and >> upgrade it to glusterfs 3.5 (simulating rolling upgrade) >> 3) After a while, kill all tar processes >> 4) Create a backup directory and move all 1..30 dirs inside 'backup' >> 5) Start the untar processes in 1) again >> 6) Bring up the upgraded node. Tar fails with estale errors. >> >> Essentially the errors occur because [3] is a client side fix. But >> rolling upgrades are targeted at servers while the older clients still >> need to access them without issues. >> >> A solution is to have a fix in the posix translator wherein the newer >> client passes it's version (3.5) to posix_lookup() which then sends >> ESTALE if version is 3.5 or newer but sends ENOENT instead if it is an >> older client. Does this seem okay? >> >> > Cannot think of a better solution to this. Seamless rolling upgrades are > necessary for us and the proposed fix does seem okay for that reason. > > Thanks, > Vijay > > I also like Justin's proposal, of having fixes in 3.4.X and requiring clients to be at least 3.4.X in order to have rolling upgrade to 3.5.Y. This way we can add the "special fix" in 3.4.X client (just like the 3.5.2 client). Ravi's proposal "works", but all LOOKUPs will have an extra xattr, and we will be carrying forward the compat code burden for a very long time. Whereas a 3.4.X client fix will remain in 3.4 branch. Thanks ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rolling upgrades from glusterfs 3.4 to 3.5
On 06/12/2014 11:09 PM, Ravishankar N wrote: On 06/12/2014 08:19 PM, Justin Clift wrote: On 12/06/2014, at 2:22 PM, Ravishankar N wrote: But we will still hit the problem when rolling upgrade is performed from 3.4 to 3.5, unless the clients are also upgraded to 3.5 Could we introduce a client side patch into (say) 3.4.5 that helps with this? But the client side patch is needed only if Avati's server (posix) fix is present. And that is present only in 3.5 and not 3.4 . Then mandate that 3.4 -> 3.5 rolling upgrades have to be on 3.4.5 first? The idea of a rolling upgrade is that customers don't have to stop their (hundreds of ?) clients from accessing the gluster volume and the replica configuration ensures HA during rolling upgrade of servers. When the can afford downtime of their application, then they stop the clients and upgrade them as well. So the rolling upgrade solution has to be independent of client side features. 3.4.x -> 3.4.5 should be "no issues" yeah? After the fix on the server side i.e. posix xlator, no issue. Pranith + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rolling upgrades from glusterfs 3.4 to 3.5
On 06/12/2014 08:19 PM, Justin Clift wrote: On 12/06/2014, at 2:22 PM, Ravishankar N wrote: But we will still hit the problem when rolling upgrade is performed from 3.4 to 3.5, unless the clients are also upgraded to 3.5 Could we introduce a client side patch into (say) 3.4.5 that helps with this? But the client side patch is needed only if Avati's server (posix) fix is present. And that is present only in 3.5 and not 3.4 . Then mandate that 3.4 -> 3.5 rolling upgrades have to be on 3.4.5 first? The idea of a rolling upgrade is that customers don't have to stop their (hundreds of ?) clients from accessing the gluster volume and the replica configuration ensures HA during rolling upgrade of servers. When the can afford downtime of their application, then they stop the clients and upgrade them as well. So the rolling upgrade solution has to be independent of client side features. 3.4.x -> 3.4.5 should be "no issues" yeah? + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rolling upgrades from glusterfs 3.4 to 3.5
On 06/12/2014 06:52 PM, Ravishankar N wrote: Hi Vijay, Since glusterfs 3.5, posix_lookup() sends ESTALE instead of ENOENT [1] when when a parent gfid (entry) is not present on the brick . In a replicate set up, this causes a problem because AFR gives more priority to ESTALE than ENOENT, causing IO to fail [2]. The fix is in progress at [3] and is client-side specific , and would make it to 3.5.2 But we will still hit the problem when rolling upgrade is performed from 3.4 to 3.5, unless the clients are also upgraded to 3.5: To elaborate an example: 0) Create a 1x2 volume using 2 nodes and mount it from client. All machines are glusterfs 3.4 1) Perform for i in {1..30}; do mkdir $i; tar xf glusterfs-3.5git.tar.gz -C $i& done 2) While this is going on, kill one of the node in the replica pair and upgrade it to glusterfs 3.5 (simulating rolling upgrade) 3) After a while, kill all tar processes 4) Create a backup directory and move all 1..30 dirs inside 'backup' 5) Start the untar processes in 1) again 6) Bring up the upgraded node. Tar fails with estale errors. Essentially the errors occur because [3] is a client side fix. But rolling upgrades are targeted at servers while the older clients still need to access them without issues. A solution is to have a fix in the posix translator wherein the newer client passes it's version (3.5) to posix_lookup() which then sends ESTALE if version is 3.5 or newer but sends ENOENT instead if it is an older client. Does this seem okay? Cannot think of a better solution to this. Seamless rolling upgrades are necessary for us and the proposed fix does seem okay for that reason. Thanks, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-857330/normal.t failure
Kaushal, Could you check if this is this the same rebalance failure we discovered? Pranith On 06/12/2014 10:35 PM, Justin Clift wrote: This one seems like a "proper" failure. Is it on your radar? Test Summary Report --- ./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests: 24 Failed: 1) Failed test: 13 http://build.gluster.org/job/rackspace-regression/123/console I've disconnected that slave from Jenkins, so it can be logged into remotely (via SSH) and checked, if that's helpful. Let me know, and I'll send you the ssh password. + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] bug-857330/normal.t failure
This one seems like a "proper" failure. Is it on your radar? Test Summary Report --- ./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests: 24 Failed: 1) Failed test: 13 http://build.gluster.org/job/rackspace-regression/123/console I've disconnected that slave from Jenkins, so it can be logged into remotely (via SSH) and checked, if that's helpful. Let me know, and I'll send you the ssh password. + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Spurious regression of tests/basic/mgmt_v3-locks.t
Avra/Poornima, Please look into this. Patch ==> http://review.gluster.com/#/c/6483/9 Author==> Poornima pguru...@redhat.com Build triggered by==> amarts Build-url ==> http://build.gluster.org/job/regression/4847/consoleFull Download-log-at ==> http://build.gluster.org:443/logs/regression/glusterfs-logs-20140612:08:37:44.tgz Test written by ==> Author: Avra Sengupta ./tests/basic/mgmt_v3-locks.t [11, 12, 13] 0 #!/bin/bash 1 2 . $(dirname $0)/../include.rc 3 . $(dirname $0)/../cluster.rc 4 5 function check_peers { 6 $CLI_1 peer status | grep 'Peer in Cluster (Connected)' | wc -l 7 } 8 9 function volume_count { 10 local cli=$1; 11 if [ $cli -eq '1' ] ; then 12 $CLI_1 volume info | grep 'Volume Name' | wc -l; 13 else 14 $CLI_2 volume info | grep 'Volume Name' | wc -l; 15 fi 16 } 17 18 function volinfo_field() 19 { 20 local vol=$1; 21 local field=$2; 22 23 $CLI_1 volume info $vol | grep "^$field: " | sed 's/.*: //'; 24 } 25 26 function two_diff_vols_create { 27 # Both volume creates should be successful 28 $CLI_1 volume create $V0 $H1:$B1/$V0 $H2:$B2/$V0 $H3:$B3/$V0 & 29 PID_1=$! 30 31 $CLI_2 volume create $V1 $H1:$B1/$V1 $H2:$B2/$V1 $H3:$B3/$V1 & 32 PID_2=$! 33 34 wait $PID_1 $PID_2 35 } 36 37 function two_diff_vols_start { 38 # Both volume starts should be successful 39 $CLI_1 volume start $V0 & 40 PID_1=$! 41 42 $CLI_2 volume start $V1 & 43 PID_2=$! 44 45 wait $PID_1 $PID_2 46 } 47 48 function two_diff_vols_stop_force { 49 # Force stop, so that if rebalance from the 50 # remove bricks is in progress, stop can 51 # still go ahead. Both volume stops should 52 # be successful 53 $CLI_1 volume stop $V0 force & 54 PID_1=$! 55 56 $CLI_2 volume stop $V1 force & 57 PID_2=$! 58 59 wait $PID_1 $PID_2 60 } 61 62 function same_vol_remove_brick { 63 64 # Running two same vol commands at the same time can result in 65 # two success', two failures, or one success and one failure, all 66 # of which are valid. The only thing that shouldn't happen is a 67 # glusterd crash. 68 69 local vol=$1 70 local brick=$2 71 $CLI_1 volume remove-brick $1 $2 start & 72 $CLI_2 volume remove-brick $1 $2 start 73 } 74 75 cleanup; 76 77 TEST launch_cluster 3; 78 TEST $CLI_1 peer probe $H2; 79 TEST $CLI_1 peer probe $H3; 80 81 EXPECT_WITHIN $PROBE_TIMEOUT 2 check_peers 82 83 two_diff_vols_create 84 EXPECT 'Created' volinfo_field $V0 'Status'; 85 EXPECT 'Created' volinfo_field $V1 'Status'; 86 87 two_diff_vols_start 88 EXPECT 'Started' volinfo_field $V0 'Status'; 89 EXPECT 'Started' volinfo_field $V1 'Status'; 90 91 same_vol_remove_brick $V0 $H2:$B2/$V0 92 # Checking glusterd crashed or not after same volume remove brick 93 # on both nodes. 94 EXPECT_WITHIN $PROBE_TIMEOUT 2 check_peers 95 96 same_vol_remove_brick $V1 $H2:$B2/$V1 97 # Checking glusterd crashed or not after same volume remove brick 98 # on both nodes. 99 EXPECT_WITHIN $PROBE_TIMEOUT 2 check_peers 100 101 $CLI_1 volume set $V0 diagnostics.client-log-level DEBUG & 102 $CLI_1 volume set $V1 diagnostics.client-log-level DEBUG 103 kill_glusterd 3 104 $CLI_1 volume status $V0 105 $CLI_2 volume status $V1 106 $CLI_1 peer status **107 EXPECT_WITHIN $PROBE_TIMEOUT 1 check_peers **108 EXPECT 'Started' volinfo_field $V0 'Status'; **109 EXPECT 'Started' volinfo_field $V1 'Status'; 110 111 TEST $glusterd_3 112 $CLI_1 volume status $V0 113 $CLI_2 volume status $V1 114 $CLI_1 peer status 115 #EXPECT_WITHIN $PROBE_TIMEOUT 2 check_peers 116 #EXPECT 'Started' volinfo_field $V0 'Status'; 117 #EXPECT 'Started' volinfo_field $V1 'Status'; 118 #two_diff_vols_stop_force 119 #EXPECT_WITHIN $PROBE_TIMEOUT 2 check_peers 120 cleanup; Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rolling upgrades from glusterfs 3.4 to 3.5
On 12/06/2014, at 2:22 PM, Ravishankar N wrote: > But we will still hit the problem when rolling upgrade is performed > from 3.4 to 3.5, unless the clients are also upgraded to 3.5 Could we introduce a client side patch into (say) 3.4.5 that helps with this? Then mandate that 3.4 -> 3.5 rolling upgrades have to be on 3.4.5 first? 3.4.x -> 3.4.5 should be "no issues" yeah? + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Please use http://build.gluster.org/job/rackspace-regression/
On 12/06/2014, at 10:22 AM, Pranith Kumar Karampuri wrote: > hi Guys, > Rackspace slaves are in action now, thanks to Justin. Please use the URL > in Subject to run the regressions. I already shifted some jobs to rackspace. Good thinking, but please hold off on this for now. The slaves are hugely unreliable (lots of hanging), at the moment. :( Rebooting each slave after each run seems to help, but that's not a real solution. I'll be adjusting and tweaking their settings throughout the day in order to improve it. + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Rolling upgrades from glusterfs 3.4 to 3.5
Hi Vijay, Since glusterfs 3.5, posix_lookup() sends ESTALE instead of ENOENT [1] when when a parent gfid (entry) is not present on the brick . In a replicate set up, this causes a problem because AFR gives more priority to ESTALE than ENOENT, causing IO to fail [2]. The fix is in progress at [3] and is client-side specific , and would make it to 3.5.2 But we will still hit the problem when rolling upgrade is performed from 3.4 to 3.5, unless the clients are also upgraded to 3.5: To elaborate an example: 0) Create a 1x2 volume using 2 nodes and mount it from client. All machines are glusterfs 3.4 1) Perform for i in {1..30}; do mkdir $i; tar xf glusterfs-3.5git.tar.gz -C $i& done 2) While this is going on, kill one of the node in the replica pair and upgrade it to glusterfs 3.5 (simulating rolling upgrade) 3) After a while, kill all tar processes 4) Create a backup directory and move all 1..30 dirs inside 'backup' 5) Start the untar processes in 1) again 6) Bring up the upgraded node. Tar fails with estale errors. Essentially the errors occur because [3] is a client side fix. But rolling upgrades are targeted at servers while the older clients still need to access them without issues. A solution is to have a fix in the posix translator wherein the newer client passes it's version (3.5) to posix_lookup() which then sends ESTALE if version is 3.5 or newer but sends ENOENT instead if it is an older client. Does this seem okay? [1] http://review.gluster.org/6318 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1106408 [3] http://review.gluster.org/#/c/8015/ ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious regression test failure in ./tests/bugs/bug-1101143.t
Thanks for reporting. Will take a look. Pranith On 06/12/2014 05:52 PM, Raghavendra Talur wrote: Hi Pranith, This test failed for my patch set today and seems to be a spurious failure. Here is the console output for the run. http://build.gluster.org/job/rackspace-regression/107/consoleFull Could you please have a look at it? -- Thanks! Raghavendra Talur | Red Hat Storage Developer | Bangalore |+918039245176 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Spurious regression test failure in ./tests/bugs/bug-1101143.t
Hi Pranith, This test failed for my patch set today and seems to be a spurious failure. Here is the console output for the run. http://build.gluster.org/job/rackspace-regression/107/consoleFull Could you please have a look at it? -- Thanks! Raghavendra Talur | Red Hat Storage Developer | Bangalore | +918039245176 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t
Thanks a lot for quick resolution Sachin Pranith On 06/12/2014 04:38 PM, Sachin Pandit wrote: http://review.gluster.org/#/c/8041/ is merged upstream. ~ Sachin. - Original Message - From: "Sachin Pandit" To: "Raghavendra Talur" Cc: "Pranith Kumar Karampuri" , "Gluster Devel" Sent: Thursday, June 12, 2014 12:58:44 PM Subject: Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t Patch link http://review.gluster.org/#/c/8041/. ~ Sachin. - Original Message - From: "Raghavendra Talur" To: "Pranith Kumar Karampuri" Cc: "Sachin Pandit" , "Gluster Devel" Sent: Thursday, June 12, 2014 10:46:14 AM Subject: Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t Sachin and I looked at the failure. Current guess is that glusterd_2 had not yet completed the handshake with glusterd_1 and hence did not know about the option set. KP suggested that instead of having a sleep before this command, we could get peer status and verify that it is 1 and then get the vol info. Although even this does not make the test fully deterministic, we will be closer to it. Sachin will send out a patch for the same. Raghavendra Talur - Original Message - From: "Pranith Kumar Karampuri" To: "Sachin Pandit" Cc: "Gluster Devel" Sent: Thursday, June 12, 2014 9:54:03 AM Subject: Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t Check the logs to find the reason. Pranith. On 06/12/2014 09:24 AM, Sachin Pandit wrote: I am not hitting this even after running the test case in a loop. I'll update in this thread once I find out the root cause of the failure. ~ Sachin - Original Message - From: "Sachin Pandit" To: "Pranith Kumar Karampuri" Cc: "Gluster Devel" Sent: Thursday, June 12, 2014 8:50:40 AM Subject: Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t I will look into this. - Original Message - From: "Pranith Kumar Karampuri" To: "Gluster Devel" Cc: rta...@redhat.com, span...@redhat.com Sent: Wednesday, June 11, 2014 9:08:44 PM Subject: spurious regression failure in tests/bugs/bug-1104642.t Raghavendra/Sachin, Could one of you guys take a look at this please. pk1@localhost - ~/workspace/gerrit-repo (master) 21:04:46 :) ⚡ ~/.scripts/regression.py http://build.gluster.org/job/regression/4831/consoleFull Patch ==> http://review.gluster.com/#/c/7994/2 Author ==> Raghavendra Talur rta...@redhat.com Build triggered by ==> amarts Build-url ==> http://build.gluster.org/job/regression/4831/consoleFull Download-log-at ==> http://build.gluster.org:443/logs/regression/glusterfs-logs-20140611:08:39:04.tgz Test written by ==> Author: Sachin Pandit ./tests/bugs/bug-1104642.t [13] 0 #!/bin/bash 1 2 . $(dirname $0)/../include.rc 3 . $(dirname $0)/../volume.rc 4 . $(dirname $0)/../cluster.rc 5 6 7 function get_value() 8 { 9 local key=$1 10 local var="CLI_$2" 11 12 eval cli_index=\$$var 13 14 $cli_index volume info | grep "^$key"\ 15 | sed 's/.*: //' 16 } 17 18 cleanup 19 20 TEST launch_cluster 2 21 22 TEST $CLI_1 peer probe $H2; 23 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count 24 25 TEST $CLI_1 volume create $V0 $H1:$B1/${V0}0 $H2:$B2/${V0}1 26 EXPECT "$V0" get_value 'Volume Name' 1 27 EXPECT "Created" get_value 'Status' 1 28 29 TEST $CLI_1 volume start $V0 30 EXPECT "Started" get_value 'Status' 1 31 32 #Bring down 2nd glusterd 33 TEST kill_glusterd 2 34 35 #set the volume all options from the 1st glusterd 36 TEST $CLI_1 volume set all cluster.server-quorum-ratio 80 37 38 #Bring back the 2nd glusterd 39 TEST $glusterd_2 40 41 #Verify whether the value has been synced 42 EXPECT '80' get_value 'cluster.server-quorum-ratio' 1 ***43 EXPECT '80' get_value 'cluster.server-quorum-ratio' 2 44 45 cleanup; Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t
http://review.gluster.org/#/c/8041/ is merged upstream. ~ Sachin. - Original Message - From: "Sachin Pandit" To: "Raghavendra Talur" Cc: "Pranith Kumar Karampuri" , "Gluster Devel" Sent: Thursday, June 12, 2014 12:58:44 PM Subject: Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t Patch link http://review.gluster.org/#/c/8041/. ~ Sachin. - Original Message - From: "Raghavendra Talur" To: "Pranith Kumar Karampuri" Cc: "Sachin Pandit" , "Gluster Devel" Sent: Thursday, June 12, 2014 10:46:14 AM Subject: Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t Sachin and I looked at the failure. Current guess is that glusterd_2 had not yet completed the handshake with glusterd_1 and hence did not know about the option set. KP suggested that instead of having a sleep before this command, we could get peer status and verify that it is 1 and then get the vol info. Although even this does not make the test fully deterministic, we will be closer to it. Sachin will send out a patch for the same. Raghavendra Talur - Original Message - From: "Pranith Kumar Karampuri" To: "Sachin Pandit" Cc: "Gluster Devel" Sent: Thursday, June 12, 2014 9:54:03 AM Subject: Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t Check the logs to find the reason. Pranith. On 06/12/2014 09:24 AM, Sachin Pandit wrote: > I am not hitting this even after running the test case in a loop. > I'll update in this thread once I find out the root cause of the failure. > > ~ Sachin > > - Original Message - > From: "Sachin Pandit" > To: "Pranith Kumar Karampuri" > Cc: "Gluster Devel" > Sent: Thursday, June 12, 2014 8:50:40 AM > Subject: Re: [Gluster-devel] spurious regression failure in > tests/bugs/bug-1104642.t > > I will look into this. > > - Original Message - > From: "Pranith Kumar Karampuri" > To: "Gluster Devel" > Cc: rta...@redhat.com, span...@redhat.com > Sent: Wednesday, June 11, 2014 9:08:44 PM > Subject: spurious regression failure in tests/bugs/bug-1104642.t > > Raghavendra/Sachin, > Could one of you guys take a look at this please. > > pk1@localhost - ~/workspace/gerrit-repo (master) > 21:04:46 :) ⚡ ~/.scripts/regression.py > http://build.gluster.org/job/regression/4831/consoleFull > Patch ==> http://review.gluster.com/#/c/7994/2 > Author ==> Raghavendra Talur rta...@redhat.com > Build triggered by ==> amarts > Build-url ==> http://build.gluster.org/job/regression/4831/consoleFull > Download-log-at ==> > http://build.gluster.org:443/logs/regression/glusterfs-logs-20140611:08:39:04.tgz > Test written by ==> Author: Sachin Pandit > > ./tests/bugs/bug-1104642.t [13] > 0 #!/bin/bash > 1 > 2 . $(dirname $0)/../include.rc > 3 . $(dirname $0)/../volume.rc > 4 . $(dirname $0)/../cluster.rc > 5 > 6 > 7 function get_value() > 8 { > 9 local key=$1 > 10 local var="CLI_$2" > 11 > 12 eval cli_index=\$$var > 13 > 14 $cli_index volume info | grep "^$key"\ > 15 | sed 's/.*: //' > 16 } > 17 > 18 cleanup > 19 > 20 TEST launch_cluster 2 > 21 > 22 TEST $CLI_1 peer probe $H2; > 23 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count > 24 > 25 TEST $CLI_1 volume create $V0 $H1:$B1/${V0}0 $H2:$B2/${V0}1 > 26 EXPECT "$V0" get_value 'Volume Name' 1 > 27 EXPECT "Created" get_value 'Status' 1 > 28 > 29 TEST $CLI_1 volume start $V0 > 30 EXPECT "Started" get_value 'Status' 1 > 31 > 32 #Bring down 2nd glusterd > 33 TEST kill_glusterd 2 > 34 > 35 #set the volume all options from the 1st glusterd > 36 TEST $CLI_1 volume set all cluster.server-quorum-ratio 80 > 37 > 38 #Bring back the 2nd glusterd > 39 TEST $glusterd_2 > 40 > 41 #Verify whether the value has been synced > 42 EXPECT '80' get_value 'cluster.server-quorum-ratio' 1 > ***43 EXPECT '80' get_value 'cluster.server-quorum-ratio' 2 > 44 > 45 cleanup; > > Pranith > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel -- Thanks! Raghavendra Talur | Red Hat Storage Developer | Bangalore | +918039245176 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Request for merging patch
On 06/12/2014 01:35 PM, Pranith Kumar Karampuri wrote: Vijay, Could you merge this patch please. http://review.gluster.org/7928 Done, thanks. -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Please use http://build.gluster.org/job/rackspace-regression/
hi Guys, Rackspace slaves are in action now, thanks to Justin. Please use the URL in Subject to run the regressions. I already shifted some jobs to rackspace. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Request for merging patch
Vijay, Could you merge this patch please. http://review.gluster.org/7928 Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t
Patch link http://review.gluster.org/#/c/8041/. ~ Sachin. - Original Message - From: "Raghavendra Talur" To: "Pranith Kumar Karampuri" Cc: "Sachin Pandit" , "Gluster Devel" Sent: Thursday, June 12, 2014 10:46:14 AM Subject: Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t Sachin and I looked at the failure. Current guess is that glusterd_2 had not yet completed the handshake with glusterd_1 and hence did not know about the option set. KP suggested that instead of having a sleep before this command, we could get peer status and verify that it is 1 and then get the vol info. Although even this does not make the test fully deterministic, we will be closer to it. Sachin will send out a patch for the same. Raghavendra Talur - Original Message - From: "Pranith Kumar Karampuri" To: "Sachin Pandit" Cc: "Gluster Devel" Sent: Thursday, June 12, 2014 9:54:03 AM Subject: Re: [Gluster-devel] spurious regression failure in tests/bugs/bug-1104642.t Check the logs to find the reason. Pranith. On 06/12/2014 09:24 AM, Sachin Pandit wrote: > I am not hitting this even after running the test case in a loop. > I'll update in this thread once I find out the root cause of the failure. > > ~ Sachin > > - Original Message - > From: "Sachin Pandit" > To: "Pranith Kumar Karampuri" > Cc: "Gluster Devel" > Sent: Thursday, June 12, 2014 8:50:40 AM > Subject: Re: [Gluster-devel] spurious regression failure in > tests/bugs/bug-1104642.t > > I will look into this. > > - Original Message - > From: "Pranith Kumar Karampuri" > To: "Gluster Devel" > Cc: rta...@redhat.com, span...@redhat.com > Sent: Wednesday, June 11, 2014 9:08:44 PM > Subject: spurious regression failure in tests/bugs/bug-1104642.t > > Raghavendra/Sachin, > Could one of you guys take a look at this please. > > pk1@localhost - ~/workspace/gerrit-repo (master) > 21:04:46 :) ⚡ ~/.scripts/regression.py > http://build.gluster.org/job/regression/4831/consoleFull > Patch ==> http://review.gluster.com/#/c/7994/2 > Author ==> Raghavendra Talur rta...@redhat.com > Build triggered by ==> amarts > Build-url ==> http://build.gluster.org/job/regression/4831/consoleFull > Download-log-at ==> > http://build.gluster.org:443/logs/regression/glusterfs-logs-20140611:08:39:04.tgz > Test written by ==> Author: Sachin Pandit > > ./tests/bugs/bug-1104642.t [13] > 0 #!/bin/bash > 1 > 2 . $(dirname $0)/../include.rc > 3 . $(dirname $0)/../volume.rc > 4 . $(dirname $0)/../cluster.rc > 5 > 6 > 7 function get_value() > 8 { > 9 local key=$1 > 10 local var="CLI_$2" > 11 > 12 eval cli_index=\$$var > 13 > 14 $cli_index volume info | grep "^$key"\ > 15 | sed 's/.*: //' > 16 } > 17 > 18 cleanup > 19 > 20 TEST launch_cluster 2 > 21 > 22 TEST $CLI_1 peer probe $H2; > 23 EXPECT_WITHIN $PROBE_TIMEOUT 1 peer_count > 24 > 25 TEST $CLI_1 volume create $V0 $H1:$B1/${V0}0 $H2:$B2/${V0}1 > 26 EXPECT "$V0" get_value 'Volume Name' 1 > 27 EXPECT "Created" get_value 'Status' 1 > 28 > 29 TEST $CLI_1 volume start $V0 > 30 EXPECT "Started" get_value 'Status' 1 > 31 > 32 #Bring down 2nd glusterd > 33 TEST kill_glusterd 2 > 34 > 35 #set the volume all options from the 1st glusterd > 36 TEST $CLI_1 volume set all cluster.server-quorum-ratio 80 > 37 > 38 #Bring back the 2nd glusterd > 39 TEST $glusterd_2 > 40 > 41 #Verify whether the value has been synced > 42 EXPECT '80' get_value 'cluster.server-quorum-ratio' 1 > ***43 EXPECT '80' get_value 'cluster.server-quorum-ratio' 2 > 44 > 45 cleanup; > > Pranith > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel -- Thanks! Raghavendra Talur | Red Hat Storage Developer | Bangalore | +918039245176 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] XFS kernel panic bug?
On Thu, Jun 12, 2014 at 07:26:25AM +0100, Justin Clift wrote: > On 12/06/2014, at 6:58 AM, Niels de Vos wrote: > > > If you capture a vmcore (needs kdump installed and configured), we may > > be able to see the cause more clearly. Oh, these seem to be Xen hosts. I don't think kdump (mainly kexec) works on Xen. You would need to run xen-dump (or something like that) on the Dom0, for that, you'll have to call Rackspace support, and I have no idea how they handle such requests... > That does help, and so will Harsha's suggestion too probably. :) That is indeed a solution that can mostly prevent such memory dead-locks. Those options can be used to configure to push out the outstanding data earlier to the loop-devices, and to the underlying XFS filesystem that hold the backing files for the loop-devices. Cheers, Niels > I'll look into it properly later on today. > > For the moment, I've rebooted the other slaves which seems to put them into > an ok state for a few runs. > > Also just started some rackspace-regression runs on them, using the ones > queued up in the normal regression queue. > > The results are being updated live into Gerrit now (+1/-1/MERGE CONFLICT). > > So, if you see any regression runs pass on the slaves, it's worth removing > the corresponding job from the main regression queue. That'll help keep > the queue shorter for today at least. :) > > Btw - Happy vacation Niels :) > > /me goes to bed > > + Justin > > -- > GlusterFS - http://www.gluster.org > > An open source, distributed file system scaling to several > petabytes, and handling thousands of clients. > > My personal twitter: twitter.com/realjustinclift > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel