Re: [Gluster-users] SQLite3 on 3 node cluster FS?
I was able to get the docker containers I'm using to test with to install the latest builds from gluster.org. So client/server versions are both 3.13.2 I am testing two main cases, both using sqlite3. With a php program wrapping all database operations with an flock(), it now works as expected. I ran the same test 500 times (or so) yesterday afternoon, and it worked every time. I repeated that same test both with and without performance.flush-behind/write-behind enabled with the same result. So that's great! When I ran my other test case, just allowing sqlite3 fcntl() style locks to manage data, the test fails with either performance setting. So it could be that sqlite3 is not correctly managing its lock and flush operations, or it is possible gluster has a data integrity problem in the case when fcntl() style locks are used. I have no way of knowing which is more likely... I think I've got what I need, so someone else is going to need to pick up the ball if they want a sqlite3 lock to work on its own with gluster. I will say that it is slow if a bunch of writers are trying to update individual records at the same time, since the database is ping-ponging all over the cluster as different clients get and hold the lock. I've updated my github repo with my latest changes if anyone feels like trying it on their own: https://github.com/powool/gluster.git My summary is: sqlite3 built in locks don't appear to work nicely with gluster, so you have to put an flock() around the database operations to prevent data loss. You also can't do any caching in your volume mount on the client side. The performance settings server side appear not to matter, provided you're up to date on client/server code. I hope this helps someone! Paul On Tue, Mar 6, 2018 at 12:32 PM, Raghavendra Gowdappa wrote: > > > On Tue, Mar 6, 2018 at 10:58 PM, Raghavendra Gowdappa > wrote: >> >> >> >> On Tue, Mar 6, 2018 at 10:22 PM, Paul Anderson wrote: >>> >>> Raghavendra, >>> >>> I've commited my tests case to https://github.com/powool/gluster.git - >>> it's grungy, and a work in progress, but I am happy to take change >>> suggestions, especially if it will save folks significant time. >>> >>> For the rest, I'll reply inline below... >>> >>> On Mon, Mar 5, 2018 at 10:39 PM, Raghavendra Gowdappa >>> wrote: >>> > +Csaba. >>> > >>> > On Tue, Mar 6, 2018 at 2:52 AM, Paul Anderson wrote: >>> >> >>> >> Raghavendra, >>> >> >>> >> Thanks very much for your reply. >>> >> >>> >> I fixed our data corruption problem by disabling the volume >>> >> performance.write-behind flag as you suggested, and simultaneously >>> >> disabling caching in my client side mount command. >>> > >>> > >>> > Good to know it worked. Can you give us the output of >>> > # gluster volume info >>> >>> [root@node-1 /]# gluster volume info >>> >>> Volume Name: dockerstore >>> Type: Replicate >>> Volume ID: fb08b9f4-0784-4534-9ed3-e01ff71a0144 >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 1 x 3 = 3 >>> Transport-type: tcp >>> Bricks: >>> Brick1: 172.18.0.4:/data/glusterfs/store/dockerstore >>> Brick2: 172.18.0.3:/data/glusterfs/store/dockerstore >>> Brick3: 172.18.0.2:/data/glusterfs/store/dockerstore >>> Options Reconfigured: >>> performance.client-io-threads: off >>> nfs.disable: on >>> transport.address-family: inet >>> locks.mandatory-locking: optimal >>> performance.flush-behind: off >>> performance.write-behind: off >>> >>> > >>> > We would like to debug the problem in write-behind. Some questions: >>> > >>> > 1. What version of Glusterfs are you using? >>> >>> On the server nodes: >>> >>> [root@node-1 /]# gluster --version >>> glusterfs 3.13.2 >>> Repository revision: git://git.gluster.org/glusterfs.git >>> >>> On the docker container sqlite test node: >>> >>> root@b4055d8547d2:/# glusterfs --version >>> glusterfs 3.8.8 built on Jan 11 2017 14:07:11 >> >> >> I guess this is where client is mounted. If I am correct on where >> glusterfs client is mounted, client is running quite a old version. There >> have been significant number of fixes between 3.8.8 and current master. > > > ... significant number of fixes to write-behind... > >> I would suggest to try out 3.13.2 patched with [1]. If you get a chance to >> try this out, please report back how did the tests go. > > > I would suggest to try out 3.13.2 patched with [1] and run tests with > write-behind turned on. > >> >> [1] https://review.gluster.org/19673 >> >>> >>> I recognize that version skew could be an issue. >>> >>> > 2. Were you able to figure out whether its stale data or metadata that >>> > is >>> > causing the issue? >>> >>> I lean towards stale data based on the only real observation I have: >>> >>> While debugging, I put log messages in as to when the flock() is >>> acquired, and when it is released. There is no instance where two >>> different processes ever hold the same flock()'d file. From what I >>> have read, the locks are considered metadata, and they appear to me to >>> be wor
Re: [Gluster-users] SQLite3 on 3 node cluster FS?
On Tue, Mar 6, 2018 at 10:58 PM, Raghavendra Gowdappa wrote: > > > On Tue, Mar 6, 2018 at 10:22 PM, Paul Anderson wrote: > >> Raghavendra, >> >> I've commited my tests case to https://github.com/powool/gluster.git - >> it's grungy, and a work in progress, but I am happy to take change >> suggestions, especially if it will save folks significant time. >> >> For the rest, I'll reply inline below... >> >> On Mon, Mar 5, 2018 at 10:39 PM, Raghavendra Gowdappa >> wrote: >> > +Csaba. >> > >> > On Tue, Mar 6, 2018 at 2:52 AM, Paul Anderson wrote: >> >> >> >> Raghavendra, >> >> >> >> Thanks very much for your reply. >> >> >> >> I fixed our data corruption problem by disabling the volume >> >> performance.write-behind flag as you suggested, and simultaneously >> >> disabling caching in my client side mount command. >> > >> > >> > Good to know it worked. Can you give us the output of >> > # gluster volume info >> >> [root@node-1 /]# gluster volume info >> >> Volume Name: dockerstore >> Type: Replicate >> Volume ID: fb08b9f4-0784-4534-9ed3-e01ff71a0144 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: 172.18.0.4:/data/glusterfs/store/dockerstore >> Brick2: 172.18.0.3:/data/glusterfs/store/dockerstore >> Brick3: 172.18.0.2:/data/glusterfs/store/dockerstore >> Options Reconfigured: >> performance.client-io-threads: off >> nfs.disable: on >> transport.address-family: inet >> locks.mandatory-locking: optimal >> performance.flush-behind: off >> performance.write-behind: off >> >> > >> > We would like to debug the problem in write-behind. Some questions: >> > >> > 1. What version of Glusterfs are you using? >> >> On the server nodes: >> >> [root@node-1 /]# gluster --version >> glusterfs 3.13.2 >> Repository revision: git://git.gluster.org/glusterfs.git >> >> On the docker container sqlite test node: >> >> root@b4055d8547d2:/# glusterfs --version >> glusterfs 3.8.8 built on Jan 11 2017 14:07:11 >> > > I guess this is where client is mounted. If I am correct on where > glusterfs client is mounted, client is running quite a old version. There > have been significant number of fixes between 3.8.8 and current master. > ... significant number of fixes to write-behind... I would suggest to try out 3.13.2 patched with [1]. If you get a chance to > try this out, please report back how did the tests go. > I would suggest to try out 3.13.2 patched with [1] and run tests with write-behind turned on. > [1] https://review.gluster.org/19673 > > >> I recognize that version skew could be an issue. >> >> > 2. Were you able to figure out whether its stale data or metadata that >> is >> > causing the issue? >> >> I lean towards stale data based on the only real observation I have: >> >> While debugging, I put log messages in as to when the flock() is >> acquired, and when it is released. There is no instance where two >> different processes ever hold the same flock()'d file. From what I >> have read, the locks are considered metadata, and they appear to me to >> be working, so that's why I'm inclined to think stale data is the >> issue. >> >> > >> > There have been patches merged in write-behind in recent past and one >> in the >> > works which address metadata consistency. Would like to understand >> whether >> > you've run into any of the already identified issues. >> >> Agreed! >> >> Thanks, >> >> Paul >> >> > >> > regards, >> > Raghavendra >> >> >> >> >> >> In very modest testing, the flock() case appears to me to work well - >> >> before it would corrupt the db within a few transactions. >> >> >> >> Testing using built in sqlite3 locks is better (fcntl range locks), >> >> but has some behavioral issues (probably just requires query retry >> >> when the file is locked). I'll research this more, although the test >> >> case is not critical to our use case. >> >> >> >> There are no signs of O_DIRECT use in the sqlite3 code that I can see. >> >> >> >> I intend to set up tests that run much longer than a few minutes, to >> >> see if there are any longer term issues. Also, I want to experiment >> >> with data durability by killing various gluster server nodes during >> >> the tests. >> >> >> >> If anyone would like our test scripts, I can either tar them up and >> >> email them or put them in github - either is fine with me. (they rely >> >> on current builds of docker and docker-compose) >> >> >> >> Thanks again!! >> >> >> >> Paul >> >> >> >> On Mon, Mar 5, 2018 at 11:26 AM, Raghavendra Gowdappa >> >> wrote: >> >> > >> >> > >> >> > On Mon, Mar 5, 2018 at 8:21 PM, Paul Anderson wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> tl;dr summary of below: flock() works, but what does it take to make >> >> >> sync()/fsync() work in a 3 node GFS cluster? >> >> >> >> >> >> I am under the impression that POSIX flock, POSIX >> >> >> fcntl(F_SETLK/F_GETLK,...), and POSIX read/write/sync/fsync are all >> >> >> supported in cluster operations, such that in theory, SQLite3 should >>
Re: [Gluster-users] SQLite3 on 3 node cluster FS?
On Tue, Mar 6, 2018 at 10:22 PM, Paul Anderson wrote: > Raghavendra, > > I've commited my tests case to https://github.com/powool/gluster.git - > it's grungy, and a work in progress, but I am happy to take change > suggestions, especially if it will save folks significant time. > > For the rest, I'll reply inline below... > > On Mon, Mar 5, 2018 at 10:39 PM, Raghavendra Gowdappa > wrote: > > +Csaba. > > > > On Tue, Mar 6, 2018 at 2:52 AM, Paul Anderson wrote: > >> > >> Raghavendra, > >> > >> Thanks very much for your reply. > >> > >> I fixed our data corruption problem by disabling the volume > >> performance.write-behind flag as you suggested, and simultaneously > >> disabling caching in my client side mount command. > > > > > > Good to know it worked. Can you give us the output of > > # gluster volume info > > [root@node-1 /]# gluster volume info > > Volume Name: dockerstore > Type: Replicate > Volume ID: fb08b9f4-0784-4534-9ed3-e01ff71a0144 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: 172.18.0.4:/data/glusterfs/store/dockerstore > Brick2: 172.18.0.3:/data/glusterfs/store/dockerstore > Brick3: 172.18.0.2:/data/glusterfs/store/dockerstore > Options Reconfigured: > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > locks.mandatory-locking: optimal > performance.flush-behind: off > performance.write-behind: off > > > > > We would like to debug the problem in write-behind. Some questions: > > > > 1. What version of Glusterfs are you using? > > On the server nodes: > > [root@node-1 /]# gluster --version > glusterfs 3.13.2 > Repository revision: git://git.gluster.org/glusterfs.git > > On the docker container sqlite test node: > > root@b4055d8547d2:/# glusterfs --version > glusterfs 3.8.8 built on Jan 11 2017 14:07:11 > I guess this is where client is mounted. If I am correct on where glusterfs client is mounted, client is running quite a old version. There have been significant number of fixes between 3.8.8 and current master. I would suggest to try out 3.13.2 patched with [1]. If you get a chance to try this out, please report back how did the tests go. [1] https://review.gluster.org/19673 > I recognize that version skew could be an issue. > > > 2. Were you able to figure out whether its stale data or metadata that is > > causing the issue? > > I lean towards stale data based on the only real observation I have: > > While debugging, I put log messages in as to when the flock() is > acquired, and when it is released. There is no instance where two > different processes ever hold the same flock()'d file. From what I > have read, the locks are considered metadata, and they appear to me to > be working, so that's why I'm inclined to think stale data is the > issue. > > > > > There have been patches merged in write-behind in recent past and one in > the > > works which address metadata consistency. Would like to understand > whether > > you've run into any of the already identified issues. > > Agreed! > > Thanks, > > Paul > > > > > regards, > > Raghavendra > >> > >> > >> In very modest testing, the flock() case appears to me to work well - > >> before it would corrupt the db within a few transactions. > >> > >> Testing using built in sqlite3 locks is better (fcntl range locks), > >> but has some behavioral issues (probably just requires query retry > >> when the file is locked). I'll research this more, although the test > >> case is not critical to our use case. > >> > >> There are no signs of O_DIRECT use in the sqlite3 code that I can see. > >> > >> I intend to set up tests that run much longer than a few minutes, to > >> see if there are any longer term issues. Also, I want to experiment > >> with data durability by killing various gluster server nodes during > >> the tests. > >> > >> If anyone would like our test scripts, I can either tar them up and > >> email them or put them in github - either is fine with me. (they rely > >> on current builds of docker and docker-compose) > >> > >> Thanks again!! > >> > >> Paul > >> > >> On Mon, Mar 5, 2018 at 11:26 AM, Raghavendra Gowdappa > >> wrote: > >> > > >> > > >> > On Mon, Mar 5, 2018 at 8:21 PM, Paul Anderson wrote: > >> >> > >> >> Hi, > >> >> > >> >> tl;dr summary of below: flock() works, but what does it take to make > >> >> sync()/fsync() work in a 3 node GFS cluster? > >> >> > >> >> I am under the impression that POSIX flock, POSIX > >> >> fcntl(F_SETLK/F_GETLK,...), and POSIX read/write/sync/fsync are all > >> >> supported in cluster operations, such that in theory, SQLite3 should > >> >> be able to atomically lock the file (or a subset of page), modify > >> >> pages, flush the pages to gluster, then release the lock, and thus > >> >> satisfy the ACID property that SQLite3 appears to try to accomplish > on > >> >> a local filesystem. > >> >> > >> >> In a test we wrote that fires off 10 simple concurrernt SQL insert, > >> >> read, update loops, we d
Re: [Gluster-users] SQLite3 on 3 node cluster FS?
Raghavendra, I've commited my tests case to https://github.com/powool/gluster.git - it's grungy, and a work in progress, but I am happy to take change suggestions, especially if it will save folks significant time. For the rest, I'll reply inline below... On Mon, Mar 5, 2018 at 10:39 PM, Raghavendra Gowdappa wrote: > +Csaba. > > On Tue, Mar 6, 2018 at 2:52 AM, Paul Anderson wrote: >> >> Raghavendra, >> >> Thanks very much for your reply. >> >> I fixed our data corruption problem by disabling the volume >> performance.write-behind flag as you suggested, and simultaneously >> disabling caching in my client side mount command. > > > Good to know it worked. Can you give us the output of > # gluster volume info [root@node-1 /]# gluster volume info Volume Name: dockerstore Type: Replicate Volume ID: fb08b9f4-0784-4534-9ed3-e01ff71a0144 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 172.18.0.4:/data/glusterfs/store/dockerstore Brick2: 172.18.0.3:/data/glusterfs/store/dockerstore Brick3: 172.18.0.2:/data/glusterfs/store/dockerstore Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet locks.mandatory-locking: optimal performance.flush-behind: off performance.write-behind: off > > We would like to debug the problem in write-behind. Some questions: > > 1. What version of Glusterfs are you using? On the server nodes: [root@node-1 /]# gluster --version glusterfs 3.13.2 Repository revision: git://git.gluster.org/glusterfs.git On the docker container sqlite test node: root@b4055d8547d2:/# glusterfs --version glusterfs 3.8.8 built on Jan 11 2017 14:07:11 I recognize that version skew could be an issue. > 2. Were you able to figure out whether its stale data or metadata that is > causing the issue? I lean towards stale data based on the only real observation I have: While debugging, I put log messages in as to when the flock() is acquired, and when it is released. There is no instance where two different processes ever hold the same flock()'d file. From what I have read, the locks are considered metadata, and they appear to me to be working, so that's why I'm inclined to think stale data is the issue. > > There have been patches merged in write-behind in recent past and one in the > works which address metadata consistency. Would like to understand whether > you've run into any of the already identified issues. Agreed! Thanks, Paul > > regards, > Raghavendra >> >> >> In very modest testing, the flock() case appears to me to work well - >> before it would corrupt the db within a few transactions. >> >> Testing using built in sqlite3 locks is better (fcntl range locks), >> but has some behavioral issues (probably just requires query retry >> when the file is locked). I'll research this more, although the test >> case is not critical to our use case. >> >> There are no signs of O_DIRECT use in the sqlite3 code that I can see. >> >> I intend to set up tests that run much longer than a few minutes, to >> see if there are any longer term issues. Also, I want to experiment >> with data durability by killing various gluster server nodes during >> the tests. >> >> If anyone would like our test scripts, I can either tar them up and >> email them or put them in github - either is fine with me. (they rely >> on current builds of docker and docker-compose) >> >> Thanks again!! >> >> Paul >> >> On Mon, Mar 5, 2018 at 11:26 AM, Raghavendra Gowdappa >> wrote: >> > >> > >> > On Mon, Mar 5, 2018 at 8:21 PM, Paul Anderson wrote: >> >> >> >> Hi, >> >> >> >> tl;dr summary of below: flock() works, but what does it take to make >> >> sync()/fsync() work in a 3 node GFS cluster? >> >> >> >> I am under the impression that POSIX flock, POSIX >> >> fcntl(F_SETLK/F_GETLK,...), and POSIX read/write/sync/fsync are all >> >> supported in cluster operations, such that in theory, SQLite3 should >> >> be able to atomically lock the file (or a subset of page), modify >> >> pages, flush the pages to gluster, then release the lock, and thus >> >> satisfy the ACID property that SQLite3 appears to try to accomplish on >> >> a local filesystem. >> >> >> >> In a test we wrote that fires off 10 simple concurrernt SQL insert, >> >> read, update loops, we discovered that we at least need to use flock() >> >> around the SQLite3 db connection open/update/close to protect it. >> >> >> >> However, that is not enough - although from testing, it looks like >> >> flock() works as advertised across gluster mounted files, sync/fsync >> >> don't appear to, so we end up getting corruption in the SQLite3 file >> >> (pragma integrity_check generally will show a bunch of problems after >> >> a short test). >> >> >> >> Is what we're trying to do achievable? We're testing using the docker >> >> container gluster/gluster-centos as the three servers, with a php test >> >> inside of php-cli using filesystem mounts. If we mount the gluster FS >> >> via sapk/plugi
Re: [Gluster-users] SQLite3 on 3 node cluster FS?
On 2018-03-06, Amar Tumballi wrote: >> If anyone would like our test scripts, I can either tar them up and >> email them or put them in github - either is fine with me. (they rely >> on current builds of docker and docker-compose) >> >> > Sure, sharing the test cases makes it very easy for us to see what would be > the issue. I would recommend a github repo for the script. > > Regards, > Amar I'm also curious about the tests. Csaba ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] SQLite3 on 3 node cluster FS?
Adding csaba On Tue, Mar 6, 2018 at 9:09 AM, Raghavendra Gowdappa wrote: > +Csaba. > > On Tue, Mar 6, 2018 at 2:52 AM, Paul Anderson wrote: > >> Raghavendra, >> >> Thanks very much for your reply. >> >> I fixed our data corruption problem by disabling the volume >> performance.write-behind flag as you suggested, and simultaneously >> disabling caching in my client side mount command. >> > > Good to know it worked. Can you give us the output of > # gluster volume info > > We would like to debug the problem in write-behind. Some questions: > > 1. What version of Glusterfs are you using? > 2. Were you able to figure out whether its stale data or metadata that is > causing the issue? > > There have been patches merged in write-behind in recent past and one in > the works which address metadata consistency. Would like to understand > whether you've run into any of the already identified issues. > > regards, > Raghavendra > >> >> In very modest testing, the flock() case appears to me to work well - >> before it would corrupt the db within a few transactions. >> >> Testing using built in sqlite3 locks is better (fcntl range locks), >> but has some behavioral issues (probably just requires query retry >> when the file is locked). I'll research this more, although the test >> case is not critical to our use case. >> >> There are no signs of O_DIRECT use in the sqlite3 code that I can see. >> >> I intend to set up tests that run much longer than a few minutes, to >> see if there are any longer term issues. Also, I want to experiment >> with data durability by killing various gluster server nodes during >> the tests. >> >> If anyone would like our test scripts, I can either tar them up and >> email them or put them in github - either is fine with me. (they rely >> on current builds of docker and docker-compose) >> >> Thanks again!! >> >> Paul >> >> On Mon, Mar 5, 2018 at 11:26 AM, Raghavendra Gowdappa >> wrote: >> > >> > >> > On Mon, Mar 5, 2018 at 8:21 PM, Paul Anderson wrote: >> >> >> >> Hi, >> >> >> >> tl;dr summary of below: flock() works, but what does it take to make >> >> sync()/fsync() work in a 3 node GFS cluster? >> >> >> >> I am under the impression that POSIX flock, POSIX >> >> fcntl(F_SETLK/F_GETLK,...), and POSIX read/write/sync/fsync are all >> >> supported in cluster operations, such that in theory, SQLite3 should >> >> be able to atomically lock the file (or a subset of page), modify >> >> pages, flush the pages to gluster, then release the lock, and thus >> >> satisfy the ACID property that SQLite3 appears to try to accomplish on >> >> a local filesystem. >> >> >> >> In a test we wrote that fires off 10 simple concurrernt SQL insert, >> >> read, update loops, we discovered that we at least need to use flock() >> >> around the SQLite3 db connection open/update/close to protect it. >> >> >> >> However, that is not enough - although from testing, it looks like >> >> flock() works as advertised across gluster mounted files, sync/fsync >> >> don't appear to, so we end up getting corruption in the SQLite3 file >> >> (pragma integrity_check generally will show a bunch of problems after >> >> a short test). >> >> >> >> Is what we're trying to do achievable? We're testing using the docker >> >> container gluster/gluster-centos as the three servers, with a php test >> >> inside of php-cli using filesystem mounts. If we mount the gluster FS >> >> via sapk/plugin-gluster into the php-cli containers using docker, we >> >> seem to have better success sometimes, but I haven't figured out why, >> >> yet. >> >> >> >> I did see that I needed to set the server volume parameter >> >> 'performance.flush-behind off', otherwise it seems that flushes won't >> >> block as would be needed by SQLite3. >> > >> > >> > If you are relying on fsync this shouldn't matter as fsync makes sure >> data >> > is synced to disk. >> > >> >> >> >> Does anyone have any suggestions? Any words of widsom would be much >> >> appreciated. >> > >> > >> > Can you experiment with turning on/off various performance xlators? >> Based on >> > earlier issues, its likely that there is stale metadata which might be >> > causing the issue (not necessarily improper fsync behavior). I would >> suggest >> > turning off all performance xlators. You can refer [1] for a related >> > discussion. In theory the only perf xlator relevant for fsync is >> > write-behind and I am not aware of any issues where fsync is not >> working. >> > Does glusterfs log file has any messages complaining about writes or >> fsync >> > failing? Does your application use O_DIRECT? If yes, please note that >> you >> > need to turn the option performance.strict-o-direct on for write-behind >> to >> > honour O_DIRECT >> > >> > Also, is it possible to identify nature of corruption - Data or >> metadata? >> > More detailed explanation will help to RCA the issue. >> > >> > Also, is your application running on a single mount or from multiple >> mounts? >> > Can you collect strace of you
Re: [Gluster-users] SQLite3 on 3 node cluster FS?
+Csaba. On Tue, Mar 6, 2018 at 2:52 AM, Paul Anderson wrote: > Raghavendra, > > Thanks very much for your reply. > > I fixed our data corruption problem by disabling the volume > performance.write-behind flag as you suggested, and simultaneously > disabling caching in my client side mount command. > Good to know it worked. Can you give us the output of # gluster volume info We would like to debug the problem in write-behind. Some questions: 1. What version of Glusterfs are you using? 2. Were you able to figure out whether its stale data or metadata that is causing the issue? There have been patches merged in write-behind in recent past and one in the works which address metadata consistency. Would like to understand whether you've run into any of the already identified issues. regards, Raghavendra > > In very modest testing, the flock() case appears to me to work well - > before it would corrupt the db within a few transactions. > > Testing using built in sqlite3 locks is better (fcntl range locks), > but has some behavioral issues (probably just requires query retry > when the file is locked). I'll research this more, although the test > case is not critical to our use case. > > There are no signs of O_DIRECT use in the sqlite3 code that I can see. > > I intend to set up tests that run much longer than a few minutes, to > see if there are any longer term issues. Also, I want to experiment > with data durability by killing various gluster server nodes during > the tests. > > If anyone would like our test scripts, I can either tar them up and > email them or put them in github - either is fine with me. (they rely > on current builds of docker and docker-compose) > > Thanks again!! > > Paul > > On Mon, Mar 5, 2018 at 11:26 AM, Raghavendra Gowdappa > wrote: > > > > > > On Mon, Mar 5, 2018 at 8:21 PM, Paul Anderson wrote: > >> > >> Hi, > >> > >> tl;dr summary of below: flock() works, but what does it take to make > >> sync()/fsync() work in a 3 node GFS cluster? > >> > >> I am under the impression that POSIX flock, POSIX > >> fcntl(F_SETLK/F_GETLK,...), and POSIX read/write/sync/fsync are all > >> supported in cluster operations, such that in theory, SQLite3 should > >> be able to atomically lock the file (or a subset of page), modify > >> pages, flush the pages to gluster, then release the lock, and thus > >> satisfy the ACID property that SQLite3 appears to try to accomplish on > >> a local filesystem. > >> > >> In a test we wrote that fires off 10 simple concurrernt SQL insert, > >> read, update loops, we discovered that we at least need to use flock() > >> around the SQLite3 db connection open/update/close to protect it. > >> > >> However, that is not enough - although from testing, it looks like > >> flock() works as advertised across gluster mounted files, sync/fsync > >> don't appear to, so we end up getting corruption in the SQLite3 file > >> (pragma integrity_check generally will show a bunch of problems after > >> a short test). > >> > >> Is what we're trying to do achievable? We're testing using the docker > >> container gluster/gluster-centos as the three servers, with a php test > >> inside of php-cli using filesystem mounts. If we mount the gluster FS > >> via sapk/plugin-gluster into the php-cli containers using docker, we > >> seem to have better success sometimes, but I haven't figured out why, > >> yet. > >> > >> I did see that I needed to set the server volume parameter > >> 'performance.flush-behind off', otherwise it seems that flushes won't > >> block as would be needed by SQLite3. > > > > > > If you are relying on fsync this shouldn't matter as fsync makes sure > data > > is synced to disk. > > > >> > >> Does anyone have any suggestions? Any words of widsom would be much > >> appreciated. > > > > > > Can you experiment with turning on/off various performance xlators? > Based on > > earlier issues, its likely that there is stale metadata which might be > > causing the issue (not necessarily improper fsync behavior). I would > suggest > > turning off all performance xlators. You can refer [1] for a related > > discussion. In theory the only perf xlator relevant for fsync is > > write-behind and I am not aware of any issues where fsync is not working. > > Does glusterfs log file has any messages complaining about writes or > fsync > > failing? Does your application use O_DIRECT? If yes, please note that you > > need to turn the option performance.strict-o-direct on for write-behind > to > > honour O_DIRECT > > > > Also, is it possible to identify nature of corruption - Data or metadata? > > More detailed explanation will help to RCA the issue. > > > > Also, is your application running on a single mount or from multiple > mounts? > > Can you collect strace of your application (strace -ff -T -p -o > > )? If possible can you also collect fuse-dump using option > --dump-fuse > > while mounting glusterfs? > > > > [1] > > http://lists.gluster.org/pipermail/gluster-users/2018- > Februa
Re: [Gluster-users] SQLite3 on 3 node cluster FS?
Tough to do. Like in my case where you would have to install and use Plex. On March 5, 2018 4:19:23 PM PST, Amar Tumballi wrote: >> >> >> If anyone would like our test scripts, I can either tar them up and >> email them or put them in github - either is fine with me. (they rely >> on current builds of docker and docker-compose) >> >> >Sure, sharing the test cases makes it very easy for us to see what >would be >the issue. I would recommend a github repo for the script. > >Regards, >Amar -- Sent from my Android device with K-9 Mail. Please excuse my brevity.___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] SQLite3 on 3 node cluster FS?
> > > If anyone would like our test scripts, I can either tar them up and > email them or put them in github - either is fine with me. (they rely > on current builds of docker and docker-compose) > > Sure, sharing the test cases makes it very easy for us to see what would be the issue. I would recommend a github repo for the script. Regards, Amar ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] SQLite3 on 3 node cluster FS?
Raghavendra, Thanks very much for your reply. I fixed our data corruption problem by disabling the volume performance.write-behind flag as you suggested, and simultaneously disabling caching in my client side mount command. In very modest testing, the flock() case appears to me to work well - before it would corrupt the db within a few transactions. Testing using built in sqlite3 locks is better (fcntl range locks), but has some behavioral issues (probably just requires query retry when the file is locked). I'll research this more, although the test case is not critical to our use case. There are no signs of O_DIRECT use in the sqlite3 code that I can see. I intend to set up tests that run much longer than a few minutes, to see if there are any longer term issues. Also, I want to experiment with data durability by killing various gluster server nodes during the tests. If anyone would like our test scripts, I can either tar them up and email them or put them in github - either is fine with me. (they rely on current builds of docker and docker-compose) Thanks again!! Paul On Mon, Mar 5, 2018 at 11:26 AM, Raghavendra Gowdappa wrote: > > > On Mon, Mar 5, 2018 at 8:21 PM, Paul Anderson wrote: >> >> Hi, >> >> tl;dr summary of below: flock() works, but what does it take to make >> sync()/fsync() work in a 3 node GFS cluster? >> >> I am under the impression that POSIX flock, POSIX >> fcntl(F_SETLK/F_GETLK,...), and POSIX read/write/sync/fsync are all >> supported in cluster operations, such that in theory, SQLite3 should >> be able to atomically lock the file (or a subset of page), modify >> pages, flush the pages to gluster, then release the lock, and thus >> satisfy the ACID property that SQLite3 appears to try to accomplish on >> a local filesystem. >> >> In a test we wrote that fires off 10 simple concurrernt SQL insert, >> read, update loops, we discovered that we at least need to use flock() >> around the SQLite3 db connection open/update/close to protect it. >> >> However, that is not enough - although from testing, it looks like >> flock() works as advertised across gluster mounted files, sync/fsync >> don't appear to, so we end up getting corruption in the SQLite3 file >> (pragma integrity_check generally will show a bunch of problems after >> a short test). >> >> Is what we're trying to do achievable? We're testing using the docker >> container gluster/gluster-centos as the three servers, with a php test >> inside of php-cli using filesystem mounts. If we mount the gluster FS >> via sapk/plugin-gluster into the php-cli containers using docker, we >> seem to have better success sometimes, but I haven't figured out why, >> yet. >> >> I did see that I needed to set the server volume parameter >> 'performance.flush-behind off', otherwise it seems that flushes won't >> block as would be needed by SQLite3. > > > If you are relying on fsync this shouldn't matter as fsync makes sure data > is synced to disk. > >> >> Does anyone have any suggestions? Any words of widsom would be much >> appreciated. > > > Can you experiment with turning on/off various performance xlators? Based on > earlier issues, its likely that there is stale metadata which might be > causing the issue (not necessarily improper fsync behavior). I would suggest > turning off all performance xlators. You can refer [1] for a related > discussion. In theory the only perf xlator relevant for fsync is > write-behind and I am not aware of any issues where fsync is not working. > Does glusterfs log file has any messages complaining about writes or fsync > failing? Does your application use O_DIRECT? If yes, please note that you > need to turn the option performance.strict-o-direct on for write-behind to > honour O_DIRECT > > Also, is it possible to identify nature of corruption - Data or metadata? > More detailed explanation will help to RCA the issue. > > Also, is your application running on a single mount or from multiple mounts? > Can you collect strace of your application (strace -ff -T -p -o > )? If possible can you also collect fuse-dump using option --dump-fuse > while mounting glusterfs? > > [1] > http://lists.gluster.org/pipermail/gluster-users/2018-February/033503.html > >> >> Thanks, >> >> Paul >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users > > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] SQLite3 on 3 node cluster FS?
On Mon, Mar 5, 2018 at 8:21 PM, Paul Anderson wrote: > Hi, > > tl;dr summary of below: flock() works, but what does it take to make > sync()/fsync() work in a 3 node GFS cluster? > > I am under the impression that POSIX flock, POSIX > fcntl(F_SETLK/F_GETLK,...), and POSIX read/write/sync/fsync are all > supported in cluster operations, such that in theory, SQLite3 should > be able to atomically lock the file (or a subset of page), modify > pages, flush the pages to gluster, then release the lock, and thus > satisfy the ACID property that SQLite3 appears to try to accomplish on > a local filesystem. > > In a test we wrote that fires off 10 simple concurrernt SQL insert, > read, update loops, we discovered that we at least need to use flock() > around the SQLite3 db connection open/update/close to protect it. > > However, that is not enough - although from testing, it looks like > flock() works as advertised across gluster mounted files, sync/fsync > don't appear to, so we end up getting corruption in the SQLite3 file > (pragma integrity_check generally will show a bunch of problems after > a short test). > > Is what we're trying to do achievable? We're testing using the docker > container gluster/gluster-centos as the three servers, with a php test > inside of php-cli using filesystem mounts. If we mount the gluster FS > via sapk/plugin-gluster into the php-cli containers using docker, we > seem to have better success sometimes, but I haven't figured out why, > yet. > > I did see that I needed to set the server volume parameter > 'performance.flush-behind off', otherwise it seems that flushes won't > block as would be needed by SQLite3. > If you are relying on fsync this shouldn't matter as fsync makes sure data is synced to disk. > Does anyone have any suggestions? Any words of widsom would be much > appreciated. > Can you experiment with turning on/off various performance xlators? Based on earlier issues, its likely that there is stale metadata which might be causing the issue (not necessarily improper fsync behavior). I would suggest turning off all performance xlators. You can refer [1] for a related discussion. In theory the only perf xlator relevant for fsync is write-behind and I am not aware of any issues where fsync is not working. Does glusterfs log file has any messages complaining about writes or fsync failing? Does your application use O_DIRECT? If yes, please note that you need to turn the option performance.strict-o-direct on for write-behind to honour O_DIRECT Also, is it possible to identify nature of corruption - Data or metadata? More detailed explanation will help to RCA the issue. Also, is your application running on a single mount or from multiple mounts? Can you collect strace of your application (strace -ff -T -p -o )? If possible can you also collect fuse-dump using option --dump-fuse while mounting glusterfs? [1] http://lists.gluster.org/pipermail/gluster-users/2018-February/033503.html > Thanks, > > Paul > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users