Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs
On Sun, 23 Nov 2014 14:34:23 -0500 (EST) Joseph Fernandeswrote: > Ok I get it now. Yes we are using a single db connection object, But > few questions, > 1) how would making sqlite3 single thread that improve the > performance? Is there a special advantage in this mode than the > multithread one? The locking code isn't compiled and sqlite3 don't spend time checking them. The code in single thread compiled with multithread is minimal, but it's there wasting time. > 2) Will it not block (wait or give a SQLITE_BUSY) the multiple thread > of the app (in our case glusterfs IO threads) as now they have to > wait for the single thread to complete the task? Sqlite3 can manage only one writer (INSERT/UPDATE/DELETE) and multiple readers, the whole db is lock when a writer writes. Don't matter if sqlite3 is in multithread mode or not. You must implement in your sqlite3 thread owner, the control logic between writing threads. Your threads send data to sqlite3 thread and it decides, with your rules, which ones and in what order apply to db. When I did it, I found particular cases in my data management logic that make it faster, f.ex. batch writes sended by some threads to update the same row multiple times, so only the last one was applied or apply them in one transaction, which is faster than apply them one by one. > ~Joe --- --- Eduardo Morras ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs
Again asking the same question : a. Does WAL during an insert/update in the log file do any internal search/sort and then insert/update to the log or b. Just appends the WAL log with the incoming insert/update entry, thus keeping the writes sequential and during a checkpoint (manual or automatic) does the merging with the actual tables, which as you pointed out will have to search/sort causing alot of random disk access? c. Also we don't read the db in file IO path. The read are for scheduled data maintenance scanners, thus making the random read of disk a occasional event. Just trying to assess if my understanding of sqlite WAL is correct. ~Joe - Original Message - From: "Valentin Davydov" <sqlite-u...@soi.spb.ru> To: "General Discussion of SQLite Database" <sqlite-users@sqlite.org> Sent: Monday, November 24, 2014 4:37:12 PM Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs On Fri, Nov 21, 2014 at 02:01:39PM -0500, Joseph Fernandes wrote: > We wanted to known the following > 1) How could we improve the performance on the write side so that we have > minimal latency? > 2) Will ther be any write performance hit when the number of records in the > DB increase? Generally speaking, you have to do some work to arrange your data (modification times) in some ordered way. This work can be done eihter in advance, as you suggest, or on demand, as some people have already told you. But anyway this will eat up necessary resources, regardless of whose code would do it, either yours or SQLite. In practice (given magnetic disks as underlying storage), most scarce of the mentioned resources is rotational/seek latency, which detrimentally affects all disk operations of any scheduled priority. SQLite performs extensive random disk access (mostly reads) on most operation scenarios - selects, inserts, indexing etc. with possible exception of small updates of non-indexed data (which are accessed in a similar fashion by later selects). The only way to cope with the slow disk is keeping all necessary data somwhere else, for example, into the RAM cache. Of course, cache itself should be populated in advance to give this benefit, and, given current RAM prices, it seems not very feasible to steal available memory from smart applications in favour of dumb cache. Hope, this considerations will help you in tuning your code. Valentin Davydov. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs
- Original Message - From: "Nico Williams" <n...@cryptonector.com> To: "General Discussion of SQLite Database" <sqlite-users@sqlite.org> Sent: Monday, November 24, 2014 4:35:59 PM Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs On Sun, Nov 23, 2014 at 4:26 PM, James K. Lowden <jklow...@schemamania.org> wrote: > Metadata updates to Posix filesystems are seen as so costly that > fsync(2) on the datafile descriptor doesn't update them. A separate > sync on the directory is required. Compared to an in-memory update > (of metadata, in kernel space) and a single fsync call, the price of a > SQLite transaction is enormous, at a guess an order of magnitude more. This. Updates of mtime and atime in particular are expensive. Another problem (for Lustre-style clusters) is stat(), since it returns information that only a metadata service might know (e.g., file type, file ID (st_dev and st_ino), and link count) but also information that it might not (file size, various timestamps), which then presents enormous headaches for implementors. There are also write visibility rules as to stat() that some applications depend on... This is why "ls -l" can be slow on some such clusters. JOE>> Glusterfs is not replacing POSIX atime,mtime and ctime with this db (Sqlite is not a metadata db). i.e stat will always read from actual atime/mtime/ctime of the inode. Therefore as mention in [1] we don't do any db read operations in the file IO path. The read consumer of the db are data maintaining scanners that are scheduled, and which want a smarter way to find the "HOT" and "COLD" files in the cluster. Infact there will be a one db per storage unit(brick) of glusterfs therefore no centralized db. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs
- Original Message - From: "James K. Lowden" <jklow...@schemamania.org> To: sqlite-users@sqlite.org Sent: Monday, November 24, 2014 3:56:14 AM Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs On Sat, 22 Nov 2014 23:25:16 -0500 (EST) Joseph Fernandes <josfe...@redhat.com> wrote: > 2) Using the changelog to feed the db has another issue i.e freshness > of data in the DB w.r.t the IO. Few of our data maintainer scanners > would require the freshness of the feed to be close to real. [...] > Your thoughts on this. If your in-memory LRU structure suffices to describe all "hot" files, you're in good shape. Rather than dumping periodically, I would consider placing it in shared memory and write a virtual table function for it in SQLite, such that it can be queried directly as needed. To me based on your description your choice isn't how best to use SQLite in line with I/O, but how best to capture the data such that they can be aggregated with SQLite at time of update. That choice is one of two: 1) capture each I/O event in a sequential file, always appending, or 2) maintain per-file counts in a hash or map. Which is better depends on how much you're willing to pay for each I/O. By paying the lookup cost of #2 each time, the total space is smaller and the maintenance-time computation less. > 3) Now that we would use Sqlite3(with WAL) to be direcly feed by the > IO path(in the absence of changelog) we are looking to get the best > performance from it. Metadata updates to Posix filesystems are seen as so costly that fsync(2) on the datafile descriptor doesn't update them. A separate sync on the directory is required. Compared to an in-memory update (of metadata, in kernel space) and a single fsync call, the price of a SQLite transaction is enormous, at a guess an order of magnitude more. Bear in mind that WAL buys you not efficiency but consistency, the very thing you don't really need. The data are written sequentially to the log and then inserted into the table. You can expect no better than O(n log n) performance. Filesystems generally would never tolerate that, but for your application you'd be the judge. >> Ok few questions on the WAL journal mechanism, 1) As far as I understand(I may be wrong), During a insert or update WAL just records it sequentially in WAL file. And during a checkpoint (manual or automatic) a. Flush the in-memory appends in the WAL file b. and then Merges with the actually tables. And therefore check pointing takes a toll on the performance. But if I don't do check point often i.e set auto checkpoint to say 1 GB or 2 GB. Following would be the implications of doing so a. Read will be slow because now I will have a large amount of data in the log and it would take time to read and collate data from the log. We are fine with this in our usage case as we DON'T read (select queries) from database in the file IO path ever. These are done by data maintaining scanner which are scheduled. When tested with 1 million records and joining two tables and having the WAL log file to grow to 10 gb , it takes almost 1 min to retrieve 1 million records. Which is fine for data scanners as they the waited for hour-hours. b. we will occupy huge space for the WAL Log file. Here is the question does WAL during an insert/update in the log file do any internal search/sort and then insert/update to the log or just appends the WAL log with the incoming insert/update entry ? 2) Glusterfs is not replacing POSIX atime,mtime and ctime with this db (Sqlite is not a metadata db). i.e stat will always read from actual atime/mtime/ctime of the inode. Therefore as mention in [1] we don't do any db read operations in the file IO path. --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs
BTW, the experience with dedup is that doing things off-line means never catching up, while doing them online means going slow. You might cache as much as you can in memory then go slow when you miss the cache... In practice I think it's best to separate data and metadata devices so that you can make metadata as fast as you can, writing COW-style, like ZFS does, and caching in memory on clients and servers as much as possible for all data writes during the writes until they are committed. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs
My advice is to borrow from other clustered filesystems' experience. If you want to adhere to POSIX semantics then st_mtime and st_size visibility will be a particular headache, especially since you don't know when it's OK to lie (i.e., which callers of stat() are using st_mtime/st_size for synchronization). Ideally we'd split stat() into metastat() and datastat()... Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs
On Fri, Nov 21, 2014 at 02:01:39PM -0500, Joseph Fernandes wrote: > We wanted to known the following > 1) How could we improve the performance on the write side so that we have > minimal latency? > 2) Will ther be any write performance hit when the number of records in the > DB increase? Generally speaking, you have to do some work to arrange your data (modification times) in some ordered way. This work can be done eihter in advance, as you suggest, or on demand, as some people have already told you. But anyway this will eat up necessary resources, regardless of whose code would do it, either yours or SQLite. In practice (given magnetic disks as underlying storage), most scarce of the mentioned resources is rotational/seek latency, which detrimentally affects all disk operations of any scheduled priority. SQLite performs extensive random disk access (mostly reads) on most operation scenarios - selects, inserts, indexing etc. with possible exception of small updates of non-indexed data (which are accessed in a similar fashion by later selects). The only way to cope with the slow disk is keeping all necessary data somwhere else, for example, into the RAM cache. Of course, cache itself should be populated in advance to give this benefit, and, given current RAM prices, it seems not very feasible to steal available memory from smart applications in favour of dumb cache. Hope, this considerations will help you in tuning your code. Valentin Davydov. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs
On Sun, Nov 23, 2014 at 4:26 PM, James K. Lowdenwrote: > Metadata updates to Posix filesystems are seen as so costly that > fsync(2) on the datafile descriptor doesn't update them. A separate > sync on the directory is required. Compared to an in-memory update > (of metadata, in kernel space) and a single fsync call, the price of a > SQLite transaction is enormous, at a guess an order of magnitude more. This. Updates of mtime and atime in particular are expensive. Another problem (for Lustre-style clusters) is stat(), since it returns information that only a metadata service might know (e.g., file type, file ID (st_dev and st_ino), and link count) but also information that it might not (file size, various timestamps), which then presents enormous headaches for implementors. There are also write visibility rules as to stat() that some applications depend on... This is why "ls -l" can be slow on some such clusters. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs
On Sat, 22 Nov 2014 23:25:16 -0500 (EST) Joseph Fernandeswrote: > 2) Using the changelog to feed the db has another issue i.e freshness > of data in the DB w.r.t the IO. Few of our data maintainer scanners > would require the freshness of the feed to be close to real. [...] > Your thoughts on this. If your in-memory LRU structure suffices to describe all "hot" files, you're in good shape. Rather than dumping periodically, I would consider placing it in shared memory and write a virtual table function for it in SQLite, such that it can be queried directly as needed. To me based on your description your choice isn't how best to use SQLite in line with I/O, but how best to capture the data such that they can be aggregated with SQLite at time of update. That choice is one of two: 1) capture each I/O event in a sequential file, always appending, or 2) maintain per-file counts in a hash or map. Which is better depends on how much you're willing to pay for each I/O. By paying the lookup cost of #2 each time, the total space is smaller and the maintenance-time computation less. > 3) Now that we would use Sqlite3(with WAL) to be direcly feed by the > IO path(in the absence of changelog) we are looking to get the best > performance from it. Metadata updates to Posix filesystems are seen as so costly that fsync(2) on the datafile descriptor doesn't update them. A separate sync on the directory is required. Compared to an in-memory update (of metadata, in kernel space) and a single fsync call, the price of a SQLite transaction is enormous, at a guess an order of magnitude more. Bear in mind that WAL buys you not efficiency but consistency, the very thing you don't really need. The data are written sequentially to the log and then inserted into the table. You can expect no better than O(n log n) performance. Filesystems generally would never tolerate that, but for your application you'd be the judge. --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs
Ok I get it now. Yes we are using a single db connection object, But few questions, 1) how would making sqlite3 single thread that improve the performance? Is there a special advantage in this mode than the multithread one? 2) Will it not block (wait or give a SQLITE_BUSY) the multiple thread of the app (in our case glusterfs IO threads) as now they have to wait for the single thread to complete the task? ~Joe - Original Message - From: "Eduardo Morras" <emorr...@yahoo.es> To: sqlite-users@sqlite.org Sent: Monday, November 24, 2014 12:29:33 AM Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs On Sat, 22 Nov 2014 11:01:45 -0500 (EST) Joseph Fernandes <josfe...@redhat.com> wrote: > Thanks Eduardo. > > Answers inline JOE>> > > a)If you use Sqlite in single thread, compile it without thread > support. > JOE>> We are running it in multithread mode, as the database will be > JOE>> fed by multiple File IO thread. Yes and no, your app can be multithread, but if only one thread has the sqlite3 pointer, you can use single thread sqlite3. This way, you have multiple working threads and one sqlite3 dedicated thread that recives the data to write and the data queries. --- --- Eduardo Morras <emorr...@yahoo.es> ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs
On Sat, 22 Nov 2014 11:01:45 -0500 (EST) Joseph Fernandeswrote: > Thanks Eduardo. > > Answers inline JOE>> > > a)If you use Sqlite in single thread, compile it without thread > support. > JOE>> We are running it in multithread mode, as the database will be > JOE>> fed by multiple File IO thread. Yes and no, your app can be multithread, but if only one thread has the sqlite3 pointer, you can use single thread sqlite3. This way, you have multiple working threads and one sqlite3 dedicated thread that recives the data to write and the data queries. --- --- Eduardo Morras ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs
Thanks James Please find the answers inline JOE>> - Original Message - From: "James K. Lowden" <jklow...@schemamania.org> To: sqlite-users@sqlite.org Sent: Saturday, November 22, 2014 10:29:01 PM Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs On Fri, 21 Nov 2014 14:01:39 -0500 (EST) Joseph Fernandes <josfe...@redhat.com> wrote: > 4) Therefore, we are looking at a datastore that can give us a very > quick write(almost zero latency, as the recording is done inline > w.r.t file IO) and that as good data querying facilities(Slight > latency in the read is fine but the fresh of that record data should > be spot on). This strikes me as a classic case for "record, then analyze". I would capture the data more cheaply and use SQLite to decide what to do. You don't really care if the recorded times are exactly right; a few missed updates wouldn't affect the cold/hot status very much. You should be willing to lose a few if you improve write latency. OTOH the maintenance operation isn't *very* time critical; you just can't afford to walk the whole tree first. JOE>> Agree! We can afford missing some updates w.r.t subsequent IO i.e If a file is been written with data then we don't update the db for every write that comes in. We have a timer and counter based approach that guides the when the db should be updated for data IO. For Metadata IO we don't want to miss the updates, as we also keep the record of hardlinks for a inode in the database (gluster internal requirement for data maintainer scanners). That suggests two possibilities for capture: 1. Keep a sequential file of {name,time} or {inode,time} pairs (whichever is more convenient to use). By using O_APPEND you get atomic writes and perfect captures across threads. fsync(2) as desired. 2. If in practice the above file grows too large, use a primitive hashing store such as BerkeleyDB to capture counts by name/inode. It's not even obvious you need an external store; you might be able to get away with std::hash_map in C++ and periodically serialize that. ISTM you don't need to worry about concurrency because a few missed updates here and there won't change much. At maintenance time, scoop the file into a SQLite table, and you're back where you started, except you already have zero write-time latency. JOE>> 1) Well we already have a journal log called "changelog" that does that for us(records each modification as a log entry). We have plans of using it to feed the db. The major challenge with changelog is, its built to be crash consistent, as its used for geo-replication feature in glusterfs i.e should not miss any update, as it needs to relay it later to the geo-replica, this brings in performance issues in the IO path. When compared with changelog, sqlite3(with WAL{Write-ahead-logging} which is similar to the logging which you suggest, but provided by sqlite itself) shows better performance in the IO path. There will be setups where geo-rep is not required and hence we need not enable the changelog, hence we are planning to provide two options here a) Feed the DB(with WAL) directly via IO path (if geo-rep is off) b) Feed the DB via changelog(if geo-rep is on), because we don't want two kind of latency hit the IO path, one from the DB update another from changelog. 2) Using the changelog to feed the db has another issue i.e freshness of data in the DB w.r.t the IO. Few of our data maintainer scanners would require the freshness of the feed to be close to real. To have that we are planning to uses a in-memory view data-structure(like a LRU) that get updated through the IO path(in parallel to changelog which is updated independently), and then using a seperate scheduled(frequently) notification-thread that would update the DB (the notifier is not blocked while broadcasting updates). Your thoughts on this. 3) Now that we would use Sqlite3(with WAL) to be direcly feed by the IO path(in the absence of changelog) we are looking to get the best performance from it. Crash consistency wouldn't be a major requirement for now. But performance and freshness of data in the DB should be a spot-on. HTH. --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs
On Fri, 21 Nov 2014 14:01:39 -0500 (EST) Joseph Fernandeswrote: > 4) Therefore, we are looking at a datastore that can give us a very > quick write(almost zero latency, as the recording is done inline > w.r.t file IO) and that as good data querying facilities(Slight > latency in the read is fine but the fresh of that record data should > be spot on). This strikes me as a classic case for "record, then analyze". I would capture the data more cheaply and use SQLite to decide what to do. You don't really care if the recorded times are exactly right; a few missed updates wouldn't affect the cold/hot status very much. You should be willing to lose a few if you improve write latency. OTOH the maintenance operation isn't *very* time critical; you just can't afford to walk the whole tree first. That suggests two possibilities for capture: 1. Keep a sequential file of {name,time} or {inode,time} pairs (whichever is more convenient to use). By using O_APPEND you get atomic writes and perfect captures across threads. fsync(2) as desired. 2. If in practice the above file grows too large, use a primitive hashing store such as BerkeleyDB to capture counts by name/inode. It's not even obvious you need an external store; you might be able to get away with std::hash_map in C++ and periodically serialize that. ISTM you don't need to worry about concurrency because a few missed updates here and there won't change much. At maintenance time, scoop the file into a SQLite table, and you're back where you started, except you already have zero write-time latency. HTH. --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs
Thanks Eduardo. Answers inline JOE>> - Original Message - From: "Eduardo Morras" <emorr...@yahoo.es> To: sqlite-users@sqlite.org Sent: Saturday, November 22, 2014 9:20:46 PM Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs On Fri, 21 Nov 2014 14:01:39 -0500 (EST) Joseph Fernandes <josfe...@redhat.com> wrote: > Hi There, > > 1) We are trying to use sqlite3 in Glusterfs as a Change Time > Recording Data Store i.e using sqlite3 db to record any modification > attempts that happens on the glusterfs per file inode, So that it can > be used to indicate the hotness of the file in the cluster. > > 2) We have developed a prototype that inserts/updates inode records > in sqlite3 db(one instance of the db file per glusterfs storage unit > or as we call it a brick) to record the modification that are > happening on the inode. These insert/updates are in the IO path of > the file i.e any data or metadata change time will be record inline, > w.r.t the IO, in the DB. > > 3) We only save the modification time on the to the db, so that > we can easily query for "What files have change during a specific > period of time" or "What files have not change during a specific > period of time". The query is done by a scanner that runner on each > storage unit or brick on a regular interval and chooses HOT or COLD > files as suggested by the DB for data maintenance operations. > > 4) Therefore, we are looking at a datastore that can give us a very > quick write(almost zero latency, as the recording is done inline > w.r.t file IO) and that as good data querying facilities(Slight > latency in the read is fine but the fresh of that record data should > be spot on). > > 5) Please find our DB setting, > Journal mode : WAL > SYNC MODE : NORMAL > Cache Size : 1000 - 4096 Pages (Default Page Size) > AutoCheck-Pointing: 1000 - 100 Pages > > We wanted to known the following > 1) How could we improve the performance on the write side so that we > have minimal latency? I'll wait for the ota extension Mr. Hipps is developing. It will (If I Understood Correctly) reorder the commit operations to get faster IO throughput. Some tricks: a)If you use Sqlite in single thread, compile it without thread support. JOE>> We are running it in multithread mode, as the database will be fed by multiple File IO thread. b)Adjust the page size to fit in a hard disk cluster size (8KB IIRC) or the multiply of it that allows integer number of rows fits in. c)Disable autovacuum JOE>> Sure will try this out in my tests > 2) Will ther be any write performance hit when the number of records > in the DB increase? > > Thanks in advance > > ~Joe --- --- Eduardo Morras <emorr...@yahoo.es> ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs
On Fri, 21 Nov 2014 14:01:39 -0500 (EST) Joseph Fernandeswrote: > Hi There, > > 1) We are trying to use sqlite3 in Glusterfs as a Change Time > Recording Data Store i.e using sqlite3 db to record any modification > attempts that happens on the glusterfs per file inode, So that it can > be used to indicate the hotness of the file in the cluster. > > 2) We have developed a prototype that inserts/updates inode records > in sqlite3 db(one instance of the db file per glusterfs storage unit > or as we call it a brick) to record the modification that are > happening on the inode. These insert/updates are in the IO path of > the file i.e any data or metadata change time will be record inline, > w.r.t the IO, in the DB. > > 3) We only save the modification time on the to the db, so that > we can easily query for "What files have change during a specific > period of time" or "What files have not change during a specific > period of time". The query is done by a scanner that runner on each > storage unit or brick on a regular interval and chooses HOT or COLD > files as suggested by the DB for data maintenance operations. > > 4) Therefore, we are looking at a datastore that can give us a very > quick write(almost zero latency, as the recording is done inline > w.r.t file IO) and that as good data querying facilities(Slight > latency in the read is fine but the fresh of that record data should > be spot on). > > 5) Please find our DB setting, > Journal mode : WAL > SYNC MODE : NORMAL > Cache Size : 1000 - 4096 Pages (Default Page Size) > AutoCheck-Pointing: 1000 - 100 Pages > > We wanted to known the following > 1) How could we improve the performance on the write side so that we > have minimal latency? I'll wait for the ota extension Mr. Hipps is developing. It will (If I Understood Correctly) reorder the commit operations to get faster IO throughput. Some tricks: a)If you use Sqlite in single thread, compile it without thread support. b)Adjust the page size to fit in a hard disk cluster size (8KB IIRC) or the multiply of it that allows integer number of rows fits in. c)Disable autovacuum > 2) Will ther be any write performance hit when the number of records > in the DB increase? > > Thanks in advance > > ~Joe --- --- Eduardo Morras ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs
As any Filesystem glusterfs also has a atime,mtime and ctime. But we are trying to optimize the search of file that are "HOT" or "COLD" using sqlite3, As a filesystem walk would kill the scanners(which are doing data maintenance) ~Joe - Original Message - From: "Keith Medcalf" <kmedc...@dessus.com> To: "General Discussion of SQLite Database" <sqlite-users@sqlite.org> Sent: Saturday, November 22, 2014 9:02:30 PM Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs Your Glusterfs does not store modification times? Have you considered adding the modification time attribute to the inode directly? --- Theory is when you know everything but nothing works. Practice is when everything works but no one knows why. Sometimes theory and practice are combined: nothing works and no one knows why. >-Original Message- >From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users- >boun...@sqlite.org] On Behalf Of Joseph Fernandes >Sent: Friday, 21 November, 2014 12:02 >To: sqlite-users@sqlite.org >Subject: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in >Glusterfs > >Hi There, > >1) We are trying to use sqlite3 in Glusterfs as a Change Time Recording >Data Store >i.e using sqlite3 db to record any modification attempts that happens on >the glusterfs >per file inode, So that it can be used to indicate the hotness of the >file in the cluster. > >2) We have developed a prototype that inserts/updates inode records in >sqlite3 db(one instance of the >db file per glusterfs storage unit or as we call it a brick) to record >the modification >that are happening on the inode. These insert/updates are in the IO path >of the file i.e any data or >metadata change time will be record inline, w.r.t the IO, in the DB. > >3) We only save the modification time on the to the db, so that >we can easily query for "What files have change during a specific period >of time" or >"What files have not change during a specific period of time". The query >is done by a scanner >that runner on each storage unit or brick on a regular interval and >chooses HOT or COLD files as suggested >by the DB for data maintenance operations. > >4) Therefore, we are looking at a datastore that can give us a very quick >write(almost zero latency, >as the recording is done inline w.r.t file IO) and that as good data >querying >facilities(Slight latency in the read is fine but the fresh of that >record data should be spot on). > >5) Please find our DB setting, >Journal mode : WAL >SYNC MODE : NORMAL >Cache Size : 1000 - 4096 Pages (Default Page Size) >AutoCheck-Pointing: 1000 - 100 Pages > >We wanted to known the following >1) How could we improve the performance on the write side so that we have >minimal latency? >2) Will ther be any write performance hit when the number of records in >the DB increase? > >Thanks in advance > >~Joe >___ >sqlite-users mailing list >sqlite-users@sqlite.org >http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs
Your Glusterfs does not store modification times? Have you considered adding the modification time attribute to the inode directly? --- Theory is when you know everything but nothing works. Practice is when everything works but no one knows why. Sometimes theory and practice are combined: nothing works and no one knows why. >-Original Message- >From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users- >boun...@sqlite.org] On Behalf Of Joseph Fernandes >Sent: Friday, 21 November, 2014 12:02 >To: sqlite-users@sqlite.org >Subject: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in >Glusterfs > >Hi There, > >1) We are trying to use sqlite3 in Glusterfs as a Change Time Recording >Data Store >i.e using sqlite3 db to record any modification attempts that happens on >the glusterfs >per file inode, So that it can be used to indicate the hotness of the >file in the cluster. > >2) We have developed a prototype that inserts/updates inode records in >sqlite3 db(one instance of the >db file per glusterfs storage unit or as we call it a brick) to record >the modification >that are happening on the inode. These insert/updates are in the IO path >of the file i.e any data or >metadata change time will be record inline, w.r.t the IO, in the DB. > >3) We only save the modification time on the to the db, so that >we can easily query for "What files have change during a specific period >of time" or >"What files have not change during a specific period of time". The query >is done by a scanner >that runner on each storage unit or brick on a regular interval and >chooses HOT or COLD files as suggested >by the DB for data maintenance operations. > >4) Therefore, we are looking at a datastore that can give us a very quick >write(almost zero latency, >as the recording is done inline w.r.t file IO) and that as good data >querying >facilities(Slight latency in the read is fine but the fresh of that >record data should be spot on). > >5) Please find our DB setting, >Journal mode : WAL >SYNC MODE : NORMAL >Cache Size : 1000 - 4096 Pages (Default Page Size) >AutoCheck-Pointing: 1000 - 100 Pages > >We wanted to known the following >1) How could we improve the performance on the write side so that we have >minimal latency? >2) Will ther be any write performance hit when the number of records in >the DB increase? > >Thanks in advance > >~Joe >___ >sqlite-users mailing list >sqlite-users@sqlite.org >http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users