Thanks James Please find the answers inline JOE>>
----- Original Message ----- From: "James K. Lowden" <jklow...@schemamania.org> To: sqlite-users@sqlite.org Sent: Saturday, November 22, 2014 10:29:01 PM Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs On Fri, 21 Nov 2014 14:01:39 -0500 (EST) Joseph Fernandes <josfe...@redhat.com> wrote: > 4) Therefore, we are looking at a datastore that can give us a very > quick write(almost zero latency, as the recording is done inline > w.r.t file IO) and that as good data querying facilities(Slight > latency in the read is fine but the fresh of that record data should > be spot on). This strikes me as a classic case for "record, then analyze". I would capture the data more cheaply and use SQLite to decide what to do. You don't really care if the recorded times are exactly right; a few missed updates wouldn't affect the cold/hot status very much. You should be willing to lose a few if you improve write latency. OTOH the maintenance operation isn't *very* time critical; you just can't afford to walk the whole tree first. JOE>> Agree! We can afford missing some updates w.r.t subsequent IO i.e If a file is been written with data then we don't update the db for every write that comes in. We have a timer and counter based approach that guides the when the db should be updated for data IO. For Metadata IO we don't want to miss the updates, as we also keep the record of hardlinks for a inode in the database (gluster internal requirement for data maintainer scanners). That suggests two possibilities for capture: 1. Keep a sequential file of {name,time} or {inode,time} pairs (whichever is more convenient to use). By using O_APPEND you get atomic writes and perfect captures across threads. fsync(2) as desired. 2. If in practice the above file grows too large, use a primitive hashing store such as BerkeleyDB to capture counts by name/inode. It's not even obvious you need an external store; you might be able to get away with std::hash_map in C++ and periodically serialize that. ISTM you don't need to worry about concurrency because a few missed updates here and there won't change much. At maintenance time, scoop the file into a SQLite table, and you're back where you started, except you already have zero write-time latency. JOE>> 1) Well we already have a journal log called "changelog" that does that for us(records each modification as a log entry). We have plans of using it to feed the db. The major challenge with changelog is, its built to be crash consistent, as its used for geo-replication feature in glusterfs i.e should not miss any update, as it needs to relay it later to the geo-replica, this brings in performance issues in the IO path. When compared with changelog, sqlite3(with WAL{Write-ahead-logging} which is similar to the logging which you suggest, but provided by sqlite itself) shows better performance in the IO path. There will be setups where geo-rep is not required and hence we need not enable the changelog, hence we are planning to provide two options here a) Feed the DB(with WAL) directly via IO path (if geo-rep is off) b) Feed the DB via changelog(if geo-rep is on), because we don't want two kind of latency hit the IO path, one from the DB update another from changelog. 2) Using the changelog to feed the db has another issue i.e freshness of data in the DB w.r.t the IO. Few of our data maintainer scanners would require the freshness of the feed to be close to real. To have that we are planning to uses a in-memory view data-structure(like a LRU) that get updated through the IO path(in parallel to changelog which is updated independently), and then using a seperate scheduled(frequently) notification-thread that would update the DB (the notifier is not blocked while broadcasting updates). Your thoughts on this. 3) Now that we would use Sqlite3(with WAL) to be direcly feed by the IO path(in the absence of changelog) we are looking to get the best performance from it. Crash consistency wouldn't be a major requirement for now. But performance and freshness of data in the DB should be a spot-on. HTH. --jkl _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users