Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Joseph Fernandes Sat, 22 Nov 2014 20:25:51 -0800

Thanks James

Please find the answers inline JOE>>

----- Original Message -----
From: "James K. Lowden" <[email protected]>
To: [email protected]
Sent: Saturday, November 22, 2014 10:29:01 PM
Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in 
Glusterfs

On Fri, 21 Nov 2014 14:01:39 -0500 (EST)
Joseph Fernandes <[email protected]> wrote:

> 4) Therefore, we are looking at a datastore that can give us a very
> quick write(almost zero latency, as the recording is done inline
> w.r.t file IO) and that as good data querying facilities(Slight
> latency in the read is fine but the fresh of that record data should
> be spot on).

This strikes me as a classic case for "record, then analyze".  I would
capture the data more cheaply and use SQLite to decide what to do.  

You don't really care if the recorded times are exactly right; a few
missed updates wouldn't affect the cold/hot status very much.  You
should be willing to lose a few if you improve write latency.  OTOH
the maintenance operation isn't *very* time critical; you just can't
afford to walk the whole tree first.  

JOE>> Agree! We can afford missing some updates w.r.t subsequent IO i.e
If a file is been written with data then we don't update the db for every
write that comes in. We have a timer and counter based approach that guides
the when the db should be updated for data IO. For Metadata IO we don't want
to miss the updates, as we also keep the record of hardlinks for a inode in 
the database (gluster internal requirement for data maintainer scanners). 

That suggests two possibilities for capture: 

1.  Keep a sequential file of {name,time} or {inode,time} pairs
(whichever is more convenient to use).  By using O_APPEND you get
atomic writes and perfect captures across threads.  fsync(2) as
desired.  

2.  If in practice the above file grows too large, use a primitive
hashing store such as BerkeleyDB to capture counts by name/inode.  It's
not even obvious you need an external store; you might be able to get
away with std::hash_map in C++ and periodically serialize that.  ISTM
you don't need to worry about concurrency because a few missed updates
here and there won't change much.  

At maintenance time, scoop the file into a SQLite table, and you're
back where you started, except you already have zero
write-time latency.  

JOE>>
1) Well we already have a journal log called "changelog" that does that for
us(records each modification as a log entry). 
We have plans of using it to feed the db. The major challenge with 
changelog is, its built to be crash consistent, as its used for geo-replication
feature in glusterfs i.e should not miss any update, as it needs to relay it 
later to the geo-replica,
this brings in performance issues in the IO path.
When compared with changelog, sqlite3(with WAL{Write-ahead-logging} which is 
similar to the 
logging which you suggest, but provided by sqlite itself) shows better 
performance
in the IO path.
There will be setups where geo-rep is not required and hence we need not enable 
the changelog,
hence we are planning to provide two options here
      a) Feed the DB(with WAL) directly via IO path (if geo-rep is off)
      b) Feed the DB via changelog(if geo-rep is on), because we don't want two 
kind of latency 
         hit the IO path, one from the DB update another from changelog.
2) Using the changelog to feed the db has another issue i.e freshness of data 
in the DB w.r.t the IO.
Few of our data maintainer scanners would require the freshness of the feed to 
be close to 
real. To have that we are planning to uses a in-memory view data-structure(like 
a LRU)
 that get updated through the IO path(in parallel to changelog which is updated 
independently),
and then using a seperate scheduled(frequently) notification-thread that would 
update the DB 
(the notifier is not blocked while broadcasting updates). Your thoughts on 
this. 
3) Now that we would use Sqlite3(with WAL) to be direcly feed by the IO path(in 
the absence of changelog)
we are looking to get the best performance from it. Crash consistency wouldn't 
be a major requirement
for now. But performance and freshness of data in the DB should be a spot-on. 

HTH.  

--jkl
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Reply via email to