subject:"Re\: \[sqlite\] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs"

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-25 Thread Eduardo Morras

On Sun, 23 Nov 2014 14:34:23 -0500 (EST)
Joseph Fernandes  wrote:

> Ok I get it now. Yes we are using a single db connection object, But
> few questions, 
> 1) how would making sqlite3 single thread that improve the
> performance? Is there a special advantage in this mode than the
> multithread one?

The locking code isn't compiled and sqlite3 don't spend time checking them. The 
code in single thread compiled with multithread is minimal, but it's there 
wasting time.

> 2) Will it not block (wait or give a SQLITE_BUSY) the multiple thread
> of the app (in our case glusterfs IO threads) as now they have to
> wait for the single thread to complete the task?

Sqlite3 can manage only one writer (INSERT/UPDATE/DELETE) and multiple readers, 
the whole db is lock when a writer writes. Don't matter if sqlite3 is in 
multithread mode or not.

You must implement in your sqlite3 thread owner, the control logic between 
writing threads. Your threads send data to sqlite3 thread and it decides, with 
your rules, which ones and in what order apply to db.

When I did it, I found particular cases in my data management logic that make 
it faster, f.ex. batch writes sended by some threads to update the same row 
multiple times, so only the last one was applied or apply them in one 
transaction, which is faster than apply them one by one.

> ~Joe

---   ---
Eduardo Morras 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Joseph Fernandes

Again asking the same question :
 a. Does WAL during an insert/update in the log file do any internal 
search/sort 
 and then insert/update to the log or 
 b. Just appends the WAL log with the incoming insert/update entry, thus 
keeping the
writes sequential and during a checkpoint (manual or automatic) does the 
merging with the actual tables, 
which as you pointed out will have to search/sort causing alot of random 
disk access? 
 c. Also we don't read the db in file IO path. The read are for scheduled data 
maintenance scanners,
thus making the random read of disk a occasional event.   
 Just trying to assess if my understanding of sqlite WAL is correct.

~Joe 

- Original Message -
From: "Valentin Davydov" <sqlite-u...@soi.spb.ru>
To: "General Discussion of SQLite Database" <sqlite-users@sqlite.org>
Sent: Monday, November 24, 2014 4:37:12 PM
Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in
Glusterfs

On Fri, Nov 21, 2014 at 02:01:39PM -0500, Joseph Fernandes wrote:

> We wanted to known the following
> 1) How could we improve the performance on the write side so that we have 
> minimal latency?
> 2) Will ther be any write performance hit when the number of records in the 
> DB increase?

Generally speaking, you have to do some work to arrange your data (modification 
times) in some ordered way. This work can be done eihter in advance, as you 
suggest, or on demand, as some people have already told you. But anyway this 
will eat up necessary resources, regardless of whose code would do it, either
yours or SQLite. In practice (given magnetic disks as underlying storage), 
most scarce of the mentioned resources is rotational/seek latency, which 
detrimentally affects all disk operations of any scheduled priority. SQLite 
performs extensive random disk access (mostly reads) on most operation 
scenarios - selects, inserts, indexing etc. with possible exception of 
small updates of non-indexed data (which are accessed in a similar fashion 
by later selects). The only way to cope with the slow disk is keeping all 
necessary data somwhere else, for example, into the RAM cache. Of course, 
cache itself should be populated in advance to give this benefit, and, given 
current RAM prices, it seems not very feasible to steal available memory from 
smart applications in favour of dumb cache.

Hope, this considerations will help you in tuning your code.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Joseph Fernandes

- Original Message -
From: "Nico Williams" <n...@cryptonector.com>
To: "General Discussion of SQLite Database" <sqlite-users@sqlite.org>
Sent: Monday, November 24, 2014 4:35:59 PM
Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in
Glusterfs

On Sun, Nov 23, 2014 at 4:26 PM, James K. Lowden
<jklow...@schemamania.org> wrote:
> Metadata updates to Posix filesystems are seen as so costly that
> fsync(2) on the datafile descriptor doesn't update them.  A separate
> sync on the directory is required.  Compared to an in-memory update
> (of metadata, in kernel space) and a single fsync call, the price of a
> SQLite transaction is enormous, at a guess an order of magnitude more.

This.  Updates of mtime and atime in particular are expensive.

Another problem (for Lustre-style clusters) is stat(), since it
returns information that only a metadata service might know (e.g.,
file type, file ID (st_dev and st_ino), and link count) but also
information that it might not (file size, various timestamps), which
then presents enormous headaches for implementors.  There are also
write visibility rules as to stat() that some applications depend
on...  This is why "ls -l" can be slow on some such clusters.

JOE>>
Glusterfs is not replacing POSIX atime,mtime and ctime with this db (Sqlite is 
not a metadata db). i.e
stat will always read from actual atime/mtime/ctime of the inode. Therefore as 
mention in [1]
we don't do any db read operations in the file IO path. The read consumer of 
the db are data maintaining
scanners that are scheduled, and which want a smarter way to find the "HOT" and 
"COLD" files in the cluster.
Infact there will be a one db per storage unit(brick) of glusterfs therefore no 
centralized db.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Joseph Fernandes

- Original Message -
From: "James K. Lowden" <jklow...@schemamania.org>
To: sqlite-users@sqlite.org
Sent: Monday, November 24, 2014 3:56:14 AM
Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in 
Glusterfs

On Sat, 22 Nov 2014 23:25:16 -0500 (EST)
Joseph Fernandes <josfe...@redhat.com> wrote:

> 2) Using the changelog to feed the db has another issue i.e freshness
> of data in the DB w.r.t the IO. Few of our data maintainer scanners
> would require the freshness of the feed to be close to real. [...]
> Your thoughts on this. 

If your in-memory LRU structure suffices to describe all "hot" files,
you're in good shape.  Rather than dumping periodically, I would
consider placing it in shared memory and write a virtual table
function for it in SQLite, such that it can be queried directly as
needed.  

To me based on your description your choice isn't how best to use
SQLite in line with I/O, but how best to capture the data such that
they can be aggregated with SQLite at time of update.  That choice is
one of two: 1) capture each I/O event in a sequential file, always
appending, or 2) maintain per-file counts in a hash or map.  Which is
better depends on how much you're willing to pay for each I/O.  By
paying the lookup cost of #2 each time, the total space is smaller and
the maintenance-time computation less.  

> 3) Now that we would use Sqlite3(with WAL) to be direcly feed by the
> IO path(in the absence of changelog) we are looking to get the best
> performance from it. 

Metadata updates to Posix filesystems are seen as so costly that 
fsync(2) on the datafile descriptor doesn't update them.  A separate
sync on the directory is required.  Compared to an in-memory update
(of metadata, in kernel space) and a single fsync call, the price of a
SQLite transaction is enormous, at a guess an order of magnitude more.
Bear in mind that WAL buys you not efficiency but consistency, the very
thing you don't really need.  The data are written sequentially to the
log and then inserted into the table.  You can expect no better than
O(n log n) performance.  Filesystems generally would never tolerate
that, but for your application you'd be the judge. 

>> Ok few questions on the WAL journal mechanism, 
1) As far as I understand(I may be wrong), During a insert or update WAL
 just records it sequentially in WAL file. And during a checkpoint (manual or 
automatic)
 a. Flush the in-memory appends in the WAL file
 b. and then Merges with the actually tables. 
 And therefore check pointing takes a toll on the performance. But if I don't 
 do check point often i.e set auto checkpoint to say 1 GB or 2 GB.
 Following would be the implications of doing so
 a. Read will be slow because now I will have a large amount of data in the log 
and it would take time to read and collate data from the log. We are fine 
with
this in our usage case as we DON'T read (select queries) from database in 
the file IO path ever.
These are done by data maintaining scanner which are scheduled. When tested 
with 1 million records
and joining two tables and having the WAL log file to grow to 10 gb , 
it takes almost 1 min to retrieve 1 million records. Which is fine for data 
scanners as they
the waited for hour-hours.
 b. we will occupy huge space for the WAL Log file.

 Here is the question does WAL during an insert/update in the log file do any 
internal search/sort 
 and then insert/update to the log or just appends the WAL log with the 
incoming insert/update entry ?

2) Glusterfs is not replacing POSIX atime,mtime and ctime with this db (Sqlite 
is not a metadata db). i.e
   stat will always read from actual atime/mtime/ctime of the inode. Therefore 
as mention in [1]
   we don't do any db read operations in the file IO path.

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Nico Williams

BTW, the experience with dedup is that doing things off-line means
never catching up, while doing them online means going slow.

You might cache as much as you can in memory then go slow when you
miss the cache...

In practice I think it's best to separate data and metadata devices so
that you can make metadata as fast as you can, writing COW-style, like
ZFS does, and caching in memory on clients and servers as much as
possible for all data writes during the writes until they are
committed.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Nico Williams

My advice is to borrow from other clustered filesystems' experience.

If you want to adhere to POSIX semantics then st_mtime and st_size
visibility will be a particular headache, especially since you don't
know when it's OK to lie (i.e., which callers of stat() are using
st_mtime/st_size for synchronization).

Ideally we'd split stat() into metastat() and datastat()...

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Valentin Davydov

On Fri, Nov 21, 2014 at 02:01:39PM -0500, Joseph Fernandes wrote:

> We wanted to known the following
> 1) How could we improve the performance on the write side so that we have 
> minimal latency?
> 2) Will ther be any write performance hit when the number of records in the 
> DB increase?

Generally speaking, you have to do some work to arrange your data (modification 
times) in some ordered way. This work can be done eihter in advance, as you 
suggest, or on demand, as some people have already told you. But anyway this 
will eat up necessary resources, regardless of whose code would do it, either
yours or SQLite. In practice (given magnetic disks as underlying storage), 
most scarce of the mentioned resources is rotational/seek latency, which 
detrimentally affects all disk operations of any scheduled priority. SQLite 
performs extensive random disk access (mostly reads) on most operation 
scenarios - selects, inserts, indexing etc. with possible exception of 
small updates of non-indexed data (which are accessed in a similar fashion 
by later selects). The only way to cope with the slow disk is keeping all 
necessary data somwhere else, for example, into the RAM cache. Of course, 
cache itself should be populated in advance to give this benefit, and, given 
current RAM prices, it seems not very feasible to steal available memory from 
smart applications in favour of dumb cache.

Hope, this considerations will help you in tuning your code.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Nico Williams

On Sun, Nov 23, 2014 at 4:26 PM, James K. Lowden
 wrote:
> Metadata updates to Posix filesystems are seen as so costly that
> fsync(2) on the datafile descriptor doesn't update them.  A separate
> sync on the directory is required.  Compared to an in-memory update
> (of metadata, in kernel space) and a single fsync call, the price of a
> SQLite transaction is enormous, at a guess an order of magnitude more.

This.  Updates of mtime and atime in particular are expensive.

Another problem (for Lustre-style clusters) is stat(), since it
returns information that only a metadata service might know (e.g.,
file type, file ID (st_dev and st_ino), and link count) but also
information that it might not (file size, various timestamps), which
then presents enormous headaches for implementors.  There are also
write visibility rules as to stat() that some applications depend
on...  This is why "ls -l" can be slow on some such clusters.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-23 Thread James K. Lowden

On Sat, 22 Nov 2014 23:25:16 -0500 (EST)
Joseph Fernandes  wrote:

> 2) Using the changelog to feed the db has another issue i.e freshness
> of data in the DB w.r.t the IO. Few of our data maintainer scanners
> would require the freshness of the feed to be close to real. [...]
> Your thoughts on this. 

If your in-memory LRU structure suffices to describe all "hot" files,
you're in good shape.  Rather than dumping periodically, I would
consider placing it in shared memory and write a virtual table
function for it in SQLite, such that it can be queried directly as
needed.  

To me based on your description your choice isn't how best to use
SQLite in line with I/O, but how best to capture the data such that
they can be aggregated with SQLite at time of update.  That choice is
one of two: 1) capture each I/O event in a sequential file, always
appending, or 2) maintain per-file counts in a hash or map.  Which is
better depends on how much you're willing to pay for each I/O.  By
paying the lookup cost of #2 each time, the total space is smaller and
the maintenance-time computation less.  

> 3) Now that we would use Sqlite3(with WAL) to be direcly feed by the
> IO path(in the absence of changelog) we are looking to get the best
> performance from it. 

Metadata updates to Posix filesystems are seen as so costly that 
fsync(2) on the datafile descriptor doesn't update them.  A separate
sync on the directory is required.  Compared to an in-memory update
(of metadata, in kernel space) and a single fsync call, the price of a
SQLite transaction is enormous, at a guess an order of magnitude more.
Bear in mind that WAL buys you not efficiency but consistency, the very
thing you don't really need.  The data are written sequentially to the
log and then inserted into the table.  You can expect no better than
O(n log n) performance.  Filesystems generally would never tolerate
that, but for your application you'd be the judge. 

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-23 Thread Joseph Fernandes

Ok I get it now. Yes we are using a single db connection object, But few 
questions, 
1) how would making sqlite3 single thread that improve the performance? Is 
there a special advantage in this mode than the multithread one?
2) Will it not block (wait or give a SQLITE_BUSY) the multiple thread of the 
app (in our case glusterfs IO threads)
as now they have to wait for the single thread to complete the task?

~Joe

- Original Message -
From: "Eduardo Morras" <emorr...@yahoo.es>
To: sqlite-users@sqlite.org
Sent: Monday, November 24, 2014 12:29:33 AM
Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in 
Glusterfs

On Sat, 22 Nov 2014 11:01:45 -0500 (EST)
Joseph Fernandes <josfe...@redhat.com> wrote:

> Thanks Eduardo.
> 
> Answers inline JOE>>
> 

> a)If you use Sqlite in single thread, compile it without thread
> support.
> JOE>> We are running it in multithread mode, as the database will be
> JOE>> fed by multiple File IO thread. 

Yes and no, your app can be multithread, but if only one thread has the sqlite3 
pointer, you can use single thread sqlite3. This way, you have multiple working 
threads and one sqlite3 dedicated thread that recives the data to write and the 
data queries.

---   ---
Eduardo Morras <emorr...@yahoo.es>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-23 Thread Eduardo Morras

On Sat, 22 Nov 2014 11:01:45 -0500 (EST)
Joseph Fernandes  wrote:

> Thanks Eduardo.
> 
> Answers inline JOE>>
> 

> a)If you use Sqlite in single thread, compile it without thread
> support.
> JOE>> We are running it in multithread mode, as the database will be
> JOE>> fed by multiple File IO thread. 

Yes and no, your app can be multithread, but if only one thread has the sqlite3 
pointer, you can use single thread sqlite3. This way, you have multiple working 
threads and one sqlite3 dedicated thread that recives the data to write and the 
data queries.

---   ---
Eduardo Morras 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-22 Thread Joseph Fernandes

Thanks James

Please find the answers inline JOE>>

- Original Message -
From: "James K. Lowden" <jklow...@schemamania.org>
To: sqlite-users@sqlite.org
Sent: Saturday, November 22, 2014 10:29:01 PM
Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in 
Glusterfs

On Fri, 21 Nov 2014 14:01:39 -0500 (EST)
Joseph Fernandes <josfe...@redhat.com> wrote:

> 4) Therefore, we are looking at a datastore that can give us a very
> quick write(almost zero latency, as the recording is done inline
> w.r.t file IO) and that as good data querying facilities(Slight
> latency in the read is fine but the fresh of that record data should
> be spot on).

This strikes me as a classic case for "record, then analyze".  I would
capture the data more cheaply and use SQLite to decide what to do.  

You don't really care if the recorded times are exactly right; a few
missed updates wouldn't affect the cold/hot status very much.  You
should be willing to lose a few if you improve write latency.  OTOH
the maintenance operation isn't *very* time critical; you just can't
afford to walk the whole tree first.  

JOE>> Agree! We can afford missing some updates w.r.t subsequent IO i.e
If a file is been written with data then we don't update the db for every
write that comes in. We have a timer and counter based approach that guides
the when the db should be updated for data IO. For Metadata IO we don't want
to miss the updates, as we also keep the record of hardlinks for a inode in 
the database (gluster internal requirement for data maintainer scanners). 

That suggests two possibilities for capture: 

1.  Keep a sequential file of {name,time} or {inode,time} pairs
(whichever is more convenient to use).  By using O_APPEND you get
atomic writes and perfect captures across threads.  fsync(2) as
desired.  

2.  If in practice the above file grows too large, use a primitive
hashing store such as BerkeleyDB to capture counts by name/inode.  It's
not even obvious you need an external store; you might be able to get
away with std::hash_map in C++ and periodically serialize that.  ISTM
you don't need to worry about concurrency because a few missed updates
here and there won't change much.  

At maintenance time, scoop the file into a SQLite table, and you're
back where you started, except you already have zero
write-time latency.  

JOE>>
1) Well we already have a journal log called "changelog" that does that for
us(records each modification as a log entry). 
We have plans of using it to feed the db. The major challenge with 
changelog is, its built to be crash consistent, as its used for geo-replication
feature in glusterfs i.e should not miss any update, as it needs to relay it 
later to the geo-replica,
this brings in performance issues in the IO path.
When compared with changelog, sqlite3(with WAL{Write-ahead-logging} which is 
similar to the 
logging which you suggest, but provided by sqlite itself) shows better 
performance
in the IO path.
There will be setups where geo-rep is not required and hence we need not enable 
the changelog,
hence we are planning to provide two options here
  a) Feed the DB(with WAL) directly via IO path (if geo-rep is off)
  b) Feed the DB via changelog(if geo-rep is on), because we don't want two 
kind of latency 
 hit the IO path, one from the DB update another from changelog.
2) Using the changelog to feed the db has another issue i.e freshness of data 
in the DB w.r.t the IO.
Few of our data maintainer scanners would require the freshness of the feed to 
be close to 
real. To have that we are planning to uses a in-memory view data-structure(like 
a LRU)
 that get updated through the IO path(in parallel to changelog which is updated 
independently),
and then using a seperate scheduled(frequently) notification-thread that would 
update the DB 
(the notifier is not blocked while broadcasting updates). Your thoughts on 
this. 
3) Now that we would use Sqlite3(with WAL) to be direcly feed by the IO path(in 
the absence of changelog)
we are looking to get the best performance from it. Crash consistency wouldn't 
be a major requirement
for now. But performance and freshness of data in the DB should be a spot-on. 

HTH.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-22 Thread James K. Lowden

On Fri, 21 Nov 2014 14:01:39 -0500 (EST)
Joseph Fernandes  wrote:

> 4) Therefore, we are looking at a datastore that can give us a very
> quick write(almost zero latency, as the recording is done inline
> w.r.t file IO) and that as good data querying facilities(Slight
> latency in the read is fine but the fresh of that record data should
> be spot on).

This strikes me as a classic case for "record, then analyze".  I would
capture the data more cheaply and use SQLite to decide what to do.  

You don't really care if the recorded times are exactly right; a few
missed updates wouldn't affect the cold/hot status very much.  You
should be willing to lose a few if you improve write latency.  OTOH
the maintenance operation isn't *very* time critical; you just can't
afford to walk the whole tree first.  

That suggests two possibilities for capture: 

1.  Keep a sequential file of {name,time} or {inode,time} pairs
(whichever is more convenient to use).  By using O_APPEND you get
atomic writes and perfect captures across threads.  fsync(2) as
desired.  

2.  If in practice the above file grows too large, use a primitive
hashing store such as BerkeleyDB to capture counts by name/inode.  It's
not even obvious you need an external store; you might be able to get
away with std::hash_map in C++ and periodically serialize that.  ISTM
you don't need to worry about concurrency because a few missed updates
here and there won't change much.  

At maintenance time, scoop the file into a SQLite table, and you're
back where you started, except you already have zero
write-time latency.  

HTH.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-22 Thread Joseph Fernandes

Thanks Eduardo.

Answers inline JOE>>

- Original Message -
From: "Eduardo Morras" <emorr...@yahoo.es>
To: sqlite-users@sqlite.org
Sent: Saturday, November 22, 2014 9:20:46 PM
Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in 
Glusterfs

On Fri, 21 Nov 2014 14:01:39 -0500 (EST)
Joseph Fernandes <josfe...@redhat.com> wrote:

> Hi There,
> 
> 1) We are trying to use sqlite3 in Glusterfs as a Change Time
> Recording Data Store i.e using sqlite3 db to record any modification
> attempts that happens on the glusterfs per file inode, So that it can
> be used to indicate the hotness of the file in the cluster.
> 
> 2) We have developed a prototype that inserts/updates inode records
> in sqlite3 db(one instance of the db file per glusterfs storage unit
> or as we call it a brick) to record the modification that are
> happening on the inode. These insert/updates are in the IO path of
> the file i.e any data or metadata change time will be record inline,
> w.r.t the IO, in the DB.
> 
> 3) We only save the modification time on the to the db, so that 
> we can easily query for "What files have change during a specific
> period of time" or "What files have not change during a specific
> period of time". The query is done by a scanner that runner on each
> storage unit or brick on a regular interval and chooses HOT or COLD
> files as suggested by the DB for data maintenance operations.
> 
> 4) Therefore, we are looking at a datastore that can give us a very
> quick write(almost zero latency, as the recording is done inline
> w.r.t file IO) and that as good data querying facilities(Slight
> latency in the read is fine but the fresh of that record data should
> be spot on).
> 
> 5) Please find our DB setting,
> Journal mode : WAL
> SYNC MODE : NORMAL 
> Cache Size : 1000 - 4096 Pages (Default Page Size)
> AutoCheck-Pointing: 1000 - 100 Pages 
> 
> We wanted to known the following
> 1) How could we improve the performance on the write side so that we
> have minimal latency?

I'll wait for the ota extension Mr. Hipps is developing. It will (If I 
Understood Correctly) reorder the commit operations to get faster IO throughput.

Some tricks:
a)If you use Sqlite in single thread, compile it without thread support.
JOE>> We are running it in multithread mode, as the database will be fed by 
multiple File IO thread. 
b)Adjust the page size to fit in a hard disk cluster size (8KB IIRC) or the 
multiply of it that allows integer number of rows fits in.
c)Disable autovacuum
JOE>> Sure will try this out in my tests



> 2) Will ther be any write performance hit when the number of records
> in the DB increase?
> 
> Thanks in advance
> 
> ~Joe


---   ---
Eduardo Morras <emorr...@yahoo.es>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-22 Thread Eduardo Morras

On Fri, 21 Nov 2014 14:01:39 -0500 (EST)
Joseph Fernandes  wrote:

> Hi There,
> 
> 1) We are trying to use sqlite3 in Glusterfs as a Change Time
> Recording Data Store i.e using sqlite3 db to record any modification
> attempts that happens on the glusterfs per file inode, So that it can
> be used to indicate the hotness of the file in the cluster.
> 
> 2) We have developed a prototype that inserts/updates inode records
> in sqlite3 db(one instance of the db file per glusterfs storage unit
> or as we call it a brick) to record the modification that are
> happening on the inode. These insert/updates are in the IO path of
> the file i.e any data or metadata change time will be record inline,
> w.r.t the IO, in the DB.
> 
> 3) We only save the modification time on the to the db, so that 
> we can easily query for "What files have change during a specific
> period of time" or "What files have not change during a specific
> period of time". The query is done by a scanner that runner on each
> storage unit or brick on a regular interval and chooses HOT or COLD
> files as suggested by the DB for data maintenance operations.
> 
> 4) Therefore, we are looking at a datastore that can give us a very
> quick write(almost zero latency, as the recording is done inline
> w.r.t file IO) and that as good data querying facilities(Slight
> latency in the read is fine but the fresh of that record data should
> be spot on).
> 
> 5) Please find our DB setting,
> Journal mode : WAL
> SYNC MODE : NORMAL 
> Cache Size : 1000 - 4096 Pages (Default Page Size)
> AutoCheck-Pointing: 1000 - 100 Pages 
> 
> We wanted to known the following
> 1) How could we improve the performance on the write side so that we
> have minimal latency?

I'll wait for the ota extension Mr. Hipps is developing. It will (If I 
Understood Correctly) reorder the commit operations to get faster IO throughput.

Some tricks:
a)If you use Sqlite in single thread, compile it without thread support.
b)Adjust the page size to fit in a hard disk cluster size (8KB IIRC) or the 
multiply of it that allows integer number of rows fits in.
c)Disable autovacuum



> 2) Will ther be any write performance hit when the number of records
> in the DB increase?
> 
> Thanks in advance
> 
> ~Joe


---   ---
Eduardo Morras 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-22 Thread Joseph Fernandes

As any Filesystem glusterfs also has a atime,mtime and ctime. But we are trying 
to optimize the search of file that are "HOT" or "COLD" using sqlite3, As a 
filesystem walk would kill the scanners(which are doing data maintenance)

~Joe

- Original Message -
From: "Keith Medcalf" <kmedc...@dessus.com>
To: "General Discussion of SQLite Database" <sqlite-users@sqlite.org>
Sent: Saturday, November 22, 2014 9:02:30 PM
Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in
Glusterfs


Your Glusterfs does not store modification times?  Have you considered adding 
the modification time attribute to the inode directly?

---
Theory is when you know everything but nothing works.  Practice is when 
everything works but no one knows why.  Sometimes theory and practice are 
combined:  nothing works and no one knows why.

>-Original Message-
>From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
>boun...@sqlite.org] On Behalf Of Joseph Fernandes
>Sent: Friday, 21 November, 2014 12:02
>To: sqlite-users@sqlite.org
>Subject: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in
>Glusterfs
>
>Hi There,
>
>1) We are trying to use sqlite3 in Glusterfs as a Change Time Recording
>Data Store
>i.e using sqlite3 db to record any modification attempts that happens on
>the glusterfs
>per file inode, So that it can be used to indicate the hotness of the
>file in the cluster.
>
>2) We have developed a prototype that inserts/updates inode records in
>sqlite3 db(one instance of the
>db file per glusterfs storage unit or as we call it a brick) to record
>the modification
>that are happening on the inode. These insert/updates are in the IO path
>of the file i.e any data or
>metadata change time will be record inline, w.r.t the IO, in the DB.
>
>3) We only save the modification time on the to the db, so that
>we can easily query for "What files have change during a specific period
>of time" or
>"What files have not change during a specific period of time". The query
>is done by a scanner
>that runner on each storage unit or brick on a regular interval and
>chooses HOT or COLD files as suggested
>by the DB for data maintenance operations.
>
>4) Therefore, we are looking at a datastore that can give us a very quick
>write(almost zero latency,
>as the recording is done inline w.r.t file IO) and that as good data
>querying
>facilities(Slight latency in the read is fine but the fresh of that
>record data should be spot on).
>
>5) Please find our DB setting,
>Journal mode : WAL
>SYNC MODE : NORMAL
>Cache Size : 1000 - 4096 Pages (Default Page Size)
>AutoCheck-Pointing: 1000 - 100 Pages
>
>We wanted to known the following
>1) How could we improve the performance on the write side so that we have
>minimal latency?
>2) Will ther be any write performance hit when the number of records in
>the DB increase?
>
>Thanks in advance
>
>~Joe
>___
>sqlite-users mailing list
>sqlite-users@sqlite.org
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-22 Thread Keith Medcalf


Your Glusterfs does not store modification times?  Have you considered adding 
the modification time attribute to the inode directly?

---
Theory is when you know everything but nothing works.  Practice is when 
everything works but no one knows why.  Sometimes theory and practice are 
combined:  nothing works and no one knows why.

>-Original Message-
>From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
>boun...@sqlite.org] On Behalf Of Joseph Fernandes
>Sent: Friday, 21 November, 2014 12:02
>To: sqlite-users@sqlite.org
>Subject: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in
>Glusterfs
>
>Hi There,
>
>1) We are trying to use sqlite3 in Glusterfs as a Change Time Recording
>Data Store
>i.e using sqlite3 db to record any modification attempts that happens on
>the glusterfs
>per file inode, So that it can be used to indicate the hotness of the
>file in the cluster.
>
>2) We have developed a prototype that inserts/updates inode records in
>sqlite3 db(one instance of the
>db file per glusterfs storage unit or as we call it a brick) to record
>the modification
>that are happening on the inode. These insert/updates are in the IO path
>of the file i.e any data or
>metadata change time will be record inline, w.r.t the IO, in the DB.
>
>3) We only save the modification time on the to the db, so that
>we can easily query for "What files have change during a specific period
>of time" or
>"What files have not change during a specific period of time". The query
>is done by a scanner
>that runner on each storage unit or brick on a regular interval and
>chooses HOT or COLD files as suggested
>by the DB for data maintenance operations.
>
>4) Therefore, we are looking at a datastore that can give us a very quick
>write(almost zero latency,
>as the recording is done inline w.r.t file IO) and that as good data
>querying
>facilities(Slight latency in the read is fine but the fresh of that
>record data should be spot on).
>
>5) Please find our DB setting,
>Journal mode : WAL
>SYNC MODE : NORMAL
>Cache Size : 1000 - 4096 Pages (Default Page Size)
>AutoCheck-Pointing: 1000 - 100 Pages
>
>We wanted to known the following
>1) How could we improve the performance on the write side so that we have
>minimal latency?
>2) Will ther be any write performance hit when the number of records in
>the DB increase?
>
>Thanks in advance
>
>~Joe
>___
>sqlite-users mailing list
>sqlite-users@sqlite.org
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

17 matches

Site Navigation

Mail list logo

Footer information