Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Simon Slavin

On 25 Nov 2014, at 1:34am, Joseph Fernandes  wrote:

> a. Does WAL during an insert/update in the log file do any internal 
> search/sort 
> and then insert/update to the log or 
> b. Just appends the WAL log with the incoming insert/update entry, thus 
> keeping the
>writes sequential and during a checkpoint (manual or automatic) does the 
> merging with the actual tables, 
>which as you pointed out will have to search/sort causing alot of random 
> disk access? 

An INSERT in WAL mode does far more than just add something to the end of the 
journal file.  Numerous operations including, for examples, allocating pages 
and updating the primary index.  It will slow operations quite a bit.

I'm with some other posters on this: when logging modifications don't use 
SQLite, just append to a text/octet file in a simple short format.  Only when 
it comes time to do your maintenance read that data from the text files into 
SQLite.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Joseph Fernandes
Again asking the same question :
 a. Does WAL during an insert/update in the log file do any internal 
search/sort 
 and then insert/update to the log or 
 b. Just appends the WAL log with the incoming insert/update entry, thus 
keeping the
writes sequential and during a checkpoint (manual or automatic) does the 
merging with the actual tables, 
which as you pointed out will have to search/sort causing alot of random 
disk access? 
 c. Also we don't read the db in file IO path. The read are for scheduled data 
maintenance scanners,
thus making the random read of disk a occasional event.   
 Just trying to assess if my understanding of sqlite WAL is correct.

~Joe 

- Original Message -
From: "Valentin Davydov" 
To: "General Discussion of SQLite Database" 
Sent: Monday, November 24, 2014 4:37:12 PM
Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in
Glusterfs

On Fri, Nov 21, 2014 at 02:01:39PM -0500, Joseph Fernandes wrote:

> We wanted to known the following
> 1) How could we improve the performance on the write side so that we have 
> minimal latency?
> 2) Will ther be any write performance hit when the number of records in the 
> DB increase?

Generally speaking, you have to do some work to arrange your data (modification 
times) in some ordered way. This work can be done eihter in advance, as you 
suggest, or on demand, as some people have already told you. But anyway this 
will eat up necessary resources, regardless of whose code would do it, either
yours or SQLite. In practice (given magnetic disks as underlying storage), 
most scarce of the mentioned resources is rotational/seek latency, which 
detrimentally affects all disk operations of any scheduled priority. SQLite 
performs extensive random disk access (mostly reads) on most operation 
scenarios - selects, inserts, indexing etc. with possible exception of 
small updates of non-indexed data (which are accessed in a similar fashion 
by later selects). The only way to cope with the slow disk is keeping all 
necessary data somwhere else, for example, into the RAM cache. Of course, 
cache itself should be populated in advance to give this benefit, and, given 
current RAM prices, it seems not very feasible to steal available memory from 
smart applications in favour of dumb cache.

Hope, this considerations will help you in tuning your code.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Joseph Fernandes


- Original Message -
From: "Nico Williams" 
To: "General Discussion of SQLite Database" 
Sent: Monday, November 24, 2014 4:35:59 PM
Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in
Glusterfs

On Sun, Nov 23, 2014 at 4:26 PM, James K. Lowden
 wrote:
> Metadata updates to Posix filesystems are seen as so costly that
> fsync(2) on the datafile descriptor doesn't update them.  A separate
> sync on the directory is required.  Compared to an in-memory update
> (of metadata, in kernel space) and a single fsync call, the price of a
> SQLite transaction is enormous, at a guess an order of magnitude more.

This.  Updates of mtime and atime in particular are expensive.

Another problem (for Lustre-style clusters) is stat(), since it
returns information that only a metadata service might know (e.g.,
file type, file ID (st_dev and st_ino), and link count) but also
information that it might not (file size, various timestamps), which
then presents enormous headaches for implementors.  There are also
write visibility rules as to stat() that some applications depend
on...  This is why "ls -l" can be slow on some such clusters.

JOE>>
Glusterfs is not replacing POSIX atime,mtime and ctime with this db (Sqlite is 
not a metadata db). i.e
stat will always read from actual atime/mtime/ctime of the inode. Therefore as 
mention in [1]
we don't do any db read operations in the file IO path. The read consumer of 
the db are data maintaining
scanners that are scheduled, and which want a smarter way to find the "HOT" and 
"COLD" files in the cluster.
Infact there will be a one db per storage unit(brick) of glusterfs therefore no 
centralized db.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Joseph Fernandes


- Original Message -
From: "James K. Lowden" 
To: sqlite-users@sqlite.org
Sent: Monday, November 24, 2014 3:56:14 AM
Subject: Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in 
Glusterfs

On Sat, 22 Nov 2014 23:25:16 -0500 (EST)
Joseph Fernandes  wrote:

> 2) Using the changelog to feed the db has another issue i.e freshness
> of data in the DB w.r.t the IO. Few of our data maintainer scanners
> would require the freshness of the feed to be close to real. [...]
> Your thoughts on this. 

If your in-memory LRU structure suffices to describe all "hot" files,
you're in good shape.  Rather than dumping periodically, I would
consider placing it in shared memory and write a virtual table
function for it in SQLite, such that it can be queried directly as
needed.  

To me based on your description your choice isn't how best to use
SQLite in line with I/O, but how best to capture the data such that
they can be aggregated with SQLite at time of update.  That choice is
one of two: 1) capture each I/O event in a sequential file, always
appending, or 2) maintain per-file counts in a hash or map.  Which is
better depends on how much you're willing to pay for each I/O.  By
paying the lookup cost of #2 each time, the total space is smaller and
the maintenance-time computation less.  

> 3) Now that we would use Sqlite3(with WAL) to be direcly feed by the
> IO path(in the absence of changelog) we are looking to get the best
> performance from it. 

Metadata updates to Posix filesystems are seen as so costly that 
fsync(2) on the datafile descriptor doesn't update them.  A separate
sync on the directory is required.  Compared to an in-memory update
(of metadata, in kernel space) and a single fsync call, the price of a
SQLite transaction is enormous, at a guess an order of magnitude more.
Bear in mind that WAL buys you not efficiency but consistency, the very
thing you don't really need.  The data are written sequentially to the
log and then inserted into the table.  You can expect no better than
O(n log n) performance.  Filesystems generally would never tolerate
that, but for your application you'd be the judge. 

>> Ok few questions on the WAL journal mechanism, 
1) As far as I understand(I may be wrong), During a insert or update WAL
 just records it sequentially in WAL file. And during a checkpoint (manual or 
automatic)
 a. Flush the in-memory appends in the WAL file
 b. and then Merges with the actually tables. 
 And therefore check pointing takes a toll on the performance. But if I don't 
 do check point often i.e set auto checkpoint to say 1 GB or 2 GB.
 Following would be the implications of doing so
 a. Read will be slow because now I will have a large amount of data in the log 
and it would take time to read and collate data from the log. We are fine 
with
this in our usage case as we DON'T read (select queries) from database in 
the file IO path ever.
These are done by data maintaining scanner which are scheduled. When tested 
with 1 million records
and joining two tables and having the WAL log file to grow to 10 gb , 
it takes almost 1 min to retrieve 1 million records. Which is fine for data 
scanners as they
the waited for hour-hours.
 b. we will occupy huge space for the WAL Log file.

 Here is the question does WAL during an insert/update in the log file do any 
internal search/sort 
 and then insert/update to the log or just appends the WAL log with the 
incoming insert/update entry ?

2) Glusterfs is not replacing POSIX atime,mtime and ctime with this db (Sqlite 
is not a metadata db). i.e
   stat will always read from actual atime/mtime/ctime of the inode. Therefore 
as mention in [1]
   we don't do any db read operations in the file IO path.

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Missing function in NativeDB.c : Java_org_sqlite_NativeDB_result_1error()

2014-11-24 Thread pelz
Hi Everyone!

I wanted to use the user defined functions in Java language with
sqlite-jdbc-3.8.7. (On Ubuntu Linux 14.04 LTS  64bits)
I wanted to check the arguments number in my custom function. And I
thought, that I send a message for user, when he gave too many arguments in
the function.
I wanted to use the "error()" function for this.
http://priede.bf.lu.lv/ftp/pub/DatuBazes/SQLite/SqliteJDBC/api/org/sqlite/Function.html#error%28java.lang.String%29

I inserted this function in my Java program for, when the user gave not
correct arguments number.
But I got the following exception:


*java.sql.SQLException: [SQLITE_ERROR] SQL error or missing database
(java.lang.UnsatisfiedLinkError:
org.sqlite.core.NativeDB.result_error(JLjava/lang/String;)V)*
I checked the NativeDB.java. I found the "result_error()" function.
https://bitbucket.org/xerial/sqlite-jdbc/src/cb3185b148726e00530779014e69d1475697084c/src/main/java/org/sqlite/NativeDB.java?at=default#cl-299

But I did not find any function for "result_error()" in NativeDb.c file.
I think, I have to find "Java_org_sqlite_NativeDB_result_1error()"
function. But there is not any.
https://bitbucket.org/xerial/sqlite-jdbc/src/cb3185b148726e00530779014e69d1475697084c/src/main/java/org/sqlite/NativeDB.c?at=default

Is it a bug?
Or I wanted to use the "error()" function in wrong mode?


There are the two Java files, that I wrote for testing the user defined
functions:

*// UdfTester*
import java.sql.*;

public class UdfTester {

static String dbPath = "/home/user/mypath/test.sqlite";

public static void main(String args[]) {

String sql = "SELECT * "
+ ",jReverseString(name) as rev "
+ "FROM names "
+ "WHERE "
+ "jLikeRegExp(name,'.*i.*a.*',3) "
+ ";"
+ "";
try {
SqliteConn db = new SqliteConn(dbPath, false);
if (db.isConnected()) {
System.out.println("Opened database successfully");
ResultSet rs = db.getStatement().executeQuery(sql);
while (rs.next()) {
int id = rs.getInt("id");
String name = rs.getString("name");
String rev = rs.getString("rev");
System.out.print("ID= " + id);
System.out.print("\tNAME= " + name);
System.out.print("\tREV= " + rev);
System.out.println();
}
rs.close();
db.close();
System.out.println("Operation done successfully");
} else {
System.err.println("The database could not open!");
}
} catch (Exception ex) {
System.err.println("Exception in "
+ "UdfTester.main() "
+ "using query: " + sql + "\n"
+ ex.getClass().getName() + ": " + ex.getMessage()
);
System.exit(0);
}
}
}


*// SqliteConn*
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.sqlite.Function;

public class SqliteConn {

private Connection conn = null;
private Statement stmt = null;
private String[] funcNames = null;
private int funcIndex;

public SqliteConn(String dbPath, boolean isAutoCommit) throws
SQLException, ClassNotFoundException {
this.funcNames = new String[]{
"jReverseString",
"jLikeRegExp"
};
this.funcIndex = 0;
try {
Class.forName("org.sqlite.JDBC");
conn = DriverManager.getConnection("jdbc:sqlite:" + dbPath);
conn.setAutoCommit(isAutoCommit);
stmt = conn.createStatement();
addFunctions();
} catch (SQLException ex) {
destroyFunctions();
stmt = null;
conn = null;
throw new SQLException(ex.getMessage() + " In SqliteConn()
constructor ");
} catch (ClassNotFoundException ex) {
destroyFunctions();
stmt = null;
conn = null;
throw new ClassNotFoundException(ex.getMessage() + " In
SqliteConn() constructor ");
}
}

public boolean isConnected() {
return (conn != null && stmt != null);
}

private void addFunctions() throws SQLException {
funcIndex = 0;

// jReverseString(String arg0)
Function.create(conn, funcNames[funcIndex], new Function() {
@Override
protected void xFunc() throws SQLException {
int num = args();
String ret = "";
if (num == 1) {
String s = value_text(0);
ret = new StringBuffer(s).reverse().toString();
}
result(ret);
}
});
   

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Nico Williams
BTW, the experience with dedup is that doing things off-line means
never catching up, while doing them online means going slow.

You might cache as much as you can in memory then go slow when you
miss the cache...

In practice I think it's best to separate data and metadata devices so
that you can make metadata as fast as you can, writing COW-style, like
ZFS does, and caching in memory on clients and servers as much as
possible for all data writes during the writes until they are
committed.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Nico Williams
My advice is to borrow from other clustered filesystems' experience.

If you want to adhere to POSIX semantics then st_mtime and st_size
visibility will be a particular headache, especially since you don't
know when it's OK to lie (i.e., which callers of stat() are using
st_mtime/st_size for synchronization).

Ideally we'd split stat() into metastat() and datastat()...

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Valentin Davydov
On Fri, Nov 21, 2014 at 02:01:39PM -0500, Joseph Fernandes wrote:

> We wanted to known the following
> 1) How could we improve the performance on the write side so that we have 
> minimal latency?
> 2) Will ther be any write performance hit when the number of records in the 
> DB increase?

Generally speaking, you have to do some work to arrange your data (modification 
times) in some ordered way. This work can be done eihter in advance, as you 
suggest, or on demand, as some people have already told you. But anyway this 
will eat up necessary resources, regardless of whose code would do it, either
yours or SQLite. In practice (given magnetic disks as underlying storage), 
most scarce of the mentioned resources is rotational/seek latency, which 
detrimentally affects all disk operations of any scheduled priority. SQLite 
performs extensive random disk access (mostly reads) on most operation 
scenarios - selects, inserts, indexing etc. with possible exception of 
small updates of non-indexed data (which are accessed in a similar fashion 
by later selects). The only way to cope with the slow disk is keeping all 
necessary data somwhere else, for example, into the RAM cache. Of course, 
cache itself should be populated in advance to give this benefit, and, given 
current RAM prices, it seems not very feasible to steal available memory from 
smart applications in favour of dumb cache.

Hope, this considerations will help you in tuning your code.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Nico Williams
On Sun, Nov 23, 2014 at 4:26 PM, James K. Lowden
 wrote:
> Metadata updates to Posix filesystems are seen as so costly that
> fsync(2) on the datafile descriptor doesn't update them.  A separate
> sync on the directory is required.  Compared to an in-memory update
> (of metadata, in kernel space) and a single fsync call, the price of a
> SQLite transaction is enormous, at a guess an order of magnitude more.

This.  Updates of mtime and atime in particular are expensive.

Another problem (for Lustre-style clusters) is stat(), since it
returns information that only a metadata service might know (e.g.,
file type, file ID (st_dev and st_ino), and link count) but also
information that it might not (file size, various timestamps), which
then presents enormous headaches for implementors.  There are also
write visibility rules as to stat() that some applications depend
on...  This is why "ls -l" can be slow on some such clusters.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users