Re: [sqlite] inserting record datinto mutliple tables with a composite primary key

2013-06-21 Thread LMHmedchem
I have more or less completed this project, so I thought I would post the
final script. The script is in ruby and accepts the name of an input file
and the name of the database,

ruby  loader.rb  inputFilename  databaseName

The input file uses metadata rows to identify the database table that the
column will go to, the data type, and the column name. The unique fields
iK1, iK2, iK3, iK4 are how the data for the primary table is registered for
each input record. If a record is inserted and the database already contains
a record with the same value for these 4 ints, there will be an exception.
The script will create the database if it doesn't exist, but can also be
used to insert records to an existing database. The "structure" table is
required with all 9 fields not null. The other satellite tables are
optional. Additional tables can be added by adding new columns with the
table name in the first row, the data type for the column in the second row,
and the column name in the third row. All fields in the satellite tables are
not null by default.

The script is below, but I have also attached it with a sample input file.

sample_files_sqlite_ruby_loader.zip

  

Thanks for the assistance. I am now working on a script to add/change data
in the existing records in a database, as well as a simple query script to
export data to a delimited text file.

*LMHmedchem*


# script loader.rb
require 'rubygems'
require 'sqlite3'
require 'logger'

class DataLoader

  attr_accessor :db_name, :current_row, :table_names, :field_names,
:data_types, :line_number, :db, :ready, :structure_id

  UNIQUE_FIELDS = %w(iK1 iK2 iK3 iK4)

  def initialize(db_path)
@db_name = db_path
@table_names = []
@field_names = {}
@data_types = {}
@ready = false
@log = Logger.new('sqlite_data_loader.log', 'weekly')
db_connect
  end

  def db_connect
@db = SQLite3::Database.open @db_name
# Enabling foreign_keys features
@db.execute('PRAGMA foreign_keys = ON')
@ready = true
  end

  def load_data(tsv_file)
@line_number = 1
File.open(tsv_file, 'r') do |f1|
  while line = f1.gets
@current_row = line.strip.split("\t")
next if @current_row.empty?
if @line_number <= 3
  collect_table_info
  # After processing third line, we can create tables with the
information collected.
  create_tables if @line_number == 3
else
  collect_table_data
end
@line_number += 1
  end
end
  end

  def collect_table_info
if @line_number == 1
  @current_row.uniq.each do |name|
@table_names << [name, @current_row.count(name)]
  end
else
  offset = 0
  @table_names.each do |table_name, fields_count|
@data_types[table_name] = @current_row[offset..(offset +
fields_count - 1)] if @line_number == 2
@field_names[table_name] = @current_row[offset..(offset +
fields_count - 1)] if @line_number == 3
offset += fields_count
  end
end
  end

  def create_tables
@table_names.each do |table_name, arity|
  create_table table_name
end
  end

  def create_table(table_name)
query = "CREATE TABLE IF NOT EXISTS #{table_name}(#{fields_for
table_name})"
@log.debug query
@db.execute query
unless 'structure' == table_name
  @log.debug "Creating index on #{table_name}"
  @db.execute "CREATE INDEX IF NOT EXISTS idx_#{table_name}_structure_id
ON #{table_name}(structure_id)"
end
  rescue SQLite3::Exception => e
@log.debug "Exception occured #{e.class}: #{e.message}"
  end

  def fields_for(table_name)
list = []
if 'structure' == table_name
  list << 'id INTEGER PRIMARY KEY AUTOINCREMENT'
else
  list << 'structure_id INTEGER NOT NULL'
end
fields = []
@field_names[table_name].each_with_index do |field, index|
  next if 'primary_key' == field
  next if fields.member?(field)
  list << "'#{field}' #{field_data_type(table_name, index)}"
  fields << field
end
list << add_unique_fields(table_name)
list << add_foreign_key_constraint(table_name)
list.compact.join(', ')
  end

  def field_data_type(table_name, index)
kind = @data_types[table_name][index].upcase
kind = 'TEXT' if kind.start_with?('STR')
"#{kind}#{' NOT NULL' if 'structure' == table_name}"
  end

  def add_unique_fields(table_name)
available_fields = (@field_names[table_name] & UNIQUE_FIELDS)
unless available_fields.empty?
  "UNIQUE(#{available_fields.join(', ')}) ON CONFLICT ROLLBACK"
end
  end

  def add_foreign_key_constraint(table_name)
unless 'structure' == table_name
  'FOREIGN KEY(structure_id) REFERENCES Structure(id)'
end
  end

  def insert_into_fields(table_name, values)
field_names = @field_names[table_name]
values_list, attr_names = [], []
if 'structure' == table_name
  attr_names << 'id'
  

Re: [sqlite] inserting record datinto mutliple tables with a composite primary key

2012-11-16 Thread LMHmedchem
This looks like a more or less complete solution for creating the tables and
doing inserts.

Primary table:

CREATE TABLE Structure(
   'id' INTEGER PRIMARY KEY,
   'name' TEXT NOT NULL,
   'filePath' TEXT NOT NULL,
   'iH1' INTEGER NOT NULL,
   'iH2' INTEGER NOT NULL,
   'iH3' INTEGER NOT NULL,
   'iH4' INTEGER NOT NULL,
   'formula' TEXT NOT NULL,
   'fw' FLOAT NOT NULL,
   UNIQUE(iH1, iH2, iH3, iH4)
   ON CONFLICT FAIL
)

Satellite table:

CREATE TABLE Project1(
 'Structure_id' INTEGER NOT NULL,
 'class' STRING,
 'status' STRING,
 'RI17-1' FLOAT,
 FOREIGN KEY(Strucutre_id) REFERENCES Structure(id)
) 

There are three cases for doing inserts of the data for phosphoserine, which
is distributed over both tables.

record phosphoserine, data for Structure table:
name = phosphoserine
filePath = phosphoserine.mol
iH1 = 185073
iH2 = 856147
iH3 = 73543
iH4 = 25338
formula= C3H8NO6P
fw = 185.073 

record phosphoserine, data for Project1 table:
class = C0248
status = M
RI17-1 = 15.0

these statements include some ruby pseudocode

1. Insert data to Structure table only, do not insert Project1 data

@db.execute "INSERT INTO Structure(id, name, filePath, iH1, iH2, iH3, iH4,
formula, fw)
VALUES(null,'phosphoserine', 'phosphoserine.mol', 185073, 856147, 73543,
25338, 'C3H8NO6P', 185.073)"


2. Sequentially insert data to Structure and then Project1 

@db.execute "INSERT INTO Structure(id, name, filePath, iH1, iH2, iH3, iH4,
formula, fw)
VALUES(null,'phosphoserine', 'phosphoserine.mol', 185073, 856147, 73543,
25338, 'C3H8NO6P', 185.073)"

# capture the row number where phosphoserine was inserted to Structure
@Structure_id = @db.last_insert_row_id

# use the value of Structure_id to link phosphoserine data in Project1 to
phosphoserine data in Structure
# the implicit rowid will be the primary key for Project1 and so is not
handled explicitly
@db.execute "INSERT INTO Project1(Structure_id, class, status, RI17-1)
VALUES(Structure_id, C0248, M, 15.0)"


3. Insert phosphoserine data to Project1 at some later time, meaning in a
situation where there is already a record in Structure for phosphoserine

# lookup the row number in Structure where the phosphoserine record is
stored using the 4 int key values
@Structure_id = @db.execute "SELECT id FROM Structure WHERE iH1 = 185073 AND
iH2 = 856147 AND iH3 = 73543 AND iH4 = 25338;"

# use the value of Structure_id to link phosphoserine data in Project1 to
phosphoserine data in Structure
# the implicit rowid will be the primary key for Project1 and so is not
handled explicitly
@db.execute "INSERT INTO Project1(Structure_id, class, status, RI17-1)
VALUES(Structure_id, C0248, M, 15.0)"

Excepting that the ruby is probably not quite right here and that the SQL
instructions are using literals and not parameters, does this look like a
reasonable setup?

The Structure table will have more records than the satellite tables and
data will be added to the satellite tables more frequently. I think this
allows for addition of new tables of data related to previously stored
structures (Project2, Project3, etc) as needed, and also allows for a quick
method of looking up structures that might have been registered under a
different name.

Is ON CONFLICT FAIL what I want for the Structure table, or would
ABORT/ROLLBACK make more sense?

*LMHmedchem*





--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/inserting-record-data-into-mutliple-tables-with-a-composite-primary-key-tp64874p65567.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] inserting record datinto mutliple tables with a composite primary key

2012-11-15 Thread Simon Slavin

On 15 Nov 2012, at 5:24am, LMHmedchem  wrote:

> On 13 Nov 2012, at 5:41pm, Simon Slavin-3 wrote: 
>> By the way, can I ask what iH2, iH3, iH4 are ?  I think I've figured out
>> iH1.
> 
> Yes iH1 is an intergerized form of the molecular weight rounded to two [snip]

Hey, thanks for that.  I read and understand the whole thing.  You piqued my 
curiosity.  My Chemistry 'O' Level is now about 30 years in the past.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] inserting record datinto mutliple tables with a composite primary key

2012-11-15 Thread Igor Tandetnik
LMHmedchem  wrote:
> Igor,
> It seem as if you are saying that string values like a name should be passed
> in SQL commands as variables, is that right?

Personally, I tend to do this for all values, whether strings or numbers or 
otherwise.

> At the moment, I am accessing
> SQLite through the ruby-sqlite3 interface

For the record, I'm completely unfamiliar with that.

> so the inserts look like,
> 
> @db.execute "INSERT INTO #{table_name.capitalize}
> VALUES(#{insert_into_fields(table_name, values_array)})"
> 
> (I haven't got to queries yet)
> 
> the data is contained in the values_array, but I guessing that will still
> create a problem if there are single quotes in one of the values?

Well, it's possible that insert_into_fields function does, or can be made to 
do, the necessary escaping.

> I have
> looked at the link you sent (and to the very funny cartoon, thanks), but it
> will take a bit to decipher. Since I am using the ruby interface and not the
> C interface, it may be that I need to look at the ruby-sqlite3 doc, but I am
> amusing/hoping that there is something similar.

A quick Google search reveals

http://sqlite-ruby.rubyforge.org/sqlite3/faq.html#538670816

> It looks like you set the
> value of a parameter to your string value and then pass the parameter
> instead of the literal string.

You embed a placeholder in your statement where a literal would be. Each 
placeholder corresponds to a parameter (multiple placeholders may correspond to 
the same parameter). Then you bind actual values to parameters, and finally you 
execute the statement.

> Is that more or less the idea? I guess it
> would be easy enough to double the quotes in the ruby code before inserts or
> queries, as Simon suggested, and then un-double them on the way out if data
> was being retrieved from the database to be written somewhere else.

You don't need this last part. The text you get from SELECT is the original, 
without escaping.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] inserting record datinto mutliple tables with a composite primary key

2012-11-15 Thread LMHmedchem
Keith,

Thank you for the thoughtful and informative response. It is very helpful to
have a better background of how the underlying technology works. I gather
that I probably don't need the AUTOINCREMENT keyword. Since I am inserting
the value of "id" from Structure in to other tables (as Structure_id) for
the purposes of keeping track of data from the same record spread across
multiple tables, it would seem as if using an explicitly defined id primary
key (instead of the implicit rowid) makes sense for the Structure table. I
don't think that the FOREIGN KEY reference to Structure(id) in the satellite
tables provides an enormous amount of protection, but there are some
situations where I can see it as being helpful and it looks like that could
not be done with rowid.

> or have caught a nasty case of the object-oriented disease...
It's funny that you should say that. I had a conversation with a friend the
other day who is a database programmer, and he advised having an explicit
primary key in each of my satellite tables that is independent of the
Structure_id column. I as planning on just using Structure_id as the primary
key, but he advised me to keep an different AUTOINCREMENT key because it may
be useful to access rows independent of the Structure_id value. This
programmer frequently uses an OO database system, so I thought your comment
was interesting. What is it about OO that would lead to always explicitly
defining the id?


Igor,
It seem as if you are saying that string values like a name should be passed
in SQL commands as variables, is that right? At the moment, I am accessing
SQLite through the ruby-sqlite3 interface, so the inserts look like,

@db.execute "INSERT INTO #{table_name.capitalize}
VALUES(#{insert_into_fields(table_name, values_array)})"

(I haven't got to queries yet)

the data is contained in the values_array, but I guessing that will still
create a problem if there are single quotes in one of the values? I have
looked at the link you sent (and to the very funny cartoon, thanks), but it
will take a bit to decipher. Since I am using the ruby interface and not the
C interface, it may be that I need to look at the ruby-sqlite3 doc, but I am
amusing/hoping that there is something similar. It looks like you set the
value of a parameter to your string value and then pass the parameter
instead of the literal string. Is that more or less the idea? I guess it
would be easy enough to double the quotes in the ruby code before inserts or
queries, as Simon suggested, and then un-double them on the way out if data
was being retrieved from the database to be written somewhere else. The
single quotes in the names have specific chemical meaning.


On 13 Nov 2012, at 5:41pm, Simon Slavin-3 wrote: 
> By the way, can I ask what iH2, iH3, iH4 are ?  I think I've figured out
> iH1.

Yes iH1 is an intergerized form of the molecular weight rounded to two
decimal places (float*100 with a round off correction on the resulting int).
Each of the other three is the sum of a class of molecular structure
descriptors.

The iH2 sums a group of descriptors based on an atom level quantification of
the valence electron density at each atom (similar to partial charge). These
atom level indices have been grouped in various ways, such as by functional
group, atom-type, and bond type. There are ~1000 of these different indices
that are summed and intergerized.  This sum varies a great deal with the
size and structure features of the molecule. There will, of course, be
instances where two different compounds have the same value for the iH2 sum,
as is true with the iH1 molecular weight key.

The iH3 sum is also a sum of structure indices, but these indices relate to
the connections of the molecular graph, such as the number and size of
rings, fused rings, branch points, paths of various lengths, and the
location of heteroatoms in these subgraphs. The iH3 indices are relatively
independent of the iH2 and serve to discriminate between compounds that may
have identical iH2 sums.

The iH4 key is a sum of indices that describe the overall shape and
connection complexity of the molecule (elongated/globular, highly
interconnected, etc.).

As far as I have been able to determine, this set of 4 ints (including the
mw int) is unique for any compound. Since names are problematic for
identifying a compound (there can be dozens of legal names and they have
characters that are difficult to deal with, especially where multiple
software applications are used in a tool chain), I am looking into how this
set of ints can be used to register a compound into a database instead of
some other identifier. These ints are produced by software that processed
chemical structure files (molecular structures stored in text files). I
receive such files from vendors, and not only can the compound have a
variety of names, it often can be drawn in a variety of forms, so the
process of determining if you already have a record for this compound or not
is 

Re: [sqlite] inserting record datinto mutliple tables with a composite primary key

2012-11-14 Thread Igor Tandetnik
LMHmedchem  wrote:
> As it happens, some of the text strings that will be added to the database
> have single quotes,
> 
> N,N'-dimethylethylenediamine
> 
> do I need to do anything different for these

Normally, your program shouldn't use string literals directly, but instead uses 
parameterized queries: http://sqlite.org/c3ref/bind_blob.html . Then you don't 
need to worry about single quotes. Obligatory: http://xkcd.com/327/

If for some reason you insist on embedding them directly into queries as 
literals, then, as Simon explained, you should double up each apostrophe, as in 
'N,N''-dimethylethylenediamine'. Doesn't matter how many of them in a row, or 
whether they are in the beginning, end or middle of the string. E.g. if you 
want to insert a three-character string 'x' (including apostrophes), the 
correct string literal is '''x''' (three apostrophes on either side).

SQLite provides a helper function sqlite3_mprintf 
(http://sqlite.org/c3ref/mprintf.html) , which is similar to regular sprintf() 
except that a) it allocates memory for the result, so you don't need to worry 
about sufficient buffer size, and b) it recognizes %q and %Q format specifiers, 
which are like %s but escape single quotes appropriately.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] inserting record datinto mutliple tables with a composite primary key

2012-11-14 Thread Simon Slavin

On 13 Nov 2012, at 5:41pm, LMHmedchem  wrote:

> Thanks for the clarification. So my proper syntax for inserts with
> AUTOINCREMENT is one of,
> 
> INSERT INTO Structure
> VALUES(null,'phosphoserine','phosphoserine.mol',185073,856147,73543,25338,'C3H8NO6P',185.073);
> 
> or
> 
> INSERT INTO Structure(id, name, filePath, iH1, iH2, iH3, iH4, formula, fw)
> VALUES(null,'phosphoserine','phosphoserine.mol',185073,856147,73543,25338,'C3H8NO6P',185.073);
> 
> or
> 
> INSERT INTO Structure(name, filePath, iH1, iH2, iH3, iH4, formula, fw)
> VALUES('phosphoserine','phosphoserine.mol',185073,856147,73543,25338,'C3H8NO6P',185.073);

Yes, all three will work (unless my old eyes missed something) as will several 
other formulations.  See the diagram on this page:



> As it happens, some of the text strings that will be added to the database
> have single quotes,
> 
> N,N'-dimethylethylenediamine
> 
> do I need to do anything different for these, such as to escape the single
> single-quote in some way? There will never be double quotes, but there could
> be any number of single quotes.
> 
> N,N',N''-trimethylbis(hexamethylene)triamine

Yes, if you are passing entire SQL commands in as text, you absolutely do need 
to worry about this.  You can write your own code which will double the quote:



As an example, the following is a valid way to insert the above formulation:

INSERT INTO Structure(name) 
VALUES('N,N'',N-trimethylbis(hexamethylene)triamine');

Doubling single quotes is the only thing you should need to do to a 
conventional C string.  You could write a standard routine to do it and call it 
for all string values you are concatenating into a SQL command.  However, if 
instead of passing entire commands you are using binding to bind a string to a 
parameter of the INSERT command, then you do not need to escape the string 
you're passing.

By the way, can I ask what iH2, iH3, iH4 are ?  I think I've figured out iH1.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] inserting record datinto mutliple tables with a composite primary key

2012-11-14 Thread LMHmedchem
Thanks for the clarification. So my proper syntax for inserts with
AUTOINCREMENT is one of,

INSERT INTO Structure
VALUES(null,'phosphoserine','phosphoserine.mol',185073,856147,73543,25338,'C3H8NO6P',185.073);

or

INSERT INTO Structure(id, name, filePath, iH1, iH2, iH3, iH4, formula, fw)
VALUES(null,'phosphoserine','phosphoserine.mol',185073,856147,73543,25338,'C3H8NO6P',185.073);

or

INSERT INTO Structure(name, filePath, iH1, iH2, iH3, iH4, formula, fw)
VALUES('phosphoserine','phosphoserine.mol',185073,856147,73543,25338,'C3H8NO6P',185.073);

As it happens, some of the text strings that will be added to the database
have single quotes,

N,N'-dimethylethylenediamine

do I need to do anything different for these, such as to escape the single
single-quote in some way? There will never be double quotes, but there could
be any number of single quotes.

N,N',N''-trimethylbis(hexamethylene)triamine

These single-quotes will never appear as the first or last character, so
possibly nothing needs to be done?

*LMHmedchem*




--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/inserting-record-data-into-mutliple-tables-with-a-composite-primary-key-tp64874p65524.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] inserting record datinto mutliple tables with a composite primary key

2012-11-13 Thread Keith Medcalf
> It looks like my insert syntax should be,
> INSERT INTO Structure
> VALUES(null,'phosphoserine','phosphoserine.mol',185073,856147,73543,25338,'C3
> H8NO6P',185.073);
> 
> where using null for id invokes AUTOINCREMENT. Is this the right syntax for
> sqlite? I have also seen versions of insert where the Table fields are
> defined and look like,
> 
> INSERT INTO Structure(id, name, filePath, iH1, iH2, iH3, iH4, formula, fw)
> VALUES(null,'phosphoserine','phosphoserine.mol',185073,856147,73543,25338,'C3
> H8NO6P',185.073);
> 
> Would I still use null here for the id, or do I not have this right? Is
> there some reason for preferring one version over the other if both are
> valid?

INSERT INTO tablename (a, b, c, ... z) VALUES ( ... )

The field list may be omitted in which case it is assumed to contain, in order, 
all the explicitly defined fields from the table declaration.  If this is the 
field list you want to use, you may either explicitly list the fields, or omit 
the list entirely.  If you want to insert values into a subset of the fields in 
the table, or use a different binding order than the order the fields were 
declared in the schema, then you need to specify the field list.  Any field not 
listed in the field list will be given the default value specified in the table 
declaration, or null if no default was specified (in other words the default 
default is default null).  If null is not permitted, the statement will fail.

It is good practice to specify the field names for two reasons:  (1) as Igor 
said, it is self documentation; and, (2) because you can then change the table 
declaration and add new fields, or re-order them, without changing the SQL -- 
this becomes more important as feeping creaturitis overtakes your database and 
you find you are adding new fields or references that do not impact already 
working code.

> I have made some progress. This is what my Structure table looks like now,
> 
> CREATE TABLE Structure(
>'id' INTEGER PRIMARY KEY AUTOINCREMENT,
>'name' TEXT NOT NULL,
>'filePath' TEXT NOT NULL,
>'iH1' INTEGER NOT NULL,
>'iH2' INTEGER NOT NULL,
>'iH3' INTEGER NOT NULL,
>'iH4' INTEGER NOT NULL,
>'formula' TEXT NOT NULL,
>'fw' FLOAT NOT NULL,
>UNIQUE(iH1, iH2, iH3, iH4)
>ON CONFLICT FAIL
> )

The column "id" (INTEGER PRIMARY KEY) is the row number of the row in the 
table.   If you do not declare an INTEGER PRIMARY KEY explicitly, then the row 
number is available in an implicit field called "rowid" (since every row must 
obviously have a unique row number).  An implicit rowid can be retrieved by 
name in a SELECT or listed as a field in an insert to insert a specific row.  
You cannot use an implicit rowid as a parent key in a foreign key constraint.  
The implicit "rowid" is not part of the returned data in a SELECT *, nor is it 
in the default field list in an INSERT INTO.  Explicitly named INTEGER PRIMARY 
KEY are included in SELECT * and in the field list of INSERT INTO.  In all 
other respects the explicit "id" INTEGER PRIMARY KEY is the same as the 
implicit "rowid" INTEGER PRIMARY KEY.

In other words, the INTEGER PRIMARY KEY column always exists and you either 
give it an explicit name or it is implicitly declared and called "rowid".  If 
it is explicitly declared then you are expected to always deal with that column 
somehow in INSERT and SELECT operations because it is an explicit field.  If 
the row number is explicitly made a column, then since you are expected to have 
to deal with it, you can reference it as a parent key in foreign key 
definitions.

The INTEGER PRIMARY KEY (whether implicit or explicitly named) is always 
magically incrementing.  You cannot store a row without a row number.  
Therefore, not specifying a row number on an insert operation (by inserting 
with a null explicit INTEGER PRIMARY KEY or not explicitly specifying a value 
for the implicit rowid) means "store the row and give it a new row number".  
For both implicit and explicit INTEGER PRIMARY KEY, the new "row number" will 
be one greater than the largest row number currently in the table (and 1 if 
there are no rows currently in the table).  If this overflows then it will be 
some other unused row number (you cannot store two different rows at the same 
location).  If there is no unused row number, then the operation fails.

AUTOINCREMENT modifies this by adding the word "ever" to the above process and 
causes the database to store the "largest ever used" row number for the table, 
and to only generate row numbers one larger than the largest row number ever in 
the table.  If that overflows, the operation fails.  This slightly increases 
overhead for every insert because now instead of max(rowid)+1 being the new 
rowid, the max(rowid ever)+1 is the new row number, and that "rowid ever" must 
be stored persistently somewhere and must be maintained.  Sometimes you need a 
guarantee that rowid's will never be re-used (AUTOINCREMENT), and 

Re: [sqlite] inserting record datinto mutliple tables with a composite primary key

2012-11-13 Thread Igor Tandetnik
LMHmedchem  wrote:
> It looks like my insert syntax should be,
> INSERT INTO Structure
> VALUES(null,'phosphoserine','phosphoserine.mol',185073,856147,73543,25338,'C3H8NO6P',185.073);
> 
> where using null for id invokes AUTOINCREMENT. Is this the right syntax for
> sqlite?

Yes, this would work. Another possibility is omitting the value for ID entirely:

insert into Structure(name, filePath, ...) values 
('phosphoserine','phosphoserine.mol', ...);

 I have also seen versions of insert where the Table fields are
> defined and look like,
> 
> INSERT INTO Structure(id, name, filePath, iH1, iH2, iH3, iH4, formula, fw)
> VALUES(null,'phosphoserine','phosphoserine.mol',185073,856147,73543,25338,'C3H8NO6P',185.073);

This works, too.

> Would I still use null here for the id, or do I not have this right? Is
> there some reason for preferring one version over the other if both are
> valid?

It's all the same to SQLite. For a programmer, a query that explicitly lists 
the columns might be easier to read - you don't need to consult the table 
definition to figure out which value goes into which field.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] inserting record datinto mutliple tables with a composite primary key

2012-11-13 Thread LMHmedchem
Hello,

I have made some progress. This is what my Structure table looks like now,

CREATE TABLE Structure(
   'id' INTEGER PRIMARY KEY AUTOINCREMENT,
   'name' TEXT NOT NULL,
   'filePath' TEXT NOT NULL,
   'iH1' INTEGER NOT NULL,
   'iH2' INTEGER NOT NULL,
   'iH3' INTEGER NOT NULL,
   'iH4' INTEGER NOT NULL,
   'formula' TEXT NOT NULL,
   'fw' FLOAT NOT NULL,
   UNIQUE(iH1, iH2, iH3, iH4)
   ON CONFLICT FAIL
)

This is the main table and I am linking all other tables using the id value
from Structure (Structure_id),

I have a question about the syntax for inserts. If I have the following data
to be inserted to Structure,

Structure Data for phosphoserine:
name = phosphoserine
filePath = phosphoserine.mol
iH1 = 185073
iH2 = 856147
iH3 = 73543
iH4 = 25338
formula= C3H8NO6P
fw = 185.073

It looks like my insert syntax should be,
INSERT INTO Structure
VALUES(null,'phosphoserine','phosphoserine.mol',185073,856147,73543,25338,'C3H8NO6P',185.073);

where using null for id invokes AUTOINCREMENT. Is this the right syntax for
sqlite? I have also seen versions of insert where the Table fields are
defined and look like,

INSERT INTO Structure(id, name, filePath, iH1, iH2, iH3, iH4, formula, fw)
VALUES(null,'phosphoserine','phosphoserine.mol',185073,856147,73543,25338,'C3H8NO6P',185.073);

Would I still use null here for the id, or do I not have this right? Is
there some reason for preferring one version over the other if both are
valid?

Satellite table would look like the following where Structure_id is the
common field linking data in Structure and Project1.

CREATE TABLE Project1(
 id INTEGER PRIMARY KEY AUTOINCREMENT,
 'Structure_id' INTEGER NOT NULL,
 'class' STRING,
 'status' STRING,
 'RI17-1' FLOAT,
 FOREIGN KEY(Strucutre_id) REFERENCES Structure(id)
)

I will post a bit about how to do the inserts to satellite tables when I
have the inserts for the structure table setup up correctly.

If Simon is reading this, I have not ignored your comment about having 500
columns in a table being an issue, I just am trying to get the basic syntax
working first. None of those fields will ever have a null value. There are
many zero's but those are not nulls and have meaning, like a family having 0
children.

*LMHmedchem*



--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/inserting-record-data-into-mutliple-tables-with-a-composite-primary-key-tp64874p65496.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] inserting record datinto mutliple tables with a composite primary key

2012-10-17 Thread Keith Medcalf

On Tuesday, 16 October, 2012, 11:51, LMHmedchem said:

You could always create a table JUST for for the compsite key which gives you 
JUST the single-key which you then use to look up data in the other tables.  If 
you specifically name the rowid columns, then you can use simple and efficient 
joins to retrieve the data from the subtables:

Create table masterkey(
rowkey integer primary key autoincrement, 
k1 integer not null, 
k2 integer not null, 
k3 integer not null, 
k4 integer not null,
unique (k1, k2, k3, k4));

create table sub1(
rowkey integer primary key references masterkey(rowkey),
...);
...

You then join the masterkey table to whatever subs you want to do the 
retrieval, and when storing data you store the masterkey first and use the 
rowkey (last_insert_rowid) for the inserts in the subtables.

Select * from masterkey, sub1 where k1=? And k2=? And k3=? And k4=? And 
masterkey.rowkey=sub1.rowkey;

If you are careful about your column names you could also just use a NATURAL 
JOIN to join the rowkey fields and not need the explicit equijoins on the 
rowkey in the select.

On all inserts you would first do the same operation against the masterkey 
table to get the appropriate rowkey -- first a lookup then an insert of the key 
if required.

> Hello Igor, thank you for the information, it is a big help.
> 
> > If you have a one-to-one relationship between two tables, is there a
> > reason why you don't simply combine the two into a single, wider table?
> 
> They way I think about a database is that you subdivide the data based
> on how you may want to retrieve it later. The main table is structure,
> and all of it's fields are mandatory not null (each record is a chemical
> structure). The other tables contain other data (about the chemical
> structure), such as available vendors, prices, experimentally measured
> values, and computer generated data. These fields may be null. Some of
> the other tables are fairly large (500-2500 cols), so I thought it would
> help make the query process more efficient if you could just search on
> the tables with the data you need and ignore others. If I am incorrect
> in thinking about the setup in this way, I would appreciate knowing
> about that.
> 
> > but the notion of having multiple primary keys doesn't seem quite right.
> 
> > Why is that? Basically, every table needs a primary key, whether
> > composite or otherwise. If this tuple of integers is the natural key for
> > your data, I don't see a problem.
> 
> I guess what I was thinking was that tables should not have independent
> primary keys if there is a 1:1 relationship in the data between the
> tables. The way I was thinking about this is that the primary key value
> assigned to a record when it was inserted to the first table would be
> copied and used to insert into the second table, etc. In my spreadsheet
> way of thinking, that is having a single primary key that is used in
> multiple tables. I'm trying to learn to think "database" and not
> "spreadsheet".
> 
> > There is - see http://sqlite.org/autoinc.html . Change your table to
> >
> > create table Structure (
> > id integer primary key,
> > i1 integer not null,
> > i2 integer not null,
> > i3 integer not null,
> > i4 integer not null,
> > ...
> > unique (i1, i2, i3, i4)
> > );
> >
> > Now, you can insert a record while leaving 'id' column out, and it will
> > be automatically assigned a unique integer value, which you can retrieve
> > with sqlite3_last_insert_rowid. You can then use that ID when inserting
> > records into your "satellite" tables.
> 
> Later on, when I need to lookup data from a record using the 4 key
> values, there would have to be a way to retrieve the unique integer
> value ROWID that corresponds to the 4 keys (was assigned by
> AUTOINCREMENT). If I use unique like above, how would the lookup on the
> 4 keys work?
> 
> LMHmedchem
> 
---
()  ascii ribbon campaign against html e-mail
/\  www.asciiribbon.org




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] inserting record datinto mutliple tables with a composite primary key

2012-10-17 Thread Simon Slavin

On 16 Oct 2012, at 6:50pm, LMHmedchem  wrote:

> Some of 
> the other tables are fairly large (500-2500 cols)

A table with 500 columns is itself a bad sign.  You should be able to think 
about the entire table makeup in your head without needing to refer to written 
documentation.  Having hundreds of numbered columns, especially, is a sign that 
you didn't think your schema through.

> In my spreadsheet 
> way of thinking, that is having a single primary key that is used in 
> multiple tables. I'm trying to learn to think "database" and not 
> "spreadsheet".


Right.  Your database is not a spreadsheet.  You don't have to lay out 
everything into one rectangular grid, you can have multiple tables.

Refactor your database to make this table into a narrower set of data, either 
by splitting it into attributes or by implementing the hierarchy that it 
probably represents.  Once you've done this your problem with zeroblobs will 
vanish.  You will probably find that you don't need those rows at all, or that 
all the zeroblobs are all inherently at the end of rows.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] inserting record datinto mutliple tables with a composite primary key

2012-10-17 Thread LMHmedchem
Hello Igor, thank you for the information, it is a big help.

> If you have a one-to-one relationship between two tables, is there a
> reason why you don't simply combine the two into a single, wider table?

They way I think about a database is that you subdivide the data based 
on how you may want to retrieve it later. The main table is structure, 
and all of it's fields are mandatory not null (each record is a chemical 
structure). The other tables contain other data (about the chemical 
structure), such as available vendors, prices, experimentally measured 
values, and computer generated data. These fields may be null. Some of 
the other tables are fairly large (500-2500 cols), so I thought it would 
help make the query process more efficient if you could just search on 
the tables with the data you need and ignore others. If I am incorrect 
in thinking about the setup in this way, I would appreciate knowing 
about that.

> but the notion of having multiple primary keys doesn't seem quite right.

> Why is that? Basically, every table needs a primary key, whether
> composite or otherwise. If this tuple of integers is the natural key for
> your data, I don't see a problem.

I guess what I was thinking was that tables should not have independent 
primary keys if there is a 1:1 relationship in the data between the 
tables. The way I was thinking about this is that the primary key value 
assigned to a record when it was inserted to the first table would be 
copied and used to insert into the second table, etc. In my spreadsheet 
way of thinking, that is having a single primary key that is used in 
multiple tables. I'm trying to learn to think "database" and not 
"spreadsheet".

> There is - see http://sqlite.org/autoinc.html . Change your table to
>
> create table Structure (
> id integer primary key,
> i1 integer not null,
> i2 integer not null,
> i3 integer not null,
> i4 integer not null,
> ...
> unique (i1, i2, i3, i4)
> );
>
> Now, you can insert a record while leaving 'id' column out, and it will
> be automatically assigned a unique integer value, which you can retrieve
> with sqlite3_last_insert_rowid. You can then use that ID when inserting
> records into your "satellite" tables.

Later on, when I need to lookup data from a record using the 4 key 
values, there would have to be a way to retrieve the unique integer 
value ROWID that corresponds to the 4 keys (was assigned by 
AUTOINCREMENT). If I use unique like above, how would the lookup on the 
4 keys work?

LMHmedchem




--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/inserting-record-data-into-mutliple-tables-with-a-composite-primary-key-tp64874p64902.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] inserting record datinto mutliple tables with a composite primary key

2012-10-16 Thread Igor Tandetnik
LMHmedchem  wrote:
> I have data that I am loading into a sqlite database from a text file. I am
> using a composite primary key for four ints for the main table.
> 
> create table Structure (
>   i1 integer not null,
>   i2 integer not null,
>   i3 integer not null,
>   i4 integer not null,
>   name string not null,
>   filePath string not null,
>   SMILES string not null,
>   formula string not null,
>   fw float not null,
>   primary key (i1, i2, i3, i4)
> )
> 
> For this table, there should be a reasonable syntax to insert and select
> using the composite primary key values to find things. My understanding is
> that the composite key values will be hashed, leading to a fast look up. I
> am not entirely sure of the syntax, but I believe that this is a sound
> structure.

No special syntax, just plain

select * from Structure where i1=:value1 and i2=:value2 and i3=:value3 and 
i4=:value4;

SQLite is smart enough to use the key for such a query.

Note that "string" doesn't have any special meaning in SQLite; you are creating 
columns with no affinity. It's better to use "text" instead. For details, see 
http://sqlite.org/datatype3.html

> There is more data for each record that will go into other tables. The
> question I have is how to best keep the data in the different tables in
> registration, meaning to make sure that I can retrieve all of the data from
> the record from all tables using the same 4 primary key values.

If you have a one-to-one relationship between two tables, is there a reason why 
you don't simply combine the two into a single, wider table?

> I could create the same primary key in every table,
> 
> create table Identifier(
>   i1 integer not null,
>   i2 integer not null,
>   i3 integer not null,
>   i4 integer not null,
>   CSpider string,
>   KEGG string,
>   CAS string,
>   primary key (i1, i2, i3, i4)
> )
> 
> but the notion of having multiple primary keys doesn't seem quite right.

Why is that? Basically, every table needs a primary key, whether composite or 
otherwise. If this tuple of integers is the natural key for your data, I don't 
see a problem.

> It also seems as if there should be a way to record the rowid of where a
> record went in the first table and I should be able to use that to insert
> data from the same record to the same rowid of other tables.

There is - see http://sqlite.org/autoinc.html . Change your table to

create table Structure (
   id integer primary key,
   i1 integer not null,
   i2 integer not null,
   i3 integer not null,
   i4 integer not null,
   ...
   unique (i1, i2, i3, i4)
);

Now, you can insert a record while leaving 'id' column out, and it will be 
automatically assigned a unique integer value, which you can retrieve with 
sqlite3_last_insert_rowid. You can then use that ID when inserting records into 
your "satellite" tables.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] inserting record datinto mutliple tables with a composite primary key

2012-10-16 Thread LMHmedchem
Hello,

I have data that I am loading into a sqlite database from a text file. I am
using a composite primary key for four ints for the main table.

create table Structure (
   i1 integer not null,
   i2 integer not null,
   i3 integer not null,
   i4 integer not null,
   name string not null,
   filePath string not null,
   SMILES string not null,
   formula string not null,
   fw float not null,
   primary key (i1, i2, i3, i4)
)

For this table, there should be a reasonable syntax to insert and select
using the composite primary key values to find things. My understanding is
that the composite key values will be hashed, leading to a fast look up. I
am not entirely sure of the syntax, but I believe that this is a sound
structure.

There is more data for each record that will go into other tables. The
question I have is how to best keep the data in the different tables in
registration, meaning to make sure that I can retrieve all of the data from
the record from all tables using the same 4 primary key values.

I could create the same primary key in every table,

create table Identifier(
   i1 integer not null,
   i2 integer not null,
   i3 integer not null,
   i4 integer not null,
   CSpider string,
   KEGG string,
   CAS string,
   primary key (i1, i2, i3, i4)
)

but the notion of having multiple primary keys doesn't seem quite right.

It also seems as if there should be a way to record the rowid of where a
record went in the first table and I should be able to use that to insert
data from the same record to the same rowid of other tables. That assumes
that there would be a way to look up the rowid associated with a set of 4
key values and use that to retrieve data from any table where that rowid was
used to insert data.

Am I going about this in the best way, or even in a reasonable way?
Suggestions and criticisms would be appreciated and a link to some examples
or a tutorial would be fantastic.

LMHmedchem




--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/inserting-record-datinto-mutliple-tables-with-a-composite-primary-key-tp64874.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users