Hi All,

Following up my own post:

> Is there any way to parse a create statement (eg create table, create view, 
> create trigger) into its components?

At the moment I am resorting to developing regular expressions to do the 
parsing. They work, but it seems to be re-inventing the wheel.

For instance, I can parse a create table statement into two strings: table 
name, and combined column definitions and table constraints using:

(?xis)                                                  # Parse create table 
statement into:
^\s*
create\s+table\s+
(\w+)\s*                                                # 1 table name
\([\r\n]*
(.*)\s*                                                 # 2 columns and 
constraints
\)
\s*
;?\s*
$                                                       # end of text

Then I can separate the column definitions and table constraints using:

(?xis)                                                  # Parse columns and 
constraints into:
^
(.+?)                                                   # 1 column definitions
(?:     ,
        (                                               # 2 table constraint 
definitions
                (?<=,)                                  # preceding comma, not 
captured
                \s*
                (?:     constraint\s+\w+\s+)?
                (?:     primary\s+key
                |       unique
                |       check
                |       foreign\s+key
                )\b
                .*
        )*
)?
$

Then parse the column definitions into column name, type, column constraints 
using:

(?xis)                                          # Parse table column 
definitions into:
(?:^|(?<=,))                                    # start or leading comma
\s*
(\w+?)                                          # 1 column name
\b\s*
([^,]*?)?                                       # 2 column type
\b\s*
(       \s*
        (?:constraint\s+\w+\s+)?                # 3 column constraints
        \b
        (?:     primary\s+key
        |       not\s+null
        |       unique
        |       check
        |       default
        |       collate
        |       references
        )
        \b[^,]*
)*
(?:,|$)                                         # trailing comma or end of text

So, something like this:

create table People
(       ID integer primary key
,       Name text collate nocase
,       Family_ID integer
                references Family(ID)
)

Becomes:

table name: People
table columns & constraints:

        ID integer primary key
,       Name text collate nocase
,       Family_ID integer
                references Family(ID)

My second parser would separate the column definitions and table constraints, 
but since there are no table constraints in this sample, I'll skip it.

My third parser divides up the column definitions to give:

Column 1:
  name:        ID
  type:        integer
  constraints: primary key

Column 2:
  name:        Name
  type:        text
  constraints: collate nocase

Column 3:
  name:        Family_ID
  type:        integer
  constraints: references Family(ID)

For more complex column constraints I could then write another regex parser to 
separate each constraint.

I have to allow for quoted identifiers (eg "Family ID"), comments and nested 
brackets by tokenizing the string and substituting quoted/bracketed/commented 
sections with word placeholders before applying the regex.

But, as I said, I'm thinking I'm re-inventing the wheel here. Isn't there a 
better way? Or anyone care to comment on my regex, such as contingencies I may 
have missed?

Thanks,
Tom
BareFeet

 --
Comparison of SQLite GUI tools:
http://www.tandb.com.au/sqlite/compare/?ml



_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to