Re: [sqlite] Lazy JSON data to sqlite database

2019-01-19 Thread Andy Goth
If you have an SQL schema that works for you and also sample data, I might
be able to assist writing the conversion program. Dunno if you're
interested since it sounds like you won't need to do this again for another
year.

On Sat, Jan 19, 2019, 23:17 David Bicking  >> Is there a tool out there that will more or less automate the task for
>
>
> >> me? Hopefully free, as no one is paying me to do this. (The other
> >> volunteers have maybe a dozen records in total and are doing their
> >> reports by hand. )
>
> >The automation is at a lower level than you seem to realize.  JSON I/O is
> a solved problem, but actually >doing anything with that data, such as
> transforming it into a relational database form, is up to you.
>
> I thought I got lucky. I found an web site called SQLify, that lets you
> upload a JSON file, and it will examine it, figure out a schema, and
> create  CREAT TABLE and INSERT lines with the data. Kind of neat.
> Unfortunately, it flattened the data in to one table, which didn't matter
> for the venue and group data, but it flattened the  hosts data by only
> using the first one and discarding the rest.
> >> A cursory look at the data: there is a top level "event" table.  "Group"
> >> and "venue" tables that are one-to-one with the event table, and a
> >> "hosts" tables with many hosts to the event table.
> >
> >JSON doesn't have tables, nor does JSON pre-declare a schema.  JSON
> intermixes the schema with the >data, and the schema is allowed to vary
> over the course of the document.
>
> Yeah, sloppy writing on my part. JSON has objects, which can be stored in
> SQL tables.
>
> >> Oh, to complicate things, the source data is not valid JSON, as the>>
> upstream source did not escape quote marks within the text.  So is there
> >> a tool that can clean up the quotes, hopefully there won't be many bad
> >> quotes... maybe three or four in the 600 record.>
> >Then it's not really JSON and can't be processed with true-blue JSON
> tools.  Everything you're saying is >calling for a custom tool.  Custom
> tools do not have to be hard or expensive though.
> >
> >But in the case of quotes not being escaped, that all by itself could be
> difficult to automate, since heuristics >will need to be applied to figure
> out whether any given quote mark is a string delimiter.  Human
> >intervention might be required.
>
> I kind of solved this by running the data through a parser. I would then
> take the error message, and find the text in the source file, and escape
> the quotes. Then parse again and fix the next error. Tedious but that works.
> David
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Lazy JSON data to sqlite database

2019-01-19 Thread David Bicking
>> Is there a tool out there that will more or less automate the task for


>> me? Hopefully free, as no one is paying me to do this. (The other
>> volunteers have maybe a dozen records in total and are doing their
>> reports by hand. )

>The automation is at a lower level than you seem to realize.  JSON I/O is a 
>solved problem, but actually >doing anything with that data, such as 
>transforming it into a relational database form, is up to you.

I thought I got lucky. I found an web site called SQLify, that lets you upload 
a JSON file, and it will examine it, figure out a schema, and create  CREAT 
TABLE and INSERT lines with the data. Kind of neat. Unfortunately, it flattened 
the data in to one table, which didn't matter for the venue and group data, but 
it flattened the  hosts data by only using the first one and discarding the 
rest.
>> A cursory look at the data: there is a top level "event" table.  "Group"
>> and "venue" tables that are one-to-one with the event table, and a
>> "hosts" tables with many hosts to the event table.
>
>JSON doesn't have tables, nor does JSON pre-declare a schema.  JSON intermixes 
>the schema with the >data, and the schema is allowed to vary over the course 
>of the document.

Yeah, sloppy writing on my part. JSON has objects, which can be stored in SQL 
tables.

>> Oh, to complicate things, the source data is not valid JSON, as the>> 
>> upstream source did not escape quote marks within the text.  So is there
>> a tool that can clean up the quotes, hopefully there won't be many bad
>> quotes... maybe three or four in the 600 record.>
>Then it's not really JSON and can't be processed with true-blue JSON tools.  
>Everything you're saying is >calling for a custom tool.  Custom tools do not 
>have to be hard or expensive though.
>
>But in the case of quotes not being escaped, that all by itself could be 
>difficult to automate, since heuristics >will need to be applied to figure out 
>whether any given quote mark is a string delimiter.  Human >intervention might 
>be required.

I kind of solved this by running the data through a parser. I would then take 
the error message, and find the text in the source file, and escape the quotes. 
Then parse again and fix the next error. Tedious but that works.
David


   
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Lazy JSON data to sqlite database

2019-01-19 Thread Andrew.Goth
David Bicking wrote:
> I have a data file with JSON data, and I want to get the data into an
> sqlite database so I can run queries against it. It is not a large file,
> only about 600 records in the main table. I've never worked with JSON
> before, and really don't want to write my own parser to extract he data.

JSON and SQL have two fundamentally different data models.  It is trivial to 
map from SQL to JSON, but going the other way may take careful thought.

> Is there a tool out there that will more or less automate the task for
> me? Hopefully free, as no one is paying me to do this. (The other
> volunteers have maybe a dozen records in total and are doing their
> reports by hand. )

The automation is at a lower level than you seem to realize.  JSON I/O is a 
solved problem, but actually doing anything with that data, such as 
transforming it into a relational database form, is up to you.

> A cursory look at the data: there is a top level "event" table.  "Group"
> and "venue" tables that are one-to-one with the event table, and a
> "hosts" tables with many hosts to the event table.

JSON doesn't have tables, nor does JSON pre-declare a schema.  JSON intermixes 
the schema with the data, and the schema is allowed to vary over the course of 
the document.

Here's some code I wrote to process JSON:

https://wiki.tcl-lang.org/page/Alternative+JSON

JSON has arrays, objects, strings, numbers, and literals.

An array is a list of values of any type.

An object is a mapping from strings to values of any type.  The strings really 
ought to be unique, or results will be unpredictable.

Strings and numbers are pretty much what you expect.

Literals are false, true, or null.

For example, here's JSON that will drop into SQL fairly easily:

[{"name": "Andy", id: 42}, {"name": "David", id: 999}]

Just do:

CREATE TABLE TableNameGoesHere(name, id);
INSERT INTO TableNameGoesHere VALUES ('Andy', 42), ('David', 999);

But JSON allows anything, so there is no way to automatically extract a useful 
schema:

["Andy", 42, "David", 999, {"event": [1, 2, "345"]}, {"group": {}}]

Here's what you need to do.  Look through your data in detail and devise an SQL 
schema that captures it, or at least the subset you care about.  Then take any 
JSON library (I recommend the one I wrote, since it's in Tcl and the Tcl 
binding for SQLite is the easiest binding) and use it to write a program that 
reads your JSON and inserts it into your SQL database.

You will also have to think about whether you blow away the SQL database every 
time or you instead try to merge in new data, and if the old data needs to be 
kept, deleted, or archived.

I don't think this is necessarily a difficult task, but it is definitely not 
one that can be performed without a serious analysis which cannot be automated.

> Oh, to complicate things, the source data is not valid JSON, as the
> upstream source did not escape quote marks within the text.  So is there
> a tool that can clean up the quotes, hopefully there won't be many bad
> quotes... maybe three or four in the 600 record.

Then it's not really JSON and can't be processed with true-blue JSON tools.  
Everything you're saying is calling for a custom tool.  Custom tools do not 
have to be hard or expensive though.

But in the case of quotes not being escaped, that all by itself could be 
difficult to automate, since heuristics will need to be applied to figure out 
whether any given quote mark is a string delimiter.  Human intervention might 
be required.
--- CONFIDENTIALITY NOTICE: This email 
and any attachments are for the sole use of the intended recipient and may 
contain material that is proprietary, confidential, privileged or otherwise 
legally protected or restricted under applicable government laws. Any review, 
disclosure, distributing or other use without expressed permission of the 
sender is strictly prohibited. If you are not the intended recipient, please 
contact the sender and delete all copies without reading, printing, or saving..
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Lazy JSON data to sqlite database

2019-01-19 Thread David Bicking
Okay, I know this is just me being lazy, but I have a data file with JSON data, 
and I want to get the data into an sqlite database so I can run queries against 
it. It is not a large file, only about 600 records in the main table. I've 
never worked with JSON before, and really don't want to write my own parser to 
extract he data.
Is there a tool out there that will more or less automate the task for me? 
Hopefully free, as no one is paying me to do this. (The other volunteers have 
maybe a dozen records in total and are doing their reports by hand. )

A cursory look at the data: there is a top level "event" table.  "Group" and 
"venue" tables that are one-to-one with the event table, and a "hosts" tables 
with many hosts to the event table. 

This is something that I will need to do once a year, so like I said, I really 
don't want to write a custom program.
Any ideas?
(Oh, to complicate things, the source data is not valid JSON, as the upstream 
source did not escape quote marks within the text.  So is there a tool that can 
clean up the quotes, hopefully there won't be many bad quotes... maybe three or 
four in the 600 record.)
Thanks,David
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users