Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?

2019-04-10 Thread Keith Medcalf

On Wednesday, 10 April, 2019 08:12, Peng Yu  wrote:

>On 4/10/19, Keith Medcalf  wrote:

>> The first column is of strings ...

>> Do you mean a single string as in "KerfufledAllaHasbalah"
>> Or a "bunch of strings with some implied delimiter" such as
>> "Kerfufled/Alla/Hasballah" where "/" is the separator between
>> strings?
>>
>> If the latter, the data needs to be normalized.

>There is only one string in each row, not many strings separated by
>some separator. A string can include white space characters. In other
>words, a string can be a word or multiple words separated by white
>spaces.

>Would it be possible to create indexes at a subcolumn level? I think
>if indexes can be created at the word level (or prefix of words, as I
>frequently search for things like a noun and its plural form
>together), then the search can be faster.

There would be a number of ways that you can do this, including using the 
built-in FTS module.  The "string" column is your free-form text.  Be aware, 
however, that this will create and maintain a number of additional indexes on 
the data and will consume *more* disk space to hold the data and indexes, 
however search time will likely decrease from your current 23 seconds to about 
say 23 milliseconds for an appropriately phrased query yeilding similar 
results.  (ie, the search will be 1000x faster but the space consumed may be 
only two or three times as much).

You could also build your own "word" list by parsing the strings and linking 
them to the record to which they pertain, then use appropriately phrased 
queries to perform lightening fast searches.  This is basically what FTS does, 
only you are doing it yourself.


---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.





___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?

2019-04-10 Thread Hick Gunter
"a string can be ... multiple words separated by white spaces"
"...indexes at subcolumn level..."
"... search for a noun and ist plural form together..."

Yes, you do need normalization. You need to divulge "what you are trying to do" 
instead of asking "how to emulate a non-relational implementation".

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Peng Yu
Gesendet: Mittwoch, 10. April 2019 16:12
An: SQLite mailing list 
Betreff: Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?

On 4/10/19, Keith Medcalf  wrote:
>
> The first column is of strings ...
>
> Do you mean a single string as in "KerfufledAllaHasbalah"
> Or a "bunch of strings with some implied delimiter" such as
> "Kerfufled/Alla/Hasballah" where "/" is the separator between strings?
>
> If the latter, the data needs to be normalized.

There is only one string in each row, not many strings separated by some 
separator. A string can include white space characters. In other words, a 
string can be a word or multiple words separated by white spaces.

Would it be possible to create indexes at a subcolumn level? I think if indexes 
can be created at the word level (or prefix of words, as I frequently search 
for things like a noun and its plural form together), then the search can be 
faster.

--
Regards,
Peng
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?

2019-04-10 Thread Graham Holden
Wednesday, April 10, 2019, 2:03:24 PM, Peng Yu  wrote:

> Given the much larger disk space required, for an occasional search of
> the data, it seems that it makes no sense to use sqlite3 if disk space
> is a major concern.

Whether it "makes sense" to use SQLite or not, probably only you can
decide. If the "zgrep" solution works, and is "fast enough", then
there may not be any point in using SQLite. However, possible
advantages (depending on your "use cases") include being able to
ENFORCE the uniqueness of the string column, and -- depending on the
types of query involved -- benefiting from an index which in many
cases _should_ speed up the search.

And, in response to a couple of your latter emails (where you give
_some_ details about your data): you will have a much better chance
of people on this list being able to help you if you give a more
complete description of the problem: include some sample data from
your TSV and the types of searches you want to do. People can then
give much more informed help.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?

2019-04-10 Thread Peng Yu
On 4/10/19, Keith Medcalf  wrote:
>
> The first column is of strings ...
>
> Do you mean a single string as in "KerfufledAllaHasbalah"
> Or a "bunch of strings with some implied delimiter" such as
> "Kerfufled/Alla/Hasballah" where "/" is the separator between strings?
>
> If the latter, the data needs to be normalized.

There is only one string in each row, not many strings separated by
some separator. A string can include white space characters. In other
words, a string can be a word or multiple words separated by white
spaces.

Would it be possible to create indexes at a subcolumn level? I think
if indexes can be created at the word level (or prefix of words, as I
frequently search for things like a noun and its plural form
together), then the search can be faster.

-- 
Regards,
Peng
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?

2019-04-10 Thread Keith Medcalf

the second column is of integers ...

Do you mean the second column in AN integer or that it is a bunch-o-integers 
separated by some separator?

If the latter, normalization is required.


---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Peng Yu
>Sent: Wednesday, 10 April, 2019 08:01
>To: SQLite mailing list
>Subject: Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?
>
>I don't know specifically what you refer to as data normalization. My
>guess is something like this. But it is irrelevant to my case.
>
>https://www.studytonight.com/dbms/database-normalization.php
>
>For my specific TSV file, it has about 50 million rows and just two
>columns. The first column is of strings and the second column is of
>integers. All the strings in the first column are unique (some
>strings
>may be substrings of other strings though).
>
>On 4/10/19, Hick Gunter  wrote:
>> I have the distinct impression that you are attempting to convert a
>flat
>> file into a naked table and pretending that the result is a
>(relational)
>> database.
>>
>> Please rethink your approach. There is a design process called
>> "normalization" that needs to be done first. This will identify
>"entities"
>> (with "attributes") and "relations" that will greatly reduce data
>> duplication found in flat files.
>
>--
>Regards,
>Peng
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?

2019-04-10 Thread Keith Medcalf

The first column is of strings ...

Do you mean a single string as in "KerfufledAllaHasbalah"
Or a "bunch of strings with some implied delimiter" such as 
"Kerfufled/Alla/Hasballah" where "/" is the separator between strings?

If the latter, the data needs to be normalized.

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Peng Yu
>Sent: Wednesday, 10 April, 2019 08:01
>To: SQLite mailing list
>Subject: Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?
>
>I don't know specifically what you refer to as data normalization. My
>guess is something like this. But it is irrelevant to my case.
>
>https://www.studytonight.com/dbms/database-normalization.php
>
>For my specific TSV file, it has about 50 million rows and just two
>columns. The first column is of strings and the second column is of
>integers. All the strings in the first column are unique (some
>strings
>may be substrings of other strings though).
>
>On 4/10/19, Hick Gunter  wrote:
>> I have the distinct impression that you are attempting to convert a
>flat
>> file into a naked table and pretending that the result is a
>(relational)
>> database.
>>
>> Please rethink your approach. There is a design process called
>> "normalization" that needs to be done first. This will identify
>"entities"
>> (with "attributes") and "relations" that will greatly reduce data
>> duplication found in flat files.
>
>--
>Regards,
>Peng
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?

2019-04-10 Thread Peng Yu
I don't know specifically what you refer to as data normalization. My
guess is something like this. But it is irrelevant to my case.

https://www.studytonight.com/dbms/database-normalization.php

For my specific TSV file, it has about 50 million rows and just two
columns. The first column is of strings and the second column is of
integers. All the strings in the first column are unique (some strings
may be substrings of other strings though).

On 4/10/19, Hick Gunter  wrote:
> I have the distinct impression that you are attempting to convert a flat
> file into a naked table and pretending that the result is a (relational)
> database.
>
> Please rethink your approach. There is a design process called
> "normalization" that needs to be done first. This will identify "entities"
> (with "attributes") and "relations" that will greatly reduce data
> duplication found in flat files.

-- 
Regards,
Peng
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?

2019-04-10 Thread Hick Gunter
I have the distinct impression that you are attempting to convert a flat file 
into a naked table and pretending that the result is a (relational) database.

Please rethink your approach. There is a design process called "normalization" 
that needs to be done first. This will identify "entities" (with "attributes") 
and "relations" that will greatly reduce data duplication found in flat files.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Peng Yu
Gesendet: Mittwoch, 10. April 2019 15:03
An: SQLite mailing list 
Betreff: Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?

Here is the runtime of using "select where like" (with %) on a .sq3 file.

real0m23.105s
user0m12.765s
sys 0m2.882s
data from a
Here is the runtime of zgrep (roughly equivalent, except that zgrep search for 
the whole line).

real0m33.814s
user0m40.927s
sys 0m0.660s

Given the much larger disk space required, for an occasional search of the 
data, it seems that it makes no sense to use sqlite3 if disk space is a major 
concern.

On 4/10/19, Peng Yu  wrote:
>> > Your comparison is unfair. A .gz file is COMPRESSED by gzip and not
>> > in any way human readable without uncompressing it first.
>
> But to store the file (and occasionally search the data), I would
> prefer 278MB instead of 1.5GB. With a .gz file, at least I can zcat
> and zgrep.
>
>> How big is the result if you compress the sqlite file through gzip?
>
> 620MB
>
>> Or how big is the TSV file you get if you expand your .gz file ?
>
> 1.17GB
>


--
Regards,
Peng
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?

2019-04-10 Thread Peng Yu
Here is the runtime of using "select where like" (with %) on a .sq3 file.

real0m23.105s
user0m12.765s
sys 0m2.882s

Here is the runtime of zgrep (roughly equivalent, except that zgrep
search for the whole line).

real0m33.814s
user0m40.927s
sys 0m0.660s

Given the much larger disk space required, for an occasional search of
the data, it seems that it makes no sense to use sqlite3 if disk space
is a major concern.

On 4/10/19, Peng Yu  wrote:
>> > Your comparison is unfair. A .gz file is COMPRESSED by gzip and not in
>> > any way human readable without uncompressing it first.
>
> But to store the file (and occasionally search the data), I would
> prefer 278MB instead of 1.5GB. With a .gz file, at least I can zcat
> and zgrep.
>
>> How big is the result if you compress the sqlite file through gzip?
>
> 620MB
>
>> Or how big is the TSV file you get if you expand your .gz file ?
>
> 1.17GB
>


-- 
Regards,
Peng
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?

2019-04-10 Thread Peng Yu
> > Your comparison is unfair. A .gz file is COMPRESSED by gzip and not in any 
> > way human readable without uncompressing it first.

But to store the file (and occasionally search the data), I would
prefer 278MB instead of 1.5GB. With a .gz file, at least I can zcat
and zgrep.

> How big is the result if you compress the sqlite file through gzip?

620MB

> Or how big is the TSV file you get if you expand your .gz file ?

1.17GB
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?

2019-04-10 Thread Simon Slavin
On 10 Apr 2019, at 7:33am, Hick Gunter  wrote:

> Your comparison is unfair. A .gz file is COMPRESSED by gzip and not in any 
> way human readable without uncompressing it first. How big is the result if 
> you compress the sqlite file through gzip?

Or how big is the TSV file you get if you expand your .gz file ?

The development team for SQLite make a 'Compressed and Encrypted Read-Only 
Database (CEROD) Extension' which is not free and requires a US$2000 license.  
It can be used only if the data you want compressed is going to be read-only.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] compressed sqlite3 database file?

2019-04-10 Thread Hick Gunter
Your comparison is unfair. A .gz file is COMPRESSED by gzip and not in any way 
human readable without uncompressing it first. How big is the result if you 
compress the sqlite file through gzip?

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Peng Yu
Gesendet: Mittwoch, 10. April 2019 07:39
An: SQLite mailing list 
Betreff: [EXTERNAL] [sqlite] compressed sqlite3 database file?

I have some TSV table in .gz format of only 278MB. But the corresponding 
sqlite3 database exceeds 1.58GB (without any index). Is there a way to make the 
database file of a size comparable (at least not over 5 times) to the original 
TSV table in the .gz file? Thanks.

--
Regards,
Peng
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users