[sqlite] comma-separated string data

2014-04-04 Thread peter korinis
A data column in a link table contains comma-separated string data, where
each value represents a value to link to another table. (many-to-many
relationship)

 

How do you 'parse' a table entry like: "4,66,51,3009,2,678, ." to extract
these values and use them in an SQL statement, perhaps a WHERE id='66'?

 

Thanks,

peter

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance HELP

2013-04-30 Thread peter korinis
This question regards SQLite (3.6) performance. (Lengthy because I want to
describe the environment.)

. Win-7 (64-bit, though I don't know if SQLite uses 64-bit). 

. 3 year old HP laptop with Intel Core Duo CPU P8400 @ 2.27GHz with
only 4GB memory

. 286GB HD (50% full) + 1TB 7200rpm external eSata HD (90% free) -
contains target SQLite DB.

. Target DB is single user, read-only, static . as follows

o   DB size = 24GB: in 8 tables. 2 most used tables contain 43million rows
and 86 million rows - others are small look-up tables with 50-10,000 rows.
1-3 indices per table.

o   imported from 44GB CSV file with 45million rows of 600 columns

o   Used for: SQL query only (CLI or GUI), no updates, no deletes, no
inserts, or no transactions, no apps.

. Future: Double the size . intend to clone existing DB and populate
with another year's data. Additional years will likely be added at later
time.

Now to the questions:

1.   Is my current DB too large for SQLite to handle efficiently? I just
read in O'Reilly, Using SQLite, book, "If you need to store and process
several gigabytes or more of data, it might be wise to consider a more
performance-oriented product."

2.   Adding which resources would most improve performance ???  (Many
queries, like a SELECT COUNT (and GROUP) of indexed column on a JOIN of the
2 large tables may take 30-60 or more minutes.)

a.   add 4GB or 12GB more memory ?

b.  get faster HD for target DB . ext. eSATA SSD (say 256GB) ?

c.   get faster ext. eSATA 10,000rpm HD for DB ?

d.  make performance changes to current DB settings ? (or learn to write
better SQL ??)

e.  convert from SQLite to MySQL or something else? I like the
simplicity and low admin or SQLite - so I prefer not to switch DBMS

f.Something else ?

Thank you so much for your help.

peterK

 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance HELP

2013-05-02 Thread peter korinis
Thank you for your responses.

I was hoping for an easy hardware solution, like more memory or a faster HD
. but it looks like indices and table design are where I need to focus.

peterK

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLite Use of Indexes

2013-07-08 Thread peter korinis
I have a large DB and want to improve query performance.

 

Context: the DB contains 2 tables representing nationwide claim data: 

. a CLAIMS table = 43M rows with indices on claim_no and stateCounty
code; and 

. a LINE table = 85M rows with indices on claim_no and HCPCS (a 5
char text code)

. Have run ANALYZE

. Using Win7

 

Want to count occurrences of given HCPCS codes for given states.

Here is the SQL query I use:

SELECT COUNT(HCPSCD) CNT, HCPSCD, STATE_CD FROM claims , LINE

WHERE CLAIMS.CLAIM_NO == LINE.CLAIM_NO AND

HCPSCD IN
('78451','78452','78453','78454','78469','78472','78473','78481','78483','78
494','78496','78499')

AND

state_cd IN ('21', '45')

GROUP BY STATE_CD

;

Query Plan results

   Order  From

0  1  TABLE LINE WITH INDEX
idx_line_hcpscd

1  0  TABLE claims WITH INDEX
idx_claims_claim_no

 

QUESTIONS:

1.   Can SQLite use > 1 index per table per SELECT? Seems like using
both indices for each table would be better than the single index per table
as chosen by the plan.

2.   Is using something like "SELECT . FROM LINE INDEXED BY claim_no
AND INDEXED BY hcpscd" possible? What is the correct syntax to include 2
INDEXED BY this way?

3.   Is there a better way to write this query, for example, would
rearranging the order of AND criteria in WHERE clause improve performance?

4.   How do I interpret the 'order' and 'from' in the query plan
results?

 

Thanks much for your help,

peter 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite Use of Indexes

2013-07-09 Thread peter korinis
Thank you Simon for responding to my questions. Your phonebook (FName/LName)
analogy clearly explained why 2 indices per table per select won't work.

Let me provide a bit more info and a possible attempt to implement your
suggestions for better indices.

 

My 'bread and butter' query counts the frequency of HCPSCD occurrences for
each county of interest in each state of interest. [Occasionally I want
counts for entire state.] 

I generally have 6 groups of  1~15 related HCPSCD codes and want counts from
1-12 counties of a state. 

 

CLAIMS table columns: claim_no, state_cd, cnty_cd are columns of interest;
40 other columns rarely used.

LINE table columns: claim_no, hcpscd, plus 25 more columns (these are
repeating fields per claim)

 

[FYI: DB contains 44M claim numbers with 1~12 HCPSCD codes per claim,
several thousand HCPSCD codes of which I am interested in 44 in 6 groups, 50
states with 9~125 counties per state of which I am usually interested in
~10.]

 

I currently do a query like this .

SELECT COUNT(HCPSCD) CNT, HCPSCD, STATE_CD, CNTY_CD FROM CLAIMS , LINE

WHERE CLAIMS.CLAIM_NO == LINE.CLAIM_NO AND

HCPSCD IN
('78451','78452','78453','78454','78469','78472','78473','78481','78483','78
494','78496','78499')   -- this is first group of hcpscd codes always
queried

AND

STATE_CD = '21'

AND

CNTY_CD IN ('220', '345', '570')

GROUP BY CNTY_CD

UNION {repeat above SELECT with second group of hcpscd codes
for same state and county . and so on 4 more times}

 

So, to implement your suggestion of crafting better indices, here's my
approach:

1.   First action is joining the 2 tables on claim_no. {Therefore
claim_no should be first row in index for both tables}

2.   Find state and county in claims table

3.   Find HCPSCD in line table

So my 2 combined indices would be:

CREATE INDEX idx_Claim_State_Cnty ON CLAIMS (CLAIM_NO,
STATE_CD, CNTY_CD)

CREATE INDEX idx_Line_hcpscd ON LINE (CLAIM_NO, HCPSCD)

 

Is this what you were suggesting? Will these indices produce improved
performance from the single column indices I was using?

Thanks so much.

Peter

===

> 1.   Can SQLite use > 1 index per table per SELECT? Seems like using

> both indices for each table would be better than the single index per
table

> as chosen by the plan.

 

No, you?re right: one index per table, per SELECT.  Pretend you have a phone
book with two indexes: surname and first name.  You have to look up someone
called Miri Kallas.  You can use either of the indexes to look up one name
but once you?ve done that the other index is useless to you.

 

You have made up some indexes which are useful but not the most useful.
Drop those indexes and try to figure out one compound index on each table
which would be best for the SELECT you asked about.  Remember that this

 

CREATE INDEX it1 ON t1 (c1)

CREATE INDEX it2 ON t1 (c2)

 

does not do the same thing as

 

CREATE INDEX it12 ON t1 (c1, c2)

 

Can?t do it for you because I can?t tell which of your columns are from
which table.

 

> 2.   Is using something like "SELECT . FROM LINE INDEXED BY claim_no

> AND INDEXED BY hcpscd" possible? What is the correct syntax to include 2

> INDEXED BY this way?

 

If you have to tell SQLite which index to use you?re doing it wrong.  Make
up a good index and SQLite will decide to use it.  Especially if you?ve done
ANALYZE.

 

> 3.   Is there a better way to write this query, for example, would

> rearranging the order of AND criteria in WHERE clause improve performance?

 

The query optimizer is meant to do all that for you.  However, I suspect
that you may understand your query better if you get rid of some of your
'IN' clauses.  If you imagine doing 24 (= 2 * 12) different SELECTs, one for
each State and HCPSCD, what would your SELECT look like then ?  With good
indexes it should be possible to make each of those SELECTs execute
ridiculously fast.

 

Once you?ve figured out how to do that and make it run fast, /then/ you
might want to recombine the query for each State, though perhaps not put
both States in the same query.

 

> 4.   How do I interpret the 'order' and 'from' in the query plan

> results?

 

It?s showing you what each of your indexes is being used for.  And what is
shows is the neither index is being used for both selecting records and
arranging the order of results.

 

Simon.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLite Use of Indices

2013-07-10 Thread peter korinis
Please help me understand how query plan chooses an index given sqlite_stat1
table.

 

I originally had created 4 single-column indices (L3, L4, C3, C4) on this
large 2 table DB. 

Following Simon's suggestion to create better/combined/reverse indices, I
created L2 and C2 but query plan still selected the original indices - L3
and C4.

So, I tried creating index C1 (the 'reverse' of C2) and hit the jackpot -
the 3 min query runtime was cut to 1 min using C1 and C3. 

{This is counter-intuitive to me - does this mean you do the join after
filtering?}

Trying to repeat the success, I created L1 (the 'reverse' of L2) but no joy;
query plan still selects C1 and L3 indices. 

Are there any other possible indices I should try?

 

Testing with this query:

SELECT COUNT(HCPSCD) CNT, HCPSCD, STATE_CD, CNTY_CD FROM claims , LINE

WHERE CLAIMS.CLAIM_NO == LINE.CLAIM_NO

  AND HCPSCD IN
('78451','78452','78453','78454','78469','78472','78473','78481','78483','78
494','78496','78499')

  AND STATE_CD = '21'

  AND CNTY_CD IN ('060','100','210')

  GROUP BY CNTY_CD ;

 

Query Plan result:

order from  detail

0  0  TABLE claims WITH INDEX idx_Cnty_State_ClaimNo

1  1  TABLE LINE WITH INDEX idx_line_claim_no

 

Here is the sqlite_stat1 table:

Table index
detail

L1LINE   idx_Line_hcpscd_claimNo
86378354 7807   2

L2LINE   idx_Line_ClaimNo_HCPSCD
86378354 2  2

L3 ** LINE   idx_line_claim_no
86378354 2

L4LINE   idx_line_hcpscd
86378354 7807

C1 ** CLAIMSidx_Cnty_State_ClaimNo
43428892 112511 123841

C2   CLAIMSidx_Claim_State_Cnty
43428892 1  1
1

C3   CLAIMSidx_claims_claim_no
43428892 1

C4   CLAIMSidx_claims_stateCounty
43428892 620413 12384

 

How do I read the detail above for each index.

What is the logic here? 

 

Thanks much,

peter

 

 

 

 

 

 

 

 

 

 

 

 

 

 

From: Simon Slavin <slav...@bigfraud.org>

To: General Discussion of SQLite Database <sqlite-users@sqlite.org>

Subject: Re: [sqlite] SQLite Use of Indexes

Message-ID: <1b21a0a2-dc54-4afe-83e2-0832a4154...@bigfraud.org>

Content-Type: text/plain; charset=windows-1252

 

On 9 Jul 2013, at 6:06pm, peter korinis <kori...@earthlink.net> wrote:

 

> So, to implement your suggestion of crafting better indices, here?s my
approach:

> 1.   First action is joining the 2 tables on claim_no. {Therefore
claim_no should be first row in index for both tables}

> 2.   Find state and county in claims table

> 3.   Find HCPSCD in line table

> So my 2 combined indices would be:

>CREATE INDEX idx_Claim_State_Cnty ON CLAIMS (CLAIM_NO,
STATE_CD, CNTY_CD)

>CREATE INDEX idx_Line_hcpscd ON LINE (CLAIM_NO, HCPSCD)

 

Those would be good indexes.  You can see how good by using EXPLAIN QUERY
PLAN for your SELECT.  Or better still, actually try them out and time the
results.

 

Might be interesting to then reverse the order of the columns in each index
and try those.  See if they?re better or worse.  Or create lots of indexes,
then use EXPLAIN QUERY PLAN and find out which indexes SQLite decided to
use, then delete the others.

 

I am deliberately not giving you an absolute answer because you have a good
large set of data for testing and you obviously understand the idea now.
You?ll learn more by trying out several alternatives yourself.

 

Simon.

 

--

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite Use of Indexes

2013-07-14 Thread peter korinis
From: "James K. Lowden" <jklow...@schemamania.org>

To: sqlite-users@sqlite.org

Subject: Re: [sqlite] SQLite Use of Indexes

Message-ID: <20130712161038.b8b4df84.jklow...@schemamania.org>

Content-Type: text/plain; charset=US-ASCII

 

On Mon, 8 Jul 2013 15:32:21 -0400

"peter korinis" <kori...@earthlink.net> wrote:

 

/ thanks James, peter 

 

> . a CLAIMS table = 43M rows with indices on claim_no and

> stateCounty code; and 

> . a LINE table = 85M rows with indices on claim_no and HCPCS

> (a 5 char text code)

> . Have run ANALYZE

> . Using Win7

> Want to count occurrences of given HCPCS codes for given states.

 

SELECT COUNT(HCPSCD) CNT, HCPSCD, STATE_CD FROM claims ,
LINE

WHERE CLAIMS.CLAIM_NO == LINE.CLAIM_NO 

--   ^^ == is not SQL
 right, a little python slipped in /

AND HCPSCD IN ( '78451'

  , '78452'

  , '78453'

  , '78454'

  , '78469'

  , '78472'

  , '78473'

  , '78481'

  , '78483'

  , '78494'

  , '78496'

  , '78499' )

AND state_cd IN ('21', '45')

GROUP BY STATE_CD

 

>From a design perpective, you might want to think about what the IN

clause is all about.  Whatever they have in common probably belongs in

a table, so you'd be able to say 

 

SELECT COUNT(HCPSCD) CNT, HCPSCD, STATE_CD 

FROM claims, LINE, HPs

WHERE CLAIMS.CLAIM_NO = LINE.CLAIM_NO 

AND HCPSCD = HPs.code

AND HPs.category = 'foo'

AND state_cd IN ('21', '45')

GROUP BY STATE_CD

 

That suggests HCPSCD as the column to index; I suppose you have a lot

more different HCPSCDs  there are >7000 HCPSCDs ; I'm interested
in ~100 in 12 groups /

than STATE_CDs.  If the HPs table is indexed by

(category, code), then the join is on two ordered sets instead of an

extended OR.  

/// would joining 3 tables give better response ?? //

 

If HCPSCD is actually an integer, you may save space and time by

declaring it as such.  /// HCPSCD is NOT an integer //

 

HTH.

 

--jkl

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-01 Thread peter korinis
I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
dev support for a pilot project (single user, no updates, just queries).

 

I want to analyze the data contained in a 44GB csv file with 44M rows x 600
columns (fields all <15 char). Seems like a DBMS will allow me to query it
in a variety of ways to analyze the data. 

 

I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
with 4GB RAM + 200GB free disk space.

End-user tools like Excel & Access failed due to lack of memory. I
downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
but it would not load the csv files - 'csv worker failed'. So I tried
Database Master from Nucleon but it failed after loading (it took 100
minutes) ~57,000 rows with error message = 'database or disk is full". I
tried to create another table in the same db but could not with same error
message. The DB size shows as 10,000KB (that looks suspiciously like a size
setting?).

 

>From what I've read SQLite can handle this size DB. So it seems that either
I do not have enough RAM or there are memory/storage (default) limits or
maybe time-out issues that prevent loading this large file . or the 2 GUI
tools I tried have size limits. I do have a fast server (16GB, 12 procs,
64-bit intel, Win server) and an iMAC available.

 

1.   Is SQLite the wrong tool for this project? (I don't want the
overkill and admin overhead of a large MySQL or SQL Server, etc.)

2.   If SQLite will work, are there configuration settings in SQLite or
Win7 that will permit the load . or is there a better tool for this project?

 

Thanks much for helping a newbie!

 

peterK

 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-02 Thread peter korinis
Thank you all. 
Look like I'm stuck with the CLI though I have contacted Nucleon software
support ... tried CLI yesterday but need more practice.
Is there a good reference book you would recommend for SQLite?

peter

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Black, Michael (IS)
Sent: Tuesday, May 01, 2012 4:22 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file

You need to try and do an import from the shell.  GUIs seem to have way too
many limits.

http://sqlite.org/download.html



Don't do any indexes up frontdo them afterwords if they'll help your
queries.  Indexes will slow down your import notably.



I don't think you're anywhere near the limits of sqlite since it talks about
terabytes.

http://sqlite.org/limits.html



Somebody else can answer for sure but wrapping your .import inside a
transaction may be a good thing.

I don't know if that's done by default.



Your queries are liable to be pretty slow depending on what you have to do.









Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on
behalf of peter korinis [kori...@earthlink.net]
Sent: Tuesday, May 01, 2012 3:06 PM
To: sqlite-users@sqlite.org
Subject: EXT :[sqlite] is SQLite the right tool to analyze a 44GB file

I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
dev support for a pilot project (single user, no updates, just queries).



I want to analyze the data contained in a 44GB csv file with 44M rows x 600
columns (fields all <15 char). Seems like a DBMS will allow me to query it
in a variety of ways to analyze the data.



I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
with 4GB RAM + 200GB free disk space.

End-user tools like Excel & Access failed due to lack of memory. I
downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
but it would not load the csv files - 'csv worker failed'. So I tried
Database Master from Nucleon but it failed after loading (it took 100
minutes) ~57,000 rows with error message = 'database or disk is full". I
tried to create another table in the same db but could not with same error
message. The DB size shows as 10,000KB (that looks suspiciously like a size
setting?).



>From what I've read SQLite can handle this size DB. So it seems that either
I do not have enough RAM or there are memory/storage (default) limits or
maybe time-out issues that prevent loading this large file . or the 2 GUI
tools I tried have size limits. I do have a fast server (16GB, 12 procs,
64-bit intel, Win server) and an iMAC available.



1.   Is SQLite the wrong tool for this project? (I don't want the
overkill and admin overhead of a large MySQL or SQL Server, etc.)

2.   If SQLite will work, are there configuration settings in SQLite or
Win7 that will permit the load . or is there a better tool for this project?



Thanks much for helping a newbie!



peterK



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-02 Thread peter korinis
not sure yet ... but i'm working on it (between interruptions).
thanks


-Original Message-
>From: "Black, Michael (IS)" <michael.bla...@ngc.com>
>Sent: May 2, 2012 10:15 AM
>To: General Discussion of SQLite Database <sqlite-users@sqlite.org>
>Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file
>
>Does that mean using the CLI worked for you?
>
>
>
>If so, you may be able to access the database with the other programs AFTER 
>you create it.
>
>Seems that creating a db from csv is challenging to some and not thoroughly 
>tested for large data sources.
>
>
>
>Michael D. Black
>
>Senior Scientist
>
>Advanced Analytics Directorate
>
>Advanced GEOINT Solutions Operating Unit
>
>Northrop Grumman Information Systems
>
>________
>From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
>behalf of peter korinis [kori...@earthlink.net]
>Sent: Wednesday, May 02, 2012 9:06 AM
>To: 'General Discussion of SQLite Database'
>Subject: EXT :Re: [sqlite] is SQLite the right tool to analyze a 44GB file
>
>Thank you all.
>Look like I'm stuck with the CLI though I have contacted Nucleon software
>support ... tried CLI yesterday but need more practice.
>Is there a good reference book you would recommend for SQLite?
>
>peter
>
>-Original Message-
>From: sqlite-users-boun...@sqlite.org
>[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Black, Michael (IS)
>Sent: Tuesday, May 01, 2012 4:22 PM
>To: General Discussion of SQLite Database
>Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file
>
>You need to try and do an import from the shell.  GUIs seem to have way too
>many limits.
>
>http://sqlite.org/download.html
>
>
>
>Don't do any indexes up frontdo them afterwords if they'll help your
>queries.  Indexes will slow down your import notably.
>
>
>
>I don't think you're anywhere near the limits of sqlite since it talks about
>terabytes.
>
>http://sqlite.org/limits.html
>
>
>
>Somebody else can answer for sure but wrapping your .import inside a
>transaction may be a good thing.
>
>I don't know if that's done by default.
>
>
>
>Your queries are liable to be pretty slow depending on what you have to do.
>
>
>
>
>
>
>
>
>
>Michael D. Black
>
>Senior Scientist
>
>Advanced Analytics Directorate
>
>Advanced GEOINT Solutions Operating Unit
>
>Northrop Grumman Information Systems
>
>
>From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on
>behalf of peter korinis [kori...@earthlink.net]
>Sent: Tuesday, May 01, 2012 3:06 PM
>To: sqlite-users@sqlite.org
>Subject: EXT :[sqlite] is SQLite the right tool to analyze a 44GB file
>
>I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
>dev support for a pilot project (single user, no updates, just queries).
>
>
>
>I want to analyze the data contained in a 44GB csv file with 44M rows x 600
>columns (fields all <15 char). Seems like a DBMS will allow me to query it
>in a variety of ways to analyze the data.
>
>
>
>I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
>with 4GB RAM + 200GB free disk space.
>
>End-user tools like Excel & Access failed due to lack of memory. I
>downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
>but it would not load the csv files - 'csv worker failed'. So I tried
>Database Master from Nucleon but it failed after loading (it took 100
>minutes) ~57,000 rows with error message = 'database or disk is full". I
>tried to create another table in the same db but could not with same error
>message. The DB size shows as 10,000KB (that looks suspiciously like a size
>setting?).
>
>
>
>From what I've read SQLite can handle this size DB. So it seems that either
>I do not have enough RAM or there are memory/storage (default) limits or
>maybe time-out issues that prevent loading this large file . or the 2 GUI
>tools I tried have size limits. I do have a fast server (16GB, 12 procs,
>64-bit intel, Win server) and an iMAC available.
>
>
>
>1.   Is SQLite the wrong tool for this project? (I don't want the
>overkill and admin overhead of a large MySQL or SQL Server, etc.)
>
>2.   If SQLite will work, are there configuration settings in SQLite or
>Win7 that will permit the load . or is there a better tool for this project?
>
>
>
>Thanks much for helping a newbie!
>
>
>
>peterK
>
>
>
>___
>sqlite-users mailing list
>sqlite-use

Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-03 Thread peter korinis
I have R but really haven't used it much. I know it's a great stats package
and great for data reduction ... but I want to perform queries against my
44GB of data, filtering records by a variety of attributes, comparing those
subsets in a variety of ad hoc ways, perhaps summing/counting other fields,
etc. 
This is the kind of job excel is good at ... but the data is too bit!
Seems like a database plus a good query GUI or some BI app would work. is R
a good query tool?

Thanks,
peter


-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Warren Young
Sent: Thursday, May 03, 2012 9:36 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file

On 5/1/2012 2:06 PM, peter korinis wrote:
> Is SQLite the wrong tool for this project?

Probably.

SQLite is a data storage tool.  With enough SQL cleverness, you can turn it
into a data *reduction* tool.  But a data analysis tool?  No, not without
marrying it to a real programming language.

Granted, that's what almost everyone does do with SQLite, but if you're
going to learn a programming language, I'd recommend you learn R, a language
and environment made for the sort of problem you find yourself stuck with.
http://r-project.org/

There are several R GUIs out there.  I like R Studio best: 
http://www.rstudio.org/

You'll still find R Studio a sharp shock compared to Excel.  And yes, it
will require some programming, and yes, I know you said you aren't a
programmer.  But in the rest of the thread, it looks like people have
convinced you to use SQLite from the command line, typing in raw SQL
commands; guess what, that's programming.  Not on the level of R code, but R
isn't far down the same slippery slope.

It may help you to know that R is most popular in the statistics community,
which of course is populated by statisticians, not programmers.

R isn't the easiest programming language to pick up, but it's far from the
hardest.  It's very similar to JavaScript, though a bit tougher to learn,
mostly due to having accumulated some strange syntax over its 36 years.
(That's counting R's predecessor, S.)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Data Import Techniques

2012-05-03 Thread peter korinis
Forgive me ... but what is SQLiteman?
Will "import table data" help me load a csv file into a SQLite table?

Thanks,
peter

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Simon
Sent: Wednesday, May 02, 2012 3:03 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Data Import Techniques

> Using the "Import Table Data" function in SQLiteman, the data loads very
quickly. However in my application, using either an SQL "insert" command or
a resultset, the import is very much slower. Is there another technique I
can use to speed things up?

This FAQ entry might interest you...

(19) INSERT is really slow - I can only do few dozen INSERTs per second
  ->  http://sqlite.org/faq.html#q19
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread peter korinis
Sqlitespy looks good ... I will try it. 
website says download contains sqlite itself, which I already have - will
there be a problem using ...spy with existing sqlite?

I was trying to use sqlite3 CLI cmd ".import testfile.txt test_tbl;" to load
a 999x46 comma-delimited file into a previously created empty table with 46
col. (if this works I will load two 22M row x 46 col csv files into that
table.) does this cmd work this way or must I create INSERT statements to do
999 inserts (later 44M inserts)?

Thanks,
peter

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Oliver Peters
Sent: Thursday, May 03, 2012 7:23 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file

Am 03.05.2012 19:59, schrieb peter korinis:
> I have R but really haven't used it much. I know it's a great stats 
> package and great for data reduction ... but I want to perform queries 
> against my 44GB of data, filtering records by a variety of attributes, 
> comparing those subsets in a variety of ad hoc ways, perhaps 
> summing/counting other fields, etc.

I prefer

http://www.yunqa.de/delphi/doku.php/products/sqlitespy/index

for creating the database (tables, views) and doing the queries cause it's
fast and reliable but I prefer to write(!) SQL code and not to create it
through a generator (as it is done in Access).

sqlitespy can't do the import job; I always do this with the CLI by creating
INSERT statements with my scripting language in a separate file

since sqlite 3.7.11 you don't need a statement like

INSERT INTO table(col01,col02,col03) VALUES(1,2,3); INSERT INTO
table(col01,col02,col03) VALUES(4,5,6);

you can make it shorter:

INSERT INTO table(col01,col02,col03) VALUES(1,2,3),(4,5,6);

this is a great advantage if you need to do many INSERTs cause your file
won't become so large


> This is the kind of job excel is good at ... but the data is too bit!
> Seems like a database plus a good query GUI or some BI app would work. is
R
> a good query tool?

afaik there is no other way than to write (!) SQL Code - depending on 
the problem this can be done in an R script or directly in the database 
(i.e. as a VIEW) or as a combination

[...]

Oliver
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread peter korinis
I have scaled down the attributes of interest to 46 columns (discarding the
other 550). No columns are calculated. No updates to this file ... one user
... only  query, sort, etc. type transactions. 
So I want to load two 22GB csv files into an empty 46 column table. (I
intend to test load with 999 records by 46 col file.) initially I only have
1 index on a record # ... am not positive several other fields that I want
to index may not be missing data in some records (I assume that will error
out if I make those an index). 
After I get the data loaded and inspect for nulls in prospective index
attributes, can I add indices?

I was planning to load using sqlite3 CLI ".import" command. Is there a way I
can monitor the progress of the load, with only minimal impact on
performance ? I've started several loads only to find out hours later that
nothing has been loaded.

Thanks for helpful info.

peter


-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Valentin Davydov
Sent: Friday, May 04, 2012 9:43 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file

On Tue, May 01, 2012 at 04:06:01PM -0400, peter korinis wrote:
> I'm new to SQLite . not a programmer . not a DBA . just an end-user 
> with no dev support for a pilot project (single user, no updates, just
queries).
> 
>  
> 
> I want to analyze the data contained in a 44GB csv file with 44M rows 
> x 600 columns (fields all <15 char). Seems like a DBMS will allow me 
> to query it in a variety of ways to analyze the data.

Yes, SQLite is quite capable of doing simple analyzis of such amounts of
data, especially selecting small subsets based on a simple criteria. However
before trying to do some real work you have to understand the structure of
your data, realize your possible queries and carefully design database
schema (tables and, equally important, indises). Perhaps, putting all data
in a single 600-column table is not a good idea (though allowed
technically), especially if your columns are equal by their physical nature:
it is not so easy to select arbitrarily calculated columns, only rows.

> I have the data files and SQLite on my laptop: a 64-bit Win7 Intel 
> dual-proc with 4GB RAM + 200GB free disk space.

Well-indexed database of small data pieces usually takes up several times
more disk space than the raw data. Probably 200GB would not be enough,
dependng mostly on the number of indises. Consider dedicating a separate
disk (or even RAID array) for it.

> End-user tools like Excel & Access failed due to lack of memory. I 
> downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager 
> add-on but it would not load the csv files - 'csv worker failed'. So I 
> tried Database Master from Nucleon but it failed after loading (it 
> took 100
> minutes) ~57,000 rows with error message = 'database or disk is full". 
> I tried to create another table in the same db but could not with same 
> error message. The DB size shows as 10,000KB (that looks suspiciously 
> like a size setting?).

Try bare sqlite shell instead of those external tools. It should take at
least several hours to fill up your database.

> From what I've read SQLite can handle this size DB.

Surely it can. In one of my projects the database takes up almost 3
terabytes of disk space, contains more than 10^10 records and still provides
small selects of indexed data in real time.

> 1.   Is SQLite the wrong tool for this project? (I don't want the
> overkill and admin overhead of a large MySQL or SQL Server, etc.)

It depends on the data structure, semantics and what you are going to find
there. SQLite isn't very good for calculation of complex aggregate
functions, but works fine in simple selecting and sorting.

> 2.   If SQLite will work, are there configuration settings in SQLite
or
> Win7 that will permit the load . or is there a better tool for this
project?

Increasing PAGE_SIZE to match the filesystem block (cluster) size and
perhaps CACHE_SIZE to fill most of the available RAM would help a bit. 
Also, don't forget to turn off journaling and wrap all in a single
transaction when creating database for the first time.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread peter korinis
Regarding SQLITE3.exe statement ".import FILE TABLE"
I created a table.
My input file is a comma-delimited text file
When I run .import I get the following "Error: FILE line 1: expected 46
columns of data but found 1"
It seems .import is not recognizing comma delimiter.
I suspect this is a simple syntax error, but I don't find any
document/explanation.

Thanks,
peter

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Simon Slavin
Sent: Friday, May 04, 2012 11:16 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file


On 4 May 2012, at 4:02pm, peter korinis <kori...@earthlink.net> wrote:

> Sqlitespy looks good ... I will try it. 
> website says download contains sqlite itself, which I already have - 
> will there be a problem using ...spy with existing sqlite?

SQLite is not a single library which has to live somewhere on your computer.
It is C code which each programmer includes in their program.  You can have
twenty programs on your disk, each using a different version of SQLite,
without problems.

> I was trying to use sqlite3 CLI cmd ".import testfile.txt test_tbl;" 
> to load a 999x46 comma-delimited file into a previously created empty 
> table with 46 col. (if this works I will load two 22M row x 46 col csv 
> files into that
> table.) does this cmd work this way or must I create INSERT statements 
> to do
> 999 inserts (later 44M inserts)?

Semicolons are needed at the end of SQL commands.  You don't want them at
the end of commands which start with a dot.  Apart from that you have
something worth trying.  Why not make a tiny test case with three lines of
two columns before you get started with the 46-column monster ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread peter korinis
No I had not used ".mode" - the wording described ".mode" as "set output
mode where." . the word "output" made me think this was for .output
statement.

I just tried " .mode csv ".  what do your * mean? Do I put in the file
and/or table name or nothing more?

I tried several different ways but still get same error when I try to
import.

 

Without knowing syntax, I'm forced to ask these dumb questions or give up
(not) . since no good documentation - sorry.

 

Thanks,

 

Have you ran *.mode csv*?

Jonas Malaco Filho

 

2012/5/7 peter korinis 

> Regarding SQLITE3.exe statement ".import FILE TABLE"

> I created a table.

> My input file is a comma-delimited text file

> When I run .import I get the following "Error: FILE line 1: expected 46

> columns of data but found 1"

> It seems .import is not recognizing comma delimiter.

> I suspect this is a simple syntax error, but I don't find any

> document/explanation.

 

pk

 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread peter korinis
On 7 May 2012, at 4:41pm, peter korinis  wrote:

 

> My input file is a comma-delimited text file

> When I run .import I get the following "Error: FILE line 1: expected 46

> columns of data but found 1"

> It seems .import is not recognizing comma delimiter.

 

<http://www.sqlite.org/sqlite.html>

 

"The default separator is a pipe symbol ("|")."

 

Simon.

 

So, if I use gawk to change my comma separated file to | (pipe) delimiter  .
will it work? 

Do I have to use ".mode csv" as Jonas just advised?

 

Thanks,

peter

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread peter korinis
Thanks for advice

 

http://www.pantz.org/software/sqlite/sqlite_commands_and_general_usage.html

 

had the explanation/example I needed to get the import done successfully. 

Using ".separator ," was what I was missing.

 

peter

 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread peter korinis
I can't find a table I just created and imported data.

With .databases I can see 'main'  but with no location on disk and with .
tables I see the table.

Instructions says they are save . but where. I can't find them with windows
search?

 

pk

 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread peter korinis
Simon

I searched the entire disk for the table name and no matches.

 

pk

 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-07 Thread peter korinis
I did NOT specify a file name when I started sqlite3 - so I guess all is
lost. I'll have to redo it.

So then the first thing is to specify the db (file) name - what syntax? I
tried typing a file name (test.db) but error.

 

pk

 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users