See below.

On Sun, Dec 28, 2008 at 11:56 PM, Chris Wedgwood <c...@f00f.org> wrote:
> On Sun, Dec 28, 2008 at 11:49:34PM -0800, Webb Sprague wrote:
>
>> I am sure there is a better way to deal with 12K rows by 2500 columns,
>> but I can't figure it out....
>
> 2500 columns sounds like a nightmare to deal with
>
> could you perhaps explain that data layout a little?
>

It is a download of a huge longitudinal survey
(www.bls.gov/nls/nlsy79.htm) that has been converted out of the
proprietary format into SAS, and now I want to convert it into a
single SQLITE database per wave.  I will wind up connecting people by
ID across the waves to show patterns of moving etc...

For each wave/ table, each row describes contains integers that code
for information about a single respondent, such as age, whether
employed in June  (either zero or one), whether employed in July,
etc...  Since the NLSY doesn't do multiple tables, this is very much
NOT normalized.  What the codes mean is described in a separate
codebook (-5 = missing data, 1=living at home, etc).

There is a separate table for each wave (1979, 1980, ... 2006).

I have managed (just now) to get it working with a hacked version of
SQLITE.  Here is a meaningless query, just to confirm:

sqlite> select W0072400, count(*) as c  from data_stuff group by
W0072400 order by c desc limit 5;
0,9204
-5,2513
100,293
1,80
3,43
CPU Time: user 0.917062 sys 0.364962

Like I say, I may be going about it all wrong, but I can't run the
proprietary software on my Mac, and SQL makes me comfortable.  I hope
to pull out the data I want via SQL (a processed 1% of the total),
then run statistical analyses and graphics with R.

I am describing all this in hopes there is another quantitative
sociologist out there using SQLITE!

TIA
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to