William, got to tell ya, we at #uphpu have been debating your post for the
last few minutes.

For data harvesting, and for reporting, it's best to not normalize your data
> but to store it all in a single table.  After, you can normalize it (or for
> data reporting, leave it as is).
>
> Next time, a single table for harvesting, a merge of all data, then
> normalize into other tables :)
>

After having had normalization pounded into my head the first several years
of my programming life, i have to admit, it's the only way i think.

I wonder if you or anybody else would like to comment on the discussion.

[12:55] <@mindjuju> not normalize data?
[12:55] <@mindjuju> scary
[13:02] <@mindjuju> that's curious, i've reread will like 3x, and I still
think it's a little off.  I've always thought that if youj're collecting
data from the web that's going to be used for the web, it should be stored
relationally, and that if it is going to be data warehouse directly, it
should be stored suited for reporting, and if it is both, for efficency
purposes, store as relational for web and convert to non-normalized for
dbwarehouse
[13:02] <@mindjuju> though truth of it, i've never had to prep data for a
specific data warehouse structure
[13:03] <@mindjuju> i do have large sums of data i collect, but id on't
think enough to constitute a warehouse
[13:03] <+josephscott> I think in most cases the largest factor is the
amount of data you are storing
[13:04] <+josephscott> a reasonably normalized DB can do pretty much
anything you need when the size of the data is relatively small
[13:06] <@mindjuju> so things break down with larger dataset josephscott?
[13:06] <+josephscott> and given the increase is today's computing power,
large can often mean millions of rows
[13:07] <+josephscott> there are different pain points that come into play
as the amount of data gets huge
[13:07] <@mindjuju> so you're saying in some cases this is better?
|name|rank|serial|address|skill1|skill2|skill3|
[13:07] <+josephscott> for instance, you end up with different
backup/restore methods when data size is huge
[13:08] <+josephscott> and queries too, when things don't fit into memory
any more things gets slow/ugly
[13:08] <+josephscott> mindjuju: I'm saying in some cases it could be
[13:08] <@mindjuju> curious
[13:08] <+josephscott> and hopefully you've got a smart person to figure out
if your particular situation is one of those
[13:09] <+josephscott> tech isn't about finding one neat technique and
applying it to everything, it's about figuring out what your
needs/issues/pain points are and designing to deal with them
[13:10] <+josephscott> and the beauty of all this is that figuring those
things out is in a constant state of flux as well
[13:10] <@mindjuju> i've got to admit, i'm addicted to uniformity
[13:10] <@mindjuju> i like all my tables and databases lined up all neat and
organized
[13:11] <+josephscott> given the same problem today and 3 years from now,
you may solve it 3 years from now in a different way
[13:12] <xtrementl> yeah given new tech to deal with existing issues and new
issues emerging
[13:14] <+josephscott> and increase in computing power/capability and
reduction in cost for existing hardware
[13:14] <+josephscott> and changes in experience, hopefully everyone
continues to learn over the span of something like 3 years

_______________________________________________

UPHPU mailing list
[email protected]
http://uphpu.org/mailman/listinfo/uphpu
IRC: #uphpu on irc.freenode.net

Reply via email to