You saved me a lot of typing. :) Comments inline...

--- On Mon, 1/10/11, Bob Bownes <bow...@gmail.com> wrote:

> From: Bob Bownes <bow...@gmail.com>
> Subject: Re: [time-nuts] Archiving Timing Data
> To: scmcgr...@gmail.com, "Discussion of precise time and frequency 
> measurement" <time-nuts@febo.com>
> Date: Monday, January 10, 2011, 10:08 PM
> There is a difference between
> archival format and database format. If you
> are looking for an archival format that is portable, then a
> CSV (or other
> delimiter of your choice) is ideal. They are easy to import
> to a real
> database and compress well. If, on the other hand, you are
> looking for a
> working database, you're better of putting it into some
> kind of schema in a
> Real Database(TM) of your choice and then tuning for
> transactional or data
> warehouse performance.

Indeed. I was thinking that at the very least you ALWAYS want the meta-data of 
your timeseries in a database, index that so you can easily query it to find 
your timeseries of interest. With meta-data I mean things like general 
description, instruments used, date, length of time series, useful statistical 
measures of the overall timeseries, stuff like that. You can query this to find 
the timeseries of choice, and then retrieve.

Retrieval would depend a lot on your usecases. Do you just find the right 
measurements, and feed that to your file based utility? Do you want to perform 
more complicated database queries? Do you /neeeeed/ a webbased frontend?

IMO the 2 most obvious solutions for something like this would be either A) 
return the URI with the file location (csv with delimiter du jour), or B) store 
the full timeseries in the database as well and use your API of choice to 
retrieve the data.

Again, the choice depends on the type of operation and frequency of operation 
you want to perform on your data.

> In this case, a simple one or two table schema and indexes
> on the things you
> want to sort on, should take care of most of the storage
> problem. Once you
> have that, use the API for the DB of choice to
> store/retrieve the data.
> 
> MySQL is free and runs on pretty much everything nowadays.
> That plus
> myphpadmin would make it easy enough for most of those
> bright enough to
> understand the content of this list to come up with a
> schema.

Agreed. I noticed RDD being recommended. Don't do it. Why not? Because. RDD is 
pretty good for what it was made to do. I have plenty of installs around 
together with mrtg or rddtool, but I would not recommend using it for something 
like this.

As for postgresql, personally I would use postgresql due to some personal 
preferences. That said, for regular desktop users without this bias I would 
recommend using mysql. Things like myadmin and phpmyadmin make life easy... 
Plus, should one decide to make the obligatory webbased frontend, mysql is 
slightly friendlier for the php novice. Not to put everyone in the "beginning 
user" category, but more to keep the gate open to as many people as possible as 
it were.


> Bob (whose day job is with a big red database company)

Fred (whose day job involved staring in disbelief at poorly conceived schema's)


> 
> 
> On Mon, Jan 10, 2011 at 4:57 PM, <scmcgr...@gmail.com>
> wrote:
> 
> > The counter argument is with a heavyweight database -
> the size of the
> > datastore increases dramatically and there is no
> guarantee that the tool
> > will be around in 10 years to read the data.
> >
> > All SQL databases use ASCII format CSV to load and
> dump the data from their
> > internal data representation.
> >
> > Transactional systems still use a hierarchical
> database 'think IBM IMS or
> > RAIMA' to store and access large datasets like CC
> auth.   These databases
> > are one step away from ASCII or EBCDIC
> >
> > Scott
> > Sent from my Verizon Wireless BlackBerry
> >
> > -----Original Message-----
> > From: Chris Albertson <albertson.ch...@gmail.com>
> > Sender: time-nuts-boun...@febo.com
> > Date: Mon, 10 Jan 2011 12:42:03
> > To: Discussion of precise time and frequency
> measurement<
> > time-nuts@febo.com>
> > Reply-To: Discussion of precise time and frequency
> measurement
> >        <time-nuts@febo.com>
> > Subject: Re: [time-nuts] Archiving Timing Data
> >
> > We have mountains of data here too.  The best why
> to store it is in a
> > "real" database of some kind.  There are several
> that are free, open
> > source and multi-platform.  The best for this use
> is "Postgres".   As
> > this is free and open source there is no reason not to
> use it.
> >
> > In the past I've kept snapshots for simulations that
> have run for
> > hours/days/weeks and we got many hundreds of millions
> of data points.
> >  Then we are able to query for almost any
> conditions and expression,
> > for example "Give me a A, B where A-B less than 4 from
> July 5th 1998"
> >
> > I can tell you first hand that having a billion lines
> of tab separated
> > data is worse than useless.  You need itcataloged
> such that you can
> > very quickly (seconds) find useful subsets of the data
> and you can
> > NEVER know in advance what subset you might need.
> >
> >
> >
> >
> > On Mon, Jan 10, 2011 at 12:22 PM, Peter Vince <pvi...@theiet.org>
> wrote:
> > > Would a TSB (Tab Separated Value) format be
> preferable?  Full-stops
> > > and commas are used in numbers as decimal and
> thousands separators (or
> > > vice versa), so using tab character would avoid
> any problems with
> > > commas in the actual data (and make it is a bit
> easier to quickly
> > > eyeball when viewed in a text editor).
> > >
> > > Peter  (G8ZZR, London, England)
> > >
> > >
> > > On 9 January 2011 17:15, Bob Camp <li...@rtty.us>
> wrote:
> > > ...
> > >> I doubt very much I'm the only one taking a
> mountain of timing data and
> > not properly cataloging it. My guess is that maybe
> > 90% of the list members
> > are in the same boat. How about:
> > >>
> > >> 1) A set of not to restrictive data format
> standards (CSV with a few
> > restrictions ...)
> > > ...
> > >
> > >_______________________________________________
> > > time-nuts mailing list -- time-nuts@febo.com
> > > To unsubscribe, go to
> > https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
> > > and follow the instructions there.
> > >
> >
> >
> >
> > --
> > =====
> > Chris Albertson
> > Redondo Beach, California
> >
> > _______________________________________________
> > time-nuts mailing list -- time-nuts@febo.com
> > To unsubscribe, go to
> > https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
> > and follow the instructions there.
> > _______________________________________________
> > time-nuts mailing list -- time-nuts@febo.com
> > To unsubscribe, go to
> > https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
> > and follow the instructions there.
> >
> _______________________________________________
> time-nuts mailing list -- time-nuts@febo.com
> To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
> and follow the instructions there.
> 


      

_______________________________________________
time-nuts mailing list -- time-nuts@febo.com
To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
and follow the instructions there.

Reply via email to