Something I didn't see mentioned of, does your data need to be made
available at real time?  Just because you're sampling 20/s doesn't have
to mean that the data is made available at 20 samples per second or even
1 sample per 1/20th of a second.  I mention this because you might find
that it's a little more resource friendly to stuff many samples into the
same transaction and commit them together.  As an example, take
60-seconds (1200 samples) worth of data and send it to the database
while continuing to sample your data.  I mention this because it may
allow for more flexibility in your implementation.

At any rate, I agree with the other comments.  Maintenance issues are
probably going to be the gotcha if you're not forward looking.


Regards,

        Greg Copeland


On Sun, 2003-03-02 at 19:33, Mark Jones wrote:
> > The real question is, the data collection is in real-time, but can you
> > have a maintenance window (6 hours a week?) to do things like REINDEX?
> 
> The database has to have the last 24 hours of data online and be acessable 24 
> hours a day 7 days a week 365 days a year basicly no downtime. My application 
> is weather related system, which must be avaliable at any moment.
> 
> > Will the data purge be spread over time or come in large lumps?
> 
>  The database that is online would only contain the last 24 hours of data.
> Once 24 hours of data has been collected the data would begin to be purged say
>  purging of 1 hours worth of data every hour.
> 
> > But if it's 20 packets with parts being inserted into 50 tables (now
> > 1000 inserts / second) things could become a little trickier -- but
> > still easily doable.
> 
> No there will be only one table with each packet having its own record.
> Each packet contains a few header fields that I would like to be able to 
> search on. The rest of the data is purely binary data that goes along with 
> the header. I no intertest in searching through it only being able to retrive 
> it based on its header. One important way of retriving data would be based on 
> time saying that every packet has a timestamp, I would like to select a range 
> of records out of the database. There maybe cases were I want to stream the 
> data back out just the way as it was recorded.
> 
> 
> > Can you use a parallel loader or does it need to be loaded sequentially?
> 
> I am not exactly sure what you mean by parallel loader but the data would be 
> loaded sequentially into the database just as I receive it from my external 
> device. As mentioned above I may want to be able to sequentially read records 
> from the database to be able to recreate the data stream that I received and 
> recored. I would need to be able to read the records back fast enough to 
> replay the at the same rate at which I received them.
> 
> 
> Thanks Again
> Mark
> 
> 
> On Sunday 02 March 2003 06:35 pm, Rod Taylor wrote:
> > On Sun, 2003-03-02 at 18:52, Mark Jones wrote:
> > > Hello
> > >
> > > I am working on a project that acquires real-time data from an external
> > > device that I need to store and be able to search through and retrieve
> > > quickly. My application receives packets of data ranging in size from 300
> > > to 5000 bytes every 50 milliseconds for the minimum duration of 24 hours
> > > before the data is purged or archived off disk. There are several fields
> > > in the data that I like to be able to search on to retrieve the data at
> > > later time. By using a SQL database such as Postgresql or Mysql it seams
> > > that it would make this task much easier. My questions are, is a SQL
> > > database such as Postgresql able to handle this kind of activity saving a
> > > record of 5000 bytes at rate of 20 times a second, also how well will it
> > > perform at searching through a database which contains nearly two million
> > > records at a size of about 8 - 9 gigabytes of data, assuming that I have
> > > adequate computing hardware. I am trying to determine if a SQL database
> > > would work well for this or if I need to write my own custom database for
> > > this project. If anyone has any experience in doing anything similar with
> > > Postgresql  I would love to know about your findings.
> >
> > Not such a big load depending on the complexity of the queries being
> > performed.  From the sounds of it, they're pretty simple.
> >
> > The real question is, the data collection is in real-time, but can you
> > have a maintenance window (6 hours a week?) to do things like REINDEX?
> >
> > Can you use a parallel loader or does it need to be loaded sequentially?
> >
> > Will the data purge be spread over time or come in large lumps?
> >
> >
> > Simple queries on 2 million records happen in low double digit
> > milliseconds on relatively low end hardware with a decent index -- but
> > complex queries can take minutes on 1/10000th the dataset.
> >
> > 20 inserts / second shouldn't be difficult to achieve even on the lowest
> > end hardware -- my laptop can do that -- giving each packet it's own
> > transaction.
> >
> > But if it's 20 packets with parts being inserted into 50 tables (now
> > 1000 inserts / second) things could become a little trickier -- but
> > still easily doable.
> >
> > The most important question is the maintenance window.
-- 
Greg Copeland <[EMAIL PROTECTED]>
Copeland Computer Consulting


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Reply via email to