Re: [MTT devel] questions about MTT database from HDF

2010-11-07 Thread Jeff Squyres
Yep; I mentioned your GDS-backend work to the HDF folks.  But your email is 
much more detailed than what I mentioned -- thanks!


On Nov 7, 2010, at 1:02 AM, Mike Dubman wrote:

> 
> Hi,
> Also, there is an MTT option to select Google Datastore as a storage backend 
> for mtt results.
> 
> 
> Pro:
>  - your data is stored in the Google`s cloud
>  - You can access your data from scripts
>  - You can create a custom UI for you data visualization
>  - You can use Google`s default datastore querying tools 
>  - seamless integration with mtt
>  - No need in DBA services 
>  - There are some simple report scripts to query data and generate Excel files
>  - You can define custom dynamic DB fields and associate it with your data
>  - You can define security policy/permissions for your data
> 
> Cons:
>  - No UI (mtt default UI works with sql backend only)
> 
> regards
> Mike
> 
> On Thu, Nov 4, 2010 at 11:08 PM, Quincey Koziol  wrote:
> Hi Josh!
> 
> On Nov 4, 2010, at 8:30 AM, Joshua Hursey wrote:
> 
> >
> > On Nov 3, 2010, at 9:10 PM, Jeff Squyres wrote:
> >
> >> Ethan / Josh --
> >>
> >> The HDF guys are interested in potentially using MTT.
> >
> > I just forwarded a message to the mtt-devel list about some work at IU to 
> > use MTT to test the CIFTS FTB project. So maybe development between these 
> > two efforts can be mutually beneficial.
> >
> >> They have some questions about the database.  Can you guys take a whack at 
> >> answering them?  (be sure to keep the CC, as Elena/Quincey aren't on the 
> >> list)
> >>
> >>
> >> On Nov 3, 2010, at 1:29 PM, Quincey Koziol wrote:
> >>
> >>> Lots of interest here about MTT, thanks again for taking time to demo 
> >>> it and talk to us!
> >>
> >> Glad to help.
> >>
> >>> One lasting concern was the slowness of the report queries - what's 
> >>> the controlling parameter there?  Is it the number of tests, the size of 
> >>> the output, the number of configurations of each test, etc?
> >>
> >> All of the above.  On a good night, Cisco dumps in 250k test runs to the 
> >> database.  That's just a boatload of data.  End result: the database is 
> >> *HUGE*.  Running queries just takes time.
> >>
> >> If the database wasn't so huge, the queries wouldn't take nearly as long.  
> >> The size of the database is basically how much data you put into it -- so 
> >> it's really a function of everything you mentioned.  I.e., increasing any 
> >> one of those items increases the size of the database.  Our database is 
> >> *huge* -- the DB guys tell me that it's lots and lots of little data (with 
> >> blobs of stdout/stderr here an there) that make it "huge", in SQL terms.
> >>
> >> Josh did some great work a few summers back that basically "fixed" the 
> >> speed of the queries to a set speed by effectively dividing up all the 
> >> data into month-long chunks in the database.  The back-end of the web 
> >> reporter only queries the relevant month chunks in the database (I think 
> >> this is a postgres-specific SQL feature).
> >>
> >> Additionally, we have the DB server on a fairly underpowered machine that 
> >> is shared with a whole pile of other server duties (www.open-mpi.org, 
> >> mailman, ...etc.).  This also contributes to the slowness.
> >
> > Yeah this pretty much sums it up. The current Open MPI MTT database is 141 
> > GB, and contains data as far back as Nov. 2006. The MTT Reporter takes some 
> > of this time just to convert the raw database output into pretty HTML (it 
> > is currently written in PHP). At the bottom of the MTT Reporter you will 
> > see some stats on where the Reporter took most of its time.
> >
> > How long the Reporter took total to return the result is:
> >  Total script execution time: 24 second(s)
> > How long just the database query took is reported as:
> >  Total SQL execution time: 19 second(s)
> >
> > We also generate an overall contribution graph which is also linked at the 
> > bottom to give you a feeling of the amount of data coming in every 
> > day/week/month.
> >
> > Jeff mentioned the partition tables work that I did a couple summers ago. 
> > The partition tables help quite a lot by partitioning the data into week 
> > long chunks so shorter date ranges will be faster than longer date ranges 
> > since they pull a smaller table with respect to all of the data to perform 
> > a query. The database interface that the MTT Reporter uses is abstracted 
> > away from the partition tables, it is really just the DBA (I guess that is 
> > me these days) that has to worry about their setup (which is usually just a 
> > 5 min task once a year). Most of the queries to MTT ask for date ranges 
> > like 'past 24 hours', 'past 3 days' so breaking up the results by week 
> > saves some time.
> >
> > One thing to also notice is that usually the first query through the MTT 
> > Reporter is the slowest. After that first query the MTT database 
> > (postgresql in this case) it is able to cache some of the query 

Re: [MTT devel] questions about MTT database from HDF

2010-11-07 Thread Mike Dubman
Hi,
Also, there is an MTT option to select Google Datastore as a storage backend
for mtt results.


Pro:
 - your data is stored in the Google`s cloud
 - You can access your data from scripts
 - You can create a custom UI for you data visualization
 - You can use Google`s default datastore querying tools
 - seamless integration with mtt
 - No need in DBA services
 - There are some simple report scripts to query data and generate Excel
files
 - You can define custom dynamic DB fields and associate it with your data
 - You can define security policy/permissions for your data

Cons:
 - No UI (mtt default UI works with sql backend only)

regards
Mike

On Thu, Nov 4, 2010 at 11:08 PM, Quincey Koziol  wrote:

> Hi Josh!
>
> On Nov 4, 2010, at 8:30 AM, Joshua Hursey wrote:
>
> >
> > On Nov 3, 2010, at 9:10 PM, Jeff Squyres wrote:
> >
> >> Ethan / Josh --
> >>
> >> The HDF guys are interested in potentially using MTT.
> >
> > I just forwarded a message to the mtt-devel list about some work at IU to
> use MTT to test the CIFTS FTB project. So maybe development between these
> two efforts can be mutually beneficial.
> >
> >> They have some questions about the database.  Can you guys take a whack
> at answering them?  (be sure to keep the CC, as Elena/Quincey aren't on the
> list)
> >>
> >>
> >> On Nov 3, 2010, at 1:29 PM, Quincey Koziol wrote:
> >>
> >>> Lots of interest here about MTT, thanks again for taking time to
> demo it and talk to us!
> >>
> >> Glad to help.
> >>
> >>> One lasting concern was the slowness of the report queries - what's
> the controlling parameter there?  Is it the number of tests, the size of the
> output, the number of configurations of each test, etc?
> >>
> >> All of the above.  On a good night, Cisco dumps in 250k test runs to the
> database.  That's just a boatload of data.  End result: the database is
> *HUGE*.  Running queries just takes time.
> >>
> >> If the database wasn't so huge, the queries wouldn't take nearly as
> long.  The size of the database is basically how much data you put into it
> -- so it's really a function of everything you mentioned.  I.e., increasing
> any one of those items increases the size of the database.  Our database is
> *huge* -- the DB guys tell me that it's lots and lots of little data (with
> blobs of stdout/stderr here an there) that make it "huge", in SQL terms.
> >>
> >> Josh did some great work a few summers back that basically "fixed" the
> speed of the queries to a set speed by effectively dividing up all the data
> into month-long chunks in the database.  The back-end of the web reporter
> only queries the relevant month chunks in the database (I think this is a
> postgres-specific SQL feature).
> >>
> >> Additionally, we have the DB server on a fairly underpowered machine
> that is shared with a whole pile of other server duties (www.open-mpi.org,
> mailman, ...etc.).  This also contributes to the slowness.
> >
> > Yeah this pretty much sums it up. The current Open MPI MTT database is
> 141 GB, and contains data as far back as Nov. 2006. The MTT Reporter takes
> some of this time just to convert the raw database output into pretty HTML
> (it is currently written in PHP). At the bottom of the MTT Reporter you will
> see some stats on where the Reporter took most of its time.
> >
> > How long the Reporter took total to return the result is:
> >  Total script execution time: 24 second(s)
> > How long just the database query took is reported as:
> >  Total SQL execution time: 19 second(s)
> >
> > We also generate an overall contribution graph which is also linked at
> the bottom to give you a feeling of the amount of data coming in every
> day/week/month.
> >
> > Jeff mentioned the partition tables work that I did a couple summers ago.
> The partition tables help quite a lot by partitioning the data into week
> long chunks so shorter date ranges will be faster than longer date ranges
> since they pull a smaller table with respect to all of the data to perform a
> query. The database interface that the MTT Reporter uses is abstracted away
> from the partition tables, it is really just the DBA (I guess that is me
> these days) that has to worry about their setup (which is usually just a 5
> min task once a year). Most of the queries to MTT ask for date ranges like
> 'past 24 hours', 'past 3 days' so breaking up the results by week saves some
> time.
> >
> > One thing to also notice is that usually the first query through the MTT
> Reporter is the slowest. After that first query the MTT database (postgresql
> in this case) it is able to cache some of the query information which should
> make subsequent queries a little faster.
> >
> > But the performance is certainly not where I would like it, and there are
> still a few ways to make it better. I think if we moved to a newer server
> that is not quite as heavily shared we would see a performance boost.
> Certainly if we added more RAM to the system, and potentially a 

[MTT devel] questions about MTT database from HDF

2010-11-03 Thread Jeff Squyres
Ethan / Josh --

The HDF guys are interested in potentially using MTT.  They have some questions 
about the database.  Can you guys take a whack at answering them?  (be sure to 
keep the CC, as Elena/Quincey aren't on the list)


On Nov 3, 2010, at 1:29 PM, Quincey Koziol wrote:

>   Lots of interest here about MTT, thanks again for taking time to demo 
> it and talk to us!

Glad to help.

>   One lasting concern was the slowness of the report queries - what's the 
> controlling parameter there?  Is it the number of tests, the size of the 
> output, the number of configurations of each test, etc?  

All of the above.  On a good night, Cisco dumps in 250k test runs to the 
database.  That's just a boatload of data.  End result: the database is *HUGE*. 
 Running queries just takes time.

If the database wasn't so huge, the queries wouldn't take nearly as long.  The 
size of the database is basically how much data you put into it -- so it's 
really a function of everything you mentioned.  I.e., increasing any one of 
those items increases the size of the database.  Our database is *huge* -- the 
DB guys tell me that it's lots and lots of little data (with blobs of 
stdout/stderr here an there) that make it "huge", in SQL terms. 

Josh did some great work a few summers back that basically "fixed" the speed of 
the queries to a set speed by effectively dividing up all the data into 
month-long chunks in the database.  The back-end of the web reporter only 
queries the relevant month chunks in the database (I think this is a 
postgres-specific SQL feature).

Additionally, we have the DB server on a fairly underpowered machine that is 
shared with a whole pile of other server duties (www.open-mpi.org, mailman, 
...etc.).  This also contributes to the slowness.

> For example, each HDF5 build includes on the order of 100 test executables, 
> and we run 50 or so configurations each night.  How would that compare with 
> the OpenMPI test results database?

Good question.  I'm CC'ing the mtt-devel list to see if Josh or Ethan could 
comment on this more intelligently than me -- they did almost all of the 
database work, not me.

I'm *guessing* that it won't come anywhere close to the size of the Open MPI 
database (we haven't trimmed the data in the OMPI database since we started 
gathering data in the database several years ago).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/