Hey Karl,

Thanks for the feedback.


> It might be very tricky to get correct behavior for pausing and resuming a
> benchmark. As soon as a thread is suspended, it is essentially unable to
> take correct timings. You'd have to do some message-passing or work with
> callbacks to halt a thread at controlled interruption points. Thus, what
> about just providing the ability to stop/cancel a benchmark?

Yeah I figured it would be too complicated and risky.


> Better to be less flexible at this point, but always provide accurate
> results.

I'll skip the pause functionality then.

Consider using the median rather than the average. In some cases one can
> observe infrequent outliers, which would distort the average.

Would the median value really be a good choice to represent all benchmark
sub-tests?  I mean, surely those outliers should have some impact on the
final result. Performance bottlenecks can sometimes be that thin line that
separates high-end from low-end products. I'm not so sure we should just
ignore them.

Rathre than a single score, what about providing multiple scores for the
> main quantities of interest? One score for GFLOPs, one score for memory
> bandwidth, one score for latency? Is this too technical already?

Well, a multi-score is fine by me. However, memory bandwidth in terms of
GB/s is only measured by copy benchmark as far I've seen. We should
consider adding some more bandwidth benchmarks in that case.

As for latency, I don't think our average users would care that much about
it. Whenever I overclocked my computer, I would primarily focus on
achieving higher memory bandwidth instead of lower latency. That is why I
would rather see my memory bandwidth instead of latency. It would certainly
be a useful addition, but not *that* important to have dedicated score.

Live updates would certainly be cool. However, you need to make sure that
> the plots are only drawn while in between different benchmark tests,
> otherwise the plotting might induce a certain load which will interfere
> with the benchmark. I think this can be accomplished without too much
> effort.

Well each benchmark is run in its own separate thread. I don't think the
main GUI thread can interfere with a benchmark's thread. But if you're
referring to data transfer between the CPU and GPU, then I can't say for
sure. To the best of my knowledge, Qt widgets utilize CPU and RAM, there's
no GPU and modern OpenGL involved. As for communication between  running
benchmarks and GUI, all messages coming from benchmarks are sent between
sub-tests. All benchmarks' sub-tests are intact, so there shouldn't be any
problems regarding message emitting.

At this point we should certainly define a reasonable grouping of the
> results. For example, our current LU factorization is fairly slow because
> of the way it is implemented, not because the hardware is poor. Are you
> available for an IRC session on this tomorrow, e.g. 16:00 UTC?

Sure, I'm available at 16:00 UTC tomorrow (I guess that's today now :D )

I quickly adjusted the one I designed some time ago, see attachment. I can
> commit the GIMP-File if you like it. Of course we can alter it further as
> needed and appropriate.

Thanks, I implemented it right away and pushed to GitHub. Looks good. Yeah
I'd like the GIMP file. I'll play around with it and see if anything can be
improved.

>From what I can tell from the screenshots: This is awesome! :-)

If there's a 'download and run/benchmark now' button on the right
> somewhere, that's already all need.

Well that's just the matrix market web page rendered in a C++ widget :)
I could alter the web page with JavaScript. Dynamically add a download and
run button that will get the matrix file in matrix market format. The only
complication is that matrix files in mtx format can only be downloaded as
compressed .tar.gz files. That mean I would *another* 3rd party library to
unpack it. At least there are libraries that can be shipped together with
the project, so no additional environment variables need to be set.
Speaking of which, can you recommend a good, cross-platform, open-source
C++ library for unpacking .tar.gz files?

The annoying thing with parsing webpages is that they change too often. We
> can get in touch with the Florida Matrix Market people, maybe they offer
> some more stable way of getting the data.

They are hosting their collection as a public data set using Amazon Web
Services. Unfortunately, I couldn't find a way to get to it using C++. I'll
take a better look at it tomorrow.

To be on the safe side it is also worth to consider setting up a local
> mirror for selected matrices, so we can ensure that there are always
> matrices available even if the matrix market webpage changes.

Yeah we can always do that.

One thing worth considering is dropping any use of Boost to simplify the
> installation procedure. The other thing I'm not yet excited about is the
> need for editing the .pro-file, which will always be user-specific. Is it
> possible to read environment variables so one does not have to adjust
> VIENNACLROOT or OPENCLROOT? We should reach a stage where a single
> ViennaCL_Benchmark.pro can be shared among multiple developers.

Yes, of course. The OPENCLROOT variable is actually automatically set
by QMake:

#Find OpenCL root folder
OPENCLROOT = $$(OPENCLROOT)  <-- OPENCLROOT is a user-defined QMake
variable [no special characters == declaration] and $$(OPENCLROOT) is a
system environment variable[2 dollar signs in front of bracketed
envVarName]. What's very interesting is the way a user-defined QMake is
used:  $$OPENCLROOT [2 dollar signs in front of non-bracketed varName].
Very logical and user friendly approach to handling variables. -.-

QMake attempts to get OpenCL's root folder from the OPENCLROOT environment
variable, similar to how it's done with CMake. In the case OPENCLROOT
variable isn't set, it falls back to using the hard-coded path:

isEmpty(OPENCLROOT){

    OPENCLROOT = "C:\AMDAPPSDK\2.9"

    message("OpenCL not found in evironment, using hard-coded path:
"$$OPENCLROOT)

}


It's only setup like this OpenCL, though. I didn't do it like this for
Boost and ViennaCL because I didn't know what environment variables
should be used for them. Extracting Boost and ViennaCL root folder
paths from the system PATH variable in QMake is little out of my
league. We should agree on what variables should be used to set the
root paths.


This was very good progress. Good job, Namik! :-)

Thanks, glad you like it! :)

Regards, Namik



On Thu, Jul 10, 2014 at 11:52 PM, Karl Rupp <[email protected]> wrote:

> Hi Namik,
>
>
> > initial benchmark UI (basic view) is ready. You can find some info about
>
>> it in my features overview blog post
>> <http://zalomiga.ba/blog/ui-first-look-viennacl-benchmark/>.
>>
>
> Fantastic, apparently you made very good progress recently!
>
>
>  Some
>> features are still missing and there's some design polishing to be done.
>> But the core benchmarking functionality is there.
>>
>> Your feedback is welcome.
>>
>> There are some specific questions I have:
>> ***Basic View***
>> -start/pause/stop functionality: Benchmarks are run in threads (a blog
>> post with detailed description is here
>> <http://zalomiga.ba/blog/implementing-multithreading-
>> in-viennacl-benchmark/>).
>>
>> Stopping and pausing them equals stopping and pausing the thread in
>> which they run (as far as I can tell). Implementing this is fairly easy.
>> What interests me is if there are any
>> requirements/obstacles/peculiarities I should know about before
>> attempting to implement stop/pause. I am dealing with code that runs on
>> a GPU, after all.
>>
>
> It might be very tricky to get correct behavior for pausing and resuming a
> benchmark. As soon as a thread is suspended, it is essentially unable to
> take correct timings. You'd have to do some message-passing or work with
> callbacks to halt a thread at controlled interruption points. Thus, what
> about just providing the ability to stop/cancel a benchmark?
>
>
>
>  Also, I'm starting to think a pause functionality isn't the brightest
>> idea. Would pausing a benchmark affect resulting performance in any way?
>>
>
> I'd very much expect so. Better to be less flexible at this point, but
> always provide accurate results.
>
>
>
>  -final result: my take on the final result is to take the averages of
>> each benchmark and add them together. Should some benchmarks carry more
>> 'weight' than others and have more impact on the final score? To get a
>> final result mark, all benchmarks should be executed in a single
>> session. Any suggestions?
>>
>
> Consider using the median rather than the average. In some cases one can
> observe infrequent outliers, which would distort the average. Rathre than a
> single score, what about providing multiple scores for the main quantities
> of interest? One score for GFLOPs, one score for memory bandwidth, one
> score for latency? Is this too technical already?
>
>
>
>
>  -detailed individual results: while some may argue that seeing these is
>> unnecessary in the basic view, I would beg to differ. Here are my
>> arguments:
>> 1. realtime plotting of the sub-test results looks interesting.
>> Even when I add the "benchmark is running" animation, it still won't be
>> enough to catch the users attention. Showing all these small results
>> makes the benchmark process feel "alive".
>>
>
> Live updates would certainly be cool. However, you need to make sure that
> the plots are only drawn while in between different benchmark tests,
> otherwise the plotting might induce a certain load which will interfere
> with the benchmark. I think this can be accomplished without too much
> effort.
>
>
>
>  2. users might want to see the highest result achieved in a benchmark
>> sub-test, since the final result of each benchmark is shown as the
>> average of all sub-tests.
>> I wasn't sure how to show the results of a benchmark in a single number,
>> so I took the average of all sub-tests. If this solution is acceptable,
>> then being able to see the highest result of sub-tests would be very
>> valuable.
>> For example: blas3 benchmark gives me an average of 570 GFLOPs, but a
>> quick look at sub-tests shows me that the maximum achieved result was
>> 1400 GFLOPs. Knowing this makes me feel damn good, even if I don't know
>> what it means. =)
>>
>
> At this point we should certainly define a reasonable grouping of the
> results. For example, our current LU factorization is fairly slow because
> of the way it is implemented, not because the hardware is poor. Are you
> available for an IRC session on this tomorrow, e.g. 16:00 UTC?
>
>
>
>  Speaking of which, why does blas3 give such drastically better results
>> than any other benchmark? It's 1400 GFLOPs(blas3) vs 11 GFLOPs(vector)
>> and makes the result plot look ridiculous (even when average results are
>> used).
>>
>
> This is because of the memory wall: The theoretical FLOPS keep increasing
> at the pace of Moore's law, but memory bandwidth only increases very
> slowly. Today's machines are able to process ~1000 GFLOP/sec, but only have
> a memory bandwidth of ~100 GB/sec. This means that we have 10 FLOPs per
> byte in processing power available. For a vector addition we only have 1
> FLOP for 24 bytes of data in double precision, hence we can only achieve a
> fraction of the theoretical peak. This is why so many people complain about
> the TOP500 list for supercomputers, which only measures FLOPs, but not the
> memory links. Many practical algorithms are limited not by FLOPs, but
> memory bandwidth.
>
>
>
>
>  3. there's really nothing else to put in its place. Without detailed
>> results, the basic view is plain and boring.
>>
>
> I'll have to check this out in more detail and think about how to fill
> this.
>
>
>
>
>> ***Home***
>> -Karl mentioned a splash screen. It would be great to have one. Where
>> can I get it? :)
>>
>
> I quickly adjusted the one I designed some time ago, see attachment. I can
> commit the GIMP-File if you like it. Of course we can alter it further as
> needed and appropriate.
>
>
>
>  -Need some ideas on what else to put on it and feedback on whats
>> currently there(about, options, quickstart, system info)
>>
>
> I'll check this out and report.
>
>
>
>  ***MatrixMarket***
>> -I implemented a simple browsing functionality of the Florida Matrix
>> Market. It's a basic rendering of matrix market's web page using
>> QWebView. Its only good to get a 'feel' for it. Any additional
>> functionality will have to be done from scratch.
>>
>
> From what I can tell from the screenshots: This is awesome! :-)
> If there's a 'download and run/benchmark now' button on the right
> somewhere, that's already all need.
>
>
>
>  -I'm thinking of making a custom matrix browser in the MatrixMarket
>> screen. It would parse the matrix market web page and create a
>> customized matrix table. Users could then browse through this table and
>> download matrices they like. Downloaded matrices would be saved locally
>> for future use. Users could then select a matrix from the local
>> repository and use it in the benchmark. Custom matrices could also be
>> used in this way.
>>
>
> The annoying thing with parsing webpages is that they change too often. We
> can get in touch with the Florida Matrix Market people, maybe they offer
> some more stable way of getting the data. To be on the safe side it is also
> worth to consider setting up a local mirror for selected matrices, so we
> can ensure that there are always matrices available even if the matrix
> market webpage changes.
>
>
>
>  That's all I can think of. Hope this email isn't too long.
>> Once again, your feedback is welcome.
>>
>
> This was very good progress. Good job, Namik! :-)
>
> Finally, another note on the installation process:
> One thing worth considering is dropping any use of Boost to simplify the
> installation procedure. The other thing I'm not yet excited about is the
> need for editing the .pro-file, which will always be user-specific. Is it
> possible to read environment variables so one does not have to adjust
> VIENNACLROOT or OPENCLROOT? We should reach a stage where a single
> ViennaCL_Benchmark.pro can be shared among multiple developers.
>
> Best regards,
> Karli
>
>
------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
ViennaCL-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to