Re: time expression of clojure produces different results on cascalog queries

2014-06-14 Thread Linus Ericsson
It says Criterium ran four batches of 60 samples in each (it tries to make the jvm då garbage collection etc between each such batch). In total, 240 samples (ie timed test runs). The statistics are better looked up at wikipedia, but lower quartile here means the 2.5% of the 240 samples (0.025 *

Re: time expression of clojure produces different results on cascalog queries

2014-06-13 Thread sindhu hosamane
Thanks a ton for ur reply's Andy and Thomas . I used Criterium and got results like below : Evaluation count : 240 in 60 samples of 4 calls. Execution time mean : 265.359848 ms Execution time std-deviation : 25.544031 ms Execution time lower quantile : 229.851248 ms ( 2.5%)

time expression of clojure produces different results on cascalog queries

2014-06-11 Thread sindhu hosamane
I have set up a single node hadoop and running my cascalog queries on it . Good and i get results too . Now i am using clojure.core/time to evaluate how much time cascalog queries took for execution. Very Strange thing is: each time i run the cascalog query , i get different elapsed time

Re: time expression of clojure produces different results on cascalog queries

2014-06-11 Thread Andy Fingerhut
I have not used Cascalog, so I do not know how much variation from one run to the next is completely normal, but there are many factors that can cause variations in run time between runs in most computations. For example: + the state of L1, L2, etc. caches in the CPU memory systems + If files

Re: time expression of clojure produces different results on cascalog queries

2014-06-11 Thread sindhu hosamane
Ya right .thanks for this info . If it is the case , how can one make performance tests ? I really have to make some performance comparisons on single node and multinode hadoop. Are there any other work arounds ? I want results to be atleast somewhat close to accurate. Or can u suggest me any

Re: time expression of clojure produces different results on cascalog queries

2014-06-11 Thread Andy Fingerhut
There are some simple things like: try to ensure that no one else is using the systems being measured besides you, and even that you yourself are doing nothing with those systems other than the runs you are trying to measure. Measure what the load on the machines is before you start your

Re: time expression of clojure produces different results on cascalog queries

2014-06-11 Thread Thomas Heller
https://github.com/hugoduncan/criterium Does most of what you'd need for some benchmarks. It should be noted that neither Hadoop nor Cascalog were built for Jobs that finish in msecs. Since you are most likely just measuring the setup/teardown, once you push some real data through the system