Re: Amazonica performance: options?

2014-03-28 Thread Michael Cohen
time ec2-describe-images -a  ec2-cli-images.txt

real  1m26.401s
user  0m6.551s
sys 0m1.159s

and writes a 7.5MB file to disk. Note the -a flag, to list all of the 
available public images.

in a repl,

(time (spit clj-awz-images.txt (describe-images)))

Elapsed time: 90258.47 msecs

and writes an 18MB file to disk containing all the available public images. 

Am I missing something? 

You can also pass a list of filters to the call to narrow the result.



On Friday, March 28, 2014 7:59:48 AM UTC-7, Dave Tenny wrote:

 I'm trying to code some amazonica based solutions in a nontrivial AWS 
 environment.
 I work with many AWS accounts and it isn't unusual to see a thousand 
 instances running on one account, and similar excesses in other types of 
 AWS resources.  So if you're going an ec2-describe-instances (or amazonica 
 equivalent), it needs not to choke in this environment.

 I like the way amazonica does all the bean marshalling for me so I can 
 express queries simply.  But the returned datasets need to be more 
 pragmatic/performant.

 The problem for me is that Amazonica doesn't seem up to the task of 
 dealing with queries that return large volumes of data.
 It has nothing to do with reflection I suspect, and more to do with 
 unwieldy amounts of duplicate information in the result unmarshalling 
 process.
 The clojure all the way down philosophy results of duplicated 
 information and just printing the result to a file takes a long time.
 If I accidentally let the output go to an emacs cider repl buffer, then 
 things get so wedged up to the point I  may as well kill -9 emacs.
 (Known cider repl issues here, it isn't all amazonica).

 For example:  here's how long it takes to run the java based ec2 cli to 
 describe instances on an account:

 $ time ec2-describe-images /tmp/ec2-cli-images.out

 real0m11.484s
 user0m2.564s 
 sys 0m0.129s 


 And here's how long it takes from a 'lein repl' to run the same query on 
 the same account:

 (time (with-output [/tmp/clj-awz-images.out] (println 
 (ec2/describe-images
 Elapsed time: 194685.552683 msecs

 Now the amount of data being printed by the EC2 CLI is of course much 
 different than the output from Amazonica,
 amazonica is returning everything in gory duplicate map detail, ec2 is 
 not, as evidenced by the relative output sizes:

 -rw-rw-r--.  1 dave dave 17201290 Mar 28 10:35 clj-awz-images.out
 -rw-rw-r--.  1 dave dave99342 Mar 28 10:26 ec2-cli-images.out.11.5s

 Where the amazonica output starts with:
 {:images [{:hypervisor xen, :state available, :virtualization-type 
 paravirtual, :root-device-type instance-store,
 ... and goes on like that with duplicate keywords all the way down.

 Anyway, my goal isn't to turn amazonica into ec2 cli.  But even the most 
 trivial operations in amazonica (especially the most trivial, i.e. those 
 lacking filters against large data sets), pretty  much whack me left and 
 right
 with CPU wedged tools and (completely unacceptable) long waits for results.

 Any suggestions on how to use amazonica in a way where the output is ... 
 different, and minimal/workable?

 Or am I left with going to another package or writing my own java sdk 
 api's directly?

 I'm pretty sure the results need to be structures whose relationship to 
 data values is implicit (and not explicit in map keys). I don't see any 
 options with amazonica to change this however.

 Thanks for suggestions, forgive me if I've missed something obvious.  I'm 
 just trying to see what's out there and at the same time move along quickly 
 enough that I can get some usable tools for work (so I can lose all my 
 python and bash scripts for various interfaces, I want clojure!).

 - Dave




-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Amazonica performance: options?

2014-03-28 Thread Dave Tenny
Actually, let me withdraw the question for now.  If I call an unfiltered
(describe-images) on my account I'll get ~27,900 images.  It takes 70
seconds to retrieve them using the Java api
(from clojure).

If I then print (str image) for all those images to a file, that makes adds
another 153 seconds for a total of 223 seconds.  Presumably that's the
normal java toString() method invocation.

If I print out the Amazonica version of it, it takes 195 seconds,
presumably because we're sharing keyword references internally and so
abusing memory less overall (just a wild guess).

So if I do the native calls and cherry pick the information I want (like
the java EC2 CLI does), then I can get the time down significantly.
Otherwise Amazonica is probably doing a reasonable job given what I'm
asking of it.

And, in the wisdom gained department, never do unfiltered (describe-images)
requests if you can help it :-)




On Fri, Mar 28, 2014 at 12:52 PM, Michael Cohen mcohe...@gmail.com wrote:

 time ec2-describe-images -a  ec2-cli-images.txt

 real  1m26.401s
 user  0m6.551s
 sys 0m1.159s

 and writes a 7.5MB file to disk. Note the -a flag, to list all of the
 available public images.

 in a repl,

 (time (spit clj-awz-images.txt (describe-images)))

 Elapsed time: 90258.47 msecs

 and writes an 18MB file to disk containing all the available public
 images.

 Am I missing something?

 You can also pass a list of filters to the call to narrow the result.



 On Friday, March 28, 2014 7:59:48 AM UTC-7, Dave Tenny wrote:

 I'm trying to code some amazonica based solutions in a nontrivial AWS
 environment.
 I work with many AWS accounts and it isn't unusual to see a thousand
 instances running on one account, and similar excesses in other types of
 AWS resources.  So if you're going an ec2-describe-instances (or amazonica
 equivalent), it needs not to choke in this environment.

 I like the way amazonica does all the bean marshalling for me so I can
 express queries simply.  But the returned datasets need to be more
 pragmatic/performant.

 The problem for me is that Amazonica doesn't seem up to the task of
 dealing with queries that return large volumes of data.
 It has nothing to do with reflection I suspect, and more to do with
 unwieldy amounts of duplicate information in the result unmarshalling
 process.
 The clojure all the way down philosophy results of duplicated
 information and just printing the result to a file takes a long time.
 If I accidentally let the output go to an emacs cider repl buffer, then
 things get so wedged up to the point I  may as well kill -9 emacs.
 (Known cider repl issues here, it isn't all amazonica).

 For example:  here's how long it takes to run the java based ec2 cli to
 describe instances on an account:

 $ time ec2-describe-images /tmp/ec2-cli-images.out

 real0m11.484s
 user0m2.564s
 sys 0m0.129s


 And here's how long it takes from a 'lein repl' to run the same query on
 the same account:

 (time (with-output [/tmp/clj-awz-images.out] (println
 (ec2/describe-images
 Elapsed time: 194685.552683 msecs

 Now the amount of data being printed by the EC2 CLI is of course much
 different than the output from Amazonica,
 amazonica is returning everything in gory duplicate map detail, ec2 is
 not, as evidenced by the relative output sizes:

 -rw-rw-r--.  1 dave dave 17201290 Mar 28 10:35 clj-awz-images.out
 -rw-rw-r--.  1 dave dave99342 Mar 28 10:26 ec2-cli-images.out.11.5s

 Where the amazonica output starts with:
 {:images [{:hypervisor xen, :state available, :virtualization-type
 paravirtual, :root-device-type instance-store,
 ... and goes on like that with duplicate keywords all the way down.

 Anyway, my goal isn't to turn amazonica into ec2 cli.  But even the most
 trivial operations in amazonica (especially the most trivial, i.e. those
 lacking filters against large data sets), pretty  much whack me left and
 right
 with CPU wedged tools and (completely unacceptable) long waits for
 results.

 Any suggestions on how to use amazonica in a way where the output is ...
 different, and minimal/workable?

 Or am I left with going to another package or writing my own java sdk
 api's directly?

 I'm pretty sure the results need to be structures whose relationship to
 data values is implicit (and not explicit in map keys). I don't see any
 options with amazonica to change this however.

 Thanks for suggestions, forgive me if I've missed something obvious.  I'm
 just trying to see what's out there and at the same time move along quickly
 enough that I can get some usable tools for work (so I can lose all my
 python and bash scripts for various interfaces, I want clojure!).

 - Dave


  --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to