Look, it all boils down to (CPU) time, and time is money. Generating a
billion depictions on the cloud will cost you the use of the machines.
Increasing the depiction speed by a factor of 10 decreases the cost by a
factor of 10, to a pretty good approximation. Storage is also money, so it
doesn't always make sense to store all N structures up front, if N is
large. In some contexts, it makes more sense to generate the 2d reps as
needed, rather than store them all in advance. One size doesn't fit all.
An intermediate strategy would be to generate the depictions on the fly and
memoize them for some time or up to some maximum storage limit. Of the
billion structures, only a fraction will ever be visualized, so a
memoization strategy sounds reasonable, which in turn implies that you want
rapid response when an unstored structure has to be generated.
-P.
On Thu, Dec 29, 2016 at 12:04 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu>
wrote:
> On 2016-12-29 07:19, John M wrote:
>
> > For why you need sub-second depiction consider these times for 92877507
> > structures (current size PubChem Compound):
> >
> > 1s per structure = 1074 days (~3 years)
> > 100 ms per structure = 107 days
> > 1ms per structure = 25 hours
>
> The Dilbert answer is buy a better computer. The serious answer is if
> you run millions of jobs sequentially on a single core, your problem is
> not how long a single job takes: no matter how fast you can make it, it
> will only scale linearly. There will be 1B compounds in PubChem two
> years from now and your painstakingly crafted 1ms/structure code will
> still take 3 years, the only difference is you get garbage depictions.
>
> Condor can be persuaded fire up 92877507 EC2 VMs and run all of those in
> parallel -- provided you're willing to pay Amazon for it of course. If
> you can code the algorithm into GPGPU/SIMD parallel flow, you can
> probably push it into an FPGA and then get that baked into ASICs in
> China -- they'll give you discount if you order more than ten thousand.
> That gets you a $20 USB dongle that will run them at umpteen K/second.
> And so on.
>
> If you don't want quality depictions because bad ones will work just
> fine for your needs, that's a perfectly good argument. If you don't want
> them because generating 10M sequentially on a single core will take a
> long time, that's BS argument.
>
> Dima
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss