Re: Past and future of data.generators
For the record, after doing some simple speed comparisons of ampling functions from Incanter, data.generators, and bigml/sampling (using Criterium, and making sure to doall lazy sequences), it appears that data.generators performs very well in some situations. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Past and future of data.generators
For the record, I just did some simple speed comparisons of sampling functions from Incanter, data.generators, and bigml/sampling, and data.generators performs very well. It was fastest in some tests. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Past and future of data.generators
I do agree that the name data.generators is not where to look for a controllable random source. A more specific name for these functions should be considered. The java.util.Random has been an issue for me in stress-testing random read and writes to a huge memory-area by several threads. If I was to do it again I would use the java.util.concurrent.ThreadLocalRandom to generate random numbers in parallel. (j.u.c.TLR is only availiable in jdk = 1.7.0, clojure.core aims for 1.6.0 as well. The core.async library do use ThreadLocalRandom. The reducers functionality has a conditional import, which I think is only to be used as a very last resort in clojure.core. A surprise and caveat is that the performance was really bad when live-generating random memory addresses - likely because of cachetrashing. The performance was indeed much higher when using a prerealized (very long) random sequence of random data. A functionality for generating random memory addresses would likely benefit from having a buffer for helping the hardware pre-fetch memory (which is often a realistic scenario in stream processing). summary: - better namespace for random object/number generation - ThreadLocalRandom is only avail in jdk 1.7.0 - stresstests do benefit from buffering incoming random data, which is more realistic as well. I will dig deeper in criterium to see if this is already implemented there. /Linus On Thursday, June 5, 2014, Mars0i marsh...@logical.net wrote: clojure.core provides a minimal set of functions for random effects: rand, rand-int, and rand-nth, currently with no simple ability to base these on a resettable random number generator or on different RNGs in different threads. (But see this ticket http://dev.clojure.org/jira/browse/CLJ-1420 pointed out by Andy Fingerhut in another thread.) data.generators includes additional useful general-purpose functions involving random numbers and random choices, but this is entirely not obvious when you read the docstrings. (Some of the docstrings are pretty mysterious.) It's also not necessarily what one would guess from the name of the library. (None of this is a criticism of anyone or anything about the project. Data.generators is at an 0.n.m release stage. I'm very grateful for the work that people have put in on it.) As I understand it, data.generators was split off from test.generative, which sounds like a good idea.So data.generators was intended to provide functions that generate random data for testing. (I imagine that the existing documentation makes more sense in the context of test.generative, too.) However, what's in data.generator has more general applications, for people who want random numbers, samples, etc. outside of software testing. (In my case, that would be for random effects in scientific simulations.) Off the top of my head, it seems to me that these other applications might have slightly different needs from the use of data.generators by test.generative. For one thing, efficiency might matter a lot in some simulations, but not in software testing. (At least, *I* wouldn't care if my test functions were slow.) I'm not saying that functions in data.generator are slow, but I don't think there's a good reason to worry about making them efficient if they're only intended for software testing. Further, there are other needs than are currently provided by test.generators. See the sampling functions in bigml/sampling https://github.com/bigmlcom/sampling or Incanter http://incanter.org/, for example, and lots of other random functions that Incanter provides. Some of those should remain in Incanter, of course, but I wonder whether Clojure would benefit from a contributed library that satisfied a set of core needs for random effects. (Incanter partly builds on clojure.core's rand at this point.) Maybe data.generators is/will be that library. Or maybe parts of data.generators would make more sense as part of a separate library (math.random? data.random? math.probability?) that could be split out of data.generators. (If it doesn't make sense to split data.generators, then would a new name for the library be more appropriate?) Just some things I was wondering about. Curious to see what others say. (Fun tip: Check out data.generators' anything function, which is like Emacs' Zippy the Pinhead functions for people who prefer industrial atonal music composed by randomly filtered Jackson Pollock paintings, to speech. Or: People who want to thoroughly test their functions by throwing random randomly-typed data at them.) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com javascript:_e(%7B%7D,'cvml','clojure@googlegroups.com'); Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to
Re: Past and future of data.generators
One of the challenges with random number generation is that there are quite a few specialised requirements. I don't believe a generic approach can meet all needs. I think we actually need a few things: 1. Better implementation for clojure.core/rand etc. I think conditional usage of j.u.c.ThreadLocalRandom for Java 1.7 would be great if we can make it work - there are plenty of concurrent workloads where a shared regular java.util.Random isn't a good solution. 2. A library generic random number generation tools (e.g. data.random - should be general purpose, able to generate a wide range of useful districutions, allow arbitrary java.util.Random instances to be passed as seeds etc.) 3. More specialised solutions can live in specific libraries (e.g. core.matrix will be getting support for generation of random matrices etc.). Often specialised implementations will offer much better performance for specific use cases, so we need to keep this option open. An example would be generating large random boolean matrices - generating and storing individual bits in bulk is *much* more efficient than going via generic random number functions for each bit. I think we should clearly separate random number generation from sample data construction. The latter certainly depends upon the former, but random numbers have a lot of other independent use cases. Hence I'm in favour of something like data.random being separate from data.generators On Thursday, 5 June 2014 05:53:10 UTC+1, Mars0i wrote: clojure.core provides a minimal set of functions for random effects: rand, rand-int, and rand-nth, currently with no simple ability to base these on a resettable random number generator or on different RNGs in different threads. (But see this ticket http://dev.clojure.org/jira/browse/CLJ-1420 pointed out by Andy Fingerhut in another thread.) data.generators includes additional useful general-purpose functions involving random numbers and random choices, but this is entirely not obvious when you read the docstrings. (Some of the docstrings are pretty mysterious.) It's also not necessarily what one would guess from the name of the library. (None of this is a criticism of anyone or anything about the project. Data.generators is at an 0.n.m release stage. I'm very grateful for the work that people have put in on it.) As I understand it, data.generators was split off from test.generative, which sounds like a good idea.So data.generators was intended to provide functions that generate random data for testing. (I imagine that the existing documentation makes more sense in the context of test.generative, too.) However, what's in data.generator has more general applications, for people who want random numbers, samples, etc. outside of software testing. (In my case, that would be for random effects in scientific simulations.) Off the top of my head, it seems to me that these other applications might have slightly different needs from the use of data.generators by test.generative. For one thing, efficiency might matter a lot in some simulations, but not in software testing. (At least, *I* wouldn't care if my test functions were slow.) I'm not saying that functions in data.generator are slow, but I don't think there's a good reason to worry about making them efficient if they're only intended for software testing. Further, there are other needs than are currently provided by test.generators. See the sampling functions in bigml/sampling https://github.com/bigmlcom/sampling or Incanter http://incanter.org/, for example, and lots of other random functions that Incanter provides. Some of those should remain in Incanter, of course, but I wonder whether Clojure would benefit from a contributed library that satisfied a set of core needs for random effects. (Incanter partly builds on clojure.core's rand at this point.) Maybe data.generators is/will be that library. Or maybe parts of data.generators would make more sense as part of a separate library (math.random? data.random? math.probability?) that could be split out of data.generators. (If it doesn't make sense to split data.generators, then would a new name for the library be more appropriate?) Just some things I was wondering about. Curious to see what others say. (Fun tip: Check out data.generators' anything function, which is like Emacs' Zippy the Pinhead functions for people who prefer industrial atonal music composed by randomly filtered Jackson Pollock paintings, to speech. Or: People who want to thoroughly test their functions by throwing random randomly-typed data at them.) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email
Re: Past and future of data.generators
Hi, I have used http://maths.uncommons.org/ in a few of my projects, so that could be used in data.random. I have also played with the random.org API in the past as a source of random numbers. Thomas ps. in one of my use cases I also care about the performance of the random generator as I potentially need to create loads (millions) and then performance can have in impact. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Past and future of data.generators
clojure.core provides a minimal set of functions for random effects: rand, rand-int, and rand-nth, currently with no simple ability to base these on a resettable random number generator or on different RNGs in different threads. (But see this ticket http://dev.clojure.org/jira/browse/CLJ-1420 pointed out by Andy Fingerhut in another thread.) data.generators includes additional useful general-purpose functions involving random numbers and random choices, but this is entirely not obvious when you read the docstrings. (Some of the docstrings are pretty mysterious.) It's also not necessarily what one would guess from the name of the library. (None of this is a criticism of anyone or anything about the project. Data.generators is at an 0.n.m release stage. I'm very grateful for the work that people have put in on it.) As I understand it, data.generators was split off from test.generative, which sounds like a good idea.So data.generators was intended to provide functions that generate random data for testing. (I imagine that the existing documentation makes more sense in the context of test.generative, too.) However, what's in data.generator has more general applications, for people who want random numbers, samples, etc. outside of software testing. (In my case, that would be for random effects in scientific simulations.) Off the top of my head, it seems to me that these other applications might have slightly different needs from the use of data.generators by test.generative. For one thing, efficiency might matter a lot in some simulations, but not in software testing. (At least, *I* wouldn't care if my test functions were slow.) I'm not saying that functions in data.generator are slow, but I don't think there's a good reason to worry about making them efficient if they're only intended for software testing. Further, there are other needs than are currently provided by test.generators. See the sampling functions in bigml/sampling https://github.com/bigmlcom/sampling or Incanter http://incanter.org/, for example, and lots of other random functions that Incanter provides. Some of those should remain in Incanter, of course, but I wonder whether Clojure would benefit from a contributed library that satisfied a set of core needs for random effects. (Incanter partly builds on clojure.core's rand at this point.) Maybe data.generators is/will be that library. Or maybe parts of data.generators would make more sense as part of a separate library (math.random? data.random? math.probability?) that could be split out of data.generators. (If it doesn't make sense to split data.generators, then would a new name for the library be more appropriate?) Just some things I was wondering about. Curious to see what others say. (Fun tip: Check out data.generators' anything function, which is like Emacs' Zippy the Pinhead functions for people who prefer industrial atonal music composed by randomly filtered Jackson Pollock paintings, to speech. Or: People who want to thoroughly test their functions by throwing random randomly-typed data at them.) -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.