Re: Simple Neural Network DSL -- request for feedback
You might also consider using your DSL as a frontend to the Nengo neural simulator (http://nengo.ca). Nengo (which is written in Java) has recently added a Python scripting interface (http://www.frontiersin.org/neuroinformatics/ 10.3389/neuro.11/007.2009/abstract). Nengo has a lot to recommend it and is pretty mature, so you may save yourself a lot of effort under the covers - also the way Nengo conceptualises the neyworks might be useful feedback to your DSL design. Ross On Nov 14, 5:18 am, Eric Schulte schulte.e...@gmail.com wrote: Hi Ross, #+begin_src clojure (let [n {:phi identity :accum (comp (partial reduce +) (partial map *)) :weights [2 2 2]}] [(repeat 3 n) (repeat 5 n) (assoc n :weights (vec (repeat 5 1)))]) #+end_src would result in the following connection pattern [[file:/tmp/layers.png]] layers.png 45KViewDownload However, for other NNs you may care about the topological organisation of the neurons in a 1-D, 2-D, or 3-D space in order to do things like connecting corresponding neurons in different layers or having the probability of a connection be a function of the separation of the neurons. In this case, you might use a data structure representing the coordinates of each neuron as its key. Fully agreed, I'm partway through implementing what you've just described (at least as I understand it), in that the library now declares a new Graph data type which consists of a list of keys-Neural mappings as well as a directed edge set. Using this new data type it is possible to construct, run and train arbitrarily connected graphs of Neural elements. See the fourth example athttp://repo.or.cz/w/neural-net.git Best -- Eric Ross- Hide quoted text - - Show quoted text - Ross Gayler r.gay...@gmail.com writes: On Nov 13, 9:12 am, Eric Schulte schulte.e...@gmail.com wrote: Albert Cardona sapri...@gmail.com writes: Your neural network DSL looks great. One minor comment: why use lists instead of sets? ... I used lists because I want to be able to specify a network in which (at least initially) all neurons in a hidden layer are identical e.g. the list example athttp://cs.unm.edu/~eschulte/src/neural-net/. You might want to consider maps. Currently I'm using maps to specify a single neuron, and I fear it would add complexity to have two different meanings for maps. For some NN models all you care about is that each neuron has a unique identity (in which case using an index value as a key is as good a solution as any). I'm currently using lists only for fully connected layers in a neural network, e.g. the following code -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Simple Neural Network DSL -- request for feedback
On Nov 13, 9:12 am, Eric Schulte schulte.e...@gmail.com wrote: Albert Cardona sapri...@gmail.com writes: Your neural network DSL looks great. One minor comment: why use lists instead of sets? ... I used lists because I want to be able to specify a network in which (at least initially) all neurons in a hidden layer are identical e.g. the list example athttp://cs.unm.edu/~eschulte/src/neural-net/. You might want to consider maps. For some NN models all you care about is that each neuron has a unique identity (in which case using an index value as a key is as good a solution as any). However, for other NNs you may care about the topological organisation of the neurons in a 1-D, 2-D, or 3-D space in order to do things like connecting corresponding neurons in different layers or having the probability of a connection be a function of the separation of the neurons. In this case, you might use a data structure representing the coordinates of each neuron as its key. Ross -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: relational data aggregation language
Thanks Shantanu. (Sorry for the slow reply.) What does a single case consist of? Is it just a result-set (as a consequence of running an SQL query)?. Maybe an example will help. I can't be too specific, but a single case can be thought of as a tiny relational database with maybe 20 tables. One table will have one row with some admin and unique identifier information. Another few tables will have one to a few rows each with identifying information like names and addresses. The remaining tables will each correspond to a different type of event that may have occurred and there will be zero to 100 rows (but typically only up to 10) in each event table with 3 or 4 columns describing the event. I know that Apache Derby[1], HSQLDB[2] (now called HyperSQL) and H2[3] can be readily used in Clojure as in-memory databases without needing to start a server. Thanks for the links to the in-memory databases. I think my brain is full now. It'll probably take me a couple of months to try out some ideas (this isn't on anybody's to-do list but mine). Thanks to all for your help. Ross On Oct 5, 12:53 am, Shantanu Kumar kumar.shant...@gmail.com wrote: Thanks Ross, that gives me a better insight into your environment. In the online environment single cases are fetched from a database with no aggregation capability and fired at the service that contains the aggregation functionality. What does a single case consist of? Is it just a result-set (as a consequence of running an SQL query)?. Maybe an example will help. This would make sense if the IMDB query language supported the aggregations we want in a way our statisticians can use, and the IMDB is sufficiently lightweight that we can link it into our service as a function library (rather than a separate server connected by some communication protocol). I know that Apache Derby[1], HSQLDB[2] (now called HyperSQL) and H2[3] can be readily used in Clojure as in-memory databases without needing to start a server. You can find the examples here:http://bitbucket.org/kumarshantanu/clj-dbcp/src(though Clj-DBCP 0.1 is not actually released yet, expected soon with SQLRat 0.2) [1]http://db.apache.org/derby/ [2]http://hsqldb.org/ [3]http://www.h2database.com/html/main.html There are likely more in-memory databases I am not aware of at the moment. As long as they have JDBC drivers using them should be easy. Since your main criteria is SQL features/functions, I guess you would need to find out which IMDB suits better. I will be happy to add any missing bits to Clj-DBCP and SQLRat (I am the author) if you can let me know. Please feel free to ping me off the list. Regards, Shantanu On Oct 4, 4:57 pm, Ross Gayler r.gay...@gmail.com wrote: Thanks for the two posts Shantanu. The rql and Clause examples are useful, both as potential parts of a solution and as examples of how query/aggregation stuff may be done in Clojure style. It is conceivable that I may end up deciding all I need is a DSL that covers the kinds of aggregations of interest to us and translates them to SQL via SQLrat. With respect to your three suggestions in your second post - things get a bit more interesting. A major part of the problem (which I failed to emphasize/mention in my first post) is that I really want this aggregation stuff to work in two deployment environments: a batch oriented statistical development environment that we control and an online, realtime transactional environment that corporate IT controls. In the online environment single cases are fetched from a database with no aggregation capability and fired at the service that contains the aggregation functionality. We control what happens inside that service but have close to zero chance of changing anything outside that service - so in that online environment we have no possibility of putting aggregation into the datasource DB that feeds our service. However, it *might* be reasonable to put an in-memory database inside our service, purely to take advantage of the aggregation facilities provided by that IMDB. A single case would get loaded into the IMDB, the aggregation would be carried out in that IMDB, the results exported, and the IMDB cleared ready for the next case. This would make sense if the IMDB query language supported the aggregations we want in a way our statisticians can use, and the IMDB is sufficiently lightweight that we can link it into our service as a function library (rather than a separate server connected by some communication protocol). In our statistical development environment things are different. The source data happens to live in a database, and we query that to get the subset of cases we are interested in (say, 1M of them). In that subset, each case can be treated completely in isolation and our aggregations will use 100% of the component data in each case. An individual aggregation might touch
Re: relational data aggregation language
Thanks for the two posts Shantanu. The rql and Clause examples are useful, both as potential parts of a solution and as examples of how query/aggregation stuff may be done in Clojure style. It is conceivable that I may end up deciding all I need is a DSL that covers the kinds of aggregations of interest to us and translates them to SQL via SQLrat. With respect to your three suggestions in your second post - things get a bit more interesting. A major part of the problem (which I failed to emphasize/mention in my first post) is that I really want this aggregation stuff to work in two deployment environments: a batch oriented statistical development environment that we control and an online, realtime transactional environment that corporate IT controls. In the online environment single cases are fetched from a database with no aggregation capability and fired at the service that contains the aggregation functionality. We control what happens inside that service but have close to zero chance of changing anything outside that service - so in that online environment we have no possibility of putting aggregation into the datasource DB that feeds our service. However, it *might* be reasonable to put an in-memory database inside our service, purely to take advantage of the aggregation facilities provided by that IMDB. A single case would get loaded into the IMDB, the aggregation would be carried out in that IMDB, the results exported, and the IMDB cleared ready for the next case. This would make sense if the IMDB query language supported the aggregations we want in a way our statisticians can use, and the IMDB is sufficiently lightweight that we can link it into our service as a function library (rather than a separate server connected by some communication protocol). In our statistical development environment things are different. The source data happens to live in a database, and we query that to get the subset of cases we are interested in (say, 1M of them). In that subset, each case can be treated completely in isolation and our aggregations will use 100% of the component data in each case. An individual aggregation might touch 20% of the data in one case, but we might have ~500 different aggregations from the same case, so every value gets used in lots of different aggregations. So although I am interested in DB query languages as a way of specifying aggregations I am not so convinced that I would actually use a full-blown DB to implement those aggregations. Cheers Ross On Oct 4, 3:24 am, Shantanu Kumar kumar.shant...@gmail.com wrote: I looked at Tutorial D - it's pretty interesting. Here are few top-of- my-head observations: * Which RDBMS do you use? If you are free to choose a new RDBMS, probably you can pick one that provides most of the computational functionality (as SQL constructs/functions) out of the box. For example Oracle, MS SQL Server, PostgreSQL etc. The reason is performance - the more you can compute within the database, the less amount of data you will need to fetch in order to process. * The kinds of computations you need to solve look like a superset of what SQL can provide. So, I think you will have to re-state the problem in terms of computations/iterations over SQL result-sets, which is probably what you are currently doing using the imperative language. If you can split every problem in terms of (a) computation you need versus (b) SQL queries you need to fire, then you can easily do it using Clojure itself without needing any DSL. * If you want a DSL for this, I suppose it should make maximum use of the database's inbuilt query functions/constructs to maximize performance. This also means the DSL implementation needs to be database-aware. Secondly, it is possible to write functions in Clojure that would emit appropriate SQL clauses (as long as it is doable) to compute certain pieces of information. Looking at multiple use cases (covering various aspects - fuzzy vs deterministic) will be helpful. Regards, Shantanu On Oct 3, 5:10 pm, Shantanu Kumar kumar.shant...@gmail.com wrote: On Oct 3, 1:16 pm, Ross Gayler r.gay...@gmail.com wrote: Thanks Michael. This sounds very similar to NoSQL and Map/Reduce? I'm not so sure about that (which may be mostly due to my ignorance of NoSQL and Map/Reduce). The amount of data involved in my problem is quite small and any infrastructure aimed at massive scaling may bring a load of conceptual and implementation baggage that is unnecessary/ unhelpful. Let me restate my problem: I have a bunch of statistician colleagues with minimal programming skills. (I am also a statistician, but with slightly better programming skills.) As part of our analytical workflow we take data sets and preprocess them by adding new variables that are typically aggregate functions of other values. We source the data form a database/file, add the new variables, and store the augmented
Re: relational data aggregation language
Thanks Michael. This sounds very similar to NoSQL and Map/Reduce? I'm not so sure about that (which may be mostly due to my ignorance of NoSQL and Map/Reduce). The amount of data involved in my problem is quite small and any infrastructure aimed at massive scaling may bring a load of conceptual and implementation baggage that is unnecessary/ unhelpful. Let me restate my problem: I have a bunch of statistician colleagues with minimal programming skills. (I am also a statistician, but with slightly better programming skills.) As part of our analytical workflow we take data sets and preprocess them by adding new variables that are typically aggregate functions of other values. We source the data form a database/file, add the new variables, and store the augmented data in a database/file for subsequent, extensive and extended (a couple of months) analysis with other tools (off the shelf statistical packages such as SAS and R). After the analyses are complete, some subset of the preprocessing calculations need to be implemented in an operational environment. This is currently done by completely re- implementing them in yet another fairly basic imperative language. The preprocessing in our analytical environment is usually written in a combination of SQL and the SAS data manipulation language (think of it as a very basic imperative language with macros but no user-defined functions). The statisticians take a long time to get their preprocessing right (they're not good at nested queries in SQL and make all the usual errors iterating over arrays of values with imperative code). So my primary goal is to find/build a query language that minimises the cognitive impedance mismatch with the statisticians and minimises their opportunity for error. Another goal is that the same mechanism should be applicable in our statistical analytical environment and the corporate deployment environment(s). The most different operational environment is online and realtime. The data describing one case gets thrown at some code that (among other things) implements the preprocessing with some embedded imperative code. So, linking in some Java byte code to do the preprocessing on a single case sounds feasible, whereas replacing/ augmenting the current corporate infrastructure with NoSQL and a CPU farm is more aggravation with corporate IT than I am paid for. The final goal is that the preprocessing mechanism should be no slower than the current methods in each of the deployment environments. The hardest one is probably in our statistical analysis environment, but there we do have the option of farming the work across multiple CPUs if needed. Let me describe the computational scale of the problem - it is really quite small. Data is organised as completely independent cases. One case might contain 500 primitive values for a total size of ~1kb. Preprocessing might calulate another 500 values, each of those being an aggregate function of some subset (say, 20 values) of the original 500 values. Currently, all these new values are calculated independently of each other, but there is a lot of overlap of intermediate results and, therefore, potential for optimisation of the computational effort required to calculate the entire set of results within a single case. In our statistical analytical environment the preprocessing is carried out in batch mode. A large dataset might contain 1M cases (~1GB of data). We can churn through the preprocessing at ~300 cases/second on a modest PC. Higher throughput in our analytical environment would be a bonus, but not essential. So I see the problem as primarily about the conceptual design of the query language, with some side constraints about implementation compatibility across a range of deployment environments and adequate throughput performance. As I mentioned in an earlier post, I'll probably assemble a collection of representative queries, express them in a variety of query languages, and try to assess how compatible the different query languages are with the way my colleagues want to think about the proble. Ross On Oct 3, 11:31 am, Michael Ossareh ossa...@gmail.com wrote: On Fri, Oct 1, 2010 at 17:55, Ross Gayler r.gay...@gmail.com wrote: Hi, This is probably an abuse of the Clojure forum, but it is a bit Clojure-related and strikes me as the sort of thing that a bright, eclectic bunch of Clojure users might know about. (Plus I'm not really a software person, so I need all the help I can get.) I am looking at the possibility of finding/building a declarative data aggregation language operating on a small relational representation. Each query identifies a set of rows satisfying some relational predicate and calculates some aggregate function of a set of values (e.g. min, max, sum). There might be ~20 input tables of up to ~1k rows. The data is immutable - it gets loaded and never changed. The results of the queries get loaded as new rows in other tables
Re: relational data aggregation language
Thanks Saul. That's all useful stuff. At this stage I am trying to simultaneously work out what my requirements should be and map out the space of implementation possibilities - so it's a matter of casting the net as widely as possible and kicking the tyres on everything in kicking range. Ross On Oct 3, 1:08 am, Saul Hazledine shaz...@gmail.com wrote: On Oct 2, 2:55 am, Ross Gayler r.gay...@gmail.com wrote: I am looking at the possibility of finding/building a declarative data aggregation language operating on a small relational representation. Each query identifies a set of rows satisfying some relational predicate and calculates some aggregate function of a set of values (e.g. min, max, sum). There might be ~20 input tables of up to ~1k rows. The data is immutable - it gets loaded and never changed. The results of the queries get loaded as new rows in other tables and are eventually used as input to other computations. There might be ~1k queries. There is no requirement for transaction management or any inherent concurrency (there is only one consumer of the results). There is no requirement for persistent storage - the aggregation is the only thing of interest. I would like the query language to map as directly as possible to the task (SQL is powerful enough, but can get very contorted and opaque for some of the queries). Two things probably worth mentioning in case you weren't aware of them. With most clojure build tools you can pull in a full relational database system such as H2, HSQLDB or Apache Derby and run an in memory database. Incanter (a R like platform for clojure) supports select and group-by on its datasets. With Incanter you can also plot pretty graphs etc. Saul -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
relational data aggregation language
Hi, This is probably an abuse of the Clojure forum, but it is a bit Clojure-related and strikes me as the sort of thing that a bright, eclectic bunch of Clojure users might know about. (Plus I'm not really a software person, so I need all the help I can get.) I am looking at the possibility of finding/building a declarative data aggregation language operating on a small relational representation. Each query identifies a set of rows satisfying some relational predicate and calculates some aggregate function of a set of values (e.g. min, max, sum). There might be ~20 input tables of up to ~1k rows. The data is immutable - it gets loaded and never changed. The results of the queries get loaded as new rows in other tables and are eventually used as input to other computations. There might be ~1k queries. There is no requirement for transaction management or any inherent concurrency (there is only one consumer of the results). There is no requirement for persistent storage - the aggregation is the only thing of interest. I would like the query language to map as directly as possible to the task (SQL is powerful enough, but can get very contorted and opaque for some of the queries). There is considerable scope for optimisation of the calculations over the total set of queries as partial results are common across many of the queries. I would like to be able to do this in Clojure (which I have not yet used), partly for some very practical reasons to do with Java interop and partly because Clojure looks very cool. * Is there any existing Clojure functionality which looks like a good fit to this problem? I have looked at Clojure-Datalog. It looks like a pretty good fit except that it lacks the aggregation operators. Apart from that the deductive power is probably greater than I need (although that doesn't necessarily cost me anything). I know that there are other (non- Clojure) Datalog implementations that have been extended with aggregation operators (e.g. DLV http://www.mat.unical.it/dlv-complex/dlv-complex). Tutorial D (what SQL should have been http://en.wikipedia.org/wiki/D_%28data_language_specification%29#Tutorial_D) might be a good fit, although once again, there is probably a lot of conceptual and implementation baggage (e.g. Rel http://dbappbuilder.sourceforge.net/Rel.php) that I don't need. * Is there a Clojure implementation of something like Tutorial D? If there is no implementation of anything that meets my requirements then I would be willing to look at the possibility of creating a Domain Specific language. However, I am wary of launching straight into that because of the probability that anything I dreamed up would be an ad hoc kludge rather than a semantically complete and consistent language. Optimised execution would be a whole other can of worms. * Does anyone know of any DSLs/formalisms for declaratively specifying relational data aggregations? Thanks Ross -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: relational data aggregation language
Thanks Daniel, That sounds like it's in the right direction (although that is probably true of anything that gives database-like functionality). I would need some filtered-join-like functionality between tables in order to select some of the rows of interest. As to the declarative part: leaving aside the terminology (which I am probably abusing) I currently have a bunch of colleagues who are even less computer literate than me doing these queries in an imperative language and it is a wonderfully fertile breeding ground for errors. (If you really want to know, they are statisticians using SAS to calculate things like Over all the companies of which this person was a director at the time a loan application was made, count the number of loan applications made in the last year where the purpose was debt restructuring and the amount sought was less than $100k). I guess I was trying to achieve two things by asking for a declarative query language: 1) reduce the accidental complexity of the specification, thereby reducing the opportunity for specification error, 2) I presumed that optimizing the calculation would be easier from a declarative specification (because there is no baked in ordering of operations). I think my next steps are to collect a small zoo (maybe only a menagerie) of different query languages; assemble a collection of test queries; code the queries in each of the query languages as a pencil and paper exercise (or in the real software if it is trivial enough to set up); try to rank the different query languages by usability for my colleagues, potential for optimization, and compatibility with our software environment and my minimal computing skills. Does this sound like something you could use? I'm happy to share it. Thanks. That would be great. At this stage (not knowing Clojure more than a tiny bit) I'm likely to make the most use of examples of how it would be used. Cheers Ross Gayler On Oct 2, 2:26 pm, Daniel Phelps phelps...@gmail.com wrote: Hi Ross, I am working on something that may be of help to you, but it's very early in development. Basically I wanted to see if I could write a database server in Clojure and what I have now sounds (kinda) like what you're after. It was really simple. Imagine a list of maps as a database table. To query the structure we simply filter that list based on some predicate, and now you have a new structure. My select function takes a database (a map of maps) a table name and a predicate and does exactly that. Then, arbitrary functions can be applied to the result. For example, suppose the result is as follows: ({:name Daniel :salary 1000} {:name Ross :salary 1500}) The total of salaries can be calculated like this: (defn sum [attr relation] (reduce + (map attr relation))) (sum :salary results) 2500 You can see how min, max and friends could just as easily be implemented. As far as building the database, I use classes as keys for the tables and instances of those classes are the tuples (see defrecord). In that sense what I have is declarative; the database is built by successively applying Clojure functions to Clojure data structures. Does this sound like something you could use? I'm happy to share it. Daniel Phelps On Oct 1, 8:55 pm, Ross Gayler r.gay...@gmail.com wrote: Hi, This is probably an abuse of the Clojure forum, but it is a bit Clojure-related and strikes me as the sort of thing that a bright, eclectic bunch of Clojure users might know about. (Plus I'm not really a software person, so I need all the help I can get.) I am looking at the possibility of finding/building a declarative data aggregation language operating on a small relational representation. Each query identifies a set of rows satisfying some relational predicate and calculates some aggregate function of a set of values (e.g. min, max, sum). There might be ~20 input tables of up to ~1k rows. The data is immutable - it gets loaded and never changed. The results of the queries get loaded as new rows in other tables and are eventually used as input to other computations. There might be ~1k queries. There is no requirement for transaction management or any inherent concurrency (there is only one consumer of the results). There is no requirement for persistent storage - the aggregation is the only thing of interest. I would like the query language to map as directly as possible to the task (SQL is powerful enough, but can get very contorted and opaque for some of the queries). There is considerable scope for optimisation of the calculations over the total set of queries as partial results are common across many of the queries. I would like to be able to do this in Clojure (which I have not yet used), partly for some very practical reasons to do with Java interop and partly because Clojure looks very cool. * Is there any existing Clojure functionality which looks like