Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)
Hi. I'm experimenting with Renjin interop - in particular, trying to make a Renjin objects implement core.matrix protocols (as mikera suggested). I hope to be able to share some draft soon. then ask your opinions about it. Hi. I'm experimenting with Renjin interop - in particular, trying to make a Renjin objects implement core.matrix protocols (as mikera suggested). I hope to be able to share some draft soon. On Monday, June 6, 2016 at 6:28:59 PM UTC+3, arthur.ma...@gmail.com wrote: > > Chaoya, > > I haven't been working on this, and I don't really intend to anytime > soon, there's other work that I must attend to in the immediate time-frame. > > - Arthur > > On Saturday, June 4, 2016 at 11:51:49 PM UTC-4, Chaoya Li wrote: >> >> Hi I'm interested in Clojure DataFrame implementation. How is this going >> now? Are you coding for core.matrix or are you writing a new library from >> scratch? How can I join in this project? >> >> 在 2016年3月10日星期四 UTC+8上午4:57:31,arthur.ma...@gmail.com写道: >>> >>> Is there any desire or need for a Clojure DataFrame? >>> >>> >>> By DataFrame, I mean a structure similar to R's data.frame, and Python's >>> pandas.DataFrame. >>> >>> Incanter's DataSet may already be fulfilling this purpose, and if so, >>> I'd like to know if and how people are using it. >>> >>> From quickly researching, I see that some prior work has been done in >>> this space, such as: >>> >>> * https://github.com/cardillo/joinery >>> * https://github.com/mattrepl/data-frame >>> * >>> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes >>> >>> Rather than going off and creating a competing implementation ( >>> https://xkcd.com/927/), I'd like to know if anyone here is actively >>> working on, or would like to work on a DataFrame and related utilities for >>> Clojure (and by extension Java)? Is it something that's sorely needed, or >>> is everybody happy with using Incanter or some other library that I'm not >>> aware of? If there's already a defacto standard out there, would anyone >>> care to please point it out? >>> >>> As background information: >>> >>> My specific use-case is in NLP and ML, where I often explore and >>> prototype in Python, but I'm then left to deal with a smattering of >>> libraries on the JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, >>> etc.), each with their own ad-hoc implementations of algorithms, matrices, >>> and utilities for reading data. It would be great to have a unified way to >>> explore my data in the Clojure REPL, and then serve the same code and >>> models in production. >>> >>> I would love for Clojure to have a broadly compatible ecosystem similar >>> to Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix >>> and Incanter appear to fulfill a large chunk of those roles, but I am not >>> aware if they've yet become the defacto standards in the community. >>> >>> Any feedback is greatly appreciated. >>> >> -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)
Chaoya, I haven't been working on this, and I don't really intend to anytime soon, there's other work that I must attend to in the immediate time-frame. - Arthur On Saturday, June 4, 2016 at 11:51:49 PM UTC-4, Chaoya Li wrote: > > Hi I'm interested in Clojure DataFrame implementation. How is this going > now? Are you coding for core.matrix or are you writing a new library from > scratch? How can I join in this project? > > 在 2016年3月10日星期四 UTC+8上午4:57:31,arthur.ma...@gmail.com写道: >> >> Is there any desire or need for a Clojure DataFrame? >> >> >> By DataFrame, I mean a structure similar to R's data.frame, and Python's >> pandas.DataFrame. >> >> Incanter's DataSet may already be fulfilling this purpose, and if so, I'd >> like to know if and how people are using it. >> >> From quickly researching, I see that some prior work has been done in >> this space, such as: >> >> * https://github.com/cardillo/joinery >> * https://github.com/mattrepl/data-frame >> * >> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes >> >> Rather than going off and creating a competing implementation ( >> https://xkcd.com/927/), I'd like to know if anyone here is actively >> working on, or would like to work on a DataFrame and related utilities for >> Clojure (and by extension Java)? Is it something that's sorely needed, or >> is everybody happy with using Incanter or some other library that I'm not >> aware of? If there's already a defacto standard out there, would anyone >> care to please point it out? >> >> As background information: >> >> My specific use-case is in NLP and ML, where I often explore and >> prototype in Python, but I'm then left to deal with a smattering of >> libraries on the JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, >> etc.), each with their own ad-hoc implementations of algorithms, matrices, >> and utilities for reading data. It would be great to have a unified way to >> explore my data in the Clojure REPL, and then serve the same code and >> models in production. >> >> I would love for Clojure to have a broadly compatible ecosystem similar >> to Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix >> and Incanter appear to fulfill a large chunk of those roles, but I am not >> aware if they've yet become the defacto standards in the community. >> >> Any feedback is greatly appreciated. >> > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)
Hi I'm interested in Clojure DataFrame implementation. How is this going now? Are you coding for core.matrix or are you writing a new library from scratch? How can I join in this project? 在 2016年3月10日星期四 UTC+8上午4:57:31,arthur.ma...@gmail.com写道: > > Is there any desire or need for a Clojure DataFrame? > > > By DataFrame, I mean a structure similar to R's data.frame, and Python's > pandas.DataFrame. > > Incanter's DataSet may already be fulfilling this purpose, and if so, I'd > like to know if and how people are using it. > > From quickly researching, I see that some prior work has been done in this > space, such as: > > * https://github.com/cardillo/joinery > * https://github.com/mattrepl/data-frame > * > http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes > > Rather than going off and creating a competing implementation ( > https://xkcd.com/927/), I'd like to know if anyone here is actively > working on, or would like to work on a DataFrame and related utilities for > Clojure (and by extension Java)? Is it something that's sorely needed, or > is everybody happy with using Incanter or some other library that I'm not > aware of? If there's already a defacto standard out there, would anyone > care to please point it out? > > As background information: > > My specific use-case is in NLP and ML, where I often explore and prototype > in Python, but I'm then left to deal with a smattering of libraries on the > JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, etc.), each with > their own ad-hoc implementations of algorithms, matrices, and utilities for > reading data. It would be great to have a unified way to explore my data in > the Clojure REPL, and then serve the same code and models in production. > > I would love for Clojure to have a broadly compatible ecosystem similar to > Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix and > Incanter appear to fulfill a large chunk of those roles, but I am not aware > if they've yet become the defacto standards in the community. > > Any feedback is greatly appreciated. > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)
On Friday, 11 March 2016 09:21:09 UTC+8, arthur.ma...@gmail.com wrote: > > Renjin and Spark's dataframes are not going to be easily removed from > their respective codebases, as far as my brief perusal of the source can > tell. I agree that N-D DataFrames would be a good addition to the > ecosystem, similar to the goals of Python's xarray (xarray.pydata.org). > However, it is not a priority for myself as of this time. Thanks for > pointing out the DataSet proposal. I'll take a look at that later. > > On a slightly related note, where is the best place to ask core.matrix > questions? I have some small questions about sparse matrix support in > core.matrix, and what sparse formats are implemented. > There is the Numerical Clojure group: https://groups.google.com/forum/#!forum/numerical-clojure For quick questions / discussion many people are on the #data-science channel in the Clojure slack Or you can just file a core.matrix issue with a question: I'm usually quite responsive with these and they may serve as a reference for future people who run into similar questions: https://github.com/mikera/core.matrix/issues -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)
On Friday, 11 March 2016 09:09:14 UTC+8, Dragan Djuric wrote: > > This is already working well for the array programming APIs (it's easy to >> mix and match Clojure data structures, Vectorz Java-based arrays, GPU >> backed arrays in computations). >> > > While we could agree to some extent on the other parts of your post but > the GPU part is *NOT* true: I would like you to point me to a single > implementation anywhere (Clojure or other) that (easily or not) mixes and > matches arrays in RAM and arrays on the GPU backend. It simply does not > work that way. > You misunderstand my point. Obviously, there may need to be some copying when you move between managed and unmanaged memory. But I'm not talking about that: the point is that this can happen "under the hood", without the user needing to do explicit conversions etc. All thanks to the protocol implementations, you can mix and match GPU, native and Java backed instances with the same API. core.matrix can trivially do stuff like (add! native-array java-array) for example. What's not to like about that? -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)
Renjin and Spark's dataframes are not going to be easily removed from their respective codebases, as far as my brief perusal of the source can tell. I agree that N-D DataFrames would be a good addition to the ecosystem, similar to the goals of Python's xarray (xarray.pydata.org). However, it is not a priority for myself as of this time. Thanks for pointing out the DataSet proposal. I'll take a look at that later. On a slightly related note, where is the best place to ask core.matrix questions? I have some small questions about sparse matrix support in core.matrix, and what sparse formats are implemented. On Thursday, March 10, 2016 at 7:45:44 PM UTC-5, Mikera wrote: > > core.matrix maintainer here. > > I think it would be great to have more work on dataframe-type support. I > think the right strategy is as follows: > a) Make use of the core.matrix Dataset protocols where possible (or add > new ones) > b) Create implementation(s) for these protocols for whatever back-end data > frame implementation is being used > > The beauty of core.matrix is that we *can* support multiple > implementations without fragmentation, because the protocol based approach > means that every implementation can use the same API. This is already > working well for the array programming APIs (it's easy to mix and match > Clojure data structures, Vectorz Java-based arrays, GPU backed arrays in > computations). We just need to do the same for DataFrames. > > Now: the current core.matrix Dataset API is a bit focused on 2D data > tables, but I think it can be extended to general N-dimensional dataframe > capability. Would be a great project for someone to take on, happy to give > guidance and help merge in changes as needed. > > I don't have a particularly strong opinion on which Dataframe > implementations are best, but it looks like Spark and Renjin are both great > candidates and would be very useful additions to the Clojure numerical > ecosystem. If we do things right, they should interoperate easily with the > core.matrix APIs, making Clojure ideal for "glue" code across such > implementations. > > On Thursday, 10 March 2016 04:57:31 UTC+8, arthur.ma...@gmail.com wrote: >> >> Is there any desire or need for a Clojure DataFrame? >> >> >> By DataFrame, I mean a structure similar to R's data.frame, and Python's >> pandas.DataFrame. >> >> Incanter's DataSet may already be fulfilling this purpose, and if so, I'd >> like to know if and how people are using it. >> >> From quickly researching, I see that some prior work has been done in >> this space, such as: >> >> * https://github.com/cardillo/joinery >> * https://github.com/mattrepl/data-frame >> * >> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes >> >> Rather than going off and creating a competing implementation ( >> https://xkcd.com/927/), I'd like to know if anyone here is actively >> working on, or would like to work on a DataFrame and related utilities for >> Clojure (and by extension Java)? Is it something that's sorely needed, or >> is everybody happy with using Incanter or some other library that I'm not >> aware of? If there's already a defacto standard out there, would anyone >> care to please point it out? >> >> As background information: >> >> My specific use-case is in NLP and ML, where I often explore and >> prototype in Python, but I'm then left to deal with a smattering of >> libraries on the JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, >> etc.), each with their own ad-hoc implementations of algorithms, matrices, >> and utilities for reading data. It would be great to have a unified way to >> explore my data in the Clojure REPL, and then serve the same code and >> models in production. >> >> I would love for Clojure to have a broadly compatible ecosystem similar >> to Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix >> and Incanter appear to fulfill a large chunk of those roles, but I am not >> aware if they've yet become the defacto standards in the community. >> >> Any feedback is greatly appreciated. >> > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)
> > This is already working well for the array programming APIs (it's easy to > mix and match Clojure data structures, Vectorz Java-based arrays, GPU > backed arrays in computations). > While we could agree to some extent on the other parts of your post but the GPU part is *NOT* true: I would like you to point me to a single implementation anywhere (Clojure or other) that (easily or not) mixes and matches arrays in RAM and arrays on the GPU backend. It simply does not work that way. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)
core.matrix maintainer here. I think it would be great to have more work on dataframe-type support. I think the right strategy is as follows: a) Make use of the core.matrix Dataset protocols where possible (or add new ones) b) Create implementation(s) for these protocols for whatever back-end data frame implementation is being used The beauty of core.matrix is that we *can* support multiple implementations without fragmentation, because the protocol based approach means that every implementation can use the same API. This is already working well for the array programming APIs (it's easy to mix and match Clojure data structures, Vectorz Java-based arrays, GPU backed arrays in computations). We just need to do the same for DataFrames. Now: the current core.matrix Dataset API is a bit focused on 2D data tables, but I think it can be extended to general N-dimensional dataframe capability. Would be a great project for someone to take on, happy to give guidance and help merge in changes as needed. I don't have a particularly strong opinion on which Dataframe implementations are best, but it looks like Spark and Renjin are both great candidates and would be very useful additions to the Clojure numerical ecosystem. If we do things right, they should interoperate easily with the core.matrix APIs, making Clojure ideal for "glue" code across such implementations. On Thursday, 10 March 2016 04:57:31 UTC+8, arthur.ma...@gmail.com wrote: > > Is there any desire or need for a Clojure DataFrame? > > > By DataFrame, I mean a structure similar to R's data.frame, and Python's > pandas.DataFrame. > > Incanter's DataSet may already be fulfilling this purpose, and if so, I'd > like to know if and how people are using it. > > From quickly researching, I see that some prior work has been done in this > space, such as: > > * https://github.com/cardillo/joinery > * https://github.com/mattrepl/data-frame > * > http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes > > Rather than going off and creating a competing implementation ( > https://xkcd.com/927/), I'd like to know if anyone here is actively > working on, or would like to work on a DataFrame and related utilities for > Clojure (and by extension Java)? Is it something that's sorely needed, or > is everybody happy with using Incanter or some other library that I'm not > aware of? If there's already a defacto standard out there, would anyone > care to please point it out? > > As background information: > > My specific use-case is in NLP and ML, where I often explore and prototype > in Python, but I'm then left to deal with a smattering of libraries on the > JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, etc.), each with > their own ad-hoc implementations of algorithms, matrices, and utilities for > reading data. It would be great to have a unified way to explore my data in > the Clojure REPL, and then serve the same code and models in production. > > I would love for Clojure to have a broadly compatible ecosystem similar to > Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix and > Incanter appear to fulfill a large chunk of those roles, but I am not aware > if they've yet become the defacto standards in the community. > > Any feedback is greatly appreciated. > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)
Dan, That's a huge amount of stats packages available for use assuming we achieve interop with Renjin's dataframes. I'll look into it as well. My priorities are to first get something working for $DAYJOB, and then to build a more generally useful package, and finally add extras such as interop. - Arthur On Wednesday, March 9, 2016 at 7:04:17 PM UTC-5, Daniel Slutsky wrote: > > Thank you for raising this question. > > By the way, one desired feature for a Clojure dataframe abstraction would > be good interop with Renjin's dataframes. > Renjin is a JVM-based rewrite of (a subset of) R. It offers a large number > of JVM-based statistical libraries. Most of them rely on the dataframe > abstraction for their data. R is also very Lisp-like in its data > representation, so wrapping all this with Clojure would be a delight. > > > > On Thursday, March 10, 2016 at 1:47:44 AM UTC+2, Christopher Small wrote: >> >> >> If you're going to do any work in this area, I would highly encourage you >> to do in as part of the core.matrix library. That is what Incanter is or >> will be using for it's dataset implementation. But it's nice that those >> abstractions and implementations be separate from Incanter itself, since >> Incanter is a rather large dependency. >> >> Core.matrix is certainly (in my eyes) becoming the de facto matrix >> computation library in the Clojure ecosystem, and I think in the level of >> interop between different implementations there, and extent of utilization >> by the clojure community, we rival the python offerings. However, while >> core.matrix has some dataset protocols, api functions and basic >> implementations, there's still some work to get the full expressiveness of >> the data.frame pattern as seen in R and Pandas. Specifically, there is no >> support for setting rownames (or arbitrary "name" assignments beyond that >> of a single dimension (columns...)). This is something I started working on >> a while back, but wasn't able to finish. I could potentially push what I >> came up with to a fork, but unfortunately, I don't have any more time to >> work on the problem at the moment. >> >> Mike Anderson is a great project maintainer, and will probably be happy >> to help guide you in stitching together a solution. >> >> Best >> >> Chris >> >> >> >> >> >> On Wednesday, March 9, 2016 at 12:57:31 PM UTC-8, arthur.ma...@gmail.com >> wrote: >>> >>> Is there any desire or need for a Clojure DataFrame? >>> >>> >>> By DataFrame, I mean a structure similar to R's data.frame, and Python's >>> pandas.DataFrame. >>> >>> Incanter's DataSet may already be fulfilling this purpose, and if so, >>> I'd like to know if and how people are using it. >>> >>> From quickly researching, I see that some prior work has been done in >>> this space, such as: >>> >>> * https://github.com/cardillo/joinery >>> * https://github.com/mattrepl/data-frame >>> * >>> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes >>> >>> Rather than going off and creating a competing implementation ( >>> https://xkcd.com/927/), I'd like to know if anyone here is actively >>> working on, or would like to work on a DataFrame and related utilities for >>> Clojure (and by extension Java)? Is it something that's sorely needed, or >>> is everybody happy with using Incanter or some other library that I'm not >>> aware of? If there's already a defacto standard out there, would anyone >>> care to please point it out? >>> >>> As background information: >>> >>> My specific use-case is in NLP and ML, where I often explore and >>> prototype in Python, but I'm then left to deal with a smattering of >>> libraries on the JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, >>> etc.), each with their own ad-hoc implementations of algorithms, matrices, >>> and utilities for reading data. It would be great to have a unified way to >>> explore my data in the Clojure REPL, and then serve the same code and >>> models in production. >>> >>> I would love for Clojure to have a broadly compatible ecosystem similar >>> to Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix >>> and Incanter appear to fulfill a large chunk of those roles, but I am not >>> aware if they've yet become the defacto standards in the community. >>> >>> Any feedback is greatly appreciated. >>> >> -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to
Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)
Sounds great; and sure thing, will do :-) The basic idea I had was to implement a bidirectional index mapping names <-> indices. This requires making sure you keep the index up to date any time you change the data, but seemed the easiest way forward. My fork is here: https://github.com/metasoarous/core.matrix/commits/develop Here are a couple of related issues: https://github.com/mikera/core.matrix/issues/193 https://github.com/mikera/core.matrix/issues/220 Hope you can come up with something nice! I would focus first on coming up with what seems like a nice set of protocols, so that we can be flexible with implementations. Ideally, we'd be able to just apply some wrapper to any core.matrix array, vector, matrix, etc that provided named/labeled access to the data, and would be fairly seamless with the rest of the library. But you should also be able to wrap something like Renjin's dataframes (as Daniel Slutsky mentioned; just implement the protocols using their classes, I imagine). There might have to be some iteration here. Like: initial protocol design -> initial implementation -> redraft potocols -> try new implementation -> redraft protocols, etc. I've noticed that it can be difficult to properly abstract implementation details away from the protocol/API on the first go (though you might have mastered this more than I :-)). My 2c Goodluck! Chris On Wed, Mar 9, 2016 at 4:29 PM,wrote: > Chris, thanks for the reply. > > It's good to know that I'm not the only one who misses this functionality! > My goal is definitely to be compatible with Incanter and core.matrix, as > they both seem mature, and I will never have the time to implement that > functionality from scratch myself. I'll be studying the source of Pandas > over the next few days, as I want to have a good idea of how they implement > their dataframes before starting on the Clojure version. My long-term goal > is for future authors to look to this set of core tools for data analysis > as the basis for any packages they build. > > If you'd like to publish whatever you've written (hacked up code is ok), > I'll take a look at that as a starting point, or at least as one possible > design. > > - Arthur > > > > On Wednesday, March 9, 2016 at 6:47:44 PM UTC-5, Christopher Small wrote: >> >> >> If you're going to do any work in this area, I would highly encourage you >> to do in as part of the core.matrix library. That is what Incanter is or >> will be using for it's dataset implementation. But it's nice that those >> abstractions and implementations be separate from Incanter itself, since >> Incanter is a rather large dependency. >> >> Core.matrix is certainly (in my eyes) becoming the de facto matrix >> computation library in the Clojure ecosystem, and I think in the level of >> interop between different implementations there, and extent of utilization >> by the clojure community, we rival the python offerings. However, while >> core.matrix has some dataset protocols, api functions and basic >> implementations, there's still some work to get the full expressiveness of >> the data.frame pattern as seen in R and Pandas. Specifically, there is no >> support for setting rownames (or arbitrary "name" assignments beyond that >> of a single dimension (columns...)). This is something I started working on >> a while back, but wasn't able to finish. I could potentially push what I >> came up with to a fork, but unfortunately, I don't have any more time to >> work on the problem at the moment. >> >> Mike Anderson is a great project maintainer, and will probably be happy >> to help guide you in stitching together a solution. >> >> Best >> >> Chris >> >> >> >> >> >> On Wednesday, March 9, 2016 at 12:57:31 PM UTC-8, arthur.ma...@gmail.com >> wrote: >>> >>> Is there any desire or need for a Clojure DataFrame? >>> >>> >>> By DataFrame, I mean a structure similar to R's data.frame, and Python's >>> pandas.DataFrame. >>> >>> Incanter's DataSet may already be fulfilling this purpose, and if so, >>> I'd like to know if and how people are using it. >>> >>> From quickly researching, I see that some prior work has been done in >>> this space, such as: >>> >>> * https://github.com/cardillo/joinery >>> * https://github.com/mattrepl/data-frame >>> * >>> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes >>> >>> Rather than going off and creating a competing implementation ( >>> https://xkcd.com/927/), I'd like to know if anyone here is actively >>> working on, or would like to work on a DataFrame and related utilities for >>> Clojure (and by extension Java)? Is it something that's sorely needed, or >>> is everybody happy with using Incanter or some other library that I'm not >>> aware of? If there's already a defacto standard out there, would anyone >>> care to please point it out? >>> >>> As background information: >>> >>> My specific use-case is in NLP and ML, where I often explore and >>> prototype in
Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)
Chris, thanks for the reply. It's good to know that I'm not the only one who misses this functionality! My goal is definitely to be compatible with Incanter and core.matrix, as they both seem mature, and I will never have the time to implement that functionality from scratch myself. I'll be studying the source of Pandas over the next few days, as I want to have a good idea of how they implement their dataframes before starting on the Clojure version. My long-term goal is for future authors to look to this set of core tools for data analysis as the basis for any packages they build. If you'd like to publish whatever you've written (hacked up code is ok), I'll take a look at that as a starting point, or at least as one possible design. - Arthur On Wednesday, March 9, 2016 at 6:47:44 PM UTC-5, Christopher Small wrote: > > > If you're going to do any work in this area, I would highly encourage you > to do in as part of the core.matrix library. That is what Incanter is or > will be using for it's dataset implementation. But it's nice that those > abstractions and implementations be separate from Incanter itself, since > Incanter is a rather large dependency. > > Core.matrix is certainly (in my eyes) becoming the de facto matrix > computation library in the Clojure ecosystem, and I think in the level of > interop between different implementations there, and extent of utilization > by the clojure community, we rival the python offerings. However, while > core.matrix has some dataset protocols, api functions and basic > implementations, there's still some work to get the full expressiveness of > the data.frame pattern as seen in R and Pandas. Specifically, there is no > support for setting rownames (or arbitrary "name" assignments beyond that > of a single dimension (columns...)). This is something I started working on > a while back, but wasn't able to finish. I could potentially push what I > came up with to a fork, but unfortunately, I don't have any more time to > work on the problem at the moment. > > Mike Anderson is a great project maintainer, and will probably be happy to > help guide you in stitching together a solution. > > Best > > Chris > > > > > > On Wednesday, March 9, 2016 at 12:57:31 PM UTC-8, arthur.ma...@gmail.com > wrote: >> >> Is there any desire or need for a Clojure DataFrame? >> >> >> By DataFrame, I mean a structure similar to R's data.frame, and Python's >> pandas.DataFrame. >> >> Incanter's DataSet may already be fulfilling this purpose, and if so, I'd >> like to know if and how people are using it. >> >> From quickly researching, I see that some prior work has been done in >> this space, such as: >> >> * https://github.com/cardillo/joinery >> * https://github.com/mattrepl/data-frame >> * >> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes >> >> Rather than going off and creating a competing implementation ( >> https://xkcd.com/927/), I'd like to know if anyone here is actively >> working on, or would like to work on a DataFrame and related utilities for >> Clojure (and by extension Java)? Is it something that's sorely needed, or >> is everybody happy with using Incanter or some other library that I'm not >> aware of? If there's already a defacto standard out there, would anyone >> care to please point it out? >> >> As background information: >> >> My specific use-case is in NLP and ML, where I often explore and >> prototype in Python, but I'm then left to deal with a smattering of >> libraries on the JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, >> etc.), each with their own ad-hoc implementations of algorithms, matrices, >> and utilities for reading data. It would be great to have a unified way to >> explore my data in the Clojure REPL, and then serve the same code and >> models in production. >> >> I would love for Clojure to have a broadly compatible ecosystem similar >> to Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix >> and Incanter appear to fulfill a large chunk of those roles, but I am not >> aware if they've yet become the defacto standards in the community. >> >> Any feedback is greatly appreciated. >> > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)
Thank you for raising this question. By the way, one desired feature for a Clojure dataframe abstraction would be good interop with Renjin's dataframes. Renjin is a JVM-based rewrite of (a subset of) R. It offers a large number of JVM-based statistical libraries. Most of them rely on the dataframe abstraction for their data. R is also very Lisp-like in its data representation, so wrapping all this with Clojure would be a delight. On Thursday, March 10, 2016 at 1:47:44 AM UTC+2, Christopher Small wrote: > > > If you're going to do any work in this area, I would highly encourage you > to do in as part of the core.matrix library. That is what Incanter is or > will be using for it's dataset implementation. But it's nice that those > abstractions and implementations be separate from Incanter itself, since > Incanter is a rather large dependency. > > Core.matrix is certainly (in my eyes) becoming the de facto matrix > computation library in the Clojure ecosystem, and I think in the level of > interop between different implementations there, and extent of utilization > by the clojure community, we rival the python offerings. However, while > core.matrix has some dataset protocols, api functions and basic > implementations, there's still some work to get the full expressiveness of > the data.frame pattern as seen in R and Pandas. Specifically, there is no > support for setting rownames (or arbitrary "name" assignments beyond that > of a single dimension (columns...)). This is something I started working on > a while back, but wasn't able to finish. I could potentially push what I > came up with to a fork, but unfortunately, I don't have any more time to > work on the problem at the moment. > > Mike Anderson is a great project maintainer, and will probably be happy to > help guide you in stitching together a solution. > > Best > > Chris > > > > > > On Wednesday, March 9, 2016 at 12:57:31 PM UTC-8, arthur.ma...@gmail.com > wrote: >> >> Is there any desire or need for a Clojure DataFrame? >> >> >> By DataFrame, I mean a structure similar to R's data.frame, and Python's >> pandas.DataFrame. >> >> Incanter's DataSet may already be fulfilling this purpose, and if so, I'd >> like to know if and how people are using it. >> >> From quickly researching, I see that some prior work has been done in >> this space, such as: >> >> * https://github.com/cardillo/joinery >> * https://github.com/mattrepl/data-frame >> * >> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes >> >> Rather than going off and creating a competing implementation ( >> https://xkcd.com/927/), I'd like to know if anyone here is actively >> working on, or would like to work on a DataFrame and related utilities for >> Clojure (and by extension Java)? Is it something that's sorely needed, or >> is everybody happy with using Incanter or some other library that I'm not >> aware of? If there's already a defacto standard out there, would anyone >> care to please point it out? >> >> As background information: >> >> My specific use-case is in NLP and ML, where I often explore and >> prototype in Python, but I'm then left to deal with a smattering of >> libraries on the JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, >> etc.), each with their own ad-hoc implementations of algorithms, matrices, >> and utilities for reading data. It would be great to have a unified way to >> explore my data in the Clojure REPL, and then serve the same code and >> models in production. >> >> I would love for Clojure to have a broadly compatible ecosystem similar >> to Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix >> and Incanter appear to fulfill a large chunk of those roles, but I am not >> aware if they've yet become the defacto standards in the community. >> >> Any feedback is greatly appreciated. >> > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)
If you're going to do any work in this area, I would highly encourage you to do in as part of the core.matrix library. That is what Incanter is or will be using for it's dataset implementation. But it's nice that those abstractions and implementations be separate from Incanter itself, since Incanter is a rather large dependency. Core.matrix is certainly (in my eyes) becoming the de facto matrix computation library in the Clojure ecosystem, and I think in the level of interop between different implementations there, and extent of utilization by the clojure community, we rival the python offerings. However, while core.matrix has some dataset protocols, api functions and basic implementations, there's still some work to get the full expressiveness of the data.frame pattern as seen in R and Pandas. Specifically, there is no support for setting rownames (or arbitrary "name" assignments beyond that of a single dimension (columns...)). This is something I started working on a while back, but wasn't able to finish. I could potentially push what I came up with to a fork, but unfortunately, I don't have any more time to work on the problem at the moment. Mike Anderson is a great project maintainer, and will probably be happy to help guide you in stitching together a solution. Best Chris On Wednesday, March 9, 2016 at 12:57:31 PM UTC-8, arthur.ma...@gmail.com wrote: > > Is there any desire or need for a Clojure DataFrame? > > > By DataFrame, I mean a structure similar to R's data.frame, and Python's > pandas.DataFrame. > > Incanter's DataSet may already be fulfilling this purpose, and if so, I'd > like to know if and how people are using it. > > From quickly researching, I see that some prior work has been done in this > space, such as: > > * https://github.com/cardillo/joinery > * https://github.com/mattrepl/data-frame > * > http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes > > Rather than going off and creating a competing implementation ( > https://xkcd.com/927/), I'd like to know if anyone here is actively > working on, or would like to work on a DataFrame and related utilities for > Clojure (and by extension Java)? Is it something that's sorely needed, or > is everybody happy with using Incanter or some other library that I'm not > aware of? If there's already a defacto standard out there, would anyone > care to please point it out? > > As background information: > > My specific use-case is in NLP and ML, where I often explore and prototype > in Python, but I'm then left to deal with a smattering of libraries on the > JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, etc.), each with > their own ad-hoc implementations of algorithms, matrices, and utilities for > reading data. It would be great to have a unified way to explore my data in > the Clojure REPL, and then serve the same code and models in production. > > I would love for Clojure to have a broadly compatible ecosystem similar to > Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix and > Incanter appear to fulfill a large chunk of those roles, but I am not aware > if they've yet become the defacto standards in the community. > > Any feedback is greatly appreciated. > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)
Is there any desire or need for a Clojure DataFrame? By DataFrame, I mean a structure similar to R's data.frame, and Python's pandas.DataFrame. Incanter's DataSet may already be fulfilling this purpose, and if so, I'd like to know if and how people are using it. >From quickly researching, I see that some prior work has been done in this space, such as: * https://github.com/cardillo/joinery * https://github.com/mattrepl/data-frame * http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes Rather than going off and creating a competing implementation (https://xkcd.com/927/), I'd like to know if anyone here is actively working on, or would like to work on a DataFrame and related utilities for Clojure (and by extension Java)? Is it something that's sorely needed, or is everybody happy with using Incanter or some other library that I'm not aware of? If there's already a defacto standard out there, would anyone care to please point it out? As background information: My specific use-case is in NLP and ML, where I often explore and prototype in Python, but I'm then left to deal with a smattering of libraries on the JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, etc.), each with their own ad-hoc implementations of algorithms, matrices, and utilities for reading data. It would be great to have a unified way to explore my data in the Clojure REPL, and then serve the same code and models in production. I would love for Clojure to have a broadly compatible ecosystem similar to Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix and Incanter appear to fulfill a large chunk of those roles, but I am not aware if they've yet become the defacto standards in the community. Any feedback is greatly appreciated. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.