Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)

2016-06-07 Thread Daniel Slutsky
Hi.

I'm experimenting with Renjin interop - in particular, trying to make a 
Renjin objects implement core.matrix protocols (as mikera suggested).

I hope to be able to share some draft soon.  then ask your opinions about 
it.


Hi.

I'm experimenting with Renjin interop - in particular, trying to make a 
Renjin objects implement core.matrix protocols (as mikera suggested).

I hope to be able to share some draft soon.


On Monday, June 6, 2016 at 6:28:59 PM UTC+3, arthur.ma...@gmail.com wrote:
>
> Chaoya,
>
>  I haven't been working on this, and I don't really intend to anytime 
> soon, there's other work that I must attend to in the immediate time-frame.
>
> - Arthur
>
> On Saturday, June 4, 2016 at 11:51:49 PM UTC-4, Chaoya Li wrote:
>>
>> Hi I'm interested in Clojure DataFrame implementation. How is this going 
>> now? Are you coding for core.matrix or are you writing a new library from 
>> scratch? How can I join in this project?
>>
>> 在 2016年3月10日星期四 UTC+8上午4:57:31,arthur.ma...@gmail.com写道:
>>>
>>> Is there any desire or need for a Clojure DataFrame?
>>>
>>>
>>> By DataFrame, I mean a structure similar to R's data.frame, and Python's 
>>> pandas.DataFrame.
>>>
>>> Incanter's DataSet may already be fulfilling this purpose, and if so, 
>>> I'd like to know if and how people are using it.
>>>
>>> From quickly researching, I see that some prior work has been done in 
>>> this space, such as:
>>>
>>> * https://github.com/cardillo/joinery
>>> * https://github.com/mattrepl/data-frame
>>> * 
>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes
>>>
>>> Rather than going off and creating a competing implementation (
>>> https://xkcd.com/927/), I'd like to know if anyone here is actively 
>>> working on, or would like to work on a DataFrame and related utilities for 
>>> Clojure (and by extension Java)? Is it something that's sorely needed, or 
>>> is everybody happy with using Incanter or some other library that I'm not 
>>> aware of? If there's already a defacto standard out there, would anyone 
>>> care to please point it out?
>>>
>>> As background information:
>>>
>>> My specific use-case is in NLP and ML, where I often explore and 
>>> prototype in Python, but I'm then left to deal with a smattering of 
>>> libraries on the JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, 
>>> etc.), each with their own ad-hoc implementations of algorithms, matrices, 
>>> and utilities for reading data. It would be great to have a unified way to 
>>> explore my data in the Clojure REPL, and then serve the same code and 
>>> models in production.
>>>
>>> I would love for Clojure to have a broadly compatible ecosystem similar 
>>> to Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix 
>>> and Incanter appear to fulfill a large chunk of those roles, but I am not 
>>> aware if they've yet become the defacto standards in the community.
>>>
>>> Any feedback is greatly appreciated.
>>>
>>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)

2016-06-06 Thread arthur . maciejewicz
Chaoya,

 I haven't been working on this, and I don't really intend to anytime 
soon, there's other work that I must attend to in the immediate time-frame.

- Arthur

On Saturday, June 4, 2016 at 11:51:49 PM UTC-4, Chaoya Li wrote:
>
> Hi I'm interested in Clojure DataFrame implementation. How is this going 
> now? Are you coding for core.matrix or are you writing a new library from 
> scratch? How can I join in this project?
>
> 在 2016年3月10日星期四 UTC+8上午4:57:31,arthur.ma...@gmail.com写道:
>>
>> Is there any desire or need for a Clojure DataFrame?
>>
>>
>> By DataFrame, I mean a structure similar to R's data.frame, and Python's 
>> pandas.DataFrame.
>>
>> Incanter's DataSet may already be fulfilling this purpose, and if so, I'd 
>> like to know if and how people are using it.
>>
>> From quickly researching, I see that some prior work has been done in 
>> this space, such as:
>>
>> * https://github.com/cardillo/joinery
>> * https://github.com/mattrepl/data-frame
>> * 
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes
>>
>> Rather than going off and creating a competing implementation (
>> https://xkcd.com/927/), I'd like to know if anyone here is actively 
>> working on, or would like to work on a DataFrame and related utilities for 
>> Clojure (and by extension Java)? Is it something that's sorely needed, or 
>> is everybody happy with using Incanter or some other library that I'm not 
>> aware of? If there's already a defacto standard out there, would anyone 
>> care to please point it out?
>>
>> As background information:
>>
>> My specific use-case is in NLP and ML, where I often explore and 
>> prototype in Python, but I'm then left to deal with a smattering of 
>> libraries on the JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, 
>> etc.), each with their own ad-hoc implementations of algorithms, matrices, 
>> and utilities for reading data. It would be great to have a unified way to 
>> explore my data in the Clojure REPL, and then serve the same code and 
>> models in production.
>>
>> I would love for Clojure to have a broadly compatible ecosystem similar 
>> to Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix 
>> and Incanter appear to fulfill a large chunk of those roles, but I am not 
>> aware if they've yet become the defacto standards in the community.
>>
>> Any feedback is greatly appreciated.
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)

2016-06-04 Thread Chaoya Li
Hi I'm interested in Clojure DataFrame implementation. How is this going 
now? Are you coding for core.matrix or are you writing a new library from 
scratch? How can I join in this project?

在 2016年3月10日星期四 UTC+8上午4:57:31,arthur.ma...@gmail.com写道:
>
> Is there any desire or need for a Clojure DataFrame?
>
>
> By DataFrame, I mean a structure similar to R's data.frame, and Python's 
> pandas.DataFrame.
>
> Incanter's DataSet may already be fulfilling this purpose, and if so, I'd 
> like to know if and how people are using it.
>
> From quickly researching, I see that some prior work has been done in this 
> space, such as:
>
> * https://github.com/cardillo/joinery
> * https://github.com/mattrepl/data-frame
> * 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes
>
> Rather than going off and creating a competing implementation (
> https://xkcd.com/927/), I'd like to know if anyone here is actively 
> working on, or would like to work on a DataFrame and related utilities for 
> Clojure (and by extension Java)? Is it something that's sorely needed, or 
> is everybody happy with using Incanter or some other library that I'm not 
> aware of? If there's already a defacto standard out there, would anyone 
> care to please point it out?
>
> As background information:
>
> My specific use-case is in NLP and ML, where I often explore and prototype 
> in Python, but I'm then left to deal with a smattering of libraries on the 
> JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, etc.), each with 
> their own ad-hoc implementations of algorithms, matrices, and utilities for 
> reading data. It would be great to have a unified way to explore my data in 
> the Clojure REPL, and then serve the same code and models in production.
>
> I would love for Clojure to have a broadly compatible ecosystem similar to 
> Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix and 
> Incanter appear to fulfill a large chunk of those roles, but I am not aware 
> if they've yet become the defacto standards in the community.
>
> Any feedback is greatly appreciated.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)

2016-03-13 Thread Mikera


On Friday, 11 March 2016 09:21:09 UTC+8, arthur.ma...@gmail.com wrote:
>
> Renjin and Spark's dataframes are not going to be easily removed from 
> their respective codebases, as far as my brief perusal of the source can 
> tell. I agree that N-D DataFrames would be a good addition to the 
> ecosystem, similar to the goals of Python's xarray (xarray.pydata.org). 
> However, it is not a priority for myself as of this time. Thanks for 
> pointing out the DataSet proposal. I'll take a look at that later.
>
> On a slightly related note, where is the best place to ask core.matrix 
> questions? I have some small questions about sparse matrix support in 
> core.matrix, and what sparse formats are implemented.
>

There is the Numerical Clojure group: 
https://groups.google.com/forum/#!forum/numerical-clojure

For quick questions / discussion many people are on the #data-science 
channel in the Clojure slack  

Or you can just file a core.matrix issue with a question: I'm usually quite 
responsive with these and they may serve as a reference for future people 
who run into similar questions: 
https://github.com/mikera/core.matrix/issues

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)

2016-03-10 Thread Mikera
On Friday, 11 March 2016 09:09:14 UTC+8, Dragan Djuric wrote:
>
> This is already working well for the array programming APIs (it's easy to 
>> mix and match Clojure data structures, Vectorz Java-based arrays, GPU 
>> backed arrays in computations). 
>>
>
> While we could agree to some extent on the other parts of your post but 
> the GPU part is *NOT* true: I would like you to point me to a single 
> implementation anywhere (Clojure or other) that (easily or not) mixes and 
> matches arrays in RAM and arrays on the GPU backend. It simply does not 
> work that way.
>

You misunderstand my point. Obviously, there may need to be some copying 
when you move between managed and unmanaged memory. 

But I'm not talking about that: the point is that this can happen "under 
the hood", without the user needing to do explicit conversions etc. All 
thanks to the protocol implementations, you can mix and match GPU, native 
and Java backed instances with the same API. 

core.matrix can trivially do stuff like (add! native-array java-array) for 
example.

What's not to like about that?

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)

2016-03-10 Thread arthur . maciejewicz
Renjin and Spark's dataframes are not going to be easily removed from their 
respective codebases, as far as my brief perusal of the source can tell. I 
agree that N-D DataFrames would be a good addition to the ecosystem, 
similar to the goals of Python's xarray (xarray.pydata.org). However, it is 
not a priority for myself as of this time. Thanks for pointing out the 
DataSet proposal. I'll take a look at that later.

On a slightly related note, where is the best place to ask core.matrix 
questions? I have some small questions about sparse matrix support in 
core.matrix, and what sparse formats are implemented.

On Thursday, March 10, 2016 at 7:45:44 PM UTC-5, Mikera wrote:
>
> core.matrix maintainer here.
>
> I think it would be great to have more work on dataframe-type support. I 
> think the right strategy is as follows:
> a) Make use of the core.matrix Dataset protocols where possible (or add 
> new ones)
> b) Create implementation(s) for these protocols for whatever back-end data 
> frame implementation is being used
>
> The beauty of core.matrix is that we *can* support multiple 
> implementations without fragmentation, because the protocol based approach 
> means that every implementation can use the same API. This is already 
> working well for the array programming APIs (it's easy to mix and match 
> Clojure data structures, Vectorz Java-based arrays, GPU backed arrays in 
> computations). We just need to do the same for DataFrames.
>
> Now: the current core.matrix Dataset API is a bit focused on 2D data 
> tables, but I think it can be extended to general N-dimensional dataframe 
> capability. Would be a great project for someone to take on, happy to give 
> guidance and help merge in changes as needed.
>
> I don't have a particularly strong opinion on which Dataframe 
> implementations are best, but it looks like Spark and Renjin are both great 
> candidates and would be very useful additions to the Clojure numerical 
> ecosystem. If we do things right, they should interoperate easily with the 
> core.matrix APIs, making Clojure ideal for "glue" code across such 
> implementations.
>
> On Thursday, 10 March 2016 04:57:31 UTC+8, arthur.ma...@gmail.com wrote:
>>
>> Is there any desire or need for a Clojure DataFrame?
>>
>>
>> By DataFrame, I mean a structure similar to R's data.frame, and Python's 
>> pandas.DataFrame.
>>
>> Incanter's DataSet may already be fulfilling this purpose, and if so, I'd 
>> like to know if and how people are using it.
>>
>> From quickly researching, I see that some prior work has been done in 
>> this space, such as:
>>
>> * https://github.com/cardillo/joinery
>> * https://github.com/mattrepl/data-frame
>> * 
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes
>>
>> Rather than going off and creating a competing implementation (
>> https://xkcd.com/927/), I'd like to know if anyone here is actively 
>> working on, or would like to work on a DataFrame and related utilities for 
>> Clojure (and by extension Java)? Is it something that's sorely needed, or 
>> is everybody happy with using Incanter or some other library that I'm not 
>> aware of? If there's already a defacto standard out there, would anyone 
>> care to please point it out?
>>
>> As background information:
>>
>> My specific use-case is in NLP and ML, where I often explore and 
>> prototype in Python, but I'm then left to deal with a smattering of 
>> libraries on the JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, 
>> etc.), each with their own ad-hoc implementations of algorithms, matrices, 
>> and utilities for reading data. It would be great to have a unified way to 
>> explore my data in the Clojure REPL, and then serve the same code and 
>> models in production.
>>
>> I would love for Clojure to have a broadly compatible ecosystem similar 
>> to Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix 
>> and Incanter appear to fulfill a large chunk of those roles, but I am not 
>> aware if they've yet become the defacto standards in the community.
>>
>> Any feedback is greatly appreciated.
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)

2016-03-10 Thread Dragan Djuric

>
> This is already working well for the array programming APIs (it's easy to 
> mix and match Clojure data structures, Vectorz Java-based arrays, GPU 
> backed arrays in computations). 
>

While we could agree to some extent on the other parts of your post but the 
GPU part is *NOT* true: I would like you to point me to a single 
implementation anywhere (Clojure or other) that (easily or not) mixes and 
matches arrays in RAM and arrays on the GPU backend. It simply does not 
work that way.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)

2016-03-10 Thread Mikera
core.matrix maintainer here.

I think it would be great to have more work on dataframe-type support. I 
think the right strategy is as follows:
a) Make use of the core.matrix Dataset protocols where possible (or add new 
ones)
b) Create implementation(s) for these protocols for whatever back-end data 
frame implementation is being used

The beauty of core.matrix is that we *can* support multiple implementations 
without fragmentation, because the protocol based approach means that every 
implementation can use the same API. This is already working well for the 
array programming APIs (it's easy to mix and match Clojure data structures, 
Vectorz Java-based arrays, GPU backed arrays in computations). We just need 
to do the same for DataFrames.

Now: the current core.matrix Dataset API is a bit focused on 2D data 
tables, but I think it can be extended to general N-dimensional dataframe 
capability. Would be a great project for someone to take on, happy to give 
guidance and help merge in changes as needed.

I don't have a particularly strong opinion on which Dataframe 
implementations are best, but it looks like Spark and Renjin are both great 
candidates and would be very useful additions to the Clojure numerical 
ecosystem. If we do things right, they should interoperate easily with the 
core.matrix APIs, making Clojure ideal for "glue" code across such 
implementations.

On Thursday, 10 March 2016 04:57:31 UTC+8, arthur.ma...@gmail.com wrote:
>
> Is there any desire or need for a Clojure DataFrame?
>
>
> By DataFrame, I mean a structure similar to R's data.frame, and Python's 
> pandas.DataFrame.
>
> Incanter's DataSet may already be fulfilling this purpose, and if so, I'd 
> like to know if and how people are using it.
>
> From quickly researching, I see that some prior work has been done in this 
> space, such as:
>
> * https://github.com/cardillo/joinery
> * https://github.com/mattrepl/data-frame
> * 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes
>
> Rather than going off and creating a competing implementation (
> https://xkcd.com/927/), I'd like to know if anyone here is actively 
> working on, or would like to work on a DataFrame and related utilities for 
> Clojure (and by extension Java)? Is it something that's sorely needed, or 
> is everybody happy with using Incanter or some other library that I'm not 
> aware of? If there's already a defacto standard out there, would anyone 
> care to please point it out?
>
> As background information:
>
> My specific use-case is in NLP and ML, where I often explore and prototype 
> in Python, but I'm then left to deal with a smattering of libraries on the 
> JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, etc.), each with 
> their own ad-hoc implementations of algorithms, matrices, and utilities for 
> reading data. It would be great to have a unified way to explore my data in 
> the Clojure REPL, and then serve the same code and models in production.
>
> I would love for Clojure to have a broadly compatible ecosystem similar to 
> Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix and 
> Incanter appear to fulfill a large chunk of those roles, but I am not aware 
> if they've yet become the defacto standards in the community.
>
> Any feedback is greatly appreciated.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)

2016-03-09 Thread arthur . maciejewicz
Dan, 

That's a huge amount of stats packages available for use assuming we 
achieve interop with Renjin's dataframes. I'll look into it as well. My 
priorities are to first get something working for $DAYJOB, and then to 
build a more generally useful package, and finally add extras such as 
interop.

- Arthur

On Wednesday, March 9, 2016 at 7:04:17 PM UTC-5, Daniel Slutsky wrote:
>
> Thank you for raising this question.
>
> By the way, one desired feature for a Clojure dataframe abstraction would 
> be good interop with Renjin's dataframes.
> Renjin is a JVM-based rewrite of (a subset of) R. It offers a large number 
> of JVM-based statistical libraries. Most of them rely on the dataframe 
> abstraction for their data. R is also very Lisp-like in its data 
> representation, so wrapping all this with Clojure would be a delight.
>
>
>
> On Thursday, March 10, 2016 at 1:47:44 AM UTC+2, Christopher Small wrote:
>>
>>
>> If you're going to do any work in this area, I would highly encourage you 
>> to do in as part of the core.matrix library. That is what Incanter is or 
>> will be using for it's dataset implementation. But it's nice that those 
>> abstractions and implementations be separate from Incanter itself, since 
>> Incanter is a rather large dependency.
>>
>> Core.matrix is certainly (in my eyes) becoming the de facto matrix 
>> computation library in the Clojure ecosystem, and I think in the level of 
>> interop between different implementations there, and extent of utilization 
>> by the clojure community, we rival the python offerings. However, while 
>> core.matrix has some dataset protocols, api functions and basic 
>> implementations, there's still some work to get the full expressiveness of 
>> the data.frame pattern as seen in R and Pandas. Specifically, there is no 
>> support for setting rownames (or arbitrary "name" assignments beyond that 
>> of a single dimension (columns...)). This is something I started working on 
>> a while back, but wasn't able to finish. I could potentially push what I 
>> came up with to a fork, but unfortunately, I don't have any more time to 
>> work on the problem at the moment.
>>
>> Mike Anderson is a great project maintainer, and will probably be happy 
>> to help guide you in stitching together a solution.
>>
>> Best
>>
>> Chris
>>
>>
>>
>>
>>
>> On Wednesday, March 9, 2016 at 12:57:31 PM UTC-8, arthur.ma...@gmail.com 
>> wrote:
>>>
>>> Is there any desire or need for a Clojure DataFrame?
>>>
>>>
>>> By DataFrame, I mean a structure similar to R's data.frame, and Python's 
>>> pandas.DataFrame.
>>>
>>> Incanter's DataSet may already be fulfilling this purpose, and if so, 
>>> I'd like to know if and how people are using it.
>>>
>>> From quickly researching, I see that some prior work has been done in 
>>> this space, such as:
>>>
>>> * https://github.com/cardillo/joinery
>>> * https://github.com/mattrepl/data-frame
>>> * 
>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes
>>>
>>> Rather than going off and creating a competing implementation (
>>> https://xkcd.com/927/), I'd like to know if anyone here is actively 
>>> working on, or would like to work on a DataFrame and related utilities for 
>>> Clojure (and by extension Java)? Is it something that's sorely needed, or 
>>> is everybody happy with using Incanter or some other library that I'm not 
>>> aware of? If there's already a defacto standard out there, would anyone 
>>> care to please point it out?
>>>
>>> As background information:
>>>
>>> My specific use-case is in NLP and ML, where I often explore and 
>>> prototype in Python, but I'm then left to deal with a smattering of 
>>> libraries on the JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, 
>>> etc.), each with their own ad-hoc implementations of algorithms, matrices, 
>>> and utilities for reading data. It would be great to have a unified way to 
>>> explore my data in the Clojure REPL, and then serve the same code and 
>>> models in production.
>>>
>>> I would love for Clojure to have a broadly compatible ecosystem similar 
>>> to Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix 
>>> and Incanter appear to fulfill a large chunk of those roles, but I am not 
>>> aware if they've yet become the defacto standards in the community.
>>>
>>> Any feedback is greatly appreciated.
>>>
>>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 

Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)

2016-03-09 Thread Christopher Small
Sounds great; and sure thing, will do :-)

The basic idea I had was to implement a bidirectional index mapping names
<-> indices. This requires making sure you keep the index up to date any
time you change the data, but seemed the easiest way forward.

My fork is here: https://github.com/metasoarous/core.matrix/commits/develop

Here are a couple of related issues:

https://github.com/mikera/core.matrix/issues/193
https://github.com/mikera/core.matrix/issues/220

Hope you can come up with something nice!

I would focus first on coming up with what seems like a nice set of
protocols, so that we can be flexible with implementations. Ideally, we'd
be able to just apply some wrapper to any core.matrix array, vector,
matrix, etc that provided named/labeled access to the data, and would be
fairly seamless with the rest of the library. But you should also be able
to wrap something like Renjin's dataframes (as Daniel Slutsky mentioned;
just implement the protocols using their classes, I imagine). There might
have to be some iteration here. Like: initial protocol design -> initial
implementation -> redraft potocols -> try new implementation -> redraft
protocols, etc. I've noticed that it can be difficult to properly abstract
implementation details away from the protocol/API on the first go (though
you might have mastered this more than I :-)).

My 2c

Goodluck!

Chris



On Wed, Mar 9, 2016 at 4:29 PM,  wrote:

> Chris, thanks for the reply.
>
> It's good to know that I'm not the only one who misses this functionality!
> My goal is definitely to be compatible with Incanter and core.matrix, as
> they both seem mature, and I will never have the time to implement that
> functionality from scratch myself. I'll be studying the source of Pandas
> over the next few days, as I want to have a good idea of how they implement
> their dataframes before starting on the Clojure version. My long-term goal
> is for future authors to look to this set of core tools for data analysis
> as the basis for any packages they build.
>
> If you'd like to publish whatever you've written (hacked up code is ok),
> I'll take a look at that as a starting point, or at least as one possible
> design.
>
> - Arthur
>
>
>
> On Wednesday, March 9, 2016 at 6:47:44 PM UTC-5, Christopher Small wrote:
>>
>>
>> If you're going to do any work in this area, I would highly encourage you
>> to do in as part of the core.matrix library. That is what Incanter is or
>> will be using for it's dataset implementation. But it's nice that those
>> abstractions and implementations be separate from Incanter itself, since
>> Incanter is a rather large dependency.
>>
>> Core.matrix is certainly (in my eyes) becoming the de facto matrix
>> computation library in the Clojure ecosystem, and I think in the level of
>> interop between different implementations there, and extent of utilization
>> by the clojure community, we rival the python offerings. However, while
>> core.matrix has some dataset protocols, api functions and basic
>> implementations, there's still some work to get the full expressiveness of
>> the data.frame pattern as seen in R and Pandas. Specifically, there is no
>> support for setting rownames (or arbitrary "name" assignments beyond that
>> of a single dimension (columns...)). This is something I started working on
>> a while back, but wasn't able to finish. I could potentially push what I
>> came up with to a fork, but unfortunately, I don't have any more time to
>> work on the problem at the moment.
>>
>> Mike Anderson is a great project maintainer, and will probably be happy
>> to help guide you in stitching together a solution.
>>
>> Best
>>
>> Chris
>>
>>
>>
>>
>>
>> On Wednesday, March 9, 2016 at 12:57:31 PM UTC-8, arthur.ma...@gmail.com
>> wrote:
>>>
>>> Is there any desire or need for a Clojure DataFrame?
>>>
>>>
>>> By DataFrame, I mean a structure similar to R's data.frame, and Python's
>>> pandas.DataFrame.
>>>
>>> Incanter's DataSet may already be fulfilling this purpose, and if so,
>>> I'd like to know if and how people are using it.
>>>
>>> From quickly researching, I see that some prior work has been done in
>>> this space, such as:
>>>
>>> * https://github.com/cardillo/joinery
>>> * https://github.com/mattrepl/data-frame
>>> *
>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes
>>>
>>> Rather than going off and creating a competing implementation (
>>> https://xkcd.com/927/), I'd like to know if anyone here is actively
>>> working on, or would like to work on a DataFrame and related utilities for
>>> Clojure (and by extension Java)? Is it something that's sorely needed, or
>>> is everybody happy with using Incanter or some other library that I'm not
>>> aware of? If there's already a defacto standard out there, would anyone
>>> care to please point it out?
>>>
>>> As background information:
>>>
>>> My specific use-case is in NLP and ML, where I often explore and
>>> prototype in 

Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)

2016-03-09 Thread arthur . maciejewicz
Chris, thanks for the reply. 

It's good to know that I'm not the only one who misses this functionality! 
My goal is definitely to be compatible with Incanter and core.matrix, as 
they both seem mature, and I will never have the time to implement that 
functionality from scratch myself. I'll be studying the source of Pandas 
over the next few days, as I want to have a good idea of how they implement 
their dataframes before starting on the Clojure version. My long-term goal 
is for future authors to look to this set of core tools for data analysis 
as the basis for any packages they build.

If you'd like to publish whatever you've written (hacked up code is ok), 
I'll take a look at that as a starting point, or at least as one possible 
design.

- Arthur


On Wednesday, March 9, 2016 at 6:47:44 PM UTC-5, Christopher Small wrote:
>
>
> If you're going to do any work in this area, I would highly encourage you 
> to do in as part of the core.matrix library. That is what Incanter is or 
> will be using for it's dataset implementation. But it's nice that those 
> abstractions and implementations be separate from Incanter itself, since 
> Incanter is a rather large dependency.
>
> Core.matrix is certainly (in my eyes) becoming the de facto matrix 
> computation library in the Clojure ecosystem, and I think in the level of 
> interop between different implementations there, and extent of utilization 
> by the clojure community, we rival the python offerings. However, while 
> core.matrix has some dataset protocols, api functions and basic 
> implementations, there's still some work to get the full expressiveness of 
> the data.frame pattern as seen in R and Pandas. Specifically, there is no 
> support for setting rownames (or arbitrary "name" assignments beyond that 
> of a single dimension (columns...)). This is something I started working on 
> a while back, but wasn't able to finish. I could potentially push what I 
> came up with to a fork, but unfortunately, I don't have any more time to 
> work on the problem at the moment.
>
> Mike Anderson is a great project maintainer, and will probably be happy to 
> help guide you in stitching together a solution.
>
> Best
>
> Chris
>
>
>
>
>
> On Wednesday, March 9, 2016 at 12:57:31 PM UTC-8, arthur.ma...@gmail.com 
> wrote:
>>
>> Is there any desire or need for a Clojure DataFrame?
>>
>>
>> By DataFrame, I mean a structure similar to R's data.frame, and Python's 
>> pandas.DataFrame.
>>
>> Incanter's DataSet may already be fulfilling this purpose, and if so, I'd 
>> like to know if and how people are using it.
>>
>> From quickly researching, I see that some prior work has been done in 
>> this space, such as:
>>
>> * https://github.com/cardillo/joinery
>> * https://github.com/mattrepl/data-frame
>> * 
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes
>>
>> Rather than going off and creating a competing implementation (
>> https://xkcd.com/927/), I'd like to know if anyone here is actively 
>> working on, or would like to work on a DataFrame and related utilities for 
>> Clojure (and by extension Java)? Is it something that's sorely needed, or 
>> is everybody happy with using Incanter or some other library that I'm not 
>> aware of? If there's already a defacto standard out there, would anyone 
>> care to please point it out?
>>
>> As background information:
>>
>> My specific use-case is in NLP and ML, where I often explore and 
>> prototype in Python, but I'm then left to deal with a smattering of 
>> libraries on the JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, 
>> etc.), each with their own ad-hoc implementations of algorithms, matrices, 
>> and utilities for reading data. It would be great to have a unified way to 
>> explore my data in the Clojure REPL, and then serve the same code and 
>> models in production.
>>
>> I would love for Clojure to have a broadly compatible ecosystem similar 
>> to Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix 
>> and Incanter appear to fulfill a large chunk of those roles, but I am not 
>> aware if they've yet become the defacto standards in the community.
>>
>> Any feedback is greatly appreciated.
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)

2016-03-09 Thread Daniel Slutsky
Thank you for raising this question.

By the way, one desired feature for a Clojure dataframe abstraction would 
be good interop with Renjin's dataframes.
Renjin is a JVM-based rewrite of (a subset of) R. It offers a large number 
of JVM-based statistical libraries. Most of them rely on the dataframe 
abstraction for their data. R is also very Lisp-like in its data 
representation, so wrapping all this with Clojure would be a delight.



On Thursday, March 10, 2016 at 1:47:44 AM UTC+2, Christopher Small wrote:
>
>
> If you're going to do any work in this area, I would highly encourage you 
> to do in as part of the core.matrix library. That is what Incanter is or 
> will be using for it's dataset implementation. But it's nice that those 
> abstractions and implementations be separate from Incanter itself, since 
> Incanter is a rather large dependency.
>
> Core.matrix is certainly (in my eyes) becoming the de facto matrix 
> computation library in the Clojure ecosystem, and I think in the level of 
> interop between different implementations there, and extent of utilization 
> by the clojure community, we rival the python offerings. However, while 
> core.matrix has some dataset protocols, api functions and basic 
> implementations, there's still some work to get the full expressiveness of 
> the data.frame pattern as seen in R and Pandas. Specifically, there is no 
> support for setting rownames (or arbitrary "name" assignments beyond that 
> of a single dimension (columns...)). This is something I started working on 
> a while back, but wasn't able to finish. I could potentially push what I 
> came up with to a fork, but unfortunately, I don't have any more time to 
> work on the problem at the moment.
>
> Mike Anderson is a great project maintainer, and will probably be happy to 
> help guide you in stitching together a solution.
>
> Best
>
> Chris
>
>
>
>
>
> On Wednesday, March 9, 2016 at 12:57:31 PM UTC-8, arthur.ma...@gmail.com 
> wrote:
>>
>> Is there any desire or need for a Clojure DataFrame?
>>
>>
>> By DataFrame, I mean a structure similar to R's data.frame, and Python's 
>> pandas.DataFrame.
>>
>> Incanter's DataSet may already be fulfilling this purpose, and if so, I'd 
>> like to know if and how people are using it.
>>
>> From quickly researching, I see that some prior work has been done in 
>> this space, such as:
>>
>> * https://github.com/cardillo/joinery
>> * https://github.com/mattrepl/data-frame
>> * 
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes
>>
>> Rather than going off and creating a competing implementation (
>> https://xkcd.com/927/), I'd like to know if anyone here is actively 
>> working on, or would like to work on a DataFrame and related utilities for 
>> Clojure (and by extension Java)? Is it something that's sorely needed, or 
>> is everybody happy with using Incanter or some other library that I'm not 
>> aware of? If there's already a defacto standard out there, would anyone 
>> care to please point it out?
>>
>> As background information:
>>
>> My specific use-case is in NLP and ML, where I often explore and 
>> prototype in Python, but I'm then left to deal with a smattering of 
>> libraries on the JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, 
>> etc.), each with their own ad-hoc implementations of algorithms, matrices, 
>> and utilities for reading data. It would be great to have a unified way to 
>> explore my data in the Clojure REPL, and then serve the same code and 
>> models in production.
>>
>> I would love for Clojure to have a broadly compatible ecosystem similar 
>> to Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix 
>> and Incanter appear to fulfill a large chunk of those roles, but I am not 
>> aware if they've yet become the defacto standards in the community.
>>
>> Any feedback is greatly appreciated.
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)

2016-03-09 Thread Christopher Small

If you're going to do any work in this area, I would highly encourage you 
to do in as part of the core.matrix library. That is what Incanter is or 
will be using for it's dataset implementation. But it's nice that those 
abstractions and implementations be separate from Incanter itself, since 
Incanter is a rather large dependency.

Core.matrix is certainly (in my eyes) becoming the de facto matrix 
computation library in the Clojure ecosystem, and I think in the level of 
interop between different implementations there, and extent of utilization 
by the clojure community, we rival the python offerings. However, while 
core.matrix has some dataset protocols, api functions and basic 
implementations, there's still some work to get the full expressiveness of 
the data.frame pattern as seen in R and Pandas. Specifically, there is no 
support for setting rownames (or arbitrary "name" assignments beyond that 
of a single dimension (columns...)). This is something I started working on 
a while back, but wasn't able to finish. I could potentially push what I 
came up with to a fork, but unfortunately, I don't have any more time to 
work on the problem at the moment.

Mike Anderson is a great project maintainer, and will probably be happy to 
help guide you in stitching together a solution.

Best

Chris





On Wednesday, March 9, 2016 at 12:57:31 PM UTC-8, arthur.ma...@gmail.com 
wrote:
>
> Is there any desire or need for a Clojure DataFrame?
>
>
> By DataFrame, I mean a structure similar to R's data.frame, and Python's 
> pandas.DataFrame.
>
> Incanter's DataSet may already be fulfilling this purpose, and if so, I'd 
> like to know if and how people are using it.
>
> From quickly researching, I see that some prior work has been done in this 
> space, such as:
>
> * https://github.com/cardillo/joinery
> * https://github.com/mattrepl/data-frame
> * 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes
>
> Rather than going off and creating a competing implementation (
> https://xkcd.com/927/), I'd like to know if anyone here is actively 
> working on, or would like to work on a DataFrame and related utilities for 
> Clojure (and by extension Java)? Is it something that's sorely needed, or 
> is everybody happy with using Incanter or some other library that I'm not 
> aware of? If there's already a defacto standard out there, would anyone 
> care to please point it out?
>
> As background information:
>
> My specific use-case is in NLP and ML, where I often explore and prototype 
> in Python, but I'm then left to deal with a smattering of libraries on the 
> JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, etc.), each with 
> their own ad-hoc implementations of algorithms, matrices, and utilities for 
> reading data. It would be great to have a unified way to explore my data in 
> the Clojure REPL, and then serve the same code and models in production.
>
> I would love for Clojure to have a broadly compatible ecosystem similar to 
> Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix and 
> Incanter appear to fulfill a large chunk of those roles, but I am not aware 
> if they've yet become the defacto standards in the community.
>
> Any feedback is greatly appreciated.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Is there any desire or need for a Clojure DataFrame? (X-POST from Numerical Clojure mailing list)

2016-03-09 Thread arthur . maciejewicz
Is there any desire or need for a Clojure DataFrame?


By DataFrame, I mean a structure similar to R's data.frame, and Python's 
pandas.DataFrame.

Incanter's DataSet may already be fulfilling this purpose, and if so, I'd 
like to know if and how people are using it.

>From quickly researching, I see that some prior work has been done in this 
space, such as:

* https://github.com/cardillo/joinery
* https://github.com/mattrepl/data-frame
* http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes

Rather than going off and creating a competing implementation 
(https://xkcd.com/927/), I'd like to know if anyone here is actively 
working on, or would like to work on a DataFrame and related utilities for 
Clojure (and by extension Java)? Is it something that's sorely needed, or 
is everybody happy with using Incanter or some other library that I'm not 
aware of? If there's already a defacto standard out there, would anyone 
care to please point it out?

As background information:

My specific use-case is in NLP and ML, where I often explore and prototype 
in Python, but I'm then left to deal with a smattering of libraries on the 
JVM (Mallet, Weka, Mahout, ND4J, DeepLearning4j, CoreNLP, etc.), each with 
their own ad-hoc implementations of algorithms, matrices, and utilities for 
reading data. It would be great to have a unified way to explore my data in 
the Clojure REPL, and then serve the same code and models in production.

I would love for Clojure to have a broadly compatible ecosystem similar to 
Python's Numpy/Pandas/Scikit-*/Scipy/matplotlib/GenSim,etc. Core.Matrix and 
Incanter appear to fulfill a large chunk of those roles, but I am not aware 
if they've yet become the defacto standards in the community.

Any feedback is greatly appreciated.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.