[jira] [Commented] (ARROW-2358) API for Writing to Multiple Feather Files

2018-06-13 Thread Dhruv Madeka (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511574#comment-16511574
 ] 

Dhruv Madeka commented on ARROW-2358:
-

Got it! I think I can handle that

> API for Writing to Multiple Feather Files
> -
>
> Key: ARROW-2358
> URL: https://issues.apache.org/jira/browse/ARROW-2358
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C, C++, Python
>Affects Versions: 0.9.0
>Reporter: Dhruv Madeka
>Priority: Minor
>
> It would be really great to have an API which can write a Table to a 
> `FeatherDataset`. Essentially, taking a name for a file - it would split the 
> table into N-equal parts (which could be determined by the user or the code) 
> and then write the data to N files with a suffix (which is `_part` by default 
> but could be user specificed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2358) API for Writing to Multiple Feather Files

2018-06-13 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511556#comment-16511556
 ] 

Wes McKinney commented on ARROW-2358:
-

There is a slice function on Column, so implementing {{Table::Slice}} should be 
reasonably straightforward if you're OK adding some C++ code and adding it to 
the Python bindings: 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/table.h#L125. I just 
opened ARROW-2707 about this

> API for Writing to Multiple Feather Files
> -
>
> Key: ARROW-2358
> URL: https://issues.apache.org/jira/browse/ARROW-2358
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C, C++, Python
>Affects Versions: 0.9.0
>Reporter: Dhruv Madeka
>Priority: Minor
>
> It would be really great to have an API which can write a Table to a 
> `FeatherDataset`. Essentially, taking a name for a file - it would split the 
> table into N-equal parts (which could be determined by the user or the code) 
> and then write the data to N files with a suffix (which is `_part` by default 
> but could be user specificed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2358) API for Writing to Multiple Feather Files

2018-06-12 Thread Dhruv Madeka (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510611#comment-16510611
 ] 

Dhruv Madeka commented on ARROW-2358:
-

[~wesmckinn] So Im good to submit a PR. Its just not obvious how to do this 
without a `slice` function for a table. Would you advise I implement that first 
and then the FeatherDataset writer?

> API for Writing to Multiple Feather Files
> -
>
> Key: ARROW-2358
> URL: https://issues.apache.org/jira/browse/ARROW-2358
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C, C++, Python
>Affects Versions: 0.9.0
>Reporter: Dhruv Madeka
>Priority: Minor
>
> It would be really great to have an API which can write a Table to a 
> `FeatherDataset`. Essentially, taking a name for a file - it would split the 
> table into N-equal parts (which could be determined by the user or the code) 
> and then write the data to N files with a suffix (which is `_part` by default 
> but could be user specificed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2358) API for Writing to Multiple Feather Files

2018-06-12 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510193#comment-16510193
 ] 

Wes McKinney commented on ARROW-2358:
-

This sounds useful to me. I removed this from the 0.10 milestone for now; 
please feel free to submit a PR

> API for Writing to Multiple Feather Files
> -
>
> Key: ARROW-2358
> URL: https://issues.apache.org/jira/browse/ARROW-2358
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C, C++, Python
>Affects Versions: 0.9.0
>Reporter: Dhruv Madeka
>Priority: Minor
>
> It would be really great to have an API which can write a Table to a 
> `FeatherDataset`. Essentially, taking a name for a file - it would split the 
> table into N-equal parts (which could be determined by the user or the code) 
> and then write the data to N files with a suffix (which is `_part` by default 
> but could be user specificed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)