[jira] [Commented] (ARROW-6300) [C++] Add io::OutputStream::Abort()

2019-09-04 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922562#comment-16922562
 ] 

Wes McKinney commented on ARROW-6300:
-

I think that continuing to call {{Close}} on non-closed files in the dtor makes 
sense. This logic could also be moved up into the binding layer, too

> [C++] Add io::OutputStream::Abort()
> ---
>
> Key: ARROW-6300
> URL: https://issues.apache.org/jira/browse/ARROW-6300
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> This method would abort the current output stream without trying to flush or 
> commit any pending internal data. This makes sense mostly for buffered 
> streams. For other streams it could simply synonymous to Close().



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6300) [C++] Add io::OutputStream::Abort()

2019-09-04 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922465#comment-16922465
 ] 

Antoine Pitrou commented on ARROW-6300:
---

Hmm, so if we call Abort() from the destructor, this will be a behaviour 
change, as we currently call Close() (flushing any pending data). Perhaps some 
code paths in third-party code rely on that.

Also, this means that on the Python side, files created using e.g. 
{{pa.output_stream()}} wouldn't flush properly when destroyed. This would break 
expectations of Python users.

So perhaps we should keep calling Close() from the destructor. What do you 
think?

> [C++] Add io::OutputStream::Abort()
> ---
>
> Key: ARROW-6300
> URL: https://issues.apache.org/jira/browse/ARROW-6300
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> This method would abort the current output stream without trying to flush or 
> commit any pending internal data. This makes sense mostly for buffered 
> streams. For other streams it could simply synonymous to Close().



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6300) [C++] Add io::OutputStream::Abort()

2019-08-22 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913120#comment-16913120
 ] 

Antoine Pitrou commented on ARROW-6300:
---

> generally, I thought it was frowned upon to work that could fail in the 
> destructor anyways

Probably, but I think it's unavoidable to handle cases where the user didn't 
clean up explicitly.

> Are we writing temporary files and then moving them (i.e. letting the OS 
> cleanup the temporary file)?

No, we aren't doing that in any of the IO classes currently.

> [C++] Add io::OutputStream::Abort()
> ---
>
> Key: ARROW-6300
> URL: https://issues.apache.org/jira/browse/ARROW-6300
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> This method would abort the current output stream without trying to flush or 
> commit any pending internal data. This makes sense mostly for buffered 
> streams. For other streams it could simply synonymous to Close().



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6300) [C++] Add io::OutputStream::Abort()

2019-08-21 Thread Micah Kornfield (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912950#comment-16912950
 ] 

Micah Kornfield commented on ARROW-6300:


It sounds like it makes sense, I think not just hiding it in the destructor 
make sense (generally, I thought it was frowned upon to work that could fail in 
the destructor anyways).

 

>> 2. For regulard file systems, would it make sense to delete the underlying 
>> file (they both seem to be about clearing underlying resources?)

>I don't think so. The aim is to clear any temporary runtime resources 
>associated to the stream. We don't claim that Abort() will rollback any 
>persistent changes.

Are we writing temporary files and then moving them (i.e. letting the OS 
cleanup the temporary file)?

> [C++] Add io::OutputStream::Abort()
> ---
>
> Key: ARROW-6300
> URL: https://issues.apache.org/jira/browse/ARROW-6300
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> This method would abort the current output stream without trying to flush or 
> commit any pending internal data. This makes sense mostly for buffered 
> streams. For other streams it could simply synonymous to Close().



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6300) [C++] Add io::OutputStream::Abort()

2019-08-21 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912119#comment-16912119
 ] 

Antoine Pitrou commented on ARROW-6300:
---

> 1.  What is the use-case for such a method?  Are we expected to abort midway 
> through?

In some cases yes. Suppose you're writing a file as part of a larger pipeline. 
But another part of the pipeline fails, and you want to signal the stream that 
it can abort its operations.

> 2.  For regulard file systems, would it make sense to delete the underlying 
> file (they both seem to be about clearing underlying resources?)

I don't think so. The aim is to clear any temporary runtime resources 
associated to the stream. We don't claim that Abort() will rollback any 
persistent changes.

Perhaps this is not necessary and we can just rely on the destructor, since C++ 
destruction is deterministic.


> [C++] Add io::OutputStream::Abort()
> ---
>
> Key: ARROW-6300
> URL: https://issues.apache.org/jira/browse/ARROW-6300
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> This method would abort the current output stream without trying to flush or 
> commit any pending internal data. This makes sense mostly for buffered 
> streams. For other streams it could simply synonymous to Close().



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6300) [C++] Add io::OutputStream::Abort()

2019-08-20 Thread Micah Kornfield (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911928#comment-16911928
 ] 

Micah Kornfield commented on ARROW-6300:


Two questions:

1.  What is the use-case for such a method?  Are we expected to abort midway 
through?

2.  For regulard file systems, would it make sense to delete the underlying 
file (they both seem to be about clearing underlying resources?)

 

 

> [C++] Add io::OutputStream::Abort()
> ---
>
> Key: ARROW-6300
> URL: https://issues.apache.org/jira/browse/ARROW-6300
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> This method would abort the current output stream without trying to flush or 
> commit any pending internal data. This makes sense mostly for buffered 
> streams. For other streams it could simply synonymous to Close().



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6300) [C++] Add io::OutputStream::Abort()

2019-08-20 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911427#comment-16911427
 ] 

Wes McKinney commented on ARROW-6300:
-

Seems reasonable to have the default implementation be a no-op

> [C++] Add io::OutputStream::Abort()
> ---
>
> Key: ARROW-6300
> URL: https://issues.apache.org/jira/browse/ARROW-6300
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> This method would abort the current output stream without trying to flush or 
> commit any pending internal data. This makes sense mostly for buffered 
> streams. For other streams it could simply synonymous to Close().



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6300) [C++] Add io::OutputStream::Abort()

2019-08-20 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911406#comment-16911406
 ] 

Antoine Pitrou commented on ARROW-6300:
---

[~emkornfi...@gmail.com] opinions?

> [C++] Add io::OutputStream::Abort()
> ---
>
> Key: ARROW-6300
> URL: https://issues.apache.org/jira/browse/ARROW-6300
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> This method would abort the current output stream without trying to flush or 
> commit any pending internal data. This makes sense mostly for buffered 
> streams. For other streams it could simply synonymous to Close().



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6300) [C++] Add io::OutputStream::Abort()

2019-08-20 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911402#comment-16911402
 ] 

Antoine Pitrou commented on ARROW-6300:
---

Context: S3 multipart uploads must be either completed or aborted:
https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html

Aborting can be done implicitly in the destructor but it may be nice to have an 
explicit method call for that (especially to catch any errors).

> [C++] Add io::OutputStream::Abort()
> ---
>
> Key: ARROW-6300
> URL: https://issues.apache.org/jira/browse/ARROW-6300
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> This method would abort the current output stream without trying to flush or 
> commit any pending internal data. This makes sense mostly for buffered 
> streams. For other streams it could simply synonymous to Close().



--
This message was sent by Atlassian Jira
(v8.3.2#803003)