Dandandan commented on pull request #9271:
URL: https://github.com/apache/arrow/pull/9271#issuecomment-767771210
I rebased this PR @jorgecarleitao @alamb
This is an automated message from the Apache Git Service.
To respond
Dandandan commented on pull request #9271:
URL: https://github.com/apache/arrow/pull/9271#issuecomment-765952965
@jorgecarleitao
Thanks, that it a great summary of the situation! I think removing the `Vec`
sounds like it might improve the "metadata overhead", maybe there are some
Dandandan commented on pull request #9271:
URL: https://github.com/apache/arrow/pull/9271#issuecomment-765564334
This PR itself is ready for review.
I think the performance for slicing for small slices would be something to
look at later.
Dandandan commented on pull request #9271:
URL: https://github.com/apache/arrow/pull/9271#issuecomment-765392170
@nevi-me
I don't think indeed it is very expensive on large Arrays compared to the
size / operations on the array, but it turns out to be expensive on very small
Dandandan commented on pull request #9271:
URL: https://github.com/apache/arrow/pull/9271#issuecomment-765223781
@jorgecarleitao
Maybe, reasonably, yes (as long the underlying vecs have a few items).
I think the clone on the `ArrayData` structure itself is expensive when
Dandandan commented on pull request #9271:
URL: https://github.com/apache/arrow/pull/9271#issuecomment-765208218
@jorgecarleitao I found the "offending" code is this function in
`array/data.rs` which does a `self.clone()`. Any idea how we could write a non
copying version?
I think it
Dandandan commented on pull request #9271:
URL: https://github.com/apache/arrow/pull/9271#issuecomment-763646185
This PR in itself is ready for review now. The change in approach will help
us in the future, even more with `Array.slice` being improved (will create a
ticket later),
Dandandan commented on pull request #9271:
URL: https://github.com/apache/arrow/pull/9271#issuecomment-763553182
Thanks @jorgecarleitao makes sense.
I'm planning a blog post soon again, might put some details there.
In short its `cargo profiler` installed from github
Dandandan commented on pull request #9271:
URL: https://github.com/apache/arrow/pull/9271#issuecomment-763517408
![image](https://user-images.githubusercontent.com/163737/105164296-42515780-5b15-11eb-87f0-a042c4287514.png)
@jorgecarleitao while the code is quite a bit faster