Thanks Joris for clearing that up! It's correct that pyspark will allow the
user to do operations on the resulting DataFrame, so it doesn't sound like
I should set `split_blocks=True` in the conversion. You're right that the
unnecessary assignments can be easily avoided if not timestamps, so that
w
Hi Bryan,
For the case that the column is no timestamp and was not modified: I don't
think it will take copies of the full dataframe by assigning columns in a
loop like that. But it is still doing work (it will copy data for that
column into the array holding those data for 2D blocks), and which c
Thanks for investigating this and the quick fix Joris and Wes! I just have
a couple questions about the behavior observed here. The pyspark code
assigns either the same series back to the pandas.DataFrame or makes some
modifications if it is a timestamp. In the case there are no timestamps, is
th
That sounds like a good solution. Having the zero-copy behavior depending
on whether you have only 1 column of a certain type or not, might lead to
surprising results. To avoid yet another keyword, only doing it when
split_blocks=True sounds good to me (in practice, that's also when it will
happen
I created https://issues.apache.org/jira/browse/ARROW-7596 and made it
a blocker for 0.16.0 so this does not get lost in the shuffle
On Thu, Jan 16, 2020 at 3:43 PM Wes McKinney wrote:
>
> hi Joris,
>
> Thanks for investigating this. It seems there were some unintended
> consequences of the zero-
hi Joris,
Thanks for investigating this. It seems there were some unintended
consequences of the zero-copy optimizations from ARROW-3789. Another
way forward might be to "opt in" to this behavior, or to only do the
zero copy optimizations when split_blocks=True. What do you think?
- Wes
On Thu,
So the spark integration build started to fail, and with the following test
error:
==
ERROR: test_toPandas_batch_order
(pyspark.sql.tests.test_arrow.EncryptionArrowTests)
---
Arrow Build Report for Job nightly-2020-01-15-0
All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0
Failed Tasks:
- gandiva-jar-osx:
URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-gandiva-jar-osx
- test-conda-py