Robert Nishihara created ARROW-2308: ---------------------------------------
Summary: Serialized tensor data should be 64-byte aligned. Key: ARROW-2308 URL: https://issues.apache.org/jira/browse/ARROW-2308 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Robert Nishihara See [https://github.com/ray-project/ray/issues/1658] for an example of this issue. Non-aligned data can trigger a copy when fed into TensorFlow and things like that. {code} import pyarrow as pa import numpy as np x = np.zeros(10) y = pa.deserialize(pa.serialize(x).to_buffer()) x.ctypes.data % 64 # 0 (it starts out aligned) y.ctypes.data % 64 # 48 (it is no longer aligned) {code} It should be possible to fix this by calling something like {{RETURN_NOT_OK(AlignStreamPosition(dst));}} before writing the array data. Note that we already do this before writing the tensor header, but the tensor header is not necessarily a multiple of 64 bytes, so the subsequent data can be unaligned. -- This message was sent by Atlassian JIRA (v7.6.3#76005)