[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-22 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-617744347 As a sidenote, I think you may want to start with Arrow unittests before trying to make Parquet unittests successful. Parquet relies on many Arrow facilities.

[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-617126711 Perhaps. If the reader is compatible with those files, and roundtripping works, then the writer is probably compliant as well.

[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-617112444 (also look for the "PARQUET_TEST_DATA" environment variable) This is an automated message from the Apache Git Service. To

[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-21 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-617112291 @kiszk The preferred way to do that would be to add a file to the https://github.com/apache/parquet-testing repository. It's checked in as a submodule in `cpp/submodules` and used in the

[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616729359 I've started a discussion on the [mailing-list](https://mail-archives.apache.org/mod_mbox/arrow-dev/) to make other people aware of your efforts. I wonder if creating a

[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616530174 Can you add the explanation you gave above (about the memory layout) somewhere in `parquet/types.h`? Thank you. This is

[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616456984 I think data should be kept in native endianness in memory (that is what the user would expect). What we must be careful is that Parquet data is encoded (and decoded) as little endian.

[GitHub] [arrow] pitrou commented on issue #6981: PARQUET-1845: [C++] Add expected results of Int96 in big-endian

2020-04-20 Thread GitBox
pitrou commented on issue #6981: URL: https://github.com/apache/arrow/pull/6981#issuecomment-616424899 Hmm, I don't think that's right. `Int96` is the physical representation of 96-bit integers in Parquet files, and it's entirely little-endian. This means it should always have the same