[GitHub] [arrow] patrickpai commented on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-08-21 Thread GitBox
patrickpai commented on pull request #7789: URL: https://github.com/apache/arrow/pull/7789#issuecomment-678400124 Hey @pitrou, super sorry for the delay. I was caught up with other work and now job searching. I'll try to address comments as soon as I can.

[GitHub] [arrow] patrickpai commented on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-07-29 Thread GitBox
patrickpai commented on pull request #7789: URL: https://github.com/apache/arrow/pull/7789#issuecomment-665793225 I triggered a new build but it's now failing one check. I think this is the relevant error, but it seems unrelated to my changes. Would appreciate any thoughts on what the

[GitHub] [arrow] patrickpai commented on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-07-20 Thread GitBox
patrickpai commented on pull request #7789: URL: https://github.com/apache/arrow/pull/7789#issuecomment-661292801 Does anyone know if I'm supposed to be able to see logs for failing checks? When I view the details for a failing check, I can't see any error messages to help figure out what

[GitHub] [arrow] patrickpai commented on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-07-17 Thread GitBox
patrickpai commented on pull request #7789: URL: https://github.com/apache/arrow/pull/7789#issuecomment-660341488 @pitrou Feel free to take a look! Note that in the most recent commit, if we try to decompress a parquet file written using Hadoop Lz4Codec but the file is corrupted (past the

[GitHub] [arrow] patrickpai commented on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-07-17 Thread GitBox
patrickpai commented on pull request #7789: URL: https://github.com/apache/arrow/pull/7789#issuecomment-660294978 Ah ok - thanks for clarifying! I think I get what you mean now. I'll make the change. This is an automated

[GitHub] [arrow] patrickpai commented on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-07-17 Thread GitBox
patrickpai commented on pull request #7789: URL: https://github.com/apache/arrow/pull/7789#issuecomment-660280975 @pitrou I can implement your heuristic. What I can do is in the `SerializedPageReader` constructor, start reading the first page, detect if we need to use hadoop lz4 or

[GitHub] [arrow] patrickpai commented on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-07-17 Thread GitBox
patrickpai commented on pull request #7789: URL: https://github.com/apache/arrow/pull/7789#issuecomment-660257971 > However, on the read side, you must ideally be able to ingest both kinds of input (Hadoop and non-Hadoop LZ4), so as to be maximally compatible with existing files.

[GitHub] [arrow] patrickpai commented on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-07-16 Thread GitBox
patrickpai commented on pull request #7789: URL: https://github.com/apache/arrow/pull/7789#issuecomment-659770909 @github-actions autotune Run clang-format on cpp This is an automated message from the Apache Git Service. To

[GitHub] [arrow] patrickpai commented on pull request #7789: PARQUET-1878: [C++] lz4 codec is not compatible with Hadoop Lz4Codec

2020-07-16 Thread GitBox
patrickpai commented on pull request #7789: URL: https://github.com/apache/arrow/pull/7789#issuecomment-659656433 Run clang-format on cpp This is an automated message from the Apache Git Service. To respond to the message,