Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-30 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #41366: URL: https://github.com/apache/arrow/pull/41366#issuecomment-2085566886 After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit e4f31462dbd668c3bcb6ce96442f3c1632c4d8c8. There were

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-29 Thread via GitHub
mapleFU merged PR #41366: URL: https://github.com/apache/arrow/pull/41366 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-29 Thread via GitHub
mapleFU commented on PR #41366: URL: https://github.com/apache/arrow/pull/41366#issuecomment-2083298276 Will merge if no negative comment in one day -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-28 Thread via GitHub
mapleFU commented on code in PR #41366: URL: https://github.com/apache/arrow/pull/41366#discussion_r1582157678 ## cpp/src/arrow/table.h: ## @@ -241,6 +241,9 @@ class ARROW_EXPORT Table { /// /// The conversion is zero-copy: each record batch is a view over a slice /// of the

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-28 Thread via GitHub
rouault commented on PR #41366: URL: https://github.com/apache/arrow/pull/41366#issuecomment-2081429596 This PR should be ready for merge (as far as I can see the test failures are also found in other pull requests) -- This is an automated message from the Apache Git Service. To respond

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
rouault commented on code in PR #41366: URL: https://github.com/apache/arrow/pull/41366#discussion_r1578584711 ## cpp/src/arrow/table.cc: ## @@ -619,6 +619,7 @@ TableBatchReader::TableBatchReader(const Table& table) for (int i = 0; i < table.num_columns(); ++i) {

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
felipecrv commented on code in PR #41366: URL: https://github.com/apache/arrow/pull/41366#discussion_r1578571911 ## cpp/src/arrow/table.cc: ## @@ -632,6 +633,7 @@ TableBatchReader::TableBatchReader(std::shared_ptr table) for (int i = 0; i < owned_table_->num_columns(); ++i)

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
rouault commented on PR #41366: URL: https://github.com/apache/arrow/pull/41366#issuecomment-2075867780 It seems the added ValidateFull() breaks one test ``` [--] 1 test from TestTableSortIndicesForTemporal/1, where TypeParam = class arrow::Date64Type [ RUN ]

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
rouault commented on PR #41366: URL: https://github.com/apache/arrow/pull/41366#issuecomment-2075828859 > is expected to be valid prior to using it with the batch reader done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
felipecrv commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075450368 > I would argue that libarrow/libparquet should be robust against hostile/corrupted datasets even on Release builds, as those kind of crashes are undesirable, and may potentially have

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on PR #41366: URL: https://github.com/apache/arrow/pull/41366#issuecomment-2075282535 Will merge in 2days if no negative comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
rouault closed pull request #41320: GH-41317: [C++] Fix crash on invalid Parquet file URL: https://github.com/apache/arrow/pull/41320 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
rouault opened a new pull request, #41366: URL: https://github.com/apache/arrow/pull/41366 ### Rationale for this change Fixes the crash detailed in #41317 in TableBatchReader::ReadNext() on a corrupted Parquet file ### What changes are included in this PR? Add a

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
rouault commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075270692 > I mean we can check it here: [#41320 (review)](https://github.com/apache/arrow/pull/41320#pullrequestreview-2019926584) . ok, closing that PR, and opening

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075193327 > not sure in which direction: your above proposal " // Check all columns has same row-size" or the alternative proposal I made in

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
rouault commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075149363 > Would you mind edit this? not sure in which direction: your above proposal " // Check all columns has same row-size" or the alternative proposal I made in

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075086943 Thanks! @rouault Would you mind edit this? Or let me handle this with a new patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
felipecrv commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075081456 > Just curious should we add DCHECK in TableReader helps debugging here Add a `DCHECK` in `TableBatchReader`, document the precondition (with `/// \pre ...` in the constructor,

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075064489 > Is the Parquet reader producing an invalid Table instance and passing it to TableReader? Yes. User is running fuzzing on parquet file, when parsing a corrupt parquet file, we

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
felipecrv commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075058403 > Also cc @felipecrv , do you think TableReader would check the input is valid? I think the generate side should checks it, and the consumer would better dcheck that? Unless the

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075003778 Also cc @felipecrv , do you think TableReader would check the input is valid? I think the generate side should checks it, and the consumer would better dcheck that? -- This is an

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-24 Thread via GitHub
mapleFU commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2074951956 See https://github.com/apache/arrow/issues/41317#issuecomment-2074947638 The root cause is that we didn't check the length. This can be detect by add the

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-23 Thread via GitHub
mapleFU commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2073059628 I'll take a carefully round tomorrow, the fix is ok but I'm not sure that's rootcause -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-22 Thread via GitHub
wgtmac commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2069225126 cc @mapleFU -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-21 Thread via GitHub
rouault commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2068064220 Digging further, seeing that FileReaderImpl::DecodeRowGroups() already calls Table::Validate(), but that GetRecordBatchReader() didn't, I've also tested successfully the following

[PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-21 Thread via GitHub
rouault opened a new pull request, #41320: URL: https://github.com/apache/arrow/pull/41320 ### Rationale for this change Fixes the crash detailed in #41317 in TableBatchReader::ReadNext() on a corrupted Parquet file ### What changes are included in this PR? Add a

Re: [PR] GH-41317: [C++] Fix crash on invalid Parquet file [arrow]

2024-04-21 Thread via GitHub
github-actions[bot] commented on PR #41320: URL: https://github.com/apache/arrow/pull/41320#issuecomment-2068042898 :warning: GitHub issue #41317 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the