conbench-apache-arrow[bot] commented on PR #41366:
URL: https://github.com/apache/arrow/pull/41366#issuecomment-2085566886
After merging your PR, Conbench analyzed the 7 benchmarking runs that have
been run so far on merge-commit e4f31462dbd668c3bcb6ce96442f3c1632c4d8c8.
There were
mapleFU merged PR #41366:
URL: https://github.com/apache/arrow/pull/41366
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
mapleFU commented on PR #41366:
URL: https://github.com/apache/arrow/pull/41366#issuecomment-2083298276
Will merge if no negative comment in one day
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
mapleFU commented on code in PR #41366:
URL: https://github.com/apache/arrow/pull/41366#discussion_r1582157678
##
cpp/src/arrow/table.h:
##
@@ -241,6 +241,9 @@ class ARROW_EXPORT Table {
///
/// The conversion is zero-copy: each record batch is a view over a slice
/// of the
rouault commented on PR #41366:
URL: https://github.com/apache/arrow/pull/41366#issuecomment-2081429596
This PR should be ready for merge (as far as I can see the test failures are
also found in other pull requests)
--
This is an automated message from the Apache Git Service.
To respond
rouault commented on code in PR #41366:
URL: https://github.com/apache/arrow/pull/41366#discussion_r1578584711
##
cpp/src/arrow/table.cc:
##
@@ -619,6 +619,7 @@ TableBatchReader::TableBatchReader(const Table& table)
for (int i = 0; i < table.num_columns(); ++i) {
felipecrv commented on code in PR #41366:
URL: https://github.com/apache/arrow/pull/41366#discussion_r1578571911
##
cpp/src/arrow/table.cc:
##
@@ -632,6 +633,7 @@ TableBatchReader::TableBatchReader(std::shared_ptr
table)
for (int i = 0; i < owned_table_->num_columns(); ++i)
rouault commented on PR #41366:
URL: https://github.com/apache/arrow/pull/41366#issuecomment-2075867780
It seems the added ValidateFull() breaks one test
```
[--] 1 test from TestTableSortIndicesForTemporal/1, where TypeParam
= class arrow::Date64Type
[ RUN ]
rouault commented on PR #41366:
URL: https://github.com/apache/arrow/pull/41366#issuecomment-2075828859
> is expected to be valid prior to using it with the batch reader
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
felipecrv commented on PR #41320:
URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075450368
> I would argue that libarrow/libparquet should be robust against
hostile/corrupted datasets even on Release builds, as those kind of crashes are
undesirable, and may potentially have
mapleFU commented on PR #41366:
URL: https://github.com/apache/arrow/pull/41366#issuecomment-2075282535
Will merge in 2days if no negative comments
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
rouault closed pull request #41320: GH-41317: [C++] Fix crash on invalid
Parquet file
URL: https://github.com/apache/arrow/pull/41320
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
rouault opened a new pull request, #41366:
URL: https://github.com/apache/arrow/pull/41366
### Rationale for this change
Fixes the crash detailed in #41317 in TableBatchReader::ReadNext() on a
corrupted Parquet file
### What changes are included in this PR?
Add a
rouault commented on PR #41320:
URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075270692
> I mean we can check it here: [#41320
(review)](https://github.com/apache/arrow/pull/41320#pullrequestreview-2019926584)
.
ok, closing that PR, and opening
mapleFU commented on PR #41320:
URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075193327
> not sure in which direction: your above proposal " // Check all columns
has same row-size" or the alternative proposal I made in
rouault commented on PR #41320:
URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075149363
> Would you mind edit this?
not sure in which direction: your above proposal " // Check all columns has
same row-size" or the alternative proposal I made in
mapleFU commented on PR #41320:
URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075086943
Thanks! @rouault Would you mind edit this? Or let me handle this with a new
patch?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
felipecrv commented on PR #41320:
URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075081456
> Just curious should we add DCHECK in TableReader helps debugging here
Add a `DCHECK` in `TableBatchReader`, document the precondition (with `///
\pre ...` in the constructor,
mapleFU commented on PR #41320:
URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075064489
> Is the Parquet reader producing an invalid Table instance and passing it
to TableReader?
Yes. User is running fuzzing on parquet file, when parsing a corrupt parquet
file, we
felipecrv commented on PR #41320:
URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075058403
> Also cc @felipecrv , do you think TableReader would check the input is
valid? I think the generate side should checks it, and the consumer would
better dcheck that?
Unless the
mapleFU commented on PR #41320:
URL: https://github.com/apache/arrow/pull/41320#issuecomment-2075003778
Also cc @felipecrv , do you think TableReader would check the input is
valid? I think the generate side should checks it, and the consumer would
better dcheck that?
--
This is an
mapleFU commented on PR #41320:
URL: https://github.com/apache/arrow/pull/41320#issuecomment-2074951956
See https://github.com/apache/arrow/issues/41317#issuecomment-2074947638
The root cause is that we didn't check the length. This can be detect by add
the
mapleFU commented on PR #41320:
URL: https://github.com/apache/arrow/pull/41320#issuecomment-2073059628
I'll take a carefully round tomorrow, the fix is ok but I'm not sure that's
rootcause
--
This is an automated message from the Apache Git Service.
To respond to the message, please log
wgtmac commented on PR #41320:
URL: https://github.com/apache/arrow/pull/41320#issuecomment-2069225126
cc @mapleFU
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
rouault commented on PR #41320:
URL: https://github.com/apache/arrow/pull/41320#issuecomment-2068064220
Digging further, seeing that FileReaderImpl::DecodeRowGroups() already
calls Table::Validate(), but that GetRecordBatchReader() didn't, I've also
tested successfully the following
rouault opened a new pull request, #41320:
URL: https://github.com/apache/arrow/pull/41320
### Rationale for this change
Fixes the crash detailed in #41317 in TableBatchReader::ReadNext() on a
corrupted Parquet file
### What changes are included in this PR?
Add a
github-actions[bot] commented on PR #41320:
URL: https://github.com/apache/arrow/pull/41320#issuecomment-2068042898
:warning: GitHub issue #41317 **has been automatically assigned in GitHub**
to PR creator.
--
This is an automated message from the Apache Git Service.
To respond to the
27 matches
Mail list logo