GitHub user thisisnic added a comment to the discussion: how to debug arrow/dplyr to consider a bug report?
Hi @jameshowison, totally fine to post an issue whether it's a bug or not, but I can help you look into it and walk through some debugging steps. Generally, what I'd do is try a few different things to rule out some issues, so I'll post my experiments below. The difference in behaviour between `read_parquet()` and `open_dataset()` is likely caused by the fact that when you call `read_parquet()`, you pull the data into R session memory and then run the dplyr chain on the data frame, whereas with `open_dataset()` it converts the dplyr chain to Acero (Arrow C++ compute engine) commands and runs them before pulling the results back into R. So whatever is happening is happening in Acero, or the R bindings to it. GitHub link: https://github.com/apache/arrow/discussions/46383#discussioncomment-13119165 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
