GitHub user thisisnic added a comment to the discussion: how to debug 
arrow/dplyr to consider a bug report?

Hi @jameshowison, totally fine to post an issue whether it's a bug or not, but 
I can help you look into it and walk through some debugging steps.  Generally, 
what I'd do is try a few different things to rule out some issues, so I'll post 
my experiments below.

The difference in behaviour between `read_parquet()` and `open_dataset()` is 
likely caused by the fact that when you call `read_parquet()`, you pull the 
data into R session memory and then run the dplyr chain on the data frame, 
whereas with `open_dataset()` it converts the dplyr chain to Acero (Arrow C++ 
compute engine) commands and runs them before pulling the results back into R.  
So whatever is happening is happening in Acero, or the R bindings to it.

GitHub link: 
https://github.com/apache/arrow/discussions/46383#discussioncomment-13119165

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to