GitHub user thisisnic added a comment to the discussion: how to debug arrow/dplyr to consider a bug report?
First thing I'm gonna try is writing the dataset to a temporary file - this is all done at the arrow level without bringing it into R. Then I'll read it in again and see if the filter works. ``` tf <- tempfile() dir.create(tf) open_dataset('data/softcite-extractions-oa-data/p01_one_percent_random_subset/papers.parquet') %>% write_dataset(tf) open_dataset(tf) |> filter(published_year < 1990) |> collect() |> nrow() ``` I go `1720` here, so it feel like there's something wrong either with the file or how it's being read. The next step is comparing the new file with the old one and seeing if there are any differences. GitHub link: https://github.com/apache/arrow/discussions/46383#discussioncomment-13119481 ---- This is an automatically sent email for user@arrow.apache.org. To unsubscribe, please send an email to: user-unsubscr...@arrow.apache.org