GitHub user thisisnic added a comment to the discussion: how to debug 
arrow/dplyr to consider a bug report?

First thing I'm gonna try is writing the dataset to a temporary file - this is 
all done at the arrow level without bringing it into R.  Then I'll read it in 
again and see if the filter works.  

```
tf <- tempfile()
dir.create(tf)

open_dataset('data/softcite-extractions-oa-data/p01_one_percent_random_subset/papers.parquet')
 %>%
  write_dataset(tf)

open_dataset(tf) |>
  filter(published_year < 1990) |>
  collect() |>
  nrow()
```

I go `1720` here, so it feel like there's something wrong either with the file 
or how it's being read.  The next step is comparing the new file with the old 
one and seeing if there are any differences.

GitHub link: 
https://github.com/apache/arrow/discussions/46383#discussioncomment-13119481

----
This is an automatically sent email for user@arrow.apache.org.
To unsubscribe, please send an email to: user-unsubscr...@arrow.apache.org

Reply via email to