Hi Arrow community, I was working on a class project for benchmarking Apache Arrow Dataset API in different programming languages. I found out that for some reason the C++ API example is slower than the Python API example. I ran my benchmarks on a 5 GB dataset consisting of 300 16MB parquet files. I tried my best to cross verify if all the parameters are similar in the Python and C++ examples. It would be great to know if someone had similar observations in the past and if the reason for this is known. I would really like to know more about this phenomenon. You can find the code and the results here [1]. I found a similar issue here [2] but I couldn't understand the exact reason. Thanks a lot for your help.
[1] https://github.com/JayjeetAtGithub/arrow-flight-benchmark/tree/main/dataset_bench [2] https://stackoverflow.com/questions/67856457/reading-parquet-file-is-slower-in-c-than-in-python Best Regards, *Jayjeet Chakraborty* Ph.D. Student Department of Computer Science and Engineering University of California, Santa Cruz -- *Jayjeet Chakraborty* B.Tech in Computer Sc. and Engineering National Institute Of Technology, Durgapur West Bengal, India M: (+91) 8436500886