Thanks for your reply, David. 1) I used PyArrow 6.0.1 for both C++ and Python. 2) The dataset was deployed using this [1] script. 3) For C++, Arrow was built from source in release mode. You can see the CMake config here [2].
I think I need to test once with Arrow C++ installed from packages instead of me building it. That might be the issue. [1] https://github.com/JayjeetAtGithub/arrow-flight-benchmark/blob/main/common/deploy_data.sh [2] https://github.com/JayjeetAtGithub/arrow-flight-benchmark/tree/main/cpp Best, Jayjeet On Tue, Mar 1, 2022 at 5:04 AM David Li <lidav...@apache.org> wrote: > Hi Jayjeet, > > That's odd since the Python API is just wrapping the C++ API, so they > should be identical if everything is configured the same. (So is the Java > API, incidentally.) That's effectively what the SO question is saying. > > What versions of PyArrow and Arrow are you using? Just to check the > obvious things, was Arrow compiled with optimizations? And if we want to > replicate this, is it possible to get the dataset? > > -David > > On Tue, Mar 1, 2022, at 01:52, Jayjeet Chakraborty wrote: > > Hi Arrow community, > > I was working on a class project for benchmarking Apache Arrow Dataset API > in different programming languages. I found out that for some reason the > C++ API example is slower than the Python API example. I ran my benchmarks > on a 5 GB dataset consisting of 300 16MB parquet files. I tried my best to > cross verify if all the parameters are similar in the Python and C++ > examples. It would be great to know if someone had similar observations in > the past and if the reason for this is known. I would really like to know > more about this phenomenon. You can find the code and the results here [1]. > I found a similar issue here [2] but I couldn't understand the exact > reason. Thanks a lot for your help. > > > [1] > https://github.com/JayjeetAtGithub/arrow-flight-benchmark/tree/main/dataset_bench > > [2] > https://stackoverflow.com/questions/67856457/reading-parquet-file-is-slower-in-c-than-in-python > > Best Regards, > *Jayjeet Chakraborty* > Ph.D. Student > Department of Computer Science and Engineering > University of California, Santa Cruz > > -- > *Jayjeet Chakraborty* > B.Tech in Computer Sc. and Engineering > National Institute Of Technology, Durgapur > West Bengal, India > M: (+91) 8436500886 > > > -- *Jayjeet Chakraborty* B.Tech in Computer Sc. and Engineering National Institute Of Technology, Durgapur West Bengal, India M: (+91) 8436500886