Hi Arrow community,

I was working on a class project for benchmarking Apache Arrow Dataset API
in different programming languages. I found out that for some reason the
C++ API example is slower than the Python API example. I ran my benchmarks
on a 5 GB dataset consisting of 300 16MB parquet files. I tried my best to
cross verify if all the parameters are similar in the Python and C++
examples. It would be great to know if someone had similar observations in
the past and if the reason for this is known. I would really like to know
more about this phenomenon. You can find the code and the results here [1].
I found a similar issue here [2] but I couldn't understand the exact
reason. Thanks a lot for your help.


[1]
https://github.com/JayjeetAtGithub/arrow-flight-benchmark/tree/main/dataset_bench

[2]
https://stackoverflow.com/questions/67856457/reading-parquet-file-is-slower-in-c-than-in-python

Best Regards,
*Jayjeet Chakraborty*
Ph.D. Student
Department of Computer Science and Engineering
University of California, Santa Cruz

-- 
*Jayjeet Chakraborty*
B.Tech in Computer Sc. and Engineering
National Institute Of Technology, Durgapur
West Bengal, India
M: (+91) 8436500886

Reply via email to