Thanks for your reply, David.

1) I used PyArrow 6.0.1 for both C++ and Python.
2) The dataset was deployed using this [1] script.
3) For C++, Arrow was built from source in release mode. You can see the
CMake config here [2].

I think I need to test once with Arrow C++ installed from packages instead
of me building it. That might be the issue.

[1]
https://github.com/JayjeetAtGithub/arrow-flight-benchmark/blob/main/common/deploy_data.sh
[2] https://github.com/JayjeetAtGithub/arrow-flight-benchmark/tree/main/cpp

Best,
Jayjeet




On Tue, Mar 1, 2022 at 5:04 AM David Li <lidav...@apache.org> wrote:

> Hi Jayjeet,
>
> That's odd since the Python API is just wrapping the C++ API, so they
> should be identical if everything is configured the same. (So is the Java
> API, incidentally.) That's effectively what the SO question is saying.
>
> What versions of PyArrow and Arrow are you using? Just to check the
> obvious things, was Arrow compiled with optimizations? And if we want to
> replicate this, is it possible to get the dataset?
>
> -David
>
> On Tue, Mar 1, 2022, at 01:52, Jayjeet Chakraborty wrote:
>
> Hi Arrow community,
>
> I was working on a class project for benchmarking Apache Arrow Dataset API
> in different programming languages. I found out that for some reason the
> C++ API example is slower than the Python API example. I ran my benchmarks
> on a 5 GB dataset consisting of 300 16MB parquet files. I tried my best to
> cross verify if all the parameters are similar in the Python and C++
> examples. It would be great to know if someone had similar observations in
> the past and if the reason for this is known. I would really like to know
> more about this phenomenon. You can find the code and the results here [1].
> I found a similar issue here [2] but I couldn't understand the exact
> reason. Thanks a lot for your help.
>
>
> [1]
> https://github.com/JayjeetAtGithub/arrow-flight-benchmark/tree/main/dataset_bench
>
> [2]
> https://stackoverflow.com/questions/67856457/reading-parquet-file-is-slower-in-c-than-in-python
>
> Best Regards,
> *Jayjeet Chakraborty*
> Ph.D. Student
> Department of Computer Science and Engineering
> University of California, Santa Cruz
>
> --
> *Jayjeet Chakraborty*
> B.Tech in Computer Sc. and Engineering
> National Institute Of Technology, Durgapur
> West Bengal, India
> M: (+91) 8436500886
>
>
>

-- 
*Jayjeet Chakraborty*
B.Tech in Computer Sc. and Engineering
National Institute Of Technology, Durgapur
West Bengal, India
M: (+91) 8436500886

Reply via email to