[jira] [Created] (ARROW-8861) Memory not released until Plasma process is killed
Chengxin Ma created ARROW-8861: -- Summary: Memory not released until Plasma process is killed Key: ARROW-8861 URL: https://issues.apache.org/jira/browse/ARROW-8861 Project: Apache Arrow Issue Type: Bug Components: C++ - Plasma Affects Versions: 0.16.0 Environment: Singularity container (Ubuntu 18.04) Reporter: Chengxin Ma Invoking the {{Delete(const ObjectID& object_id)}} method of a plasma client seems not really to free up the memory used by the object. To reproduce: 1. use {{htop}} (or other similar tools) to monitor memory usage; 2. start up the Plasma Object Store by {{plasma_store -m 10 -s /tmp/plasma}}; 3. use {{put.py}} to put an object into Plasma; 4. compile and run {{delete.cc}} ({{g++ delete.cc `pkg-config --cflags --libs arrow plasma` --std=c++11 -o delete}}); 5. kill the {{plasma_store}} process. Memory usage drops at Step 5, rather than Step 4. How to free up the memory while keeping Plasma Object Store running? {{put.py}}: {code:java} from pyarrow import plasma if __name__ == "__main__": client = plasma.connect("/tmp/plasma") object_id = plasma.ObjectID(20 * b"a") object_size = 5 buffer = memoryview(client.create(object_id, object_size)) for i in range(5): buffer[i] = i % 128 client.seal(object_id) client.disconnect() {code} {{delete.cc}}: {code:java} #include "arrow/util/logging.h" #include using namespace plasma; int main(int argc, char **argv) { PlasmaClient client; ARROW_CHECK_OK(client.Connect("/tmp/plasma")); ObjectID object_id = ObjectID::from_binary(""); client.Delete(object_id); ARROW_CHECK_OK(client.Disconnect()); } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8587) Compilation error when linking arrow-flight-perf-server
Chengxin Ma created ARROW-8587: -- Summary: Compilation error when linking arrow-flight-perf-server Key: ARROW-8587 URL: https://issues.apache.org/jira/browse/ARROW-8587 Project: Apache Arrow Issue Type: Bug Components: Benchmarking, C++, FlightRPC Affects Versions: 1.0.0 Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Reporter: Chengxin Ma I wanted to play around with Flight benchmark after seeing the discussion regarding Flight's throughput in arrow dev mailing list today. I met the following error when trying to build the benchmark from latest source code: {code:java} [ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to `boost::filesystem::detail::canonical(boost::filesystem::path const&, boost::filesystem::path const&, boost::system::error_code*)' ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to `boost::system::system_category()' ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to `boost::filesystem::path::parent_path() const' ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate' ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd' ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to `boost::system::generic_category()' ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to `boost::filesystem::detail::current_path(boost::system::error_code*)' ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateInit2_' ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate' ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateInit2_' ../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd' ../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to `boost::filesystem::path::operator/=(boost::filesystem::path const&)' collect2: error: ld returned 1 exit status src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: recipe for target 'debug/arrow-flight-perf-server' failed make[2]: *** [debug/arrow-flight-perf-server] Error 1 CMakeFiles/Makefile2:2609: recipe for target 'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] Error 2 Makefile:140: recipe for target 'all' failed make: *** [all] Error 2 {code} I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug -DARROW_DEPENDENCY_SOURCE=AUTO -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON -DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build. I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the output, but the Boost library that I installed from the package manger was of this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem? PS: I was able to build the benchmark [before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very similar to the one I'm using on my laptop. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7522) Broken Record Batch returned from a function call
Chengxin Ma created ARROW-7522: -- Summary: Broken Record Batch returned from a function call Key: ARROW-7522 URL: https://issues.apache.org/jira/browse/ARROW-7522 Project: Apache Arrow Issue Type: Bug Components: C++, C++ - Plasma Affects Versions: 0.15.1 Environment: macOS Reporter: Chengxin Ma Scenario: retrieving Record Batch from Plasma with known Object ID. The following code snippet works well: {code:java} int main(int argc, char **argv) { plasma::ObjectID object_id = plasma::ObjectID::from_binary("0FF1CE00C0FFEE00BEEF"); // Start up and connect a Plasma client. plasma::PlasmaClient client; ARROW_CHECK_OK(client.Connect("/tmp/store")); plasma::ObjectBuffer object_buffer; ARROW_CHECK_OK(client.Get(_id, 1, -1, _buffer)); // Retrieve object data. auto buffer = object_buffer.data; arrow::io::BufferReader buffer_reader(buffer); std::shared_ptr record_batch_stream_reader; ARROW_CHECK_OK(arrow::ipc::RecordBatchStreamReader::Open(_reader, _batch_stream_reader)); std::shared_ptr record_batch; arrow::Status status = record_batch_stream_reader->ReadNext(_batch); std::cout << "record_batch->column_name(0): " << record_batch->column_name(0) << std::endl; std::cout << "record_batch->num_columns(): " << record_batch->num_columns() << std::endl; std::cout << "record_batch->num_rows(): " << record_batch->num_rows() << std::endl; std::cout << "record_batch->column(0)->length(): " << record_batch->column(0)->length() << std::endl; std::cout << "record_batch->column(0)->ToString(): " << record_batch->column(0)->ToString() << std::endl; } {code} {{record_batch->column(0)->ToString()}} would incur a segmentation fault if retrieving Record Batch is wrapped in a function: {code:java} std::shared_ptr GetRecordBatchFromPlasma(plasma::ObjectID object_id) { // Start up and connect a Plasma client. plasma::PlasmaClient client; ARROW_CHECK_OK(client.Connect("/tmp/store")); plasma::ObjectBuffer object_buffer; ARROW_CHECK_OK(client.Get(_id, 1, -1, _buffer)); // Retrieve object data. auto buffer = object_buffer.data; arrow::io::BufferReader buffer_reader(buffer); std::shared_ptr record_batch_stream_reader; ARROW_CHECK_OK(arrow::ipc::RecordBatchStreamReader::Open(_reader, _batch_stream_reader)); std::shared_ptr record_batch; arrow::Status status = record_batch_stream_reader->ReadNext(_batch); // Disconnect the client. ARROW_CHECK_OK(client.Disconnect()); return record_batch; } int main(int argc, char **argv) { plasma::ObjectID object_id = plasma::ObjectID::from_binary("0FF1CE00C0FFEE00BEEF"); std::shared_ptr record_batch = GetRecordBatchFromPlasma(object_id); std::cout << "record_batch->column_name(0): " << record_batch->column_name(0) << std::endl; std::cout << "record_batch->num_columns(): " << record_batch->num_columns() << std::endl; std::cout << "record_batch->num_rows(): " << record_batch->num_rows() << std::endl; std::cout << "record_batch->column(0)->length(): " << record_batch->column(0)->length() << std::endl; std::cout << "record_batch->column(0)->ToString(): " << record_batch->column(0)->ToString() << std::endl; } {code} The meta info of the Record Batch such as number of columns and rows is still available, but I can't see the content of the columns. {{lldb}} says that the stop reason is {{EXC_BAD_ACCESS}}, so I think the Record Batch is destroyed after {{GetRecordBatchFromPlasma}} finishes. But why can I still see the meta info of this Record Batch? What is the proper way to get the Record Batch if we insist using a function? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7434) [GLib] Homebrew packages seem not working
Chengxin Ma created ARROW-7434: -- Summary: [GLib] Homebrew packages seem not working Key: ARROW-7434 URL: https://issues.apache.org/jira/browse/ARROW-7434 Project: Apache Arrow Issue Type: Bug Components: GLib Affects Versions: 0.15.1 Environment: macOS 10.15.2 Reporter: Chengxin Ma After installing {{apache-arrow}} and {{apache-arrow-glib}} via {{Homebrew}} according to the [Installation Guide|https://arrow.apache.org/install/], I wrote a very simple program to test if they were successfully installed. {code} $ cat hello_world.c #include #include int main(int argc, char **argv) { printf("Hello, World! \n"); } {code} {{gcc}} gave the following error: {code} $ gcc -o hello_world hello_world.c In file included from hello_world.c:3: In file included from /usr/local/include/arrow-glib/arrow-glib.h:22: /usr/local/include/arrow-glib/gobject-type.h:22:10: fatal error: 'glib-object.h' file not found #include ^~~ 1 error generated. {code} Is there any step that I didn’t follow here? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7411) [C++][Flight] Incorrect Arrow Flight benchmark output
Chengxin Ma created ARROW-7411: -- Summary: [C++][Flight] Incorrect Arrow Flight benchmark output Key: ARROW-7411 URL: https://issues.apache.org/jira/browse/ARROW-7411 Project: Apache Arrow Issue Type: Improvement Components: Benchmarking, C++, FlightRPC Affects Versions: 0.15.1 Environment: macOS Reporter: Chengxin Ma Assignee: Chengxin Ma Fix For: 1.0.0 When running Arrow Flight benchmark in the following scenario, the output is incorrect. {code} $ ./arrow-flight-perf-server & [1] 12986 Server host: localhost Server port: 31337 $ ./arrow-flight-benchmark -server_host localhost -test_put Using remote server: true Testing method: DoPut Server host: localhost Server port: 31337 Bytes read: 128000 Nanos: 496372147 Speed: 2459.25 MB/s {code} {{Using remote server}} should be {{false}} and {{Bytes read}} should be {{Bytes write}}. To correct the result of {{Using remote server}}, we can: * Change {{if (FLAGS_server_host == "")}} to another condition which checks if there is already an {{arrow-flight-perf-server}} running. This is a bit complicated to do and might add some unnecessary complexity (e.g. we need to make sure it support all OSes.). * Delete {{Using remote server}}, since we already have {{Server host}} in the output. I personally prefer the second option and will make a PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7320) Target arrow-type-benchmark failed to be built on bullx Linux
Chengxin Ma created ARROW-7320: -- Summary: Target arrow-type-benchmark failed to be built on bullx Linux Key: ARROW-7320 URL: https://issues.apache.org/jira/browse/ARROW-7320 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 1.0.0 Environment: bullx Linux Reporter: Chengxin Ma I was building Arrow on bullx Linux (a Linux distribution compatible with Red Hat Enterprise Linux). CMake options: {code} -DCMAKE_BUILD_TYPE=Debug -DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON {code} {{make}} failed with the following error message: {code} Scanning dependencies of target arrow-type-benchmark [ 72%] Building CXX object src/arrow/CMakeFiles/arrow-type-benchmark.dir/type_benchmark.cc.o make[2]: *** No rule to make target `gbenchmark_ep/src/gbenchmark_ep-install/lib/libbenchmark_main.a', needed by `debug/arrow-type-benchmark'. Stop. make[1]: *** [src/arrow/CMakeFiles/arrow-type-benchmark.dir/all] Error 2 make: *** [all] Error 2 {code} This is due to the same reason as mentioned in [this commit|https://github.com/apache/arrow/pull/4246/commits/f6b0bc7f8dc56f02e2778752235e728b7623a9ee]: If {{-DCMAKE_INSTALL_LIBDIR=lib}} is not explicitly set, {{libbenchmark_main.a}} will be put in {{lib64}} instead of {{lib}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7200) Running Arrow Flight benchmark on two hosts doesn't work
Chengxin Ma created ARROW-7200: -- Summary: Running Arrow Flight benchmark on two hosts doesn't work Key: ARROW-7200 URL: https://issues.apache.org/jira/browse/ARROW-7200 Project: Apache Arrow Issue Type: Bug Components: Benchmarking, C++, FlightRPC Affects Versions: 0.15.1, 0.15.0 Environment: AWS EC2 Instance type: t3a.xlarge AMI: ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20191002 Number of instances: 2 They are capable of pinging each other. Reporter: Chengxin Ma Attachments: Screen Shot 2019-11-18 at 16.00.38.png I was trying to evaluate the performance of Apache Arrow Flight on two hosts (one as the client and the other one as the server), using [the official benchmark|[https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/flight_benchmark.cc]]. Flags I used to build the project were: {code:java} -DARROW_FLIGHT=ON -DCMAKE_BUILD_TYPE=Debug -DARROW_BUILD_BENCHMARKS=ON {code} The branch I used was maint-0.15.x since there was a build error on the master branch. _(The build error on master only existed in the environment where I set up two hosts: AWS. On my local environment (macOS) the build was successful on the master branch. I don't think this build error is relevant to the issue since there is no difference in the cpp source code.)_ On the host acting as the server, I ran {code:java} ./arrow-flight-perf-server{code} On the host acting as the client, I ran {code:java} ./arrow-flight-benchmark --server_host ip-172-31-11-18{code} It gives the following error: {code:java} Failed with error: << IOError: gRPC returned unavailable error, with message: Connect Failed. Detail: Unavailable{code} If I ran {code:java} ./arrow-flight-benchmark --server_host ip-172-31-11-17{code} the error will be different: {code:java} IOError: Server was not available after 10 attempts{code} This is understandable since this host doesn't exist at all. This indicates that Flight is able to find the existing host (ip-172-31-11-18), but the communication somehow didn't succeed. The benchmark works fine if I run it with the localhost, either by not specifying the server_host flag or running the server in another process on the same host. I am not sure if the problem is in the environment or in the code itself. Could someone please give me some hint on how to resolve the problem? -- This message was sent by Atlassian Jira (v8.3.4#803005)