[jira] [Created] (ARROW-8861) Memory not released until Plasma process is killed

2020-05-19 Thread Chengxin Ma (Jira)
Chengxin Ma created ARROW-8861:
--

 Summary: Memory not released until Plasma process is killed
 Key: ARROW-8861
 URL: https://issues.apache.org/jira/browse/ARROW-8861
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Plasma
Affects Versions: 0.16.0
 Environment: Singularity container (Ubuntu 18.04)
Reporter: Chengxin Ma


Invoking the {{Delete(const ObjectID& object_id)}} method of a plasma client 
seems not really to free up the memory used by the object.

To reproduce:
 1. use {{htop}} (or other similar tools) to monitor memory usage;
 2. start up the Plasma Object Store by {{plasma_store -m 10 -s 
/tmp/plasma}};
 3. use {{put.py}} to put an object into Plasma;
 4. compile and run {{delete.cc}} ({{g++ delete.cc `pkg-config --cflags --libs 
arrow plasma` --std=c++11 -o delete}});
 5. kill the {{plasma_store}} process.

Memory usage drops at Step 5, rather than Step 4.

How to free up the memory while keeping Plasma Object Store running?

{{put.py}}:
{code:java}
from pyarrow import plasma

if __name__ == "__main__":
client = plasma.connect("/tmp/plasma")
object_id = plasma.ObjectID(20 * b"a")
object_size = 5
buffer = memoryview(client.create(object_id, object_size))
for i in range(5):
buffer[i] = i % 128
client.seal(object_id)
client.disconnect()
{code}
{{delete.cc}}:
{code:java}
#include "arrow/util/logging.h"
#include 

using namespace plasma;

int main(int argc, char **argv)
{
PlasmaClient client;
ARROW_CHECK_OK(client.Connect("/tmp/plasma"));
ObjectID object_id = ObjectID::from_binary("");

client.Delete(object_id);

ARROW_CHECK_OK(client.Disconnect());
}
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8587) Compilation error when linking arrow-flight-perf-server

2020-04-24 Thread Chengxin Ma (Jira)
Chengxin Ma created ARROW-8587:
--

 Summary: Compilation error when linking arrow-flight-perf-server
 Key: ARROW-8587
 URL: https://issues.apache.org/jira/browse/ARROW-8587
 Project: Apache Arrow
  Issue Type: Bug
  Components: Benchmarking, C++, FlightRPC
Affects Versions: 1.0.0
 Environment: Linux HP 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 
31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Chengxin Ma


I wanted to play around with Flight benchmark after seeing the discussion 
regarding Flight's throughput in arrow dev mailing list today.

I met the following error when trying to build the benchmark from latest source 
code:
{code:java}
[ 95%] Linking CXX executable ../../../debug/arrow-flight-perf-server
../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
`boost::filesystem::detail::canonical(boost::filesystem::path const&, 
boost::filesystem::path const&, boost::system::error_code*)'
../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
`boost::system::system_category()'
../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
`boost::filesystem::path::parent_path() const'
../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflate'
../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateEnd'
../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
`boost::system::generic_category()'
../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
`boost::filesystem::detail::current_path(boost::system::error_code*)'
../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateInit2_'
../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflate'
../../../debug/libarrow_flight.so.18.0.0: undefined reference to `deflateInit2_'
../../../debug/libarrow_flight.so.18.0.0: undefined reference to `inflateEnd'
../../../debug/libarrow_flight_testing.so.18.0.0: undefined reference to 
`boost::filesystem::path::operator/=(boost::filesystem::path const&)'
collect2: error: ld returned 1 exit status
src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/build.make:154: recipe 
for target 'debug/arrow-flight-perf-server' failed
make[2]: *** [debug/arrow-flight-perf-server] Error 1
CMakeFiles/Makefile2:2609: recipe for target 
'src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all' failed
make[1]: *** [src/arrow/flight/CMakeFiles/arrow-flight-perf-server.dir/all] 
Error 2
Makefile:140: recipe for target 'all' failed
make: *** [all] Error 2

{code}
I was using {{cmake .. -DCMAKE_BUILD_TYPE=Debug -DARROW_DEPENDENCY_SOURCE=AUTO 
-DARROW_FLIGHT=ON -DARROW_BUILD_BENCHMARKS=ON 
-DARROW_CXXFLAGS="-lboost_filesystem -lboost_system"}} to configure the build.
 I noticed that there was a {{ARROW_BOOST_BUILD_VERSION: 1.71.0}} in the 
output, but the Boost library that I installed from the package manger was of 
this version: {{1.65.1.0ubuntu1}}. Could this be the cause of the problem?

PS:
I was able to build the benchmark 
[before|https://issues.apache.org/jira/browse/ARROW-7200]. It was on AWS with 
the OS being ubuntu-bionic-18.04-amd64-server-20191002, which should be very 
similar to the one I'm using on my laptop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7522) Broken Record Batch returned from a function call

2020-01-08 Thread Chengxin Ma (Jira)
Chengxin Ma created ARROW-7522:
--

 Summary: Broken Record Batch returned from a function call
 Key: ARROW-7522
 URL: https://issues.apache.org/jira/browse/ARROW-7522
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, C++ - Plasma
Affects Versions: 0.15.1
 Environment: macOS
Reporter: Chengxin Ma


Scenario: retrieving Record Batch from Plasma with known Object ID.

The following code snippet works well:
{code:java}
int main(int argc, char **argv)
{
plasma::ObjectID object_id = 
plasma::ObjectID::from_binary("0FF1CE00C0FFEE00BEEF");

// Start up and connect a Plasma client.
plasma::PlasmaClient client;
ARROW_CHECK_OK(client.Connect("/tmp/store"));

plasma::ObjectBuffer object_buffer;
ARROW_CHECK_OK(client.Get(_id, 1, -1, _buffer));

// Retrieve object data.
auto buffer = object_buffer.data;

arrow::io::BufferReader buffer_reader(buffer); 
std::shared_ptr record_batch_stream_reader;
ARROW_CHECK_OK(arrow::ipc::RecordBatchStreamReader::Open(_reader, 
_batch_stream_reader));

std::shared_ptr record_batch;
arrow::Status status = record_batch_stream_reader->ReadNext(_batch);

std::cout << "record_batch->column_name(0): " << 
record_batch->column_name(0) << std::endl;
std::cout << "record_batch->num_columns(): " << record_batch->num_columns() 
<< std::endl;
std::cout << "record_batch->num_rows(): " << record_batch->num_rows() << 
std::endl;
std::cout << "record_batch->column(0)->length(): "
  << record_batch->column(0)->length() << std::endl;
std::cout << "record_batch->column(0)->ToString(): "
  << record_batch->column(0)->ToString() << std::endl;
}
{code}
{{record_batch->column(0)->ToString()}} would incur a segmentation fault if 
retrieving Record Batch is wrapped in a function:
{code:java}
std::shared_ptr GetRecordBatchFromPlasma(plasma::ObjectID 
object_id)
{
// Start up and connect a Plasma client.
plasma::PlasmaClient client;
ARROW_CHECK_OK(client.Connect("/tmp/store"));

plasma::ObjectBuffer object_buffer;
ARROW_CHECK_OK(client.Get(_id, 1, -1, _buffer));

// Retrieve object data.
auto buffer = object_buffer.data;

arrow::io::BufferReader buffer_reader(buffer);
std::shared_ptr record_batch_stream_reader;
ARROW_CHECK_OK(arrow::ipc::RecordBatchStreamReader::Open(_reader, 
_batch_stream_reader));

std::shared_ptr record_batch;
arrow::Status status = record_batch_stream_reader->ReadNext(_batch);

// Disconnect the client.
ARROW_CHECK_OK(client.Disconnect());

return record_batch;
}

int main(int argc, char **argv)
{
plasma::ObjectID object_id = 
plasma::ObjectID::from_binary("0FF1CE00C0FFEE00BEEF");

std::shared_ptr record_batch = 
GetRecordBatchFromPlasma(object_id);

std::cout << "record_batch->column_name(0): " << 
record_batch->column_name(0) << std::endl;
std::cout << "record_batch->num_columns(): " << record_batch->num_columns() 
<< std::endl;
std::cout << "record_batch->num_rows(): " << record_batch->num_rows() << 
std::endl;
std::cout << "record_batch->column(0)->length(): "
  << record_batch->column(0)->length() << std::endl;
std::cout << "record_batch->column(0)->ToString(): "
  << record_batch->column(0)->ToString() << std::endl;
}
{code}
The meta info of the Record Batch such as number of columns and rows is still 
available, but I can't see the content of the columns.

{{lldb}} says that the stop reason is {{EXC_BAD_ACCESS}}, so I think the Record 
Batch is destroyed after {{GetRecordBatchFromPlasma}} finishes. But why can I 
still see the meta info of this Record Batch?
 What is the proper way to get the Record Batch if we insist using a function?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7434) [GLib] Homebrew packages seem not working

2019-12-18 Thread Chengxin Ma (Jira)
Chengxin Ma created ARROW-7434:
--

 Summary: [GLib] Homebrew packages seem not working
 Key: ARROW-7434
 URL: https://issues.apache.org/jira/browse/ARROW-7434
 Project: Apache Arrow
  Issue Type: Bug
  Components: GLib
Affects Versions: 0.15.1
 Environment: macOS 10.15.2
Reporter: Chengxin Ma


After installing {{apache-arrow}} and {{apache-arrow-glib}} via {{Homebrew}} 
according to the [Installation Guide|https://arrow.apache.org/install/], I 
wrote a very simple program to test if they were successfully installed.

{code}
$ cat hello_world.c
#include 

#include 

int main(int argc, char **argv) {
printf("Hello, World! \n");
}
{code}

{{gcc}} gave the following error:

{code}
$ gcc -o hello_world hello_world.c
In file included from hello_world.c:3:
In file included from /usr/local/include/arrow-glib/arrow-glib.h:22:
/usr/local/include/arrow-glib/gobject-type.h:22:10: fatal error: 
'glib-object.h' file not found
#include 
 ^~~
1 error generated.
{code}

Is there any step that I didn’t follow here?




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7411) [C++][Flight] Incorrect Arrow Flight benchmark output

2019-12-17 Thread Chengxin Ma (Jira)
Chengxin Ma created ARROW-7411:
--

 Summary: [C++][Flight] Incorrect Arrow Flight benchmark output
 Key: ARROW-7411
 URL: https://issues.apache.org/jira/browse/ARROW-7411
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Benchmarking, C++, FlightRPC
Affects Versions: 0.15.1
 Environment: macOS
Reporter: Chengxin Ma
Assignee: Chengxin Ma
 Fix For: 1.0.0


When running Arrow Flight benchmark in the following scenario, the output is 
incorrect. 
{code}
$ ./arrow-flight-perf-server &
[1] 12986
Server host: localhost
Server port: 31337
$ ./arrow-flight-benchmark -server_host localhost -test_put 
Using remote server: true
Testing method: DoPut
Server host: localhost
Server port: 31337
Bytes read: 128000
Nanos: 496372147
Speed: 2459.25 MB/s
{code}

{{Using remote server}} should be {{false}} and {{Bytes read}} should be 
{{Bytes write}}.

To correct the result of {{Using remote server}}, we can:

* Change {{if (FLAGS_server_host == "")}} to another condition which checks if 
there is already an {{arrow-flight-perf-server}} running. This is a bit 
complicated to do and might add some unnecessary complexity (e.g. we need to 
make sure it support all OSes.). 

* Delete {{Using remote server}}, since we already have {{Server host}} in the 
output.

I personally prefer the second option and will make a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7320) Target arrow-type-benchmark failed to be built on bullx Linux

2019-12-04 Thread Chengxin Ma (Jira)
Chengxin Ma created ARROW-7320:
--

 Summary: Target arrow-type-benchmark failed to be built on bullx 
Linux
 Key: ARROW-7320
 URL: https://issues.apache.org/jira/browse/ARROW-7320
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 1.0.0
 Environment: bullx Linux
Reporter: Chengxin Ma


I was building Arrow on bullx Linux (a Linux distribution compatible with Red 
Hat Enterprise Linux).

CMake options:
{code}
-DCMAKE_BUILD_TYPE=Debug
-DARROW_FLIGHT=ON
-DARROW_BUILD_BENCHMARKS=ON
{code}

{{make}} failed with the following error message:
{code}
Scanning dependencies of target arrow-type-benchmark
[ 72%] Building CXX object 
src/arrow/CMakeFiles/arrow-type-benchmark.dir/type_benchmark.cc.o
make[2]: *** No rule to make target 
`gbenchmark_ep/src/gbenchmark_ep-install/lib/libbenchmark_main.a', needed by 
`debug/arrow-type-benchmark'.  Stop.
make[1]: *** [src/arrow/CMakeFiles/arrow-type-benchmark.dir/all] Error 2
make: *** [all] Error 2
{code}

This is due to the same reason as mentioned in [this 
commit|https://github.com/apache/arrow/pull/4246/commits/f6b0bc7f8dc56f02e2778752235e728b7623a9ee]:

If {{-DCMAKE_INSTALL_LIBDIR=lib}} is not explicitly set, 
{{libbenchmark_main.a}} will be put in {{lib64}} instead of {{lib}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7200) Running Arrow Flight benchmark on two hosts doesn't work

2019-11-18 Thread Chengxin Ma (Jira)
Chengxin Ma created ARROW-7200:
--

 Summary: Running Arrow Flight benchmark on two hosts doesn't work
 Key: ARROW-7200
 URL: https://issues.apache.org/jira/browse/ARROW-7200
 Project: Apache Arrow
  Issue Type: Bug
  Components: Benchmarking, C++, FlightRPC
Affects Versions: 0.15.1, 0.15.0
 Environment: AWS EC2
Instance type: t3a.xlarge
AMI: ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20191002
Number of instances: 2
They are capable of pinging each other.
Reporter: Chengxin Ma
 Attachments: Screen Shot 2019-11-18 at 16.00.38.png

I was trying to evaluate the performance of Apache Arrow Flight on two hosts 
(one as the client and the other one as the server), using [the official 
benchmark|[https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/flight_benchmark.cc]].

Flags I used to build the project were:

 
{code:java}
-DARROW_FLIGHT=ON
-DCMAKE_BUILD_TYPE=Debug
-DARROW_BUILD_BENCHMARKS=ON
{code}
 

The branch I used was maint-0.15.x since there was a build error on the master 
branch. _(The build error on master only existed in the environment where I set 
up two hosts: AWS. On my local environment (macOS) the build was successful on 
the master branch. I don't think this build error is relevant to the issue 
since there is no difference in the cpp source code.)_

On the host acting as the server, I ran 
{code:java}
./arrow-flight-perf-server{code}
On the host acting as the client, I ran 
{code:java}
./arrow-flight-benchmark --server_host ip-172-31-11-18{code}
It gives the following error: 
{code:java}
Failed with error: << IOError: gRPC returned unavailable error, with message: 
Connect Failed. Detail: Unavailable{code}
 

 If I ran 
{code:java}
./arrow-flight-benchmark --server_host ip-172-31-11-17{code}
the error will be different:
{code:java}
IOError: Server was not available after 10 attempts{code}
This is understandable since this host doesn't exist at all.

This indicates that Flight is able to find the existing host (ip-172-31-11-18), 
but the communication somehow didn't succeed.

The benchmark works fine if I run it with the localhost, either by not 
specifying the server_host flag or running the server in another process on the 
same host.

I am not sure if the problem is in the environment or in the code itself. Could 
someone please give me some hint on how to resolve the problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)