RE: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-06 Thread Melik-Adamyan, Areg
> The question is whether you want to spend at least a month or more of > intense development on something else (a basic query engine, as we've been > discussing in [1]) before we are able to develop consensus about the > approach to threading. Personally, I would not make this choice given that >

RE: How about inet4/inet6/macaddr data types?

2019-04-29 Thread Melik-Adamyan, Areg
If you want to store it and manipulate the best format is integers (or binary) - it will allow all the fast operations of masking, subnet querying, etc. but text representation will require conversion. It highly depends on the use-case, but conversion to pgSQL's inet or cidr from integer is

RE: [Contribution][Proposal] Use Contributors file and Signed-Off-By Process for Arrow

2019-04-29 Thread Melik-Adamyan, Areg
onsibility of the Committers and PMC members to steward IP in the > project, and one of the parts of the release process is to verify that the > software has complied with the ASF's licensing policies [1] > > Thanks > Wes > > [1]: https://apache.org/legal/resolved.html >

[Contribution][Proposal] Use Contributors file and Signed-Off-By Process for Arrow

2019-04-29 Thread Melik-Adamyan, Areg
To avoid contamination of the Arrow code with wrong licensed code, which can be accidentally included into arrow, including GPL code, and track the contributions maintainers needs to check actually whether committer has signed the ICLA or CCLA, and listed in the contributors file - which we do

RE: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

2019-04-25 Thread Melik-Adamyan, Areg
Hi, We are talking about the same thing actually, but you do not want to use 3rd party tools. For 3 and 4 - you run the first version store in 1.out, then second version store in 2.out and run compare tool. Your tool does two steps automatically, that is fine. > Various reason why I think

RE: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

2019-04-24 Thread Melik-Adamyan, Areg
mckinney.com/blog/introducing-vbench-new-code-performance- > analysis-and-monitoring-tool/ > [3]: https://github.com/airspeed-velocity/asv > > On Wed, Apr 24, 2019 at 11:18 AM Sebastien Binet wrote: > > > > On Wed, Apr 24, 2019 at 11:22 AM Antoine Pitrou > wrote: > >

RE: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

2019-04-24 Thread Melik-Adamyan, Areg
mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure] On Wed, Apr 24, 2019 at 11:22 AM Antoine Pitrou wrote: > > Hi Areg, > > Le 23/04/2019 à 23:43, Melik-Adamyan, Areg a écrit : > > Because we are using Google Benchmark, which has specific format > >

RE: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

2019-04-23 Thread Melik-Adamyan, Areg
oses a usable CLI interface (with documentation). [1] https://github.com/apache/arrow/pull/4141 [2] https://jira.apache.org/jira/browse/ARROW-4827 [3] https://github.com/apache/arrow/blob/512ae64bc074a0b620966131f9338d4a1eed2356/docs/source/developers/benchmarks.rst [4] https://github.com/apache/arrow/

RE: [Rust] [DataFusion] Parallel query execution PoC

2019-04-22 Thread Melik-Adamyan, Areg
I would encourage you to familiarize yourself with the proposal https://cwiki.apache.org/confluence/display/ARROW/Parallel+Execution+Engine and join the forces for more rapid development of the engine. -Original Message- From: Andy Grove [mailto:andygrov...@gmail.com] Sent: Saturday,

RE: [Discuss] Benchmarking infrastructure

2019-03-29 Thread Melik-Adamyan, Areg
>When you say "output is parsed", how is that exactly? We don't have any >scripts in the repository to do this yet (I have some comments on this below). >We also have to collect machine information and insert that into the database. >From my >perspective we have quite a bit of engineering work

[Discuss] Benchmarking infrastructure

2019-03-29 Thread Melik-Adamyan, Areg
Back to the benchmarking per commit. So currently I have fired a community TeamCity Edition here http://arrow-publi-1wwtu5dnaytn9-2060566241.us-east-1.elb.amazonaws.com and dedicated pool of two Skylake bare metal machines (Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz) This can go to up to 4 if

RE: FPGA support for Apache Arrow

2019-03-28 Thread Melik-Adamyan, Areg
Hi Chris, Do you have plans to contribute the infrastructure part back to the community so the others can build hybrid pipelines? -Original Message- From: Wes McKinney [mailto:wesmck...@gmail.com] Sent: Thursday, March 28, 2019 10:51 AM To: dev@arrow.apache.org Cc: ch...@inaccel.com

RE: Benchmarking dashboard proposal

2019-02-20 Thread Melik-Adamyan, Areg
> * Programming language(s) associated with benchmark (e.g. >> > > > > > > a >> > > benchmark >> > > > > > > may involve both C++ and Python) >> > > > > > > * Benchmark time, plus mean and standard deviation if >> available, >>

RE: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-23 Thread Melik-Adamyan, Areg
+1 (non-binding) Is there a plan for C++ API? -Original Message- From: Renjie Liu [mailto:liurenjie2...@gmail.com] Sent: Wednesday, January 23, 2019 7:44 PM To: dev@arrow.apache.org Subject: Re: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow +1 (non-binding) I also

Benchmarking dashboard proposal

2019-01-17 Thread Melik-Adamyan, Areg
Hello, I want to restart/attach to the discussions for creating Arrow benchmarking dashboard. I want to propose performance benchmark run per commit to track the changes. The proposal includes building infrastructure for per-commit tracking comprising of the following parts: - Hosted JetBrains