[jira] [Created] (ARROW-4296) [Plasma] Starting Plasma store with use_one_memory_mapped_file enabled crashes due to improper memory alignment

2019-01-18 Thread Anurag Khandelwal (JIRA)
Anurag Khandelwal created ARROW-4296:


 Summary: [Plasma] Starting Plasma store with 
use_one_memory_mapped_file enabled crashes due to improper memory alignment
 Key: ARROW-4296
 URL: https://issues.apache.org/jira/browse/ARROW-4296
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Plasma (C++)
Affects Versions: 0.11.1
Reporter: Anurag Khandelwal
 Fix For: 0.13.0


Starting Plasma with use_one_memory_mapped_file (-f flag) causes a crash, most 
likely due to improper memory alignment. This can be resolved by changing the 
dlmemalign call during initialization to use slightly smaller memory (by ~8KB).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4295) [Plasma] Incorrect log message when evicting objects

2019-01-18 Thread Anurag Khandelwal (JIRA)
Anurag Khandelwal created ARROW-4295:


 Summary: [Plasma] Incorrect log message when evicting objects
 Key: ARROW-4295
 URL: https://issues.apache.org/jira/browse/ARROW-4295
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Plasma (C++)
Affects Versions: 0.11.1
Reporter: Anurag Khandelwal
 Fix For: 0.13.0


When Plasma evicts objects on running out of memory, it prints log messages of 
the form:

{quote}There is not enough space to create this object, so evicting x objects 
to free up y bytes. The number of bytes in use (before this eviction) is 
z.{quote}

However, the reported number of bytes in use (before this eviction) actually 
reports the number of bytes *after* the eviction. A straightforward fix is to 
simply replace z with (y+z).





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Release Apache Arrow 0.12.0 RC4

2019-01-18 Thread Andrew Palumbo
My mistake I missed the note about using the verify script from master.

On Jan 18, 2019 3:38 PM, Andrew Palumbo  wrote:
I seem have been failing here:

../../../release/libgandiva.so.12.0.0: undefined reference to 
`llvm::sys::DynamicLibrary::getPermanentLibrary(char const*, std::string*)'
collect2: error: ld returned 1 exit status
src/gandiva/tests/CMakeFiles/gandiva-date_time_test.dir/build.make:94: recipe 
for target 'release/gandiva-date_time_test' failed
make[2]: *** [release/gandiva-date_time_test] Error 1
CMakeFiles/Makefile2:8836: recipe for target 
'src/gandiva/tests/CMakeFiles/gandiva-date_time_test.dir/all' failed
make[1]: *** [src/gandiva/tests/CMakeFiles/gandiva-date_time_test.dir/all] 
Error 2
[ 63%] Linking CXX executable ../../../release/gandiva-projector_test
../../../release/libgandiva.so.12.0.0: undefined reference to 
`llvm::sys::DynamicLibrary::getPermanentLibrary(char const*, std::string*)'
collect2: error: ld returned 1 exit status
src/gandiva/tests/CMakeFiles/gandiva-projector_test.dir/build.make:94: recipe 
for target 'release/gandiva-projector_test' failed
make[2]: *** [release/gandiva-projector_test] Error 1
CMakeFiles/Makefile2:8627: recipe for target 
'src/gandiva/tests/CMakeFiles/gandiva-projector_test.dir/all' failed
make[1]: *** [src/gandiva/tests/CMakeFiles/gandiva-projector_test.dir/all] 
Error 2
[ 63%] Linking CXX executable ../../../release/gandiva-projector_test_static
../../../release/libgandiva.a(engine.cc.o): In function 
`gandiva::Engine::InitOnce()':
engine.cc:(.text+0x537): undefined reference to 
`llvm::sys::DynamicLibrary::getPermanentLibrary(char const*, std::string*)'
collect2: error: ld returned 1 exit status
src/gandiva/tests/CMakeFiles/gandiva-projector_test_static.dir/build.make:147: 
recipe for target 'release/gandiva-projector_test_static' failed
make[2]: *** [release/gandiva-projector_test_static] Error 1
CMakeFiles/Makefile2:8756: recipe for target 
'src/gandiva/tests/CMakeFiles/gandiva-projector_test_static.dir/all' failed
make[1]: *** 
[src/gandiva/tests/CMakeFiles/gandiva-projector_test_static.dir/all] Error 2
Makefile:140: recipe for target 'all' failed
make: *** [all] Error 2
+ cleanup
+ rm -fr /tmp/arrow-0.12.0.mC5MI

running:


dev/release/verify-release-candidate.sh source 0.12.0 4



might want to take with a grain of salt, my system may not be configured 
correctly.  Are there any new dependencies  necessary since RC2?

--andy

From: Wes McKinney 
Sent: Thursday, January 17, 2019 5:51 PM
To: dev@arrow.apache.org
Subject: Re: [VOTE] Release Apache Arrow 0.12.0 RC4

+1 (binding)

* Ran verification script with Ubuntu 14.04, JDK8, Node 11.6
(installed with nvm, which I guess sets up npx), gcc 4.8
* Ran verification script on Windows 10 / MSVC 2015 with a minor
change to turn on the unit tests
https://github.com/apache/arrow/commit/0f8bd747468dd28c909ef823bed77d8082a5b373

Thanks to Krisztian and everyone else for the help on this release

On Thu, Jan 17, 2019 at 11:59 AM Antoine Pitrou  wrote:
>
>
> +1 (binding)
>
> I ran "./dev/release/verify-release-candidate.sh source 0.12.0 4".
> Everything succeeded before JS, which failed
> ("./dev/release/verify-release-candidate.sh: ligne 258: npx : commande
> introuvable").
>
> - Ubuntu 18.04.1 (x86-64)
> - CUDA enabled
> - gcc 7.3.0
> - java version "1.8.0_201"
> Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
> Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
>
> Regards
>
> Antoine.
>
>
>
> Le 17/01/2019 à 18:03, Bryan Cutler a écrit :
> > +1
> >
> > I ran ARROW_HAVE_CUDA=NO dev/release/verify-release-candidate.sh source
> > 0.12.0 4
> >
> >- 4.15.0-43-generic #46~16.04.1-Ubuntu
> >- openjdk version "1.8.0_191"
> >- gcc version 5.4.0
> >
> > I also ran Spark integration tests and was able to get all tests passing
> > after some minor modifications.
> >
> > Bryan
> >
> > On Thu, Jan 17, 2019 at 12:48 AM Kouhei Sutou  wrote:
> >
> >> +1 (binding)
> >>
> >> I ran the followings on Debian GNU/Linux sid:
> >>
> >>   * CC=gcc-7 CXX=g++-7 dev/release/verify-release-candidate.sh source
> >> 0.12.0 4
> >>   * dev/release/verify-release-candidate.sh binaries 0.12.0 4
> >>
> >> with:
> >>
> >>   * gcc-7 (Debian 7.4.0-2) 7.4.0
> >>   * openjdk version "1.8.0_191"
> >>   * ruby 2.7.0dev (2019-01-17 trunk 66841) [x86_64-linux]
> >>   * nvidia-cuda-dev 9.2.148-5
> >>
> >> I couldn't run JavaScript tests and integration tests with
> >> JavaScript because I couldn't prepare workable Node.js 11.
> >> (I can prepare workable Node.js 10.15.)
> >>
> >> RC4 includes JavaScript update
> >>
> >> https://github.com/apache/arrow/commit/5598d2f42573ed19e7db4aae7adb02af2cd4ccd0
> >> . It requires Node.js 11.
> >>
> >> I think that JavaScript implementation isn't important in
> >> this release. Because we still release JavaScript
> >> implementation separately.
> >>
> >> I hope that 0.13.0 release includes JavaScript
> >> implementation.
> >>
> >>
> >> Thanks,
> 

Re: [VOTE] Release Apache Arrow 0.12.0 RC4

2019-01-18 Thread Andrew Palumbo
I seem have been failing here:

../../../release/libgandiva.so.12.0.0: undefined reference to 
`llvm::sys::DynamicLibrary::getPermanentLibrary(char const*, std::string*)'
collect2: error: ld returned 1 exit status
src/gandiva/tests/CMakeFiles/gandiva-date_time_test.dir/build.make:94: recipe 
for target 'release/gandiva-date_time_test' failed
make[2]: *** [release/gandiva-date_time_test] Error 1
CMakeFiles/Makefile2:8836: recipe for target 
'src/gandiva/tests/CMakeFiles/gandiva-date_time_test.dir/all' failed
make[1]: *** [src/gandiva/tests/CMakeFiles/gandiva-date_time_test.dir/all] 
Error 2
[ 63%] Linking CXX executable ../../../release/gandiva-projector_test
../../../release/libgandiva.so.12.0.0: undefined reference to 
`llvm::sys::DynamicLibrary::getPermanentLibrary(char const*, std::string*)'
collect2: error: ld returned 1 exit status
src/gandiva/tests/CMakeFiles/gandiva-projector_test.dir/build.make:94: recipe 
for target 'release/gandiva-projector_test' failed
make[2]: *** [release/gandiva-projector_test] Error 1
CMakeFiles/Makefile2:8627: recipe for target 
'src/gandiva/tests/CMakeFiles/gandiva-projector_test.dir/all' failed
make[1]: *** [src/gandiva/tests/CMakeFiles/gandiva-projector_test.dir/all] 
Error 2
[ 63%] Linking CXX executable ../../../release/gandiva-projector_test_static
../../../release/libgandiva.a(engine.cc.o): In function 
`gandiva::Engine::InitOnce()':
engine.cc:(.text+0x537): undefined reference to 
`llvm::sys::DynamicLibrary::getPermanentLibrary(char const*, std::string*)'
collect2: error: ld returned 1 exit status
src/gandiva/tests/CMakeFiles/gandiva-projector_test_static.dir/build.make:147: 
recipe for target 'release/gandiva-projector_test_static' failed
make[2]: *** [release/gandiva-projector_test_static] Error 1
CMakeFiles/Makefile2:8756: recipe for target 
'src/gandiva/tests/CMakeFiles/gandiva-projector_test_static.dir/all' failed
make[1]: *** 
[src/gandiva/tests/CMakeFiles/gandiva-projector_test_static.dir/all] Error 2
Makefile:140: recipe for target 'all' failed
make: *** [all] Error 2
+ cleanup
+ rm -fr /tmp/arrow-0.12.0.mC5MI

running:


dev/release/verify-release-candidate.sh source 0.12.0 4



might want to take with a grain of salt, my system may not be configured 
correctly.  Are there any new dependencies  necessary since RC2?

--andy

From: Wes McKinney 
Sent: Thursday, January 17, 2019 5:51 PM
To: dev@arrow.apache.org
Subject: Re: [VOTE] Release Apache Arrow 0.12.0 RC4

+1 (binding)

* Ran verification script with Ubuntu 14.04, JDK8, Node 11.6
(installed with nvm, which I guess sets up npx), gcc 4.8
* Ran verification script on Windows 10 / MSVC 2015 with a minor
change to turn on the unit tests
https://github.com/apache/arrow/commit/0f8bd747468dd28c909ef823bed77d8082a5b373

Thanks to Krisztian and everyone else for the help on this release

On Thu, Jan 17, 2019 at 11:59 AM Antoine Pitrou  wrote:
>
>
> +1 (binding)
>
> I ran "./dev/release/verify-release-candidate.sh source 0.12.0 4".
> Everything succeeded before JS, which failed
> ("./dev/release/verify-release-candidate.sh: ligne 258: npx : commande
> introuvable").
>
> - Ubuntu 18.04.1 (x86-64)
> - CUDA enabled
> - gcc 7.3.0
> - java version "1.8.0_201"
> Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
> Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
>
> Regards
>
> Antoine.
>
>
>
> Le 17/01/2019 à 18:03, Bryan Cutler a écrit :
> > +1
> >
> > I ran ARROW_HAVE_CUDA=NO dev/release/verify-release-candidate.sh source
> > 0.12.0 4
> >
> >- 4.15.0-43-generic #46~16.04.1-Ubuntu
> >- openjdk version "1.8.0_191"
> >- gcc version 5.4.0
> >
> > I also ran Spark integration tests and was able to get all tests passing
> > after some minor modifications.
> >
> > Bryan
> >
> > On Thu, Jan 17, 2019 at 12:48 AM Kouhei Sutou  wrote:
> >
> >> +1 (binding)
> >>
> >> I ran the followings on Debian GNU/Linux sid:
> >>
> >>   * CC=gcc-7 CXX=g++-7 dev/release/verify-release-candidate.sh source
> >> 0.12.0 4
> >>   * dev/release/verify-release-candidate.sh binaries 0.12.0 4
> >>
> >> with:
> >>
> >>   * gcc-7 (Debian 7.4.0-2) 7.4.0
> >>   * openjdk version "1.8.0_191"
> >>   * ruby 2.7.0dev (2019-01-17 trunk 66841) [x86_64-linux]
> >>   * nvidia-cuda-dev 9.2.148-5
> >>
> >> I couldn't run JavaScript tests and integration tests with
> >> JavaScript because I couldn't prepare workable Node.js 11.
> >> (I can prepare workable Node.js 10.15.)
> >>
> >> RC4 includes JavaScript update
> >>
> >> https://github.com/apache/arrow/commit/5598d2f42573ed19e7db4aae7adb02af2cd4ccd0
> >> . It requires Node.js 11.
> >>
> >> I think that JavaScript implementation isn't important in
> >> this release. Because we still release JavaScript
> >> implementation separately.
> >>
> >> I hope that 0.13.0 release includes JavaScript
> >> implementation.
> >>
> >>
> >> Thanks,
> >> --
> >> kou
> >>
> >> In 
> >>   "[VOTE] Release Apache Arrow 0.12.0 RC4" on Wed, 16 Jan 2019 12:59:35
> >> +0100,
> 

[jira] [Created] (ARROW-4294) [Plasma] Add support for evicting objects to external store

2019-01-18 Thread Anurag Khandelwal (JIRA)
Anurag Khandelwal created ARROW-4294:


 Summary: [Plasma] Add support for evicting objects to external 
store
 Key: ARROW-4294
 URL: https://issues.apache.org/jira/browse/ARROW-4294
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Affects Versions: 0.11.1
Reporter: Anurag Khandelwal
 Fix For: 0.13.0


Currently, when Plasma needs storage space for additional objects, it evicts 
objects by deleting them from the Plasma store. This is a problem when it isn't 
possible to reconstruct the object or reconstructing it is expensive. Adding 
support for a pluggable external store that Plasma can evict objects to will 
address this issue. 

My proposal is described below.

*Requirements*
 * Objects in Plasma should be evicted to a external store rather than being 
removed altogether
 * Communication to the external storage service should be through a very thin, 
shim interface. At the same time, the interface should be general enough to 
support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
 * Should be pluggable (e.g., it should be simple to add in or remove the 
external storage service for eviction, switch between different remote 
services, etc.) and easy to implement

*Assumptions/Non-Requirements*
 * The external store has practically infinite storage
 * The external store's write operation is idempotent and atomic; this is 
needed ensure there are no race conditions due to multiple concurrent evictions 
of the same object.

*Proposed Implementation*
 * Define a ExternalStore interface with a Connect call. The call returns an 
ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
needs to be supported has to have this interface implemented.
 * In order to read or write data to the external store in a thread-safe 
manner, one ExternalStoreHandle should be created per-thread. While the 
ExternalStoreHandle itself is not required to be thread-safe, multiple 
ExternalStoreHandles across multiple threads should be able to modify the 
external store in a thread-safe manner.
 * Replace the DeleteObjects method in the Plasma Store with an EvictObjects 
method. If an external store is specified for the Plasma store, the 
EvictObjects method would mark the object state as PLASMA_EVICTED, write the 
object data to the external store (via the ExternalStoreHandle) and reclaim the 
memory associated with the object data/metadata rather than remove the entry 
from the Object Table altogether. In case there is no valid external store, the 
eviction path would remain the same (i.e., the object entry is still deleted 
from the Object Table).
 * The Get method in Plasma Store now tries to fetch the object from external 
store if it is not found locally and there is an external store associated with 
the Plasma Store. The method tries to offload this to an external worker thread 
pool with a fire-and-forget model, but may need to do this synchronously if 
there are too many requests already enqueued.
 * *The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, which 
can be appended to with implementations of the ExternalStore and 
ExternalStoreHandle interfaces, which will then be compiled into the 
plasma_store_server executable.*

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4293) [C++] Can't access parquet statistics on binary columns

2019-01-18 Thread Ildar (JIRA)
Ildar created ARROW-4293:


 Summary: [C++] Can't access parquet statistics on binary columns
 Key: ARROW-4293
 URL: https://issues.apache.org/jira/browse/ARROW-4293
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Ildar


Hi,

I'm trying to use per-column statistics (min/max values) to filter out row 
groups while reading parquet file. But I don't see statistics built for binary 
columns. I noticed that {{ApplicationVersion::HasCorrectStatistics()}} discards 
statistics that have sort order {{UNSIGNED }}and haven't been created by 
{{parquet-cpp}}. As I understand there used to be some issues in {{parquet-mr}} 
before. But do they still persist?

For example, I have parquet file created with {{parquet-mr}} version 1.10, it 
seems to have correct min/max values for binary columns. And {{parquet-cpp}} 
works fine for me if I remove this code from {{HasCorrectStatistics()}} func:

{{ if (SortOrder::SIGNED != sort_order && !max_equals_min) {}}
{{    return false; }}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Benchmarking dashboard proposal

2019-01-18 Thread Tom Augspurger
I'll see if I can figure out why the benchmarks at
https://pandas.pydata.org/speed/arrow/ aren't being updated this weekend.

On Fri, Jan 18, 2019 at 2:34 AM Uwe L. Korn  wrote:

> Hello,
>
> note that we have(had?) the Python benchmarks continuously running and
> reported at https://pandas.pydata.org/speed/arrow/. Seems like this
> stopped in July 2018.
>
> UWe
>
> On Fri, Jan 18, 2019, at 9:23 AM, Antoine Pitrou wrote:
> >
> > Hi Areg,
> >
> > That sounds like a good idea to me.  Note our benchmarks are currently
> > scattered accross the various implementations.  The two that I know of:
> >
> > - the C++ benchmarks are standalone executables created using the Google
> > Benchmark library, aptly named "*-benchmark" (or "*-benchmark.exe" on
> > Windows)
> > - the Python benchmarks use the ASV utility:
> >
> https://github.com/apache/arrow/blob/master/docs/source/python/benchmarks.rst
> >
> > There may be more in the other implementations.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 18/01/2019 à 07:13, Melik-Adamyan, Areg a écrit :
> > > Hello,
> > >
> > > I want to restart/attach to the discussions for creating Arrow
> benchmarking dashboard. I want to propose performance benchmark run per
> commit to track the changes.
> > > The proposal includes building infrastructure for per-commit tracking
> comprising of the following parts:
> > > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build
> system
> > > - Agents running in cloud both VM/container (DigitalOcean, or others)
> and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> > > - JFrog artifactory storage and management for OSS projects
> https://jfrog.com/open-source/#artifactory2
> > > - Codespeed as a frontend https://github.com/tobami/codespeed
> > >
> > > I am volunteering to build such system (if needed more Intel folks
> will be involved) so we can start tracking performance on various platforms
> and understand how changes affect it.
> > >
> > > Please, let me know your thoughts!
> > >
> > > Thanks,
> > > -Areg.
> > >
> > >
> > >
>


Re: Export symbol guidelines

2019-01-18 Thread Wes McKinney
It is probably safer to namespace those symbols

On Thu, Jan 17, 2019 at 11:51 AM shyam narayan singh
 wrote:
>
> Vendored code is part of the arrow. Gandiva is just the caller of the
> timezone apis.
>
> Regards
> Shyam
>
> On Thu, Jan 17, 2019 at 11:06 PM Antoine Pitrou  wrote:
>
> >
> > Side note: why do those symbols appear in libarrow.so and not
> > libgandiva.so?
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 17/01/2019 à 18:35, Antoine Pitrou a écrit :
> > >
> > > Le 17/01/2019 à 18:29, shyam narayan singh a écrit :
> > >> Hi
> > >>
> > >> 1. The symbols are "locate_zone" and "to_sys" that are part of
> > >> cast_time.cc. These are invoked during casting timestamp with timezone
> > >> present.
> > >> 2. I am trying different things. I just made the symbols hidden to see
> > the
> > >> affect. "manylinux" passed while the others failed.
> > >> 3. Couple of approaches
> > >>   a. relax the constraints
> > >>   b.move the the vendored code to arrow/vendored namespace.
> > >
> > > Ideally we don't want to change the vendored code at all (except at the
> > > very beginning and end of the file).  I'm not sure it's possible to
> > > implement b) under that constraint.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> >


[jira] [Created] (ARROW-4291) [Dev] Support selecting features in release scripts

2019-01-18 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4291:
--

 Summary: [Dev] Support selecting features in release scripts
 Key: ARROW-4291
 URL: https://issues.apache.org/jira/browse/ARROW-4291
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Developer Tools, Packaging
Reporter: Uwe L. Korn


Sometimes not all components can be verified on a system. We should provide 
some environment variables to exclude them to proceed to the next step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4290) [C++/Gandiva] Support detecting correct LLVM version in Homebrew

2019-01-18 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4290:
--

 Summary: [C++/Gandiva] Support detecting correct LLVM version in 
Homebrew
 Key: ARROW-4290
 URL: https://issues.apache.org/jira/browse/ARROW-4290
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Gandiva
Reporter: Uwe L. Korn


We should also search in homebrew for the matching LLVM version for Gandiva on 
OSX. You can install it via {{brew install llvm@6}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4289) [C++] Forward AR and RANLIB to thirdparty builds

2019-01-18 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4289:
--

 Summary: [C++] Forward AR and RANLIB to thirdparty builds
 Key: ARROW-4289
 URL: https://issues.apache.org/jira/browse/ARROW-4289
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn


On OSX Mojave, it seems that there are many version of AR present. CMake seems 
to detect the right one whereas some thirdparty tooling picks up the wrong one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Benchmarking dashboard proposal

2019-01-18 Thread Brian Hulette
We also have some JS benchmarks [1]. Currently they're only really run on
an ad-hoc basis to manually test major changes but it would be great to
include them in this.

[1] https://github.com/apache/arrow/tree/master/js/perf

On Fri, Jan 18, 2019 at 12:34 AM Uwe L. Korn  wrote:

> Hello,
>
> note that we have(had?) the Python benchmarks continuously running and
> reported at https://pandas.pydata.org/speed/arrow/. Seems like this
> stopped in July 2018.
>
> UWe
>
> On Fri, Jan 18, 2019, at 9:23 AM, Antoine Pitrou wrote:
> >
> > Hi Areg,
> >
> > That sounds like a good idea to me.  Note our benchmarks are currently
> > scattered accross the various implementations.  The two that I know of:
> >
> > - the C++ benchmarks are standalone executables created using the Google
> > Benchmark library, aptly named "*-benchmark" (or "*-benchmark.exe" on
> > Windows)
> > - the Python benchmarks use the ASV utility:
> >
> https://github.com/apache/arrow/blob/master/docs/source/python/benchmarks.rst
> >
> > There may be more in the other implementations.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 18/01/2019 à 07:13, Melik-Adamyan, Areg a écrit :
> > > Hello,
> > >
> > > I want to restart/attach to the discussions for creating Arrow
> benchmarking dashboard. I want to propose performance benchmark run per
> commit to track the changes.
> > > The proposal includes building infrastructure for per-commit tracking
> comprising of the following parts:
> > > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build
> system
> > > - Agents running in cloud both VM/container (DigitalOcean, or others)
> and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> > > - JFrog artifactory storage and management for OSS projects
> https://jfrog.com/open-source/#artifactory2
> > > - Codespeed as a frontend https://github.com/tobami/codespeed
> > >
> > > I am volunteering to build such system (if needed more Intel folks
> will be involved) so we can start tracking performance on various platforms
> and understand how changes affect it.
> > >
> > > Please, let me know your thoughts!
> > >
> > > Thanks,
> > > -Areg.
> > >
> > >
> > >
>


Re: 0.12.0-rc2 not building on win32

2019-01-18 Thread Wes McKinney
Can we open a JIRA issue about the 32-bit MSYS2 build?

Thanks

On Sun, Jan 13, 2019 at 2:03 PM Wes McKinney  wrote:
>
> Understood. My sense from the other email thread was that R is definitely 
> going to miss 0.12. That will give you all around 8-10 weeks to sort out the 
> build and packaging questions until the 0.13 time frame
>
> On Sun, Jan 13, 2019, 3:52 PM Jeroen Ooms >
>> On Sun, Jan 13, 2019 at 5:41 PM Wes McKinney  wrote:
>> >
>> > Thanks Jeroen (you can reply on the vote thread about this next time
>> > there's an issue with an RC).
>> >
>> > Unfortunately I don't think it would be a good idea to block the release
>> > over this. We don't have CI for this build configuration, so we should try
>> > to fill this gap for the 0.13 release
>>
>> OK. FYI: 32bit Windows systems are still quite common and win32
>> support is a prerequisite for CRAN packages.


[jira] [Created] (ARROW-4288) Installation instructions don't work on Ubuntu 18.04

2019-01-18 Thread JIRA
Kirill Müller created ARROW-4288:


 Summary: Installation instructions don't work on Ubuntu 18.04
 Key: ARROW-4288
 URL: https://issues.apache.org/jira/browse/ARROW-4288
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Kirill Müller


The R package seems to require statically linking to Boost. One way to achieve 
this on Ubuntu is to use the vendored Boost.

See also ARROW-4286 which discusses namespacing Boost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4287) [C++] Ensure minimal bison version on OSX for Thrift

2019-01-18 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4287:
--

 Summary: [C++] Ensure minimal bison version on OSX for Thrift
 Key: ARROW-4287
 URL: https://issues.apache.org/jira/browse/ARROW-4287
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.13.0


Thrift currently just uses the first bison it finds but needs actually a newer 
one. We should look for the minimal version required and fall back explicitly 
to homebrew and use the newer version if it is available there.

Note: I'll add a fix in our CMake toolchain but will also try to upstream this 
to Thrift.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4286) [C++/R] Namespace vendored Boost

2019-01-18 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4286:
--

 Summary: [C++/R] Namespace vendored Boost
 Key: ARROW-4286
 URL: https://issues.apache.org/jira/browse/ARROW-4286
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Packaging, R
Reporter: Uwe L. Korn
 Fix For: 0.13.0


For R, we vendor Boost and thus also include the symbols privately in our 
modules. While they are private, some things like virtual destructors can still 
interfere with other packages that vendor Boost. We should also namespace the 
vendored Boost as we do in the manylinux1 packaging: 
https://github.com/apache/arrow/blob/0f8bd747468dd28c909ef823bed77d8082a5b373/python/manylinux1/scripts/build_boost.sh#L28



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Benchmarking dashboard proposal

2019-01-18 Thread Uwe L. Korn
Hello,

note that we have(had?) the Python benchmarks continuously running and reported 
at https://pandas.pydata.org/speed/arrow/. Seems like this stopped in July 2018.

UWe

On Fri, Jan 18, 2019, at 9:23 AM, Antoine Pitrou wrote:
> 
> Hi Areg,
> 
> That sounds like a good idea to me.  Note our benchmarks are currently
> scattered accross the various implementations.  The two that I know of:
> 
> - the C++ benchmarks are standalone executables created using the Google
> Benchmark library, aptly named "*-benchmark" (or "*-benchmark.exe" on
> Windows)
> - the Python benchmarks use the ASV utility:
> https://github.com/apache/arrow/blob/master/docs/source/python/benchmarks.rst
> 
> There may be more in the other implementations.
> 
> Regards
> 
> Antoine.
> 
> 
> Le 18/01/2019 à 07:13, Melik-Adamyan, Areg a écrit :
> > Hello,
> > 
> > I want to restart/attach to the discussions for creating Arrow benchmarking 
> > dashboard. I want to propose performance benchmark run per commit to track 
> > the changes.
> > The proposal includes building infrastructure for per-commit tracking 
> > comprising of the following parts:
> > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build 
> > system 
> > - Agents running in cloud both VM/container (DigitalOcean, or others) and 
> > bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> > - JFrog artifactory storage and management for OSS projects 
> > https://jfrog.com/open-source/#artifactory2 
> > - Codespeed as a frontend https://github.com/tobami/codespeed 
> > 
> > I am volunteering to build such system (if needed more Intel folks will be 
> > involved) so we can start tracking performance on various platforms and 
> > understand how changes affect it.
> > 
> > Please, let me know your thoughts!
> > 
> > Thanks,
> > -Areg.
> > 
> > 
> > 


Re: Benchmarking dashboard proposal

2019-01-18 Thread Antoine Pitrou


Hi Areg,

That sounds like a good idea to me.  Note our benchmarks are currently
scattered accross the various implementations.  The two that I know of:

- the C++ benchmarks are standalone executables created using the Google
Benchmark library, aptly named "*-benchmark" (or "*-benchmark.exe" on
Windows)
- the Python benchmarks use the ASV utility:
https://github.com/apache/arrow/blob/master/docs/source/python/benchmarks.rst

There may be more in the other implementations.

Regards

Antoine.


Le 18/01/2019 à 07:13, Melik-Adamyan, Areg a écrit :
> Hello,
> 
> I want to restart/attach to the discussions for creating Arrow benchmarking 
> dashboard. I want to propose performance benchmark run per commit to track 
> the changes.
> The proposal includes building infrastructure for per-commit tracking 
> comprising of the following parts:
> - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build system 
> - Agents running in cloud both VM/container (DigitalOcean, or others) and 
> bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> - JFrog artifactory storage and management for OSS projects 
> https://jfrog.com/open-source/#artifactory2 
> - Codespeed as a frontend https://github.com/tobami/codespeed 
> 
> I am volunteering to build such system (if needed more Intel folks will be 
> involved) so we can start tracking performance on various platforms and 
> understand how changes affect it.
> 
> Please, let me know your thoughts!
> 
> Thanks,
> -Areg.
> 
> 
>