arrow git commit: ARROW-60: [C++] Struct type builder API

2016-06-07 Thread wesm
rimitive.h" +#include "arrow/types/struct.h" +#include "arrow/types/test-common.h" +#include "arrow/util/status.h" using std::shared_ptr; using std::string; @@ -52,4 +61,327 @@ TEST(TestStructType, Basics) { // TODO(wesm): out of bounds for field(...) } +void Va

arrow git commit: ARROW-211: [Format] Fixed typos in layout examples

2016-06-07 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 65740950c -> ce2fe7a78 ARROW-211: [Format] Fixed typos in layout examples Just a few typo fixes according to the ticket. Author: Smyatkin Maxim Closes #86 from Smyatkin-Maxim/ARROW-211 and squashes the following

arrow git commit: ARROW-223: Do not link against libpython

2016-06-21 Thread wesm
Repository: arrow Updated Branches: refs/heads/master a3e3849cd -> f7ade7bfe ARROW-223: Do not link against libpython Author: Uwe L. Korn Closes #95 from xhochy/arrow-223 and squashes the following commits: 4fdf1e7 [Uwe L. Korn] ARROW-223: Do not link against libpython

arrow git commit: ARROW-42: Add Python tests to Travis CI build

2016-03-08 Thread wesm
Repository: arrow Updated Branches: refs/heads/master e822ea758 -> 83675273b ARROW-42: Add Python tests to Travis CI build Author: Wes McKinney <w...@apache.org> Closes #22 from wesm/ARROW-42 and squashes the following commits: 3b056a1 [Wes McKinney] Modularize Travis CI buil

[1/2] arrow git commit: ARROW-54: [Python] Rename package to "pyarrow"

2016-03-09 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 83675273b -> 6fdcd4943 http://git-wip-us.apache.org/repos/asf/arrow/blob/6fdcd494/python/pyarrow/includes/libarrow.pxd -- diff --git a/python/pyarrow/includes/libarrow.pxd

arrow git commit: ARROW-68: Better error handling for not fully setup systems

2016-03-19 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 5881aacef -> c99661069 ARROW-68: Better error handling for not fully setup systems Author: Micah Kornfield Closes #27 from emkornfield/emk_add_nice_errors_PR and squashes the following commits: c0b9d78 [Micah

arrow git commit: ARROW-55: [Python] Fix unit tests in 2.7

2016-03-19 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 6fdcd4943 -> 883c62bdd ARROW-55: [Python] Fix unit tests in 2.7 Fixing the #define check for Python 2 makes all unit tests pass in Python 2.7. Author: Dan Robinson Closes #25 from danrobinson/ARROW-55 and

arrow git commit: ARROW-70: Add adapt 'lite' DCHECK macros from Kudu as also used in Parquet

2016-03-23 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 65db0da80 -> a4002c6e2 ARROW-70: Add adapt 'lite' DCHECK macros from Kudu as also used in Parquet Also added a null pointer DCHECK to show that it works. cc @emkornfield Author: Wes McKinney <w...@apache.org> Closes #33 from w

[3/3] arrow git commit: ARROW-67: C++ metadata flatbuffer serialization and data movement to memory maps

2016-03-22 Thread wesm
and consolidation as part of this. For example, List types are now internally equivalent to a nested type with 1 named child field (versus a struct, which can have any number of child fields). Associated JIRAs: ARROW-48, ARROW-57, ARROW-58 Author: Wes McKinney <w...@apache.org> Closes #28 from we

[2/3] arrow git commit: ARROW-67: C++ metadata flatbuffer serialization and data movement to memory maps

2016-03-22 Thread wesm
// The buffer is prefixed by its size as int32_t + const uint8_t* fb_head = buffer->data() + sizeof(int32_t); + const flatbuf::Message* message = flatbuf::GetMessage(fb_head); + + // TODO(wesm): verify message + result->impl_.reset(new Impl(buffer, message)); + *out = result; +

arrow git commit: ARROW-22: [C++] Convert flat Parquet schemas to Arrow schemas

2016-03-26 Thread wesm
dence between repetition and definition levels so that the right null bits can be set easily during reassembly. Closes #37. Closes #38. Closes #39 Author: Wes McKinney <w...@apache.org> Author: Uwe L. Korn <uw...@xhochy.com> Closes #41 from wesm/ARROW-22 and squashes the followi

arrow git commit: ARROW-44: Python: prototype object model for array slot values ("scalars")

2016-03-07 Thread wesm
rr[2]) Out[10]: 0 In [11]: arr.type Out[11]: DataType(list) ``` Author: Wes McKinney <w...@apache.org> Closes #20 from wesm/ARROW-44 and squashes the following commits: df06ba1 [Wes McKinney] Add tests for scalars proxying implemented Python list type conversions, fix associated bugs 20

arrow git commit: ARROW-20: Add null_count_ member to array containers, remove nullable_ member

2016-03-03 Thread wesm
gorithms code. If it is deemed useful we can validate (cheaply) that physical data meets the metadata requirements (e.g. non-nullable type metadata cannot be associated with data containers having nulls). Author: Wes McKinney <w...@apache.org> Closes #9 from wesm/ARROW-20 and squashes th

arrow git commit: ARROW-36: Remove fixVersions from JIRA resolve code path

2016-03-03 Thread wesm
<w...@apache.org> Closes #11 from wesm/ARROW-36 and squashes the following commits: 432c17c [Wes McKinney] Remove fixVersions from JIRA resolve code path Project: http://git-wip-us.apache.org/repos/asf/arrow/repo Commit: http://git-wip-us.apache.org/repos/asf/arrow/commit/1000d110 Tree: http:

arrow git commit: ARROW-9: Rename some unchanged "Drill" to "Arrow" (follow-up)

2016-03-07 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 8caa28726 -> 571343bbe ARROW-9: Rename some unchanged "Drill" to "Arrow" (follow-up) https://issues.apache.org/jira/browse/ARROW-9 There is a unchanged one from "Drill" to "Arrow" at `ValueVector` and minor typos are fixed. Author:

arrow git commit: ARROW-35: Add a short call-to-action in the top level README.md

2016-03-07 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 572cdf22e -> 8caa28726 ARROW-35: Add a short call-to-action in the top level README.md Author: Wes McKinney <w...@apache.org> Closes #13 from wesm/ARROW-35 and squashes the following commits: e10bfc3 [Wes McKinney] Add a prop

arrow git commit: ARROW-23: Add a logical Column data structure

2016-03-04 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 3b777c7f4 -> 9c2b95446 ARROW-23: Add a logical Column data structure I also added global const instances of common primitive types Author: Wes McKinney <w...@apache.org> Closes #15 from wesm/ARROW-23 and squashes the followin

arrow git commit: ARROW-24: C++: Implement a logical Table container type

2016-03-04 Thread wesm
n the road. Author: Wes McKinney <w...@apache.org> Closes #16 from wesm/ARROW-24 and squashes the following commits: b701c76 [Wes McKinney] Test case for wrong number of columns passed 5faa5ac [Wes McKinney] cpplint 9a651cb [Wes McKinney] Basic table prototype. Move Schema code unde

arrow git commit: ARROW-43: Python: format array values to in __repr__ for interactive computing

2016-03-08 Thread wesm
Repository: arrow Updated Branches: refs/heads/master ae95dbd18 -> 45cd9fd8d ARROW-43: Python: format array values to in __repr__ for interactive computing Author: Wes McKinney <w...@apache.org> Closes #21 from wesm/ARROW-43 and squashes the following commits: dee6ba2 [Wes McKinn

arrow git commit: ARROW-90: [C++] Check for SIMD instruction set support

2016-03-31 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 6d31d5928 -> 79fddd113 ARROW-90: [C++] Check for SIMD instruction set support This also adds an option to disable the usage of a specific instruction set, e.g. you compile on a machine that supports SSE3 but you want to use the binary also

arrow git commit: ARROW-88: [C++] Refactor usages of parquet_cpp namespace

2016-03-28 Thread wesm
loses #49 from wesm/ARROW-88 and squashes the following commits: c4d81dc [Wes McKinney] Refactor usages of parquet_cpp namespace Project: http://git-wip-us.apache.org/repos/asf/arrow/repo Commit: http://git-wip-us.apache.org/repos/asf/arrow/commit/df7726d4 Tree: http://git-wip-us.apache.org/repos/

arrow git commit: ARROW-193: typos "int his" fix to "in this"

2016-05-08 Thread wesm
Repository: arrow Updated Branches: refs/heads/master c9ffe546b -> 1f04f7ff9 ARROW-193: typos "int his" fix to "in this" Project: http://git-wip-us.apache.org/repos/asf/arrow/repo Commit: http://git-wip-us.apache.org/repos/asf/arrow/commit/1f04f7ff Tree:

arrow git commit: ARROW-199: [C++] Refine third party dependency

2016-05-14 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 68b80a838 -> 6968ec01d ARROW-199: [C++] Refine third party dependency To generate makefile, run download_thirdparty.sh and build_thirdparty.sh is not enough source setup_build_env.sh is necessary since FLATBUFFERS_HOME must be set .

arrow git commit: ARROW-204: Add Travis CI builds that post conda artifacts for Linux and OS X

2016-05-18 Thread wesm
ing issues won't fail the build. Author: Wes McKinney <w...@apache.org> Closes #79 from wesm/ARROW-204 and squashes the following commits: afd0582 [Wes McKinney] Change encrypted token to apache/arrow, only upload on commits to master 58955e5 [Wes McKinney] Draft of automated conda builds for

arrow git commit: ARROW-201: [C++] Initial ParquetWriter implementation

2016-05-18 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 978de1a94 -> e0fb3698e ARROW-201: [C++] Initial ParquetWriter implementation Author: Uwe L. Korn Closes #78 from xhochy/arrow-201 and squashes the following commits: 5d95099 [Uwe L. Korn] Add check for flat column

arrow git commit: ARROW-103: Add files to gitignore

2016-04-17 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 37f727168 -> 5843e6872 ARROW-103: Add files to gitignore Patches [ARROW-103](https://issues.apache.org/jira/browse/ARROW-103), though perhaps it would make sense to leave that issue open to cover any future .gitignore-related pull

arrow git commit: ARROW-523: Python: Account for changes in PARQUET-834

2017-02-02 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 0ae4d86e5 -> c05292faf ARROW-523: Python: Account for changes in PARQUET-834 Author: Uwe L. Korn Closes #313 from xhochy/ARROW-523 and squashes the following commits: ff699ea [Uwe L. Korn] Use relative import e36dcc8

arrow git commit: ARROW-477: [Java] Add support for second/microsecond/nanosecond timestamps in-memory and in IPC/JSON layer

2017-02-03 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 720d422fa -> 08f38d979 ARROW-477: [Java] Add support for second/microsecond/nanosecond timestamps in-memory and in IPC/JSON layer Changes include: - add support for TimeStamp data type with second/microsecond/nanosecond time units - add

arrow git commit: ARROW-410: [C++] Add virtual Writeable::Flush

2017-01-31 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 7ac320bde -> be5d73f2c ARROW-410: [C++] Add virtual Writeable::Flush Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #310 from wesm/ARROW-410 and squashes the following commits: 7352f0a [Wes McKinney] Add virtual Writeab

arrow git commit: ARROW-381: [C++] Simplify primitive array type builders to use a default type singleton

2017-02-04 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 5b35d6bda -> 84f16624b ARROW-381: [C++] Simplify primitive array type builders to use a default type singleton Author: Uwe L. Korn Closes #316 from xhochy/ARROW-381 and squashes the following commits: 7061d9a [Uwe L.

arrow git commit: ARROW-527: Remove drill-module.conf file

2017-02-04 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 84f16624b -> c45c3b3e1 ARROW-527: Remove drill-module.conf file Remove drill-module.conf file as it is not used by the project. Author: Laurent Goujon Closes #318 from laurentgo/laurent/ARROW-527 and squashes the

arrow git commit: ARROW-531: Python: Document jemalloc, extend Pandas section, add Getting Involved

2017-02-07 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 4c3481ea5 -> e97fbe640 ARROW-531: Python: Document jemalloc, extend Pandas section, add Getting Involved Author: Uwe L. Korn Closes #321 from xhochy/ARROW-531 and squashes the following commits: 55da9dc [Uwe L. Korn]

arrow git commit: ARROW-535: [Python] Add type mapping for NPY_LONGLONG

2017-02-07 Thread wesm
Repository: arrow Updated Branches: refs/heads/master f268e927a -> 4c3481ea5 ARROW-535: [Python] Add type mapping for NPY_LONGLONG Based on https://github.com/wesm/feather/pull/107 Author: Uwe L. Korn <uw...@xhochy.com> Closes #323 from xhochy/ARROW-535 and squashes the followin

arrow git commit: ARROW-351: Time type has no unit

2017-02-08 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 1407abfc9 -> b99d049c3 ARROW-351: Time type has no unit Author: Julien Le Dem Closes #328 from julienledem/arrow_351 and squashes the following commits: 2497ee3 [Julien Le Dem] ARROW-351: Time type has no unit

arrow git commit: ARROW-366 Java Dictionary Vector

2017-02-07 Thread wesm
Repository: arrow Updated Branches: refs/heads/master e97fbe640 -> c322cbf22 ARROW-366 Java Dictionary Vector I've added a dictionary type, and a partial implementation of a dictionary vector that just wraps an index vector and has a reference to a lookup vector. The spec seems to indicate

arrow git commit: ARROW-538: [C++] Set up AddressSanitizer (ASAN) builds

2017-02-08 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 4440e4011 -> 0bdfd5efb ARROW-538: [C++] Set up AddressSanitizer (ASAN) builds Most of the infrastructure was already in place, only needed to fix the gtest build. We will now build with AddressSanitizer activated on OSX. Author: Uwe L.

arrow git commit: ARROW-543: C++: Lazily computed null_counts counts number of non-null entries

2017-02-08 Thread wesm
Repository: arrow Updated Branches: refs/heads/master b99d049c3 -> 4440e4011 ARROW-543: C++: Lazily computed null_counts counts number of non-null entries Author: Uwe L. Korn Closes #329 from xhochy/ARROW-543 and squashes the following commits: 191792b [Uwe L. Korn]

arrow git commit: ARROW-529: Python: Add jemalloc and Python 3.6 to manylinux1 build

2017-02-05 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 70c05be21 -> 5bee596ca ARROW-529: Python: Add jemalloc and Python 3.6 to manylinux1 build Author: Uwe L. Korn Closes #319 from xhochy/ARROW-529 and squashes the following commits: 48893a2 [Uwe L. Korn] ARROW-529:

[1/3] arrow git commit: ARROW-33: [C++] Implement zero-copy array slicing, integrate with IPC code paths

2017-02-06 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 74bc4dd48 -> 5439b7158 http://git-wip-us.apache.org/repos/asf/arrow/blob/5439b715/python/src/pyarrow/adapters/pandas.cc -- diff --git

[2/3] arrow git commit: ARROW-33: [C++] Implement zero-copy array slicing, integrate with IPC code paths

2017-02-06 Thread wesm
); + const int right_abs_index = o_i + right.offset(); + // TODO(wesm): really we should be comparing stretches of non-null data // rather than looking at one value at a time. if (union_mode == UnionMode::SPARSE) { -if (!left.child(child_num)->RangeEq

[3/3] arrow git commit: ARROW-33: [C++] Implement zero-copy array slicing, integrate with IPC code paths

2017-02-06 Thread wesm
to do to polish things up Closes #56. Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #322 from wesm/ARROW-33 and squashes the following commits: 61afe42 [Wes McKinney] Some API cleaning in builder.h 86511a3 [Wes McKinney] Python fixes, clang warning fixes 9a00870 [Wes McKinney

arrow git commit: ARROW-525: Python: Add more documentation to the package

2017-02-04 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 08f38d979 -> e881f1155 ARROW-525: Python: Add more documentation to the package Author: Uwe L. Korn Closes #317 from xhochy/ARROW-525 and squashes the following commits: d213e63 [Uwe L. Korn] ARROW-525: Python: Add

arrow git commit: ARROW-457: Python: Better control over memory pool

2017-02-04 Thread wesm
Repository: arrow Updated Branches: refs/heads/master e881f1155 -> 5b35d6bda ARROW-457: Python: Better control over memory pool Author: Uwe L. Korn Closes #315 from xhochy/ARROW-457 and squashes the following commits: dc5abdb [Uwe L. Korn] Use aligned deallocator 20c8505

arrow git commit: ARROW-505: [C++] Fix compiler warning in gcc in release mode

2017-01-22 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 5888e10cf -> 5a161ebc1 ARROW-505: [C++] Fix compiler warning in gcc in release mode Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #294 from wesm/fix-release-compile-warning and squashes the following commits: 418

arrow git commit: ARROW-495: [C++] Implement streaming binary format, refactoring

2017-01-21 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 8ca7033fc -> 5888e10cf ARROW-495: [C++] Implement streaming binary format, refactoring cc @nongli Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #293 from wesm/ARROW-495 and squashes the following commits: 279583b [Wes

arrow git commit: ARROW-494: [C++] Extend lifetime of memory mapped data if any buffers reference it

2017-01-23 Thread wesm
ory was being unmapped even if there are `arrow::Buffer` object referencing it. Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #298 from wesm/ARROW-494 and squashes the following commits: 60222e3 [Wes McKinney] clang-format 2960d17 [Wes McKinney] Add C++ unit test d7d776a [Wes McKi

arrow git commit: ARROW-506: Java: Implement echo server for integration testing.

2017-01-23 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 69cdbd8ce -> c327b5fd2 ARROW-506: Java: Implement echo server for integration testing. While implementing this, it became clear it made sense for the stream writer to have an API to indicate EOS without closing the stream. The current

arrow git commit: ARROW-508: [C++] Add basic threadsafety to normal files and memory maps

2017-01-23 Thread wesm
ion in esoteric circumstances. I'm going to report a bug to change these to `ReadAt` which can be more easily made threadsafe Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #300 from wesm/ARROW-508 and squashes the following commits: e57156c [Wes McKinney] Make base ReadableFile

arrow git commit: ARROW-81: [Format] Augment dictionary encoding metadata to accommodate additional use cases

2017-01-23 Thread wesm
ort, and in general for statistical computing applications. Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #297 from wesm/ARROW-81 and squashes the following commits: c960bac [Wes McKinney] Augment dictionary encoding metadata to accommodate additional use cases Project: http:

arrow git commit: ARROW-378: Python: Respect timezone on conversion of Pandas datetime columns

2017-01-23 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 085c8754b -> c90ca60c1 ARROW-378: Python: Respect timezone on conversion of Pandas datetime columns arrow is now pandas datetime timezone aware Author: ahnj Closes #287 from ahnj/timestamp-aware and squashes the

arrow git commit: ARROW-512: C++: Add method to check for primitive types

2017-01-26 Thread wesm
Repository: arrow Updated Branches: refs/heads/master a68af9d16 -> a90b5f363 ARROW-512: C++: Add method to check for primitive types Also includes some documentation updates. Author: Uwe L. Korn Closes #304 from xhochy/ARROW-512 and squashes the following commits:

arrow git commit: ARROW-514: [Python] Automatically wrap pyarrow.io.Buffer in BufferReader

2017-01-26 Thread wesm
Repository: arrow Updated Branches: refs/heads/master aac2e70c1 -> 30bb0d97d ARROW-514: [Python] Automatically wrap pyarrow.io.Buffer in BufferReader Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #306 from wesm/ARROW-514 and squashes the following commits: d5e3235 [Wes

arrow git commit: ARROW-519: [C++] Refactor array comparison code into a compare.h / compare.cc in part to resolve Xcode 6.1 linker issue

2017-01-29 Thread wesm
arrays not equal" per ARROW-517 Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #308 from wesm/ARROW-519 and squashes the following commits: 85b0bf8 [Wes McKinney] Fix invalid memory access when doing RangeEquals on BinaryArray with all empty strings f5f4593 [Wes McKinney] Rem

arrow git commit: ARROW-498 [C++] Add command line utilities that convert between stream and file.

2017-01-25 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 61a54f8a6 -> a68af9d16 ARROW-498 [C++] Add command line utilities that convert between stream and file. These are in the style of unix utilities using stdin/stdout for argument passing. This makes it easy to chain them together and I

arrow git commit: ARROW-563: Support non-standard gcc version strings

2017-02-20 Thread wesm
Repository: arrow Updated Branches: refs/heads/master ab15e01c7 -> ef6b46557 ARROW-563: Support non-standard gcc version strings Author: Uwe L. Korn Closes #343 from xhochy/ARROW-563 and squashes the following commits: 64d1c93 [Uwe L. Korn] ARROW-563: Support non-standard

[2/2] arrow git commit: ARROW-459: [C++] Dictionary IPC support in file and stream formats

2017-02-24 Thread wesm
ARROW-459: [C++] Dictionary IPC support in file and stream formats Also fixes ARROW-565 Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #347 from wesm/ARROW-459 and squashes the following commits: 6a987b7 [Wes McKinney] Fix clang warning with forward declaration 8e0e6fb [Wes Mc

arrow git commit: ARROW-580: C++: Also provide jemalloc_X targets if only a static or shared version is found

2017-02-25 Thread wesm
Repository: arrow Updated Branches: refs/heads/master d28f1c1e0 -> 89dc55789 ARROW-580: C++: Also provide jemalloc_X targets if only a static or shared version is found Author: Uwe L. Korn Closes #349 from xhochy/ARROW-580 and squashes the following commits: 6cdeef2 [Uwe

arrow git commit: ARROW-578: [C++] Add -DARROW_CXXFLAGS=... option to make CMake more consistent

2017-02-25 Thread wesm
ork properly in our Travis CI setup, so go figure. Some Google searches seem to confirm this is a known issue, and having a specific "user flags" option is a way around it. We just did the same thing in parquet-cpp. Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #348

arrow git commit: ARROW-558: Add KEYS files

2017-02-14 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 69cf69238 -> d50f1525a ARROW-558: Add KEYS files Author: Uwe L. Korn Closes #341 from xhochy/ARROW-558 and squashes the following commits: ea5327b [Uwe L. Korn] ARROW-558: Add KEYS files Project:

arrow git commit: ARROW-544: [C++] Test writing zero-length record batches, zero-length BinaryArray fixes

2017-02-10 Thread wesm
ges to verify. cc @BryanCutler Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #333 from wesm/ARROW-544 and squashes the following commits: f80d58f [Wes McKinney] Protect zero-length record batches from incomplete buffer metadata f876dce [Wes McKinney] Test with null value_offsets to

arrow git commit: ARROW-561:[JAVA][PYTHON] Update java & python dependencies to improve downstream packaging experience

2017-02-15 Thread wesm
Repository: arrow Updated Branches: refs/heads/master d50f1525a -> fa8d27f31 ARROW-561:[JAVA][PYTHON] Update java & python dependencies to improve downstream packaging experience The current build for arrow uses a interesting work around for hamcrest conflict between JUNIT and mockito which

arrow git commit: ARROW-553: C++: Faster valid bitmap building

2017-02-13 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 1f26040f5 -> ad0157547 ARROW-553: C++: Faster valid bitmap building Author: Uwe L. Korn Closes #338 from xhochy/ARROW-553 and squashes the following commits: 1c1ee3d [Uwe L. Korn] ARROW-553: C++: Faster valid bitmap

arrow git commit: ARROW-547: [Python] Add zero-copy slice methods to Array, RecordBatch

2017-02-13 Thread wesm
Repository: arrow Updated Branches: refs/heads/master ad0157547 -> 66f650cd3 ARROW-547: [Python] Add zero-copy slice methods to Array, RecordBatch Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #336 from wesm/ARROW-547 and squashes the following commits: 42037c2 [Wes

arrow git commit: ARROW-521: [C++] Track peak allocations in default memory pool

2017-02-09 Thread wesm
; Closes #330 from wesm/ARROW-521 and squashes the following commits: 10531c4 [Wes McKinney] Move max_memory_ member to DefaultMemoryPool, add default virtual max_memory() to MemoryPool a0d134d [Wes McKinney] Add max_memory() method to MemoryPool, leave implementation to subclasses Project: h

arrow git commit: ARROW-509: [Python] Add support for multithreaded Parquet reads

2017-01-24 Thread wesm
ble = pq.read_table('/home/wesm/data/airlines_parquet/4345e5eef217aa1b-c8f16177f35fd983_1150363067_data.0.parq') CPU times: user 8.21 s, sys: 468 ms, total: 8.68 s Wall time: 8.68 s In [3]: %time table = pq.read_table('/home/wesm/data/airlines_parquet/4345e5eef217aa1b-c8f16177f35fd983_1150363067_data.0.p

arrow git commit: ARROW-468: Python: Conversion of nested data in pd.DataFrames

2017-01-18 Thread wesm
eck(obj); +#endif +} + +static inline bool PyObject_is_bool(const PyObject* obj) { +#if PY_MAJOR_VERSION >= 3 + return PyString_Check(obj) || PyBytes_Check(obj); +#else + return PyString_Check(obj) || PyUnicode_Check(obj); +#endif +} + +template +static int64_t ValuesToBitmap(const void* data, int64_t l

[2/3] arrow git commit: ARROW-461: [Python] Add Python interfaces to DictionaryArray data, pandas interop

2017-01-19 Thread wesm
lock; + for (int c = 0; c < data.num_chunks(); c++) { +auto arr = static_cast<ArrayType*>(data.chunk(c).get()); -// Returns null count -static int64_t MaskToBitmap(PyArrayObject* mask, int64_t length, uint8_t* bitmap) { - int64_t null_count = 0; - const uint8_t* mask_values =

[3/3] arrow git commit: ARROW-461: [Python] Add Python interfaces to DictionaryArray data, pandas interop

2017-01-19 Thread wesm
ARROW-461: [Python] Add Python interfaces to DictionaryArray data, pandas interop Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #291 from wesm/ARROW-461 and squashes the following commits: b3efe96 [Wes McKinney] Fix cpp unit test, code review comments 285f863 [Wes McKinney]

[1/3] arrow git commit: ARROW-461: [Python] Add Python interfaces to DictionaryArray data, pandas interop

2017-01-19 Thread wesm
State_STATE state_; - DISALLOW_COPY_AND_ASSIGN(PyGILGuard); -}; - // TODO(wesm): We can just let errors pass through. To be explored later #define RETURN_IF_PYERROR() \ if (PyErr_Occurred()) { \ @@ -88,8 +91,9 @@ class PyGILGuard { PyObjectStringify str

arrow git commit: ARROW-490: Python: Update manylinux1 build scripts

2017-01-17 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 850774efe -> b1472305c ARROW-490: Python: Update manylinux1 build scripts Through the usage of the ExternalProject command, a lot has become much simpler. Author: Uwe L. Korn Closes #290 from xhochy/ARROW-490 and

arrow git commit: ARROW-484: Revise README to include more detail about software components

2017-01-16 Thread wesm
Repository: arrow Updated Branches: refs/heads/master a098fd04f -> 850774efe ARROW-484: Revise README to include more detail about software components Also closes #14. Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #286 from wesm/ARROW-484 and squashes the followin

arrow git commit: ARROW-486: [C++] Use virtual inheritance for diamond inheritance

2017-01-15 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 47115aa3e -> a098fd04f ARROW-486: [C++] Use virtual inheritance for diamond inheritance arrow::io::ReadWriteFileInterface inheritances arrow::io::FileInterface as diamond style via: * ReadableFileInterface -> InputStream ->

arrow git commit: ARROW-451: [C++] Implement DataType::Equals as TypeVisitor. Add default implementations for TypeVisitor, ArrayVisitor methods

2017-02-26 Thread wesm
ere not having their `unit` metadata compared due to an oversight. Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #350 from wesm/ARROW-451 and squashes the following commits: 97e75d8 [Wes McKinney] Export ArrayVisitor, TypeVisitor symbols a3332be [Wes McKinney] Typo 635e74d [Wes

arrow git commit: ARROW-577: [C++] Use private implementation pattern in ipc::StreamWriter and ipc::FileWriter

2017-02-26 Thread wesm
ers/compilation units. I also moved the stream-to-file and file-to-stream executables to arrow/ipc Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #351 from wesm/ARROW-577 and squashes the following commits: 98c32d2 [Wes McKinney] Only build file/stream utils if ARROW_BUILD_UTILITIES is o

arrow git commit: ARROW-284: Disable arrow_parquet module in Travis CI to triage builds

2016-09-06 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 2d8ec7893 -> 637584bec ARROW-284: Disable arrow_parquet module in Travis CI to triage builds Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #132 from wesm/ARROW-284 and squashes the following commits: e3410cf [Wes

arrow git commit: ARROW-283: [C++] Account for upstream changes in parquet-cpp

2016-09-06 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 637584bec -> 214b861ae ARROW-283: [C++] Account for upstream changes in parquet-cpp Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #131 from wesm/ARROW-283 and squashes the following commits: 52dfb28 [Wes McKinne

arrow git commit: ARROW-361: Python: Support reading a column-selection from Parquet files

2016-11-06 Thread wesm
Repository: arrow Updated Branches: refs/heads/master e8bc1fe3b -> 121e82682 ARROW-361: Python: Support reading a column-selection from Parquet files Author: Uwe L. Korn Closes #197 from xhochy/ARROW-361 and squashes the following commits: c1fb939 [Uwe L. Korn] Cache

arrow git commit: ARROW-327: [Python] Remove conda builds from Travis CI setup

2016-10-17 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 676c32cce -> e2c0a1831 ARROW-327: [Python] Remove conda builds from Travis CI setup We'll do these builds in conda-forge Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #178 from wesm/ARROW-327 and squashes the followin

arrow git commit: ARROW-356: Add documentation about reading Parquet

2016-11-11 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 4fa7ac4f6 -> 7f048a4b8 ARROW-356: Add documentation about reading Parquet Assumes #192. Author: Uwe L. Korn Closes #193 from xhochy/ARROW-356 and squashes the following commits: 530484f [Uwe L. Korn] Mention new setup

arrow git commit: ARROW-383: [C++] Integration testing CLI tool

2016-11-21 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 997f502ce -> f082b1732 ARROW-383: [C++] Integration testing CLI tool Modeled after Java version in ARROW-367 Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #209 from wesm/ARROW-383 and squashes the following commits:

[2/2] arrow git commit: ARROW-363: [Java/C++] integration testing harness, initial integration tests

2016-11-28 Thread wesm
ARROW-363: [Java/C++] integration testing harness, initial integration tests This also includes format reconciliation as discussed in ARROW-384. Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #211 from wesm/ARROW-363 and squashes the following commits: 6982c3c [Wes McKinney]

arrow git commit: ARROW-371: Handle pandas-nullable types correctly

2016-11-16 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 48f9780a8 -> 78288b5fc ARROW-371: Handle pandas-nullable types correctly Author: Uwe L. Korn Closes #205 from xhochy/ARROW-371 and squashes the following commits: 1f73e8b [Uwe L. Korn] ARROW-371: Handle pandas-nullable

arrow git commit: ARROW-367: converter json <=> Arrow file format for Integration tests

2016-11-18 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 78288b5fc -> 841709627 ARROW-367: converter json <=> Arrow file format for Integration tests Author: Julien Le Dem Closes #203 from julienledem/integration and squashes the following commits: b3cd326 [Julien Le Dem]

arrow git commit: ARROW-323: [Python] Opt-in to pyarrow.parquet extension rather than attempting and failing silently

2016-11-03 Thread wesm
ing through an option to CMake Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #194 from wesm/ARROW-323 and squashes the following commits: 07c05cc [Wes McKinney] Update readme to illustrate proper use of with build_ext 3bd9a8d [Wes McKinney] Add --with-parquet option to setup.py 374

arrow git commit: ARROW-357: Use a single RowGroup for Parquet files as default.

2016-11-02 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 2a059bd27 -> 17c9ae7c4 ARROW-357: Use a single RowGroup for Parquet files as default. This is not the optimal choice, we should rather have an option to optimise for the underlying block size of the filesystem but without the

arrow git commit: ARROW-359: Document ARROW_LIBHDFS_DIR

2016-11-02 Thread wesm
Repository: arrow Updated Branches: refs/heads/master e70d97dbc -> 2a059bd27 ARROW-359: Document ARROW_LIBHDFS_DIR Author: Christopher C. Aycock Closes #196 from chrisaycock/ARROW-359 and squashes the following commits: 52ec78e [Christopher C. Aycock]

arrow git commit: ARROW-354: Fix comparison of arrays of empty strings

2016-10-29 Thread wesm
Repository: arrow Updated Branches: refs/heads/master da24c1a0a -> d946e7917 ARROW-354: Fix comparison of arrays of empty strings Author: Uwe L. Korn Closes #189 from xhochy/ARROW-354 and squashes the following commits: 8f75d78 [Uwe L. Korn] ARROW-354: Fix comparison of

arrow git commit: ARROW-348: [Python] Add build-type command line option to setup.py, build CMake extensions in a build type subdirectory

2016-11-01 Thread wesm
oses #187 from wesm/ARROW-348 and squashes the following commits: 3cdaeaf [Wes McKinney] Cast build_type to lowercase in case env variable is uppercase 74bfa71 [Wes McKinney] Pull default build type from environment variable d0b3154 [Wes McKinney] Tweak readme 6017948 [Wes McKinney] Add built-typ

arrow git commit: ARROW-355: Add tests for serialising arrays of empty strings to Parquet

2016-11-01 Thread wesm
Repository: arrow Updated Branches: refs/heads/master d4148759a -> c7db80e72 ARROW-355: Add tests for serialising arrays of empty strings to Parquet Depends on https://issues.apache.org/jira/browse/PARQUET-759 Author: Uwe L. Korn Closes #190 from xhochy/ARROW-355 and

arrow git commit: ARROW-332: Add RecordBatch.to_pandas method

2016-10-11 Thread wesm
Repository: arrow Updated Branches: refs/heads/master caa843bda -> 3919a2778 ARROW-332: Add RecordBatch.to_pandas method This makes testing and IPC data wrangling a little easier. Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #165 from wesm/ARROW-332 and squashes the

arrow git commit: ARROW-312: Read and write Arrow IPC file format from Python

2016-10-10 Thread wesm
Wes McKinney <wes.mckin...@twosigma.com> Closes #164 from wesm/ARROW-312 and squashes the following commits: 7df3e5f [Wes McKinney] Set BUILD_WITH_INSTALL_RPATH on arrow_ipc be8cee0 [Wes McKinney] Link Cython modules to libarrow* libraries 5716601 [Wes McKinney] Fix accidental deletion 77f

[1/2] arrow git commit: ARROW-261: Refactor String/Binary code paths to reflect unnested (non-list-based) structure

2016-10-17 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 8e8b17f99 -> 732a2059d http://git-wip-us.apache.org/repos/asf/arrow/blob/732a2059/python/src/pyarrow/adapters/pandas.h -- diff --git a/python/src/pyarrow/adapters/pandas.h

arrow git commit: ARROW-394: [Integration] Generate tests cases for numeric types, strings, lists, structs

2016-12-09 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 8995c9230 -> 45ed7e7a3 ARROW-394: [Integration] Generate tests cases for numeric types, strings, lists, structs Automatically generating testing files from Python. Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #219

arrow git commit: ARROW-400: set struct length on load

2016-12-09 Thread wesm
Repository: arrow Updated Branches: refs/heads/master d06c49144 -> 14ed1be2d ARROW-400: set struct length on load Adds unit test, closes #233 Author: Julien Le Dem <jul...@dremio.com> Author: Wes McKinney <wes.mckin...@twosigma.com> Closes #234 from wesm/ARROW-400 and squashe

[2/5] arrow git commit: ARROW-418: [C++] Array / Builder class code reorganization, flattening

2016-12-12 Thread wesm
_bitmap) -: BinaryArray(kBinary, length, offsets, data, null_count, null_bitmap) {} - -BinaryArray::BinaryArray(const TypePtr& type, int32_t length, -const std::shared_ptr& offsets, const std::shared_ptr& data, -int32_t null_count, const std::shared_ptr& null_bitma

[1/5] arrow git commit: ARROW-418: [C++] Array / Builder class code reorganization, flattening

2016-12-12 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 73fe55683 -> 2c10d7cce http://git-wip-us.apache.org/repos/asf/arrow/blob/2c10d7cc/python/src/pyarrow/common.h -- diff --git a/python/src/pyarrow/common.h

[5/5] arrow git commit: ARROW-418: [C++] Array / Builder class code reorganization, flattening

2016-12-12 Thread wesm
com> Closes #236 from wesm/ARROW-418 and squashes the following commits: 6f556ea [Wes McKinney] Add missing math.h include for clang 9dc2e22 [Wes McKinney] Fix remaining old includes 6f7ae77 [Wes McKinney] Fixes, cpplint 66ac3f7 [Wes McKinney] Promote buffer.h/status.h/memory-pool.h to top

[3/5] arrow git commit: ARROW-418: [C++] Array / Builder class code reorganization, flattening

2016-12-12 Thread wesm
/type.cc -- diff --git a/cpp/src/arrow/type.cc b/cpp/src/arrow/type.cc index 75f5086..5b172e4 100644 --- a/cpp/src/arrow/type.cc +++ b/cpp/src/arrow/type.cc @@ -20,7 +20,7 @@ #include #includ

[4/5] arrow git commit: ARROW-418: [C++] Array / Builder class code reorganization, flattening

2016-12-12 Thread wesm
ets(), values->data(), + list->null_count(), list->null_bitmap()); + return Status::OK(); +} + +// -- +// Struct + +Status StructBuilder::Finish(std::shared_ptr* out) { + std::vector<std::shared_ptr> fields(fie

arrow git commit: ARROW-423: Define BUILD_BYPRODUCTS for CMake 3.2+

2016-12-15 Thread wesm
Repository: arrow Updated Branches: refs/heads/master 935279091 -> 063c190a5 ARROW-423: Define BUILD_BYPRODUCTS for CMake 3.2+ Author: Uwe L. Korn Closes #240 from xhochy/ARROW-423 and squashes the following commits: 4c99ba2 [Uwe L. Korn] ARROW-423: Define

  1   2   3   4   5   6   7   8   9   10   >