[jira] [Updated] (ARROW-3729) [C++] Support for writing TIMESTAMP_NANOS Parquet metadata
[ https://issues.apache.org/jira/browse/ARROW-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3729: -- Labels: parquet pull-request-available (was: parquet) > [C++] Support for writing TIMESTAMP_NANOS Parquet metadata > -- > > Key: ARROW-3729 > URL: https://issues.apache.org/jira/browse/ARROW-3729 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: TP Boudreau >Priority: Major > Labels: parquet, pull-request-available > Fix For: 0.14.0 > > > This was brought up on the mailing list. > We also will need to do corresponding work in the parquet-cpp library to opt > in to writing nanosecond timestamps instead of casting to micro- or > millisecond. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3104) [Python] Python bindings for HiveServer2 client interface
[ https://issues.apache.org/jira/browse/ARROW-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3104: Fix Version/s: (was: 0.14.0) > [Python] Python bindings for HiveServer2 client interface > - > > Key: ARROW-3104 > URL: https://issues.apache.org/jira/browse/ARROW-3104 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: HiveServer2 > > These will be a 1-1 mapping to the current C++ classes, with support for > yielding Arrow record batches or tables -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3134) [C++] Implement n-ary iterator for a collection of chunked arrays with possibly different chunking layouts
[ https://issues.apache.org/jira/browse/ARROW-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3134: Labels: dataframe (was: ) > [C++] Implement n-ary iterator for a collection of chunked arrays with > possibly different chunking layouts > -- > > Key: ARROW-3134 > URL: https://issues.apache.org/jira/browse/ARROW-3134 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: dataframe > Fix For: 0.14.0 > > > This is a common pattern that will result in kernel invocation on chunked > arrays -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3097) [Format] Interval type is not documented
[ https://issues.apache.org/jira/browse/ARROW-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3097: Fix Version/s: (was: 0.14.0) 1.0.0 > [Format] Interval type is not documented > > > Key: ARROW-3097 > URL: https://issues.apache.org/jira/browse/ARROW-3097 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Konstantin Shaposhnikov >Priority: Major > Labels: columnar-format-1.0 > Fix For: 1.0.0 > > > All types except Interval are documented in Metadata.md. Information about > Interval is missing, in particular what is its size (64 bit?) and what is the > meaning of IntervalUnit values. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3103) [C++] Conversion to Arrow record batch for HiveServer2 ColumnarRowSet
[ https://issues.apache.org/jira/browse/ARROW-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3103: Fix Version/s: (was: 0.14.0) > [C++] Conversion to Arrow record batch for HiveServer2 ColumnarRowSet > - > > Key: ARROW-3103 > URL: https://issues.apache.org/jira/browse/ARROW-3103 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: HiveServer2, database > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4515) [C++, lint] Use clang-format more efficiently in `check-format` target
[ https://issues.apache.org/jira/browse/ARROW-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4515: Fix Version/s: (was: 0.14.0) > [C++, lint] Use clang-format more efficiently in `check-format` target > -- > > Key: ARROW-4515 > URL: https://issues.apache.org/jira/browse/ARROW-4515 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Benjamin Kietzman >Assignee: Benjamin Kietzman >Priority: Minor > > `clang-format` supports command line option `-output-replacements-xml` which > (in the case of no required changes) outputs: > ``` > > > > ``` > Using this option during `check-format` instead of using python to compute a > diff between formatted and on-disk should speed up that target significantly -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4534) [Rust] Build JSON reader for reading record batches from line-delimited JSON files
[ https://issues.apache.org/jira/browse/ARROW-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4534: Fix Version/s: (was: 0.14.0) > [Rust] Build JSON reader for reading record batches from line-delimited JSON > files > -- > > Key: ARROW-4534 > URL: https://issues.apache.org/jira/browse/ARROW-4534 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Affects Versions: 0.12.0 >Reporter: Neville Dipale >Priority: Major > > Similar to ARROW-694, this is an umbrella issue for supporting reading JSON > line-delimited files in Arrow. > I have a reference implementation at > https://github.com/nevi-me/rust-dataframe/blob/io/json/src/io/json.rs where > I'm building a Rust-based dataframe library using Arrow. > I'd like us to have feature parity with CPP at some point. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4838) [C++] Implement safe Make constructor
[ https://issues.apache.org/jira/browse/ARROW-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4838: Fix Version/s: (was: 0.14.0) > [C++] Implement safe Make constructor > - > > Key: ARROW-4838 > URL: https://issues.apache.org/jira/browse/ARROW-4838 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Francois Saint-Jacques >Priority: Major > > The following classes need validating constructors: > * ArrayData > * ChunkedArray > * RecordBatch > * Column > * Table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4845) [C++] Compiler warnings on Windows MingW64
[ https://issues.apache.org/jira/browse/ARROW-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4845: Summary: [C++] Compiler warnings on Windows MingW64 (was: Compiler warnings on Windows) > [C++] Compiler warnings on Windows MingW64 > -- > > Key: ARROW-4845 > URL: https://issues.apache.org/jira/browse/ARROW-4845 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.12.1 >Reporter: Jeroen >Priority: Major > Fix For: 0.14.0 > > > I am seeing the warnings below when compiling the R bindings on Windows. Most > of these seem easy to fix (comparing int with size_t or int32 with int64). > {code} > array.cpp: In function 'Rcpp::LogicalVector Array__Mask(const > std::shared_ptr&)': > array.cpp:102:24: warning: comparison of integer expressions of different > signedness: 'size_t' {aka 'long long unsigned int'} and 'int64_t' {aka 'long > long int'} [-Wsign-compare] >for (size_t i = 0; i < array->length(); i++, bitmap_reader.Next()) { > ~~^ > /mingw64/bin/g++ -std=gnu++11 -I"C:/PROGRA~1/R/R-testing/include" -DNDEBUG > -DARROW_STATIC -I"C:/R/library/Rcpp/include"-O2 -Wall -mtune=generic > -c array__to_vector.cpp -o array__to_vector.o > array__to_vector.cpp: In member function 'virtual arrow::Status > arrow::r::Converter_Boolean::Ingest_some_nulls(SEXP, const > std::shared_ptr&, R_xlen_t, R_xlen_t) const': > array__to_vector.cpp:254:28: warning: comparison of integer expressions of > different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' > {aka 'long long int'} [-Wsign-compare] >for (size_t i = 0; i < n; i++, data_reader.Next(), null_reader.Next(), > ++p_data) { > ~~^~~ > array__to_vector.cpp:258:28: warning: comparison of integer expressions of > different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' > {aka 'long long int'} [-Wsign-compare] >for (size_t i = 0; i < n; i++, data_reader.Next(), ++p_data) { > ~~^~~ > array__to_vector.cpp: In member function 'virtual arrow::Status > arrow::r::Converter_Decimal::Ingest_some_nulls(SEXP, const > std::shared_ptr&, R_xlen_t, R_xlen_t) const': > array__to_vector.cpp:473:28: warning: comparison of integer expressions of > different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' > {aka 'long long int'} [-Wsign-compare] >for (size_t i = 0; i < n; i++, bitmap_reader.Next(), ++p_data) { > ~~^~~ > array__to_vector.cpp:478:28: warning: comparison of integer expressions of > different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' > {aka 'long long int'} [-Wsign-compare] >for (size_t i = 0; i < n; i++, ++p_data) { > ~~^~~ > array__to_vector.cpp: In member function 'virtual arrow::Status > arrow::r::Converter_Int64::Ingest_some_nulls(SEXP, const > std::shared_ptr&, R_xlen_t, R_xlen_t) const': > array__to_vector.cpp:515:28: warning: comparison of integer expressions of > different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' > {aka 'long long int'} [-Wsign-compare] >for (size_t i = 0; i < n; i++, bitmap_reader.Next(), ++p_data) { > ~~^~~ > array__to_vector.cpp: In instantiation of 'arrow::Status > arrow::r::SomeNull_Ingest(SEXP, R_xlen_t, R_xlen_t, const array_value_type*, > const std::shared_ptr&, Lambda) [with int RTYPE = 14; > array_value_type = long long int; Lambda = > arrow::r::Converter_Date64::Ingest_some_nulls(SEXP, const > std::shared_ptr&, R_xlen_t, R_xlen_t) const::; > SEXP = SEXPREC*; R_xlen_t = long long int]': > array__to_vector.cpp:366:77: required from here > array__to_vector.cpp:116:26: warning: comparison of integer expressions of > different signedness: 'size_t' {aka 'long long unsigned int'} and 'R_xlen_t' > {aka 'long long int'} [-Wsign-compare] > for (size_t i = 0; i < n; i++, bitmap_reader.Next(), ++p_data, > ++p_values) { > ~~^~~ > array__to_vector.cpp: In instantiation of 'arrow::Status > arrow::r::SomeNull_Ingest(SEXP, R_xlen_t, R_xlen_t, const array_value_type*, > const std::shared_ptr&, Lambda) [with int RTYPE = 13; > array_value_type = unsigned char; Lambda = > arrow::r::Converter_Dictionary::Ingest_some_nulls_Impl(SEXP, const > std::shared_ptr&, R_xlen_t, R_xlen_t) const [with Type = > arrow::UInt8Type; SEXP = SEXPREC*; R_xlen_t = long long > int]::; SEXP = SEXPREC*; R_xlen_t = long long int]': > array__to_vector.cpp:341:47: required from 'arrow::Status > arrow::r::Converter_Dictionary::Ingest_some_nulls_Impl(SEXP, const > std::shared_ptr&, R_xlen_t, R_xlen_t) const [with Type = > arrow::UInt8Type;
[jira] [Commented] (ARROW-1957) [Python] Write nanosecond timestamps using new NANO LogicalType Parquet unit
[ https://issues.apache.org/jira/browse/ARROW-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852680#comment-16852680 ] TP Boudreau commented on ARROW-1957: Yes, thanks for assigning it. > [Python] Write nanosecond timestamps using new NANO LogicalType Parquet unit > > > Key: ARROW-1957 > URL: https://issues.apache.org/jira/browse/ARROW-1957 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Affects Versions: 0.8.0 > Environment: Python 3.6.4. Mac OSX and CentOS Linux release > 7.3.1611. Pandas 0.21.1 . >Reporter: Jordan Samuels >Assignee: TP Boudreau >Priority: Minor > Labels: parquet > Fix For: 0.14.0 > > > The following code > {code} > import pyarrow as pa > import pyarrow.parquet as pq > import pandas as pd > n=3 > df = pd.DataFrame({'x': range(n)}, index=pd.DatetimeIndex(start='2017-01-01', > freq='1n', periods=n)) > pq.write_table(pa.Table.from_pandas(df), '/tmp/t.parquet'){code} > results in: > {{ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: > 14832288001}} > The desired effect is that we can save nanosecond resolution without losing > precision (e.g. conversion to ms). Note that if {{freq='1u'}} is used, the > code runs properly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5448) [CI] MinGW build failures on AppVeyor
[ https://issues.apache.org/jira/browse/ARROW-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5448: -- Labels: pull-request-available (was: ) > [CI] MinGW build failures on AppVeyor > - > > Key: ARROW-5448 > URL: https://issues.apache.org/jira/browse/ARROW-5448 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Continuous Integration >Reporter: Antoine Pitrou >Assignee: Kouhei Sutou >Priority: Blocker > Labels: pull-request-available > > Apparently the Numpy package is broken. See > https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/24922425/job/9yoq08uepk5p6dwb > {code} > -- Found PythonLibs: C:/msys64/mingw32/lib/libpython3.7m.dll.a > CMake Error at cmake_modules/FindNumPy.cmake:62 (message): > NumPy import failure: > Traceback (most recent call last): > File > "C:/msys64/mingw32/lib/python3.7/site-packages\numpy\core\__init__.py", line > 40, in > from . import multiarray > File > "C:/msys64/mingw32/lib/python3.7/site-packages\numpy\core\multiarray.py", > line 12, in > from . import overrides > File > "C:/msys64/mingw32/lib/python3.7/site-packages\numpy\core\overrides.py", line > 6, in > from numpy.core._multiarray_umath import ( > ImportError: DLL load failed: The specified module could not be found. > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5453) [C++] Just-released cmake-format 0.5.2 breaks the build
Wes McKinney created ARROW-5453: --- Summary: [C++] Just-released cmake-format 0.5.2 breaks the build Key: ARROW-5453 URL: https://issues.apache.org/jira/browse/ARROW-5453 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Wes McKinney Fix For: 0.14.0 It seems we should always pin the cmake-format version until the developers stop changing the formatting algorithm -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2164) [C++] Clean up unnecessary decimal module refs
[ https://issues.apache.org/jira/browse/ARROW-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2164: Fix Version/s: (was: 0.14.0) > [C++] Clean up unnecessary decimal module refs > -- > > Key: ARROW-2164 > URL: https://issues.apache.org/jira/browse/ARROW-2164 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.8.0 >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > > See this comment: > https://github.com/apache/arrow/pull/1610#discussion_r168533239 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2248) [Python] Nightly or on-demand HDFS test builds
[ https://issues.apache.org/jira/browse/ARROW-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852498#comment-16852498 ] Wes McKinney commented on ARROW-2248: - cc [~npr] > [Python] Nightly or on-demand HDFS test builds > -- > > Key: ARROW-2248 > URL: https://issues.apache.org/jira/browse/ARROW-2248 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > We continue to acquire more functionality related to HDFS and Parquet. > Testing this, including tests that involve interoperability with other > systems, like Spark, will require some work outside of our normal CI > infrastructure. > I suggest we start with testing the C++/Python HDFS integration, which will > help with validating patches like ARROW-1643 > https://github.com/apache/arrow/pull/1668 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2249) [Java/Python] in-process vector sharing from Java to Python
[ https://issues.apache.org/jira/browse/ARROW-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2249: Fix Version/s: (was: 0.14.0) > [Java/Python] in-process vector sharing from Java to Python > --- > > Key: ARROW-2249 > URL: https://issues.apache.org/jira/browse/ARROW-2249 > Project: Apache Arrow > Issue Type: New Feature > Components: Java, Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: beginner > > Currently we seem to use in all applications of Arrow the IPC capabilities to > move data between a Java process and a Python process. While this is > 0-serialization, it is not zero-copy. By taking the address and offset, we > can already create Python buffers from Java buffers: > https://github.com/apache/arrow/pull/1693. This is still a very low-level > interface and we should provide the user with: > * A guide on how to load Apache Arrow java libraries in Python (either > through a fat-jar that was shipped with Arrow or how he should integrate it > into its Java packaging) > * {{pyarrow.Array.from_jvm}}, {{pyarrow.RecordBatch.from_jvm}}, … functions > that take the respective Java objects and emit Python objects. These Python > objects should also ensure that the underlying memory regions are kept alive > as long as the Python objects exist. > This issue can also be used as a tracker for the various sub-tasks that will > need to be done to complete this rather large milestone. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2260) [C++][Plasma] plasma_store should show usage
[ https://issues.apache.org/jira/browse/ARROW-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2260: Fix Version/s: (was: 0.14.0) > [C++][Plasma] plasma_store should show usage > > > Key: ARROW-2260 > URL: https://issues.apache.org/jira/browse/ARROW-2260 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Affects Versions: 0.8.0 >Reporter: Antoine Pitrou >Priority: Minor > > Currently the options exposed by the {{plasma_store}} executable aren't very > discoverable: > {code:bash} > $ plasma_store -h > please specify socket for incoming connections with -s switch > Abandon > (pyarrow) antoine@fsol:~/arrow/cpp (ARROW-2135-nan-conversion-when-casting > *)$ plasma_store > please specify socket for incoming connections with -s switch > Abandon > (pyarrow) antoine@fsol:~/arrow/cpp (ARROW-2135-nan-conversion-when-casting > *)$ plasma_store --help > plasma_store: invalid option -- '-' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2888) [Plasma] Several GPU-related APIs are used in places where errors cannot be appropriately handled
[ https://issues.apache.org/jira/browse/ARROW-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2888: Fix Version/s: (was: 0.14.0) > [Plasma] Several GPU-related APIs are used in places where errors cannot be > appropriately handled > - > > Key: ARROW-2888 > URL: https://issues.apache.org/jira/browse/ARROW-2888 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Plasma >Reporter: Wes McKinney >Priority: Major > > I'm adding {{DCHECK_OK}} statements for ARROW-2883 to fix the unchecked > Status warnings, but this code should be refactored so that these errors can > bubble up properly -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2879) [Python] Arrow plasma can only use a small part of specified shared memory
[ https://issues.apache.org/jira/browse/ARROW-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852513#comment-16852513 ] Wes McKinney commented on ARROW-2879: - Want to submit a pull request? > [Python] Arrow plasma can only use a small part of specified shared memory > -- > > Key: ARROW-2879 > URL: https://issues.apache.org/jira/browse/ARROW-2879 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: chineking >Priority: Major > Fix For: 0.14.0 > > > Hi, thanks for the great job of arrow, it helps us a lot. > However, we encounter a problem when we were using plasma. > The sample code: > {code:python} > import numpy as np > import pyarrow as pa > import pyarrow.plasma as plasma > client = plasma.connect("/tmp/plasma", "", 0) > puts = [] > nbytes = 0 > while True: > a = np.ones((1000, 1000)) > try: > oid = client.put(a) > puts.append(client.get(oid)) > nbytes += a.nbytes > except pa.lib.PlasmaStoreFull: > print('use nbytes', nbytes) > break > {code} > We start a plasma store with 1G memory, but the nbytes output above is only > 49600, which cannot even reach half of the memory we specified. > I cannot figure out why plasma can only use such a small part of shared > memory. Could anybody help me? Thanks a lot. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2912) [Website] Build more detailed Community landing patch a la Apache Spark
[ https://issues.apache.org/jira/browse/ARROW-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852514#comment-16852514 ] Wes McKinney commented on ARROW-2912: - cc [~npr] > [Website] Build more detailed Community landing patch a la Apache Spark > --- > > Key: ARROW-2912 > URL: https://issues.apache.org/jira/browse/ARROW-2912 > Project: Apache Arrow > Issue Type: Improvement > Components: Website >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > It would be useful to have some prose descriptions of where to get help and > where to direct questions. See example: > http://spark.apache.org/community.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2910) [Packaging] Build from official apache archive
[ https://issues.apache.org/jira/browse/ARROW-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2910: Fix Version/s: (was: 0.14.0) > [Packaging] Build from official apache archive > -- > > Key: ARROW-2910 > URL: https://issues.apache.org/jira/browse/ARROW-2910 > Project: Apache Arrow > Issue Type: Task > Components: Packaging >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Time Spent: 7.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2887) [Plasma] Methods in plasma/store.h returning PlasmaError should return Status instead
[ https://issues.apache.org/jira/browse/ARROW-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2887: Fix Version/s: (was: 0.14.0) > [Plasma] Methods in plasma/store.h returning PlasmaError should return Status > instead > - > > Key: ARROW-2887 > URL: https://issues.apache.org/jira/browse/ARROW-2887 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Plasma >Reporter: Wes McKinney >Priority: Major > > These functions are not able to return other kinds of errors (e.g. > CUDA-related errors) as a result of this. I encountered this while working on > ARROW-2883 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2939) [Python] API documentation version doesn't match latest on PyPI
[ https://issues.apache.org/jira/browse/ARROW-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852516#comment-16852516 ] Wes McKinney commented on ARROW-2939: - The published docs are now for the latest released version. I think it would be useful to have an archive of old documentation versions, though. Removing this from the 0.14 Fix Version > [Python] API documentation version doesn't match latest on PyPI > --- > > Key: ARROW-2939 > URL: https://issues.apache.org/jira/browse/ARROW-2939 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Ian Robertson >Priority: Minor > Labels: documentation > Fix For: 0.14.0 > > > Hey folks, apologies if this isn't the right place to raise this. In poking > around the web documentation (for pyarrow specifically), it looks like the > auto-generated API docs contain commits past the release of 0.9.0. For > example: > * > [https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.column] > * Contains differences merged here: > [https://github.com/apache/arrow/pull/1923] > * But latest pypi/conda versions of pyarrow are 0.9.0, which don't include > that change. > Not sure if the docs are auto-built off master somewhere, I couldn't find > anything about building docs in the docs itself. I would guess that you may > want some of the usage docs to be published in between releases if they're > not about new functionality, but the API reference being out of date can be > confusing. Is it possible to anchor the API docs to the latest released > version? Or even something like how Pandas has a whole bunch of old versions > still available? (e.g. [https://pandas.pydata.org/pandas-docs/stable/] vs. > old versions like [http://pandas.pydata.org/pandas-docs/version/0.17.0/]) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2853) [Python] Implementing support for zero copy NumPy arrays in libarrow_python
[ https://issues.apache.org/jira/browse/ARROW-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2853: Fix Version/s: (was: 0.14.0) > [Python] Implementing support for zero copy NumPy arrays in libarrow_python > --- > > Key: ARROW-2853 > URL: https://issues.apache.org/jira/browse/ARROW-2853 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Florian Rathgeber >Priority: Major > > Implementing support for zero copy NumPy arrays in libarrow_python (i.e. in > C++). We can utilize common code paths with `{{to_pandas`}} and toggle > between NumPy-for-pandas and NumPy-for-NumPy behavior (and use the > `zero_copy_only` flag where needed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2858) [Packaging] Add unit tests for crossbow
[ https://issues.apache.org/jira/browse/ARROW-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2858: Fix Version/s: (was: 0.14.0) > [Packaging] Add unit tests for crossbow > --- > > Key: ARROW-2858 > URL: https://issues.apache.org/jira/browse/ARROW-2858 > Project: Apache Arrow > Issue Type: Task > Components: Packaging >Reporter: Phillip Cloud >Priority: Major > > As this code grows we should start adding unit tests to make sure we can make > changes safely. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2818) [Python] Better error message when passing SparseDataFrame into Table.from_pandas
[ https://issues.apache.org/jira/browse/ARROW-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852510#comment-16852510 ] Wes McKinney commented on ARROW-2818: - [~jorisvandenbossche] > [Python] Better error message when passing SparseDataFrame into > Table.from_pandas > - > > Key: ARROW-2818 > URL: https://issues.apache.org/jira/browse/ARROW-2818 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > This can be a rough edge for users. Note that pandas sparse support is being > considered for deprecation > original issue https://github.com/apache/arrow/issues/1894 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2870) [Python] Define API for handling null markers from Array.to_numpy
[ https://issues.apache.org/jira/browse/ARROW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852511#comment-16852511 ] Wes McKinney commented on ARROW-2870: - [~jorisvandenbossche] > [Python] Define API for handling null markers from Array.to_numpy > - > > Key: ARROW-2870 > URL: https://issues.apache.org/jira/browse/ARROW-2870 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > This is follow-up work for {{Arrow.to_numpy}} started in ARROW-564 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2446) [C++] SliceBuffer on CudaBuffer should return CudaBuffer
[ https://issues.apache.org/jira/browse/ARROW-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2446: Fix Version/s: (was: 0.14.0) > [C++] SliceBuffer on CudaBuffer should return CudaBuffer > > > Key: ARROW-2446 > URL: https://issues.apache.org/jira/browse/ARROW-2446 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, GPU >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Priority: Major > > Currently {{SliceBuffer}} on a {{CudaBuffer}} returns a plain {{Buffer}} > instance, which is dangerous for unsuspecting consumers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5344) [C++] Use ArrayDataVisitor in implementation of dictionary unpacking in compute/kernels/cast.cc
[ https://issues.apache.org/jira/browse/ARROW-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-5344: Fix Version/s: (was: 0.14.0) > [C++] Use ArrayDataVisitor in implementation of dictionary unpacking in > compute/kernels/cast.cc > --- > > Key: ARROW-5344 > URL: https://issues.apache.org/jira/browse/ARROW-5344 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > > Follow-up to code review from ARROW-3144 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5334) [C++] Add "Type" to names of arrow::Integer, arrow::FloatingPoint classes for consistency
[ https://issues.apache.org/jira/browse/ARROW-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-5334: --- Assignee: Wes McKinney > [C++] Add "Type" to names of arrow::Integer, arrow::FloatingPoint classes for > consistency > - > > Key: ARROW-5334 > URL: https://issues.apache.org/jira/browse/ARROW-5334 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > These intermediate classes used for template metaprogramming (in particular, > {{std::is_base_of}}) have inconsistent names with the rest of data types. For > clarity, I think we should add "Type" to these class names and others like > them > Please do after ARROW-3144 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5073) [C++] Build toolchain support for libcurl
[ https://issues.apache.org/jira/browse/ARROW-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-5073: Fix Version/s: (was: 0.14.0) 0.15.0 > [C++] Build toolchain support for libcurl > - > > Key: ARROW-5073 > URL: https://issues.apache.org/jira/browse/ARROW-5073 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: filesystem > Fix For: 0.15.0 > > > libcurl can be used in a number of different situations (e.g. TensorFlow uses > it for GCS interactions > https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/cloud/gcs_file_system.cc) > so this will likely be required once we begin to tackle that problem -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5420) [Java] Implement or remove getCurrentSizeInBytes in VariableWidthVector
[ https://issues.apache.org/jira/browse/ARROW-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Kornfield resolved ARROW-5420. Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 4390 [https://github.com/apache/arrow/pull/4390] > [Java] Implement or remove getCurrentSizeInBytes in VariableWidthVector > --- > > Key: ARROW-5420 > URL: https://issues.apache.org/jira/browse/ARROW-5420 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Ji Liu >Assignee: Ji Liu >Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > Now VariableWidthVector#getCurrentSizeInBytes doesn't seem to have been > implemented. We should implement it or just remove it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2671) [Python] Run ASV suite in nightly build, only run in Travis CI on demand
[ https://issues.apache.org/jira/browse/ARROW-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2671: Fix Version/s: (was: 0.14.0) > [Python] Run ASV suite in nightly build, only run in Travis CI on demand > > > Key: ARROW-2671 > URL: https://issues.apache.org/jira/browse/ARROW-2671 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: nightly > > Lately the main Travis CI build is running nearly 40 minutes long, e.g. here > is the latest commit on master > https://travis-ci.org/apache/arrow/builds/387326546 > A fair chunk of the long runtime is spent running the Python benchmarks at > the end of the test suite. We should absolutely keep these running smoothly. > However: > * It may be just as valuable to run them on master nightly, and report in if > they are broken > * We could add a check to look at the commit message and run them in Travis > CI if requested > If others agree, I suggest that as soon as the packaging bot / nightly build > tool is working properly, that we make these changes in the interest of > improving CI build times -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2667) [C++/Python] Add pandas-like take method to Array
[ https://issues.apache.org/jira/browse/ARROW-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2667: Summary: [C++/Python] Add pandas-like take method to Array (was: [C++/Python] Add pandas-like take method to Array/Column/ChunkedArray) > [C++/Python] Add pandas-like take method to Array > - > > Key: ARROW-2667 > URL: https://issues.apache.org/jira/browse/ARROW-2667 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Uwe L. Korn >Priority: Major > Fix For: 0.14.0 > > > We should add a {{take}} method to {{Array/ChunkedArray/Column}} that takes a > list of indices and returns a reordered array. > For reference, see Pandas' interface: > https://github.com/pandas-dev/pandas/blob/2cbdd9a2cd19501c98582490e35c5402ae6de941/pandas/core/arrays/base.py#L466 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2801) [Python] Implement splt_row_groups for ParquetDataset
[ https://issues.apache.org/jira/browse/ARROW-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2801: Labels: datasets parquet pull-request-available (was: parquet pull-request-available) > [Python] Implement splt_row_groups for ParquetDataset > - > > Key: ARROW-2801 > URL: https://issues.apache.org/jira/browse/ARROW-2801 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Robbie Gruener >Assignee: Robbie Gruener >Priority: Minor > Labels: datasets, parquet, pull-request-available > Fix For: 0.14.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently the split_row_groups argument in ParquetDataset yields a not > implemented error. An easy and efficient way to implement this is by using > the summary metadata file instead of opening every footer file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2671) [Python] Run ASV suite in nightly build, only run in Travis CI on demand
[ https://issues.apache.org/jira/browse/ARROW-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852509#comment-16852509 ] Wes McKinney commented on ARROW-2671: - Well 6 months later we still aren't running these. I hope we are running them by the end of 2019 cc [~npr] > [Python] Run ASV suite in nightly build, only run in Travis CI on demand > > > Key: ARROW-2671 > URL: https://issues.apache.org/jira/browse/ARROW-2671 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: nightly > > Lately the main Travis CI build is running nearly 40 minutes long, e.g. here > is the latest commit on master > https://travis-ci.org/apache/arrow/builds/387326546 > A fair chunk of the long runtime is spent running the Python benchmarks at > the end of the test suite. We should absolutely keep these running smoothly. > However: > * It may be just as valuable to run them on master nightly, and report in if > they are broken > * We could add a check to look at the commit message and run them in Travis > CI if requested > If others agree, I suggest that as soon as the packaging bot / nightly build > tool is working properly, that we make these changes in the interest of > improving CI build times -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2702) [Python] Examine usages of Invalid and TypeError errors in numpy_to_arrow.cc to see if we are using the right error type in each instance
[ https://issues.apache.org/jira/browse/ARROW-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2702: Fix Version/s: (was: 0.14.0) > [Python] Examine usages of Invalid and TypeError errors in numpy_to_arrow.cc > to see if we are using the right error type in each instance > - > > Key: ARROW-2702 > URL: https://issues.apache.org/jira/browse/ARROW-2702 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > > See discussion in [https://github.com/apache/arrow/pull/2075] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5454) [C++] Implement Take on ChunkedArray for DataFrame use
Wes McKinney created ARROW-5454: --- Summary: [C++] Implement Take on ChunkedArray for DataFrame use Key: ARROW-5454 URL: https://issues.apache.org/jira/browse/ARROW-5454 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 0.15.0 Follow up to ARROW-2667 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3246) [Python][Parquet] direct reading/writing of pandas categoricals in parquet
[ https://issues.apache.org/jira/browse/ARROW-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-3246: --- Assignee: Wes McKinney > [Python][Parquet] direct reading/writing of pandas categoricals in parquet > -- > > Key: ARROW-3246 > URL: https://issues.apache.org/jira/browse/ARROW-3246 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Martin Durant >Assignee: Wes McKinney >Priority: Minor > Labels: parquet > Fix For: 0.14.0 > > > Parquet supports "dictionary encoding" of column data in a manner very > similar to the concept of Categoricals in pandas. It is natural to use this > encoding for a column which originated as a categorical. Conversely, when > loading, if the file metadata says that a given column came from a pandas (or > arrow) categorical, then we can trust that the whole of the column is > dictionary-encoded and load the data directly into a categorical column, > rather than expanding the labels upon load and recategorising later. > If the data does not have the pandas metadata, then the guarantee cannot > hold, and we cannot assume either that the whole column is dictionary encoded > or that the labels are the same throughout. In this case, the current > behaviour is fine. > > (please forgive that some of this has already been mentioned elsewhere; this > is one of the entries in the list at > [https://github.com/dask/fastparquet/issues/374] as a feature that is useful > in fastparquet) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3232) [Python] Return an ndarray from Column.to_pandas
[ https://issues.apache.org/jira/browse/ARROW-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3232: Fix Version/s: (was: 0.14.0) > [Python] Return an ndarray from Column.to_pandas > > > Key: ARROW-3232 > URL: https://issues.apache.org/jira/browse/ARROW-3232 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Krisztian Szucs >Priority: Major > > See discussion: > https://github.com/apache/arrow/pull/2535#discussion_r216299243 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3192) [Java] Implement "ArrowBufReadChannel" abstraction and alternate MessageSerializer that uses this
[ https://issues.apache.org/jira/browse/ARROW-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3192: Fix Version/s: (was: 0.14.0) > [Java] Implement "ArrowBufReadChannel" abstraction and alternate > MessageSerializer that uses this > - > > Key: ARROW-3192 > URL: https://issues.apache.org/jira/browse/ARROW-3192 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Wes McKinney >Priority: Major > > The current MessageSerializer implementation is wasteful when used to read an > IPC payload that is already in-memory in an {{ArrowBuf}}. In particular, > reads out of a {{ReadChannel}} require memory allocation > * > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L569 > * > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L290 > In C++, we have abstracted memory allocation out of the IPC read path so that > zero-copy is possible. I suggest that a similar mechanism can be developed > for Java to improve deserialization performance for in-memory messages. The > new interface would return {{ArrowBuf}} when performing reads, which could be > zero-copy when possible, but when not the current strategy of allocate-copy > could be used -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3185) [C++] Address libparquet SO version convention in unified build
[ https://issues.apache.org/jira/browse/ARROW-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3185: Fix Version/s: (was: 0.14.0) > [C++] Address libparquet SO version convention in unified build > --- > > Key: ARROW-3185 > URL: https://issues.apache.org/jira/browse/ARROW-3185 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > > Follow up work to ARROW-3075 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3543) [R] Time zone adjustment issue when reading Feather file written by Python
[ https://issues.apache.org/jira/browse/ARROW-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852543#comment-16852543 ] Wes McKinney commented on ARROW-3543: - cc [~npr] > [R] Time zone adjustment issue when reading Feather file written by Python > -- > > Key: ARROW-3543 > URL: https://issues.apache.org/jira/browse/ARROW-3543 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Olaf >Priority: Critical > Fix For: 0.14.0 > > > Hello the dream team, > Pasting from [https://github.com/wesm/feather/issues/351] > Thanks for this wonderful package. I was playing with feather and some > timestamps and I noticed some dangerous behavior. Maybe it is a bug. > Consider this > > {code:java} > import pandas as pd > import feather > import numpy as np > df = pd.DataFrame( > {'string_time_utc' : [pd.to_datetime('2018-02-01 14:00:00.531'), > pd.to_datetime('2018-02-01 14:01:00.456'), pd.to_datetime('2018-03-05 > 14:01:02.200')]} > ) > df['timestamp_est'] = > pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None) > df > Out[17]: > string_time_utc timestamp_est > 0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531 > 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456 > 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200 > {code} > Here I create the corresponding `EST` timestamp of my original timestamps (in > `UTC` time). > Now saving the dataframe to `csv` or to `feather` will generate two > completely different results. > > {code:java} > df.to_csv('P://testing.csv') > df.to_feather('P://testing.feather') > {code} > Switching to R. > Using the good old `csv` gives me something a bit annoying, but expected. R > thinks my timezone is `UTC` by default, and wrongly attached this timezone to > `timestamp_est`. No big deal, I can always use `with_tz` or even better: > import as character and process as timestamp while in R. > > {code:java} > > dataframe <- read_csv('P://testing.csv') > Parsed with column specification: > cols( > X1 = col_integer(), > string_time_utc = col_datetime(format = ""), > timestamp_est = col_datetime(format = "") > ) > Warning message: > Missing column names filled in: 'X1' [1] > > > > dataframe %>% mutate(mytimezone = tz(timestamp_est)) > A tibble: 3 x 4 > X1 string_time_utc timestamp_est > > 1 0 2018-02-01 14:00:00.530 2018-02-01 09:00:00.530 > 2 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456 > 3 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200 > mytimezone > > 1 UTC > 2 UTC > 3 UTC {code} > {code:java} > #Now look at what happens with feather: > > > dataframe <- read_feather('P://testing.feather') > > > > dataframe %>% mutate(mytimezone = tz(timestamp_est)) > A tibble: 3 x 3 > string_time_utc timestamp_est mytimezone > > 1 2018-02-01 09:00:00.531 2018-02-01 04:00:00.531 "" > 2 2018-02-01 09:01:00.456 2018-02-01 04:01:00.456 "" > 3 2018-03-05 09:01:02.200 2018-03-05 04:01:02.200 "" {code} > My timestamps have been converted!!! pure insanity. > Am I missing something here? > Thanks!! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3579) [Crossbow] Unintuitive error message when remote branch has not been pushed
[ https://issues.apache.org/jira/browse/ARROW-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3579: Fix Version/s: (was: 0.14.0) > [Crossbow] Unintuitive error message when remote branch has not been pushed > --- > > Key: ARROW-3579 > URL: https://issues.apache.org/jira/browse/ARROW-3579 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools >Reporter: Wes McKinney >Assignee: Krisztian Szucs >Priority: Major > > {code} > $ python dev/tasks/crossbow.py submit -g linux --arrow-version 0.11.1-rc0 > Traceback (most recent call last): > File "dev/tasks/crossbow.py", line 796, in > crossbow(obj={}, auto_envvar_prefix='CROSSBOW') > File > "/home/wesm/miniconda/envs/arrow-release/lib/python3.6/site-packages/click/core.py", > line 764, in __call__ > return self.main(*args, **kwargs) > File > "/home/wesm/miniconda/envs/arrow-release/lib/python3.6/site-packages/click/core.py", > line 717, in main > rv = self.invoke(ctx) > File > "/home/wesm/miniconda/envs/arrow-release/lib/python3.6/site-packages/click/core.py", > line 1137, in invoke > return _process_result(sub_ctx.command.invoke(sub_ctx)) > File > "/home/wesm/miniconda/envs/arrow-release/lib/python3.6/site-packages/click/core.py", > line 956, in invoke > return ctx.invoke(self.callback, **ctx.params) > File > "/home/wesm/miniconda/envs/arrow-release/lib/python3.6/site-packages/click/core.py", > line 555, in invoke > return callback(*args, **kwargs) > File > "/home/wesm/miniconda/envs/arrow-release/lib/python3.6/site-packages/click/decorators.py", > line 17, in new_func > return f(get_current_context(), *args, **kwargs) > File "dev/tasks/crossbow.py", line 596, in submit > target = Target.from_repo(arrow) > File "dev/tasks/crossbow.py", line 407, in from_repo > remote=repo.remote_url, > File "dev/tasks/crossbow.py", line 235, in remote_url > return self.remote.url.replace( > File "dev/tasks/crossbow.py", line 225, in remote > return self.repo.remotes[self.branch.upstream.remote_name] > AttributeError: 'NoneType' object has no attribute 'remote_name' > {code} > The fix was to make sure the local branch and the reference branch for the > build in my fork wesm/arrow was the same -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3571) [Wiki] Release management guide does not explain how to set up Crossbow or where to find instructions
[ https://issues.apache.org/jira/browse/ARROW-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852544#comment-16852544 ] Wes McKinney commented on ARROW-3571: - cc [~npr] > [Wiki] Release management guide does not explain how to set up Crossbow or > where to find instructions > - > > Key: ARROW-3571 > URL: https://issues.apache.org/jira/browse/ARROW-3571 > Project: Apache Arrow > Issue Type: Improvement > Components: Wiki >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > If you follow the guide, at one point it says "Launch a Crossbow build" but > provides no link to the setup instructions for this -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3650) [Python] Mixed column indexes are read back as strings
[ https://issues.apache.org/jira/browse/ARROW-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3650: Fix Version/s: (was: 0.14.0) > [Python] Mixed column indexes are read back as strings > --- > > Key: ARROW-3650 > URL: https://issues.apache.org/jira/browse/ARROW-3650 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.11.1 >Reporter: Armin Berres >Priority: Major > Labels: parquet, pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > Consider the following example: > {code:java} > df = pd.DataFrame(1, index=[pd.to_datetime('2018/01/01')], columns=['a > string', pd.to_datetime('2018/01/02')]) > table = pa.Table.from_pandas(df) > pq.write_table(table, 'test.parquet') > ref_df = pq.read_pandas('test.parquet').to_pandas() > print(df.columns) > # Index(['a string', 2018-01-02 00:00:00], dtype='object') > print(ref_df.columns) > # Index(['a string', '2018-01-02 00:00:00'], dtype='object') > {code} > The serialized data frame has an index with a string and a datetime field > (happened when resetting the index of a formerly datetime only column). > When reading the string back the datetime is converted into a string. > When looking at the schema I find {{"pandas_type": "mixed", "numpy_ty' > b'pe": "object"}} before serializing and {{"pandas_type": > "unicode", "numpy_' > b'type": "object"}} after reading back. So the schema was aware > of the mixed type but did not store the actual types. > The same happens with other types like numbers as well. One can produce > interesting situations: > {{pd.DataFrame(1, index=[pd.to_datetime('2018/01/01')], columns=['1', 1])}} > can be written but fails to be read back as the index is no more unique with > '1' showing up two times. > IIf this is not a bug but expected maybe the user should be somehow warned > that information is lost? Like a {{NotImplemented}} exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3538) [Python] ability to override the automated assignment of uuid for filenames when writing datasets
[ https://issues.apache.org/jira/browse/ARROW-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3538: Fix Version/s: (was: 0.14.0) 0.15.0 > [Python] ability to override the automated assignment of uuid for filenames > when writing datasets > - > > Key: ARROW-3538 > URL: https://issues.apache.org/jira/browse/ARROW-3538 > Project: Apache Arrow > Issue Type: Wish > Components: Python >Affects Versions: 0.10.0 >Reporter: Ji Xu >Priority: Major > Labels: datasets, features, parquet > Fix For: 0.15.0 > > > Say I have a pandas DataFrame {{df}} that I would like to store on disk as > dataset using pyarrow parquet, I would do this: > {code:java} > table = pyarrow.Table.from_pandas(df) > pyarrow.parquet.write_to_dataset(table, root_path=some_path, > partition_cols=['a',]){code} > On disk the dataset would look like something like this: > {color:#14892c}some_path{color} > {color:#14892c}├── a=1{color} > {color:#14892c}├── 4498704937d84fe5abebb3f06515ab2d.parquet{color} > {color:#14892c}├── a=2{color} > {color:#14892c}├── 8bcfaed8986c4bdba587aaaee532370c.parquet{color} > *Wished Feature:* It'd be great if I can override the auto-assignment of the > long UUID as filename somehow during the *dataset* writing. My purpose is to > be able to overwrite the dataset on disk when I have a new version of {{df}}. > Currently if I try to write the dataset again, another new uniquely named > [UUID].parquet file will be placed next to the old one, with the same, > redundant data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3435) [C++] Add option to use dynamic linking with re2
[ https://issues.apache.org/jira/browse/ARROW-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-3435. - Resolution: Fixed Fix Version/s: (was: 0.14.0) 0.13.0 If libre2.so is available, it is used now instead of static linking {code} $ ldd ~/local/lib/libgandiva.so linux-vdso.so.1 (0x7ffe46d76000) libarrow.so.14 => /home/wesm/local/lib/libarrow.so.14 (0x7f59ce1ac000) libre2.so.0 => /home/wesm/cpp-runtime-toolchain/lib/libre2.so.0 (0x7f59ce13a000) libglog.so.0 => /home/wesm/cpp-runtime-toolchain/lib/libglog.so.0 (0x7f59ce106000) libz.so.1 => /home/wesm/cpp-runtime-toolchain/lib/libz.so.1 (0x7f59ce0ec000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x7f59ce0c4000) libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x7f59ce096000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f59ce073000) libstdc++.so.6 => /home/wesm/cpp-runtime-toolchain/lib/libstdc++.so.6 (0x7f59cdf31000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x7f59cdde3000) libgcc_s.so.1 => /home/wesm/cpp-runtime-toolchain/lib/libgcc_s.so.1 (0x7f59cddcf000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7f59cdbe4000) /lib64/ld-linux-x86-64.so.2 (0x7f59d1291000) libbrotlienc.so.1 => /usr/lib/x86_64-linux-gnu/libbrotlienc.so.1 (0x7f59cdb56000) libbrotlidec.so.1 => /usr/lib/x86_64-linux-gnu/libbrotlidec.so.1 (0x7f59cdb45000) libbz2.so.1.0 => /home/wesm/cpp-runtime-toolchain/lib/libbz2.so.1.0 (0x7f59cdb31000) liblz4.so.1 => /home/wesm/cpp-runtime-toolchain/lib/liblz4.so.1 (0x7f59cd921000) libsnappy.so.1 => /home/wesm/cpp-runtime-toolchain/lib/libsnappy.so.1 (0x7f59cd916000) libzstd.so.1.3.8 => /home/wesm/cpp-runtime-toolchain/lib/libzstd.so.1.3.8 (0x7f59cd868000) libboost_system.so.1.68.0 => /home/wesm/cpp-runtime-toolchain/lib/libboost_system.so.1.68.0 (0x7f59cd861000) libboost_filesystem.so.1.68.0 => /home/wesm/cpp-runtime-toolchain/lib/libboost_filesystem.so.1.68.0 (0x7f59cd841000) libboost_regex.so.1.68.0 => /home/wesm/cpp-runtime-toolchain/lib/libboost_regex.so.1.68.0 (0x7f59cd738000) libbrotlicommon.so.1 => /usr/lib/x86_64-linux-gnu/libbrotlicommon.so.1 (0x7f59cd715000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x7f59cd70a000) libicudata.so.58 => /home/wesm/cpp-runtime-toolchain/lib/./libicudata.so.58 (0x7f59cbe06000) libicui18n.so.58 => /home/wesm/cpp-runtime-toolchain/lib/./libicui18n.so.58 (0x7f59cbb87000) libicuuc.so.58 => /home/wesm/cpp-runtime-toolchain/lib/./libicuuc.so.58 (0x7f59cb9d4000) {code} > [C++] Add option to use dynamic linking with re2 > > > Key: ARROW-3435 > URL: https://issues.apache.org/jira/browse/ARROW-3435 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Uwe L. Korn >Priority: Major > Fix For: 0.13.0 > > > Initial support for re2 uses static linking -- some applications may wish to > use dynamic linking -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3538) [Python] ability to override the automated assignment of uuid for filenames when writing datasets
[ https://issues.apache.org/jira/browse/ARROW-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3538: Labels: datasets features parquet (was: features parquet) > [Python] ability to override the automated assignment of uuid for filenames > when writing datasets > - > > Key: ARROW-3538 > URL: https://issues.apache.org/jira/browse/ARROW-3538 > Project: Apache Arrow > Issue Type: Wish > Components: Python >Affects Versions: 0.10.0 >Reporter: Ji Xu >Priority: Major > Labels: datasets, features, parquet > Fix For: 0.14.0 > > > Say I have a pandas DataFrame {{df}} that I would like to store on disk as > dataset using pyarrow parquet, I would do this: > {code:java} > table = pyarrow.Table.from_pandas(df) > pyarrow.parquet.write_to_dataset(table, root_path=some_path, > partition_cols=['a',]){code} > On disk the dataset would look like something like this: > {color:#14892c}some_path{color} > {color:#14892c}├── a=1{color} > {color:#14892c}├── 4498704937d84fe5abebb3f06515ab2d.parquet{color} > {color:#14892c}├── a=2{color} > {color:#14892c}├── 8bcfaed8986c4bdba587aaaee532370c.parquet{color} > *Wished Feature:* It'd be great if I can override the auto-assignment of the > long UUID as filename somehow during the *dataset* writing. My purpose is to > be able to overwrite the dataset on disk when I have a new version of {{df}}. > Currently if I try to write the dataset again, another new uniquely named > [UUID].parquet file will be placed next to the old one, with the same, > redundant data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3495) [Java] Optimize bit operations performance
[ https://issues.apache.org/jira/browse/ARROW-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3495: Fix Version/s: (was: 0.14.0) > [Java] Optimize bit operations performance > -- > > Key: ARROW-3495 > URL: https://issues.apache.org/jira/browse/ARROW-3495 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Affects Versions: 0.11.0 >Reporter: Li Jin >Assignee: Animesh Trivedi >Priority: Major > > From [~atrivedi]'s benchmark finding: > 2) Materialize values from Validity and Value direct buffers instead of > calling getInt() function on the IntVector. This is implemented as a new > Unsafe reader type ( > [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L31] > ) > 3) Optimize bitmap operation to check if a bit is set or not ( > [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L23] > ) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3496) [Java] Add microbenchmark code to Java
[ https://issues.apache.org/jira/browse/ARROW-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3496: Fix Version/s: (was: 0.14.0) > [Java] Add microbenchmark code to Java > -- > > Key: ARROW-3496 > URL: https://issues.apache.org/jira/browse/ARROW-3496 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Affects Versions: 0.11.0 >Reporter: Li Jin >Assignee: Animesh Trivedi >Priority: Major > > [~atrivedi] has done some microbenchmarking with the Java API. Let's consider > adding them to the codebase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3471) [C++][Gandiva] Investigate caching isomorphic expressions
[ https://issues.apache.org/jira/browse/ARROW-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3471: Fix Version/s: (was: 0.14.0) > [C++][Gandiva] Investigate caching isomorphic expressions > - > > Key: ARROW-3471 > URL: https://issues.apache.org/jira/browse/ARROW-3471 > Project: Apache Arrow > Issue Type: Task > Components: C++ - Gandiva >Reporter: Praveen Kumar Desabandu >Priority: Major > Labels: gandiva > > Two expressions say add(a+b) and add(c+d), could potentially be reused if the > only thing differing are the names. > Test E2E. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3503) [Python] Allow config hadoop_bin in pyarrow hdfs.py
[ https://issues.apache.org/jira/browse/ARROW-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3503: Fix Version/s: (was: 0.14.0) > [Python] Allow config hadoop_bin in pyarrow hdfs.py > > > Key: ARROW-3503 > URL: https://issues.apache.org/jira/browse/ARROW-3503 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Wenbo Zhao >Priority: Major > Labels: filesystem, pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently, the hadoop_bin is either from `HADOOP_HOME` or the `hadoop` > command. > [https://github.com/apache/arrow/blob/master/python/pyarrow/hdfs.py#L130] > However, in some of environment setup, hadoop_bin could be some other > location. Can we do something like > > {code:java} > if 'HADOOP_BIN' in os.environ: > hadoop_bin = os.environ['HADOOP_BIN'] > elif 'HADOOP_HOME' in os.environ: > hadoop_bin = '{0}/bin/hadoop'.format(os.environ['HADOOP_HOME']) > else: > hadoop_bin = 'hadoop' > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3444) [Python] Table.nbytes attribute
[ https://issues.apache.org/jira/browse/ARROW-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3444: Fix Version/s: (was: 0.14.0) > [Python] Table.nbytes attribute > --- > > Key: ARROW-3444 > URL: https://issues.apache.org/jira/browse/ARROW-3444 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Dave Hirschfeld >Priority: Minor > > As it says in the title, I think this would be a very handy attribute to have > available in Python. You can get it by converting to pandas and using > `DataFrame.nbytes` but this is wasteful of both time and memory so it would > be good to have this information on the `pyarrow.Table` object itself. > This could be implemented using the > [__sizeof__|https://docs.python.org/3/library/sys.html#sys.getsizeof] protocol -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5455) [Rust] Build broken by 2019-05-30 Rust nightly
[ https://issues.apache.org/jira/browse/ARROW-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-5455: Priority: Blocker (was: Major) > [Rust] Build broken by 2019-05-30 Rust nightly > -- > > Key: ARROW-5455 > URL: https://issues.apache.org/jira/browse/ARROW-5455 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Wes McKinney >Priority: Blocker > Fix For: 0.14.0 > > > Seem example failed build > https://travis-ci.org/apache/arrow/jobs/539477452 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4398) [Python] Add benchmarks for Arrow<>Parquet BYTE_ARRAY serialization (read and write)
[ https://issues.apache.org/jira/browse/ARROW-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4398: Fix Version/s: (was: 0.14.0) 0.15.0 > [Python] Add benchmarks for Arrow<>Parquet BYTE_ARRAY serialization (read and > write) > > > Key: ARROW-4398 > URL: https://issues.apache.org/jira/browse/ARROW-4398 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: parquet > Fix For: 0.15.0 > > > This is follow-on work to PARQUET-1508, so we can monitor the performance of > this operation over time -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4369) [Packaging] Release verification script should test linux packages via docker
[ https://issues.apache.org/jira/browse/ARROW-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852579#comment-16852579 ] Wes McKinney commented on ARROW-4369: - [~kszucs] any thoughts about this for 0.14? We can also postpone > [Packaging] Release verification script should test linux packages via docker > - > > Key: ARROW-4369 > URL: https://issues.apache.org/jira/browse/ARROW-4369 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Krisztian Szucs >Priority: Major > Fix For: 0.14.0 > > > It shouldn't be too hard to create a verification script which checks the > linux packages. This could prevent issues like [ARROW-4368] / > [https://github.com/apache/arrow/issues/3476] > I suggest to separate the current verification script into one which verifies > the source release artifact and another which verifies the binaries: > * checksum and signatures as is right now > * install linux packages on multiple distros via docker > We could test wheels and conda packages as well, but in follow-up PRs. > > cc [~kou] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4419) [Flight] Deal with body buffers in FlightData
[ https://issues.apache.org/jira/browse/ARROW-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852580#comment-16852580 ] Wes McKinney commented on ARROW-4419: - [~lidavidm] where does this issue stand? > [Flight] Deal with body buffers in FlightData > - > > Key: ARROW-4419 > URL: https://issues.apache.org/jira/browse/ARROW-4419 > Project: Apache Arrow > Issue Type: Improvement > Components: FlightRPC >Reporter: David Li >Priority: Minor > Labels: flight > Fix For: 0.14.0 > > > The Java implementation will fail to decode a schema message if the message > also contains (empty) body buffers (see ArrowMessage.asSchema's precondition > checks). However, clients using default Protobuf serialization will likely > write an empty body buffer by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4409) [C++] Enable arrow::ipc internal JSON reader to read from a file path
[ https://issues.apache.org/jira/browse/ARROW-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4409: Fix Version/s: (was: 0.14.0) > [C++] Enable arrow::ipc internal JSON reader to read from a file path > - > > Key: ARROW-4409 > URL: https://issues.apache.org/jira/browse/ARROW-4409 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Minor > > This may make tests easier to write. Currently an input buffer is required, > so reading from a file requires some boilerplate -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5448) [CI] MinGW build failures on AppVeyor
[ https://issues.apache.org/jira/browse/ARROW-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-5448: --- Assignee: Kouhei Sutou > [CI] MinGW build failures on AppVeyor > - > > Key: ARROW-5448 > URL: https://issues.apache.org/jira/browse/ARROW-5448 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Continuous Integration >Reporter: Antoine Pitrou >Assignee: Kouhei Sutou >Priority: Blocker > > Apparently the Numpy package is broken. See > https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/24922425/job/9yoq08uepk5p6dwb > {code} > -- Found PythonLibs: C:/msys64/mingw32/lib/libpython3.7m.dll.a > CMake Error at cmake_modules/FindNumPy.cmake:62 (message): > NumPy import failure: > Traceback (most recent call last): > File > "C:/msys64/mingw32/lib/python3.7/site-packages\numpy\core\__init__.py", line > 40, in > from . import multiarray > File > "C:/msys64/mingw32/lib/python3.7/site-packages\numpy\core\multiarray.py", > line 12, in > from . import overrides > File > "C:/msys64/mingw32/lib/python3.7/site-packages\numpy\core\overrides.py", line > 6, in > from numpy.core._multiarray_umath import ( > ImportError: DLL load failed: The specified module could not be found. > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5453) [C++] Just-released cmake-format 0.5.2 breaks the build
[ https://issues.apache.org/jira/browse/ARROW-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-5453: --- Assignee: Wes McKinney > [C++] Just-released cmake-format 0.5.2 breaks the build > --- > > Key: ARROW-5453 > URL: https://issues.apache.org/jira/browse/ARROW-5453 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Blocker > Fix For: 0.14.0 > > > It seems we should always pin the cmake-format version until the developers > stop changing the formatting algorithm -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2365) [Plasma] Return status codes instead of crashing
[ https://issues.apache.org/jira/browse/ARROW-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2365: Fix Version/s: (was: 0.14.0) > [Plasma] Return status codes instead of crashing > > > Key: ARROW-2365 > URL: https://issues.apache.org/jira/browse/ARROW-2365 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Plasma >Reporter: Antoine Pitrou >Priority: Major > > When certain {{PlasmaClient}} methods are called with bad arguments, > PlasmaClient crashes instead of returning an error Status. For example, try > calling {{Seal()}} with a non-existent object id. > This is hostile towards users of high-level languages such as Python. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2339) [Python] Add a fast path for int hashing
[ https://issues.apache.org/jira/browse/ARROW-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2339: Fix Version/s: (was: 0.14.0) > [Python] Add a fast path for int hashing > > > Key: ARROW-2339 > URL: https://issues.apache.org/jira/browse/ARROW-2339 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Alex Hagerman >Priority: Minor > > Create a __hash__ fast path for Int scalars that avoids using as_py(). > > https://issues.apache.org/jira/browse/ARROW-640 > [https://github.com/apache/arrow/pull/1765/files/4497b69db8039cfeaa7a25f593f3a3e6c7984604] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2366) [Python] Support reading Parquet files having a permutation of column order
[ https://issues.apache.org/jira/browse/ARROW-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852499#comment-16852499 ] Wes McKinney commented on ARROW-2366: - This will need to be addressed as part of general schema conformance in the C++ Datasets API cc [~pitrou] [~npr] > [Python] Support reading Parquet files having a permutation of column order > --- > > Key: ARROW-2366 > URL: https://issues.apache.org/jira/browse/ARROW-2366 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Uwe L. Korn >Priority: Major > Labels: datasets, parquet > Fix For: 0.14.0 > > > See discussion in https://github.com/dask/fastparquet/issues/320 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2367) [Python] ListArray has trouble with sizes greater than kMaximumCapacity
[ https://issues.apache.org/jira/browse/ARROW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2367: Fix Version/s: (was: 0.14.0) 0.15.0 > [Python] ListArray has trouble with sizes greater than kMaximumCapacity > --- > > Key: ARROW-2367 > URL: https://issues.apache.org/jira/browse/ARROW-2367 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: Bryant Menn >Assignee: Wes McKinney >Priority: Major > Fix For: 0.15.0 > > > When creating a Pandas dataframe with lists as elements as a column the > following error occurs when converting to a {{pyarrow.Table}} object. > {code} > Traceback (most recent call last): > File "arrow-2227.py", line 16, in > arr = pa.array(df['strings'], from_pandas=True) > File "array.pxi", line 177, in pyarrow.lib.array > File "error.pxi", line 77, in pyarrow.lib.check_status > File "error.pxi", line 77, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: BinaryArray cannot contain more than 2147483646 > bytes, have 2147483647 > {code} > The following code was used to generate the error (adapted from ARROW-2227): > {code} > import pandas as pd > import pyarrow as pa > # Commented lines were used to test non-binary data types, both cause the > same error > v1 = b'x' * 1 > v2 = b'x' * 147483646 > # v1 = 'x' * 1 > # v2 = 'x' * 147483646 > df = pd.DataFrame({ > 'strings': [[v1]] * 20 + [[v2]] + [[b'x']] > # 'strings': [[v1]] * 20 + [[v2]] + [['x']] > }) > arr = pa.array(df['strings'], from_pandas=True) > assert isinstance(arr, pa.ChunkedArray), type(arr) > {code} > Code was run using Python 3.6 with PyArrow installed from conda-forge on > macOS High Sierra. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2410) [JS] Add DataFrame.scanAsync
[ https://issues.apache.org/jira/browse/ARROW-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852500#comment-16852500 ] Wes McKinney commented on ARROW-2410: - [~bhulette] [~paul.e.taylor] of interest for 0.14? > [JS] Add DataFrame.scanAsync > > > Key: ARROW-2410 > URL: https://issues.apache.org/jira/browse/ARROW-2410 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Brian Hulette >Priority: Major > Fix For: 0.14.0 > > > Add a version of `DataFrame.scan`, `scanAsync` that yields periodically. The > yield frequency could be specified either as a number of record batches, or a > number of records. > This scan should also be cancellable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2379) [Plasma] PlasmaClient::Info() should return whether an object is in use
[ https://issues.apache.org/jira/browse/ARROW-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2379: Fix Version/s: (was: 0.14.0) > [Plasma] PlasmaClient::Info() should return whether an object is in use > --- > > Key: ARROW-2379 > URL: https://issues.apache.org/jira/browse/ARROW-2379 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Reporter: Antoine Pitrou >Priority: Major > > It can be useful to know whether a given object is already in use by the > local client. > See https://github.com/apache/arrow/pull/1807#discussion_r178611472 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2366) [Python] Support reading Parquet files having a permutation of column order
[ https://issues.apache.org/jira/browse/ARROW-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2366: Labels: datasets parquet (was: parquet) > [Python] Support reading Parquet files having a permutation of column order > --- > > Key: ARROW-2366 > URL: https://issues.apache.org/jira/browse/ARROW-2366 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Uwe L. Korn >Priority: Major > Labels: datasets, parquet > Fix For: 0.14.0 > > > See discussion in https://github.com/dask/fastparquet/issues/320 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4343) [C++] Add as complete as possible Ubuntu Trusty / 14.04 build to docker-compose setup
[ https://issues.apache.org/jira/browse/ARROW-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852576#comment-16852576 ] Wes McKinney commented on ARROW-4343: - What does it mean now that Ubuntu Trusty is no longer an LTS release? > [C++] Add as complete as possible Ubuntu Trusty / 14.04 build to > docker-compose setup > - > > Key: ARROW-4343 > URL: https://issues.apache.org/jira/browse/ARROW-4343 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > Until we formally stop supporting Trusty it would be useful to be able to > verify in Docker that builds work there. I still have an Ubuntu 14.04 machine > that I use (and I've been filing bugs that I find on it) but not sure for how > much longer -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4350) [Python] nested numpy arrays
[ https://issues.apache.org/jira/browse/ARROW-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852577#comment-16852577 ] Wes McKinney commented on ARROW-4350: - [~jorisvandenbossche] could you take a look and maybe clarify the issue title etc.? > [Python] nested numpy arrays > > > Key: ARROW-4350 > URL: https://issues.apache.org/jira/browse/ARROW-4350 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.11.1, 0.12.0 >Reporter: yu peng >Priority: Major > Fix For: 0.14.0 > > > {code:java} > In [19]: df = pd.DataFrame({'a': [[[1], [2]], [[2], [3]]], 'b': [1, 2]}) > In [20]: df.iloc[0].to_dict() > Out[20]: {'a': [[1], [2]], 'b': 1} > In [21]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict() > Out[21]: {'a': array([array([1]), array([2])], dtype=object), 'b': 1} > In [24]: np.array(df.iloc[0].to_dict()['a']).shape > Out[24]: (2, 1) > In [25]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict()['a'].shape > Out[25]: (2,) > {code} > Adding extra array type is not functioning as expected. > > More importantly, this would fail > > {code:java} > In [108]: df = pd.DataFrame({'a': [[[1, 2],[2, 3]], [[1,2], [2, 3]]], 'b': > [[1, 2],[2, 3]]}) > In [109]: df > Out[109]: > a b > 0 [[1, 2], [2, 3]] [1, 2] > 1 [[1, 2], [2, 3]] [2, 3] > In [110]: pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas()) > --- > ArrowTypeError Traceback (most recent call last) > in () > > 1 pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas()) > /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/table.pxi > in pyarrow.lib.Table.from_pandas() > 1215 > 1216 """ > -> 1217 names, arrays, metadata = pdcompat.dataframe_to_arrays( > 1218 df, > 1219 schema=schema, > /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc > in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe) > 379 arrays = [convert_column(c, t) > 380 for c, t in zip(columns_to_convert, > --> 381 convert_types)] > 382 else: > 383 from concurrent import futures > /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc > in convert_column(col, ty) > 374 e.args += ("Conversion failed for column {0!s} with type {1!s}" > 375 .format(col.name, col.dtype),) > --> 376 raise e > 377 > 378 if nthreads == 1: > ArrowTypeError: ('only size-1 arrays can be converted to Python scalars', > 'Conversion failed for column a with type object') > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4333) [C++] Sketch out design for kernels and "query" execution in compute layer
[ https://issues.apache.org/jira/browse/ARROW-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4333: Fix Version/s: (was: 0.14.0) > [C++] Sketch out design for kernels and "query" execution in compute layer > -- > > Key: ARROW-4333 > URL: https://issues.apache.org/jira/browse/ARROW-4333 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Micah Kornfield >Priority: Major > Labels: analytics > > It would be good to formalize the design of kernels and the controlling query > execution layer (e.g. volcano batch model?) to understand the following: > Contracts for kernels: > * Thread safety of kernels? > * When Kernels should allocate memory vs expect preallocated memory? How to > communicate requirements for a kernels memory allocaiton? > * How to communicate the whether a kernels execution is parallelizable > across a ChunkedArray? How to determine if the order to execution across a > ChunkedArray is important? > * How to communicate when it is safe to re-use the same buffers and input > and output to the same kernel? > What does the threading model look like for the higher level of control? > Where should synchronization happen? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4337) [C#] Array / RecordBatch Builder Fluent API
[ https://issues.apache.org/jira/browse/ARROW-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4337: Fix Version/s: (was: 0.14.0) > [C#] Array / RecordBatch Builder Fluent API > --- > > Key: ARROW-4337 > URL: https://issues.apache.org/jira/browse/ARROW-4337 > Project: Apache Arrow > Issue Type: Improvement > Components: C# >Reporter: Chris Hutchinson >Assignee: Chris Hutchinson >Priority: Major > Labels: c#, pull-request-available > Original Estimate: 12h > Time Spent: 5h 10m > Remaining Estimate: 6h 50m > > Implement a fluent API for building arrays and record batches from Arrow > buffers, flat arrays, spans, enumerables, etc. > A future implementation could extend this API with support for ADO.NET > DataTables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4324) [Python] Array dtype inference incorrect when created from list of mixed numpy scalars
[ https://issues.apache.org/jira/browse/ARROW-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852574#comment-16852574 ] Wes McKinney commented on ARROW-4324: - [~jorisvandenbossche] could you take a look? > [Python] Array dtype inference incorrect when created from list of mixed > numpy scalars > -- > > Key: ARROW-4324 > URL: https://issues.apache.org/jira/browse/ARROW-4324 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.11.1 >Reporter: Keith Kraus >Priority: Minor > Fix For: 0.14.0 > > > Minimal reproducer: > {code:python} > import pyarrow as pa > import numpy as np > test_list = [np.dtype('int32').type(10), np.dtype('float32').type(0.5)] > test_array = pa.array(test_list) > # Expected > # test_array > # > # [ > # 10, > # 0.5 > # ] > # Got > # test_array > # > # [ > # 10, > # 0 > # ] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5458) Apache Arrow parallel CRC32c computation optimization
Yuqi Gu created ARROW-5458: -- Summary: Apache Arrow parallel CRC32c computation optimization Key: ARROW-5458 URL: https://issues.apache.org/jira/browse/ARROW-5458 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Yuqi Gu ARMv8 defines VMULL/PMULL crypto instruction. This patch optimizes crc32c calculate with the instruction when available rather than original linear crc instructions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5453) [C++] Just-released cmake-format 0.5.2 breaks the build
[ https://issues.apache.org/jira/browse/ARROW-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5453: -- Labels: pull-request-available (was: ) > [C++] Just-released cmake-format 0.5.2 breaks the build > --- > > Key: ARROW-5453 > URL: https://issues.apache.org/jira/browse/ARROW-5453 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Blocker > Labels: pull-request-available > Fix For: 0.14.0 > > > It seems we should always pin the cmake-format version until the developers > stop changing the formatting algorithm -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2652) [C++/Python] Document how to provide information on segfaults
[ https://issues.apache.org/jira/browse/ARROW-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2652: Fix Version/s: (was: 0.14.0) > [C++/Python] Document how to provide information on segfaults > - > > Key: ARROW-2652 > URL: https://issues.apache.org/jira/browse/ARROW-2652 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Documentation, Python >Reporter: Uwe L. Korn >Priority: Major > > We often have users that report segmentation faults in {{pyarrow}}. This will > sadly keep reappearing as we also don't have the magical ability of writing > 100%-bug-free code. Thus we should have a small section in our documentation > on how people can give us the relevant information in the case of a > segmentation fault. Preferably the documentation covers {{gdb}} and {{lldb}}. > They both have similar commands but differ in some minor flags. > For one of the example comments I gave to a user in tickets see > https://github.com/apache/arrow/issues/2089#issuecomment-393477116 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-2606) [Java/Python] Add unit test for pyarrow.decimal128 in Array.from_jvm
[ https://issues.apache.org/jira/browse/ARROW-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-2606: --- Assignee: (was: Uwe L. Korn) > [Java/Python] Add unit test for pyarrow.decimal128 in Array.from_jvm > - > > Key: ARROW-2606 > URL: https://issues.apache.org/jira/browse/ARROW-2606 > Project: Apache Arrow > Issue Type: New Feature > Components: Java, Python >Reporter: Uwe L. Korn >Priority: Major > Fix For: 0.14.0 > > > Follow-up after https://issues.apache.org/jira/browse/ARROW-2249. We need to > find the correct code to construct Java decimals and fill them into a > {{DecimalVector}}. Afterwards, we should activate the decimal128 type on > {{test_jvm_array}} and ensure that we load them correctly from Java into > Python. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-2610) [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm
[ https://issues.apache.org/jira/browse/ARROW-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-2610: --- Assignee: (was: Uwe L. Korn) > [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm > --- > > Key: ARROW-2610 > URL: https://issues.apache.org/jira/browse/ARROW-2610 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Priority: Major > > The DictionaryType is a bit more complex as it also references the dictionary > values itself. This also needs to be integrated into > {{pyarrow.Field.from_jvm}} but the work to make DictionaryType working maybe > also depends on that {{pyarrow.Array.from_jvm}} first supports non-primitive > arrays. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2619) [Rust] Move JSON serde code to separate file/module
[ https://issues.apache.org/jira/browse/ARROW-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852507#comment-16852507 ] Wes McKinney commented on ARROW-2619: - Still of interest for 0.14? > [Rust] Move JSON serde code to separate file/module > --- > > Key: ARROW-2619 > URL: https://issues.apache.org/jira/browse/ARROW-2619 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > Fix For: 0.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2587) [Python] Unable to write StructArrays with multiple children to parquet
[ https://issues.apache.org/jira/browse/ARROW-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852504#comment-16852504 ] Wes McKinney commented on ARROW-2587: - Nested Parquet is not yet on my immediate critical path, but it will be eventually (hopefully in 2019) > [Python] Unable to write StructArrays with multiple children to parquet > --- > > Key: ARROW-2587 > URL: https://issues.apache.org/jira/browse/ARROW-2587 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: jacques >Priority: Major > Labels: parquet > Fix For: 0.15.0 > > Attachments: Screen Shot 2018-05-16 at 12.24.39.png > > > Although I am able to read StructArray from parquet, I am still unable to > write it back from pa.Table to parquet. > I get an "ArrowInvalid: Nested column branch had multiple children" > Here is a quick example: > {noformat} > In [2]: import pyarrow.parquet as pq > In [3]: table = pq.read_table('test.parquet') > In [4]: table > Out[4]: > pyarrow.Table > weight: double > animal_type: string > animal_interpretation: struct > child 0, is_large_animal: bool > child 1, is_mammal: bool > metadata > > \{'org.apache.spark.sql.parquet.row.metadata': > '{"type":"struct","fields":[{"name":"weight","type":"double","nullable":true,"metadata":{}},\{"name":"animal_type","type":"string","nullable":true,"metadata":{}},{"name":"animal_interpretation","type":{"type":"struct","fields":[\\{"name":"is_large_animal","type":"boolean","nullable":true,"metadata":{}},\\\{"name":"is_mammal","type":"boolean","nullable":true,"metadata":{}}]},"nullable":false,"metadata":{}}]}'} > In [5]: table.schema > Out[5]: > weight: double > animal_type: string > animal_interpretation: struct > child 0, is_large_animal: bool > child 1, is_mammal: bool > metadata > > \{'org.apache.spark.sql.parquet.row.metadata': > '{"type":"struct","fields":[{"name":"weight","type":"double","nullable":true,"metadata":{}},\{"name":"animal_type","type":"string","nullable":true,"metadata":{}},{"name":"animal_interpretation","type":{"type":"struct","fields":[\\{"name":"is_large_animal","type":"boolean","nullable":true,"metadata":{}},\\\{"name":"is_mammal","type":"boolean","nullable":true,"metadata":{}}]},"nullable":false,"metadata":{}}]}'} > In [6]: pq.write_table(table,"test_write.parquet") > --- > ArrowInvalid Traceback (most recent call last) > in () > > 1 pq.write_table(table,"test_write.parquet") > /usr/local/lib/python2.7/dist-packages/pyarrow/parquet.pyc in > write_table(table, where, row_group_size, version, use_dictionary, > compression, use_deprecated_int96_timestamps, coerce_timestamps, flavor, > **kwargs) > 982 use_deprecated_int96_timestamps=use_int96, > 983 **kwargs) as writer: > --> 984 writer.write_table(table, row_group_size=row_group_size) > 985 except Exception: > 986 if is_path(where): > /usr/local/lib/python2.7/dist-packages/pyarrow/parquet.pyc in > write_table(self, table, row_group_size) > 325 table = _sanitize_table(table, self.schema, self.flavor) > 326 assert self.is_open > --> 327 self.writer.write_table(table, row_group_size=row_group_size) > 328 > 329 def close(self): > /usr/local/lib/python2.7/dist-packages/pyarrow/_parquet.so in > pyarrow._parquet.ParquetWriter.write_table() > /usr/local/lib/python2.7/dist-packages/pyarrow/lib.so in > pyarrow.lib.check_status() > ArrowInvalid: Nested column branch had multiple children > {noformat} > > I would really appreciate a fix on this. > Best, > Jacques -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2610) [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm
[ https://issues.apache.org/jira/browse/ARROW-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852506#comment-16852506 ] Wes McKinney commented on ARROW-2610: - This should be less complex now after the recent DictionaryType changes > [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm > --- > > Key: ARROW-2610 > URL: https://issues.apache.org/jira/browse/ARROW-2610 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Fix For: 0.14.0 > > > The DictionaryType is a bit more complex as it also references the dictionary > values itself. This also needs to be integrated into > {{pyarrow.Field.from_jvm}} but the work to make DictionaryType working maybe > also depends on that {{pyarrow.Array.from_jvm}} first supports non-primitive > arrays. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2609) [Java/Python] Complex type conversion in pyarrow.Field.from_jvm
[ https://issues.apache.org/jira/browse/ARROW-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2609: Fix Version/s: (was: 0.14.0) > [Java/Python] Complex type conversion in pyarrow.Field.from_jvm > --- > > Key: ARROW-2609 > URL: https://issues.apache.org/jira/browse/ARROW-2609 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Priority: Major > > The converter {{pyarrow.Field.from_jvm}} currently only works for primitive > types. Types like List, Struct or Union that have children in their > definition are not supported. We should add the needed recursion for these > types and enable the respective tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2606) [Java/Python] Add unit test for pyarrow.decimal128 in Array.from_jvm
[ https://issues.apache.org/jira/browse/ARROW-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2606: Fix Version/s: (was: 0.14.0) > [Java/Python] Add unit test for pyarrow.decimal128 in Array.from_jvm > - > > Key: ARROW-2606 > URL: https://issues.apache.org/jira/browse/ARROW-2606 > Project: Apache Arrow > Issue Type: New Feature > Components: Java, Python >Reporter: Uwe L. Korn >Priority: Major > > Follow-up after https://issues.apache.org/jira/browse/ARROW-2249. We need to > find the correct code to construct Java decimals and fill them into a > {{DecimalVector}}. Afterwards, we should activate the decimal128 type on > {{test_jvm_array}} and ensure that we load them correctly from Java into > Python. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2605) [Java/Python] Add unit test for pyarrow.timeX types in Array.from_jvm
[ https://issues.apache.org/jira/browse/ARROW-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852505#comment-16852505 ] Wes McKinney commented on ARROW-2605: - cc [~jorisvandenbossche] > [Java/Python] Add unit test for pyarrow.timeX types in Array.from_jvm > - > > Key: ARROW-2605 > URL: https://issues.apache.org/jira/browse/ARROW-2605 > Project: Apache Arrow > Issue Type: New Feature > Components: Java, Python >Reporter: Uwe L. Korn >Priority: Major > > Follow-up after https://issues.apache.org/jira/browse/ARROW-2249 as we are > missing the necessary methods to construct these arrays conveniently on the > Python side. > Once there is a path to construct {{pyarrow.Array}} instances from a Python > list of {{datetime.time}} for the various time types, we should activate the > time types on {{test_jvm_array}} and ensure that we load them correctly from > Java into Python. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2587) [Python] Unable to write StructArrays with multiple children to parquet
[ https://issues.apache.org/jira/browse/ARROW-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2587: Fix Version/s: (was: 0.14.0) 0.15.0 > [Python] Unable to write StructArrays with multiple children to parquet > --- > > Key: ARROW-2587 > URL: https://issues.apache.org/jira/browse/ARROW-2587 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: jacques >Priority: Major > Labels: parquet > Fix For: 0.15.0 > > Attachments: Screen Shot 2018-05-16 at 12.24.39.png > > > Although I am able to read StructArray from parquet, I am still unable to > write it back from pa.Table to parquet. > I get an "ArrowInvalid: Nested column branch had multiple children" > Here is a quick example: > {noformat} > In [2]: import pyarrow.parquet as pq > In [3]: table = pq.read_table('test.parquet') > In [4]: table > Out[4]: > pyarrow.Table > weight: double > animal_type: string > animal_interpretation: struct > child 0, is_large_animal: bool > child 1, is_mammal: bool > metadata > > \{'org.apache.spark.sql.parquet.row.metadata': > '{"type":"struct","fields":[{"name":"weight","type":"double","nullable":true,"metadata":{}},\{"name":"animal_type","type":"string","nullable":true,"metadata":{}},{"name":"animal_interpretation","type":{"type":"struct","fields":[\\{"name":"is_large_animal","type":"boolean","nullable":true,"metadata":{}},\\\{"name":"is_mammal","type":"boolean","nullable":true,"metadata":{}}]},"nullable":false,"metadata":{}}]}'} > In [5]: table.schema > Out[5]: > weight: double > animal_type: string > animal_interpretation: struct > child 0, is_large_animal: bool > child 1, is_mammal: bool > metadata > > \{'org.apache.spark.sql.parquet.row.metadata': > '{"type":"struct","fields":[{"name":"weight","type":"double","nullable":true,"metadata":{}},\{"name":"animal_type","type":"string","nullable":true,"metadata":{}},{"name":"animal_interpretation","type":{"type":"struct","fields":[\\{"name":"is_large_animal","type":"boolean","nullable":true,"metadata":{}},\\\{"name":"is_mammal","type":"boolean","nullable":true,"metadata":{}}]},"nullable":false,"metadata":{}}]}'} > In [6]: pq.write_table(table,"test_write.parquet") > --- > ArrowInvalid Traceback (most recent call last) > in () > > 1 pq.write_table(table,"test_write.parquet") > /usr/local/lib/python2.7/dist-packages/pyarrow/parquet.pyc in > write_table(table, where, row_group_size, version, use_dictionary, > compression, use_deprecated_int96_timestamps, coerce_timestamps, flavor, > **kwargs) > 982 use_deprecated_int96_timestamps=use_int96, > 983 **kwargs) as writer: > --> 984 writer.write_table(table, row_group_size=row_group_size) > 985 except Exception: > 986 if is_path(where): > /usr/local/lib/python2.7/dist-packages/pyarrow/parquet.pyc in > write_table(self, table, row_group_size) > 325 table = _sanitize_table(table, self.schema, self.flavor) > 326 assert self.is_open > --> 327 self.writer.write_table(table, row_group_size=row_group_size) > 328 > 329 def close(self): > /usr/local/lib/python2.7/dist-packages/pyarrow/_parquet.so in > pyarrow._parquet.ParquetWriter.write_table() > /usr/local/lib/python2.7/dist-packages/pyarrow/lib.so in > pyarrow.lib.check_status() > ArrowInvalid: Nested column branch had multiple children > {noformat} > > I would really appreciate a fix on this. > Best, > Jacques -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2501) [Java] Remove Jackson from compile-time dependencies for arrow-vector
[ https://issues.apache.org/jira/browse/ARROW-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852501#comment-16852501 ] Wes McKinney commented on ARROW-2501: - Could this be done in 0.14? cc [~pravindra] [~siddteotia] > [Java] Remove Jackson from compile-time dependencies for arrow-vector > - > > Key: ARROW-2501 > URL: https://issues.apache.org/jira/browse/ARROW-2501 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Affects Versions: 0.9.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > I would like to upgrade Jackson to the latest version (2.9.5). If there are > no objections I will create a PR (it is literally just changing the version > number in the pom - no code changes required). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2512) [Python] Enable direct interaction of GPU Objects in Python
[ https://issues.apache.org/jira/browse/ARROW-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2512: Fix Version/s: (was: 0.14.0) > [Python] Enable direct interaction of GPU Objects in Python > --- > > Key: ARROW-2512 > URL: https://issues.apache.org/jira/browse/ARROW-2512 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma, GPU, Python >Reporter: William Paul >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Plasma can now manage objects on the GPU, but in order to use this > functionality in Python, there needs to be some way to represent these GPU > objects in Python that allows computation on the GPU. > The easiest way to enable this is to rely on a third party library, such as > Pytorch, which will allow us to use all of its existing functionality. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2532) [C++] Add chunked builder classes
[ https://issues.apache.org/jira/browse/ARROW-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2532: Fix Version/s: (was: 0.14.0) > [C++] Add chunked builder classes > - > > Key: ARROW-2532 > URL: https://issues.apache.org/jira/browse/ARROW-2532 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.9.0 >Reporter: Antoine Pitrou >Priority: Major > > I think it would be useful to have chunked builders for list, string and > binary types. A chunked builder would produce a chunked array as output, > circumventing the 32-bit offset limit of those types. There's some > special-casing scatterred around our Numpy conversion routines right now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-2607) [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm
[ https://issues.apache.org/jira/browse/ARROW-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-2607: --- Assignee: (was: Uwe L. Korn) > [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm > --- > > Key: ARROW-2607 > URL: https://issues.apache.org/jira/browse/ARROW-2607 > Project: Apache Arrow > Issue Type: New Feature > Components: Java, Python >Reporter: Uwe L. Korn >Priority: Major > > Follow-up after https://issues.apache.org/jira/browse/ARROW-2249: Currently > only primitive arrays are supported in {{pyarrow.Array.from_jvm}} as it uses > {{pyarrow.Array.from_buffers}} underneath. We should extend one of the two > functions to be able to deal with string arrays. There is a currently failing > unit test {{test_jvm_string_array}} in {{pyarrow/tests/test_jvm.py}} to > verify the implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-2609) [Java/Python] Complex type conversion in pyarrow.Field.from_jvm
[ https://issues.apache.org/jira/browse/ARROW-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-2609: --- Assignee: (was: Uwe L. Korn) > [Java/Python] Complex type conversion in pyarrow.Field.from_jvm > --- > > Key: ARROW-2609 > URL: https://issues.apache.org/jira/browse/ARROW-2609 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Priority: Major > Fix For: 0.14.0 > > > The converter {{pyarrow.Field.from_jvm}} currently only works for primitive > types. Types like List, Struct or Union that have children in their > definition are not supported. We should add the needed recursion for these > types and enable the respective tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2605) [Java/Python] Add unit test for pyarrow.timeX types in Array.from_jvm
[ https://issues.apache.org/jira/browse/ARROW-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2605: Fix Version/s: (was: 0.14.0) > [Java/Python] Add unit test for pyarrow.timeX types in Array.from_jvm > - > > Key: ARROW-2605 > URL: https://issues.apache.org/jira/browse/ARROW-2605 > Project: Apache Arrow > Issue Type: New Feature > Components: Java, Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > > Follow-up after https://issues.apache.org/jira/browse/ARROW-2249 as we are > missing the necessary methods to construct these arrays conveniently on the > Python side. > Once there is a path to construct {{pyarrow.Array}} instances from a Python > list of {{datetime.time}} for the various time types, we should activate the > time types on {{test_jvm_array}} and ensure that we load them correctly from > Java into Python. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2610) [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm
[ https://issues.apache.org/jira/browse/ARROW-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2610: Fix Version/s: (was: 0.14.0) > [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm > --- > > Key: ARROW-2610 > URL: https://issues.apache.org/jira/browse/ARROW-2610 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > > The DictionaryType is a bit more complex as it also references the dictionary > values itself. This also needs to be integrated into > {{pyarrow.Field.from_jvm}} but the work to make DictionaryType working maybe > also depends on that {{pyarrow.Array.from_jvm}} first supports non-primitive > arrays. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-2605) [Java/Python] Add unit test for pyarrow.timeX types in Array.from_jvm
[ https://issues.apache.org/jira/browse/ARROW-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-2605: --- Assignee: (was: Uwe L. Korn) > [Java/Python] Add unit test for pyarrow.timeX types in Array.from_jvm > - > > Key: ARROW-2605 > URL: https://issues.apache.org/jira/browse/ARROW-2605 > Project: Apache Arrow > Issue Type: New Feature > Components: Java, Python >Reporter: Uwe L. Korn >Priority: Major > > Follow-up after https://issues.apache.org/jira/browse/ARROW-2249 as we are > missing the necessary methods to construct these arrays conveniently on the > Python side. > Once there is a path to construct {{pyarrow.Array}} instances from a Python > list of {{datetime.time}} for the various time types, we should activate the > time types on {{test_jvm_array}} and ensure that we load them correctly from > Java into Python. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2607) [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm
[ https://issues.apache.org/jira/browse/ARROW-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2607: Fix Version/s: (was: 0.14.0) > [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm > --- > > Key: ARROW-2607 > URL: https://issues.apache.org/jira/browse/ARROW-2607 > Project: Apache Arrow > Issue Type: New Feature > Components: Java, Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > > Follow-up after https://issues.apache.org/jira/browse/ARROW-2249: Currently > only primitive arrays are supported in {{pyarrow.Array.from_jvm}} as it uses > {{pyarrow.Array.from_buffers}} underneath. We should extend one of the two > functions to be able to deal with string arrays. There is a currently failing > unit test {{test_jvm_string_array}} in {{pyarrow/tests/test_jvm.py}} to > verify the implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2600) [Python] Add additional LocalFileSystem filesystem methods
[ https://issues.apache.org/jira/browse/ARROW-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2600: Fix Version/s: (was: 0.14.0) 0.15.0 > [Python] Add additional LocalFileSystem filesystem methods > -- > > Key: ARROW-2600 > URL: https://issues.apache.org/jira/browse/ARROW-2600 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Alex Hagerman >Priority: Minor > Labels: filesystem, pull-request-available > Fix For: 0.15.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Related to https://issues.apache.org/jira/browse/ARROW-1319 I noticed the > methods Martin listed are also not part of the LocalFileSystem class. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3277) [Python] Validate manylinux1 builds with crossbow instead of each Travis CI build
[ https://issues.apache.org/jira/browse/ARROW-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3277: Fix Version/s: (was: 0.14.0) 0.15.0 > [Python] Validate manylinux1 builds with crossbow instead of each Travis CI > build > - > > Key: ARROW-3277 > URL: https://issues.apache.org/jira/browse/ARROW-3277 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.15.0 > > > The recent manylinxu1 timeouts bring up a bigger question which is > centralizing the validation of packaging builds. We definitely want the > project to be notified in a timely way when there is some problem with a > packaging build -- since manylinux1 can be run locally in Docker, it is > easier to debug and need not necessarily be run on every commit -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3378) [C++] Implement whitespace CSV tokenizer
[ https://issues.apache.org/jira/browse/ARROW-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3378: Fix Version/s: (was: 0.14.0) 0.15.0 > [C++] Implement whitespace CSV tokenizer > > > Key: ARROW-3378 > URL: https://issues.apache.org/jira/browse/ARROW-3378 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.15.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3332) [Gandiva] Remove usages of mutable reference out arguments
[ https://issues.apache.org/jira/browse/ARROW-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3332: Fix Version/s: (was: 0.14.0) > [Gandiva] Remove usages of mutable reference out arguments > -- > > Key: ARROW-3332 > URL: https://issues.apache.org/jira/browse/ARROW-3332 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, C++ - Gandiva >Reporter: Wes McKinney >Priority: Major > > I have noticed several usages of mutable reference out arguments, e.g. > gandiva/regex_util.h. We should change these to conform to the style guide > (out arguments as pointers) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3359) [Gandiva][C++] Nest gandiva inside arrow namespace?
[ https://issues.apache.org/jira/browse/ARROW-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3359: Fix Version/s: (was: 0.14.0) > [Gandiva][C++] Nest gandiva inside arrow namespace? > --- > > Key: ARROW-3359 > URL: https://issues.apache.org/jira/browse/ARROW-3359 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, C++ - Gandiva >Reporter: Wes McKinney >Priority: Major > > This would make for more readable code by making symbols from the outer scope > visible -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3379) [C++] Implement regex/multichar delimiter tokenizer
[ https://issues.apache.org/jira/browse/ARROW-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3379: Labels: csv datasets (was: csv) > [C++] Implement regex/multichar delimiter tokenizer > --- > > Key: ARROW-3379 > URL: https://issues.apache.org/jira/browse/ARROW-3379 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: csv, datasets > Fix For: 0.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-5445) [Website] Remove language that encourages pinning a version
[ https://issues.apache.org/jira/browse/ARROW-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou closed ARROW-5445. --- Resolution: Won't Fix https://github.com/apache/arrow/pull/4411#discussion_r288957237 {quote} Version pinning is commonplace in the Python world -- I don't think API stability has much to do with it (we will still have some API changes or deprecations after 1.0 I would guess) {quote} > [Website] Remove language that encourages pinning a version > --- > > Key: ARROW-5445 > URL: https://issues.apache.org/jira/browse/ARROW-5445 > Project: Apache Arrow > Issue Type: Improvement > Components: Website >Reporter: Neal Richardson >Priority: Minor > Fix For: 1.0.0 > > > See [https://github.com/apache/arrow/pull/4411#discussion_r288804415]. > Whenever we decide to stop threatening to break APIs (1.0 release or > otherwise), purge any recommendations like this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)