[jira] [Updated] (ARROW-6309) [C++] Parquet tests and executables are linked statically

2019-08-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6309: -- Labels: pull-request-available (was: ) > [C++] Parquet tests and executables are linked

[jira] [Commented] (ARROW-6309) [C++] Parquet tests and executables are linked statically

2019-08-21 Thread Sutou Kouhei (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912724#comment-16912724 ] Sutou Kouhei commented on ARROW-6309: - We need to use static linking only on Windows, right? I've

[jira] [Commented] (ARROW-3263) [R] Use R sentinel values for missingness in addition to bitmask

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912749#comment-16912749 ] Wes McKinney commented on ARROW-3263: - Circling back on this discussion from a year ago. Now that we

[jira] [Resolved] (ARROW-6011) [Python] Data incomplete when using pyarrow in pyspark in python 3.x

2019-08-21 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved ARROW-6011. - Resolution: Cannot Reproduce I could not reproduce. We can continue the discussion in

[jira] [Closed] (ARROW-3685) [Python] Use fixed size binary for NumPy fixed-size string dtypes

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-3685. --- Resolution: Won't Fix Per [~pitrou]'s comments, if an explicit data type is not passed, the safest

[jira] [Updated] (ARROW-6299) [C++] Simplify FileFormat classes to singletons

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-6299: Labels: dataset (was: dataset datasets) > [C++] Simplify FileFormat classes to singletons >

[jira] [Updated] (ARROW-6244) [C++] Implement Partition DataSource

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-6244: Labels: dataset (was: dataset datasets) > [C++] Implement Partition DataSource >

[jira] [Updated] (ARROW-6238) [C++] Implement SimpleDataSource/SimpleDataFragment

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-6238: Labels: dataset pull-request-available (was: dataset datasets pull-request-available) > [C++]

[jira] [Updated] (ARROW-3379) [C++] Implement regex/multichar delimiter tokenizer

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3379: Labels: csv dataset (was: csv dataset datasets) > [C++] Implement regex/multichar delimiter

[jira] [Updated] (ARROW-6243) [C++] Implement basic Filter expression classes

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-6243: Labels: dataset pull-request-available (was: dataset datasets pull-request-available) > [C++]

[jira] [Updated] (ARROW-4076) [Python] schema validation and filters

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4076: Labels: dataset easyfix parquet pull-request-available (was: dataset datasets easyfix parquet

[jira] [Updated] (ARROW-3538) [Python] ability to override the automated assignment of uuid for filenames when writing datasets

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3538: Labels: dataset features parquet pull-request-available (was: dataset datasets features parquet

[jira] [Updated] (ARROW-6242) [C++] Implements basic Dataset/Scanner/ScannerBuilder

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-6242: Labels: dataset (was: dataset datasets) > [C++] Implements basic Dataset/Scanner/ScannerBuilder >

[jira] [Updated] (ARROW-6161) [C++] Implements dataset::ParquetFile and associated Scan structures

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-6161: Labels: dataset pull-request-available (was: dataset datasets pull-request-available) > [C++]

[jira] [Updated] (ARROW-3408) [C++] Add option to CSV reader to dictionary encode individual columns or all string / binary columns

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3408: Labels: csv dataset (was: csv dataset datasets) > [C++] Add option to CSV reader to dictionary

[jira] [Updated] (ARROW-2366) [Python][C++][Parquet] Support reading Parquet files having a permutation of column order

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2366: Labels: dataset parquet (was: dataset datasets parquet) > [Python][C++][Parquet] Support reading

[jira] [Updated] (ARROW-2882) [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2882: Labels: dataset parquet (was: dataset datasets parquet) > [C++][Python] Support AWS Firehose

[jira] [Updated] (ARROW-1089) [C++/Python] Add API to write an Arrow stream into either the stream or file formats on disk

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1089: Labels: dataset (was: dataset datasets) > [C++/Python] Add API to write an Arrow stream into

[jira] [Commented] (ARROW-4967) [C++] Parquet: Object type and stats lost when using 96-bit timestamps

2019-08-21 Thread Deepak Majeti (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912811#comment-16912811 ] Deepak Majeti commented on ARROW-4967: -- The comments above are correct! INT96 type is deprecated and

[jira] [Commented] (ARROW-5454) [C++] Implement Take on ChunkedArray for DataFrame use

2019-08-21 Thread Artem KOZHEVNIKOV (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912675#comment-16912675 ] Artem KOZHEVNIKOV commented on ARROW-5454: -- if it were in pure python, we could do something

[jira] [Commented] (ARROW-3786) Enable merge_arrow_pr.py script to run in non-English JIRA accounts.

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912769#comment-16912769 ] Wes McKinney commented on ARROW-3786: - [~srowen] do you have any advice from Apache Spark since you

[jira] [Commented] (ARROW-3919) [Python] Support 64 bit indices for pyarrow.serialize and pyarrow.deserialize

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912771#comment-16912771 ] Wes McKinney commented on ARROW-3919: - Now that we have Large* types this can be implemented more

[jira] [Closed] (ARROW-4439) [C++] Improve FindBrotli.cmake

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-4439. --- Resolution: Invalid Please reopen with a description of the issue or create a new PR > [C++]

[jira] [Commented] (ARROW-4427) Move Confluence Wiki pages to the Sphinx docs

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912779#comment-16912779 ] Wes McKinney commented on ARROW-4427: - Seems like we've made a little progress here, but there's more

[jira] [Commented] (ARROW-1036) [C++] Define abstract API for filtering Arrow streams (e.g. predicate evaluation)

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912465#comment-16912465 ] Wes McKinney commented on ARROW-1036: - cc [~bkietz] as you think about application of filters to

[jira] [Commented] (ARROW-1636) [Format] Integration tests for null type

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912477#comment-16912477 ] Wes McKinney commented on ARROW-1636: - [~emkornfi...@gmail.com] [~liyafan] [~tianchen92] popping this

[jira] [Commented] (ARROW-1636) [Format] Integration tests for null type

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912479#comment-16912479 ] Wes McKinney commented on ARROW-1636: - See also ARROW-1638 for the Java impl of this > [Format]

[jira] [Commented] (ARROW-2296) [C++] Add num_rows to file footer

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912507#comment-16912507 ] Wes McKinney commented on ARROW-2296: - At minimum having a method in C++ to provide this information

[jira] [Updated] (ARROW-2296) [C++] Add num_rows to file footer

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2296: Fix Version/s: 0.15.0 > [C++] Add num_rows to file footer > - > >

[jira] [Commented] (ARROW-1636) [Format] Integration tests for null type

2019-08-21 Thread Ji Liu (Jira)
[ https://issues.apache.org/jira/browse/ARROW-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912526#comment-16912526 ] Ji Liu commented on ARROW-1636: --- ok, we'll take a close watch at this issue:). > [Format] Integration

[jira] [Resolved] (ARROW-6291) [C++] CMake ignores ARROW_PARQUET

2019-08-21 Thread Sutou Kouhei (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sutou Kouhei resolved ARROW-6291. - Fix Version/s: 0.15.0 Resolution: Fixed Issue resolved by pull request 5154

[jira] [Updated] (ARROW-6291) [C++] CMake ignores ARROW_PARQUET

2019-08-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6291: -- Labels: pull-request-available (was: ) > [C++] CMake ignores ARROW_PARQUET >

[jira] [Assigned] (ARROW-6291) [C++] CMake ignores ARROW_PARQUET

2019-08-21 Thread Sutou Kouhei (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sutou Kouhei reassigned ARROW-6291: --- Assignee: Wes McKinney > [C++] CMake ignores ARROW_PARQUET >

[jira] [Assigned] (ARROW-6312) Declare required Libs.private in arrow.pc package config

2019-08-21 Thread Sutou Kouhei (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sutou Kouhei reassigned ARROW-6312: --- Assignee: Michael Maguire > Declare required Libs.private in arrow.pc package config >

[jira] [Commented] (ARROW-4848) [C++] Static libparquet not compiled with -DARROW_STATIC on Windows

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912796#comment-16912796 ] Wes McKinney commented on ARROW-4848: - This sounds right to me. [~jeroenooms] do you have some

[jira] [Updated] (ARROW-4848) [C++] Static libparquet not compiled with -DARROW_STATIC on Windows

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4848: Summary: [C++] Static libparquet not compiled with -DARROW_STATIC on Windows (was: Static

[jira] [Commented] (ARROW-4836) "Cannot tell() a compressed stream" when using RecordBatchStreamWriter

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912794#comment-16912794 ] Wes McKinney commented on ARROW-4836: - Indeed it seems like technically this should work. We would

[jira] [Updated] (ARROW-4848) Static libparquet not compiled with -DARROW_STATIC on Windows

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4848: Fix Version/s: 0.15.0 > Static libparquet not compiled with -DARROW_STATIC on Windows >

[jira] [Commented] (ARROW-3786) Enable merge_arrow_pr.py script to run in non-English JIRA accounts.

2019-08-21 Thread Sean Owen (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912795#comment-16912795 ] Sean Owen commented on ARROW-3786: -- Hm, I don't know much about the merge script, but of course the

[jira] [Updated] (ARROW-4836) "Cannot tell() a compressed stream" when using RecordBatchStreamWriter

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4836: Fix Version/s: 0.15.0 > "Cannot tell() a compressed stream" when using RecordBatchStreamWriter >

[jira] [Commented] (ARROW-4120) [Python] Define process for testing procedures that check for no macro-level memory leaks

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912776#comment-16912776 ] Wes McKinney commented on ARROW-4120: - In the context of ARROW-6060, we had a case with runaway peak

[jira] [Commented] (ARROW-4279) [C++] Rebase https://github.com/apache/parquet-cpp/pull/462# onto arrow repo

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912777#comment-16912777 ] Wes McKinney commented on ARROW-4279: - I think this task should be abandoned > [C++] Rebase

[jira] [Updated] (ARROW-4220) [Python] Add buffered input and output stream ASV benchmarks with simulated high latency IO

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4220: Fix Version/s: 0.15.0 > [Python] Add buffered input and output stream ASV benchmarks with

[jira] [Closed] (ARROW-4757) [C++] Nested chunked array support

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-4757. --- Resolution: Won't Fix Now that we have the Large* types this seems less needed. > [C++] Nested

[jira] [Closed] (ARROW-4779) [CI] AppVeyor link failure

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-4779. --- Resolution: Cannot Reproduce > [CI] AppVeyor link failure > -- > >

[jira] [Updated] (ARROW-4770) [C++][ORC] Enable copy free conversion for primitive types

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4770: Summary: [C++][ORC] Enable copy free conversion for primitive types (was: Enable copy free

[jira] [Updated] (ARROW-4930) [Python] Remove LIBDIR assumptions in Python build

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4930: Summary: [Python] Remove LIBDIR assumptions in Python build (was: Remove LIBDIR assumptions in

[jira] [Updated] (ARROW-4966) [C++] orc::TimezoneError Can't open /usr/share/zoneinfo/GMT-00:00

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4966: Summary: [C++] orc::TimezoneError Can't open /usr/share/zoneinfo/GMT-00:00 (was:

[jira] [Closed] (ARROW-4860) [C++] Build AWS C++ SDK for Windows in conda-forge

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-4860. --- Fix Version/s: (was: 1.0.0) 0.15.0 Resolution: Fixed Appears this was

[jira] [Updated] (ARROW-4880) [Python] python/asv-build.sh is probably broken after CMake refactor

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4880: Fix Version/s: (was: 1.0.0) 0.15.0 > [Python] python/asv-build.sh is

[jira] [Commented] (ARROW-3203) [C++] Build error on Debian Buster

2019-08-21 Thread Sutou Kouhei (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912822#comment-16912822 ] Sutou Kouhei commented on ARROW-3203: - Yes. This is outdated. > [C++] Build error on Debian Buster >

[jira] [Comment Edited] (ARROW-5454) [C++] Implement Take on ChunkedArray for DataFrame use

2019-08-21 Thread Artem KOZHEVNIKOV (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912675#comment-16912675 ] Artem KOZHEVNIKOV edited comment on ARROW-5454 at 8/21/19 9:03 PM: --- if

[jira] [Updated] (ARROW-3221) [C++][Python] Add a virtual Slice method to buffers

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3221: Fix Version/s: 0.15.0 > [C++][Python] Add a virtual Slice method to buffers >

[jira] [Closed] (ARROW-3232) [Python] Return an ndarray from Column.to_pandas

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-3232. --- Resolution: Won't Fix Column is no more > [Python] Return an ndarray from Column.to_pandas >

[jira] [Closed] (ARROW-3203) [C++] Build error on Debian Buster

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-3203. --- Resolution: Fixed We're successfully building for Debian Buster now so i think this is outdated cc

[jira] [Updated] (ARROW-3590) [Python] Expose Python API for start and end offset of row group in parquet file

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3590: Labels: dataset parquet (was: parquet) > [Python] Expose Python API for start and end offset of

[jira] [Updated] (ARROW-3543) [R] Time zone adjustment issue when reading Feather file written by Python

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3543: Priority: Major (was: Critical) > [R] Time zone adjustment issue when reading Feather file

[jira] [Updated] (ARROW-3604) [R] Support to collect int64 as ints

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3604: Fix Version/s: 1.0.0 > [R] Support to collect int64 as ints >

[jira] [Closed] (ARROW-3599) [C++] "infer" reports errors

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-3599. --- Resolution: Won't Fix > [C++] "infer" reports errors > > >

[jira] [Updated] (ARROW-3651) [Python] Datetimes from non-DateTimeIndex cannot be deserialized

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3651: Fix Version/s: 0.15.0 > [Python] Datetimes from non-DateTimeIndex cannot be deserialized >

[jira] [Assigned] (ARROW-4111) [Python] Create time types from Python sequences of integers

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-4111: --- Assignee: Wes McKinney > [Python] Create time types from Python sequences of integers >

[jira] [Updated] (ARROW-4111) [Python] Create time types from Python sequences of integers

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4111: Fix Version/s: (was: 1.0.0) 0.15.0 > [Python] Create time types from Python

[jira] [Closed] (ARROW-4083) [C++] Allowing ChunkedArrays to contain a mix of DictionaryArray and dense Array (of the dictionary type)

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-4083. --- Resolution: Won't Fix I will take care of this elsewhere when it is actually needed > [C++]

[jira] [Updated] (ARROW-4095) [C++] Implement optimizations for dictionary unification where dictionaries are prefixes of the unified dictionary

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4095: Fix Version/s: (was: 1.0.0) 0.15.0 > [C++] Implement optimizations for

[jira] [Closed] (ARROW-4470) [Python] Pyarrow using considerable more memory when reading partitioned Parquet file

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-4470. --- Resolution: Cannot Reproduce If you can provide a reproduction of the issue we can provide more

[jira] [Closed] (ARROW-4967) [C++] Parquet: Object type and stats lost when using 96-bit timestamps

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-4967. --- Resolution: Won't Fix Computation of statistics is disabled for INT96. We don't intend to do

[jira] [Commented] (ARROW-5932) [C++] undefined reference to `__cxa_init_primary_exception@CXXABI_1.3.11'

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912808#comment-16912808 ] Wes McKinney commented on ARROW-5932: - This usually suggests you have multiple libstdc++ versions on

[jira] [Updated] (ARROW-3154) [Python] Document how to write _metadata, _common_metadata files with Parquet datasets

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3154: Labels: dataset parquet (was: parquet) > [Python] Document how to write _metadata,

[jira] [Updated] (ARROW-3154) [Python][C++] Document how to write _metadata, _common_metadata files with Parquet datasets

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3154: Summary: [Python][C++] Document how to write _metadata, _common_metadata files with Parquet

[jira] [Commented] (ARROW-3154) [Python][C++] Document how to write _metadata, _common_metadata files with Parquet datasets

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912746#comment-16912746 ] Wes McKinney commented on ARROW-3154: - This issue may be subsumed with the broader migration to a

[jira] [Updated] (ARROW-6260) [Website] Use deploy key on Travis to build and push to asf-site

2019-08-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6260: -- Labels: pull-request-available (was: ) > [Website] Use deploy key on Travis to build and push

[jira] [Commented] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912773#comment-16912773 ] Wes McKinney commented on ARROW-3933: - Added to 0.15.0 milestone so I can take a quick look to assess

[jira] [Updated] (ARROW-3933) [Python] Segfault reading Parquet files from GNOMAD

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3933: Fix Version/s: 0.15.0 > [Python] Segfault reading Parquet files from GNOMAD >

[jira] [Updated] (ARROW-4726) [C++] IntToFloatingPoint tests disabled under 32bit builds

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4726: Fix Version/s: 1.0.0 > [C++] IntToFloatingPoint tests disabled under 32bit builds >

[jira] [Updated] (ARROW-4746) [C++/Python] PyDataTime_Date wrongly casted to PyDataTime_DateTime

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4746: Fix Version/s: 0.15.0 > [C++/Python] PyDataTime_Date wrongly casted to PyDataTime_DateTime >

[jira] [Updated] (ARROW-3424) [Python] Improved workflow for loading an arbitrary collection of Parquet files

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3424: Labels: dataset parquet (was: dataset datasets parquet) > [Python] Improved workflow for loading

[jira] [Updated] (ARROW-3410) [C++] Streaming CSV reader interface for memory-constrainted environments

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3410: Labels: dataset (was: ) > [C++] Streaming CSV reader interface for memory-constrainted

[jira] [Updated] (ARROW-3410) [C++][Dataset] Streaming CSV reader interface for memory-constrainted environments

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3410: Summary: [C++][Dataset] Streaming CSV reader interface for memory-constrainted environments (was:

[jira] [Updated] (ARROW-3424) [Python] Improved workflow for loading an arbitrary collection of Parquet files

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3424: Component/s: C++ > [Python] Improved workflow for loading an arbitrary collection of Parquet >

[jira] [Updated] (ARROW-3777) [Python] Implement a mock "high latency" filesystem

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3777: Fix Version/s: 0.15.0 > [Python] Implement a mock "high latency" filesystem >

[jira] [Updated] (ARROW-3764) [C++] Port Python "ParquetDataset" business logic to C++

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3764: Labels: dataset parquet (was: dataset datasets parquet) > [C++] Port Python "ParquetDataset"

[jira] [Updated] (ARROW-3705) [Python] Add "nrows" argument to parquet.read_table read indicated number of rows from file instead of whole file

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3705: Labels: dataset parquet (was: dataset datasets parquet) > [Python] Add "nrows" argument to

[jira] [Updated] (ARROW-3263) [R] Use R sentinel values for missingness in addition to bitmask

2019-08-21 Thread Neal Richardson (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-3263: --- Component/s: R > [R] Use R sentinel values for missingness in addition to bitmask >

[jira] [Commented] (ARROW-4359) [Python] Column metadata is not saved or loaded in parquet

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912778#comment-16912778 ] Wes McKinney commented on ARROW-4359: - This looks kinda buggy, maybe it's fixed now. I added to

[jira] [Updated] (ARROW-4359) [Python] Column metadata is not saved or loaded in parquet

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4359: Fix Version/s: 0.15.0 > [Python] Column metadata is not saved or loaded in parquet >

[jira] [Updated] (ARROW-4771) [C++][ORC] Enable copy free conversion for Composite type

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4771: Summary: [C++][ORC] Enable copy free conversion for Composite type (was: Enable copy free

[jira] [Closed] (ARROW-4809) [Python] import error with undefined symbol _ZNK5arrow6Status8ToStringB5xcc11Ev

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-4809. --- Resolution: Cannot Reproduce Please reopen if you have a reproduction for us to look at > [Python]

[jira] [Commented] (ARROW-2681) [C++] Use source releases when building ORC instead of using GitHub tag snapshots

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912805#comment-16912805 ] Wes McKinney commented on ARROW-2681: - I guess it's not so weird since we remove our old releases

[jira] [Updated] (ARROW-976) [C++][Python] Provide API for defining and reading Parquet datasets with more ad hoc partition schemes

2019-08-21 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-976: --- Summary: [C++][Python] Provide API for defining and reading Parquet datasets with more ad hoc

[jira] [Updated] (ARROW-3705) [Python] Add "nrows" argument to parquet.read_table read indicated number of rows from file instead of whole file

2019-08-21 Thread Francois Saint-Jacques (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques updated ARROW-3705: -- Labels: dataset datasets parquet (was: datasets parquet) > [Python] Add

[jira] [Updated] (ARROW-3379) [C++] Implement regex/multichar delimiter tokenizer

2019-08-21 Thread Francois Saint-Jacques (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques updated ARROW-3379: -- Labels: csv dataset datasets (was: csv datasets) > [C++] Implement

[jira] [Resolved] (ARROW-6289) [Java] Add empty() in UnionVector to create instance

2019-08-21 Thread Pindikura Ravindra (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pindikura Ravindra resolved ARROW-6289. --- Fix Version/s: 0.15.0 Resolution: Fixed Issue resolved by pull request 5115

[jira] [Updated] (ARROW-2801) [Python] Implement splt_row_groups for ParquetDataset

2019-08-21 Thread Francois Saint-Jacques (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques updated ARROW-2801: -- Labels: dataset datasets parquet pull-request-available (was: datasets

[jira] [Updated] (ARROW-3538) [Python] ability to override the automated assignment of uuid for filenames when writing datasets

2019-08-21 Thread Francois Saint-Jacques (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques updated ARROW-3538: -- Labels: dataset datasets features parquet pull-request-available (was:

[jira] [Updated] (ARROW-6238) [C++] Implement SimpleDataSource/SimpleDataFragment

2019-08-21 Thread Francois Saint-Jacques (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques updated ARROW-6238: -- Labels: dataset datasets pull-request-available (was: datasets

[jira] [Updated] (ARROW-6242) [C++] Implements basic Dataset/Scanner/ScannerBuilder

2019-08-21 Thread Francois Saint-Jacques (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques updated ARROW-6242: -- Labels: dataset datasets (was: datasets) > [C++] Implements basic

[jira] [Updated] (ARROW-3764) [C++] Port Python "ParquetDataset" business logic to C++

2019-08-21 Thread Francois Saint-Jacques (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques updated ARROW-3764: -- Labels: dataset datasets parquet (was: datasets parquet) > [C++] Port Python

[jira] [Updated] (ARROW-6161) [C++] Implements dataset::ParquetFile and associated Scan structures

2019-08-21 Thread Francois Saint-Jacques (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques updated ARROW-6161: -- Labels: dataset datasets pull-request-available (was: datasets

[jira] [Updated] (ARROW-3408) [C++] Add option to CSV reader to dictionary encode individual columns or all string / binary columns

2019-08-21 Thread Francois Saint-Jacques (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques updated ARROW-3408: -- Labels: csv dataset datasets (was: csv datasets) > [C++] Add option to CSV

[jira] [Updated] (ARROW-5830) [C++] Stop using memcmp in TensorEquals

2019-08-21 Thread lidavidm (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lidavidm updated ARROW-5830: Labels: beginner (was: ) > [C++] Stop using memcmp in TensorEquals >

  1   2   3   >