[jira] [Updated] (ARROW-8906) [Rust] Support reading multiple CSV files for schema inference
[ https://issues.apache.org/jira/browse/ARROW-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8906: -- Labels: pull-request-available (was: ) > [Rust] Support reading multiple CSV files for schema inference > -- > > Key: ARROW-8906 > URL: https://issues.apache.org/jira/browse/ARROW-8906 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: QP Hou >Assignee: QP Hou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8906) [Rust] Support reading multiple CSV files for schema inference
QP Hou created ARROW-8906: - Summary: [Rust] Support reading multiple CSV files for schema inference Key: ARROW-8906 URL: https://issues.apache.org/jira/browse/ARROW-8906 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: QP Hou Assignee: QP Hou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8901) [C++] Reduce number of take kernels
[ https://issues.apache.org/jira/browse/ARROW-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114518#comment-17114518 ] Wes McKinney commented on ARROW-8901: - We probably need at least int8 through int64 (so we can use take to unpack dictionaries). A different code path will probably be used for running "take" in a selection vector context (per ARROW-8903) > [C++] Reduce number of take kernels > --- > > Key: ARROW-8901 > URL: https://issues.apache.org/jira/browse/ARROW-8901 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > > After ARROW-8792 we can observe that we are generating 312 take kernels > {code} > In [1]: import pyarrow.compute as pc > > In [2]: reg = pc.function_registry() > > In [3]: reg.get_function('take') > > Out[3]: > arrow.compute.Function > kind: vector > num_kernels: 312 > {code} > You can see them all here: > https://gist.github.com/wesm/c3085bf40fa2ee5e555204f8c65b4ad5 > It's probably going to be sufficient to only support int16, int32, and int64 > index types for almost all types and insert implicit casts (once we implement > implicit-cast-insertion into the execution code) for other index types. If we > determine that there is some performance hot path where we need to specialize > for other index types, then we can always do that. > Additionally, we should be able to collapse the date/time kernels since we're > just moving memory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8905) [C++] Collapse Take APIs from 8 to 1 or 2
[ https://issues.apache.org/jira/browse/ARROW-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-8905: Description: There are currently 8 {{arrow::compute::Take}} functions with different function signatures. Fewer functions would make life easier for binding developers (was: There are currently 8 {{Take}} functions with different function signatures. Fewer functions would make life easier for binding developers) > [C++] Collapse Take APIs from 8 to 1 or 2 > - > > Key: ARROW-8905 > URL: https://issues.apache.org/jira/browse/ARROW-8905 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > There are currently 8 {{arrow::compute::Take}} functions with different > function signatures. Fewer functions would make life easier for binding > developers -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8905) [C++] Collapse Take APIs from 8 to 1 or 2
Wes McKinney created ARROW-8905: --- Summary: [C++] Collapse Take APIs from 8 to 1 or 2 Key: ARROW-8905 URL: https://issues.apache.org/jira/browse/ARROW-8905 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 1.0.0 There are currently 8 {{Take}} functions with different function signatures. Fewer functions would make life easier for binding developers -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6775) [C++] [Python] Proposal for several Array utility functions
[ https://issues.apache.org/jira/browse/ARROW-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114510#comment-17114510 ] Wes McKinney commented on ARROW-6775: - I think these can all be implemented as kernels with the new compute framework after ARROW-8792. I have linked the issue > [C++] [Python] Proposal for several Array utility functions > --- > > Key: ARROW-6775 > URL: https://issues.apache.org/jira/browse/ARROW-6775 > Project: Apache Arrow > Issue Type: Wish > Components: C++, Python >Reporter: Zhuo Peng >Priority: Minor > > Hi, > We developed several utilities that computes / accesses certain properties of > Arrays and wonder if they make sense to get them into the upstream (into both > the C++ API and pyarrow) and assuming yes, where is the best place to put > them? > Maybe I have overlooked existing APIs that already do the same.. in that case > please point out. > > 1/ ListLengthFromListArray(ListArray&) > Returns lengths of lists in a ListArray, as a Int32Array (or Int64Array for > large lists). For example: > [[1, 2, 3], [], None] => [3, 0, 0] (or [3, 0, None], but we hope the returned > array can be converted to numpy) > > 2/ GetBinaryArrayTotalByteSize(BinaryArray&) > Returns the total byte size of a BinaryArray (basically offset[len - 1] - > offset[0]). > Alternatively, a BinaryArray::Flatten() -> Uint8Array would work. > > 3/ GetArrayNullBitmapAsByteArray(Array&) > Returns the array's null bitmap as a UInt8Array (which can be efficiently > converted to a bool numpy array) > > 4/ GetFlattenedArrayParentIndices(ListArray&) > Makes a int32 array of the same length as the flattened ListArray. > returned_array[i] == j means i-th element in the flattened ListArray came > from j-th list in the ListArray. > For example [[1,2,3], [], None, [4,5]] => [0, 0, 0, 3, 3] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-3520) [C++] Implement List Flatten kernel
[ https://issues.apache.org/jira/browse/ARROW-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114509#comment-17114509 ] Wes McKinney commented on ARROW-3520: - This would be fine as a {{VectorFunction}} > [C++] Implement List Flatten kernel > --- > > Key: ARROW-3520 > URL: https://issues.apache.org/jira/browse/ARROW-3520 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 2.0.0 > > > see also ARROW-45 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8904) [Python] Fix usages of deprecated C++ APIs related to child/field
Wes McKinney created ARROW-8904: --- Summary: [Python] Fix usages of deprecated C++ APIs related to child/field Key: ARROW-8904 URL: https://issues.apache.org/jira/browse/ARROW-8904 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney Fix For: 1.0.0 {code} -- Running cmake --build for pyarrow cmake --build . --config debug -- -j16 [19/20] Building CXX object CMakeFiles/lib.dir/lib.cpp.o lib.cpp:20265:85: warning: 'num_children' is deprecated: Use num_fields() [-Wdeprecated-declarations] __pyx_t_1 = __pyx_f_7pyarrow_3lib__normalize_index(__pyx_v_i, __pyx_v_self->type->num_children()); if (unlikely(__pyx_t_1 == ((Py_ssize_t)-1L))) __PYX_ERR(1, 119, __pyx_L1_error) ^ /home/wesm/local/include/arrow/type.h:263:3: note: 'num_children' has been explicitly marked deprecated here ARROW_DEPRECATED("Use num_fields()") ^ /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from macro 'ARROW_DEPRECATED' # define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__))) ^ lib.cpp:20276:76: warning: 'child' is deprecated: Use field(i) [-Wdeprecated-declarations] __pyx_t_2 = __pyx_f_7pyarrow_3lib_pyarrow_wrap_field(__pyx_v_self->type->child(__pyx_v_index)); if (unlikely(!__pyx_t_2)) __PYX_ERR(1, 120, __pyx_L1_error) ^ /home/wesm/local/include/arrow/type.h:251:3: note: 'child' has been explicitly marked deprecated here ARROW_DEPRECATED("Use field(i)") ^ /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from macro 'ARROW_DEPRECATED' # define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__))) ^ lib.cpp:20507:56: warning: 'num_children' is deprecated: Use num_fields() [-Wdeprecated-declarations] __pyx_t_1 = __Pyx_PyInt_From_int(__pyx_v_self->type->num_children()); if (unlikely(!__pyx_t_1)) __PYX_ERR(1, 139, __pyx_L1_error) ^ /home/wesm/local/include/arrow/type.h:263:3: note: 'num_children' has been explicitly marked deprecated here ARROW_DEPRECATED("Use num_fields()") ^ /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from macro 'ARROW_DEPRECATED' # define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__))) ^ lib.cpp:23361:44: warning: 'num_children' is deprecated: Use num_fields() [-Wdeprecated-declarations] __pyx_r = __pyx_v_self->__pyx_base.type->num_children(); ^ /home/wesm/local/include/arrow/type.h:263:3: note: 'num_children' has been explicitly marked deprecated here ARROW_DEPRECATED("Use num_fields()") ^ /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from macro 'ARROW_DEPRECATED' # define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__))) ^ lib.cpp:24039:44: warning: 'num_children' is deprecated: Use num_fields() [-Wdeprecated-declarations] __pyx_r = __pyx_v_self->__pyx_base.type->num_children(); ^ /home/wesm/local/include/arrow/type.h:263:3: note: 'num_children' has been explicitly marked deprecated here ARROW_DEPRECATED("Use num_fields()") ^ /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from macro 'ARROW_DEPRECATED' # define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__))) ^ lib.cpp:58220:37: warning: 'child' is deprecated: Use field(pos) [-Wdeprecated-declarations] __pyx_v_child = __pyx_v_self->ap->child(__pyx_v_child_id); ^ /home/wesm/local/include/arrow/array.h:1281:3: note: 'child' has been explicitly marked deprecated here ARROW_DEPRECATED("Use field(pos)") ^ /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from macro 'ARROW_DEPRECATED' # define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__))) ^ lib.cpp:58956:74: warning: 'children' is deprecated: Use fields() [-Wdeprecated-declarations] __pyx_v_child_fields = __pyx_v_self->__pyx_base.__pyx_base.type->type->children(); ^ /home/wesm/local/include/arrow/type.h:257:3: note: 'children' has been explicitly marked deprecated here ARROW_DEPRECATED("Use fields()") ^ /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from macro 'ARROW_DEPRECATED' # define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__)))
[jira] [Created] (ARROW-8903) [C++] Implement optimized "unsafe take" for use with selection vectors for kernel execution
Wes McKinney created ARROW-8903: --- Summary: [C++] Implement optimized "unsafe take" for use with selection vectors for kernel execution Key: ARROW-8903 URL: https://issues.apache.org/jira/browse/ARROW-8903 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Selection vectors constructed from filters do not need to be subjected to boundschecking and other such safety checks as are present with a usual invocation of {{take}}. So based on the type width of a selection vector (uint16?) we should implement highly streamlined take implementations that additionally take into consideration that selection vectors are monotonic by construction -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8815) [Dev][Release] Binary upload script should retry on unexpected bintray request error
[ https://issues.apache.org/jira/browse/ARROW-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-8815. - Resolution: Fixed Issue resolved by pull request 7192 [https://github.com/apache/arrow/pull/7192] > [Dev][Release] Binary upload script should retry on unexpected bintray > request error > > > Key: ARROW-8815 > URL: https://issues.apache.org/jira/browse/ARROW-8815 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > During uploading the binaries to bintray the script exited multiple times > because of unhandled HTTP errors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8902) [rust][datafusion] optimize count(*) queries on parquet sources
Alex Gaynor created ARROW-8902: -- Summary: [rust][datafusion] optimize count(*) queries on parquet sources Key: ARROW-8902 URL: https://issues.apache.org/jira/browse/ARROW-8902 Project: Apache Arrow Issue Type: Bug Components: Rust - DataFusion Reporter: Alex Gaynor Currently, as far as I can tell, when you perform a `select count(*) from dataset` in datafusion against a parquet dataset, the way this is implemented is by doing a scan on column 0, and counting up all of the rows (specifically I think it counts the # of rows in each batch). However, for the specific case of just counting _everythign_ in a parquet file, you can just read the rowcount from the footer metadata, so it's O(1) instead of O(n) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8901) [C++] Reduce number of take kernels
Wes McKinney created ARROW-8901: --- Summary: [C++] Reduce number of take kernels Key: ARROW-8901 URL: https://issues.apache.org/jira/browse/ARROW-8901 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney After ARROW-8792 we can observe that we are generating 312 take kernels {code} In [1]: import pyarrow.compute as pc In [2]: reg = pc.function_registry() In [3]: reg.get_function('take') Out[3]: arrow.compute.Function kind: vector num_kernels: 312 {code} You can see them all here: https://gist.github.com/wesm/c3085bf40fa2ee5e555204f8c65b4ad5 It's probably going to be sufficient to only support int16, int32, and int64 index types for almost all types and insert implicit casts (once we implement implicit-cast-insertion into the execution code) for other index types. If we determine that there is some performance hot path where we need to specialize for other index types, then we can always do that. Additionally, we should be able to collapse the date/time kernels since we're just moving memory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8900) Respect HTTP(S)_PROXY for S3 Filesystems and/or expose proxy options as parameters
Daniel Nugent created ARROW-8900: Summary: Respect HTTP(S)_PROXY for S3 Filesystems and/or expose proxy options as parameters Key: ARROW-8900 URL: https://issues.apache.org/jira/browse/ARROW-8900 Project: Apache Arrow Issue Type: Improvement Affects Versions: 0.17.0 Reporter: Daniel Nugent HTTP_PROXY and HTTPS_PROXY are not automatically respected by the Aws::Client::ClientConfiguration (see: https://github.com/aws/aws-sdk-cpp/issues/1049) Either Arrow should respect them or make them available as parameters when connecting to S3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-4390) [R] Serialize "labeled" metadata in Feather files, IPC messages
[ https://issues.apache.org/jira/browse/ARROW-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114428#comment-17114428 ] Neal Richardson commented on ARROW-4390: After exploring more, I don't think this requires an extension type, we just need to collect R attributes and store them as schema metadata. > [R] Serialize "labeled" metadata in Feather files, IPC messages > --- > > Key: ARROW-4390 > URL: https://issues.apache.org/jira/browse/ARROW-4390 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Wes McKinney >Priority: Major > > see https://github.com/apache/arrow/issues/3480 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8899) [R] Add R metadata like pandas metadata for round-trip fidelity
Neal Richardson created ARROW-8899: -- Summary: [R] Add R metadata like pandas metadata for round-trip fidelity Key: ARROW-8899 URL: https://issues.apache.org/jira/browse/ARROW-8899 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Fix For: 1.0.0 Arrow Schema and Field objects have custom_metadata fields to store arbitrary strings in a key-value store. Pandas stores JSON in a "pandas" key and uses that to improve the fidelity of round-tripping data to Arrow/Parquet/Feather and back. https://pandas.pydata.org/docs/dev/development/developer.html#storing-pandas-dataframe-objects-in-apache-parquet-format describes this a bit. You can see this pandas metadata in the sample Parquet file: {code:r} tab <- read_parquet(system.file("v0.7.1.parquet", package="arrow"), as_data_frame = FALSE) tab # Table # 10 rows x 11 columns # $carat # $cut # $color # $clarity # $depth # $table # $price # $x # $y # $z # $__index_level_0__ tab$metadata # $pandas # [1] "{\"index_columns\": [\"__index_level_0__\"], \"column_indexes\": [{\"name\": null, \"pandas_type\": \"string\", \"numpy_type\": \"object\", \"metadata\": null}], \"columns\": [{\"name\": \"carat\", \"pandas_type\": \"float64\", \"numpy_type\": \"float64\", \"metadata\": null}, {\"name\": \"cut\", \"pandas_type\": \"unicode\", \"numpy_type\": \"object\", \"metadata\": null}, {\"name\": \"color\", \"pandas_type\": \"unicode\", \"numpy_type\": \"object\", \"metadata\": null}, {\"name\": \"clarity\", \"pandas_type\": \"unicode\", \"numpy_type\": \"object\", \"metadata\": null}, {\"name\": \"depth\", \"pandas_type\": \"float64\", \"numpy_type\": \"float64\", \"metadata\": null}, {\"name\": \"table\", \"pandas_type\": \"float64\", \"numpy_type\": \"float64\", \"metadata\": null}, {\"name\": \"price\", \"pandas_type\": \"int64\", \"numpy_type\": \"int64\", \"metadata\": null}, {\"name\": \"x\", \"pandas_type\": \"float64\", \"numpy_type\": \"float64\", \"metadata\": null}, {\"name\": \"y\", \"pandas_type\": \"float64\", \"numpy_type\": \"float64\", \"metadata\": null}, {\"name\": \"z\", \"pandas_type\": \"float64\", \"numpy_type\": \"float64\", \"metadata\": null}, {\"name\": \"__index_level_0__\", \"pandas_type\": \"int64\", \"numpy_type\": \"int64\", \"metadata\": null}], \"pandas_version\": \"0.20.1\"}" {code} We should do something similar in R: store the "attributes" for each column in a data.frame when we convert to Arrow, and restore those attributes when we read from Arrow. Since ARROW-8703, you could naively do this all in R, something like: {code:r} tab$metadata$r <- lapply(df, attributes) {code} on the conversion to Arrow, and in as.data.frame(), do {code:r} if (!is.null(tab$metadata$r)) { df[] <- mapply(function(col, meta) { attributes(col) <- meta }, col = df, meta = tab$metadata$r) } {code} However, it's trickier than this because: * {{tab$metadata$r}} needs to be serialized to string and deserialized on the way back. Pandas uses JSON but arrow doesn't currently have a JSON R dependency. The C++ build does include rapidjson, maybe we could tap into that? Alternatively, we could {{dput()}} to dump the R attributes, which might have higher fidelity in addition to zero dependencies, but there are tradeoffs. * We'll need to do the same for all places where Tables and RecordBatches are created/converted * We'll need to make sure that nested types (structs) get the same coverage * This metadata only is attached to Schemas, meaning that Arrays/ChunkedArrays don't have a place to store extra metadata. So we probably want to attach to the R6 (Chunked)Array objects a metadata/attributes field so that if we convert an R vector to array, or if we extract an array out of a record batch, we don't lose the attributes. Doing this should resolve ARROW-4390 and make ARROW-8867 trivial as well. Finally, a note about this custom metadata vs. extension types. Extension types can be defined by [adding metadata to a Field|https://arrow.apache.org/docs/format/Columnar.html#extension-types] (in a Schema). I think this is out of scope here because we're only concerned with R roundtrip fidelity. If there were a type that (for example) R and Pandas both had that Arrow did not, we could define an extension type so that we could share that across the implementations. But unless/until there is value in establishing that extension type standard, let's not worry with it. (In other words, in R we should ignore pandas metadata; if there's anything that pandas wants to share with R, it will define it somewhere else.) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8898) [C++] Determine desirable maximum length for ExecBatch in pipelined and parallel execution of kernels
Wes McKinney created ARROW-8898: --- Summary: [C++] Determine desirable maximum length for ExecBatch in pipelined and parallel execution of kernels Key: ARROW-8898 URL: https://issues.apache.org/jira/browse/ARROW-8898 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Maximum lengths like 16K or 64K seem to be popular, but we should write our own benchmarks so that we can justify the choice of default chunksize -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8897) [C++] Determine strategy for propagating failures in initializing built-in function registry in arrow/compute
Wes McKinney created ARROW-8897: --- Summary: [C++] Determine strategy for propagating failures in initializing built-in function registry in arrow/compute Key: ARROW-8897 URL: https://issues.apache.org/jira/browse/ARROW-8897 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney As discussed on https://github.com/apache/arrow/pull/7240, we are using {{DCHECK_OK}} to check statuses when initializing the built-in registry. We could propagate failures by changing {{arrow::compute::GetFunctionRegistry}} to return Result, but there may be other ways -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8896) [C++] Reimplement dictionary unpacking in Cast kernels using Take
Wes McKinney created ARROW-8896: --- Summary: [C++] Reimplement dictionary unpacking in Cast kernels using Take Key: ARROW-8896 URL: https://issues.apache.org/jira/browse/ARROW-8896 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 1.0.0 As suggested by [~apitrou] this should yield less code to maintain -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8895) [C++] Add C++ unit tests for filter and take functions on temporal type inputs, including timestamps
[ https://issues.apache.org/jira/browse/ARROW-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-8895: Summary: [C++] Add C++ unit tests for filter and take functions on temporal type inputs, including timestamps (was: [C++] Add C++ unit tests for filter function on temporal type inputs, including timestamps) > [C++] Add C++ unit tests for filter and take functions on temporal type > inputs, including timestamps > > > Key: ARROW-8895 > URL: https://issues.apache.org/jira/browse/ARROW-8895 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > These are used in R but not tested in C++, so I only found out that I had > missed adding the kernels to the Filter VectorFunction when running the R > test suite -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8895) [C++] Add C++ unit tests for filter function on temporal type inputs, including timestamps
Wes McKinney created ARROW-8895: --- Summary: [C++] Add C++ unit tests for filter function on temporal type inputs, including timestamps Key: ARROW-8895 URL: https://issues.apache.org/jira/browse/ARROW-8895 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 1.0.0 These are used in R but not tested in C++, so I only found out that I had missed adding the kernels to the Filter VectorFunction when running the R test suite -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8894) [C++] C++ array kernels framework and execution buildout (umbrella issue)
[ https://issues.apache.org/jira/browse/ARROW-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-8894: Description: In the wake of ARROW-8792, this issue is to serve as an umbrella issue for follow up work and associated "buildout" which includes things like: * Implementation of many new function types and adding new kernel cases to existing functions * Adding implicit casting functionality to function execution * Creation of "bound" physical array expressions and execution thereof * Pipeline execution (executing multiple kernels while eliminating temporary allocation) * Parallel execution of scalar and aggregate kernels (including parallel execution of pipelined kernels) There's quite a few existing JIRAs in the project that I'll attach to this issue and I'll open plenty more issues as things occur to me to help organize the work. was: In the wake of ARROW-8792, this issue is to serve as an umbrella issue for follow up work and associated "buildout" which includes things like: * Implementation of many new function types and adding new kernel cases to existing functions * Adding implicit casting functionality to function execution * Creation of "bound" physical arrays expressions * Pipeline execution (executing multiple kernels while eliminating temporary allocation) * Parallel execution of scalar and aggregate kernels (including parallel execution of pipelined kernels) There's quite a few existing JIRAs in the project that I'll attach to this issue and I'll open plenty more issues as things occur to me to help organize the work. > [C++] C++ array kernels framework and execution buildout (umbrella issue) > - > > Key: ARROW-8894 > URL: https://issues.apache.org/jira/browse/ARROW-8894 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > > In the wake of ARROW-8792, this issue is to serve as an umbrella issue for > follow up work and associated "buildout" which includes things like: > * Implementation of many new function types and adding new kernel cases to > existing functions > * Adding implicit casting functionality to function execution > * Creation of "bound" physical array expressions and execution thereof > * Pipeline execution (executing multiple kernels while eliminating temporary > allocation) > * Parallel execution of scalar and aggregate kernels (including parallel > execution of pipelined kernels) > There's quite a few existing JIRAs in the project that I'll attach to this > issue and I'll open plenty more issues as things occur to me to help organize > the work. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8894) [C++] C++ array kernels framework and execution buildout (umbrella issue)
Wes McKinney created ARROW-8894: --- Summary: [C++] C++ array kernels framework and execution buildout (umbrella issue) Key: ARROW-8894 URL: https://issues.apache.org/jira/browse/ARROW-8894 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney In the wake of ARROW-8792, this issue is to serve as an umbrella issue for follow up work and associated "buildout" which includes things like: * Implementation of many new function types and adding new kernel cases to existing functions * Adding implicit casting functionality to function execution * Creation of "bound" physical arrays expressions * Pipeline execution (executing multiple kernels while eliminating temporary allocation) * Parallel execution of scalar and aggregate kernels (including parallel execution of pipelined kernels) There's quite a few existing JIRAs in the project that I'll attach to this issue and I'll open plenty more issues as things occur to me to help organize the work. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8890) [R] Fix C++ lint issue
[ https://issues.apache.org/jira/browse/ARROW-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques resolved ARROW-8890. --- Resolution: Fixed Issue resolved by pull request 7251 [https://github.com/apache/arrow/pull/7251] > [R] Fix C++ lint issue > --- > > Key: ARROW-8890 > URL: https://issues.apache.org/jira/browse/ARROW-8890 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Francois Saint-Jacques >Assignee: Francois Saint-Jacques >Priority: Trivial > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-8893) [R] Fix cpplint issues introduced by ARROW-8885
[ https://issues.apache.org/jira/browse/ARROW-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-8893. --- Fix Version/s: (was: 1.0.0) Resolution: Duplicate dup of ARROW-8890 > [R] Fix cpplint issues introduced by ARROW-8885 > --- > > Key: ARROW-8893 > URL: https://issues.apache.org/jira/browse/ARROW-8893 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Wes McKinney >Priority: Major > > {code} > (arrow-3.7) 12:34 ~/code/arrow/r $ ./lint.sh > /home/wesm/code/arrow/r/src/arrow_types.h:20: Include the directory when > naming .h files [build/include_subdir] [4] > /home/wesm/code/arrow/r/src/arrow_types.h:66: Add #include for > forward [build/include_what_you_use] [4] > /home/wesm/code/arrow/r/src/arrow_types.h:83: Add #include for > vector<> [build/include_what_you_use] [4] > /home/wesm/code/arrow/r/src/arrow_types.h:95: Add #include for > numeric_limits<> [build/include_what_you_use] [4] > /home/wesm/code/arrow/r/src/arrow_types.h:110: Add #include for > shared_ptr<> [build/include_what_you_use] [4] > /home/wesm/code/arrow/r/src/arrow_exports.h:22: Include the directory when > naming .h files [build/include_subdir] [4] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8455) [Rust] [Parquet] Arrow column read on partially compatible files
[ https://issues.apache.org/jira/browse/ARROW-8455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved ARROW-8455. - Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 6935 [https://github.com/apache/arrow/pull/6935] > [Rust] [Parquet] Arrow column read on partially compatible files > > > Key: ARROW-8455 > URL: https://issues.apache.org/jira/browse/ARROW-8455 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Affects Versions: 0.16.0 >Reporter: Remi Dettai >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Seen behavior: When reading a Parquet file into Arrow with > `get_record_reader_by_columns`, it will fail if one of the column of the file > is a list (or any other unsupported type). > Expected behavior: it should only fail if you are actually reading the column > with unsuported type. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8893) [R] Fix cpplint issues introduced by ARROW-8885
Wes McKinney created ARROW-8893: --- Summary: [R] Fix cpplint issues introduced by ARROW-8885 Key: ARROW-8893 URL: https://issues.apache.org/jira/browse/ARROW-8893 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Wes McKinney Fix For: 1.0.0 {code} (arrow-3.7) 12:34 ~/code/arrow/r $ ./lint.sh /home/wesm/code/arrow/r/src/arrow_types.h:20: Include the directory when naming .h files [build/include_subdir] [4] /home/wesm/code/arrow/r/src/arrow_types.h:66: Add #include for forward [build/include_what_you_use] [4] /home/wesm/code/arrow/r/src/arrow_types.h:83: Add #include for vector<> [build/include_what_you_use] [4] /home/wesm/code/arrow/r/src/arrow_types.h:95: Add #include for numeric_limits<> [build/include_what_you_use] [4] /home/wesm/code/arrow/r/src/arrow_types.h:110: Add #include for shared_ptr<> [build/include_what_you_use] [4] /home/wesm/code/arrow/r/src/arrow_exports.h:22: Include the directory when naming .h files [build/include_subdir] [4] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8892) [C++][CI] CI builds for MSVC do not build benchmarks
Wes McKinney created ARROW-8892: --- Summary: [C++][CI] CI builds for MSVC do not build benchmarks Key: ARROW-8892 URL: https://issues.apache.org/jira/browse/ARROW-8892 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Wes McKinney Fix For: 1.0.0 We must ensure that our benchmarks always build on Windows I'm fixing these errors for example in ARROW-8792 {code} C:/Users/wesmc/code/arrow/cpp/src/parquet/encoding_benchmark.cc(249): error C2220: warning treated as error - no 'object' file generated C:/Users/wesmc/code/arrow/cpp/src/parquet/encoding_benchmark.cc(256): note: see reference to function template instantiation 'void parquet::BM_PlainEncodingSpaced(benchmark::State &)' being compiled C:/Users/wesmc/code/arrow/cpp/src/parquet/encoding_benchmark.cc(249): warning C4244: 'argument': conversion from 'const int64_t' to 'int', possible loss of data C:/Users/wesmc/code/arrow/cpp/src/parquet/encoding_benchmark.cc(292): warning C4244: 'argument': conversion from 'const int64_t' to 'int', possible loss of data C:/Users/wesmc/code/arrow/cpp/src/parquet/encoding_benchmark.cc(306): note: see reference to function template instantiation 'void parquet::BM_PlainDecodingSpaced(benchmark::State &)' being compiled C:/Users/wesmc/code/arrow/cpp/src/parquet/encoding_benchmark.cc(299): warning C4244: 'argument': conversion from 'int64_t' to 'int', possible loss of data C:/Users/wesmc/code/arrow/cpp/src/parquet/encoding_benchmark.cc(300): warning C4244: 'argument': conversion from 'const int64_t' to 'int', possible loss of data [11/67] Linking CXX executable release\arrow-ipc-read-write-benchmark.exe {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-555) [C++] String algorithm library for StringArray/BinaryArray
[ https://issues.apache.org/jira/browse/ARROW-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114217#comment-17114217 ] Wes McKinney commented on ARROW-555: Yes, that's the idea. I can try to implement {{str.split}} which would be {{String -> List}} in Arrow types. > [C++] String algorithm library for StringArray/BinaryArray > -- > > Key: ARROW-555 > URL: https://issues.apache.org/jira/browse/ARROW-555 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: Analytics > > This is a parent JIRA for starting a module for processing strings in-memory > arranged in Arrow format. This will include using the re2 C++ regular > expression library and other standard string manipulations (such as those > found on Python's string objects) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8878) [R] how to install when behind a firewall?
[ https://issues.apache.org/jira/browse/ARROW-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114186#comment-17114186 ] Olaf commented on ARROW-8878: - Hi [~npr], thanks for replying back. Please see below: * getOption("download.file.method") returns wget * sorry for the low-tech question, but can I install manually without cloning? That is, simply going to the github page [https://github.com/apache/arrow], manually downloading the zip and then installing using the "install from zip" utility in Rstudio? Would that work correctly? Thanks!! > [R] how to install when behind a firewall? > -- > > Key: ARROW-8878 > URL: https://issues.apache.org/jira/browse/ARROW-8878 > Project: Apache Arrow > Issue Type: Bug > Components: R > Environment: r >Reporter: Olaf >Priority: Major > > Hello there and thanks again for this beautiful package! > I am trying to install {{arrow}} on linux and I got a few problematic > warnings during the install. My computer is behind a firewall so not all the > connections coming from rstudio are allowed. > > {code:java} > > sessionInfo() > R version 3.6.1 (2019-07-05) > Platform: x86_64-ubuntu18-linux-gnu (64-bit) > Running under: Ubuntu 18.04.4 LTS > Matrix products: default > BLAS/LAPACK: > /apps/intel/2019.1/compilers_and_libraries_2019.1.144/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 > [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > attached base packages: > [1] stats graphics grDevices utils datasets methods base > other attached packages: > [1] MKLthreads_0.1 > loaded via a namespace (and not attached): > [1] compiler_3.6.1 tools_3.6.1 > {code} > > after running {{install.packages("arrow")}} I get > > {code:java} > > installing *source* package ?arrow? ... > ** package ?arrow? successfully unpacked and MD5 sums checked > ** using staged installation > *** Successfully retrieved C++ source > *** Proceeding without C++ dependencies > Warning message: > In unzip(tf1, exdir = src_dir) : error 1 in extracting from zip file > ./configure: line 132: cd: libarrow/arrow-0.17.1/lib: No such file or > directory > - NOTE --- > After installation, please run arrow::install_arrow() > for help installing required runtime libraries > - > {code} > > > However, the installation ends normally. > > {code:java} > ** R > ** inst > ** byte-compile and prepare package for lazy loading > ** help > *** installing help indices > ** building package indices > ** installing vignettes > ** testing if installed package can be loaded from temporary location > ** checking absolute paths in shared objects and dynamic libraries > ** testing if installed package can be loaded from final location > ** testing if installed package keeps a record of temporary installation path > * DONE (arrow) > {code} > > So I go ahead and try to run arrow::install_arrow() and get a similar warning. > > {code:java} > installing *source* package ?arrow? ... > ** package ?arrow? successfully unpacked and MD5 sums checked > ** using staged installation > *** Successfully retrieved C++ binaries for ubuntu-18.04 > Warning messages: > 1: In file(file, "rt") : > URL > 'https://raw.githubusercontent.com/ursa-labs/arrow-r-nightly/master/linux/distro-map.csv': > status was 'Couldn't connect to server' > 2: In unzip(bin_file, exdir = dst_dir) : > error 1 in extracting from zip file > ./configure: line 132: cd: libarrow/arrow-0.17.1/lib: No such file or > directory > - NOTE --- > After installation, please run arrow::install_arrow() > for help installing required runtime libraries > {code} > And unfortunately I cannot read any parquet file. > {noformat} > Error in fetch(key) : lazy-load database > '/mydata/R/x86_64-ubuntu18-linux-gnu-library/3.6/arrow/help/arrow.rdb' is > corrupt{noformat} > > Could you please tell me how to fix this? Can I just copy the zip from github > and do a manual install in Rstudio? > > Thanks! > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-555) [C++] String algorithm library for StringArray/BinaryArray
[ https://issues.apache.org/jira/browse/ARROW-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114105#comment-17114105 ] Maarten Breddels commented on ARROW-555: Sounds good. I think it would help me a lot to see str->scalar and str->str (and possibly a str->[str, str]) example. They can be trivial, like always return ["a", "b"], but with that, I can probably get up to speed very quickly, if it's not too much to ask. > [C++] String algorithm library for StringArray/BinaryArray > -- > > Key: ARROW-555 > URL: https://issues.apache.org/jira/browse/ARROW-555 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: Analytics > > This is a parent JIRA for starting a module for processing strings in-memory > arranged in Arrow format. This will include using the re2 C++ regular > expression library and other standard string manipulations (such as those > found on Python's string objects) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8891) [C++] Split non-cast compute kernels into a separate shared library
Wes McKinney created ARROW-8891: --- Summary: [C++] Split non-cast compute kernels into a separate shared library Key: ARROW-8891 URL: https://issues.apache.org/jira/browse/ARROW-8891 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Since we are going to implement a lot more precompiled kernels, I am not sure it makes sense to require all of them to be compiled unconditionally just to get access to {{compute::Cast}}, which is needed in many different contexts. After ARROW-8792 is merged, I would suggest creating a plugin hook for adding a bundle of kernels from a shared library outside of libarrow.so, and then moving all the object code outside of Cast to something like libarrow_compute.so. Then we can change the CMake flags to compile Cast kernels always (?) and then opt in to building the additional kernels package separately -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-8510) [C++] arrow/dataset/file_base.cc fails to compile with internal compiler error with "Visual Studio 15 2017 Win64" generator
[ https://issues.apache.org/jira/browse/ARROW-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques reassigned ARROW-8510: - Assignee: Francois Saint-Jacques > [C++] arrow/dataset/file_base.cc fails to compile with internal compiler > error with "Visual Studio 15 2017 Win64" generator > --- > > Key: ARROW-8510 > URL: https://issues.apache.org/jira/browse/ARROW-8510 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Developer Tools >Reporter: Wes McKinney >Assignee: Francois Saint-Jacques >Priority: Blocker > Fix For: 1.0.0 > > > I discovered this while running the release verification on Windows. There > was an obscuring issue which is that if the build fails, the verification > script continues. I will fix that -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8890) [R] Fix C++ lint issue
[ https://issues.apache.org/jira/browse/ARROW-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8890: -- Labels: pull-request-available (was: ) > [R] Fix C++ lint issue > --- > > Key: ARROW-8890 > URL: https://issues.apache.org/jira/browse/ARROW-8890 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Francois Saint-Jacques >Assignee: Francois Saint-Jacques >Priority: Trivial > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-8889) [Python] Python 3.7 SIGSEGV when comparing RecordBatch to None
[ https://issues.apache.org/jira/browse/ARROW-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques reassigned ARROW-8889: - Assignee: David Li > [Python] Python 3.7 SIGSEGV when comparing RecordBatch to None > -- > > Key: ARROW-8889 > URL: https://issues.apache.org/jira/browse/ARROW-8889 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.1, 0.17.1 >Reporter: David Li >Assignee: David Li >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > This seems to only happen for Python 3.6 and 3.7. It doesn't happen with 3.8. > It seems to happen even when built from source, but I used the wheels for > this reproduction. > {noformat} > > uname -a > Linux chaconne 5.6.13-arch1-1 #1 SMP PREEMPT Thu, 14 May 2020 06:52:53 + > x86_64 GNU/Linux > > python --version > Python 3.7.7 > > pip freeze > numpy==1.18.4 > pyarrow==0.17.1{noformat} > Reproduction: > {code:python} > import pyarrow as pa > table = pa.Table.from_arrays([pa.array([1,2,3])], names=["a"]) > batches = table.to_batches() > batches[0].equals(None) > {code} > {noformat} > #0 0x7fffdf9d34f0 in arrow::RecordBatch::num_columns() const () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17 > #1 0x7fffdf9d69e9 in arrow::RecordBatch::Equals(arrow::RecordBatch > const&, bool) const () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17 > #2 0x7fffe084a6e0 in > __pyx_pw_7pyarrow_3lib_11RecordBatch_31equals(_object*, _object*, _object*) > () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so > #3 0x556b97e4 in _PyMethodDef_RawFastCallKeywords > (method=0x7fffe0c1b760 <__pyx_methods_7pyarrow_3lib_RecordBatch+288>, > self=0x7fffdefd7110, args=0x7786f5c8, nargs=, > kwnames=) > at /tmp/build/80754af9/python_1585000375785/work/Objects/call.c:694 > #4 0x556c06af in _PyMethodDescr_FastCallKeywords > (descrobj=0x7fffdefa4050, args=0x7786f5c0, nargs=2, kwnames=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Objects/descrobject.c:288 > #5 0x55724add in call_function (kwnames=0x0, oparg=2, > pp_stack=) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:4593 > #6 _PyEval_EvalFrameDefault (f=, throwflag=) > at /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3110 > #7 0x55669289 in _PyEval_EvalCodeWithName (_co=0x778a68a0, > globals=, locals=, args=, > argcount=, kwnames=0x0, kwargs=0x0, kwcount=, > kwstep=2, > defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3930 > #8 0x5566a1c4 in PyEval_EvalCodeEx (_co=, > globals=, locals=, args=, > argcount=, kws=, kwcount=0, defs=0x0, > defcount=0, kwdefs=0x0, > closure=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3959 > #9 0x5566a1ec in PyEval_EvalCode (co=, > globals=, locals=) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:524 > #10 0x55780cb4 in run_mod (mod=, filename= out>, globals=0x778d7c30, locals=0x778d7c30, flags=, > arena=) > at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:1035 > #11 0x5578b0d1 in PyRun_FileExFlags (fp=0x558c24d0, > filename_str=, start=, globals=0x778d7c30, > locals=0x778d7c30, closeit=1, flags=0x7fffe1b0) > at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:988 > #12 0x5578b2c3 in PyRun_SimpleFileExFlags (fp=0x558c24d0, > filename=, closeit=1, flags=0x7fffe1b0) at > /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:429 > #13 0x5578c3f5 in pymain_run_file (p_cf=0x7fffe1b0, > filename=0x558e51f0 L"repro.py", fp=0x558c24d0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:462 > #14 pymain_run_filename (cf=0x7fffe1b0, pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:1641 > #15 pymain_run_python (pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:2902 > #16 pymain_main (pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3442 > #17 0x5578c51c in _Py_UnixMain (argc=, argv= out>) at /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3477 > #18 0x77dcd002 in __libc_start_main () from /usr/lib/libc.so.6 > #19 0x5572fac0 in _start () at ../sysdeps/x86_64/elf/start.S:103 > {noformat} -- This
[jira] [Resolved] (ARROW-8889) [Python] Python 3.7 SIGSEGV when comparing RecordBatch to None
[ https://issues.apache.org/jira/browse/ARROW-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques resolved ARROW-8889. --- Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 7249 [https://github.com/apache/arrow/pull/7249] > [Python] Python 3.7 SIGSEGV when comparing RecordBatch to None > -- > > Key: ARROW-8889 > URL: https://issues.apache.org/jira/browse/ARROW-8889 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.1, 0.17.1 >Reporter: David Li >Assignee: David Li >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > This seems to only happen for Python 3.6 and 3.7. It doesn't happen with 3.8. > It seems to happen even when built from source, but I used the wheels for > this reproduction. > {noformat} > > uname -a > Linux chaconne 5.6.13-arch1-1 #1 SMP PREEMPT Thu, 14 May 2020 06:52:53 + > x86_64 GNU/Linux > > python --version > Python 3.7.7 > > pip freeze > numpy==1.18.4 > pyarrow==0.17.1{noformat} > Reproduction: > {code:python} > import pyarrow as pa > table = pa.Table.from_arrays([pa.array([1,2,3])], names=["a"]) > batches = table.to_batches() > batches[0].equals(None) > {code} > {noformat} > #0 0x7fffdf9d34f0 in arrow::RecordBatch::num_columns() const () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17 > #1 0x7fffdf9d69e9 in arrow::RecordBatch::Equals(arrow::RecordBatch > const&, bool) const () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17 > #2 0x7fffe084a6e0 in > __pyx_pw_7pyarrow_3lib_11RecordBatch_31equals(_object*, _object*, _object*) > () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so > #3 0x556b97e4 in _PyMethodDef_RawFastCallKeywords > (method=0x7fffe0c1b760 <__pyx_methods_7pyarrow_3lib_RecordBatch+288>, > self=0x7fffdefd7110, args=0x7786f5c8, nargs=, > kwnames=) > at /tmp/build/80754af9/python_1585000375785/work/Objects/call.c:694 > #4 0x556c06af in _PyMethodDescr_FastCallKeywords > (descrobj=0x7fffdefa4050, args=0x7786f5c0, nargs=2, kwnames=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Objects/descrobject.c:288 > #5 0x55724add in call_function (kwnames=0x0, oparg=2, > pp_stack=) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:4593 > #6 _PyEval_EvalFrameDefault (f=, throwflag=) > at /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3110 > #7 0x55669289 in _PyEval_EvalCodeWithName (_co=0x778a68a0, > globals=, locals=, args=, > argcount=, kwnames=0x0, kwargs=0x0, kwcount=, > kwstep=2, > defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3930 > #8 0x5566a1c4 in PyEval_EvalCodeEx (_co=, > globals=, locals=, args=, > argcount=, kws=, kwcount=0, defs=0x0, > defcount=0, kwdefs=0x0, > closure=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3959 > #9 0x5566a1ec in PyEval_EvalCode (co=, > globals=, locals=) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:524 > #10 0x55780cb4 in run_mod (mod=, filename= out>, globals=0x778d7c30, locals=0x778d7c30, flags=, > arena=) > at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:1035 > #11 0x5578b0d1 in PyRun_FileExFlags (fp=0x558c24d0, > filename_str=, start=, globals=0x778d7c30, > locals=0x778d7c30, closeit=1, flags=0x7fffe1b0) > at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:988 > #12 0x5578b2c3 in PyRun_SimpleFileExFlags (fp=0x558c24d0, > filename=, closeit=1, flags=0x7fffe1b0) at > /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:429 > #13 0x5578c3f5 in pymain_run_file (p_cf=0x7fffe1b0, > filename=0x558e51f0 L"repro.py", fp=0x558c24d0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:462 > #14 pymain_run_filename (cf=0x7fffe1b0, pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:1641 > #15 pymain_run_python (pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:2902 > #16 pymain_main (pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3442 > #17 0x5578c51c in _Py_UnixMain (argc=, argv= out>) at /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3477 > #18 0x77dcd002 in
[jira] [Created] (ARROW-8890) [R] Fix C++ lint issue
Francois Saint-Jacques created ARROW-8890: - Summary: [R] Fix C++ lint issue Key: ARROW-8890 URL: https://issues.apache.org/jira/browse/ARROW-8890 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Francois Saint-Jacques Assignee: Francois Saint-Jacques Fix For: 1.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8889) [Python] Python 3.7 SIGSEGV when comparing RecordBatch to None
[ https://issues.apache.org/jira/browse/ARROW-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8889: -- Labels: pull-request-available (was: ) > [Python] Python 3.7 SIGSEGV when comparing RecordBatch to None > -- > > Key: ARROW-8889 > URL: https://issues.apache.org/jira/browse/ARROW-8889 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.1, 0.17.1 >Reporter: David Li >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This seems to only happen for Python 3.6 and 3.7. It doesn't happen with 3.8. > It seems to happen even when built from source, but I used the wheels for > this reproduction. > {noformat} > > uname -a > Linux chaconne 5.6.13-arch1-1 #1 SMP PREEMPT Thu, 14 May 2020 06:52:53 + > x86_64 GNU/Linux > > python --version > Python 3.7.7 > > pip freeze > numpy==1.18.4 > pyarrow==0.17.1{noformat} > Reproduction: > {code:python} > import pyarrow as pa > table = pa.Table.from_arrays([pa.array([1,2,3])], names=["a"]) > batches = table.to_batches() > batches[0].equals(None) > {code} > {noformat} > #0 0x7fffdf9d34f0 in arrow::RecordBatch::num_columns() const () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17 > #1 0x7fffdf9d69e9 in arrow::RecordBatch::Equals(arrow::RecordBatch > const&, bool) const () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17 > #2 0x7fffe084a6e0 in > __pyx_pw_7pyarrow_3lib_11RecordBatch_31equals(_object*, _object*, _object*) > () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so > #3 0x556b97e4 in _PyMethodDef_RawFastCallKeywords > (method=0x7fffe0c1b760 <__pyx_methods_7pyarrow_3lib_RecordBatch+288>, > self=0x7fffdefd7110, args=0x7786f5c8, nargs=, > kwnames=) > at /tmp/build/80754af9/python_1585000375785/work/Objects/call.c:694 > #4 0x556c06af in _PyMethodDescr_FastCallKeywords > (descrobj=0x7fffdefa4050, args=0x7786f5c0, nargs=2, kwnames=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Objects/descrobject.c:288 > #5 0x55724add in call_function (kwnames=0x0, oparg=2, > pp_stack=) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:4593 > #6 _PyEval_EvalFrameDefault (f=, throwflag=) > at /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3110 > #7 0x55669289 in _PyEval_EvalCodeWithName (_co=0x778a68a0, > globals=, locals=, args=, > argcount=, kwnames=0x0, kwargs=0x0, kwcount=, > kwstep=2, > defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3930 > #8 0x5566a1c4 in PyEval_EvalCodeEx (_co=, > globals=, locals=, args=, > argcount=, kws=, kwcount=0, defs=0x0, > defcount=0, kwdefs=0x0, > closure=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3959 > #9 0x5566a1ec in PyEval_EvalCode (co=, > globals=, locals=) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:524 > #10 0x55780cb4 in run_mod (mod=, filename= out>, globals=0x778d7c30, locals=0x778d7c30, flags=, > arena=) > at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:1035 > #11 0x5578b0d1 in PyRun_FileExFlags (fp=0x558c24d0, > filename_str=, start=, globals=0x778d7c30, > locals=0x778d7c30, closeit=1, flags=0x7fffe1b0) > at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:988 > #12 0x5578b2c3 in PyRun_SimpleFileExFlags (fp=0x558c24d0, > filename=, closeit=1, flags=0x7fffe1b0) at > /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:429 > #13 0x5578c3f5 in pymain_run_file (p_cf=0x7fffe1b0, > filename=0x558e51f0 L"repro.py", fp=0x558c24d0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:462 > #14 pymain_run_filename (cf=0x7fffe1b0, pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:1641 > #15 pymain_run_python (pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:2902 > #16 pymain_main (pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3442 > #17 0x5578c51c in _Py_UnixMain (argc=, argv= out>) at /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3477 > #18 0x77dcd002 in __libc_start_main () from /usr/lib/libc.so.6 > #19 0x5572fac0 in _start () at ../sysdeps/x86_64/elf/start.S:103 > {noformat} -- This message was sent by Atlassian Jira
[jira] [Commented] (ARROW-8889) [Python] Python 3.7 SIGSEGV when comparing RecordBatch to None
[ https://issues.apache.org/jira/browse/ARROW-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113981#comment-17113981 ] David Li commented on ARROW-8889: - I tried with a wheel for 0.15.1 and it happens as well. (It doesn't happen with 0.15.1 built from source.) So it seems this has been around a while. > [Python] Python 3.7 SIGSEGV when comparing RecordBatch to None > -- > > Key: ARROW-8889 > URL: https://issues.apache.org/jira/browse/ARROW-8889 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.1, 0.17.1 >Reporter: David Li >Priority: Major > > This seems to only happen for Python 3.6 and 3.7. It doesn't happen with 3.8. > It seems to happen even when built from source, but I used the wheels for > this reproduction. > {noformat} > > uname -a > Linux chaconne 5.6.13-arch1-1 #1 SMP PREEMPT Thu, 14 May 2020 06:52:53 + > x86_64 GNU/Linux > > python --version > Python 3.7.7 > > pip freeze > numpy==1.18.4 > pyarrow==0.17.1{noformat} > Reproduction: > {code:python} > import pyarrow as pa > table = pa.Table.from_arrays([pa.array([1,2,3])], names=["a"]) > batches = table.to_batches() > batches[0].equals(None) > {code} > {noformat} > #0 0x7fffdf9d34f0 in arrow::RecordBatch::num_columns() const () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17 > #1 0x7fffdf9d69e9 in arrow::RecordBatch::Equals(arrow::RecordBatch > const&, bool) const () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17 > #2 0x7fffe084a6e0 in > __pyx_pw_7pyarrow_3lib_11RecordBatch_31equals(_object*, _object*, _object*) > () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so > #3 0x556b97e4 in _PyMethodDef_RawFastCallKeywords > (method=0x7fffe0c1b760 <__pyx_methods_7pyarrow_3lib_RecordBatch+288>, > self=0x7fffdefd7110, args=0x7786f5c8, nargs=, > kwnames=) > at /tmp/build/80754af9/python_1585000375785/work/Objects/call.c:694 > #4 0x556c06af in _PyMethodDescr_FastCallKeywords > (descrobj=0x7fffdefa4050, args=0x7786f5c0, nargs=2, kwnames=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Objects/descrobject.c:288 > #5 0x55724add in call_function (kwnames=0x0, oparg=2, > pp_stack=) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:4593 > #6 _PyEval_EvalFrameDefault (f=, throwflag=) > at /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3110 > #7 0x55669289 in _PyEval_EvalCodeWithName (_co=0x778a68a0, > globals=, locals=, args=, > argcount=, kwnames=0x0, kwargs=0x0, kwcount=, > kwstep=2, > defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3930 > #8 0x5566a1c4 in PyEval_EvalCodeEx (_co=, > globals=, locals=, args=, > argcount=, kws=, kwcount=0, defs=0x0, > defcount=0, kwdefs=0x0, > closure=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3959 > #9 0x5566a1ec in PyEval_EvalCode (co=, > globals=, locals=) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:524 > #10 0x55780cb4 in run_mod (mod=, filename= out>, globals=0x778d7c30, locals=0x778d7c30, flags=, > arena=) > at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:1035 > #11 0x5578b0d1 in PyRun_FileExFlags (fp=0x558c24d0, > filename_str=, start=, globals=0x778d7c30, > locals=0x778d7c30, closeit=1, flags=0x7fffe1b0) > at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:988 > #12 0x5578b2c3 in PyRun_SimpleFileExFlags (fp=0x558c24d0, > filename=, closeit=1, flags=0x7fffe1b0) at > /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:429 > #13 0x5578c3f5 in pymain_run_file (p_cf=0x7fffe1b0, > filename=0x558e51f0 L"repro.py", fp=0x558c24d0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:462 > #14 pymain_run_filename (cf=0x7fffe1b0, pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:1641 > #15 pymain_run_python (pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:2902 > #16 pymain_main (pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3442 > #17 0x5578c51c in _Py_UnixMain (argc=, argv= out>) at /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3477 > #18 0x77dcd002 in __libc_start_main () from /usr/lib/libc.so.6 > #19 0x5572fac0 in _start () at ../sysdeps/x86_64/elf/start.S:103 > {noformat} -- This
[jira] [Updated] (ARROW-8889) [Python] Python 3.7 SIGSEGV when comparing RecordBatch to None
[ https://issues.apache.org/jira/browse/ARROW-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Li updated ARROW-8889: Affects Version/s: 0.15.1 > [Python] Python 3.7 SIGSEGV when comparing RecordBatch to None > -- > > Key: ARROW-8889 > URL: https://issues.apache.org/jira/browse/ARROW-8889 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.15.1, 0.17.1 >Reporter: David Li >Priority: Major > > This seems to only happen for Python 3.6 and 3.7. It doesn't happen with 3.8. > It seems to happen even when built from source, but I used the wheels for > this reproduction. > {noformat} > > uname -a > Linux chaconne 5.6.13-arch1-1 #1 SMP PREEMPT Thu, 14 May 2020 06:52:53 + > x86_64 GNU/Linux > > python --version > Python 3.7.7 > > pip freeze > numpy==1.18.4 > pyarrow==0.17.1{noformat} > Reproduction: > {code:python} > import pyarrow as pa > table = pa.Table.from_arrays([pa.array([1,2,3])], names=["a"]) > batches = table.to_batches() > batches[0].equals(None) > {code} > {noformat} > #0 0x7fffdf9d34f0 in arrow::RecordBatch::num_columns() const () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17 > #1 0x7fffdf9d69e9 in arrow::RecordBatch::Equals(arrow::RecordBatch > const&, bool) const () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17 > #2 0x7fffe084a6e0 in > __pyx_pw_7pyarrow_3lib_11RecordBatch_31equals(_object*, _object*, _object*) > () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so > #3 0x556b97e4 in _PyMethodDef_RawFastCallKeywords > (method=0x7fffe0c1b760 <__pyx_methods_7pyarrow_3lib_RecordBatch+288>, > self=0x7fffdefd7110, args=0x7786f5c8, nargs=, > kwnames=) > at /tmp/build/80754af9/python_1585000375785/work/Objects/call.c:694 > #4 0x556c06af in _PyMethodDescr_FastCallKeywords > (descrobj=0x7fffdefa4050, args=0x7786f5c0, nargs=2, kwnames=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Objects/descrobject.c:288 > #5 0x55724add in call_function (kwnames=0x0, oparg=2, > pp_stack=) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:4593 > #6 _PyEval_EvalFrameDefault (f=, throwflag=) > at /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3110 > #7 0x55669289 in _PyEval_EvalCodeWithName (_co=0x778a68a0, > globals=, locals=, args=, > argcount=, kwnames=0x0, kwargs=0x0, kwcount=, > kwstep=2, > defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3930 > #8 0x5566a1c4 in PyEval_EvalCodeEx (_co=, > globals=, locals=, args=, > argcount=, kws=, kwcount=0, defs=0x0, > defcount=0, kwdefs=0x0, > closure=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3959 > #9 0x5566a1ec in PyEval_EvalCode (co=, > globals=, locals=) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:524 > #10 0x55780cb4 in run_mod (mod=, filename= out>, globals=0x778d7c30, locals=0x778d7c30, flags=, > arena=) > at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:1035 > #11 0x5578b0d1 in PyRun_FileExFlags (fp=0x558c24d0, > filename_str=, start=, globals=0x778d7c30, > locals=0x778d7c30, closeit=1, flags=0x7fffe1b0) > at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:988 > #12 0x5578b2c3 in PyRun_SimpleFileExFlags (fp=0x558c24d0, > filename=, closeit=1, flags=0x7fffe1b0) at > /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:429 > #13 0x5578c3f5 in pymain_run_file (p_cf=0x7fffe1b0, > filename=0x558e51f0 L"repro.py", fp=0x558c24d0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:462 > #14 pymain_run_filename (cf=0x7fffe1b0, pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:1641 > #15 pymain_run_python (pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:2902 > #16 pymain_main (pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3442 > #17 0x5578c51c in _Py_UnixMain (argc=, argv= out>) at /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3477 > #18 0x77dcd002 in __libc_start_main () from /usr/lib/libc.so.6 > #19 0x5572fac0 in _start () at ../sysdeps/x86_64/elf/start.S:103 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8889) [Python] Python 3.7 SIGSEGV when comparing RecordBatch to None
[ https://issues.apache.org/jira/browse/ARROW-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113977#comment-17113977 ] David Li commented on ARROW-8889: - I have a core dump but it's too large. Let me upload it somewhere else. > [Python] Python 3.7 SIGSEGV when comparing RecordBatch to None > -- > > Key: ARROW-8889 > URL: https://issues.apache.org/jira/browse/ARROW-8889 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.17.1 >Reporter: David Li >Priority: Major > > This seems to only happen for Python 3.6 and 3.7. It doesn't happen with 3.8. > It seems to happen even when built from source, but I used the wheels for > this reproduction. > {noformat} > > uname -a > Linux chaconne 5.6.13-arch1-1 #1 SMP PREEMPT Thu, 14 May 2020 06:52:53 + > x86_64 GNU/Linux > > python --version > Python 3.7.7 > > pip freeze > numpy==1.18.4 > pyarrow==0.17.1{noformat} > Reproduction: > {code:python} > import pyarrow as pa > table = pa.Table.from_arrays([pa.array([1,2,3])], names=["a"]) > batches = table.to_batches() > batches[0].equals(None) > {code} > {noformat} > #0 0x7fffdf9d34f0 in arrow::RecordBatch::num_columns() const () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17 > #1 0x7fffdf9d69e9 in arrow::RecordBatch::Equals(arrow::RecordBatch > const&, bool) const () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17 > #2 0x7fffe084a6e0 in > __pyx_pw_7pyarrow_3lib_11RecordBatch_31equals(_object*, _object*, _object*) > () from > /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so > #3 0x556b97e4 in _PyMethodDef_RawFastCallKeywords > (method=0x7fffe0c1b760 <__pyx_methods_7pyarrow_3lib_RecordBatch+288>, > self=0x7fffdefd7110, args=0x7786f5c8, nargs=, > kwnames=) > at /tmp/build/80754af9/python_1585000375785/work/Objects/call.c:694 > #4 0x556c06af in _PyMethodDescr_FastCallKeywords > (descrobj=0x7fffdefa4050, args=0x7786f5c0, nargs=2, kwnames=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Objects/descrobject.c:288 > #5 0x55724add in call_function (kwnames=0x0, oparg=2, > pp_stack=) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:4593 > #6 _PyEval_EvalFrameDefault (f=, throwflag=) > at /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3110 > #7 0x55669289 in _PyEval_EvalCodeWithName (_co=0x778a68a0, > globals=, locals=, args=, > argcount=, kwnames=0x0, kwargs=0x0, kwcount=, > kwstep=2, > defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3930 > #8 0x5566a1c4 in PyEval_EvalCodeEx (_co=, > globals=, locals=, args=, > argcount=, kws=, kwcount=0, defs=0x0, > defcount=0, kwdefs=0x0, > closure=0x0) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3959 > #9 0x5566a1ec in PyEval_EvalCode (co=, > globals=, locals=) at > /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:524 > #10 0x55780cb4 in run_mod (mod=, filename= out>, globals=0x778d7c30, locals=0x778d7c30, flags=, > arena=) > at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:1035 > #11 0x5578b0d1 in PyRun_FileExFlags (fp=0x558c24d0, > filename_str=, start=, globals=0x778d7c30, > locals=0x778d7c30, closeit=1, flags=0x7fffe1b0) > at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:988 > #12 0x5578b2c3 in PyRun_SimpleFileExFlags (fp=0x558c24d0, > filename=, closeit=1, flags=0x7fffe1b0) at > /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:429 > #13 0x5578c3f5 in pymain_run_file (p_cf=0x7fffe1b0, > filename=0x558e51f0 L"repro.py", fp=0x558c24d0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:462 > #14 pymain_run_filename (cf=0x7fffe1b0, pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:1641 > #15 pymain_run_python (pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:2902 > #16 pymain_main (pymain=0x7fffe2c0) at > /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3442 > #17 0x5578c51c in _Py_UnixMain (argc=, argv= out>) at /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3477 > #18 0x77dcd002 in __libc_start_main () from /usr/lib/libc.so.6 > #19 0x5572fac0 in _start () at ../sysdeps/x86_64/elf/start.S:103 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8889) [Python] Python 3.7 SIGSEGV when comparing RecordBatch to None
David Li created ARROW-8889: --- Summary: [Python] Python 3.7 SIGSEGV when comparing RecordBatch to None Key: ARROW-8889 URL: https://issues.apache.org/jira/browse/ARROW-8889 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.17.1 Reporter: David Li This seems to only happen for Python 3.6 and 3.7. It doesn't happen with 3.8. It seems to happen even when built from source, but I used the wheels for this reproduction. {noformat} > uname -a Linux chaconne 5.6.13-arch1-1 #1 SMP PREEMPT Thu, 14 May 2020 06:52:53 + x86_64 GNU/Linux > python --version Python 3.7.7 > pip freeze numpy==1.18.4 pyarrow==0.17.1{noformat} Reproduction: {code:python} import pyarrow as pa table = pa.Table.from_arrays([pa.array([1,2,3])], names=["a"]) batches = table.to_batches() batches[0].equals(None) {code} {noformat} #0 0x7fffdf9d34f0 in arrow::RecordBatch::num_columns() const () from /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17 #1 0x7fffdf9d69e9 in arrow::RecordBatch::Equals(arrow::RecordBatch const&, bool) const () from /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/libarrow.so.17 #2 0x7fffe084a6e0 in __pyx_pw_7pyarrow_3lib_11RecordBatch_31equals(_object*, _object*, _object*) () from /home/lidavidm/Code/twosigma/arrow/venv2/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so #3 0x556b97e4 in _PyMethodDef_RawFastCallKeywords (method=0x7fffe0c1b760 <__pyx_methods_7pyarrow_3lib_RecordBatch+288>, self=0x7fffdefd7110, args=0x7786f5c8, nargs=, kwnames=) at /tmp/build/80754af9/python_1585000375785/work/Objects/call.c:694 #4 0x556c06af in _PyMethodDescr_FastCallKeywords (descrobj=0x7fffdefa4050, args=0x7786f5c0, nargs=2, kwnames=0x0) at /tmp/build/80754af9/python_1585000375785/work/Objects/descrobject.c:288 #5 0x55724add in call_function (kwnames=0x0, oparg=2, pp_stack=) at /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:4593 #6 _PyEval_EvalFrameDefault (f=, throwflag=) at /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3110 #7 0x55669289 in _PyEval_EvalCodeWithName (_co=0x778a68a0, globals=, locals=, args=, argcount=, kwnames=0x0, kwargs=0x0, kwcount=, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3930 #8 0x5566a1c4 in PyEval_EvalCodeEx (_co=, globals=, locals=, args=, argcount=, kws=, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:3959 #9 0x5566a1ec in PyEval_EvalCode (co=, globals=, locals=) at /tmp/build/80754af9/python_1585000375785/work/Python/ceval.c:524 #10 0x55780cb4 in run_mod (mod=, filename=, globals=0x778d7c30, locals=0x778d7c30, flags=, arena=) at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:1035 #11 0x5578b0d1 in PyRun_FileExFlags (fp=0x558c24d0, filename_str=, start=, globals=0x778d7c30, locals=0x778d7c30, closeit=1, flags=0x7fffe1b0) at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:988 #12 0x5578b2c3 in PyRun_SimpleFileExFlags (fp=0x558c24d0, filename=, closeit=1, flags=0x7fffe1b0) at /tmp/build/80754af9/python_1585000375785/work/Python/pythonrun.c:429 #13 0x5578c3f5 in pymain_run_file (p_cf=0x7fffe1b0, filename=0x558e51f0 L"repro.py", fp=0x558c24d0) at /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:462 #14 pymain_run_filename (cf=0x7fffe1b0, pymain=0x7fffe2c0) at /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:1641 #15 pymain_run_python (pymain=0x7fffe2c0) at /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:2902 #16 pymain_main (pymain=0x7fffe2c0) at /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3442 #17 0x5578c51c in _Py_UnixMain (argc=, argv=) at /tmp/build/80754af9/python_1585000375785/work/Modules/main.c:3477 #18 0x77dcd002 in __libc_start_main () from /usr/lib/libc.so.6 #19 0x5572fac0 in _start () at ../sysdeps/x86_64/elf/start.S:103 {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8885) [R] Don't include everything everywhere
[ https://issues.apache.org/jira/browse/ARROW-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques resolved ARROW-8885. --- Resolution: Fixed Issue resolved by pull request 7245 [https://github.com/apache/arrow/pull/7245] > [R] Don't include everything everywhere > --- > > Key: ARROW-8885 > URL: https://issues.apache.org/jira/browse/ARROW-8885 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > I noticed that we were jamming all of our arrow #includes in one header file > in the R bindings and then including that everywhere. Seemed like that was > wasteful and probably causing compilation to be slower. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8696) [Java] Convert tests to integration tests
[ https://issues.apache.org/jira/browse/ARROW-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Murray resolved ARROW-8696. Resolution: Fixed > [Java] Convert tests to integration tests > - > > Key: ARROW-8696 > URL: https://issues.apache.org/jira/browse/ARROW-8696 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Ryan Murray >Assignee: Ryan Murray >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > Some tests under arrow-memory and arrow-vector are integration tests but run > via main(). We should convert them to proper integration tests under maven > failsafe -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8696) [Java] Convert tests to integration tests
[ https://issues.apache.org/jira/browse/ARROW-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113870#comment-17113870 ] Ryan Murray commented on ARROW-8696: Closed in https://github.com/apache/arrow/pull/7100 via 93ba086 > [Java] Convert tests to integration tests > - > > Key: ARROW-8696 > URL: https://issues.apache.org/jira/browse/ARROW-8696 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Ryan Murray >Assignee: Ryan Murray >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > Some tests under arrow-memory and arrow-vector are integration tests but run > via main(). We should convert them to proper integration tests under maven > failsafe -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8888) [Python] Heuristic in dataframe_to_arrays that decides to multithread convert cause slow conversions
[ https://issues.apache.org/jira/browse/ARROW-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Glasson updated ARROW-: - Description: When calling pa.Table.from_pandas() the code path that uses the ThreadPoolExecutor in dataframe_to_arrays (called by Table.from_pandas) the conversion is much much slower. I have a simple example - but the time difference is much worse with a real table. {code:java} Python 3.7.3 | packaged by conda-forge | (default, Dec 6 2019, 08:54:18) Type 'copyright', 'credits' or 'license' for more information IPython 7.13.0 – An enhanced Interactive Python. Type '?' for help. In [1]: import pyarrow as pa In [2]: import pandas as pd In [3]: df = pd.DataFrame({"A": [0] * 1000}) In [4]: %timeit table = pa.Table.from_pandas(df) 577 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [5]: %timeit table = pa.Table.from_pandas(df, nthreads=1) 106 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1 loops each) {code} was: When calling pa.Table.from_pandas() the code path that uses the ThreadPoolExecutor in dataframe_to_arrays (called by Table.from_pandas) the conversion is much much slower. I have a simple example - but the time difference is much worse with a real table. {code:java} Python 3.7.3 | packaged by conda-forge | (default, Dec 6 2019, 08:54:18) Type 'copyright', 'credits' or 'license' for more information IPython 7.13.0 – An enhanced Interactive Python. Type '?' for help. In [1]: import pyarrow as pa In [2]: import pandas as pd In [3]: df = pd.DataFrame({"A": [0] * 1000}) In [4]: %timeit table = pa.Table.from_pandas(df) 577 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [5]: %timeit table = pa.Table.from_pandas(df, nthreads=1) 106 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1 loops each) {code} Summary: [Python] Heuristic in dataframe_to_arrays that decides to multithread convert cause slow conversions (was: Heuristic in dataframe_to_arrays that decides to multithread convert cause slow conversions) > [Python] Heuristic in dataframe_to_arrays that decides to multithread convert > cause slow conversions > > > Key: ARROW- > URL: https://issues.apache.org/jira/browse/ARROW- > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.16.0 > Environment: MacOS: 10.15.4 (Also happening on windows 10) > Python: 3.7.3 > Pyarrow: 0.16.0 > Pandas: 0.25.3 >Reporter: Kevin Glasson >Priority: Minor > > When calling pa.Table.from_pandas() the code path that uses the > ThreadPoolExecutor in dataframe_to_arrays (called by Table.from_pandas) the > conversion is much much slower. > > I have a simple example - but the time difference is much worse with a real > table. > > {code:java} > Python 3.7.3 | packaged by conda-forge | (default, Dec 6 2019, 08:54:18) > Type 'copyright', 'credits' or 'license' for more information > IPython 7.13.0 – An enhanced Interactive Python. Type '?' for help. > In [1]: import pyarrow as pa > In [2]: import pandas as pd > In [3]: df = pd.DataFrame({"A": [0] * 1000}) > In [4]: %timeit table = pa.Table.from_pandas(df) > 577 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) > In [5]: %timeit table = pa.Table.from_pandas(df, nthreads=1) > 106 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1 loops each) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8888) Heuristic in dataframe_to_arrays that decides to multithread convert cause slow conversions
[ https://issues.apache.org/jira/browse/ARROW-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Glasson updated ARROW-: - Description: When calling pa.Table.from_pandas() the code path that uses the ThreadPoolExecutor in dataframe_to_arrays (called by Table.from_pandas) the conversion is much much slower. I have a simple example - but the time difference is much worse with a real table. {code:java} Python 3.7.3 | packaged by conda-forge | (default, Dec 6 2019, 08:54:18) Type 'copyright', 'credits' or 'license' for more information IPython 7.13.0 – An enhanced Interactive Python. Type '?' for help. In [1]: import pyarrow as pa In [2]: import pandas as pd In [3]: df = pd.DataFrame({"A": [0] * 1000}) In [4]: %timeit table = pa.Table.from_pandas(df) 577 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [5]: %timeit table = pa.Table.from_pandas(df, nthreads=1) 106 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1 loops each) {code} was: When calling pa.Table.from_pandas() the code path that uses the ThreadPoolExecutor in dataframe_to_arrays (called by Table.from_pandas) the conversion is much much slower. I have a simple example - but the time difference is much worse with a real table. Python 3.7.3 | packaged by conda-forge | (default, Dec 6 2019, 08:54:18) Type 'copyright', 'credits' or 'license' for more information IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import pyarrow as pa In [2]: import pandas as pd In [3]: df = pd.DataFrame(\{"A": [0] * 1000}) In [4]: %timeit table = pa.Table.from_pandas(df) 577 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [5]: %timeit table = pa.Table.from_pandas(df, nthreads=1) 106 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1 loops each) > Heuristic in dataframe_to_arrays that decides to multithread convert cause > slow conversions > --- > > Key: ARROW- > URL: https://issues.apache.org/jira/browse/ARROW- > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.16.0 > Environment: MacOS: 10.15.4 (Also happening on windows 10) > Python: 3.7.3 > Pyarrow: 0.16.0 > Pandas: 0.25.3 >Reporter: Kevin Glasson >Priority: Minor > > When calling pa.Table.from_pandas() the code path that uses the > ThreadPoolExecutor in dataframe_to_arrays (called by Table.from_pandas) the > conversion is much much slower. > > I have a simple example - but the time difference is much worse with a real > table. > > > {code:java} > Python 3.7.3 | packaged by conda-forge | (default, Dec 6 2019, 08:54:18) > Type 'copyright', 'credits' or 'license' for more information > IPython 7.13.0 – An enhanced Interactive Python. Type '?' for help. > In [1]: import pyarrow as pa > In [2]: import pandas as pd > In [3]: df = pd.DataFrame({"A": [0] * 1000}) > In [4]: %timeit table = pa.Table.from_pandas(df) > 577 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) > In [5]: %timeit table = pa.Table.from_pandas(df, nthreads=1) > 106 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1 loops each) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8888) Heuristic in dataframe_to_arrays that decides to multithread convert cause slow conversions
Kevin Glasson created ARROW-: Summary: Heuristic in dataframe_to_arrays that decides to multithread convert cause slow conversions Key: ARROW- URL: https://issues.apache.org/jira/browse/ARROW- Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.16.0 Environment: MacOS: 10.15.4 (Also happening on windows 10) Python: 3.7.3 Pyarrow: 0.16.0 Pandas: 0.25.3 Reporter: Kevin Glasson When calling pa.Table.from_pandas() the code path that uses the ThreadPoolExecutor in dataframe_to_arrays (called by Table.from_pandas) the conversion is much much slower. I have a simple example - but the time difference is much worse with a real table. Python 3.7.3 | packaged by conda-forge | (default, Dec 6 2019, 08:54:18) Type 'copyright', 'credits' or 'license' for more information IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import pyarrow as pa In [2]: import pandas as pd In [3]: df = pd.DataFrame(\{"A": [0] * 1000}) In [4]: %timeit table = pa.Table.from_pandas(df) 577 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [5]: %timeit table = pa.Table.from_pandas(df, nthreads=1) 106 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1 loops each) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8402) [Java] Support ValidateFull methods in Java
[ https://issues.apache.org/jira/browse/ARROW-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8402: -- Labels: pull-request-available (was: ) > [Java] Support ValidateFull methods in Java > --- > > Key: ARROW-8402 > URL: https://issues.apache.org/jira/browse/ARROW-8402 > Project: Apache Arrow > Issue Type: New Feature > Components: Java >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We need to support ValidateFull methods in Java, just like we do in C++. > This is required by ARROW-5926. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8887) [Java] Buffer size for complex vectors increases rapidly in case of clear/write loop
[ https://issues.apache.org/jira/browse/ARROW-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pindikura Ravindra resolved ARROW-8887. --- Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 7247 [https://github.com/apache/arrow/pull/7247] > [Java] Buffer size for complex vectors increases rapidly in case of > clear/write loop > > > Key: ARROW-8887 > URL: https://issues.apache.org/jira/browse/ARROW-8887 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Projjal Chanda >Assignee: Projjal Chanda >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Similar to https://issues.apache.org/jira/browse/ARROW-5232 -- This message was sent by Atlassian Jira (v8.3.4#803005)