[jira] [Commented] (ARROW-13546) [Python] Breaking API change in FSSpecHandler, requires metadata argument

2021-08-04 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-13546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393032#comment-17393032 ] Maarten Breddels commented on ARROW-13546: -- My current workaround, to make it backward

[jira] [Created] (ARROW-13546) [Python] Breaking API change in FSSpecHandler, requires metadata argument

2021-08-04 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-13546: Summary: [Python] Breaking API change in FSSpecHandler, requires metadata argument Key: ARROW-13546 URL: https://issues.apache.org/jira/browse/ARROW-13546

[jira] [Commented] (ARROW-13259) [C++] Enable slicing to end of string using "utf8_slice_codeunits" when string length unknown or different lengths

2021-07-05 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-13259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17374890#comment-17374890 ] Maarten Breddels commented on ARROW-13259: -- Does my comment

[jira] [Commented] (ARROW-12608) [C++] Add option to split_pattern compute kernel to split by regex

2021-04-30 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337317#comment-17337317 ] Maarten Breddels commented on ARROW-12608: -- I agree a split_pattern_regex might make sense, you

[jira] [Commented] (ARROW-12547) Sigbus when using mmap in multiprocessing env over netapp

2021-04-26 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17332653#comment-17332653 ] Maarten Breddels commented on ARROW-12547: -- I recommend trying without memory mapping. If IO

[jira] [Commented] (ARROW-3016) [C++] Add ability to enable call stack logging for each memory allocation

2021-02-18 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286400#comment-17286400 ] Maarten Breddels commented on ARROW-3016: - I'd also recommend using perf with uprobes for this,

[jira] [Commented] (ARROW-11000) [Python] Enable random access reading for Python file objects (if supported)

2020-12-21 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253047#comment-17253047 ] Maarten Breddels commented on ARROW-11000: -- did you check with passing

[jira] [Created] (ARROW-10959) [C++] Add scalar string join kernel

2020-12-18 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-10959: Summary: [C++] Add scalar string join kernel Key: ARROW-10959 URL: https://issues.apache.org/jira/browse/ARROW-10959 Project: Apache Arrow Issue

[jira] [Commented] (ARROW-10557) [C++] Add scalar string slicing/substring kernel

2020-12-18 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251793#comment-17251793 ] Maarten Breddels commented on ARROW-10557: -- This would be easier to implement using the tools

[jira] [Commented] (ARROW-10799) [C++] Take on string chunked arrays slow and fails

2020-12-03 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243437#comment-17243437 ] Maarten Breddels commented on ARROW-10799: -- Would you mind opening a draft PR for that, in case

[jira] [Commented] (ARROW-10799) [C++] Take on string chunked arrays slow and fails

2020-12-03 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243422#comment-17243422 ] Maarten Breddels commented on ARROW-10799: -- Ah yes, that implementation makes sense. I saw the

[jira] [Commented] (ARROW-10799) [C++] Take on string chunked arrays slow and fails

2020-12-03 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243417#comment-17243417 ] Maarten Breddels commented on ARROW-10799: -- {code:java} import pyarrow as pa a = pa.array(['a']

[jira] [Created] (ARROW-10799) [C++] Take on string chunked arrays slow and fails

2020-12-03 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-10799: Summary: [C++] Take on string chunked arrays slow and fails Key: ARROW-10799 URL: https://issues.apache.org/jira/browse/ARROW-10799 Project: Apache Arrow

[jira] [Commented] (ARROW-10739) [Python] Pickling a sliced array serializes all the buffers

2020-11-26 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239312#comment-17239312 ] Maarten Breddels commented on ARROW-10739: -- Ok, good to know. Two workarounds I came up with  

[jira] [Commented] (ARROW-10736) [Python] feather/arrow row splitting and counting (Dataset API)

2020-11-26 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239305#comment-17239305 ] Maarten Breddels commented on ARROW-10736: -- Thanks, I tried scan with an empty schema on the

[jira] [Created] (ARROW-10739) [Python] Pickling a sliced array serializes all the buffers

2020-11-25 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-10739: Summary: [Python] Pickling a sliced array serializes all the buffers Key: ARROW-10739 URL: https://issues.apache.org/jira/browse/ARROW-10739 Project: Apache

[jira] [Created] (ARROW-10736) [Python] feather/arrow row splitting and counting (Dataset API)

2020-11-25 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-10736: Summary: [Python] feather/arrow row splitting and counting (Dataset API) Key: ARROW-10736 URL: https://issues.apache.org/jira/browse/ARROW-10736 Project:

[jira] [Commented] (ARROW-10709) [Python] Difficult to make an efficient zero-copy file reader in Python

2020-11-24 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238106#comment-17238106 ] Maarten Breddels commented on ARROW-10709: -- Pandas also does not like it when .read returns a

[jira] [Created] (ARROW-10709) [Python] Difficult to make an efficient zero-copy file reader in Python

2020-11-24 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-10709: Summary: [Python] Difficult to make an efficient zero-copy file reader in Python Key: ARROW-10709 URL: https://issues.apache.org/jira/browse/ARROW-10709

[jira] [Commented] (ARROW-10640) [C++] A "where" kernel to combine two arrays based on a mask

2020-11-19 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235563#comment-17235563 ] Maarten Breddels commented on ARROW-10640: -- Yes, that would maybe be the 'ultimate' variant,

[jira] [Commented] (ARROW-10640) [C++] A "where" kernel to combine two arrays based on a mask

2020-11-18 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234481#comment-17234481 ] Maarten Breddels commented on ARROW-10640: -- Another idea would be to have a 'choose' like

[jira] [Commented] (ARROW-9489) [C++] Add fill_null kernel implementation for (array[string], scalar[string])

2020-11-11 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229989#comment-17229989 ] Maarten Breddels commented on ARROW-9489: - Yes, I thought about that too. Although I think a

[jira] [Created] (ARROW-10557) [C++] Add scalar string slicing/substring kernel

2020-11-11 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-10557: Summary: [C++] Add scalar string slicing/substring kernel Key: ARROW-10557 URL: https://issues.apache.org/jira/browse/ARROW-10557 Project: Apache Arrow

[jira] [Created] (ARROW-10556) [C++] Caching pre computed data based on FunctionOptions in the kernel state

2020-11-11 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-10556: Summary: [C++] Caching pre computed data based on FunctionOptions in the kernel state Key: ARROW-10556 URL: https://issues.apache.org/jira/browse/ARROW-10556

[jira] [Created] (ARROW-10541) [C++] Add re2 library to core arrow / ARROW_WITH_RE2

2020-11-10 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-10541: Summary: [C++] Add re2 library to core arrow / ARROW_WITH_RE2 Key: ARROW-10541 URL: https://issues.apache.org/jira/browse/ARROW-10541 Project: Apache Arrow

[jira] [Commented] (ARROW-9489) [C++] Add fill_null kernel implementation for (array[string], scalar[string])

2020-11-05 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226572#comment-17226572 ] Maarten Breddels commented on ARROW-9489: - Yes, happy to take this on, since it's an ugly code

[jira] [Assigned] (ARROW-9128) [C++] Implement string space trimming kernels: trim, ltrim, and rtrim

2020-10-15 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maarten Breddels reassigned ARROW-9128: --- Assignee: Maarten Breddels > [C++] Implement string space trimming kernels: trim,

[jira] [Created] (ARROW-10306) [C++] Add string replacement kernel

2020-10-14 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-10306: Summary: [C++] Add string replacement kernel Key: ARROW-10306 URL: https://issues.apache.org/jira/browse/ARROW-10306 Project: Apache Arrow Issue

[jira] [Commented] (ARROW-9128) [C++] Implement string space trimming kernels: trim, ltrim, and rtrim

2020-10-14 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213857#comment-17213857 ] Maarten Breddels commented on ARROW-9128: - Shall I implement this? > [C++] Implement string

[jira] [Created] (ARROW-10209) [Python] support positional arguments for options in compute wrapper

2020-10-07 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-10209: Summary: [Python] support positional arguments for options in compute wrapper Key: ARROW-10209 URL: https://issues.apache.org/jira/browse/ARROW-10209

[jira] [Created] (ARROW-10208) [C++] comparing list arrays with nulls fails in test framework

2020-10-07 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-10208: Summary: [C++] comparing list arrays with nulls fails in test framework Key: ARROW-10208 URL: https://issues.apache.org/jira/browse/ARROW-10208 Project:

[jira] [Created] (ARROW-10207) C++] Unary kernels that results in a list have no preallocated offset buffer

2020-10-07 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-10207: Summary: C++] Unary kernels that results in a list have no preallocated offset buffer Key: ARROW-10207 URL: https://issues.apache.org/jira/browse/ARROW-10207

[jira] [Created] (ARROW-10195) [C++] Add string struct extract kernel using re2

2020-10-06 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-10195: Summary: [C++] Add string struct extract kernel using re2 Key: ARROW-10195 URL: https://issues.apache.org/jira/browse/ARROW-10195 Project: Apache Arrow

[jira] [Commented] (ARROW-10023) [Gandiva][C++] Implementing Split part function in gandiva

2020-09-17 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197892#comment-17197892 ] Maarten Breddels commented on ARROW-10023: -- It's gonna be in C++, I can push an initial version

[jira] [Commented] (ARROW-9991) [C++] split kernels for strings/binary

2020-09-16 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17196786#comment-17196786 ] Maarten Breddels commented on ARROW-9991: - Indeed, and whatever Unicode specifies as 'whitespace'

[jira] [Commented] (ARROW-10023) Implementing Split part function in gandiva

2020-09-16 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17196785#comment-17196785 ] Maarten Breddels commented on ARROW-10023: -- Probably related to

[jira] [Updated] (ARROW-9991) [C++] split kernels for strings/binary

2020-09-14 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maarten Breddels updated ARROW-9991: Summary: [C++] split kernels for strings/binary (was: [C++] split kernsl for

[jira] [Updated] (ARROW-9991) [C++] split kernsl for strings/binary

2020-09-14 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maarten Breddels updated ARROW-9991: Description: Similar to Python str.split and bytes.split, we'd like to have a way to

[jira] [Created] (ARROW-9991) [C++] split kernsl for strings/binary

2020-09-14 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-9991: --- Summary: [C++] split kernsl for strings/binary Key: ARROW-9991 URL: https://issues.apache.org/jira/browse/ARROW-9991 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-9471) [C++] Scan Dataset in reverse

2020-07-14 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-9471: --- Summary: [C++] Scan Dataset in reverse Key: ARROW-9471 URL: https://issues.apache.org/jira/browse/ARROW-9471 Project: Apache Arrow Issue Type:

[jira] [Commented] (ARROW-9458) [Python] Dataset singlethreaded only

2020-07-14 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157387#comment-17157387 ] Maarten Breddels commented on ARROW-9458: - let me know if you want to do the honors yourself,

[jira] [Commented] (ARROW-9458) [Python] Dataset singlethreaded only

2020-07-14 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157374#comment-17157374 ] Maarten Breddels commented on ARROW-9458: - Indeed, seeing a massive speedup. Too bad py-spy

[jira] [Commented] (ARROW-9458) [Python] Dataset singlethreaded only

2020-07-14 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157340#comment-17157340 ] Maarten Breddels commented on ARROW-9458: - Did you set ? batch_size=1_000_000 > [Python] Dataset

[jira] [Commented] (ARROW-9458) [Python] Dataset singlethreaded only

2020-07-14 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157338#comment-17157338 ] Maarten Breddels commented on ARROW-9458: -   Running this (now with all columns) {code:java}

[jira] [Updated] (ARROW-9458) [Python] Dataset singlethreaded only

2020-07-14 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maarten Breddels updated ARROW-9458: Attachment: image-2020-07-14-14-38-16-767.png > [Python] Dataset singlethreaded only >

[jira] [Updated] (ARROW-9458) [Python] Dataset singlethreaded only

2020-07-14 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maarten Breddels updated ARROW-9458: Attachment: image-2020-07-14-14-31-29-943.png > [Python] Dataset singlethreaded only >

[jira] [Closed] (ARROW-9456) [Python] Dataset segfault when not importing pyarrow.parquet

2020-07-14 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maarten Breddels closed ARROW-9456. --- Resolution: Not A Bug > [Python] Dataset segfault when not importing pyarrow.parquet >

[jira] [Commented] (ARROW-9456) [Python] Dataset segfault when not importing pyarrow.parquet

2020-07-14 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157293#comment-17157293 ] Maarten Breddels commented on ARROW-9456: - Note that you should not run the vaex parquet example

[jira] [Commented] (ARROW-9444) [C++][Doc] Undocumented compute functions (string_isalpha, etc.)

2020-07-14 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157235#comment-17157235 ] Maarten Breddels commented on ARROW-9444: - Feel free to assign to me, I didn't know there was a

[jira] [Commented] (ARROW-9456) [Python] Dataset segfault when not importing pyarrow.parquet

2020-07-14 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157231#comment-17157231 ] Maarten Breddels commented on ARROW-9456: - This file gives me the same problem {code:java} import

[jira] [Created] (ARROW-9458) [Python] Dataset singlethreaded only

2020-07-14 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-9458: --- Summary: [Python] Dataset singlethreaded only Key: ARROW-9458 URL: https://issues.apache.org/jira/browse/ARROW-9458 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-9456) [Python] Dataset segfault when not importing pyarrow.parquet

2020-07-14 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-9456: --- Summary: [Python] Dataset segfault when not importing pyarrow.parquet Key: ARROW-9456 URL: https://issues.apache.org/jira/browse/ARROW-9456 Project: Apache

[jira] [Created] (ARROW-9403) [Python] add .tolist as alias of to_pylist

2020-07-10 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-9403: --- Summary: [Python] add .tolist as alias of to_pylist Key: ARROW-9403 URL: https://issues.apache.org/jira/browse/ARROW-9403 Project: Apache Arrow Issue

[jira] [Updated] (ARROW-9403) [Python] add .tolist as alias of .to_pylist

2020-07-10 Thread Maarten Breddels (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maarten Breddels updated ARROW-9403: Summary: [Python] add .tolist as alias of .to_pylist (was: [Python] add .tolist as alias

[jira] [Created] (ARROW-9268) [C++] Add is{alnum,alpha,...} kernels for strings

2020-06-29 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-9268: --- Summary: [C++] Add is{alnum,alpha,...} kernels for strings Key: ARROW-9268 URL: https://issues.apache.org/jira/browse/ARROW-9268 Project: Apache Arrow

[jira] [Created] (ARROW-9133) [C++] Add utf8_upper and utf_lower

2020-06-15 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-9133: --- Summary: [C++] Add utf8_upper and utf_lower Key: ARROW-9133 URL: https://issues.apache.org/jira/browse/ARROW-9133 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-9131) [C++] Faster ascii_lower and ascii_upper

2020-06-15 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-9131: --- Summary: [C++] Faster ascii_lower and ascii_upper Key: ARROW-9131 URL: https://issues.apache.org/jira/browse/ARROW-9131 Project: Apache Arrow Issue