[jira] [Created] (ARROW-11564) Cython/ReadTheDocs broken?
Leo Meyerovich created ARROW-11564: -- Summary: Cython/ReadTheDocs broken? Key: ARROW-11564 URL: https://issues.apache.org/jira/browse/ARROW-11564 Project: Apache Arrow Issue Type: Bug Affects Versions: 3.0.0 Reporter: Leo Meyerovich Our ReadTheDocs was regenerating some docs and threw a surprising error on PyArrow's Cython dependency * CPython 3.0 fail on Arrow 3.0.0's Cython: [https://readthedocs.org/projects/pygraphistry/builds/12971110/ |https://readthedocs.org/projects/pygraphistry/builds/12971110/] * Previous build passes on Arrow 2.0.0: https://readthedocs.org/api/v2/build/12767582.txt * Boring setup.py: [https://github.com/graphistry/pygraphistry/blob/master/setup.py#L49] I'm guessing there may be something to file w/ ReadTheDocs on env, but.. surprising regression -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-8053) [JS] Improve performance of filtering
[ https://issues.apache.org/jira/browse/ARROW-8053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056567#comment-17056567 ] Leo Meyerovich edited comment on ARROW-8053 at 3/11/20, 1:26 AM: - Sorry, we never got support for continuing our arrow js plans, namely, moving on to the GPU bindings & compute stack. That would have included this case: * deforested typed native code via wasm * multicore & simd via workers & wasm * GPU interp via RAPIDS IMO we need the follow-up tutorial on real-world IO and follow-up on real compute (not the weird predicate stuff). My limited cycles need to be on that or figuring out GPU IO bindings. To be clear, you can manually do the ^^^ on this use case. It'd take real experimentation, e.g., not even clear here if row-based, columnar, or a tiled hybrid is the right choice. And probably not the filter predicates - we definitely avoid them in prod ;) was (Author: lmeyerov): Sorry, we never got support for continuing our arrow js plans, namely, moving on to the GPU bindings & compute stack. That would have included this case: * deforested typed native code via wasm * multicore & simd via workers & wasm * GPU interp via RAPIDS IMO we need the follow-up tutorial on real-world IO and follow-up on real compute (not the weird predicate stuff). My limited cycles need to be on that or figuring out GPU IO bindings. To be clear, you can manually do the ^^^ on this use case. It'd take real experimentation, e.g., not even clear here if row-based, columnar, or a tiled hybrid is the right choice. > [JS] Improve performance of filtering > - > > Key: ARROW-8053 > URL: https://issues.apache.org/jira/browse/ARROW-8053 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Will Strimling >Priority: Major > > A series of observable notebooks have shown quite convincingly that arrow > doesn't compete with other libraries or JavaScript when it comes to filtering > performance. Has there been any discussion or roadmaps established for > improving it? > Most convincing Observables: > * > [https://observablehq.com/@duaneatat/apache-arrow-filtering-vs-array-filter] > * > [https://observablehq.com/@robertleeplummerjr/array-filtering-apache-arrow-vs-gpu-js-textures-vs-array-fil] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-8053) [JS] Improve performance of filtering
[ https://issues.apache.org/jira/browse/ARROW-8053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056567#comment-17056567 ] Leo Meyerovich edited comment on ARROW-8053 at 3/11/20, 1:25 AM: - Sorry, we never got support for continuing our arrow js plans, namely, moving on to the GPU bindings & compute stack. That would have included this case: * deforested typed native code via wasm * multicore & simd via workers & wasm * GPU interp via RAPIDS IMO we need the follow-up tutorial on real-world IO and follow-up on real compute (not the weird predicate stuff). My limited cycles need to be on that or figuring out GPU IO bindings. To be clear, you can manually do the ^^^ on this use case. It'd take real experimentation, e.g., not even clear here if row-based, columnar, or a tiled hybrid is the right choice. was (Author: lmeyerov): Sorry, we never got support for continuing our arrow js plans, namely, moving on to the GPU bindings & compute stack. That would have included this case: * deforested typed native code via wasm * multicore & simd via workers & wasm * GPU interp via RAPIDS IMO we need the follow-up tutorial on real-world IO and follow-up on real compute (not the weird predicate stuff). To be clear, you can manually do the ^^^ on this use case. It'd take real experimentation, e.g., not even clear here if row-based, columnar, or a tiled hybrid is the right choice. > [JS] Improve performance of filtering > - > > Key: ARROW-8053 > URL: https://issues.apache.org/jira/browse/ARROW-8053 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Will Strimling >Priority: Major > > A series of observable notebooks have shown quite convincingly that arrow > doesn't compete with other libraries or JavaScript when it comes to filtering > performance. Has there been any discussion or roadmaps established for > improving it? > Most convincing Observables: > * > [https://observablehq.com/@duaneatat/apache-arrow-filtering-vs-array-filter] > * > [https://observablehq.com/@robertleeplummerjr/array-filtering-apache-arrow-vs-gpu-js-textures-vs-array-fil] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-8053) [JS] Improve performance of filtering
[ https://issues.apache.org/jira/browse/ARROW-8053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056567#comment-17056567 ] Leo Meyerovich edited comment on ARROW-8053 at 3/11/20, 1:23 AM: - Sorry, we never got support for continuing our arrow js plans, namely, moving on to the GPU bindings & compute stack. That would have included this case: * deforested typed native code via wasm * multicore & simd via workers & wasm * GPU interp via RAPIDS IMO we need the follow-up tutorial on real-world IO and follow-up on real compute (not the weird predicate stuff). To be clear, you can manually do the ^^^ on this use case. It'd take real experimentation, e.g., not even clear here if row-based, columnar, or a tiled hybrid is the right choice. was (Author: lmeyerov): Sorry, we never got support for continuing our arrow js plans, namely, moving on to the GPU bindings & compute stack. That would have included this case: * deforested typed native code via wasm * multicore & simd via workers & wasm * GPU interp via RAPIDS IMO we need the follow-up tutorial on real-world IO and follow-up on real compute (not the weird predicate stuff) > [JS] Improve performance of filtering > - > > Key: ARROW-8053 > URL: https://issues.apache.org/jira/browse/ARROW-8053 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Will Strimling >Priority: Major > > A series of observable notebooks have shown quite convincingly that arrow > doesn't compete with other libraries or JavaScript when it comes to filtering > performance. Has there been any discussion or roadmaps established for > improving it? > Most convincing Observables: > * > [https://observablehq.com/@duaneatat/apache-arrow-filtering-vs-array-filter] > * > [https://observablehq.com/@robertleeplummerjr/array-filtering-apache-arrow-vs-gpu-js-textures-vs-array-fil] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8053) [JS] Improve performance of filtering
[ https://issues.apache.org/jira/browse/ARROW-8053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056567#comment-17056567 ] Leo Meyerovich commented on ARROW-8053: --- Sorry, we never got support for continuing our arrow js plans, namely, moving on to the GPU bindings & compute stack. That would have included this case: * deforested typed native code via wasm * multicore & simd via workers & wasm * GPU interp via RAPIDS IMO we need the follow-up tutorial on real-world IO and follow-up on real compute (not the weird predicate stuff) > [JS] Improve performance of filtering > - > > Key: ARROW-8053 > URL: https://issues.apache.org/jira/browse/ARROW-8053 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Will Strimling >Priority: Major > > A series of observable notebooks have shown quite convincingly that arrow > doesn't compete with other libraries or JavaScript when it comes to filtering > performance. Has there been any discussion or roadmaps established for > improving it? > Most convincing Observables: > * > [https://observablehq.com/@duaneatat/apache-arrow-filtering-vs-array-filter] > * > [https://observablehq.com/@robertleeplummerjr/array-filtering-apache-arrow-vs-gpu-js-textures-vs-array-fil] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types
[ https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023149#comment-17023149 ] Leo Meyerovich commented on ARROW-7513: --- Thanks Brian! I'm slammed for another ~2w, and then will work on Part II. The other thing I realized that is missing here is charts showing [] vs Arrow perf (speed + mem use). > [JS] Arrow Tutorial: Common data types > -- > > Key: ARROW-7513 > URL: https://issues.apache.org/jira/browse/ARROW-7513 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Leo Meyerovich >Assignee: Leo Meyerovich >Priority: Minor > Labels: pull-request-available > Fix For: 0.16.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The JS client lacks basic introductory material around creating the common > basic data types such as turning JS arrays into ints, dicts, etc. There is no > equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . > This has made use for myself difficult, and I bet for others. > > As with prev tutorials, I started sketching on > [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit] > . When we're happy can make sense to export as an html or something to the > repo, or just link from the main readme. > I believe the target topics worth covering are: > * Common user data types: Ints, Dicts, Struct, Time > * Common column types: Data, Vector, Column > * Going from individual & arrays & buffers of JS values to Arrow-wrapped > forms, and basic inspection of the result > Not worth going into here is Tables vs. RecordBatches, which is the other > tutorial. > > 1. Ideas of what to add/edit/remove? > 2. And anyone up for helping with discussion of Data vs. Vector, and ingest > of Time & Struct? > 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff > here. > > cc [~wesm] [~bhulette] [~paul.e.taylor] > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types
[ https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017539#comment-17017539 ] Leo Meyerovich commented on ARROW-7513: --- _nudge:_ [~paul.e.taylor] / [~bhulette] - need to do anything to get the PR link merged? Need to take care of some things, and then will see about extracting Part II out from the earlier versions on Data+Builders. > [JS] Arrow Tutorial: Common data types > -- > > Key: ARROW-7513 > URL: https://issues.apache.org/jira/browse/ARROW-7513 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Leo Meyerovich >Assignee: Leo Meyerovich >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The JS client lacks basic introductory material around creating the common > basic data types such as turning JS arrays into ints, dicts, etc. There is no > equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . > This has made use for myself difficult, and I bet for others. > > As with prev tutorials, I started sketching on > [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit] > . When we're happy can make sense to export as an html or something to the > repo, or just link from the main readme. > I believe the target topics worth covering are: > * Common user data types: Ints, Dicts, Struct, Time > * Common column types: Data, Vector, Column > * Going from individual & arrays & buffers of JS values to Arrow-wrapped > forms, and basic inspection of the result > Not worth going into here is Tables vs. RecordBatches, which is the other > tutorial. > > 1. Ideas of what to add/edit/remove? > 2. And anyone up for helping with discussion of Data vs. Vector, and ingest > of Time & Struct? > 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff > here. > > cc [~wesm] [~bhulette] [~paul.e.taylor] > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types
[ https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013248#comment-17013248 ] Leo Meyerovich commented on ARROW-7513: --- ... OK PR is up: [https://github.com/apache/arrow/pull/6163] > [JS] Arrow Tutorial: Common data types > -- > > Key: ARROW-7513 > URL: https://issues.apache.org/jira/browse/ARROW-7513 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Leo Meyerovich >Assignee: Leo Meyerovich >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The JS client lacks basic introductory material around creating the common > basic data types such as turning JS arrays into ints, dicts, etc. There is no > equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . > This has made use for myself difficult, and I bet for others. > > As with prev tutorials, I started sketching on > [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit] > . When we're happy can make sense to export as an html or something to the > repo, or just link from the main readme. > I believe the target topics worth covering are: > * Common user data types: Ints, Dicts, Struct, Time > * Common column types: Data, Vector, Column > * Going from individual & arrays & buffers of JS values to Arrow-wrapped > forms, and basic inspection of the result > Not worth going into here is Tables vs. RecordBatches, which is the other > tutorial. > > 1. Ideas of what to add/edit/remove? > 2. And anyone up for helping with discussion of Data vs. Vector, and ingest > of Time & Struct? > 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff > here. > > cc [~wesm] [~bhulette] [~paul.e.taylor] > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types
[ https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013225#comment-17013225 ] Leo Meyerovich commented on ARROW-7513: --- Worked w/ paul a bit, and part 1 is done afaict. See updated live link. Next step: -- any review is good -- I can update the main readme to point to it Later: -- bring back lower-level data.new and builders as part ii -- paul points out the helper structVector.getChildByName (slice-by-name) should really be built-in, and can be a nice first pr for someone > [JS] Arrow Tutorial: Common data types > -- > > Key: ARROW-7513 > URL: https://issues.apache.org/jira/browse/ARROW-7513 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Leo Meyerovich >Assignee: Leo Meyerovich >Priority: Minor > > The JS client lacks basic introductory material around creating the common > basic data types such as turning JS arrays into ints, dicts, etc. There is no > equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . > This has made use for myself difficult, and I bet for others. > > As with prev tutorials, I started sketching on > [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit] > . When we're happy can make sense to export as an html or something to the > repo, or just link from the main readme. > I believe the target topics worth covering are: > * Common user data types: Ints, Dicts, Struct, Time > * Common column types: Data, Vector, Column > * Going from individual & arrays & buffers of JS values to Arrow-wrapped > forms, and basic inspection of the result > Not worth going into here is Tables vs. RecordBatches, which is the other > tutorial. > > 1. Ideas of what to add/edit/remove? > 2. And anyone up for helping with discussion of Data vs. Vector, and ingest > of Time & Struct? > 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff > here. > > cc [~wesm] [~bhulette] [~paul.e.taylor] > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types
[ https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013192#comment-17013192 ] Leo Meyerovich commented on ARROW-7513: --- Great, thanks Paul, I cleaned up the int64vector stuff + initial struct stuff. Two last q's and we're done w/ Part I! * Is there a clean way to create a utf8 dict? I could do VectorUtf8.from(["str1", ...]), but no VectorDictionary.from, even with opts. * For a table col w/ a nested struct, any way to slice out a subcol, e.g., nick_col = tbl.get('richName').get('nick') > [JS] Arrow Tutorial: Common data types > -- > > Key: ARROW-7513 > URL: https://issues.apache.org/jira/browse/ARROW-7513 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Leo Meyerovich >Assignee: Leo Meyerovich >Priority: Minor > > The JS client lacks basic introductory material around creating the common > basic data types such as turning JS arrays into ints, dicts, etc. There is no > equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . > This has made use for myself difficult, and I bet for others. > > As with prev tutorials, I started sketching on > [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit] > . When we're happy can make sense to export as an html or something to the > repo, or just link from the main readme. > I believe the target topics worth covering are: > * Common user data types: Ints, Dicts, Struct, Time > * Common column types: Data, Vector, Column > * Going from individual & arrays & buffers of JS values to Arrow-wrapped > forms, and basic inspection of the result > Not worth going into here is Tables vs. RecordBatches, which is the other > tutorial. > > 1. Ideas of what to add/edit/remove? > 2. And anyone up for helping with discussion of Data vs. Vector, and ingest > of Time & Struct? > 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff > here. > > cc [~wesm] [~bhulette] [~paul.e.taylor] > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types
[ https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011451#comment-17011451 ] Leo Meyerovich commented on ARROW-7513: --- * Good: Updated the numerics section to use `VectorT.from(Array | Buffer)` ** Oddly, arrow.Int64Vector.from((new Uint32Array([2,3, 555,0, 1,0])).buffer) returns length 6, not 3 (0.15.0) * Bad:`VectorDictionary.from(['hello', 'hello', null, 'carrot'])` did not seem to work, so kept as lower-level for now * Bad: Still not sure how to do structs > [JS] Arrow Tutorial: Common data types > -- > > Key: ARROW-7513 > URL: https://issues.apache.org/jira/browse/ARROW-7513 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Leo Meyerovich >Assignee: Leo Meyerovich >Priority: Minor > > The JS client lacks basic introductory material around creating the common > basic data types such as turning JS arrays into ints, dicts, etc. There is no > equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . > This has made use for myself difficult, and I bet for others. > > As with prev tutorials, I started sketching on > [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit] > . When we're happy can make sense to export as an html or something to the > repo, or just link from the main readme. > I believe the target topics worth covering are: > * Common user data types: Ints, Dicts, Struct, Time > * Common column types: Data, Vector, Column > * Going from individual & arrays & buffers of JS values to Arrow-wrapped > forms, and basic inspection of the result > Not worth going into here is Tables vs. RecordBatches, which is the other > tutorial. > > 1. Ideas of what to add/edit/remove? > 2. And anyone up for helping with discussion of Data vs. Vector, and ingest > of Time & Struct? > 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff > here. > > cc [~wesm] [~bhulette] [~paul.e.taylor] > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types
[ https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010870#comment-17010870 ] Leo Meyerovich commented on ARROW-7513: --- Agreed, I'll see about forking this into Part I & Part II, where Part I is high-level api and move the Data stuff to Part II. I'm stumped on `structs` and `nested structs` though, any recs/examples? > [JS] Arrow Tutorial: Common data types > -- > > Key: ARROW-7513 > URL: https://issues.apache.org/jira/browse/ARROW-7513 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Leo Meyerovich >Assignee: Leo Meyerovich >Priority: Minor > > The JS client lacks basic introductory material around creating the common > basic data types such as turning JS arrays into ints, dicts, etc. There is no > equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . > This has made use for myself difficult, and I bet for others. > > As with prev tutorials, I started sketching on > [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit] > . When we're happy can make sense to export as an html or something to the > repo, or just link from the main readme. > I believe the target topics worth covering are: > * Common user data types: Ints, Dicts, Struct, Time > * Common column types: Data, Vector, Column > * Going from individual & arrays & buffers of JS values to Arrow-wrapped > forms, and basic inspection of the result > Not worth going into here is Tables vs. RecordBatches, which is the other > tutorial. > > 1. Ideas of what to add/edit/remove? > 2. And anyone up for helping with discussion of Data vs. Vector, and ingest > of Time & Struct? > 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff > here. > > cc [~wesm] [~bhulette] [~paul.e.taylor] > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7513) [JS] Arrow Tutorial: Common data types
Leo Meyerovich created ARROW-7513: - Summary: [JS] Arrow Tutorial: Common data types Key: ARROW-7513 URL: https://issues.apache.org/jira/browse/ARROW-7513 Project: Apache Arrow Issue Type: Task Components: JavaScript Reporter: Leo Meyerovich Assignee: Leo Meyerovich The JS client lacks basic introductory material around creating the common basic data types such as turning JS arrays into ints, dicts, etc. There is no equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . This has made use for myself difficult, and I bet for others. As with prev tutorials, I started sketching on [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit] . When we're happy can make sense to export as an html or something to the repo, or just link from the main readme. I believe the target topics worth covering are: * Common user data types: Ints, Dicts, Struct, Time * Common column types: Data, Vector, Column * Going from individual & arrays & buffers of JS values to Arrow-wrapped forms, and basic inspection of the result Not worth going into here is Tables vs. RecordBatches, which is the other tutorial. 1. Ideas of what to add/edit/remove? 2. And anyone up for helping with discussion of Data vs. Vector, and ingest of Time & Struct? 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff here. cc [~wesm] [~bhulette] [~paul.e.taylor] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7109) [JS] Create table from Arrays
[ https://issues.apache.org/jira/browse/ARROW-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973604#comment-16973604 ] Leo Meyerovich commented on ARROW-7109: --- Awesome! How to stream record batches in/out seems like the most meaningful. At least here, a lot of our use ends up being about this. So maybe help to share some of the arrow APIs used and maybe how/why. # Maybe a start is revisiting the early microbatch tutorial I had done before the API stabilized: [https://observablehq.com/d/6e565d7662d984ea |https://observablehq.com/d/6e565d7662d984ea] ^^^ Aimed to show microbatch API and prove some numbers 2. As a followup, demoing common fast IO needs: fast browser<>node, node process <> node process, and node process <> python process. A lot of our code ends up looking like: ``` export function asTable(source: Table | AsyncIterable, fields: string[] = []) { return AsyncIterableX.defer(async () => { if (source) { const batches = source instanceof Table ? source.chunks : await toArray(source); if (batches.length > 0) { if (!fields || !fields.length) { return AsyncIterableX.of(new Table(batches)); } const table = new Table(batches).select(...fields); if (table.schema.fields.length === fields.length) { return AsyncIterableX.of(table); } } } return AsyncIterableX.empty(); }); } export function asBatches(fn: () => DeferFnReturn) { return asReaders(fn).concatAll() as BatchesReturn; } export function asReaders(fn: () => DeferFnReturn) { return AsyncIterableX.defer(async () => { const x = await fn(); return RecordBatchStreamReader.readAll(x); }) as ReadersReturn; } ``` and ``` asReaders(() => { const nodeEncodingsStream = asTable(Array.isArray(nodeEncodings) ? this.node.table.encodings(nodeEncodings) : nodeEncodings); const edgeEncodingsStream = asTable(Array.isArray(edgeEncodings) ? this.edge.table.encodings(edgeEncodings) : edgeEncodings); const recordBatchesStream = AsyncIterable.concat( // `preshaped` is an AsyncIterable>, so flatten it here AsyncIterable.as(preshaped).flatMap((x) => asTable(x)), // Calling memoize() on the node/edge encodings streams ensures they start downloading // immediately, instead of serially after the incoming RecordBatch streams have completed nodeEncodingsStream.memoize().flatMap((\{ chunks }) => AsyncIterable.as(chunks)), edgeEncodingsStream.memoize().flatMap((\{ chunks }) => AsyncIterable.as(chunks)), ); return recordBatchesStream .pipe(RecordBatchStreamWriter.throughNode(\{ autoDestroy: false })) .pipe(this.got.stream.post('/preshaped/shaped', \{ headers: octetstream })); }); ``` > [JS] Create table from Arrays > - > > Key: ARROW-7109 > URL: https://issues.apache.org/jira/browse/ARROW-7109 > Project: Apache Arrow > Issue Type: Wish >Reporter: Sascha Hofmann >Assignee: Brian Hulette >Priority: Minor > Attachments: image-2019-11-12-09-11-39-751.png > > > I am trying to generate an arrow table from JS arrays and followed the > example from [here | > [https://observablehq.com/@lmeyerov/manipulating-flat-arrays-arrow-style]] > but I am struggling to generate different schemas, most importantly how to > provide a type different then 'floatingpoint'. Right now, I have: > {code:java} > const data = Table.from({ > schema: { > fields: [{name: 'a', nullable: false, children: Array(0), > type: {name: 'floatingpoint', precision: 'SINGLE'}}] > }, > batches: [{ >count: 10, >columns: [{ name: 'a', count: 10, VALIDITY: [], > DATA: Array.from({ length: 10 }, () => 'a') }]}] > }) > {code} > Which, of course is non-sense but I couldn't figure out how to provide the > type (I tried type: Utf8 among others). In general wouldn't it be a nice to > have create Table from object function? > On another note, are there any plans to make the docs a little bit more > descriptive? Happy to contribute! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7109) [JS] Create table from Arrays
[ https://issues.apache.org/jira/browse/ARROW-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972634#comment-16972634 ] Leo Meyerovich commented on ARROW-7109: --- Thanks – we use Arrow heavily internally for webgl->nodejs->nodeopencl+pydata, while our external efforts here are more of a labor of love. So while very much production quality, the docs are still generally "grep arrow + our codebase" :) Helping update the tutorials or docs based on your experiences would be awesome if you're up for it! > [JS] Create table from Arrays > - > > Key: ARROW-7109 > URL: https://issues.apache.org/jira/browse/ARROW-7109 > Project: Apache Arrow > Issue Type: Wish >Reporter: Sascha Hofmann >Assignee: Brian Hulette >Priority: Minor > Attachments: image-2019-11-12-09-11-39-751.png > > > I am trying to generate an arrow table from JS arrays and followed the > example from [here | > [https://observablehq.com/@lmeyerov/manipulating-flat-arrays-arrow-style]] > but I am struggling to generate different schemas, most importantly how to > provide a type different then 'floatingpoint'. Right now, I have: > {code:java} > const data = Table.from({ > schema: { > fields: [{name: 'a', nullable: false, children: Array(0), > type: {name: 'floatingpoint', precision: 'SINGLE'}}] > }, > batches: [{ >count: 10, >columns: [{ name: 'a', count: 10, VALIDITY: [], > DATA: Array.from({ length: 10 }, () => 'a') }]}] > }) > {code} > Which, of course is non-sense but I couldn't figure out how to provide the > type (I tried type: Utf8 among others). In general wouldn't it be a nice to > have create Table from object function? > On another note, are there any plans to make the docs a little bit more > descriptive? Happy to contribute! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7109) [JS] Create table from Arrays
[ https://issues.apache.org/jira/browse/ARROW-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971996#comment-16971996 ] Leo Meyerovich commented on ARROW-7109: --- Thanks, merged! I see [https://observablehq.com/d/6e565d7662d984ea] I & II are also out of date, so should update too. > [JS] Create table from Arrays > - > > Key: ARROW-7109 > URL: https://issues.apache.org/jira/browse/ARROW-7109 > Project: Apache Arrow > Issue Type: Wish >Reporter: Sascha Hofmann >Assignee: Brian Hulette >Priority: Minor > > I am trying to generate an arrow table from JS arrays and followed the > example from [here | > [https://observablehq.com/@lmeyerov/manipulating-flat-arrays-arrow-style]] > but I am struggling to generate different schemas, most importantly how to > provide a type different then 'floatingpoint'. Right now, I have: > {code:java} > const data = Table.from({ > schema: { > fields: [{name: 'a', nullable: false, children: Array(0), > type: {name: 'floatingpoint', precision: 'SINGLE'}}] > }, > batches: [{ >count: 10, >columns: [{ name: 'a', count: 10, VALIDITY: [], > DATA: Array.from({ length: 10 }, () => 'a') }]}] > }) > {code} > Which, of course is non-sense but I couldn't figure out how to provide the > type (I tried type: Utf8 among others). In general wouldn't it be a nice to > have create Table from object function? > On another note, are there any plans to make the docs a little bit more > descriptive? Happy to contribute! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7109) [JS] Create table from Arrays
[ https://issues.apache.org/jira/browse/ARROW-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971935#comment-16971935 ] Leo Meyerovich commented on ARROW-7109: --- Thanks [~bhulette]! Happy to update the tutorial if there are specific sections you can help point to – for interim, I put a forward pointer to yours. > [JS] Create table from Arrays > - > > Key: ARROW-7109 > URL: https://issues.apache.org/jira/browse/ARROW-7109 > Project: Apache Arrow > Issue Type: Wish >Reporter: Sascha Hofmann >Assignee: Brian Hulette >Priority: Minor > > I am trying to generate an arrow table from JS arrays and followed the > example from [here | > [https://observablehq.com/@lmeyerov/manipulating-flat-arrays-arrow-style]] > but I am struggling to generate different schemas, most importantly how to > provide a type different then 'floatingpoint'. Right now, I have: > {code:java} > const data = Table.from({ > schema: { > fields: [{name: 'a', nullable: false, children: Array(0), > type: {name: 'floatingpoint', precision: 'SINGLE'}}] > }, > batches: [{ >count: 10, >columns: [{ name: 'a', count: 10, VALIDITY: [], > DATA: Array.from({ length: 10 }, () => 'a') }]}] > }) > {code} > Which, of course is non-sense but I couldn't figure out how to provide the > type (I tried type: Utf8 among others). In general wouldn't it be a nice to > have create Table from object function? > On another note, are there any plans to make the docs a little bit more > descriptive? Happy to contribute! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-4131) [Python] Coerce mixed columns to String
Leo Meyerovich created ARROW-4131: - Summary: [Python] Coerce mixed columns to String Key: ARROW-4131 URL: https://issues.apache.org/jira/browse/ARROW-4131 Project: Apache Arrow Issue Type: Improvement Reporter: Leo Meyerovich Continuing [https://github.com/apache/arrow/issues/3280] === I'm seeing variants of this elsewhere (e.g., [wesm/feather#349|https://github.com/wesm/feather/issues/349] ) -- Not all Pandas tables coerce to Arrow tables, and when they fail, not in a way that is conducive to automation: Sample: {{mixed_df = pd.DataFrame(\{'mixed': [1, 'b']}) pa.Table.from_pandas(mixed_df) => ArrowInvalid: ('Could not convert b with type str: tried to convert to double', 'Conversion failed for column mixed with type object') }} I would have expected behaviors more like the following: * Coerce {{toString}} by default, with a default-off option to disallow toString coercions * Provide a default-off option to {{from_pandas}} to auto-coerce * Name the exception so it is clear that this is a column coercion failure, and include the column name(s), making this predictable and clearly handleable by both library writers & users I lean towards: * Defaults auto-coerce, improving life of early users, `coerce_mixed_columns_to_strings=True` * For less frequent yet more advanced library implementors, allow them to override to `False` * In their case, create a predictable & machine-readable exception, `MixedColumnException(mixed_columns=['a', 'b', ...], msg="")` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2777) [JS] Friendlier onboarding readme
[ https://issues.apache.org/jira/browse/ARROW-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529327#comment-16529327 ] Leo Meyerovich commented on ARROW-2777: --- Great. I don't have a strong feeling over which synchronous comms channel is preferred. I added Slack b/c I've been having synchronous education comms w/ various expert JS framework devs who are curious about Arrow but have questions. Likewise, as basic compute fills out over next few months, I expect more beginner q's to start, and an interim period before SO, tutorials, etc. catch up. Slack appeared to be the most "in-the-open" sync comms channel being actively used. Maybe keep now, and if/when a decision is made + new path is established, update? Or clear enough decision that I should delete now, and if so, what should I put instead? Thanks! > [JS] Friendlier onboarding readme > - > > Key: ARROW-2777 > URL: https://issues.apache.org/jira/browse/ARROW-2777 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Leo Meyerovich >Assignee: Leo Meyerovich >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Based on some recent community feedback, add to JS onboarding: > -- Example of loading native JS values > -- Pointer to Slack > -- Links to tutorials and docs > -- Ideally.. tutorial of loading -> map/filter/reduce -> emitting... but more > core methods seem needed first. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2777) [JS] Friendlier onboarding readme
Leo Meyerovich created ARROW-2777: - Summary: [JS] Friendlier onboarding readme Key: ARROW-2777 URL: https://issues.apache.org/jira/browse/ARROW-2777 Project: Apache Arrow Issue Type: Improvement Reporter: Leo Meyerovich Assignee: Leo Meyerovich Based on some recent community feedback, add to JS onboarding: -- Example of loading native JS values -- Pointer to Slack -- Links to tutorials and docs -- Ideally.. tutorial of loading -> map/filter/reduce -> emitting... but more core methods seem needed first. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2206) [JS] Add Perspective as a community project
[ https://issues.apache.org/jira/browse/ARROW-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leo Meyerovich updated ARROW-2206: -- Description: JS lib is used by [https://github.com/jpmorganchase/perspective] . We have permission from Deepank to reference. Tracking in [https://github.com/apache/arrow/pull/1652] . was:JS lib is used by [https://github.com/jpmorganchase/perspective] . We have permission from Deepank to reference. > [JS] Add Perspective as a community project > --- > > Key: ARROW-2206 > URL: https://issues.apache.org/jira/browse/ARROW-2206 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation, JavaScript >Reporter: Leo Meyerovich >Assignee: Leo Meyerovich >Priority: Major > Labels: pull-request-available > Original Estimate: 1h > Remaining Estimate: 1h > > JS lib is used by [https://github.com/jpmorganchase/perspective] . We have > permission from Deepank to reference. > > Tracking in [https://github.com/apache/arrow/pull/1652] . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2206) [JS] Add Perspective as a community project
Leo Meyerovich created ARROW-2206: - Summary: [JS] Add Perspective as a community project Key: ARROW-2206 URL: https://issues.apache.org/jira/browse/ARROW-2206 Project: Apache Arrow Issue Type: Improvement Components: Documentation, JavaScript Reporter: Leo Meyerovich Assignee: Leo Meyerovich JS lib is used by [https://github.com/jpmorganchase/perspective] . We have permission from Deepank to reference. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1952) [JS] 32b dense vector coercion
[ https://issues.apache.org/jira/browse/ARROW-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305023#comment-16305023 ] Leo Meyerovich commented on ARROW-1952: --- (Discussion happening w/ Paul around this, so more for documentation) > [JS] 32b dense vector coercion > -- > > Key: ARROW-1952 > URL: https://issues.apache.org/jira/browse/ARROW-1952 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Leo Meyerovich >Priority: Minor > > JS APIs, for better or worse, is quite 32b centric. Currently, JS Arrow does > a good job of information-preserving flattening, e.g., 64i vector into an > array of [hi, lo] int32s. Something similar for timestamps. ... However > in getting some Arrow code to load into a legacy system, I'm finding myself > to be writing a _lot_ of lossy flatteners in userland. Doing it there seems > brittle, error-prone, incurs friction for adoption, and if put in the core > lib, enable reuse across libs. > I can imagine at least 2 reasonable interfaces for this: > (1) 64b Vector -> 32b flat array (typed or otherwise). This is the naive, > simple thing. > (2) 64b Vector -> 32b Vector , and reuse whatever 32b vector -> flat array > logic will available anyways. This helps stay in the symbolic abstraction > longer, so may be smarter. > Thoughts? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1952) [JS] 32b dense vector coercion
[ https://issues.apache.org/jira/browse/ARROW-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leo Meyerovich updated ARROW-1952: -- Description: JS APIs, for better or worse, is quite 32b centric. Currently, JS Arrow does a good job of information-preserving flattening, e.g., 64i vector into an array of [hi, lo] int32s. Something similar for timestamps. ... However in getting some Arrow code to load into a legacy system, I'm finding myself to be writing a _lot_ of lossy flatteners in userland. Doing it there seems brittle, error-prone, incurs friction for adoption, and if put in the core lib, enable reuse across libs. I can imagine at least 2 reasonable interfaces for this: (1) 64b Vector -> 32b flat array (typed or otherwise). This is the naive, simple thing. (2) 64b Vector -> 32b Vector , and reuse whatever 32b vector -> flat array logic will available anyways. This helps stay in the symbolic abstraction longer, so may be smarter. Thoughts? was: JS APIs, for better or worse, is quite 32b centric. Currently, JS Arrow does a good job of information-preserving flattening, e.g., 64i vector into an array of [hi, lo] int32s. Something similar for timestamps. ... However in getting some Arrow code to load into a legacy system, I'm finding myself to be writing a _lot_ of lossy flatteners. This seems brittle, error-prone, incurs friction for adoption, and if put in the core lib, enable reuse across libs. I can imagine at least 2 reasonable interfaces for this: (1) 64b Vector -> 32b flat array (typed or otherwise). This is the naive, simple thing. (2) 64b Vector -> 32b Vector , and reuse whatever 32b vector -> flat array logic will available anyways. This helps stay in the symbolic abstraction longer, so may be smarter. Thoughts? > [JS] 32b dense vector coercion > -- > > Key: ARROW-1952 > URL: https://issues.apache.org/jira/browse/ARROW-1952 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Leo Meyerovich >Priority: Minor > > JS APIs, for better or worse, is quite 32b centric. Currently, JS Arrow does > a good job of information-preserving flattening, e.g., 64i vector into an > array of [hi, lo] int32s. Something similar for timestamps. ... However > in getting some Arrow code to load into a legacy system, I'm finding myself > to be writing a _lot_ of lossy flatteners in userland. Doing it there seems > brittle, error-prone, incurs friction for adoption, and if put in the core > lib, enable reuse across libs. > I can imagine at least 2 reasonable interfaces for this: > (1) 64b Vector -> 32b flat array (typed or otherwise). This is the naive, > simple thing. > (2) 64b Vector -> 32b Vector , and reuse whatever 32b vector -> flat array > logic will available anyways. This helps stay in the symbolic abstraction > longer, so may be smarter. > Thoughts? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1952) [JS] 32b dense vector coercion
[ https://issues.apache.org/jira/browse/ARROW-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leo Meyerovich updated ARROW-1952: -- Description: JS APIs, for better or worse, is quite 32b centric. Currently, JS Arrow does a good job of information-preserving flattening, e.g., 64i vector into an array of [hi, lo] int32s. Something similar for timestamps. ... However in getting some Arrow code to load into a legacy system, I'm finding myself to be writing a _lot_ of lossy flatteners. This seems brittle, error-prone, incurs friction for adoption, and if put in the core lib, enable reuse across libs. I can imagine at least 2 reasonable interfaces for this: (1) 64b Vector -> 32b flat array (typed or otherwise). This is the naive, simple thing. (2) 64b Vector -> 32b Vector , and reuse whatever 32b vector -> flat array logic will available anyways. This helps stay in the symbolic abstraction longer, so may be smarter. Thoughts? was: JS APIs, for better or worse, is quite 32b centric. Currently, JS Arrow does a good job of information-preserving flattening, e.g., 64i vector into an array of [hi, lo] int32s. Something similarly annoying for timestamps. ... However in getting some Arrow code to load into a legacy system, I'm finding myself to be writing a _lot_ of lossy flatteners. This seems brittle, error-prone, incurs friction for adoption, and if put in the core lib, enable reuse across libs. I can imagine at least 2 reasonable interfaces for this: (1) 64b Vector -> 32b flat array (typed or otherwise). This is the naive, simple thing. (2) 64b Vector -> 32b Vector , and reuse whatever 32b vector -> flat array logic will available anyways. This helps stay in the symbolic abstraction longer, so may be smarter. Thoughts? > [JS] 32b dense vector coercion > -- > > Key: ARROW-1952 > URL: https://issues.apache.org/jira/browse/ARROW-1952 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Leo Meyerovich >Priority: Minor > > JS APIs, for better or worse, is quite 32b centric. Currently, JS Arrow does > a good job of information-preserving flattening, e.g., 64i vector into an > array of [hi, lo] int32s. Something similar for timestamps. ... However > in getting some Arrow code to load into a legacy system, I'm finding myself > to be writing a _lot_ of lossy flatteners. This seems brittle, error-prone, > incurs friction for adoption, and if put in the core lib, enable reuse across > libs. > I can imagine at least 2 reasonable interfaces for this: > (1) 64b Vector -> 32b flat array (typed or otherwise). This is the naive, > simple thing. > (2) 64b Vector -> 32b Vector , and reuse whatever 32b vector -> flat array > logic will available anyways. This helps stay in the symbolic abstraction > longer, so may be smarter. > Thoughts? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1911) Add Graphistry to Arrow JS proof points
Leo Meyerovich created ARROW-1911: - Summary: Add Graphistry to Arrow JS proof points Key: ARROW-1911 URL: https://issues.apache.org/jira/browse/ARROW-1911 Project: Apache Arrow Issue Type: Improvement Reporter: Leo Meyerovich As part of upcoming publicity to the JS project, we wanted to add the Graphistry enterprise-grade use case to the homepage. -- This message was sent by Atlassian JIRA (v6.4.14#64029)