from:"Leo Meyerovich \(Jira\)"

[jira] [Created] (ARROW-11564) Cython/ReadTheDocs broken?

2021-02-08 Thread Leo Meyerovich (Jira)

Leo Meyerovich created ARROW-11564:
--

 Summary: Cython/ReadTheDocs broken?
 Key: ARROW-11564
 URL: https://issues.apache.org/jira/browse/ARROW-11564
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Leo Meyerovich


Our ReadTheDocs was regenerating some docs and threw a surprising error on 
PyArrow's Cython dependency

 
 * CPython 3.0 fail on Arrow 3.0.0's Cython:  
[https://readthedocs.org/projects/pygraphistry/builds/12971110/ 
|https://readthedocs.org/projects/pygraphistry/builds/12971110/]
 * Previous build passes on Arrow 2.0.0: 
https://readthedocs.org/api/v2/build/12767582.txt
 * Boring setup.py: 
[https://github.com/graphistry/pygraphistry/blob/master/setup.py#L49]

 

I'm guessing there may be something to file w/ ReadTheDocs on env, but.. 
surprising regression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-8053) [JS] Improve performance of filtering

2020-03-10 Thread Leo Meyerovich (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056567#comment-17056567
 ] 

Leo Meyerovich edited comment on ARROW-8053 at 3/11/20, 1:26 AM:
-

Sorry, we never got support for continuing our arrow js plans, namely, moving 
on to the GPU bindings & compute stack. That would have included this case:
 * deforested typed native code via wasm
 * multicore & simd via workers & wasm
 * GPU interp via RAPIDS

IMO we need the follow-up tutorial on real-world IO and follow-up on real 
compute (not the weird predicate stuff). My limited cycles need to be on that 
or figuring out GPU IO bindings.

 

To be clear, you can manually do the ^^^ on this use case. It'd take real 
experimentation, e.g., not even clear here if row-based, columnar, or a tiled 
hybrid is the right choice. And probably not the filter predicates - we 
definitely avoid them in prod ;)


was (Author: lmeyerov):
Sorry, we never got support for continuing our arrow js plans, namely, moving 
on to the GPU bindings & compute stack. That would have included this case:
 * deforested typed native code via wasm
 * multicore & simd via workers & wasm
 * GPU interp via RAPIDS

IMO we need the follow-up tutorial on real-world IO and follow-up on real 
compute (not the weird predicate stuff). My limited cycles need to be on that 
or figuring out GPU IO bindings.

 

To be clear, you can manually do the ^^^ on this use case. It'd take real 
experimentation, e.g., not even clear here if row-based, columnar, or a tiled 
hybrid is the right choice.

> [JS] Improve performance of filtering
> -
>
> Key: ARROW-8053
> URL: https://issues.apache.org/jira/browse/ARROW-8053
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Will Strimling
>Priority: Major
>
> A series of observable notebooks have shown quite convincingly that arrow 
> doesn't compete with other libraries or JavaScript when it comes to filtering 
> performance. Has there been any discussion or roadmaps established for 
> improving it?
> Most convincing Observables:
>  * 
> [https://observablehq.com/@duaneatat/apache-arrow-filtering-vs-array-filter]
>  * 
> [https://observablehq.com/@robertleeplummerjr/array-filtering-apache-arrow-vs-gpu-js-textures-vs-array-fil]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-8053) [JS] Improve performance of filtering

2020-03-10 Thread Leo Meyerovich (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056567#comment-17056567
 ] 

Leo Meyerovich edited comment on ARROW-8053 at 3/11/20, 1:25 AM:
-

Sorry, we never got support for continuing our arrow js plans, namely, moving 
on to the GPU bindings & compute stack. That would have included this case:
 * deforested typed native code via wasm
 * multicore & simd via workers & wasm
 * GPU interp via RAPIDS

IMO we need the follow-up tutorial on real-world IO and follow-up on real 
compute (not the weird predicate stuff). My limited cycles need to be on that 
or figuring out GPU IO bindings.

 

To be clear, you can manually do the ^^^ on this use case. It'd take real 
experimentation, e.g., not even clear here if row-based, columnar, or a tiled 
hybrid is the right choice.


was (Author: lmeyerov):
Sorry, we never got support for continuing our arrow js plans, namely, moving 
on to the GPU bindings & compute stack. That would have included this case:
 * deforested typed native code via wasm
 * multicore & simd via workers & wasm
 * GPU interp via RAPIDS

IMO we need the follow-up tutorial on real-world IO and follow-up on real 
compute (not the weird predicate stuff).

 

To be clear, you can manually do the ^^^ on this use case. It'd take real 
experimentation, e.g., not even clear here if row-based, columnar, or a tiled 
hybrid is the right choice.

> [JS] Improve performance of filtering
> -
>
> Key: ARROW-8053
> URL: https://issues.apache.org/jira/browse/ARROW-8053
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Will Strimling
>Priority: Major
>
> A series of observable notebooks have shown quite convincingly that arrow 
> doesn't compete with other libraries or JavaScript when it comes to filtering 
> performance. Has there been any discussion or roadmaps established for 
> improving it?
> Most convincing Observables:
>  * 
> [https://observablehq.com/@duaneatat/apache-arrow-filtering-vs-array-filter]
>  * 
> [https://observablehq.com/@robertleeplummerjr/array-filtering-apache-arrow-vs-gpu-js-textures-vs-array-fil]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARROW-8053) [JS] Improve performance of filtering

2020-03-10 Thread Leo Meyerovich (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056567#comment-17056567
 ] 

Leo Meyerovich edited comment on ARROW-8053 at 3/11/20, 1:23 AM:
-

Sorry, we never got support for continuing our arrow js plans, namely, moving 
on to the GPU bindings & compute stack. That would have included this case:
 * deforested typed native code via wasm
 * multicore & simd via workers & wasm
 * GPU interp via RAPIDS

IMO we need the follow-up tutorial on real-world IO and follow-up on real 
compute (not the weird predicate stuff).

 

To be clear, you can manually do the ^^^ on this use case. It'd take real 
experimentation, e.g., not even clear here if row-based, columnar, or a tiled 
hybrid is the right choice.


was (Author: lmeyerov):
Sorry, we never got support for continuing our arrow js plans, namely, moving 
on to the GPU bindings & compute stack. That would have included this case:
 * deforested typed native code via wasm
 * multicore & simd via workers & wasm
 * GPU interp via RAPIDS

IMO we need the follow-up tutorial on real-world IO and follow-up on real 
compute (not the weird predicate stuff)

> [JS] Improve performance of filtering
> -
>
> Key: ARROW-8053
> URL: https://issues.apache.org/jira/browse/ARROW-8053
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Will Strimling
>Priority: Major
>
> A series of observable notebooks have shown quite convincingly that arrow 
> doesn't compete with other libraries or JavaScript when it comes to filtering 
> performance. Has there been any discussion or roadmaps established for 
> improving it?
> Most convincing Observables:
>  * 
> [https://observablehq.com/@duaneatat/apache-arrow-filtering-vs-array-filter]
>  * 
> [https://observablehq.com/@robertleeplummerjr/array-filtering-apache-arrow-vs-gpu-js-textures-vs-array-fil]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8053) [JS] Improve performance of filtering

2020-03-10 Thread Leo Meyerovich (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-8053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056567#comment-17056567
 ] 

Leo Meyerovich commented on ARROW-8053:
---

Sorry, we never got support for continuing our arrow js plans, namely, moving 
on to the GPU bindings & compute stack. That would have included this case:
 * deforested typed native code via wasm
 * multicore & simd via workers & wasm
 * GPU interp via RAPIDS

IMO we need the follow-up tutorial on real-world IO and follow-up on real 
compute (not the weird predicate stuff)

> [JS] Improve performance of filtering
> -
>
> Key: ARROW-8053
> URL: https://issues.apache.org/jira/browse/ARROW-8053
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Will Strimling
>Priority: Major
>
> A series of observable notebooks have shown quite convincingly that arrow 
> doesn't compete with other libraries or JavaScript when it comes to filtering 
> performance. Has there been any discussion or roadmaps established for 
> improving it?
> Most convincing Observables:
>  * 
> [https://observablehq.com/@duaneatat/apache-arrow-filtering-vs-array-filter]
>  * 
> [https://observablehq.com/@robertleeplummerjr/array-filtering-apache-arrow-vs-gpu-js-textures-vs-array-fil]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-24 Thread Leo Meyerovich (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023149#comment-17023149
 ] 

Leo Meyerovich commented on ARROW-7513:
---

Thanks Brian! I'm slammed for another ~2w, and then will work on Part II.

 

The other thing I realized that is missing here is charts showing [] vs Arrow 
perf (speed + mem use).

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.16.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-16 Thread Leo Meyerovich (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017539#comment-17017539
 ] 

Leo Meyerovich commented on ARROW-7513:
---

_nudge:_  [~paul.e.taylor]  / [~bhulette]  - need to do anything to get the PR 
link merged?

 

Need to take care of some things, and then will see about extracting Part II 
out from the earlier versions on Data+Builders.

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-10 Thread Leo Meyerovich (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013248#comment-17013248
 ] 

Leo Meyerovich commented on ARROW-7513:
---

... OK PR is up: [https://github.com/apache/arrow/pull/6163]

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-10 Thread Leo Meyerovich (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013225#comment-17013225
 ] 

Leo Meyerovich commented on ARROW-7513:
---

Worked w/ paul a bit, and part 1 is done afaict. See updated live link.

 

Next step:

-- any review is good

-- I can update the main readme to point to it

 

Later:

-- bring back lower-level data.new and builders as part ii

-- paul points out the helper structVector.getChildByName (slice-by-name) 
should really be built-in, and can be a nice first pr for someone

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-10 Thread Leo Meyerovich (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013192#comment-17013192
 ] 

Leo Meyerovich commented on ARROW-7513:
---

Great, thanks Paul, I cleaned up the int64vector stuff + initial struct stuff. 
Two last q's and we're done w/ Part I!

 
 * Is there a clean way to create a utf8 dict? I could do 
VectorUtf8.from(["str1", ...]), but no VectorDictionary.from, even with opts. 
 * For a table col w/ a nested struct, any way to slice out a subcol, e.g., 
nick_col = tbl.get('richName').get('nick')

 

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-08 Thread Leo Meyerovich (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011451#comment-17011451
 ] 

Leo Meyerovich commented on ARROW-7513:
---

* Good: Updated the numerics section to use `VectorT.from(Array | Buffer)`
 ** Oddly, arrow.Int64Vector.from((new Uint32Array([2,3, 555,0, 1,0])).buffer) 
returns length 6, not 3 (0.15.0)
 * Bad:`VectorDictionary.from(['hello', 'hello', null, 'carrot'])` did not seem 
to work, so kept as lower-level for now
 * Bad: Still not sure how to do structs

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-08 Thread Leo Meyerovich (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010870#comment-17010870
 ] 

Leo Meyerovich commented on ARROW-7513:
---

Agreed, I'll see about forking this into Part I & Part II, where Part I is 
high-level api and move the Data stuff to Part II. 

 

I'm stumped on `structs` and `nested structs` though, any recs/examples?

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-07 Thread Leo Meyerovich (Jira)

Leo Meyerovich created ARROW-7513:
-

Summary: [JS] Arrow Tutorial: Common data types
Key: ARROW-7513
URL: https://issues.apache.org/jira/browse/ARROW-7513
Project: Apache Arrow
Issue Type: Task
Components: JavaScript
Reporter: Leo Meyerovich
Assignee: Leo Meyerovich

The JS client lacks basic introductory material around creating the common
basic data types such as turning JS arrays into ints, dicts, etc. There is no
equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . This
has made use for myself difficult, and I bet for others.

As with prev tutorials, I started sketching on
[https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
. When we're happy can make sense to export as an html or something to the
repo, or just link from the main readme.

I believe the target topics worth covering are:
* Common user data types: Ints, Dicts, Struct, Time
* Common column types: Data, Vector, Column
* Going from individual & arrays & buffers of JS values to Arrow-wrapped
forms, and basic inspection of the result

Not worth going into here is Tables vs. RecordBatches, which is the other
tutorial.

1. Ideas of what to add/edit/remove?

2. And anyone up for helping with discussion of Data vs. Vector, and ingest of
Time & Struct?

3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff
here.

cc [~wesm] [~bhulette] [~paul.e.taylor]

--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-7109) [JS] Create table from Arrays

2019-11-13 Thread Leo Meyerovich (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973604#comment-16973604
 ] 

Leo Meyerovich commented on ARROW-7109:
---

Awesome! 

 

How to stream record batches in/out seems like the most meaningful. At least 
here, a lot of our use ends up being about this. So maybe help to share some of 
the arrow APIs used and maybe how/why. 

 
 # Maybe a start is revisiting the early microbatch tutorial I had done before 
the API stabilized:

[https://observablehq.com/d/6e565d7662d984ea 
|https://observablehq.com/d/6e565d7662d984ea]

 

^^^ Aimed to show microbatch API and prove some numbers

 

2. As a followup, demoing common fast IO needs: fast browser<>node, node 
process <> node process, and node process <> python process. A lot of our code 
ends up looking like:

 

```
export function asTable(source: Table | AsyncIterable, fields: 
string[] = []) {
return AsyncIterableX.defer(async () => {
if (source) {
const batches = source instanceof Table ? source.chunks : await 
toArray(source);
if (batches.length > 0) {
if (!fields || !fields.length) {
return AsyncIterableX.of(new Table(batches));
}
const table = new Table(batches).select(...fields);
if (table.schema.fields.length === fields.length) {
return AsyncIterableX.of(table);
}
}
}
return AsyncIterableX.empty();
});
}

export function asBatches(fn: () => DeferFnReturn) {
return asReaders(fn).concatAll() as BatchesReturn;
}

export function asReaders(fn: () => DeferFnReturn) {
return AsyncIterableX.defer(async () => {
const x = await fn();
return RecordBatchStreamReader.readAll(x);
}) as ReadersReturn;
}
```
 
and
 
```
asReaders(() => {
const nodeEncodingsStream = asTable(Array.isArray(nodeEncodings) ? 
this.node.table.encodings(nodeEncodings) : nodeEncodings);
const edgeEncodingsStream = asTable(Array.isArray(edgeEncodings) ? 
this.edge.table.encodings(edgeEncodings) : edgeEncodings);
const recordBatchesStream = AsyncIterable.concat(
// `preshaped` is an AsyncIterable>, 
so flatten it here
AsyncIterable.as(preshaped).flatMap((x) => asTable(x)),
// Calling memoize() on the node/edge encodings streams ensures 
they start downloading
// immediately, instead of serially after the incoming 
RecordBatch streams have completed
nodeEncodingsStream.memoize().flatMap((\{ chunks }) => 
AsyncIterable.as(chunks)),
edgeEncodingsStream.memoize().flatMap((\{ chunks }) => 
AsyncIterable.as(chunks)),
);
return recordBatchesStream
.pipe(RecordBatchStreamWriter.throughNode(\{ autoDestroy: false 
}))
.pipe(this.got.stream.post('/preshaped/shaped', \{ headers: 
octetstream }));
});
```
 

> [JS] Create table from Arrays
> -
>
> Key: ARROW-7109
> URL: https://issues.apache.org/jira/browse/ARROW-7109
> Project: Apache Arrow
>  Issue Type: Wish
>Reporter: Sascha Hofmann
>Assignee: Brian Hulette
>Priority: Minor
> Attachments: image-2019-11-12-09-11-39-751.png
>
>
> I am trying to generate an arrow table from JS arrays and followed the 
> example from [here | 
> [https://observablehq.com/@lmeyerov/manipulating-flat-arrays-arrow-style]] 
> but I am struggling to generate different schemas, most importantly how to 
> provide a type different then 'floatingpoint'. Right now, I have:
> {code:java}
> const data = Table.from({
>   schema: {
> fields: [{name: 'a', nullable: false, children: Array(0), 
>  type: {name: 'floatingpoint', precision: 'SINGLE'}}]
> },
>   batches: [{
>count: 10,
>columns: [{ name: 'a', count: 10, VALIDITY: [], 
> DATA: Array.from({ length: 10 }, () => 'a') }]}]  
> })
> {code}
> Which, of course is non-sense but I couldn't figure out how to provide the 
> type (I tried type: Utf8 among others).  In general wouldn't it be a nice to 
> have create Table from object function?
> On another note, are there any plans to make the docs a little bit more 
> descriptive? Happy to contribute!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-7109) [JS] Create table from Arrays

2019-11-12 Thread Leo Meyerovich (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972634#comment-16972634
 ] 

Leo Meyerovich commented on ARROW-7109:
---

Thanks – we use Arrow heavily internally for webgl->nodejs->nodeopencl+pydata, 
while our external efforts here are more of a labor of love. So while very much 
production quality, the docs are still generally "grep arrow + our codebase" :) 
Helping update the tutorials or docs based on your experiences would be awesome 
if you're up for it!

 

 

> [JS] Create table from Arrays
> -
>
> Key: ARROW-7109
> URL: https://issues.apache.org/jira/browse/ARROW-7109
> Project: Apache Arrow
>  Issue Type: Wish
>Reporter: Sascha Hofmann
>Assignee: Brian Hulette
>Priority: Minor
> Attachments: image-2019-11-12-09-11-39-751.png
>
>
> I am trying to generate an arrow table from JS arrays and followed the 
> example from [here | 
> [https://observablehq.com/@lmeyerov/manipulating-flat-arrays-arrow-style]] 
> but I am struggling to generate different schemas, most importantly how to 
> provide a type different then 'floatingpoint'. Right now, I have:
> {code:java}
> const data = Table.from({
>   schema: {
> fields: [{name: 'a', nullable: false, children: Array(0), 
>  type: {name: 'floatingpoint', precision: 'SINGLE'}}]
> },
>   batches: [{
>count: 10,
>columns: [{ name: 'a', count: 10, VALIDITY: [], 
> DATA: Array.from({ length: 10 }, () => 'a') }]}]  
> })
> {code}
> Which, of course is non-sense but I couldn't figure out how to provide the 
> type (I tried type: Utf8 among others).  In general wouldn't it be a nice to 
> have create Table from object function?
> On another note, are there any plans to make the docs a little bit more 
> descriptive? Happy to contribute!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-7109) [JS] Create table from Arrays

2019-11-11 Thread Leo Meyerovich (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971996#comment-16971996
 ] 

Leo Meyerovich commented on ARROW-7109:
---

Thanks, merged! I see [https://observablehq.com/d/6e565d7662d984ea] I & II are 
also out of date, so should update too.

> [JS] Create table from Arrays
> -
>
> Key: ARROW-7109
> URL: https://issues.apache.org/jira/browse/ARROW-7109
> Project: Apache Arrow
>  Issue Type: Wish
>Reporter: Sascha Hofmann
>Assignee: Brian Hulette
>Priority: Minor
>
> I am trying to generate an arrow table from JS arrays and followed the 
> example from [here | 
> [https://observablehq.com/@lmeyerov/manipulating-flat-arrays-arrow-style]] 
> but I am struggling to generate different schemas, most importantly how to 
> provide a type different then 'floatingpoint'. Right now, I have:
> {code:java}
> const data = Table.from({
>   schema: {
> fields: [{name: 'a', nullable: false, children: Array(0), 
>  type: {name: 'floatingpoint', precision: 'SINGLE'}}]
> },
>   batches: [{
>count: 10,
>columns: [{ name: 'a', count: 10, VALIDITY: [], 
> DATA: Array.from({ length: 10 }, () => 'a') }]}]  
> })
> {code}
> Which, of course is non-sense but I couldn't figure out how to provide the 
> type (I tried type: Utf8 among others).  In general wouldn't it be a nice to 
> have create Table from object function?
> On another note, are there any plans to make the docs a little bit more 
> descriptive? Happy to contribute!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-7109) [JS] Create table from Arrays

2019-11-11 Thread Leo Meyerovich (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971935#comment-16971935
 ] 

Leo Meyerovich commented on ARROW-7109:
---

Thanks [~bhulette]! Happy to update the tutorial if there are specific sections 
you can help point to – for interim, I put a forward pointer to yours.

> [JS] Create table from Arrays
> -
>
> Key: ARROW-7109
> URL: https://issues.apache.org/jira/browse/ARROW-7109
> Project: Apache Arrow
>  Issue Type: Wish
>Reporter: Sascha Hofmann
>Assignee: Brian Hulette
>Priority: Minor
>
> I am trying to generate an arrow table from JS arrays and followed the 
> example from [here | 
> [https://observablehq.com/@lmeyerov/manipulating-flat-arrays-arrow-style]] 
> but I am struggling to generate different schemas, most importantly how to 
> provide a type different then 'floatingpoint'. Right now, I have:
> {code:java}
> const data = Table.from({
>   schema: {
> fields: [{name: 'a', nullable: false, children: Array(0), 
>  type: {name: 'floatingpoint', precision: 'SINGLE'}}]
> },
>   batches: [{
>count: 10,
>columns: [{ name: 'a', count: 10, VALIDITY: [], 
> DATA: Array.from({ length: 10 }, () => 'a') }]}]  
> })
> {code}
> Which, of course is non-sense but I couldn't figure out how to provide the 
> type (I tried type: Utf8 among others).  In general wouldn't it be a nice to 
> have create Table from object function?
> On another note, are there any plans to make the docs a little bit more 
> descriptive? Happy to contribute!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-4131) [Python] Coerce mixed columns to String

2018-12-28 Thread Leo Meyerovich (JIRA)

Leo Meyerovich created ARROW-4131:
-

 Summary: [Python] Coerce mixed columns to String
 Key: ARROW-4131
 URL: https://issues.apache.org/jira/browse/ARROW-4131
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Leo Meyerovich


Continuing [https://github.com/apache/arrow/issues/3280] 

 

===

 

I'm seeing variants of this elsewhere (e.g., 
[wesm/feather#349|https://github.com/wesm/feather/issues/349] ) --

Not all Pandas tables coerce to Arrow tables, and when they fail, not in a way 
that is conducive to automation:

Sample:

{{mixed_df = pd.DataFrame(\{'mixed': [1, 'b']}) pa.Table.from_pandas(mixed_df) 
=> ArrowInvalid: ('Could not convert b with type str: tried to convert to 
double', 'Conversion failed for column mixed with type object') }}

I would have expected behaviors more like the following:
 * Coerce {{toString}} by default, with a default-off option to disallow 
toString coercions

 * Provide a default-off option to {{from_pandas}} to auto-coerce

 * Name the exception so it is clear that this is a column coercion failure, 
and include the column name(s), making this predictable and clearly handleable 
by both library writers & users

I lean towards:
 * Defaults auto-coerce, improving life of early users, 
`coerce_mixed_columns_to_strings=True`
 * For less frequent yet more advanced library implementors, allow them to 
override to `False`
 * In their case, create a predictable & machine-readable exception, 
`MixedColumnException(mixed_columns=['a', 'b', ...], msg="")`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-2777) [JS] Friendlier onboarding readme

2018-07-01 Thread Leo Meyerovich (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529327#comment-16529327
 ] 

Leo Meyerovich commented on ARROW-2777:
---

Great.  I don't have a strong feeling over which synchronous comms channel is 
preferred. I added Slack b/c I've been having synchronous education comms w/ 
various expert JS framework devs who are curious about Arrow but have 
questions. Likewise,  as basic compute fills out over next few months, I expect 
more beginner q's to start, and an interim period before SO, tutorials, etc. 
catch up. Slack appeared to be the most "in-the-open" sync comms channel being 
actively used.

 

Maybe keep now, and if/when a decision is made + new path is established, 
update?

 

Or clear enough decision that I should delete now, and if so, what should I put 
instead?

 

Thanks!

 

 

> [JS] Friendlier onboarding readme
> -
>
> Key: ARROW-2777
> URL: https://issues.apache.org/jira/browse/ARROW-2777
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Based on some recent community feedback, add to JS onboarding:
> -- Example of loading native JS values
> -- Pointer to Slack
> -- Links to tutorials and docs
> -- Ideally.. tutorial of loading -> map/filter/reduce -> emitting... but more 
> core methods seem needed first.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-2777) [JS] Friendlier onboarding readme

2018-07-01 Thread Leo Meyerovich (JIRA)

Leo Meyerovich created ARROW-2777:
-

 Summary: [JS] Friendlier onboarding readme
 Key: ARROW-2777
 URL: https://issues.apache.org/jira/browse/ARROW-2777
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Leo Meyerovich
Assignee: Leo Meyerovich


Based on some recent community feedback, add to JS onboarding:

-- Example of loading native JS values

-- Pointer to Slack

-- Links to tutorials and docs

-- Ideally.. tutorial of loading -> map/filter/reduce -> emitting... but more 
core methods seem needed first.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-2206) [JS] Add Perspective as a community project

2018-02-23 Thread Leo Meyerovich (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leo Meyerovich updated ARROW-2206:
--
Description: 
JS lib is used by [https://github.com/jpmorganchase/perspective] . We have 
permission from Deepank to reference.

 

Tracking in [https://github.com/apache/arrow/pull/1652] .

  was:JS lib is used by [https://github.com/jpmorganchase/perspective] . We 
have permission from Deepank to reference.


> [JS] Add Perspective as a community project
> ---
>
> Key: ARROW-2206
> URL: https://issues.apache.org/jira/browse/ARROW-2206
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation, JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> JS lib is used by [https://github.com/jpmorganchase/perspective] . We have 
> permission from Deepank to reference.
>  
> Tracking in [https://github.com/apache/arrow/pull/1652] .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-2206) [JS] Add Perspective as a community project

2018-02-23 Thread Leo Meyerovich (JIRA)

Leo Meyerovich created ARROW-2206:
-

 Summary: [JS] Add Perspective as a community project
 Key: ARROW-2206
 URL: https://issues.apache.org/jira/browse/ARROW-2206
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation, JavaScript
Reporter: Leo Meyerovich
Assignee: Leo Meyerovich


JS lib is used by [https://github.com/jpmorganchase/perspective] . We have 
permission from Deepank to reference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1952) [JS] 32b dense vector coercion

2017-12-27 Thread Leo Meyerovich (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305023#comment-16305023
 ] 

Leo Meyerovich commented on ARROW-1952:
---

(Discussion happening w/ Paul around this, so more for documentation)

> [JS] 32b dense vector coercion
> --
>
> Key: ARROW-1952
> URL: https://issues.apache.org/jira/browse/ARROW-1952
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Leo Meyerovich
>Priority: Minor
>
> JS APIs, for better or worse, is quite 32b centric. Currently, JS Arrow does 
> a good job of information-preserving flattening, e.g., 64i vector into an 
> array of [hi, lo] int32s.  Something similar for timestamps. ... However  
> in getting some Arrow code to load into a legacy system, I'm finding myself 
> to be writing a _lot_ of lossy flatteners in userland.  Doing it there seems 
> brittle, error-prone, incurs friction for adoption, and if put in the core 
> lib, enable reuse across libs.
> I can imagine at least 2 reasonable interfaces for this:
> (1) 64b Vector -> 32b flat array (typed or otherwise). This is the naive, 
> simple thing.
> (2) 64b Vector -> 32b Vector , and reuse whatever 32b vector -> flat array 
> logic will available anyways. This helps stay in the symbolic abstraction 
> longer, so may be smarter.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (ARROW-1952) [JS] 32b dense vector coercion

2017-12-27 Thread Leo Meyerovich (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leo Meyerovich updated ARROW-1952:
--
Description: 
JS APIs, for better or worse, is quite 32b centric. Currently, JS Arrow does a 
good job of information-preserving flattening, e.g., 64i vector into an array 
of [hi, lo] int32s.  Something similar for timestamps. ... However  in 
getting some Arrow code to load into a legacy system, I'm finding myself to be 
writing a _lot_ of lossy flatteners in userland.  Doing it there seems brittle, 
error-prone, incurs friction for adoption, and if put in the core lib, enable 
reuse across libs.

I can imagine at least 2 reasonable interfaces for this:
(1) 64b Vector -> 32b flat array (typed or otherwise). This is the naive, 
simple thing.
(2) 64b Vector -> 32b Vector , and reuse whatever 32b vector -> flat array 
logic will available anyways. This helps stay in the symbolic abstraction 
longer, so may be smarter.

Thoughts?


  was:
JS APIs, for better or worse, is quite 32b centric. Currently, JS Arrow does a 
good job of information-preserving flattening, e.g., 64i vector into an array 
of [hi, lo] int32s.  Something similar for timestamps. ... However  in 
getting some Arrow code to load into a legacy system, I'm finding myself to be 
writing a _lot_ of lossy flatteners.  This seems brittle, error-prone, incurs 
friction for adoption, and if put in the core lib, enable reuse across libs.

I can imagine at least 2 reasonable interfaces for this:
(1) 64b Vector -> 32b flat array (typed or otherwise). This is the naive, 
simple thing.
(2) 64b Vector -> 32b Vector , and reuse whatever 32b vector -> flat array 
logic will available anyways. This helps stay in the symbolic abstraction 
longer, so may be smarter.

Thoughts?



> [JS] 32b dense vector coercion
> --
>
> Key: ARROW-1952
> URL: https://issues.apache.org/jira/browse/ARROW-1952
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Leo Meyerovich
>Priority: Minor
>
> JS APIs, for better or worse, is quite 32b centric. Currently, JS Arrow does 
> a good job of information-preserving flattening, e.g., 64i vector into an 
> array of [hi, lo] int32s.  Something similar for timestamps. ... However  
> in getting some Arrow code to load into a legacy system, I'm finding myself 
> to be writing a _lot_ of lossy flatteners in userland.  Doing it there seems 
> brittle, error-prone, incurs friction for adoption, and if put in the core 
> lib, enable reuse across libs.
> I can imagine at least 2 reasonable interfaces for this:
> (1) 64b Vector -> 32b flat array (typed or otherwise). This is the naive, 
> simple thing.
> (2) 64b Vector -> 32b Vector , and reuse whatever 32b vector -> flat array 
> logic will available anyways. This helps stay in the symbolic abstraction 
> longer, so may be smarter.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (ARROW-1952) [JS] 32b dense vector coercion

2017-12-27 Thread Leo Meyerovich (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leo Meyerovich updated ARROW-1952:
--
Description: 
JS APIs, for better or worse, is quite 32b centric. Currently, JS Arrow does a 
good job of information-preserving flattening, e.g., 64i vector into an array 
of [hi, lo] int32s.  Something similar for timestamps. ... However  in 
getting some Arrow code to load into a legacy system, I'm finding myself to be 
writing a _lot_ of lossy flatteners.  This seems brittle, error-prone, incurs 
friction for adoption, and if put in the core lib, enable reuse across libs.

I can imagine at least 2 reasonable interfaces for this:
(1) 64b Vector -> 32b flat array (typed or otherwise). This is the naive, 
simple thing.
(2) 64b Vector -> 32b Vector , and reuse whatever 32b vector -> flat array 
logic will available anyways. This helps stay in the symbolic abstraction 
longer, so may be smarter.

Thoughts?


  was:
JS APIs, for better or worse, is quite 32b centric. Currently, JS Arrow does a 
good job of information-preserving flattening, e.g., 64i vector into an array 
of [hi, lo] int32s.  Something similarly annoying for timestamps. ... However 
 in getting some Arrow code to load into a legacy system, I'm finding 
myself to be writing a _lot_ of lossy flatteners.  This seems brittle, 
error-prone, incurs friction for adoption, and if put in the core lib, enable 
reuse across libs.

I can imagine at least 2 reasonable interfaces for this:
(1) 64b Vector -> 32b flat array (typed or otherwise). This is the naive, 
simple thing.
(2) 64b Vector -> 32b Vector , and reuse whatever 32b vector -> flat array 
logic will available anyways. This helps stay in the symbolic abstraction 
longer, so may be smarter.

Thoughts?



> [JS] 32b dense vector coercion
> --
>
> Key: ARROW-1952
> URL: https://issues.apache.org/jira/browse/ARROW-1952
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Leo Meyerovich
>Priority: Minor
>
> JS APIs, for better or worse, is quite 32b centric. Currently, JS Arrow does 
> a good job of information-preserving flattening, e.g., 64i vector into an 
> array of [hi, lo] int32s.  Something similar for timestamps. ... However  
> in getting some Arrow code to load into a legacy system, I'm finding myself 
> to be writing a _lot_ of lossy flatteners.  This seems brittle, error-prone, 
> incurs friction for adoption, and if put in the core lib, enable reuse across 
> libs.
> I can imagine at least 2 reasonable interfaces for this:
> (1) 64b Vector -> 32b flat array (typed or otherwise). This is the naive, 
> simple thing.
> (2) 64b Vector -> 32b Vector , and reuse whatever 32b vector -> flat array 
> logic will available anyways. This helps stay in the symbolic abstraction 
> longer, so may be smarter.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-1911) Add Graphistry to Arrow JS proof points

2017-12-10 Thread Leo Meyerovich (JIRA)

Leo Meyerovich created ARROW-1911:
-

 Summary: Add Graphistry to Arrow JS proof points
 Key: ARROW-1911
 URL: https://issues.apache.org/jira/browse/ARROW-1911
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Leo Meyerovich


As part of upcoming publicity to the JS project, we wanted to add the 
Graphistry enterprise-grade use case to the homepage. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (ARROW-11564) Cython/ReadTheDocs broken?

[jira] [Comment Edited] (ARROW-8053) [JS] Improve performance of filtering

[jira] [Comment Edited] (ARROW-8053) [JS] Improve performance of filtering

[jira] [Comment Edited] (ARROW-8053) [JS] Improve performance of filtering

[jira] [Commented] (ARROW-8053) [JS] Improve performance of filtering

[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

[jira] [Created] (ARROW-7513) [JS] Arrow Tutorial: Common data types

[jira] [Commented] (ARROW-7109) [JS] Create table from Arrays

[jira] [Commented] (ARROW-7109) [JS] Create table from Arrays

[jira] [Commented] (ARROW-7109) [JS] Create table from Arrays

[jira] [Commented] (ARROW-7109) [JS] Create table from Arrays

[jira] [Created] (ARROW-4131) [Python] Coerce mixed columns to String

[jira] [Commented] (ARROW-2777) [JS] Friendlier onboarding readme

[jira] [Created] (ARROW-2777) [JS] Friendlier onboarding readme

[jira] [Updated] (ARROW-2206) [JS] Add Perspective as a community project

[jira] [Created] (ARROW-2206) [JS] Add Perspective as a community project

[jira] [Commented] (ARROW-1952) [JS] 32b dense vector coercion

[jira] [Updated] (ARROW-1952) [JS] 32b dense vector coercion

[jira] [Updated] (ARROW-1952) [JS] 32b dense vector coercion

[jira] [Created] (ARROW-1911) Add Graphistry to Arrow JS proof points

26 matches

Site Navigation

Mail list logo

Footer information