[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-24 Thread Leo Meyerovich (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023149#comment-17023149
 ] 

Leo Meyerovich commented on ARROW-7513:
---

Thanks Brian! I'm slammed for another ~2w, and then will work on Part II.

 

The other thing I realized that is missing here is charts showing [] vs Arrow 
perf (speed + mem use).

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.16.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-16 Thread Leo Meyerovich (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017539#comment-17017539
 ] 

Leo Meyerovich commented on ARROW-7513:
---

_nudge:_  [~paul.e.taylor]  / [~bhulette]  - need to do anything to get the PR 
link merged?

 

Need to take care of some things, and then will see about extracting Part II 
out from the earlier versions on Data+Builders.

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-10 Thread Leo Meyerovich (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013248#comment-17013248
 ] 

Leo Meyerovich commented on ARROW-7513:
---

... OK PR is up: [https://github.com/apache/arrow/pull/6163]

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-10 Thread Leo Meyerovich (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013225#comment-17013225
 ] 

Leo Meyerovich commented on ARROW-7513:
---

Worked w/ paul a bit, and part 1 is done afaict. See updated live link.

 

Next step:

-- any review is good

-- I can update the main readme to point to it

 

Later:

-- bring back lower-level data.new and builders as part ii

-- paul points out the helper structVector.getChildByName (slice-by-name) 
should really be built-in, and can be a nice first pr for someone

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-10 Thread Leo Meyerovich (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013192#comment-17013192
 ] 

Leo Meyerovich commented on ARROW-7513:
---

Great, thanks Paul, I cleaned up the int64vector stuff + initial struct stuff. 
Two last q's and we're done w/ Part I!

 
 * Is there a clean way to create a utf8 dict? I could do 
VectorUtf8.from(["str1", ...]), but no VectorDictionary.from, even with opts. 
 * For a table col w/ a nested struct, any way to slice out a subcol, e.g., 
nick_col = tbl.get('richName').get('nick')

 

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-10 Thread Paul Taylor (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013120#comment-17013120
 ] 

Paul Taylor commented on ARROW-7513:


[~lmeyerov] Int64Vector and Uint64Vector.from methods either require you pass 
the JS BigInt types, or a second "is64bit" boolean argument: 
https://github.com/apache/arrow/blob/master/js/src/vector/int.ts#L63-L64. All 
the IntVectors share the same from implementation IIRC because of a limitation 
in the typescript compiler that may not exist anymore.

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-08 Thread Leo Meyerovich (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011451#comment-17011451
 ] 

Leo Meyerovich commented on ARROW-7513:
---

* Good: Updated the numerics section to use `VectorT.from(Array | Buffer)`
 ** Oddly, arrow.Int64Vector.from((new Uint32Array([2,3, 555,0, 1,0])).buffer) 
returns length 6, not 3 (0.15.0)
 * Bad:`VectorDictionary.from(['hello', 'hello', null, 'carrot'])` did not seem 
to work, so kept as lower-level for now
 * Bad: Still not sure how to do structs

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-08 Thread Leo Meyerovich (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010870#comment-17010870
 ] 

Leo Meyerovich commented on ARROW-7513:
---

Agreed, I'll see about forking this into Part I & Part II, where Part I is 
high-level api and move the Data stuff to Part II. 

 

I'm stumped on `structs` and `nested structs` though, any recs/examples?

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-08 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010761#comment-17010761
 ] 

Brian Hulette commented on ARROW-7513:
--

Thanks for doing this Leo!
I just have one suggestion after a brief look this morning. I think Data should 
be considered a low-level API (and maybe even a private one?), and we should 
direct users to create Vectors directly with the builders, or with the {{from}} 
static initializers (which defer to the builders).

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)