[jira] [Created] (ARROW-12808) [JS] Document browser support

2021-05-17 Thread Brian Hulette (Jira)
Brian Hulette created ARROW-12808:
-

 Summary: [JS] Document browser support
 Key: ARROW-12808
 URL: https://issues.apache.org/jira/browse/ARROW-12808
 Project: Apache Arrow
  Issue Type: Task
  Components: JavaScript
Reporter: Brian Hulette


For example in https://github.com/apache/arrow/pull/10340 we're explicitly 
removing support for IE. We should at least document that IE support is an 
explicit non-goal. Even better if we can identify supported version ranges for 
major browsers. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7616) [Java] Support comparing value ranges for dense union vector

2020-03-16 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-7616.
--
Fix Version/s: 0.17.0
   Resolution: Fixed

Issue resolved by pull request 6355
[https://github.com/apache/arrow/pull/6355]

> [Java] Support comparing value ranges for dense union vector
> 
>
> Key: ARROW-7616
> URL: https://issues.apache.org/jira/browse/ARROW-7616
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.17.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> After we support dense union vectors, we should support range value 
> comparisons for them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3247) [Python] Support spark parquet array and map types

2020-02-27 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046482#comment-17046482
 ] 

Brian Hulette commented on ARROW-3247:
--

Thanks Micah, could you link those here? When searching for parquet and maps 
this is all I could find.

> [Python] Support spark parquet array and map types
> --
>
> Key: ARROW-3247
> URL: https://issues.apache.org/jira/browse/ARROW-3247
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Durant
>Priority: Minor
>  Labels: parquet
>
> As far I understand, there is already some support for nested 
> array/dict/structs in arrow. However, spark Map and List types are structured 
> one level deeper (I believe to allow for both NULL and empty entries). 
> Surprisingly, fastparquet can load these. I do not know the plan for 
> arbitrary nested object support, but it should be made clear.
> Schema of spark-generated file from the fastparquet test suite:
> {code:java}
>  - spark_schema:
> | - map_op_op: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_op_req: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - map_req_op: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_req_req: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_op_op: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
> | - arr_op_req: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_req_op: LIST, REQUIRED
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
>   - arr_req_req: LIST, REQUIRED
> - list: REPEATED
>   - element: BYTE_ARRAY, UTF8, REQUIRED
> {code}
> (please forgive that some of this has already been mentioned elsewhere; this 
> is one of the entries in the list at 
> [https://github.com/dask/fastparquet/issues/374] as a feature that is useful 
> in fastparquet)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3247) [Python] Support spark parquet array and map types

2020-02-26 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045913#comment-17045913
 ] 

Brian Hulette commented on ARROW-3247:
--

I think the Arrow parquet reader just doesn't handle maps of any sort. There 
don't seem to be any tests for it. It looks like we will run into a 
list-of-structs and crash here: 
[https://github.com/apache/arrow/blob/b557587f4f7c8b547fea45dc98b9182f3f5e9bf7/cpp/src/parquet/arrow/reader.cc#L717-L722]

> [Python] Support spark parquet array and map types
> --
>
> Key: ARROW-3247
> URL: https://issues.apache.org/jira/browse/ARROW-3247
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Durant
>Priority: Minor
>  Labels: parquet
>
> As far I understand, there is already some support for nested 
> array/dict/structs in arrow. However, spark Map and List types are structured 
> one level deeper (I believe to allow for both NULL and empty entries). 
> Surprisingly, fastparquet can load these. I do not know the plan for 
> arbitrary nested object support, but it should be made clear.
> Schema of spark-generated file from the fastparquet test suite:
> {code:java}
>  - spark_schema:
> | - map_op_op: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_op_req: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - map_req_op: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_req_req: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_op_op: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
> | - arr_op_req: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_req_op: LIST, REQUIRED
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
>   - arr_req_req: LIST, REQUIRED
> - list: REPEATED
>   - element: BYTE_ARRAY, UTF8, REQUIRED
> {code}
> (please forgive that some of this has already been mentioned elsewhere; this 
> is one of the entries in the list at 
> [https://github.com/dask/fastparquet/issues/374] as a feature that is useful 
> in fastparquet)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3247) [Python] Support spark parquet array and map types

2020-02-26 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045908#comment-17045908
 ] 

Brian Hulette commented on ARROW-3247:
--

[~mdurant] are these types different from the spec defined at 
[https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps]?

> [Python] Support spark parquet array and map types
> --
>
> Key: ARROW-3247
> URL: https://issues.apache.org/jira/browse/ARROW-3247
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Durant
>Priority: Minor
>  Labels: parquet
>
> As far I understand, there is already some support for nested 
> array/dict/structs in arrow. However, spark Map and List types are structured 
> one level deeper (I believe to allow for both NULL and empty entries). 
> Surprisingly, fastparquet can load these. I do not know the plan for 
> arbitrary nested object support, but it should be made clear.
> Schema of spark-generated file from the fastparquet test suite:
> {code:java}
>  - spark_schema:
> | - map_op_op: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_op_req: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - map_req_op: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_req_req: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_op_op: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
> | - arr_op_req: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_req_op: LIST, REQUIRED
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
>   - arr_req_req: LIST, REQUIRED
> - list: REPEATED
>   - element: BYTE_ARRAY, UTF8, REQUIRED
> {code}
> (please forgive that some of this has already been mentioned elsewhere; this 
> is one of the entries in the list at 
> [https://github.com/dask/fastparquet/issues/374] as a feature that is useful 
> in fastparquet)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-3247) [Python] Support spark parquet array and map types

2020-02-26 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045896#comment-17045896
 ] 

Brian Hulette commented on ARROW-3247:
--

I added a code block around the spec so we don't need to switch to text mode.

> [Python] Support spark parquet array and map types
> --
>
> Key: ARROW-3247
> URL: https://issues.apache.org/jira/browse/ARROW-3247
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Durant
>Priority: Minor
>  Labels: parquet
>
> As far I understand, there is already some support for nested 
> array/dict/structs in arrow. However, spark Map and List types are structured 
> one level deeper (I believe to allow for both NULL and empty entries). 
> Surprisingly, fastparquet can load these. I do not know the plan for 
> arbitrary nested object support, but it should be made clear.
> Schema of spark-generated file from the fastparquet test suite:
> {code:java}
>  - spark_schema:
> | - map_op_op: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_op_req: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - map_req_op: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_req_req: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_op_op: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
> | - arr_op_req: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_req_op: LIST, REQUIRED
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
>   - arr_req_req: LIST, REQUIRED
> - list: REPEATED
>   - element: BYTE_ARRAY, UTF8, REQUIRED
> {code}
> (please forgive that some of this has already been mentioned elsewhere; this 
> is one of the entries in the list at 
> [https://github.com/dask/fastparquet/issues/374] as a feature that is useful 
> in fastparquet)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-3247) [Python] Support spark parquet array and map types

2020-02-26 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-3247:
-
Description: 
As far I understand, there is already some support for nested 
array/dict/structs in arrow. However, spark Map and List types are structured 
one level deeper (I believe to allow for both NULL and empty entries). 
Surprisingly, fastparquet can load these. I do not know the plan for arbitrary 
nested object support, but it should be made clear.

Schema of spark-generated file from the fastparquet test suite:
{code:java}
 - spark_schema:
| - map_op_op: MAP, OPTIONAL
|   - key_value: REPEATED
|   | - key: BYTE_ARRAY, UTF8, REQUIRED
| - value: BYTE_ARRAY, UTF8, OPTIONAL
| - map_op_req: MAP, OPTIONAL
|   - key_value: REPEATED
|   | - key: BYTE_ARRAY, UTF8, REQUIRED
| - value: BYTE_ARRAY, UTF8, REQUIRED
| - map_req_op: MAP, REQUIRED
|   - key_value: REPEATED
|   | - key: BYTE_ARRAY, UTF8, REQUIRED
| - value: BYTE_ARRAY, UTF8, OPTIONAL
| - map_req_req: MAP, REQUIRED
|   - key_value: REPEATED
|   | - key: BYTE_ARRAY, UTF8, REQUIRED
| - value: BYTE_ARRAY, UTF8, REQUIRED
| - arr_op_op: LIST, OPTIONAL
|   - list: REPEATED
| - element: BYTE_ARRAY, UTF8, OPTIONAL
| - arr_op_req: LIST, OPTIONAL
|   - list: REPEATED
| - element: BYTE_ARRAY, UTF8, REQUIRED
| - arr_req_op: LIST, REQUIRED
|   - list: REPEATED
| - element: BYTE_ARRAY, UTF8, OPTIONAL
  - arr_req_req: LIST, REQUIRED
- list: REPEATED
  - element: BYTE_ARRAY, UTF8, REQUIRED
{code}
(please forgive that some of this has already been mentioned elsewhere; this is 
one of the entries in the list at 
[https://github.com/dask/fastparquet/issues/374] as a feature that is useful in 
fastparquet)

  was:
As far I understand, there is already some support for nested 
array/dict/structs in arrow. However, spark Map and List types are structured 
one level deeper (I believe to allow for both NULL and empty entries). 
Surprisingly, fastparquet can load these. I do not know the plan for arbitrary 
nested object support, but it should be made clear.

Schema of spark-generated file from the fastparquet test suite (please see in 
text mode):

 - spark_schema:
| - map_op_op: MAP, OPTIONAL
|   - key_value: REPEATED
|   | - key: BYTE_ARRAY, UTF8, REQUIRED
| - value: BYTE_ARRAY, UTF8, OPTIONAL
| - map_op_req: MAP, OPTIONAL
|   - key_value: REPEATED
|   | - key: BYTE_ARRAY, UTF8, REQUIRED
| - value: BYTE_ARRAY, UTF8, REQUIRED
| - map_req_op: MAP, REQUIRED
|   - key_value: REPEATED
|   | - key: BYTE_ARRAY, UTF8, REQUIRED
| - value: BYTE_ARRAY, UTF8, OPTIONAL
| - map_req_req: MAP, REQUIRED
|   - key_value: REPEATED
|   | - key: BYTE_ARRAY, UTF8, REQUIRED
| - value: BYTE_ARRAY, UTF8, REQUIRED
| - arr_op_op: LIST, OPTIONAL
|   - list: REPEATED
| - element: BYTE_ARRAY, UTF8, OPTIONAL
| - arr_op_req: LIST, OPTIONAL
|   - list: REPEATED
| - element: BYTE_ARRAY, UTF8, REQUIRED
| - arr_req_op: LIST, REQUIRED
|   - list: REPEATED
| - element: BYTE_ARRAY, UTF8, OPTIONAL
  - arr_req_req: LIST, REQUIRED
- list: REPEATED
  - element: BYTE_ARRAY, UTF8, REQUIRED

(please forgive that some of this has already been mentioned elsewhere; this is 
one of the entries in the list at 
https://github.com/dask/fastparquet/issues/374 as a feature that is useful in 
fastparquet)


> [Python] Support spark parquet array and map types
> --
>
> Key: ARROW-3247
> URL: https://issues.apache.org/jira/browse/ARROW-3247
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Durant
>Priority: Minor
>  Labels: parquet
>
> As far I understand, there is already some support for nested 
> array/dict/structs in arrow. However, spark Map and List types are structured 
> one level deeper (I believe to allow for both NULL and empty entries). 
> Surprisingly, fastparquet can load these. I do not know the plan for 
> arbitrary nested object support, but it should be made clear.
> Schema of spark-generated file from the fastparquet test suite:
> {code:java}
>  - spark_schema:
> | - map_op_op: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_op_req: MAP, OPTIONAL
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - map_req_op: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, OPTIONAL
> | - map_req_req: MAP, REQUIRED
> |   - key_value: REPEATED
> |   | - key: BYTE_ARRAY, UTF8, REQUIRED
> | - value: BYTE_ARRAY, UTF8, REQUIRED
> | - arr_op_op: LIST, OPTIONAL
> |   - list: REPEATED
> | - element: BYTE_ARRAY, UTF8, OPTIONAL
> | - arr_op_req: LIST, 

[jira] [Created] (ARROW-7674) Add helpful message for captcha challenge in merge_arrow_pr.py

2020-01-24 Thread Brian Hulette (Jira)
Brian Hulette created ARROW-7674:


 Summary: Add helpful message for captcha challenge in 
merge_arrow_pr.py
 Key: ARROW-7674
 URL: https://issues.apache.org/jira/browse/ARROW-7674
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Brian Hulette
Assignee: Brian Hulette


After an incorrect password jira starts requiring a captcha challenge. When 
this happens with merge_arrow_pr.py its difficult to distinguish from any other 
failed login attempt. We should print a helpful message when this happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-24 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-7513.
--
Fix Version/s: 0.16.0
   Resolution: Fixed

Issue resolved by pull request 6163
[https://github.com/apache/arrow/pull/6163]

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.16.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7513) [JS] Arrow Tutorial: Common data types

2020-01-08 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010761#comment-17010761
 ] 

Brian Hulette commented on ARROW-7513:
--

Thanks for doing this Leo!
I just have one suggestion after a brief look this morning. I think Data should 
be considered a low-level API (and maybe even a private one?), and we should 
direct users to create Vectors directly with the builders, or with the {{from}} 
static initializers (which defer to the builders).

> [JS] Arrow Tutorial: Common data types
> --
>
> Key: ARROW-7513
> URL: https://issues.apache.org/jira/browse/ARROW-7513
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Leo Meyerovich
>Priority: Minor
>
> The JS client lacks basic introductory material around creating the common 
> basic data types such as turning JS arrays into ints, dicts, etc. There is no 
> equivalent of Python's [https://arrow.apache.org/docs/python/data.html] . 
> This has made use for myself difficult, and I bet for others.
>  
> As with prev tutorials, I started sketching on 
> [https://observablehq.com/@lmeyerov/rich-data-types-in-apache-arrow-js-efficient-data-tables-wit]
>   . When we're happy can make sense to export as an html or something to the 
> repo, or just link from the main readme.
> I believe the target topics worth covering are:
>  * Common user data types: Ints, Dicts, Struct, Time
>  * Common column types: Data, Vector, Column
>  * Going from individual & arrays & buffers of JS values to Arrow-wrapped 
> forms, and basic inspection of the result
> Not worth going into here is Tables vs. RecordBatches, which is the other 
> tutorial.
>  
> 1. Ideas of what to add/edit/remove?
> 2. And anyone up for helping with discussion of Data vs. Vector, and ingest 
> of Time & Struct?
> 3. ... Should we be encouraging Struct or Map? I saw some PRs changing stuff 
> here.
>  
> cc [~wesm] [~bhulette] [~paul.e.taylor]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7109) [JS] Create table from Arrays

2019-11-11 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971970#comment-16971970
 ] 

Brian Hulette commented on ARROW-7109:
--

I think the table creation is the only thing that needs updating. I think I 
made the observable version of a pull request- I "suggested" my changes: 
https://observablehq.com/compare/5cff242aeff80264@685...c095c1810b0369f1@689

> [JS] Create table from Arrays
> -
>
> Key: ARROW-7109
> URL: https://issues.apache.org/jira/browse/ARROW-7109
> Project: Apache Arrow
>  Issue Type: Wish
>Reporter: Sascha Hofmann
>Assignee: Brian Hulette
>Priority: Minor
>
> I am trying to generate an arrow table from JS arrays and followed the 
> example from [here | 
> [https://observablehq.com/@lmeyerov/manipulating-flat-arrays-arrow-style]] 
> but I am struggling to generate different schemas, most importantly how to 
> provide a type different then 'floatingpoint'. Right now, I have:
> {code:java}
> const data = Table.from({
>   schema: {
> fields: [{name: 'a', nullable: false, children: Array(0), 
>  type: {name: 'floatingpoint', precision: 'SINGLE'}}]
> },
>   batches: [{
>count: 10,
>columns: [{ name: 'a', count: 10, VALIDITY: [], 
> DATA: Array.from({ length: 10 }, () => 'a') }]}]  
> })
> {code}
> Which, of course is non-sense but I couldn't figure out how to provide the 
> type (I tried type: Utf8 among others).  In general wouldn't it be a nice to 
> have create Table from object function?
> On another note, are there any plans to make the docs a little bit more 
> descriptive? Happy to contribute!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7109) [JS] Create table from Arrays

2019-11-11 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971900#comment-16971900
 ] 

Brian Hulette commented on ARROW-7109:
--

Hey Sascha, sorry about the confusion, that notebook is based on a pretty old 
version of arrow JS. Since then, [~paul.e.taylor] has added a lot of syntax to 
make it a lot easier to build tables from JS arrays. There's an example in the 
[README|https://github.com/apache/arrow/tree/master/js#create-a-table-from-javascript-arrays],
 I also suggested a change to [~lmeyerov]'s notebook to use the new syntax: 
https://observablehq.com/@theneuralbit/manipulating-flat-arrays-arrow-style

Does that help?

> [JS] Create table from Arrays
> -
>
> Key: ARROW-7109
> URL: https://issues.apache.org/jira/browse/ARROW-7109
> Project: Apache Arrow
>  Issue Type: Wish
>Reporter: Sascha Hofmann
>Assignee: Brian Hulette
>Priority: Minor
>
> I am trying to generate an arrow table from JS arrays and followed the 
> example from [here | 
> [https://observablehq.com/@lmeyerov/manipulating-flat-arrays-arrow-style]] 
> but I am struggling to generate different schemas, most importantly how to 
> provide a type different then 'floatingpoint'. Right now, I have:
> {code:java}
> const data = Table.from({
>   schema: {
> fields: [{name: 'a', nullable: false, children: Array(0), 
>  type: {name: 'floatingpoint', precision: 'SINGLE'}}]
> },
>   batches: [{
>count: 10,
>columns: [{ name: 'a', count: 10, VALIDITY: [], 
> DATA: Array.from({ length: 10 }, () => 'a') }]}]  
> })
> {code}
> Which, of course is non-sense but I couldn't figure out how to provide the 
> type (I tried type: Utf8 among others).  In general wouldn't it be a nice to 
> have create Table from object function?
> On another note, are there any plans to make the docs a little bit more 
> descriptive? Happy to contribute!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-7109) [JS] Create table from Arrays

2019-11-11 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned ARROW-7109:


Assignee: Brian Hulette

> [JS] Create table from Arrays
> -
>
> Key: ARROW-7109
> URL: https://issues.apache.org/jira/browse/ARROW-7109
> Project: Apache Arrow
>  Issue Type: Wish
>Reporter: Sascha Hofmann
>Assignee: Brian Hulette
>Priority: Minor
>
> I am trying to generate an arrow table from JS arrays and followed the 
> example from [here | 
> [https://observablehq.com/@lmeyerov/manipulating-flat-arrays-arrow-style]] 
> but I am struggling to generate different schemas, most importantly how to 
> provide a type different then 'floatingpoint'. Right now, I have:
> {code:java}
> const data = Table.from({
>   schema: {
> fields: [{name: 'a', nullable: false, children: Array(0), 
>  type: {name: 'floatingpoint', precision: 'SINGLE'}}]
> },
>   batches: [{
>count: 10,
>columns: [{ name: 'a', count: 10, VALIDITY: [], 
> DATA: Array.from({ length: 10 }, () => 'a') }]}]  
> })
> {code}
> Which, of course is non-sense but I couldn't figure out how to provide the 
> type (I tried type: Utf8 among others).  In general wouldn't it be a nice to 
> have create Table from object function?
> On another note, are there any plans to make the docs a little bit more 
> descriptive? Happy to contribute!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6908) Add support for Bazel

2019-10-16 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953119#comment-16953119
 ] 

Brian Hulette commented on ARROW-6908:
--

Could https://github.com/bazelbuild/rules_foreign_cc#building-cmake-projects 
work?

> Add support for Bazel
> -
>
> Key: ARROW-6908
> URL: https://issues.apache.org/jira/browse/ARROW-6908
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Aryan Naraghi
>Priority: Major
>
> I would like to use Arrow in a C++ project that uses Bazel.
>  
> Would it be possible to add support for building Arrow using Bazel?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6370) [JS] Table.from adds 0 on int columns

2019-08-28 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917857#comment-16917857
 ] 

Brian Hulette commented on ARROW-6370:
--

Yeah I suspect converting to int32 will solve your problem. But this is still a 
bug so I'll see if I can reproduce it :)
What version of node are you using?

> [JS] Table.from adds 0 on int columns
> -
>
> Key: ARROW-6370
> URL: https://issues.apache.org/jira/browse/ARROW-6370
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.14.1
>Reporter: Sascha Hofmann
>Priority: Major
>
> I am generating an arrow table in pyarrow and send it via gRPC like this:
> {code:java}
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> writer.close()
> yield ds.Response(
> status=200,
> loading=False,
> response=[sink.getvalue().to_pybytes()]   
> )
> {code}
> On the javascript end, I parse it like that:
> {code:java}
>  Table.from(response.getResponseList()[0])
> {code}
> That works but when I look at the actual table, int columns have a 0 for 
> every other row. String columns seem to be parsed just fine. 
> The Python byte array created from to_pybytes() has the same length as 
> received in javascript. I am also able to recreate the original table for the 
> byte array in Python. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6370) [JS] Table.from adds 0 on int columns

2019-08-28 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917848#comment-16917848
 ] 

Brian Hulette commented on ARROW-6370:
--

What is the type of the int column, int64? int64s behave a little weird in JS. 
If running in a platform with BigInt, calls to Int64Array.get _should_ return 
an instance of it, otherwise they will return a two element slice of Int32Array 
with the high, low bytes.

Could you provide a little more detail on how you're generating the record 
batches? and maybe how you're observing the ints?

> [JS] Table.from adds 0 on int columns
> -
>
> Key: ARROW-6370
> URL: https://issues.apache.org/jira/browse/ARROW-6370
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.14.1
>Reporter: Sascha Hofmann
>Priority: Major
>
> I am generating an arrow table in pyarrow and send it via gRPC like this:
> {code:java}
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> writer.close()
> yield ds.Response(
> status=200,
> loading=False,
> response=[sink.getvalue().to_pybytes()]   
> )
> {code}
> On the javascript end, I parse it like that:
> {code:java}
>  Table.from(response.getResponseList()[0])
> {code}
> That works but when I look at the actual table, int columns have a 0 for 
> every other row. String columns seem to be parsed just fine. 
> The Python byte array created from to_pybytes() has the same length as 
> received in javascript. I am also able to recreate the original table for the 
> byte array in Python. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6317) [JS] Implement changes to ensure flatbuffer alignment

2019-08-23 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914459#comment-16914459
 ] 

Brian Hulette commented on ARROW-6317:
--

oh! so he did, thanks for letting me know. I'll assign this to [~paul.e.taylor] 
for now then. I'm happy to pick this up if you don't have time though.


> [JS] Implement changes to ensure flatbuffer alignment
> -
>
> Key: ARROW-6317
> URL: https://issues.apache.org/jira/browse/ARROW-6317
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: JavaScript
>Reporter: Micah Kornfield
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 0.15.0
>
>
> See description in parent bug on requirements.
> [~bhulette] or [~paul.e.taylor] do you think one of you would be able to pick 
> this up for 0.15.0



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (ARROW-6317) [JS] Implement changes to ensure flatbuffer alignment

2019-08-23 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned ARROW-6317:


Assignee: Paul Taylor  (was: Brian Hulette)

> [JS] Implement changes to ensure flatbuffer alignment
> -
>
> Key: ARROW-6317
> URL: https://issues.apache.org/jira/browse/ARROW-6317
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: JavaScript
>Reporter: Micah Kornfield
>Assignee: Paul Taylor
>Priority: Blocker
> Fix For: 0.15.0
>
>
> See description in parent bug on requirements.
> [~bhulette] or [~paul.e.taylor] do you think one of you would be able to pick 
> this up for 0.15.0



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (ARROW-6317) [JS] Implement changes to ensure flatbuffer alignment

2019-08-23 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned ARROW-6317:


Assignee: Brian Hulette

> [JS] Implement changes to ensure flatbuffer alignment
> -
>
> Key: ARROW-6317
> URL: https://issues.apache.org/jira/browse/ARROW-6317
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: JavaScript
>Reporter: Micah Kornfield
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 0.15.0
>
>
> See description in parent bug on requirements.
> [~bhulette] or [~paul.e.taylor] do you think one of you would be able to pick 
> this up for 0.15.0



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6317) [JS] Implement changes to ensure flatbuffer alignment

2019-08-23 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914451#comment-16914451
 ] 

Brian Hulette commented on ARROW-6317:
--

I can take a look tonight.

> [JS] Implement changes to ensure flatbuffer alignment
> -
>
> Key: ARROW-6317
> URL: https://issues.apache.org/jira/browse/ARROW-6317
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: JavaScript
>Reporter: Micah Kornfield
>Priority: Blocker
> Fix For: 0.15.0
>
>
> See description in parent bug on requirements.
> [~bhulette] or [~paul.e.taylor] do you think one of you would be able to pick 
> this up for 0.15.0



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6282) Support lossy compression

2019-08-17 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909851#comment-16909851
 ] 

Brian Hulette commented on ARROW-6282:
--

Great idea! I think right now we only support compressing entire record record 
batches, to make this work would need buffer-level compression so that we could 
just compress the floating-point buffers. [~emkornfi...@gmail.com] did write up 
a proposal that included buffer-level compression, among other things: 
[strawman PR|https://github.com/apache/arrow/pull/4815], [ML 
discussion|https://lists.apache.org/thread.html/a99124e57c14c3c9ef9d98f3c80cfe1dd25496bf3ff7046778add937@%3Cdev.arrow.apache.org%3E]

> Support lossy compression
> -
>
> Key: ARROW-6282
> URL: https://issues.apache.org/jira/browse/ARROW-6282
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Dominik Moritz
>Priority: Major
>
> Arrow dataframes with large columns of integers or floats can be compressed 
> using gzip or brotli. However, in some cases it will be okay to compress the 
> data lossy to achieve even higher compression ratios. The main use case for 
> this is visualization where small inaccuracies matter less. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-5791) pyarrow.csv.read_csv hangs + eats all RAM

2019-06-29 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875623#comment-16875623
 ] 

Brian Hulette commented on ARROW-5791:
--

Thanks for the concise bug report! I haven't had a chance to dig into this very 
far, but I'm sure it's not a coincidence that 32768 == 2^15. 32767 is the max 
of an unsigned 16-bit integer, so if we're assigning an unsigned int16 to each 
column somewhere it would overflow once you get beyond 32768 columns (since one 
column gets 0).

I'm not sure where exactly that would be happening though. My first inclination 
was that it would be in the element count for the [vector of 
fields|https://github.com/apache/arrow/blob/master/format/Schema.fbs#L321], but 
according to the [flatbuffers 
page|https://google.github.io/flatbuffers/flatbuffers_internals.html] vectors 
are prefixed by a 32-bit element count.

> pyarrow.csv.read_csv hangs + eats all RAM
> -
>
> Key: ARROW-5791
> URL: https://issues.apache.org/jira/browse/ARROW-5791
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.13.0
> Environment: Ubuntu Xenial, python 2.7
>Reporter: Bogdan Klichuk
>Priority: Major
> Attachments: csvtest.py, graph.svg, sample_32768_cols.csv, 
> sample_32769_cols.csv
>
>
> I have quite a sparse dataset in CSV format. A wide table that has several 
> rows but many (32k) columns. Total size ~540K.
> When I read the dataset using `pyarrow.csv.read_csv` it hangs, gradually eats 
> all memory and gets killed.
> More details on the conditions further. Script to run and all mentioned files 
> are under attachments.
> 1) `sample_32769_cols.csv` is the dataset that suffers the problem.
> 2) `sample_32768_cols.csv` is the dataset that DOES NOT suffer and is read in 
> under 400ms on my machine. It's the same dataset without ONE last column. 
> That last column is no different than others and has empty values.
> The reason of why exactly this column makes difference between proper 
> execution and hanging failure which looks like some memory leak - no idea.
> I have created flame graph for the case (1) to support this issue resolution 
> (`graph.svg`).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5741) [JS] Make numeric vector from functions consistent with TypedArray.from

2019-06-26 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873413#comment-16873413
 ] 

Brian Hulette commented on ARROW-5741:
--

[~paul.e.taylor] are you going to take this? Do you think it can be done for 
0.14?

> [JS] Make numeric vector from functions consistent with TypedArray.from
> ---
>
> Key: ARROW-5741
> URL: https://issues.apache.org/jira/browse/ARROW-5741
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Major
>
> Described in 
> https://lists.apache.org/thread.html/b648a781cba7f10d5a6072ff2e7dab6c03e2d1f12e359d9261891486@%3Cdev.arrow.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5741) [JS] Make numeric vector from functions consistent with TypedArray.from

2019-06-26 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-5741:


 Summary: [JS] Make numeric vector from functions consistent with 
TypedArray.from
 Key: ARROW-5741
 URL: https://issues.apache.org/jira/browse/ARROW-5741
 Project: Apache Arrow
  Issue Type: Improvement
  Components: JavaScript
Reporter: Brian Hulette


Described in 
https://lists.apache.org/thread.html/b648a781cba7f10d5a6072ff2e7dab6c03e2d1f12e359d9261891486@%3Cdev.arrow.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5356) [JS] Implement Duration type, integration test support for Interval and Duration types

2019-06-26 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-5356:
-
Fix Version/s: 0.14.0

> [JS] Implement Duration type, integration test support for Interval and 
> Duration types
> --
>
> Key: ARROW-5356
> URL: https://issues.apache.org/jira/browse/ARROW-5356
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 0.14.0
>
>
> Follow on work to ARROW-835



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5740) [JS] Add ability to run tests in headless browsers

2019-06-26 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-5740:


 Summary: [JS] Add ability to run tests in headless browsers
 Key: ARROW-5740
 URL: https://issues.apache.org/jira/browse/ARROW-5740
 Project: Apache Arrow
  Issue Type: Task
  Components: JavaScript
Reporter: Brian Hulette


Now that we have a compatibility check that modifies behavior based on the 
features in a supported browser, we should really be running our tests in 
various browsers to exercise the various cases.

For example right now we don't actually run tests on the non-BigNum code.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5714) [JS] Inconsistent behavior in Int64Builder with/without BigNum

2019-06-24 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871698#comment-16871698
 ] 

Brian Hulette commented on ARROW-5714:
--

I'm pretty sure the issue is that we don't account for stride here: 
https://github.com/apache/arrow/blob/master/js/src/builder/buffer.ts#L108

> [JS] Inconsistent behavior in Int64Builder with/without BigNum
> --
>
> Key: ARROW-5714
> URL: https://issues.apache.org/jira/browse/ARROW-5714
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 0.14.0
>
>
> When the Int64Builder is used in a context without BigNum, appending two 
> numbers combines them into a single Int64:
> {code}
> > v = Arrow.Builder.new({type: new 
> > Arrow.Int64()}).append(1).append(2).finish().toVector()
> > v.get(0)
> Int32Array [ 1, 2 ]
> {code}
> Whereas the same process with BigNum creates two new Int64s.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5714) [JS] Inconsistent behavior in Int64Builder with/without BigNum

2019-06-24 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-5714:
-
Description: 
When the Int64Builder is used in a context without BigNum, appending two 
numbers combines them into a single Int64:

{code}
> v = Arrow.Builder.new({type: new 
> Arrow.Int64()}).append(1).append(2).finish().toVector()
> v.get(0)
Int32Array [ 1, 2 ]
{code}

Whereas the same process with BigNum creates two new Int64s.

  was:
When the Int64Builder is used in a context without BigNum, appending two 
numbers combines them into a single Int64:

{{
> v = Arrow.Builder.new({type: new 
> Arrow.Int64()}).append(1).append(2).finish().toVector()
> v.get(0)
Int32Array [ 1, 2 ]
}}

Whereas the same process with BigNum creates two new Int64s.


> [JS] Inconsistent behavior in Int64Builder with/without BigNum
> --
>
> Key: ARROW-5714
> URL: https://issues.apache.org/jira/browse/ARROW-5714
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 0.14.0
>
>
> When the Int64Builder is used in a context without BigNum, appending two 
> numbers combines them into a single Int64:
> {code}
> > v = Arrow.Builder.new({type: new 
> > Arrow.Int64()}).append(1).append(2).finish().toVector()
> > v.get(0)
> Int32Array [ 1, 2 ]
> {code}
> Whereas the same process with BigNum creates two new Int64s.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5537) [JS] Support delta dictionaries in RecordBatchWriter and DictionaryBuilder

2019-06-21 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-5537:
-
Component/s: JavaScript

> [JS] Support delta dictionaries in RecordBatchWriter and DictionaryBuilder
> --
>
> Key: ARROW-5537
> URL: https://issues.apache.org/jira/browse/ARROW-5537
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: 0.13.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The new JS DictionaryBuilder and RecordBatchWriter and should support 
> building and writing delta dictionary batches to enable creating 
> DictionaryVectors while streaming.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5689) [JS] Remove hard-coded Field.nullable

2019-06-21 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-5689:


 Summary: [JS] Remove hard-coded Field.nullable
 Key: ARROW-5689
 URL: https://issues.apache.org/jira/browse/ARROW-5689
 Project: Apache Arrow
  Issue Type: Task
  Components: JavaScript
Reporter: Brian Hulette


Context: https://github.com/apache/arrow/pull/4502#discussion_r296390833

This isn't a huge issue since we can just elide validity buffers when null 
count is zero, but sometimes it's desirable to be able to assert a Field is 
_never_ null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5537) [JS] Support delta dictionaries in RecordBatchWriter and DictionaryBuilder

2019-06-21 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-5537.
--
Resolution: Fixed

Issue resolved by pull request 4502
[https://github.com/apache/arrow/pull/4502]

> [JS] Support delta dictionaries in RecordBatchWriter and DictionaryBuilder
> --
>
> Key: ARROW-5537
> URL: https://issues.apache.org/jira/browse/ARROW-5537
> Project: Apache Arrow
>  Issue Type: New Feature
>Affects Versions: 0.13.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The new JS DictionaryBuilder and RecordBatchWriter and should support 
> building and writing delta dictionary batches to enable creating 
> DictionaryVectors while streaming.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5688) [JS] Add test for EOS in File Format

2019-06-21 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869969#comment-16869969
 ] 

Brian Hulette commented on ARROW-5688:
--

[~jgm-ktg] - here's the JIRA to add a test if you're still working on adding 
one :)

> [JS] Add test for EOS in File Format
> 
>
> Key: ARROW-5688
> URL: https://issues.apache.org/jira/browse/ARROW-5688
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Brian Hulette
>Priority: Major
>
> Either in a unit test, or in the integration tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5688) [JS] Add test for EOS in File Format

2019-06-21 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-5688:


 Summary: [JS] Add test for EOS in File Format
 Key: ARROW-5688
 URL: https://issues.apache.org/jira/browse/ARROW-5688
 Project: Apache Arrow
  Issue Type: Task
Reporter: Brian Hulette


Either in a unit test, or in the integration tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5438) [JS] Utilize stream EOS in File format

2019-06-21 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-5438.
--
   Resolution: Fixed
 Assignee: John Muehlhausen
Fix Version/s: 0.14.0

> [JS] Utilize stream EOS in File format
> --
>
> Key: ARROW-5438
> URL: https://issues.apache.org/jira/browse/ARROW-5438
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: John Muehlhausen
>Assignee: John Muehlhausen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We currently do not write EOS at the end of a Message stream inside the File 
> format.  As a result, the file cannot be parsed sequentially.  This change 
> prepares for other implementations or future reference features that parse a 
> File sequentially... i.e. without access to seek().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5115) [JS] Implement the Vector Builders

2019-06-19 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-5115.
--
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4476
[https://github.com/apache/arrow/pull/4476]

> [JS] Implement the Vector Builders
> --
>
> Key: ARROW-5115
> URL: https://issues.apache.org/jira/browse/ARROW-5115
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: 0.13.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We should implement the streaming Vector Builders in JS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2797) [JS] comparison predicates don't work on 64-bit integers

2019-06-19 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867816#comment-16867816
 ] 

Brian Hulette commented on ARROW-2797:
--

Maybe if we only support this when we have BigInt it wouldn't be that big of a 
change?

> [JS] comparison predicates don't work on 64-bit integers
> 
>
> Key: ARROW-2797
> URL: https://issues.apache.org/jira/browse/ARROW-2797
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: JS-0.3.1
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>
> The 64-bit integer vector {{get}} function returns a 2-element array, which 
> doesn't compare propery in the comparison predicates. We should special case 
> the comparisons for 64-bit integers and timestamps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5491) [C++] Remove unecessary semicolons following MACRO definitions

2019-06-04 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-5491.
--
Resolution: Fixed

> [C++] Remove unecessary semicolons following MACRO definitions
> --
>
> Key: ARROW-5491
> URL: https://issues.apache.org/jira/browse/ARROW-5491
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Affects Versions: 0.13.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5491) [C++] Remove unecessary semicolons following MACRO definitions

2019-06-03 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-5491:
-
Summary: [C++] Remove unecessary semicolons following MACRO definitions  
(was: Remove unecessary semicolons following MACRO definitions)

> [C++] Remove unecessary semicolons following MACRO definitions
> --
>
> Key: ARROW-5491
> URL: https://issues.apache.org/jira/browse/ARROW-5491
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Affects Versions: 0.13.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5491) Remove unecessary semicolons following MACRO definitions

2019-06-03 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-5491:


 Summary: Remove unecessary semicolons following MACRO definitions
 Key: ARROW-5491
 URL: https://issues.apache.org/jira/browse/ARROW-5491
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Affects Versions: 0.13.0
Reporter: Brian Hulette
Assignee: Brian Hulette
 Fix For: 0.14.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5356) [JS] Implement Duration type, integration test support for Interval and Duration types

2019-05-17 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned ARROW-5356:


Assignee: Brian Hulette

> [JS] Implement Duration type, integration test support for Interval and 
> Duration types
> --
>
> Key: ARROW-5356
> URL: https://issues.apache.org/jira/browse/ARROW-5356
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
>
> Follow on work to ARROW-835



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5274) [JavaScript] Wrong array type for countBy

2019-05-15 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-5274.
--
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4265
[https://github.com/apache/arrow/pull/4265]

> [JavaScript] Wrong array type for countBy
> -
>
> Key: ARROW-5274
> URL: https://issues.apache.org/jira/browse/ARROW-5274
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Yngve Kristiansen
>Assignee: Yngve Kristiansen
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>   Original Estimate: 5m
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The {{countBy}} function is not returning correct histograms, as it seems to 
> select the wrong array type for the indexing.
> The following line in countBy seems to be causing the problems:
> {{const countByteLength = Math.ceil(Math.log(vector.dictionary.length) / 
> Math.log(256));}}
> For example, if the dictionary length is 3, yet the indices length is 1 
> million, the result of this expression will be 1, which will lead to a 
> Uint8Array being used, again resulting in overflows.
> Codepen example
>  [https://codepen.io/Yngve92/pen/mYdWrr]
> If I switch the expression to: {{const countByteLength = 
> Math.ceil(Math.log(vector.length) / Math.log(256));}} it seems to be working 
> all right, but I am not sure if this is correct.
> The expression is on L63, L189 in src/compute/dataframe.ts.
>  
> PR submitted: [https://github.com/apache/arrow/pull/4265] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5313) [Format] Comments on Field table are a bit confusing

2019-05-13 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-5313:


 Summary: [Format] Comments on Field table are a bit confusing
 Key: ARROW-5313
 URL: https://issues.apache.org/jira/browse/ARROW-5313
 Project: Apache Arrow
  Issue Type: Task
  Components: Format
Affects Versions: 0.13.0
Reporter: Brian Hulette
Assignee: Brian Hulette


Currently Schema.fbs has two different explanations of {{Field.children}}

One says "children is only for nested Arrow arrays" and the other says 
"children apply only to nested data types like Struct, List and Union". I think 
both are technically correct but the latter is much more explicit, we should 
remove the former.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2412) [Integration] Add nested dictionary integration test

2019-04-27 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827780#comment-16827780
 ] 

Brian Hulette commented on ARROW-2412:
--

[~wesmckinn] yes there is a (very outdated) patch: 
https://github.com/apache/arrow/pull/1848

I never merged it for fear of making the tests fail, but I should probably just 
add it with the generate call commented out.

> [Integration] Add nested dictionary integration test
> 
>
> Key: ARROW-2412
> URL: https://issues.apache.org/jira/browse/ARROW-2412
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Integration
>Reporter: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add nested dictionary generator to the integration test. The tests will 
> probably fail at first but can serve as a starting point for developing this 
> capability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4991) [CI] Bump travis node version to 11.12

2019-03-21 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-4991:


 Summary: [CI] Bump travis node version to 11.12
 Key: ARROW-4991
 URL: https://issues.apache.org/jira/browse/ARROW-4991
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Brian Hulette
Assignee: Brian Hulette
 Fix For: JS-0.4.1






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4988) [JS] Bump required node version to 11.12

2019-03-21 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-4988.
--
   Resolution: Fixed
Fix Version/s: JS-0.4.1

Issue resolved by pull request 4006
[https://github.com/apache/arrow/pull/4006]

> [JS] Bump required node version to 11.12
> 
>
> Key: ARROW-4988
> URL: https://issues.apache.org/jira/browse/ARROW-4988
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The cause of ARROW-4948 and 
> http://mail-archives.apache.org/mod_mbox/arrow-dev/201903.mbox/%3C5ce620e0-0063-4bee-8ad6-a41301ac08c4%40www.fastmail.com%3E
> was actually a regression in node v11.11, resolved in v11.12 see 
> https://github.com/nodejs/node/blob/master/doc/changelogs/CHANGELOG_V11.md#2019-03-15-version-11120-current-bridgear
>  and https://github.com/nodejs/node/pull/26488
> Bump requirement up to 11.12



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4988) [JS] Bump required node version to 11.12

2019-03-21 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-4988:
-
Summary: [JS] Bump required node version to 11.12  (was: Bump required node 
version to 11.12)

> [JS] Bump required node version to 11.12
> 
>
> Key: ARROW-4988
> URL: https://issues.apache.org/jira/browse/ARROW-4988
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>
> The cause of ARROW-4948 and 
> http://mail-archives.apache.org/mod_mbox/arrow-dev/201903.mbox/%3C5ce620e0-0063-4bee-8ad6-a41301ac08c4%40www.fastmail.com%3E
> was actually a regression in node v11.11, resolved in v11.12 see 
> https://github.com/nodejs/node/blob/master/doc/changelogs/CHANGELOG_V11.md#2019-03-15-version-11120-current-bridgear
>  and https://github.com/nodejs/node/pull/26488
> Bump requirement up to 11.12



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4988) Bump required node version to 11.12

2019-03-21 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-4988:


 Summary: Bump required node version to 11.12
 Key: ARROW-4988
 URL: https://issues.apache.org/jira/browse/ARROW-4988
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Brian Hulette
Assignee: Brian Hulette


The cause of ARROW-4948 and 
http://mail-archives.apache.org/mod_mbox/arrow-dev/201903.mbox/%3C5ce620e0-0063-4bee-8ad6-a41301ac08c4%40www.fastmail.com%3E

was actually a regression in node v11.11, resolved in v11.12 see 
https://github.com/nodejs/node/blob/master/doc/changelogs/CHANGELOG_V11.md#2019-03-15-version-11120-current-bridgear
 and https://github.com/nodejs/node/pull/26488

Bump requirement up to 11.12



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4738) [JS] NullVector should include a null data buffer

2019-03-01 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782207#comment-16782207
 ] 

Brian Hulette commented on ARROW-4738:
--

Is this the cause of https://issues.apache.org/jira/browse/ARROW-3667?

> [JS] NullVector should include a null data buffer
> -
>
> Key: ARROW-4738
> URL: https://issues.apache.org/jira/browse/ARROW-4738
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.4.1
>
>
> Arrow C++ and pyarrow expect NullVectors to include a null data buffer, so 
> ArrowJS should write one into the buffer layout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4682) [JS] Writer should be able to write empty tables

2019-02-27 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-4682.
--
Resolution: Fixed

Issue resolved by pull request 3759
[https://github.com/apache/arrow/pull/3759]

> [JS] Writer should be able to write empty tables
> 
>
> Key: ARROW-4682
> URL: https://issues.apache.org/jira/browse/ARROW-4682
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.1
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The writer should be able to write empty tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4695) [JS] Tests timing out on Travis

2019-02-27 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779461#comment-16779461
 ] 

Brian Hulette commented on ARROW-4695:
--

I was able to repro this locally (I think?) on commit 
1d0b3697efee154f72b96df20155eb7e68ce6569

{noformat}
-> % npm run test

> apache-arrow@ test /home/hulettbh/working_dir/arrow/js
> NODE_NO_WARNINGS=1 gulp test

[07:48:18] Using gulpfile ~/working_dir/arrow/js/gulpfile.js
[07:48:18] Starting 'test'...
[07:48:18] Starting 'test:ts'...
[07:48:18] Starting 'test:src'...
[07:48:18] Starting 'test:apache-arrow'...
[07:48:53] Finished 'test:apache-arrow' after 35 s
[07:48:53] Starting 'test:es5:cjs'...
[07:48:54] Finished 'test:ts' after 36 s
[07:48:54] Starting 'test:es2015:cjs'...
[07:49:04] Finished 'test:src' after 46 s
[07:49:04] Starting 'test:esnext:cjs'...
[07:49:42] Finished 'test:es2015:cjs' after 48 s
[07:49:42] Starting 'test:es5:esm'...
[07:50:21] Finished 'test:es5:esm' after 39 s
[07:50:21] Starting 'test:es2015:esm'...
[07:50:58] Finished 'test:es2015:esm' after 37 s
[07:50:58] Starting 'test:esnext:esm'...
[07:51:32] Finished 'test:esnext:esm' after 34 s
[07:51:32] Starting 'test:es5:umd'...
[07:52:08] Finished 'test:es5:umd' after 36 s
[07:52:08] Starting 'test:es2015:umd'...
[07:52:46] Finished 'test:es2015:umd' after 38 s
[07:52:46] Starting 'test:esnext:umd'...
[07:53:24] Finished 'test:esnext:umd' after 38 s
{noformat}

I cancelled the tests at 8:03:25 after getting no additional output. In this 
case it looks like es5:cjs and esnext:cjs never finished.


> [JS] Tests timing out on Travis
> ---
>
> Key: ARROW-4695
> URL: https://issues.apache.org/jira/browse/ARROW-4695
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Brian Hulette
>Priority: Major
>  Labels: travis-ci
>
> Example build: https://travis-ci.org/apache/arrow/jobs/498967250
> JS tests sometimes fail with the following message:
> {noformat}
> > apache-arrow@ test /home/travis/build/apache/arrow/js
> > NODE_NO_WARNINGS=1 gulp test
> [22:14:01] Using gulpfile ~/build/apache/arrow/js/gulpfile.js
> [22:14:01] Starting 'test'...
> [22:14:01] Starting 'test:ts'...
> [22:14:49] Finished 'test:ts' after 47 s
> [22:14:49] Starting 'test:src'...
> [22:15:27] Finished 'test:src' after 38 s
> [22:15:27] Starting 'test:apache-arrow'...
> No output has been received in the last 10m0s, this potentially indicates a 
> stalled build or something wrong with the build itself.
> Check the details on how to adjust your build configuration on: 
> https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
> The build has been terminated
> {noformat}
> I thought maybe we were just running up against some time limit, but that 
> particular build was terminated at 22:25:27, exactly ten minutes after the 
> last output, at 22:15:27. So it does seem like the build is somehow stalling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4695) [JS] Tests timing out on Travis

2019-02-27 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-4695:


 Summary: [JS] Tests timing out on Travis
 Key: ARROW-4695
 URL: https://issues.apache.org/jira/browse/ARROW-4695
 Project: Apache Arrow
  Issue Type: Improvement
  Components: JavaScript
Affects Versions: JS-0.4.0
Reporter: Brian Hulette


Example build: https://travis-ci.org/apache/arrow/jobs/498967250

JS tests sometimes fail with the following message:

{noformat}
> apache-arrow@ test /home/travis/build/apache/arrow/js
> NODE_NO_WARNINGS=1 gulp test
[22:14:01] Using gulpfile ~/build/apache/arrow/js/gulpfile.js
[22:14:01] Starting 'test'...
[22:14:01] Starting 'test:ts'...
[22:14:49] Finished 'test:ts' after 47 s
[22:14:49] Starting 'test:src'...
[22:15:27] Finished 'test:src' after 38 s
[22:15:27] Starting 'test:apache-arrow'...
No output has been received in the last 10m0s, this potentially indicates a 
stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: 
https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated
{noformat}

I thought maybe we were just running up against some time limit, but that 
particular build was terminated at 22:25:27, exactly ten minutes after the last 
output, at 22:15:27. So it does seem like the build is somehow stalling.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4686) Only accept 'y' or 'n' in merge_arrow_pr.py prompts

2019-02-26 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-4686:


 Summary: Only accept 'y' or 'n' in merge_arrow_pr.py prompts
 Key: ARROW-4686
 URL: https://issues.apache.org/jira/browse/ARROW-4686
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Brian Hulette
Assignee: Brian Hulette


The current prompt syntax ("y/n" with neither capitalized) implies there's no 
default, which I think is the right behavior, but it's not implemented that 
way. Script should retry until either y or n is received.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4674) [JS] Update arrow2csv to new Row API

2019-02-26 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-4674.
--
Resolution: Fixed

Issue resolved by pull request 3747
[https://github.com/apache/arrow/pull/3747]

> [JS] Update arrow2csv to new Row API
> 
>
> Key: ARROW-4674
> URL: https://issues.apache.org/jira/browse/ARROW-4674
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The {{arrow2csv}} utility uses {{row.length}} to measure cells, but now that 
> we've made Rows use Symbols for their internal properties, it should 
> enumerate the values with the iterator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4652) [JS] RecordBatchReader throughNode should respect autoDestroy

2019-02-22 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-4652.
--
Resolution: Fixed

Issue resolved by pull request 3727
[https://github.com/apache/arrow/pull/3727]

> [JS] RecordBatchReader throughNode should respect autoDestroy
> -
>
> Key: ARROW-4652
> URL: https://issues.apache.org/jira/browse/ARROW-4652
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The Reader transform stream closes after reading one set of tables even when 
> autoDestroy is false. Instead it should reset/reopen the reader, like 
> {{RecordBatchReader.readAll()}} does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4579) [JS] Add more interop with BigInt/BigInt64Array/BigUint64Array

2019-02-22 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-4579.
--
Resolution: Fixed

Closed in https://github.com/apache/arrow/pull/3653

> [JS] Add more interop with BigInt/BigInt64Array/BigUint64Array
> --
>
> Key: ARROW-4579
> URL: https://issues.apache.org/jira/browse/ARROW-4579
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.4.1
>
>
> We should use or return the new native [BigInt 
> types|https://developers.google.com/web/updates/2018/05/bigint] whenever it's 
> available.
> * Use the native {{BigInt}} to convert/stringify i64s/u64s
> * Support the {{BigInt}} type in element comparator and {{indexOf()}}
> * Add zero-copy {{toBigInt64Array()}} and {{toBigUint64Array()}} methods to 
> {{Int64Vector}} and {{Uint64Vector}}, respectively



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4578) [JS] Float16Vector toArray should be zero-copy

2019-02-22 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-4578.
--
Resolution: Fixed

Issue resolved by pull request 3653
[https://github.com/apache/arrow/pull/3653]

> [JS] Float16Vector toArray should be zero-copy
> --
>
> Key: ARROW-4578
> URL: https://issues.apache.org/jira/browse/ARROW-4578
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.1
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The {{Float16Vector#toArray()}} implementation currently transforms each half 
> float into a single float, and returns a Float32Array. All the other 
> {{toArray()}} implementations are zero-copy, and this deviation would break 
> anyone expecting to give two-byte half floats to native APIs like WebGL. We 
> should instead include {{Float16Vector#toFloat32Array()}} and 
> {{Float16Vector#toFloat64Array()}} convenience methods that do rely on 
> copying.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4555) [JS] Add high-level Table and Column creation methods

2019-02-21 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-4555.
--
   Resolution: Fixed
Fix Version/s: (was: JS-0.5.0)
   JS-0.4.1

Closed in https://github.com/apache/arrow/pull/3634

> [JS] Add high-level Table and Column creation methods
> -
>
> Key: ARROW-4555
> URL: https://issues.apache.org/jira/browse/ARROW-4555
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.4.1
>
>
> It'd be great to have a few high-level functions that implicitly create the 
> Schema, RecordBatches, etc. from a Table and a list of Columns. For example:
> {code:actionscript}
> const table = Table.new(
>   Column.new('foo', ...),
>   Column.new('bar', ...)
> );
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4557) [JS] Add Table/Schema/RecordBatch `selectAt(...indices)` method

2019-02-21 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-4557.
--
   Resolution: Fixed
Fix Version/s: (was: JS-0.5.0)
   JS-0.4.1

Closed in https://github.com/apache/arrow/pull/3634

> [JS] Add Table/Schema/RecordBatch `selectAt(...indices)` method
> ---
>
> Key: ARROW-4557
> URL: https://issues.apache.org/jira/browse/ARROW-4557
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.4.1
>
>
> Presently Table, Schema, and RecordBatch have basic {{select(...colNames)}} 
> implementations. Having an easy {{selectAt(...colIndices)}} impl would be a 
> nice complement, especially when there are duplicate column names.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4554) [JS] Implement logic for combining Vectors with different lengths/chunksizes

2019-02-21 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774245#comment-16774245
 ] 

Brian Hulette commented on ARROW-4554:
--

Closed in https://github.com/apache/arrow/pull/3634

> [JS] Implement logic for combining Vectors with different lengths/chunksizes
> 
>
> Key: ARROW-4554
> URL: https://issues.apache.org/jira/browse/ARROW-4554
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.4.1
>
>
> We should add logic to combine and possibly slice/re-chunk and uniformly 
> partition chunks into separate RecordBatches. This will make it easier to 
> create Tables or RecordBatches from Vectors of different lengths. This is 
> also necessary for {{Table#assign()}}. PR incoming.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4553) [JS] Implement Schema/Field/DataType comparators

2019-02-21 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-4553.
--
Resolution: Fixed

Closed in https://github.com/apache/arrow/pull/3634

> [JS] Implement Schema/Field/DataType comparators
> 
>
> Key: ARROW-4553
> URL: https://issues.apache.org/jira/browse/ARROW-4553
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.4.1
>
>
> Some basic type comparison logic is necessary for {{Table#assign()}}. PR 
> incoming.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4554) [JS] Implement logic for combining Vectors with different lengths/chunksizes

2019-02-21 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-4554.
--
   Resolution: Fixed
Fix Version/s: (was: JS-0.5.0)
   JS-0.4.1

> [JS] Implement logic for combining Vectors with different lengths/chunksizes
> 
>
> Key: ARROW-4554
> URL: https://issues.apache.org/jira/browse/ARROW-4554
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.4.1
>
>
> We should add logic to combine and possibly slice/re-chunk and uniformly 
> partition chunks into separate RecordBatches. This will make it easier to 
> create Tables or RecordBatches from Vectors of different lengths. This is 
> also necessary for {{Table#assign()}}. PR incoming.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4553) [JS] Implement Schema/Field/DataType comparators

2019-02-21 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-4553:
-
Fix Version/s: (was: JS-0.5.0)
   JS-0.4.1

> [JS] Implement Schema/Field/DataType comparators
> 
>
> Key: ARROW-4553
> URL: https://issues.apache.org/jira/browse/ARROW-4553
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.4.1
>
>
> Some basic type comparison logic is necessary for {{Table#assign()}}. PR 
> incoming.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2764) [JS] Easy way to create a new Table with an additional column

2019-02-21 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2764:
-
Fix Version/s: (was: JS-0.5.0)
   JS-0.4.1

> [JS] Easy way to create a new Table with an additional column
> -
>
> Key: ARROW-2764
> URL: https://issues.apache.org/jira/browse/ARROW-2764
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Major
> Fix For: JS-0.4.1
>
>
> It should be easier to add a new column to a table. API could be either 
> `table.addColumn(vector)` or `table.merge(..tables or vectors)`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2764) [JS] Easy way to create a new Table with an additional column

2019-02-21 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-2764.
--
Resolution: Fixed
  Assignee: Paul Taylor

Closed in https://github.com/apache/arrow/pull/3634

> [JS] Easy way to create a new Table with an additional column
> -
>
> Key: ARROW-2764
> URL: https://issues.apache.org/jira/browse/ARROW-2764
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.4.1
>
>
> It should be easier to add a new column to a table. API could be either 
> `table.addColumn(vector)` or `table.merge(..tables or vectors)`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4552) [JS] Table and Schema assign implementations

2019-02-21 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-4552.
--
   Resolution: Fixed
Fix Version/s: (was: JS-0.5.0)
   JS-0.4.1

Issue resolved by pull request 3634
[https://github.com/apache/arrow/pull/3634]

> [JS] Table and Schema assign implementations
> 
>
> Key: ARROW-4552
> URL: https://issues.apache.org/jira/browse/ARROW-4552
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: JS-0.4.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.1
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> It'd be really handy to have a basic {{assign}} methods on the Table and 
> Schema. I've extracted and cleaned up some internal helper methods I have 
> that does this. PR incoming.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4550) [JS] Fix AMD pattern

2019-02-12 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-4550.
--
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3630
[https://github.com/apache/arrow/pull/3630]

> [JS] Fix AMD pattern
> 
>
> Key: ARROW-4550
> URL: https://issues.apache.org/jira/browse/ARROW-4550
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Dominik Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4524) [JS] Improve Row proxy generation performance

2019-02-12 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-4524:
-
Summary: [JS] Improve Row proxy generation performance  (was: [JS] Improve 
Row proxy generation improvement)

> [JS] Improve Row proxy generation performance
> -
>
> Key: ARROW-4524
> URL: https://issues.apache.org/jira/browse/ARROW-4524
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>
> See 
> https://github.com/vega/vega-loader-arrow/commit/19c88e130aaeeae9d0166360db467121e5724352#r32253784



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4524) [JS] Improve Row proxy generation improvement

2019-02-12 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766751#comment-16766751
 ] 

Brian Hulette commented on ARROW-4524:
--

Fixed in https://github.com/apache/arrow/pull/3601

> [JS] Improve Row proxy generation improvement
> -
>
> Key: ARROW-4524
> URL: https://issues.apache.org/jira/browse/ARROW-4524
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>
> See 
> https://github.com/vega/vega-loader-arrow/commit/19c88e130aaeeae9d0166360db467121e5724352#r32253784



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4524) [JS] Improve Row proxy generation improvement

2019-02-12 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-4524:
-
Fix Version/s: (was: 0.4.1)
  Summary: [JS] Improve Row proxy generation improvement  (was: [JS] 
only invoke `Object.defineProperty` once per table)

> [JS] Improve Row proxy generation improvement
> -
>
> Key: ARROW-4524
> URL: https://issues.apache.org/jira/browse/ARROW-4524
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>
> See 
> https://github.com/vega/vega-loader-arrow/commit/19c88e130aaeeae9d0166360db467121e5724352#r32253784



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4523) [JS] Add row proxy generation benchmark

2019-02-12 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-4523.
--
Resolution: Fixed

> [JS] Add row proxy generation benchmark
> ---
>
> Key: ARROW-4523
> URL: https://issues.apache.org/jira/browse/ARROW-4523
> Project: Apache Arrow
>  Issue Type: Test
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4551) [JS] Investigate using Symbols to access Row columns by index

2019-02-12 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-4551:


 Summary: [JS] Investigate using Symbols to access Row columns by 
index
 Key: ARROW-4551
 URL: https://issues.apache.org/jira/browse/ARROW-4551
 Project: Apache Arrow
  Issue Type: Task
  Components: JavaScript
Reporter: Brian Hulette


Can we use row[Symbol.for(0)] instead of row[0] in order to avoid collisions? 
What would the performance impact be?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4523) [JS] Add row proxy generation benchmark

2019-02-09 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-4523:


 Summary: [JS] Add row proxy generation benchmark
 Key: ARROW-4523
 URL: https://issues.apache.org/jira/browse/ARROW-4523
 Project: Apache Arrow
  Issue Type: Test
  Components: JavaScript
Reporter: Brian Hulette
Assignee: Brian Hulette






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4524) [JS] only invoke `Object.defineProperty` once per table

2019-02-09 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-4524:


 Summary: [JS] only invoke `Object.defineProperty` once per table
 Key: ARROW-4524
 URL: https://issues.apache.org/jira/browse/ARROW-4524
 Project: Apache Arrow
  Issue Type: Improvement
  Components: JavaScript
Reporter: Brian Hulette
Assignee: Brian Hulette
 Fix For: 0.4.1


See 
https://github.com/vega/vega-loader-arrow/commit/19c88e130aaeeae9d0166360db467121e5724352#r32253784



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4519) Publish JS API Docs for v0.4.0

2019-02-08 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-4519:


 Summary: Publish JS API Docs for v0.4.0
 Key: ARROW-4519
 URL: https://issues.apache.org/jira/browse/ARROW-4519
 Project: Apache Arrow
  Issue Type: Task
  Components: JavaScript
Reporter: Brian Hulette
Assignee: Brian Hulette






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2323) [JS] Document JavaScript release management

2019-02-05 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-2323.
--
Resolution: Fixed

Documented at 
https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-JavaScriptReleases

> [JS] Document JavaScript release management
> ---
>
> Key: ARROW-2323
> URL: https://issues.apache.org/jira/browse/ARROW-2323
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
> Fix For: JS-0.4.0
>
>
> The JavaScript post-vote release management process is not documented. For 
> example, there is certain NPM-related steps required to be able to publish 
> artifacts after the release vote has taken place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3993) [JS] CI Jobs Failing

2018-12-10 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-3993:


 Summary: [JS] CI Jobs Failing
 Key: ARROW-3993
 URL: https://issues.apache.org/jira/browse/ARROW-3993
 Project: Apache Arrow
  Issue Type: Task
  Components: JavaScript
Affects Versions: JS-0.3.1
Reporter: Brian Hulette
Assignee: Brian Hulette
 Fix For: JS-0.4.0


JS Jobs failing with:
npm ERR! code ETARGET
npm ERR! notarget No matching version found for gulp@next
npm ERR! notarget In most cases you or one of your dependencies are requesting
npm ERR! notarget a package version that doesn't exist.
npm ERR! notarget 
npm ERR! notarget It was specified as a dependency of 'apache-arrow'
npm ERR! notarget 
npm ERR! A complete log of this run can be found in:
npm ERR! /home/travis/.npm/_logs/2018-12-10T22_33_26_272Z-debug.log
The command "$TRAVIS_BUILD_DIR/ci/travis_before_script_js.sh" failed and exited 
with 1 during .

Reported by [~wesmckinn] in 
https://github.com/apache/arrow/pull/3152#issuecomment-446020105



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1918) [JS] Integration portion of verify-release-candidate.sh fails

2018-12-04 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-1918:
-
Fix Version/s: (was: JS-0.4.0)
   JS-0.5.0

> [JS] Integration portion of verify-release-candidate.sh fails
> -
>
> Key: ARROW-1918
> URL: https://issues.apache.org/jira/browse/ARROW-1918
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
> Fix For: JS-0.5.0
>
>
> I'm going to temporarily disable this in my fixes in ARROW-1917



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1918) [JS] Integration portion of verify-release-candidate.sh fails

2018-12-04 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned ARROW-1918:


Assignee: Brian Hulette

> [JS] Integration portion of verify-release-candidate.sh fails
> -
>
> Key: ARROW-1918
> URL: https://issues.apache.org/jira/browse/ARROW-1918
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.8.0
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
> Fix For: JS-0.5.0
>
>
> I'm going to temporarily disable this in my fixes in ARROW-1917



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2993) [JS] Document minimum supported NodeJS version

2018-12-04 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-2993.
--
Resolution: Fixed

Issue resolved by pull request 3087
[https://github.com/apache/arrow/pull/3087]

> [JS] Document minimum supported NodeJS version
> --
>
> Key: ARROW-2993
> URL: https://issues.apache.org/jira/browse/ARROW-2993
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The integration tests fail with NodeJS 8.11.3 LTS, but pass with 10.1 and 
> higher. It would be useful to document the minimum supported NodeJS version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2993) [JS] Document minimum supported NodeJS version

2018-12-04 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned ARROW-2993:


Assignee: Brian Hulette

> [JS] Document minimum supported NodeJS version
> --
>
> Key: ARROW-2993
> URL: https://issues.apache.org/jira/browse/ARROW-2993
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
> Fix For: JS-0.4.0
>
>
> The integration tests fail with NodeJS 8.11.3 LTS, but pass with 10.1 and 
> higher. It would be useful to document the minimum supported NodeJS version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3892) [JS] Remove any dependency on compromised NPM flatmap-stream package

2018-12-04 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-3892.
--
Resolution: Fixed

Issue resolved by pull request 3083
[https://github.com/apache/arrow/pull/3083]

> [JS] Remove any dependency on compromised NPM flatmap-stream package
> 
>
> Key: ARROW-3892
> URL: https://issues.apache.org/jira/browse/ARROW-3892
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are erroring out as the result of 
> https://github.com/dominictarr/event-stream/issues/116
> {code}
>  npm ERR! code ENOVERSIONS
>  npm ERR! No valid versions available for flatmap-stream
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2323) [JS] Document JavaScript release management

2018-12-03 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned ARROW-2323:


Assignee: Brian Hulette

> [JS] Document JavaScript release management
> ---
>
> Key: ARROW-2323
> URL: https://issues.apache.org/jira/browse/ARROW-2323
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
> Fix For: JS-0.4.0
>
>
> The JavaScript post-vote release management process is not documented. For 
> example, there is certain NPM-related steps required to be able to publish 
> artifacts after the release vote has taken place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2984) [JS] Refactor release verification script to share code with main source release verification script

2018-12-03 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2984:
-
Fix Version/s: (was: JS-0.4.0)
   JS-0.5.0

> [JS] Refactor release verification script to share code with main source 
> release verification script
> 
>
> Key: ARROW-2984
> URL: https://issues.apache.org/jira/browse/ARROW-2984
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Wes McKinney
>Priority: Major
> Fix For: JS-0.5.0
>
>
> There is some possible code duplication. See discussion in ARROW-2977 
> https://github.com/apache/arrow/pull/2369



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3892) [JS] Remove any dependency on compromised NPM flatmap-stream package

2018-12-03 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned ARROW-3892:


Assignee: Brian Hulette

> [JS] Remove any dependency on compromised NPM flatmap-stream package
> 
>
> Key: ARROW-3892
> URL: https://issues.apache.org/jira/browse/ARROW-3892
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Wes McKinney
>Assignee: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>
> We are erroring out as the result of 
> https://github.com/dominictarr/event-stream/issues/116
> {code}
>  npm ERR! code ENOVERSIONS
>  npm ERR! No valid versions available for flatmap-stream
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3667) [JS] Incorrectly reads record batches with an all null column

2018-12-03 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708149#comment-16708149
 ] 

Brian Hulette commented on ARROW-3667:
--

Makes sense, thanks for the context.
Maybe I'll start a discussion on the mailing list to define how we represent 
the null datatype in JSON.

> [JS] Incorrectly reads record batches with an all null column
> -
>
> Key: ARROW-3667
> URL: https://issues.apache.org/jira/browse/ARROW-3667
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: JS-0.3.1
>Reporter: Brian Hulette
>Priority: Major
> Fix For: JS-0.4.0
>
>
> The JS library seems to incorrectly read any columns that come after an 
> all-null column in IPC buffers produced by pyarrow.
> Here's a python script that generates two arrow buffers, one with an all-null 
> column followed by a utf-8 column, and a second with those two reversed
> {code:python}
> import pyarrow as pa
> import pandas as pd
> def serialize_to_arrow(df, fd, compress=True):
>   batch = pa.RecordBatch.from_pandas(df)
>   writer = pa.RecordBatchFileWriter(fd, batch.schema)
>   writer.write_batch(batch)
>   writer.close()
> if __name__ == "__main__":
> df = pd.DataFrame(data={'nulls': [None, None, None], 'not nulls': ['abc', 
> 'def', 'ghi']}, columns=['nulls', 'not nulls'])
> with open('bad.arrow', 'wb') as fd:
> serialize_to_arrow(df, fd)
> df = pd.DataFrame(df, columns=['not nulls', 'nulls'])
> with open('good.arrow', 'wb') as fd:
> serialize_to_arrow(df, fd)
> {code}
> JS incorrectly interprets the [null, not null] case:
> {code:javascript}
> > var arrow = require('apache-arrow')
> undefined
> > var fs = require('fs')
> undefined
> > arrow.Table.from(fs.readFileSync('good.arrow')).getColumn('not 
> > nulls').get(0)
> 'abc'
> > arrow.Table.from(fs.readFileSync('bad.arrow')).getColumn('not nulls').get(0)
> '\u\u\u\u\u0003\u\u\u\u0006\u\u\u\t\u\u\u'
> {code}
> Presumably this is because pyarrow is omitting some (or all) of the buffers 
> associated with the all-null column, but the JS IPC reader is still looking 
> for them, causing the buffer count to get out of sync.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3667) [JS] Incorrectly reads record batches with an all null column

2018-12-03 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-3667:
-
Fix Version/s: (was: JS-0.4.0)
   JS-0.5.0

> [JS] Incorrectly reads record batches with an all null column
> -
>
> Key: ARROW-3667
> URL: https://issues.apache.org/jira/browse/ARROW-3667
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: JS-0.3.1
>Reporter: Brian Hulette
>Priority: Major
> Fix For: JS-0.5.0
>
>
> The JS library seems to incorrectly read any columns that come after an 
> all-null column in IPC buffers produced by pyarrow.
> Here's a python script that generates two arrow buffers, one with an all-null 
> column followed by a utf-8 column, and a second with those two reversed
> {code:python}
> import pyarrow as pa
> import pandas as pd
> def serialize_to_arrow(df, fd, compress=True):
>   batch = pa.RecordBatch.from_pandas(df)
>   writer = pa.RecordBatchFileWriter(fd, batch.schema)
>   writer.write_batch(batch)
>   writer.close()
> if __name__ == "__main__":
> df = pd.DataFrame(data={'nulls': [None, None, None], 'not nulls': ['abc', 
> 'def', 'ghi']}, columns=['nulls', 'not nulls'])
> with open('bad.arrow', 'wb') as fd:
> serialize_to_arrow(df, fd)
> df = pd.DataFrame(df, columns=['not nulls', 'nulls'])
> with open('good.arrow', 'wb') as fd:
> serialize_to_arrow(df, fd)
> {code}
> JS incorrectly interprets the [null, not null] case:
> {code:javascript}
> > var arrow = require('apache-arrow')
> undefined
> > var fs = require('fs')
> undefined
> > arrow.Table.from(fs.readFileSync('good.arrow')).getColumn('not 
> > nulls').get(0)
> 'abc'
> > arrow.Table.from(fs.readFileSync('bad.arrow')).getColumn('not nulls').get(0)
> '\u\u\u\u\u0003\u\u\u\u0006\u\u\u\t\u\u\u'
> {code}
> Presumably this is because pyarrow is omitting some (or all) of the buffers 
> associated with the all-null column, but the JS IPC reader is still looking 
> for them, causing the buffer count to get out of sync.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-951) [JS] Fix generated API documentation

2018-12-03 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-951:

Fix Version/s: (was: JS-0.4.0)
   JS-0.5.0

> [JS] Fix generated API documentation
> 
>
> Key: ARROW-951
> URL: https://issues.apache.org/jira/browse/ARROW-951
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Minor
>  Labels: documentation
> Fix For: JS-0.5.0
>
>
> The current generated API documentation doesn't respect the project's 
> namespaces, it simply lists all exported objects. We should see if we can 
> make typedoc display the project's structure (even if it means re-structuring 
> the code a bit), or find another approach for doc generation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3337) [JS] IPC writer doesn't serialize the dictionary of nested Vectors

2018-12-03 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-3337:
-
Fix Version/s: (was: JS-0.4.0)
   JS-0.5.0

> [JS] IPC writer doesn't serialize the dictionary of nested Vectors
> --
>
> Key: ARROW-3337
> URL: https://issues.apache.org/jira/browse/ARROW-3337
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Affects Versions: JS-0.3.1
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.5.0
>
>
> The JS writer only serializes dictionaries for [top-level 
> children|https://github.com/apache/arrow/blob/ee9b1ba426e2f1f117cde8d8f4ba6fbe3be5674c/js/src/ipc/writer/binary.ts#L40]
>  of a Table. This is wrong, and an oversight on my part. The fix here is to 
> put the actual Dictionary vectors in the `schema.dictionaries` map instead of 
> the dictionary fields, like I understand the C++ does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2909) [JS] Add convenience function for creating a table from a list of vectors

2018-12-03 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-2909.
--
Resolution: Fixed

Issue resolved by pull request 2322
[https://github.com/apache/arrow/pull/2322]

> [JS] Add convenience function for creating a table from a list of vectors
> -
>
> Key: ARROW-2909
> URL: https://issues.apache.org/jira/browse/ARROW-2909
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Similar to ARROW-2766, but requires users to first turn their arrays into 
> vectors, so we don't have to deduce type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2839) [JS] Support whatwg/streams in IPC reader/writer

2018-12-03 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2839:
-
Fix Version/s: (was: JS-0.4.0)
   JS-0.5.0

> [JS] Support whatwg/streams in IPC reader/writer
> 
>
> Key: ARROW-2839
> URL: https://issues.apache.org/jira/browse/ARROW-2839
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Affects Versions: JS-0.3.1
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.5.0
>
>
> We should make it easy to stream Arrow in the browser via 
> [whatwg/streams|https://github.com/whatwg/streams]. I already have this 
> working at Graphistry, but I had to use some of the IPC internal methods. 
> Creating this issue to track back-porting that work and the few minor 
> refactors to the IPC internals that we'll need to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3691) [JS] Update dependencies, switch to terser

2018-11-02 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-3691.
--
Resolution: Fixed

Issue resolved by pull request 2611
[https://github.com/apache/arrow/pull/2611]

> [JS] Update dependencies, switch to terser
> --
>
> Key: ARROW-3691
> URL: https://issues.apache.org/jira/browse/ARROW-3691
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>
> Many dependencies are out of date, give them a bump.
> The uglifyjs-webpack-plugin [no longer 
> supports|https://github.com/webpack-contrib/uglifyjs-webpack-plugin/releases/tag/v2.0.0]
>  ES6 minification, switch to terser-webpack-plugin



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3691) [JS] Update dependencies, switch to terser

2018-11-02 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-3691:


 Summary: [JS] Update dependencies, switch to terser
 Key: ARROW-3691
 URL: https://issues.apache.org/jira/browse/ARROW-3691
 Project: Apache Arrow
  Issue Type: Task
  Components: JavaScript
Reporter: Brian Hulette
 Fix For: JS-0.4.0


Many dependencies are out of date, give them a bump.

The uglifyjs-webpack-plugin [no longer 
supports|https://github.com/webpack-contrib/uglifyjs-webpack-plugin/releases/tag/v2.0.0]
 ES6 minification, switch to terser-webpack-plugin



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3689) [JS] Upgrade to TS 3.1

2018-11-01 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-3689:


 Summary: [JS] Upgrade to TS 3.1
 Key: ARROW-3689
 URL: https://issues.apache.org/jira/browse/ARROW-3689
 Project: Apache Arrow
  Issue Type: Task
  Components: JavaScript
Reporter: Brian Hulette
 Fix For: JS-0.5.0


Attempted 
[here|https://github.com/apache/arrow/pull/2611#issuecomment-431318129], but 
ran into issues.

Should upgrade typedoc to 0.13 at the same time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3667) [JS] Incorrectly reads record batches with an all null column

2018-11-01 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671808#comment-16671808
 ] 

Brian Hulette commented on ARROW-3667:
--

My branch that adds a null column case to the integration test is at 
https://github.com/TheNeuralBit/arrow/tree/all_null_column

> [JS] Incorrectly reads record batches with an all null column
> -
>
> Key: ARROW-3667
> URL: https://issues.apache.org/jira/browse/ARROW-3667
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: JS-0.3.1
>Reporter: Brian Hulette
>Priority: Major
> Fix For: JS-0.4.0
>
>
> The JS library seems to incorrectly read any columns that come after an 
> all-null column in IPC buffers produced by pyarrow.
> Here's a python script that generates two arrow buffers, one with an all-null 
> column followed by a utf-8 column, and a second with those two reversed
> {code:python}
> import pyarrow as pa
> import pandas as pd
> def serialize_to_arrow(df, fd, compress=True):
>   batch = pa.RecordBatch.from_pandas(df)
>   writer = pa.RecordBatchFileWriter(fd, batch.schema)
>   writer.write_batch(batch)
>   writer.close()
> if __name__ == "__main__":
> df = pd.DataFrame(data={'nulls': [None, None, None], 'not nulls': ['abc', 
> 'def', 'ghi']}, columns=['nulls', 'not nulls'])
> with open('bad.arrow', 'wb') as fd:
> serialize_to_arrow(df, fd)
> df = pd.DataFrame(df, columns=['not nulls', 'nulls'])
> with open('good.arrow', 'wb') as fd:
> serialize_to_arrow(df, fd)
> {code}
> JS incorrectly interprets the [null, not null] case:
> {code:javascript}
> > var arrow = require('apache-arrow')
> undefined
> > var fs = require('fs')
> undefined
> > arrow.Table.from(fs.readFileSync('good.arrow')).getColumn('not 
> > nulls').get(0)
> 'abc'
> > arrow.Table.from(fs.readFileSync('bad.arrow')).getColumn('not nulls').get(0)
> '\u\u\u\u\u0003\u\u\u\u0006\u\u\u\t\u\u\u'
> {code}
> Presumably this is because pyarrow is omitting some (or all) of the buffers 
> associated with the all-null column, but the JS IPC reader is still looking 
> for them, causing the buffer count to get out of sync.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3667) [JS] Incorrectly reads record batches with an all null column

2018-11-01 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671795#comment-16671795
 ] 

Brian Hulette commented on ARROW-3667:
--

I'm looking at adding a null column case to the integration tests, but it's not 
clear what the JSON format should look like for a null type column.

I tried generating a json file using the C++ implementation to use as a guide, 
but it turns out C++ actually fails to read the JSON it generates based on 
{{bad.arrow}}
{code}
-> % ./cpp/build/debug/json-integration-test --integration --mode ARROW_TO_JSON 
--arrow /tmp/bad.arrow --json /tmp/bad.json Found schema:
nulls: null
not nulls: string
__index_level_0__: int64
-- metadata --
pandas: {"index_columns": ["__index_level_0__"], "column_indexes": [{"name": 
null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", 
"metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "nulls", 
"field_name": "nulls", "pandas_type": "empty", "numpy_type": "object", 
"metadata": null}, {"name": "not nulls", "field_name": "not nulls", 
"pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": 
null, "field_name": "__index_level_0__", "pandas_type": "int64", "numpy_type": 
"int64", "metadata": null}], "pandas_version": "0.23.4"}
-> % ./cpp/build/debug/json-integration-test --integration --mode JSON_TO_ARROW 
--arrow /tmp/bad.arrow --json /tmp/bad.json
Found schema: nulls: null
not nulls: string
__index_level_0__: int64
Error message: Invalid: field VALIDITY not found{code}

Could someone familiar with the C++ implementation weigh in here? cc 
[~wesmckinn] [~pitrou]
Here's what {{/tmp/bad.json}} looks like:

{code:json}
{
  "schema": {
"fields": [
  {
"name": "nulls",
"nullable": true,
"type": {
  "name": "null"
},
"children": []
  },
  {
"name": "not nulls",
"nullable": true,
"type": {
  "name": "utf8"
},
"children": []
  },
  {
"name": "__index_level_0__",
"nullable": true,
"type": {
  "name": "int",
  "bitWidth": 64,
  "isSigned": true
},
"children": []
  }
]
  },
  "batches": [
{
  "count": 3,
  "columns": [
{
  "name": "nulls",
  "count": 3,
  "children": []
},
{
  "name": "not nulls",
  "count": 3,
  "VALIDITY": [
1,
1,
1
  ],
  "OFFSET": [
0,
3,
6,
9
  ],
  "DATA": [
"abc",
"def",
"ghi"
  ],
  "children": []
},
{
  "name": "__index_level_0__",
  "count": 3,
  "VALIDITY": [
1,
1,
1
  ],
  "DATA": [
0,
1,
2
  ],
  "children": []
}
  ]
}
  ]
}
{code}



> [JS] Incorrectly reads record batches with an all null column
> -
>
> Key: ARROW-3667
> URL: https://issues.apache.org/jira/browse/ARROW-3667
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: JS-0.3.1
>Reporter: Brian Hulette
>Priority: Major
> Fix For: JS-0.4.0
>
>
> The JS library seems to incorrectly read any columns that come after an 
> all-null column in IPC buffers produced by pyarrow.
> Here's a python script that generates two arrow buffers, one with an all-null 
> column followed by a utf-8 column, and a second with those two reversed
> {code:python}
> import pyarrow as pa
> import pandas as pd
> def serialize_to_arrow(df, fd, compress=True):
>   batch = pa.RecordBatch.from_pandas(df)
>   writer = pa.RecordBatchFileWriter(fd, batch.schema)
>   writer.write_batch(batch)
>   writer.close()
> if __name__ == "__main__":
> df = pd.DataFrame(data={'nulls': [None, None, None], 'not nulls': ['abc', 
> 'def', 'ghi']}, columns=['nulls', 'not nulls'])
> with open('bad.arrow', 'wb') as fd:
> serialize_to_arrow(df, fd)
> df = pd.DataFrame(df, columns=['not nulls', 'nulls'])
> with open('good.arrow', 'wb') as fd:
> serialize_to_arrow(df, fd)
> {code}
> JS incorrectly interprets the [null, not null] case:
> {code:javascript}
> > var arrow = require('apache-arrow')
> undefined
> > var fs = require('fs')
> undefined
> > arrow.Table.from(fs.readFileSync('good.arrow')).getColumn('not 
> > nulls').get(0)
> 'abc'
> > arrow.Table.from(fs.readFileSync('bad.arrow')).getColumn('not nulls').get(0)
> '\u\u\u\u\u0003\u\u\u\u0006\u\u\u\t\u\u\u'
> {code}
> Presumably this is because pyarrow is omitting some (or all) of the buffers 

[jira] [Created] (ARROW-3667) [JS] Incorrectly reads record batches with an all null column

2018-10-31 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-3667:


 Summary: [JS] Incorrectly reads record batches with an all null 
column
 Key: ARROW-3667
 URL: https://issues.apache.org/jira/browse/ARROW-3667
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: JS-0.3.1
Reporter: Brian Hulette
 Fix For: JS-0.4.0


The JS library seems to incorrectly read any columns that come after an 
all-null column in IPC buffers produced by pyarrow.

Here's a python script that generates two arrow buffers, one with an all-null 
column followed by a utf-8 column, and a second with those two reversed

{code:python}
import pyarrow as pa
import pandas as pd

def serialize_to_arrow(df, fd, compress=True):
  batch = pa.RecordBatch.from_pandas(df)
  writer = pa.RecordBatchFileWriter(fd, batch.schema)

  writer.write_batch(batch)
  writer.close()

if __name__ == "__main__":
df = pd.DataFrame(data={'nulls': [None, None, None], 'not nulls': ['abc', 
'def', 'ghi']}, columns=['nulls', 'not nulls'])
with open('bad.arrow', 'wb') as fd:
serialize_to_arrow(df, fd)
df = pd.DataFrame(df, columns=['not nulls', 'nulls'])
with open('good.arrow', 'wb') as fd:
serialize_to_arrow(df, fd)
{code}

JS incorrectly interprets the [null, not null] case:

{code:javascript}
> var arrow = require('apache-arrow')
undefined
> var fs = require('fs')
undefined
> arrow.Table.from(fs.readFileSync('good.arrow')).getColumn('not nulls').get(0)
'abc'
> arrow.Table.from(fs.readFileSync('bad.arrow')).getColumn('not nulls').get(0)
'\u\u\u\u\u0003\u\u\u\u0006\u\u\u\t\u\u\u'
{code}

Presumably this is because pyarrow is omitting some (or all) of the buffers 
associated with the all-null column, but the JS IPC reader is still looking for 
them, causing the buffer count to get out of sync.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3523) [JS] Assign dictionary IDs in IPC writer rather than on creation

2018-10-15 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-3523:


 Summary: [JS] Assign dictionary IDs in IPC writer rather than on 
creation
 Key: ARROW-3523
 URL: https://issues.apache.org/jira/browse/ARROW-3523
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Brian Hulette
 Fix For: JS-0.5.0


 Currently the JS implementation relies on on the user assigning IDs for 
dictionaries that they create, we should do something like the C++ 
implementation, which uses a dictionary id memo to assign and retrieve 
dictionary ids in the IPC writer 
(https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/metadata-internal.cc#L495).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-3304) JS stream reader should yield all messages

2018-10-15 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette closed ARROW-3304.

Resolution: Fixed

> JS stream reader should yield all messages
> --
>
> Key: ARROW-3304
> URL: https://issues.apache.org/jira/browse/ARROW-3304
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Affects Versions: JS-0.3.1
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The JS stream reader should yield all parsed messages from the source stream 
> so an external consumer of the iterator can read multiple tables from one 
> combined source stream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3425) [JS] Programmatically created dictionary vectors don't get dictionary IDs

2018-10-03 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-3425:


 Summary: [JS] Programmatically created dictionary vectors don't 
get dictionary IDs
 Key: ARROW-3425
 URL: https://issues.apache.org/jira/browse/ARROW-3425
 Project: Apache Arrow
  Issue Type: Bug
  Components: JavaScript
Reporter: Brian Hulette
 Fix For: JS-0.4.0


This seems to be the cause of the test failures in 
https://github.com/apache/arrow/pull/2322

Modifying {{getSingleRecordBatchTable}} to [generate its vectors 
programmatically|https://github.com/apache/arrow/pull/2322/files#diff-eb6e5955a00e92f7bebb15a03f8437d1R359]
 (rather than deserializing hard-coded JSON), causes the new round-trip tests 
added in https://github.com/apache/arrow/pull/2638 to fail. The root cause 
seems to be that an ID is never allocated for the generated dictionary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3074) [JS] Date.indexOf generates an error

2018-09-27 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-3074.
--
Resolution: Fixed

> [JS] Date.indexOf generates an error
> 
>
> Key: ARROW-3074
> URL: https://issues.apache.org/jira/browse/ARROW-3074
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: JS-0.4.0
>
>
> https://github.com/apache/arrow/blob/master/js/src/vector/flat.ts#L150
> {{every}} doesn't exist on {{Date}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >