[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259723#comment-16259723
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Expand JavaScript 
implementation, build system, fix integration tests
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345803902
 
 
   Ah cool I missed that. I think we are good then, so I suggest we cut a JS 
release ASAP to make sure we've got the process down and then we can release 
again after 0.8.0 final goes out. I'm available this week to help out with this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259600#comment-16259600
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Expand JavaScript 
implementation, build system, fix integration tests
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345783422
 
 
   @wesm I added 
https://github.com/apache/arrow/commit/48111290c2c8169ccefcf04c92afa684f0e8d56d 
to support reading <= 0.7.1 buffers. I tested on the previous arrow files in 
the tests, plus a few I generated in pyarrow locally. Is there anything else we 
need to do on that?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259278#comment-16259278
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Expand JavaScript 
implementation, build system, fix integration tests
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345708786
 
 
   Sweet thanks! So after we get the release scripts set up, we can release 
this, but one problem to be aware of is that the library in its current state 
cannot read 0.7.1 binary data and possibly not 0.8.0 (in its final form) binary 
data. Hopefully we can get ARROW-1785 sorted out soon 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258887#comment-16258887
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345610464
 
 
   @wesm alright! now that the integration tests are passing, I added 
backwards-compatibility for Arrow files written before 0.8, re-enabled the 
datetime tests, and removed the generated arrow files from the performance 
tests. Should be good to go pending this last CI build.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258885#comment-16258885
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345610464
 
 
   @wesm alright! now that the integration tests are passing, I re-enabled the 
datetime tests and removed the generated arrow files from the performance 
tests. Should be good to go pending this last CI build.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258799#comment-16258799
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345587677
 
 
   Totally fine to move the integration tests to jdk8


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258790#comment-16258790
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345585644
 
 
   @wesm actually this might be tough -- the JS version of closure-compiler is 
a bit outdated and broken, and the Java version [hasn't supported Java 7 since 
October](https://github.com/google/closure-compiler/issues/2672).
   
   I don't want to skip running the tests on the ES5 UMD bundle, as that's the 
lowest-common-denominator for anyone wanting to experiment with Arrow in the 
browser, and the integration tests validate that public methods and properties 
don't get minified away.
   
   Is it possible to update the integration job to openjdk8 (like the java 
version)? If not, I can create a sibling `integration-java8` job that includes 
the JS tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258789#comment-16258789
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345585644
 
 
   @wesm actually this might be tough -- the JS version of closure-compiler is 
a bit outdated, and the Java version [hasn't supported Java 7 since 
October](https://github.com/google/closure-compiler/issues/2672).
   
   I don't want to skip running the tests on the ES5 UMD bundle, as that's the 
lowest-common-denominator for anyone wanting to experiment with Arrow in the 
browser, and the integration tests validate that public methods and properties 
don't get minified away.
   
   Is it possible to update the integration job to openjdk8 (like the java 
version)? If not, I can create a sibling `integration-java8` job that includes 
the JS tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258703#comment-16258703
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345567116
 
 
   @wesm yeah, looks like Closure Compiler threw an exception building the ES5 
UMD target: https://travis-ci.org/apache/arrow/jobs/304499884#L4649. I'm not 
certain, but it could be related to the integration tests running with JDK7 
instead of 8. I'll switch the job to use the JS version of the Closure Compiler 
which, while slower, won't be affected by Java externalities.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258702#comment-16258702
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345567116
 
 
   @wesm yeah, looks like Closure Compiler threw an exception building the ES5 
UMD target: https://travis-ci.org/apache/arrow/jobs/304499884#L4649. I'm not 
certain, but it could be related to the integration tests running with JDK7 
instead of 8.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258701#comment-16258701
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345566713
 
 
   There was a non-deterministic Plasma failure in the C++/Python entry, but 
the integration test entry also failed:
   
   ```
   [00:23:46] Starting 'test:es5:umd'...
FAIL  test/table-tests.ts
 ● Test suite failed to run
   Cannot find module 
'/home/travis/build/apache/arrow/js/targets/es5/umd/Arrow' from 'Arrow.ts'
 
 at Resolver.resolveModule 
(node_modules/jest-resolve/build/index.js:191:17)
 at Object. (test/Arrow.ts:50:17)
FAIL  test/reader-tests.ts
 ● Test suite failed to run
   Cannot find module 
'/home/travis/build/apache/arrow/js/targets/es5/umd/Arrow' from 'Arrow.ts'
 
 at Resolver.resolveModule 
(node_modules/jest-resolve/build/index.js:191:17)
 at Object. (test/Arrow.ts:50:17)
FAIL  test/integration-tests.ts
 ● Test suite failed to run
   Cannot find module 
'/home/travis/build/apache/arrow/js/targets/es5/umd/Arrow' from 'Arrow.ts'
 
 at Resolver.resolveModule 
(node_modules/jest-resolve/build/index.js:191:17)
 at Object. (test/Arrow.ts:50:17)
FAIL  test/vector-tests.ts
 ● Test suite failed to run
   Cannot find module 
'/home/travis/build/apache/arrow/js/targets/es5/umd/Arrow' from 'Arrow.ts'
 
 at Resolver.resolveModule 
(node_modules/jest-resolve/build/index.js:191:17)
 at Object. (test/Arrow.ts:50:17)
   Test Suites: 4 failed, 4 total
   Tests:   0 total
   Snapshots:   0 total
   Time:2.409s, estimated 24s
   Ran all test suites.
   [00:23:49] 'test:es5:umd' errored after 2.64 s
   [00:23:49] Error: exited with error code: 1
   at ChildProcess.onexit 
(/home/travis/build/apache/arrow/js/node_modules/end-of-stream/index.js:39:36)
   at ChildProcess.emit (events.js:159:13)
   at Process.ChildProcess._handle.onexit (internal/child_process.js:209:12)
   [00:23:49] 'test' errored after 3.42 min
   npm ERR! Test failed.  See above for more details.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258669#comment-16258669
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345557709
 
 
   The PR tool should squash out those commits, so I don't think it's a 
problem. I'll let you know if I run into any issues after the build runs


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258665#comment-16258665
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345557542
 
 
   @wesm awesome, thanks! After my latest commit, the integration tests all 
pass now for me locally. Are you fine with this PR as-is, or should I close it 
and do one from a new branch w/o the test data commits?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258642#comment-16258642
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345554173
 
 
   OK, integration tests should pass now, fingers crossed. I will merge once 
the build is green, thanks @trxcllnt and @TheNeuralBit for the patience, it is 
appreciated


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258640#comment-16258640
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345553066
 
 
   I found the problem -- one of the primitive integration test files was being 
clobbered and not run, which was suppressing a failure that should have been 
raised a long time ago.
   
   In the meantime, there was also a regression from the Java refactor, and we 
are no longer able to fully read unsigned integer types anymore. I will hack 
the integration tests for now and open a JIRA about fixing, 
   
   here's an example of trying to read a `uint16` vector:
   
   ```
   16:49:51.051 [main] DEBUG io.netty.util.Recycler - 
-Dio.netty.recycler.ratio: 8
   Error accessing files
   Numeric value (65350) out of range of Java short
at [Source: /tmp/tmpwgopllpl/generated_primitive.json; line: , column: 
18]
   16:49:51.065 [main] ERROR org.apache.arrow.tools.Integration - Error 
accessing files
   com.fasterxml.jackson.core.JsonParseException: Numeric value (65350) out of 
range of Java short
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258602#comment-16258602
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345546104
 
 
   Super weird. I am traveling today so I hope to find some downtime in a 
little while to look at this, before EOD is the goal


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258588#comment-16258588
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

TheNeuralBit commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345543675
 
 
   I'm seeing this same error locally:
   ```
   com.fasterxml.jackson.core.JsonParseException: Numeric value (50261) out of 
range of Java short
   ```
   
   Strangely, `python integration_test.py` runs just fine. I only run into this 
issue when I use the java integration test directly to generate a test file. My 
process:
   
   - ran `generate_primitive_case(..).write('primitive.json)` from a python 
shell to get a JSON file
   - ran `java -cp 
${ARROW_HOME}/java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar
 org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a primitive.java.arrow -j 
primitive.json`
   
   Not sure what is causing this discrepancy, but it seems like the same thing 
that's affecting @trxcllnt's generator.
   
   EDIT: Note I haven't had any issues generating C++ files yet, I'm only 
seeing the java issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258587#comment-16258587
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

TheNeuralBit commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345543675
 
 
   I'm seeing this same error locally:
   ```
   com.fasterxml.jackson.core.JsonParseException: Numeric value (50261) out of 
range of Java short
   ```
   
   Strangely, `python integration_test.py` runs just fine. I only run into this 
issue when I use the java integration test directly to generate a test file. My 
process:
   
   - ran `generate_primitive_case(..).write('primitive.json)` from a python 
shell to get a JSON file
   - ran `java -cp 
${ARROW_HOME}/java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar
 org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a primitive.java.arrow -j 
primitive.json`
   
   Not sure what is causing this discrepancy, but it seems like the same thing 
that's affecting @trxcllnt's generator.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258558#comment-16258558
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345539857
 
 
   See 
   
   ```
   Error message: Invalid: 
/home/travis/build/apache/arrow/cpp/src/arrow/ipc/json-integration-test.cc:89 
code: reader->ReadRecordBatch(i, )
   /home/travis/build/apache/arrow/cpp/src/arrow/ipc/json-internal.cc:1435 
code: ReadArray(pool, json_columns[i], type, [i])
   /home/travis/build/apache/arrow/cpp/src/arrow/ipc/json-internal.cc:1287 
code: ParseTypeValues(*type_)
   /home/travis/build/apache/arrow/cpp/src/arrow/ipc/json-internal.cc:1055 
code: ParseHexValue(hex_data + j * 2, _buffer_data[j])
   Encountered non-hex digit
   Command failed: 
/home/travis/build/apache/arrow/cpp-build/debug/json-integration-test 
--integration --mode=JSON_TO_ARROW 
--json=/home/travis/build/apache/arrow/js/test/data/json/primitive.json 
--arrow=/home/travis/build/apache/arrow/js/test/data/cpp/file/primitive.arrow
   ```
   
   It looks like there is something wrong with the JSON files that have been 
written to that directory. I will take a closer look


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258511#comment-16258511
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345525031
 
 
   I’ll take a look to see if I can figure it out


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258489#comment-16258489
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345518069
 
 
   @wesm after rebasing master, I removed the test data and added some lines to 
auto-generate the files and run the snapshot tests to the integration runner. 
I'm getting these errors converting the JSON to arrows both locally and in 
travis: https://travis-ci.org/apache/arrow/jobs/304302317#L4476. It's strange 
the normal integration tests run and all seem to pass. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258227#comment-16258227
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345472682
 
 
   Here's what I'm seeing in the diff in the test directory:
   
   ```
js/test/Arrow.ts |57 +-
js/test/__snapshots__/reader-tests.ts.snap   |   497 -
js/test/__snapshots__/table-tests.ts.snap|  1815 ---
js/test/arrows/cpp/file/datetime.arrow   |   Bin 0 -> 6490 bytes
js/test/arrows/cpp/file/decimal.arrow|   Bin 0 -> 259090 bytes
js/test/arrows/cpp/file/dictionary.arrow |   Bin 0 -> 2562 bytes
js/test/arrows/cpp/file/nested.arrow |   Bin 0 -> 2218 bytes
js/test/arrows/cpp/file/primitive-empty.arrow|   Bin 0 -> 9498 bytes
js/test/arrows/cpp/file/primitive.arrow  |   Bin 0 -> 9442 bytes
js/test/arrows/cpp/file/simple.arrow |   Bin 0 -> 1154 bytes
js/test/arrows/cpp/file/struct_example.arrow |   Bin 0 -> 1538 bytes
js/test/arrows/cpp/stream/datetime.arrow |   Bin 0 -> 5076 bytes
js/test/arrows/cpp/stream/decimal.arrow  |   Bin 0 -> 255228 bytes
js/test/arrows/cpp/stream/dictionary.arrow   |   Bin 0 -> 2004 bytes
js/test/arrows/cpp/stream/nested.arrow   |   Bin 0 -> 1636 bytes
js/test/arrows/cpp/stream/primitive-empty.arrow  |   Bin 0 -> 6852 bytes
js/test/arrows/cpp/stream/primitive.arrow|   Bin 0 -> 7020 bytes
js/test/arrows/cpp/stream/simple.arrow   |   Bin 0 -> 748 bytes
js/test/arrows/cpp/stream/struct_example.arrow   |   Bin 0 -> 1124 bytes
js/test/arrows/file/dictionary.arrow |   Bin 2522 -> 0 bytes
js/test/arrows/file/dictionary2.arrow|   Bin 2762 -> 0 bytes
js/test/arrows/file/multi_dictionary.arrow   |   Bin 3482 -> 0 bytes
js/test/arrows/file/simple.arrow |   Bin 1642 -> 0 bytes
js/test/arrows/file/struct.arrow |   Bin 2354 -> 0 bytes
js/test/arrows/java/file/datetime.arrow  |   Bin 0 -> 6746 bytes
js/test/arrows/java/file/decimal.arrow   |   Bin 0 -> 259730 bytes
js/test/arrows/java/file/dictionary.arrow|   Bin 0 -> 2666 bytes
js/test/arrows/java/file/nested.arrow|   Bin 0 -> 2314 bytes
js/test/arrows/java/file/primitive-empty.arrow   |   Bin 0 -> 9778 bytes
js/test/arrows/java/file/primitive.arrow |   Bin 0 -> 10034 bytes
js/test/arrows/java/file/simple.arrow|   Bin 0 -> 1210 bytes
js/test/arrows/java/file/struct_example.arrow|   Bin 0 -> 1602 bytes
js/test/arrows/java/stream/datetime.arrow|   Bin 0 -> 5196 bytes
js/test/arrows/java/stream/decimal.arrow |   Bin 0 -> 255564 bytes
js/test/arrows/java/stream/dictionary.arrow  |   Bin 0 -> 2036 bytes
js/test/arrows/java/stream/nested.arrow  |   Bin 0 -> 1676 bytes
js/test/arrows/java/stream/primitive-empty.arrow |   Bin 0 -> 6916 bytes
js/test/arrows/java/stream/primitive.arrow   |   Bin 0 -> 7404 bytes
js/test/arrows/java/stream/simple.arrow  |   Bin 0 -> 772 bytes
js/test/arrows/java/stream/struct_example.arrow  |   Bin 0 -> 1148 bytes
js/test/arrows/json/datetime.json|  1091 ++
js/test/arrows/json/decimal.json | 33380 
+++
js/test/arrows/json/dictionary.json  |   424 +
js/test/arrows/json/nested.json  |   384 +
js/test/arrows/json/primitive-empty.json |  1099 ++
js/test/arrows/json/primitive.json   |  1788 +++
js/test/arrows/json/simple.json  |66 +
js/test/arrows/json/struct_example.json  |   237 +
js/test/arrows/multi/count/records.arrow |   Bin 224 -> 0 bytes
js/test/arrows/multi/count/schema.arrow  |   Bin 184 -> 0 bytes
js/test/arrows/multi/latlong/records.arrow   |   Bin 352 -> 0 bytes
js/test/arrows/multi/latlong/schema.arrow|   Bin 264 -> 0 bytes
js/test/arrows/multi/origins/records.arrow   |   Bin 224 -> 0 bytes
js/test/arrows/multi/origins/schema.arrow|   Bin 1604 -> 0 bytes
js/test/arrows/stream/dictionary.arrow   |   Bin 1776 -> 0 bytes
js/test/arrows/stream/simple.arrow   |   Bin 1188 -> 0 bytes
js/test/arrows/stream/struct.arrow   |   Bin 1884 -> 0 bytes
js/test/integration-tests.ts |   114 +
js/test/reader-tests.ts  |69 +-
js/test/table-tests.ts   |   175 +-
js/test/test-config.ts  

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257649#comment-16257649
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345378644
 
 
   This doesn't fail for me locally:
   
   ```
   $ ../cpp/build/debug/json-integration-test --integration 
--json=/tmp/tmp0jga4tt5/generated_primitive.json --arrow=foo.arrow 
--mode=JSON_TO_ARROW
   Found schema: bool_nullable: bool
   bool_nonnullable: bool not null
   int8_nullable: int8
   int8_nonnullable: int8 not null
   int16_nullable: int16
   int16_nonnullable: int16 not null
   int32_nullable: int32
   int32_nonnullable: int32 not null
   int64_nullable: int64
   int64_nonnullable: int64 not null
   uint8_nullable: uint8
   uint8_nonnullable: uint8 not null
   uint16_nullable: uint16
   uint16_nonnullable: uint16 not null
   uint32_nullable: uint32
   uint32_nonnullable: uint32 not null
   uint64_nullable: uint64
   uint64_nonnullable: uint64 not null
   float32_nullable: float
   float32_nonnullable: float not null
   float64_nullable: double
   float64_nonnullable: double not null
   binary_nullable: binary
   binary_nonnullable: binary not null
   utf8_nullable: string
   utf8_nonnullable: string not null
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257644#comment-16257644
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345377333
 
 
   Sorry that I missed the thing. I will figure out what's going on here:
   
   > Then I had to manually edit the "primitive.json" file to remove the 
"binary_nullable" and "binary_nonnullable" columns, because the C++ command 
fails if they're present (click to expand)
   
   ```
   $ ../cpp/build/release/json-integration-test \
 --integration --mode=JSON_TO_ARROW \
 --json=./test/arrows/json/primitive.json \
 --arrow=./test/arrows/cpp/file/primitive.arrow
   Found schema: bool_nullable: bool
   bool_nonnullable: bool not null
   int8_nullable: int8
   int8_nonnullable: int8 not null
   int16_nullable: int16
   int16_nonnullable: int16 not null
   int32_nullable: int32
   int32_nonnullable: int32 not null
   int64_nullable: int64
   int64_nonnullable: int64 not null
   uint8_nullable: uint8
   uint8_nonnullable: uint8 not null
   uint16_nullable: uint16
   uint16_nonnullable: uint16 not null
   uint32_nullable: uint32
   uint32_nonnullable: uint32 not null
   uint64_nullable: uint64
   uint64_nonnullable: uint64 not null
   float32_nullable: float
   float32_nonnullable: float not null
   float64_nullable: double
   float64_nonnullable: double not null
   binary_nullable: binary
   binary_nonnullable: binary not null
   utf8_nullable: string
   utf8_nonnullable: string not null
   Error message: Invalid: Encountered non-hex digit
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257587#comment-16257587
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345368042
 
 
   Sorry I have been dragging my feet because I’m not really on board with 
checking in data files that can be generated as part of CI. Per Slack 
conversation it seems there are some roadblocks so I’m available as needed 
today and tomorrow to get this sorted out


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257568#comment-16257568
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345364755
 
 
   @wesm I understand you may be busy, so do you mind if I go ahead and merge 
this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256022#comment-16256022
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345075467
 
 
   @wesm nope, only thing left to do is the ASF release scripts I think


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256019#comment-16256019
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-345074775
 
 
   Does anything still need to be done on this branch? If we are still not 
close to being able to cut a JS release by early next week I will rearrange my 
priorities to help out 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252920#comment-16252920
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-344481293
 
 
   That's good for me. As soon as the release scripts are good to go I can 
conduct the release vote on the mailing list. We can close the vote in less 
than the usual 72 hours so long as we get 3 PMC votes. So we'll need a quick 
"here's how to verify the release candidate" blurb to direct people do when we 
start the release vote


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252903#comment-16252903
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-344477823
 
 
   @wesm that sounds good to me. In the meantime can we get this PR merged + 
finish ASF release scripts, and push a new version to npm? I'm at the point 
where not having the latest on npm is going to be a problem for projects at 
work soon, @TheNeuralBit may be feeling this too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252868#comment-16252868
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-344472045
 
 
   Yes, definitely, we are in agreement. We should push for a JSON reader ASAP 
-- following the C++ reader as a guide, I do not think it is that big of a 
project, to be honest, when you consider all the hardship of dealing with 
parsing JSON in C++, which is a complete non-issue in JavaScript. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252659#comment-16252659
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-344431051
 
 
   > @wesm This "validate the tests results once" part is where I'm getting 
lost. How do you know whether anything is correct if you don't write down what 
you expect to be true?
   
   Ah right, in this case the JSON files are the initial source of truth. I 
compared the snapshots against the Arrow files read via pandas/pyarrow, and it 
looked correct. After this (assuming stable test data), the snapshots are the 
source of truth. If we decide to change the test data, then we have to 
re-validate the snapshots are what we expect them to be.
   
   But I want to stress, I'm not against doing it differently. I'm also 
bandwidth constrained, and snapshots get high coverage with minimal effort. It 
sounds like the JSON reader should provide all the same benefits as snapshot 
testing. From that perspective, I see snapshots as a stop-gap until the JS JSON 
reader is done (unless there's a way we can validate columns with the C++ or 
Java JSON readers from the JS tests?)
   
   With that in mind, I agree it's best not to commit the snapshots to the git 
history, if we're just going to remove them once the JSON reader is ready. In 
the interim, I don't mind validating any new JS PR's against my local 
snapshots, as the volume of JS PR's isn't that high yet.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252557#comment-16252557
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-344420875
 
 
   > Snapshots are just a different, data-centric way of writing assertions. 
Give it a lot of test data, validate the tests results once, then compare diffs 
after that. If you can eyeball test results and know whether it works, then the 
computer codegens all the dreadful bits about comparing types, values, etc. 
(even when it's late and you might otherwise forget to test an edge case).
   
   This "validate the tests results once" part is where I'm getting lost. How 
do you know whether anything is correct if you don't write down what you expect 
to be true? I can help with rallying the troops to write more tests. I am a bit 
bandwidth constrained at the moment with all the 0.8.0 stuff in progress, but I 
am hopeful that some others can get involved and this will also help with 
beating on the API and finding rough edges. cc @leifwalsh @scottdraves 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252166#comment-16252166
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150960731
 
 

 ##
 File path: js/npm-release.sh
 ##
 @@ -17,10 +17,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-npm run clean
-npm run lint
-npm run build
-npm run test
-npm --no-git-tag-version version patch &>/dev/null
-npm run bundle
-npm run lerna:publish
\ No newline at end of file
+bump=${1:-patch} && echo "semantic-version bump: $bump"
+
+run-s --silent lint build test
+lerna publish --yes --skip-git --cd-version $bump --force-publish=*
 
 Review comment:
   @wesm done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251733#comment-16251733
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading 
C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150897358
 
 

 ##
 File path: js/npm-release.sh
 ##
 @@ -17,10 +17,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-npm run clean
-npm run lint
-npm run build
-npm run test
-npm --no-git-tag-version version patch &>/dev/null
-npm run bundle
-npm run lerna:publish
\ No newline at end of file
+bump=${1:-patch} && echo "semantic-version bump: $bump"
+
+run-s --silent lint build test
+lerna publish --yes --skip-git --cd-version $bump --force-publish=*
 
 Review comment:
   Yes, what's in `js/LICENSE` should go in the top level `LICENSE.txt` at the 
bottom. Then we can copy that one license into the JS tarball


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250894#comment-16250894
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-344149960
 
 
   Maybe another way to phrase it is, disk and network are cheap, my/our time 
is not. ;-)
   
   edit: shit, I misread the snapshot count; we have _113,940_ snapshots, not 
11,394


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250885#comment-16250885
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-344149960
 
 
   Maybe another way to phrase it is, disk and network are cheap, my/our time 
is not. ;-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250874#comment-16250874
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-344148302
 
 
   Snapshots are just a different, data-centric way of writing assertions. Give 
it a lot of test data, validate the tests results once, then compare diffs 
after that. If you can eyeball test results and know whether it works, then the 
computer codegens all the dreadful bits about comparing types, values, etc. 
(even when it's late and you might otherwise forget to test an edge case).
   
   > I'm no expert, so there may be things I'm missing -- are some of the test 
assertions dependent on the flavor of the deployment target?
   
   No, the assertions should be identical regardless of the compilation target 
-- they're generated [once at the 
beginning](https://travis-ci.org/apache/arrow/jobs/301012418#L1177), then all 
the targets are compared against the same snapshots.
   
   I may have mentioned this before, but they've also helped catching 
minification bugs. Like before when we did return Long instances, Closure 
Compiler minified the class name down to something like "zw", so the snapshot 
tests failed for just the ES5/UMD target.
   
   But on the whole I can't argue with your position. All I can say is I'm 
proably pretty lazy by normal standards, so I try to make my computer do as 
much of my homework as possible.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250841#comment-16250841
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150736649
 
 

 ##
 File path: js/npm-release.sh
 ##
 @@ -17,10 +17,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-npm run clean
-npm run lint
-npm run build
-npm run test
-npm --no-git-tag-version version patch &>/dev/null
-npm run bundle
-npm run lerna:publish
\ No newline at end of file
+bump=${1:-patch} && echo "semantic-version bump: $bump"
+
+run-s --silent lint build test
+lerna publish --yes --skip-git --cd-version $bump --force-publish=*
 
 Review comment:
   The build also copies extra files from the `js` folder into each of the 
packages, so we just need to change [this 
line](https://github.com/apache/arrow/blob/master/js/gulp/util.js#L30) to 
`['../LICENSE.txt', '../NOTICE.txt', 'README.md']`. Do we also need to add the 
info in [`js/LICENSE`](https://github.com/apache/arrow/blob/master/js/LICENSE) 
to the top-level notice.txt? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250810#comment-16250810
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading 
C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150733769
 
 

 ##
 File path: js/npm-release.sh
 ##
 @@ -17,10 +17,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-npm run clean
-npm run lint
-npm run build
-npm run test
-npm --no-git-tag-version version patch &>/dev/null
-npm run bundle
-npm run lerna:publish
\ No newline at end of file
+bump=${1:-patch} && echo "semantic-version bump: $bump"
+
+run-s --silent lint build test
+lerna publish --yes --skip-git --cd-version $bump --force-publish=*
 
 Review comment:
   Seems reasonable. The other side of this is creating the signed tarball for 
voting, and making sure the tarball is sufficient for the post-release upload 
to NPM. We'll need to copy some files from the root directory (like the license 
and notice files). I can help with this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250811#comment-16250811
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-344140344
 
 
   @wesm the Jest docs on snapshot testing highlight its utility testing React 
components, but it's really just a form of test code generation. The tests 
evaluate [all 
combinations](https://github.com/trxcllnt/arrow/blob/generate-js-test-files/js/test/table-tests.ts#L22)
 of `source lib x arrow format` (in reality: `[c++, java] x [file, stream]`) 
for each of the generated files (nested, simple, decimal, datetime, primitive, 
primitive-empty, dictionary, and struct_example), so there are quite a few 
assertions.
   
   
   Snapshots capture a bit of runtime type info that would otherwise have to be 
asserted explicitly, for example that calling `uint64Vector.get(i)` returns a 
`Uint32Array` of two elements:
   
   ```
   exports[`readBuffers cpp stream primitive reads each batch as an Array of 
Vectors 167`] = `
   Uint32Array [
 12840890,
 0,
   ]
   `;
   ```
   
   
   
   They're also helpful catching regressions (or comparing against pandas) in 
`Table.toString()`:
   
   
   ```
   exports[`Table cpp file nested toString({ index: true }) prints a pretty 
Table with an Index column 1`] = `
   "Index,list_nullable,struct_nullable
   0, null,   [null,\\"tmo7qBM\\"]
   1, [1685103474],  [-583988484,null]
   2, [1981297353], [-749108100,\\"yGRfkmw\\"]
   3, [-2032422645,-2111456179,-895490422],   [820115077,null]
   4, null,   null
   5, [null,-434891054,-864560986],   null
   6, null,  [986507083,\\"U6xvhr7\\"]
   7, null,   null
   8, null,[null,null]
   9, null,   null
  10, [-498865952],   null
  11, null,   [null,\\"ctyWPJf\\"]
  12, null,[null,null]
  13, [-1076160763,-792439045,-656549144,null],   null
  14, null,  [1234093448,null]
  15,   [null,null,1882910932],   null
  16, null,  [934007407,\\"9QUyEm5\\"]"
   `;
   ```
   
   
   
   It also gives reviewers a chance to see what the tests produce, so if `get` 
on a Uint64Array starts returning a `Long` object instead of a `Uint32Array`, 
we can flag that in a code review. That said, it sounds like the JSON reader 
should be able to do most of this validation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250803#comment-16250803
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-344140344
 
 
   @wesm the Jest docs on snapshot testing highlight its utility testing React 
components, but it's really just a form of test code generation. The tests 
evaluate [all 
combinations](https://github.com/trxcllnt/arrow/blob/generate-js-test-files/js/test/table-tests.ts#L22)
 of `source lib x arrow format` (in reality: `[c++, java] x [file, stream]`) 
for each of the generated files (nested, simple, decimal, datetime, primitive, 
primitive-empty, dictionary, and struct_example), so there are quite a few 
assertions.
   
   
   Snapshots capture a bit of runtime type info that would otherwise have to be 
asserted explicitly, for example that calling `uint64Vector.get(i)` returns a 
`Uint32Array` of two elements:
   
   ```
   exports[`readBuffers cpp stream primitive reads each batch as an Array of 
Vectors 167`] = `
   Uint32Array [
 12840890,
 0,
   ]
   `;
   ```
   
   
   
   They're also helpful catching regressions (or comparing against pandas) in 
`Table.toString()`:
   
   
   ```
   exports[`Table cpp file nested toString({ index: true }) prints a pretty 
Table with an Index column 1`] = `
   "Index,list_nullable,struct_nullable
   0, null,   [null,\\"tmo7qBM\\"]
   1, [1685103474],  [-583988484,null]
   2, [1981297353], [-749108100,\\"yGRfkmw\\"]
   3, [-2032422645,-2111456179,-895490422],   [820115077,null]
   4, null,   null
   5, [null,-434891054,-864560986],   null
   6, null,  [986507083,\\"U6xvhr7\\"]
   7, null,   null
   8, null,[null,null]
   9, null,   null
  10, [-498865952],   null
  11, null,   [null,\\"ctyWPJf\\"]
  12, null,[null,null]
  13, [-1076160763,-792439045,-656549144,null],   null
  14, null,  [1234093448,null]
  15,   [null,null,1882910932],   null
  16, null,  [934007407,\\"9QUyEm5\\"]"
   `;
   ```
   
   
   
   That said, it sounds like the JSON reader should be able to do most of this 
validation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250761#comment-16250761
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150728634
 
 

 ##
 File path: js/npm-release.sh
 ##
 @@ -17,10 +17,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-npm run clean
-npm run lint
-npm run build
-npm run test
-npm --no-git-tag-version version patch &>/dev/null
-npm run bundle
-npm run lerna:publish
\ No newline at end of file
+bump=${1:-patch} && echo "semantic-version bump: $bump"
+
+run-s --silent lint build test
+lerna publish --yes --skip-git --cd-version $bump --force-publish=*
 
 Review comment:
   @wesm That makes sense. The way I have things set up, we compile and publish 
multiple modules to npm:
   - one [large-ish module](https://www.npmjs.com/package/apache-arrow) that 
you can get via `npm install apache-arrow`
   - the rest as smaller/specialized modules under the 
[`@apache-arrow`](https://www.npmjs.com/org/apache-arrow) [npm 
organization](https://www.npmjs.com/docs/orgs/), which can be installed via the 
formula `npm install @apache-arrow/`. For example, `npm install 
@apache-arrow/es5-cjs` installs the slimmed down ES5/CommonJS target
   
   The `npm run build` command compiles all the output targets to the 
(gitignored) `targets` directory. The `lerna publish --yes --skip-git 
--cd-version $bump --force-publish=*` command publishes all the targets to npm. 
So from the sound of it, all we need to do is tar up the `targets` directory 
with a shell script that installs and runs `lerna publish`, and we're good to 
go? If so, I can do that tonight.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250732#comment-16250732
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150726475
 
 

 ##
 File path: js/gulp/test-task.js
 ##
 @@ -42,3 +54,78 @@ const testTask = ((cache, execArgv, testOptions) => 
memoizeTask(cache, function
 
 module.exports = testTask;
 module.exports.testTask = testTask;
+module.exports.cleanTestData = cleanTestData;
+module.exports.createTestData = createTestData;
+
+async function cleanTestData() {
+return await del([
+`${path.resolve('./test/arrows/cpp')}/**`,
+`${path.resolve('./test/arrows/java')}/**`,
+]);
+}
+
+async function createTestData() {
+const base = path.resolve('./test/arrows');
+await mkdirp(path.join(base, 'cpp/file'));
+await mkdirp(path.join(base, 'java/file'));
+await mkdirp(path.join(base, 'cpp/stream'));
+await mkdirp(path.join(base, 'java/stream'));
+const errors = [];
+const names = await glob(path.join(base, 'json/*.json'));
+for (let jsonPath of names) {
+const name = path.parse(path.basename(jsonPath)).name;
+const arrowCppFilePath = path.join(base, 'cpp/file', `${name}.arrow`);
+const arrowJavaFilePath = path.join(base, 'java/file', 
`${name}.arrow`);
+const arrowCppStreamPath = path.join(base, 'cpp/stream', 
`${name}.arrow`);
+const arrowJavaStreamPath = path.join(base, 'java/stream', 
`${name}.arrow`);
+try {
+await generateCPPFile(jsonPath, arrowCppFilePath);
+await generateCPPStream(arrowCppFilePath, arrowCppStreamPath);
+} catch (e) { errors.push(e.message); }
+try {
+await generateJavaFile(jsonPath, arrowJavaFilePath);
+await generateJavaStream(arrowJavaFilePath, arrowJavaStreamPath);
+} catch (e) { errors.push(e.message); }
+}
+if (errors.length) {
+console.error(errors.join(`\n`));
+process.exit(1);
+}
+}
+
+async function generateCPPFile(jsonPath, filePath) {
+await rimraf(filePath);
+return await exec(
+`../cpp/build/release/json-integration-test ${
+`--integration --mode=JSON_TO_ARROW`} ${
+`--json=${path.resolve(jsonPath)} --arrow=${filePath}`}`,
+{ maxBuffer: Math.pow(2, 53) - 1 }
+);
+}
+
+async function generateCPPStream(filePath, streamPath) {
+await rimraf(streamPath);
+return await exec(
+`../cpp/build/release/file-to-stream ${filePath} > ${streamPath}`,
+{ maxBuffer: Math.pow(2, 53) - 1 }
+);
+}
+
+async function generateJavaFile(jsonPath, filePath) {
+await rimraf(filePath);
+return await exec(
+`java -cp 
../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar ${
+`org.apache.arrow.tools.Integration -c JSON_TO_ARROW`} ${
+`-j ${path.resolve(jsonPath)} -a ${filePath}`}`,
+{ maxBuffer: Math.pow(2, 53) - 1 }
+);
+}
+
+async function generateJavaStream(filePath, streamPath) {
+await rimraf(streamPath);
+return await exec(
+`java -cp 
../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar ${
 
 Review comment:
   I included this in my [response 
below](https://github.com/apache/arrow/pull/1294#discussion_r150721453)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250733#comment-16250733
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150726479
 
 

 ##
 File path: js/gulp/test-task.js
 ##
 @@ -42,3 +54,78 @@ const testTask = ((cache, execArgv, testOptions) => 
memoizeTask(cache, function
 
 module.exports = testTask;
 module.exports.testTask = testTask;
+module.exports.cleanTestData = cleanTestData;
+module.exports.createTestData = createTestData;
+
+async function cleanTestData() {
+return await del([
+`${path.resolve('./test/arrows/cpp')}/**`,
+`${path.resolve('./test/arrows/java')}/**`,
+]);
+}
+
+async function createTestData() {
+const base = path.resolve('./test/arrows');
+await mkdirp(path.join(base, 'cpp/file'));
+await mkdirp(path.join(base, 'java/file'));
+await mkdirp(path.join(base, 'cpp/stream'));
+await mkdirp(path.join(base, 'java/stream'));
+const errors = [];
+const names = await glob(path.join(base, 'json/*.json'));
+for (let jsonPath of names) {
+const name = path.parse(path.basename(jsonPath)).name;
+const arrowCppFilePath = path.join(base, 'cpp/file', `${name}.arrow`);
+const arrowJavaFilePath = path.join(base, 'java/file', 
`${name}.arrow`);
+const arrowCppStreamPath = path.join(base, 'cpp/stream', 
`${name}.arrow`);
+const arrowJavaStreamPath = path.join(base, 'java/stream', 
`${name}.arrow`);
+try {
+await generateCPPFile(jsonPath, arrowCppFilePath);
+await generateCPPStream(arrowCppFilePath, arrowCppStreamPath);
+} catch (e) { errors.push(e.message); }
+try {
+await generateJavaFile(jsonPath, arrowJavaFilePath);
+await generateJavaStream(arrowJavaFilePath, arrowJavaStreamPath);
+} catch (e) { errors.push(e.message); }
+}
+if (errors.length) {
+console.error(errors.join(`\n`));
+process.exit(1);
+}
+}
+
+async function generateCPPFile(jsonPath, filePath) {
+await rimraf(filePath);
+return await exec(
+`../cpp/build/release/json-integration-test ${
+`--integration --mode=JSON_TO_ARROW`} ${
+`--json=${path.resolve(jsonPath)} --arrow=${filePath}`}`,
+{ maxBuffer: Math.pow(2, 53) - 1 }
+);
+}
+
+async function generateCPPStream(filePath, streamPath) {
+await rimraf(streamPath);
+return await exec(
+`../cpp/build/release/file-to-stream ${filePath} > ${streamPath}`,
 
 Review comment:
   I included this in my [response 
below](https://github.com/apache/arrow/pull/1294#discussion_r150721453)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250730#comment-16250730
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-344130863
 
 
   > I'm a bit torn here. On the one hand, I don't want to check in 21mb worth 
of tests to source control. On the other hand, I don't want to hand-write the 
11k assertions that the snapshot tests represent (and would also presumably be 
many-MBs worth of tests anyway).
   
   > I believe git compresses files across the network? And if space-on-disk is 
an issue, I could add a post-clone script to automatically compress the 
snapshot files after checkout (about 3mb gzipped). Jest doesn't work with 
compressed snapshot files out of the box, but I could add some steps to the 
test runner to decompress the snapshots before running.
   
   I guess I'm not quite understanding what snapshot tests accomplish here that 
normal array comparisons would not. In Java and C++ we have functions that 
compare the contents of arrays. So when you say hand-writing the snapshot test 
assertions, what's being tested and why is that the only way to test that 
behavior? Is there a concern that a programmatic comparison like
   
   
https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/json-integration-test.cc#L180
   
   might not be as strong of an assertion as a UI-based test (what the values 
from the arrays would actually appear as in the DOM)?
   
   Having the possibility of a single PR bloating the git history by whatever 
the snap files gzip down to doesn't seem like a good idea. Even having large 
diffs as the result of automatically generated files on commit isn't ideal


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250723#comment-16250723
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150725447
 
 

 ##
 File path: js/src/vector/arrow.ts
 ##
 @@ -0,0 +1,245 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import * as Schema_ from '../format/Schema_generated';
+import * as Message_ from '../format/Message_generated';
+import Field = Schema_.org.apache.arrow.flatbuf.Field;
+import FieldNode = Message_.org.apache.arrow.flatbuf.FieldNode;
+
+import { Vector } from './vector';
+import { Utf8Vector as Utf8VectorBase } from './utf8';
+import { StructVector as StructVectorBase } from './struct';
+import { DictionaryVector as DictionaryVectorBase } from './dictionary';
+import {
+ListVector as ListVectorBase,
+BinaryVector as BinaryVectorBase,
+FixedSizeListVector as FixedSizeListVectorBase
+} from './list';
+
+import {
+BoolVector as BoolVectorBase,
+Int8Vector as Int8VectorBase,
+Int16Vector as Int16VectorBase,
+Int32Vector as Int32VectorBase,
+Int64Vector as Int64VectorBase,
+Uint8Vector as Uint8VectorBase,
+Uint16Vector as Uint16VectorBase,
+Uint32Vector as Uint32VectorBase,
+Uint64Vector as Uint64VectorBase,
+Float16Vector as Float16VectorBase,
+Float32Vector as Float32VectorBase,
+Float64Vector as Float64VectorBase,
+Date32Vector as Date32VectorBase,
+Date64Vector as Date64VectorBase,
+Time32Vector as Time32VectorBase,
+Time64Vector as Time64VectorBase,
+DecimalVector as DecimalVectorBase,
+TimestampVector as TimestampVectorBase,
+} from './numeric';
+
+import { nullableMixin, fieldMixin } from './traits';
+
+function MixinArrowTraits, TArgv>(
+Base: new (argv: TArgv) => T,
+Field: new (argv: TArgv & { field: Field, fieldNode: FieldNode }) => T,
+Nullable: new (argv: TArgv & { validity: Uint8Array }) => T,
+NullableField: new (argv: TArgv & { validity: Uint8Array, field: Field, 
fieldNode: FieldNode }) => T,
+) {
 
 Review comment:
   @TheNeuralBit but if we do want to do more compilation steps beyond what the 
TS compiler does, it'd be neat to also run [prepack on the flatbuffers 
generated 
code](https://gist.github.com/trxcllnt/84bb4893b6db957925ed7625fd0f34e5)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250718#comment-16250718
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150725447
 
 

 ##
 File path: js/src/vector/arrow.ts
 ##
 @@ -0,0 +1,245 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import * as Schema_ from '../format/Schema_generated';
+import * as Message_ from '../format/Message_generated';
+import Field = Schema_.org.apache.arrow.flatbuf.Field;
+import FieldNode = Message_.org.apache.arrow.flatbuf.FieldNode;
+
+import { Vector } from './vector';
+import { Utf8Vector as Utf8VectorBase } from './utf8';
+import { StructVector as StructVectorBase } from './struct';
+import { DictionaryVector as DictionaryVectorBase } from './dictionary';
+import {
+ListVector as ListVectorBase,
+BinaryVector as BinaryVectorBase,
+FixedSizeListVector as FixedSizeListVectorBase
+} from './list';
+
+import {
+BoolVector as BoolVectorBase,
+Int8Vector as Int8VectorBase,
+Int16Vector as Int16VectorBase,
+Int32Vector as Int32VectorBase,
+Int64Vector as Int64VectorBase,
+Uint8Vector as Uint8VectorBase,
+Uint16Vector as Uint16VectorBase,
+Uint32Vector as Uint32VectorBase,
+Uint64Vector as Uint64VectorBase,
+Float16Vector as Float16VectorBase,
+Float32Vector as Float32VectorBase,
+Float64Vector as Float64VectorBase,
+Date32Vector as Date32VectorBase,
+Date64Vector as Date64VectorBase,
+Time32Vector as Time32VectorBase,
+Time64Vector as Time64VectorBase,
+DecimalVector as DecimalVectorBase,
+TimestampVector as TimestampVectorBase,
+} from './numeric';
+
+import { nullableMixin, fieldMixin } from './traits';
+
+function MixinArrowTraits, TArgv>(
+Base: new (argv: TArgv) => T,
+Field: new (argv: TArgv & { field: Field, fieldNode: FieldNode }) => T,
+Nullable: new (argv: TArgv & { validity: Uint8Array }) => T,
+NullableField: new (argv: TArgv & { validity: Uint8Array, field: Field, 
fieldNode: FieldNode }) => T,
+) {
 
 Review comment:
   @TheNeuralBit but if we do want to do more compilation steps beyond what the 
TS compiler does, it'd be neat to also run [preval on the flatbuffers generated 
code](https://gist.github.com/trxcllnt/84bb4893b6db957925ed7625fd0f34e5)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250695#comment-16250695
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150723322
 
 

 ##
 File path: js/src/vector/arrow.ts
 ##
 @@ -0,0 +1,245 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import * as Schema_ from '../format/Schema_generated';
+import * as Message_ from '../format/Message_generated';
+import Field = Schema_.org.apache.arrow.flatbuf.Field;
+import FieldNode = Message_.org.apache.arrow.flatbuf.FieldNode;
+
+import { Vector } from './vector';
+import { Utf8Vector as Utf8VectorBase } from './utf8';
+import { StructVector as StructVectorBase } from './struct';
+import { DictionaryVector as DictionaryVectorBase } from './dictionary';
+import {
+ListVector as ListVectorBase,
+BinaryVector as BinaryVectorBase,
+FixedSizeListVector as FixedSizeListVectorBase
+} from './list';
+
+import {
+BoolVector as BoolVectorBase,
+Int8Vector as Int8VectorBase,
+Int16Vector as Int16VectorBase,
+Int32Vector as Int32VectorBase,
+Int64Vector as Int64VectorBase,
+Uint8Vector as Uint8VectorBase,
+Uint16Vector as Uint16VectorBase,
+Uint32Vector as Uint32VectorBase,
+Uint64Vector as Uint64VectorBase,
+Float16Vector as Float16VectorBase,
+Float32Vector as Float32VectorBase,
+Float64Vector as Float64VectorBase,
+Date32Vector as Date32VectorBase,
+Date64Vector as Date64VectorBase,
+Time32Vector as Time32VectorBase,
+Time64Vector as Time64VectorBase,
+DecimalVector as DecimalVectorBase,
+TimestampVector as TimestampVectorBase,
+} from './numeric';
+
+import { nullableMixin, fieldMixin } from './traits';
+
+function MixinArrowTraits, TArgv>(
+Base: new (argv: TArgv) => T,
+Field: new (argv: TArgv & { field: Field, fieldNode: FieldNode }) => T,
+Nullable: new (argv: TArgv & { validity: Uint8Array }) => T,
+NullableField: new (argv: TArgv & { validity: Uint8Array, field: Field, 
fieldNode: FieldNode }) => T,
+) {
 
 Review comment:
   @TheNeuralBit yeah we could use 
[babel-codegen](https://github.com/kentcdodds/babel-plugin-codegen), 
[babel-preval](https://github.com/kentcdodds/babel-plugin-preval), or 
[babel-macros](https://github.com/kentcdodds/babel-macros) if we want. I was 
hoping to avoid babel if possible, but since we're webpacking the es2015+ UMD 
bundles anyway, it wouldn't be too much of a headache.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250682#comment-16250682
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150721453
 
 

 ##
 File path: js/test/arrows/json/datetime.json
 ##
 @@ -0,0 +1,1091 @@
+{
 
 Review comment:
   @wesm yes, that would be ideal.
   
   I generated the JSON from the python integration tests 
like this (click to expand)
   
   ```python
   from integration_test import generate_nested_case
   from integration_test import generate_decimal_case
   from integration_test import generate_datetime_case
   from integration_test import generate_primitive_case
   from integration_test import generate_dictionary_case
   
   generate_nested_case().write("../js/test/arrows/json/nested.json")
   generate_decimal_case().write("../js/test/arrows/json/decimal.json")
   generate_datetime_case().write("../js/test/arrows/json/datetime.json")
   generate_dictionary_case().write("../js/test/arrows/json/dictionary.json")
   generate_primitive_case([7, 
10]).write("../js/test/arrows/json/primitive.json")
   generate_primitive_case([0, 0, 
0]).write("../js/test/arrows/json/primitive-empty.json")
   ```
   
   
   Then I had to manually edit the "primitive.json" file to 
remove the "binary_nullable" and "binary_nonnullable" columns, because the C++ 
command fails if they're present (click to expand)
   
   
   ```sh
   $ ../cpp/build/release/json-integration-test \
 --integration --mode=JSON_TO_ARROW \
 --json=./test/arrows/json/primitive.json \
 --arrow=./test/arrows/cpp/file/primitive.arrow
   Found schema: bool_nullable: bool
   bool_nonnullable: bool not null
   int8_nullable: int8
   int8_nonnullable: int8 not null
   int16_nullable: int16
   int16_nonnullable: int16 not null
   int32_nullable: int32
   int32_nonnullable: int32 not null
   int64_nullable: int64
   int64_nonnullable: int64 not null
   uint8_nullable: uint8
   uint8_nonnullable: uint8 not null
   uint16_nullable: uint16
   uint16_nonnullable: uint16 not null
   uint32_nullable: uint32
   uint32_nonnullable: uint32 not null
   uint64_nullable: uint64
   uint64_nonnullable: uint64 not null
   float32_nullable: float
   float32_nonnullable: float not null
   float64_nullable: double
   float64_nonnullable: double not null
   binary_nullable: binary
   binary_nonnullable: binary not null
   utf8_nullable: string
   utf8_nonnullable: string not null
   Error message: Invalid: Encountered non-hex digit
   ```
   
   
   The unit tests rely heavily on [snapshot 
testing](https://facebook.github.io/jest/docs/en/snapshot-testing.html) to 
validate the actual values in the vectors. I manually validated the data in the 
snapshots against the buffers using pyarrow and pandas, but that approach won't 
scale. Typically the snapshot files get checked into version control, but now 
that we have 11k snapshots, the snapshot files are around 21mb. I removed them 
from the repo b/c we don't want huge files. Now the CI server generates the 
snapshots once up front, then validates the compilation targets against those.
   
   This will catch any cases where compiling the JS to different targets leads 
to failures (e.g. if the minifiers mangle names they weren't supposed to), but 
since we're not checking in the snapshot files, the CI server won't be able to 
tell us if a new PR causes a snapshot test to break. We _can_ know that if we 
run the tests locally, but we can't rely us running the tests for each PR 
locally before merging.
   
   I'm a bit torn here. On the one hand, I don't want to check in 21mb worth of 
tests to source control. On the other hand, I don't want to hand-write the 11k 
assertions that the snapshot tests represent (and would also presumably be 
many-MBs worth of tests anyway).
   
   I believe git compresses files across the network? And if space-on-disk is 
an issue, I could add a post-clone script to automatically compress the 
snapshot files after checkout (about 3mb gzipped). Jest doesn't work with 
compressed snapshot files out of the box, but I could add some steps to the 
test runner to decompress the snapshots before running.
   
   To your point about using the C++/Java writers to convert the JSON to Arrow 
buffers on the fly, we should 100% do that. This PR is marginally better since 
we can at least regenerate the arrow files easily enough, but ideally we don't 
have them at all and we can pipe them to the node process on the fly, or at a 
minimum, write to files then clean up after. We'll want a mode for local dev 
that skips this step, as incurring the JVM overhead to convert JSON to Arrow 
files is painful for debugging.
   
   I left the code in there (commented out) to draw attention to this 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250225#comment-16250225
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

TheNeuralBit commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150661882
 
 

 ##
 File path: js/src/vector/arrow.ts
 ##
 @@ -0,0 +1,245 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import * as Schema_ from '../format/Schema_generated';
+import * as Message_ from '../format/Message_generated';
+import Field = Schema_.org.apache.arrow.flatbuf.Field;
+import FieldNode = Message_.org.apache.arrow.flatbuf.FieldNode;
+
+import { Vector } from './vector';
+import { Utf8Vector as Utf8VectorBase } from './utf8';
+import { StructVector as StructVectorBase } from './struct';
+import { DictionaryVector as DictionaryVectorBase } from './dictionary';
+import {
+ListVector as ListVectorBase,
+BinaryVector as BinaryVectorBase,
+FixedSizeListVector as FixedSizeListVectorBase
+} from './list';
+
+import {
+BoolVector as BoolVectorBase,
+Int8Vector as Int8VectorBase,
+Int16Vector as Int16VectorBase,
+Int32Vector as Int32VectorBase,
+Int64Vector as Int64VectorBase,
+Uint8Vector as Uint8VectorBase,
+Uint16Vector as Uint16VectorBase,
+Uint32Vector as Uint32VectorBase,
+Uint64Vector as Uint64VectorBase,
+Float16Vector as Float16VectorBase,
+Float32Vector as Float32VectorBase,
+Float64Vector as Float64VectorBase,
+Date32Vector as Date32VectorBase,
+Date64Vector as Date64VectorBase,
+Time32Vector as Time32VectorBase,
+Time64Vector as Time64VectorBase,
+DecimalVector as DecimalVectorBase,
+TimestampVector as TimestampVectorBase,
+} from './numeric';
+
+import { nullableMixin, fieldMixin } from './traits';
+
+function MixinArrowTraits, TArgv>(
+Base: new (argv: TArgv) => T,
+Field: new (argv: TArgv & { field: Field, fieldNode: FieldNode }) => T,
+Nullable: new (argv: TArgv & { validity: Uint8Array }) => T,
+NullableField: new (argv: TArgv & { validity: Uint8Array, field: Field, 
fieldNode: FieldNode }) => T,
+) {
 
 Review comment:
   Ah makes sense, I figured there must be a good reason. This seems like a 
great application for some kind of preprocessor with macros or a code 
generator... but I don't know of any that would integrate well with JS 
build/dev tools


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250171#comment-16250171
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150653618
 
 

 ##
 File path: js/src/vector/arrow.ts
 ##
 @@ -0,0 +1,245 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import * as Schema_ from '../format/Schema_generated';
+import * as Message_ from '../format/Message_generated';
+import Field = Schema_.org.apache.arrow.flatbuf.Field;
+import FieldNode = Message_.org.apache.arrow.flatbuf.FieldNode;
+
+import { Vector } from './vector';
+import { Utf8Vector as Utf8VectorBase } from './utf8';
+import { StructVector as StructVectorBase } from './struct';
+import { DictionaryVector as DictionaryVectorBase } from './dictionary';
+import {
+ListVector as ListVectorBase,
+BinaryVector as BinaryVectorBase,
+FixedSizeListVector as FixedSizeListVectorBase
+} from './list';
+
+import {
+BoolVector as BoolVectorBase,
+Int8Vector as Int8VectorBase,
+Int16Vector as Int16VectorBase,
+Int32Vector as Int32VectorBase,
+Int64Vector as Int64VectorBase,
+Uint8Vector as Uint8VectorBase,
+Uint16Vector as Uint16VectorBase,
+Uint32Vector as Uint32VectorBase,
+Uint64Vector as Uint64VectorBase,
+Float16Vector as Float16VectorBase,
+Float32Vector as Float32VectorBase,
+Float64Vector as Float64VectorBase,
+Date32Vector as Date32VectorBase,
+Date64Vector as Date64VectorBase,
+Time32Vector as Time32VectorBase,
+Time64Vector as Time64VectorBase,
+DecimalVector as DecimalVectorBase,
+TimestampVector as TimestampVectorBase,
+} from './numeric';
+
+import { nullableMixin, fieldMixin } from './traits';
+
+function MixinArrowTraits, TArgv>(
+Base: new (argv: TArgv) => T,
+Field: new (argv: TArgv & { field: Field, fieldNode: FieldNode }) => T,
+Nullable: new (argv: TArgv & { validity: Uint8Array }) => T,
+NullableField: new (argv: TArgv & { validity: Uint8Array, field: Field, 
fieldNode: FieldNode }) => T,
+) {
 
 Review comment:
   @TheNeuralBit yeah so the ES6 class spec states that the `name` property of 
a class constructor is immutable (and they're also not allowed to be computed 
properties, like `let x = 'myClass'; class [x] extends Foo {}`). And anonymous 
class names default to "Object", reading as `Object { data: Int32Array }` 
instead of `Int32Vector` when debugging. While this is ugly and hard to scale 
if we want to add more mixin behaviors, I figure it's a win for anyone using 
the library in the real world to see descriptive class names.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250159#comment-16250159
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150652141
 
 

 ##
 File path: js/src/reader/arrow.ts
 ##
 @@ -15,64 +15,135 @@
 // specific language governing permissions and limitations
 // under the License.
 
+import { Vector } from '../vector/vector';
 import { flatbuffers } from 'flatbuffers';
+import { readVector, readValueVector } from './vector';
+import {
+readFileFooter, readFileMessages,
+readStreamSchema, readStreamMessages
+} from './format';
+
+import * as File_ from '../format/File_generated';
 import * as Schema_ from '../format/Schema_generated';
 import * as Message_ from '../format/Message_generated';
-export import Schema = Schema_.org.apache.arrow.flatbuf.Schema;
-export import RecordBatch = Message_.org.apache.arrow.flatbuf.RecordBatch;
-
-import { readFile } from './file';
-import { readStream } from './stream';
-import { readVector } from './vector';
-import { readDictionary } from './dictionary';
-import { Vector, Column } from '../types/types';
 
 import ByteBuffer = flatbuffers.ByteBuffer;
+import Footer = File_.org.apache.arrow.flatbuf.Footer;
 import Field = Schema_.org.apache.arrow.flatbuf.Field;
-export type Dictionaries = { [k: string]: Vector } | null;
-export type IteratorState = { nodeIndex: number; bufferIndex: number };
-
-export function* readRecords(...bytes: ByteBuffer[]) {
-try {
-yield* readFile(...bytes);
-} catch (e) {
-try {
-yield* readStream(...bytes);
-} catch (e) {
-throw new Error('Invalid Arrow buffer');
-}
+import Schema = Schema_.org.apache.arrow.flatbuf.Schema;
+import Message = Message_.org.apache.arrow.flatbuf.Message;
+import RecordBatch = Message_.org.apache.arrow.flatbuf.RecordBatch;
+import MessageHeader = Message_.org.apache.arrow.flatbuf.MessageHeader;
+import DictionaryBatch = Message_.org.apache.arrow.flatbuf.DictionaryBatch;
+import DictionaryEncoding = 
Schema_.org.apache.arrow.flatbuf.DictionaryEncoding;
+
+export type ArrowReaderContext = {
+schema?: Schema;
+footer?: Footer | null;
+dictionaries: Map;
+dictionaryEncodedFields: Map;
+readMessages: (bb: ByteBuffer, footer: Footer) => Iterable;
+};
+
+export type VectorReaderContext = {
+node: number;
+buffer: number;
+offset: number;
+bytes: Uint8Array;
+batch: RecordBatch;
+dictionaries: Map;
+};
+
+export function* readVectors(buffers: Iterable, 
context?: ArrowReaderContext) {
+const context_ = context || {} as ArrowReaderContext;
+for (const buffer of buffers) {
+yield* readBuffer(toByteBuffer(buffer), context_);
 }
 }
 
-export function* readBuffers(...bytes: Array) {
-const dictionaries: Dictionaries = {};
-const byteBuffers = bytes.map(toByteBuffer);
-for (let { schema, batch } of readRecords(...byteBuffers)) {
-let vectors: Column[] = [];
-let state = { nodeIndex: 0, bufferIndex: 0 };
-let fieldsLength = schema.fieldsLength();
-let index = -1, field: Field, vector: Vector;
-if (batch.id) {
-// A dictionary batch only contain a single vector. Traverse each
-// field and its children until we find one that uses this 
dictionary
-while (++index < fieldsLength) {
-if (field = schema.fields(index)!) {
-if (vector = readDictionary(field, batch, state, 
dictionaries)!) {
-dictionaries[batch.id] = dictionaries[batch.id] && 
dictionaries[batch.id].concat(vector) || vector;
-break;
-}
+export async function* readVectorsAsync(buffers: AsyncIterable, context?: ArrowReaderContext) {
+const context_ = context || {} as ArrowReaderContext;
+for await (const buffer of buffers) {
+yield* readBuffer(toByteBuffer(buffer), context_);
+}
+}
+
+function* readBuffer(bb: ByteBuffer, readerContext: ArrowReaderContext) {
 
 Review comment:
   @wesm anything type-related (type annotations, interfaces, generics, and 
declarations like `type = { foo: string }`) is TypeScript, the rest is ES. If 
you're curious about an individual feature, you can run the build (`npm run 
build`) and compare the transpiled output (in the `targets` directory) with the 
TS source. We transpile to multiple JS versions and module formats, but it's 
probably easiest to compare against the `targets/es2015/esm` or 
`targets/esnext/esm`.
   
   TS code-gens/polyfills missing features depending on the target environment. 
For example, ES5 doesn't have generators, so TS 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250162#comment-16250162
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150652141
 
 

 ##
 File path: js/src/reader/arrow.ts
 ##
 @@ -15,64 +15,135 @@
 // specific language governing permissions and limitations
 // under the License.
 
+import { Vector } from '../vector/vector';
 import { flatbuffers } from 'flatbuffers';
+import { readVector, readValueVector } from './vector';
+import {
+readFileFooter, readFileMessages,
+readStreamSchema, readStreamMessages
+} from './format';
+
+import * as File_ from '../format/File_generated';
 import * as Schema_ from '../format/Schema_generated';
 import * as Message_ from '../format/Message_generated';
-export import Schema = Schema_.org.apache.arrow.flatbuf.Schema;
-export import RecordBatch = Message_.org.apache.arrow.flatbuf.RecordBatch;
-
-import { readFile } from './file';
-import { readStream } from './stream';
-import { readVector } from './vector';
-import { readDictionary } from './dictionary';
-import { Vector, Column } from '../types/types';
 
 import ByteBuffer = flatbuffers.ByteBuffer;
+import Footer = File_.org.apache.arrow.flatbuf.Footer;
 import Field = Schema_.org.apache.arrow.flatbuf.Field;
-export type Dictionaries = { [k: string]: Vector } | null;
-export type IteratorState = { nodeIndex: number; bufferIndex: number };
-
-export function* readRecords(...bytes: ByteBuffer[]) {
-try {
-yield* readFile(...bytes);
-} catch (e) {
-try {
-yield* readStream(...bytes);
-} catch (e) {
-throw new Error('Invalid Arrow buffer');
-}
+import Schema = Schema_.org.apache.arrow.flatbuf.Schema;
+import Message = Message_.org.apache.arrow.flatbuf.Message;
+import RecordBatch = Message_.org.apache.arrow.flatbuf.RecordBatch;
+import MessageHeader = Message_.org.apache.arrow.flatbuf.MessageHeader;
+import DictionaryBatch = Message_.org.apache.arrow.flatbuf.DictionaryBatch;
+import DictionaryEncoding = 
Schema_.org.apache.arrow.flatbuf.DictionaryEncoding;
+
+export type ArrowReaderContext = {
+schema?: Schema;
+footer?: Footer | null;
+dictionaries: Map;
+dictionaryEncodedFields: Map;
+readMessages: (bb: ByteBuffer, footer: Footer) => Iterable;
+};
+
+export type VectorReaderContext = {
+node: number;
+buffer: number;
+offset: number;
+bytes: Uint8Array;
+batch: RecordBatch;
+dictionaries: Map;
+};
+
+export function* readVectors(buffers: Iterable, 
context?: ArrowReaderContext) {
+const context_ = context || {} as ArrowReaderContext;
+for (const buffer of buffers) {
+yield* readBuffer(toByteBuffer(buffer), context_);
 }
 }
 
-export function* readBuffers(...bytes: Array) {
-const dictionaries: Dictionaries = {};
-const byteBuffers = bytes.map(toByteBuffer);
-for (let { schema, batch } of readRecords(...byteBuffers)) {
-let vectors: Column[] = [];
-let state = { nodeIndex: 0, bufferIndex: 0 };
-let fieldsLength = schema.fieldsLength();
-let index = -1, field: Field, vector: Vector;
-if (batch.id) {
-// A dictionary batch only contain a single vector. Traverse each
-// field and its children until we find one that uses this 
dictionary
-while (++index < fieldsLength) {
-if (field = schema.fields(index)!) {
-if (vector = readDictionary(field, batch, state, 
dictionaries)!) {
-dictionaries[batch.id] = dictionaries[batch.id] && 
dictionaries[batch.id].concat(vector) || vector;
-break;
-}
+export async function* readVectorsAsync(buffers: AsyncIterable, context?: ArrowReaderContext) {
+const context_ = context || {} as ArrowReaderContext;
+for await (const buffer of buffers) {
+yield* readBuffer(toByteBuffer(buffer), context_);
+}
+}
+
+function* readBuffer(bb: ByteBuffer, readerContext: ArrowReaderContext) {
 
 Review comment:
   @wesm anything type-related (type annotations, interfaces, generics, and 
declarations like `type = { foo: string }`) is TypeScript, the rest is ES. If 
you're curious about an individual feature, you can run the build (`npm run 
build`) and compare the transpiled output (in the `targets` directory) with the 
TS source. We transpile to multiple JS versions and module formats, but it's 
probably easiest to compare against the `targets/es2015/esm` or 
`targets/esnext/esm`.
   
   TS code-gens/polyfills missing features depending on the target environment. 
For example, ES5 doesn't have generators, so TS 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250158#comment-16250158
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150652141
 
 

 ##
 File path: js/src/reader/arrow.ts
 ##
 @@ -15,64 +15,135 @@
 // specific language governing permissions and limitations
 // under the License.
 
+import { Vector } from '../vector/vector';
 import { flatbuffers } from 'flatbuffers';
+import { readVector, readValueVector } from './vector';
+import {
+readFileFooter, readFileMessages,
+readStreamSchema, readStreamMessages
+} from './format';
+
+import * as File_ from '../format/File_generated';
 import * as Schema_ from '../format/Schema_generated';
 import * as Message_ from '../format/Message_generated';
-export import Schema = Schema_.org.apache.arrow.flatbuf.Schema;
-export import RecordBatch = Message_.org.apache.arrow.flatbuf.RecordBatch;
-
-import { readFile } from './file';
-import { readStream } from './stream';
-import { readVector } from './vector';
-import { readDictionary } from './dictionary';
-import { Vector, Column } from '../types/types';
 
 import ByteBuffer = flatbuffers.ByteBuffer;
+import Footer = File_.org.apache.arrow.flatbuf.Footer;
 import Field = Schema_.org.apache.arrow.flatbuf.Field;
-export type Dictionaries = { [k: string]: Vector } | null;
-export type IteratorState = { nodeIndex: number; bufferIndex: number };
-
-export function* readRecords(...bytes: ByteBuffer[]) {
-try {
-yield* readFile(...bytes);
-} catch (e) {
-try {
-yield* readStream(...bytes);
-} catch (e) {
-throw new Error('Invalid Arrow buffer');
-}
+import Schema = Schema_.org.apache.arrow.flatbuf.Schema;
+import Message = Message_.org.apache.arrow.flatbuf.Message;
+import RecordBatch = Message_.org.apache.arrow.flatbuf.RecordBatch;
+import MessageHeader = Message_.org.apache.arrow.flatbuf.MessageHeader;
+import DictionaryBatch = Message_.org.apache.arrow.flatbuf.DictionaryBatch;
+import DictionaryEncoding = 
Schema_.org.apache.arrow.flatbuf.DictionaryEncoding;
+
+export type ArrowReaderContext = {
+schema?: Schema;
+footer?: Footer | null;
+dictionaries: Map;
+dictionaryEncodedFields: Map;
+readMessages: (bb: ByteBuffer, footer: Footer) => Iterable;
+};
+
+export type VectorReaderContext = {
+node: number;
+buffer: number;
+offset: number;
+bytes: Uint8Array;
+batch: RecordBatch;
+dictionaries: Map;
+};
+
+export function* readVectors(buffers: Iterable, 
context?: ArrowReaderContext) {
+const context_ = context || {} as ArrowReaderContext;
+for (const buffer of buffers) {
+yield* readBuffer(toByteBuffer(buffer), context_);
 }
 }
 
-export function* readBuffers(...bytes: Array) {
-const dictionaries: Dictionaries = {};
-const byteBuffers = bytes.map(toByteBuffer);
-for (let { schema, batch } of readRecords(...byteBuffers)) {
-let vectors: Column[] = [];
-let state = { nodeIndex: 0, bufferIndex: 0 };
-let fieldsLength = schema.fieldsLength();
-let index = -1, field: Field, vector: Vector;
-if (batch.id) {
-// A dictionary batch only contain a single vector. Traverse each
-// field and its children until we find one that uses this 
dictionary
-while (++index < fieldsLength) {
-if (field = schema.fields(index)!) {
-if (vector = readDictionary(field, batch, state, 
dictionaries)!) {
-dictionaries[batch.id] = dictionaries[batch.id] && 
dictionaries[batch.id].concat(vector) || vector;
-break;
-}
+export async function* readVectorsAsync(buffers: AsyncIterable, context?: ArrowReaderContext) {
+const context_ = context || {} as ArrowReaderContext;
+for await (const buffer of buffers) {
+yield* readBuffer(toByteBuffer(buffer), context_);
+}
+}
+
+function* readBuffer(bb: ByteBuffer, readerContext: ArrowReaderContext) {
 
 Review comment:
   @wesm anything type-related (type annotations, interfaces, generics, and 
declarations like `type = { foo: string }`) is TypeScript, the rest is ES. If 
you're curious about an individual feature, you can run the build `npm run 
build` and compare the transpiled output (in the `targets` directory) with the 
TS source. We transpile to multiple JS versions and module formats, but it's 
probably easiest to compare against the `targets/es2015/esm` or 
`targets/esnext/esm`.
   
   TS code-gens/polyfills missing features depending on the target environment. 
For example, ES5 doesn't have generators, so TS 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250148#comment-16250148
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150649634
 
 

 ##
 File path: js/gulpfile.js
 ##
 @@ -86,9 +86,9 @@ const buildConcurrent = (tasks) => () =>
 .merge(...knownTargets.map((target) =>
 del(`${targetDir(target, `cls`)}/**`);
   
-gulp.task( `test`, gulp.series(getTasks(`test`)));
-gulp.task(`debug`, gulp.series(getTasks(`debug`)));
-gulp.task(`clean`, gulp.parallel(getTasks(`clean`)));
+gulp.task( `test`, gulp.series(/*createTestData,*/ getTasks(`test`)/*, 
cleanTestData*/));
+gulp.task(`debug`, gulp.series(/*createTestData,*/ getTasks(`debug`)/*, 
cleanTestData*/));
+gulp.task(`clean`, gulp.parallel(/*cleanTestData,*/ getTasks(`clean`)));
 
 Review comment:
   @TheNeuralBit yes, definitely. I put these in to remind us to generate test 
data on the fly (and how to do it) for the tests, and remove the arrow files 
from the test directory. Making sure the CI environment had the C++ and Java 
libs built before the JS tests run was a bit more than I could bite off 
yesterday on my flight home :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250143#comment-16250143
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150648586
 
 

 ##
 File path: js/test/integration-tests.ts
 ##
 @@ -0,0 +1,114 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import Arrow from './Arrow';
+import { zip } from 'ix/iterable/zip';
+import { config, formats } from './test-config';
+
+const { Table, readVectors } = Arrow;
+
+expect.extend({
+toEqualVector(v1: any, v2: any) {
+
+const format = (x: any, y: any, msg= ' ') => `${
+this.utils.printExpected(x)}${
+msg}${
+this.utils.printReceived(y)
+}`;
+
+let getFailures = new Array();
+let propsFailures = new Array();
+let iteratorFailures = new Array();
+let allFailures = [
+{ title: 'get', failures: getFailures },
+{ title: 'props', failures: propsFailures },
+{ title: 'iterator', failures: iteratorFailures }
+];
+
+let props = ['name', 'type', 'length', 'nullable', 'nullCount', 
'metadata'];
+for (let i = -1, n = props.length; ++i < n;) {
+const prop = props[i];
+if (this.utils.stringify(v1[prop]) !== 
this.utils.stringify(v2[prop])) {
+propsFailures.push(`${prop}: ${format(v1[prop], v2[prop], ' 
!== ')}`);
+}
+}
+
+for (let i = -1, n = v1.length; ++i < n;) {
+let x1 = v1.get(i), x2 = v2.get(i);
+if (this.utils.stringify(x1) !== this.utils.stringify(x2)) {
+getFailures.push(`${i}: ${format(x1, x2, ' !== ')}`);
+}
+}
+
+let i = -1;
+for (let [x1, x2] of zip(v1, v2)) {
+++i;
+if (this.utils.stringify(x1) !== this.utils.stringify(x2)) {
+iteratorFailures.push(`${i}: ${format(x1, x2, ' !== ')}`);
+}
+}
+
+return {
+pass: allFailures.every(({ failures }) => failures.length === 0),
+message: () => [
+`${v1.name}: (${format('cpp', 'java', ' !== ')})\n`,
+...allFailures.map(({ failures, title }) =>
+!failures.length ? `` : [`${title}:`, 
...failures].join(`\n`))
+].join('\n')
+};
+}
+});
+
+describe(`Integration`, () => {
+for (const format of formats) {
+describe(format, () => {
+for (const [cppArrow, javaArrow] of zip(config.cpp[format], 
config.java[format])) {
+describe(`${cppArrow.name}`, () => {
+testReaderIntegration(cppArrow.buffers, javaArrow.buffers);
+testTableFromBuffersIntegration(cppArrow.buffers, 
javaArrow.buffers);
+});
+}
+});
+}
+});
+
+function testReaderIntegration(cppBuffers: Uint8Array[], javaBuffers: 
Uint8Array[]) {
+test(`cpp and java vectors report the same values`, () => {
+expect.hasAssertions();
+for (const [cppVectors, javaVectors] of zip(readVectors(cppBuffers), 
readVectors(javaBuffers))) {
+expect(cppVectors.length).toEqual(javaVectors.length);
+for (let i = -1, n = cppVectors.length; ++i < n;) {
+(expect(cppVectors[i]) as any).toEqualVector(javaVectors[i]);
+}
+}
+});
+}
+
+function testTableFromBuffersIntegration(cppBuffers: Uint8Array[], 
javaBuffers: Uint8Array[]) {
+test(`cpp and java tables report the same values`, () => {
+expect.hasAssertions();
+const cppTable = Table.from(cppBuffers);
+const javaTable = Table.from(javaBuffers);
+const cppVectors = cppTable.columns;
+const javaVectors = javaTable.columns;
+expect(cppTable.length).toEqual(javaTable.length);
+expect(cppVectors.length).toEqual(javaVectors.length);
+for (let i = -1, n = cppVectors.length; 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250051#comment-16250051
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading 
C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150642270
 
 

 ##
 File path: js/test/integration-tests.ts
 ##
 @@ -0,0 +1,114 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import Arrow from './Arrow';
+import { zip } from 'ix/iterable/zip';
+import { config, formats } from './test-config';
+
+const { Table, readVectors } = Arrow;
+
+expect.extend({
+toEqualVector(v1: any, v2: any) {
+
+const format = (x: any, y: any, msg= ' ') => `${
+this.utils.printExpected(x)}${
+msg}${
+this.utils.printReceived(y)
+}`;
+
+let getFailures = new Array();
+let propsFailures = new Array();
+let iteratorFailures = new Array();
+let allFailures = [
+{ title: 'get', failures: getFailures },
+{ title: 'props', failures: propsFailures },
+{ title: 'iterator', failures: iteratorFailures }
+];
+
+let props = ['name', 'type', 'length', 'nullable', 'nullCount', 
'metadata'];
+for (let i = -1, n = props.length; ++i < n;) {
+const prop = props[i];
+if (this.utils.stringify(v1[prop]) !== 
this.utils.stringify(v2[prop])) {
+propsFailures.push(`${prop}: ${format(v1[prop], v2[prop], ' 
!== ')}`);
+}
+}
+
+for (let i = -1, n = v1.length; ++i < n;) {
+let x1 = v1.get(i), x2 = v2.get(i);
+if (this.utils.stringify(x1) !== this.utils.stringify(x2)) {
+getFailures.push(`${i}: ${format(x1, x2, ' !== ')}`);
+}
+}
+
+let i = -1;
+for (let [x1, x2] of zip(v1, v2)) {
+++i;
+if (this.utils.stringify(x1) !== this.utils.stringify(x2)) {
+iteratorFailures.push(`${i}: ${format(x1, x2, ' !== ')}`);
+}
+}
+
+return {
+pass: allFailures.every(({ failures }) => failures.length === 0),
+message: () => [
+`${v1.name}: (${format('cpp', 'java', ' !== ')})\n`,
+...allFailures.map(({ failures, title }) =>
+!failures.length ? `` : [`${title}:`, 
...failures].join(`\n`))
+].join('\n')
+};
+}
+});
+
+describe(`Integration`, () => {
+for (const format of formats) {
+describe(format, () => {
+for (const [cppArrow, javaArrow] of zip(config.cpp[format], 
config.java[format])) {
+describe(`${cppArrow.name}`, () => {
+testReaderIntegration(cppArrow.buffers, javaArrow.buffers);
+testTableFromBuffersIntegration(cppArrow.buffers, 
javaArrow.buffers);
+});
+}
+});
+}
+});
+
+function testReaderIntegration(cppBuffers: Uint8Array[], javaBuffers: 
Uint8Array[]) {
+test(`cpp and java vectors report the same values`, () => {
+expect.hasAssertions();
+for (const [cppVectors, javaVectors] of zip(readVectors(cppBuffers), 
readVectors(javaBuffers))) {
+expect(cppVectors.length).toEqual(javaVectors.length);
+for (let i = -1, n = cppVectors.length; ++i < n;) {
+(expect(cppVectors[i]) as any).toEqualVector(javaVectors[i]);
+}
+}
+});
+}
+
+function testTableFromBuffersIntegration(cppBuffers: Uint8Array[], 
javaBuffers: Uint8Array[]) {
+test(`cpp and java tables report the same values`, () => {
+expect.hasAssertions();
+const cppTable = Table.from(cppBuffers);
+const javaTable = Table.from(javaBuffers);
+const cppVectors = cppTable.columns;
+const javaVectors = javaTable.columns;
+expect(cppTable.length).toEqual(javaTable.length);
+expect(cppVectors.length).toEqual(javaVectors.length);
+for (let i = -1, n = cppVectors.length; ++i < 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249782#comment-16249782
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

TheNeuralBit commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150583457
 
 

 ##
 File path: js/gulpfile.js
 ##
 @@ -86,9 +86,9 @@ const buildConcurrent = (tasks) => () =>
 .merge(...knownTargets.map((target) =>
 del(`${targetDir(target, `cls`)}/**`);
   
-gulp.task( `test`, gulp.series(getTasks(`test`)));
-gulp.task(`debug`, gulp.series(getTasks(`debug`)));
-gulp.task(`clean`, gulp.parallel(getTasks(`clean`)));
+gulp.task( `test`, gulp.series(/*createTestData,*/ getTasks(`test`)/*, 
cleanTestData*/));
+gulp.task(`debug`, gulp.series(/*createTestData,*/ getTasks(`debug`)/*, 
cleanTestData*/));
+gulp.task(`clean`, gulp.parallel(/*cleanTestData,*/ getTasks(`clean`)));
 
 Review comment:
   Should `createTestData` and `cleanTestData` be uncommented so we can remove 
the arrow files from the repo? I'm thinking these are probably commented now so 
that other contributors will be able to run the tests without building the Java 
and C++ impls - if that's the case, maybe we should separate out integration 
tests, which require the other libraries, and unit tests, which can be run 
stand-alone?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249783#comment-16249783
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

TheNeuralBit commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150566841
 
 

 ##
 File path: js/test/integration-tests.ts
 ##
 @@ -0,0 +1,114 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import Arrow from './Arrow';
+import { zip } from 'ix/iterable/zip';
+import { config, formats } from './test-config';
+
+const { Table, readVectors } = Arrow;
+
+expect.extend({
+toEqualVector(v1: any, v2: any) {
+
+const format = (x: any, y: any, msg= ' ') => `${
+this.utils.printExpected(x)}${
+msg}${
+this.utils.printReceived(y)
+}`;
+
+let getFailures = new Array();
+let propsFailures = new Array();
+let iteratorFailures = new Array();
+let allFailures = [
+{ title: 'get', failures: getFailures },
+{ title: 'props', failures: propsFailures },
+{ title: 'iterator', failures: iteratorFailures }
+];
+
+let props = ['name', 'type', 'length', 'nullable', 'nullCount', 
'metadata'];
+for (let i = -1, n = props.length; ++i < n;) {
+const prop = props[i];
+if (this.utils.stringify(v1[prop]) !== 
this.utils.stringify(v2[prop])) {
+propsFailures.push(`${prop}: ${format(v1[prop], v2[prop], ' 
!== ')}`);
+}
+}
+
+for (let i = -1, n = v1.length; ++i < n;) {
+let x1 = v1.get(i), x2 = v2.get(i);
+if (this.utils.stringify(x1) !== this.utils.stringify(x2)) {
+getFailures.push(`${i}: ${format(x1, x2, ' !== ')}`);
+}
+}
+
+let i = -1;
+for (let [x1, x2] of zip(v1, v2)) {
+++i;
+if (this.utils.stringify(x1) !== this.utils.stringify(x2)) {
+iteratorFailures.push(`${i}: ${format(x1, x2, ' !== ')}`);
+}
+}
+
+return {
+pass: allFailures.every(({ failures }) => failures.length === 0),
+message: () => [
+`${v1.name}: (${format('cpp', 'java', ' !== ')})\n`,
+...allFailures.map(({ failures, title }) =>
+!failures.length ? `` : [`${title}:`, 
...failures].join(`\n`))
+].join('\n')
+};
+}
+});
+
+describe(`Integration`, () => {
+for (const format of formats) {
+describe(format, () => {
+for (const [cppArrow, javaArrow] of zip(config.cpp[format], 
config.java[format])) {
+describe(`${cppArrow.name}`, () => {
+testReaderIntegration(cppArrow.buffers, javaArrow.buffers);
+testTableFromBuffersIntegration(cppArrow.buffers, 
javaArrow.buffers);
+});
+}
+});
+}
+});
+
+function testReaderIntegration(cppBuffers: Uint8Array[], javaBuffers: 
Uint8Array[]) {
+test(`cpp and java vectors report the same values`, () => {
+expect.hasAssertions();
+for (const [cppVectors, javaVectors] of zip(readVectors(cppBuffers), 
readVectors(javaBuffers))) {
+expect(cppVectors.length).toEqual(javaVectors.length);
+for (let i = -1, n = cppVectors.length; ++i < n;) {
+(expect(cppVectors[i]) as any).toEqualVector(javaVectors[i]);
+}
+}
+});
+}
+
+function testTableFromBuffersIntegration(cppBuffers: Uint8Array[], 
javaBuffers: Uint8Array[]) {
+test(`cpp and java tables report the same values`, () => {
+expect.hasAssertions();
+const cppTable = Table.from(cppBuffers);
+const javaTable = Table.from(javaBuffers);
+const cppVectors = cppTable.columns;
+const javaVectors = javaTable.columns;
+expect(cppTable.length).toEqual(javaTable.length);
+expect(cppVectors.length).toEqual(javaVectors.length);
+for (let i = -1, n = 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249781#comment-16249781
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

TheNeuralBit commented on a change in pull request #1294: ARROW-1693: [JS] Fix 
reading C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150558328
 
 

 ##
 File path: js/src/vector/arrow.ts
 ##
 @@ -0,0 +1,245 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import * as Schema_ from '../format/Schema_generated';
+import * as Message_ from '../format/Message_generated';
+import Field = Schema_.org.apache.arrow.flatbuf.Field;
+import FieldNode = Message_.org.apache.arrow.flatbuf.FieldNode;
+
+import { Vector } from './vector';
+import { Utf8Vector as Utf8VectorBase } from './utf8';
+import { StructVector as StructVectorBase } from './struct';
+import { DictionaryVector as DictionaryVectorBase } from './dictionary';
+import {
+ListVector as ListVectorBase,
+BinaryVector as BinaryVectorBase,
+FixedSizeListVector as FixedSizeListVectorBase
+} from './list';
+
+import {
+BoolVector as BoolVectorBase,
+Int8Vector as Int8VectorBase,
+Int16Vector as Int16VectorBase,
+Int32Vector as Int32VectorBase,
+Int64Vector as Int64VectorBase,
+Uint8Vector as Uint8VectorBase,
+Uint16Vector as Uint16VectorBase,
+Uint32Vector as Uint32VectorBase,
+Uint64Vector as Uint64VectorBase,
+Float16Vector as Float16VectorBase,
+Float32Vector as Float32VectorBase,
+Float64Vector as Float64VectorBase,
+Date32Vector as Date32VectorBase,
+Date64Vector as Date64VectorBase,
+Time32Vector as Time32VectorBase,
+Time64Vector as Time64VectorBase,
+DecimalVector as DecimalVectorBase,
+TimestampVector as TimestampVectorBase,
+} from './numeric';
+
+import { nullableMixin, fieldMixin } from './traits';
+
+function MixinArrowTraits, TArgv>(
+Base: new (argv: TArgv) => T,
+Field: new (argv: TArgv & { field: Field, fieldNode: FieldNode }) => T,
+Nullable: new (argv: TArgv & { validity: Uint8Array }) => T,
+NullableField: new (argv: TArgv & { validity: Uint8Array, field: Field, 
fieldNode: FieldNode }) => T,
+) {
 
 Review comment:
   Why move the calls to `nullableMixin` and `fieldMixin` from here and out to 
each individual call? Are there some subtle differences in some vectors that 
I'm missing?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249124#comment-16249124
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading 
C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150451049
 
 

 ##
 File path: js/test/integration-tests.ts
 ##
 @@ -0,0 +1,114 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import Arrow from './Arrow';
+import { zip } from 'ix/iterable/zip';
+import { config, formats } from './test-config';
+
+const { Table, readVectors } = Arrow;
+
+expect.extend({
+toEqualVector(v1: any, v2: any) {
+
+const format = (x: any, y: any, msg= ' ') => `${
+this.utils.printExpected(x)}${
+msg}${
+this.utils.printReceived(y)
+}`;
+
+let getFailures = new Array();
+let propsFailures = new Array();
+let iteratorFailures = new Array();
+let allFailures = [
+{ title: 'get', failures: getFailures },
+{ title: 'props', failures: propsFailures },
+{ title: 'iterator', failures: iteratorFailures }
+];
+
+let props = ['name', 'type', 'length', 'nullable', 'nullCount', 
'metadata'];
+for (let i = -1, n = props.length; ++i < n;) {
+const prop = props[i];
+if (this.utils.stringify(v1[prop]) !== 
this.utils.stringify(v2[prop])) {
+propsFailures.push(`${prop}: ${format(v1[prop], v2[prop], ' 
!== ')}`);
+}
+}
+
+for (let i = -1, n = v1.length; ++i < n;) {
+let x1 = v1.get(i), x2 = v2.get(i);
+if (this.utils.stringify(x1) !== this.utils.stringify(x2)) {
+getFailures.push(`${i}: ${format(x1, x2, ' !== ')}`);
+}
+}
+
+let i = -1;
+for (let [x1, x2] of zip(v1, v2)) {
+++i;
+if (this.utils.stringify(x1) !== this.utils.stringify(x2)) {
+iteratorFailures.push(`${i}: ${format(x1, x2, ' !== ')}`);
+}
+}
+
+return {
+pass: allFailures.every(({ failures }) => failures.length === 0),
+message: () => [
+`${v1.name}: (${format('cpp', 'java', ' !== ')})\n`,
+...allFailures.map(({ failures, title }) =>
+!failures.length ? `` : [`${title}:`, 
...failures].join(`\n`))
+].join('\n')
+};
+}
+});
+
+describe(`Integration`, () => {
+for (const format of formats) {
+describe(format, () => {
+for (const [cppArrow, javaArrow] of zip(config.cpp[format], 
config.java[format])) {
+describe(`${cppArrow.name}`, () => {
+testReaderIntegration(cppArrow.buffers, javaArrow.buffers);
+testTableFromBuffersIntegration(cppArrow.buffers, 
javaArrow.buffers);
+});
+}
+});
+}
+});
+
+function testReaderIntegration(cppBuffers: Uint8Array[], javaBuffers: 
Uint8Array[]) {
+test(`cpp and java vectors report the same values`, () => {
+expect.hasAssertions();
+for (const [cppVectors, javaVectors] of zip(readVectors(cppBuffers), 
readVectors(javaBuffers))) {
+expect(cppVectors.length).toEqual(javaVectors.length);
+for (let i = -1, n = cppVectors.length; ++i < n;) {
+(expect(cppVectors[i]) as any).toEqualVector(javaVectors[i]);
+}
+}
+});
+}
+
+function testTableFromBuffersIntegration(cppBuffers: Uint8Array[], 
javaBuffers: Uint8Array[]) {
+test(`cpp and java tables report the same values`, () => {
+expect.hasAssertions();
+const cppTable = Table.from(cppBuffers);
+const javaTable = Table.from(javaBuffers);
+const cppVectors = cppTable.columns;
+const javaVectors = javaTable.columns;
+expect(cppTable.length).toEqual(javaTable.length);
+expect(cppVectors.length).toEqual(javaVectors.length);
+for (let i = -1, n = cppVectors.length; ++i < 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249120#comment-16249120
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading 
C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150420450
 
 

 ##
 File path: js/npm-release.sh
 ##
 @@ -17,10 +17,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-npm run clean
-npm run lint
-npm run build
-npm run test
-npm --no-git-tag-version version patch &>/dev/null
-npm run bundle
-npm run lerna:publish
\ No newline at end of file
+bump=${1:-patch} && echo "semantic-version bump: $bump"
+
+run-s --silent lint build test
+lerna publish --yes --skip-git --cd-version $bump --force-publish=*
 
 Review comment:
   Aside: what we're going to want to do for ASF release purposes: one script 
to produce a tarball of the JS project that is sufficient to publish to NPM 
after. Then a script in the tarball that can publish the project to NPM. So the 
ASF signed artifact that we upload to SVN will have everything that's needed to 
publish the project to NPM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249119#comment-16249119
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading 
C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150448976
 
 

 ##
 File path: js/src/reader/arrow.ts
 ##
 @@ -15,64 +15,135 @@
 // specific language governing permissions and limitations
 // under the License.
 
+import { Vector } from '../vector/vector';
 import { flatbuffers } from 'flatbuffers';
+import { readVector, readValueVector } from './vector';
+import {
+readFileFooter, readFileMessages,
+readStreamSchema, readStreamMessages
+} from './format';
+
+import * as File_ from '../format/File_generated';
 import * as Schema_ from '../format/Schema_generated';
 import * as Message_ from '../format/Message_generated';
-export import Schema = Schema_.org.apache.arrow.flatbuf.Schema;
-export import RecordBatch = Message_.org.apache.arrow.flatbuf.RecordBatch;
-
-import { readFile } from './file';
-import { readStream } from './stream';
-import { readVector } from './vector';
-import { readDictionary } from './dictionary';
-import { Vector, Column } from '../types/types';
 
 import ByteBuffer = flatbuffers.ByteBuffer;
+import Footer = File_.org.apache.arrow.flatbuf.Footer;
 import Field = Schema_.org.apache.arrow.flatbuf.Field;
-export type Dictionaries = { [k: string]: Vector } | null;
-export type IteratorState = { nodeIndex: number; bufferIndex: number };
-
-export function* readRecords(...bytes: ByteBuffer[]) {
-try {
-yield* readFile(...bytes);
-} catch (e) {
-try {
-yield* readStream(...bytes);
-} catch (e) {
-throw new Error('Invalid Arrow buffer');
-}
+import Schema = Schema_.org.apache.arrow.flatbuf.Schema;
+import Message = Message_.org.apache.arrow.flatbuf.Message;
+import RecordBatch = Message_.org.apache.arrow.flatbuf.RecordBatch;
+import MessageHeader = Message_.org.apache.arrow.flatbuf.MessageHeader;
+import DictionaryBatch = Message_.org.apache.arrow.flatbuf.DictionaryBatch;
+import DictionaryEncoding = 
Schema_.org.apache.arrow.flatbuf.DictionaryEncoding;
+
+export type ArrowReaderContext = {
+schema?: Schema;
+footer?: Footer | null;
+dictionaries: Map;
+dictionaryEncodedFields: Map;
+readMessages: (bb: ByteBuffer, footer: Footer) => Iterable;
+};
+
+export type VectorReaderContext = {
+node: number;
+buffer: number;
+offset: number;
+bytes: Uint8Array;
+batch: RecordBatch;
+dictionaries: Map;
+};
+
+export function* readVectors(buffers: Iterable, 
context?: ArrowReaderContext) {
+const context_ = context || {} as ArrowReaderContext;
+for (const buffer of buffers) {
+yield* readBuffer(toByteBuffer(buffer), context_);
 }
 }
 
-export function* readBuffers(...bytes: Array) {
-const dictionaries: Dictionaries = {};
-const byteBuffers = bytes.map(toByteBuffer);
-for (let { schema, batch } of readRecords(...byteBuffers)) {
-let vectors: Column[] = [];
-let state = { nodeIndex: 0, bufferIndex: 0 };
-let fieldsLength = schema.fieldsLength();
-let index = -1, field: Field, vector: Vector;
-if (batch.id) {
-// A dictionary batch only contain a single vector. Traverse each
-// field and its children until we find one that uses this 
dictionary
-while (++index < fieldsLength) {
-if (field = schema.fields(index)!) {
-if (vector = readDictionary(field, batch, state, 
dictionaries)!) {
-dictionaries[batch.id] = dictionaries[batch.id] && 
dictionaries[batch.id].concat(vector) || vector;
-break;
-}
+export async function* readVectorsAsync(buffers: AsyncIterable, context?: ArrowReaderContext) {
+const context_ = context || {} as ArrowReaderContext;
+for await (const buffer of buffers) {
+yield* readBuffer(toByteBuffer(buffer), context_);
+}
+}
+
+function* readBuffer(bb: ByteBuffer, readerContext: ArrowReaderContext) {
 
 Review comment:
   What do you recommend as a resource for getting up to speed on TypeScript? 
Where is the line between TypeScript and ES6? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
>

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249122#comment-16249122
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading 
C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150420366
 
 

 ##
 File path: js/gulp/test-task.js
 ##
 @@ -42,3 +54,78 @@ const testTask = ((cache, execArgv, testOptions) => 
memoizeTask(cache, function
 
 module.exports = testTask;
 module.exports.testTask = testTask;
+module.exports.cleanTestData = cleanTestData;
+module.exports.createTestData = createTestData;
+
+async function cleanTestData() {
+return await del([
+`${path.resolve('./test/arrows/cpp')}/**`,
+`${path.resolve('./test/arrows/java')}/**`,
+]);
+}
+
+async function createTestData() {
+const base = path.resolve('./test/arrows');
+await mkdirp(path.join(base, 'cpp/file'));
+await mkdirp(path.join(base, 'java/file'));
+await mkdirp(path.join(base, 'cpp/stream'));
+await mkdirp(path.join(base, 'java/stream'));
+const errors = [];
+const names = await glob(path.join(base, 'json/*.json'));
+for (let jsonPath of names) {
+const name = path.parse(path.basename(jsonPath)).name;
+const arrowCppFilePath = path.join(base, 'cpp/file', `${name}.arrow`);
+const arrowJavaFilePath = path.join(base, 'java/file', 
`${name}.arrow`);
+const arrowCppStreamPath = path.join(base, 'cpp/stream', 
`${name}.arrow`);
+const arrowJavaStreamPath = path.join(base, 'java/stream', 
`${name}.arrow`);
+try {
+await generateCPPFile(jsonPath, arrowCppFilePath);
+await generateCPPStream(arrowCppFilePath, arrowCppStreamPath);
+} catch (e) { errors.push(e.message); }
+try {
+await generateJavaFile(jsonPath, arrowJavaFilePath);
+await generateJavaStream(arrowJavaFilePath, arrowJavaStreamPath);
+} catch (e) { errors.push(e.message); }
+}
+if (errors.length) {
+console.error(errors.join(`\n`));
+process.exit(1);
+}
+}
+
+async function generateCPPFile(jsonPath, filePath) {
+await rimraf(filePath);
+return await exec(
+`../cpp/build/release/json-integration-test ${
+`--integration --mode=JSON_TO_ARROW`} ${
+`--json=${path.resolve(jsonPath)} --arrow=${filePath}`}`,
+{ maxBuffer: Math.pow(2, 53) - 1 }
+);
+}
+
+async function generateCPPStream(filePath, streamPath) {
+await rimraf(streamPath);
+return await exec(
+`../cpp/build/release/file-to-stream ${filePath} > ${streamPath}`,
 
 Review comment:
   Simple way to make this file path more easily configurable / less 
hard-coded? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249123#comment-16249123
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading 
C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150451049
 
 

 ##
 File path: js/test/integration-tests.ts
 ##
 @@ -0,0 +1,114 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import Arrow from './Arrow';
+import { zip } from 'ix/iterable/zip';
+import { config, formats } from './test-config';
+
+const { Table, readVectors } = Arrow;
+
+expect.extend({
+toEqualVector(v1: any, v2: any) {
+
+const format = (x: any, y: any, msg= ' ') => `${
+this.utils.printExpected(x)}${
+msg}${
+this.utils.printReceived(y)
+}`;
+
+let getFailures = new Array();
+let propsFailures = new Array();
+let iteratorFailures = new Array();
+let allFailures = [
+{ title: 'get', failures: getFailures },
+{ title: 'props', failures: propsFailures },
+{ title: 'iterator', failures: iteratorFailures }
+];
+
+let props = ['name', 'type', 'length', 'nullable', 'nullCount', 
'metadata'];
+for (let i = -1, n = props.length; ++i < n;) {
+const prop = props[i];
+if (this.utils.stringify(v1[prop]) !== 
this.utils.stringify(v2[prop])) {
+propsFailures.push(`${prop}: ${format(v1[prop], v2[prop], ' 
!== ')}`);
+}
+}
+
+for (let i = -1, n = v1.length; ++i < n;) {
+let x1 = v1.get(i), x2 = v2.get(i);
+if (this.utils.stringify(x1) !== this.utils.stringify(x2)) {
+getFailures.push(`${i}: ${format(x1, x2, ' !== ')}`);
+}
+}
+
+let i = -1;
+for (let [x1, x2] of zip(v1, v2)) {
+++i;
+if (this.utils.stringify(x1) !== this.utils.stringify(x2)) {
+iteratorFailures.push(`${i}: ${format(x1, x2, ' !== ')}`);
+}
+}
+
+return {
+pass: allFailures.every(({ failures }) => failures.length === 0),
+message: () => [
+`${v1.name}: (${format('cpp', 'java', ' !== ')})\n`,
+...allFailures.map(({ failures, title }) =>
+!failures.length ? `` : [`${title}:`, 
...failures].join(`\n`))
+].join('\n')
+};
+}
+});
+
+describe(`Integration`, () => {
+for (const format of formats) {
+describe(format, () => {
+for (const [cppArrow, javaArrow] of zip(config.cpp[format], 
config.java[format])) {
+describe(`${cppArrow.name}`, () => {
+testReaderIntegration(cppArrow.buffers, javaArrow.buffers);
+testTableFromBuffersIntegration(cppArrow.buffers, 
javaArrow.buffers);
+});
+}
+});
+}
+});
+
+function testReaderIntegration(cppBuffers: Uint8Array[], javaBuffers: 
Uint8Array[]) {
+test(`cpp and java vectors report the same values`, () => {
+expect.hasAssertions();
+for (const [cppVectors, javaVectors] of zip(readVectors(cppBuffers), 
readVectors(javaBuffers))) {
+expect(cppVectors.length).toEqual(javaVectors.length);
+for (let i = -1, n = cppVectors.length; ++i < n;) {
+(expect(cppVectors[i]) as any).toEqualVector(javaVectors[i]);
+}
+}
+});
+}
+
+function testTableFromBuffersIntegration(cppBuffers: Uint8Array[], 
javaBuffers: Uint8Array[]) {
+test(`cpp and java tables report the same values`, () => {
+expect.hasAssertions();
+const cppTable = Table.from(cppBuffers);
+const javaTable = Table.from(javaBuffers);
+const cppVectors = cppTable.columns;
+const javaVectors = javaTable.columns;
+expect(cppTable.length).toEqual(javaTable.length);
+expect(cppVectors.length).toEqual(javaVectors.length);
+for (let i = -1, n = cppVectors.length; ++i < 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249121#comment-16249121
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading 
C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150450626
 
 

 ##
 File path: js/test/arrows/json/datetime.json
 ##
 @@ -0,0 +1,1091 @@
+{
 
 Review comment:
   I'm not sure about checking in all these .json and .arrow files, is there 
some way we can automate their generation as part of the integration testing? 
Then they don't have to be modified when we expand the integration test suite


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249118#comment-16249118
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading 
C++ dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#discussion_r150420344
 
 

 ##
 File path: js/gulp/test-task.js
 ##
 @@ -42,3 +54,78 @@ const testTask = ((cache, execArgv, testOptions) => 
memoizeTask(cache, function
 
 module.exports = testTask;
 module.exports.testTask = testTask;
+module.exports.cleanTestData = cleanTestData;
+module.exports.createTestData = createTestData;
+
+async function cleanTestData() {
+return await del([
+`${path.resolve('./test/arrows/cpp')}/**`,
+`${path.resolve('./test/arrows/java')}/**`,
+]);
+}
+
+async function createTestData() {
+const base = path.resolve('./test/arrows');
+await mkdirp(path.join(base, 'cpp/file'));
+await mkdirp(path.join(base, 'java/file'));
+await mkdirp(path.join(base, 'cpp/stream'));
+await mkdirp(path.join(base, 'java/stream'));
+const errors = [];
+const names = await glob(path.join(base, 'json/*.json'));
+for (let jsonPath of names) {
+const name = path.parse(path.basename(jsonPath)).name;
+const arrowCppFilePath = path.join(base, 'cpp/file', `${name}.arrow`);
+const arrowJavaFilePath = path.join(base, 'java/file', 
`${name}.arrow`);
+const arrowCppStreamPath = path.join(base, 'cpp/stream', 
`${name}.arrow`);
+const arrowJavaStreamPath = path.join(base, 'java/stream', 
`${name}.arrow`);
+try {
+await generateCPPFile(jsonPath, arrowCppFilePath);
+await generateCPPStream(arrowCppFilePath, arrowCppStreamPath);
+} catch (e) { errors.push(e.message); }
+try {
+await generateJavaFile(jsonPath, arrowJavaFilePath);
+await generateJavaStream(arrowJavaFilePath, arrowJavaStreamPath);
+} catch (e) { errors.push(e.message); }
+}
+if (errors.length) {
+console.error(errors.join(`\n`));
+process.exit(1);
+}
+}
+
+async function generateCPPFile(jsonPath, filePath) {
+await rimraf(filePath);
+return await exec(
+`../cpp/build/release/json-integration-test ${
+`--integration --mode=JSON_TO_ARROW`} ${
+`--json=${path.resolve(jsonPath)} --arrow=${filePath}`}`,
+{ maxBuffer: Math.pow(2, 53) - 1 }
+);
+}
+
+async function generateCPPStream(filePath, streamPath) {
+await rimraf(streamPath);
+return await exec(
+`../cpp/build/release/file-to-stream ${filePath} > ${streamPath}`,
+{ maxBuffer: Math.pow(2, 53) - 1 }
+);
+}
+
+async function generateJavaFile(jsonPath, filePath) {
+await rimraf(filePath);
+return await exec(
+`java -cp 
../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar ${
+`org.apache.arrow.tools.Integration -c JSON_TO_ARROW`} ${
+`-j ${path.resolve(jsonPath)} -a ${filePath}`}`,
+{ maxBuffer: Math.pow(2, 53) - 1 }
+);
+}
+
+async function generateJavaStream(filePath, streamPath) {
+await rimraf(streamPath);
+return await exec(
+`java -cp 
../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar ${
 
 Review comment:
   Can this version number here be gotten from the environment / pom file per 
chance?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 

[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248969#comment-16248969
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294#issuecomment-343760757
 
 
   awesome, thanks @trxcllnt! I'm going to work through the patch to leave any 
comments that jump out but this is really exciting


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248921#comment-16248921
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors (WIP)
URL: https://github.com/apache/arrow/pull/1294#issuecomment-343752026
 
 
   @wesm I believe this branch is ready. We should revalidate datetime and 
dictionary once we can get Pandas to generate a CSV from the test data. The 
[integration 
tests](https://github.com/trxcllnt/arrow/blob/64e318cec96345d71c4c1f08e028a14b5dd3dd3d/js/test/integration-tests.ts#L81)
 are finally passing, so I feel good about this one:
   ```
   Test Suites: 4 passed, 4 total
   Tests:   404 passed, 404 total
   Snapshots:   113940 passed, 113940 total
   ```
   
   The last commit re-enables node_js job in travis so we can verify the above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248920#comment-16248920
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors (WIP)
URL: https://github.com/apache/arrow/pull/1294#issuecomment-343752026
 
 
   @wesm I believe this branch is ready. We should revalidate datetime and 
dictionary once we can get Pandas to generate a CSV from the test data. Even 
the [integration 
tests](https://github.com/trxcllnt/arrow/blob/64e318cec96345d71c4c1f08e028a14b5dd3dd3d/js/test/integration-tests.ts#L81)
 are finally passing, so I feel good about this one:
   ```
   Test Suites: 4 passed, 4 total
   Tests:   404 passed, 404 total
   Snapshots:   113940 passed, 113940 total
   ```
   
   The last commit re-enables node_js job in travis so we can verify the above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246518#comment-16246518
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors (WIP)
URL: https://github.com/apache/arrow/pull/1294#issuecomment-343290645
 
 
   no problem, feel free to keep adding here. I will take a little time to 
review also


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246481#comment-16246481
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors (WIP)
URL: https://github.com/apache/arrow/pull/1294#issuecomment-343284914
 
 
   @wesm yep, working on this tonight. do you mind if I add to this PR vs 
starting a new branch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246435#comment-16246435
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ 
dictionary-encoded vectors (WIP)
URL: https://github.com/apache/arrow/pull/1294#issuecomment-343277677
 
 
   Want to go ahead and ignore the vector layout metadata? Per ARROW-1785. I 
will wait a day or so for further feedback to circulate, then proceed with a 
removal of this metadata. We'll need to update the Flatbuffers files in JS 
again as part of this, I will give you write access on my fork so you can push 
directly to the PR branch as needed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245180#comment-16245180
 ] 

ASF GitHub Bot commented on ARROW-1693:
---

trxcllnt opened a new pull request #1294: WIP ARROW-1693: Fix reading C++ 
dictionary-encoded vectors
URL: https://github.com/apache/arrow/pull/1294
 
 
   This PR adds a workaround for reading the metadata layout for C++ 
dictionary-encoded vectors.
   
   I added tests that validate against the C++/Java integration suite. In order 
to make the new tests pass, I had to update the generated flatbuffers format 
and add a few types the JS version didn't have yet (Bool, Date32, and 
Timestamp). It also uses the new `isDelta` flag on DictionaryBatches to 
determine whether the DictionaryBatch vector should replace or append to the 
existing dictionary.
   
   I also added a script for generating test arrow files from the C++ and Java 
implementations, so we don't break the tests updating the format in the future. 
I saved the generated Arrow files in with the tests because I didn't see a way 
to pipe the JSON test data through the C++/Java json-to-arrow commands without 
writing to a file. If I missed something and we can do it all in-memory, I'd be 
happy to make that change!
   
   This PR is marked WIP because I added an [integration 
test](https://github.com/apache/arrow/commit/6e98874d9f4bfae7758f8f731212ae7ceb3f1321#diff-18c6be12406c482092d4b1f7bd70a8e1R22)
 that validates the JS reader reads C++ and Java files the same way, but 
unfortunately it doesn't. Debugging, I noticed a number of other differences 
between the buffer layout metadata between the C++ and Java versions. If we go 
ahead with @jacques-n [comment in 
ARROW-1693](https://issues.apache.org/jira/browse/ARROW-1693?focusedCommentId=16244812=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16244812)
 and remove/ignore the metadata, this test should pass too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>  Labels: pull-request-available
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-08 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244812#comment-16244812
 ] 

Jacques Nadeau commented on ARROW-1693:
---

I actually think having the vector layout was a mistake. I think we should 
remove it. It is a constant that is defined by the spec. We actually 
implemented an alternative representation internally where we skip inclusion of 
it because we don't want to send around information that is useless (and can be 
fairly substantial when talking about five record, several thousand field 
datasets).

> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-02 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236908#comment-16236908
 ] 

Wes McKinney commented on ARROW-1693:
-

See ARROW-1362

> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-02 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236902#comment-16236902
 ] 

Wes McKinney commented on ARROW-1693:
-

I think the idea of including the buffer layouts was hypothetically to permit 
an implementation to, say, omit the validity bitmap buffer without 
consequences. In practice, both the Java and C++ implementations are presuming 
to be sent the same buffer layout that the emit -- i.e. with the buffers in 
same order (so in the case of strings, you would have validity, offsets, then 
data). But validating our presumptions is useful. So what we probably need to 
do is implement buffer layout validation in both Java and C++ so that we can 
assert that a sender has prepared the buffers in the way that are supported. I 
was wrong to call the JS implementation "brittle" in this regard, really it's 
that the more rigorous checking exposed bugs in the other implementations

> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-11-02 Thread Paul Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236855#comment-16236855
 ] 

Paul Taylor commented on ARROW-1693:


[~wesmckinn] digging into this now, yeah it looks like the the DictionaryBatch 
UTF8Vector fieldNode don't include the offsets buffer. Sounds like I should get 
those integration tests up and running.

I wanna offer some push back on your comment about brittleness though. Maybe 
I'm alone on this, but seems like a cross-platform ipc format should strictly 
enforce its own spec -- anything less and end up with a bunch of 
maybe-compatible implementations, right?

> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 0.8.0
>
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-10-20 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212893#comment-16212893
 ] 

Wes McKinney commented on ARROW-1693:
-

This was fixed in the C++ implementation in ARROW-1363 
https://github.com/apache/arrow/commit/0ced74e1e39587c0ee10ac5979fefbaac97446f5#diff-3ea143b7ffb13757e558952ab1a4e60b

> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files

2017-10-20 Thread Brian Hulette (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212697#comment-16212697
 ] 

Brian Hulette commented on ARROW-1693:
--

Pretty sure this is related to the vector layout representing the index vs. the 
dictionary data

> [JS] Error reading dictionary-encoded integration test files
> 
>
> Key: ARROW-1693
> URL: https://issues.apache.org/jira/browse/ARROW-1693
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Brian Hulette
> Attachments: dictionary-cpp.arrow, dictionary-java.arrow, 
> dictionary.json
>
>
> The JS implementation crashes when reading the dictionary test case from the 
> integration tests.
> To replicate, first generate the test files with java and cpp impls:
> {code}
> $ cd ${ARROW_HOME}/integration/
> $ python -c 'from integration_test import generate_dictionary_case; 
> generate_dictionary_case().write("dictionary.json")'
> $ ../cpp/debug/debug/json-integration-test --integration 
> --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW
> $ java -cp 
> ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow 
> -j dictionary.json
> {code}
> Attempt to read the files with the JS impl:
> {code}
> $ cd ${ARROW_HOME}/js/
> $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow
> {code}
> Both files result in an error for me on 
> [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]:
> {{TypeError: Cannot read property 'buffer' of undefined}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)