[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259723#comment-16259723 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Expand JavaScript implementation, build system, fix integration tests URL: https://github.com/apache/arrow/pull/1294#issuecomment-345803902 Ah cool I missed that. I think we are good then, so I suggest we cut a JS release ASAP to make sure we've got the process down and then we can release again after 0.8.0 final goes out. I'm available this week to help out with this This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259600#comment-16259600 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Expand JavaScript implementation, build system, fix integration tests URL: https://github.com/apache/arrow/pull/1294#issuecomment-345783422 @wesm I added https://github.com/apache/arrow/commit/48111290c2c8169ccefcf04c92afa684f0e8d56d to support reading <= 0.7.1 buffers. I tested on the previous arrow files in the tests, plus a few I generated in pyarrow locally. Is there anything else we need to do on that? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259278#comment-16259278 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Expand JavaScript implementation, build system, fix integration tests URL: https://github.com/apache/arrow/pull/1294#issuecomment-345708786 Sweet thanks! So after we get the release scripts set up, we can release this, but one problem to be aware of is that the library in its current state cannot read 0.7.1 binary data and possibly not 0.8.0 (in its final form) binary data. Hopefully we can get ARROW-1785 sorted out soon This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258887#comment-16258887 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345610464 @wesm alright! now that the integration tests are passing, I added backwards-compatibility for Arrow files written before 0.8, re-enabled the datetime tests, and removed the generated arrow files from the performance tests. Should be good to go pending this last CI build. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258885#comment-16258885 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345610464 @wesm alright! now that the integration tests are passing, I re-enabled the datetime tests and removed the generated arrow files from the performance tests. Should be good to go pending this last CI build. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258799#comment-16258799 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345587677 Totally fine to move the integration tests to jdk8 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258790#comment-16258790 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345585644 @wesm actually this might be tough -- the JS version of closure-compiler is a bit outdated and broken, and the Java version [hasn't supported Java 7 since October](https://github.com/google/closure-compiler/issues/2672). I don't want to skip running the tests on the ES5 UMD bundle, as that's the lowest-common-denominator for anyone wanting to experiment with Arrow in the browser, and the integration tests validate that public methods and properties don't get minified away. Is it possible to update the integration job to openjdk8 (like the java version)? If not, I can create a sibling `integration-java8` job that includes the JS tests. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258789#comment-16258789 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345585644 @wesm actually this might be tough -- the JS version of closure-compiler is a bit outdated, and the Java version [hasn't supported Java 7 since October](https://github.com/google/closure-compiler/issues/2672). I don't want to skip running the tests on the ES5 UMD bundle, as that's the lowest-common-denominator for anyone wanting to experiment with Arrow in the browser, and the integration tests validate that public methods and properties don't get minified away. Is it possible to update the integration job to openjdk8 (like the java version)? If not, I can create a sibling `integration-java8` job that includes the JS tests. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258703#comment-16258703 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345567116 @wesm yeah, looks like Closure Compiler threw an exception building the ES5 UMD target: https://travis-ci.org/apache/arrow/jobs/304499884#L4649. I'm not certain, but it could be related to the integration tests running with JDK7 instead of 8. I'll switch the job to use the JS version of the Closure Compiler which, while slower, won't be affected by Java externalities. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258702#comment-16258702 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345567116 @wesm yeah, looks like Closure Compiler threw an exception building the ES5 UMD target: https://travis-ci.org/apache/arrow/jobs/304499884#L4649. I'm not certain, but it could be related to the integration tests running with JDK7 instead of 8. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258701#comment-16258701 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345566713 There was a non-deterministic Plasma failure in the C++/Python entry, but the integration test entry also failed: ``` [00:23:46] Starting 'test:es5:umd'... FAIL test/table-tests.ts ● Test suite failed to run Cannot find module '/home/travis/build/apache/arrow/js/targets/es5/umd/Arrow' from 'Arrow.ts' at Resolver.resolveModule (node_modules/jest-resolve/build/index.js:191:17) at Object. (test/Arrow.ts:50:17) FAIL test/reader-tests.ts ● Test suite failed to run Cannot find module '/home/travis/build/apache/arrow/js/targets/es5/umd/Arrow' from 'Arrow.ts' at Resolver.resolveModule (node_modules/jest-resolve/build/index.js:191:17) at Object. (test/Arrow.ts:50:17) FAIL test/integration-tests.ts ● Test suite failed to run Cannot find module '/home/travis/build/apache/arrow/js/targets/es5/umd/Arrow' from 'Arrow.ts' at Resolver.resolveModule (node_modules/jest-resolve/build/index.js:191:17) at Object. (test/Arrow.ts:50:17) FAIL test/vector-tests.ts ● Test suite failed to run Cannot find module '/home/travis/build/apache/arrow/js/targets/es5/umd/Arrow' from 'Arrow.ts' at Resolver.resolveModule (node_modules/jest-resolve/build/index.js:191:17) at Object. (test/Arrow.ts:50:17) Test Suites: 4 failed, 4 total Tests: 0 total Snapshots: 0 total Time:2.409s, estimated 24s Ran all test suites. [00:23:49] 'test:es5:umd' errored after 2.64 s [00:23:49] Error: exited with error code: 1 at ChildProcess.onexit (/home/travis/build/apache/arrow/js/node_modules/end-of-stream/index.js:39:36) at ChildProcess.emit (events.js:159:13) at Process.ChildProcess._handle.onexit (internal/child_process.js:209:12) [00:23:49] 'test' errored after 3.42 min npm ERR! Test failed. See above for more details. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258669#comment-16258669 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345557709 The PR tool should squash out those commits, so I don't think it's a problem. I'll let you know if I run into any issues after the build runs This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258665#comment-16258665 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345557542 @wesm awesome, thanks! After my latest commit, the integration tests all pass now for me locally. Are you fine with this PR as-is, or should I close it and do one from a new branch w/o the test data commits? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258642#comment-16258642 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345554173 OK, integration tests should pass now, fingers crossed. I will merge once the build is green, thanks @trxcllnt and @TheNeuralBit for the patience, it is appreciated This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258640#comment-16258640 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345553066 I found the problem -- one of the primitive integration test files was being clobbered and not run, which was suppressing a failure that should have been raised a long time ago. In the meantime, there was also a regression from the Java refactor, and we are no longer able to fully read unsigned integer types anymore. I will hack the integration tests for now and open a JIRA about fixing, here's an example of trying to read a `uint16` vector: ``` 16:49:51.051 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.ratio: 8 Error accessing files Numeric value (65350) out of range of Java short at [Source: /tmp/tmpwgopllpl/generated_primitive.json; line: , column: 18] 16:49:51.065 [main] ERROR org.apache.arrow.tools.Integration - Error accessing files com.fasterxml.jackson.core.JsonParseException: Numeric value (65350) out of range of Java short ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258602#comment-16258602 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345546104 Super weird. I am traveling today so I hope to find some downtime in a little while to look at this, before EOD is the goal This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258588#comment-16258588 ] ASF GitHub Bot commented on ARROW-1693: --- TheNeuralBit commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345543675 I'm seeing this same error locally: ``` com.fasterxml.jackson.core.JsonParseException: Numeric value (50261) out of range of Java short ``` Strangely, `python integration_test.py` runs just fine. I only run into this issue when I use the java integration test directly to generate a test file. My process: - ran `generate_primitive_case(..).write('primitive.json)` from a python shell to get a JSON file - ran `java -cp ${ARROW_HOME}/java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a primitive.java.arrow -j primitive.json` Not sure what is causing this discrepancy, but it seems like the same thing that's affecting @trxcllnt's generator. EDIT: Note I haven't had any issues generating C++ files yet, I'm only seeing the java issue. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258587#comment-16258587 ] ASF GitHub Bot commented on ARROW-1693: --- TheNeuralBit commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345543675 I'm seeing this same error locally: ``` com.fasterxml.jackson.core.JsonParseException: Numeric value (50261) out of range of Java short ``` Strangely, `python integration_test.py` runs just fine. I only run into this issue when I use the java integration test directly to generate a test file. My process: - ran `generate_primitive_case(..).write('primitive.json)` from a python shell to get a JSON file - ran `java -cp ${ARROW_HOME}/java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a primitive.java.arrow -j primitive.json` Not sure what is causing this discrepancy, but it seems like the same thing that's affecting @trxcllnt's generator. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258558#comment-16258558 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345539857 See ``` Error message: Invalid: /home/travis/build/apache/arrow/cpp/src/arrow/ipc/json-integration-test.cc:89 code: reader->ReadRecordBatch(i, ) /home/travis/build/apache/arrow/cpp/src/arrow/ipc/json-internal.cc:1435 code: ReadArray(pool, json_columns[i], type, [i]) /home/travis/build/apache/arrow/cpp/src/arrow/ipc/json-internal.cc:1287 code: ParseTypeValues(*type_) /home/travis/build/apache/arrow/cpp/src/arrow/ipc/json-internal.cc:1055 code: ParseHexValue(hex_data + j * 2, _buffer_data[j]) Encountered non-hex digit Command failed: /home/travis/build/apache/arrow/cpp-build/debug/json-integration-test --integration --mode=JSON_TO_ARROW --json=/home/travis/build/apache/arrow/js/test/data/json/primitive.json --arrow=/home/travis/build/apache/arrow/js/test/data/cpp/file/primitive.arrow ``` It looks like there is something wrong with the JSON files that have been written to that directory. I will take a closer look This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258511#comment-16258511 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345525031 I’ll take a look to see if I can figure it out This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258489#comment-16258489 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345518069 @wesm after rebasing master, I removed the test data and added some lines to auto-generate the files and run the snapshot tests to the integration runner. I'm getting these errors converting the JSON to arrows both locally and in travis: https://travis-ci.org/apache/arrow/jobs/304302317#L4476. It's strange the normal integration tests run and all seem to pass. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Paul Taylor > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258227#comment-16258227 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345472682 Here's what I'm seeing in the diff in the test directory: ``` js/test/Arrow.ts |57 +- js/test/__snapshots__/reader-tests.ts.snap | 497 - js/test/__snapshots__/table-tests.ts.snap| 1815 --- js/test/arrows/cpp/file/datetime.arrow | Bin 0 -> 6490 bytes js/test/arrows/cpp/file/decimal.arrow| Bin 0 -> 259090 bytes js/test/arrows/cpp/file/dictionary.arrow | Bin 0 -> 2562 bytes js/test/arrows/cpp/file/nested.arrow | Bin 0 -> 2218 bytes js/test/arrows/cpp/file/primitive-empty.arrow| Bin 0 -> 9498 bytes js/test/arrows/cpp/file/primitive.arrow | Bin 0 -> 9442 bytes js/test/arrows/cpp/file/simple.arrow | Bin 0 -> 1154 bytes js/test/arrows/cpp/file/struct_example.arrow | Bin 0 -> 1538 bytes js/test/arrows/cpp/stream/datetime.arrow | Bin 0 -> 5076 bytes js/test/arrows/cpp/stream/decimal.arrow | Bin 0 -> 255228 bytes js/test/arrows/cpp/stream/dictionary.arrow | Bin 0 -> 2004 bytes js/test/arrows/cpp/stream/nested.arrow | Bin 0 -> 1636 bytes js/test/arrows/cpp/stream/primitive-empty.arrow | Bin 0 -> 6852 bytes js/test/arrows/cpp/stream/primitive.arrow| Bin 0 -> 7020 bytes js/test/arrows/cpp/stream/simple.arrow | Bin 0 -> 748 bytes js/test/arrows/cpp/stream/struct_example.arrow | Bin 0 -> 1124 bytes js/test/arrows/file/dictionary.arrow | Bin 2522 -> 0 bytes js/test/arrows/file/dictionary2.arrow| Bin 2762 -> 0 bytes js/test/arrows/file/multi_dictionary.arrow | Bin 3482 -> 0 bytes js/test/arrows/file/simple.arrow | Bin 1642 -> 0 bytes js/test/arrows/file/struct.arrow | Bin 2354 -> 0 bytes js/test/arrows/java/file/datetime.arrow | Bin 0 -> 6746 bytes js/test/arrows/java/file/decimal.arrow | Bin 0 -> 259730 bytes js/test/arrows/java/file/dictionary.arrow| Bin 0 -> 2666 bytes js/test/arrows/java/file/nested.arrow| Bin 0 -> 2314 bytes js/test/arrows/java/file/primitive-empty.arrow | Bin 0 -> 9778 bytes js/test/arrows/java/file/primitive.arrow | Bin 0 -> 10034 bytes js/test/arrows/java/file/simple.arrow| Bin 0 -> 1210 bytes js/test/arrows/java/file/struct_example.arrow| Bin 0 -> 1602 bytes js/test/arrows/java/stream/datetime.arrow| Bin 0 -> 5196 bytes js/test/arrows/java/stream/decimal.arrow | Bin 0 -> 255564 bytes js/test/arrows/java/stream/dictionary.arrow | Bin 0 -> 2036 bytes js/test/arrows/java/stream/nested.arrow | Bin 0 -> 1676 bytes js/test/arrows/java/stream/primitive-empty.arrow | Bin 0 -> 6916 bytes js/test/arrows/java/stream/primitive.arrow | Bin 0 -> 7404 bytes js/test/arrows/java/stream/simple.arrow | Bin 0 -> 772 bytes js/test/arrows/java/stream/struct_example.arrow | Bin 0 -> 1148 bytes js/test/arrows/json/datetime.json| 1091 ++ js/test/arrows/json/decimal.json | 33380 +++ js/test/arrows/json/dictionary.json | 424 + js/test/arrows/json/nested.json | 384 + js/test/arrows/json/primitive-empty.json | 1099 ++ js/test/arrows/json/primitive.json | 1788 +++ js/test/arrows/json/simple.json |66 + js/test/arrows/json/struct_example.json | 237 + js/test/arrows/multi/count/records.arrow | Bin 224 -> 0 bytes js/test/arrows/multi/count/schema.arrow | Bin 184 -> 0 bytes js/test/arrows/multi/latlong/records.arrow | Bin 352 -> 0 bytes js/test/arrows/multi/latlong/schema.arrow| Bin 264 -> 0 bytes js/test/arrows/multi/origins/records.arrow | Bin 224 -> 0 bytes js/test/arrows/multi/origins/schema.arrow| Bin 1604 -> 0 bytes js/test/arrows/stream/dictionary.arrow | Bin 1776 -> 0 bytes js/test/arrows/stream/simple.arrow | Bin 1188 -> 0 bytes js/test/arrows/stream/struct.arrow | Bin 1884 -> 0 bytes js/test/integration-tests.ts | 114 + js/test/reader-tests.ts |69 +- js/test/table-tests.ts | 175 +- js/test/test-config.ts
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257649#comment-16257649 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345378644 This doesn't fail for me locally: ``` $ ../cpp/build/debug/json-integration-test --integration --json=/tmp/tmp0jga4tt5/generated_primitive.json --arrow=foo.arrow --mode=JSON_TO_ARROW Found schema: bool_nullable: bool bool_nonnullable: bool not null int8_nullable: int8 int8_nonnullable: int8 not null int16_nullable: int16 int16_nonnullable: int16 not null int32_nullable: int32 int32_nonnullable: int32 not null int64_nullable: int64 int64_nonnullable: int64 not null uint8_nullable: uint8 uint8_nonnullable: uint8 not null uint16_nullable: uint16 uint16_nonnullable: uint16 not null uint32_nullable: uint32 uint32_nonnullable: uint32 not null uint64_nullable: uint64 uint64_nonnullable: uint64 not null float32_nullable: float float32_nonnullable: float not null float64_nullable: double float64_nonnullable: double not null binary_nullable: binary binary_nonnullable: binary not null utf8_nullable: string utf8_nonnullable: string not null ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257644#comment-16257644 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345377333 Sorry that I missed the thing. I will figure out what's going on here: > Then I had to manually edit the "primitive.json" file to remove the "binary_nullable" and "binary_nonnullable" columns, because the C++ command fails if they're present (click to expand) ``` $ ../cpp/build/release/json-integration-test \ --integration --mode=JSON_TO_ARROW \ --json=./test/arrows/json/primitive.json \ --arrow=./test/arrows/cpp/file/primitive.arrow Found schema: bool_nullable: bool bool_nonnullable: bool not null int8_nullable: int8 int8_nonnullable: int8 not null int16_nullable: int16 int16_nonnullable: int16 not null int32_nullable: int32 int32_nonnullable: int32 not null int64_nullable: int64 int64_nonnullable: int64 not null uint8_nullable: uint8 uint8_nonnullable: uint8 not null uint16_nullable: uint16 uint16_nonnullable: uint16 not null uint32_nullable: uint32 uint32_nonnullable: uint32 not null uint64_nullable: uint64 uint64_nonnullable: uint64 not null float32_nullable: float float32_nonnullable: float not null float64_nullable: double float64_nonnullable: double not null binary_nullable: binary binary_nonnullable: binary not null utf8_nullable: string utf8_nonnullable: string not null Error message: Invalid: Encountered non-hex digit ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257587#comment-16257587 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345368042 Sorry I have been dragging my feet because I’m not really on board with checking in data files that can be generated as part of CI. Per Slack conversation it seems there are some roadblocks so I’m available as needed today and tomorrow to get this sorted out This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257568#comment-16257568 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345364755 @wesm I understand you may be busy, so do you mind if I go ahead and merge this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256022#comment-16256022 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345075467 @wesm nope, only thing left to do is the ASF release scripts I think This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256019#comment-16256019 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-345074775 Does anything still need to be done on this branch? If we are still not close to being able to cut a JS release by early next week I will rearrange my priorities to help out This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252920#comment-16252920 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-344481293 That's good for me. As soon as the release scripts are good to go I can conduct the release vote on the mailing list. We can close the vote in less than the usual 72 hours so long as we get 3 PMC votes. So we'll need a quick "here's how to verify the release candidate" blurb to direct people do when we start the release vote This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252903#comment-16252903 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-344477823 @wesm that sounds good to me. In the meantime can we get this PR merged + finish ASF release scripts, and push a new version to npm? I'm at the point where not having the latest on npm is going to be a problem for projects at work soon, @TheNeuralBit may be feeling this too. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252868#comment-16252868 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-344472045 Yes, definitely, we are in agreement. We should push for a JSON reader ASAP -- following the C++ reader as a guide, I do not think it is that big of a project, to be honest, when you consider all the hardship of dealing with parsing JSON in C++, which is a complete non-issue in JavaScript. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252659#comment-16252659 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-344431051 > @wesm This "validate the tests results once" part is where I'm getting lost. How do you know whether anything is correct if you don't write down what you expect to be true? Ah right, in this case the JSON files are the initial source of truth. I compared the snapshots against the Arrow files read via pandas/pyarrow, and it looked correct. After this (assuming stable test data), the snapshots are the source of truth. If we decide to change the test data, then we have to re-validate the snapshots are what we expect them to be. But I want to stress, I'm not against doing it differently. I'm also bandwidth constrained, and snapshots get high coverage with minimal effort. It sounds like the JSON reader should provide all the same benefits as snapshot testing. From that perspective, I see snapshots as a stop-gap until the JS JSON reader is done (unless there's a way we can validate columns with the C++ or Java JSON readers from the JS tests?) With that in mind, I agree it's best not to commit the snapshots to the git history, if we're just going to remove them once the JSON reader is ready. In the interim, I don't mind validating any new JS PR's against my local snapshots, as the volume of JS PR's isn't that high yet. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252557#comment-16252557 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-344420875 > Snapshots are just a different, data-centric way of writing assertions. Give it a lot of test data, validate the tests results once, then compare diffs after that. If you can eyeball test results and know whether it works, then the computer codegens all the dreadful bits about comparing types, values, etc. (even when it's late and you might otherwise forget to test an edge case). This "validate the tests results once" part is where I'm getting lost. How do you know whether anything is correct if you don't write down what you expect to be true? I can help with rallying the troops to write more tests. I am a bit bandwidth constrained at the moment with all the 0.8.0 stuff in progress, but I am hopeful that some others can get involved and this will also help with beating on the API and finding rough edges. cc @leifwalsh @scottdraves This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252166#comment-16252166 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150960731 ## File path: js/npm-release.sh ## @@ -17,10 +17,7 @@ # specific language governing permissions and limitations # under the License. -npm run clean -npm run lint -npm run build -npm run test -npm --no-git-tag-version version patch &>/dev/null -npm run bundle -npm run lerna:publish \ No newline at end of file +bump=${1:-patch} && echo "semantic-version bump: $bump" + +run-s --silent lint build test +lerna publish --yes --skip-git --cd-version $bump --force-publish=* Review comment: @wesm done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251733#comment-16251733 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150897358 ## File path: js/npm-release.sh ## @@ -17,10 +17,7 @@ # specific language governing permissions and limitations # under the License. -npm run clean -npm run lint -npm run build -npm run test -npm --no-git-tag-version version patch &>/dev/null -npm run bundle -npm run lerna:publish \ No newline at end of file +bump=${1:-patch} && echo "semantic-version bump: $bump" + +run-s --silent lint build test +lerna publish --yes --skip-git --cd-version $bump --force-publish=* Review comment: Yes, what's in `js/LICENSE` should go in the top level `LICENSE.txt` at the bottom. Then we can copy that one license into the JS tarball This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250894#comment-16250894 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-344149960 Maybe another way to phrase it is, disk and network are cheap, my/our time is not. ;-) edit: shit, I misread the snapshot count; we have _113,940_ snapshots, not 11,394 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250885#comment-16250885 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-344149960 Maybe another way to phrase it is, disk and network are cheap, my/our time is not. ;-) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250874#comment-16250874 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-344148302 Snapshots are just a different, data-centric way of writing assertions. Give it a lot of test data, validate the tests results once, then compare diffs after that. If you can eyeball test results and know whether it works, then the computer codegens all the dreadful bits about comparing types, values, etc. (even when it's late and you might otherwise forget to test an edge case). > I'm no expert, so there may be things I'm missing -- are some of the test assertions dependent on the flavor of the deployment target? No, the assertions should be identical regardless of the compilation target -- they're generated [once at the beginning](https://travis-ci.org/apache/arrow/jobs/301012418#L1177), then all the targets are compared against the same snapshots. I may have mentioned this before, but they've also helped catching minification bugs. Like before when we did return Long instances, Closure Compiler minified the class name down to something like "zw", so the snapshot tests failed for just the ES5/UMD target. But on the whole I can't argue with your position. All I can say is I'm proably pretty lazy by normal standards, so I try to make my computer do as much of my homework as possible. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250841#comment-16250841 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150736649 ## File path: js/npm-release.sh ## @@ -17,10 +17,7 @@ # specific language governing permissions and limitations # under the License. -npm run clean -npm run lint -npm run build -npm run test -npm --no-git-tag-version version patch &>/dev/null -npm run bundle -npm run lerna:publish \ No newline at end of file +bump=${1:-patch} && echo "semantic-version bump: $bump" + +run-s --silent lint build test +lerna publish --yes --skip-git --cd-version $bump --force-publish=* Review comment: The build also copies extra files from the `js` folder into each of the packages, so we just need to change [this line](https://github.com/apache/arrow/blob/master/js/gulp/util.js#L30) to `['../LICENSE.txt', '../NOTICE.txt', 'README.md']`. Do we also need to add the info in [`js/LICENSE`](https://github.com/apache/arrow/blob/master/js/LICENSE) to the top-level notice.txt? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250810#comment-16250810 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150733769 ## File path: js/npm-release.sh ## @@ -17,10 +17,7 @@ # specific language governing permissions and limitations # under the License. -npm run clean -npm run lint -npm run build -npm run test -npm --no-git-tag-version version patch &>/dev/null -npm run bundle -npm run lerna:publish \ No newline at end of file +bump=${1:-patch} && echo "semantic-version bump: $bump" + +run-s --silent lint build test +lerna publish --yes --skip-git --cd-version $bump --force-publish=* Review comment: Seems reasonable. The other side of this is creating the signed tarball for voting, and making sure the tarball is sufficient for the post-release upload to NPM. We'll need to copy some files from the root directory (like the license and notice files). I can help with this This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250811#comment-16250811 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-344140344 @wesm the Jest docs on snapshot testing highlight its utility testing React components, but it's really just a form of test code generation. The tests evaluate [all combinations](https://github.com/trxcllnt/arrow/blob/generate-js-test-files/js/test/table-tests.ts#L22) of `source lib x arrow format` (in reality: `[c++, java] x [file, stream]`) for each of the generated files (nested, simple, decimal, datetime, primitive, primitive-empty, dictionary, and struct_example), so there are quite a few assertions. Snapshots capture a bit of runtime type info that would otherwise have to be asserted explicitly, for example that calling `uint64Vector.get(i)` returns a `Uint32Array` of two elements: ``` exports[`readBuffers cpp stream primitive reads each batch as an Array of Vectors 167`] = ` Uint32Array [ 12840890, 0, ] `; ``` They're also helpful catching regressions (or comparing against pandas) in `Table.toString()`: ``` exports[`Table cpp file nested toString({ index: true }) prints a pretty Table with an Index column 1`] = ` "Index,list_nullable,struct_nullable 0, null, [null,\\"tmo7qBM\\"] 1, [1685103474], [-583988484,null] 2, [1981297353], [-749108100,\\"yGRfkmw\\"] 3, [-2032422645,-2111456179,-895490422], [820115077,null] 4, null, null 5, [null,-434891054,-864560986], null 6, null, [986507083,\\"U6xvhr7\\"] 7, null, null 8, null,[null,null] 9, null, null 10, [-498865952], null 11, null, [null,\\"ctyWPJf\\"] 12, null,[null,null] 13, [-1076160763,-792439045,-656549144,null], null 14, null, [1234093448,null] 15, [null,null,1882910932], null 16, null, [934007407,\\"9QUyEm5\\"]" `; ``` It also gives reviewers a chance to see what the tests produce, so if `get` on a Uint64Array starts returning a `Long` object instead of a `Uint32Array`, we can flag that in a code review. That said, it sounds like the JSON reader should be able to do most of this validation. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on >
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250803#comment-16250803 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-344140344 @wesm the Jest docs on snapshot testing highlight its utility testing React components, but it's really just a form of test code generation. The tests evaluate [all combinations](https://github.com/trxcllnt/arrow/blob/generate-js-test-files/js/test/table-tests.ts#L22) of `source lib x arrow format` (in reality: `[c++, java] x [file, stream]`) for each of the generated files (nested, simple, decimal, datetime, primitive, primitive-empty, dictionary, and struct_example), so there are quite a few assertions. Snapshots capture a bit of runtime type info that would otherwise have to be asserted explicitly, for example that calling `uint64Vector.get(i)` returns a `Uint32Array` of two elements: ``` exports[`readBuffers cpp stream primitive reads each batch as an Array of Vectors 167`] = ` Uint32Array [ 12840890, 0, ] `; ``` They're also helpful catching regressions (or comparing against pandas) in `Table.toString()`: ``` exports[`Table cpp file nested toString({ index: true }) prints a pretty Table with an Index column 1`] = ` "Index,list_nullable,struct_nullable 0, null, [null,\\"tmo7qBM\\"] 1, [1685103474], [-583988484,null] 2, [1981297353], [-749108100,\\"yGRfkmw\\"] 3, [-2032422645,-2111456179,-895490422], [820115077,null] 4, null, null 5, [null,-434891054,-864560986], null 6, null, [986507083,\\"U6xvhr7\\"] 7, null, null 8, null,[null,null] 9, null, null 10, [-498865952], null 11, null, [null,\\"ctyWPJf\\"] 12, null,[null,null] 13, [-1076160763,-792439045,-656549144,null], null 14, null, [1234093448,null] 15, [null,null,1882910932], null 16, null, [934007407,\\"9QUyEm5\\"]" `; ``` That said, it sounds like the JSON reader should be able to do most of this validation. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250761#comment-16250761 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150728634 ## File path: js/npm-release.sh ## @@ -17,10 +17,7 @@ # specific language governing permissions and limitations # under the License. -npm run clean -npm run lint -npm run build -npm run test -npm --no-git-tag-version version patch &>/dev/null -npm run bundle -npm run lerna:publish \ No newline at end of file +bump=${1:-patch} && echo "semantic-version bump: $bump" + +run-s --silent lint build test +lerna publish --yes --skip-git --cd-version $bump --force-publish=* Review comment: @wesm That makes sense. The way I have things set up, we compile and publish multiple modules to npm: - one [large-ish module](https://www.npmjs.com/package/apache-arrow) that you can get via `npm install apache-arrow` - the rest as smaller/specialized modules under the [`@apache-arrow`](https://www.npmjs.com/org/apache-arrow) [npm organization](https://www.npmjs.com/docs/orgs/), which can be installed via the formula `npm install @apache-arrow/`. For example, `npm install @apache-arrow/es5-cjs` installs the slimmed down ES5/CommonJS target The `npm run build` command compiles all the output targets to the (gitignored) `targets` directory. The `lerna publish --yes --skip-git --cd-version $bump --force-publish=*` command publishes all the targets to npm. So from the sound of it, all we need to do is tar up the `targets` directory with a shell script that installs and runs `lerna publish`, and we're good to go? If so, I can do that tonight. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250732#comment-16250732 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150726475 ## File path: js/gulp/test-task.js ## @@ -42,3 +54,78 @@ const testTask = ((cache, execArgv, testOptions) => memoizeTask(cache, function module.exports = testTask; module.exports.testTask = testTask; +module.exports.cleanTestData = cleanTestData; +module.exports.createTestData = createTestData; + +async function cleanTestData() { +return await del([ +`${path.resolve('./test/arrows/cpp')}/**`, +`${path.resolve('./test/arrows/java')}/**`, +]); +} + +async function createTestData() { +const base = path.resolve('./test/arrows'); +await mkdirp(path.join(base, 'cpp/file')); +await mkdirp(path.join(base, 'java/file')); +await mkdirp(path.join(base, 'cpp/stream')); +await mkdirp(path.join(base, 'java/stream')); +const errors = []; +const names = await glob(path.join(base, 'json/*.json')); +for (let jsonPath of names) { +const name = path.parse(path.basename(jsonPath)).name; +const arrowCppFilePath = path.join(base, 'cpp/file', `${name}.arrow`); +const arrowJavaFilePath = path.join(base, 'java/file', `${name}.arrow`); +const arrowCppStreamPath = path.join(base, 'cpp/stream', `${name}.arrow`); +const arrowJavaStreamPath = path.join(base, 'java/stream', `${name}.arrow`); +try { +await generateCPPFile(jsonPath, arrowCppFilePath); +await generateCPPStream(arrowCppFilePath, arrowCppStreamPath); +} catch (e) { errors.push(e.message); } +try { +await generateJavaFile(jsonPath, arrowJavaFilePath); +await generateJavaStream(arrowJavaFilePath, arrowJavaStreamPath); +} catch (e) { errors.push(e.message); } +} +if (errors.length) { +console.error(errors.join(`\n`)); +process.exit(1); +} +} + +async function generateCPPFile(jsonPath, filePath) { +await rimraf(filePath); +return await exec( +`../cpp/build/release/json-integration-test ${ +`--integration --mode=JSON_TO_ARROW`} ${ +`--json=${path.resolve(jsonPath)} --arrow=${filePath}`}`, +{ maxBuffer: Math.pow(2, 53) - 1 } +); +} + +async function generateCPPStream(filePath, streamPath) { +await rimraf(streamPath); +return await exec( +`../cpp/build/release/file-to-stream ${filePath} > ${streamPath}`, +{ maxBuffer: Math.pow(2, 53) - 1 } +); +} + +async function generateJavaFile(jsonPath, filePath) { +await rimraf(filePath); +return await exec( +`java -cp ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar ${ +`org.apache.arrow.tools.Integration -c JSON_TO_ARROW`} ${ +`-j ${path.resolve(jsonPath)} -a ${filePath}`}`, +{ maxBuffer: Math.pow(2, 53) - 1 } +); +} + +async function generateJavaStream(filePath, streamPath) { +await rimraf(streamPath); +return await exec( +`java -cp ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar ${ Review comment: I included this in my [response below](https://github.com/apache/arrow/pull/1294#discussion_r150721453) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp >
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250733#comment-16250733 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150726479 ## File path: js/gulp/test-task.js ## @@ -42,3 +54,78 @@ const testTask = ((cache, execArgv, testOptions) => memoizeTask(cache, function module.exports = testTask; module.exports.testTask = testTask; +module.exports.cleanTestData = cleanTestData; +module.exports.createTestData = createTestData; + +async function cleanTestData() { +return await del([ +`${path.resolve('./test/arrows/cpp')}/**`, +`${path.resolve('./test/arrows/java')}/**`, +]); +} + +async function createTestData() { +const base = path.resolve('./test/arrows'); +await mkdirp(path.join(base, 'cpp/file')); +await mkdirp(path.join(base, 'java/file')); +await mkdirp(path.join(base, 'cpp/stream')); +await mkdirp(path.join(base, 'java/stream')); +const errors = []; +const names = await glob(path.join(base, 'json/*.json')); +for (let jsonPath of names) { +const name = path.parse(path.basename(jsonPath)).name; +const arrowCppFilePath = path.join(base, 'cpp/file', `${name}.arrow`); +const arrowJavaFilePath = path.join(base, 'java/file', `${name}.arrow`); +const arrowCppStreamPath = path.join(base, 'cpp/stream', `${name}.arrow`); +const arrowJavaStreamPath = path.join(base, 'java/stream', `${name}.arrow`); +try { +await generateCPPFile(jsonPath, arrowCppFilePath); +await generateCPPStream(arrowCppFilePath, arrowCppStreamPath); +} catch (e) { errors.push(e.message); } +try { +await generateJavaFile(jsonPath, arrowJavaFilePath); +await generateJavaStream(arrowJavaFilePath, arrowJavaStreamPath); +} catch (e) { errors.push(e.message); } +} +if (errors.length) { +console.error(errors.join(`\n`)); +process.exit(1); +} +} + +async function generateCPPFile(jsonPath, filePath) { +await rimraf(filePath); +return await exec( +`../cpp/build/release/json-integration-test ${ +`--integration --mode=JSON_TO_ARROW`} ${ +`--json=${path.resolve(jsonPath)} --arrow=${filePath}`}`, +{ maxBuffer: Math.pow(2, 53) - 1 } +); +} + +async function generateCPPStream(filePath, streamPath) { +await rimraf(streamPath); +return await exec( +`../cpp/build/release/file-to-stream ${filePath} > ${streamPath}`, Review comment: I included this in my [response below](https://github.com/apache/arrow/pull/1294#discussion_r150721453) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250730#comment-16250730 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-344130863 > I'm a bit torn here. On the one hand, I don't want to check in 21mb worth of tests to source control. On the other hand, I don't want to hand-write the 11k assertions that the snapshot tests represent (and would also presumably be many-MBs worth of tests anyway). > I believe git compresses files across the network? And if space-on-disk is an issue, I could add a post-clone script to automatically compress the snapshot files after checkout (about 3mb gzipped). Jest doesn't work with compressed snapshot files out of the box, but I could add some steps to the test runner to decompress the snapshots before running. I guess I'm not quite understanding what snapshot tests accomplish here that normal array comparisons would not. In Java and C++ we have functions that compare the contents of arrays. So when you say hand-writing the snapshot test assertions, what's being tested and why is that the only way to test that behavior? Is there a concern that a programmatic comparison like https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/json-integration-test.cc#L180 might not be as strong of an assertion as a UI-based test (what the values from the arrays would actually appear as in the DOM)? Having the possibility of a single PR bloating the git history by whatever the snap files gzip down to doesn't seem like a good idea. Even having large diffs as the result of automatically generated files on commit isn't ideal This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250723#comment-16250723 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150725447 ## File path: js/src/vector/arrow.ts ## @@ -0,0 +1,245 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +import * as Schema_ from '../format/Schema_generated'; +import * as Message_ from '../format/Message_generated'; +import Field = Schema_.org.apache.arrow.flatbuf.Field; +import FieldNode = Message_.org.apache.arrow.flatbuf.FieldNode; + +import { Vector } from './vector'; +import { Utf8Vector as Utf8VectorBase } from './utf8'; +import { StructVector as StructVectorBase } from './struct'; +import { DictionaryVector as DictionaryVectorBase } from './dictionary'; +import { +ListVector as ListVectorBase, +BinaryVector as BinaryVectorBase, +FixedSizeListVector as FixedSizeListVectorBase +} from './list'; + +import { +BoolVector as BoolVectorBase, +Int8Vector as Int8VectorBase, +Int16Vector as Int16VectorBase, +Int32Vector as Int32VectorBase, +Int64Vector as Int64VectorBase, +Uint8Vector as Uint8VectorBase, +Uint16Vector as Uint16VectorBase, +Uint32Vector as Uint32VectorBase, +Uint64Vector as Uint64VectorBase, +Float16Vector as Float16VectorBase, +Float32Vector as Float32VectorBase, +Float64Vector as Float64VectorBase, +Date32Vector as Date32VectorBase, +Date64Vector as Date64VectorBase, +Time32Vector as Time32VectorBase, +Time64Vector as Time64VectorBase, +DecimalVector as DecimalVectorBase, +TimestampVector as TimestampVectorBase, +} from './numeric'; + +import { nullableMixin, fieldMixin } from './traits'; + +function MixinArrowTraits, TArgv>( +Base: new (argv: TArgv) => T, +Field: new (argv: TArgv & { field: Field, fieldNode: FieldNode }) => T, +Nullable: new (argv: TArgv & { validity: Uint8Array }) => T, +NullableField: new (argv: TArgv & { validity: Uint8Array, field: Field, fieldNode: FieldNode }) => T, +) { Review comment: @TheNeuralBit but if we do want to do more compilation steps beyond what the TS compiler does, it'd be neat to also run [prepack on the flatbuffers generated code](https://gist.github.com/trxcllnt/84bb4893b6db957925ed7625fd0f34e5) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250718#comment-16250718 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150725447 ## File path: js/src/vector/arrow.ts ## @@ -0,0 +1,245 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +import * as Schema_ from '../format/Schema_generated'; +import * as Message_ from '../format/Message_generated'; +import Field = Schema_.org.apache.arrow.flatbuf.Field; +import FieldNode = Message_.org.apache.arrow.flatbuf.FieldNode; + +import { Vector } from './vector'; +import { Utf8Vector as Utf8VectorBase } from './utf8'; +import { StructVector as StructVectorBase } from './struct'; +import { DictionaryVector as DictionaryVectorBase } from './dictionary'; +import { +ListVector as ListVectorBase, +BinaryVector as BinaryVectorBase, +FixedSizeListVector as FixedSizeListVectorBase +} from './list'; + +import { +BoolVector as BoolVectorBase, +Int8Vector as Int8VectorBase, +Int16Vector as Int16VectorBase, +Int32Vector as Int32VectorBase, +Int64Vector as Int64VectorBase, +Uint8Vector as Uint8VectorBase, +Uint16Vector as Uint16VectorBase, +Uint32Vector as Uint32VectorBase, +Uint64Vector as Uint64VectorBase, +Float16Vector as Float16VectorBase, +Float32Vector as Float32VectorBase, +Float64Vector as Float64VectorBase, +Date32Vector as Date32VectorBase, +Date64Vector as Date64VectorBase, +Time32Vector as Time32VectorBase, +Time64Vector as Time64VectorBase, +DecimalVector as DecimalVectorBase, +TimestampVector as TimestampVectorBase, +} from './numeric'; + +import { nullableMixin, fieldMixin } from './traits'; + +function MixinArrowTraits, TArgv>( +Base: new (argv: TArgv) => T, +Field: new (argv: TArgv & { field: Field, fieldNode: FieldNode }) => T, +Nullable: new (argv: TArgv & { validity: Uint8Array }) => T, +NullableField: new (argv: TArgv & { validity: Uint8Array, field: Field, fieldNode: FieldNode }) => T, +) { Review comment: @TheNeuralBit but if we do want to do more compilation steps beyond what the TS compiler does, it'd be neat to also run [preval on the flatbuffers generated code](https://gist.github.com/trxcllnt/84bb4893b6db957925ed7625fd0f34e5) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250695#comment-16250695 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150723322 ## File path: js/src/vector/arrow.ts ## @@ -0,0 +1,245 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +import * as Schema_ from '../format/Schema_generated'; +import * as Message_ from '../format/Message_generated'; +import Field = Schema_.org.apache.arrow.flatbuf.Field; +import FieldNode = Message_.org.apache.arrow.flatbuf.FieldNode; + +import { Vector } from './vector'; +import { Utf8Vector as Utf8VectorBase } from './utf8'; +import { StructVector as StructVectorBase } from './struct'; +import { DictionaryVector as DictionaryVectorBase } from './dictionary'; +import { +ListVector as ListVectorBase, +BinaryVector as BinaryVectorBase, +FixedSizeListVector as FixedSizeListVectorBase +} from './list'; + +import { +BoolVector as BoolVectorBase, +Int8Vector as Int8VectorBase, +Int16Vector as Int16VectorBase, +Int32Vector as Int32VectorBase, +Int64Vector as Int64VectorBase, +Uint8Vector as Uint8VectorBase, +Uint16Vector as Uint16VectorBase, +Uint32Vector as Uint32VectorBase, +Uint64Vector as Uint64VectorBase, +Float16Vector as Float16VectorBase, +Float32Vector as Float32VectorBase, +Float64Vector as Float64VectorBase, +Date32Vector as Date32VectorBase, +Date64Vector as Date64VectorBase, +Time32Vector as Time32VectorBase, +Time64Vector as Time64VectorBase, +DecimalVector as DecimalVectorBase, +TimestampVector as TimestampVectorBase, +} from './numeric'; + +import { nullableMixin, fieldMixin } from './traits'; + +function MixinArrowTraits, TArgv>( +Base: new (argv: TArgv) => T, +Field: new (argv: TArgv & { field: Field, fieldNode: FieldNode }) => T, +Nullable: new (argv: TArgv & { validity: Uint8Array }) => T, +NullableField: new (argv: TArgv & { validity: Uint8Array, field: Field, fieldNode: FieldNode }) => T, +) { Review comment: @TheNeuralBit yeah we could use [babel-codegen](https://github.com/kentcdodds/babel-plugin-codegen), [babel-preval](https://github.com/kentcdodds/babel-plugin-preval), or [babel-macros](https://github.com/kentcdodds/babel-macros) if we want. I was hoping to avoid babel if possible, but since we're webpacking the es2015+ UMD bundles anyway, it wouldn't be too much of a headache. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250682#comment-16250682 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150721453 ## File path: js/test/arrows/json/datetime.json ## @@ -0,0 +1,1091 @@ +{ Review comment: @wesm yes, that would be ideal. I generated the JSON from the python integration tests like this (click to expand) ```python from integration_test import generate_nested_case from integration_test import generate_decimal_case from integration_test import generate_datetime_case from integration_test import generate_primitive_case from integration_test import generate_dictionary_case generate_nested_case().write("../js/test/arrows/json/nested.json") generate_decimal_case().write("../js/test/arrows/json/decimal.json") generate_datetime_case().write("../js/test/arrows/json/datetime.json") generate_dictionary_case().write("../js/test/arrows/json/dictionary.json") generate_primitive_case([7, 10]).write("../js/test/arrows/json/primitive.json") generate_primitive_case([0, 0, 0]).write("../js/test/arrows/json/primitive-empty.json") ``` Then I had to manually edit the "primitive.json" file to remove the "binary_nullable" and "binary_nonnullable" columns, because the C++ command fails if they're present (click to expand) ```sh $ ../cpp/build/release/json-integration-test \ --integration --mode=JSON_TO_ARROW \ --json=./test/arrows/json/primitive.json \ --arrow=./test/arrows/cpp/file/primitive.arrow Found schema: bool_nullable: bool bool_nonnullable: bool not null int8_nullable: int8 int8_nonnullable: int8 not null int16_nullable: int16 int16_nonnullable: int16 not null int32_nullable: int32 int32_nonnullable: int32 not null int64_nullable: int64 int64_nonnullable: int64 not null uint8_nullable: uint8 uint8_nonnullable: uint8 not null uint16_nullable: uint16 uint16_nonnullable: uint16 not null uint32_nullable: uint32 uint32_nonnullable: uint32 not null uint64_nullable: uint64 uint64_nonnullable: uint64 not null float32_nullable: float float32_nonnullable: float not null float64_nullable: double float64_nonnullable: double not null binary_nullable: binary binary_nonnullable: binary not null utf8_nullable: string utf8_nonnullable: string not null Error message: Invalid: Encountered non-hex digit ``` The unit tests rely heavily on [snapshot testing](https://facebook.github.io/jest/docs/en/snapshot-testing.html) to validate the actual values in the vectors. I manually validated the data in the snapshots against the buffers using pyarrow and pandas, but that approach won't scale. Typically the snapshot files get checked into version control, but now that we have 11k snapshots, the snapshot files are around 21mb. I removed them from the repo b/c we don't want huge files. Now the CI server generates the snapshots once up front, then validates the compilation targets against those. This will catch any cases where compiling the JS to different targets leads to failures (e.g. if the minifiers mangle names they weren't supposed to), but since we're not checking in the snapshot files, the CI server won't be able to tell us if a new PR causes a snapshot test to break. We _can_ know that if we run the tests locally, but we can't rely us running the tests for each PR locally before merging. I'm a bit torn here. On the one hand, I don't want to check in 21mb worth of tests to source control. On the other hand, I don't want to hand-write the 11k assertions that the snapshot tests represent (and would also presumably be many-MBs worth of tests anyway). I believe git compresses files across the network? And if space-on-disk is an issue, I could add a post-clone script to automatically compress the snapshot files after checkout (about 3mb gzipped). Jest doesn't work with compressed snapshot files out of the box, but I could add some steps to the test runner to decompress the snapshots before running. To your point about using the C++/Java writers to convert the JSON to Arrow buffers on the fly, we should 100% do that. This PR is marginally better since we can at least regenerate the arrow files easily enough, but ideally we don't have them at all and we can pipe them to the node process on the fly, or at a minimum, write to files then clean up after. We'll want a mode for local dev that skips this step, as incurring the JVM overhead to convert JSON to Arrow files is painful for debugging. I left the code in there (commented out) to draw attention to this
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250225#comment-16250225 ] ASF GitHub Bot commented on ARROW-1693: --- TheNeuralBit commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150661882 ## File path: js/src/vector/arrow.ts ## @@ -0,0 +1,245 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +import * as Schema_ from '../format/Schema_generated'; +import * as Message_ from '../format/Message_generated'; +import Field = Schema_.org.apache.arrow.flatbuf.Field; +import FieldNode = Message_.org.apache.arrow.flatbuf.FieldNode; + +import { Vector } from './vector'; +import { Utf8Vector as Utf8VectorBase } from './utf8'; +import { StructVector as StructVectorBase } from './struct'; +import { DictionaryVector as DictionaryVectorBase } from './dictionary'; +import { +ListVector as ListVectorBase, +BinaryVector as BinaryVectorBase, +FixedSizeListVector as FixedSizeListVectorBase +} from './list'; + +import { +BoolVector as BoolVectorBase, +Int8Vector as Int8VectorBase, +Int16Vector as Int16VectorBase, +Int32Vector as Int32VectorBase, +Int64Vector as Int64VectorBase, +Uint8Vector as Uint8VectorBase, +Uint16Vector as Uint16VectorBase, +Uint32Vector as Uint32VectorBase, +Uint64Vector as Uint64VectorBase, +Float16Vector as Float16VectorBase, +Float32Vector as Float32VectorBase, +Float64Vector as Float64VectorBase, +Date32Vector as Date32VectorBase, +Date64Vector as Date64VectorBase, +Time32Vector as Time32VectorBase, +Time64Vector as Time64VectorBase, +DecimalVector as DecimalVectorBase, +TimestampVector as TimestampVectorBase, +} from './numeric'; + +import { nullableMixin, fieldMixin } from './traits'; + +function MixinArrowTraits, TArgv>( +Base: new (argv: TArgv) => T, +Field: new (argv: TArgv & { field: Field, fieldNode: FieldNode }) => T, +Nullable: new (argv: TArgv & { validity: Uint8Array }) => T, +NullableField: new (argv: TArgv & { validity: Uint8Array, field: Field, fieldNode: FieldNode }) => T, +) { Review comment: Ah makes sense, I figured there must be a good reason. This seems like a great application for some kind of preprocessor with macros or a code generator... but I don't know of any that would integrate well with JS build/dev tools This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250171#comment-16250171 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150653618 ## File path: js/src/vector/arrow.ts ## @@ -0,0 +1,245 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +import * as Schema_ from '../format/Schema_generated'; +import * as Message_ from '../format/Message_generated'; +import Field = Schema_.org.apache.arrow.flatbuf.Field; +import FieldNode = Message_.org.apache.arrow.flatbuf.FieldNode; + +import { Vector } from './vector'; +import { Utf8Vector as Utf8VectorBase } from './utf8'; +import { StructVector as StructVectorBase } from './struct'; +import { DictionaryVector as DictionaryVectorBase } from './dictionary'; +import { +ListVector as ListVectorBase, +BinaryVector as BinaryVectorBase, +FixedSizeListVector as FixedSizeListVectorBase +} from './list'; + +import { +BoolVector as BoolVectorBase, +Int8Vector as Int8VectorBase, +Int16Vector as Int16VectorBase, +Int32Vector as Int32VectorBase, +Int64Vector as Int64VectorBase, +Uint8Vector as Uint8VectorBase, +Uint16Vector as Uint16VectorBase, +Uint32Vector as Uint32VectorBase, +Uint64Vector as Uint64VectorBase, +Float16Vector as Float16VectorBase, +Float32Vector as Float32VectorBase, +Float64Vector as Float64VectorBase, +Date32Vector as Date32VectorBase, +Date64Vector as Date64VectorBase, +Time32Vector as Time32VectorBase, +Time64Vector as Time64VectorBase, +DecimalVector as DecimalVectorBase, +TimestampVector as TimestampVectorBase, +} from './numeric'; + +import { nullableMixin, fieldMixin } from './traits'; + +function MixinArrowTraits, TArgv>( +Base: new (argv: TArgv) => T, +Field: new (argv: TArgv & { field: Field, fieldNode: FieldNode }) => T, +Nullable: new (argv: TArgv & { validity: Uint8Array }) => T, +NullableField: new (argv: TArgv & { validity: Uint8Array, field: Field, fieldNode: FieldNode }) => T, +) { Review comment: @TheNeuralBit yeah so the ES6 class spec states that the `name` property of a class constructor is immutable (and they're also not allowed to be computed properties, like `let x = 'myClass'; class [x] extends Foo {}`). And anonymous class names default to "Object", reading as `Object { data: Int32Array }` instead of `Int32Vector` when debugging. While this is ugly and hard to scale if we want to add more mixin behaviors, I figure it's a win for anyone using the library in the real world to see descriptive class names. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp >
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250159#comment-16250159 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150652141 ## File path: js/src/reader/arrow.ts ## @@ -15,64 +15,135 @@ // specific language governing permissions and limitations // under the License. +import { Vector } from '../vector/vector'; import { flatbuffers } from 'flatbuffers'; +import { readVector, readValueVector } from './vector'; +import { +readFileFooter, readFileMessages, +readStreamSchema, readStreamMessages +} from './format'; + +import * as File_ from '../format/File_generated'; import * as Schema_ from '../format/Schema_generated'; import * as Message_ from '../format/Message_generated'; -export import Schema = Schema_.org.apache.arrow.flatbuf.Schema; -export import RecordBatch = Message_.org.apache.arrow.flatbuf.RecordBatch; - -import { readFile } from './file'; -import { readStream } from './stream'; -import { readVector } from './vector'; -import { readDictionary } from './dictionary'; -import { Vector, Column } from '../types/types'; import ByteBuffer = flatbuffers.ByteBuffer; +import Footer = File_.org.apache.arrow.flatbuf.Footer; import Field = Schema_.org.apache.arrow.flatbuf.Field; -export type Dictionaries = { [k: string]: Vector } | null; -export type IteratorState = { nodeIndex: number; bufferIndex: number }; - -export function* readRecords(...bytes: ByteBuffer[]) { -try { -yield* readFile(...bytes); -} catch (e) { -try { -yield* readStream(...bytes); -} catch (e) { -throw new Error('Invalid Arrow buffer'); -} +import Schema = Schema_.org.apache.arrow.flatbuf.Schema; +import Message = Message_.org.apache.arrow.flatbuf.Message; +import RecordBatch = Message_.org.apache.arrow.flatbuf.RecordBatch; +import MessageHeader = Message_.org.apache.arrow.flatbuf.MessageHeader; +import DictionaryBatch = Message_.org.apache.arrow.flatbuf.DictionaryBatch; +import DictionaryEncoding = Schema_.org.apache.arrow.flatbuf.DictionaryEncoding; + +export type ArrowReaderContext = { +schema?: Schema; +footer?: Footer | null; +dictionaries: Map; +dictionaryEncodedFields: Map ; +readMessages: (bb: ByteBuffer, footer: Footer) => Iterable; +}; + +export type VectorReaderContext = { +node: number; +buffer: number; +offset: number; +bytes: Uint8Array; +batch: RecordBatch; +dictionaries: Map ; +}; + +export function* readVectors(buffers: Iterable, context?: ArrowReaderContext) { +const context_ = context || {} as ArrowReaderContext; +for (const buffer of buffers) { +yield* readBuffer(toByteBuffer(buffer), context_); } } -export function* readBuffers(...bytes: Array) { -const dictionaries: Dictionaries = {}; -const byteBuffers = bytes.map(toByteBuffer); -for (let { schema, batch } of readRecords(...byteBuffers)) { -let vectors: Column[] = []; -let state = { nodeIndex: 0, bufferIndex: 0 }; -let fieldsLength = schema.fieldsLength(); -let index = -1, field: Field, vector: Vector; -if (batch.id) { -// A dictionary batch only contain a single vector. Traverse each -// field and its children until we find one that uses this dictionary -while (++index < fieldsLength) { -if (field = schema.fields(index)!) { -if (vector = readDictionary(field, batch, state, dictionaries)!) { -dictionaries[batch.id] = dictionaries[batch.id] && dictionaries[batch.id].concat(vector) || vector; -break; -} +export async function* readVectorsAsync(buffers: AsyncIterable, context?: ArrowReaderContext) { +const context_ = context || {} as ArrowReaderContext; +for await (const buffer of buffers) { +yield* readBuffer(toByteBuffer(buffer), context_); +} +} + +function* readBuffer(bb: ByteBuffer, readerContext: ArrowReaderContext) { Review comment: @wesm anything type-related (type annotations, interfaces, generics, and declarations like `type = { foo: string }`) is TypeScript, the rest is ES. If you're curious about an individual feature, you can run the build (`npm run build`) and compare the transpiled output (in the `targets` directory) with the TS source. We transpile to multiple JS versions and module formats, but it's probably easiest to compare against the `targets/es2015/esm` or `targets/esnext/esm`. TS code-gens/polyfills missing features depending on the target environment. For example, ES5 doesn't have generators, so TS
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250162#comment-16250162 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150652141 ## File path: js/src/reader/arrow.ts ## @@ -15,64 +15,135 @@ // specific language governing permissions and limitations // under the License. +import { Vector } from '../vector/vector'; import { flatbuffers } from 'flatbuffers'; +import { readVector, readValueVector } from './vector'; +import { +readFileFooter, readFileMessages, +readStreamSchema, readStreamMessages +} from './format'; + +import * as File_ from '../format/File_generated'; import * as Schema_ from '../format/Schema_generated'; import * as Message_ from '../format/Message_generated'; -export import Schema = Schema_.org.apache.arrow.flatbuf.Schema; -export import RecordBatch = Message_.org.apache.arrow.flatbuf.RecordBatch; - -import { readFile } from './file'; -import { readStream } from './stream'; -import { readVector } from './vector'; -import { readDictionary } from './dictionary'; -import { Vector, Column } from '../types/types'; import ByteBuffer = flatbuffers.ByteBuffer; +import Footer = File_.org.apache.arrow.flatbuf.Footer; import Field = Schema_.org.apache.arrow.flatbuf.Field; -export type Dictionaries = { [k: string]: Vector } | null; -export type IteratorState = { nodeIndex: number; bufferIndex: number }; - -export function* readRecords(...bytes: ByteBuffer[]) { -try { -yield* readFile(...bytes); -} catch (e) { -try { -yield* readStream(...bytes); -} catch (e) { -throw new Error('Invalid Arrow buffer'); -} +import Schema = Schema_.org.apache.arrow.flatbuf.Schema; +import Message = Message_.org.apache.arrow.flatbuf.Message; +import RecordBatch = Message_.org.apache.arrow.flatbuf.RecordBatch; +import MessageHeader = Message_.org.apache.arrow.flatbuf.MessageHeader; +import DictionaryBatch = Message_.org.apache.arrow.flatbuf.DictionaryBatch; +import DictionaryEncoding = Schema_.org.apache.arrow.flatbuf.DictionaryEncoding; + +export type ArrowReaderContext = { +schema?: Schema; +footer?: Footer | null; +dictionaries: Map; +dictionaryEncodedFields: Map ; +readMessages: (bb: ByteBuffer, footer: Footer) => Iterable; +}; + +export type VectorReaderContext = { +node: number; +buffer: number; +offset: number; +bytes: Uint8Array; +batch: RecordBatch; +dictionaries: Map ; +}; + +export function* readVectors(buffers: Iterable, context?: ArrowReaderContext) { +const context_ = context || {} as ArrowReaderContext; +for (const buffer of buffers) { +yield* readBuffer(toByteBuffer(buffer), context_); } } -export function* readBuffers(...bytes: Array) { -const dictionaries: Dictionaries = {}; -const byteBuffers = bytes.map(toByteBuffer); -for (let { schema, batch } of readRecords(...byteBuffers)) { -let vectors: Column[] = []; -let state = { nodeIndex: 0, bufferIndex: 0 }; -let fieldsLength = schema.fieldsLength(); -let index = -1, field: Field, vector: Vector; -if (batch.id) { -// A dictionary batch only contain a single vector. Traverse each -// field and its children until we find one that uses this dictionary -while (++index < fieldsLength) { -if (field = schema.fields(index)!) { -if (vector = readDictionary(field, batch, state, dictionaries)!) { -dictionaries[batch.id] = dictionaries[batch.id] && dictionaries[batch.id].concat(vector) || vector; -break; -} +export async function* readVectorsAsync(buffers: AsyncIterable, context?: ArrowReaderContext) { +const context_ = context || {} as ArrowReaderContext; +for await (const buffer of buffers) { +yield* readBuffer(toByteBuffer(buffer), context_); +} +} + +function* readBuffer(bb: ByteBuffer, readerContext: ArrowReaderContext) { Review comment: @wesm anything type-related (type annotations, interfaces, generics, and declarations like `type = { foo: string }`) is TypeScript, the rest is ES. If you're curious about an individual feature, you can run the build (`npm run build`) and compare the transpiled output (in the `targets` directory) with the TS source. We transpile to multiple JS versions and module formats, but it's probably easiest to compare against the `targets/es2015/esm` or `targets/esnext/esm`. TS code-gens/polyfills missing features depending on the target environment. For example, ES5 doesn't have generators, so TS
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250158#comment-16250158 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150652141 ## File path: js/src/reader/arrow.ts ## @@ -15,64 +15,135 @@ // specific language governing permissions and limitations // under the License. +import { Vector } from '../vector/vector'; import { flatbuffers } from 'flatbuffers'; +import { readVector, readValueVector } from './vector'; +import { +readFileFooter, readFileMessages, +readStreamSchema, readStreamMessages +} from './format'; + +import * as File_ from '../format/File_generated'; import * as Schema_ from '../format/Schema_generated'; import * as Message_ from '../format/Message_generated'; -export import Schema = Schema_.org.apache.arrow.flatbuf.Schema; -export import RecordBatch = Message_.org.apache.arrow.flatbuf.RecordBatch; - -import { readFile } from './file'; -import { readStream } from './stream'; -import { readVector } from './vector'; -import { readDictionary } from './dictionary'; -import { Vector, Column } from '../types/types'; import ByteBuffer = flatbuffers.ByteBuffer; +import Footer = File_.org.apache.arrow.flatbuf.Footer; import Field = Schema_.org.apache.arrow.flatbuf.Field; -export type Dictionaries = { [k: string]: Vector } | null; -export type IteratorState = { nodeIndex: number; bufferIndex: number }; - -export function* readRecords(...bytes: ByteBuffer[]) { -try { -yield* readFile(...bytes); -} catch (e) { -try { -yield* readStream(...bytes); -} catch (e) { -throw new Error('Invalid Arrow buffer'); -} +import Schema = Schema_.org.apache.arrow.flatbuf.Schema; +import Message = Message_.org.apache.arrow.flatbuf.Message; +import RecordBatch = Message_.org.apache.arrow.flatbuf.RecordBatch; +import MessageHeader = Message_.org.apache.arrow.flatbuf.MessageHeader; +import DictionaryBatch = Message_.org.apache.arrow.flatbuf.DictionaryBatch; +import DictionaryEncoding = Schema_.org.apache.arrow.flatbuf.DictionaryEncoding; + +export type ArrowReaderContext = { +schema?: Schema; +footer?: Footer | null; +dictionaries: Map; +dictionaryEncodedFields: Map ; +readMessages: (bb: ByteBuffer, footer: Footer) => Iterable; +}; + +export type VectorReaderContext = { +node: number; +buffer: number; +offset: number; +bytes: Uint8Array; +batch: RecordBatch; +dictionaries: Map ; +}; + +export function* readVectors(buffers: Iterable, context?: ArrowReaderContext) { +const context_ = context || {} as ArrowReaderContext; +for (const buffer of buffers) { +yield* readBuffer(toByteBuffer(buffer), context_); } } -export function* readBuffers(...bytes: Array) { -const dictionaries: Dictionaries = {}; -const byteBuffers = bytes.map(toByteBuffer); -for (let { schema, batch } of readRecords(...byteBuffers)) { -let vectors: Column[] = []; -let state = { nodeIndex: 0, bufferIndex: 0 }; -let fieldsLength = schema.fieldsLength(); -let index = -1, field: Field, vector: Vector; -if (batch.id) { -// A dictionary batch only contain a single vector. Traverse each -// field and its children until we find one that uses this dictionary -while (++index < fieldsLength) { -if (field = schema.fields(index)!) { -if (vector = readDictionary(field, batch, state, dictionaries)!) { -dictionaries[batch.id] = dictionaries[batch.id] && dictionaries[batch.id].concat(vector) || vector; -break; -} +export async function* readVectorsAsync(buffers: AsyncIterable, context?: ArrowReaderContext) { +const context_ = context || {} as ArrowReaderContext; +for await (const buffer of buffers) { +yield* readBuffer(toByteBuffer(buffer), context_); +} +} + +function* readBuffer(bb: ByteBuffer, readerContext: ArrowReaderContext) { Review comment: @wesm anything type-related (type annotations, interfaces, generics, and declarations like `type = { foo: string }`) is TypeScript, the rest is ES. If you're curious about an individual feature, you can run the build `npm run build` and compare the transpiled output (in the `targets` directory) with the TS source. We transpile to multiple JS versions and module formats, but it's probably easiest to compare against the `targets/es2015/esm` or `targets/esnext/esm`. TS code-gens/polyfills missing features depending on the target environment. For example, ES5 doesn't have generators, so TS
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250148#comment-16250148 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150649634 ## File path: js/gulpfile.js ## @@ -86,9 +86,9 @@ const buildConcurrent = (tasks) => () => .merge(...knownTargets.map((target) => del(`${targetDir(target, `cls`)}/**`); -gulp.task( `test`, gulp.series(getTasks(`test`))); -gulp.task(`debug`, gulp.series(getTasks(`debug`))); -gulp.task(`clean`, gulp.parallel(getTasks(`clean`))); +gulp.task( `test`, gulp.series(/*createTestData,*/ getTasks(`test`)/*, cleanTestData*/)); +gulp.task(`debug`, gulp.series(/*createTestData,*/ getTasks(`debug`)/*, cleanTestData*/)); +gulp.task(`clean`, gulp.parallel(/*cleanTestData,*/ getTasks(`clean`))); Review comment: @TheNeuralBit yes, definitely. I put these in to remind us to generate test data on the fly (and how to do it) for the tests, and remove the arrow files from the test directory. Making sure the CI environment had the C++ and Java libs built before the JS tests run was a bit more than I could bite off yesterday on my flight home :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250143#comment-16250143 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150648586 ## File path: js/test/integration-tests.ts ## @@ -0,0 +1,114 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +import Arrow from './Arrow'; +import { zip } from 'ix/iterable/zip'; +import { config, formats } from './test-config'; + +const { Table, readVectors } = Arrow; + +expect.extend({ +toEqualVector(v1: any, v2: any) { + +const format = (x: any, y: any, msg= ' ') => `${ +this.utils.printExpected(x)}${ +msg}${ +this.utils.printReceived(y) +}`; + +let getFailures = new Array(); +let propsFailures = new Array(); +let iteratorFailures = new Array(); +let allFailures = [ +{ title: 'get', failures: getFailures }, +{ title: 'props', failures: propsFailures }, +{ title: 'iterator', failures: iteratorFailures } +]; + +let props = ['name', 'type', 'length', 'nullable', 'nullCount', 'metadata']; +for (let i = -1, n = props.length; ++i < n;) { +const prop = props[i]; +if (this.utils.stringify(v1[prop]) !== this.utils.stringify(v2[prop])) { +propsFailures.push(`${prop}: ${format(v1[prop], v2[prop], ' !== ')}`); +} +} + +for (let i = -1, n = v1.length; ++i < n;) { +let x1 = v1.get(i), x2 = v2.get(i); +if (this.utils.stringify(x1) !== this.utils.stringify(x2)) { +getFailures.push(`${i}: ${format(x1, x2, ' !== ')}`); +} +} + +let i = -1; +for (let [x1, x2] of zip(v1, v2)) { +++i; +if (this.utils.stringify(x1) !== this.utils.stringify(x2)) { +iteratorFailures.push(`${i}: ${format(x1, x2, ' !== ')}`); +} +} + +return { +pass: allFailures.every(({ failures }) => failures.length === 0), +message: () => [ +`${v1.name}: (${format('cpp', 'java', ' !== ')})\n`, +...allFailures.map(({ failures, title }) => +!failures.length ? `` : [`${title}:`, ...failures].join(`\n`)) +].join('\n') +}; +} +}); + +describe(`Integration`, () => { +for (const format of formats) { +describe(format, () => { +for (const [cppArrow, javaArrow] of zip(config.cpp[format], config.java[format])) { +describe(`${cppArrow.name}`, () => { +testReaderIntegration(cppArrow.buffers, javaArrow.buffers); +testTableFromBuffersIntegration(cppArrow.buffers, javaArrow.buffers); +}); +} +}); +} +}); + +function testReaderIntegration(cppBuffers: Uint8Array[], javaBuffers: Uint8Array[]) { +test(`cpp and java vectors report the same values`, () => { +expect.hasAssertions(); +for (const [cppVectors, javaVectors] of zip(readVectors(cppBuffers), readVectors(javaBuffers))) { +expect(cppVectors.length).toEqual(javaVectors.length); +for (let i = -1, n = cppVectors.length; ++i < n;) { +(expect(cppVectors[i]) as any).toEqualVector(javaVectors[i]); +} +} +}); +} + +function testTableFromBuffersIntegration(cppBuffers: Uint8Array[], javaBuffers: Uint8Array[]) { +test(`cpp and java tables report the same values`, () => { +expect.hasAssertions(); +const cppTable = Table.from(cppBuffers); +const javaTable = Table.from(javaBuffers); +const cppVectors = cppTable.columns; +const javaVectors = javaTable.columns; +expect(cppTable.length).toEqual(javaTable.length); +expect(cppVectors.length).toEqual(javaVectors.length); +for (let i = -1, n = cppVectors.length;
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250051#comment-16250051 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150642270 ## File path: js/test/integration-tests.ts ## @@ -0,0 +1,114 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +import Arrow from './Arrow'; +import { zip } from 'ix/iterable/zip'; +import { config, formats } from './test-config'; + +const { Table, readVectors } = Arrow; + +expect.extend({ +toEqualVector(v1: any, v2: any) { + +const format = (x: any, y: any, msg= ' ') => `${ +this.utils.printExpected(x)}${ +msg}${ +this.utils.printReceived(y) +}`; + +let getFailures = new Array(); +let propsFailures = new Array(); +let iteratorFailures = new Array(); +let allFailures = [ +{ title: 'get', failures: getFailures }, +{ title: 'props', failures: propsFailures }, +{ title: 'iterator', failures: iteratorFailures } +]; + +let props = ['name', 'type', 'length', 'nullable', 'nullCount', 'metadata']; +for (let i = -1, n = props.length; ++i < n;) { +const prop = props[i]; +if (this.utils.stringify(v1[prop]) !== this.utils.stringify(v2[prop])) { +propsFailures.push(`${prop}: ${format(v1[prop], v2[prop], ' !== ')}`); +} +} + +for (let i = -1, n = v1.length; ++i < n;) { +let x1 = v1.get(i), x2 = v2.get(i); +if (this.utils.stringify(x1) !== this.utils.stringify(x2)) { +getFailures.push(`${i}: ${format(x1, x2, ' !== ')}`); +} +} + +let i = -1; +for (let [x1, x2] of zip(v1, v2)) { +++i; +if (this.utils.stringify(x1) !== this.utils.stringify(x2)) { +iteratorFailures.push(`${i}: ${format(x1, x2, ' !== ')}`); +} +} + +return { +pass: allFailures.every(({ failures }) => failures.length === 0), +message: () => [ +`${v1.name}: (${format('cpp', 'java', ' !== ')})\n`, +...allFailures.map(({ failures, title }) => +!failures.length ? `` : [`${title}:`, ...failures].join(`\n`)) +].join('\n') +}; +} +}); + +describe(`Integration`, () => { +for (const format of formats) { +describe(format, () => { +for (const [cppArrow, javaArrow] of zip(config.cpp[format], config.java[format])) { +describe(`${cppArrow.name}`, () => { +testReaderIntegration(cppArrow.buffers, javaArrow.buffers); +testTableFromBuffersIntegration(cppArrow.buffers, javaArrow.buffers); +}); +} +}); +} +}); + +function testReaderIntegration(cppBuffers: Uint8Array[], javaBuffers: Uint8Array[]) { +test(`cpp and java vectors report the same values`, () => { +expect.hasAssertions(); +for (const [cppVectors, javaVectors] of zip(readVectors(cppBuffers), readVectors(javaBuffers))) { +expect(cppVectors.length).toEqual(javaVectors.length); +for (let i = -1, n = cppVectors.length; ++i < n;) { +(expect(cppVectors[i]) as any).toEqualVector(javaVectors[i]); +} +} +}); +} + +function testTableFromBuffersIntegration(cppBuffers: Uint8Array[], javaBuffers: Uint8Array[]) { +test(`cpp and java tables report the same values`, () => { +expect.hasAssertions(); +const cppTable = Table.from(cppBuffers); +const javaTable = Table.from(javaBuffers); +const cppVectors = cppTable.columns; +const javaVectors = javaTable.columns; +expect(cppTable.length).toEqual(javaTable.length); +expect(cppVectors.length).toEqual(javaVectors.length); +for (let i = -1, n = cppVectors.length; ++i <
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249782#comment-16249782 ] ASF GitHub Bot commented on ARROW-1693: --- TheNeuralBit commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150583457 ## File path: js/gulpfile.js ## @@ -86,9 +86,9 @@ const buildConcurrent = (tasks) => () => .merge(...knownTargets.map((target) => del(`${targetDir(target, `cls`)}/**`); -gulp.task( `test`, gulp.series(getTasks(`test`))); -gulp.task(`debug`, gulp.series(getTasks(`debug`))); -gulp.task(`clean`, gulp.parallel(getTasks(`clean`))); +gulp.task( `test`, gulp.series(/*createTestData,*/ getTasks(`test`)/*, cleanTestData*/)); +gulp.task(`debug`, gulp.series(/*createTestData,*/ getTasks(`debug`)/*, cleanTestData*/)); +gulp.task(`clean`, gulp.parallel(/*cleanTestData,*/ getTasks(`clean`))); Review comment: Should `createTestData` and `cleanTestData` be uncommented so we can remove the arrow files from the repo? I'm thinking these are probably commented now so that other contributors will be able to run the tests without building the Java and C++ impls - if that's the case, maybe we should separate out integration tests, which require the other libraries, and unit tests, which can be run stand-alone? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249783#comment-16249783 ] ASF GitHub Bot commented on ARROW-1693: --- TheNeuralBit commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150566841 ## File path: js/test/integration-tests.ts ## @@ -0,0 +1,114 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +import Arrow from './Arrow'; +import { zip } from 'ix/iterable/zip'; +import { config, formats } from './test-config'; + +const { Table, readVectors } = Arrow; + +expect.extend({ +toEqualVector(v1: any, v2: any) { + +const format = (x: any, y: any, msg= ' ') => `${ +this.utils.printExpected(x)}${ +msg}${ +this.utils.printReceived(y) +}`; + +let getFailures = new Array(); +let propsFailures = new Array(); +let iteratorFailures = new Array(); +let allFailures = [ +{ title: 'get', failures: getFailures }, +{ title: 'props', failures: propsFailures }, +{ title: 'iterator', failures: iteratorFailures } +]; + +let props = ['name', 'type', 'length', 'nullable', 'nullCount', 'metadata']; +for (let i = -1, n = props.length; ++i < n;) { +const prop = props[i]; +if (this.utils.stringify(v1[prop]) !== this.utils.stringify(v2[prop])) { +propsFailures.push(`${prop}: ${format(v1[prop], v2[prop], ' !== ')}`); +} +} + +for (let i = -1, n = v1.length; ++i < n;) { +let x1 = v1.get(i), x2 = v2.get(i); +if (this.utils.stringify(x1) !== this.utils.stringify(x2)) { +getFailures.push(`${i}: ${format(x1, x2, ' !== ')}`); +} +} + +let i = -1; +for (let [x1, x2] of zip(v1, v2)) { +++i; +if (this.utils.stringify(x1) !== this.utils.stringify(x2)) { +iteratorFailures.push(`${i}: ${format(x1, x2, ' !== ')}`); +} +} + +return { +pass: allFailures.every(({ failures }) => failures.length === 0), +message: () => [ +`${v1.name}: (${format('cpp', 'java', ' !== ')})\n`, +...allFailures.map(({ failures, title }) => +!failures.length ? `` : [`${title}:`, ...failures].join(`\n`)) +].join('\n') +}; +} +}); + +describe(`Integration`, () => { +for (const format of formats) { +describe(format, () => { +for (const [cppArrow, javaArrow] of zip(config.cpp[format], config.java[format])) { +describe(`${cppArrow.name}`, () => { +testReaderIntegration(cppArrow.buffers, javaArrow.buffers); +testTableFromBuffersIntegration(cppArrow.buffers, javaArrow.buffers); +}); +} +}); +} +}); + +function testReaderIntegration(cppBuffers: Uint8Array[], javaBuffers: Uint8Array[]) { +test(`cpp and java vectors report the same values`, () => { +expect.hasAssertions(); +for (const [cppVectors, javaVectors] of zip(readVectors(cppBuffers), readVectors(javaBuffers))) { +expect(cppVectors.length).toEqual(javaVectors.length); +for (let i = -1, n = cppVectors.length; ++i < n;) { +(expect(cppVectors[i]) as any).toEqualVector(javaVectors[i]); +} +} +}); +} + +function testTableFromBuffersIntegration(cppBuffers: Uint8Array[], javaBuffers: Uint8Array[]) { +test(`cpp and java tables report the same values`, () => { +expect.hasAssertions(); +const cppTable = Table.from(cppBuffers); +const javaTable = Table.from(javaBuffers); +const cppVectors = cppTable.columns; +const javaVectors = javaTable.columns; +expect(cppTable.length).toEqual(javaTable.length); +expect(cppVectors.length).toEqual(javaVectors.length); +for (let i = -1, n =
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249781#comment-16249781 ] ASF GitHub Bot commented on ARROW-1693: --- TheNeuralBit commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150558328 ## File path: js/src/vector/arrow.ts ## @@ -0,0 +1,245 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +import * as Schema_ from '../format/Schema_generated'; +import * as Message_ from '../format/Message_generated'; +import Field = Schema_.org.apache.arrow.flatbuf.Field; +import FieldNode = Message_.org.apache.arrow.flatbuf.FieldNode; + +import { Vector } from './vector'; +import { Utf8Vector as Utf8VectorBase } from './utf8'; +import { StructVector as StructVectorBase } from './struct'; +import { DictionaryVector as DictionaryVectorBase } from './dictionary'; +import { +ListVector as ListVectorBase, +BinaryVector as BinaryVectorBase, +FixedSizeListVector as FixedSizeListVectorBase +} from './list'; + +import { +BoolVector as BoolVectorBase, +Int8Vector as Int8VectorBase, +Int16Vector as Int16VectorBase, +Int32Vector as Int32VectorBase, +Int64Vector as Int64VectorBase, +Uint8Vector as Uint8VectorBase, +Uint16Vector as Uint16VectorBase, +Uint32Vector as Uint32VectorBase, +Uint64Vector as Uint64VectorBase, +Float16Vector as Float16VectorBase, +Float32Vector as Float32VectorBase, +Float64Vector as Float64VectorBase, +Date32Vector as Date32VectorBase, +Date64Vector as Date64VectorBase, +Time32Vector as Time32VectorBase, +Time64Vector as Time64VectorBase, +DecimalVector as DecimalVectorBase, +TimestampVector as TimestampVectorBase, +} from './numeric'; + +import { nullableMixin, fieldMixin } from './traits'; + +function MixinArrowTraits, TArgv>( +Base: new (argv: TArgv) => T, +Field: new (argv: TArgv & { field: Field, fieldNode: FieldNode }) => T, +Nullable: new (argv: TArgv & { validity: Uint8Array }) => T, +NullableField: new (argv: TArgv & { validity: Uint8Array, field: Field, fieldNode: FieldNode }) => T, +) { Review comment: Why move the calls to `nullableMixin` and `fieldMixin` from here and out to each individual call? Are there some subtle differences in some vectors that I'm missing? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} >
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249124#comment-16249124 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150451049 ## File path: js/test/integration-tests.ts ## @@ -0,0 +1,114 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +import Arrow from './Arrow'; +import { zip } from 'ix/iterable/zip'; +import { config, formats } from './test-config'; + +const { Table, readVectors } = Arrow; + +expect.extend({ +toEqualVector(v1: any, v2: any) { + +const format = (x: any, y: any, msg= ' ') => `${ +this.utils.printExpected(x)}${ +msg}${ +this.utils.printReceived(y) +}`; + +let getFailures = new Array(); +let propsFailures = new Array(); +let iteratorFailures = new Array(); +let allFailures = [ +{ title: 'get', failures: getFailures }, +{ title: 'props', failures: propsFailures }, +{ title: 'iterator', failures: iteratorFailures } +]; + +let props = ['name', 'type', 'length', 'nullable', 'nullCount', 'metadata']; +for (let i = -1, n = props.length; ++i < n;) { +const prop = props[i]; +if (this.utils.stringify(v1[prop]) !== this.utils.stringify(v2[prop])) { +propsFailures.push(`${prop}: ${format(v1[prop], v2[prop], ' !== ')}`); +} +} + +for (let i = -1, n = v1.length; ++i < n;) { +let x1 = v1.get(i), x2 = v2.get(i); +if (this.utils.stringify(x1) !== this.utils.stringify(x2)) { +getFailures.push(`${i}: ${format(x1, x2, ' !== ')}`); +} +} + +let i = -1; +for (let [x1, x2] of zip(v1, v2)) { +++i; +if (this.utils.stringify(x1) !== this.utils.stringify(x2)) { +iteratorFailures.push(`${i}: ${format(x1, x2, ' !== ')}`); +} +} + +return { +pass: allFailures.every(({ failures }) => failures.length === 0), +message: () => [ +`${v1.name}: (${format('cpp', 'java', ' !== ')})\n`, +...allFailures.map(({ failures, title }) => +!failures.length ? `` : [`${title}:`, ...failures].join(`\n`)) +].join('\n') +}; +} +}); + +describe(`Integration`, () => { +for (const format of formats) { +describe(format, () => { +for (const [cppArrow, javaArrow] of zip(config.cpp[format], config.java[format])) { +describe(`${cppArrow.name}`, () => { +testReaderIntegration(cppArrow.buffers, javaArrow.buffers); +testTableFromBuffersIntegration(cppArrow.buffers, javaArrow.buffers); +}); +} +}); +} +}); + +function testReaderIntegration(cppBuffers: Uint8Array[], javaBuffers: Uint8Array[]) { +test(`cpp and java vectors report the same values`, () => { +expect.hasAssertions(); +for (const [cppVectors, javaVectors] of zip(readVectors(cppBuffers), readVectors(javaBuffers))) { +expect(cppVectors.length).toEqual(javaVectors.length); +for (let i = -1, n = cppVectors.length; ++i < n;) { +(expect(cppVectors[i]) as any).toEqualVector(javaVectors[i]); +} +} +}); +} + +function testTableFromBuffersIntegration(cppBuffers: Uint8Array[], javaBuffers: Uint8Array[]) { +test(`cpp and java tables report the same values`, () => { +expect.hasAssertions(); +const cppTable = Table.from(cppBuffers); +const javaTable = Table.from(javaBuffers); +const cppVectors = cppTable.columns; +const javaVectors = javaTable.columns; +expect(cppTable.length).toEqual(javaTable.length); +expect(cppVectors.length).toEqual(javaVectors.length); +for (let i = -1, n = cppVectors.length; ++i <
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249120#comment-16249120 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150420450 ## File path: js/npm-release.sh ## @@ -17,10 +17,7 @@ # specific language governing permissions and limitations # under the License. -npm run clean -npm run lint -npm run build -npm run test -npm --no-git-tag-version version patch &>/dev/null -npm run bundle -npm run lerna:publish \ No newline at end of file +bump=${1:-patch} && echo "semantic-version bump: $bump" + +run-s --silent lint build test +lerna publish --yes --skip-git --cd-version $bump --force-publish=* Review comment: Aside: what we're going to want to do for ASF release purposes: one script to produce a tarball of the JS project that is sufficient to publish to NPM after. Then a script in the tarball that can publish the project to NPM. So the ASF signed artifact that we upload to SVN will have everything that's needed to publish the project to NPM This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249119#comment-16249119 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150448976 ## File path: js/src/reader/arrow.ts ## @@ -15,64 +15,135 @@ // specific language governing permissions and limitations // under the License. +import { Vector } from '../vector/vector'; import { flatbuffers } from 'flatbuffers'; +import { readVector, readValueVector } from './vector'; +import { +readFileFooter, readFileMessages, +readStreamSchema, readStreamMessages +} from './format'; + +import * as File_ from '../format/File_generated'; import * as Schema_ from '../format/Schema_generated'; import * as Message_ from '../format/Message_generated'; -export import Schema = Schema_.org.apache.arrow.flatbuf.Schema; -export import RecordBatch = Message_.org.apache.arrow.flatbuf.RecordBatch; - -import { readFile } from './file'; -import { readStream } from './stream'; -import { readVector } from './vector'; -import { readDictionary } from './dictionary'; -import { Vector, Column } from '../types/types'; import ByteBuffer = flatbuffers.ByteBuffer; +import Footer = File_.org.apache.arrow.flatbuf.Footer; import Field = Schema_.org.apache.arrow.flatbuf.Field; -export type Dictionaries = { [k: string]: Vector } | null; -export type IteratorState = { nodeIndex: number; bufferIndex: number }; - -export function* readRecords(...bytes: ByteBuffer[]) { -try { -yield* readFile(...bytes); -} catch (e) { -try { -yield* readStream(...bytes); -} catch (e) { -throw new Error('Invalid Arrow buffer'); -} +import Schema = Schema_.org.apache.arrow.flatbuf.Schema; +import Message = Message_.org.apache.arrow.flatbuf.Message; +import RecordBatch = Message_.org.apache.arrow.flatbuf.RecordBatch; +import MessageHeader = Message_.org.apache.arrow.flatbuf.MessageHeader; +import DictionaryBatch = Message_.org.apache.arrow.flatbuf.DictionaryBatch; +import DictionaryEncoding = Schema_.org.apache.arrow.flatbuf.DictionaryEncoding; + +export type ArrowReaderContext = { +schema?: Schema; +footer?: Footer | null; +dictionaries: Map; +dictionaryEncodedFields: Map ; +readMessages: (bb: ByteBuffer, footer: Footer) => Iterable; +}; + +export type VectorReaderContext = { +node: number; +buffer: number; +offset: number; +bytes: Uint8Array; +batch: RecordBatch; +dictionaries: Map ; +}; + +export function* readVectors(buffers: Iterable, context?: ArrowReaderContext) { +const context_ = context || {} as ArrowReaderContext; +for (const buffer of buffers) { +yield* readBuffer(toByteBuffer(buffer), context_); } } -export function* readBuffers(...bytes: Array) { -const dictionaries: Dictionaries = {}; -const byteBuffers = bytes.map(toByteBuffer); -for (let { schema, batch } of readRecords(...byteBuffers)) { -let vectors: Column[] = []; -let state = { nodeIndex: 0, bufferIndex: 0 }; -let fieldsLength = schema.fieldsLength(); -let index = -1, field: Field, vector: Vector; -if (batch.id) { -// A dictionary batch only contain a single vector. Traverse each -// field and its children until we find one that uses this dictionary -while (++index < fieldsLength) { -if (field = schema.fields(index)!) { -if (vector = readDictionary(field, batch, state, dictionaries)!) { -dictionaries[batch.id] = dictionaries[batch.id] && dictionaries[batch.id].concat(vector) || vector; -break; -} +export async function* readVectorsAsync(buffers: AsyncIterable, context?: ArrowReaderContext) { +const context_ = context || {} as ArrowReaderContext; +for await (const buffer of buffers) { +yield* readBuffer(toByteBuffer(buffer), context_); +} +} + +function* readBuffer(bb: ByteBuffer, readerContext: ArrowReaderContext) { Review comment: What do you recommend as a resource for getting up to speed on TypeScript? Where is the line between TypeScript and ES6? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 >
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249122#comment-16249122 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150420366 ## File path: js/gulp/test-task.js ## @@ -42,3 +54,78 @@ const testTask = ((cache, execArgv, testOptions) => memoizeTask(cache, function module.exports = testTask; module.exports.testTask = testTask; +module.exports.cleanTestData = cleanTestData; +module.exports.createTestData = createTestData; + +async function cleanTestData() { +return await del([ +`${path.resolve('./test/arrows/cpp')}/**`, +`${path.resolve('./test/arrows/java')}/**`, +]); +} + +async function createTestData() { +const base = path.resolve('./test/arrows'); +await mkdirp(path.join(base, 'cpp/file')); +await mkdirp(path.join(base, 'java/file')); +await mkdirp(path.join(base, 'cpp/stream')); +await mkdirp(path.join(base, 'java/stream')); +const errors = []; +const names = await glob(path.join(base, 'json/*.json')); +for (let jsonPath of names) { +const name = path.parse(path.basename(jsonPath)).name; +const arrowCppFilePath = path.join(base, 'cpp/file', `${name}.arrow`); +const arrowJavaFilePath = path.join(base, 'java/file', `${name}.arrow`); +const arrowCppStreamPath = path.join(base, 'cpp/stream', `${name}.arrow`); +const arrowJavaStreamPath = path.join(base, 'java/stream', `${name}.arrow`); +try { +await generateCPPFile(jsonPath, arrowCppFilePath); +await generateCPPStream(arrowCppFilePath, arrowCppStreamPath); +} catch (e) { errors.push(e.message); } +try { +await generateJavaFile(jsonPath, arrowJavaFilePath); +await generateJavaStream(arrowJavaFilePath, arrowJavaStreamPath); +} catch (e) { errors.push(e.message); } +} +if (errors.length) { +console.error(errors.join(`\n`)); +process.exit(1); +} +} + +async function generateCPPFile(jsonPath, filePath) { +await rimraf(filePath); +return await exec( +`../cpp/build/release/json-integration-test ${ +`--integration --mode=JSON_TO_ARROW`} ${ +`--json=${path.resolve(jsonPath)} --arrow=${filePath}`}`, +{ maxBuffer: Math.pow(2, 53) - 1 } +); +} + +async function generateCPPStream(filePath, streamPath) { +await rimraf(streamPath); +return await exec( +`../cpp/build/release/file-to-stream ${filePath} > ${streamPath}`, Review comment: Simple way to make this file path more easily configurable / less hard-coded? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249123#comment-16249123 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150451049 ## File path: js/test/integration-tests.ts ## @@ -0,0 +1,114 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +import Arrow from './Arrow'; +import { zip } from 'ix/iterable/zip'; +import { config, formats } from './test-config'; + +const { Table, readVectors } = Arrow; + +expect.extend({ +toEqualVector(v1: any, v2: any) { + +const format = (x: any, y: any, msg= ' ') => `${ +this.utils.printExpected(x)}${ +msg}${ +this.utils.printReceived(y) +}`; + +let getFailures = new Array(); +let propsFailures = new Array(); +let iteratorFailures = new Array(); +let allFailures = [ +{ title: 'get', failures: getFailures }, +{ title: 'props', failures: propsFailures }, +{ title: 'iterator', failures: iteratorFailures } +]; + +let props = ['name', 'type', 'length', 'nullable', 'nullCount', 'metadata']; +for (let i = -1, n = props.length; ++i < n;) { +const prop = props[i]; +if (this.utils.stringify(v1[prop]) !== this.utils.stringify(v2[prop])) { +propsFailures.push(`${prop}: ${format(v1[prop], v2[prop], ' !== ')}`); +} +} + +for (let i = -1, n = v1.length; ++i < n;) { +let x1 = v1.get(i), x2 = v2.get(i); +if (this.utils.stringify(x1) !== this.utils.stringify(x2)) { +getFailures.push(`${i}: ${format(x1, x2, ' !== ')}`); +} +} + +let i = -1; +for (let [x1, x2] of zip(v1, v2)) { +++i; +if (this.utils.stringify(x1) !== this.utils.stringify(x2)) { +iteratorFailures.push(`${i}: ${format(x1, x2, ' !== ')}`); +} +} + +return { +pass: allFailures.every(({ failures }) => failures.length === 0), +message: () => [ +`${v1.name}: (${format('cpp', 'java', ' !== ')})\n`, +...allFailures.map(({ failures, title }) => +!failures.length ? `` : [`${title}:`, ...failures].join(`\n`)) +].join('\n') +}; +} +}); + +describe(`Integration`, () => { +for (const format of formats) { +describe(format, () => { +for (const [cppArrow, javaArrow] of zip(config.cpp[format], config.java[format])) { +describe(`${cppArrow.name}`, () => { +testReaderIntegration(cppArrow.buffers, javaArrow.buffers); +testTableFromBuffersIntegration(cppArrow.buffers, javaArrow.buffers); +}); +} +}); +} +}); + +function testReaderIntegration(cppBuffers: Uint8Array[], javaBuffers: Uint8Array[]) { +test(`cpp and java vectors report the same values`, () => { +expect.hasAssertions(); +for (const [cppVectors, javaVectors] of zip(readVectors(cppBuffers), readVectors(javaBuffers))) { +expect(cppVectors.length).toEqual(javaVectors.length); +for (let i = -1, n = cppVectors.length; ++i < n;) { +(expect(cppVectors[i]) as any).toEqualVector(javaVectors[i]); +} +} +}); +} + +function testTableFromBuffersIntegration(cppBuffers: Uint8Array[], javaBuffers: Uint8Array[]) { +test(`cpp and java tables report the same values`, () => { +expect.hasAssertions(); +const cppTable = Table.from(cppBuffers); +const javaTable = Table.from(javaBuffers); +const cppVectors = cppTable.columns; +const javaVectors = javaTable.columns; +expect(cppTable.length).toEqual(javaTable.length); +expect(cppVectors.length).toEqual(javaVectors.length); +for (let i = -1, n = cppVectors.length; ++i <
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249121#comment-16249121 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150450626 ## File path: js/test/arrows/json/datetime.json ## @@ -0,0 +1,1091 @@ +{ Review comment: I'm not sure about checking in all these .json and .arrow files, is there some way we can automate their generation as part of the integration testing? Then they don't have to be modified when we expand the integration test suite This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249118#comment-16249118 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on a change in pull request #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#discussion_r150420344 ## File path: js/gulp/test-task.js ## @@ -42,3 +54,78 @@ const testTask = ((cache, execArgv, testOptions) => memoizeTask(cache, function module.exports = testTask; module.exports.testTask = testTask; +module.exports.cleanTestData = cleanTestData; +module.exports.createTestData = createTestData; + +async function cleanTestData() { +return await del([ +`${path.resolve('./test/arrows/cpp')}/**`, +`${path.resolve('./test/arrows/java')}/**`, +]); +} + +async function createTestData() { +const base = path.resolve('./test/arrows'); +await mkdirp(path.join(base, 'cpp/file')); +await mkdirp(path.join(base, 'java/file')); +await mkdirp(path.join(base, 'cpp/stream')); +await mkdirp(path.join(base, 'java/stream')); +const errors = []; +const names = await glob(path.join(base, 'json/*.json')); +for (let jsonPath of names) { +const name = path.parse(path.basename(jsonPath)).name; +const arrowCppFilePath = path.join(base, 'cpp/file', `${name}.arrow`); +const arrowJavaFilePath = path.join(base, 'java/file', `${name}.arrow`); +const arrowCppStreamPath = path.join(base, 'cpp/stream', `${name}.arrow`); +const arrowJavaStreamPath = path.join(base, 'java/stream', `${name}.arrow`); +try { +await generateCPPFile(jsonPath, arrowCppFilePath); +await generateCPPStream(arrowCppFilePath, arrowCppStreamPath); +} catch (e) { errors.push(e.message); } +try { +await generateJavaFile(jsonPath, arrowJavaFilePath); +await generateJavaStream(arrowJavaFilePath, arrowJavaStreamPath); +} catch (e) { errors.push(e.message); } +} +if (errors.length) { +console.error(errors.join(`\n`)); +process.exit(1); +} +} + +async function generateCPPFile(jsonPath, filePath) { +await rimraf(filePath); +return await exec( +`../cpp/build/release/json-integration-test ${ +`--integration --mode=JSON_TO_ARROW`} ${ +`--json=${path.resolve(jsonPath)} --arrow=${filePath}`}`, +{ maxBuffer: Math.pow(2, 53) - 1 } +); +} + +async function generateCPPStream(filePath, streamPath) { +await rimraf(streamPath); +return await exec( +`../cpp/build/release/file-to-stream ${filePath} > ${streamPath}`, +{ maxBuffer: Math.pow(2, 53) - 1 } +); +} + +async function generateJavaFile(jsonPath, filePath) { +await rimraf(filePath); +return await exec( +`java -cp ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar ${ +`org.apache.arrow.tools.Integration -c JSON_TO_ARROW`} ${ +`-j ${path.resolve(jsonPath)} -a ${filePath}`}`, +{ maxBuffer: Math.pow(2, 53) - 1 } +); +} + +async function generateJavaStream(filePath, streamPath) { +await rimraf(streamPath); +return await exec( +`java -cp ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar ${ Review comment: Can this version number here be gotten from the environment / pom file per chance? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248969#comment-16248969 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294#issuecomment-343760757 awesome, thanks @trxcllnt! I'm going to work through the patch to leave any comments that jump out but this is really exciting This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248921#comment-16248921 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors (WIP) URL: https://github.com/apache/arrow/pull/1294#issuecomment-343752026 @wesm I believe this branch is ready. We should revalidate datetime and dictionary once we can get Pandas to generate a CSV from the test data. The [integration tests](https://github.com/trxcllnt/arrow/blob/64e318cec96345d71c4c1f08e028a14b5dd3dd3d/js/test/integration-tests.ts#L81) are finally passing, so I feel good about this one: ``` Test Suites: 4 passed, 4 total Tests: 404 passed, 404 total Snapshots: 113940 passed, 113940 total ``` The last commit re-enables node_js job in travis so we can verify the above. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248920#comment-16248920 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors (WIP) URL: https://github.com/apache/arrow/pull/1294#issuecomment-343752026 @wesm I believe this branch is ready. We should revalidate datetime and dictionary once we can get Pandas to generate a CSV from the test data. Even the [integration tests](https://github.com/trxcllnt/arrow/blob/64e318cec96345d71c4c1f08e028a14b5dd3dd3d/js/test/integration-tests.ts#L81) are finally passing, so I feel good about this one: ``` Test Suites: 4 passed, 4 total Tests: 404 passed, 404 total Snapshots: 113940 passed, 113940 total ``` The last commit re-enables node_js job in travis so we can verify the above. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246518#comment-16246518 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors (WIP) URL: https://github.com/apache/arrow/pull/1294#issuecomment-343290645 no problem, feel free to keep adding here. I will take a little time to review also This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246481#comment-16246481 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors (WIP) URL: https://github.com/apache/arrow/pull/1294#issuecomment-343284914 @wesm yep, working on this tonight. do you mind if I add to this PR vs starting a new branch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246435#comment-16246435 ] ASF GitHub Bot commented on ARROW-1693: --- wesm commented on issue #1294: ARROW-1693: [JS] Fix reading C++ dictionary-encoded vectors (WIP) URL: https://github.com/apache/arrow/pull/1294#issuecomment-343277677 Want to go ahead and ignore the vector layout metadata? Per ARROW-1785. I will wait a day or so for further feedback to circulate, then proceed with a removal of this metadata. We'll need to update the Flatbuffers files in JS again as part of this, I will give you write access on my fork so you can push directly to the PR branch as needed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245180#comment-16245180 ] ASF GitHub Bot commented on ARROW-1693: --- trxcllnt opened a new pull request #1294: WIP ARROW-1693: Fix reading C++ dictionary-encoded vectors URL: https://github.com/apache/arrow/pull/1294 This PR adds a workaround for reading the metadata layout for C++ dictionary-encoded vectors. I added tests that validate against the C++/Java integration suite. In order to make the new tests pass, I had to update the generated flatbuffers format and add a few types the JS version didn't have yet (Bool, Date32, and Timestamp). It also uses the new `isDelta` flag on DictionaryBatches to determine whether the DictionaryBatch vector should replace or append to the existing dictionary. I also added a script for generating test arrow files from the C++ and Java implementations, so we don't break the tests updating the format in the future. I saved the generated Arrow files in with the tests because I didn't see a way to pipe the JSON test data through the C++/Java json-to-arrow commands without writing to a file. If I missed something and we can do it all in-memory, I'd be happy to make that change! This PR is marked WIP because I added an [integration test](https://github.com/apache/arrow/commit/6e98874d9f4bfae7758f8f731212ae7ceb3f1321#diff-18c6be12406c482092d4b1f7bd70a8e1R22) that validates the JS reader reads C++ and Java files the same way, but unfortunately it doesn't. Debugging, I noticed a number of other differences between the buffer layout metadata between the C++ and Java versions. If we go ahead with @jacques-n [comment in ARROW-1693](https://issues.apache.org/jira/browse/ARROW-1693?focusedCommentId=16244812=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16244812) and remove/ignore the metadata, this test should pass too. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Labels: pull-request-available > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244812#comment-16244812 ] Jacques Nadeau commented on ARROW-1693: --- I actually think having the vector layout was a mistake. I think we should remove it. It is a constant that is defined by the spec. We actually implemented an alternative representation internally where we skip inclusion of it because we don't want to send around information that is useless (and can be fairly substantial when talking about five record, several thousand field datasets). > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236908#comment-16236908 ] Wes McKinney commented on ARROW-1693: - See ARROW-1362 > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236902#comment-16236902 ] Wes McKinney commented on ARROW-1693: - I think the idea of including the buffer layouts was hypothetically to permit an implementation to, say, omit the validity bitmap buffer without consequences. In practice, both the Java and C++ implementations are presuming to be sent the same buffer layout that the emit -- i.e. with the buffers in same order (so in the case of strings, you would have validity, offsets, then data). But validating our presumptions is useful. So what we probably need to do is implement buffer layout validation in both Java and C++ so that we can assert that a sender has prepared the buffers in the way that are supported. I was wrong to call the JS implementation "brittle" in this regard, really it's that the more rigorous checking exposed bugs in the other implementations > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236855#comment-16236855 ] Paul Taylor commented on ARROW-1693: [~wesmckinn] digging into this now, yeah it looks like the the DictionaryBatch UTF8Vector fieldNode don't include the offsets buffer. Sounds like I should get those integration tests up and running. I wanna offer some push back on your comment about brittleness though. Maybe I'm alone on this, but seems like a cross-platform ipc format should strictly enforce its own spec -- anything less and end up with a bunch of maybe-compatible implementations, right? > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 0.8.0 > > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212893#comment-16212893 ] Wes McKinney commented on ARROW-1693: - This was fixed in the C++ implementation in ARROW-1363 https://github.com/apache/arrow/commit/0ced74e1e39587c0ee10ac5979fefbaac97446f5#diff-3ea143b7ffb13757e558952ab1a4e60b > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1693) [JS] Error reading dictionary-encoded integration test files
[ https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212697#comment-16212697 ] Brian Hulette commented on ARROW-1693: -- Pretty sure this is related to the vector layout representing the index vs. the dictionary data > [JS] Error reading dictionary-encoded integration test files > > > Key: ARROW-1693 > URL: https://issues.apache.org/jira/browse/ARROW-1693 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Brian Hulette > Attachments: dictionary-cpp.arrow, dictionary-java.arrow, > dictionary.json > > > The JS implementation crashes when reading the dictionary test case from the > integration tests. > To replicate, first generate the test files with java and cpp impls: > {code} > $ cd ${ARROW_HOME}/integration/ > $ python -c 'from integration_test import generate_dictionary_case; > generate_dictionary_case().write("dictionary.json")' > $ ../cpp/debug/debug/json-integration-test --integration > --json=dictionary.json --arrow=dictionary-cpp.arrow --mode=JSON_TO_ARROW > $ java -cp > ../java/tools/target/arrow-tools-0.8.0-SNAPSHOT-jar-with-dependencies.jar > org.apache.arrow.tools.Integration -c JSON_TO_ARROW -a dictionary-java.arrow > -j dictionary.json > {code} > Attempt to read the files with the JS impl: > {code} > $ cd ${ARROW_HOME}/js/ > $ ./bin/arrow2csv.js -s dict1_0 -f ../integration/dictionary-{java,cpp}.arrow > {code} > Both files result in an error for me on > [a8f51858|https://github.com/apache/arrow/commit/a8f518588fda471b2e3cc8e0f0064e7c4bb99899]: > {{TypeError: Cannot read property 'buffer' of undefined}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)