[jira] [Commented] (ARROW-6370) [JS] Table.from adds 0 on int columns

2019-09-12 Thread Sascha Hofmann (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928878#comment-16928878
 ] 

Sascha Hofmann commented on ARROW-6370:
---

Alright! Thank you so much explaining that and for your patience :D. 

> [JS] Table.from adds 0 on int columns
> -
>
> Key: ARROW-6370
> URL: https://issues.apache.org/jira/browse/ARROW-6370
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.14.1
>Reporter: Sascha Hofmann
>Priority: Major
>
> I am generating an arrow table in pyarrow and send it via gRPC like this:
> {code:java}
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> writer.close()
> yield ds.Response(
> status=200,
> loading=False,
> response=[sink.getvalue().to_pybytes()]   
> )
> {code}
> On the javascript end, I parse it like that:
> {code:java}
>  Table.from(response.getResponseList()[0])
> {code}
> That works but when I look at the actual table, int columns have a 0 for 
> every other row. String columns seem to be parsed just fine. 
> The Python byte array created from to_pybytes() has the same length as 
> received in javascript. I am also able to recreate the original table for the 
> byte array in Python. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6370) [JS] Table.from adds 0 on int columns

2019-09-12 Thread Paul Taylor (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928851#comment-16928851
 ] 

Paul Taylor commented on ARROW-6370:


[~saschahofmann]

bq. From my understanding of Arrow it should be a platform-independent format, 
meaning that if I am sending an arrow table from Python to JS it should turn 
out the same, right?

Yes, and that's what's happening here. But you're sending 8-byte integers to a 
platform which has historically only supported 4-byte integers, which is why 
you see each 8-byte integer as a pair of 4-byte integers.

I recommend reading [this post|https://v8.dev/features/bigint] on BigInts in 
the v8 blog.

BigInts (and their related typed arrays) are relatively new additions to JS, 
and aren't supported in all engines yet.

We have done our best to support geting and setting BigInt values when running 
in VM that supports them, but for now we still have to support platforms 
without BigInt. That's why the values Array for Int64Vector is a stride-2 
Int32Array instead of a BigInt64Array.

> [JS] Table.from adds 0 on int columns
> -
>
> Key: ARROW-6370
> URL: https://issues.apache.org/jira/browse/ARROW-6370
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.14.1
>Reporter: Sascha Hofmann
>Priority: Major
>
> I am generating an arrow table in pyarrow and send it via gRPC like this:
> {code:java}
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> writer.close()
> yield ds.Response(
> status=200,
> loading=False,
> response=[sink.getvalue().to_pybytes()]   
> )
> {code}
> On the javascript end, I parse it like that:
> {code:java}
>  Table.from(response.getResponseList()[0])
> {code}
> That works but when I look at the actual table, int columns have a 0 for 
> every other row. String columns seem to be parsed just fine. 
> The Python byte array created from to_pybytes() has the same length as 
> received in javascript. I am also able to recreate the original table for the 
> byte array in Python. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6370) [JS] Table.from adds 0 on int columns

2019-09-12 Thread Sascha Hofmann (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928421#comment-16928421
 ] 

Sascha Hofmann commented on ARROW-6370:
---

Ok I discovered that I can retrieve the right values via
{code:java}
col.chunks[0].values64{code}

> [JS] Table.from adds 0 on int columns
> -
>
> Key: ARROW-6370
> URL: https://issues.apache.org/jira/browse/ARROW-6370
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.14.1
>Reporter: Sascha Hofmann
>Priority: Major
>
> I am generating an arrow table in pyarrow and send it via gRPC like this:
> {code:java}
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> writer.close()
> yield ds.Response(
> status=200,
> loading=False,
> response=[sink.getvalue().to_pybytes()]   
> )
> {code}
> On the javascript end, I parse it like that:
> {code:java}
>  Table.from(response.getResponseList()[0])
> {code}
> That works but when I look at the actual table, int columns have a 0 for 
> every other row. String columns seem to be parsed just fine. 
> The Python byte array created from to_pybytes() has the same length as 
> received in javascript. I am also able to recreate the original table for the 
> byte array in Python. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6370) [JS] Table.from adds 0 on int columns

2019-09-11 Thread Sascha Hofmann (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927830#comment-16927830
 ] 

Sascha Hofmann commented on ARROW-6370:
---

>From my understanding of Arrow it should be a platform-independent format, 
>meaning that if I am sending an arrow table from Python to JS it should turn 
>out the same, right?

Anyhow, I am trying to write a parser function that translates Int columns to 
Int64. I couldn't find much about casting a Vector to a different type in the 
docs. Could you maybe point me in the right direction, [~paul.e.taylor] ? 

 

> [JS] Table.from adds 0 on int columns
> -
>
> Key: ARROW-6370
> URL: https://issues.apache.org/jira/browse/ARROW-6370
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.14.1
>Reporter: Sascha Hofmann
>Priority: Major
>
> I am generating an arrow table in pyarrow and send it via gRPC like this:
> {code:java}
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> writer.close()
> yield ds.Response(
> status=200,
> loading=False,
> response=[sink.getvalue().to_pybytes()]   
> )
> {code}
> On the javascript end, I parse it like that:
> {code:java}
>  Table.from(response.getResponseList()[0])
> {code}
> That works but when I look at the actual table, int columns have a 0 for 
> every other row. String columns seem to be parsed just fine. 
> The Python byte array created from to_pybytes() has the same length as 
> received in javascript. I am also able to recreate the original table for the 
> byte array in Python. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6370) [JS] Table.from adds 0 on int columns

2019-09-10 Thread Sascha Hofmann (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926425#comment-16926425
 ] 

Sascha Hofmann commented on ARROW-6370:
---

Ok gotcha. Our problem is that we are sending a whole table buffer, is there a 
way to pass the table schema to the Table.from(buffer) ? Or do we need to check 
for int32 columns and cast them to int64? 

> [JS] Table.from adds 0 on int columns
> -
>
> Key: ARROW-6370
> URL: https://issues.apache.org/jira/browse/ARROW-6370
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.14.1
>Reporter: Sascha Hofmann
>Priority: Major
>
> I am generating an arrow table in pyarrow and send it via gRPC like this:
> {code:java}
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> writer.close()
> yield ds.Response(
> status=200,
> loading=False,
> response=[sink.getvalue().to_pybytes()]   
> )
> {code}
> On the javascript end, I parse it like that:
> {code:java}
>  Table.from(response.getResponseList()[0])
> {code}
> That works but when I look at the actual table, int columns have a 0 for 
> every other row. String columns seem to be parsed just fine. 
> The Python byte array created from to_pybytes() has the same length as 
> received in javascript. I am also able to recreate the original table for the 
> byte array in Python. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6370) [JS] Table.from adds 0 on int columns

2019-09-09 Thread Paul Taylor (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925928#comment-16925928
 ] 

Paul Taylor commented on ARROW-6370:


[~saschahofmann] I closed this because this is working as intended.

64-bit little-endian numbers are represented as pairs of lo, hi twos-complement 
32-bit integers. If your values are less than 32-bits, the high bits will be 
zero. We're not inserting zeros, the zeros are part of the data Python is 
sending to JavaScript.

The Int64Vector and Uint64Vector support implicitly casting either to a normal 
JS 64-bit float as (with 53-bits of precision) if you can afford to lose 
precision, or to JS's new 
[BigInt|https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt]
 type if you need the full 64-bits of precision and are on a platform that 
supports BigInt (v8 and the newest versions of FF).

{code:javascript}
const { Int64, Vector } = require('apache-arrow');
let i64s = Vector.from({ type: new Int64(), values: [123n, 456n, 789n] });
for (let x of i64s) {
  console.log(x); // will be an Int32Array of two numbers: lo, hi
  console.log(0 + x); // casts to a 53-bit integer, i.e. regular JS float64
  console.log(0n + x); // casts to a BigInt, i.e. JS's new 64-bit integer
}
{code}


> [JS] Table.from adds 0 on int columns
> -
>
> Key: ARROW-6370
> URL: https://issues.apache.org/jira/browse/ARROW-6370
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.14.1
>Reporter: Sascha Hofmann
>Priority: Major
>
> I am generating an arrow table in pyarrow and send it via gRPC like this:
> {code:java}
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> writer.close()
> yield ds.Response(
> status=200,
> loading=False,
> response=[sink.getvalue().to_pybytes()]   
> )
> {code}
> On the javascript end, I parse it like that:
> {code:java}
>  Table.from(response.getResponseList()[0])
> {code}
> That works but when I look at the actual table, int columns have a 0 for 
> every other row. String columns seem to be parsed just fine. 
> The Python byte array created from to_pybytes() has the same length as 
> received in javascript. I am also able to recreate the original table for the 
> byte array in Python. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6370) [JS] Table.from adds 0 on int columns

2019-09-04 Thread Sascha Hofmann (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922883#comment-16922883
 ] 

Sascha Hofmann commented on ARROW-6370:
---

I still believe that this is a bug and that converting a colum that is 
originally a int64 is only a temporary solution?

> [JS] Table.from adds 0 on int columns
> -
>
> Key: ARROW-6370
> URL: https://issues.apache.org/jira/browse/ARROW-6370
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.14.1
>Reporter: Sascha Hofmann
>Priority: Major
>
> I am generating an arrow table in pyarrow and send it via gRPC like this:
> {code:java}
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> writer.close()
> yield ds.Response(
> status=200,
> loading=False,
> response=[sink.getvalue().to_pybytes()]   
> )
> {code}
> On the javascript end, I parse it like that:
> {code:java}
>  Table.from(response.getResponseList()[0])
> {code}
> That works but when I look at the actual table, int columns have a 0 for 
> every other row. String columns seem to be parsed just fine. 
> The Python byte array created from to_pybytes() has the same length as 
> received in javascript. I am also able to recreate the original table for the 
> byte array in Python. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6370) [JS] Table.from adds 0 on int columns

2019-08-28 Thread Sascha Hofmann (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917896#comment-16917896
 ] 

Sascha Hofmann commented on ARROW-6370:
---

Indeed! Converting it to int32() in python solved the issue of added 0s.

> [JS] Table.from adds 0 on int columns
> -
>
> Key: ARROW-6370
> URL: https://issues.apache.org/jira/browse/ARROW-6370
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.14.1
>Reporter: Sascha Hofmann
>Priority: Major
>
> I am generating an arrow table in pyarrow and send it via gRPC like this:
> {code:java}
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> writer.close()
> yield ds.Response(
> status=200,
> loading=False,
> response=[sink.getvalue().to_pybytes()]   
> )
> {code}
> On the javascript end, I parse it like that:
> {code:java}
>  Table.from(response.getResponseList()[0])
> {code}
> That works but when I look at the actual table, int columns have a 0 for 
> every other row. String columns seem to be parsed just fine. 
> The Python byte array created from to_pybytes() has the same length as 
> received in javascript. I am also able to recreate the original table for the 
> byte array in Python. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6370) [JS] Table.from adds 0 on int columns

2019-08-28 Thread Sascha Hofmann (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917864#comment-16917864
 ] 

Sascha Hofmann commented on ARROW-6370:
---

Cool thank you! Using latest so v10.16.3

> [JS] Table.from adds 0 on int columns
> -
>
> Key: ARROW-6370
> URL: https://issues.apache.org/jira/browse/ARROW-6370
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.14.1
>Reporter: Sascha Hofmann
>Priority: Major
>
> I am generating an arrow table in pyarrow and send it via gRPC like this:
> {code:java}
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> writer.close()
> yield ds.Response(
> status=200,
> loading=False,
> response=[sink.getvalue().to_pybytes()]   
> )
> {code}
> On the javascript end, I parse it like that:
> {code:java}
>  Table.from(response.getResponseList()[0])
> {code}
> That works but when I look at the actual table, int columns have a 0 for 
> every other row. String columns seem to be parsed just fine. 
> The Python byte array created from to_pybytes() has the same length as 
> received in javascript. I am also able to recreate the original table for the 
> byte array in Python. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6370) [JS] Table.from adds 0 on int columns

2019-08-28 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917857#comment-16917857
 ] 

Brian Hulette commented on ARROW-6370:
--

Yeah I suspect converting to int32 will solve your problem. But this is still a 
bug so I'll see if I can reproduce it :)
What version of node are you using?

> [JS] Table.from adds 0 on int columns
> -
>
> Key: ARROW-6370
> URL: https://issues.apache.org/jira/browse/ARROW-6370
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.14.1
>Reporter: Sascha Hofmann
>Priority: Major
>
> I am generating an arrow table in pyarrow and send it via gRPC like this:
> {code:java}
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> writer.close()
> yield ds.Response(
> status=200,
> loading=False,
> response=[sink.getvalue().to_pybytes()]   
> )
> {code}
> On the javascript end, I parse it like that:
> {code:java}
>  Table.from(response.getResponseList()[0])
> {code}
> That works but when I look at the actual table, int columns have a 0 for 
> every other row. String columns seem to be parsed just fine. 
> The Python byte array created from to_pybytes() has the same length as 
> received in javascript. I am also able to recreate the original table for the 
> byte array in Python. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6370) [JS] Table.from adds 0 on int columns

2019-08-28 Thread Sascha Hofmann (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917852#comment-16917852
 ] 

Sascha Hofmann commented on ARROW-6370:
---

Yes, the column is int64.

For creating the RecordBatch in python: I am reading a pyarrow table from a 
parquet file, which itself was created from a csv. I tested this on different 
CSVs with the same behaviour. 

I assume above issue is creating our problem. We are using arrow in an Electron 
app (so Node.js) with a python backend server. The bytes are sent via gRPC.

 

I will try to convert the int columns to int32 in python and see what's 
happening.

 

> [JS] Table.from adds 0 on int columns
> -
>
> Key: ARROW-6370
> URL: https://issues.apache.org/jira/browse/ARROW-6370
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.14.1
>Reporter: Sascha Hofmann
>Priority: Major
>
> I am generating an arrow table in pyarrow and send it via gRPC like this:
> {code:java}
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> writer.close()
> yield ds.Response(
> status=200,
> loading=False,
> response=[sink.getvalue().to_pybytes()]   
> )
> {code}
> On the javascript end, I parse it like that:
> {code:java}
>  Table.from(response.getResponseList()[0])
> {code}
> That works but when I look at the actual table, int columns have a 0 for 
> every other row. String columns seem to be parsed just fine. 
> The Python byte array created from to_pybytes() has the same length as 
> received in javascript. I am also able to recreate the original table for the 
> byte array in Python. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (ARROW-6370) [JS] Table.from adds 0 on int columns

2019-08-28 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917848#comment-16917848
 ] 

Brian Hulette commented on ARROW-6370:
--

What is the type of the int column, int64? int64s behave a little weird in JS. 
If running in a platform with BigInt, calls to Int64Array.get _should_ return 
an instance of it, otherwise they will return a two element slice of Int32Array 
with the high, low bytes.

Could you provide a little more detail on how you're generating the record 
batches? and maybe how you're observing the ints?

> [JS] Table.from adds 0 on int columns
> -
>
> Key: ARROW-6370
> URL: https://issues.apache.org/jira/browse/ARROW-6370
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: 0.14.1
>Reporter: Sascha Hofmann
>Priority: Major
>
> I am generating an arrow table in pyarrow and send it via gRPC like this:
> {code:java}
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
> writer.write_batch(batch)
> writer.close()
> yield ds.Response(
> status=200,
> loading=False,
> response=[sink.getvalue().to_pybytes()]   
> )
> {code}
> On the javascript end, I parse it like that:
> {code:java}
>  Table.from(response.getResponseList()[0])
> {code}
> That works but when I look at the actual table, int columns have a 0 for 
> every other row. String columns seem to be parsed just fine. 
> The Python byte array created from to_pybytes() has the same length as 
> received in javascript. I am also able to recreate the original table for the 
> byte array in Python. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)