[jira] [Commented] (ARROW-2051) [Python] Support serializing UUID objects to tables

2019-09-18 Thread Mitar (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932786#comment-16932786
 ] 

Mitar commented on ARROW-2051:
--

Sounds good. I will then explore how to do that through extension types.

> [Python] Support serializing UUID objects to tables
> ---
>
> Key: ARROW-2051
> URL: https://issues.apache.org/jira/browse/ARROW-2051
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Omer Katz
>Priority: Major
>
> UUID objects can be easily supported and can be represented as 128-bit 
> integers or a stream of bytes.
> The fastest way I know to construct a UUID object is by using it's 128-bit 
> (16 bytes) integer representation.
>  
> {code:java}
> %timeit uuid.UUID(int=24197857161011715162171839636988778104)
> 611 ns ± 6.27 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID(bytes=b'\x124Vx\x124Vx\x124Vx\x124Vx')
> 1.17 µs ± 7.5 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID('12345678-1234-5678-1234-567812345678')
> 1.47 µs ± 6.08 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> {code}
>  
> Right now I have to do this manually which is pretty tedious.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-2051) [Python] Support serializing UUID objects to tables

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932645#comment-16932645
 ] 

Antoine Pitrou commented on ARROW-2051:
---

1) We don't have 128-bit numbers in Arrow. We do have fixed-size binary data.
2) Arrow now has extension types, so you could probably implement a UUID 
extension type (though the Python API for extension types may still be in flux).
3) The answer to "why not..." questions generally is that it costs maintenance 
time for us, so unless some contributor (such as you) wants to bear the 
maintenance cost it probably won't happen if we don't find it useful enough.


> [Python] Support serializing UUID objects to tables
> ---
>
> Key: ARROW-2051
> URL: https://issues.apache.org/jira/browse/ARROW-2051
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Omer Katz
>Priority: Major
>
> UUID objects can be easily supported and can be represented as 128-bit 
> integers or a stream of bytes.
> The fastest way I know to construct a UUID object is by using it's 128-bit 
> (16 bytes) integer representation.
>  
> {code:java}
> %timeit uuid.UUID(int=24197857161011715162171839636988778104)
> 611 ns ± 6.27 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID(bytes=b'\x124Vx\x124Vx\x124Vx\x124Vx')
> 1.17 µs ± 7.5 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID('12345678-1234-5678-1234-567812345678')
> 1.47 µs ± 6.08 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> {code}
>  
> Right now I have to do this manually which is pretty tedious.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-2051) [Python] Support serializing UUID objects to tables

2019-09-18 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932642#comment-16932642
 ] 

Wes McKinney commented on ARROW-2051:
-

You are more than welcome to contribute a UUID extension type for use in 
Python. 

> [Python] Support serializing UUID objects to tables
> ---
>
> Key: ARROW-2051
> URL: https://issues.apache.org/jira/browse/ARROW-2051
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Omer Katz
>Priority: Major
>
> UUID objects can be easily supported and can be represented as 128-bit 
> integers or a stream of bytes.
> The fastest way I know to construct a UUID object is by using it's 128-bit 
> (16 bytes) integer representation.
>  
> {code:java}
> %timeit uuid.UUID(int=24197857161011715162171839636988778104)
> 611 ns ± 6.27 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID(bytes=b'\x124Vx\x124Vx\x124Vx\x124Vx')
> 1.17 µs ± 7.5 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID('12345678-1234-5678-1234-567812345678')
> 1.47 µs ± 6.08 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> {code}
>  
> Right now I have to do this manually which is pretty tedious.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-2051) [Python] Support serializing UUID objects to tables

2019-09-18 Thread Mitar (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932618#comment-16932618
 ] 

Mitar commented on ARROW-2051:
--

I mean, you have 128 bit numbers in Arrow? So why not supporting converting 
UUID to that?

> [Python] Support serializing UUID objects to tables
> ---
>
> Key: ARROW-2051
> URL: https://issues.apache.org/jira/browse/ARROW-2051
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Omer Katz
>Priority: Major
>
> UUID objects can be easily supported and can be represented as 128-bit 
> integers or a stream of bytes.
> The fastest way I know to construct a UUID object is by using it's 128-bit 
> (16 bytes) integer representation.
>  
> {code:java}
> %timeit uuid.UUID(int=24197857161011715162171839636988778104)
> 611 ns ± 6.27 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID(bytes=b'\x124Vx\x124Vx\x124Vx\x124Vx')
> 1.17 µs ± 7.5 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID('12345678-1234-5678-1234-567812345678')
> 1.47 µs ± 6.08 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> {code}
>  
> Right now I have to do this manually which is pretty tedious.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-2051) [Python] Support serializing UUID objects to tables

2019-09-18 Thread Joris Van den Bossche (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932447#comment-16932447
 ] 

Joris Van den Bossche commented on ARROW-2051:
--

What is the exact idea here? To provide a way to construct an array of UUIDs 
from python objects? The exact proposed enhancement is not fully clear to me.

But, I would say, as long as we have no UUID type in Arrow (or an extension 
type, not sure if there are any plans on that), construction methods for UUID 
seems out of scope for pyarrow to me. 



> [Python] Support serializing UUID objects to tables
> ---
>
> Key: ARROW-2051
> URL: https://issues.apache.org/jira/browse/ARROW-2051
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Omer Katz
>Priority: Major
>
> UUID objects can be easily supported and can be represented as 128-bit 
> integers or a stream of bytes.
> The fastest way I know to construct a UUID object is by using it's 128-bit 
> (16 bytes) integer representation.
>  
> {code:java}
> %timeit uuid.UUID(int=24197857161011715162171839636988778104)
> 611 ns ± 6.27 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID(bytes=b'\x124Vx\x124Vx\x124Vx\x124Vx')
> 1.17 µs ± 7.5 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID('12345678-1234-5678-1234-567812345678')
> 1.47 µs ± 6.08 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> {code}
>  
> Right now I have to do this manually which is pretty tedious.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-2051) [Python] Support serializing UUID objects to tables

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932371#comment-16932371
 ] 

Antoine Pitrou commented on ARROW-2051:
---

[~jorisvandenbossche] do you think it's worthwhile keeping an issue open for 
this? 

> [Python] Support serializing UUID objects to tables
> ---
>
> Key: ARROW-2051
> URL: https://issues.apache.org/jira/browse/ARROW-2051
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Omer Katz
>Priority: Major
>
> UUID objects can be easily supported and can be represented as 128-bit 
> integers or a stream of bytes.
> The fastest way I know to construct a UUID object is by using it's 128-bit 
> (16 bytes) integer representation.
>  
> {code:java}
> %timeit uuid.UUID(int=24197857161011715162171839636988778104)
> 611 ns ± 6.27 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID(bytes=b'\x124Vx\x124Vx\x124Vx\x124Vx')
> 1.17 µs ± 7.5 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> %timeit uuid.UUID('12345678-1234-5678-1234-567812345678')
> 1.47 µs ± 6.08 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
> {code}
>  
> Right now I have to do this manually which is pretty tedious.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)