[jira] [Commented] (ARROW-6179) [C++] ExtensionType subclass for "unknown" types?

2019-08-11 Thread Micah Kornfield (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904826#comment-16904826
 ] 

Micah Kornfield commented on ARROW-6179:


"The bigquery usage of this, is that open source code? (to familiarize myself 
with an application of the extension types) "

No it isn't open source. The usage can be seen it is visible when using the 
storage API (which i believe has a free tier, but I haven't used it myself).


"You mean that you use the extension type key (ARROW:extension:name) in the 
metadata without having it an actual extension type?"

Yes that is what I mean.

> [C++] ExtensionType subclass for "unknown" types?
> -
>
> Key: ARROW-6179
> URL: https://issues.apache.org/jira/browse/ARROW-6179
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Joris Van den Bossche
>Priority: Major
>
> In C++, when receiving IPC with extension type metadata for a type that is 
> unknown (the name is not registered), we currently fall back to returning the 
> "raw" storage array. The custom metadata (extension name and metadata) is 
> still available in the Field metadata.
> Alternatively, we could also have a generic {{ExtensionType}} class that can 
> hold such "unknown" extension type (eg {{UnknowExtensionType}} or 
> {{GenericExtensionType}}), keeping the extension name and metadata in the 
> Array's type. 
> This could be a single class where several instances can be created given a 
> storage type, extension name and optionally extension metadata. It would be a 
> way to have an unregistered extension type.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6179) [C++] ExtensionType subclass for "unknown" types?

2019-08-09 Thread Joris Van den Bossche (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903969#comment-16903969
 ] 

Joris Van den Bossche commented on ARROW-6179:
--

The bigquery usage of this, is that open source code? (to familiarize myself 
with an application of the extension types) 
You mean that you use the extension type key (ARROW:extension:name) in the 
metadata without having it an actual extension type?

For sure if we would create such a generic extension array, I think it should 
work in more places in arrow than it currently is the case (eg I opened issues 
to fallback to the storage type when converting to pandas or to parquet).

> [C++] ExtensionType subclass for "unknown" types?
> -
>
> Key: ARROW-6179
> URL: https://issues.apache.org/jira/browse/ARROW-6179
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Joris Van den Bossche
>Priority: Major
>
> In C++, when receiving IPC with extension type metadata for a type that is 
> unknown (the name is not registered), we currently fall back to returning the 
> "raw" storage array. The custom metadata (extension name and metadata) is 
> still available in the Field metadata.
> Alternatively, we could also have a generic {{ExtensionType}} class that can 
> hold such "unknown" extension type (eg {{UnknowExtensionType}} or 
> {{GenericExtensionType}}), keeping the extension name and metadata in the 
> Array's type. 
> This could be a single class where several instances can be created given a 
> storage type, extension name and optionally extension metadata. It would be a 
> way to have an unregistered extension type.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6179) [C++] ExtensionType subclass for "unknown" types?

2019-08-09 Thread Micah Kornfield (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903949#comment-16903949
 ] 

Micah Kornfield commented on ARROW-6179:


Ok, personally I would like to leave the current  behavior as at least the 
default.  One example of the usage on non registration of  extension types is 
the BQ storage read API uses it to mark fields that don't have a one to one 
correspondence with built in arrow types (geography and datetime).  In the 
future someone could choose to write custom extension types but in the meantime 
they don't require special handling and flow through without any problem when 
converting to pandas.

> [C++] ExtensionType subclass for "unknown" types?
> -
>
> Key: ARROW-6179
> URL: https://issues.apache.org/jira/browse/ARROW-6179
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Joris Van den Bossche
>Priority: Major
>
> In C++, when receiving IPC with extension type metadata for a type that is 
> unknown (the name is not registered), we currently fall back to returning the 
> "raw" storage array. The custom metadata (extension name and metadata) is 
> still available in the Field metadata.
> Alternatively, we could also have a generic {{ExtensionType}} class that can 
> hold such "unknown" extension type (eg {{UnknowExtensionType}} or 
> {{GenericExtensionType}}), keeping the extension name and metadata in the 
> Array's type. 
> This could be a single class where several instances can be created given a 
> storage type, extension name and optionally extension metadata. It would be a 
> way to have an unregistered extension type.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6179) [C++] ExtensionType subclass for "unknown" types?

2019-08-09 Thread Joris Van den Bossche (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903935#comment-16903935
 ] 

Joris Van den Bossche commented on ARROW-6179:
--

I suppose, if we go for this, it would replace the automatic fallback. And then 
a user can still get the storage array as a fallback themselves?

Although, I see that there is a PR adding {{IpcOptions}} for writing, so if 
needed, there might also be such options for reading.

To be honest, I don't know have a good enough idea of potential use cases in 
C++ of the ExtensionType mechanism to really assess if it would be generally 
useful to keep the array in a generic extension array or rather directly fall 
back to the storage array.  
I was thinking that for Python usage, this might be useful to be able to send 
an extension type defined from Python without needing to register a specific 
subclass in C++.


> [C++] ExtensionType subclass for "unknown" types?
> -
>
> Key: ARROW-6179
> URL: https://issues.apache.org/jira/browse/ARROW-6179
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Joris Van den Bossche
>Priority: Major
>
> In C++, when receiving IPC with extension type metadata for a type that is 
> unknown (the name is not registered), we currently fall back to returning the 
> "raw" storage array. The custom metadata (extension name and metadata) is 
> still available in the Field metadata.
> Alternatively, we could also have a generic {{ExtensionType}} class that can 
> hold such "unknown" extension type (eg {{UnknowExtensionType}} or 
> {{GenericExtensionType}}), keeping the extension name and metadata in the 
> Array's type. 
> This could be a single class where several instances can be created given a 
> storage type, extension name and optionally extension metadata. It would be a 
> way to have an unregistered extension type.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (ARROW-6179) [C++] ExtensionType subclass for "unknown" types?

2019-08-08 Thread Micah Kornfield (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903265#comment-16903265
 ] 

Micah Kornfield commented on ARROW-6179:


How would the two options be chosen?

> [C++] ExtensionType subclass for "unknown" types?
> -
>
> Key: ARROW-6179
> URL: https://issues.apache.org/jira/browse/ARROW-6179
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Joris Van den Bossche
>Priority: Major
>
> In C++, when receiving IPC with extension type metadata for a type that is 
> unknown (the name is not registered), we currently fall back to returning the 
> "raw" storage array. The custom metadata (extension name and metadata) is 
> still available in the Field metadata.
> Alternatively, we could also have a generic {{ExtensionType}} class that can 
> hold such "unknown" extension type (eg {{UnknowExtensionType}} or 
> {{GenericExtensionType}}), keeping the extension name and metadata in the 
> Array's type. 
> This could be a single class where several instances can be created given a 
> storage type, extension name and optionally extension metadata. It would be a 
> way to have an unregistered extension type.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)