[jira] [Commented] (ARROW-6179) [C++] ExtensionType subclass for "unknown" types?
[ https://issues.apache.org/jira/browse/ARROW-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904826#comment-16904826 ] Micah Kornfield commented on ARROW-6179: "The bigquery usage of this, is that open source code? (to familiarize myself with an application of the extension types) " No it isn't open source. The usage can be seen it is visible when using the storage API (which i believe has a free tier, but I haven't used it myself). "You mean that you use the extension type key (ARROW:extension:name) in the metadata without having it an actual extension type?" Yes that is what I mean. > [C++] ExtensionType subclass for "unknown" types? > - > > Key: ARROW-6179 > URL: https://issues.apache.org/jira/browse/ARROW-6179 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Joris Van den Bossche >Priority: Major > > In C++, when receiving IPC with extension type metadata for a type that is > unknown (the name is not registered), we currently fall back to returning the > "raw" storage array. The custom metadata (extension name and metadata) is > still available in the Field metadata. > Alternatively, we could also have a generic {{ExtensionType}} class that can > hold such "unknown" extension type (eg {{UnknowExtensionType}} or > {{GenericExtensionType}}), keeping the extension name and metadata in the > Array's type. > This could be a single class where several instances can be created given a > storage type, extension name and optionally extension metadata. It would be a > way to have an unregistered extension type. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6179) [C++] ExtensionType subclass for "unknown" types?
[ https://issues.apache.org/jira/browse/ARROW-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903969#comment-16903969 ] Joris Van den Bossche commented on ARROW-6179: -- The bigquery usage of this, is that open source code? (to familiarize myself with an application of the extension types) You mean that you use the extension type key (ARROW:extension:name) in the metadata without having it an actual extension type? For sure if we would create such a generic extension array, I think it should work in more places in arrow than it currently is the case (eg I opened issues to fallback to the storage type when converting to pandas or to parquet). > [C++] ExtensionType subclass for "unknown" types? > - > > Key: ARROW-6179 > URL: https://issues.apache.org/jira/browse/ARROW-6179 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Joris Van den Bossche >Priority: Major > > In C++, when receiving IPC with extension type metadata for a type that is > unknown (the name is not registered), we currently fall back to returning the > "raw" storage array. The custom metadata (extension name and metadata) is > still available in the Field metadata. > Alternatively, we could also have a generic {{ExtensionType}} class that can > hold such "unknown" extension type (eg {{UnknowExtensionType}} or > {{GenericExtensionType}}), keeping the extension name and metadata in the > Array's type. > This could be a single class where several instances can be created given a > storage type, extension name and optionally extension metadata. It would be a > way to have an unregistered extension type. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6179) [C++] ExtensionType subclass for "unknown" types?
[ https://issues.apache.org/jira/browse/ARROW-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903949#comment-16903949 ] Micah Kornfield commented on ARROW-6179: Ok, personally I would like to leave the current behavior as at least the default. One example of the usage on non registration of extension types is the BQ storage read API uses it to mark fields that don't have a one to one correspondence with built in arrow types (geography and datetime). In the future someone could choose to write custom extension types but in the meantime they don't require special handling and flow through without any problem when converting to pandas. > [C++] ExtensionType subclass for "unknown" types? > - > > Key: ARROW-6179 > URL: https://issues.apache.org/jira/browse/ARROW-6179 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Joris Van den Bossche >Priority: Major > > In C++, when receiving IPC with extension type metadata for a type that is > unknown (the name is not registered), we currently fall back to returning the > "raw" storage array. The custom metadata (extension name and metadata) is > still available in the Field metadata. > Alternatively, we could also have a generic {{ExtensionType}} class that can > hold such "unknown" extension type (eg {{UnknowExtensionType}} or > {{GenericExtensionType}}), keeping the extension name and metadata in the > Array's type. > This could be a single class where several instances can be created given a > storage type, extension name and optionally extension metadata. It would be a > way to have an unregistered extension type. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6179) [C++] ExtensionType subclass for "unknown" types?
[ https://issues.apache.org/jira/browse/ARROW-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903935#comment-16903935 ] Joris Van den Bossche commented on ARROW-6179: -- I suppose, if we go for this, it would replace the automatic fallback. And then a user can still get the storage array as a fallback themselves? Although, I see that there is a PR adding {{IpcOptions}} for writing, so if needed, there might also be such options for reading. To be honest, I don't know have a good enough idea of potential use cases in C++ of the ExtensionType mechanism to really assess if it would be generally useful to keep the array in a generic extension array or rather directly fall back to the storage array. I was thinking that for Python usage, this might be useful to be able to send an extension type defined from Python without needing to register a specific subclass in C++. > [C++] ExtensionType subclass for "unknown" types? > - > > Key: ARROW-6179 > URL: https://issues.apache.org/jira/browse/ARROW-6179 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Joris Van den Bossche >Priority: Major > > In C++, when receiving IPC with extension type metadata for a type that is > unknown (the name is not registered), we currently fall back to returning the > "raw" storage array. The custom metadata (extension name and metadata) is > still available in the Field metadata. > Alternatively, we could also have a generic {{ExtensionType}} class that can > hold such "unknown" extension type (eg {{UnknowExtensionType}} or > {{GenericExtensionType}}), keeping the extension name and metadata in the > Array's type. > This could be a single class where several instances can be created given a > storage type, extension name and optionally extension metadata. It would be a > way to have an unregistered extension type. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (ARROW-6179) [C++] ExtensionType subclass for "unknown" types?
[ https://issues.apache.org/jira/browse/ARROW-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903265#comment-16903265 ] Micah Kornfield commented on ARROW-6179: How would the two options be chosen? > [C++] ExtensionType subclass for "unknown" types? > - > > Key: ARROW-6179 > URL: https://issues.apache.org/jira/browse/ARROW-6179 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Joris Van den Bossche >Priority: Major > > In C++, when receiving IPC with extension type metadata for a type that is > unknown (the name is not registered), we currently fall back to returning the > "raw" storage array. The custom metadata (extension name and metadata) is > still available in the Field metadata. > Alternatively, we could also have a generic {{ExtensionType}} class that can > hold such "unknown" extension type (eg {{UnknowExtensionType}} or > {{GenericExtensionType}}), keeping the extension name and metadata in the > Array's type. > This could be a single class where several instances can be created given a > storage type, extension name and optionally extension metadata. It would be a > way to have an unregistered extension type. -- This message was sent by Atlassian JIRA (v7.6.14#76016)