subject:"\[GitHub\] \[arrow\] jorisvandenbossche commented on pull request #7536\: ARROW\-8647\: \[C\+\+\]\[Python\]\[Dataset\] Allow partitioning fields to be inferred with dictionary type"

[GitHub] [arrow] jorisvandenbossche commented on pull request #7536: ARROW-8647: [C++][Python][Dataset] Allow partitioning fields to be inferred with dictionary type

2020-07-07 Thread GitBox

jorisvandenbossche commented on pull request #7536: URL: https://github.com/apache/arrow/pull/7536#issuecomment-654704786 > I think that any comparison involving the dict type should also work with the "effective" logical type (the value type of the dict). Opened https://issues.apach

[GitHub] [arrow] jorisvandenbossche commented on pull request #7536: ARROW-8647: [C++][Python][Dataset] Allow partitioning fields to be inferred with dictionary type

2020-07-01 Thread GitBox

jorisvandenbossche commented on pull request #7536: URL: https://github.com/apache/arrow/pull/7536#issuecomment-652493993 @bkietz thanks for the update ensuring all uniques as dictionary values! Testing this out, I ran into an issue with HivePartitioning -> ARROW-9288 / #7608

[GitHub] [arrow] jorisvandenbossche commented on pull request #7536: ARROW-8647: [C++][Python][Dataset] Allow partitioning fields to be inferred with dictionary type

2020-06-25 Thread GitBox

jorisvandenbossche commented on pull request #7536: URL: https://github.com/apache/arrow/pull/7536#issuecomment-649500017 Currently for the ParquetDataset, it also simply uses int32 for the indices. Now, there is a more fundamental issue I had not thought of: the actual dictionary o