[jira] [Created] (ARROW-9404) [C++] Add support for Decimal16, Decimal32 and Decimal64
Artem Alekseev created ARROW-9404: - Summary: [C++] Add support for Decimal16, Decimal32 and Decimal64 Key: ARROW-9404 URL: https://issues.apache.org/jira/browse/ARROW-9404 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Artem Alekseev Assignee: Artem Alekseev It looks like arrow lacks support for decimal16, decimal32 and decimal64 types. Are there any reasons for that? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8532) [C++][CSV] Add support for sentinel values.
[ https://issues.apache.org/jira/browse/ARROW-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090319#comment-17090319 ] Artem Alekseev commented on ARROW-8532: --- [~wesm] please take a look. > [C++][CSV] Add support for sentinel values. > --- > > Key: ARROW-8532 > URL: https://issues.apache.org/jira/browse/ARROW-8532 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Ravil Bikbulatov >Priority: Major > > Some systems still use sentinel values to store nulls. It would be good if > read_csv would place sentinel values and user wouldn't need to convet null > bitmaps to sentinel values. > Adding this support doesn't contradict Arrow specification as null values are > undefined. Also it wouldn't add any overhead to read_csv. Since Arrow is > general purpose framework I think we can relieve users from pain of > converting bitmats to sentinel values. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-8111) [C++][CSV] Support MM/DD/YYYY date format
[ https://issues.apache.org/jira/browse/ARROW-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060082#comment-17060082 ] Artem Alekseev edited comment on ARROW-8111 at 3/16/20, 10:01 AM: -- Oh, I found that we actually needed US MM/DD/ format, so I will rename the issue :) Also, for disambiguate US and EU formats we can add explicit locale param to the parser. was (Author: fexolm): Oh, I found that we actually needed US MM/DD/ format, so I will rename the issue :) > [C++][CSV] Support MM/DD/ date format > - > > Key: ARROW-8111 > URL: https://issues.apache.org/jira/browse/ARROW-8111 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Artem Alekseev >Assignee: Artem Alekseev >Priority: Major > > Currently, date parser supports only -MM-DD format. For our workload we > need MM/DD/ format. It is obvious that CSV parser should support > different date formats, so we may start from implementing MM/DD/ format. > Also, we may use some date parsing library, which would solve the problem for > us. > Also, we may need to somehow specify a format for every column in CSV parser. > If you have any implementation ideas in mind, please share, so that I can > implement it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8111) [C++][CSV] Support MM/DD/YYYY date format
[ https://issues.apache.org/jira/browse/ARROW-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Alekseev updated ARROW-8111: -- Description: Currently, date parser supports only -MM-DD format. For our workload we need MM/DD/ format. It is obvious that CSV parser should support different date formats, so we may start from implementing MM/DD/ format. Also, we may use some date parsing library, which would solve the problem for us. Also, we may need to somehow specify a format for every column in CSV parser. If you have any implementation ideas in mind, please share, so that I can implement it. was: Currently, date parser supports only -MM-DD format. For our workload we need DD/MM/ format. It is obvious that CSV parser should support different date formats, so we may start from implementing DD/MM/ format. Also, we may use some date parsing library, which would solve the problem for us. Also, we may need to somehow specify a format for every column in CSV parser. If you have any implementation ideas in mind, please share, so that I can implement it. > [C++][CSV] Support MM/DD/ date format > - > > Key: ARROW-8111 > URL: https://issues.apache.org/jira/browse/ARROW-8111 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Artem Alekseev >Assignee: Artem Alekseev >Priority: Major > > Currently, date parser supports only -MM-DD format. For our workload we > need MM/DD/ format. It is obvious that CSV parser should support > different date formats, so we may start from implementing MM/DD/ format. > Also, we may use some date parsing library, which would solve the problem for > us. > Also, we may need to somehow specify a format for every column in CSV parser. > If you have any implementation ideas in mind, please share, so that I can > implement it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8111) [C++][CSV] Support MM/DD/YYYY date format
[ https://issues.apache.org/jira/browse/ARROW-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Alekseev updated ARROW-8111: -- Summary: [C++][CSV] Support MM/DD/ date format (was: [C++][CSV] Support DD/MM/ date format) > [C++][CSV] Support MM/DD/ date format > - > > Key: ARROW-8111 > URL: https://issues.apache.org/jira/browse/ARROW-8111 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Artem Alekseev >Priority: Major > > Currently, date parser supports only -MM-DD format. For our workload we > need DD/MM/ format. It is obvious that CSV parser should support > different date formats, so we may start from implementing DD/MM/ format. > Also, we may use some date parsing library, which would solve the problem for > us. > Also, we may need to somehow specify a format for every column in CSV parser. > If you have any implementation ideas in mind, please share, so that I can > implement it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-8111) [C++][CSV] Support MM/DD/YYYY date format
[ https://issues.apache.org/jira/browse/ARROW-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Alekseev reassigned ARROW-8111: - Assignee: Artem Alekseev > [C++][CSV] Support MM/DD/ date format > - > > Key: ARROW-8111 > URL: https://issues.apache.org/jira/browse/ARROW-8111 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Artem Alekseev >Assignee: Artem Alekseev >Priority: Major > > Currently, date parser supports only -MM-DD format. For our workload we > need DD/MM/ format. It is obvious that CSV parser should support > different date formats, so we may start from implementing DD/MM/ format. > Also, we may use some date parsing library, which would solve the problem for > us. > Also, we may need to somehow specify a format for every column in CSV parser. > If you have any implementation ideas in mind, please share, so that I can > implement it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8111) [C++][CSV] Support DD/MM/YYYY date format
[ https://issues.apache.org/jira/browse/ARROW-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060082#comment-17060082 ] Artem Alekseev commented on ARROW-8111: --- Oh, I found that we actually needed US MM/DD/ format, so I will rename the issue :) > [C++][CSV] Support DD/MM/ date format > - > > Key: ARROW-8111 > URL: https://issues.apache.org/jira/browse/ARROW-8111 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Artem Alekseev >Priority: Major > > Currently, date parser supports only -MM-DD format. For our workload we > need DD/MM/ format. It is obvious that CSV parser should support > different date formats, so we may start from implementing DD/MM/ format. > Also, we may use some date parsing library, which would solve the problem for > us. > Also, we may need to somehow specify a format for every column in CSV parser. > If you have any implementation ideas in mind, please share, so that I can > implement it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8111) [C++][CSV] Support DD/MM/YYYY date format
[ https://issues.apache.org/jira/browse/ARROW-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060042#comment-17060042 ] Artem Alekseev commented on ARROW-8111: --- Ok, thanks, folks! I'll create a draft patch soon to discuss more in detail. > [C++][CSV] Support DD/MM/ date format > - > > Key: ARROW-8111 > URL: https://issues.apache.org/jira/browse/ARROW-8111 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Artem Alekseev >Priority: Major > > Currently, date parser supports only -MM-DD format. For our workload we > need DD/MM/ format. It is obvious that CSV parser should support > different date formats, so we may start from implementing DD/MM/ format. > Also, we may use some date parsing library, which would solve the problem for > us. > Also, we may need to somehow specify a format for every column in CSV parser. > If you have any implementation ideas in mind, please share, so that I can > implement it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-8111) [C++][CSV] Support DD/MM/YYYY date format
[ https://issues.apache.org/jira/browse/ARROW-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058769#comment-17058769 ] Artem Alekseev edited comment on ARROW-8111 at 3/13/20, 2:21 PM: - [~apitrou], yeah, I saw it, and I was able to patch it to support DD/MM/, but I'm not sure that adding just this format would be good enough to merge it in upstream. If it is ok, then I could just publish my patch, if its not, then we need to think about the more general solution of supporting more standard formats, or even custom time format. was (Author: fexolm): [~apitrou], yeah, I saw it, and I was able to patch it to support DD/MM/, but I'm not sure if adding just this format would be good enough to merge it in upstream. If it is ok, then I could just publish my patch, if its not, then we need to think about the more general solution of supporting more standard formats, or even custom time format. > [C++][CSV] Support DD/MM/ date format > - > > Key: ARROW-8111 > URL: https://issues.apache.org/jira/browse/ARROW-8111 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Artem Alekseev >Priority: Major > > Currently, date parser supports only -MM-DD format. For our workload we > need DD/MM/ format. It is obvious that CSV parser should support > different date formats, so we may start from implementing DD/MM/ format. > Also, we may use some date parsing library, which would solve the problem for > us. > Also, we may need to somehow specify a format for every column in CSV parser. > If you have any implementation ideas in mind, please share, so that I can > implement it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-8111) [C++][CSV] Support DD/MM/YYYY date format
[ https://issues.apache.org/jira/browse/ARROW-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058769#comment-17058769 ] Artem Alekseev commented on ARROW-8111: --- [~apitrou], yeah, I saw it, and I was able to patch it to support DD/MM/, but I'm not sure if adding just this format would be good enough to merge it in upstream. If it is ok, then I could just publish my patch, if its not, then we need to think about the more general solution of supporting more standard formats, or even custom time format. > [C++][CSV] Support DD/MM/ date format > - > > Key: ARROW-8111 > URL: https://issues.apache.org/jira/browse/ARROW-8111 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Artem Alekseev >Priority: Major > > Currently, date parser supports only -MM-DD format. For our workload we > need DD/MM/ format. It is obvious that CSV parser should support > different date formats, so we may start from implementing DD/MM/ format. > Also, we may use some date parsing library, which would solve the problem for > us. > Also, we may need to somehow specify a format for every column in CSV parser. > If you have any implementation ideas in mind, please share, so that I can > implement it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8111) [C++][CSV] Support DD/MM/YYYY date format
[ https://issues.apache.org/jira/browse/ARROW-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Alekseev updated ARROW-8111: -- Description: Currently, date parser supports only -MM-DD format. For our workload we need DD/MM/ format. It is obvious that CSV parser should support different date formats, so we may start from implementing DD/MM/ format. Also, we may use some date parsing library, which would solve the problem for us. Also, we may need to somehow specify a format for every column in CSV parser. If you have any implementation ideas in mind, please share, so that I can implement it. was: Currently, date parser supports only -MM-DD format. For our workload we need DD/MM/ format. It is obvious that CSV parser should support different date formats, so we may start from implementing DD/MM/ format. Also, we may use some date parsing library, which would solve the problem for us. > [C++][CSV] Support DD/MM/ date format > - > > Key: ARROW-8111 > URL: https://issues.apache.org/jira/browse/ARROW-8111 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Artem Alekseev >Priority: Major > > Currently, date parser supports only -MM-DD format. For our workload we > need DD/MM/ format. It is obvious that CSV parser should support > different date formats, so we may start from implementing DD/MM/ format. > Also, we may use some date parsing library, which would solve the problem for > us. > Also, we may need to somehow specify a format for every column in CSV parser. > If you have any implementation ideas in mind, please share, so that I can > implement it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8111) [C++][CSV] Support DD/MM/YYYY date format
[ https://issues.apache.org/jira/browse/ARROW-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Alekseev updated ARROW-8111: -- Description: Currently, date parser supports only -MM-DD format. For our workload we need DD/MM/ format. It is obvious that CSV parser should support different date formats, so we may start from implementing DD/MM/ format. Also, we may use some date parsing library, which would solve the problem for us. was:Currently, date parser supports only -MM-DD format. For our workload we need DD/MM/ format. It is obvious that CSV parser should support different date formats, so we may start from implementing DD/MM/ format. > [C++][CSV] Support DD/MM/ date format > - > > Key: ARROW-8111 > URL: https://issues.apache.org/jira/browse/ARROW-8111 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Artem Alekseev >Priority: Major > > Currently, date parser supports only -MM-DD format. For our workload we > need DD/MM/ format. It is obvious that CSV parser should support > different date formats, so we may start from implementing DD/MM/ format. > Also, we may use some date parsing library, which would solve the problem for > us. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8111) [C++][CSV] Support DD/MM/YYYY date format
Artem Alekseev created ARROW-8111: - Summary: [C++][CSV] Support DD/MM/ date format Key: ARROW-8111 URL: https://issues.apache.org/jira/browse/ARROW-8111 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Artem Alekseev Currently, date parser supports only -MM-DD format. For our workload we need DD/MM/ format. It is obvious that CSV parser should support different date formats, so we may start from implementing DD/MM/ format. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7085) [C++][CSV] Add support for Extention type in csv reader
[ https://issues.apache.org/jira/browse/ARROW-7085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972401#comment-16972401 ] Artem Alekseev commented on ARROW-7085: --- [~apitrou] we use our own dictionary encoding > [C++][CSV] Add support for Extention type in csv reader > --- > > Key: ARROW-7085 > URL: https://issues.apache.org/jira/browse/ARROW-7085 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Artem Alekseev >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-7085) [C++][CSV] Add support for Extention type in csv reader
[ https://issues.apache.org/jira/browse/ARROW-7085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971401#comment-16971401 ] Artem Alekseev edited comment on ARROW-7085 at 11/11/19 10:57 AM: -- [~apitrou] We need to encode strings with custom encoder and avoid the overhead of parsing column as a string and converting it to integers. I think there are 2 options of desirable API: * add an option to define custom Converter for ExtensionType (for now it's impossible to use extension types in CSV reader) * add an option to define custom ColumnBuilder for a specific column I think such API can be useful, but for my problem, I found that arrow dictionary encoding can do the job. But I'm not sure if there is a way to e.g. check if a specified string is in the dictionary, or how to get string index. Does Arrow provide such API? was (Author: fexolm): [~apitrou] We need to encode strings with custom encoder and avoid the overhead of parsing column as a string and converting it to integers. I think there are 2 options of desirable API: * add an option to define custom Converter for ExtensionType (for now it's impossible to use extension types in CSV reader) * add an option to define custom ColumnBuilder for a specific column > [C++][CSV] Add support for Extention type in csv reader > --- > > Key: ARROW-7085 > URL: https://issues.apache.org/jira/browse/ARROW-7085 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Artem Alekseev >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7085) [C++][CSV] Add support for Extention type in csv reader
[ https://issues.apache.org/jira/browse/ARROW-7085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971401#comment-16971401 ] Artem Alekseev commented on ARROW-7085: --- [~apitrou] We need to encode strings with custom encoder and avoid the overhead of parsing column as a string and converting it to integers. I think there are 2 options of desirable API: * add an option to define custom Converter for ExtensionType (for now it's impossible to use extension types in CSV reader) * add an option to define custom ColumnBuilder for a specific column > [C++][CSV] Add support for Extention type in csv reader > --- > > Key: ARROW-7085 > URL: https://issues.apache.org/jira/browse/ARROW-7085 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Artem Alekseev >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-7085) [C++][CSV] Add support for Extention type in csv reader
[ https://issues.apache.org/jira/browse/ARROW-7085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970050#comment-16970050 ] Artem Alekseev edited comment on ARROW-7085 at 11/8/19 10:38 AM: - [~wesm], [~apitrou] For our purposes we need to define a custom type and use arrow CSV reader to parse it. Unfortunately, there is no way to use custom types with CSV reader right now, because of type dispatch when choosing an appropriate column builder. I'm not sure how architecturally it could be solved, maybe custom ColumnBuilder is a case. was (Author: fexolm): [~wesm] For our purposes we need to define a custom type and use arrow CSV reader to parse it. Unfortunately, there is no way to use custom types with CSV reader right now, because of type dispatch when choosing an appropriate column builder. I'm not sure how architecturally it could be solved, maybe custom ColumnBuilder is a case. > [C++][CSV] Add support for Extention type in csv reader > --- > > Key: ARROW-7085 > URL: https://issues.apache.org/jira/browse/ARROW-7085 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Artem Alekseev >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-7085) [C++][CSV] Add support for Extention type in csv reader
[ https://issues.apache.org/jira/browse/ARROW-7085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970050#comment-16970050 ] Artem Alekseev commented on ARROW-7085: --- [~wesm] For our purposes we need to define a custom type and use arrow CSV reader to parse it. Unfortunately, there is no way to use custom types with CSV reader right now, because of type dispatch when choosing an appropriate column builder. I'm not sure how architecturally it could be solved, maybe custom ColumnBuilder is a case. > [C++][CSV] Add support for Extention type in csv reader > --- > > Key: ARROW-7085 > URL: https://issues.apache.org/jira/browse/ARROW-7085 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Artem Alekseev >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-4631) [C++] Implement serial version of sort computational kernel
[ https://issues.apache.org/jira/browse/ARROW-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Alekseev reassigned ARROW-4631: - Assignee: (was: Artem Alekseev) > [C++] Implement serial version of sort computational kernel > --- > > Key: ARROW-4631 > URL: https://issues.apache.org/jira/browse/ARROW-4631 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Affects Versions: 0.13.0 >Reporter: Areg Melik-Adamyan >Priority: Major > Labels: analytics > Fix For: 1.0.0 > > > Implement serial version of sort computational kernel. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7085) [C++][CSV] Add support for Extention type in csv reader
Artem Alekseev created ARROW-7085: - Summary: [C++][CSV] Add support for Extention type in csv reader Key: ARROW-7085 URL: https://issues.apache.org/jira/browse/ARROW-7085 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Artem Alekseev -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6973) [C++][ThreadPool] Use perfect forwarding in Submit
Artem Alekseev created ARROW-6973: - Summary: [C++][ThreadPool] Use perfect forwarding in Submit Key: ARROW-6973 URL: https://issues.apache.org/jira/browse/ARROW-6973 Project: Apache Arrow Issue Type: Improvement Reporter: Artem Alekseev Assignee: Artem Alekseev -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-1564) [C++] Kernel functions for computing minimum and maximum of an array in one pass
[ https://issues.apache.org/jira/browse/ARROW-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Alekseev reassigned ARROW-1564: - Assignee: (was: Artem Alekseev) > [C++] Kernel functions for computing minimum and maximum of an array in one > pass > > > Key: ARROW-1564 > URL: https://issues.apache.org/jira/browse/ARROW-1564 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: Analytics > > This is useful for determining whether a small-range integer O( n ) sort can > be used in some circumstances. Can also be used for simply computing array > statistics -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (ARROW-1564) [C++] Kernel functions for computing minimum and maximum of an array in one pass
[ https://issues.apache.org/jira/browse/ARROW-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Alekseev reassigned ARROW-1564: - Assignee: Artem Alekseev > [C++] Kernel functions for computing minimum and maximum of an array in one > pass > > > Key: ARROW-1564 > URL: https://issues.apache.org/jira/browse/ARROW-1564 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Assignee: Artem Alekseev >Priority: Major > Labels: Analytics > > This is useful for determining whether a small-range integer O( n ) sort can > be used in some circumstances. Can also be used for simply computing array > statistics -- This message was sent by Atlassian JIRA (v7.6.14#76016)