[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433962#comment-16433962 ] ASF GitHub Bot commented on ARROW-2224: --- cpcloud commented on issue #1880: ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#issuecomment-380465437 Fair enough. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433957#comment-16433957 ] ASF GitHub Bot commented on ARROW-2224: --- pitrou commented on issue #1880: ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#issuecomment-380464846 I took it from here: > isdigit and isxdigit are the only standard narrow character classification functions that are not affected by the currently installed C locale. although some implementations (e.g. Microsoft in 1252 codepage) may classify additional single-byte characters as digits. http://en.cppreference.com/w/cpp/string/byte/isdigit Not sure how authoritative that page is. That said, `static_cast(std::isdigit(c))` is not very pretty. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433952#comment-16433952 ] ASF GitHub Bot commented on ARROW-2224: --- cpcloud commented on issue #1880: ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#issuecomment-380464015 > It's not like isdigit is more readable anyway Readability wasn't my original concern, reimplementing a builtin function was. > and apparently it risks being locale-dependent on Windows (which is a can of worms). Is there some documentation on this somewhere? I found the following lines in the [`setlocale` documentation for Visual Studio 2015](https://msdn.microsoft.com/en-us/library/x99tb11d.aspx): > LC_CTYPE The character-handling functions (except isdigit, isxdigit, mbstowcs, and mbtowc, which are unaffected). That suggests `isdigit` is *not* affected by locale. Am I reading something wrong? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433951#comment-16433951 ] ASF GitHub Bot commented on ARROW-2224: --- pitrou commented on a change in pull request #1880: ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#discussion_r180765500 ## File path: cpp/src/arrow/util/decimal.cc ## @@ -253,117 +251,131 @@ static void StringToInteger(const std::string& str, Decimal128* out) { } } -static const boost::regex DECIMAL_REGEX( -// sign of the number -"(?[-+]?)" - -// digits around the decimal point - "(((?\\d+)\\.(?\\d*)|\\.(?\\d+)" -")" +namespace { -// optional exponent -"([eE](?[-+]?\\d+))?" +struct DecimalComponents { + std::string sign; + std::string whole_digits; + std::string fractional_digits; + std::string exponent_sign; + std::string exponent_digits; +}; -// otherwise -"|" +inline bool IsSign(char c) { return (c == '-' || c == '+'); } -// we're just an integer -"(?\\d+)" +inline bool IsDot(char c) { return c == '.'; } -// or an integer with an exponent -"(?:[eE](?[-+]?\\d+))?)"); +inline bool IsDigit(char c) { return (c >= '0' && c <= '9'); } -static inline bool is_zero_character(char c) { return c == '0'; } +inline bool StartsExponent(char c) { return (c == 'e' || c == 'E'); } -Status Decimal128::FromString(const std::string& s, Decimal128* out, int32_t* precision, - int32_t* scale) { - if (s.empty()) { -return Status::Invalid("Empty string cannot be converted to decimal"); +inline size_t ParseDigitsRun(const char* s, size_t start, size_t size, std::string* out) { + size_t pos; + for (pos = start; pos < size; ++pos) { +if (!IsDigit(s[pos])) { + break; +} } + *out = std::string(s + start, pos - start); + return pos; +} - // case of all zeros - if (std::all_of(s.cbegin(), s.cend(), is_zero_character)) { -if (precision != nullptr) { - *precision = 0; -} +bool ParseDecimalComponents(const char* s, size_t size, DecimalComponents* out) { Review comment: I mean if the parse function is taking a `string::const_iterator`, it maybe won't accept a different kind of iterator. Or we need to make it a template function, piling more layers of abstraction without any concrete advantage. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433942#comment-16433942 ] ASF GitHub Bot commented on ARROW-2224: --- cpcloud commented on issue #1880: ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#issuecomment-380460395 @pitrou Sweet. Thanks for doing this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433941#comment-16433941 ] ASF GitHub Bot commented on ARROW-2224: --- cpcloud commented on a change in pull request #1880: ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#discussion_r180761749 ## File path: cpp/src/arrow/util/decimal.cc ## @@ -253,117 +251,131 @@ static void StringToInteger(const std::string& str, Decimal128* out) { } } -static const boost::regex DECIMAL_REGEX( -// sign of the number -"(?[-+]?)" - -// digits around the decimal point - "(((?\\d+)\\.(?\\d*)|\\.(?\\d+)" -")" +namespace { -// optional exponent -"([eE](?[-+]?\\d+))?" +struct DecimalComponents { + std::string sign; + std::string whole_digits; + std::string fractional_digits; + std::string exponent_sign; + std::string exponent_digits; +}; -// otherwise -"|" +inline bool IsSign(char c) { return (c == '-' || c == '+'); } -// we're just an integer -"(?\\d+)" +inline bool IsDot(char c) { return c == '.'; } -// or an integer with an exponent -"(?:[eE](?[-+]?\\d+))?)"); +inline bool IsDigit(char c) { return (c >= '0' && c <= '9'); } -static inline bool is_zero_character(char c) { return c == '0'; } +inline bool StartsExponent(char c) { return (c == 'e' || c == 'E'); } -Status Decimal128::FromString(const std::string& s, Decimal128* out, int32_t* precision, - int32_t* scale) { - if (s.empty()) { -return Status::Invalid("Empty string cannot be converted to decimal"); +inline size_t ParseDigitsRun(const char* s, size_t start, size_t size, std::string* out) { + size_t pos; + for (pos = start; pos < size; ++pos) { +if (!IsDigit(s[pos])) { + break; +} } + *out = std::string(s + start, pos - start); + return pos; +} - // case of all zeros - if (std::all_of(s.cbegin(), s.cend(), is_zero_character)) { -if (precision != nullptr) { - *precision = 0; -} +bool ParseDecimalComponents(const char* s, size_t size, DecimalComponents* out) { Review comment: I don't follow how a `view`/`span` class is less abstracted, since that would presumably implement the c++ iterator interface, like every implementation of it usually does. However, as I said, I don't think this is really worth spending too much time on. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433888#comment-16433888 ] ASF GitHub Bot commented on ARROW-2224: --- pitrou commented on issue #1880: ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#issuecomment-380443938 Also adding a Decimal::FromString benchmark. That benchmark is 60% faster with the PR (1.8M items/second up from 700k items/second here). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433867#comment-16433867 ] ASF GitHub Bot commented on ARROW-2224: --- pitrou commented on issue #1880: ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#issuecomment-380439980 This PR reduces conversion time for decimals from Python to Arrow by ~45%: * before: ``` [100.00%] ··· Running convert_builtins.ConvertPyListToArray.time_convert ok [100.00%] = type - decimal 177±0.06ms = ``` * after: ``` [100.00%] ··· Running convert_builtins.ConvertPyListToArray.time_convert ok [100.00%] = type - decimal 101±0.5ms = ``` (irrelevant lines removed) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433683#comment-16433683 ] ASF GitHub Bot commented on ARROW-2224: --- pitrou commented on issue #1880: ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#issuecomment-380401339 Reverted the `std::isdigit` change because of a silly performance warning (turned into an error) on MSVC. It's not like `isdigit` is more readable anyway, and apparently it risks being locale-dependent on Windows (which is a can of worms). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433182#comment-16433182 ] ASF GitHub Bot commented on ARROW-2224: --- pitrou commented on issue #1880: [WIP] ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#issuecomment-380279472 > I would think it's okay to remove them if they weren't there before the PR that introduced boost-regex. It seems the boost-regex references in the Python build files were introduced specifically for parquet: see cb5da9c1 and 8b1c8118. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433178#comment-16433178 ] ASF GitHub Bot commented on ARROW-2224: --- cpcloud commented on issue #1880: [WIP] ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#issuecomment-380278645 > By the way, I don't know if it's ok to remove boost-regex references from the Python build files, since parquet still requires boost-regex AFAIR. @xhochy Can you comment on this? I would think it's okay to remove them if they weren't there before the PR that introduced boost-regex. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433175#comment-16433175 ] ASF GitHub Bot commented on ARROW-2224: --- cpcloud commented on issue #1880: [WIP] ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#issuecomment-380278461 I think the current version supports that, does it not? I remember adding that. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433176#comment-16433176 ] ASF GitHub Bot commented on ARROW-2224: --- cpcloud commented on issue #1880: [WIP] ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#issuecomment-380278524 I may be misunderstand the usage of "should" here :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433174#comment-16433174 ] ASF GitHub Bot commented on ARROW-2224: --- pitrou commented on issue #1880: [WIP] ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#issuecomment-380278313 Another note: the new implementation should support exponents without a sign, e.g. "1.23e45". The Python decimal module also does this. I need to add tests, though. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433170#comment-16433170 ] ASF GitHub Bot commented on ARROW-2224: --- pitrou commented on issue #1880: [WIP] ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#issuecomment-380277704 By the way, I don't know if it's ok to remove boost-regex references from the Python build files, since parquet still requires boost-regex AFAIR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433160#comment-16433160 ] ASF GitHub Bot commented on ARROW-2224: --- cpcloud commented on a change in pull request #1880: [WIP] ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#discussion_r180594466 ## File path: cpp/src/arrow/util/decimal.cc ## @@ -253,117 +251,131 @@ static void StringToInteger(const std::string& str, Decimal128* out) { } } -static const boost::regex DECIMAL_REGEX( -// sign of the number -"(?[-+]?)" - -// digits around the decimal point - "(((?\\d+)\\.(?\\d*)|\\.(?\\d+)" -")" +namespace { -// optional exponent -"([eE](?[-+]?\\d+))?" +struct DecimalComponents { + std::string sign; + std::string whole_digits; + std::string fractional_digits; + std::string exponent_sign; + std::string exponent_digits; +}; -// otherwise -"|" +inline bool IsSign(char c) { return (c == '-' || c == '+'); } -// we're just an integer -"(?\\d+)" +inline bool IsDot(char c) { return c == '.'; } -// or an integer with an exponent -"(?:[eE](?[-+]?\\d+))?)"); +inline bool IsDigit(char c) { return (c >= '0' && c <= '9'); } -static inline bool is_zero_character(char c) { return c == '0'; } +inline bool StartsExponent(char c) { return (c == 'e' || c == 'E'); } -Status Decimal128::FromString(const std::string& s, Decimal128* out, int32_t* precision, - int32_t* scale) { - if (s.empty()) { -return Status::Invalid("Empty string cannot be converted to decimal"); +inline size_t ParseDigitsRun(const char* s, size_t start, size_t size, std::string* out) { + size_t pos; + for (pos = start; pos < size; ++pos) { +if (!IsDigit(s[pos])) { + break; +} } + *out = std::string(s + start, pos - start); + return pos; +} - // case of all zeros - if (std::all_of(s.cbegin(), s.cend(), is_zero_character)) { -if (precision != nullptr) { - *precision = 0; -} +bool ParseDecimalComponents(const char* s, size_t size, DecimalComponents* out) { Review comment: By no means should you feel obliged to do that extra work :) If this works, then that's great. I thought maybe you had chosen not to use iterators for a particular reason. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433158#comment-16433158 ] ASF GitHub Bot commented on ARROW-2224: --- cpcloud commented on a change in pull request #1880: [WIP] ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#discussion_r180594140 ## File path: cpp/src/arrow/util/decimal.cc ## @@ -253,117 +251,131 @@ static void StringToInteger(const std::string& str, Decimal128* out) { } } -static const boost::regex DECIMAL_REGEX( -// sign of the number -"(?[-+]?)" - -// digits around the decimal point - "(((?\\d+)\\.(?\\d*)|\\.(?\\d+)" -")" +namespace { -// optional exponent -"([eE](?[-+]?\\d+))?" +struct DecimalComponents { + std::string sign; + std::string whole_digits; + std::string fractional_digits; + std::string exponent_sign; + std::string exponent_digits; +}; -// otherwise -"|" +inline bool IsSign(char c) { return (c == '-' || c == '+'); } -// we're just an integer -"(?\\d+)" +inline bool IsDot(char c) { return c == '.'; } -// or an integer with an exponent -"(?:[eE](?[-+]?\\d+))?)"); +inline bool IsDigit(char c) { return (c >= '0' && c <= '9'); } -static inline bool is_zero_character(char c) { return c == '0'; } +inline bool StartsExponent(char c) { return (c == 'e' || c == 'E'); } -Status Decimal128::FromString(const std::string& s, Decimal128* out, int32_t* precision, - int32_t* scale) { - if (s.empty()) { -return Status::Invalid("Empty string cannot be converted to decimal"); +inline size_t ParseDigitsRun(const char* s, size_t start, size_t size, std::string* out) { + size_t pos; + for (pos = start; pos < size; ++pos) { +if (!IsDigit(s[pos])) { + break; +} } + *out = std::string(s + start, pos - start); + return pos; +} - // case of all zeros - if (std::all_of(s.cbegin(), s.cend(), is_zero_character)) { -if (precision != nullptr) { - *precision = 0; -} +bool ParseDecimalComponents(const char* s, size_t size, DecimalComponents* out) { + size_t pos = 0; -if (scale != nullptr) { - *scale = 0; + if (size == 0) { +return false; + } + // Sign of the number + if (IsSign(s[pos])) { +out->sign = std::string(s + pos, 1); +++pos; + } + // First run of digits + pos = ParseDigitsRun(s, pos, size, >whole_digits); + if (pos == size) { +return !out->whole_digits.empty(); + } + // Optional dot (if given in fractional form) + bool has_dot = IsDot(s[pos]); + if (has_dot) { +// Second run of digits +++pos; +pos = ParseDigitsRun(s, pos, size, >fractional_digits); + } + if (out->whole_digits.empty() && out->fractional_digits.empty()) { +// Need at least some digits (whole or fractional) +return false; + } + if (pos == size) { +return true; + } + // Optional exponent + if (StartsExponent(s[pos])) { +++pos; +if (pos == size) { + return false; +} +// Optional exponent sign +if (IsSign(s[pos])) { + out->exponent_sign = std::string(s + pos, 1); + ++pos; +} +pos = ParseDigitsRun(s, pos, size, >exponent_digits); +if (out->exponent_digits.empty()) { + // Need some exponent digits + return false; } - -*out = 0; -return Status::OK(); } + return pos == size; +} - boost::smatch results; - const bool matches = boost::regex_match(s, results, DECIMAL_REGEX); +} // namespace - if (!matches) { -std::stringstream ss; -ss << "The string " << s << " is not a valid decimal number"; -return Status::Invalid(ss.str()); +Status Decimal128::FromString(const std::string& s, Decimal128* out, int32_t* precision, + int32_t* scale) { + if (s.empty()) { +return Status::Invalid("Empty string cannot be converted to decimal"); } - const std::string sign = results["SIGN"]; - const std::string integer = results["INTEGER"]; - - const std::string left_digits = results["LEFT_DIGITS"]; - const std::string first_right_digits = results["FIRST_RIGHT_DIGITS"]; - - const std::string second_right_digits = results["SECOND_RIGHT_DIGITS"]; - - const std::string first_exp_value = results["FIRST_EXP_VALUE"]; - const std::string second_exp_value = results["SECOND_EXP_VALUE"]; - - std::string whole_part; - std::string fractional_part; - std::string exponent_value; - - if (!integer.empty()) { -whole_part = integer; - } else if (!left_digits.empty()) { -DCHECK(second_right_digits.empty()) << s << " " << second_right_digits; -whole_part = left_digits; -fractional_part = first_right_digits; - } else { -DCHECK(first_right_digits.empty()) << s << " " << first_right_digits; -fractional_part = second_right_digits; + DecimalComponents dec; + if (!ParseDecimalComponents(s.data(), s.size(), )) { +
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433138#comment-16433138 ] ASF GitHub Bot commented on ARROW-2224: --- cpcloud commented on a change in pull request #1880: [WIP] ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#discussion_r180592157 ## File path: cpp/src/arrow/util/decimal.cc ## @@ -253,117 +251,131 @@ static void StringToInteger(const std::string& str, Decimal128* out) { } } -static const boost::regex DECIMAL_REGEX( -// sign of the number -"(?[-+]?)" - -// digits around the decimal point - "(((?\\d+)\\.(?\\d*)|\\.(?\\d+)" -")" +namespace { -// optional exponent -"([eE](?[-+]?\\d+))?" +struct DecimalComponents { + std::string sign; + std::string whole_digits; + std::string fractional_digits; + std::string exponent_sign; + std::string exponent_digits; +}; -// otherwise -"|" +inline bool IsSign(char c) { return (c == '-' || c == '+'); } -// we're just an integer -"(?\\d+)" +inline bool IsDot(char c) { return c == '.'; } -// or an integer with an exponent -"(?:[eE](?[-+]?\\d+))?)"); +inline bool IsDigit(char c) { return (c >= '0' && c <= '9'); } -static inline bool is_zero_character(char c) { return c == '0'; } +inline bool StartsExponent(char c) { return (c == 'e' || c == 'E'); } Review comment: don't need parens here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433150#comment-16433150 ] ASF GitHub Bot commented on ARROW-2224: --- pitrou commented on a change in pull request #1880: [WIP] ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#discussion_r180593109 ## File path: cpp/src/arrow/util/decimal.cc ## @@ -253,117 +251,131 @@ static void StringToInteger(const std::string& str, Decimal128* out) { } } -static const boost::regex DECIMAL_REGEX( -// sign of the number -"(?[-+]?)" - -// digits around the decimal point - "(((?\\d+)\\.(?\\d*)|\\.(?\\d+)" -")" +namespace { -// optional exponent -"([eE](?[-+]?\\d+))?" +struct DecimalComponents { + std::string sign; + std::string whole_digits; + std::string fractional_digits; + std::string exponent_sign; + std::string exponent_digits; +}; -// otherwise -"|" +inline bool IsSign(char c) { return (c == '-' || c == '+'); } -// we're just an integer -"(?\\d+)" +inline bool IsDot(char c) { return c == '.'; } -// or an integer with an exponent -"(?:[eE](?[-+]?\\d+))?)"); +inline bool IsDigit(char c) { return (c >= '0' && c <= '9'); } -static inline bool is_zero_character(char c) { return c == '0'; } +inline bool StartsExponent(char c) { return (c == 'e' || c == 'E'); } -Status Decimal128::FromString(const std::string& s, Decimal128* out, int32_t* precision, - int32_t* scale) { - if (s.empty()) { -return Status::Invalid("Empty string cannot be converted to decimal"); +inline size_t ParseDigitsRun(const char* s, size_t start, size_t size, std::string* out) { + size_t pos; + for (pos = start; pos < size; ++pos) { +if (!IsDigit(s[pos])) { + break; +} } + *out = std::string(s + start, pos - start); + return pos; +} - // case of all zeros - if (std::all_of(s.cbegin(), s.cend(), is_zero_character)) { -if (precision != nullptr) { - *precision = 0; -} +bool ParseDecimalComponents(const char* s, size_t size, DecimalComponents* out) { + size_t pos = 0; -if (scale != nullptr) { - *scale = 0; + if (size == 0) { +return false; + } + // Sign of the number + if (IsSign(s[pos])) { +out->sign = std::string(s + pos, 1); +++pos; + } + // First run of digits + pos = ParseDigitsRun(s, pos, size, >whole_digits); + if (pos == size) { +return !out->whole_digits.empty(); + } + // Optional dot (if given in fractional form) + bool has_dot = IsDot(s[pos]); + if (has_dot) { +// Second run of digits +++pos; +pos = ParseDigitsRun(s, pos, size, >fractional_digits); + } + if (out->whole_digits.empty() && out->fractional_digits.empty()) { +// Need at least some digits (whole or fractional) +return false; + } + if (pos == size) { +return true; + } + // Optional exponent + if (StartsExponent(s[pos])) { +++pos; +if (pos == size) { + return false; +} +// Optional exponent sign +if (IsSign(s[pos])) { + out->exponent_sign = std::string(s + pos, 1); + ++pos; +} +pos = ParseDigitsRun(s, pos, size, >exponent_digits); +if (out->exponent_digits.empty()) { + // Need some exponent digits + return false; } - -*out = 0; -return Status::OK(); } + return pos == size; +} - boost::smatch results; - const bool matches = boost::regex_match(s, results, DECIMAL_REGEX); +} // namespace - if (!matches) { -std::stringstream ss; -ss << "The string " << s << " is not a valid decimal number"; -return Status::Invalid(ss.str()); +Status Decimal128::FromString(const std::string& s, Decimal128* out, int32_t* precision, + int32_t* scale) { + if (s.empty()) { +return Status::Invalid("Empty string cannot be converted to decimal"); } - const std::string sign = results["SIGN"]; - const std::string integer = results["INTEGER"]; - - const std::string left_digits = results["LEFT_DIGITS"]; - const std::string first_right_digits = results["FIRST_RIGHT_DIGITS"]; - - const std::string second_right_digits = results["SECOND_RIGHT_DIGITS"]; - - const std::string first_exp_value = results["FIRST_EXP_VALUE"]; - const std::string second_exp_value = results["SECOND_EXP_VALUE"]; - - std::string whole_part; - std::string fractional_part; - std::string exponent_value; - - if (!integer.empty()) { -whole_part = integer; - } else if (!left_digits.empty()) { -DCHECK(second_right_digits.empty()) << s << " " << second_right_digits; -whole_part = left_digits; -fractional_part = first_right_digits; - } else { -DCHECK(first_right_digits.empty()) << s << " " << first_right_digits; -fractional_part = second_right_digits; + DecimalComponents dec; + if (!ParseDecimalComponents(s.data(), s.size(), )) { +
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433147#comment-16433147 ] ASF GitHub Bot commented on ARROW-2224: --- pitrou commented on a change in pull request #1880: [WIP] ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#discussion_r180592851 ## File path: cpp/src/arrow/util/decimal.cc ## @@ -253,117 +251,131 @@ static void StringToInteger(const std::string& str, Decimal128* out) { } } -static const boost::regex DECIMAL_REGEX( -// sign of the number -"(?[-+]?)" - -// digits around the decimal point - "(((?\\d+)\\.(?\\d*)|\\.(?\\d+)" -")" +namespace { -// optional exponent -"([eE](?[-+]?\\d+))?" +struct DecimalComponents { + std::string sign; + std::string whole_digits; + std::string fractional_digits; + std::string exponent_sign; + std::string exponent_digits; +}; -// otherwise -"|" +inline bool IsSign(char c) { return (c == '-' || c == '+'); } -// we're just an integer -"(?\\d+)" +inline bool IsDot(char c) { return c == '.'; } -// or an integer with an exponent -"(?:[eE](?[-+]?\\d+))?)"); +inline bool IsDigit(char c) { return (c >= '0' && c <= '9'); } Review comment: I was too lazy to look it up :-) Will do. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433145#comment-16433145 ] ASF GitHub Bot commented on ARROW-2224: --- cpcloud commented on a change in pull request #1880: [WIP] ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#discussion_r180592574 ## File path: cpp/src/arrow/util/decimal.cc ## @@ -253,117 +251,131 @@ static void StringToInteger(const std::string& str, Decimal128* out) { } } -static const boost::regex DECIMAL_REGEX( -// sign of the number -"(?[-+]?)" - -// digits around the decimal point - "(((?\\d+)\\.(?\\d*)|\\.(?\\d+)" -")" +namespace { -// optional exponent -"([eE](?[-+]?\\d+))?" +struct DecimalComponents { + std::string sign; + std::string whole_digits; + std::string fractional_digits; + std::string exponent_sign; + std::string exponent_digits; +}; -// otherwise -"|" +inline bool IsSign(char c) { return (c == '-' || c == '+'); } -// we're just an integer -"(?\\d+)" +inline bool IsDot(char c) { return c == '.'; } -// or an integer with an exponent -"(?:[eE](?[-+]?\\d+))?)"); +inline bool IsDigit(char c) { return (c >= '0' && c <= '9'); } -static inline bool is_zero_character(char c) { return c == '0'; } +inline bool StartsExponent(char c) { return (c == 'e' || c == 'E'); } -Status Decimal128::FromString(const std::string& s, Decimal128* out, int32_t* precision, - int32_t* scale) { - if (s.empty()) { -return Status::Invalid("Empty string cannot be converted to decimal"); +inline size_t ParseDigitsRun(const char* s, size_t start, size_t size, std::string* out) { + size_t pos; + for (pos = start; pos < size; ++pos) { +if (!IsDigit(s[pos])) { + break; +} } + *out = std::string(s + start, pos - start); + return pos; +} - // case of all zeros - if (std::all_of(s.cbegin(), s.cend(), is_zero_character)) { -if (precision != nullptr) { - *precision = 0; -} +bool ParseDecimalComponents(const char* s, size_t size, DecimalComponents* out) { + size_t pos = 0; -if (scale != nullptr) { - *scale = 0; + if (size == 0) { +return false; + } + // Sign of the number + if (IsSign(s[pos])) { +out->sign = std::string(s + pos, 1); +++pos; + } + // First run of digits + pos = ParseDigitsRun(s, pos, size, >whole_digits); + if (pos == size) { +return !out->whole_digits.empty(); + } + // Optional dot (if given in fractional form) + bool has_dot = IsDot(s[pos]); + if (has_dot) { +// Second run of digits +++pos; +pos = ParseDigitsRun(s, pos, size, >fractional_digits); + } + if (out->whole_digits.empty() && out->fractional_digits.empty()) { +// Need at least some digits (whole or fractional) +return false; + } + if (pos == size) { +return true; + } + // Optional exponent + if (StartsExponent(s[pos])) { +++pos; +if (pos == size) { + return false; +} +// Optional exponent sign +if (IsSign(s[pos])) { + out->exponent_sign = std::string(s + pos, 1); + ++pos; +} +pos = ParseDigitsRun(s, pos, size, >exponent_digits); +if (out->exponent_digits.empty()) { + // Need some exponent digits + return false; } - -*out = 0; -return Status::OK(); } + return pos == size; +} - boost::smatch results; - const bool matches = boost::regex_match(s, results, DECIMAL_REGEX); +} // namespace - if (!matches) { -std::stringstream ss; -ss << "The string " << s << " is not a valid decimal number"; -return Status::Invalid(ss.str()); +Status Decimal128::FromString(const std::string& s, Decimal128* out, int32_t* precision, + int32_t* scale) { + if (s.empty()) { +return Status::Invalid("Empty string cannot be converted to decimal"); } - const std::string sign = results["SIGN"]; - const std::string integer = results["INTEGER"]; - - const std::string left_digits = results["LEFT_DIGITS"]; - const std::string first_right_digits = results["FIRST_RIGHT_DIGITS"]; - - const std::string second_right_digits = results["SECOND_RIGHT_DIGITS"]; - - const std::string first_exp_value = results["FIRST_EXP_VALUE"]; - const std::string second_exp_value = results["SECOND_EXP_VALUE"]; - - std::string whole_part; - std::string fractional_part; - std::string exponent_value; - - if (!integer.empty()) { -whole_part = integer; - } else if (!left_digits.empty()) { -DCHECK(second_right_digits.empty()) << s << " " << second_right_digits; -whole_part = left_digits; -fractional_part = first_right_digits; - } else { -DCHECK(first_right_digits.empty()) << s << " " << first_right_digits; -fractional_part = second_right_digits; + DecimalComponents dec; + if (!ParseDecimalComponents(s.data(), s.size(), )) { +
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433136#comment-16433136 ] ASF GitHub Bot commented on ARROW-2224: --- cpcloud commented on a change in pull request #1880: [WIP] ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880#discussion_r180591991 ## File path: cpp/src/arrow/util/decimal.cc ## @@ -253,117 +251,131 @@ static void StringToInteger(const std::string& str, Decimal128* out) { } } -static const boost::regex DECIMAL_REGEX( -// sign of the number -"(?[-+]?)" - -// digits around the decimal point - "(((?\\d+)\\.(?\\d*)|\\.(?\\d+)" -")" +namespace { -// optional exponent -"([eE](?[-+]?\\d+))?" +struct DecimalComponents { + std::string sign; + std::string whole_digits; + std::string fractional_digits; + std::string exponent_sign; + std::string exponent_digits; +}; -// otherwise -"|" +inline bool IsSign(char c) { return (c == '-' || c == '+'); } -// we're just an integer -"(?\\d+)" +inline bool IsDot(char c) { return c == '.'; } -// or an integer with an exponent -"(?:[eE](?[-+]?\\d+))?)"); +inline bool IsDigit(char c) { return (c >= '0' && c <= '9'); } -static inline bool is_zero_character(char c) { return c == '0'; } +inline bool StartsExponent(char c) { return (c == 'e' || c == 'E'); } -Status Decimal128::FromString(const std::string& s, Decimal128* out, int32_t* precision, - int32_t* scale) { - if (s.empty()) { -return Status::Invalid("Empty string cannot be converted to decimal"); +inline size_t ParseDigitsRun(const char* s, size_t start, size_t size, std::string* out) { + size_t pos; + for (pos = start; pos < size; ++pos) { +if (!IsDigit(s[pos])) { + break; +} } + *out = std::string(s + start, pos - start); + return pos; +} - // case of all zeros - if (std::all_of(s.cbegin(), s.cend(), is_zero_character)) { -if (precision != nullptr) { - *precision = 0; -} +bool ParseDecimalComponents(const char* s, size_t size, DecimalComponents* out) { Review comment: Why not use iterators instead of a pointer plus size? It seems like we're subverting the language. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433117#comment-16433117 ] ASF GitHub Bot commented on ARROW-2224: --- pitrou opened a new pull request #1880: [WIP] ARROW-2224: [C++] Remove boost-regex dependency URL: https://github.com/apache/arrow/pull/1880 Use a hand-written decimal parser. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432996#comment-16432996 ] Antoine Pitrou commented on ARROW-2224: --- Note that the day where we really care about decimal performance, we should probably use Stefan Krah's libmpdec, which is also used by the Python _decimal module. It's a pure C library, is fast and would probably allow us to have tighter interoperation with Python decimals. > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432995#comment-16432995 ] Phillip Cloud commented on ARROW-2224: -- The original implementation used C++ to implement a handwritten parser, but it quickly got complex once we needed to support integers with exponential notation. Maybe the complexity of dealing with boost is greater than the complexity of debugging a handwritten parser, in which case it would be worth it to reimplement it without a regular expression. > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432987#comment-16432987 ] Antoine Pitrou commented on ARROW-2224: --- How important is it to have fast parsing of decimal types? If we reimplement our own parsing, we can tweak performance *and* remove the troublesome dependency. > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432928#comment-16432928 ] Phillip Cloud commented on ARROW-2224: -- The "should" here isn't related to linking issues, rather, it's related to performance. See here for some benchmarks: http://lh3lh3.users.sourceforge.net/reb.shtml Those benchmarks are now coming up on 8 years old, so we should revisit that. Some of the linking issues might be mitigated by using {{}} on Windows and {{regcomp}} on UNIX platforms. I'm unaware of the perils of that solution (if any). > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2224) [C++] Replace boost regex usage with libre2
[ https://issues.apache.org/jira/browse/ARROW-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432890#comment-16432890 ] Antoine Pitrou commented on ARROW-2224: --- Can you explain the "should"? AFAICS libre2 isn't header-only, so it may present the same linking and ABI issues as boost-regex. > [C++] Replace boost regex usage with libre2 > --- > > Key: ARROW-2224 > URL: https://issues.apache.org/jira/browse/ARROW-2224 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > > We're using {{boost::regex}} to parse decimal strings for {{decimal128}} > types. We should use {{libre2}} instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)