Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
......................................................................


Patch Set 6:

(4 comments)

It is great to know that Impala can achieve 926 MB/s conversion rate and very 
attempting to get the best from fast_double_parser():-)

The key is not to populate a new std::string when the original input conforms 
to the requirements of the library (well formed null-terminated string via 
string::c_str() in constant speed), which should be true in most cases.

Throughout the code base of Impala, I was able to find only the following call 
that needs the service of converting string to double which makes the above 
idea feasible.

346 static bool ParseProbability(const string& prob_str, bool* should_execute) 
{   
347   StringParser::ParseResult parse_result;                 
348   double probability = StringParser::StringToFloat<double>(
349       prob_str.c_str(), prob_str.size(), &parse_result);                    
         
350   if (parse_result != StringParser::PARSE_SUCCESS ||          
351       probability < 0.0 || probability > 1.0) {
352     return false;                                                           
               
353   }                                                                   
354   // +1L ensures probability of 0.0 and 1.0 work as expected.
355   *should_execute = rand() < probability * (RAND_MAX + 1L);
356   return true;                                                            
357 }

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h
File be/src/util/string-parser.h:

http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@463
PS6, Line 463: const char* s, int len
If we can change it to std::string(), we can take advantage of the 
fast_double_parser to the fullest.


http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@498
PS6, Line 498: StringToFloatPreprocess(s, i, len);
We can pass the input std::string as 's' and get rid off 'len',

On return, if processed_str.length() is 0, then the original input confirms to 
fast_double_parser() and we can make use of the original directly.


http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@538
PS6, Line 538: const char* s, int offset,
             :       int len
The new signature of this function becomes

static inline std::string StringToFloatPreprocess(const string& s, int offset)


http://gerrit.cloudera.org:8080/#/c/17389/6/be/src/util/string-parser.h@550
PS6, Line 550:
Here before copying the test of the string, make the following check and return 
an empty string object to signal that the original input is good to go with 
fast_double_parser.

if (i==offset && res.length() == 0) {
  return string();
}



--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor <[email protected]>
Gerrit-Reviewer: Amogh Margoor <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 03 Jun 2021 17:55:42 +0000
Gerrit-HasComments: Yes

Reply via email to