[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2018-03-04 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385353#comment-16385353
 ] 

Wes McKinney commented on ARROW-1491:
-

[~cpcloud] this would be nice to have, but relative to the bug backlog for 
0.9.0 we could also defer this to the next release

> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2018-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367850#comment-16367850
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

cpcloud commented on issue #1387: ARROW-1491: [C++] Add casting implementations 
from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#issuecomment-366351431
 
 
   I'm taking over this PR, will put up a new one based on this one.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2018-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348890#comment-16348890
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

cpcloud commented on a change in pull request #1387: ARROW-1491: [C++] Add 
casting implementations from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#discussion_r165417551
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -703,6 +706,106 @@ struct CastFunctor::value &&
+!std::is_same::value,
+T>::type
+castStringToNumeric(const std::string& s) {
+  return boost::lexical_cast(s);
+}
+
+template 
+typename std::enable_if::value || std::is_same::value,
+T>::type
+castStringToNumeric(const std::string& s) {
+  // Convert to int before casting to T
+  // because boost::lexical_cast does not support 8bit int/uint.
+  return boost::numeric_cast(boost::lexical_cast(s));
+}
+
+template 
+struct CastFunctor::value>::type> {
+  void operator()(FunctionContext* ctx, const CastOptions& options,
+  const ArrayData& input, ArrayData* output) {
+using out_type = typename O::c_type;
+StringArray input_array(input.Copy());
+
+auto out_data = GetMutableValues(output, 1);
+
+std::function cast_func;
+
+for (int64_t i = 0; i < input.length; ++i) {
+  if (input_array.IsNull(i)) {
+out_data++;
+continue;
+  }
+
+  std::string s = input_array.GetString(i);
+
+  try {
+*out_data++ = castStringToNumeric(s);
+  } catch (...) {
 
 Review comment:
   I'm concerned about propagating the actual error message instead of just 
saying "Cast from X to Y failed".


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2018-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348891#comment-16348891
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

cpcloud commented on a change in pull request #1387: ARROW-1491: [C++] Add 
casting implementations from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#discussion_r165417094
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -703,6 +706,106 @@ struct CastFunctor::value &&
+!std::is_same::value,
+T>::type
+castStringToNumeric(const std::string& s) {
+  return boost::lexical_cast(s);
+}
+
+template 
+typename std::enable_if::value || std::is_same::value,
+T>::type
+castStringToNumeric(const std::string& s) {
+  // Convert to int before casting to T
+  // because boost::lexical_cast does not support 8bit int/uint.
+  return boost::numeric_cast(boost::lexical_cast(s));
+}
+
+template 
+struct CastFunctor::value>::type> {
+  void operator()(FunctionContext* ctx, const CastOptions& options,
+  const ArrayData& input, ArrayData* output) {
+using out_type = typename O::c_type;
+StringArray input_array(input.Copy());
+
+auto out_data = GetMutableValues(output, 1);
+
+std::function cast_func;
 
 Review comment:
   Is this variable used anywhere? It looks like you might've replaced it with 
the `castStringToNumeric` function.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2018-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348889#comment-16348889
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

cpcloud commented on a change in pull request #1387: ARROW-1491: [C++] Add 
casting implementations from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#discussion_r165416431
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -703,6 +706,106 @@ struct CastFunctor::value &&
+!std::is_same::value,
+T>::type
+castStringToNumeric(const std::string& s) {
 
 Review comment:
   Capitalize this function.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2018-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348892#comment-16348892
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

cpcloud commented on a change in pull request #1387: ARROW-1491: [C++] Add 
casting implementations from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#discussion_r165416478
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -703,6 +706,106 @@ struct CastFunctor::value &&
+!std::is_same::value,
+T>::type
+castStringToNumeric(const std::string& s) {
+  return boost::lexical_cast(s);
+}
+
+template 
+typename std::enable_if::value || std::is_same::value,
+T>::type
+castStringToNumeric(const std::string& s) {
 
 Review comment:
   Capitalize.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2018-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348893#comment-16348893
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

cpcloud commented on a change in pull request #1387: ARROW-1491: [C++] Add 
casting implementations from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#discussion_r165417268
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -703,6 +706,106 @@ struct CastFunctor::value &&
+!std::is_same::value,
+T>::type
+castStringToNumeric(const std::string& s) {
+  return boost::lexical_cast(s);
+}
+
+template 
+typename std::enable_if::value || std::is_same::value,
+T>::type
+castStringToNumeric(const std::string& s) {
+  // Convert to int before casting to T
+  // because boost::lexical_cast does not support 8bit int/uint.
+  return boost::numeric_cast(boost::lexical_cast(s));
+}
+
+template 
+struct CastFunctor::value>::type> {
+  void operator()(FunctionContext* ctx, const CastOptions& options,
+  const ArrayData& input, ArrayData* output) {
+using out_type = typename O::c_type;
+StringArray input_array(input.Copy());
+
+auto out_data = GetMutableValues(output, 1);
+
+std::function cast_func;
+
+for (int64_t i = 0; i < input.length; ++i) {
+  if (input_array.IsNull(i)) {
+out_data++;
+continue;
+  }
+
+  std::string s = input_array.GetString(i);
+
+  try {
+*out_data++ = castStringToNumeric(s);
+  } catch (...) {
 
 Review comment:
   Is there a specific exception that can be caught here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2018-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325563#comment-16325563
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

xhochy commented on issue #1387: ARROW-1491: [C++] Add casting implementations 
from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#issuecomment-357510384
 
 
   This PR looks good besides the dependency on Boost. Probably we need this to 
get it working but in the longterm, we should get rid of it again.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2018-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325562#comment-16325562
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

xhochy commented on a change in pull request #1387: ARROW-1491: [C++] Add 
casting implementations from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#discussion_r161396841
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -17,6 +17,9 @@
 
 #include "arrow/compute/kernels/cast.h"
 
+#include 
+#include 
+#include 
 
 Review comment:
   Wouldn't it be ok in the of small size ints just to upcast them? This should 
not affect performance as it's a small temporary.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321227#comment-16321227
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

wesm commented on issue #1387: ARROW-1491: [C++] Add casting implementations 
from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#issuecomment-356755296
 
 
   I will review again when I can


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2018-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16320262#comment-16320262
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

Licht-T commented on issue #1387: ARROW-1491: [C++] Add casting implementations 
from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#issuecomment-356607776
 
 
   @wesm Now, all fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2017-12-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306687#comment-16306687
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

Licht-T commented on a change in pull request #1387: ARROW-1491: [C++] Add 
casting implementations from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#discussion_r159118929
 
 

 ##
 File path: cpp/src/arrow/compute/compute-test.cc
 ##
 @@ -769,6 +769,65 @@ TEST_F(TestCast, OffsetOutputBuffer) {
 int16(), e3);
 }
 
+TEST_F(TestCast, StringToBoolean) {
+  CastOptions options;
+
+  vector is_valid = {true, true, true, true, true};
+
+  vector v1 = {"False", "true", "true", "True", "false"};
+  vector v2 = {"0", "1", "1", "1", "0"};
+  vector e = {false, true, true, true, false};
+  CheckCase(utf8(), v1, is_valid, 
boolean(),
+e, options);
+  CheckCase(utf8(), v2, is_valid, 
boolean(),
+e, options);
+}
+
+TEST_F(TestCast, StringToNumber) {
+  CastOptions options;
+
+  vector is_valid = {true, true, true, true, true};
+
+  // string to int
+  vector v_int = {"0", "1", "127", "-1", "0"};
+  vector e_int8 = {0, 1, 127, -1, 0};
+  vector e_int16 = {0, 1, 127, -1, 0};
+  vector e_int32 = {0, 1, 127, -1, 0};
+  vector e_int64 = {0, 1, 127, -1, 0};
+  CheckCase(utf8(), v_int, 
is_valid, int8(),
+   e_int8, options);
+  CheckCase(utf8(), v_int, 
is_valid, int16(),
+ e_int16, options);
+  CheckCase(utf8(), v_int, 
is_valid, int32(),
+ e_int32, options);
+  CheckCase(utf8(), v_int, 
is_valid, int64(),
+ e_int64, options);
+
+  // string to uint
+  vector v_uint = {"0", "1", "127", "255", "0"};
+  vector e_uint8 = {0, 1, 127, 255, 0};
+  vector e_uint16 = {0, 1, 127, 255, 0};
+  vector e_uint32 = {0, 1, 127, 255, 0};
+  vector e_uint64 = {0, 1, 127, 255, 0};
+  CheckCase(utf8(), v_uint, 
is_valid,
+ uint8(), e_uint8, 
options);
+  CheckCase(utf8(), v_uint, 
is_valid,
+   uint16(), e_uint16, 
options);
+  CheckCase(utf8(), v_uint, 
is_valid,
+   uint32(), e_uint32, 
options);
+  CheckCase(utf8(), v_uint, 
is_valid,
+   uint64(), e_uint64, 
options);
+
+  // string to float
+  vector v_float = {"0.1", "1.2", "127.3", "200.4", "0.5"};
+  vector e_float = {0.1f, 1.2f, 127.3f, 200.4f, 0.5f};
+  vector e_double = {0.1, 1.2, 127.3, 200.4, 0.5};
+  CheckCase(utf8(), v_float, 
is_valid,
+   float32(), e_float, 
options);
+  CheckCase(utf8(), v_float, 
is_valid,
+ float64(), e_double, 
options);
 
 Review comment:
   @wesm It seems that the sliced pattern is already tested in `CheckCase` 
method.
   
https://github.com/Licht-T/arrow/blob/master/cpp/src/arrow/compute/compute-test.cc#L123


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2017-12-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16302987#comment-16302987
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

wesm commented on issue #1387: ARROW-1491: [C++] Add casting implementations 
from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#issuecomment-353811452
 
 
   Sure please go ahead 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2017-12-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16302974#comment-16302974
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

Licht-T commented on issue #1387: ARROW-1491: [C++] Add casting implementations 
from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#issuecomment-353809606
 
 
   Thanks @wesm! I was busy but now I am okay. Would you mind if I help?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2017-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16299145#comment-16299145
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

wesm commented on issue #1387: ARROW-1491: [C++] Add casting implementations 
from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#issuecomment-353190250
 
 
   @Licht-T I will do a bit of work on this patch tomorrow or Friday for 
further review


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2017-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280623#comment-16280623
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

wesm commented on a change in pull request #1387: ARROW-1491: [C++] Add casting 
implementations from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#discussion_r155318929
 
 

 ##
 File path: cpp/src/arrow/compute/compute-test.cc
 ##
 @@ -769,6 +769,65 @@ TEST_F(TestCast, OffsetOutputBuffer) {
 int16(), e3);
 }
 
+TEST_F(TestCast, StringToBoolean) {
+  CastOptions options;
+
+  vector is_valid = {true, true, true, true, true};
+
+  vector v1 = {"False", "true", "true", "True", "false"};
+  vector v2 = {"0", "1", "1", "1", "0"};
+  vector e = {false, true, true, true, false};
+  CheckCase(utf8(), v1, is_valid, 
boolean(),
+e, options);
+  CheckCase(utf8(), v2, is_valid, 
boolean(),
+e, options);
+}
+
+TEST_F(TestCast, StringToNumber) {
+  CastOptions options;
+
+  vector is_valid = {true, true, true, true, true};
 
 Review comment:
   Can you modify the unit tests to propagate nulls? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2017-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280622#comment-16280622
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

wesm commented on a change in pull request #1387: ARROW-1491: [C++] Add casting 
implementations from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#discussion_r155319018
 
 

 ##
 File path: cpp/src/arrow/compute/compute-test.cc
 ##
 @@ -769,6 +769,65 @@ TEST_F(TestCast, OffsetOutputBuffer) {
 int16(), e3);
 }
 
+TEST_F(TestCast, StringToBoolean) {
+  CastOptions options;
+
+  vector is_valid = {true, true, true, true, true};
+
+  vector v1 = {"False", "true", "true", "True", "false"};
+  vector v2 = {"0", "1", "1", "1", "0"};
+  vector e = {false, true, true, true, false};
+  CheckCase(utf8(), v1, is_valid, 
boolean(),
+e, options);
+  CheckCase(utf8(), v2, is_valid, 
boolean(),
+e, options);
+}
+
+TEST_F(TestCast, StringToNumber) {
+  CastOptions options;
+
+  vector is_valid = {true, true, true, true, true};
+
+  // string to int
+  vector v_int = {"0", "1", "127", "-1", "0"};
+  vector e_int8 = {0, 1, 127, -1, 0};
+  vector e_int16 = {0, 1, 127, -1, 0};
+  vector e_int32 = {0, 1, 127, -1, 0};
+  vector e_int64 = {0, 1, 127, -1, 0};
+  CheckCase(utf8(), v_int, 
is_valid, int8(),
+   e_int8, options);
+  CheckCase(utf8(), v_int, 
is_valid, int16(),
+ e_int16, options);
+  CheckCase(utf8(), v_int, 
is_valid, int32(),
+ e_int32, options);
+  CheckCase(utf8(), v_int, 
is_valid, int64(),
+ e_int64, options);
+
+  // string to uint
+  vector v_uint = {"0", "1", "127", "255", "0"};
+  vector e_uint8 = {0, 1, 127, 255, 0};
+  vector e_uint16 = {0, 1, 127, 255, 0};
+  vector e_uint32 = {0, 1, 127, 255, 0};
+  vector e_uint64 = {0, 1, 127, 255, 0};
+  CheckCase(utf8(), v_uint, 
is_valid,
+ uint8(), e_uint8, 
options);
+  CheckCase(utf8(), v_uint, 
is_valid,
+   uint16(), e_uint16, 
options);
+  CheckCase(utf8(), v_uint, 
is_valid,
+   uint32(), e_uint32, 
options);
+  CheckCase(utf8(), v_uint, 
is_valid,
+   uint64(), e_uint64, 
options);
+
+  // string to float
+  vector v_float = {"0.1", "1.2", "127.3", "200.4", "0.5"};
+  vector e_float = {0.1f, 1.2f, 127.3f, 200.4f, 0.5f};
+  vector e_double = {0.1, 1.2, 127.3, 200.4, 0.5};
+  CheckCase(utf8(), v_float, 
is_valid,
+   float32(), e_float, 
options);
+  CheckCase(utf8(), v_float, 
is_valid,
+ float64(), e_double, 
options);
 
 Review comment:
   Can you test with a non-zero offset (e.g. `foo->Slice(2)`)?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2017-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280624#comment-16280624
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

wesm commented on a change in pull request #1387: ARROW-1491: [C++] Add casting 
implementations from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#discussion_r155318397
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -660,6 +663,100 @@ struct CastFunctor::value>::type> {
+  void operator()(FunctionContext* ctx, const CastOptions& options,
+  const ArrayData& input, ArrayData* output) {
+using out_type = typename O::c_type;
+StringArray input_array(input.Copy());
+
+if (input_array.null_count() > 0) {
+  std::stringstream ss;
+  ss << "Failed to cast NA into " << output->type->ToString();
+  ctx->SetStatus(Status(StatusCode::SerializationError, ss.str()));
+  return;
+}
 
 Review comment:
   If the input has nulls, then the output should have nulls in the same 
locations (like the other cast functions)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2017-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280621#comment-16280621
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

wesm commented on a change in pull request #1387: ARROW-1491: [C++] Add casting 
implementations from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#discussion_r155317738
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -660,6 +663,100 @@ struct CastFunctor::value>::type> {
+  void operator()(FunctionContext* ctx, const CastOptions& options,
+  const ArrayData& input, ArrayData* output) {
+using out_type = typename O::c_type;
+StringArray input_array(input.Copy());
+
+if (input_array.null_count() > 0) {
+  std::stringstream ss;
+  ss << "Failed to cast NA into " << output->type->ToString();
+  ctx->SetStatus(Status(StatusCode::SerializationError, ss.str()));
+  return;
+}
+
+auto out_data = GetMutableValues(output, 1);
+
+std::function cast_func;
+if (output->type->id() == Type::INT8 || output->type->id() == Type::UINT8) 
{
+  cast_func = [](const std::string& s) {
+return boost::numeric_cast(boost::lexical_cast(s));
+  };
+} else {
+  cast_func = [](const std::string& s) { return 
boost::lexical_cast(s); };
 
 Review comment:
   I think C++11 Lambdas actually incur more overhead than an inlined function. 
We should instead introduce an auxiliary numeric cast functor that does this 
switch at compile-time (resulting in an inlined function in the inner loop for 
all possible types) rather than runtime


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2017-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280625#comment-16280625
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

wesm commented on a change in pull request #1387: ARROW-1491: [C++] Add casting 
implementations from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#discussion_r155318269
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -660,6 +663,100 @@ struct CastFunctor::value>::type> {
+  void operator()(FunctionContext* ctx, const CastOptions& options,
+  const ArrayData& input, ArrayData* output) {
+using out_type = typename O::c_type;
+StringArray input_array(input.Copy());
+
+if (input_array.null_count() > 0) {
+  std::stringstream ss;
+  ss << "Failed to cast NA into " << output->type->ToString();
+  ctx->SetStatus(Status(StatusCode::SerializationError, ss.str()));
+  return;
+}
+
+auto out_data = GetMutableValues(output, 1);
+
+std::function cast_func;
+if (output->type->id() == Type::INT8 || output->type->id() == Type::UINT8) 
{
+  cast_func = [](const std::string& s) {
+return boost::numeric_cast(boost::lexical_cast(s));
+  };
+} else {
+  cast_func = [](const std::string& s) { return 
boost::lexical_cast(s); };
+}
+
+for (int64_t i = 0; i < input.length; ++i) {
+  std::string s = input_array.GetString(i);
+
+  try {
+*out_data++ = cast_func(s);
+  } catch (...) {
+std::stringstream ss;
+ss << "Failed to cast String '" << s << "' into " << 
output->type->ToString();
+ctx->SetStatus(Status(StatusCode::SerializationError, ss.str()));
+return;
+  }
+}
+  }
+};
+
+// --
+// String to Boolean
+
+template 
+struct CastFunctor::value>::type> {
+  void operator()(FunctionContext* ctx, const CastOptions& options,
+  const ArrayData& input, ArrayData* output) {
+StringArray input_array(input.Copy());
+internal::BitmapWriter writer(output->buffers[1]->mutable_data(), 
output->offset,
+  input.length);
+
+if (input_array.null_count() > 0) {
+  std::stringstream ss;
+  ss << "Failed to cast NA into " << output->type->ToString();
+  ctx->SetStatus(Status(StatusCode::SerializationError, ss.str()));
 
 Review comment:
   If the input has nulls, then the output should have nulls in the same 
locations


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2017-12-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279236#comment-16279236
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

Licht-T commented on a change in pull request #1387: ARROW-1491: [C++] Add 
casting implementations from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#discussion_r155086307
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -17,6 +17,9 @@
 
 #include "arrow/compute/kernels/cast.h"
 
+#include 
+#include 
+#include 
 
 Review comment:
   Seems that `boost::numeric_cast` and `boost::lexical_cast` are not 
replaceable by STL.
   STL has `std::to_string`, but it does not support small size ints.
   http://en.cppreference.com/w/cpp/string/basic_string/to_string


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2017-12-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279051#comment-16279051
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

wesm commented on a change in pull request #1387: ARROW-1491: [C++] Add casting 
implementations from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#discussion_r155044277
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -17,6 +17,9 @@
 
 #include "arrow/compute/kernels/cast.h"
 
+#include 
+#include 
+#include 
 
 Review comment:
   Is it possible to not rely on Boost for this, e.g. are there some 
alternatives in the STL or that we can access otherwise? I will review the rest 
in more detail later


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2017-12-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278494#comment-16278494
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

xhochy commented on a change in pull request #1387: ARROW-1491: [C++] Add 
casting implementations from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387#discussion_r154934826
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -660,6 +663,100 @@ struct CastFunctor::value>::type> {
+  void operator()(FunctionContext* ctx, const CastOptions& options,
+  const ArrayData& input, ArrayData* output) {
+using out_type = typename O::c_type;
+StringArray input_array(input.Copy());
+
+if (input_array.null_count() > 0) {
+  std::stringstream ss;
+  ss << "Failed to cast NA into " << output->type->ToString();
+  ctx->SetStatus(Status(StatusCode::SerializationError, ss.str()));
+  return;
+}
+
+auto out_data = GetMutableValues(output, 1);
+
+std::function cast_func;
+if (output->type->id() == Type::INT8 || output->type->id() == Type::UINT8) 
{
+  cast_func = [](const std::string& s) {
+return boost::numeric_cast(boost::lexical_cast(s));
 
 Review comment:
   Can you add a comment why this special case is needed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2017-12-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16276798#comment-16276798
 ] 

ASF GitHub Bot commented on ARROW-1491:
---

Licht-T opened a new pull request #1387: ARROW-1491: [C++] Add casting 
implementations from strings to numbers or boolean
URL: https://github.com/apache/arrow/pull/1387
 
 
   This closes [ARROW-1491](https://issues.apache.org/jira/browse/ARROW-1491).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Licht Takeuchi
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1491) [C++] Add casting implementations from strings to numbers or boolean

2017-10-25 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219398#comment-16219398
 ] 

Wes McKinney commented on ARROW-1491:
-

While this would be nice, it's not immediately urgent. Some help would be 
appreciated

> [C++] Add casting implementations from strings to numbers or boolean
> 
>
> Key: ARROW-1491
> URL: https://issues.apache.org/jira/browse/ARROW-1491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)