[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-08 Thread GitBox


emkornfield commented on pull request #6985:
URL: https://github.com/apache/arrow/pull/6985#issuecomment-626100172


   @pitrou did you want to take another look?
   
   @nealrichardson should I hold off until we an revert the change to see if 
the test still fails?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7131: ARROW-8743: [CI][C++] Add a test job for s390x

2020-05-08 Thread GitBox


kou commented on pull request #7131:
URL: https://github.com/apache/arrow/pull/7131#issuecomment-626095079


   OK. I'll merge this once remaining GitHub Actions jobs are passed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on pull request #7131: ARROW-8743: [CI][C++] Add a test job for s390x

2020-05-08 Thread GitBox


kiszk commented on pull request #7131:
URL: https://github.com/apache/arrow/pull/7131#issuecomment-626094759


   Yes, I think that it is a good starting point. 
   
   I have a patch to fix failures at `rle_encoding_test.cc` in my environment. 
In my local environment, I used the option `cmake -DCMAKE_BUILD_TYPE=Debug 
-DARROW_BUILD_TESTS=ON`.
   
   After merging this, I will submit a PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7131: ARROW-8743: [CI][C++] Add a test job for s390x

2020-05-08 Thread GitBox


kou commented on pull request #7131:
URL: https://github.com/apache/arrow/pull/7131#issuecomment-626093743


   > Other candidates look `ARROW_PYTHON` and `ARROW_S3`. Does it make sense?
   
   They are already disabled by default: 
https://travis-ci.org/github/apache/arrow/builds/684931696#L1496



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7131: ARROW-8743: [CI][C++] Add a test job for s390x

2020-05-08 Thread GitBox


kou commented on pull request #7131:
URL: https://github.com/apache/arrow/pull/7131#issuecomment-626093641


   Done: https://travis-ci.org/github/apache/arrow/builds/684931696
   
   There are many failure outputs from 
`cpp/src/arrow/util/rle_encoding_test.cc`. It's a core component. We can't 
disable it.
   
   Do you think that the current job is a good starting point?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on pull request #7131: ARROW-8743: [CI][C++] Add a test job for s390x

2020-05-08 Thread GitBox


kiszk commented on pull request #7131:
URL: https://github.com/apache/arrow/pull/7131#issuecomment-626093493


   Thank you very much!
   Other candidates look `ARROW_PYTHON` and `ARROW_S3`. Does it make sense?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7131: ARROW-8743: [CI][C++] Add a test job for s390x

2020-05-08 Thread GitBox


kou commented on pull request #7131:
URL: https://github.com/apache/arrow/pull/7131#issuecomment-626089457


   Sure. I'll reduce enabled options.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on pull request #7131: ARROW-8743: [CI][C++] Add a test job for s390x

2020-05-08 Thread GitBox


kiszk commented on pull request #7131:
URL: https://github.com/apache/arrow/pull/7131#issuecomment-626087155


   One question. Can we enable options step-by-step, too? For example, can we 
customize `ENV ...` at 
https://github.com/apache/arrow/pull/7131/commits/5d6617c4a26be2983336e1baba1a16effe412e9c#diff-cb07f7d49b556d3668355e47df3ad348R85
 for each platform?
   
   It can reduce the number of errors in a log file when we submit a PR to fix 
failures step-by-step.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on pull request #7131: ARROW-8743: [CI][C++] Add a test job for s390x

2020-05-08 Thread GitBox


kiszk commented on pull request #7131:
URL: https://github.com/apache/arrow/pull/7131#issuecomment-626083632


   @kou Thank you. I am happy to fix them.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] scampi commented on a change in pull request #6402: ARROW-7831: [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchange

2020-05-08 Thread GitBox


scampi commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r422419677



##
File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java
##
@@ -740,10 +740,16 @@ private void splitAndTransferOffsetBuffer(int startIndex, 
int length, BaseVariab
 final int start = offsetBuffer.getInt(startIndex * OFFSET_WIDTH);
 final int end = offsetBuffer.getInt((startIndex + length) * OFFSET_WIDTH);
 final int dataLength = end - start;
-target.allocateOffsetBuffer((length + 1) * OFFSET_WIDTH);
-for (int i = 0; i < length + 1; i++) {
-  final int relativeSourceOffset = offsetBuffer.getInt((startIndex + i) * 
OFFSET_WIDTH) - start;
-  target.offsetBuffer.setInt(i * OFFSET_WIDTH, relativeSourceOffset);
+
+if (startIndex == 0) {
+  target.offsetBuffer = offsetBuffer.slice(0, (1 + length) * OFFSET_WIDTH);
+  target.offsetBuffer.getReferenceManager().retain();

Review comment:
   > We should also verify that reference management and accounting is 
intact after this change
   
   @siddharthteotia  I have added a test but this is not checked. Do you have 
some pointers on how I can check this?
   
   > do you want to transfer the buffer here as well?
   
   @emkornfield What do you mean?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7132: ARROW-3509: [C++] Standardize on using Field in Type/Array

2020-05-08 Thread GitBox


github-actions[bot] commented on pull request #7132:
URL: https://github.com/apache/arrow/pull/7132#issuecomment-626051893


   https://issues.apache.org/jira/browse/ARROW-3509



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield opened a new pull request #7132: ARROW-3509: [C++] Standardize on using Field in Type/Array

2020-05-08 Thread GitBox


emkornfield opened a new pull request #7132:
URL: https://github.com/apache/arrow/pull/7132


   I think there is potentially more consolidaion on the builders but this 
seems like a reasonable place to make the PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7131: ARROW-8743: [CI][C++] Add a test job for s390x

2020-05-08 Thread GitBox


github-actions[bot] commented on pull request #7131:
URL: https://github.com/apache/arrow/pull/7131#issuecomment-626024349


   https://issues.apache.org/jira/browse/ARROW-8743



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7131: ARROW-8743: [CI][C++] Add a test job for s390x

2020-05-08 Thread GitBox


kou commented on pull request #7131:
URL: https://github.com/apache/arrow/pull/7131#issuecomment-626022318


   @kiszk Do you want to work on fixing test failures step-by-step as follow-up 
tasks?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou opened a new pull request #7131: ARROW-8743: [CI][C++] Add a test job for s390x

2020-05-08 Thread GitBox


kou opened a new pull request #7131:
URL: https://github.com/apache/arrow/pull/7131


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7120: ARROW-8727: [C++] Don't require stack allocation of any object to use StringConverter, hide behind ParseValue function

2020-05-08 Thread GitBox


wesm commented on a change in pull request #7120:
URL: https://github.com/apache/arrow/pull/7120#discussion_r422371276



##
File path: cpp/src/arrow/util/value_parsing.h
##
@@ -484,6 +455,14 @@ static inline bool ParseHH_MM_SS(const char* s, 
std::chrono::duration*
 
 }  // namespace detail
 
+/// \brief Attempt to convert a string to the primitive type corresponding to
+/// an Arrow data type
+template 
+inline bool ParseValue(const char* s, size_t length, typename T::c_type* out,
+   const ParseContext* ctx = NULLPTR) {
+  return detail::StringConverter::Convert(s, length, out);
+}
+

Review comment:
   It doesn't provide much value. I was trying to address @pitrou's comment 
about parsers that may require some state initialization like a locale and I 
used Timestamp as an example





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7120: ARROW-8727: [C++] Don't require stack allocation of any object to use StringConverter, hide behind ParseValue function

2020-05-08 Thread GitBox


wesm commented on pull request #7120:
URL: https://github.com/apache/arrow/pull/7120#issuecomment-626009344


   That’s fine. In general, introducing “nuisance” requirements for simple 
cases doesn’t seem like a good habit 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7118: ARROW-8724: [Packaging][deb][RPM] Use directory in host as build directory

2020-05-08 Thread GitBox


kou commented on pull request #7118:
URL: https://github.com/apache/arrow/pull/7118#issuecomment-626002577


   +1
   Failures on uploading artifacts are unrelated.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz commented on a change in pull request #7120: ARROW-8727: [C++] Don't require stack allocation of any object to use StringConverter, hide behind ParseValue function

2020-05-08 Thread GitBox


bkietz commented on a change in pull request #7120:
URL: https://github.com/apache/arrow/pull/7120#discussion_r422353994



##
File path: cpp/src/arrow/util/value_parsing.h
##
@@ -484,6 +455,14 @@ static inline bool ParseHH_MM_SS(const char* s, 
std::chrono::duration*
 
 }  // namespace detail
 
+/// \brief Attempt to convert a string to the primitive type corresponding to
+/// an Arrow data type
+template 
+inline bool ParseValue(const char* s, size_t length, typename T::c_type* out,
+   const ParseContext* ctx = NULLPTR) {
+  return detail::StringConverter::Convert(s, length, out);
+}
+

Review comment:
   It's confusing when signatures for generics like this differ 
significantly, so yes: as I wrote it the possibility of a non-empty context 
does push the requirement of a context argument onto overloads of `ParseValue` 
which are context free.
   
   If you feel `ParseValue()` has requirements which are 
sufficiently different from `ParseValue()` that you think the signatures 
*should* be different, then why should they be in the same overload family? 
What value does `ParseValue()` provide over 
`ParseTimestampISO8601()`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7130: ARROW-8742: [C++][Python] Add GRPC Mutual TLS for clients and server

2020-05-08 Thread GitBox


github-actions[bot] commented on pull request #7130:
URL: https://github.com/apache/arrow/pull/7130#issuecomment-625999819


   https://issues.apache.org/jira/browse/ARROW-8742



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7130: [C++][Python] Add GRPC Mutual TLS for clients and server

2020-05-08 Thread GitBox


github-actions[bot] commented on pull request #7130:
URL: https://github.com/apache/arrow/pull/7130#issuecomment-625993735


   
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
   ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
 * [Other pull requests](https://github.com/apache/arrow/pulls/)
 * [Contribution Guidelines - How to contribute 
patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] erinleeryan opened a new pull request #7130: [C++][Python] Add GRPC Mutual TLS for clients and server

2020-05-08 Thread GitBox


erinleeryan opened a new pull request #7130:
URL: https://github.com/apache/arrow/pull/7130


   Our team would like to implement mutual tls authentication for a pyarrow 
client, hence modifications to the C++ and Python implementations to at a 
GrpcMTls scheme for connections.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #6617: ARROW-5875: [FlightRPC] integration tests for Flight features

2020-05-08 Thread GitBox


wesm commented on pull request #6617:
URL: https://github.com/apache/arrow/pull/6617#issuecomment-625973437


   +1, sorry for the delay in merging this



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7120: ARROW-8727: [C++] Don't require stack allocation of any object to use StringConverter, hide behind ParseValue function

2020-05-08 Thread GitBox


wesm commented on pull request #7120:
URL: https://github.com/apache/arrow/pull/7120#issuecomment-625970866


   I left the `ParseValue` API as is and addressed the other comments. Since 
this is internal-only code would it be alright to call it a day on this? If 
someone wants to make more changes in a follow up PR that's fine with me



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7120: ARROW-8727: [C++] Don't require stack allocation of any object to use StringConverter, hide behind ParseValue function

2020-05-08 Thread GitBox


wesm commented on a change in pull request #7120:
URL: https://github.com/apache/arrow/pull/7120#discussion_r422314767



##
File path: cpp/src/arrow/util/value_parsing.h
##
@@ -484,6 +455,14 @@ static inline bool ParseHH_MM_SS(const char* s, 
std::chrono::duration*
 
 }  // namespace detail
 
+/// \brief Attempt to convert a string to the primitive type corresponding to
+/// an Arrow data type
+template 
+inline bool ParseValue(const char* s, size_t length, typename T::c_type* out,
+   const ParseContext* ctx = NULLPTR) {
+  return detail::StringConverter::Convert(s, length, out);
+}
+

Review comment:
   I still really don't like this because you're pushing requirements from 
a special case onto the general case 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield edited a comment on pull request #7110: Proof of concept for Arrow text schema

2020-05-08 Thread GitBox


emkornfield edited a comment on pull request #7110:
URL: https://github.com/apache/arrow/pull/7110#issuecomment-625963511


   >  I'm not sure what you mean by non-official function, though. Could you 
expand on that?
   
   Sorry, I meant not something that is supported officially in the IPC format 
specification (i.e. other languages need to support reading the JSON).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] emkornfield commented on pull request #7110: Proof of concept for Arrow text schema

2020-05-08 Thread GitBox


emkornfield commented on pull request #7110:
URL: https://github.com/apache/arrow/pull/7110#issuecomment-625963511


   >  I'm not sure what you mean by non-official function, though. Could you 
expand on that?
   Sorry, I meant not something that is supported officially in the IPC format 
specification (i.e. other languages need to support reading the JSON).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] chrish42 commented on pull request #7110: Proof of concept for Arrow text schema

2020-05-08 Thread GitBox


chrish42 commented on pull request #7110:
URL: https://github.com/apache/arrow/pull/7110#issuecomment-625943409


   I'd also prefer not to add anything to the JSON, but I was raising that 
option because there were concerns about maintaining compatibility in the past 
when I initially raised this, so that would be a way to mitigate that. One way 
to add extra fields while still leaving all the parsing to the Flatbuffers 
library would be to create a new Flatbuffers schema with said additional 
fields. But the first choice would be to make sure that Flatbuffers won't be 
changing that format under Arrow's feet, and so not need those extra fields.
   
   Also, agreed about this not belonging in the ipc namespace. It should 
probably live closer to the Schema C++ Arrow class. But we can discuss all that 
more once there's agreement on the approach and I rework the pull request to be 
mergeable. I'm not sure what you mean by non-official function, though. Could 
you expand on that? For me, the goal is to have a (user-readable, non-code) way 
of expressing an Arrow schema that's supported across versions. I get that 
we'll get there progressively, however. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] bkietz commented on a change in pull request #7120: ARROW-8727: [C++] Don't require stack allocation of any object to use StringConverter, hide behind ParseValue function

2020-05-08 Thread GitBox


bkietz commented on a change in pull request #7120:
URL: https://github.com/apache/arrow/pull/7120#discussion_r422259138



##
File path: cpp/src/arrow/util/value_parsing.cc
##
@@ -43,46 +44,43 @@ struct StringToFloatConverter::Impl {
   util::double_conversion::StringToDoubleConverter fallback_converter_;
 };
 
-constexpr int StringToFloatConverter::Impl::flags_;
-constexpr double StringToFloatConverter::Impl::main_junk_value_;
-constexpr double StringToFloatConverter::Impl::fallback_junk_value_;
+static StringToFloatConverterImpl g_string_to_float;

Review comment:
   To make immutability/thread safety clearer:
   ```suggestion
   static const StringToFloatConverterImpl g_string_to_float;
   ```

##
File path: cpp/src/arrow/util/value_parsing.h
##
@@ -55,26 +55,18 @@ class ARROW_EXPORT TimestampParser {
 
 namespace internal {
 
-/// \brief A class providing conversion from strings to some Arrow data types
-///
-/// Conversion is triggered by calling operator().  It returns true on
-/// success, false on failure.
-///
-/// The class may have a non-trivial construction cost in some cases,
-/// so it's recommended to use a single instance many times, if doing bulk
-/// conversion. Instances of this class are not guaranteed to be thread-safe.
-///
-template 
-class StringConverter;
+namespace detail {
 
-template <>
-class StringConverter {
- public:
-  explicit StringConverter(const std::shared_ptr& = NULLPTR) {}
+template 
+struct StringConverter {
+  static bool Convert(const char*, size_t, void*) { return false; }

Review comment:
   This allows construction of an always-fails StringConverter for any 
arrow type, replacing the previous behavior under which `StringConverter` 
was a compile error if conversion for `T` is not defined. This replacement 
seems extremely dubious to me, especially if not clearly documented. Please 
revert this and any instantiations of `StringConverter` or `ParseValue` 
which are undefined.

##
File path: cpp/src/arrow/util/value_parsing.h
##
@@ -484,6 +455,14 @@ static inline bool ParseHH_MM_SS(const char* s, 
std::chrono::duration*
 
 }  // namespace detail
 
+/// \brief Attempt to convert a string to the primitive type corresponding to
+/// an Arrow data type
+template 
+inline bool ParseValue(const char* s, size_t length, typename T::c_type* out,
+   const ParseContext* ctx = NULLPTR) {
+  return detail::StringConverter::Convert(s, length, out);
+}
+

Review comment:
   This declaration (and the specialization below) make the requirements on 
ParseContext extremely unclear. It's optional and may be of any type... except 
when parsing timestamps. Instead it seems better to make ParseContext required 
for all types and empty for the majority: 
   ```suggestion
   template 
   struct ParseContext {};
   
   template <>
   struct ParseContext {
 TimeUnit::type unit;
   };
   
   /// \brief Attempt to convert a string to the primitive type corresponding to
   /// an Arrow data type
   template >
   inline bool ParseValue(const ParseContext& ctx, util::string_view s, 
typename T::c_type* out) {
 return Converter{}::Convert(s.data(), s.size(), out);
   }
   
   template <>
   inline bool ParseValue(const ParseContext& ctx,
   util::string_view s, typename T::c_type* out) {
 return ParseTimestampISO8601(s.data(), s.size(), ctx.unit, out);
   }
   ```
   
   This makes the usage of a parse context clearer and does not add 
significantly to boilerplate at any call site:
   ```c++
   ParseValue({}, "42", )
   ParseValue({TimeUnit::NANO}, ts_str, _value)
   ```
   
   Bonus: the out argument is the last argument again.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7129: ARROW-8741: [Python][Packaging] Keep VS2015 with bundled dependencies for the windows wheels

2020-05-08 Thread GitBox


github-actions[bot] commented on pull request #7129:
URL: https://github.com/apache/arrow/pull/7129#issuecomment-625887076


   https://issues.apache.org/jira/browse/ARROW-8741



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7129: [Python][Packaging] Keep VS2015 with bundled dependencies for the windows wheels [WIP]

2020-05-08 Thread GitBox


github-actions[bot] commented on pull request #7129:
URL: https://github.com/apache/arrow/pull/7129#issuecomment-625875967


   Revision: 2b729afa7900059a153769086a7fbb331be821f5
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-219](https://github.com/ursa-labs/crossbow/branches/all?query=actions-219)
   
   |Task|Status|
   ||--|
   
|wheel-win-cp36m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-219-appveyor-wheel-win-cp36m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on issue #7127: [R] Implementing methodes for combining arrow tabels using dplyr::bind_rows and dplyr::bind_cols

2020-05-08 Thread GitBox


wesm commented on issue #7127:
URL: https://github.com/apache/arrow/issues/7127#issuecomment-625862341


   Hi could you please open a JIRA issue?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7129: [Python][Packaging] Keep VS2015 with bundled dependencies for the windows wheels

2020-05-08 Thread GitBox


github-actions[bot] commented on pull request #7129:
URL: https://github.com/apache/arrow/pull/7129#issuecomment-625851764


   Revision: 13b3c2c9b54147d26bac38dfdf624d63984ab278
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-218](https://github.com/ursa-labs/crossbow/branches/all?query=actions-218)
   
   |Task|Status|
   ||--|
   
|wheel-win-cp36m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-218-appveyor-wheel-win-cp36m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #7129: [Python][Packaging] Keep VS2015 with bundled dependencies for the windows wheels

2020-05-08 Thread GitBox


kszucs commented on pull request #7129:
URL: https://github.com/apache/arrow/pull/7129#issuecomment-625851074


   @github-actions crossbow submit wheel-win-cp36m



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7129: [Python][Packaging] Keep VS2015 with bundled dependencies for the windows wheels

2020-05-08 Thread GitBox


github-actions[bot] commented on pull request #7129:
URL: https://github.com/apache/arrow/pull/7129#issuecomment-625825788


   Revision: 7db00e5112ded499d0bb3c40effe0e9608d4ba8f
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-217](https://github.com/ursa-labs/crossbow/branches/all?query=actions-217)
   
   |Task|Status|
   ||--|
   
|wheel-win-cp36m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-217-appveyor-wheel-win-cp36m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #7129: [Python][Packaging] Keep VS2015 with bundled dependencies for the windows wheels

2020-05-08 Thread GitBox


kszucs commented on pull request #7129:
URL: https://github.com/apache/arrow/pull/7129#issuecomment-625824968


   @github-actions crossbow submit wheel-win-cp36m



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7113: ARROW-8740: [CI] Fix archery option in pandas master cron test

2020-05-08 Thread GitBox


github-actions[bot] commented on pull request #7113:
URL: https://github.com/apache/arrow/pull/7113#issuecomment-625817459


   https://issues.apache.org/jira/browse/ARROW-8740



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7129: [Python][Packaging] Keep VS2015 with bundled dependencies for the windows wheels

2020-05-08 Thread GitBox


github-actions[bot] commented on pull request #7129:
URL: https://github.com/apache/arrow/pull/7129#issuecomment-625807095


   Revision: 6109c0a9d8ada6201d616561233253ac51814ec4
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-216](https://github.com/ursa-labs/crossbow/branches/all?query=actions-216)
   
   |Task|Status|
   ||--|
   
|wheel-win-cp36m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-216-appveyor-wheel-win-cp36m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #7129: [Python][Packaging] Keep VS2015 with bundled dependencies for the windows wheels

2020-05-08 Thread GitBox


kszucs commented on pull request #7129:
URL: https://github.com/apache/arrow/pull/7129#issuecomment-625806490


   @github-actions crossbow submit wheel-win-cp36m



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] lidavidm commented on pull request #6617: ARROW-5875: [FlightRPC] integration tests for Flight features

2020-05-08 Thread GitBox


lidavidm commented on pull request #6617:
URL: https://github.com/apache/arrow/pull/6617#issuecomment-625800808


   Rebased again to fix merge conflicts.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7129: [Python][Packaging] Keep VS2015 with bundled dependencies for the windows wheels

2020-05-08 Thread GitBox


github-actions[bot] commented on pull request #7129:
URL: https://github.com/apache/arrow/pull/7129#issuecomment-625789258


   Revision: baa6c77e9e2b67f99a32da27c04cb94e67b08c7a
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-215](https://github.com/ursa-labs/crossbow/branches/all?query=actions-215)
   
   |Task|Status|
   ||--|
   
|wheel-win-cp36m|[![Appveyor](https://img.shields.io/appveyor/ci/ursa-labs/crossbow/actions-215-appveyor-wheel-win-cp36m.svg)](https://ci.appveyor.com/project/ursa-labs/crossbow/history)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs opened a new pull request #7129: [Python][Packaging] Keep VS2015 with bundled dependencies for the windows wheels

2020-05-08 Thread GitBox


kszucs opened a new pull request #7129:
URL: https://github.com/apache/arrow/pull/7129


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #7129: [Python][Packaging] Keep VS2015 with bundled dependencies for the windows wheels

2020-05-08 Thread GitBox


kszucs commented on pull request #7129:
URL: https://github.com/apache/arrow/pull/7129#issuecomment-625787449


   @github-actions crossbow submit  wheel-win-cp36m



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on pull request #7080: ARROW-8662: [CI] Consolidate appveyor scripts

2020-05-08 Thread GitBox


kszucs commented on pull request #7080:
URL: https://github.com/apache/arrow/pull/7080#issuecomment-625771425


   @pitrou I guess we shouldn't block on the infra ticket.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs commented on a change in pull request #7080: ARROW-8662: [CI] Consolidate appveyor scripts

2020-05-08 Thread GitBox


kszucs commented on a change in pull request #7080:
URL: https://github.com/apache/arrow/pull/7080#discussion_r422091054



##
File path: ci/appveyor-cpp-build.bat
##
@@ -17,114 +17,146 @@
 
 @echo on
 
-@rem In release mode, disable optimizations (/Od) for faster compiling
-set CMAKE_CXX_FLAGS_RELEASE=/Od
-
-if "%JOB%" == "Static_Crt_Build" (
-  @rem Since we link the CRT statically, we should also disable building
-  @rem the Arrow shared library to link the tests statically, otherwise
-  @rem the Arrow DLL and the tests end up using a different instance of
-  @rem the CRT, which wreaks havoc.
-
-  @rem ARROW-5403(wesm): Since changing to using gtest DLLs we can no
-  @rem longer run the unit tests because gtest.dll and the unit test
-  @rem executables have different static copies of the CRT
-
-  mkdir cpp\build-debug
-  pushd cpp\build-debug
-
-  cmake -G "%GENERATOR%" ^
--DARROW_VERBOSE_THIRDPARTY_BUILD=OFF ^
--DARROW_USE_STATIC_CRT=ON ^
--DARROW_BOOST_USE_SHARED=OFF ^
--DARROW_BUILD_SHARED=OFF ^
--DARROW_BUILD_TESTS=ON ^
--DARROW_BUILD_EXAMPLES=ON ^
--DCMAKE_BUILD_TYPE=Debug ^
--DARROW_ENABLE_TIMING_TESTS=OFF ^
--DARROW_TEST_LINKAGE=static ^
--DARROW_CXXFLAGS="/MP" ^
-..  || exit /B
+git config core.symlinks true
+git reset --hard
 
-  cmake --build . --config Debug || exit /B
-  ctest --output-on-failure -j2 || exit /B
-  popd
-  rmdir /S /Q cpp\build-debug
+@rem Retrieve git submodules, configure env var for Parquet unit tests
+git submodule update --init || exit /B
 
-  mkdir cpp\build-release
-  pushd cpp\build-release
-
-  cmake -G "%GENERATOR%" ^
--DARROW_VERBOSE_THIRDPARTY_BUILD=OFF ^
--DARROW_USE_STATIC_CRT=ON ^
--DARROW_BOOST_USE_SHARED=OFF ^
--DARROW_BUILD_SHARED=OFF ^
--DARROW_BUILD_TESTS=ON ^
--DARROW_BUILD_EXAMPLES=ON ^
--DCMAKE_BUILD_TYPE=Release ^
--DARROW_ENABLE_TIMING_TESTS=OFF ^
--DARROW_TEST_LINKAGE=static ^
--DCMAKE_CXX_FLAGS_RELEASE="/MT %CMAKE_CXX_FLAGS_RELEASE%" ^
--DARROW_CXXFLAGS="/WX /MP" ^
-..  || exit /B
-
-  cmake --build . --config Release || exit /B
-  ctest --output-on-failure -j2 || exit /B
-  popd
-
-  @rem Finish Static_Crt_Build build successfully
-  exit /B 0
-)
+set ARROW_TEST_DATA=%CD%\testing\data
+set PARQUET_TEST_DATA=%CD%\cpp\submodules\parquet-testing\data
 
+@rem
 @rem In the configurations below we disable building the Arrow static library
 @rem to save some time.  Unfortunately this will still build the Parquet static
 @rem library because of PARQUET-1420 (Thrift-generated symbols not exported in 
DLL).
-
+@rem
 if "%JOB%" == "Build_Debug" (
   mkdir cpp\build-debug
   pushd cpp\build-debug
 
   cmake -G "%GENERATOR%" ^
--DARROW_USE_PRECOMPILED_HEADERS=OFF ^
--DARROW_VERBOSE_THIRDPARTY_BUILD=OFF ^
 -DARROW_BOOST_USE_SHARED=OFF ^
--DARROW_BUILD_TESTS=ON ^
 -DARROW_BUILD_EXAMPLES=ON ^
--DCMAKE_BUILD_TYPE=%CONFIGURATION% ^
--DCMAKE_UNITY_BUILD=ON ^
 -DARROW_BUILD_STATIC=OFF ^
+-DARROW_BUILD_TESTS=ON ^
 -DARROW_CXXFLAGS="/MP" ^
 -DARROW_ENABLE_TIMING_TESTS=OFF ^
-..  || exit /B
+-DARROW_USE_PRECOMPILED_HEADERS=OFF ^
+-DARROW_VERBOSE_THIRDPARTY_BUILD=OFF ^
+-DCMAKE_BUILD_TYPE="Debug" ^
+-DCMAKE_UNITY_BUILD=ON ^
+.. || exit /B
 
-  cmake --build . --config %CONFIGURATION% || exit /B
+  cmake --build . --config Debug || exit /B
   ctest --output-on-failure -j2 || exit /B
   popd
 
   @rem Finish Debug build successfully
   exit /B 0
 )
 
-@rem Avoid Boost 1.70 because of https://github.com/boostorg/process/issues/85
-set CONDA_PACKAGES=--file=ci\conda_env_python.yml ^
-  python=%PYTHON% numpy=1.14 "boost-cpp<1.70"
+call activate arrow
 
-if "%ARROW_BUILD_GANDIVA%" == "ON" (
-  @rem Install llvmdev in the toolchain if building gandiva.dll
-  set CONDA_PACKAGES=%CONDA_PACKAGES% --file=ci\conda_env_gandiva.yml
-)
+@rem Use Boost from Anaconda
+set BOOST_ROOT=%CONDA_PREFIX%\Library
+set BOOST_LIBRARYDIR=%CONDA_PREFIX%\Library\lib
+
+@rem The "main" C++ build script for Windows CI
+@rem (i.e. for usual configurations)
 
 if "%JOB%" == "Toolchain" (
-  @rem Install pre-built "toolchain" packages for faster builds
-  set CONDA_PACKAGES=%CONDA_PACKAGES% --file=ci\conda_env_cpp.yml
+  set CMAKE_ARGS=-DARROW_DEPENDENCY_SOURCE=CONDA -DARROW_WITH_BZ2=ON
+) else (
+  @rem We're in a conda environment but don't want to use it for the 
dependencies
+  set CMAKE_ARGS=-DARROW_DEPENDENCY_SOURCE=AUTO
 )
 
-conda create -n arrow -q -y %CONDA_PACKAGES% -c conda-forge || exit /B
+@rem Enable warnings-as-errors
+set ARROW_CXXFLAGS=/WX /MP
 
-call activate arrow
+@rem
+@rem Build and test Arrow C++ libraries (including Parquet)
+@rem
 
-@rem Use Boost from Anaconda
-set BOOST_ROOT=%CONDA_PREFIX%\Library
-set BOOST_LIBRARYDIR=%CONDA_PREFIX%\Library\lib

[GitHub] [arrow] rymurr commented on a change in pull request #7100: ARROW-8696: [Java] Convert tests to maven failsafe

2020-05-08 Thread GitBox


rymurr commented on a change in pull request #7100:
URL: https://github.com/apache/arrow/pull/7100#discussion_r422081471



##
File path: 
java/memory/src/test/java/org/apache/arrow/memory/ITTestLargeArrowBuf.java
##
@@ -27,38 +32,45 @@
  *   main method.
  * 
  */
-public class TestLargeArrowBuf {
+public class ITTestLargeArrowBuf {
+  private static final Logger logger = 
LoggerFactory.getLogger(ITTestLargeArrowBuf.class);

Review comment:
   As Oscar Wilde says:
   
   > Consistency is the hallmark of the unimaginative.
   
   I have added https://issues.apache.org/jira/browse/ARROW-8739 to track





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on pull request #7095: ARROW-8664: [Java] Add flag to skip null check

2020-05-08 Thread GitBox


rymurr commented on pull request #7095:
URL: https://github.com/apache/arrow/pull/7095#issuecomment-625760957


   > This looks good to me. +1.
   > 
   > > In the past adding a getUnsafe method was discussed. What do you think 
about that?
   > 
   > This ensures that the all methods are consistent with the current flag. I 
think the getUnsafe could make sense but seems like a separate thing.
   
   Captured this in https://issues.apache.org/jira/browse/ARROW-8738



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on a change in pull request #7100: ARROW-8696: [Java] Convert tests to maven failsafe

2020-05-08 Thread GitBox


rymurr commented on a change in pull request #7100:
URL: https://github.com/apache/arrow/pull/7100#discussion_r422079081



##
File path: 
java/vector/src/test/java/org/apache/arrow/vector/ITTestLargeVector.java
##
@@ -45,32 +47,33 @@ private static void testLargeLongVector() {
 BigIntVector largeVec = new BigIntVector("vec", allocator)) {
   largeVec.allocateNew(vecLength);
 
-  System.out.println("Successfully allocated a vector with capacity " + 
vecLength);
+  logger.info("Successfully allocated a vector with capacity {}", 
vecLength);
 
   for (int i = 0; i < vecLength; i++) {
 largeVec.set(i, i * 10L);
 
 if ((i + 1) % 1 == 0) {
-  System.out.println("Successfully written " + (i + 1) + " values");
+  logger.debug("Successfully written {} values", i + 1);
 }
   }
-  System.out.println("Successfully written " + vecLength + " values");
+  logger.info("Successfully written {} values", vecLength);
 
   for (int i = 0; i < vecLength; i++) {
 long val = largeVec.get(i);
 assertEquals(i * 10L, val);
 
 if ((i + 1) % 1 == 0) {
-  System.out.println("Successfully read " + (i + 1) + " values");
+  logger.debug("Successfully read {} values", i + 1);
 }
   }
-  System.out.println("Successfully read " + vecLength + " values");
+  logger.info("Successfully read {} values", vecLength);

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on pull request #7100: ARROW-8696: [Java] Convert tests to maven failsafe

2020-05-08 Thread GitBox


rymurr commented on pull request #7100:
URL: https://github.com/apache/arrow/pull/7100#issuecomment-625759852


   > I suggest we do a separate failsafe invocation for the one test with 
specifalized system property requirements rather than applying those settings 
for all tests.
   
   done



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7128: ARROW-8722: [Dev] "archery docker run -e" doesn't work

2020-05-08 Thread GitBox


github-actions[bot] commented on pull request #7128:
URL: https://github.com/apache/arrow/pull/7128#issuecomment-625757285


   https://issues.apache.org/jira/browse/ARROW-8722



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on a change in pull request #7101: ARROW-8695: [Java] Remove references to PlatformDependent in arrow-memory

2020-05-08 Thread GitBox


rymurr commented on a change in pull request #7101:
URL: https://github.com/apache/arrow/pull/7101#discussion_r422074255



##
File path: 
java/memory/src/main/java/org/apache/arrow/memory/util/hash/SimpleHasher.java
##
@@ -58,21 +56,21 @@ public int hashCode(long address, long length) {
 int hashValue = 0;
 int index = 0;
 while (index + 8 <= length) {
-  long longValue = getLong(address + index);
+  long longValue = MemoryUtil.UNSAFE.getLong(address + index);
   int longHash = getLongHashCode(longValue);
   hashValue = combineHashCode(hashValue, longHash);
   index += 8;
 }
 
 if (index + 4 <= length) {
-  int intValue = getInt(address + index);
+  int intValue = MemoryUtil.UNSAFE.getInt(address + index);
   int intHash = intValue;
   hashValue = combineHashCode(hashValue, intHash);
   index += 4;
 }
 
 while (index < length) {
-  byte byteValue = getByte(address + index);
+  byte byteValue = MemoryUtil.UNSAFE.getByte(address + index);

Review comment:
   A cursory look at the bytecode shows the `PlatformDependent` call 
results in a few static method invocations (`INVOKESTATIC` bytecode op):
   
   `PlatformDependent.getByte` is defined as
   ``` java
   public static byte getByte(long address) {
 return PlatformDependent0.getByte(address);
   }
   ```
   and `PlatformDependent0` is
   ``` java
   public static byte getByte(long address) {
 return UNSAFE.getByte(address);
   }
   ```
   
   And the `MemoryUtil` one the same with 1 less layer of indirect static 
calls. I would anticipate the JIT would optimise the paths to be the same. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on a change in pull request #7101: ARROW-8695: [Java] Remove references to PlatformDependent in arrow-memory

2020-05-08 Thread GitBox


rymurr commented on a change in pull request #7101:
URL: https://github.com/apache/arrow/pull/7101#discussion_r422074255



##
File path: 
java/memory/src/main/java/org/apache/arrow/memory/util/hash/SimpleHasher.java
##
@@ -58,21 +56,21 @@ public int hashCode(long address, long length) {
 int hashValue = 0;
 int index = 0;
 while (index + 8 <= length) {
-  long longValue = getLong(address + index);
+  long longValue = MemoryUtil.UNSAFE.getLong(address + index);
   int longHash = getLongHashCode(longValue);
   hashValue = combineHashCode(hashValue, longHash);
   index += 8;
 }
 
 if (index + 4 <= length) {
-  int intValue = getInt(address + index);
+  int intValue = MemoryUtil.UNSAFE.getInt(address + index);
   int intHash = intValue;
   hashValue = combineHashCode(hashValue, intHash);
   index += 4;
 }
 
 while (index < length) {
-  byte byteValue = getByte(address + index);
+  byte byteValue = MemoryUtil.UNSAFE.getByte(address + index);

Review comment:
   A cursory look at the bytecode shows the `PlatformDependent` call 
results in a few static method invocations:
   
   `PlatformDependent.getByte` is defined as
   ``` java
   public static byte getByte(long address) {
 return PlatformDependent0.getByte(address);
   }
   ```
   and `PlatformDependent0` is
   ``` java
   public static byte getByte(long address) {
 return UNSAFE.getByte(address);
   }
   ```
   
   And the `MemoryUtil` one the same with 1 less layer of indirect static 
calls. I would anticipate the JIT would optimise the paths to be the same. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] rymurr commented on a change in pull request #7101: ARROW-8695: [Java] Remove references to PlatformDependent in arrow-memory

2020-05-08 Thread GitBox


rymurr commented on a change in pull request #7101:
URL: https://github.com/apache/arrow/pull/7101#discussion_r422074255



##
File path: 
java/memory/src/main/java/org/apache/arrow/memory/util/hash/SimpleHasher.java
##
@@ -58,21 +56,21 @@ public int hashCode(long address, long length) {
 int hashValue = 0;
 int index = 0;
 while (index + 8 <= length) {
-  long longValue = getLong(address + index);
+  long longValue = MemoryUtil.UNSAFE.getLong(address + index);
   int longHash = getLongHashCode(longValue);
   hashValue = combineHashCode(hashValue, longHash);
   index += 8;
 }
 
 if (index + 4 <= length) {
-  int intValue = getInt(address + index);
+  int intValue = MemoryUtil.UNSAFE.getInt(address + index);
   int intHash = intValue;
   hashValue = combineHashCode(hashValue, intHash);
   index += 4;
 }
 
 while (index < length) {
-  byte byteValue = getByte(address + index);
+  byte byteValue = MemoryUtil.UNSAFE.getByte(address + index);

Review comment:
   A cursory look at the bytecode shows the `PlatformDependent` call 
results in a few static method invocations:
   
   `PlatformDependent.getByte` is defined as
   ``` java
   public static byte getByte(long address) {
 return PlatformDependent0.getByte(address);
   }
   ```
   and `PlatformDependent0` is
   ``` java
   public static byte getByte(long address) {
 return UNSAFE.getByte(address);
   }
   ```
   
   And the MemoryUtil one the same with 1 less layer of indirect static calls. 
I would anticipate the JIT would optimise the paths to be the same. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kszucs opened a new pull request #7128: ARROW-8722: [Dev] "archery docker run -e" doesn't work

2020-05-08 Thread GitBox


kszucs opened a new pull request #7128:
URL: https://github.com/apache/arrow/pull/7128


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] tianchen92 commented on pull request #6433: ARROW-7495: [Java] Remove "empty" concept from ArrowBuf, replace with custom referencemanager

2020-05-08 Thread GitBox


tianchen92 commented on pull request #6433:
URL: https://github.com/apache/arrow/pull/6433#issuecomment-625704614


   > I think it would be good to have a set of tests ensuring that the behavior 
of the empty buffer is consistent. For example, refcnt, release/allocate, etc. 
That way, if someone changes the noop reference manager which this is depending 
on in the future, we don't break the expected behavior of an EmptyBuf as used 
by all the vector classes. Other than that, this looks good.
   
   Good suggestion, seems empty ArrowBuf was used in vectors before populating 
data. I have added a set of tests in TestValueVector.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] domiden opened a new issue #7127: [R] Implementing methodes for combine tabels using `dplyr::bind_rows` and `dplyr::bind_cols`

2020-05-08 Thread GitBox


domiden opened a new issue #7127:
URL: https://github.com/apache/arrow/issues/7127


   First at all, many thanks for your hard work! I was quite exited, when you 
guys implemented some basic function of the the `dplyr` package. Is there a why 
to combine tow or more arrow tables into one by rows or columns? At the moment 
my workaround looks like this:
   
   ``` r
   dplyr::bind_rows(
 "a" = arrow.table.1 %>% dplyr::collect(),
 "b" = arrow.table.2 %>% dplyr::collect(),
 "c" = arrow.table.3 %>% dplyr::collect(),
 "d" = arrow.table.4 %>% dplyr::collect(),
 .id = "ID"
   ) %>% 
 arrow::write_ipc_stream(sink = "file_name.arrow")
   ```
   But this is actually not really a meaningful measure because of putting the 
data back into the r environment, which might lead to an exhaust of RAM space. 
Perhaps you might have a better workaround on hand. It would be great if you 
guys could implement the `bind_rows` and `bind_cols`methods provided by 
`dplyr`. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org