[GitHub] [arrow] liyafan82 commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations
liyafan82 commented on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-699729876 Merging. The check failure is irrelavent. Thanks for the PR @josiahyan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] liyafan82 commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations
liyafan82 commented on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-698713708 @jacques-n Do you have any more comments? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] liyafan82 commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations
liyafan82 commented on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-698713708 @jacques-n Do you have any more comments? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] liyafan82 commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations
liyafan82 commented on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-697114813 Thank you all for the fruitful discussion. One small reminder for @josiahyan : to cache the buffer capacity, it is sufficient to use an `int` instead of a `long` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] liyafan82 commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations
liyafan82 commented on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-695873809 @josiahyan Thank you for the additional details. I think one of your concern is that, the underlying buffers can be changed unintentionally, which lefts the vector in an inconsistent state (Please correct me if my understanding is incorrect, or if I have missed something). IMO, we encourage users to manipulate vectors through standard APIs (e.g. get/set/setSafe, etc. ). However, unintended manipulations are always possible, because the getXXXBuffer APIs are there. More fundamentally, the buffers are all based on off-heap memory, so if the user has the memory address, they can manipulate the buffer data at will. So I think what we should do is to design and implement the APIs to try making vectors in a consistent state, and if the vector falls into an inconsistent state, we should trap it with minimal cost. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] liyafan82 commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations
liyafan82 commented on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-695873809 @josiahyan Thank you for the additional details. I think one of your concern is that, the underlying buffers can be changed unintentionally, which lefts the vector in an inconsistent state (Please correct me if my understanding is incorrect, or if I have missed something). IMO, we encourage users to manipulate vectors through standard APIs (e.g. get/set/setSafe, etc. ). However, unintended manipulations are always possible, because the getXXXBuffer APIs are there. More fundamentally, the buffers are all based on off-heap memory, so if the user has the memory address, they can manipulate the buffer data at will. So I think what we should do is to design and implement the APIs to try making vectors in a consistent state, and if the vector falls into an inconsistent state, we should trap it with minimal cost. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] liyafan82 commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations
liyafan82 commented on pull request #8214: URL: https://github.com/apache/arrow/pull/8214#issuecomment-694850013 > +1, integration test failure is unrelated. > > @liyafan82 since you're also working in this space, what do you think of the optimization here? I think it's an easy win and will help the non-nullable vector case too. @josiahyan Thanks for providing the patch. This is a problem that needs to be addressed. I am a little surprised that JIT failed to transform the integer divistion to a right shift, as it can be easily deduced that the type width for an IntVector is always 4. This change improves the peformance by saving the number of bits to shift and use a if branch to check if shift operations applies. I am thinking about another way: can we simply save the value/validity buffer capacities? so we can further improve the performance by avoiding the if branch in `getValueBufferValueCapacity` and the if branch in `capAtMaxInt` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org