[GitHub] [arrow] liyafan82 commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-27 Thread GitBox


liyafan82 commented on pull request #8214:
URL: https://github.com/apache/arrow/pull/8214#issuecomment-699729876


   Merging. The check failure is irrelavent. Thanks for the PR @josiahyan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-25 Thread GitBox


liyafan82 commented on pull request #8214:
URL: https://github.com/apache/arrow/pull/8214#issuecomment-698713708


   @jacques-n Do you have any more comments?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-24 Thread GitBox


liyafan82 commented on pull request #8214:
URL: https://github.com/apache/arrow/pull/8214#issuecomment-698713708


   @jacques-n Do you have any more comments?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-22 Thread GitBox


liyafan82 commented on pull request #8214:
URL: https://github.com/apache/arrow/pull/8214#issuecomment-697114813


   Thank you all for the fruitful discussion.
   One small reminder for @josiahyan : to cache the buffer capacity, it is 
sufficient to use an `int` instead of a `long`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-21 Thread GitBox


liyafan82 commented on pull request #8214:
URL: https://github.com/apache/arrow/pull/8214#issuecomment-695873809


   @josiahyan Thank you for the additional details. 
   
   I think one of your concern is that, the underlying buffers can be changed 
unintentionally, which lefts the vector in an inconsistent state (Please 
correct me if my understanding is incorrect, or if I have missed something). 
   
   IMO, we encourage users to manipulate vectors through standard APIs (e.g. 
get/set/setSafe, etc. ). However, unintended manipulations are always possible, 
because the getXXXBuffer APIs are there. More fundamentally, the buffers are 
all based on off-heap memory, so if the user has the memory address, they can 
manipulate the buffer data at will. 
   
   So I think what we should do is to design and implement the APIs to try 
making vectors in a consistent state, and if the vector falls into an 
inconsistent state, we should trap it with minimal cost. 
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-20 Thread GitBox


liyafan82 commented on pull request #8214:
URL: https://github.com/apache/arrow/pull/8214#issuecomment-695873809


   @josiahyan Thank you for the additional details. 
   
   I think one of your concern is that, the underlying buffers can be changed 
unintentionally, which lefts the vector in an inconsistent state (Please 
correct me if my understanding is incorrect, or if I have missed something). 
   
   IMO, we encourage users to manipulate vectors through standard APIs (e.g. 
get/set/setSafe, etc. ). However, unintended manipulations are always possible, 
because the getXXXBuffer APIs are there. More fundamentally, the buffers are 
all based on off-heap memory, so if the user has the memory address, they can 
manipulate the buffer data at will. 
   
   So I think what we should do is to design and implement the APIs to try 
making vectors in a consistent state, and if the vector falls into an 
inconsistent state, we should trap it with minimal cost. 
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on pull request #8214: ARROW-9965: [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations

2020-09-18 Thread GitBox


liyafan82 commented on pull request #8214:
URL: https://github.com/apache/arrow/pull/8214#issuecomment-694850013


   > +1, integration test failure is unrelated.
   > 
   > @liyafan82 since you're also working in this space, what do you think of 
the optimization here? I think it's an easy win and will help the non-nullable 
vector case too.
   
   @josiahyan Thanks for providing the patch. This is a problem that needs to 
be addressed. 
   
   I am a little surprised that JIT failed to transform the integer divistion 
to a right shift, as it can be easily deduced that the type width for an 
IntVector is always 4. 
   
   This change improves the peformance by saving the number of bits to shift 
and use a if branch to check if shift operations applies.
   
   I am thinking about another way: can we simply save the value/validity 
buffer capacities? so we can further improve the performance by avoiding the if 
branch in `getValueBufferValueCapacity` and the if branch in  `capAtMaxInt`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org