[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216142#comment-16216142
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

wesm commented on issue #1203: ARROW-1474: [JAVA] Java ValueVector hierarchy 
Refactor (Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#issuecomment-338836892
 
 
   Merged in 
https://github.com/apache/arrow/commit/612b9708658ece27b1ba12ee92a997652beb007b,
 thanks @siddharthteotia and everyone for reviewing!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215961#comment-16215961
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

jacques-n commented on issue #1203: ARROW-1474: [JAVA] Java ValueVector 
hierarchy Refactor (Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#issuecomment-338815078
 
 
   I'm onboard with merge. We can still continue to address comments post merge 
as necessary. Big patches and github don't mix...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215902#comment-16215902
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

wesm commented on issue #1203: ARROW-1474:[WIP] Java Vector Refactor 
(Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#issuecomment-338806312
 
 
   I'm on board with merging to the refactor branch whenever you all agree


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215842#comment-16215842
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

siddharthteotia commented on issue #1203: ARROW-1474:[WIP] Java Vector Refactor 
(Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#issuecomment-338797396
 
 
   @jacques-n , @BryanCutler , @icexelloss 
   
   The latest commit has good javadocs for each and every new function in all 
vector types. 
   
   How do you feel about merging this patch to java-vector-refactor branch? I 
believe merging into master will require proper formal sign off.
   
   We are going to kickstart testing these in Dremio after cherry-picking these 
two patches from java-vector-refactor branch and making the necessary code 
changes. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215800#comment-16215800
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on a change in pull request #1203: ARROW-1474:[WIP] Java 
Vector Refactor (Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#discussion_r146388114
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseValueVector.java
 ##
 @@ -137,10 +136,19 @@ protected ArrowBuf releaseBuffer(ArrowBuf buffer) {
 return buffer;
   }
 
+  @Override
   public int getValueCount() { return 0; }
 
+  @Override
   public void setValueCount(int valueCount) { }
 
+  @Override
   public Object getObject(int index) { return null; }
+
+  @Override
+  public int getNullCount() { return 0; }
 
 Review comment:
   This is resolved.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215273#comment-16215273
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on issue #1203: ARROW-1474:[WIP] Java Vector Refactor 
(Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#issuecomment-338690930
 
 
   @siddharthteotia , thanks for the explanation.
   
   **W.r.t introduction of some static interfaces for JsonFileWriter/Reader**
   I see, so the reason is inner vectors are removed and therefore the json 
reader/writer doesn't work anymore.
   
   In that case, I am sort of OK with leaving these methods as public static 
but document clearly those methods are not part of public API and should not be 
used (IMHO "not recommended" is not strong enough, I would probably say 
"shouldn't") and refactor that later, before or after 0.8 release.
   
   What do other people think?
   
   **W.r.t introduction of some new APIs in ValueVector**
   
   > So I introduced APIs like getNullCount(), getValueCount(), 
setValueCount(), getObject() for the new nullable vectors. Once we remove 
non-nullable vectors and expose mutator/accessor functions as direct get/set in 
ValueVector, we can get rid of these APIs too.
   
   > User is free to call such methods on vectors
   
   @siddharthteotia, let me see if I understand this correctly:
   
   getNullCount(), getValueCount(), setValueCount(), getObject() are a part of 
the new vector API and we will keep them going forward.
   
   I think I saw a version of the BaseValueVector that these methods are 
returning bogus values and got confused. It seems to be correct row. I probably 
just saw a wip version. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215269#comment-16215269
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on issue #1203: ARROW-1474:[WIP] Java Vector Refactor 
(Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#issuecomment-338690930
 
 
   @siddharthteotia , thanks for the explanation.
   
   **W.r.t introduction of some static interfaces for JsonFileWriter/Reader**
   I see, so the reason is inner vectors are removed and therefore the json 
reader/writer doesn't work anymore.
   
   In that case, I am sort of OK with leaving these methods as public static 
but document clearly those methods are not part of public API and should not be 
used (IMHO "not recommended" is not strong enough, I would probably say 
"shouldn't") and refactor that later, before or after 0.8 release.
   
   What do other people think?
   
   **W.r.t introduction of some new APIs in ValueVector**
   
   > So I introduced APIs like getNullCount(), getValueCount(), 
setValueCount(), getObject() for the new nullable vectors. Once we remove 
non-nullable vectors and expose mutator/accessor functions as direct get/set in 
ValueVector, we can get rid of these APIs too.
   
   > User is free to call such methods on vectors
   
   @siddharthteotia, let me see if I understand this correctly:
   
   getNullCount(), getValueCount(), setValueCount(), getObject() are a part of 
the new vector API and we will keep them going forward.
   
   I think I saw a version of the BaseValueVector that these methods are 
returning bogus values, it seems to be correct row. I probably just saw a wip 
version. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215268#comment-16215268
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on issue #1203: ARROW-1474:[WIP] Java Vector Refactor 
(Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#issuecomment-338690930
 
 
   @siddharthteotia , thanks for the explanation.
   
   **W.r.t introduction of some static interfaces for JsonFileWriter/Reader**
   I see, so the reason is inner vectors are removed and therefore the json 
reader/writer doesn't work anymore.
   
   In that case, I am sort of OK with leaving these methods as public static 
but document clearly those methods are not part of public API and should not be 
used (IMHO "not recommended" is not strong enough, I would probably say 
"shouldn't") and refactor that later, before or after 0.8 release.
   
   What do other people think?
   
   **W.r.t introduction of some new APIs in ValueVector**
   
   > So I introduced APIs like getNullCount(), getValueCount(), 
setValueCount(), getObject() for the new nullable vectors. Once we remove 
non-nullable vectors and expose mutator/accessor functions as direct get/set in 
ValueVector, we can get rid of these APIs too.
   
   > User is free to call such methods on vectors
   @siddharthteotia, let me see if I understand this correctly:
   
   getNullCount(), getValueCount(), setValueCount(), getObject() are a part of 
the new vector API and we will keep them going forward.
   
   I think I saw a version of the BaseValueVector that these methods are 
returning bogus values, it seems to be correct row. I probably just saw a wip 
version. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215120#comment-16215120
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

wesm commented on issue #1203: ARROW-1474:[WIP] Java Vector Refactor 
(Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#issuecomment-338656053
 
 
   > For future reference, I usually prefer not to squash commit during PR - 
this makes it hard to track incremental changes. We can squash commit when 
merging.
   
   We're really running into a weakness of GitHub code reviews. My 
understanding is that Dremio uses Gerrit for code reviews (like Kudu, Impala, 
and lots of Google projects) and so the squashing is a key part of the Gerrit 
workflow. But it works pretty poorly for GitHub, where having a string of new 
commits is better (although the GitHub UI is terrible for reviewing incremental 
diffs)
   
   I would really like to have the option of doing large Arrow code reviews on 
Gerrit. It can be a bit challenging to do (because Gerrit can fall out of sync) 
unless you have 100% of your reviews hosted on there, and Gerrit is quite a bit 
of process for some users. I hope that we find a way to do this in the future.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214901#comment-16214901
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

siddharthteotia commented on issue #1203: ARROW-1474:[WIP] Java Vector Refactor 
(Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#issuecomment-338598989
 
 
   **W.r.t introduction of some static interfaces for JsonFileWriter/Reader**
   
   The introduction of couple of static interfaces is not absolutely necessary. 
They are written for better readability in JsonFileReader's gigantic switch 
block when it parses Json and writes to the vector (and its inner vectors). 
Since now we no longer have inner vectors, we obviously couldn't leverage the 
same code. The JsonFileReader had to be changed to specifically write to 
different buffers (TYPE, VALIDITY, OFFSET, DATA) for a particular vector. Also 
it has to allocate the buffer and appropriately set writer index before calling 
loadFieldBuffers. This is something that was needed for every case in switch 
block here. Once I did this, the code looked pretty messy and ugly. So I moved 
all the logic private to vectors and made them as part of static interfaces.
   
   On the other hand, in JsonFileWriter we were reading from vector (and its 
inner vectors) and writing out Json data. Again, since there are no inner 
vectors, all operation had to be transformed to work at the buffer level -- for 
writing the contents of each inner buffer. Also, the old code of JsonFileWriter 
stated a TODO that it was not handling each type. The new code handles all 
types.
   
   If the general preference is to not introduce static interfaces in vector 
APIs, I am fine with removing them and moving all logic into JSon code itself. 
The javadocs already indicate that external use of these APIs is not 
recommended.
   
   **W.r.t introduction of some new APIs in ValueVector**
   
   Note that top level interface is still ValueVector even though hierarchy 
underneath has changed. So there are still non-nullable vectors extending 
ValueVector, implementing mutator/accessor interfaces etc.
   
   So I introduced APIs like getNullCount(), getValueCount(), setValueCount(), 
getObject() for the new nullable vectors. Once we remove non-nullable vectors 
and expose mutator/accessor functions as direct get/set in ValueVector, we can 
get rid of these APIs too. 
   
   User is free to call such methods on vectors since internally they delegate 
the call to corresponding mutator/accessor operation for non-nullable vectors 
and for nullable vectors we already have the new implementation. For legacy 
vectors, it doesn't really matter since each operation is just a pass-through 
to new code.
   
   There aren't any placeholder interfaces anywhere. Each vector (nullable or 
non-nullable) has a concrete implementation of all such interfaces as 
prescribed by ValueVector. Correctness is not affected anywhere. We should be 
able to do the simple cleanup once we remove non-nullable vectors. If we are 
not planning to remove non-nullable vectors then we should just remove 
mutator/accessor from them and expose all the get/set APIs directly just like 
we have done for other nullable and complex vectors. That will also allow us to 
do simple cleanup.
   
   Whatever we decide to do with non-nullable scalar vectors, we should do soon 
to make the entire java Vector code under ValueVector hierarchy consistent.
   
   Right now the nullable scalars and complex types are consistent -- none of 
them have inner vectors and none of them support mutator/accessor based access. 
Either we should do the same thing with non nullable vectors or remove them all 
together. The latter is preferable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214859#comment-16214859
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

siddharthteotia commented on a change in pull request #1203: ARROW-1474:[WIP] 
Java Vector Refactor (Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#discussion_r146209120
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/complex/BaseRepeatedValueVector.java
 ##
 @@ -187,54 +227,58 @@ protected void replaceDataVector(FieldVector v) {
 vector = v;
   }
 
-  public abstract class BaseRepeatedAccessor extends 
BaseValueVector.BaseAccessor implements RepeatedAccessor {
 
-@Override
-public int getValueCount() {
-  return Math.max(offsets.getAccessor().getValueCount() - 1, 0);
-}
+  @Override
+  public int getValueCount() {
+return valueCount;
+  }
 
-@Override
-public int getInnerValueCount() {
-  return vector.getAccessor().getValueCount();
-}
+  /* returns the value count for inner data vector for this list vector */
+  public int getInnerValueCount() {
+return vector.getValueCount();
+  }
 
-@Override
-public int getInnerValueCountAt(int index) {
-  return offsets.getAccessor().get(index + 1) - 
offsets.getAccessor().get(index);
-}
 
-@Override
-public boolean isNull(int index) {
-  return false;
-}
+  /* returns the value count for inner data vector at a particular index */
+  public int getInnerValueCountAt(int index) {
+return offsetBuffer.getInt((index + 1) * OFFSET_WIDTH) -
+offsetBuffer.getInt(index * OFFSET_WIDTH);
+  }
 
-@Override
-public boolean isEmpty(int index) {
-  return false;
-}
+  public boolean isNull(int index) {
+return false;
   }
 
-  public abstract class BaseRepeatedMutator extends 
BaseValueVector.BaseMutator implements RepeatedMutator {
+  public boolean isEmpty(int index) {
+return false;
+  }
 
-@Override
-public int startNewValue(int index) {
-  while (offsets.getValueCapacity() <= index) {
-offsets.reAlloc();
-  }
-  int offset = offsets.getAccessor().get(index);
-  offsets.getMutator().setSafe(index + 1, offset);
-  setValueCount(index + 1);
-  return offset;
+  public int startNewValue(int index) {
+while (index >= getOffsetBufferValueCapacity()) {
+  reallocOffsetBuffer();
 }
+int offset = offsetBuffer.getInt(index * OFFSET_WIDTH);
+offsetBuffer.setInt((index + 1) * OFFSET_WIDTH, offset);
+setValueCount(index + 1);
+return offset;
+  }
 
-@Override
-public void setValueCount(int valueCount) {
-  // TODO: populate offset end points
-  offsets.getMutator().setValueCount(valueCount == 0 ? 0 : valueCount + 1);
-  final int childValueCount = valueCount == 0 ? 0 : 
offsets.getAccessor().get(valueCount);
-  vector.getMutator().setValueCount(childValueCount);
+  public void setValueCount(int valueCount) {
+// TODO: populate offset end points
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214572#comment-16214572
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on issue #1203: ARROW-1474:[WIP] Java Vector Refactor 
(Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#issuecomment-338536838
 
 
   @siddharthteotia For future reference, I usually prefer not to squash commit 
during PR - this makes it hard to track incremental changes. We can squash 
commit when merging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214571#comment-16214571
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on issue #1203: ARROW-1474:[WIP] Java Vector Refactor 
(Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#issuecomment-338536671
 
 
   A few high level comments:
   * There are a few functions that seems to be placeholder for the interface - 
for instance `getNullCount`  that only returns 0. I suppose these functions are 
not going to be removed in 0.8? If that's the case, I think we should make it 
clear to the user that these functions will not work and shouldn't be called. 
Having bogus implementation could be confusing to user.
   
   * API added in vectors classes that are for json reader/writer. I think 
those shouldn't be a part of the public API. Also I am not sure why we need to 
make such changes, maybe @siddharthteotia can help clarify?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214563#comment-16214563
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on a change in pull request #1203: ARROW-1474:[WIP] Java 
Vector Refactor (Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#discussion_r146157662
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/complex/ListVector.java
 ##
 @@ -129,14 +152,70 @@ public UnionListWriter getWriter() {
 
   @Override
   public void allocateNew() throws OutOfMemoryException {
-super.allocateNewSafe();
-bits.allocateNewSafe();
+   if (!allocateNewSafe()) {
+ throw new OutOfMemoryException("Failure while allocating memory");
+   }
+  }
+
+  public boolean allocateNewSafe() {
+boolean success = false;
+try {
+  /* allocate validity buffer */
+  allocateValidityBuffer(validityAllocationSizeInBytes);
+  /* allocate offset and data buffer */
+  success = super.allocateNewSafe();
+} finally {
+  if (!success) {
+clear();
+return false;
+  }
+}
+return true;
+  }
+
+  private void allocateValidityBuffer(final long size) {
 
 Review comment:
   Ok this is fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214561#comment-16214561
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on a change in pull request #1203: ARROW-1474:[WIP] Java 
Vector Refactor (Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#discussion_r146157560
 
 

 ##
 File path: java/vector/src/main/codegen/templates/UnionVector.java
 ##
 @@ -550,15 +532,5 @@ public void setType(int index, MinorType type) {
 }
 
 @Override
-public void reset() { }
-
-@Override
-public void generateTestData(int values) { }
-  }
-
-  public int getValueCount() { return 0; }
-
-  public void setValueCount(int valueCount) { }
-
-  public Object getObject(int index) { return null; }
+public int getNullCount() { return 0; }
 
 Review comment:
   Should this be `UnsupportedOperationException`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214562#comment-16214562
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on a change in pull request #1203: ARROW-1474:[WIP] Java 
Vector Refactor (Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#discussion_r146157591
 
 

 ##
 File path: java/vector/src/main/codegen/templates/UnionVector.java
 ##
 @@ -363,12 +361,12 @@ public void copyValueSafe(int from, int to) {
 
   @Override
   public Accessor getAccessor() {
 
 Review comment:
   Sorry the link is not valid any more, can you paste a new link?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214555#comment-16214555
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on a change in pull request #1203: ARROW-1474:[WIP] Java 
Vector Refactor (Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#discussion_r146157276
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseNullableVariableWidthVector.java
 ##
 @@ -761,4 +741,57 @@ protected final void handleSafe(int index, int 
dataLength) {
  reallocValueBuffer();
   }
}
+
+
+   /**
+**
+*helper methods currently*
+*used by JsonFileReader and  *
+*JsonFileWriter  *
+**
+**/
+
+
+   public static ArrowBuf setSafeJsonHelper(ArrowBuf data, ArrowBuf offset,
+BufferAllocator allocator, int 
index, byte[] value,
+int valueCount) {
+  if (data == null) {
+ data = allocator.buffer(INITIAL_BYTE_COUNT);
+  }
+  final int currentBufferCapacity = data.capacity();
+  final int currentStartOffset = offset.getInt(index * OFFSET_WIDTH);
+  while (currentBufferCapacity < currentStartOffset + value.length) {
+ final ArrowBuf newBuf = allocator.buffer(currentBufferCapacity * 2);
+ newBuf.setBytes(0, data, 0, currentBufferCapacity);
+ data.release();
+ data = newBuf;
+  }
+  data.setBytes(currentStartOffset, value, 0, value.length);
+  if (index == (valueCount - 1)) {
+ data.writerIndex(offset.getInt(valueCount * OFFSET_WIDTH));
+  }
+  return data;
+   }
+
+   public static byte[] get(final ArrowBuf data, final ArrowBuf offset, int 
index) {
 
 Review comment:
   Sorry, I really don't like having these functions as a part of the vector 
API just for the JSON reader. I am not sure what's the best way to do this 
though - Can you please explain why do we need these methods? The original JSON 
reader/writer use these - 
   
https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java#L344
   
   
https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileWriter.java#L240


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214546#comment-16214546
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on a change in pull request #1203: ARROW-1474:[WIP] Java 
Vector Refactor (Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#discussion_r146156364
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseValueVector.java
 ##
 @@ -137,10 +136,19 @@ protected ArrowBuf releaseBuffer(ArrowBuf buffer) {
 return buffer;
   }
 
+  @Override
   public int getValueCount() { return 0; }
 
+  @Override
   public void setValueCount(int valueCount) { }
 
+  @Override
   public Object getObject(int index) { return null; }
+
+  @Override
+  public int getNullCount() { return 0; }
 
 Review comment:
   I am concerned these methods would be confusing to user, so if we have to 
keep them, I think we should make it clear these function won't work and 
shouldn't be called at all. Maybe have a noisy deprecation warning or sth.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214545#comment-16214545
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on a change in pull request #1203: ARROW-1474:[WIP] Java 
Vector Refactor (Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#discussion_r146156364
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseValueVector.java
 ##
 @@ -137,10 +136,19 @@ protected ArrowBuf releaseBuffer(ArrowBuf buffer) {
 return buffer;
   }
 
+  @Override
   public int getValueCount() { return 0; }
 
+  @Override
   public void setValueCount(int valueCount) { }
 
+  @Override
   public Object getObject(int index) { return null; }
+
+  @Override
+  public int getNullCount() { return 0; }
 
 Review comment:
   I am concerned these methods would be confusing to user, so if we have to 
keep them, I think we should make it clear these function won't work and 
shouldn't be called at all.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214543#comment-16214543
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on a change in pull request #1203: ARROW-1474:[WIP] Java 
Vector Refactor (Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#discussion_r146156364
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseValueVector.java
 ##
 @@ -137,10 +136,19 @@ protected ArrowBuf releaseBuffer(ArrowBuf buffer) {
 return buffer;
   }
 
+  @Override
   public int getValueCount() { return 0; }
 
+  @Override
   public void setValueCount(int valueCount) { }
 
+  @Override
   public Object getObject(int index) { return null; }
+
+  @Override
+  public int getNullCount() { return 0; }
 
 Review comment:
   I am concerned these methods would be confusing to user, so we have to keep 
them, I think we should make it clear these function won't work and shouldn't 
be called at all.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214542#comment-16214542
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on a change in pull request #1203: ARROW-1474:[WIP] Java 
Vector Refactor (Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#discussion_r146156320
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/BaseValueVector.java
 ##
 @@ -137,10 +136,19 @@ protected ArrowBuf releaseBuffer(ArrowBuf buffer) {
 return buffer;
   }
 
+  @Override
   public int getValueCount() { return 0; }
 
+  @Override
   public void setValueCount(int valueCount) { }
 
+  @Override
   public Object getObject(int index) { return null; }
+
+  @Override
+  public int getNullCount() { return 0; }
 
 Review comment:
   How about have them throw `UnsupportedOperationException`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214539#comment-16214539
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on a change in pull request #1203: ARROW-1474:[WIP] Java 
Vector Refactor (Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#discussion_r146155849
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/BitVectorHelper.java
 ##
 @@ -19,24 +19,25 @@
 package org.apache.arrow.vector;
 
 import io.netty.buffer.ArrowBuf;
+import org.apache.arrow.memory.BufferAllocator;
 
-class BitVectorHelper {
+public class BitVectorHelper {
 
 Review comment:
   I see. In that case, maybe add doc in this class that this is internally API?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214535#comment-16214535
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on a change in pull request #1203: ARROW-1474:[WIP] Java 
Vector Refactor (Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#discussion_r146155339
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/complex/NullableMapVector.java
 ##
 @@ -60,9 +60,6 @@ public static NullableMapVector empty(String name, 
BufferAllocator allocator) {
 
   private final List innerVectors;
 
-  private final Accessor accessor;
-  private final Mutator mutator;
-
   // deprecated, use FieldType or static constructor instead
   @Deprecated
   public NullableMapVector(String name, BufferAllocator allocator, CallBack 
callBack) {
 
 Review comment:
   track in ARROW-1710


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214527#comment-16214527
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on a change in pull request #1203: ARROW-1474:[WIP] Java 
Vector Refactor (Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#discussion_r146154884
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/complex/ListVector.java
 ##
 @@ -55,14 +58,12 @@ public static ListVector empty(String name, 
BufferAllocator allocator) {
 return new ListVector(name, allocator, 
FieldType.nullable(ArrowType.List.INSTANCE), null);
   }
 
-  final UInt4Vector offsets;
-  final BitVector bits;
-  private final List innerVectors;
-  private Mutator mutator = new Mutator();
-  private Accessor accessor = new Accessor();
+  private ArrowBuf validityBuffer;
   private UnionListReader reader;
   private CallBack callBack;
   private final FieldType fieldType;
+  private int validityAllocationSizeInBytes;
+  private int lastSet;
 
 Review comment:
   Ok this is fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214526#comment-16214526
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on a change in pull request #1203: ARROW-1474:[WIP] Java 
Vector Refactor (Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#discussion_r146154853
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/complex/ListVector.java
 ##
 @@ -55,14 +58,12 @@ public static ListVector empty(String name, 
BufferAllocator allocator) {
 return new ListVector(name, allocator, 
FieldType.nullable(ArrowType.List.INSTANCE), null);
   }
 
-  final UInt4Vector offsets;
-  final BitVector bits;
-  private final List innerVectors;
-  private Mutator mutator = new Mutator();
-  private Accessor accessor = new Accessor();
+  private ArrowBuf validityBuffer;
 
 Review comment:
   Ok. This is fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214525#comment-16214525
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

icexelloss commented on a change in pull request #1203: ARROW-1474:[WIP] Java 
Vector Refactor (Implementation Phase 2)
URL: https://github.com/apache/arrow/pull/1203#discussion_r146154778
 
 

 ##
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/complex/BaseRepeatedValueVector.java
 ##
 @@ -187,54 +227,58 @@ protected void replaceDataVector(FieldVector v) {
 vector = v;
   }
 
-  public abstract class BaseRepeatedAccessor extends 
BaseValueVector.BaseAccessor implements RepeatedAccessor {
 
-@Override
-public int getValueCount() {
-  return Math.max(offsets.getAccessor().getValueCount() - 1, 0);
-}
+  @Override
+  public int getValueCount() {
+return valueCount;
+  }
 
-@Override
-public int getInnerValueCount() {
-  return vector.getAccessor().getValueCount();
-}
+  /* returns the value count for inner data vector for this list vector */
+  public int getInnerValueCount() {
+return vector.getValueCount();
+  }
 
-@Override
-public int getInnerValueCountAt(int index) {
-  return offsets.getAccessor().get(index + 1) - 
offsets.getAccessor().get(index);
-}
 
-@Override
-public boolean isNull(int index) {
-  return false;
-}
+  /* returns the value count for inner data vector at a particular index */
+  public int getInnerValueCountAt(int index) {
+return offsetBuffer.getInt((index + 1) * OFFSET_WIDTH) -
+offsetBuffer.getInt(index * OFFSET_WIDTH);
+  }
 
-@Override
-public boolean isEmpty(int index) {
-  return false;
-}
+  public boolean isNull(int index) {
+return false;
   }
 
-  public abstract class BaseRepeatedMutator extends 
BaseValueVector.BaseMutator implements RepeatedMutator {
+  public boolean isEmpty(int index) {
+return false;
+  }
 
-@Override
-public int startNewValue(int index) {
-  while (offsets.getValueCapacity() <= index) {
-offsets.reAlloc();
-  }
-  int offset = offsets.getAccessor().get(index);
-  offsets.getMutator().setSafe(index + 1, offset);
-  setValueCount(index + 1);
-  return offset;
+  public int startNewValue(int index) {
+while (index >= getOffsetBufferValueCapacity()) {
+  reallocOffsetBuffer();
 }
+int offset = offsetBuffer.getInt(index * OFFSET_WIDTH);
+offsetBuffer.setInt((index + 1) * OFFSET_WIDTH, offset);
+setValueCount(index + 1);
+return offset;
+  }
 
-@Override
-public void setValueCount(int valueCount) {
-  // TODO: populate offset end points
-  offsets.getMutator().setValueCount(valueCount == 0 ? 0 : valueCount + 1);
-  final int childValueCount = valueCount == 0 ? 0 : 
offsets.getAccessor().get(valueCount);
-  vector.getMutator().setValueCount(childValueCount);
+  public void setValueCount(int valueCount) {
+// TODO: populate offset end points
 
 Review comment:
   Can we maybe remove it then?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211987#comment-16211987
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

Github user siddharthteotia commented on the issue:

https://github.com/apache/arrow/pull/1203
  
All tests run clean with latest commit.


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211945#comment-16211945
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

Github user siddharthteotia commented on the issue:

https://github.com/apache/arrow/pull/1203
  
The latest commit addresses some recent review comments and removes the 
inner vectors from LIST. I guess we are left with Implementing Timestamp 
vectors as suggested above -- 
https://github.com/apache/arrow/pull/1203#discussion_r145286430
I will get going with it unless there are other specific concerns (modulo 
indentation etc)



> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211394#comment-16211394
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

Github user BryanCutler commented on the issue:

https://github.com/apache/arrow/pull/1203
  
I looked at JSONfile reader/writer and looks good from what I can tell so 
far, just some minor style comments


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210834#comment-16210834
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

Github user siddharthteotia commented on the issue:

https://github.com/apache/arrow/pull/1203
  
@jacques-n , @BryanCutler , @icexelloss 

- Addressed issues in JsonFileReader and JsonFileWriter. Quite some work 
was needed here because getFieldInnerVectors is not applicable anymore. I have 
introduced static methods in vectors as helper routines for reading from and 
writing into JSON.

- Removed logger from all vectors.

- Addressed failures in tests. Down to one failure now.

Remaining:

- I haven't yet removed the inner vectors from List. Will do that tomorrow.


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210356#comment-16210356
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

Github user wesm commented on the issue:

https://github.com/apache/arrow/pull/1203
  
I opened https://issues.apache.org/jira/browse/ARROW-1688. We must keep our 
code clean


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210108#comment-16210108
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

Github user icexelloss commented on the issue:

https://github.com/apache/arrow/pull/1203
  
Probably not. I fixed some checkstyles warning  but not all of them:
https://github.com/apache/arrow/pull/930#issuecomment-319492008

So we cannot failure the build for checkstyles yet.


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210018#comment-16210018
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

Github user wesm commented on the issue:

https://github.com/apache/arrow/pull/1203
  
Is this style issue being checked in Travis CI?


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209937#comment-16209937
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

Github user icexelloss commented on the issue:

https://github.com/apache/arrow/pull/1203
  
@siddharthteotia I also found the new files are 3-space indented. The 
current files are 2-space indented. Please fix the indentation?


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208649#comment-16208649
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

Github user siddharthteotia commented on the issue:

https://github.com/apache/arrow/pull/1203
  
Thanks for the thorough review @jacques-n , @BryanCutler , @icexelloss , 
@wesm . I am in the process of addressing comments as we are reaching 
consensus. Meanwhile, I am trying to prioritize stability of tests. 

Right now we have 2 related failures in ComplexWriter, Promotable Writer 
and lots of failures in TestJsonFile since getFieldInnerVectors is no longer 
applicable. I am addressing the former ones as of now.

The recent commit has refactored complex types -- LIST, FIXED SIZE LIST, 
MAP, UNION along with corresponding Legacy types and code changes in the 
callers to make things work.


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208553#comment-16208553
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

Github user BryanCutler commented on the issue:

https://github.com/apache/arrow/pull/1203
  
@siddharthteotia , regarding the logger can you specific vector logs by 
moving to the base class and initializing the logger with the class name as a 
string, e.g. `getLogger(this.getClass().getSimpleName());`?


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208258#comment-16208258
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

Github user jacques-n commented on the issue:

https://github.com/apache/arrow/pull/1203
  
@BryanCutler and @siddharthteotia , the reason the vector holders a 
reference to ArrowReader is that there are several cases where you don't want 
to constantly be recreating the reader and at the time, this was the easiest 
place to maintain it. Not sure if this still the case but it is something that 
should be reviewed before removing.


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208252#comment-16208252
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

Github user icexelloss commented on the issue:

https://github.com/apache/arrow/pull/1203
  
Thanks @siddharthteotia. I went through the change and left a few comments.


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16207144#comment-16207144
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

Github user siddharthteotia commented on the issue:

https://github.com/apache/arrow/pull/1203
  
All scalar types refactored and new implementation is ready -- builds fine. 
Corresponding Legacy vectors are also ready.
Testing has issues w.r.t mutator/accessor in LIST, MAP, UNION. Next step is 
to refactor them.


> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205531#comment-16205531
 ] 

ASF GitHub Bot commented on ARROW-1474:
---

GitHub user siddharthteotia opened a pull request:

https://github.com/apache/arrow/pull/1203

ARROW-1474:[WIP] Java Vector Refactor (Implementation Phase 2)

Implementation of majority of scalar types:

Few timestamp types and Nullable Var Binary are in progress along with 
ListVector, MapVector, FixedSizeListVector

cc @jacques-n , @BryanCutler , @icexelloss 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/siddharthteotia/arrow ARROW-1474

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/arrow/pull/1203.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1203


commit edf9acca66e9001ee2673e3973d27dc172d33713
Author: siddharth 
Date:   2017-10-16T07:39:30Z

ARROW-1474:[WIP] Java Vector Refactor (Implementation Phase 2)




> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1474) [JAVA] ValueVector hierarchy (Implementation Phase 2)

2017-10-14 Thread Siddharth Teotia (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204754#comment-16204754
 ] 

Siddharth Teotia commented on ARROW-1474:
-

A patch with implementation for other scalar types is in progress along with 
creating their legacy counterparts (mutator/accessor based)

* NullableBigIntVector
* NullableFloat4Vector
* NullableFloat8Vector
* NullableUInt1Vector
* NullableUInt2Vector
* NullableUInt4Vector
* NullableUInt8Vector
* Nullable  vector




> [JAVA] ValueVector hierarchy (Implementation Phase 2)
> -
>
> Key: ARROW-1474
> URL: https://issues.apache.org/jira/browse/ARROW-1474
> Project: Apache Arrow
>  Issue Type: Sub-task
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)