[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce the memory used by direct addressing of arcs

2019-11-11 Thread GitBox
bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce 
the memory used by direct addressing of arcs
URL: https://github.com/apache/lucene-solr/pull/980#discussion_r345054344
 
 

 ##
 File path: lucene/core/src/test/org/apache/lucene/util/TestBitUtil.java
 ##
 @@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util;
+
+public class TestBitUtil extends LuceneTestCase {
+
+  public void testNextBitSet() {
+for (int i = 0; i < 1; i++) {
+  long[] bits = buildRandomBits();
+  int numLong = bits.length - 1;
+
+  // Verify nextBitSet with countBitsUpTo for all bit indexes.
+  for (int bitIndex = -1; bitIndex < 64 * numLong; bitIndex++) {
+int nextIndex = BitUtil.nextBitSet(bits, numLong, bitIndex);
+if (nextIndex == -1) {
+  assertEquals("No next bit set, so expected no bit count diff (i=" + 
i + " bitIndex=" + bitIndex + ")",
+  BitUtil.countBitsUpTo(bits, numLong, bitIndex + 1), 
BitUtil.countBits(bits, numLong));
+} else {
+  assertTrue("Expected next bit set at nextIndex=" + nextIndex + " 
(i=" + i + " bitIndex=" + bitIndex + ")",
+  BitUtil.isBitSet(bits, numLong, nextIndex));
+  assertEquals("Next bit set at nextIndex=" + nextIndex + " so 
expected bit count diff of 1 (i=" + i + " bitIndex=" + bitIndex + ")",
+  BitUtil.countBitsUpTo(bits, numLong, bitIndex + 1) + 1, 
BitUtil.countBitsUpTo(bits, numLong, nextIndex + 1));
+}
+  }
+}
+  }
+
+  public void testPreviousBitSet() {
+for (int i = 0; i < 1; i++) {
+  long[] bits = buildRandomBits();
+  int numLong = bits.length - 1;
+
+  // Verify previousBitSet with countBitsUpTo for all bit indexes.
+  for (int bitIndex = 0; bitIndex <= 64 * numLong; bitIndex++) {
+int previousIndex = BitUtil.previousBitSet(bits, numLong, bitIndex);
+if (previousIndex == -1) {
+  assertEquals("No previous bit set, so expected bit count 0"
+  + " (i=" + i + " bitIndex=" + bitIndex + ")",
+  0, BitUtil.countBitsUpTo(bits, numLong, bitIndex));
+} else {
+  assertTrue("Expected previous bit set at previousIndex=" + 
previousIndex
+  + " (i=" + i + " bitIndex=" + bitIndex + ")",
+  BitUtil.isBitSet(bits, numLong, previousIndex));
+  int bitCount = BitUtil.countBitsUpTo(bits, numLong, 
Math.min(bitIndex + 1, numLong * Long.SIZE));
+  int expectedPreviousBitCount = bitIndex < numLong * Long.SIZE && 
BitUtil.isBitSet(bits, numLong, bitIndex) ?
+  bitCount - 1 : bitCount;
+  assertEquals("Previous bit set at previousIndex=" + previousIndex
+  + " with current bitCount=" + bitCount
+  + " so expected previousBitCount=" + expectedPreviousBitCount
+  + " (i=" + i + " bitIndex=" + bitIndex + ")",
+  expectedPreviousBitCount, BitUtil.countBitsUpTo(bits, numLong, 
previousIndex + 1));
+}
+  }
+}
+  }
+
+  private long[] buildRandomBits() {
+long[] bits = new long[random().nextInt(3) + 2];
+for (int j = 0; j < bits.length; j++) {
+  bits[j] = random().nextLong();
 
 Review comment:
   Good point.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce the memory used by direct addressing of arcs

2019-11-11 Thread GitBox
bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce 
the memory used by direct addressing of arcs
URL: https://github.com/apache/lucene-solr/pull/980#discussion_r345054246
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/fst/FST.java
 ##
 @@ -249,38 +261,127 @@ public T nextFinalOutput() {
   return nextFinalOutput;
 }
 
-long nextArc() {
+/** Address (into the byte[]) of the next arc - only for list of variable 
length arc.
+ * Or ord/address to the next node if label == {@link #END_LABEL}. */
+ long nextArc() {
   return nextArc;
 }
 
-/** Where the first arc in the array starts; only valid if
- *  bytesPerArc != 0 */
+/** Where we are in the array; only valid if bytesPerArc != 0. */
+public int arcIdx() {
+  return arcIdx;
+}
+
+/** Node header flags. Only meaningful to check if the value is either
+ * {@link #ARCS_FOR_BINARY_SEARCH} or {@link #ARCS_FOR_DIRECT_ADDRESSING}
+ * (other value when bytesPerArc == 0). */
+public byte nodeFlags() {
+  return nodeFlags;
+}
+
+/** Where the first arc in the array starts; only valid if bytesPerArc != 
0 */
 public long posArcsStart() {
   return posArcsStart;
 }
 
-/** Non-zero if this arc is part of an array, which means all
+/** Non-zero if this arc is part of a node with fixed length arcs, which 
means all
  *  arcs for the node are encoded with a fixed number of bytes so
- *  that we can random access by index.  We do when there are enough
- *  arcs leaving one node.  It wastes some bytes but gives faster
- *  lookups. */
+ *  that we binary search or direct address. We do when there are enough
+ *  arcs leaving one node. It wastes some bytes but gives faster lookups. 
*/
 public int bytesPerArc() {
   return bytesPerArc;
 }
 
-/** Where we are in the array; only valid if bytesPerArc != 0, and the 
array has no holes.
- * arcIdx = Integer.MIN_VALUE indicates that the arc is part of a direct 
array, addressed by
- * label.
- */
-public int arcIdx() {
-  return arcIdx;
-}
-
-/** How many arc, if bytesPerArc == 0. Otherwise, the size of the arc 
array. If the array is
- * direct, this may include holes. Otherwise it is also how many arcs are 
in the array */
+/** How many arcs; only valid if bytesPerArc != 0 (fixed length arcs).
+ * For a node designed for binary search this is the array size.
+ * For a node designed for direct addressing, this is the label range. */
 public int numArcs() {
   return numArcs;
 }
+
+/** Table of bits of a direct addressing node.
+ * Only valid if nodeFlags == {@link #ARCS_FOR_DIRECT_ADDRESSING};
+ * may be null otherwise. */
+BitTable bitTable() {
+  return bitTable;
+}
+
+/** The table of bits of a direct addressing node created lazily. */
+BitTable getOrCreateBitTable() {
+  if (bitTable == null) {
+bitTable = new BitTable();
+  }
+  return bitTable;
+}
+
+/** First label of a direct addressing node.
+ * Only valid if nodeFlags == {@link #ARCS_FOR_DIRECT_ADDRESSING}. */
+int firstLabel() {
+  return firstLabel;
+}
+
+/**
+ * Reusable table of bits using an array of long internally.
+ */
+static class BitTable {
+
+  private long[] bits;
+  private int numLongs;
+
+  /** Sets the number of longs in the internal long array.
+   * Enlarges it if needed. Always clears the array. */
+  BitTable setNumLongs(int numLongs) {
+assert numLongs >= 0;
+this.numLongs = numLongs;
+if (bits == null || bits.length < numLongs) {
+  bits = new long[ArrayUtil.oversize(numLongs, Long.BYTES)];
+} else {
+  // Avoid range check in Arrays.fill().
 
 Review comment:
   I don't want to pay the range check (3 conditions verified) in 
Arrays.fill(long[], int, int, long) because they are already verified. I simply 
remove the comment for clarity.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce the memory used by direct addressing of arcs

2019-11-08 Thread GitBox
bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce 
the memory used by direct addressing of arcs
URL: https://github.com/apache/lucene-solr/pull/980#discussion_r344284382
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/fst/FST.java
 ##
 @@ -726,6 +759,57 @@ private void writeArrayPacked(Builder builder, 
Builder.UnCompiledNode node
 }
   }
 
+  private void writeArrayDirectAddressing(Builder builder, 
Builder.UnCompiledNode nodeIn, long fixedArrayStart, int maxBytesPerArc, int 
labelRange) {
+int numPresenceBytes = getNumPresenceBytes(labelRange);
+// expand the arcs in place, backwards
+long srcPos = builder.bytes.getPosition();
+long destPos = fixedArrayStart + numPresenceBytes + nodeIn.numArcs * 
maxBytesPerArc;
+// if destPos == srcPos it means all the arcs were the same length, and 
the array of them is *already* direct
+assert destPos >= srcPos;
+if (destPos > srcPos) {
+  builder.bytes.skipBytes((int) (destPos - srcPos));
+  assert builder.bytes.getPosition() == destPos;
+  for (int arcIdx = nodeIn.numArcs - 1; arcIdx >= 0; arcIdx--) {
+destPos -= maxBytesPerArc;
+int arcLen = builder.reusedBytesPerArc[arcIdx];
+srcPos -= arcLen;
+if (srcPos != destPos) {
+  assert destPos > srcPos: "destPos=" + destPos + " srcPos=" + srcPos 
+ " arcIdx=" + arcIdx + " maxBytesPerArc=" + maxBytesPerArc + " 
reusedBytesPerArc[arcIdx]=" + builder.reusedBytesPerArc[arcIdx] + " 
nodeIn.numArcs=" + nodeIn.numArcs;
+  builder.bytes.copyBytes(srcPos, destPos, arcLen);
+}
+  }
+}
+assert destPos - numPresenceBytes == fixedArrayStart;
+writePresenceBits(builder, nodeIn, labelRange, fixedArrayStart);
+  }
+
+  private void writePresenceBits(Builder builder, Builder.UnCompiledNode 
nodeIn, int labelRange, long dest) {
+long bytePos = dest;
+byte presenceBits = 1; // The first arc is always present.
+int presenceIndex = 0;
+int previousLabel = nodeIn.arcs[0].label;
+for (int arcIdx = 1; arcIdx < nodeIn.numArcs; arcIdx++) {
+  int label = nodeIn.arcs[arcIdx].label;
+  presenceIndex += label - previousLabel;
+  while (presenceIndex >= 8) {
 
 Review comment:
   Ok


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce the memory used by direct addressing of arcs

2019-11-08 Thread GitBox
bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce 
the memory used by direct addressing of arcs
URL: https://github.com/apache/lucene-solr/pull/980#discussion_r344284258
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/fst/FST.java
 ##
 @@ -676,26 +697,38 @@ long addNode(Builder builder, 
Builder.UnCompiledNode nodeIn) throws IOExce
 */
 
 if (doFixedArray) {
-  final int MAX_HEADER_SIZE = 11; // header(byte) + numArcs(vint) + 
numBytes(vint)
   assert maxBytesPerArc > 0;
   // 2nd pass just "expands" all arcs to take up a fixed byte size
 
+  // If more than (1 / DIRECT_ARC_LOAD_FACTOR) of the "slots" would be 
occupied, write an arc
+  // array that may have holes in it so that we can address the arcs 
directly by label without
 
 Review comment:
   Yes


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce the memory used by direct addressing of arcs

2019-11-08 Thread GitBox
bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce 
the memory used by direct addressing of arcs
URL: https://github.com/apache/lucene-solr/pull/980#discussion_r344284113
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/fst/FST.java
 ##
 @@ -864,6 +958,64 @@ private long readUnpackedNodeTarget(BytesReader in) 
throws IOException {
 return readNextRealArc(arc, in);
   }
 
+  /**
+   * Reads the presence bits of a direct-addressing node, store them in the 
provided arc {@link Arc#bitTable()}
+   * and returns the number of presence bytes.
+   */
+  private int readPresenceBytes(Arc arc, BytesReader in) throws IOException 
{
+int numPresenceBytes = getNumPresenceBytes(arc.numArcs());
+long[] presenceBits = new long[(numPresenceBytes + 7) / Long.BYTES];
+for (int i = 0; i < numPresenceBytes; i++) {
+  // Read the next unsigned byte, shift it to the left, and appends it to 
the current long.
+  presenceBits[i / Long.BYTES] |= (in.readByte() & 0xFFL) << (i * 
Byte.SIZE);
+}
+arc.bitTable = presenceBits;
+assert checkPresenceBytesAreValid(arc);
+return numPresenceBytes;
+  }
+
+  static boolean checkPresenceBytesAreValid(Arc arc) {
+assert (arc.bitTable()[0] & 1L) != 0; // First bit must be set.
+assert (arc.bitTable()[arc.bitTable().length - 1] & (1L << (arc.numArcs() 
- 1))) != 0; // Last bit must be set.
+assert countBits(arc.bitTable()) <= arc.numArcs(); // Total bit set (real 
num arcs) must be <= label range (stored in arc.numArcs()).
+return true;
+  }
+
+  /**
+   * Counts all bits set in the provided longs.
+   */
+  static int countBits(long[] bits) {
 
 Review comment:
   Ok


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce the memory used by direct addressing of arcs

2019-10-29 Thread GitBox
bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce 
the memory used by direct addressing of arcs
URL: https://github.com/apache/lucene-solr/pull/980#discussion_r339996219
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/fst/FST.java
 ##
 @@ -864,6 +1012,68 @@ private long readUnpackedNodeTarget(BytesReader in) 
throws IOException {
 return readNextRealArc(arc, in);
   }
 
+  /**
+   * Reads the presence bits of a direct-addressing node, store them in the 
provided arc {@link Arc#bitTable()} and returns the number of presence bytes.
+   */
+  private int readPresenceBytes(Arc arc, BytesReader in) throws IOException 
{
+int numPresenceBytes = getNumPresenceBytes(arc.numArcs());
+long[] presenceBits = new long[numPresenceBytes + 7 >>> 3];
+int longIndex = 0;
+int byteIndex = 0;
+for (int i = 0; i < numPresenceBytes; i++) {
+  presenceBits[longIndex] |= (long) (in.readByte() & 0xFF) << (i << 3);
+  if (++byteIndex == 8) {
+byteIndex = 0;
+longIndex++;
+  }
+}
+arc.bitTable = presenceBits;
+assert checkPresenceBytesAreValid(arc);
+return numPresenceBytes;
+  }
+
+  static boolean checkPresenceBytesAreValid(Arc arc) {
+assert (arc.bitTable()[0] & 1L) != 0; // First bit must be set.
+assert (arc.bitTable()[arc.bitTable().length - 1] & (1L << (arc.numArcs() 
- 1))) != 0; // Last bit must be set.
+assert countBits(arc.bitTable()) <= arc.numArcs(); // Total bit set must 
be <= label range.
 
 Review comment:
   Actually arc.numArcs() stores the label range for direct addressing. The 
real number of arcs is the total number of bits set.
   I'll change the comment to:
   "Total bit set (real num arcs) must be <= label range (stored in 
arc.numArcs())."


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce the memory used by direct addressing of arcs

2019-10-29 Thread GitBox
bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce 
the memory used by direct addressing of arcs
URL: https://github.com/apache/lucene-solr/pull/980#discussion_r339994642
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/fst/FST.java
 ##
 @@ -864,6 +1012,68 @@ private long readUnpackedNodeTarget(BytesReader in) 
throws IOException {
 return readNextRealArc(arc, in);
   }
 
+  /**
+   * Reads the presence bits of a direct-addressing node, store them in the 
provided arc {@link Arc#bitTable()} and returns the number of presence bytes.
+   */
+  private int readPresenceBytes(Arc arc, BytesReader in) throws IOException 
{
+int numPresenceBytes = getNumPresenceBytes(arc.numArcs());
+long[] presenceBits = new long[numPresenceBytes + 7 >>> 3];
+int longIndex = 0;
+int byteIndex = 0;
+for (int i = 0; i < numPresenceBytes; i++) {
+  presenceBits[longIndex] |= (long) (in.readByte() & 0xFF) << (i << 3);
 
 Review comment:
   What about
   presenceBits[i / Byte.SIZE] |= (in.readByte() & 0xFFL) << (i * Byte.SIZE);


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce the memory used by direct addressing of arcs

2019-10-29 Thread GitBox
bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce 
the memory used by direct addressing of arcs
URL: https://github.com/apache/lucene-solr/pull/980#discussion_r339989028
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/fst/FST.java
 ##
 @@ -864,6 +1012,68 @@ private long readUnpackedNodeTarget(BytesReader in) 
throws IOException {
 return readNextRealArc(arc, in);
   }
 
+  /**
+   * Reads the presence bits of a direct-addressing node, store them in the 
provided arc {@link Arc#bitTable()} and returns the number of presence bytes.
+   */
+  private int readPresenceBytes(Arc arc, BytesReader in) throws IOException 
{
+int numPresenceBytes = getNumPresenceBytes(arc.numArcs());
+long[] presenceBits = new long[numPresenceBytes + 7 >>> 3];
+int longIndex = 0;
+int byteIndex = 0;
+for (int i = 0; i < numPresenceBytes; i++) {
+  presenceBits[longIndex] |= (long) (in.readByte() & 0xFF) << (i << 3);
+  if (++byteIndex == 8) {
+byteIndex = 0;
+longIndex++;
+  }
+}
+arc.bitTable = presenceBits;
+assert checkPresenceBytesAreValid(arc);
+return numPresenceBytes;
+  }
+
+  static boolean checkPresenceBytesAreValid(Arc arc) {
+assert (arc.bitTable()[0] & 1L) != 0; // First bit must be set.
+assert (arc.bitTable()[arc.bitTable().length - 1] & (1L << (arc.numArcs() 
- 1))) != 0; // Last bit must be set.
+assert countBits(arc.bitTable()) <= arc.numArcs(); // Total bit set must 
be <= label range.
+return true;
+  }
+
+  /**
+   * Counts all bits set in the provided longs.
+   */
+  static int countBits(long[] bits) {
+int bitCount = 0;
+for (long l : bits) {
+  bitCount += Long.bitCount(l);
+}
+return bitCount;
+  }
+
+  /**
+   * Returns whether the bit at given index is set.
+   */
+  static boolean isBitSet(long[] bits, int bitIndex) {
+return (bits[bitIndex >> 6] & (1L << (bitIndex & 63))) != 0;
+  }
+
+  /**
+   * Counts the bits set up to the given bit index, exclusive.
+   */
+  static int countBitsUpTo(long[] bits, int bitIndex) {
+int bitCount = 0;
+int lastLong = bitIndex >>> 6; // index / 64
+for (int i = 0; i < lastLong; i++) {
+  bitCount += Long.bitCount(bits[i]);
+}
+int indexInLastLong = bitIndex & 63; // index % 64
+if (indexInLastLong != 0) {
+  long mask = 0xL >>> (Long.SIZE - indexInLastLong);
+  bitCount += Long.bitCount(bits[lastLong] & mask);
+}
 
 Review comment:
   Nice!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce the memory used by direct addressing of arcs

2019-10-29 Thread GitBox
bruno-roustant commented on a change in pull request #980: LUCENE-8920: Reduce 
the memory used by direct addressing of arcs
URL: https://github.com/apache/lucene-solr/pull/980#discussion_r339983062
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/util/fst/FST.java
 ##
 @@ -864,6 +1012,68 @@ private long readUnpackedNodeTarget(BytesReader in) 
throws IOException {
 return readNextRealArc(arc, in);
   }
 
+  /**
+   * Reads the presence bits of a direct-addressing node, store them in the 
provided arc {@link Arc#bitTable()} and returns the number of presence bytes.
+   */
+  private int readPresenceBytes(Arc arc, BytesReader in) throws IOException 
{
+int numPresenceBytes = getNumPresenceBytes(arc.numArcs());
+long[] presenceBits = new long[numPresenceBytes + 7 >>> 3];
+int longIndex = 0;
+int byteIndex = 0;
+for (int i = 0; i < numPresenceBytes; i++) {
+  presenceBits[longIndex] |= (long) (in.readByte() & 0xFF) << (i << 3);
+  if (++byteIndex == 8) {
+byteIndex = 0;
+longIndex++;
+  }
+}
+arc.bitTable = presenceBits;
+assert checkPresenceBytesAreValid(arc);
+return numPresenceBytes;
+  }
+
+  static boolean checkPresenceBytesAreValid(Arc arc) {
+assert (arc.bitTable()[0] & 1L) != 0; // First bit must be set.
+assert (arc.bitTable()[arc.bitTable().length - 1] & (1L << (arc.numArcs() 
- 1))) != 0; // Last bit must be set.
+assert countBits(arc.bitTable()) <= arc.numArcs(); // Total bit set must 
be <= label range.
+return true;
+  }
+
+  /**
+   * Counts all bits set in the provided longs.
+   */
+  static int countBits(long[] bits) {
+int bitCount = 0;
+for (long l : bits) {
+  bitCount += Long.bitCount(l);
+}
+return bitCount;
+  }
+
+  /**
+   * Returns whether the bit at given index is set.
+   */
+  static boolean isBitSet(long[] bits, int bitIndex) {
+return (bits[bitIndex >> 6] & (1L << (bitIndex & 63))) != 0;
 
 Review comment:
   Thanks for the tip about the removal of the mask.
   
   About the arithmetic shift, my idea was to keep the idea of "division by 
Long.SIZE", but I agree this is not clear. For clarity I finally prefer simply 
"bits[bitIndex / Long.SIZE]".


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org