[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400581886
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -169,5 +173,50 @@ public void close()
   throws IOException {
 _indexReader.close();
 _indexDirectory.close();
+_docIdTranslator.close();
+  }
+
+  private static class DocIdTranslator implements Closeable {
+final PinotDataBuffer _buffer;
+
+DocIdTranslator(File segmentIndexDir, String column, int numDocs, 
IndexSearcher indexSearcher)
+throws Exception {
+  int length = Integer.BYTES * numDocs;
+  File docIdMappingFile = new 
File(SegmentDirectoryPaths.findSegmentDirectory(segmentIndexDir),
+  column + LUCENE_TEXT_INDEX_DOCID_MAPPING_FILE_EXTENSION);
+  // The mapping is local to a segment. It is created on the server during 
segment load.
+  // Unless we are running Pinot on Solaris/SPARC, the underlying 
architecture is
+  // LITTLE_ENDIAN (Linux/x86). So use that as byte order.
+  String desc = "Text index docId mapping buffer: " + column;
+  if (docIdMappingFile.exists()) {
+// we will be here for segment reload and server restart
+// for refresh, we will not be here since segment is deleted/replaced
+// TODO: see if we can prefetch the pages
+_buffer =
+PinotDataBuffer.mapFile(docIdMappingFile, /* readOnly */ true, 0, 
length, ByteOrder.LITTLE_ENDIAN, desc);
+  } else {
+_buffer =
+PinotDataBuffer.mapFile(docIdMappingFile, /* readOnly */ false, 0, 
length, ByteOrder.LITTLE_ENDIAN, desc);
+for (int i = 0; i < numDocs; i++) {
+  try {
+Document document = indexSearcher.doc(i);
+int pinotDocId = 
Integer.parseInt(document.get(LuceneTextIndexCreator.LUCENE_INDEX_DOC_ID_COLUMN_NAME));
+_buffer.putInt(i * Integer.BYTES, pinotDocId);
+  } catch (Exception e) {
+throw new RuntimeException("Failed to build doc id mapping during 
segment load, " + e);
 
 Review comment:
   ```suggestion
   throw new RuntimeException("Caught exception while building doc 
id mapping for text index column: " + column, e);
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400581254
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -123,8 +133,7 @@ public MutableRoaringBitmap getDocIds(Object value) {
   _indexSearcher.search(query, docIDCollector);
   return getPinotDocIds(docIDs);
 } catch (Exception e) {
-  LOGGER.error("Failed while searching the text index for column {}, 
search query {}, exception {}", _column,
-  searchQuery, e.getMessage());
+  LOGGER.error("Failed while searching the text index for column {}, 
search query {},", _column, searchQuery);
 
 Review comment:
   If you want to keep the context of the exception, put it into the 
RuntimeException


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400583323
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -169,5 +173,50 @@ public void close()
   throws IOException {
 _indexReader.close();
 _indexDirectory.close();
+_docIdTranslator.close();
+  }
+
+  private static class DocIdTranslator implements Closeable {
+final PinotDataBuffer _buffer;
+
+DocIdTranslator(File segmentIndexDir, String column, int numDocs, 
IndexSearcher indexSearcher)
+throws Exception {
+  int length = Integer.BYTES * numDocs;
+  File docIdMappingFile = new 
File(SegmentDirectoryPaths.findSegmentDirectory(segmentIndexDir),
+  column + LUCENE_TEXT_INDEX_DOCID_MAPPING_FILE_EXTENSION);
+  // The mapping is local to a segment. It is created on the server during 
segment load.
+  // Unless we are running Pinot on Solaris/SPARC, the underlying 
architecture is
+  // LITTLE_ENDIAN (Linux/x86). So use that as byte order.
+  String desc = "Text index docId mapping buffer: " + column;
+  if (docIdMappingFile.exists()) {
+// we will be here for segment reload and server restart
+// for refresh, we will not be here since segment is deleted/replaced
+// TODO: see if we can prefetch the pages
+_buffer =
+PinotDataBuffer.mapFile(docIdMappingFile, /* readOnly */ true, 0, 
length, ByteOrder.LITTLE_ENDIAN, desc);
+  } else {
+_buffer =
+PinotDataBuffer.mapFile(docIdMappingFile, /* readOnly */ false, 0, 
length, ByteOrder.LITTLE_ENDIAN, desc);
+for (int i = 0; i < numDocs; i++) {
+  try {
+Document document = indexSearcher.doc(i);
+int pinotDocId = 
Integer.parseInt(document.get(LuceneTextIndexCreator.LUCENE_INDEX_DOC_ID_COLUMN_NAME));
+_buffer.putInt(i * Integer.BYTES, pinotDocId);
+  } catch (Exception e) {
+throw new RuntimeException("Failed to build doc id mapping during 
segment load, " + e);
+  }
+}
+  }
+}
+
+int getPinotDocId(int offset) {
 
 Review comment:
   Pass in `luceneDocId` and wrap the `_buffer.getInt(luceneDocId * 
Integer.BYTES)` logic inside


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400582841
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -123,8 +133,7 @@ public MutableRoaringBitmap getDocIds(Object value) {
   _indexSearcher.search(query, docIDCollector);
   return getPinotDocIds(docIDs);
 } catch (Exception e) {
-  LOGGER.error("Failed while searching the text index for column {}, 
search query {}, exception {}", _column,
-  searchQuery, e.getMessage());
+  LOGGER.error("Failed while searching the text index for column {}, 
search query {},", _column, searchQuery);
   throw new RuntimeException(e);
 
 Review comment:
   ```suggestion
 throw new RuntimeException("Caught exception while searching the text 
index column: " + _column + " with query: " + searchQuery, e);
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400580427
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/converter/SegmentV1V2ToV3FormatConverter.java
 ##
 @@ -241,6 +242,22 @@ public boolean accept(File dir, String name) {
 Files.copy(indexFile.toPath(), v3LuceneIndexFile.toPath());
   }
 }
+// if segment reload is issued asking for up-conversion of
 
 Review comment:
   I was referring to the usage (check metadata and copy instead of filtering 
on file names). Not critical, you can address later


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400565538
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -169,5 +178,57 @@ public void close()
   throws IOException {
 _indexReader.close();
 _indexDirectory.close();
+_docIdReaderWriter.close();
+  }
+
+  private class DocIdReaderWriter implements Closeable {
+private PinotDataBuffer _buffer;
+private final boolean _mappingExists;
+
+DocIdReaderWriter(File segmentIndexDir, String column, int numDocs)
+throws Exception {
+  int length = Integer.BYTES * numDocs;
+  File docIdMappingFile = new 
File(SegmentDirectoryPaths.findSegmentDirectory(segmentIndexDir),
+  column + LUCENE_TEXT_INDEX_DOCID_MAPPING_FILE_EXTENSION);
+  // The mapping is local to a segment. It is created on the server during 
segment load.
+  // Unless we are running Pinot on Solaris/SPARC, the underlying 
architecture is
+  // LITTLE_ENDIAN (Linux/x86). So use that as byte order.
+  if (docIdMappingFile.exists()) {
+// we will be here for segment reload and server restart
+// for refresh, we will not be here since segment is deleted/replaced
+// TODO: see if we can prefetch the pages
+_mappingExists = true;
+_buffer = PinotDataBuffer.mapFile(docIdMappingFile, /* readOnly */ 
true, 0, length, ByteOrder.LITTLE_ENDIAN,
+_column + getClass().getSimpleName());
+  } else {
+_mappingExists = false;
+_buffer = PinotDataBuffer.mapFile(docIdMappingFile, /* readOnly */ 
false, 0, length, ByteOrder.LITTLE_ENDIAN,
+_column + getClass().getSimpleName());
+  }
+}
+
+public void buildDocIdMapping(int numDocs) {
 
 Review comment:
   Merge this into the constructor, no need to track an extra boolean.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400564728
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -169,5 +178,57 @@ public void close()
   throws IOException {
 _indexReader.close();
 _indexDirectory.close();
+_docIdReaderWriter.close();
+  }
+
+  private class DocIdReaderWriter implements Closeable {
 
 Review comment:
   `private static class`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400565872
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -60,7 +66,7 @@
* @param column column name
* @param segmentIndexDir segment index directory
*/
-  public LuceneTextIndexReader(String column, File segmentIndexDir) {
+  public LuceneTextIndexReader(String column, File segmentIndexDir, int 
numDocs) {
 
 Review comment:
   Recommend making second argument `indexDir` to denote that it is top level 
dir


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400559121
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/loader/invertedindex/TextIndexHandler.java
 ##
 @@ -61,8 +61,26 @@
 import static 
org.apache.pinot.core.segment.creator.impl.V1Constants.MetadataKeys.Column.getKeyFor;
 
 
+/**
+ * Helper class for text indexes used by {@link 
org.apache.pinot.core.segment.index.loader.SegmentPreProcessor}.
+ * to create text index for column during segment load time. Currently text 
index is always
+ * created (if enabled on a column) during segment generation
+ *
+ * (1) A new segment with text index is created/refreshed. Server loads the 
segment. The handler
+ * detects the existence of text index and returns.
+ *
+ * (2) A reload is issued on an existing segment with existing text index. The 
handler
+ * detects the existence of text index and returns.
+ *
+ * (3) A reload is issued on an existing segment after text index is enabled 
on an existing
+ * column. Read the forward index to create text index.
+ *
+ * (4) A reload is issued on an existing segment after text index is enabled 
on a newly
+ * added column. In this case, the default column handler would have taken 
care of adding
+ * forward index for the new column. Read the forward index to create text 
index.
+ */
 public class TextIndexHandler {
 
 Review comment:
   Not related to this pr, but seems this file needs reformat


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400557528
 
 

 ##
 File path: 
pinot-core/src/test/java/org/apache/pinot/queries/TextSearchQueriesTest.java
 ##
 @@ -18,6 +18,7 @@
  */
 package org.apache.pinot.queries;
 
+import com.google.common.base.Stopwatch;
 
 Review comment:
   Remove?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400568072
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -169,5 +178,57 @@ public void close()
   throws IOException {
 _indexReader.close();
 _indexDirectory.close();
+_docIdReaderWriter.close();
+  }
+
+  private class DocIdReaderWriter implements Closeable {
+private PinotDataBuffer _buffer;
+private final boolean _mappingExists;
+
+DocIdReaderWriter(File segmentIndexDir, String column, int numDocs)
+throws Exception {
+  int length = Integer.BYTES * numDocs;
+  File docIdMappingFile = new 
File(SegmentDirectoryPaths.findSegmentDirectory(segmentIndexDir),
+  column + LUCENE_TEXT_INDEX_DOCID_MAPPING_FILE_EXTENSION);
+  // The mapping is local to a segment. It is created on the server during 
segment load.
+  // Unless we are running Pinot on Solaris/SPARC, the underlying 
architecture is
+  // LITTLE_ENDIAN (Linux/x86). So use that as byte order.
+  if (docIdMappingFile.exists()) {
+// we will be here for segment reload and server restart
+// for refresh, we will not be here since segment is deleted/replaced
+// TODO: see if we can prefetch the pages
+_mappingExists = true;
+_buffer = PinotDataBuffer.mapFile(docIdMappingFile, /* readOnly */ 
true, 0, length, ByteOrder.LITTLE_ENDIAN,
+_column + getClass().getSimpleName());
 
 Review comment:
   `"Text index docId mapping buffer: " + _column`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400566118
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -121,7 +131,7 @@ public MutableRoaringBitmap getDocIds(Object value) {
 try {
   Query query = _queryParser.parse(searchQuery);
   _indexSearcher.search(query, docIDCollector);
-  return getPinotDocIds(docIDs);
+  return getPinotDocIdsFromMappingFile(docIDs);
 
 Review comment:
   I prefer the original name


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400568967
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -169,5 +178,57 @@ public void close()
   throws IOException {
 _indexReader.close();
 _indexDirectory.close();
+_docIdReaderWriter.close();
+  }
+
+  private class DocIdReaderWriter implements Closeable {
 
 Review comment:
   I would recommend renaming it to `DocIdTranslator`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400566758
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -142,18 +152,17 @@ public MutableRoaringBitmap getDocIds(Object value) {
*
* TODO: Explore optimizing this path to avoid building the second bitmap
*/
-  private MutableRoaringBitmap getPinotDocIds(MutableRoaringBitmap 
luceneDocIds) {
+  private MutableRoaringBitmap 
getPinotDocIdsFromMappingFile(MutableRoaringBitmap luceneDocIds) {
 IntIterator luceneDocIDIterator = luceneDocIds.getIntIterator();
 MutableRoaringBitmap actualDocIDs = new MutableRoaringBitmap();
 try {
   while (luceneDocIDIterator.hasNext()) {
 int luceneDocId = luceneDocIDIterator.next();
-Document document = _indexSearcher.doc(luceneDocId);
-int pinotDocId = 
Integer.valueOf(document.get(LuceneTextIndexCreator.LUCENE_INDEX_DOC_ID_COLUMN_NAME));
+int pinotDocId = _docIdReaderWriter.getInt(luceneDocId * 
Integer.BYTES);
 actualDocIDs.add(pinotDocId);
   }
 } catch (Exception e) {
-  throw new RuntimeException("Error: failed while retrieving document from 
index: " + e);
+  throw new RuntimeException("Error: failed while retrieving pinot doc id 
from mapping file: " + e);
 
 Review comment:
   No need to catch, you can directly throw the exception


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400568405
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -169,5 +178,57 @@ public void close()
   throws IOException {
 _indexReader.close();
 _indexDirectory.close();
+_docIdReaderWriter.close();
+  }
+
+  private class DocIdReaderWriter implements Closeable {
+private PinotDataBuffer _buffer;
+private final boolean _mappingExists;
+
+DocIdReaderWriter(File segmentIndexDir, String column, int numDocs)
+throws Exception {
+  int length = Integer.BYTES * numDocs;
+  File docIdMappingFile = new 
File(SegmentDirectoryPaths.findSegmentDirectory(segmentIndexDir),
+  column + LUCENE_TEXT_INDEX_DOCID_MAPPING_FILE_EXTENSION);
+  // The mapping is local to a segment. It is created on the server during 
segment load.
+  // Unless we are running Pinot on Solaris/SPARC, the underlying 
architecture is
+  // LITTLE_ENDIAN (Linux/x86). So use that as byte order.
+  if (docIdMappingFile.exists()) {
+// we will be here for segment reload and server restart
+// for refresh, we will not be here since segment is deleted/replaced
+// TODO: see if we can prefetch the pages
+_mappingExists = true;
+_buffer = PinotDataBuffer.mapFile(docIdMappingFile, /* readOnly */ 
true, 0, length, ByteOrder.LITTLE_ENDIAN,
+_column + getClass().getSimpleName());
+  } else {
+_mappingExists = false;
+_buffer = PinotDataBuffer.mapFile(docIdMappingFile, /* readOnly */ 
false, 0, length, ByteOrder.LITTLE_ENDIAN,
+_column + getClass().getSimpleName());
+  }
+}
+
+public void buildDocIdMapping(int numDocs) {
+  if (!_mappingExists) {
+for (int i = 0; i < numDocs; i++) {
+  try {
+Document document = _indexSearcher.doc(i);
+int pinotDocId = 
Integer.valueOf(document.get(LuceneTextIndexCreator.LUCENE_INDEX_DOC_ID_COLUMN_NAME));
+_buffer.putInt(i * Integer.BYTES, pinotDocId);
+  } catch (Exception e) {
+throw new RuntimeException("Failed to build doc id mapping during 
segment load: " + e);
+  }
+}
+  }
+}
+
+int getInt(int offset) {
 
 Review comment:
   `int getPinotDocId(int luceneDocId)` for better abstraction


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400564673
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -169,5 +178,57 @@ public void close()
   throws IOException {
 _indexReader.close();
 _indexDirectory.close();
+_docIdReaderWriter.close();
+  }
+
+  private class DocIdReaderWriter implements Closeable {
+private PinotDataBuffer _buffer;
 
 Review comment:
   `final PinotDataBuffer _buffer` (class itself is private)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400566628
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -121,7 +131,7 @@ public MutableRoaringBitmap getDocIds(Object value) {
 try {
   Query query = _queryParser.parse(searchQuery);
   _indexSearcher.search(query, docIDCollector);
-  return getPinotDocIds(docIDs);
+  return getPinotDocIdsFromMappingFile(docIDs);
 } catch (Exception e) {
   LOGGER.error("Failed while searching the text index for column {}, 
search query {}, exception {}", _column,
 
 Review comment:
   You don't really need to log the error if you are going to throw out the 
exception. The catcher will log it with the stack trace


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400565038
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -169,5 +178,57 @@ public void close()
   throws IOException {
 _indexReader.close();
 _indexDirectory.close();
+_docIdReaderWriter.close();
+  }
+
+  private class DocIdReaderWriter implements Closeable {
+private PinotDataBuffer _buffer;
+private final boolean _mappingExists;
+
+DocIdReaderWriter(File segmentIndexDir, String column, int numDocs)
+throws Exception {
+  int length = Integer.BYTES * numDocs;
+  File docIdMappingFile = new 
File(SegmentDirectoryPaths.findSegmentDirectory(segmentIndexDir),
+  column + LUCENE_TEXT_INDEX_DOCID_MAPPING_FILE_EXTENSION);
+  // The mapping is local to a segment. It is created on the server during 
segment load.
+  // Unless we are running Pinot on Solaris/SPARC, the underlying 
architecture is
+  // LITTLE_ENDIAN (Linux/x86). So use that as byte order.
+  if (docIdMappingFile.exists()) {
+// we will be here for segment reload and server restart
+// for refresh, we will not be here since segment is deleted/replaced
+// TODO: see if we can prefetch the pages
+_mappingExists = true;
+_buffer = PinotDataBuffer.mapFile(docIdMappingFile, /* readOnly */ 
true, 0, length, ByteOrder.LITTLE_ENDIAN,
+_column + getClass().getSimpleName());
+  } else {
+_mappingExists = false;
+_buffer = PinotDataBuffer.mapFile(docIdMappingFile, /* readOnly */ 
false, 0, length, ByteOrder.LITTLE_ENDIAN,
+_column + getClass().getSimpleName());
+  }
+}
+
+public void buildDocIdMapping(int numDocs) {
+  if (!_mappingExists) {
+for (int i = 0; i < numDocs; i++) {
+  try {
+Document document = _indexSearcher.doc(i);
+int pinotDocId = 
Integer.valueOf(document.get(LuceneTextIndexCreator.LUCENE_INDEX_DOC_ID_COLUMN_NAME));
 
 Review comment:
   `Integer.parseInt()`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400560559
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/converter/SegmentV1V2ToV3FormatConverter.java
 ##
 @@ -241,6 +242,22 @@ public boolean accept(File dir, String name) {
 Files.copy(indexFile.toPath(), v3LuceneIndexFile.toPath());
   }
 }
+// if segment reload is issued asking for up-conversion of
 
 Review comment:
   Not related to this pr, but can you make this method similar to 
`copyForwardIndex()` and call it in `copyIndexData()` based on the metadata?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400558663
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/loader/invertedindex/TextIndexHandler.java
 ##
 @@ -154,6 +152,11 @@ private void createTextIndexForColumn(ColumnMetadata 
columnMetadata)
 int numDocs = columnMetadata.getTotalDocs();
 LOGGER.info("Creating new text index for column: {} in segment: {}", 
column, _segmentName);
 File segmentIndexDir = 
SegmentDirectoryPaths.segmentDirectoryFor(_indexDir, _segmentVersion);
+// The handlers are always invoked by the preprocessor. Before this 
ImmutableSegmentLoader would have already
 
 Review comment:
   By convention, we use `indexDir` for top level directory, and 
`segmentDirectory` for the direct directory (`indexDir` for `v1`, or `v3` for 
`v3`). If you follow the same naming convention, it will be clearer.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400372543
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -169,5 +178,49 @@ public void close()
   throws IOException {
 _indexReader.close();
 _indexDirectory.close();
+_docIdReaderWriter.close();
+  }
+
+  private class DocIdReaderWriter implements Closeable {
+private PinotDataBuffer _buffer;
+
+DocIdReaderWriter(File segmentIndexDir, String column, int numDocs) throws 
Exception {
+  int length = Integer.BYTES * numDocs;
+  File docIdMappingFile = new 
File(SegmentDirectoryPaths.findSegmentDirectory(segmentIndexDir),
+  column + LUCENE_TEXT_INDEX_DOCID_MAPPING_FILE_EXTENSION);
+  // For newly added segments, this file will not exist.
+  // For segment refresh, segment reload and server restart, file will 
exist,
+  // but we don't know if we are here for refresh v/s reload v/s restart.
+  // In case of refresh, we have to build the mapping again, but in case of
+  // reload and restart, we don't. Also, reload has a sub-case where this 
text index
+  // was indeed created during reload (user enabled on existing or newly 
added column).
+  // Since there is no way to distinguish why we are here, we build the 
mapping again
+  // regardless.
+  // TODO: see if we can prefetch the pages
+  _buffer =
+  PinotDataBuffer.mapFile(docIdMappingFile, false, 0, length, 
ByteOrder.BIG_ENDIAN, getClass().getSimpleName());
+}
+
+public void buildDocIdMapping(int numDocs) {
+  for (int i = 0; i < numDocs; i++) {
+try {
+  Document document = _indexSearcher.doc(i);
+  int pinotDocId = 
Integer.valueOf(document.get(LuceneTextIndexCreator.LUCENE_INDEX_DOC_ID_COLUMN_NAME));
+  _buffer.putInt(i * Integer.BYTES, pinotDocId);
+} catch (Exception e) {
+  LOGGER.error("Failed to build doc id mapping during segment load for 
column:{},docID:{},error:{}. Will continue and build mapping on the fly",
 
 Review comment:
   Throw this exception out instead of logging an ERROR. If this step fails, 
JVM will crash when reading the buffer.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400370429
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -169,5 +178,49 @@ public void close()
   throws IOException {
 _indexReader.close();
 _indexDirectory.close();
+_docIdReaderWriter.close();
+  }
+
+  private class DocIdReaderWriter implements Closeable {
+private PinotDataBuffer _buffer;
+
+DocIdReaderWriter(File segmentIndexDir, String column, int numDocs) throws 
Exception {
+  int length = Integer.BYTES * numDocs;
+  File docIdMappingFile = new 
File(SegmentDirectoryPaths.findSegmentDirectory(segmentIndexDir),
+  column + LUCENE_TEXT_INDEX_DOCID_MAPPING_FILE_EXTENSION);
+  // For newly added segments, this file will not exist.
+  // For segment refresh, segment reload and server restart, file will 
exist,
+  // but we don't know if we are here for refresh v/s reload v/s restart.
+  // In case of refresh, we have to build the mapping again, but in case of
+  // reload and restart, we don't. Also, reload has a sub-case where this 
text index
+  // was indeed created during reload (user enabled on existing or newly 
added column).
+  // Since there is no way to distinguish why we are here, we build the 
mapping again
+  // regardless.
+  // TODO: see if we can prefetch the pages
+  _buffer =
+  PinotDataBuffer.mapFile(docIdMappingFile, false, 0, length, 
ByteOrder.BIG_ENDIAN, getClass().getSimpleName());
 
 Review comment:
   For better performance, I would probably choose native order as this index 
is always local to one instance?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400359941
 
 

 ##
 File path: 
pinot-core/src/test/java/org/apache/pinot/queries/TestTextSearchQueries.java
 ##
 @@ -89,7 +91,7 @@
   private RecordReader _recordReader;
   Schema _schema;
 
-  private List _indexSegments = new ArrayList<>(1);
+  private static List _indexSegments = new ArrayList<>(1);
 
 Review comment:
   Revert the change in this file?
   Also suggesting rename the test to TextSearchQueriesTest for naming 
convention.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400360304
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -51,6 +54,9 @@
   private final IndexSearcher _indexSearcher;
   private final QueryParser _queryParser;
   private final String _column;
+  private DocIdReaderWriter _docIdReaderWriter;
 
 Review comment:
   `private final`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400369668
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -169,5 +178,49 @@ public void close()
   throws IOException {
 _indexReader.close();
 _indexDirectory.close();
+_docIdReaderWriter.close();
+  }
+
+  private class DocIdReaderWriter implements Closeable {
+private PinotDataBuffer _buffer;
+
+DocIdReaderWriter(File segmentIndexDir, String column, int numDocs) throws 
Exception {
+  int length = Integer.BYTES * numDocs;
+  File docIdMappingFile = new 
File(SegmentDirectoryPaths.findSegmentDirectory(segmentIndexDir),
+  column + LUCENE_TEXT_INDEX_DOCID_MAPPING_FILE_EXTENSION);
+  // For newly added segments, this file will not exist.
+  // For segment refresh, segment reload and server restart, file will 
exist,
+  // but we don't know if we are here for refresh v/s reload v/s restart.
+  // In case of refresh, we have to build the mapping again, but in case of
 
 Review comment:
   For segment refresh, this file should not exist as we delete the old segment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400367374
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -70,6 +76,10 @@ public LuceneTextIndexReader(String column, File 
segmentIndexDir) {
   // Disable Lucene query result cache. While it helps a lot with 
performance for
   // repeated queries, on the downside it cause heap issues.
   _indexSearcher.setQueryCache(null);
+  // TODO: consider using a threshold of num docs per segment to decide 
between building
+  // mapping file upfront on segment load v/s on-the-fly during query 
processing
+  _docIdReaderWriter = new DocIdReaderWriter(segmentIndexDir, _column, 
numDocs);
+  _docIdReaderWriter.buildDocIdMapping(numDocs);
 
 Review comment:
   We build the mapping every time we load the index? You saved the mapping 
into a file right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-30 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r400371993
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -169,5 +178,49 @@ public void close()
   throws IOException {
 _indexReader.close();
 _indexDirectory.close();
+_docIdReaderWriter.close();
+  }
+
+  private class DocIdReaderWriter implements Closeable {
+private PinotDataBuffer _buffer;
+
+DocIdReaderWriter(File segmentIndexDir, String column, int numDocs) throws 
Exception {
+  int length = Integer.BYTES * numDocs;
+  File docIdMappingFile = new 
File(SegmentDirectoryPaths.findSegmentDirectory(segmentIndexDir),
+  column + LUCENE_TEXT_INDEX_DOCID_MAPPING_FILE_EXTENSION);
+  // For newly added segments, this file will not exist.
+  // For segment refresh, segment reload and server restart, file will 
exist,
+  // but we don't know if we are here for refresh v/s reload v/s restart.
+  // In case of refresh, we have to build the mapping again, but in case of
+  // reload and restart, we don't. Also, reload has a sub-case where this 
text index
+  // was indeed created during reload (user enabled on existing or newly 
added column).
+  // Since there is no way to distinguish why we are here, we build the 
mapping again
+  // regardless.
+  // TODO: see if we can prefetch the pages
+  _buffer =
+  PinotDataBuffer.mapFile(docIdMappingFile, false, 0, length, 
ByteOrder.BIG_ENDIAN, getClass().getSimpleName());
 
 Review comment:
   Please include the column name into the description (last argument) to 
distinguish different columns


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to PinotDocId cache

2020-03-24 Thread GitBox
Jackie-Jiang commented on a change in pull request #5177: Lucene DocId to 
PinotDocId cache
URL: https://github.com/apache/incubator-pinot/pull/5177#discussion_r397376549
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java
 ##
 @@ -148,8 +153,12 @@ private MutableRoaringBitmap 
getPinotDocIds(MutableRoaringBitmap luceneDocIds) {
 try {
   while (luceneDocIDIterator.hasNext()) {
 int luceneDocId = luceneDocIDIterator.next();
-Document document = _indexSearcher.doc(luceneDocId);
-int pinotDocId = 
Integer.valueOf(document.get(LuceneTextIndexCreator.LUCENE_INDEX_DOC_ID_COLUMN_NAME));
+Integer pinotDocId = _luceneDocIDToPinotDocIDCache.get(luceneDocId);
 
 Review comment:
   Please use `computeIfAbsent` instead


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org