[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-17 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r323414056
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java
 ##
 @@ -21,14 +21,8 @@
 import com.google.common.base.Preconditions;
 import it.unimi.dsi.fastutil.ints.IntArrays;
 import java.io.IOException;
-import java.util.ArrayList;
-import java.util.Collection;
-import java.util.Collections;
-import java.util.HashMap;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Map;
-import java.util.Set;
+import java.util.*;
 
 Review comment:
   I think that it's better to import required class only instead of importing 
all classes in util? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-17 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r323415168
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/column/ColumnIndexContainer.java
 ##
 @@ -44,5 +45,15 @@
*/
   Dictionary getDictionary();
 
+  /**
+   *
+   * @return Get the bloom filter for the column if it exists else  {@code 
null}
+   */
   BloomFilterReader getBloomFilter();
+
+  /**
+   *
+   * @return Get the bloom filter for the column if it exists else  {@code 
null}
 
 Review comment:
   Change the comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-17 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r325409640
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java
 ##
 @@ -254,6 +261,13 @@ private boolean 
createDictionaryForColumn(ColumnIndexCreationInfo info, SegmentG
 
   @Override
   public void indexRow(GenericRow row) {
+
 
 Review comment:
   remove line


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-17 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r325361949
 
 

 ##
 File path: 
pinot-core/src/test/java/org/apache/pinot/core/realtime/impl/presence/RealtimePresenceVectorReaderWriterTest.java
 ##
 @@ -0,0 +1,42 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.realtime.impl.presence;
+
+import org.testng.Assert;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+public class RealtimePresenceVectorReaderWriterTest {
+private static RealtimePresenceVectorReaderWriter readerWriter = null;
+
+@BeforeClass
+public void setup() {
+readerWriter = new RealtimePresenceVectorReaderWriter();
+}
+
+@Test
+public void testRealtimePresenceVectorReaderWriter() {
+for (int i=0;i<100;i++) {
+readerWriter.setNull(i);
+}
+for (int i=0;i<100;i++) {
 
 Review comment:
   same here


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-17 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r325409528
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/realtime/impl/presence/RealtimePresenceVectorReaderWriter.java
 ##
 @@ -0,0 +1,41 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.realtime.impl.presence;
+
+import org.apache.pinot.core.segment.index.readers.PresenceVectorReader;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+/**
+ * Defines a real-time presence vector to be used in realtime ingestion.
+ */
+public class RealtimePresenceVectorReaderWriter implements 
PresenceVectorReader {
+private final MutableRoaringBitmap _nullBitmap;
 
 Review comment:
   We use 2 spaces. Can you reformat?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-17 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r325413616
 
 

 ##
 File path: 
pinot-core/src/test/java/org/apache/pinot/core/realtime/impl/presence/RealtimePresenceVectorReaderWriterTest.java
 ##
 @@ -0,0 +1,42 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.realtime.impl.presence;
+
+import org.testng.Assert;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+public class RealtimePresenceVectorReaderWriterTest {
 
 Review comment:
   2 spaces


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-17 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r325387059
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/PresenceVectorReaderImpl.java
 ##
 @@ -0,0 +1,42 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.segment.index.readers;
+
+import com.google.common.base.Joiner;
+
+import java.io.IOException;
+import java.util.Random;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+import org.roaringbitmap.RoaringBitmap;
+import org.roaringbitmap.buffer.ImmutableRoaringBitmap;
+
+
+public class PresenceVectorReaderImpl implements PresenceVectorReader {
+
+  ImmutableRoaringBitmap _nullBitmap;
+
+  public PresenceVectorReaderImpl(PinotDataBuffer presenceVectorBuffer) throws 
IOException {
+_nullBitmap = new 
ImmutableRoaringBitmap(presenceVectorBuffer.toDirectByteBuffer(0, 
(int)presenceVectorBuffer.size()));
 
 Review comment:
   `(int) presenceVectorBuffer.size()`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-17 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r325360431
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/BitmapDocIdSet.java
 ##
 @@ -33,7 +36,8 @@
   public BitmapDocIdSet(ImmutableRoaringBitmap[] bitmaps, int startDocId, int 
endDocId, boolean exclusive) {
 int numBitmaps = bitmaps.length;
 if (numBitmaps > 1) {
-  MutableRoaringBitmap orBitmap = MutableRoaringBitmap.or(bitmaps);
+  Iterator iterator = Arrays.asList(bitmaps).iterator();
+  MutableRoaringBitmap orBitmap = MutableRoaringBitmap.or(iterator, 
startDocId, endDocId + 1);
 
 Review comment:
   Is this change about a bug fix or API change?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-17 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r323414056
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java
 ##
 @@ -21,14 +21,8 @@
 import com.google.common.base.Preconditions;
 import it.unimi.dsi.fastutil.ints.IntArrays;
 import java.io.IOException;
-import java.util.ArrayList;
-import java.util.Collection;
-import java.util.Collections;
-import java.util.HashMap;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Map;
-import java.util.Set;
+import java.util.*;
 
 Review comment:
   I think that it's better to import required class only instead of importing 
all classes in util? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-17 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r325413513
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/PresenceVectorReader.java
 ##
 @@ -0,0 +1,35 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.segment.index.readers;
+
+/**
+ * Reader interface to read from an underlying Presence vector. This is
+ * primarily used to check if a particular column value corresponding to
+ * a document ID is null or not.
+ */
+public interface PresenceVectorReader {
+
+/**
+ * Check if the given docId is present in the corresponding column
 
 Review comment:
   2 spaces


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-17 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r325361903
 
 

 ##
 File path: 
pinot-core/src/test/java/org/apache/pinot/core/realtime/impl/presence/RealtimePresenceVectorReaderWriterTest.java
 ##
 @@ -0,0 +1,42 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.realtime.impl.presence;
+
+import org.testng.Assert;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+public class RealtimePresenceVectorReaderWriterTest {
+private static RealtimePresenceVectorReaderWriter readerWriter = null;
+
+@BeforeClass
+public void setup() {
+readerWriter = new RealtimePresenceVectorReaderWriter();
+}
+
+@Test
+public void testRealtimePresenceVectorReaderWriter() {
+for (int i=0;i<100;i++) {
 
 Review comment:
   format for loop (`for (int i = 0; i < 100; i++_)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-17 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r325413576
 
 

 ##
 File path: 
pinot-core/src/test/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImplPresenceVectorTest.java
 ##
 @@ -0,0 +1,104 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.indexsegment.mutable;
+
+import org.apache.pinot.common.data.FieldSpec;
+import org.apache.pinot.common.data.Schema;
+import org.apache.pinot.core.data.GenericRow;
+import org.apache.pinot.core.data.readers.JSONRecordReader;
+import org.apache.pinot.core.data.readers.RecordReader;
+import org.apache.pinot.core.data.recordtransformer.CompositeTransformer;
+import org.apache.pinot.core.segment.index.data.source.ColumnDataSource;
+import org.apache.pinot.core.segment.index.readers.PresenceVectorReader;
+import org.roaringbitmap.RoaringBitmap;
+import org.testng.Assert;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URL;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+
+
+import static 
org.apache.pinot.common.utils.CommonConstants.Segment.NULL_FIELDS;
+
+public class MutableSegmentImplPresenceVectorTest {
 
 Review comment:
   2spaces


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-05 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r321570383
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/presence/PresenceVectorCreator.java
 ##
 @@ -0,0 +1,77 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.segment.creator.impl.presence;
+
+import java.io.Closeable;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteOrder;
+import java.util.Arrays;
+import org.apache.pinot.core.segment.creator.impl.V1Constants;
+import org.apache.pinot.core.segment.index.readers.PresenceVectorReader;
+import org.apache.pinot.core.segment.index.readers.PresenceVectorReaderImpl;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+import org.roaringbitmap.ImmutableBitmapDataProvider;
+import org.roaringbitmap.RoaringBitmap;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+/**
+ * Presence Vector Creator
+ *
+ */
+public class PresenceVectorCreator implements Closeable {
+
+  private MutableRoaringBitmap _nullBitmap;
+  private File _presenceVectorFile;
+
+  public PresenceVectorCreator(File indexDir, String columnName) {
+_presenceVectorFile = new File(indexDir, columnName + 
V1Constants.Indexes.PRESENCE_VECTOR_FILE_EXTENSION);
+_nullBitmap = new MutableRoaringBitmap();
+  }
+
+  @Override
+  public void close()
+  throws IOException {
+try (DataOutputStream outputStream = new DataOutputStream(new 
FileOutputStream(_presenceVectorFile))) {
+  _nullBitmap.serialize(outputStream);
+}
+  }
+
+  public void setIsNull(int docId) {
+_nullBitmap.add(docId);
+  }
+
+  public static void main(String[] args)
 
 Review comment:
   remove


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-05 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r321571678
 
 

 ##
 File path: pinot-core/src/test/resources/data/test_presence_vector_data.json
 ##
 @@ -0,0 +1,12 @@
+{
 
 Review comment:
   should we also test the case something like
   
   `"description" : null`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-05 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r321570277
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/data/recordtransformer/NullValueTransformer.java
 ##
 @@ -33,17 +36,35 @@ public NullValueTransformer(Schema schema) {
 
   @Override
   public GenericRow transform(GenericRow record) {
+Set nullColumnNamesSet = null;
+
+// Clear out the 'null_fields' value in case the Generic row is reused
+record.putField(NULL_FIELDS, null);
+
 for (FieldSpec fieldSpec : _fieldSpecs) {
   String fieldName = fieldSpec.getName();
   // Do not allow default value for time column
   if (record.getValue(fieldName) == null && fieldSpec.getFieldType() != 
FieldSpec.FieldType.TIME) {
+if (nullColumnNamesSet == null) {
+  nullColumnNamesSet = new HashSet<>();
+}
+
+// Only handle null columns for non-virtual columns
+if (!fieldSpec.isVirtualColumn()) {
+  nullColumnNamesSet.add(fieldName);
+}
+
 if (fieldSpec.isSingleValueField()) {
   record.putField(fieldName, fieldSpec.getDefaultNullValue());
 } else {
   record.putField(fieldName, new 
Object[]{fieldSpec.getDefaultNullValue()});
 }
   }
 }
+
+if (nullColumnNamesSet != null) {
+  record.putField(NULL_FIELDS, String.join(",", nullColumnNamesSet));
 
 Review comment:
   I think that we can directly store `Set` since `GenericRow` gets `Object` as 
value instead of keep repeating `join & parse`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-05 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r321568658
 
 

 ##
 File path: 
pinot-core/src/test/resources/data/test_presence_vector_pinot_schema.json
 ##
 @@ -0,0 +1,26 @@
+{
+  "schemaName": "test_pinot_table",
+  "dimensionFieldSpecs": [
+{
+  "name": "cityid",
+  "dataType": "INT",
+  "defaultNullValue": 0
+},
+{
+  "name": "description",
+  "dataType": "STRING"
+},
+{
+  "name": "signup_email",
+  "dataType": "STRING",
+  "defaultNullValue": ""
+}
+  ],
+  "timeFieldSpec": {
+"incomingGranularitySpec": {
+  "name": "secondsSinceEpoch",
+  "dataType": "LONG",
+  "timeType": "SECONDS"
+}
+  }
+}
 
 Review comment:
   add line


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-05 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r321568965
 
 

 ##
 File path: 
pinot-common/src/main/java/org/apache/pinot/common/utils/CommonConstants.java
 ##
 @@ -373,6 +373,8 @@ public ServerType getServerType() {
 
 public static final String LOCAL_SEGMENT_SCHEME = "file";
 
+public static final String NULL_FIELDS = "null_fields";
 
 Review comment:
   `null.fields`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-05 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r321568760
 
 

 ##
 File path: 
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/ClusterIntegrationTestUtils.java
 ##
 @@ -283,6 +283,7 @@ public static void buildSegmentsFromAvro(List 
avroFiles, int baseSegmentIn
   TarGzCompressionUtils
   .createTarGzOfDirectory(segmentFile.getAbsolutePath(), new 
File(tarDir, segmentName).getAbsolutePath());
 } catch (Exception e) {
+  e.printStackTrace();
 
 Review comment:
   why do we need to add this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-05 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r321572493
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java
 ##
 @@ -479,6 +521,20 @@ public GenericRow getRecord(int docId, GenericRow reuse) {
   .getValue(docId, fieldSpec, _indexReaderWriterMap.get(column), 
_dictionaryMap.get(column),
   _maxNumValuesMap.getOrDefault(column, 0)));
 }
+
+// Explicitly set it to null in case of reuse
+reuse.putField(NULL_FIELDS, null);
+
+// Look up bitmap for this docId and construct a comma separated
+// null column names based on that bitmap
+RoaringBitmap bitmap = _docIdToNullColumnsMap.get(docId);
 
 Review comment:
   why can't we rebuild the row using `_docIdToNullColumnsMap`? Because we need 
to call `isPresent()` for all columns?
   
   Do we know the overheads on this data structure in memory?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-05 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r321568590
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/index/readers/PresenceVectorReaderImpl.java
 ##
 @@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.segment.index.readers;
+
+import com.google.common.base.Joiner;
+
+import java.io.IOException;
+import java.util.Random;
+import org.apache.pinot.core.segment.memory.PinotDataBuffer;
+import org.roaringbitmap.RoaringBitmap;
+import org.roaringbitmap.buffer.ImmutableRoaringBitmap;
+
+
+public class PresenceVectorReaderImpl implements PresenceVectorReader {
+
+  ImmutableRoaringBitmap _nullBitmap;
+
+  public PresenceVectorReaderImpl(PinotDataBuffer presenceVectorBuffer) throws 
IOException {
+_nullBitmap = new 
ImmutableRoaringBitmap(presenceVectorBuffer.toDirectByteBuffer(0, 
(int)presenceVectorBuffer.size()));
+  }
+
+  public boolean isPresent(int docId) {
+return !_nullBitmap.contains(docId);
+  }
+
+  public static void main(String[] args) {
 
 Review comment:
   remove


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-05 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r321570358
 
 

 ##
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/realtime/impl/presence/RealtimePresenceVectorReaderWriter.java
 ##
 @@ -0,0 +1,41 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.realtime.impl.presence;
+
+import org.apache.pinot.core.segment.index.readers.PresenceVectorReader;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+/**
+ * Defines a real-time presence vector to be used in realtime ingestion.
+ */
+public class RealtimePresenceVectorReaderWriter implements 
PresenceVectorReader {
+private final MutableRoaringBitmap _nullBitmap;
+
+public RealtimePresenceVectorReaderWriter() {
+_nullBitmap = new MutableRoaringBitmap();
+}
+
+public void setIsNull(int docId) {
 
 Review comment:
   `setNull`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] snleee commented on a change in pull request #4585: Presence vector

2019-09-05 Thread GitBox
snleee commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r321568650
 
 

 ##
 File path: pinot-core/src/test/resources/data/test_presence_vector_data.json
 ##
 @@ -0,0 +1,12 @@
+{
+  "description" : "6\"mlc\u0015x[9jA\u001DT",
+  "secondsSinceEpoch": 1567205394
+}
+{
+  "cityid" : -2053105565,
+  "description" : "q\u0012=cLp{-PDa\fT",
+  "signup_email" : {
+"string" : "fI\u00194u@WI\f\u001Dt"
+  },
+  "secondsSinceEpoch": 1567205394
+}
 
 Review comment:
   add line


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org