[GitHub] [incubator-pinot] fx19880617 opened a new pull request #6011: Update pinot helm to adding custom configs

2020-09-11 Thread GitBox


fx19880617 opened a new pull request #6011:
URL: https://github.com/apache/incubator-pinot/pull/6011


   ## Description
   This change:
   1. Allows users to add more configs into pinot config files just through 
`values.yaml` file.
   2. Add GC related settings into default JVM configs for reference.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[incubator-pinot] branch update_pinot_helm_for_custom_config_file created (now c3d3d20)

2020-09-11 Thread xiangfu
This is an automated email from the ASF dual-hosted git repository.

xiangfu pushed a change to branch update_pinot_helm_for_custom_config_file
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git.


  at c3d3d20  Update pinot helm to adding custom configs and update the jvm 
default configs

This branch includes the following new commits:

 new c3d3d20  Update pinot helm to adding custom configs and update the jvm 
default configs

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[incubator-pinot] 01/01: Update pinot helm to adding custom configs and update the jvm default configs

2020-09-11 Thread xiangfu
This is an automated email from the ASF dual-hosted git repository.

xiangfu pushed a commit to branch update_pinot_helm_for_custom_config_file
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git

commit c3d3d20b59f5af7e190afdbaa32a8ff121bd4c26
Author: Xiang Fu 
AuthorDate: Fri Sep 11 22:50:53 2020 -0700

Update pinot helm to adding custom configs and update the jvm default 
configs
---
 .../helm/pinot/templates/broker/configmap.yaml  |  2 +-
 .../helm/pinot/templates/controller/configmap.yaml  |  2 +-
 .../helm/pinot/templates/server/configmap.yaml  |  2 +-
 kubernetes/helm/pinot/values.yaml   | 21 ++---
 4 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/kubernetes/helm/pinot/templates/broker/configmap.yaml 
b/kubernetes/helm/pinot/templates/broker/configmap.yaml
index 9ed5dd5..525ae80 100644
--- a/kubernetes/helm/pinot/templates/broker/configmap.yaml
+++ b/kubernetes/helm/pinot/templates/broker/configmap.yaml
@@ -25,4 +25,4 @@ data:
   pinot-broker.conf: |-
 pinot.broker.client.queryPort={{ .Values.broker.port }}
 pinot.broker.routing.table.builder.class={{ 
.Values.broker.routingTable.builderClass }}
-pinot.set.instance.id.to.hostname=true
+{{ .Values.broker.extra.configs | indent 4 }}
\ No newline at end of file
diff --git a/kubernetes/helm/pinot/templates/controller/configmap.yaml 
b/kubernetes/helm/pinot/templates/controller/configmap.yaml
index 69d24a8..3d40a27 100644
--- a/kubernetes/helm/pinot/templates/controller/configmap.yaml
+++ b/kubernetes/helm/pinot/templates/controller/configmap.yaml
@@ -29,4 +29,4 @@ data:
 controller.vip.port={{ .Values.controller.service.port }}
 controller.data.dir={{ .Values.controller.data.dir }}
 controller.zk.str={{ include "zookeeper.url" . }}
-pinot.set.instance.id.to.hostname=true
+{{ .Values.controller.extra.configs | indent 4 }}
\ No newline at end of file
diff --git a/kubernetes/helm/pinot/templates/server/configmap.yaml 
b/kubernetes/helm/pinot/templates/server/configmap.yaml
index c99e173..f4320cf 100644
--- a/kubernetes/helm/pinot/templates/server/configmap.yaml
+++ b/kubernetes/helm/pinot/templates/server/configmap.yaml
@@ -27,4 +27,4 @@ data:
 pinot.server.adminapi.port={{ .Values.server.ports.admin }}
 pinot.server.instance.dataDir={{ .Values.server.dataDir }}
 pinot.server.instance.segmentTarDir={{ .Values.server.segmentTarDir }}
-pinot.set.instance.id.to.hostname=true
+{{ .Values.server.extra.configs | indent 4 }}
\ No newline at end of file
diff --git a/kubernetes/helm/pinot/values.yaml 
b/kubernetes/helm/pinot/values.yaml
index 689a8f0..4e6c347 100644
--- a/kubernetes/helm/pinot/values.yaml
+++ b/kubernetes/helm/pinot/values.yaml
@@ -47,7 +47,7 @@ controller:
 host: pinot-controller
 port: 9000
 
-  jvmOpts: "-Xms256M -Xmx1G"
+  jvmOpts: "-Xms256M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime 
-XX:+PrintGCApplicationConcurrentTime 
-Xloggc:/opt/pinot/gc-pinot-controller.log"
 
   log4j2ConfFile: /opt/pinot/conf/pinot-controller-log4j2.xml
   pluginsDir: /opt/pinot/plugins
@@ -80,6 +80,11 @@ controller:
   updateStrategy:
 type: RollingUpdate
 
+  # Extra configs will be appended to pinot-controller.conf file
+  extra:
+configs: |-
+  pinot.set.instance.id.to.hostname=true
+
 broker:
   name: broker
 
@@ -87,7 +92,7 @@ broker:
 
   replicaCount: 1
 
-  jvmOpts: "-Xms256M -Xmx1G"
+  jvmOpts: "-Xms256M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime 
-XX:+PrintGCApplicationConcurrentTime -Xloggc:/opt/pinot/gc-pinot-broker.log"
 
   log4j2ConfFile: /opt/pinot/conf/pinot-broker-log4j2.xml
   pluginsDir: /opt/pinot/plugins
@@ -123,6 +128,11 @@ broker:
   updateStrategy:
 type: RollingUpdate
 
+  # Extra configs will be appended to pinot-broker.conf file
+  extra:
+configs: |-
+  pinot.set.instance.id.to.hostname=true
+
 server:
   name: server
 
@@ -143,7 +153,7 @@ server:
 storageClass: ""
 #storageClass: "ssd"
 
-  jvmOpts: "-Xms512M -Xmx1G"
+  jvmOpts: "-Xms512M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime 
-XX:+PrintGCApplicationConcurrentTime -Xloggc:/opt/pinot/gc-pinot-server.log"
 
   log4j2ConfFile: /opt/pinot/conf/pinot-server-log4j2.xml
   pluginsDir: /opt/pinot/plugins
@@ -171,6 +181,11 @@ server:
   updateStrategy:
 type: RollingUpdate
 
+  # Extra configs will be appended to pinot-server.conf file
+  extra:
+configs: |-
+  pinot.set.instance.id.to.hostname=true
+  pinot.server.instance.realtime.alloc.offheap=true
 # 
--
 # Zookeeper:
 # 
--



[GitHub] [incubator-pinot] jackjlli commented on pull request #6009: Adjust schema validation logic in AvroIngestionSchemaValidator

2020-09-11 Thread GitBox


jackjlli commented on pull request #6009:
URL: https://github.com/apache/incubator-pinot/pull/6009#issuecomment-691397062


   > Can you reduce the size of the testing files (both 
`test_sample_data_multi_value.avro` and `test_sample_data.avro` previously 
checked in)? All we need from the file is the avro schema, and we should not 
add a 11.7 MB file into the repo.
   
   Good point  Shrink the files to 10 lines each.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[incubator-pinot] branch fix-schema-validator updated (6d22faa -> f09ca9a)

2020-09-11 Thread jlli
This is an automated email from the ASF dual-hosted git repository.

jlli pushed a change to branch fix-schema-validator
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git.


from 6d22faa  Adjust schema validation logic in AvroIngestionSchemaValidator
 add f09ca9a  Reduce test file sizes

No new revisions were added by this update.

Summary of changes:
 .../src/test/resources/data/test_sample_data.avro  | Bin 917973 -> 2315 bytes
 .../data/test_sample_data_multi_value.avro | Bin 1227 -> 5108 bytes
 2 files changed, 0 insertions(+), 0 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] yupeng9 commented on a change in pull request #6008: Add a length limit of 512 to the properties stored in the segment metadata

2020-09-11 Thread GitBox


yupeng9 commented on a change in pull request #6008:
URL: https://github.com/apache/incubator-pinot/pull/6008#discussion_r487354466



##
File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java
##
@@ -78,6 +78,9 @@
 public class SegmentColumnarIndexCreator implements SegmentCreator {
   // TODO Refactor class name to match interface name
   private static final Logger LOGGER = 
LoggerFactory.getLogger(SegmentColumnarIndexCreator.class);
+  // Allow at most 512 characters for the metadata property
+  private static final int METADATA_PROPERTY_LENGTH_LIMIT = 512;

Review comment:
   I see. Thanks





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang opened a new pull request #6010: [Clean up] Separate TextIndex from InvertedIndex

2020-09-11 Thread GitBox


Jackie-Jiang opened a new pull request #6010:
URL: https://github.com/apache/incubator-pinot/pull/6010


   ## Description
   Introduce `TextIndexCreator` and `TextIndexReader` for text index



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6008: Add a length limit of 512 to the properties stored in the segment metadata

2020-09-11 Thread GitBox


Jackie-Jiang commented on a change in pull request #6008:
URL: https://github.com/apache/incubator-pinot/pull/6008#discussion_r487331483



##
File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java
##
@@ -78,6 +78,9 @@
 public class SegmentColumnarIndexCreator implements SegmentCreator {
   // TODO Refactor class name to match interface name
   private static final Logger LOGGER = 
LoggerFactory.getLogger(SegmentColumnarIndexCreator.class);
+  // Allow at most 512 characters for the metadata property
+  private static final int METADATA_PROPERTY_LENGTH_LIMIT = 512;

Review comment:
   They are independent, where the `maxLength` is configurable in the 
`FieldSpec` to allow ingesting long strings, but the property length limit is 
fixed. Also, the property length limit also applies to the BYTES column.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] jackjlli opened a new pull request #6009: Adjust schema validation logic in AvroIngestionSchemaValidator

2020-09-11 Thread GitBox


jackjlli opened a new pull request #6009:
URL: https://github.com/apache/incubator-pinot/pull/6009


   ## Description
   This PR adjusts schema validation logic in AvroIngestionSchemaValidator.
   
   The current logic doesn't check the actual data type for multi-value column, 
which could have returned incorrect validation results.
   
   E.g. if column3 is of array structure (like Object[]) and its base element 
is of String type in AVRO and if column3 is of String type as well in Pinot 
schema, the current code would mark it `data type mismatch`.
   
   E.g. if column3 is of array of map structure (like Map[]), the current logic 
would miss marking it `multi-value structure mismatch`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[incubator-pinot] 01/01: Adjust schema validation logic in AvroIngestionSchemaValidator

2020-09-11 Thread jlli
This is an automated email from the ASF dual-hosted git repository.

jlli pushed a commit to branch fix-schema-validator
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git

commit 6d22faa9fd916217b764ddebd8cacac54ea4a6db
Author: Jack Li(Analytics Engineering) 
AuthorDate: Fri Sep 11 14:37:24 2020 -0700

Adjust schema validation logic in AvroIngestionSchemaValidator
---
 .../hadoop/data/IngestionSchemaValidatorTest.java  |  51 ++---
 .../data/test_sample_data_multi_value.avro | Bin 0 -> 1227 bytes
 .../avro/AvroIngestionSchemaValidator.java |  51 +++--
 3 files changed, 71 insertions(+), 31 deletions(-)

diff --git 
a/pinot-plugins/pinot-batch-ingestion/v0_deprecated/pinot-hadoop/src/test/java/org/apache/pinot/hadoop/data/IngestionSchemaValidatorTest.java
 
b/pinot-plugins/pinot-batch-ingestion/v0_deprecated/pinot-hadoop/src/test/java/org/apache/pinot/hadoop/data/IngestionSchemaValidatorTest.java
index fec3583..8cd0912 100644
--- 
a/pinot-plugins/pinot-batch-ingestion/v0_deprecated/pinot-hadoop/src/test/java/org/apache/pinot/hadoop/data/IngestionSchemaValidatorTest.java
+++ 
b/pinot-plugins/pinot-batch-ingestion/v0_deprecated/pinot-hadoop/src/test/java/org/apache/pinot/hadoop/data/IngestionSchemaValidatorTest.java
@@ -29,16 +29,16 @@ import org.testng.annotations.Test;
 
 
 public class IngestionSchemaValidatorTest {
+
   @Test
-  public void testAvroIngestionSchemaValidator()
+  public void testAvroIngestionSchemaValidatorForSingleValueColumns()
   throws Exception {
-String inputFilePath = new File(
-
Preconditions.checkNotNull(IngestionSchemaValidatorTest.class.getClassLoader().getResource("data/test_sample_data.avro"))
-.getFile()).toString();
+String inputFilePath = new File(Preconditions
+
.checkNotNull(IngestionSchemaValidatorTest.class.getClassLoader().getResource("data/test_sample_data.avro"))
+.getFile()).toString();
 String recordReaderClassName = 
"org.apache.pinot.plugin.inputformat.avro.AvroRecordReader";
 
-Schema pinotSchema = new Schema.SchemaBuilder()
-.addSingleValueDimension("column1", FieldSpec.DataType.LONG)
+Schema pinotSchema = new 
Schema.SchemaBuilder().addSingleValueDimension("column1", 
FieldSpec.DataType.LONG)
 .addSingleValueDimension("column2", FieldSpec.DataType.INT)
 .addSingleValueDimension("column3", FieldSpec.DataType.STRING)
 .addSingleValueDimension("column7", FieldSpec.DataType.STRING)
@@ -53,8 +53,7 @@ public class IngestionSchemaValidatorTest {
 
Assert.assertFalse(ingestionSchemaValidator.getMissingPinotColumnResult().isMismatchDetected());
 
 // Adding one extra column
-pinotSchema = new Schema.SchemaBuilder()
-.addSingleValueDimension("column1", FieldSpec.DataType.LONG)
+pinotSchema = new 
Schema.SchemaBuilder().addSingleValueDimension("column1", 
FieldSpec.DataType.LONG)
 .addSingleValueDimension("column2", FieldSpec.DataType.INT)
 .addSingleValueDimension("column3", FieldSpec.DataType.STRING)
 .addSingleValueDimension("extra_column", FieldSpec.DataType.STRING)
@@ -69,11 +68,9 @@ public class IngestionSchemaValidatorTest {
 
Assert.assertFalse(ingestionSchemaValidator.getMultiValueStructureMismatchResult().isMismatchDetected());
 
Assert.assertTrue(ingestionSchemaValidator.getMissingPinotColumnResult().isMismatchDetected());
 
Assert.assertNotNull(ingestionSchemaValidator.getMissingPinotColumnResult().getMismatchReason());
-
System.out.println(ingestionSchemaValidator.getMissingPinotColumnResult().getMismatchReason());
 
 // Change the data type of column1 from LONG to STRING
-pinotSchema = new Schema.SchemaBuilder()
-.addSingleValueDimension("column1", FieldSpec.DataType.STRING)
+pinotSchema = new 
Schema.SchemaBuilder().addSingleValueDimension("column1", 
FieldSpec.DataType.STRING)
 .addSingleValueDimension("column2", FieldSpec.DataType.INT)
 .addSingleValueDimension("column3", FieldSpec.DataType.STRING)
 .addSingleValueDimension("column7", FieldSpec.DataType.STRING)
@@ -83,14 +80,12 @@ public class IngestionSchemaValidatorTest {
 Assert.assertNotNull(ingestionSchemaValidator);
 
Assert.assertTrue(ingestionSchemaValidator.getDataTypeMismatchResult().isMismatchDetected());
 
Assert.assertNotNull(ingestionSchemaValidator.getDataTypeMismatchResult().getMismatchReason());
-
System.out.println(ingestionSchemaValidator.getDataTypeMismatchResult().getMismatchReason());
 
Assert.assertFalse(ingestionSchemaValidator.getSingleValueMultiValueFieldMismatchResult().isMismatchDetected());
 
Assert.assertFalse(ingestionSchemaValidator.getMultiValueStructureMismatchResult().isMismatchDetected());
 
Assert.assertFalse(ingestionSchemaValidator.getMissingPinotColumnResult().isMismatchDetected());
 
 // Change column2 from single-value column to multi-value column
-

[incubator-pinot] branch fix-schema-validator created (now 6d22faa)

2020-09-11 Thread jlli
This is an automated email from the ASF dual-hosted git repository.

jlli pushed a change to branch fix-schema-validator
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git.


  at 6d22faa  Adjust schema validation logic in AvroIngestionSchemaValidator

This branch includes the following new commits:

 new 6d22faa  Adjust schema validation logic in AvroIngestionSchemaValidator

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] jackjlli commented on pull request #6005: Fix extract method in AvroRecordExtractor class

2020-09-11 Thread GitBox


jackjlli commented on pull request #6005:
URL: https://github.com/apache/incubator-pinot/pull/6005#issuecomment-691313748


   > Could you add a test such that it fails with the original code, and passes 
with the old code?
   
   Test added.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] yupeng9 commented on a change in pull request #6008: Add a length limit of 512 to the properties stored in the segment metadata

2020-09-11 Thread GitBox


yupeng9 commented on a change in pull request #6008:
URL: https://github.com/apache/incubator-pinot/pull/6008#discussion_r487290286



##
File path: 
pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java
##
@@ -78,6 +78,9 @@
 public class SegmentColumnarIndexCreator implements SegmentCreator {
   // TODO Refactor class name to match interface name
   private static final Logger LOGGER = 
LoggerFactory.getLogger(SegmentColumnarIndexCreator.class);
+  // Allow at most 512 characters for the metadata property
+  private static final int METADATA_PROPERTY_LENGTH_LIMIT = 512;

Review comment:
   shall this align with `FieldSpec.DEFAULT_MAX_LENGTH`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] codecov-commenter commented on pull request #6008: Add a length limit of 512 to the properties stored in the segment metadata

2020-09-11 Thread GitBox


codecov-commenter commented on pull request #6008:
URL: https://github.com/apache/incubator-pinot/pull/6008#issuecomment-691309424


   # 
[Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6008?src=pr=h1) 
Report
   > Merging 
[#6008](https://codecov.io/gh/apache/incubator-pinot/pull/6008?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc)
 will **decrease** coverage by `20.10%`.
   > The diff coverage is `49.63%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-pinot/pull/6008/graphs/tree.svg?width=650=150=pr=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6008?src=pr=tree)
   
   ```diff
   @@ Coverage Diff @@
   ##   master#6008   +/-   ##
   ===
   - Coverage   66.44%   46.34%   -20.11% 
   ===
 Files1075 1180  +105 
 Lines   5477355889 +1116 
 Branches 8168 8134   -34 
   ===
   - Hits3639625899-10497 
   - Misses  1570027869+12169 
   + Partials 2677 2121  -556 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | #integration | `46.34% <49.63%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-pinot/pull/6008?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==)
 | `100.00% <ø> (ø)` | |
   | 
[.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=)
 | `52.83% <0.00%> (-13.84%)` | :arrow_down: |
   | 
[...org/apache/pinot/broker/queryquota/HitCounter.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9IaXRDb3VudGVyLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | 
[...che/pinot/broker/queryquota/MaxHitRateTracker.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9NYXhIaXRSYXRlVHJhY2tlci5qYXZh)
 | `0.00% <0.00%> (ø)` | |
   | 
[...ache/pinot/broker/queryquota/QueryQuotaEntity.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9RdWVyeVF1b3RhRW50aXR5LmphdmE=)
 | `0.00% <0.00%> (-50.00%)` | :arrow_down: |
   | 
[...ava/org/apache/pinot/client/AbstractResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Fic3RyYWN0UmVzdWx0U2V0LmphdmE=)
 | `26.66% <0.00%> (-30.48%)` | :arrow_down: |
   | 
[.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==)
 | `22.22% <0.00%> (-26.62%)` | :arrow_down: |
   | 
[.../org/apache/pinot/client/ResultTableResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L1Jlc3VsdFRhYmxlUmVzdWx0U2V0LmphdmE=)
 | `24.00% <0.00%> (-10.29%)` | :arrow_down: |
   | 
[.../apache/pinot/common/function/FunctionInvoker.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vZnVuY3Rpb24vRnVuY3Rpb25JbnZva2VyLmphdmE=)
 | `61.36% <ø> (ø)` | |
   | 
[...apache/pinot/common/function/FunctionRegistry.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vZnVuY3Rpb24vRnVuY3Rpb25SZWdpc3RyeS5qYXZh)
 | `78.78% <ø> (ø)` | |
   | ... and [1130 
more](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6008?src=pr=continue).
   > **Legend** - [Click here to learn 

[incubator-pinot] branch master updated (11fd62b -> 13a281c)

2020-09-11 Thread xiangfu
This is an automated email from the ASF dual-hosted git repository.

xiangfu pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git.


from 11fd62b  Fix S3PinotFS List API may not return full results (#6002)
 add 13a281c  Fix/data view dev serve (#6006)

No new revisions were added by this update.

Summary of changes:
 website/README.md  |   13 +
 website/docs/components.md |2 +-
 website/docs/components/cluster.md |8 +-
 website/docs/misc/build-docker.md  |5 +-
 website/docs/user-guide/pql.md |4 +-
 website/docusaurus.config.js   |1 +
 website/package.json   |   28 +-
 website/src/components/Alert/index.js  |   61 +-
 website/src/components/Alert/styles.css|   57 +-
 website/src/components/BlogPostTags/index.js   |   38 +-
 .../src/components/BlogPostTags/styles.module.css  |2 +-
 website/src/components/Changelog/index.js  |  299 ++--
 website/src/components/CheckboxList/index.js   |   87 +-
 website/src/components/CodeHeader/index.js |   31 +-
 website/src/components/CodeHeader/styles.css   |   20 +-
 website/src/components/Field/index.js  |  294 +--
 website/src/components/Fields/index.js |  184 +-
 website/src/components/Fields/styles.css   |   28 +-
 website/src/components/Jump/index.js   |   60 +-
 website/src/components/Jump/styles.css |  108 +-
 website/src/components/Step/index.js   |   12 +-
 website/src/components/Steps/index.js  |   12 +-
 website/src/components/Steps/styles.css|   14 +-
 website/src/css/custom.css | 1885 ++--
 website/src/exports/animatedGraph.js   |  113 +-
 website/src/exports/cloudify.js|  587 +++---
 website/src/exports/newPost.js |   44 +-
 website/src/exports/newRelease.js  |   47 +-
 website/src/exports/repoUrl.js |   10 +-
 website/src/pages/download.css |   10 +-
 website/src/pages/download.js  |  441 +++--
 website/src/pages/index.css|  173 +-
 website/src/pages/index.js |  627 ---
 website/src/pages/index.module.css |  380 ++--
 34 files changed, 3088 insertions(+), 2597 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] fx19880617 merged pull request #6006: Fix/data view dev serve

2020-09-11 Thread GitBox


fx19880617 merged pull request #6006:
URL: https://github.com/apache/incubator-pinot/pull/6006


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang opened a new pull request #6008: Add a length limit of 512 to the properties stored in the segment metadata

2020-09-11 Thread GitBox


Jackie-Jiang opened a new pull request #6008:
URL: https://github.com/apache/incubator-pinot/pull/6008


   ## Description
   Prevent storing very long values into the segment metadata. This could 
happen when Pinot is used as a blob store (not recommended but supported).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] jihaozh commented on a change in pull request #6001: [TE] entity anomaly logging for ad-hoc debugging

2020-09-11 Thread GitBox


jihaozh commented on a change in pull request #6001:
URL: https://github.com/apache/incubator-pinot/pull/6001#discussion_r487263049



##
File path: 
thirdeye/thirdeye-pinot/src/test/java/org/apache/pinot/thirdeye/tools/RunAdhocDatabaseQueriesTool.java
##
@@ -723,9 +741,65 @@ private void rollbackMigrateSubscriptionWatermarks() {
 }
   }
 
-  public static void main(String[] args) throws Exception {
+  private void printEntityAnomalyDetails(MergedAnomalyResultDTO anomaly, 
String indent, int index) {
+LOG.info("");
+LOG.info("Exploring Entity Anomaly {} with id {}", index, anomaly.getId());
+LOG.info(ENTITY_STATS_TEMPLATE, anomaly.getChildren().size(), 
anomaly.getProperties());
+LOG.info(ENTITY_TIME_TEMPLATE,
+new DateTime(anomaly.getCreatedTime(), TIMEZONE),
+DATE_FORMAT.print(new DateTime(anomaly.getStartTime(), TIMEZONE)),
+DATE_FORMAT.print(new DateTime(anomaly.getEndTime(), TIMEZONE)));
+  }
 
-File persistenceFile = new File("/Users/akrai/persistence-linux.yml");
+  /**
+   * Visualizes the entity anomalies by printing them
+   *
+   * Eg: dq.printEntityAnomalyTrees(158750221, 0, System.currentTimeMillis())
+   *
+   * @param detectionId The detection id whose anomalies need to be printed
+   * @param start The start time of the anomaly slice
+   * @param end The end time of the anomaly slice
+   */
+  private void printEntityAnomalyTrees(long detectionId, long start, long end) 
{
+TimeSeriesLoader timeseriesLoader =
+new DefaultTimeSeriesLoader(metricConfigDAO, datasetConfigDAO,
+ThirdEyeCacheRegistry.getInstance().getQueryCache(),
+ThirdEyeCacheRegistry.getInstance().getTimeSeriesCache());
+AggregationLoader aggregationLoader =
+new DefaultAggregationLoader(metricConfigDAO, datasetConfigDAO,
+ThirdEyeCacheRegistry.getInstance().getQueryCache(),
+ThirdEyeCacheRegistry.getInstance().getDatasetMaxDataTimeCache());
+DefaultDataProvider provider =
+new DefaultDataProvider(metricConfigDAO, datasetConfigDAO, eventDAO, 
mergedResultDAO,
+DAORegistry.getInstance().getEvaluationManager(), 
timeseriesLoader, aggregationLoader,
+new DetectionPipelineLoader(), 
TimeSeriesCacheBuilder.getInstance(), AnomaliesCacheBuilder.getInstance());
+
+AnomalySlice anomalySlice = new AnomalySlice();
+anomalySlice = 
anomalySlice.withDetectionId(detectionId).withStart(start).withEnd(end);
+Multimap
+sliceToAnomaliesMap = 
provider.fetchAnomalies(Collections.singletonList(anomalySlice));
+
+LOG.info("Total number of entity anomalies = " + 
sliceToAnomaliesMap.values().size());
+
+int i = 1;
+for (MergedAnomalyResultDTO parentAnomaly : sliceToAnomaliesMap.values()) {
+  printEntityAnomalyDetails(parentAnomaly, "", i);
+  int j = 1;
+  for (MergedAnomalyResultDTO child : parentAnomaly.getChildren()) {
+printEntityAnomalyDetails(parentAnomaly, "\t", j);
+int k = 1;
+for (MergedAnomalyResultDTO grandchild : child.getChildren()) {
+  printEntityAnomalyDetails(grandchild, "\t\t", k);
+  k++;
+}
+j++;
+  }
+  i++;
+}
+  }
+
+  public static void main(String[] args) throws Exception {
+File persistenceFile = new File("/Users/akrai/persistence-local.yml");

Review comment:
   hide the user name here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] ChethanUK commented on pull request #6006: Fix/data view dev serve

2020-09-11 Thread GitBox


ChethanUK commented on pull request #6006:
URL: https://github.com/apache/incubator-pinot/pull/6006#issuecomment-691278842


   ✅ Preview available at 
https://incubator-pinot-git-fix-web-data-error.chethanuk.vercel.app



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] vincentchenjl opened a new pull request #6007: [TE] add labeler into yaml

2020-09-11 Thread GitBox


vincentchenjl opened a new pull request #6007:
URL: https://github.com/apache/incubator-pinot/pull/6007


   This PR is second PR for severity-based alert feature, including the logic 
of parsing labeler configuration and constructing the detection pipelines based 
on the YAML.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] ChethanUK opened a new pull request #6006: Fix/data view dev serve

2020-09-11 Thread GitBox


ChethanUK opened a new pull request #6006:
URL: https://github.com/apache/incubator-pinot/pull/6006


   Fixing a few data view and other errors which were failing the build 
   
   Updating the packages and adding dev serve test mode
   
   ```bash
   yarn run serve --build --port 3001 --host 0.0.0.0
   ```
   
   View after update:
   
   Light Mode - 
   ![Light 
Mode](https://user-images.githubusercontent.com/16241795/92965649-c888cd80-f493-11ea-95e4-d28b2b8bacec.png)
   
   Dark Mode -
   ![Dark 
Mode](https://user-images.githubusercontent.com/16241795/92965653-ca529100-f493-11ea-8b67-45c307dc10b2.png)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6005: Fix extract method in AvroRecordExtractor class

2020-09-11 Thread GitBox


jackjlli commented on a change in pull request #6005:
URL: https://github.com/apache/incubator-pinot/pull/6005#discussion_r487224829



##
File path: 
pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java
##
@@ -46,14 +46,13 @@ public void init(Set fields, @Nullable 
RecordExtractorConfig recordExtra
   @Override
   public GenericRow extract(GenericRecord from, GenericRow to) {
 if (_extractAll) {
-  Map jsonMap = JsonUtils.genericRecordToJson(from);
-  jsonMap.forEach((fieldName, value) -> to.putValue(fieldName, 
AvroUtils.convert(value)));
+  List fields = from.getSchema().getFields();

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[incubator-pinot] branch fix-AvroRecordExtractor updated (bfb5954 -> c19fb22)

2020-09-11 Thread jlli
This is an automated email from the ASF dual-hosted git repository.

jlli pushed a change to branch fix-AvroRecordExtractor
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git.


from bfb5954  Fix extract method in AvroRecordExtractor
 add c19fb22  Address PR comments

No new revisions were added by this update.

Summary of changes:
 .../inputformat/avro/AvroRecordExtractor.java  |  8 +++--
 .../inputformat/avro/AvroRecordExtractorTest.java  | 37 +-
 .../java/org/apache/pinot/spi/utils/JsonUtils.java |  2 --
 3 files changed, 41 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6005: Fix extract method in AvroRecordExtractor class

2020-09-11 Thread GitBox


Jackie-Jiang commented on a change in pull request #6005:
URL: https://github.com/apache/incubator-pinot/pull/6005#discussion_r487213705



##
File path: 
pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java
##
@@ -46,14 +46,13 @@ public void init(Set fields, @Nullable 
RecordExtractorConfig recordExtra
   @Override
   public GenericRow extract(GenericRecord from, GenericRow to) {
 if (_extractAll) {
-  Map jsonMap = JsonUtils.genericRecordToJson(from);
-  jsonMap.forEach((fieldName, value) -> to.putValue(fieldName, 
AvroUtils.convert(value)));
+  List fields = from.getSchema().getFields();

Review comment:
   Suggest using the non-functional way for performance concern





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] jackjlli opened a new pull request #6005: Fix extract method in AvroRecordExtractor class

2020-09-11 Thread GitBox


jackjlli opened a new pull request #6005:
URL: https://github.com/apache/incubator-pinot/pull/6005


   ## Description
   This PR fixes the extract method in AvroRecordExtractor class.
   
   When `_extractAll` is true, the generic record will be first converted to a 
json String and then parse to a json map, whereas json object has its precision 
limitation:
   https://developers.google.com/discovery/v1/type-format
   
   E.g. the column1 was originally of long type. But when it was converted to 
json string and then parse to a json map, the type would be changed to int. 
Thus, the actual value will be incorrect from the json map comparing to the 
actual value from generic record.
   
   This PR fixes it by fetching the actual value directly from the original 
generic record.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[incubator-pinot] 01/01: Fix extract method in AvroRecordExtractor

2020-09-11 Thread jlli
This is an automated email from the ASF dual-hosted git repository.

jlli pushed a commit to branch fix-AvroRecordExtractor
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git

commit bfb59545cdd13e94f34c882be579afd917fbdd80
Author: Jack Li(Analytics Engineering) 
AuthorDate: Fri Sep 11 10:33:09 2020 -0700

Fix extract method in AvroRecordExtractor
---
 .../plugin/inputformat/avro/AvroRecordExtractor.java| 17 -
 .../main/java/org/apache/pinot/spi/utils/JsonUtils.java | 13 -
 2 files changed, 8 insertions(+), 22 deletions(-)

diff --git 
a/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java
 
b/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java
index 339ab67..ede8586 100644
--- 
a/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java
+++ 
b/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java
@@ -18,14 +18,14 @@
  */
 package org.apache.pinot.plugin.inputformat.avro;
 
-import java.util.Map;
+import java.util.List;
 import java.util.Set;
 import javax.annotation.Nullable;
+import org.apache.avro.Schema;
 import org.apache.avro.generic.GenericRecord;
 import org.apache.pinot.spi.data.readers.GenericRow;
 import org.apache.pinot.spi.data.readers.RecordExtractor;
 import org.apache.pinot.spi.data.readers.RecordExtractorConfig;
-import org.apache.pinot.spi.utils.JsonUtils;
 
 
 /**
@@ -46,14 +46,13 @@ public class AvroRecordExtractor implements 
RecordExtractor {
   @Override
   public GenericRow extract(GenericRecord from, GenericRow to) {
 if (_extractAll) {
-  Map jsonMap = JsonUtils.genericRecordToJson(from);
-  jsonMap.forEach((fieldName, value) -> to.putValue(fieldName, 
AvroUtils.convert(value)));
+  List fields = from.getSchema().getFields();
+  fields.forEach(field -> {
+String fieldName = field.name();
+to.putValue(fieldName, AvroUtils.convert(from.get(fieldName)));
+  });
 } else {
-  for (String fieldName : _fields) {
-Object value = from.get(fieldName);
-Object convertedValue = AvroUtils.convert(value);
-to.putValue(fieldName, convertedValue);
-  }
+  _fields.forEach(fieldName -> to.putValue(fieldName, 
AvroUtils.convert(from.get(fieldName;
 }
 return to;
   }
diff --git a/pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java 
b/pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java
index f5bf9d3..419b5ce 100644
--- a/pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java
+++ b/pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java
@@ -193,17 +193,4 @@ public class JsonUtils {
 throw new IllegalArgumentException(String.format("Unsupported data 
type %s", dataType));
 }
   }
-
-  /**
-   * Converts from a GenericRecord to a json map
-   */
-  public static Map genericRecordToJson(GenericRecord 
genericRecord) {
-try {
-  String jsonString = genericRecord.toString();
-  return DEFAULT_MAPPER.readValue(jsonString, new 
TypeReference>() {
-  });
-} catch (IOException e) {
-  throw new IllegalStateException("Caught exception when converting 
generic record " + genericRecord + " to JSON");
-}
-  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[incubator-pinot] branch fix-AvroRecordExtractor updated (f4add59 -> bfb5954)

2020-09-11 Thread jlli
This is an automated email from the ASF dual-hosted git repository.

jlli pushed a change to branch fix-AvroRecordExtractor
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git.


 discard f4add59  Fix extract method in AvroRecordExtractor
 new bfb5954  Fix extract method in AvroRecordExtractor

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (f4add59)
\
 N -- N -- N   refs/heads/fix-AvroRecordExtractor (bfb5954)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java   | 2 --
 1 file changed, 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[incubator-pinot] branch fix-AvroRecordExtractor updated (fd1cb00 -> f4add59)

2020-09-11 Thread jlli
This is an automated email from the ASF dual-hosted git repository.

jlli pushed a change to branch fix-AvroRecordExtractor
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git.


 discard fd1cb00  Fix extract method in AvroRecordExtractor
 new f4add59  Fix extract method in AvroRecordExtractor

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (fd1cb00)
\
 N -- N -- N   refs/heads/fix-AvroRecordExtractor (f4add59)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../pinot/plugin/inputformat/avro/AvroRecordExtractor.java  |  9 +++--
 .../src/main/java/org/apache/pinot/spi/utils/JsonUtils.java | 13 -
 2 files changed, 7 insertions(+), 15 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[incubator-pinot] 01/01: Fix extract method in AvroRecordExtractor

2020-09-11 Thread jlli
This is an automated email from the ASF dual-hosted git repository.

jlli pushed a commit to branch fix-AvroRecordExtractor
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git

commit f4add59f9518b37c638d90457a92e18f81108545
Author: Jack Li(Analytics Engineering) 
AuthorDate: Fri Sep 11 10:33:09 2020 -0700

Fix extract method in AvroRecordExtractor
---
 .../plugin/inputformat/avro/AvroRecordExtractor.java  | 15 ---
 .../main/java/org/apache/pinot/spi/utils/JsonUtils.java   | 13 -
 2 files changed, 8 insertions(+), 20 deletions(-)

diff --git 
a/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java
 
b/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java
index 339ab67..cec3e5f 100644
--- 
a/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java
+++ 
b/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java
@@ -18,9 +18,11 @@
  */
 package org.apache.pinot.plugin.inputformat.avro;
 
+import java.util.List;
 import java.util.Map;
 import java.util.Set;
 import javax.annotation.Nullable;
+import org.apache.avro.Schema;
 import org.apache.avro.generic.GenericRecord;
 import org.apache.pinot.spi.data.readers.GenericRow;
 import org.apache.pinot.spi.data.readers.RecordExtractor;
@@ -46,14 +48,13 @@ public class AvroRecordExtractor implements 
RecordExtractor {
   @Override
   public GenericRow extract(GenericRecord from, GenericRow to) {
 if (_extractAll) {
-  Map jsonMap = JsonUtils.genericRecordToJson(from);
-  jsonMap.forEach((fieldName, value) -> to.putValue(fieldName, 
AvroUtils.convert(value)));
+  List fields = from.getSchema().getFields();
+  fields.forEach(field -> {
+String fieldName = field.name();
+to.putValue(fieldName, AvroUtils.convert(from.get(fieldName)));
+  });
 } else {
-  for (String fieldName : _fields) {
-Object value = from.get(fieldName);
-Object convertedValue = AvroUtils.convert(value);
-to.putValue(fieldName, convertedValue);
-  }
+  _fields.forEach(fieldName -> to.putValue(fieldName, 
AvroUtils.convert(from.get(fieldName;
 }
 return to;
   }
diff --git a/pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java 
b/pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java
index f5bf9d3..419b5ce 100644
--- a/pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java
+++ b/pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java
@@ -193,17 +193,4 @@ public class JsonUtils {
 throw new IllegalArgumentException(String.format("Unsupported data 
type %s", dataType));
 }
   }
-
-  /**
-   * Converts from a GenericRecord to a json map
-   */
-  public static Map genericRecordToJson(GenericRecord 
genericRecord) {
-try {
-  String jsonString = genericRecord.toString();
-  return DEFAULT_MAPPER.readValue(jsonString, new 
TypeReference>() {
-  });
-} catch (IOException e) {
-  throw new IllegalStateException("Caught exception when converting 
generic record " + genericRecord + " to JSON");
-}
-  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[incubator-pinot] branch fix-AvroRecordExtractor created (now fd1cb00)

2020-09-11 Thread jlli
This is an automated email from the ASF dual-hosted git repository.

jlli pushed a change to branch fix-AvroRecordExtractor
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git.


  at fd1cb00  Fix extract method in AvroRecordExtractor

This branch includes the following new commits:

 new fd1cb00  Fix extract method in AvroRecordExtractor

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[incubator-pinot] 01/01: Fix extract method in AvroRecordExtractor

2020-09-11 Thread jlli
This is an automated email from the ASF dual-hosted git repository.

jlli pushed a commit to branch fix-AvroRecordExtractor
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git

commit fd1cb00a4f8d0f475e114add27b81d11ef651372
Author: Jack Li(Analytics Engineering) 
AuthorDate: Fri Sep 11 10:33:09 2020 -0700

Fix extract method in AvroRecordExtractor
---
 .../apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git 
a/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java
 
b/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java
index 339ab67..05cab34 100644
--- 
a/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java
+++ 
b/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java
@@ -47,13 +47,9 @@ public class AvroRecordExtractor implements 
RecordExtractor {
   public GenericRow extract(GenericRecord from, GenericRow to) {
 if (_extractAll) {
   Map jsonMap = JsonUtils.genericRecordToJson(from);
-  jsonMap.forEach((fieldName, value) -> to.putValue(fieldName, 
AvroUtils.convert(value)));
+  jsonMap.forEach((fieldName, value) -> to.putValue(fieldName, 
AvroUtils.convert(from.get(fieldName;
 } else {
-  for (String fieldName : _fields) {
-Object value = from.get(fieldName);
-Object convertedValue = AvroUtils.convert(value);
-to.putValue(fieldName, convertedValue);
-  }
+  _fields.forEach(fieldName -> to.putValue(fieldName, 
AvroUtils.convert(from.get(fieldName;
 }
 return to;
   }


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[incubator-pinot] branch master updated: Fix S3PinotFS List API may not return full results (#6002)

2020-09-11 Thread xiangfu
This is an automated email from the ASF dual-hosted git repository.

xiangfu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git


The following commit(s) were added to refs/heads/master by this push:
 new 11fd62b  Fix S3PinotFS List API may not return full results (#6002)
11fd62b is described below

commit 11fd62b77d84ba714828d0f85341e205f83c6c4e
Author: Xiang Fu 
AuthorDate: Fri Sep 11 02:37:45 2020 -0700

Fix S3PinotFS List API may not return full results (#6002)
---
 .../spark/SparkSegmentGenerationJobRunner.java |  1 +
 .../apache/pinot/plugin/filesystem/S3PinotFS.java  | 61 +-
 2 files changed, 36 insertions(+), 26 deletions(-)

diff --git 
a/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java
 
b/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java
index ad96e5d..c1b3f25 100644
--- 
a/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java
+++ 
b/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java
@@ -188,6 +188,7 @@ public class SparkSegmentGenerationJobRunner implements 
IngestionJobRunner, Seri
   }
 }
 
+LOGGER.info("Found {} files to create Pinot segments!", 
filteredFiles.size());
 try {
   JavaSparkContext sparkContext = 
JavaSparkContext.fromSparkContext(SparkContext.getOrCreate());
 
diff --git 
a/pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java
 
b/pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java
index d70eadc..08e74c9 100644
--- 
a/pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java
+++ 
b/pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java
@@ -29,6 +29,7 @@ import java.nio.charset.StandardCharsets;
 import java.nio.file.Path;
 import java.nio.file.Paths;
 import java.util.HashMap;
+import java.util.List;
 import java.util.Map;
 
 import org.apache.commons.io.FileUtils;
@@ -374,34 +375,42 @@ public class S3PinotFS extends PinotFS {
   throws IOException {
 try {
   ImmutableList.Builder builder = ImmutableList.builder();
+  String continuationToken = null;
+  boolean isDone = false;
   String prefix = normalizeToDirectoryPrefix(fileUri);
-
-  ListObjectsV2Response listObjectsV2Response;
-  ListObjectsV2Request.Builder listObjectsV2RequestBuilder =
-  ListObjectsV2Request.builder().bucket(fileUri.getHost());
-
-  if (!prefix.equals(DELIMITER)) {
-listObjectsV2RequestBuilder = 
listObjectsV2RequestBuilder.prefix(prefix);
-  }
-
-  if (!recursive) {
-listObjectsV2RequestBuilder = 
listObjectsV2RequestBuilder.delimiter(DELIMITER);
-  }
-
-  ListObjectsV2Request listObjectsV2Request = 
listObjectsV2RequestBuilder.build();
-  listObjectsV2Response = _s3Client.listObjectsV2(listObjectsV2Request);
-
-  listObjectsV2Response.contents().stream().forEach(object -> {
-//Only add files and not directories
-if (!object.key().equals(fileUri.getPath()) && 
!object.key().endsWith(DELIMITER)) {
-  String fileKey = object.key();
-  if (fileKey.startsWith(DELIMITER)) {
-fileKey = fileKey.substring(1);
-  }
-  builder.add(S3_SCHEME + fileUri.getHost() + DELIMITER + fileKey);
+  while(!isDone) {
+ListObjectsV2Request.Builder listObjectsV2RequestBuilder =
+ListObjectsV2Request.builder().bucket(fileUri.getHost());
+if (!prefix.equals(DELIMITER)) {
+  listObjectsV2RequestBuilder = 
listObjectsV2RequestBuilder.prefix(prefix);
+}
+if (!recursive) {
+  listObjectsV2RequestBuilder = 
listObjectsV2RequestBuilder.delimiter(DELIMITER);
 }
-  });
-  return builder.build().toArray(new String[0]);
+if (continuationToken != null) {
+  listObjectsV2RequestBuilder.continuationToken(continuationToken);
+}
+ListObjectsV2Request listObjectsV2Request = 
listObjectsV2RequestBuilder.build();
+LOGGER.debug("Trying to send ListObjectsV2Request {}", 
listObjectsV2Request);
+ListObjectsV2Response listObjectsV2Response = 
_s3Client.listObjectsV2(listObjectsV2Request);
+LOGGER.debug("Getting ListObjectsV2Response: {}", 
listObjectsV2Response);
+List filesReturned = listObjectsV2Response.contents();
+filesReturned.stream().forEach(object -> {
+  //Only add files and not directories
+  if 

[GitHub] [incubator-pinot] fx19880617 merged pull request #6002: Fixing S3PinotFS List API returned partial results

2020-09-11 Thread GitBox


fx19880617 merged pull request #6002:
URL: https://github.com/apache/incubator-pinot/pull/6002


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] fx19880617 commented on issue #6003: Review PinotFS ListFile Implementations

2020-09-11 Thread GitBox


fx19880617 commented on issue #6003:
URL: 
https://github.com/apache/incubator-pinot/issues/6003#issuecomment-690927594


   @elonazoulay could you help check if GcsPinotFs is good when the bucket has 
many objects?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] fx19880617 removed a comment on issue #6003: Review PinotFS ListFile Implementations

2020-09-11 Thread GitBox


fx19880617 removed a comment on issue #6003:
URL: 
https://github.com/apache/incubator-pinot/issues/6003#issuecomment-690920467


   @snleee 
   @elonazoulay 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] fx19880617 commented on issue #6003: Review PinotFS ListFile Implementations

2020-09-11 Thread GitBox


fx19880617 commented on issue #6003:
URL: 
https://github.com/apache/incubator-pinot/issues/6003#issuecomment-690926043


   @snleee  seems that ADLSGen2 also has the problem of truncation:
   
https://docs.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/filesystem/list



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[incubator-pinot] branch fixing_s3_list_api updated (8665918 -> 0226925)

2020-09-11 Thread xiangfu
This is an automated email from the ASF dual-hosted git repository.

xiangfu pushed a change to branch fixing_s3_list_api
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git.


 discard 8665918  Fix S3PinotFS List API may not return full results
 add 0226925  Fix S3PinotFS List API may not return full results

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (8665918)
\
 N -- N -- N   refs/heads/fixing_s3_list_api (0226925)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java   | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[incubator-pinot] branch fixing_s3_list_api updated (8665918 -> 0226925)

2020-09-11 Thread xiangfu
This is an automated email from the ASF dual-hosted git repository.

xiangfu pushed a change to branch fixing_s3_list_api
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git.


 discard 8665918  Fix S3PinotFS List API may not return full results
 add 0226925  Fix S3PinotFS List API may not return full results

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (8665918)
\
 N -- N -- N   refs/heads/fixing_s3_list_api (0226925)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java   | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] fx19880617 commented on issue #6003: Review PinotFS ListFile Implementations

2020-09-11 Thread GitBox


fx19880617 commented on issue #6003:
URL: 
https://github.com/apache/incubator-pinot/issues/6003#issuecomment-690920467


   @snleee 
   @elonazoulay 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[incubator-pinot] branch fixing_s3_list_api updated (8665918 -> 0226925)

2020-09-11 Thread xiangfu
This is an automated email from the ASF dual-hosted git repository.

xiangfu pushed a change to branch fixing_s3_list_api
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git.


 discard 8665918  Fix S3PinotFS List API may not return full results
 add 0226925  Fix S3PinotFS List API may not return full results

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (8665918)
\
 N -- N -- N   refs/heads/fixing_s3_list_api (0226925)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java   | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] fx19880617 commented on a change in pull request #6002: Fixing S3PinotFS List API returned partial results

2020-09-11 Thread GitBox


fx19880617 commented on a change in pull request #6002:
URL: https://github.com/apache/incubator-pinot/pull/6002#discussion_r486817834



##
File path: 
pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java
##
@@ -374,33 +375,39 @@ public long length(URI fileUri)
   throws IOException {
 try {
   ImmutableList.Builder builder = ImmutableList.builder();
+  String continuationToken = null;
+  boolean isDone = false;
   String prefix = normalizeToDirectoryPrefix(fileUri);
-
-  ListObjectsV2Response listObjectsV2Response;
-  ListObjectsV2Request.Builder listObjectsV2RequestBuilder =
-  ListObjectsV2Request.builder().bucket(fileUri.getHost());
-
-  if (!prefix.equals(DELIMITER)) {
-listObjectsV2RequestBuilder = 
listObjectsV2RequestBuilder.prefix(prefix);
-  }
-
-  if (!recursive) {
-listObjectsV2RequestBuilder = 
listObjectsV2RequestBuilder.delimiter(DELIMITER);
-  }
-
-  ListObjectsV2Request listObjectsV2Request = 
listObjectsV2RequestBuilder.build();
-  listObjectsV2Response = _s3Client.listObjectsV2(listObjectsV2Request);
-
-  listObjectsV2Response.contents().stream().forEach(object -> {
-//Only add files and not directories
-if (!object.key().equals(fileUri.getPath()) && 
!object.key().endsWith(DELIMITER)) {
-  String fileKey = object.key();
-  if (fileKey.startsWith(DELIMITER)) {
-fileKey = fileKey.substring(1);
-  }
-  builder.add(S3_SCHEME + fileUri.getHost() + DELIMITER + fileKey);
+  while(!isDone) {
+ListObjectsV2Request.Builder listObjectsV2RequestBuilder =
+ListObjectsV2Request.builder().bucket(fileUri.getHost());
+if (!prefix.equals(DELIMITER)) {
+  listObjectsV2RequestBuilder = 
listObjectsV2RequestBuilder.prefix(prefix);
+}
+if (!recursive) {
+  listObjectsV2RequestBuilder = 
listObjectsV2RequestBuilder.delimiter(DELIMITER);
 }
-  });
+if (continuationToken != null) {
+  listObjectsV2RequestBuilder.continuationToken(continuationToken);
+}
+ListObjectsV2Request listObjectsV2Request = 
listObjectsV2RequestBuilder.build();
+LOGGER.debug("Trying to send ListObjectsV2Request {}", 
listObjectsV2Request);
+ListObjectsV2Response listObjectsV2Response = 
_s3Client.listObjectsV2(listObjectsV2Request);
+LOGGER.debug("Getting ListObjectsV2Response: {}", 
listObjectsV2Response);
+List filesReturned = listObjectsV2Response.contents();
+filesReturned.stream().forEach(object -> {
+  //Only add files and not directories
+  if (!object.key().equals(fileUri.getPath()) && 
!object.key().endsWith(DELIMITER)) {
+String fileKey = object.key();
+if (fileKey.startsWith(DELIMITER)) {
+  fileKey = fileKey.substring(1);
+}
+builder.add(S3_SCHEME + fileUri.getHost() + DELIMITER + fileKey);
+  }
+});
+isDone = !listObjectsV2Response.isTruncated();
+continuationToken = listObjectsV2Response.nextContinuationToken();
+  }
   return builder.build().toArray(new String[0]);

Review comment:
   added.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] Jackie-Jiang opened a new pull request #6004: Add ThetaSketchAggregationFunction

2020-09-11 Thread GitBox


Jackie-Jiang opened a new pull request #6004:
URL: https://github.com/apache/incubator-pinot/pull/6004


   ## Description
   Introduce `ThetaSketchAggregationFunction` as an enhanced version of 
`DistinctCountThetaSketchAggregationFunction`, and add the following supports:
   - Support aggregating on raw values of all data types (both SV and MV)
   - Support nested filter (E.g. `A = 1 AND (B = 2 OR C = 3)`)
   - Support filter on all data types (both SV and MV)
   - Support simple union without filters (E.g. `thetaSketch(col)`)
   - Support `$0` as the default sketch (sketch without filter)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] fx19880617 commented on issue #6003: Review PinotFS ListFile Implementations

2020-09-11 Thread GitBox


fx19880617 commented on issue #6003:
URL: 
https://github.com/apache/incubator-pinot/issues/6003#issuecomment-690914135


   S3PinotFs is fixed in #6002 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] fx19880617 opened a new issue #6003: Review PinotFS ListFile Implementations

2020-09-11 Thread GitBox


fx19880617 opened a new issue #6003:
URL: https://github.com/apache/incubator-pinot/issues/6003


   Per https://github.com/apache/incubator-pinot/pull/6002, S3 has a limit on 
each ListObject call response. I think this policy may also be true for other 
blob store.
   
   This issue is created to review other PinotFs implementations to ensure the 
similar issues are not presenting there.
   
   * [ ] AzurePinotFs
   * [ ] ADLSGen2PinotFs
   * [ ] GcsPinotFs
   * [ ] HadoopPinotFs
   * [ ] S3PinotFs
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] kishoreg commented on a change in pull request #6002: Fixing S3PinotFS List API returned partial results

2020-09-11 Thread GitBox


kishoreg commented on a change in pull request #6002:
URL: https://github.com/apache/incubator-pinot/pull/6002#discussion_r486810483



##
File path: 
pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java
##
@@ -374,33 +375,39 @@ public long length(URI fileUri)
   throws IOException {
 try {
   ImmutableList.Builder builder = ImmutableList.builder();
+  String continuationToken = null;
+  boolean isDone = false;
   String prefix = normalizeToDirectoryPrefix(fileUri);
-
-  ListObjectsV2Response listObjectsV2Response;
-  ListObjectsV2Request.Builder listObjectsV2RequestBuilder =
-  ListObjectsV2Request.builder().bucket(fileUri.getHost());
-
-  if (!prefix.equals(DELIMITER)) {
-listObjectsV2RequestBuilder = 
listObjectsV2RequestBuilder.prefix(prefix);
-  }
-
-  if (!recursive) {
-listObjectsV2RequestBuilder = 
listObjectsV2RequestBuilder.delimiter(DELIMITER);
-  }
-
-  ListObjectsV2Request listObjectsV2Request = 
listObjectsV2RequestBuilder.build();
-  listObjectsV2Response = _s3Client.listObjectsV2(listObjectsV2Request);
-
-  listObjectsV2Response.contents().stream().forEach(object -> {
-//Only add files and not directories
-if (!object.key().equals(fileUri.getPath()) && 
!object.key().endsWith(DELIMITER)) {
-  String fileKey = object.key();
-  if (fileKey.startsWith(DELIMITER)) {
-fileKey = fileKey.substring(1);
-  }
-  builder.add(S3_SCHEME + fileUri.getHost() + DELIMITER + fileKey);
+  while(!isDone) {
+ListObjectsV2Request.Builder listObjectsV2RequestBuilder =
+ListObjectsV2Request.builder().bucket(fileUri.getHost());
+if (!prefix.equals(DELIMITER)) {
+  listObjectsV2RequestBuilder = 
listObjectsV2RequestBuilder.prefix(prefix);
+}
+if (!recursive) {
+  listObjectsV2RequestBuilder = 
listObjectsV2RequestBuilder.delimiter(DELIMITER);
 }
-  });
+if (continuationToken != null) {
+  listObjectsV2RequestBuilder.continuationToken(continuationToken);
+}
+ListObjectsV2Request listObjectsV2Request = 
listObjectsV2RequestBuilder.build();
+LOGGER.debug("Trying to send ListObjectsV2Request {}", 
listObjectsV2Request);
+ListObjectsV2Response listObjectsV2Response = 
_s3Client.listObjectsV2(listObjectsV2Request);
+LOGGER.debug("Getting ListObjectsV2Response: {}", 
listObjectsV2Response);
+List filesReturned = listObjectsV2Response.contents();
+filesReturned.stream().forEach(object -> {
+  //Only add files and not directories
+  if (!object.key().equals(fileUri.getPath()) && 
!object.key().endsWith(DELIMITER)) {
+String fileKey = object.key();
+if (fileKey.startsWith(DELIMITER)) {
+  fileKey = fileKey.substring(1);
+}
+builder.add(S3_SCHEME + fileUri.getHost() + DELIMITER + fileKey);
+  }
+});
+isDone = !listObjectsV2Response.isTruncated();
+continuationToken = listObjectsV2Response.nextContinuationToken();
+  }
   return builder.build().toArray(new String[0]);

Review comment:
   will be good to log the total number of files listed as info





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] fx19880617 commented on pull request #6002: Fixing S3PinotFS List API returned partial results

2020-09-11 Thread GitBox


fx19880617 commented on pull request #6002:
URL: https://github.com/apache/incubator-pinot/pull/6002#issuecomment-690912063


   We should also review other PinotFs implementations to ensure the similar 
issues are not presenting there.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [incubator-pinot] fx19880617 opened a new pull request #6002: Fixing S3PinotFS List API returned partial results

2020-09-11 Thread GitBox


fx19880617 opened a new pull request #6002:
URL: https://github.com/apache/incubator-pinot/pull/6002


   ## Description
   S3 API has a bounded limit(1000) for the objects returned in ListObject API, 
which means each call may at most returned 1000 S3 objects.
   
   This PR will check `ListObjectsV2Response` and fetch next batch when 
necessary.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[incubator-pinot] 01/01: Fix S3PinotFS List API may not return full results

2020-09-11 Thread xiangfu
This is an automated email from the ASF dual-hosted git repository.

xiangfu pushed a commit to branch fixing_s3_list_api
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git

commit 86659186e8e2196b199a5c2bb011dd560a5a524f
Author: Xiang Fu 
AuthorDate: Thu Sep 10 23:38:37 2020 -0700

Fix S3PinotFS List API may not return full results
---
 .../spark/SparkSegmentGenerationJobRunner.java |  1 +
 .../apache/pinot/plugin/filesystem/S3PinotFS.java  | 57 --
 2 files changed, 33 insertions(+), 25 deletions(-)

diff --git 
a/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java
 
b/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java
index ad96e5d..c1b3f25 100644
--- 
a/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java
+++ 
b/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java
@@ -188,6 +188,7 @@ public class SparkSegmentGenerationJobRunner implements 
IngestionJobRunner, Seri
   }
 }
 
+LOGGER.info("Found {} files to create Pinot segments!", 
filteredFiles.size());
 try {
   JavaSparkContext sparkContext = 
JavaSparkContext.fromSparkContext(SparkContext.getOrCreate());
 
diff --git 
a/pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java
 
b/pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java
index d70eadc..9d03c47 100644
--- 
a/pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java
+++ 
b/pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java
@@ -29,6 +29,7 @@ import java.nio.charset.StandardCharsets;
 import java.nio.file.Path;
 import java.nio.file.Paths;
 import java.util.HashMap;
+import java.util.List;
 import java.util.Map;
 
 import org.apache.commons.io.FileUtils;
@@ -374,33 +375,39 @@ public class S3PinotFS extends PinotFS {
   throws IOException {
 try {
   ImmutableList.Builder builder = ImmutableList.builder();
+  String continuationToken = null;
+  boolean isDone = false;
   String prefix = normalizeToDirectoryPrefix(fileUri);
-
-  ListObjectsV2Response listObjectsV2Response;
-  ListObjectsV2Request.Builder listObjectsV2RequestBuilder =
-  ListObjectsV2Request.builder().bucket(fileUri.getHost());
-
-  if (!prefix.equals(DELIMITER)) {
-listObjectsV2RequestBuilder = 
listObjectsV2RequestBuilder.prefix(prefix);
-  }
-
-  if (!recursive) {
-listObjectsV2RequestBuilder = 
listObjectsV2RequestBuilder.delimiter(DELIMITER);
-  }
-
-  ListObjectsV2Request listObjectsV2Request = 
listObjectsV2RequestBuilder.build();
-  listObjectsV2Response = _s3Client.listObjectsV2(listObjectsV2Request);
-
-  listObjectsV2Response.contents().stream().forEach(object -> {
-//Only add files and not directories
-if (!object.key().equals(fileUri.getPath()) && 
!object.key().endsWith(DELIMITER)) {
-  String fileKey = object.key();
-  if (fileKey.startsWith(DELIMITER)) {
-fileKey = fileKey.substring(1);
-  }
-  builder.add(S3_SCHEME + fileUri.getHost() + DELIMITER + fileKey);
+  while(!isDone) {
+ListObjectsV2Request.Builder listObjectsV2RequestBuilder =
+ListObjectsV2Request.builder().bucket(fileUri.getHost());
+if (!prefix.equals(DELIMITER)) {
+  listObjectsV2RequestBuilder = 
listObjectsV2RequestBuilder.prefix(prefix);
+}
+if (!recursive) {
+  listObjectsV2RequestBuilder = 
listObjectsV2RequestBuilder.delimiter(DELIMITER);
 }
-  });
+if (continuationToken != null) {
+  listObjectsV2RequestBuilder.continuationToken(continuationToken);
+}
+ListObjectsV2Request listObjectsV2Request = 
listObjectsV2RequestBuilder.build();
+LOGGER.debug("Trying to send ListObjectsV2Request {}", 
listObjectsV2Request);
+ListObjectsV2Response listObjectsV2Response = 
_s3Client.listObjectsV2(listObjectsV2Request);
+LOGGER.debug("Getting ListObjectsV2Response: {}", 
listObjectsV2Response);
+List filesReturned = listObjectsV2Response.contents();
+filesReturned.stream().forEach(object -> {
+  //Only add files and not directories
+  if (!object.key().equals(fileUri.getPath()) && 
!object.key().endsWith(DELIMITER)) {
+String fileKey = object.key();
+if (fileKey.startsWith(DELIMITER)) {
+  fileKey = fileKey.substring(1);
+ 

[incubator-pinot] branch fixing_s3_list_api created (now 8665918)

2020-09-11 Thread xiangfu
This is an automated email from the ASF dual-hosted git repository.

xiangfu pushed a change to branch fixing_s3_list_api
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git.


  at 8665918  Fix S3PinotFS List API may not return full results

This branch includes the following new commits:

 new 8665918  Fix S3PinotFS List API may not return full results

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org