[GitHub] [incubator-pinot] fx19880617 opened a new pull request #6011: Update pinot helm to adding custom configs
fx19880617 opened a new pull request #6011: URL: https://github.com/apache/incubator-pinot/pull/6011 ## Description This change: 1. Allows users to add more configs into pinot config files just through `values.yaml` file. 2. Add GC related settings into default JVM configs for reference. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] branch update_pinot_helm_for_custom_config_file created (now c3d3d20)
This is an automated email from the ASF dual-hosted git repository. xiangfu pushed a change to branch update_pinot_helm_for_custom_config_file in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git. at c3d3d20 Update pinot helm to adding custom configs and update the jvm default configs This branch includes the following new commits: new c3d3d20 Update pinot helm to adding custom configs and update the jvm default configs The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] 01/01: Update pinot helm to adding custom configs and update the jvm default configs
This is an automated email from the ASF dual-hosted git repository. xiangfu pushed a commit to branch update_pinot_helm_for_custom_config_file in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git commit c3d3d20b59f5af7e190afdbaa32a8ff121bd4c26 Author: Xiang Fu AuthorDate: Fri Sep 11 22:50:53 2020 -0700 Update pinot helm to adding custom configs and update the jvm default configs --- .../helm/pinot/templates/broker/configmap.yaml | 2 +- .../helm/pinot/templates/controller/configmap.yaml | 2 +- .../helm/pinot/templates/server/configmap.yaml | 2 +- kubernetes/helm/pinot/values.yaml | 21 ++--- 4 files changed, 21 insertions(+), 6 deletions(-) diff --git a/kubernetes/helm/pinot/templates/broker/configmap.yaml b/kubernetes/helm/pinot/templates/broker/configmap.yaml index 9ed5dd5..525ae80 100644 --- a/kubernetes/helm/pinot/templates/broker/configmap.yaml +++ b/kubernetes/helm/pinot/templates/broker/configmap.yaml @@ -25,4 +25,4 @@ data: pinot-broker.conf: |- pinot.broker.client.queryPort={{ .Values.broker.port }} pinot.broker.routing.table.builder.class={{ .Values.broker.routingTable.builderClass }} -pinot.set.instance.id.to.hostname=true +{{ .Values.broker.extra.configs | indent 4 }} \ No newline at end of file diff --git a/kubernetes/helm/pinot/templates/controller/configmap.yaml b/kubernetes/helm/pinot/templates/controller/configmap.yaml index 69d24a8..3d40a27 100644 --- a/kubernetes/helm/pinot/templates/controller/configmap.yaml +++ b/kubernetes/helm/pinot/templates/controller/configmap.yaml @@ -29,4 +29,4 @@ data: controller.vip.port={{ .Values.controller.service.port }} controller.data.dir={{ .Values.controller.data.dir }} controller.zk.str={{ include "zookeeper.url" . }} -pinot.set.instance.id.to.hostname=true +{{ .Values.controller.extra.configs | indent 4 }} \ No newline at end of file diff --git a/kubernetes/helm/pinot/templates/server/configmap.yaml b/kubernetes/helm/pinot/templates/server/configmap.yaml index c99e173..f4320cf 100644 --- a/kubernetes/helm/pinot/templates/server/configmap.yaml +++ b/kubernetes/helm/pinot/templates/server/configmap.yaml @@ -27,4 +27,4 @@ data: pinot.server.adminapi.port={{ .Values.server.ports.admin }} pinot.server.instance.dataDir={{ .Values.server.dataDir }} pinot.server.instance.segmentTarDir={{ .Values.server.segmentTarDir }} -pinot.set.instance.id.to.hostname=true +{{ .Values.server.extra.configs | indent 4 }} \ No newline at end of file diff --git a/kubernetes/helm/pinot/values.yaml b/kubernetes/helm/pinot/values.yaml index 689a8f0..4e6c347 100644 --- a/kubernetes/helm/pinot/values.yaml +++ b/kubernetes/helm/pinot/values.yaml @@ -47,7 +47,7 @@ controller: host: pinot-controller port: 9000 - jvmOpts: "-Xms256M -Xmx1G" + jvmOpts: "-Xms256M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/opt/pinot/gc-pinot-controller.log" log4j2ConfFile: /opt/pinot/conf/pinot-controller-log4j2.xml pluginsDir: /opt/pinot/plugins @@ -80,6 +80,11 @@ controller: updateStrategy: type: RollingUpdate + # Extra configs will be appended to pinot-controller.conf file + extra: +configs: |- + pinot.set.instance.id.to.hostname=true + broker: name: broker @@ -87,7 +92,7 @@ broker: replicaCount: 1 - jvmOpts: "-Xms256M -Xmx1G" + jvmOpts: "-Xms256M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/opt/pinot/gc-pinot-broker.log" log4j2ConfFile: /opt/pinot/conf/pinot-broker-log4j2.xml pluginsDir: /opt/pinot/plugins @@ -123,6 +128,11 @@ broker: updateStrategy: type: RollingUpdate + # Extra configs will be appended to pinot-broker.conf file + extra: +configs: |- + pinot.set.instance.id.to.hostname=true + server: name: server @@ -143,7 +153,7 @@ server: storageClass: "" #storageClass: "ssd" - jvmOpts: "-Xms512M -Xmx1G" + jvmOpts: "-Xms512M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/opt/pinot/gc-pinot-server.log" log4j2ConfFile: /opt/pinot/conf/pinot-server-log4j2.xml pluginsDir: /opt/pinot/plugins @@ -171,6 +181,11 @@ server: updateStrategy: type: RollingUpdate + # Extra configs will be appended to pinot-server.conf file + extra: +configs: |- + pinot.set.instance.id.to.hostname=true + pinot.server.instance.realtime.alloc.offheap=true # -- # Zookeeper: # --
[GitHub] [incubator-pinot] jackjlli commented on pull request #6009: Adjust schema validation logic in AvroIngestionSchemaValidator
jackjlli commented on pull request #6009: URL: https://github.com/apache/incubator-pinot/pull/6009#issuecomment-691397062 > Can you reduce the size of the testing files (both `test_sample_data_multi_value.avro` and `test_sample_data.avro` previously checked in)? All we need from the file is the avro schema, and we should not add a 11.7 MB file into the repo. Good point Shrink the files to 10 lines each. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] branch fix-schema-validator updated (6d22faa -> f09ca9a)
This is an automated email from the ASF dual-hosted git repository. jlli pushed a change to branch fix-schema-validator in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git. from 6d22faa Adjust schema validation logic in AvroIngestionSchemaValidator add f09ca9a Reduce test file sizes No new revisions were added by this update. Summary of changes: .../src/test/resources/data/test_sample_data.avro | Bin 917973 -> 2315 bytes .../data/test_sample_data_multi_value.avro | Bin 1227 -> 5108 bytes 2 files changed, 0 insertions(+), 0 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] yupeng9 commented on a change in pull request #6008: Add a length limit of 512 to the properties stored in the segment metadata
yupeng9 commented on a change in pull request #6008: URL: https://github.com/apache/incubator-pinot/pull/6008#discussion_r487354466 ## File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java ## @@ -78,6 +78,9 @@ public class SegmentColumnarIndexCreator implements SegmentCreator { // TODO Refactor class name to match interface name private static final Logger LOGGER = LoggerFactory.getLogger(SegmentColumnarIndexCreator.class); + // Allow at most 512 characters for the metadata property + private static final int METADATA_PROPERTY_LENGTH_LIMIT = 512; Review comment: I see. Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] Jackie-Jiang opened a new pull request #6010: [Clean up] Separate TextIndex from InvertedIndex
Jackie-Jiang opened a new pull request #6010: URL: https://github.com/apache/incubator-pinot/pull/6010 ## Description Introduce `TextIndexCreator` and `TextIndexReader` for text index This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6008: Add a length limit of 512 to the properties stored in the segment metadata
Jackie-Jiang commented on a change in pull request #6008: URL: https://github.com/apache/incubator-pinot/pull/6008#discussion_r487331483 ## File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java ## @@ -78,6 +78,9 @@ public class SegmentColumnarIndexCreator implements SegmentCreator { // TODO Refactor class name to match interface name private static final Logger LOGGER = LoggerFactory.getLogger(SegmentColumnarIndexCreator.class); + // Allow at most 512 characters for the metadata property + private static final int METADATA_PROPERTY_LENGTH_LIMIT = 512; Review comment: They are independent, where the `maxLength` is configurable in the `FieldSpec` to allow ingesting long strings, but the property length limit is fixed. Also, the property length limit also applies to the BYTES column. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] jackjlli opened a new pull request #6009: Adjust schema validation logic in AvroIngestionSchemaValidator
jackjlli opened a new pull request #6009: URL: https://github.com/apache/incubator-pinot/pull/6009 ## Description This PR adjusts schema validation logic in AvroIngestionSchemaValidator. The current logic doesn't check the actual data type for multi-value column, which could have returned incorrect validation results. E.g. if column3 is of array structure (like Object[]) and its base element is of String type in AVRO and if column3 is of String type as well in Pinot schema, the current code would mark it `data type mismatch`. E.g. if column3 is of array of map structure (like Map[]), the current logic would miss marking it `multi-value structure mismatch`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] 01/01: Adjust schema validation logic in AvroIngestionSchemaValidator
This is an automated email from the ASF dual-hosted git repository. jlli pushed a commit to branch fix-schema-validator in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git commit 6d22faa9fd916217b764ddebd8cacac54ea4a6db Author: Jack Li(Analytics Engineering) AuthorDate: Fri Sep 11 14:37:24 2020 -0700 Adjust schema validation logic in AvroIngestionSchemaValidator --- .../hadoop/data/IngestionSchemaValidatorTest.java | 51 ++--- .../data/test_sample_data_multi_value.avro | Bin 0 -> 1227 bytes .../avro/AvroIngestionSchemaValidator.java | 51 +++-- 3 files changed, 71 insertions(+), 31 deletions(-) diff --git a/pinot-plugins/pinot-batch-ingestion/v0_deprecated/pinot-hadoop/src/test/java/org/apache/pinot/hadoop/data/IngestionSchemaValidatorTest.java b/pinot-plugins/pinot-batch-ingestion/v0_deprecated/pinot-hadoop/src/test/java/org/apache/pinot/hadoop/data/IngestionSchemaValidatorTest.java index fec3583..8cd0912 100644 --- a/pinot-plugins/pinot-batch-ingestion/v0_deprecated/pinot-hadoop/src/test/java/org/apache/pinot/hadoop/data/IngestionSchemaValidatorTest.java +++ b/pinot-plugins/pinot-batch-ingestion/v0_deprecated/pinot-hadoop/src/test/java/org/apache/pinot/hadoop/data/IngestionSchemaValidatorTest.java @@ -29,16 +29,16 @@ import org.testng.annotations.Test; public class IngestionSchemaValidatorTest { + @Test - public void testAvroIngestionSchemaValidator() + public void testAvroIngestionSchemaValidatorForSingleValueColumns() throws Exception { -String inputFilePath = new File( - Preconditions.checkNotNull(IngestionSchemaValidatorTest.class.getClassLoader().getResource("data/test_sample_data.avro")) -.getFile()).toString(); +String inputFilePath = new File(Preconditions + .checkNotNull(IngestionSchemaValidatorTest.class.getClassLoader().getResource("data/test_sample_data.avro")) +.getFile()).toString(); String recordReaderClassName = "org.apache.pinot.plugin.inputformat.avro.AvroRecordReader"; -Schema pinotSchema = new Schema.SchemaBuilder() -.addSingleValueDimension("column1", FieldSpec.DataType.LONG) +Schema pinotSchema = new Schema.SchemaBuilder().addSingleValueDimension("column1", FieldSpec.DataType.LONG) .addSingleValueDimension("column2", FieldSpec.DataType.INT) .addSingleValueDimension("column3", FieldSpec.DataType.STRING) .addSingleValueDimension("column7", FieldSpec.DataType.STRING) @@ -53,8 +53,7 @@ public class IngestionSchemaValidatorTest { Assert.assertFalse(ingestionSchemaValidator.getMissingPinotColumnResult().isMismatchDetected()); // Adding one extra column -pinotSchema = new Schema.SchemaBuilder() -.addSingleValueDimension("column1", FieldSpec.DataType.LONG) +pinotSchema = new Schema.SchemaBuilder().addSingleValueDimension("column1", FieldSpec.DataType.LONG) .addSingleValueDimension("column2", FieldSpec.DataType.INT) .addSingleValueDimension("column3", FieldSpec.DataType.STRING) .addSingleValueDimension("extra_column", FieldSpec.DataType.STRING) @@ -69,11 +68,9 @@ public class IngestionSchemaValidatorTest { Assert.assertFalse(ingestionSchemaValidator.getMultiValueStructureMismatchResult().isMismatchDetected()); Assert.assertTrue(ingestionSchemaValidator.getMissingPinotColumnResult().isMismatchDetected()); Assert.assertNotNull(ingestionSchemaValidator.getMissingPinotColumnResult().getMismatchReason()); - System.out.println(ingestionSchemaValidator.getMissingPinotColumnResult().getMismatchReason()); // Change the data type of column1 from LONG to STRING -pinotSchema = new Schema.SchemaBuilder() -.addSingleValueDimension("column1", FieldSpec.DataType.STRING) +pinotSchema = new Schema.SchemaBuilder().addSingleValueDimension("column1", FieldSpec.DataType.STRING) .addSingleValueDimension("column2", FieldSpec.DataType.INT) .addSingleValueDimension("column3", FieldSpec.DataType.STRING) .addSingleValueDimension("column7", FieldSpec.DataType.STRING) @@ -83,14 +80,12 @@ public class IngestionSchemaValidatorTest { Assert.assertNotNull(ingestionSchemaValidator); Assert.assertTrue(ingestionSchemaValidator.getDataTypeMismatchResult().isMismatchDetected()); Assert.assertNotNull(ingestionSchemaValidator.getDataTypeMismatchResult().getMismatchReason()); - System.out.println(ingestionSchemaValidator.getDataTypeMismatchResult().getMismatchReason()); Assert.assertFalse(ingestionSchemaValidator.getSingleValueMultiValueFieldMismatchResult().isMismatchDetected()); Assert.assertFalse(ingestionSchemaValidator.getMultiValueStructureMismatchResult().isMismatchDetected()); Assert.assertFalse(ingestionSchemaValidator.getMissingPinotColumnResult().isMismatchDetected()); // Change column2 from single-value column to multi-value column -
[incubator-pinot] branch fix-schema-validator created (now 6d22faa)
This is an automated email from the ASF dual-hosted git repository. jlli pushed a change to branch fix-schema-validator in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git. at 6d22faa Adjust schema validation logic in AvroIngestionSchemaValidator This branch includes the following new commits: new 6d22faa Adjust schema validation logic in AvroIngestionSchemaValidator The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] jackjlli commented on pull request #6005: Fix extract method in AvroRecordExtractor class
jackjlli commented on pull request #6005: URL: https://github.com/apache/incubator-pinot/pull/6005#issuecomment-691313748 > Could you add a test such that it fails with the original code, and passes with the old code? Test added. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] yupeng9 commented on a change in pull request #6008: Add a length limit of 512 to the properties stored in the segment metadata
yupeng9 commented on a change in pull request #6008: URL: https://github.com/apache/incubator-pinot/pull/6008#discussion_r487290286 ## File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java ## @@ -78,6 +78,9 @@ public class SegmentColumnarIndexCreator implements SegmentCreator { // TODO Refactor class name to match interface name private static final Logger LOGGER = LoggerFactory.getLogger(SegmentColumnarIndexCreator.class); + // Allow at most 512 characters for the metadata property + private static final int METADATA_PROPERTY_LENGTH_LIMIT = 512; Review comment: shall this align with `FieldSpec.DEFAULT_MAX_LENGTH`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] codecov-commenter commented on pull request #6008: Add a length limit of 512 to the properties stored in the segment metadata
codecov-commenter commented on pull request #6008: URL: https://github.com/apache/incubator-pinot/pull/6008#issuecomment-691309424 # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6008?src=pr=h1) Report > Merging [#6008](https://codecov.io/gh/apache/incubator-pinot/pull/6008?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) will **decrease** coverage by `20.10%`. > The diff coverage is `49.63%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6008/graphs/tree.svg?width=650=150=pr=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6008?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#6008 +/- ## === - Coverage 66.44% 46.34% -20.11% === Files1075 1180 +105 Lines 5477355889 +1116 Branches 8168 8134 -34 === - Hits3639625899-10497 - Misses 1570027869+12169 + Partials 2677 2121 -556 ``` | Flag | Coverage Δ | | |---|---|---| | #integration | `46.34% <49.63%> (?)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6008?src=pr=tree) | Coverage Δ | | |---|---|---| | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `100.00% <ø> (ø)` | | | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `52.83% <0.00%> (-13.84%)` | :arrow_down: | | [...org/apache/pinot/broker/queryquota/HitCounter.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9IaXRDb3VudGVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: | | [...che/pinot/broker/queryquota/MaxHitRateTracker.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9NYXhIaXRSYXRlVHJhY2tlci5qYXZh) | `0.00% <0.00%> (ø)` | | | [...ache/pinot/broker/queryquota/QueryQuotaEntity.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9RdWVyeVF1b3RhRW50aXR5LmphdmE=) | `0.00% <0.00%> (-50.00%)` | :arrow_down: | | [...ava/org/apache/pinot/client/AbstractResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Fic3RyYWN0UmVzdWx0U2V0LmphdmE=) | `26.66% <0.00%> (-30.48%)` | :arrow_down: | | [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `22.22% <0.00%> (-26.62%)` | :arrow_down: | | [.../org/apache/pinot/client/ResultTableResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L1Jlc3VsdFRhYmxlUmVzdWx0U2V0LmphdmE=) | `24.00% <0.00%> (-10.29%)` | :arrow_down: | | [.../apache/pinot/common/function/FunctionInvoker.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vZnVuY3Rpb24vRnVuY3Rpb25JbnZva2VyLmphdmE=) | `61.36% <ø> (ø)` | | | [...apache/pinot/common/function/FunctionRegistry.java](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vZnVuY3Rpb24vRnVuY3Rpb25SZWdpc3RyeS5qYXZh) | `78.78% <ø> (ø)` | | | ... and [1130 more](https://codecov.io/gh/apache/incubator-pinot/pull/6008/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6008?src=pr=continue). > **Legend** - [Click here to learn
[incubator-pinot] branch master updated (11fd62b -> 13a281c)
This is an automated email from the ASF dual-hosted git repository. xiangfu pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git. from 11fd62b Fix S3PinotFS List API may not return full results (#6002) add 13a281c Fix/data view dev serve (#6006) No new revisions were added by this update. Summary of changes: website/README.md | 13 + website/docs/components.md |2 +- website/docs/components/cluster.md |8 +- website/docs/misc/build-docker.md |5 +- website/docs/user-guide/pql.md |4 +- website/docusaurus.config.js |1 + website/package.json | 28 +- website/src/components/Alert/index.js | 61 +- website/src/components/Alert/styles.css| 57 +- website/src/components/BlogPostTags/index.js | 38 +- .../src/components/BlogPostTags/styles.module.css |2 +- website/src/components/Changelog/index.js | 299 ++-- website/src/components/CheckboxList/index.js | 87 +- website/src/components/CodeHeader/index.js | 31 +- website/src/components/CodeHeader/styles.css | 20 +- website/src/components/Field/index.js | 294 +-- website/src/components/Fields/index.js | 184 +- website/src/components/Fields/styles.css | 28 +- website/src/components/Jump/index.js | 60 +- website/src/components/Jump/styles.css | 108 +- website/src/components/Step/index.js | 12 +- website/src/components/Steps/index.js | 12 +- website/src/components/Steps/styles.css| 14 +- website/src/css/custom.css | 1885 ++-- website/src/exports/animatedGraph.js | 113 +- website/src/exports/cloudify.js| 587 +++--- website/src/exports/newPost.js | 44 +- website/src/exports/newRelease.js | 47 +- website/src/exports/repoUrl.js | 10 +- website/src/pages/download.css | 10 +- website/src/pages/download.js | 441 +++-- website/src/pages/index.css| 173 +- website/src/pages/index.js | 627 --- website/src/pages/index.module.css | 380 ++-- 34 files changed, 3088 insertions(+), 2597 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 merged pull request #6006: Fix/data view dev serve
fx19880617 merged pull request #6006: URL: https://github.com/apache/incubator-pinot/pull/6006 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] Jackie-Jiang opened a new pull request #6008: Add a length limit of 512 to the properties stored in the segment metadata
Jackie-Jiang opened a new pull request #6008: URL: https://github.com/apache/incubator-pinot/pull/6008 ## Description Prevent storing very long values into the segment metadata. This could happen when Pinot is used as a blob store (not recommended but supported). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] jihaozh commented on a change in pull request #6001: [TE] entity anomaly logging for ad-hoc debugging
jihaozh commented on a change in pull request #6001: URL: https://github.com/apache/incubator-pinot/pull/6001#discussion_r487263049 ## File path: thirdeye/thirdeye-pinot/src/test/java/org/apache/pinot/thirdeye/tools/RunAdhocDatabaseQueriesTool.java ## @@ -723,9 +741,65 @@ private void rollbackMigrateSubscriptionWatermarks() { } } - public static void main(String[] args) throws Exception { + private void printEntityAnomalyDetails(MergedAnomalyResultDTO anomaly, String indent, int index) { +LOG.info(""); +LOG.info("Exploring Entity Anomaly {} with id {}", index, anomaly.getId()); +LOG.info(ENTITY_STATS_TEMPLATE, anomaly.getChildren().size(), anomaly.getProperties()); +LOG.info(ENTITY_TIME_TEMPLATE, +new DateTime(anomaly.getCreatedTime(), TIMEZONE), +DATE_FORMAT.print(new DateTime(anomaly.getStartTime(), TIMEZONE)), +DATE_FORMAT.print(new DateTime(anomaly.getEndTime(), TIMEZONE))); + } -File persistenceFile = new File("/Users/akrai/persistence-linux.yml"); + /** + * Visualizes the entity anomalies by printing them + * + * Eg: dq.printEntityAnomalyTrees(158750221, 0, System.currentTimeMillis()) + * + * @param detectionId The detection id whose anomalies need to be printed + * @param start The start time of the anomaly slice + * @param end The end time of the anomaly slice + */ + private void printEntityAnomalyTrees(long detectionId, long start, long end) { +TimeSeriesLoader timeseriesLoader = +new DefaultTimeSeriesLoader(metricConfigDAO, datasetConfigDAO, +ThirdEyeCacheRegistry.getInstance().getQueryCache(), +ThirdEyeCacheRegistry.getInstance().getTimeSeriesCache()); +AggregationLoader aggregationLoader = +new DefaultAggregationLoader(metricConfigDAO, datasetConfigDAO, +ThirdEyeCacheRegistry.getInstance().getQueryCache(), +ThirdEyeCacheRegistry.getInstance().getDatasetMaxDataTimeCache()); +DefaultDataProvider provider = +new DefaultDataProvider(metricConfigDAO, datasetConfigDAO, eventDAO, mergedResultDAO, +DAORegistry.getInstance().getEvaluationManager(), timeseriesLoader, aggregationLoader, +new DetectionPipelineLoader(), TimeSeriesCacheBuilder.getInstance(), AnomaliesCacheBuilder.getInstance()); + +AnomalySlice anomalySlice = new AnomalySlice(); +anomalySlice = anomalySlice.withDetectionId(detectionId).withStart(start).withEnd(end); +Multimap +sliceToAnomaliesMap = provider.fetchAnomalies(Collections.singletonList(anomalySlice)); + +LOG.info("Total number of entity anomalies = " + sliceToAnomaliesMap.values().size()); + +int i = 1; +for (MergedAnomalyResultDTO parentAnomaly : sliceToAnomaliesMap.values()) { + printEntityAnomalyDetails(parentAnomaly, "", i); + int j = 1; + for (MergedAnomalyResultDTO child : parentAnomaly.getChildren()) { +printEntityAnomalyDetails(parentAnomaly, "\t", j); +int k = 1; +for (MergedAnomalyResultDTO grandchild : child.getChildren()) { + printEntityAnomalyDetails(grandchild, "\t\t", k); + k++; +} +j++; + } + i++; +} + } + + public static void main(String[] args) throws Exception { +File persistenceFile = new File("/Users/akrai/persistence-local.yml"); Review comment: hide the user name here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] ChethanUK commented on pull request #6006: Fix/data view dev serve
ChethanUK commented on pull request #6006: URL: https://github.com/apache/incubator-pinot/pull/6006#issuecomment-691278842 ✅ Preview available at https://incubator-pinot-git-fix-web-data-error.chethanuk.vercel.app This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] vincentchenjl opened a new pull request #6007: [TE] add labeler into yaml
vincentchenjl opened a new pull request #6007: URL: https://github.com/apache/incubator-pinot/pull/6007 This PR is second PR for severity-based alert feature, including the logic of parsing labeler configuration and constructing the detection pipelines based on the YAML. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] ChethanUK opened a new pull request #6006: Fix/data view dev serve
ChethanUK opened a new pull request #6006: URL: https://github.com/apache/incubator-pinot/pull/6006 Fixing a few data view and other errors which were failing the build Updating the packages and adding dev serve test mode ```bash yarn run serve --build --port 3001 --host 0.0.0.0 ``` View after update: Light Mode - ![Light Mode](https://user-images.githubusercontent.com/16241795/92965649-c888cd80-f493-11ea-95e4-d28b2b8bacec.png) Dark Mode - ![Dark Mode](https://user-images.githubusercontent.com/16241795/92965653-ca529100-f493-11ea-8b67-45c307dc10b2.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6005: Fix extract method in AvroRecordExtractor class
jackjlli commented on a change in pull request #6005: URL: https://github.com/apache/incubator-pinot/pull/6005#discussion_r487224829 ## File path: pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java ## @@ -46,14 +46,13 @@ public void init(Set fields, @Nullable RecordExtractorConfig recordExtra @Override public GenericRow extract(GenericRecord from, GenericRow to) { if (_extractAll) { - Map jsonMap = JsonUtils.genericRecordToJson(from); - jsonMap.forEach((fieldName, value) -> to.putValue(fieldName, AvroUtils.convert(value))); + List fields = from.getSchema().getFields(); Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] branch fix-AvroRecordExtractor updated (bfb5954 -> c19fb22)
This is an automated email from the ASF dual-hosted git repository. jlli pushed a change to branch fix-AvroRecordExtractor in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git. from bfb5954 Fix extract method in AvroRecordExtractor add c19fb22 Address PR comments No new revisions were added by this update. Summary of changes: .../inputformat/avro/AvroRecordExtractor.java | 8 +++-- .../inputformat/avro/AvroRecordExtractorTest.java | 37 +- .../java/org/apache/pinot/spi/utils/JsonUtils.java | 2 -- 3 files changed, 41 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6005: Fix extract method in AvroRecordExtractor class
Jackie-Jiang commented on a change in pull request #6005: URL: https://github.com/apache/incubator-pinot/pull/6005#discussion_r487213705 ## File path: pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java ## @@ -46,14 +46,13 @@ public void init(Set fields, @Nullable RecordExtractorConfig recordExtra @Override public GenericRow extract(GenericRecord from, GenericRow to) { if (_extractAll) { - Map jsonMap = JsonUtils.genericRecordToJson(from); - jsonMap.forEach((fieldName, value) -> to.putValue(fieldName, AvroUtils.convert(value))); + List fields = from.getSchema().getFields(); Review comment: Suggest using the non-functional way for performance concern This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] jackjlli opened a new pull request #6005: Fix extract method in AvroRecordExtractor class
jackjlli opened a new pull request #6005: URL: https://github.com/apache/incubator-pinot/pull/6005 ## Description This PR fixes the extract method in AvroRecordExtractor class. When `_extractAll` is true, the generic record will be first converted to a json String and then parse to a json map, whereas json object has its precision limitation: https://developers.google.com/discovery/v1/type-format E.g. the column1 was originally of long type. But when it was converted to json string and then parse to a json map, the type would be changed to int. Thus, the actual value will be incorrect from the json map comparing to the actual value from generic record. This PR fixes it by fetching the actual value directly from the original generic record. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] 01/01: Fix extract method in AvroRecordExtractor
This is an automated email from the ASF dual-hosted git repository. jlli pushed a commit to branch fix-AvroRecordExtractor in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git commit bfb59545cdd13e94f34c882be579afd917fbdd80 Author: Jack Li(Analytics Engineering) AuthorDate: Fri Sep 11 10:33:09 2020 -0700 Fix extract method in AvroRecordExtractor --- .../plugin/inputformat/avro/AvroRecordExtractor.java| 17 - .../main/java/org/apache/pinot/spi/utils/JsonUtils.java | 13 - 2 files changed, 8 insertions(+), 22 deletions(-) diff --git a/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java b/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java index 339ab67..ede8586 100644 --- a/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java +++ b/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java @@ -18,14 +18,14 @@ */ package org.apache.pinot.plugin.inputformat.avro; -import java.util.Map; +import java.util.List; import java.util.Set; import javax.annotation.Nullable; +import org.apache.avro.Schema; import org.apache.avro.generic.GenericRecord; import org.apache.pinot.spi.data.readers.GenericRow; import org.apache.pinot.spi.data.readers.RecordExtractor; import org.apache.pinot.spi.data.readers.RecordExtractorConfig; -import org.apache.pinot.spi.utils.JsonUtils; /** @@ -46,14 +46,13 @@ public class AvroRecordExtractor implements RecordExtractor { @Override public GenericRow extract(GenericRecord from, GenericRow to) { if (_extractAll) { - Map jsonMap = JsonUtils.genericRecordToJson(from); - jsonMap.forEach((fieldName, value) -> to.putValue(fieldName, AvroUtils.convert(value))); + List fields = from.getSchema().getFields(); + fields.forEach(field -> { +String fieldName = field.name(); +to.putValue(fieldName, AvroUtils.convert(from.get(fieldName))); + }); } else { - for (String fieldName : _fields) { -Object value = from.get(fieldName); -Object convertedValue = AvroUtils.convert(value); -to.putValue(fieldName, convertedValue); - } + _fields.forEach(fieldName -> to.putValue(fieldName, AvroUtils.convert(from.get(fieldName; } return to; } diff --git a/pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java b/pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java index f5bf9d3..419b5ce 100644 --- a/pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java +++ b/pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java @@ -193,17 +193,4 @@ public class JsonUtils { throw new IllegalArgumentException(String.format("Unsupported data type %s", dataType)); } } - - /** - * Converts from a GenericRecord to a json map - */ - public static Map genericRecordToJson(GenericRecord genericRecord) { -try { - String jsonString = genericRecord.toString(); - return DEFAULT_MAPPER.readValue(jsonString, new TypeReference>() { - }); -} catch (IOException e) { - throw new IllegalStateException("Caught exception when converting generic record " + genericRecord + " to JSON"); -} - } } - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] branch fix-AvroRecordExtractor updated (f4add59 -> bfb5954)
This is an automated email from the ASF dual-hosted git repository. jlli pushed a change to branch fix-AvroRecordExtractor in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git. discard f4add59 Fix extract method in AvroRecordExtractor new bfb5954 Fix extract method in AvroRecordExtractor This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (f4add59) \ N -- N -- N refs/heads/fix-AvroRecordExtractor (bfb5954) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java | 2 -- 1 file changed, 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] branch fix-AvroRecordExtractor updated (fd1cb00 -> f4add59)
This is an automated email from the ASF dual-hosted git repository. jlli pushed a change to branch fix-AvroRecordExtractor in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git. discard fd1cb00 Fix extract method in AvroRecordExtractor new f4add59 Fix extract method in AvroRecordExtractor This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (fd1cb00) \ N -- N -- N refs/heads/fix-AvroRecordExtractor (f4add59) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../pinot/plugin/inputformat/avro/AvroRecordExtractor.java | 9 +++-- .../src/main/java/org/apache/pinot/spi/utils/JsonUtils.java | 13 - 2 files changed, 7 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] 01/01: Fix extract method in AvroRecordExtractor
This is an automated email from the ASF dual-hosted git repository. jlli pushed a commit to branch fix-AvroRecordExtractor in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git commit f4add59f9518b37c638d90457a92e18f81108545 Author: Jack Li(Analytics Engineering) AuthorDate: Fri Sep 11 10:33:09 2020 -0700 Fix extract method in AvroRecordExtractor --- .../plugin/inputformat/avro/AvroRecordExtractor.java | 15 --- .../main/java/org/apache/pinot/spi/utils/JsonUtils.java | 13 - 2 files changed, 8 insertions(+), 20 deletions(-) diff --git a/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java b/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java index 339ab67..cec3e5f 100644 --- a/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java +++ b/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java @@ -18,9 +18,11 @@ */ package org.apache.pinot.plugin.inputformat.avro; +import java.util.List; import java.util.Map; import java.util.Set; import javax.annotation.Nullable; +import org.apache.avro.Schema; import org.apache.avro.generic.GenericRecord; import org.apache.pinot.spi.data.readers.GenericRow; import org.apache.pinot.spi.data.readers.RecordExtractor; @@ -46,14 +48,13 @@ public class AvroRecordExtractor implements RecordExtractor { @Override public GenericRow extract(GenericRecord from, GenericRow to) { if (_extractAll) { - Map jsonMap = JsonUtils.genericRecordToJson(from); - jsonMap.forEach((fieldName, value) -> to.putValue(fieldName, AvroUtils.convert(value))); + List fields = from.getSchema().getFields(); + fields.forEach(field -> { +String fieldName = field.name(); +to.putValue(fieldName, AvroUtils.convert(from.get(fieldName))); + }); } else { - for (String fieldName : _fields) { -Object value = from.get(fieldName); -Object convertedValue = AvroUtils.convert(value); -to.putValue(fieldName, convertedValue); - } + _fields.forEach(fieldName -> to.putValue(fieldName, AvroUtils.convert(from.get(fieldName; } return to; } diff --git a/pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java b/pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java index f5bf9d3..419b5ce 100644 --- a/pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java +++ b/pinot-spi/src/main/java/org/apache/pinot/spi/utils/JsonUtils.java @@ -193,17 +193,4 @@ public class JsonUtils { throw new IllegalArgumentException(String.format("Unsupported data type %s", dataType)); } } - - /** - * Converts from a GenericRecord to a json map - */ - public static Map genericRecordToJson(GenericRecord genericRecord) { -try { - String jsonString = genericRecord.toString(); - return DEFAULT_MAPPER.readValue(jsonString, new TypeReference>() { - }); -} catch (IOException e) { - throw new IllegalStateException("Caught exception when converting generic record " + genericRecord + " to JSON"); -} - } } - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] branch fix-AvroRecordExtractor created (now fd1cb00)
This is an automated email from the ASF dual-hosted git repository. jlli pushed a change to branch fix-AvroRecordExtractor in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git. at fd1cb00 Fix extract method in AvroRecordExtractor This branch includes the following new commits: new fd1cb00 Fix extract method in AvroRecordExtractor The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] 01/01: Fix extract method in AvroRecordExtractor
This is an automated email from the ASF dual-hosted git repository. jlli pushed a commit to branch fix-AvroRecordExtractor in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git commit fd1cb00a4f8d0f475e114add27b81d11ef651372 Author: Jack Li(Analytics Engineering) AuthorDate: Fri Sep 11 10:33:09 2020 -0700 Fix extract method in AvroRecordExtractor --- .../apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java b/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java index 339ab67..05cab34 100644 --- a/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java +++ b/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java @@ -47,13 +47,9 @@ public class AvroRecordExtractor implements RecordExtractor { public GenericRow extract(GenericRecord from, GenericRow to) { if (_extractAll) { Map jsonMap = JsonUtils.genericRecordToJson(from); - jsonMap.forEach((fieldName, value) -> to.putValue(fieldName, AvroUtils.convert(value))); + jsonMap.forEach((fieldName, value) -> to.putValue(fieldName, AvroUtils.convert(from.get(fieldName; } else { - for (String fieldName : _fields) { -Object value = from.get(fieldName); -Object convertedValue = AvroUtils.convert(value); -to.putValue(fieldName, convertedValue); - } + _fields.forEach(fieldName -> to.putValue(fieldName, AvroUtils.convert(from.get(fieldName; } return to; } - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] branch master updated: Fix S3PinotFS List API may not return full results (#6002)
This is an automated email from the ASF dual-hosted git repository. xiangfu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git The following commit(s) were added to refs/heads/master by this push: new 11fd62b Fix S3PinotFS List API may not return full results (#6002) 11fd62b is described below commit 11fd62b77d84ba714828d0f85341e205f83c6c4e Author: Xiang Fu AuthorDate: Fri Sep 11 02:37:45 2020 -0700 Fix S3PinotFS List API may not return full results (#6002) --- .../spark/SparkSegmentGenerationJobRunner.java | 1 + .../apache/pinot/plugin/filesystem/S3PinotFS.java | 61 +- 2 files changed, 36 insertions(+), 26 deletions(-) diff --git a/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java b/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java index ad96e5d..c1b3f25 100644 --- a/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java +++ b/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java @@ -188,6 +188,7 @@ public class SparkSegmentGenerationJobRunner implements IngestionJobRunner, Seri } } +LOGGER.info("Found {} files to create Pinot segments!", filteredFiles.size()); try { JavaSparkContext sparkContext = JavaSparkContext.fromSparkContext(SparkContext.getOrCreate()); diff --git a/pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java b/pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java index d70eadc..08e74c9 100644 --- a/pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java +++ b/pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java @@ -29,6 +29,7 @@ import java.nio.charset.StandardCharsets; import java.nio.file.Path; import java.nio.file.Paths; import java.util.HashMap; +import java.util.List; import java.util.Map; import org.apache.commons.io.FileUtils; @@ -374,34 +375,42 @@ public class S3PinotFS extends PinotFS { throws IOException { try { ImmutableList.Builder builder = ImmutableList.builder(); + String continuationToken = null; + boolean isDone = false; String prefix = normalizeToDirectoryPrefix(fileUri); - - ListObjectsV2Response listObjectsV2Response; - ListObjectsV2Request.Builder listObjectsV2RequestBuilder = - ListObjectsV2Request.builder().bucket(fileUri.getHost()); - - if (!prefix.equals(DELIMITER)) { -listObjectsV2RequestBuilder = listObjectsV2RequestBuilder.prefix(prefix); - } - - if (!recursive) { -listObjectsV2RequestBuilder = listObjectsV2RequestBuilder.delimiter(DELIMITER); - } - - ListObjectsV2Request listObjectsV2Request = listObjectsV2RequestBuilder.build(); - listObjectsV2Response = _s3Client.listObjectsV2(listObjectsV2Request); - - listObjectsV2Response.contents().stream().forEach(object -> { -//Only add files and not directories -if (!object.key().equals(fileUri.getPath()) && !object.key().endsWith(DELIMITER)) { - String fileKey = object.key(); - if (fileKey.startsWith(DELIMITER)) { -fileKey = fileKey.substring(1); - } - builder.add(S3_SCHEME + fileUri.getHost() + DELIMITER + fileKey); + while(!isDone) { +ListObjectsV2Request.Builder listObjectsV2RequestBuilder = +ListObjectsV2Request.builder().bucket(fileUri.getHost()); +if (!prefix.equals(DELIMITER)) { + listObjectsV2RequestBuilder = listObjectsV2RequestBuilder.prefix(prefix); +} +if (!recursive) { + listObjectsV2RequestBuilder = listObjectsV2RequestBuilder.delimiter(DELIMITER); } - }); - return builder.build().toArray(new String[0]); +if (continuationToken != null) { + listObjectsV2RequestBuilder.continuationToken(continuationToken); +} +ListObjectsV2Request listObjectsV2Request = listObjectsV2RequestBuilder.build(); +LOGGER.debug("Trying to send ListObjectsV2Request {}", listObjectsV2Request); +ListObjectsV2Response listObjectsV2Response = _s3Client.listObjectsV2(listObjectsV2Request); +LOGGER.debug("Getting ListObjectsV2Response: {}", listObjectsV2Response); +List filesReturned = listObjectsV2Response.contents(); +filesReturned.stream().forEach(object -> { + //Only add files and not directories + if
[GitHub] [incubator-pinot] fx19880617 merged pull request #6002: Fixing S3PinotFS List API returned partial results
fx19880617 merged pull request #6002: URL: https://github.com/apache/incubator-pinot/pull/6002 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 commented on issue #6003: Review PinotFS ListFile Implementations
fx19880617 commented on issue #6003: URL: https://github.com/apache/incubator-pinot/issues/6003#issuecomment-690927594 @elonazoulay could you help check if GcsPinotFs is good when the bucket has many objects? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 removed a comment on issue #6003: Review PinotFS ListFile Implementations
fx19880617 removed a comment on issue #6003: URL: https://github.com/apache/incubator-pinot/issues/6003#issuecomment-690920467 @snleee @elonazoulay This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 commented on issue #6003: Review PinotFS ListFile Implementations
fx19880617 commented on issue #6003: URL: https://github.com/apache/incubator-pinot/issues/6003#issuecomment-690926043 @snleee seems that ADLSGen2 also has the problem of truncation: https://docs.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/filesystem/list This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] branch fixing_s3_list_api updated (8665918 -> 0226925)
This is an automated email from the ASF dual-hosted git repository. xiangfu pushed a change to branch fixing_s3_list_api in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git. discard 8665918 Fix S3PinotFS List API may not return full results add 0226925 Fix S3PinotFS List API may not return full results This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (8665918) \ N -- N -- N refs/heads/fixing_s3_list_api (0226925) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: .../src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] branch fixing_s3_list_api updated (8665918 -> 0226925)
This is an automated email from the ASF dual-hosted git repository. xiangfu pushed a change to branch fixing_s3_list_api in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git. discard 8665918 Fix S3PinotFS List API may not return full results add 0226925 Fix S3PinotFS List API may not return full results This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (8665918) \ N -- N -- N refs/heads/fixing_s3_list_api (0226925) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: .../src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 commented on issue #6003: Review PinotFS ListFile Implementations
fx19880617 commented on issue #6003: URL: https://github.com/apache/incubator-pinot/issues/6003#issuecomment-690920467 @snleee @elonazoulay This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] branch fixing_s3_list_api updated (8665918 -> 0226925)
This is an automated email from the ASF dual-hosted git repository. xiangfu pushed a change to branch fixing_s3_list_api in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git. discard 8665918 Fix S3PinotFS List API may not return full results add 0226925 Fix S3PinotFS List API may not return full results This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (8665918) \ N -- N -- N refs/heads/fixing_s3_list_api (0226925) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: .../src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 commented on a change in pull request #6002: Fixing S3PinotFS List API returned partial results
fx19880617 commented on a change in pull request #6002: URL: https://github.com/apache/incubator-pinot/pull/6002#discussion_r486817834 ## File path: pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java ## @@ -374,33 +375,39 @@ public long length(URI fileUri) throws IOException { try { ImmutableList.Builder builder = ImmutableList.builder(); + String continuationToken = null; + boolean isDone = false; String prefix = normalizeToDirectoryPrefix(fileUri); - - ListObjectsV2Response listObjectsV2Response; - ListObjectsV2Request.Builder listObjectsV2RequestBuilder = - ListObjectsV2Request.builder().bucket(fileUri.getHost()); - - if (!prefix.equals(DELIMITER)) { -listObjectsV2RequestBuilder = listObjectsV2RequestBuilder.prefix(prefix); - } - - if (!recursive) { -listObjectsV2RequestBuilder = listObjectsV2RequestBuilder.delimiter(DELIMITER); - } - - ListObjectsV2Request listObjectsV2Request = listObjectsV2RequestBuilder.build(); - listObjectsV2Response = _s3Client.listObjectsV2(listObjectsV2Request); - - listObjectsV2Response.contents().stream().forEach(object -> { -//Only add files and not directories -if (!object.key().equals(fileUri.getPath()) && !object.key().endsWith(DELIMITER)) { - String fileKey = object.key(); - if (fileKey.startsWith(DELIMITER)) { -fileKey = fileKey.substring(1); - } - builder.add(S3_SCHEME + fileUri.getHost() + DELIMITER + fileKey); + while(!isDone) { +ListObjectsV2Request.Builder listObjectsV2RequestBuilder = +ListObjectsV2Request.builder().bucket(fileUri.getHost()); +if (!prefix.equals(DELIMITER)) { + listObjectsV2RequestBuilder = listObjectsV2RequestBuilder.prefix(prefix); +} +if (!recursive) { + listObjectsV2RequestBuilder = listObjectsV2RequestBuilder.delimiter(DELIMITER); } - }); +if (continuationToken != null) { + listObjectsV2RequestBuilder.continuationToken(continuationToken); +} +ListObjectsV2Request listObjectsV2Request = listObjectsV2RequestBuilder.build(); +LOGGER.debug("Trying to send ListObjectsV2Request {}", listObjectsV2Request); +ListObjectsV2Response listObjectsV2Response = _s3Client.listObjectsV2(listObjectsV2Request); +LOGGER.debug("Getting ListObjectsV2Response: {}", listObjectsV2Response); +List filesReturned = listObjectsV2Response.contents(); +filesReturned.stream().forEach(object -> { + //Only add files and not directories + if (!object.key().equals(fileUri.getPath()) && !object.key().endsWith(DELIMITER)) { +String fileKey = object.key(); +if (fileKey.startsWith(DELIMITER)) { + fileKey = fileKey.substring(1); +} +builder.add(S3_SCHEME + fileUri.getHost() + DELIMITER + fileKey); + } +}); +isDone = !listObjectsV2Response.isTruncated(); +continuationToken = listObjectsV2Response.nextContinuationToken(); + } return builder.build().toArray(new String[0]); Review comment: added. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] Jackie-Jiang opened a new pull request #6004: Add ThetaSketchAggregationFunction
Jackie-Jiang opened a new pull request #6004: URL: https://github.com/apache/incubator-pinot/pull/6004 ## Description Introduce `ThetaSketchAggregationFunction` as an enhanced version of `DistinctCountThetaSketchAggregationFunction`, and add the following supports: - Support aggregating on raw values of all data types (both SV and MV) - Support nested filter (E.g. `A = 1 AND (B = 2 OR C = 3)`) - Support filter on all data types (both SV and MV) - Support simple union without filters (E.g. `thetaSketch(col)`) - Support `$0` as the default sketch (sketch without filter) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 commented on issue #6003: Review PinotFS ListFile Implementations
fx19880617 commented on issue #6003: URL: https://github.com/apache/incubator-pinot/issues/6003#issuecomment-690914135 S3PinotFs is fixed in #6002 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 opened a new issue #6003: Review PinotFS ListFile Implementations
fx19880617 opened a new issue #6003: URL: https://github.com/apache/incubator-pinot/issues/6003 Per https://github.com/apache/incubator-pinot/pull/6002, S3 has a limit on each ListObject call response. I think this policy may also be true for other blob store. This issue is created to review other PinotFs implementations to ensure the similar issues are not presenting there. * [ ] AzurePinotFs * [ ] ADLSGen2PinotFs * [ ] GcsPinotFs * [ ] HadoopPinotFs * [ ] S3PinotFs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] kishoreg commented on a change in pull request #6002: Fixing S3PinotFS List API returned partial results
kishoreg commented on a change in pull request #6002: URL: https://github.com/apache/incubator-pinot/pull/6002#discussion_r486810483 ## File path: pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java ## @@ -374,33 +375,39 @@ public long length(URI fileUri) throws IOException { try { ImmutableList.Builder builder = ImmutableList.builder(); + String continuationToken = null; + boolean isDone = false; String prefix = normalizeToDirectoryPrefix(fileUri); - - ListObjectsV2Response listObjectsV2Response; - ListObjectsV2Request.Builder listObjectsV2RequestBuilder = - ListObjectsV2Request.builder().bucket(fileUri.getHost()); - - if (!prefix.equals(DELIMITER)) { -listObjectsV2RequestBuilder = listObjectsV2RequestBuilder.prefix(prefix); - } - - if (!recursive) { -listObjectsV2RequestBuilder = listObjectsV2RequestBuilder.delimiter(DELIMITER); - } - - ListObjectsV2Request listObjectsV2Request = listObjectsV2RequestBuilder.build(); - listObjectsV2Response = _s3Client.listObjectsV2(listObjectsV2Request); - - listObjectsV2Response.contents().stream().forEach(object -> { -//Only add files and not directories -if (!object.key().equals(fileUri.getPath()) && !object.key().endsWith(DELIMITER)) { - String fileKey = object.key(); - if (fileKey.startsWith(DELIMITER)) { -fileKey = fileKey.substring(1); - } - builder.add(S3_SCHEME + fileUri.getHost() + DELIMITER + fileKey); + while(!isDone) { +ListObjectsV2Request.Builder listObjectsV2RequestBuilder = +ListObjectsV2Request.builder().bucket(fileUri.getHost()); +if (!prefix.equals(DELIMITER)) { + listObjectsV2RequestBuilder = listObjectsV2RequestBuilder.prefix(prefix); +} +if (!recursive) { + listObjectsV2RequestBuilder = listObjectsV2RequestBuilder.delimiter(DELIMITER); } - }); +if (continuationToken != null) { + listObjectsV2RequestBuilder.continuationToken(continuationToken); +} +ListObjectsV2Request listObjectsV2Request = listObjectsV2RequestBuilder.build(); +LOGGER.debug("Trying to send ListObjectsV2Request {}", listObjectsV2Request); +ListObjectsV2Response listObjectsV2Response = _s3Client.listObjectsV2(listObjectsV2Request); +LOGGER.debug("Getting ListObjectsV2Response: {}", listObjectsV2Response); +List filesReturned = listObjectsV2Response.contents(); +filesReturned.stream().forEach(object -> { + //Only add files and not directories + if (!object.key().equals(fileUri.getPath()) && !object.key().endsWith(DELIMITER)) { +String fileKey = object.key(); +if (fileKey.startsWith(DELIMITER)) { + fileKey = fileKey.substring(1); +} +builder.add(S3_SCHEME + fileUri.getHost() + DELIMITER + fileKey); + } +}); +isDone = !listObjectsV2Response.isTruncated(); +continuationToken = listObjectsV2Response.nextContinuationToken(); + } return builder.build().toArray(new String[0]); Review comment: will be good to log the total number of files listed as info This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 commented on pull request #6002: Fixing S3PinotFS List API returned partial results
fx19880617 commented on pull request #6002: URL: https://github.com/apache/incubator-pinot/pull/6002#issuecomment-690912063 We should also review other PinotFs implementations to ensure the similar issues are not presenting there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 opened a new pull request #6002: Fixing S3PinotFS List API returned partial results
fx19880617 opened a new pull request #6002: URL: https://github.com/apache/incubator-pinot/pull/6002 ## Description S3 API has a bounded limit(1000) for the objects returned in ListObject API, which means each call may at most returned 1000 S3 objects. This PR will check `ListObjectsV2Response` and fetch next batch when necessary. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org
[incubator-pinot] 01/01: Fix S3PinotFS List API may not return full results
This is an automated email from the ASF dual-hosted git repository. xiangfu pushed a commit to branch fixing_s3_list_api in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git commit 86659186e8e2196b199a5c2bb011dd560a5a524f Author: Xiang Fu AuthorDate: Thu Sep 10 23:38:37 2020 -0700 Fix S3PinotFS List API may not return full results --- .../spark/SparkSegmentGenerationJobRunner.java | 1 + .../apache/pinot/plugin/filesystem/S3PinotFS.java | 57 -- 2 files changed, 33 insertions(+), 25 deletions(-) diff --git a/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java b/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java index ad96e5d..c1b3f25 100644 --- a/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java +++ b/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java @@ -188,6 +188,7 @@ public class SparkSegmentGenerationJobRunner implements IngestionJobRunner, Seri } } +LOGGER.info("Found {} files to create Pinot segments!", filteredFiles.size()); try { JavaSparkContext sparkContext = JavaSparkContext.fromSparkContext(SparkContext.getOrCreate()); diff --git a/pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java b/pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java index d70eadc..9d03c47 100644 --- a/pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java +++ b/pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java @@ -29,6 +29,7 @@ import java.nio.charset.StandardCharsets; import java.nio.file.Path; import java.nio.file.Paths; import java.util.HashMap; +import java.util.List; import java.util.Map; import org.apache.commons.io.FileUtils; @@ -374,33 +375,39 @@ public class S3PinotFS extends PinotFS { throws IOException { try { ImmutableList.Builder builder = ImmutableList.builder(); + String continuationToken = null; + boolean isDone = false; String prefix = normalizeToDirectoryPrefix(fileUri); - - ListObjectsV2Response listObjectsV2Response; - ListObjectsV2Request.Builder listObjectsV2RequestBuilder = - ListObjectsV2Request.builder().bucket(fileUri.getHost()); - - if (!prefix.equals(DELIMITER)) { -listObjectsV2RequestBuilder = listObjectsV2RequestBuilder.prefix(prefix); - } - - if (!recursive) { -listObjectsV2RequestBuilder = listObjectsV2RequestBuilder.delimiter(DELIMITER); - } - - ListObjectsV2Request listObjectsV2Request = listObjectsV2RequestBuilder.build(); - listObjectsV2Response = _s3Client.listObjectsV2(listObjectsV2Request); - - listObjectsV2Response.contents().stream().forEach(object -> { -//Only add files and not directories -if (!object.key().equals(fileUri.getPath()) && !object.key().endsWith(DELIMITER)) { - String fileKey = object.key(); - if (fileKey.startsWith(DELIMITER)) { -fileKey = fileKey.substring(1); - } - builder.add(S3_SCHEME + fileUri.getHost() + DELIMITER + fileKey); + while(!isDone) { +ListObjectsV2Request.Builder listObjectsV2RequestBuilder = +ListObjectsV2Request.builder().bucket(fileUri.getHost()); +if (!prefix.equals(DELIMITER)) { + listObjectsV2RequestBuilder = listObjectsV2RequestBuilder.prefix(prefix); +} +if (!recursive) { + listObjectsV2RequestBuilder = listObjectsV2RequestBuilder.delimiter(DELIMITER); } - }); +if (continuationToken != null) { + listObjectsV2RequestBuilder.continuationToken(continuationToken); +} +ListObjectsV2Request listObjectsV2Request = listObjectsV2RequestBuilder.build(); +LOGGER.debug("Trying to send ListObjectsV2Request {}", listObjectsV2Request); +ListObjectsV2Response listObjectsV2Response = _s3Client.listObjectsV2(listObjectsV2Request); +LOGGER.debug("Getting ListObjectsV2Response: {}", listObjectsV2Response); +List filesReturned = listObjectsV2Response.contents(); +filesReturned.stream().forEach(object -> { + //Only add files and not directories + if (!object.key().equals(fileUri.getPath()) && !object.key().endsWith(DELIMITER)) { +String fileKey = object.key(); +if (fileKey.startsWith(DELIMITER)) { + fileKey = fileKey.substring(1); +
[incubator-pinot] branch fixing_s3_list_api created (now 8665918)
This is an automated email from the ASF dual-hosted git repository. xiangfu pushed a change to branch fixing_s3_list_api in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git. at 8665918 Fix S3PinotFS List API may not return full results This branch includes the following new commits: new 8665918 Fix S3PinotFS List API may not return full results The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org