[GitHub] ratcashdev edited a comment on issue #1334: Experimental settings to achieve 5 ms latency
ratcashdev edited a comment on issue #1334: Experimental settings to achieve 5 ms latency URL: https://github.com/apache/incubator-pulsar/issues/1334#issuecomment-408586469 Thank you for the detailed explanation, @sijie. So if I understand all of that correctly, 1. when we compare the two in their default configuration (default config being the `apple`), Kafka will probably have lower latency than Pulsar for publishing, since - at least up until know - it could not control fsync behavior. Users need to keep in mind the difference in behaviour at their default settings: `filesystem-persistence` (Kafka) vs `direct-to-disk-persistence` and forced fsync (Pulsar) 2. when we compare the two in a config that's comparable in behavior (here identical behavior being the `apple`), Kafka (with something resembling `direct-to-disk-persistence` will likely provide higher latency compared to Pulsar in its default setting in this regard. 3. Pulsar using `filesystem-persistence` was not possible up until today (can you please pinpoint the exact config that needs to be tweaked) , and it's likely that it will provide lower latencies, but not lower than with a `non-persistent` topic, will it? And yet back to option No 2 above: Kafka , by default, (first?) replicates the data, then writes to the filesystem. What's the order for Pulsar, when we disable forced fsync? This makes me wonder. If up until know Pulsar did not support delayed/disabled fsync and is the default for kafka, how exactly could they This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ratcashdev edited a comment on issue #1334: Experimental settings to achieve 5 ms latency
ratcashdev edited a comment on issue #1334: Experimental settings to achieve 5 ms latency URL: https://github.com/apache/incubator-pulsar/issues/1334#issuecomment-408586469 Thank you for the detailed explanation, @sijie. So if I understand all of that correctly, 1. when we compare the two in their default configuration (default config being the `apple`), Kafka will probably have lower latency than Pulsar for publishing, since - at least up until know - it could not control fsync behavior. Users need to keep in mind the difference in behaviour at their default settings: `filesystem-persistence` (Kafka) vs `direct-to-disk-persistence` and forced fsync (Pulsar) 2. when we compare the two in a config that's comparable in behavior (here identical behavior being the `apple`), Kafka (with something resembling `direct-to-disk-persistence` will likely provide higher latency compared to Pulsar in its default setting in this regard. 3. Pulsar using `filesystem-persistence` was not possible up until today (can you please pinpoint the exact config that needs to be tweaked) , and it's likely that it will provide lower latencies, but not lower than with a `non-persistent` topic, will it? And yet back to option No 2 above: Kafka , by default, (first?) replicates the data, then writes to the filesystem. What's the order for Pulsar, when we disable forced fsync? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ratcashdev commented on issue #1334: Experimental settings to achieve 5 ms latency
ratcashdev commented on issue #1334: Experimental settings to achieve 5 ms latency URL: https://github.com/apache/incubator-pulsar/issues/1334#issuecomment-408586469 Thank you for the detailed explanation, @sijie. So if I understand all of that correctly, 1. when we compare the two in their default configuration (default config being the `apple`), Kafka will probably have lower latency than Pulsar for publishing, since - at least up until know - it could not control fsync behavior. Users need to keep in mind the difference in behaviour at their default settings: `filesystem-persistence` (Kafka) vs `direct-to-disk-persistence` and forced fsync (Pulsar) 2. when we compare the two in a config that's comparable in behavior (here identical behavior being the `apple`), Kafka (with something resembling `direct-to-disk-persistence` will likely provide higher latency compared to Pulsar in its default setting in this regard. 3. Pulsar using `filesystem-persistence` was not possible up until today (can you please pinpoint the exact config that needs to be tweaked) , and it's likely that it will provide lower latencies, but not lower than with a `non-persistent` topic, will it? And yet back to option No 2 above: Kafka , by default, (first?) replicates the data, then writes to the filesystem. What's the order for Pulsar, if we disabled forced fsync? This makes me wonder. If up until know Pulsar did not support delayed/disabled fsync and is the default for kafka, how exactly could they This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rdhabalia opened a new pull request #2261: Avoid creating output topic on tenant namespace if output-topic not provided
rdhabalia opened a new pull request #2261: Avoid creating output topic on tenant namespace if output-topic not provided URL: https://github.com/apache/incubator-pulsar/pull/2261 ### Motivation Sometime user wants to use pulsar-function only for processing source-messages and do not want to redirect output to any topic. Right now, if user doesn't provided output topic on function-cli then cli derives output-topic and forces output messages to be redirected to that topic. It causes new topic creation under a tenant's namespace without tenant's permission and PulsarSink unnecessary publishes messages to that newly created topic. Also if the namespace is global then messages of the output topic will be replicated to other clusters as well. So, function should not write messages to output topic if output topic is not provided. ### Modifications - function-cli doesn't compute output topic name if it's not provided - function worker doesn't create output topic if it's not present This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] maskit commented on issue #2251: [website] Enable Translation & Localization
maskit commented on issue #2251: [website] Enable Translation & Localization URL: https://github.com/apache/incubator-pulsar/pull/2251#issuecomment-408574638 @sijie The added section looks perfect, but I was worried about a case people suggest translations without reading those contribution guide, because there’s no references on Crowdin side. Maybe I’m too anxious but there should be no cons to have a reference for the README or a statement on Crowdin side. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rdhabalia opened a new pull request #2260: ack message for atmost processing guaranty
rdhabalia opened a new pull request #2260: ack message for atmost processing guaranty URL: https://github.com/apache/incubator-pulsar/pull/2260 ### Motivation [at-most-once](https://doc.akka.io/docs/akka/current/general/message-delivery-reliability.html?language=scala) delivery can't give delivery guaranteed. If function has pulsar as a source and sink both then right now, `PulsarSinkAtMostOnceProcessor` doesn't ack message to broker so, broker restart can redeliver all messages and user might not be expecting this behavior for `at-most-once`. ### Modifications Pulsar sink acks messages regardless publish-result for `at-most-once` usecase. ### Result Pulsar sink will not redeliver duplicate messages when it is consuming messages from pulsar-source for `at-most-once`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rdhabalia opened a new pull request #2259: Fix: set subscription-type based on message ordering
rdhabalia opened a new pull request #2259: Fix: set subscription-type based on message ordering URL: https://github.com/apache/incubator-pulsar/pull/2259 ### Motivation Right now, user can define message processing-guarantees to process/sink messages. however, user can still want to retain message ordering with different processing-types and user should be able to configure FAILOVER subscription type with `ATLEAST_ONCE` option. so, provide option to retain ordering while consuming messages from pulsar-source. ### Modifications Add `retainOrdering` option in functions-cli like we already have in [cli-Sink](https://github.com/apache/incubator-pulsar/blob/master/pulsar-client-tools/src/main/java/org/apache/pulsar/admin/cli/CmdSinks.java#L217). ### Result User can configure message ordering option with different processing types. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rdhabalia commented on issue #2214: Add tls support to authenticate client to access function admin-api
rdhabalia commented on issue #2214: Add tls support to authenticate client to access function admin-api URL: https://github.com/apache/incubator-pulsar/pull/2214#issuecomment-408565039 @sijie added default port for tls and also added tls + authentication test. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rdhabalia opened a new pull request #2258: Derive source/sink arg-class name from functio-class for file-url archive
rdhabalia opened a new pull request #2258: Derive source/sink arg-class name from functio-class for file-url archive URL: https://github.com/apache/incubator-pulsar/pull/2258 ### Motivation When user submits function with file-url, server loads jar and performs validation. In that case, server should also configures source/sink arg type-class if it's not configured in the function details. ### Modification when user submits function with url using cli, it doesn't pass type-arg classname into the request, so, server will update it by loading jar and function-class name. ### Result User can successfully submit function jar with file-url. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rdhabalia opened a new pull request #2257: Fix: function-cli avoid arg-type check and class-loading for file url
rdhabalia opened a new pull request #2257: Fix: function-cli avoid arg-type check and class-loading for file url URL: https://github.com/apache/incubator-pulsar/pull/2257 ### Motivation While function-registration, Function does server side validation for function-jar with url (http/file). So, cli can't load jar for file-url protocol and validate classname of the function. ### Modifications [Skip class-name validation](https://github.com/apache/incubator-pulsar/blob/master/pulsar-client-tools/src/main/java/org/apache/pulsar/admin/cli/CmdFunctions.java#L447) for jar with file url. ### Result user can submit function with file-url using cli. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly opened a new pull request #2256: Namespace level policy for offload deletion lag
ivankelly opened a new pull request #2256: Namespace level policy for offload deletion lag URL: https://github.com/apache/incubator-pulsar/pull/2256 Add a policy parameter at the namespace level for the offload deletion lag, the amount of time to wait after offloading a ledger before we delete the ledger from bookkeeper. This namespace policy overrides the broker configured policy. Via the REST api this is exposed at a millisecond granularity while via the CLI it is exposed a minute granularity. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat commented on a change in pull request #2254: Fix pulsar:version vars in site2 docs.
merlimat commented on a change in pull request #2254: Fix pulsar:version vars in site2 docs. URL: https://github.com/apache/incubator-pulsar/pull/2254#discussion_r205902157 ## File path: site2/docs/deploy-bare-metal.md ## @@ -48,20 +48,20 @@ For machines running a bookie and a Pulsar broker, we recommend using more power To get started deploying a Pulsar cluster on bare metal, you'll need to download a binary tarball release in one of the following ways: * By clicking on the link directly below, which will automatically trigger a download: - * Pulsar pulsar:version binary release + * Pulsar {{pulsar:version}} binary release * From the Pulsar [downloads page](pulsar:download_page_url) * From the Pulsar [releases page](https://github.com/apache/incubator-pulsar/releases/latest) on [GitHub](https://github.com) * Using [wget](https://www.gnu.org/software/wget): ```bash -$ wget http://archive.apache.org/dist/incubator/pulsar/pulsar-pulsar:version/apache-pulsar-pulsar:version-bin.tar.gz +$ wget pulsar:binary_release_url Review comment: Sure, no problem with that. In any case, I think that the notation `{{pulsar:version}}`, since it's common in many template languanges, would make it clearer to someone editing this document that this will be somehow replaced on the published version. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #2236: how to unsubscribe pulsar mailing list.
sijie commented on issue #2236: how to unsubscribe pulsar mailing list. URL: https://github.com/apache/incubator-pulsar/issues/2236#issuecomment-408532621 @mykidong when you send emails to users-unsubscribe, did you receive any emails from the mailing list confirming you are removed? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #2250: How to manage namespace policies efficiently
sijie commented on issue #2250: How to manage namespace policies efficiently URL: https://github.com/apache/incubator-pulsar/issues/2250#issuecomment-408532443 I never try rocketmq UI before. do you have any experiences that you can share with us? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #2251: [website] Enable Translation & Localization
sijie commented on issue #2251: [website] Enable Translation & Localization URL: https://github.com/apache/incubator-pulsar/pull/2251#issuecomment-408531124 rerun java8 tests This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on a change in pull request #2253: Add Test Specific Examples
sijie commented on a change in pull request #2253: Add Test Specific Examples URL: https://github.com/apache/incubator-pulsar/pull/2253#discussion_r205888617 ## File path: pulsar-functions/java-examples/src/main/java/org/apache/pulsar/functions/api/examples/test/CustomBaseToBaseFunction.java ## @@ -0,0 +1,28 @@ +/** + * Copyright (c) 2018 Streamlio Inc. All Rights Reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.pulsar.functions.api.examples.test; + +import org.apache.pulsar.functions.api.Context; +import org.apache.pulsar.functions.api.Function; + +public class CustomBaseToBaseFunction implements Function { Review comment: also please document them. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on a change in pull request #2253: Add Test Specific Examples
sijie commented on a change in pull request #2253: Add Test Specific Examples URL: https://github.com/apache/incubator-pulsar/pull/2253#discussion_r205888563 ## File path: pulsar-functions/java-examples/src/main/java/org/apache/pulsar/functions/api/examples/test/ByteBufferSerDe.java ## @@ -0,0 +1,33 @@ +/** + * Copyright (c) 2018 Streamlio Inc. All Rights Reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.pulsar.functions.api.examples.test; Review comment: I can see only one function is for negative test. It can be under `test` package. However all the other classes are examples: - examples for customizing serde - examples for using custom objects - examples for using context.publish - examples for windowing - examples for config This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie opened a new pull request #2255: [website] Add contribution guide to README.md
sijie opened a new pull request #2255: [website] Add contribution guide to README.md URL: https://github.com/apache/incubator-pulsar/pull/2255 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] aahmed-se commented on a change in pull request #2253: Add Test Specific Examples
aahmed-se commented on a change in pull request #2253: Add Test Specific Examples URL: https://github.com/apache/incubator-pulsar/pull/2253#discussion_r205881296 ## File path: pulsar-functions/java-examples/src/main/java/org/apache/pulsar/functions/api/examples/test/ByteBufferSerDe.java ## @@ -0,0 +1,33 @@ +/** + * Copyright (c) 2018 Streamlio Inc. All Rights Reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.pulsar.functions.api.examples.test; Review comment: not sure about that, some of them are negative test function they are expected to fail, these should be kept in a separate package. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] aahmed-se commented on a change in pull request #2253: Add Test Specific Examples
aahmed-se commented on a change in pull request #2253: Add Test Specific Examples URL: https://github.com/apache/incubator-pulsar/pull/2253#discussion_r205880996 ## File path: pulsar-functions/java-examples/src/main/java/org/apache/pulsar/functions/api/examples/test/ByteBufferSerDe.java ## @@ -0,0 +1,33 @@ +/** Review comment: fixed all headers This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cckellogg commented on a change in pull request #2254: Fix pulsar:version vars in site2 docs.
cckellogg commented on a change in pull request #2254: Fix pulsar:version vars in site2 docs. URL: https://github.com/apache/incubator-pulsar/pull/2254#discussion_r205880941 ## File path: site2/docs/deploy-bare-metal.md ## @@ -48,20 +48,20 @@ For machines running a bookie and a Pulsar broker, we recommend using more power To get started deploying a Pulsar cluster on bare metal, you'll need to download a binary tarball release in one of the following ways: * By clicking on the link directly below, which will automatically trigger a download: - * Pulsar pulsar:version binary release + * Pulsar {{pulsar:version}} binary release * From the Pulsar [downloads page](pulsar:download_page_url) * From the Pulsar [releases page](https://github.com/apache/incubator-pulsar/releases/latest) on [GitHub](https://github.com) * Using [wget](https://www.gnu.org/software/wget): ```bash -$ wget http://archive.apache.org/dist/incubator/pulsar/pulsar-pulsar:version/apache-pulsar-pulsar:version-bin.tar.gz +$ wget pulsar:binary_release_url Review comment: We are going to change those in future patches for consistency. This one is just to make sure we got all the pulsar:version ones. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat commented on a change in pull request #2254: Fix pulsar:version vars in site2 docs.
merlimat commented on a change in pull request #2254: Fix pulsar:version vars in site2 docs. URL: https://github.com/apache/incubator-pulsar/pull/2254#discussion_r205880375 ## File path: site2/docs/deploy-bare-metal.md ## @@ -48,20 +48,20 @@ For machines running a bookie and a Pulsar broker, we recommend using more power To get started deploying a Pulsar cluster on bare metal, you'll need to download a binary tarball release in one of the following ways: * By clicking on the link directly below, which will automatically trigger a download: - * Pulsar pulsar:version binary release + * Pulsar {{pulsar:version}} binary release * From the Pulsar [downloads page](pulsar:download_page_url) * From the Pulsar [releases page](https://github.com/apache/incubator-pulsar/releases/latest) on [GitHub](https://github.com) * Using [wget](https://www.gnu.org/software/wget): ```bash -$ wget http://archive.apache.org/dist/incubator/pulsar/pulsar-pulsar:version/apache-pulsar-pulsar:version-bin.tar.gz +$ wget pulsar:binary_release_url Review comment: Should we use `{{...}}` as well for consistency with `{{pulsar:version}}`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cckellogg opened a new pull request #2254: Fix pulsar:version vars in site2 docs.
cckellogg opened a new pull request #2254: Fix pulsar:version vars in site2 docs. URL: https://github.com/apache/incubator-pulsar/pull/2254 ### Motivation pulsar:version shows up in the docs. http://pulsar.incubator.apache.org/staging/docs/standalone/#installing-pulsar ### Modifications Update to the new variable format. ### Result The correct version shows up. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie closed pull request #2248: [website] remove `reference-auth.md`
sijie closed pull request #2248: [website] remove `reference-auth.md` URL: https://github.com/apache/incubator-pulsar/pull/2248 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/site2/docs/reference-auth.md b/site2/docs/reference-auth.md deleted file mode 100644 index cc7e8fc42c..00 --- a/site2/docs/reference-auth.md +++ /dev/null @@ -1,206 +0,0 @@ -id: reference-auth -title: Extending Authentication and Authorization in Pulsar -sidebar_label: Authn & Authz plugins - -Pulsar provides a way to use custom authentication and authorization mechanisms - -## Authentication - -Pulsar supports mutual TLS and Athenz authentication plugins, and these can be used as described -[here](security-overview.md). - -It is possible to use a custom authentication mechanism by providing the implementation in the -form of two plugins one for the Client library and the other for the Pulsar Broker to validate -the credentials. - -### Client authentication plugin - -For client library, you will need to implement `org.apache.pulsar.client.api.Authentication`. This class can then be passed -when creating a Pulsar client: - -```java -PulsarClient client = PulsarClient.builder() -.serviceUrl("pulsar://localhost:6650") -.authentication(new MyAuthentication()) -.build(); -``` - -For reference, there are 2 interfaces to implement on the client side: - * `Authentication` -> [Authentication API](http://pulsar.apache.org/api/client/org/apache/pulsar/client/api/Authentication.html) - * `AuthenticationDataProvider` -> [AuthenticationDataProvider API](http://pulsar.apache.org/api/client/org/apache/pulsar/client/api/AuthenticationDataProvider.html) - - -This in turn will need to provide the client credentials in the form of `org.apache.pulsar.client.api.AuthenticationDataProvider`. This will leave -the chance to return different kinds of authentication token for different -type of connection or by passing a certificate chain to use for TLS. - - -Examples for client authentication providers can be found at: - - * Mutual TLS Auth -- https://github.com/apache/incubator-pulsar/tree/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/auth - * Athenz -- https://github.com/apache/incubator-pulsar/tree/master/pulsar-client-auth-athenz/src/main/java/org/apache/pulsar/client/impl/auth - -### Broker authentication plugin - -On broker side, we need the corresponding plugin to validate the credentials -passed by the client. Broker can support multiple authentication providers -at the same time. - -In `conf/broker.conf` it's possible to specify a list of valid providers: - -```properties -# Autentication provider name list, which is comma separated list of class names -authenticationProviders= -``` - -There is one single interface to implement `org.apache.pulsar.broker.authentication.AuthenticationProvider`: - -```java -/** - * Provider of authentication mechanism - */ -public interface AuthenticationProvider extends Closeable { - -/** - * Perform initialization for the authentication provider - * - * @param config - *broker config object - * @throws IOException - * if the initialization fails - */ -void initialize(ServiceConfiguration config) throws IOException; - -/** - * @return the authentication method name supported by this provider - */ -String getAuthMethodName(); - -/** - * Validate the authentication for the given credentials with the specified authentication data - * - * @param authData - *provider specific authentication data - * @return the "role" string for the authenticated connection, if the authentication was successful - * @throws AuthenticationException - * if the credentials are not valid - */ -String authenticate(AuthenticationDataSource authData) throws AuthenticationException; - -} -``` - -Example for Broker authentication plugins: - - * Mutual TLS -- https://github.com/apache/incubator-pulsar/blob/master/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/authentication/AuthenticationProviderTls.java - * Athenz -- https://github.com/apache/incubator-pulsar/blob/master/pulsar-broker-auth-athenz/src/main/java/org/apache/pulsar/broker/authentication/AuthenticationProviderAthenz.java - -## Authorization - -Authorization is the operation that checks whether a particular "role" or "principal" is -allowed to perform a certain operation. - -By default, Pulsar provides an embedded authorization, though it's possible to -configure a different one through a plugin. - -To provide a custom provider, one needs to implement the - `org.apache.pulsar.broker.authorization.AuthorizationProvider` interface, have t
[incubator-pulsar] branch master updated: [website] remove `reference-auth.md` (#2248)
This is an automated email from the ASF dual-hosted git repository. sijie pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-pulsar.git The following commit(s) were added to refs/heads/master by this push: new 9d37897 [website] remove `reference-auth.md` (#2248) 9d37897 is described below commit 9d378978026b532386d89f7af3afc6997d3bff1c Author: Sijie Guo AuthorDate: Fri Jul 27 12:11:39 2018 -0700 [website] remove `reference-auth.md` (#2248) ### Motivation This page is a duplication of `security-extending.md` ### Changes Remove the duplicated `reference-auth.md` --- site2/docs/reference-auth.md | 206 --- site2/website/sidebars.json | 3 +- 2 files changed, 1 insertion(+), 208 deletions(-) diff --git a/site2/docs/reference-auth.md b/site2/docs/reference-auth.md deleted file mode 100644 index cc7e8fc..000 --- a/site2/docs/reference-auth.md +++ /dev/null @@ -1,206 +0,0 @@ -id: reference-auth -title: Extending Authentication and Authorization in Pulsar -sidebar_label: Authn & Authz plugins - -Pulsar provides a way to use custom authentication and authorization mechanisms - -## Authentication - -Pulsar supports mutual TLS and Athenz authentication plugins, and these can be used as described -[here](security-overview.md). - -It is possible to use a custom authentication mechanism by providing the implementation in the -form of two plugins one for the Client library and the other for the Pulsar Broker to validate -the credentials. - -### Client authentication plugin - -For client library, you will need to implement `org.apache.pulsar.client.api.Authentication`. This class can then be passed -when creating a Pulsar client: - -```java -PulsarClient client = PulsarClient.builder() -.serviceUrl("pulsar://localhost:6650") -.authentication(new MyAuthentication()) -.build(); -``` - -For reference, there are 2 interfaces to implement on the client side: - * `Authentication` -> [Authentication API](http://pulsar.apache.org/api/client/org/apache/pulsar/client/api/Authentication.html) - * `AuthenticationDataProvider` -> [AuthenticationDataProvider API](http://pulsar.apache.org/api/client/org/apache/pulsar/client/api/AuthenticationDataProvider.html) - - -This in turn will need to provide the client credentials in the form of `org.apache.pulsar.client.api.AuthenticationDataProvider`. This will leave -the chance to return different kinds of authentication token for different -type of connection or by passing a certificate chain to use for TLS. - - -Examples for client authentication providers can be found at: - - * Mutual TLS Auth -- https://github.com/apache/incubator-pulsar/tree/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/auth - * Athenz -- https://github.com/apache/incubator-pulsar/tree/master/pulsar-client-auth-athenz/src/main/java/org/apache/pulsar/client/impl/auth - -### Broker authentication plugin - -On broker side, we need the corresponding plugin to validate the credentials -passed by the client. Broker can support multiple authentication providers -at the same time. - -In `conf/broker.conf` it's possible to specify a list of valid providers: - -```properties -# Autentication provider name list, which is comma separated list of class names -authenticationProviders= -``` - -There is one single interface to implement `org.apache.pulsar.broker.authentication.AuthenticationProvider`: - -```java -/** - * Provider of authentication mechanism - */ -public interface AuthenticationProvider extends Closeable { - -/** - * Perform initialization for the authentication provider - * - * @param config - *broker config object - * @throws IOException - * if the initialization fails - */ -void initialize(ServiceConfiguration config) throws IOException; - -/** - * @return the authentication method name supported by this provider - */ -String getAuthMethodName(); - -/** - * Validate the authentication for the given credentials with the specified authentication data - * - * @param authData - *provider specific authentication data - * @return the "role" string for the authenticated connection, if the authentication was successful - * @throws AuthenticationException - * if the credentials are not valid - */ -String authenticate(AuthenticationDataSource authData) throws AuthenticationException; - -} -``` - -Example for Broker authentication plugins: - - * Mutual TLS -- https://github.com/apache/incubator-pulsar/blob/master/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/authentication/AuthenticationProviderTls.java - * Athenz -- https://github.com/apache/incubator-pulsar/blob/master/pulsar-broker-auth-athenz/src/main/java/org/apache/pulsar/broker/authentication/AuthenticationProviderAthenz.jav
[GitHub] sijie commented on a change in pull request #2253: Add Test Specific Examples
sijie commented on a change in pull request #2253: Add Test Specific Examples URL: https://github.com/apache/incubator-pulsar/pull/2253#discussion_r205869900 ## File path: pulsar-functions/java-examples/src/main/java/org/apache/pulsar/functions/api/examples/test/CustomBaseToBaseFunction.java ## @@ -0,0 +1,28 @@ +/** + * Copyright (c) 2018 Streamlio Inc. All Rights Reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.pulsar.functions.api.examples.test; + +import org.apache.pulsar.functions.api.Context; +import org.apache.pulsar.functions.api.Function; + +public class CustomBaseToBaseFunction implements Function { Review comment: for each function, can you add a comment here about: - what does this function do? - what kind of features that this example is demonstrating? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on a change in pull request #2253: Add Test Specific Examples
sijie commented on a change in pull request #2253: Add Test Specific Examples URL: https://github.com/apache/incubator-pulsar/pull/2253#discussion_r205869405 ## File path: pulsar-functions/java-examples/src/main/java/org/apache/pulsar/functions/api/examples/test/TestWindowDurationFunction.java ## @@ -0,0 +1,26 @@ +/** + * Copyright (c) 2018 Streamlio Inc. All Rights Reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.pulsar.functions.api.examples.test; + +import java.util.Collection; + +public class TestWindowDurationFunction implements java.util.function.Function, String> { Review comment: WindowDurationExampleFunction This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on a change in pull request #2253: Add Test Specific Examples
sijie commented on a change in pull request #2253: Add Test Specific Examples URL: https://github.com/apache/incubator-pulsar/pull/2253#discussion_r205868967 ## File path: pulsar-functions/java-examples/src/main/java/org/apache/pulsar/functions/api/examples/test/ByteBufferSerDe.java ## @@ -0,0 +1,33 @@ +/** + * Copyright (c) 2018 Streamlio Inc. All Rights Reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.pulsar.functions.api.examples.test; Review comment: all these files should be under `org.apache.pulsar.functions.api.examples`, `test` is not needed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on a change in pull request #2253: Add Test Specific Examples
sijie commented on a change in pull request #2253: Add Test Specific Examples URL: https://github.com/apache/incubator-pulsar/pull/2253#discussion_r205868747 ## File path: pulsar-functions/java-examples/src/main/java/org/apache/pulsar/functions/api/examples/test/ByteBufferSerDe.java ## @@ -0,0 +1,33 @@ +/** Review comment: Fix the license header - "removing the streamlio copyright line" This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on a change in pull request #2253: Add Test Specific Examples
sijie commented on a change in pull request #2253: Add Test Specific Examples URL: https://github.com/apache/incubator-pulsar/pull/2253#discussion_r205869311 ## File path: pulsar-functions/java-examples/src/main/java/org/apache/pulsar/functions/api/examples/test/TestWindowFunction.java ## @@ -0,0 +1,25 @@ +/** + * Copyright (c) 2018 Streamlio Inc. All Rights Reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.pulsar.functions.api.examples.test; + +import java.util.Collection; + +public class TestWindowFunction implements java.util.function.Function, String> { Review comment: WindowExampleFunction This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #1334: Experimental settings to achieve 5 ms latency
sijie commented on issue #1334: Experimental settings to achieve 5 ms latency URL: https://github.com/apache/incubator-pulsar/issues/1334#issuecomment-408508215 @ratcashdev > Kafka (and also RabbitMQ) can achieve sub 1 ms latency, while using persistence. `persistence` means different things when different people talk about it. when most of the projects talking about disk persistence, they mean data are written to filesystem. However when data is written to filesystem doesn't really mean data is persistent. Kafka only writes to filesystem without fsyncs. the "persistence" here is a `filesystem` persistence, but it still faces data-loss possibility. In relational databases, people talk about "ACID" - "D" for "Durability", when data is persistent to disk, it means writing to filesystem and also *fsync* to disk to ensure data is not lost when machine crashes. this disk-level persistence has much stronger data integrity than filesystem-level persistence. The DZone blog post is benchmarking kafka with filesystem-level-persistence. since most of the data is in filesystem page cache, which even not touch disks. so the latency is expected to be low, since those are effectively "memory" latency. The gigaom report is benchmarking pulsar and kafka with disk-level-persistence (However due to kafka's design, they can not really achieve disk-level-persistence as what Pulsar can achieve. The benchmark was using a close-enough settings to simulate same disk persistence behavior to do an apple-to-apple comparison). so the latency measure there were real disk persistence latency. keep "filesystem-persistence" and "disk-persistence" in the mind when you read the blog posts, they will help you understand those numbers. Pulsar recently added a flag to disable `fsync` behavior. so frankly speaking, pulsar is able to achieve both `filesystem-persistence` and `disk-persistence`, people can decide to choose which one based on their tradeoffs. However I don't think there is any benchmark results regarding pulsar `filesystem-persistence`. All the benchmark results or claims in the blog posts about 5ms latency are about `disk-persistence`. for replicating the claims, you can try out [open-messaging-benchmark](http://openmessaging.cloud/docs/benchmarks/). open messaging is a linux foundation project on building a vendor neutral messaging standard and benchmark. Hope this helps. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] aahmed-se opened a new pull request #2253: Add Test Specific Examples
aahmed-se opened a new pull request #2253: Add Test Specific Examples URL: https://github.com/apache/incubator-pulsar/pull/2253 These are function examples specifically for test scenarios. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #2251: [website] Enable Translation & Localization
sijie commented on issue #2251: [website] Enable Translation & Localization URL: https://github.com/apache/incubator-pulsar/pull/2251#issuecomment-408497589 @maskit I add a section called "contribute translations" in the README.md. Add a statement there saying all the translations are licensed under ALv2 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-pulsar] branch asf-site updated: Updated site at revision fd1b459
This is an automated email from the ASF dual-hosted git repository. mmerli pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-pulsar.git The following commit(s) were added to refs/heads/asf-site by this push: new 6cc7861 Updated site at revision fd1b459 6cc7861 is described below commit 6cc7861ee480ef74fe643799283c407cfe28ca09 Author: jenkins AuthorDate: Fri Jul 27 18:10:14 2018 + Updated site at revision fd1b459 --- .../docs/latest/adaptors/PulsarSpark/index.html| 10 +- .../docs/latest/adaptors/PulsarStorm/index.html| 8 content/docs/latest/admin-api/overview/index.html | 18 +- content/docs/latest/clients/Cpp/index.html | 8 content/docs/latest/clients/Java/index.html| 16 content/docs/latest/clients/Python/index.html | 10 +- content/docs/latest/clients/WebSocket/index.html | 8 content/docs/latest/clients/go/index.html | 6 +++--- .../docs/latest/cookbooks/Encryption/index.html| 6 +++--- .../latest/cookbooks/PartitionedTopics/index.html | 16 .../latest/cookbooks/RetentionExpiry/index.html| 16 .../docs/latest/cookbooks/compaction/index.html| 8 .../cookbooks/message-deduplication/index.html | 12 ++-- .../docs/latest/cookbooks/message-queue/index.html | 16 .../latest/cookbooks/tiered-storage/index.html | 6 +++--- .../docs/latest/deployment/Kubernetes/index.html | 4 ++-- .../docs/latest/deployment/aws-cluster/index.html | 6 +++--- content/docs/latest/deployment/cluster/index.html | 6 +++--- content/docs/latest/deployment/instance/index.html | 6 +++--- .../ConceptsAndArchitecture/index.html | 4 ++-- .../latest/getting-started/LocalCluster/index.html | 4 ++-- .../latest/getting-started/Pulsar-2.0/index.html | 2 +- .../docs/latest/getting-started/docker/index.html | 4 ++-- .../docs/latest/project/BinaryProtocol/index.html | 4 ++-- content/docs/latest/project/CompileCpp/index.html | 8 .../docs/latest/project/SimulationTools/index.html | 2 +- .../docs/latest/project/schema-storage/index.html | 4 ++-- content/docs/latest/reference/CliTools/index.html | 22 +++--- .../docs/latest/security/authorization/index.html | 14 +++--- content/docs/latest/security/encryption/index.html | 6 +++--- content/ja/adaptors/PulsarSpark/index.html | 8 content/ja/adaptors/PulsarStorm/index.html | 6 +++--- content/ja/admin/AdminInterface/index.html | 12 ++-- content/ja/admin/Authz/index.html | 12 ++-- content/ja/admin/ClustersBrokers/index.html| 6 +++--- content/ja/admin/PropertiesNamespaces/index.html | 6 +++--- content/ja/advanced/PartitionedTopics/index.html | 12 ++-- content/ja/advanced/RetentionExpiry/index.html | 12 ++-- content/ja/clients/Cpp/index.html | 6 +++--- content/ja/clients/Java/index.html | 8 content/ja/clients/Python/index.html | 8 content/ja/clients/WebSocket/index.html| 8 content/ja/deployment/InstanceSetup/index.html | 6 +++--- content/ja/deployment/Kubernetes/index.html| 4 ++-- .../ConceptsAndArchitecture/index.html | 2 +- content/ja/getting-started/LocalCluster/index.html | 4 ++-- content/ja/project/BinaryProtocol/index.html | 4 ++-- content/ja/project/SimulationTools/index.html | 2 +- content/ja/reference/CliTools/index.html | 18 +- content/staging/download.html | 4 +--- content/staging/download/index.html| 4 +--- content/staging/en/download.html | 4 +--- content/staging/en/download/index.html | 4 +--- content/staging/en/release-notes.html | 2 +- content/staging/en/release-notes/index.html| 2 +- content/staging/en/versions.html | 2 +- content/staging/en/versions/index.html | 2 +- content/staging/release-notes.html | 2 +- content/staging/release-notes/index.html | 2 +- content/staging/versions.html | 2 +- content/staging/versions/index.html| 2 +- 61 files changed, 214 insertions(+), 222 deletions(-) diff --git a/content/docs/latest/adaptors/PulsarSpark/index.html b/content/docs/latest/adaptors/PulsarSpark/index.html index 6bd0b47..afa2d4a 100644 --- a/content/docs/latest/adaptors/PulsarSpark/index.html +++ b/content/docs/latest/adaptors/PulsarSpark/index.html @@ -1072,9 +1072,9 @@ - Spark Streaming Pulsar receiver + Spark Streaming Pulsar receiver @@ -1316,9 +
[incubator-pulsar] branch master updated: Throw exception on non-zero command exit in integ tests (#2249)
This is an automated email from the ASF dual-hosted git repository. sijie pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-pulsar.git The following commit(s) were added to refs/heads/master by this push: new e2aca36 Throw exception on non-zero command exit in integ tests (#2249) e2aca36 is described below commit e2aca368910c437e4d8eefe826ffb6cf86b232da Author: Ivan Kelly AuthorDate: Fri Jul 27 19:10:05 2018 +0100 Throw exception on non-zero command exit in integ tests (#2249) This patch replaces the boolean flag to ignore failure, and instead always throws an exception if a command exits with non-zero. This means that test code doesn't need to check the exit code for each invokation. For non-zero exits, the ContainerExecResult is carried as part of the exception, so the test can also assert on the contents of stdout, stderr and the exit code. --- .../pulsar/tests/integration/cli/CLITest.java | 52 ++ .../integration/containers/ChaosContainer.java | 3 +- .../integration/docker/ContainerExecException.java | 32 + .../runtime/PulsarFunctionsRuntimeTest.java| 28 ++-- .../tests/integration/io/PulsarIOSinkTest.java | 40 ++--- .../tests/integration/io/PulsarIOSourceTest.java | 39 +--- .../tests/integration/utils/DockerUtils.java | 29 7 files changed, 129 insertions(+), 94 deletions(-) diff --git a/tests/integration/src/test/java/org/apache/pulsar/tests/integration/cli/CLITest.java b/tests/integration/src/test/java/org/apache/pulsar/tests/integration/cli/CLITest.java index 22cc937..acdae26 100644 --- a/tests/integration/src/test/java/org/apache/pulsar/tests/integration/cli/CLITest.java +++ b/tests/integration/src/test/java/org/apache/pulsar/tests/integration/cli/CLITest.java @@ -22,8 +22,10 @@ import static org.testng.Assert.assertEquals; import static org.testng.Assert.assertFalse; import static org.testng.Assert.assertNotEquals; import static org.testng.Assert.assertTrue; +import static org.testng.Assert.fail; import org.apache.pulsar.tests.integration.containers.BrokerContainer; +import org.apache.pulsar.tests.integration.docker.ContainerExecException; import org.apache.pulsar.tests.integration.docker.ContainerExecResult; import org.apache.pulsar.tests.integration.topologies.PulsarCluster; import org.apache.pulsar.tests.integration.topologies.PulsarClusterTestBase; @@ -39,7 +41,6 @@ public class CLITest extends PulsarClusterTestBase { String tenantName = "test-deprecated-commands"; ContainerExecResult result = pulsarCluster.runAdminCommandOnAnyBroker("--help"); -assertEquals(0, result.getExitCode()); assertFalse(result.getStdout().isEmpty()); assertFalse(result.getStdout().contains("Usage: properties ")); result = pulsarCluster.runAdminCommandOnAnyBroker( @@ -73,7 +74,6 @@ public class CLITest extends PulsarClusterTestBase { "--subscription", "" + subscriptionPrefix + i ); -assertEquals(0, result.getExitCode()); assertTrue(result.getStdout().isEmpty()); assertTrue(result.getStderr().isEmpty()); i++; @@ -93,7 +93,6 @@ public class CLITest extends PulsarClusterTestBase { "1", topicName); -assertEquals(0, result.getExitCode()); assertTrue(result.getStdout().contains("1 messages successfully produced")); // terminate the topic @@ -102,20 +101,21 @@ public class CLITest extends PulsarClusterTestBase { "persistent", "terminate", topicName); -assertEquals(0, result.getExitCode()); assertTrue(result.getStdout().contains("Topic succesfully terminated at")); // try to produce should fail -result = pulsarCluster.getAnyBroker().execCmd( -PulsarCluster.CLIENT_SCRIPT, -"produce", -"-m", -"\"test topic termination\"", -"-n", -"1", -topicName); -assertNotEquals(0, result.getExitCode()); -assertTrue(result.getStdout().contains("Topic was already terminated")); +try { +pulsarCluster.getAnyBroker().execCmd(PulsarCluster.CLIENT_SCRIPT, + "produce", + "-m", + "\"test topic termination\"", + "-n", + "1", + topicName); +fail("Command should have exited with non-zero"); +} catch (ContainerExecException e) { +assertTrue(e.getResult().getStdout().contains("Topic was already terminated")); +} }
[GitHub] sijie closed pull request #2249: Throw exception on non-zero command exit in integ tests
sijie closed pull request #2249: Throw exception on non-zero command exit in integ tests URL: https://github.com/apache/incubator-pulsar/pull/2249 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/tests/integration/src/test/java/org/apache/pulsar/tests/integration/cli/CLITest.java b/tests/integration/src/test/java/org/apache/pulsar/tests/integration/cli/CLITest.java index 22cc93780a..acdae263be 100644 --- a/tests/integration/src/test/java/org/apache/pulsar/tests/integration/cli/CLITest.java +++ b/tests/integration/src/test/java/org/apache/pulsar/tests/integration/cli/CLITest.java @@ -22,8 +22,10 @@ import static org.testng.Assert.assertFalse; import static org.testng.Assert.assertNotEquals; import static org.testng.Assert.assertTrue; +import static org.testng.Assert.fail; import org.apache.pulsar.tests.integration.containers.BrokerContainer; +import org.apache.pulsar.tests.integration.docker.ContainerExecException; import org.apache.pulsar.tests.integration.docker.ContainerExecResult; import org.apache.pulsar.tests.integration.topologies.PulsarCluster; import org.apache.pulsar.tests.integration.topologies.PulsarClusterTestBase; @@ -39,7 +41,6 @@ public void testDeprecatedCommands() throws Exception { String tenantName = "test-deprecated-commands"; ContainerExecResult result = pulsarCluster.runAdminCommandOnAnyBroker("--help"); -assertEquals(0, result.getExitCode()); assertFalse(result.getStdout().isEmpty()); assertFalse(result.getStdout().contains("Usage: properties ")); result = pulsarCluster.runAdminCommandOnAnyBroker( @@ -73,7 +74,6 @@ public void testCreateSubscriptionCommand() throws Exception { "--subscription", "" + subscriptionPrefix + i ); -assertEquals(0, result.getExitCode()); assertTrue(result.getStdout().isEmpty()); assertTrue(result.getStderr().isEmpty()); i++; @@ -93,7 +93,6 @@ public void testTopicTerminationOnTopicsWithoutConnectedConsumers() throws Excep "1", topicName); -assertEquals(0, result.getExitCode()); assertTrue(result.getStdout().contains("1 messages successfully produced")); // terminate the topic @@ -102,20 +101,21 @@ public void testTopicTerminationOnTopicsWithoutConnectedConsumers() throws Excep "persistent", "terminate", topicName); -assertEquals(0, result.getExitCode()); assertTrue(result.getStdout().contains("Topic succesfully terminated at")); // try to produce should fail -result = pulsarCluster.getAnyBroker().execCmd( -PulsarCluster.CLIENT_SCRIPT, -"produce", -"-m", -"\"test topic termination\"", -"-n", -"1", -topicName); -assertNotEquals(0, result.getExitCode()); -assertTrue(result.getStdout().contains("Topic was already terminated")); +try { +pulsarCluster.getAnyBroker().execCmd(PulsarCluster.CLIENT_SCRIPT, + "produce", + "-m", + "\"test topic termination\"", + "-n", + "1", + topicName); +fail("Command should have exited with non-zero"); +} catch (ContainerExecException e) { +assertTrue(e.getResult().getStdout().contains("Topic was already terminated")); +} } @Test @@ -131,7 +131,6 @@ public void testSchemaCLI() throws Exception { "-n", "1", topicName); -assertEquals(0, result.getExitCode()); assertTrue(result.getStdout().contains("1 messages successfully produced")); result = container.execCmd( @@ -142,7 +141,6 @@ public void testSchemaCLI() throws Exception { "-f", "/pulsar/conf/schema_example.conf" ); -assertEquals(0, result.getExitCode()); assertTrue(result.getStdout().isEmpty()); assertTrue(result.getStderr().isEmpty()); @@ -152,7 +150,6 @@ public void testSchemaCLI() throws Exception { "schemas", "get", topicName); -assertEquals(0, result.getExitCode()); assertTrue(result.getStdout().contains("\"type\" : \"STRING\"")); // delete the schema @@ -161,19 +158,20 @@ public void testSchemaCLI() throws Exception { "schemas", "delete", topicName); -
[GitHub] sijie commented on issue #2251: [website] Enable Translation & Localization
sijie commented on issue #2251: [website] Enable Translation & Localization URL: https://github.com/apache/incubator-pulsar/pull/2251#issuecomment-408491528 > Can we have a statement that says translated documents will be licensed under AL2 on the project description or somewhere? We don't store the translated the md files directly. However we store the generated html files in asf-site branch. It is a good idea to put a statement in the site2 README explaining all the translated documents will be licensed under AL2. > And we also need to make sure that only committers have a privilege to approve suggested translations. yes. I will add committers as managers to that crowdin project once this is live on the website. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] merlimat closed pull request #2247: [website] Add versions page
merlimat closed pull request #2247: [website] Add versions page URL: https://github.com/apache/incubator-pulsar/pull/2247 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/site2/website/oldversions.json b/site2/website/oldversions.json new file mode 100644 index 00..a83f27fb69 --- /dev/null +++ b/site2/website/oldversions.json @@ -0,0 +1,8 @@ +[ +"2.0.1-incubating", +"2.0.0-rc1-incubating", +"1.22.0-incubating", +"1.21.0-incubating", +"1.20.0-incubating", +"1.19.0-incubating" +] diff --git a/site2/website/pages/en/download.js b/site2/website/pages/en/download.js index 248487446b..6d34397cb2 100644 --- a/site2/website/pages/en/download.js +++ b/site2/website/pages/en/download.js @@ -89,15 +89,20 @@ class Download extends React.Component { Release notes - - [Release notes](/release-notes) for all Pulsar's versions - + + +Release notes for all Pulsar's versions + + Getting started - - Once you've downloaded a Pulsar release, instructions on getting up and running with a standalone cluster - that you can run on your laptop can be found in the [Run Pulsar locally](/docs/standalone) tutorial. - + + +Once you've downloaded a Pulsar release, instructions on getting up and running with a standalone cluster +that you can run on your laptop can be found in the{' '} +Run Pulsar locally tutorial. + + If you need to connect to an existing Pulsar cluster or instance using an officially supported client, see the client docs for these languages: @@ -160,7 +165,7 @@ class Download extends React.Component { sha512) - Release Notes + Release Notes ) diff --git a/site2/website/pages/en/release-notes.js b/site2/website/pages/en/release-notes.js index 09643e3773..076d6a6eb7 100644 --- a/site2/website/pages/en/release-notes.js +++ b/site2/website/pages/en/release-notes.js @@ -19,7 +19,7 @@ class ReleaseNotes extends React.Component { - Apache Pulsar downloads + Apache Pulsar Release Notes diff --git a/site2/website/pages/en/versions.js b/site2/website/pages/en/versions.js index 5aabeb1824..c730e201c3 100644 --- a/site2/website/pages/en/versions.js +++ b/site2/website/pages/en/versions.js @@ -8,90 +8,99 @@ const GridBlock = CompLibrary.GridBlock; const CWD = process.cwd(); const siteConfig = require(`${CWD}/siteConfig.js`); -//const versions = require(CWD + '/versions.json'); +// versions post docusaurus +// const versions = require(`${CWD}/versions.json`); +// versions pre docusaurus +const oldversions = require(`${CWD}/oldversions.json`); -/* -class Versions extends React.Component { - render() { -const latestVersion = versions[0]; +function Versions(props) { +const latestStableVersion = oldversions[0]; +const repoUrl = `https://github.com/${siteConfig.organizationName}/${ + siteConfig.projectName +}`; return ( - - - - - {siteConfig.title + ' Versions'} - -New versions of this project are released every so often. -Current version (Stable) - - - - {latestVersion} - -Documentation - - -Release Notes - - - - - - This is the version that is configured automatically when you - first install this project. - -Pre-release versions - - - - master - -Documentation - - -Release Notes - - - - -Other text describing this section. -Past Versions - - -{versions.map( - version => -version !== latestVersion && ( - -{version} - - Documentation -
[GitHub] merlimat commented on issue #2248: [website] remove `reference-auth.md`
merlimat commented on issue #2248: [website] remove `reference-auth.md` URL: https://github.com/apache/incubator-pulsar/pull/2248#issuecomment-408486971 run java8 tests This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-pulsar] branch master updated: [website] Add versions page (#2247)
This is an automated email from the ASF dual-hosted git repository. mmerli pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-pulsar.git The following commit(s) were added to refs/heads/master by this push: new fd1b459 [website] Add versions page (#2247) fd1b459 is described below commit fd1b4593d9f993a3450a151e8aa7d1ce671f215e Author: Sijie Guo AuthorDate: Fri Jul 27 10:31:39 2018 -0700 [website] Add versions page (#2247) * Update release notes title * [website] Add versions page ### Motivation We need to link back to the documentation for the old versions. ### Changes Update `versions` page to include links back to old versions. ### NOTES This change doesn't include cutting a version for 2.1.0 release. This will be done after 2.1.0 release is cut. --- site2/website/oldversions.json | 8 ++ site2/website/pages/en/download.js | 21 ++-- site2/website/pages/en/release-notes.js | 2 +- site2/website/pages/en/versions.js | 171 +--- site2/website/siteConfig.js | 2 +- 5 files changed, 113 insertions(+), 91 deletions(-) diff --git a/site2/website/oldversions.json b/site2/website/oldversions.json new file mode 100644 index 000..a83f27f --- /dev/null +++ b/site2/website/oldversions.json @@ -0,0 +1,8 @@ +[ +"2.0.1-incubating", +"2.0.0-rc1-incubating", +"1.22.0-incubating", +"1.21.0-incubating", +"1.20.0-incubating", +"1.19.0-incubating" +] diff --git a/site2/website/pages/en/download.js b/site2/website/pages/en/download.js index 2484874..6d34397 100644 --- a/site2/website/pages/en/download.js +++ b/site2/website/pages/en/download.js @@ -89,15 +89,20 @@ class Download extends React.Component { Release notes - - [Release notes](/release-notes) for all Pulsar's versions - + + +Release notes for all Pulsar's versions + + Getting started - - Once you've downloaded a Pulsar release, instructions on getting up and running with a standalone cluster - that you can run on your laptop can be found in the [Run Pulsar locally](/docs/standalone) tutorial. - + + +Once you've downloaded a Pulsar release, instructions on getting up and running with a standalone cluster +that you can run on your laptop can be found in the{' '} +Run Pulsar locally tutorial. + + If you need to connect to an existing Pulsar cluster or instance using an officially supported client, see the client docs for these languages: @@ -160,7 +165,7 @@ class Download extends React.Component { sha512) - Release Notes + Release Notes ) diff --git a/site2/website/pages/en/release-notes.js b/site2/website/pages/en/release-notes.js index 09643e3..076d6a6 100644 --- a/site2/website/pages/en/release-notes.js +++ b/site2/website/pages/en/release-notes.js @@ -19,7 +19,7 @@ class ReleaseNotes extends React.Component { - Apache Pulsar downloads + Apache Pulsar Release Notes diff --git a/site2/website/pages/en/versions.js b/site2/website/pages/en/versions.js index 5aabeb1..c730e20 100644 --- a/site2/website/pages/en/versions.js +++ b/site2/website/pages/en/versions.js @@ -8,90 +8,99 @@ const GridBlock = CompLibrary.GridBlock; const CWD = process.cwd(); const siteConfig = require(`${CWD}/siteConfig.js`); -//const versions = require(CWD + '/versions.json'); +// versions post docusaurus +// const versions = require(`${CWD}/versions.json`); +// versions pre docusaurus +const oldversions = require(`${CWD}/oldversions.json`); -/* -class Versions extends React.Component { - render() { -const latestVersion = versions[0]; +function Versions(props) { +const latestStableVersion = oldversions[0]; +const repoUrl = `https://github.com/${siteConfig.organizationName}/${ + siteConfig.projectName +}`; return ( - - - - - {siteConfig.title + ' Versions'} - -New versions of this project are released every so often. -Current version (Stable) - - - - {latestVersion} - -Documentation - - -Release Notes - - - -
[GitHub] awilliams opened a new issue #2252: documentation: incorrect flags for bin/pulsar
awilliams opened a new issue #2252: documentation: incorrect flags for bin/pulsar URL: https://github.com/apache/incubator-pulsar/issues/2252 Expected behavior The documentation lists `-c` and `--conf` as possible arguments of the `bin/pulsar` command. https://pulsar.incubator.apache.org/docs/v2.0.1-incubating/reference/CliTools/#zookeeper-90gyx Actual behavior In practice, the `-c` and `--conf` args to `bin/pulsar` are not respected. It seems that the proper way to specify an alternative config file is via the [`PULSAR_ZK_CONF`](https://github.com/apache/incubator-pulsar/blob/fd47532380d770e4fd78cabe71dea293fc2f0e06/bin/pulsar#L264) env variable. ```shell $ ./bin/pulsar zookeeper -c conf/zookeeper.conf [AppClassLoader@4dd8dc3] info AspectJ Weaver Version 1.8.9 built on Monday Mar 14, 2016 at 21:18:16 GMT [AppClassLoader@4dd8dc3] info register classloader sun.misc.Launcher$AppClassLoader@4dd8dc3 [AppClassLoader@4dd8dc3] info using configuration file:/src/github.com/apache/incubator-pulsar-v2/pulsar-broker/target/pulsar-broker.jar!/META-INF/aop.xml [AppClassLoader@4dd8dc3] info using configuration file:/src/github.com/apache/incubator-pulsar-v2/pulsar-zookeeper/target/pulsar-zookeeper.jar!/META-INF/aop.xml [AppClassLoader@4dd8dc3] info register aspect org.apache.pulsar.broker.zookeeper.aspectj.ClientCnxnAspect [AppClassLoader@4dd8dc3] info register aspect org.apache.pulsar.zookeeper.FinalRequestProcessorAspect [AppClassLoader@4dd8dc3] info register aspect org.apache.pulsar.zookeeper.ZooKeeperServerAspect 10:49:25.346 [main] INFO org.apache.pulsar.zookeeper.ZooKeeperStarter - Starting ZK stats HTTP server at port 8000 10:49:25.367 [main] INFO org.eclipse.jetty.util.log - Logging initialized @2001ms 10:49:25.425 [main] INFO org.eclipse.jetty.server.Server - jetty-9.3.11.v20160721 10:49:25.459 [main] INFO org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@fcb4004{/,null,AVAILABLE} 10:49:25.486 [main] INFO org.eclipse.jetty.server.AbstractConnector - Started ServerConnector@3cc41abc{HTTP/1.1,[http/1.1]}{0.0.0.0:8000} 10:49:25.486 [main] INFO org.eclipse.jetty.server.Server - Started @2122ms 10:49:25.492 [main] INFO org.apache.zookeeper.server.DatadirCleanupManager - autopurge.snapRetainCount set to 3 10:49:25.492 [main] INFO org.apache.zookeeper.server.DatadirCleanupManager - autopurge.purgeInterval set to 0 10:49:25.493 [main] INFO org.apache.zookeeper.server.DatadirCleanupManager - Purge task is not scheduled. 10:49:25.493 [main] WARN org.apache.zookeeper.server.quorum.QuorumPeerMain - Either no config or no quorum defined in config, running in standalone mode 10:49:25.535 [main] ERROR org.apache.zookeeper.server.ZooKeeperServerMain - Invalid arguments, exiting abnormally java.lang.NumberFormatException: For input string: "/src/github.com/apache/incubator-pulsar-v2/conf/zookeeper.conf" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:1.8.0_161] at java.lang.Integer.parseInt(Integer.java:569) ~[?:1.8.0_161] at java.lang.Integer.parseInt(Integer.java:615) ~[?:1.8.0_161] at org.apache.zookeeper.server.ServerConfig.parse(ServerConfig.java:59) ~[pulsar-broker.jar:2.0.1-incubating] at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:84) ~[pulsar-broker.jar:2.0.1-incubating] at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53) [pulsar-broker.jar:2.0.1-incubating] at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) [pulsar-broker.jar:2.0.1-incubating] at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) [pulsar-broker.jar:2.0.1-incubating] at org.apache.pulsar.zookeeper.ZooKeeperStarter.start(ZooKeeperStarter.java:68) [pulsar-zookeeper.jar:2.0.1-incubating] at org.apache.pulsar.zookeeper.ZooKeeperStarter.main(ZooKeeperStarter.java:38) [pulsar-zookeeper.jar:2.0.1-incubating] 10:49:25.540 [main] INFO org.apache.zookeeper.server.ZooKeeperServerMain - Usage: ZooKeeperServerMain configfile | port datadir [ticktime] [maxcnxns] Usage: ZooKeeperServerMain configfile | port datadir [ticktime] [maxcnxns] ``` Steps to reproduce See above "Actual behavior" System configuration `v2.0.1-incubating` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205821168 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -86,6 +93,67 @@ If you are running in EC2 you can also use instance profile credentials, provide {% include admonition.html type="warning" content="The broker must be rebooted for credentials specified in pulsar_env to take effect." %} + Configuring the size of block read/write + +Pulsar also provides some knobs to configure the size of requests sent to AWS S3. + +- ```s3ManagedLedgerOffloadMaxBlockSizeInBytes``` configures the maximum size of a "part" sent during a multipart upload. This cannot be smaller than 5MB. Default is 64MB. +- ```s3ManagedLedgerOffloadReadBufferSizeInBytes``` configures the block size for each individual read when reading back data from AWS S3. Default is 1MB. + +In both cases, these should not be touched unless you know what you are doing. + + +### Configuring for "google-cloud-storage" driver + + Configuring the Bucket + +On the broker, the administrator must configure the bucket and credentials for the cloud storage service. The configured bucket and credentials must exist before attempting to offload. If it does not exist, the offload operation will fail. + +Regarding driver type "google-cloud-storage", the administrator should configure `gcsManagedLedgerOffloadBucket`. + +```conf +gcsManagedLedgerOffloadBucket=pulsar-topic-offload +``` + + Configuring the Bucket Region + +Bucket Region is the region where bucket located. + +Regarding GCS, buckets are default created in the `us multi-regional location`, page [Bucket Locations](https://cloud.google.com/storage/docs/bucket-locations) contains more information. + +- GCS Region example: + +```conf +gcsManagedLedgerOffloadRegion=europe-west3 +``` + + Configuring the Authenticating + +The administrator need configure `gcsManagedLedgerOffloadServiceAccountKeyFile` in `broker.conf` to get GCS service available. It is a Json file, which contains GCS credentials of service account key. +[This page](https://support.google.com/googleapi/answer/6158849) contains more information of how to create this key file for authentication. You could also get more information regarding google cloud [IAM](https://cloud.google.com/storage/docs/access-control/iam). + +Usually these are the steps to create the authentication file: +1. Open the API Console Credentials page. +2. If it's not already selected, select the project that you're creating credentials for. +3. To set up a new service account, click New credentials and then select Service account key. +4. Choose the service account to use for the key. +5. Choose whether to download the service account's public/private key as a JSON file that can be loaded by a Google API client library. + +Here is an example: Review comment: Remove this line. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205817782 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -17,44 +19,50 @@ A topic in Pulsar is backed by a log, known as a managed ledger. This log is com The Tiered Storage offloading mechanism takes advantage of this segment oriented architecture. When offloading is requested, the segments of the log are copied, one-by-one, to tiered storage. All segments of the log, apart from the segment currently being written to can be offloaded. -## Amazon S3 - -Tiered storage currently supports S3 for long term storage. On the broker, the administrator must configure a S3 bucket and the AWS region where the bucket exists. Offloaded data will be placed into this bucket. +On the broker, the administrator must configure the bucket or credentials for the cloud storage service. The configured bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. -The configured S3 bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. +Pulsar uses multi-part objects to upload the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a life cycle rule your bucket to expire incomplete multi-part upload after a day or two to avoid getting charged for incomplete uploads. -Pulsar users multipart objects to update the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a lifecycle rule your S3 bucket to expire incomplete multipart upload after a day or two to avoid getting charged for incomplete uploads. - -### Configuring the broker +## Configuring the driver for "aws-s3" or "google-cloud-storage" in the broker Offloading is configured in ```broker.conf```. -At a minimum, the user must configure the driver, the region and the bucket. +At a minimum, the administrator must configure the driver, the bucket and the authenticating. There is also some other knobs to configure, like the bucket regions, the max block size in backed storage, etc. + +Currently we support driver of types: { "aws-s3", "google-cloud-storage" }, +{% include admonition.html type="warning" content="Driver names are case-insensitive for driver's name. "s3" and "aws-s3" are similar, with "aws-s3" you just don't need to define the url of the endpoint because it is aligned with region, and default is `s3.amazonaws.com`; while with s3, you must provide the endpoint url by `s3ManagedLedgerOffloadServiceEndpoint`." %} ```conf -managedLedgerOffloadDriver=S3 -s3ManagedLedgerOffloadRegion=eu-west-3 -s3ManagedLedgerOffloadBucket=pulsar-topic-offload +managedLedgerOffloadDriver=aws-s3 ``` -It is also possible to specify the s3 endpoint directly, using ```s3ManagedLedgerOffloadServiceEndpoint```. This is useful if you are using a non-AWS storage service which provides an S3 compatible API. +### Configuring for "aws-s3" driver -{% include admonition.html type="warning" content="If the endpoint is specified directly, then the region must _not_ be set." %} + Configuring the Bucket -{% include admonition.html type="warning" content="The broker.conf of all brokers must have the same configuration for driver, region and bucket for offload to avoid data becoming unavailable as topics move from one broker to another." %} +On the broker, the administrator must configure the bucket and credentials for the cloud storage service. The configured bucket and credentials must exist before attempting to offload. If it does not exist, the offload operation will fail. -Pulsar also provides some knobs to configure the size of requests sent to S3. +Regarding AWS S3, the administrator should configure `s3ManagedLedgerOffloadBucket`. -- ```s3ManagedLedgerOffloadMaxBlockSizeInBytes``` configures the maximum size of a "part" sent during a multipart upload. This cannot be smaller than 5MB. Default is 64MB. -- ```s3ManagedLedgerOffloadReadBufferSizeInBytes``` configures the block size for each individual read when reading back data from S3. Default is 1MB. +```conf +s3ManagedLedgerOffloadBucket=pulsar-topic-offload +``` -In both cases, these should not be touched unless you know what you are doing. + Configuring the Bucket Region -{% include admonition.html type="warning" content="The broker must be rebooted for any changes in the configuration to take effect." %} +Bucket Region is the region where bucket located. -### Authenticating with S3 +Regarding AWS S3, the default region is `US East (N. Virginia)`. Page [AWS Regions and Endpoints](https://docs.aws.amazon.com/general/latest/gr/rande.html) contains more information. -To be able to access S3, you need to authenticate with S3. Pulsar does not provide any direct mean
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205820170 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -86,6 +93,67 @@ If you are running in EC2 you can also use instance profile credentials, provide {% include admonition.html type="warning" content="The broker must be rebooted for credentials specified in pulsar_env to take effect." %} + Configuring the size of block read/write + +Pulsar also provides some knobs to configure the size of requests sent to AWS S3. + +- ```s3ManagedLedgerOffloadMaxBlockSizeInBytes``` configures the maximum size of a "part" sent during a multipart upload. This cannot be smaller than 5MB. Default is 64MB. +- ```s3ManagedLedgerOffloadReadBufferSizeInBytes``` configures the block size for each individual read when reading back data from AWS S3. Default is 1MB. + +In both cases, these should not be touched unless you know what you are doing. + + +### Configuring for "google-cloud-storage" driver + + Configuring the Bucket + +On the broker, the administrator must configure the bucket and credentials for the cloud storage service. The configured bucket and credentials must exist before attempting to offload. If it does not exist, the offload operation will fail. + +Regarding driver type "google-cloud-storage", the administrator should configure `gcsManagedLedgerOffloadBucket`. + +```conf +gcsManagedLedgerOffloadBucket=pulsar-topic-offload +``` + + Configuring the Bucket Region + +Bucket Region is the region where bucket located. + +Regarding GCS, buckets are default created in the `us multi-regional location`, page [Bucket Locations](https://cloud.google.com/storage/docs/bucket-locations) contains more information. + +- GCS Region example: + +```conf +gcsManagedLedgerOffloadRegion=europe-west3 +``` + + Configuring the Authenticating Review comment: As above, just "Authentication" is enough. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205821093 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -86,6 +93,67 @@ If you are running in EC2 you can also use instance profile credentials, provide {% include admonition.html type="warning" content="The broker must be rebooted for credentials specified in pulsar_env to take effect." %} + Configuring the size of block read/write + +Pulsar also provides some knobs to configure the size of requests sent to AWS S3. + +- ```s3ManagedLedgerOffloadMaxBlockSizeInBytes``` configures the maximum size of a "part" sent during a multipart upload. This cannot be smaller than 5MB. Default is 64MB. +- ```s3ManagedLedgerOffloadReadBufferSizeInBytes``` configures the block size for each individual read when reading back data from AWS S3. Default is 1MB. + +In both cases, these should not be touched unless you know what you are doing. + + +### Configuring for "google-cloud-storage" driver + + Configuring the Bucket + +On the broker, the administrator must configure the bucket and credentials for the cloud storage service. The configured bucket and credentials must exist before attempting to offload. If it does not exist, the offload operation will fail. + +Regarding driver type "google-cloud-storage", the administrator should configure `gcsManagedLedgerOffloadBucket`. + +```conf +gcsManagedLedgerOffloadBucket=pulsar-topic-offload +``` + + Configuring the Bucket Region + +Bucket Region is the region where bucket located. + +Regarding GCS, buckets are default created in the `us multi-regional location`, page [Bucket Locations](https://cloud.google.com/storage/docs/bucket-locations) contains more information. + +- GCS Region example: + +```conf +gcsManagedLedgerOffloadRegion=europe-west3 +``` + + Configuring the Authenticating + +The administrator need configure `gcsManagedLedgerOffloadServiceAccountKeyFile` in `broker.conf` to get GCS service available. It is a Json file, which contains GCS credentials of service account key. +[This page](https://support.google.com/googleapi/answer/6158849) contains more information of how to create this key file for authentication. You could also get more information regarding google cloud [IAM](https://cloud.google.com/storage/docs/access-control/iam). + +Usually these are the steps to create the authentication file: +1. Open the API Console Credentials page. +2. If it's not already selected, select the project that you're creating credentials for. +3. To set up a new service account, click New credentials and then select Service account key. +4. Choose the service account to use for the key. +5. Choose whether to download the service account's public/private key as a JSON file that can be loaded by a Google API client library. Review comment: Remove "Choose whether to". This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205817635 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -17,44 +19,50 @@ A topic in Pulsar is backed by a log, known as a managed ledger. This log is com The Tiered Storage offloading mechanism takes advantage of this segment oriented architecture. When offloading is requested, the segments of the log are copied, one-by-one, to tiered storage. All segments of the log, apart from the segment currently being written to can be offloaded. -## Amazon S3 - -Tiered storage currently supports S3 for long term storage. On the broker, the administrator must configure a S3 bucket and the AWS region where the bucket exists. Offloaded data will be placed into this bucket. +On the broker, the administrator must configure the bucket or credentials for the cloud storage service. The configured bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. -The configured S3 bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. +Pulsar uses multi-part objects to upload the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a life cycle rule your bucket to expire incomplete multi-part upload after a day or two to avoid getting charged for incomplete uploads. -Pulsar users multipart objects to update the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a lifecycle rule your S3 bucket to expire incomplete multipart upload after a day or two to avoid getting charged for incomplete uploads. - -### Configuring the broker +## Configuring the driver for "aws-s3" or "google-cloud-storage" in the broker Offloading is configured in ```broker.conf```. -At a minimum, the user must configure the driver, the region and the bucket. +At a minimum, the administrator must configure the driver, the bucket and the authenticating. There is also some other knobs to configure, like the bucket regions, the max block size in backed storage, etc. + +Currently we support driver of types: { "aws-s3", "google-cloud-storage" }, +{% include admonition.html type="warning" content="Driver names are case-insensitive for driver's name. "s3" and "aws-s3" are similar, with "aws-s3" you just don't need to define the url of the endpoint because it is aligned with region, and default is `s3.amazonaws.com`; while with s3, you must provide the endpoint url by `s3ManagedLedgerOffloadServiceEndpoint`." %} ```conf -managedLedgerOffloadDriver=S3 -s3ManagedLedgerOffloadRegion=eu-west-3 -s3ManagedLedgerOffloadBucket=pulsar-topic-offload +managedLedgerOffloadDriver=aws-s3 ``` -It is also possible to specify the s3 endpoint directly, using ```s3ManagedLedgerOffloadServiceEndpoint```. This is useful if you are using a non-AWS storage service which provides an S3 compatible API. +### Configuring for "aws-s3" driver -{% include admonition.html type="warning" content="If the endpoint is specified directly, then the region must _not_ be set." %} + Configuring the Bucket -{% include admonition.html type="warning" content="The broker.conf of all brokers must have the same configuration for driver, region and bucket for offload to avoid data becoming unavailable as topics move from one broker to another." %} +On the broker, the administrator must configure the bucket and credentials for the cloud storage service. The configured bucket and credentials must exist before attempting to offload. If it does not exist, the offload operation will fail. -Pulsar also provides some knobs to configure the size of requests sent to S3. +Regarding AWS S3, the administrator should configure `s3ManagedLedgerOffloadBucket`. -- ```s3ManagedLedgerOffloadMaxBlockSizeInBytes``` configures the maximum size of a "part" sent during a multipart upload. This cannot be smaller than 5MB. Default is 64MB. -- ```s3ManagedLedgerOffloadReadBufferSizeInBytes``` configures the block size for each individual read when reading back data from S3. Default is 1MB. +```conf +s3ManagedLedgerOffloadBucket=pulsar-topic-offload +``` -In both cases, these should not be touched unless you know what you are doing. + Configuring the Bucket Region -{% include admonition.html type="warning" content="The broker must be rebooted for any changes in the configuration to take effect." %} +Bucket Region is the region where bucket located. Review comment: Add a note about whether bucket region is a required configuration. What happens if it is not configured. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL ab
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205820891 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -86,6 +93,67 @@ If you are running in EC2 you can also use instance profile credentials, provide {% include admonition.html type="warning" content="The broker must be rebooted for credentials specified in pulsar_env to take effect." %} + Configuring the size of block read/write + +Pulsar also provides some knobs to configure the size of requests sent to AWS S3. + +- ```s3ManagedLedgerOffloadMaxBlockSizeInBytes``` configures the maximum size of a "part" sent during a multipart upload. This cannot be smaller than 5MB. Default is 64MB. +- ```s3ManagedLedgerOffloadReadBufferSizeInBytes``` configures the block size for each individual read when reading back data from AWS S3. Default is 1MB. + +In both cases, these should not be touched unless you know what you are doing. + + +### Configuring for "google-cloud-storage" driver + + Configuring the Bucket + +On the broker, the administrator must configure the bucket and credentials for the cloud storage service. The configured bucket and credentials must exist before attempting to offload. If it does not exist, the offload operation will fail. + +Regarding driver type "google-cloud-storage", the administrator should configure `gcsManagedLedgerOffloadBucket`. + +```conf +gcsManagedLedgerOffloadBucket=pulsar-topic-offload +``` + + Configuring the Bucket Region + +Bucket Region is the region where bucket located. + +Regarding GCS, buckets are default created in the `us multi-regional location`, page [Bucket Locations](https://cloud.google.com/storage/docs/bucket-locations) contains more information. + +- GCS Region example: + +```conf +gcsManagedLedgerOffloadRegion=europe-west3 +``` + + Configuring the Authenticating + +The administrator need configure `gcsManagedLedgerOffloadServiceAccountKeyFile` in `broker.conf` to get GCS service available. It is a Json file, which contains GCS credentials of service account key. +[This page](https://support.google.com/googleapi/answer/6158849) contains more information of how to create this key file for authentication. You could also get more information regarding google cloud [IAM](https://cloud.google.com/storage/docs/access-control/iam). Review comment: The [Service Accounts section of this page](...) contains more information... More information about [Google Cloud IAMs is available here](...). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205816756 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -17,44 +19,50 @@ A topic in Pulsar is backed by a log, known as a managed ledger. This log is com The Tiered Storage offloading mechanism takes advantage of this segment oriented architecture. When offloading is requested, the segments of the log are copied, one-by-one, to tiered storage. All segments of the log, apart from the segment currently being written to can be offloaded. -## Amazon S3 - -Tiered storage currently supports S3 for long term storage. On the broker, the administrator must configure a S3 bucket and the AWS region where the bucket exists. Offloaded data will be placed into this bucket. +On the broker, the administrator must configure the bucket or credentials for the cloud storage service. The configured bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. -The configured S3 bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. +Pulsar uses multi-part objects to upload the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a life cycle rule your bucket to expire incomplete multi-part upload after a day or two to avoid getting charged for incomplete uploads. -Pulsar users multipart objects to update the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a lifecycle rule your S3 bucket to expire incomplete multipart upload after a day or two to avoid getting charged for incomplete uploads. - -### Configuring the broker +## Configuring the driver for "aws-s3" or "google-cloud-storage" in the broker Offloading is configured in ```broker.conf```. -At a minimum, the user must configure the driver, the region and the bucket. +At a minimum, the administrator must configure the driver, the bucket and the authenticating. There is also some other knobs to configure, like the bucket regions, the max block size in backed storage, etc. + +Currently we support driver of types: { "aws-s3", "google-cloud-storage" }, +{% include admonition.html type="warning" content="Driver names are case-insensitive for driver's name. "s3" and "aws-s3" are similar, with "aws-s3" you just don't need to define the url of the endpoint because it is aligned with region, and default is `s3.amazonaws.com`; while with s3, you must provide the endpoint url by `s3ManagedLedgerOffloadServiceEndpoint`." %} Review comment: "s3" and "aws-s3" ... -> There is a third driver type, "s3", which is identical to "aws-s3", though it requires that you specify an endpoint url using `s3ManagedLedgerOffloadServiceEndpoint`. This is useful if using a S3 compatible data store, other than AWS. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205818346 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -86,6 +93,67 @@ If you are running in EC2 you can also use instance profile credentials, provide {% include admonition.html type="warning" content="The broker must be rebooted for credentials specified in pulsar_env to take effect." %} + Configuring the size of block read/write + +Pulsar also provides some knobs to configure the size of requests sent to AWS S3. + +- ```s3ManagedLedgerOffloadMaxBlockSizeInBytes``` configures the maximum size of a "part" sent during a multipart upload. This cannot be smaller than 5MB. Default is 64MB. +- ```s3ManagedLedgerOffloadReadBufferSizeInBytes``` configures the block size for each individual read when reading back data from AWS S3. Default is 1MB. + +In both cases, these should not be touched unless you know what you are doing. + + +### Configuring for "google-cloud-storage" driver + + Configuring the Bucket + +On the broker, the administrator must configure the bucket and credentials for the cloud storage service. The configured bucket and credentials must exist before attempting to offload. If it does not exist, the offload operation will fail. + +Regarding driver type "google-cloud-storage", the administrator should configure `gcsManagedLedgerOffloadBucket`. Review comment: Remove this line This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205818275 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -86,6 +93,67 @@ If you are running in EC2 you can also use instance profile credentials, provide {% include admonition.html type="warning" content="The broker must be rebooted for credentials specified in pulsar_env to take effect." %} + Configuring the size of block read/write + +Pulsar also provides some knobs to configure the size of requests sent to AWS S3. + +- ```s3ManagedLedgerOffloadMaxBlockSizeInBytes``` configures the maximum size of a "part" sent during a multipart upload. This cannot be smaller than 5MB. Default is 64MB. +- ```s3ManagedLedgerOffloadReadBufferSizeInBytes``` configures the block size for each individual read when reading back data from AWS S3. Default is 1MB. + +In both cases, these should not be touched unless you know what you are doing. + + +### Configuring for "google-cloud-storage" driver + + Configuring the Bucket Review comment: Again, I'd have same "Bucket & Region" header as I said above. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205816884 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -17,44 +19,50 @@ A topic in Pulsar is backed by a log, known as a managed ledger. This log is com The Tiered Storage offloading mechanism takes advantage of this segment oriented architecture. When offloading is requested, the segments of the log are copied, one-by-one, to tiered storage. All segments of the log, apart from the segment currently being written to can be offloaded. -## Amazon S3 - -Tiered storage currently supports S3 for long term storage. On the broker, the administrator must configure a S3 bucket and the AWS region where the bucket exists. Offloaded data will be placed into this bucket. +On the broker, the administrator must configure the bucket or credentials for the cloud storage service. The configured bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. -The configured S3 bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. +Pulsar uses multi-part objects to upload the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a life cycle rule your bucket to expire incomplete multi-part upload after a day or two to avoid getting charged for incomplete uploads. -Pulsar users multipart objects to update the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a lifecycle rule your S3 bucket to expire incomplete multipart upload after a day or two to avoid getting charged for incomplete uploads. - -### Configuring the broker +## Configuring the driver for "aws-s3" or "google-cloud-storage" in the broker Offloading is configured in ```broker.conf```. -At a minimum, the user must configure the driver, the region and the bucket. +At a minimum, the administrator must configure the driver, the bucket and the authenticating. There is also some other knobs to configure, like the bucket regions, the max block size in backed storage, etc. + +Currently we support driver of types: { "aws-s3", "google-cloud-storage" }, +{% include admonition.html type="warning" content="Driver names are case-insensitive for driver's name. "s3" and "aws-s3" are similar, with "aws-s3" you just don't need to define the url of the endpoint because it is aligned with region, and default is `s3.amazonaws.com`; while with s3, you must provide the endpoint url by `s3ManagedLedgerOffloadServiceEndpoint`." %} ```conf -managedLedgerOffloadDriver=S3 -s3ManagedLedgerOffloadRegion=eu-west-3 -s3ManagedLedgerOffloadBucket=pulsar-topic-offload +managedLedgerOffloadDriver=aws-s3 ``` -It is also possible to specify the s3 endpoint directly, using ```s3ManagedLedgerOffloadServiceEndpoint```. This is useful if you are using a non-AWS storage service which provides an S3 compatible API. +### Configuring for "aws-s3" driver Review comment: -> "aws-s3" Driver configuration This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205819601 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -86,6 +93,67 @@ If you are running in EC2 you can also use instance profile credentials, provide {% include admonition.html type="warning" content="The broker must be rebooted for credentials specified in pulsar_env to take effect." %} + Configuring the size of block read/write + +Pulsar also provides some knobs to configure the size of requests sent to AWS S3. + +- ```s3ManagedLedgerOffloadMaxBlockSizeInBytes``` configures the maximum size of a "part" sent during a multipart upload. This cannot be smaller than 5MB. Default is 64MB. +- ```s3ManagedLedgerOffloadReadBufferSizeInBytes``` configures the block size for each individual read when reading back data from AWS S3. Default is 1MB. + +In both cases, these should not be touched unless you know what you are doing. + + +### Configuring for "google-cloud-storage" driver + + Configuring the Bucket + +On the broker, the administrator must configure the bucket and credentials for the cloud storage service. The configured bucket and credentials must exist before attempting to offload. If it does not exist, the offload operation will fail. + +Regarding driver type "google-cloud-storage", the administrator should configure `gcsManagedLedgerOffloadBucket`. + +```conf +gcsManagedLedgerOffloadBucket=pulsar-topic-offload +``` + + Configuring the Bucket Region + +Bucket Region is the region where bucket located. Review comment: Is this a required config? What happens if it isn't configured. Is it required if the bucket has been explicitly created in a region? What are the tradeoffs between `us multi-regional location` and a single bucket? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205819012 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -17,44 +19,50 @@ A topic in Pulsar is backed by a log, known as a managed ledger. This log is com The Tiered Storage offloading mechanism takes advantage of this segment oriented architecture. When offloading is requested, the segments of the log are copied, one-by-one, to tiered storage. All segments of the log, apart from the segment currently being written to can be offloaded. -## Amazon S3 - -Tiered storage currently supports S3 for long term storage. On the broker, the administrator must configure a S3 bucket and the AWS region where the bucket exists. Offloaded data will be placed into this bucket. +On the broker, the administrator must configure the bucket or credentials for the cloud storage service. The configured bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. -The configured S3 bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. +Pulsar uses multi-part objects to upload the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a life cycle rule your bucket to expire incomplete multi-part upload after a day or two to avoid getting charged for incomplete uploads. -Pulsar users multipart objects to update the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a lifecycle rule your S3 bucket to expire incomplete multipart upload after a day or two to avoid getting charged for incomplete uploads. - -### Configuring the broker +## Configuring the driver for "aws-s3" or "google-cloud-storage" in the broker Offloading is configured in ```broker.conf```. -At a minimum, the user must configure the driver, the region and the bucket. +At a minimum, the administrator must configure the driver, the bucket and the authenticating. There is also some other knobs to configure, like the bucket regions, the max block size in backed storage, etc. + +Currently we support driver of types: { "aws-s3", "google-cloud-storage" }, +{% include admonition.html type="warning" content="Driver names are case-insensitive for driver's name. "s3" and "aws-s3" are similar, with "aws-s3" you just don't need to define the url of the endpoint because it is aligned with region, and default is `s3.amazonaws.com`; while with s3, you must provide the endpoint url by `s3ManagedLedgerOffloadServiceEndpoint`." %} ```conf -managedLedgerOffloadDriver=S3 -s3ManagedLedgerOffloadRegion=eu-west-3 -s3ManagedLedgerOffloadBucket=pulsar-topic-offload +managedLedgerOffloadDriver=aws-s3 ``` -It is also possible to specify the s3 endpoint directly, using ```s3ManagedLedgerOffloadServiceEndpoint```. This is useful if you are using a non-AWS storage service which provides an S3 compatible API. +### Configuring for "aws-s3" driver -{% include admonition.html type="warning" content="If the endpoint is specified directly, then the region must _not_ be set." %} + Configuring the Bucket -{% include admonition.html type="warning" content="The broker.conf of all brokers must have the same configuration for driver, region and bucket for offload to avoid data becoming unavailable as topics move from one broker to another." %} +On the broker, the administrator must configure the bucket and credentials for the cloud storage service. The configured bucket and credentials must exist before attempting to offload. If it does not exist, the offload operation will fail. Review comment: Duplicate of text on line 22, which is in intro so it covers all drivers. I would replace this with a brief intro to what the bucket is. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205817742 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -17,44 +19,50 @@ A topic in Pulsar is backed by a log, known as a managed ledger. This log is com The Tiered Storage offloading mechanism takes advantage of this segment oriented architecture. When offloading is requested, the segments of the log are copied, one-by-one, to tiered storage. All segments of the log, apart from the segment currently being written to can be offloaded. -## Amazon S3 - -Tiered storage currently supports S3 for long term storage. On the broker, the administrator must configure a S3 bucket and the AWS region where the bucket exists. Offloaded data will be placed into this bucket. +On the broker, the administrator must configure the bucket or credentials for the cloud storage service. The configured bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. -The configured S3 bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. +Pulsar uses multi-part objects to upload the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a life cycle rule your bucket to expire incomplete multi-part upload after a day or two to avoid getting charged for incomplete uploads. -Pulsar users multipart objects to update the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a lifecycle rule your S3 bucket to expire incomplete multipart upload after a day or two to avoid getting charged for incomplete uploads. - -### Configuring the broker +## Configuring the driver for "aws-s3" or "google-cloud-storage" in the broker Offloading is configured in ```broker.conf```. -At a minimum, the user must configure the driver, the region and the bucket. +At a minimum, the administrator must configure the driver, the bucket and the authenticating. There is also some other knobs to configure, like the bucket regions, the max block size in backed storage, etc. + +Currently we support driver of types: { "aws-s3", "google-cloud-storage" }, +{% include admonition.html type="warning" content="Driver names are case-insensitive for driver's name. "s3" and "aws-s3" are similar, with "aws-s3" you just don't need to define the url of the endpoint because it is aligned with region, and default is `s3.amazonaws.com`; while with s3, you must provide the endpoint url by `s3ManagedLedgerOffloadServiceEndpoint`." %} ```conf -managedLedgerOffloadDriver=S3 -s3ManagedLedgerOffloadRegion=eu-west-3 -s3ManagedLedgerOffloadBucket=pulsar-topic-offload +managedLedgerOffloadDriver=aws-s3 ``` -It is also possible to specify the s3 endpoint directly, using ```s3ManagedLedgerOffloadServiceEndpoint```. This is useful if you are using a non-AWS storage service which provides an S3 compatible API. +### Configuring for "aws-s3" driver -{% include admonition.html type="warning" content="If the endpoint is specified directly, then the region must _not_ be set." %} + Configuring the Bucket -{% include admonition.html type="warning" content="The broker.conf of all brokers must have the same configuration for driver, region and bucket for offload to avoid data becoming unavailable as topics move from one broker to another." %} +On the broker, the administrator must configure the bucket and credentials for the cloud storage service. The configured bucket and credentials must exist before attempting to offload. If it does not exist, the offload operation will fail. -Pulsar also provides some knobs to configure the size of requests sent to S3. +Regarding AWS S3, the administrator should configure `s3ManagedLedgerOffloadBucket`. -- ```s3ManagedLedgerOffloadMaxBlockSizeInBytes``` configures the maximum size of a "part" sent during a multipart upload. This cannot be smaller than 5MB. Default is 64MB. -- ```s3ManagedLedgerOffloadReadBufferSizeInBytes``` configures the block size for each individual read when reading back data from S3. Default is 1MB. +```conf +s3ManagedLedgerOffloadBucket=pulsar-topic-offload +``` -In both cases, these should not be touched unless you know what you are doing. + Configuring the Bucket Region -{% include admonition.html type="warning" content="The broker must be rebooted for any changes in the configuration to take effect." %} +Bucket Region is the region where bucket located. -### Authenticating with S3 +Regarding AWS S3, the default region is `US East (N. Virginia)`. Page [AWS Regions and Endpoints](https://docs.aws.amazon.com/general/latest/gr/rande.html) contains more information. Review comment: With AWS S3, ... -
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205820379 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -86,6 +93,67 @@ If you are running in EC2 you can also use instance profile credentials, provide {% include admonition.html type="warning" content="The broker must be rebooted for credentials specified in pulsar_env to take effect." %} + Configuring the size of block read/write + +Pulsar also provides some knobs to configure the size of requests sent to AWS S3. + +- ```s3ManagedLedgerOffloadMaxBlockSizeInBytes``` configures the maximum size of a "part" sent during a multipart upload. This cannot be smaller than 5MB. Default is 64MB. +- ```s3ManagedLedgerOffloadReadBufferSizeInBytes``` configures the block size for each individual read when reading back data from AWS S3. Default is 1MB. + +In both cases, these should not be touched unless you know what you are doing. + + +### Configuring for "google-cloud-storage" driver + + Configuring the Bucket + +On the broker, the administrator must configure the bucket and credentials for the cloud storage service. The configured bucket and credentials must exist before attempting to offload. If it does not exist, the offload operation will fail. + +Regarding driver type "google-cloud-storage", the administrator should configure `gcsManagedLedgerOffloadBucket`. + +```conf +gcsManagedLedgerOffloadBucket=pulsar-topic-offload +``` + + Configuring the Bucket Region + +Bucket Region is the region where bucket located. + +Regarding GCS, buckets are default created in the `us multi-regional location`, page [Bucket Locations](https://cloud.google.com/storage/docs/bucket-locations) contains more information. + +- GCS Region example: + +```conf +gcsManagedLedgerOffloadRegion=europe-west3 +``` + + Configuring the Authenticating + +The administrator need configure `gcsManagedLedgerOffloadServiceAccountKeyFile` in `broker.conf` to get GCS service available. It is a Json file, which contains GCS credentials of service account key. Review comment: The administrator needs to configure `gcsManagedLedgerOffloadServiceAccountKeyFile` in `broker.conf` for the broker to be able to access the GCS service. `gcsManagedLedgerOffloadServiceAccountKeyFile` is a Json file, containing the GCS credentials of a service account. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205819223 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -86,6 +93,67 @@ If you are running in EC2 you can also use instance profile credentials, provide {% include admonition.html type="warning" content="The broker must be rebooted for credentials specified in pulsar_env to take effect." %} + Configuring the size of block read/write + +Pulsar also provides some knobs to configure the size of requests sent to AWS S3. + +- ```s3ManagedLedgerOffloadMaxBlockSizeInBytes``` configures the maximum size of a "part" sent during a multipart upload. This cannot be smaller than 5MB. Default is 64MB. +- ```s3ManagedLedgerOffloadReadBufferSizeInBytes``` configures the block size for each individual read when reading back data from AWS S3. Default is 1MB. + +In both cases, these should not be touched unless you know what you are doing. + + +### Configuring for "google-cloud-storage" driver + + Configuring the Bucket + +On the broker, the administrator must configure the bucket and credentials for the cloud storage service. The configured bucket and credentials must exist before attempting to offload. If it does not exist, the offload operation will fail. Review comment: dupe of text on line 22 in the intro. Replace with brief description of what bucket is in GCS. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205818158 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -86,6 +93,67 @@ If you are running in EC2 you can also use instance profile credentials, provide {% include admonition.html type="warning" content="The broker must be rebooted for credentials specified in pulsar_env to take effect." %} + Configuring the size of block read/write + +Pulsar also provides some knobs to configure the size of requests sent to AWS S3. + +- ```s3ManagedLedgerOffloadMaxBlockSizeInBytes``` configures the maximum size of a "part" sent during a multipart upload. This cannot be smaller than 5MB. Default is 64MB. +- ```s3ManagedLedgerOffloadReadBufferSizeInBytes``` configures the block size for each individual read when reading back data from AWS S3. Default is 1MB. + +In both cases, these should not be touched unless you know what you are doing. + + +### Configuring for "google-cloud-storage" driver Review comment: "google-cloud-storage" Driver Configuration This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205817080 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -17,44 +19,50 @@ A topic in Pulsar is backed by a log, known as a managed ledger. This log is com The Tiered Storage offloading mechanism takes advantage of this segment oriented architecture. When offloading is requested, the segments of the log are copied, one-by-one, to tiered storage. All segments of the log, apart from the segment currently being written to can be offloaded. -## Amazon S3 - -Tiered storage currently supports S3 for long term storage. On the broker, the administrator must configure a S3 bucket and the AWS region where the bucket exists. Offloaded data will be placed into this bucket. +On the broker, the administrator must configure the bucket or credentials for the cloud storage service. The configured bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. -The configured S3 bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. +Pulsar uses multi-part objects to upload the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a life cycle rule your bucket to expire incomplete multi-part upload after a day or two to avoid getting charged for incomplete uploads. -Pulsar users multipart objects to update the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a lifecycle rule your S3 bucket to expire incomplete multipart upload after a day or two to avoid getting charged for incomplete uploads. - -### Configuring the broker +## Configuring the driver for "aws-s3" or "google-cloud-storage" in the broker Offloading is configured in ```broker.conf```. -At a minimum, the user must configure the driver, the region and the bucket. +At a minimum, the administrator must configure the driver, the bucket and the authenticating. There is also some other knobs to configure, like the bucket regions, the max block size in backed storage, etc. + +Currently we support driver of types: { "aws-s3", "google-cloud-storage" }, +{% include admonition.html type="warning" content="Driver names are case-insensitive for driver's name. "s3" and "aws-s3" are similar, with "aws-s3" you just don't need to define the url of the endpoint because it is aligned with region, and default is `s3.amazonaws.com`; while with s3, you must provide the endpoint url by `s3ManagedLedgerOffloadServiceEndpoint`." %} ```conf -managedLedgerOffloadDriver=S3 -s3ManagedLedgerOffloadRegion=eu-west-3 -s3ManagedLedgerOffloadBucket=pulsar-topic-offload +managedLedgerOffloadDriver=aws-s3 ``` -It is also possible to specify the s3 endpoint directly, using ```s3ManagedLedgerOffloadServiceEndpoint```. This is useful if you are using a non-AWS storage service which provides an S3 compatible API. +### Configuring for "aws-s3" driver -{% include admonition.html type="warning" content="If the endpoint is specified directly, then the region must _not_ be set." %} + Configuring the Bucket -{% include admonition.html type="warning" content="The broker.conf of all brokers must have the same configuration for driver, region and bucket for offload to avoid data becoming unavailable as topics move from one broker to another." %} +On the broker, the administrator must configure the bucket and credentials for the cloud storage service. The configured bucket and credentials must exist before attempting to offload. If it does not exist, the offload operation will fail. -Pulsar also provides some knobs to configure the size of requests sent to S3. +Regarding AWS S3, the administrator should configure `s3ManagedLedgerOffloadBucket`. Review comment: Remove this line. It adds little. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205820083 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -17,44 +19,50 @@ A topic in Pulsar is backed by a log, known as a managed ledger. This log is com The Tiered Storage offloading mechanism takes advantage of this segment oriented architecture. When offloading is requested, the segments of the log are copied, one-by-one, to tiered storage. All segments of the log, apart from the segment currently being written to can be offloaded. -## Amazon S3 - -Tiered storage currently supports S3 for long term storage. On the broker, the administrator must configure a S3 bucket and the AWS region where the bucket exists. Offloaded data will be placed into this bucket. +On the broker, the administrator must configure the bucket or credentials for the cloud storage service. The configured bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. -The configured S3 bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. +Pulsar uses multi-part objects to upload the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a life cycle rule your bucket to expire incomplete multi-part upload after a day or two to avoid getting charged for incomplete uploads. -Pulsar users multipart objects to update the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a lifecycle rule your S3 bucket to expire incomplete multipart upload after a day or two to avoid getting charged for incomplete uploads. - -### Configuring the broker +## Configuring the driver for "aws-s3" or "google-cloud-storage" in the broker Offloading is configured in ```broker.conf```. -At a minimum, the user must configure the driver, the region and the bucket. +At a minimum, the administrator must configure the driver, the bucket and the authenticating. There is also some other knobs to configure, like the bucket regions, the max block size in backed storage, etc. + +Currently we support driver of types: { "aws-s3", "google-cloud-storage" }, +{% include admonition.html type="warning" content="Driver names are case-insensitive for driver's name. "s3" and "aws-s3" are similar, with "aws-s3" you just don't need to define the url of the endpoint because it is aligned with region, and default is `s3.amazonaws.com`; while with s3, you must provide the endpoint url by `s3ManagedLedgerOffloadServiceEndpoint`." %} ```conf -managedLedgerOffloadDriver=S3 -s3ManagedLedgerOffloadRegion=eu-west-3 -s3ManagedLedgerOffloadBucket=pulsar-topic-offload +managedLedgerOffloadDriver=aws-s3 ``` -It is also possible to specify the s3 endpoint directly, using ```s3ManagedLedgerOffloadServiceEndpoint```. This is useful if you are using a non-AWS storage service which provides an S3 compatible API. +### Configuring for "aws-s3" driver -{% include admonition.html type="warning" content="If the endpoint is specified directly, then the region must _not_ be set." %} + Configuring the Bucket -{% include admonition.html type="warning" content="The broker.conf of all brokers must have the same configuration for driver, region and bucket for offload to avoid data becoming unavailable as topics move from one broker to another." %} +On the broker, the administrator must configure the bucket and credentials for the cloud storage service. The configured bucket and credentials must exist before attempting to offload. If it does not exist, the offload operation will fail. -Pulsar also provides some knobs to configure the size of requests sent to S3. +Regarding AWS S3, the administrator should configure `s3ManagedLedgerOffloadBucket`. -- ```s3ManagedLedgerOffloadMaxBlockSizeInBytes``` configures the maximum size of a "part" sent during a multipart upload. This cannot be smaller than 5MB. Default is 64MB. -- ```s3ManagedLedgerOffloadReadBufferSizeInBytes``` configures the block size for each individual read when reading back data from S3. Default is 1MB. +```conf +s3ManagedLedgerOffloadBucket=pulsar-topic-offload +``` -In both cases, these should not be touched unless you know what you are doing. + Configuring the Bucket Region -{% include admonition.html type="warning" content="The broker must be rebooted for any changes in the configuration to take effect." %} +Bucket Region is the region where bucket located. -### Authenticating with S3 +Regarding AWS S3, the default region is `US East (N. Virginia)`. Page [AWS Regions and Endpoints](https://docs.aws.amazon.com/general/latest/gr/rande.html) contains more information. -To be able to access S3, you need to authenticate with S3. Pulsar does not provide any direct mean
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205815813 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -17,44 +19,50 @@ A topic in Pulsar is backed by a log, known as a managed ledger. This log is com The Tiered Storage offloading mechanism takes advantage of this segment oriented architecture. When offloading is requested, the segments of the log are copied, one-by-one, to tiered storage. All segments of the log, apart from the segment currently being written to can be offloaded. -## Amazon S3 - -Tiered storage currently supports S3 for long term storage. On the broker, the administrator must configure a S3 bucket and the AWS region where the bucket exists. Offloaded data will be placed into this bucket. +On the broker, the administrator must configure the bucket or credentials for the cloud storage service. The configured bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. -The configured S3 bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. +Pulsar uses multi-part objects to upload the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a life cycle rule your bucket to expire incomplete multi-part upload after a day or two to avoid getting charged for incomplete uploads. -Pulsar users multipart objects to update the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a lifecycle rule your S3 bucket to expire incomplete multipart upload after a day or two to avoid getting charged for incomplete uploads. - -### Configuring the broker +## Configuring the driver for "aws-s3" or "google-cloud-storage" in the broker Review comment: -> Configuring the offload driver This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205816007 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -17,44 +19,50 @@ A topic in Pulsar is backed by a log, known as a managed ledger. This log is com The Tiered Storage offloading mechanism takes advantage of this segment oriented architecture. When offloading is requested, the segments of the log are copied, one-by-one, to tiered storage. All segments of the log, apart from the segment currently being written to can be offloaded. -## Amazon S3 - -Tiered storage currently supports S3 for long term storage. On the broker, the administrator must configure a S3 bucket and the AWS region where the bucket exists. Offloaded data will be placed into this bucket. +On the broker, the administrator must configure the bucket or credentials for the cloud storage service. The configured bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. -The configured S3 bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. +Pulsar uses multi-part objects to upload the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a life cycle rule your bucket to expire incomplete multi-part upload after a day or two to avoid getting charged for incomplete uploads. -Pulsar users multipart objects to update the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a lifecycle rule your S3 bucket to expire incomplete multipart upload after a day or two to avoid getting charged for incomplete uploads. - -### Configuring the broker +## Configuring the driver for "aws-s3" or "google-cloud-storage" in the broker Offloading is configured in ```broker.conf```. -At a minimum, the user must configure the driver, the region and the bucket. +At a minimum, the administrator must configure the driver, the bucket and the authenticating. There is also some other knobs to configure, like the bucket regions, the max block size in backed storage, etc. Review comment: -> the bucket and authentication credentials. -> bucket region This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205815559 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -5,6 +5,8 @@ tags: [admin, tiered-storage] Pulsar's **Tiered Storage** feature allows older backlog data to be offloaded to long term storage, thereby freeing up space in BookKeeper and reducing storage costs. This cookbook walks you through using tiered storage in your Pulsar cluster. +Tiered storage currently leverage [Apache Jclouds](https://jclouds.apache.org) to supports [S3](https://aws.amazon.com/s3/) and [Google Cloud Storage](https://cloud.google.com/storage/)(GCS for short) for long term storage. And by Jclouds, it is easy to add more [supported](https://jclouds.apache.org/reference/providers/#blobstore-providers) cloud storage provider in the future. Review comment: -> currently uses [Apache Jclouds](...) to support [Amazon S3](...). And by Jclouds.. -> With jclouds, it is easy to add support for more cloud storage providers in the future. jclouds always seem to write their name in all lowercase. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS
ivankelly commented on a change in pull request #2152: GCS offload support(4): add documentations for GCS URL: https://github.com/apache/incubator-pulsar/pull/2152#discussion_r205817458 ## File path: site/docs/latest/cookbooks/tiered-storage.md ## @@ -17,44 +19,50 @@ A topic in Pulsar is backed by a log, known as a managed ledger. This log is com The Tiered Storage offloading mechanism takes advantage of this segment oriented architecture. When offloading is requested, the segments of the log are copied, one-by-one, to tiered storage. All segments of the log, apart from the segment currently being written to can be offloaded. -## Amazon S3 - -Tiered storage currently supports S3 for long term storage. On the broker, the administrator must configure a S3 bucket and the AWS region where the bucket exists. Offloaded data will be placed into this bucket. +On the broker, the administrator must configure the bucket or credentials for the cloud storage service. The configured bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. -The configured S3 bucket must exist before attempting to offload. If it does not exist, the offload operation will fail. +Pulsar uses multi-part objects to upload the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a life cycle rule your bucket to expire incomplete multi-part upload after a day or two to avoid getting charged for incomplete uploads. -Pulsar users multipart objects to update the segment data. It is possible that a broker could crash while uploading the data. We recommend you add a lifecycle rule your S3 bucket to expire incomplete multipart upload after a day or two to avoid getting charged for incomplete uploads. - -### Configuring the broker +## Configuring the driver for "aws-s3" or "google-cloud-storage" in the broker Offloading is configured in ```broker.conf```. -At a minimum, the user must configure the driver, the region and the bucket. +At a minimum, the administrator must configure the driver, the bucket and the authenticating. There is also some other knobs to configure, like the bucket regions, the max block size in backed storage, etc. + +Currently we support driver of types: { "aws-s3", "google-cloud-storage" }, +{% include admonition.html type="warning" content="Driver names are case-insensitive for driver's name. "s3" and "aws-s3" are similar, with "aws-s3" you just don't need to define the url of the endpoint because it is aligned with region, and default is `s3.amazonaws.com`; while with s3, you must provide the endpoint url by `s3ManagedLedgerOffloadServiceEndpoint`." %} ```conf -managedLedgerOffloadDriver=S3 -s3ManagedLedgerOffloadRegion=eu-west-3 -s3ManagedLedgerOffloadBucket=pulsar-topic-offload +managedLedgerOffloadDriver=aws-s3 ``` -It is also possible to specify the s3 endpoint directly, using ```s3ManagedLedgerOffloadServiceEndpoint```. This is useful if you are using a non-AWS storage service which provides an S3 compatible API. +### Configuring for "aws-s3" driver -{% include admonition.html type="warning" content="If the endpoint is specified directly, then the region must _not_ be set." %} + Configuring the Bucket Review comment: I would change this to Bucket & Region and then remove the region header below. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] maskit commented on issue #2251: [website] Enable Translation & Localization
maskit commented on issue #2251: [website] Enable Translation & Localization URL: https://github.com/apache/incubator-pulsar/pull/2251#issuecomment-408417999 Sounds good. Can we have a statement that says translated documents will be licensed under AL2 on the project description or somewhere? I'm not sure whether we really need it, but if the translation will be stored into our git repo or be distributed under pulsar.i.a.o, we probably need it. And we also need to make sure that only committers have a privilege to approve suggested translations. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ratcashdev commented on issue #1334: Experimental settings to achieve 5 ms latency
ratcashdev commented on issue #1334: Experimental settings to achieve 5 ms latency URL: https://github.com/apache/incubator-pulsar/issues/1334#issuecomment-408387040 According to the blog post here: https://dzone.com/articles/benchmarking-message-queue-latency Kafka (and also RabbitMQ) can achieve sub 1 ms latency, while using persistence. Materials presented by Streaml.io suggest Pulsar has a 40% lower latency compared to Kafka (https://www.businesswire.com/news/home/20180306005633/en/Apache-Pulsar-Outperforms-Apache-Kafka-2.5x-OpenMessaging) How can we achieve / replicate these claims? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jiazhai commented on issue #2251: [website] Enable Translation & Localization
jiazhai commented on issue #2251: [website] Enable Translation & Localization URL: https://github.com/apache/incubator-pulsar/pull/2251#issuecomment-408375728 👍 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #2251: [website] Enable Translation & Localization
sijie commented on issue #2251: [website] Enable Translation & Localization URL: https://github.com/apache/incubator-pulsar/pull/2251#issuecomment-408369735 for reviewers, this change is based on #2247 . so gitsha 6d08fa3 is the change to be reviewed in this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #2251: [website] Enable Translation & Localization
sijie commented on issue #2251: [website] Enable Translation & Localization URL: https://github.com/apache/incubator-pulsar/pull/2251#issuecomment-408369760 @cckellogg ^^ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie opened a new pull request #2251: [website] Enable Translation & Localization
sijie opened a new pull request #2251: [website] Enable Translation & Localization URL: https://github.com/apache/incubator-pulsar/pull/2251 *Motivation* Docusaurus allows for easy translation functionality using [Crowdin](https://crowdin.com/). Documentation files written in English are uploaded to Crowdin for translation by users within a community. *Changes* - Enable the integration with Crowdin. - Adjust hyperlinks to include `language`. Crowdin project: https://crowdin.com/project/apache-pulsar This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dbsd11 opened a new issue #2250: How to manage namespace policies efficiently
dbsd11 opened a new issue #2250: How to manage namespace policies efficiently URL: https://github.com/apache/incubator-pulsar/issues/2250 Expected behavior Doubts about the use and policies of namespace . It's not very intuitive. I think need an admin-ui project to manage namespaces,topics like rocketmq This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ivankelly opened a new pull request #2249: Throw exception on non-zero command exit in integ tests
ivankelly opened a new pull request #2249: Throw exception on non-zero command exit in integ tests URL: https://github.com/apache/incubator-pulsar/pull/2249 This patch replaces the boolean flag to ignore failure, and instead always throws an exception if a command exits with non-zero. This means that test code doesn't need to check the exit code for each invokation. For non-zero exits, the ContainerExecResult is carried as part of the exception, so the test can also assert on the contents of stdout, stderr and the exit code. ### Motivation Explain here the context, and why you're making that change. What is the problem you're trying to solve. ### Modifications Describe the modifications you've done. ### Result After your change, what will change. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #2247: [website] Add versions page
sijie commented on issue #2247: [website] Add versions page URL: https://github.com/apache/incubator-pulsar/pull/2247#issuecomment-408351505 rerun java8 tests This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services