[GitHub] [samza] cameronlee314 commented on issue #1323: Add docs for configs of Azure Blob SystemProducer
cameronlee314 commented on issue #1323: Add docs for configs of Azure Blob SystemProducer URL: https://github.com/apache/samza/pull/1323#issuecomment-602931582 FYI, in case you didn't know, you can test what your changes look like by following `docs/README.md`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [samza] cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer
cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer URL: https://github.com/apache/samza/pull/1323#discussion_r396830826 ## File path: docs/learn/documentation/versioned/jobs/samza-configurations.md ## @@ -245,6 +246,34 @@ Configs for producing to [ElasticSearch](https://www.elastic.co/products/elastic |systems.**_system-name_**.bulk.flush.max.size.mb|5|The maximum aggregate size of messages in the buffered before flushing.| |systems.**_system-name_**.bulk.flush.interval.ms|never|How often buffered messages should be flushed.| + [3.7 Azure Blob Storage](#azure-blob-storage) +Configs for producing to [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/). This section applies if you have set systems.**__system-name__**.samza.factory = `org.apache.samza.system.azureblob.AzureBlobSystemFactory`. +**_system-name_** is the Azure container name you want to produce blobs to. If such a container does not exist then it is created. + +|Name|Default|Description| +|--- |--- |--- | +|sensitive.systems.**_system-name_**.azureblob.account.name| |__Required:__ The Azure account name to which the Azure container belongs to. | +|sensitive.systems.**_system-name_**.azureblob.account.key| |__Required:__ Key for the Azure account specified above.| + + [Advanced Azure Blob Storage Configurations](#advanced-azure-blob-storage) +|Name|Default|Description| +|--- |--- |--- | +|systems.**_system-name_**.azureblob.proxy.use |"false"|if true, proxy will be used to connect to Azure.| +|systems.**_system-name_**.azureblob.proxy.hostname| |if proxy.use is true then host name of proxy.| +|systems.**_system-name_**.azureblob.proxy.port| |if proxy.use is true then port of proxy.| +|samza.azureblob.log.slowRequestMs|30 secs|The duration after which an Azure request will be logged as a warning.| +|systems.**_system-name_**.azureblob.writer.factory.class|`org.apache.samza.system.``azureblob.avro.``AzureBlobAvroWriterFactory`|Fully qualified class name of the `org.apache.samza.system.azureblob.producer.AzureBlobWriter` impl for the system producer.The default writer creates blobs that are of type AVRO and require the messages sent to a blob to be AVRO records. The blobs created by the default writer are of type [Block Blobs](https://docs.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs#about-block-blobs).All the following configs are relevant to this default writer.| +|systems.**_system-name_**.azureblob.compression.type|"none"|type of compression to be used before uploading blocks. Can be "none" or "gzip".| +|systems.**_system-name_**.azureblob.maxFlushThresholdSize|10485760 (10 MB)|max size of the uncompressed block to be uploaded in bytes. Maximum size allowed by Azure is 100MB.| +|systems.**_system-name_**.azureblob.maxBlobSize|Long.MAX_VALUE (unlimited)|max size of the uncompressed blob in bytes.If default value then size is unlimited capped only by Azure BlockBlob size of 4.75 TB (100 MB per block X 50,000 blocks).| Review comment: Minor: extra space before `4.75TB` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [samza] cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer
cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer URL: https://github.com/apache/samza/pull/1323#discussion_r396833814 ## File path: docs/learn/documentation/versioned/jobs/samza-configurations.md ## @@ -245,6 +246,34 @@ Configs for producing to [ElasticSearch](https://www.elastic.co/products/elastic |systems.**_system-name_**.bulk.flush.max.size.mb|5|The maximum aggregate size of messages in the buffered before flushing.| |systems.**_system-name_**.bulk.flush.interval.ms|never|How often buffered messages should be flushed.| + [3.7 Azure Blob Storage](#azure-blob-storage) +Configs for producing to [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/). This section applies if you have set systems.**__system-name__**.samza.factory = `org.apache.samza.system.azureblob.AzureBlobSystemFactory`. +**_system-name_** is the Azure container name you want to produce blobs to. If such a container does not exist then it is created. + +|Name|Default|Description| +|--- |--- |--- | +|sensitive.systems.**_system-name_**.azureblob.account.name| |__Required:__ The Azure account name to which the Azure container belongs to. | +|sensitive.systems.**_system-name_**.azureblob.account.key| |__Required:__ Key for the Azure account specified above.| + + [Advanced Azure Blob Storage Configurations](#advanced-azure-blob-storage) +|Name|Default|Description| +|--- |--- |--- | +|systems.**_system-name_**.azureblob.proxy.use |"false"|if true, proxy will be used to connect to Azure.| +|systems.**_system-name_**.azureblob.proxy.hostname| |if proxy.use is true then host name of proxy.| +|systems.**_system-name_**.azureblob.proxy.port| |if proxy.use is true then port of proxy.| +|samza.azureblob.log.slowRequestMs|30 secs|The duration after which an Azure request will be logged as a warning.| +|systems.**_system-name_**.azureblob.writer.factory.class|`org.apache.samza.system.``azureblob.avro.``AzureBlobAvroWriterFactory`|Fully qualified class name of the `org.apache.samza.system.azureblob.producer.AzureBlobWriter` impl for the system producer.The default writer creates blobs that are of type AVRO and require the messages sent to a blob to be AVRO records. The blobs created by the default writer are of type [Block Blobs](https://docs.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs#about-block-blobs).All the following configs are relevant to this default writer.| Review comment: Regarding "All the following configs are relevant to this default writer.": The following configs apply to other writers too, right? The wording kind of makes it sound like the following configs won't apply to a non-default writer. Can you please clarify that a little bit (or maybe you can just remove that sentence)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [samza] cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer
cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer URL: https://github.com/apache/samza/pull/1323#discussion_r396829354 ## File path: docs/learn/documentation/versioned/jobs/samza-configurations.md ## @@ -245,6 +246,34 @@ Configs for producing to [ElasticSearch](https://www.elastic.co/products/elastic |systems.**_system-name_**.bulk.flush.max.size.mb|5|The maximum aggregate size of messages in the buffered before flushing.| |systems.**_system-name_**.bulk.flush.interval.ms|never|How often buffered messages should be flushed.| + [3.7 Azure Blob Storage](#azure-blob-storage) +Configs for producing to [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/). This section applies if you have set systems.**__system-name__**.samza.factory = `org.apache.samza.system.azureblob.AzureBlobSystemFactory`. +**_system-name_** is the Azure container name you want to produce blobs to. If such a container does not exist then it is created. + +|Name|Default|Description| +|--- |--- |--- | +|sensitive.systems.**_system-name_**.azureblob.account.name| |__Required:__ The Azure account name to which the Azure container belongs to. | +|sensitive.systems.**_system-name_**.azureblob.account.key| |__Required:__ Key for the Azure account specified above.| + + [Advanced Azure Blob Storage Configurations](#advanced-azure-blob-storage) +|Name|Default|Description| +|--- |--- |--- | +|systems.**_system-name_**.azureblob.proxy.use |"false"|if true, proxy will be used to connect to Azure.| Review comment: Minor: It looks like other parts of this documentation use `false` instead of `"false"`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [samza] cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer
cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer URL: https://github.com/apache/samza/pull/1323#discussion_r396830013 ## File path: docs/learn/documentation/versioned/jobs/samza-configurations.md ## @@ -245,6 +246,34 @@ Configs for producing to [ElasticSearch](https://www.elastic.co/products/elastic |systems.**_system-name_**.bulk.flush.max.size.mb|5|The maximum aggregate size of messages in the buffered before flushing.| |systems.**_system-name_**.bulk.flush.interval.ms|never|How often buffered messages should be flushed.| + [3.7 Azure Blob Storage](#azure-blob-storage) +Configs for producing to [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/). This section applies if you have set systems.**__system-name__**.samza.factory = `org.apache.samza.system.azureblob.AzureBlobSystemFactory`. +**_system-name_** is the Azure container name you want to produce blobs to. If such a container does not exist then it is created. + +|Name|Default|Description| +|--- |--- |--- | +|sensitive.systems.**_system-name_**.azureblob.account.name| |__Required:__ The Azure account name to which the Azure container belongs to. | +|sensitive.systems.**_system-name_**.azureblob.account.key| |__Required:__ Key for the Azure account specified above.| + + [Advanced Azure Blob Storage Configurations](#advanced-azure-blob-storage) +|Name|Default|Description| +|--- |--- |--- | +|systems.**_system-name_**.azureblob.proxy.use |"false"|if true, proxy will be used to connect to Azure.| +|systems.**_system-name_**.azureblob.proxy.hostname| |if proxy.use is true then host name of proxy.| +|systems.**_system-name_**.azureblob.proxy.port| |if proxy.use is true then port of proxy.| +|samza.azureblob.log.slowRequestMs|30 secs|The duration after which an Azure request will be logged as a warning.| Review comment: Minor: For consistency, maybe put the actual milliseconds number. You can put `30s` in parentheses or as a note in the Description part. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [samza] cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer
cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer URL: https://github.com/apache/samza/pull/1323#discussion_r396832610 ## File path: docs/learn/documentation/versioned/jobs/samza-configurations.md ## @@ -245,6 +246,34 @@ Configs for producing to [ElasticSearch](https://www.elastic.co/products/elastic |systems.**_system-name_**.bulk.flush.max.size.mb|5|The maximum aggregate size of messages in the buffered before flushing.| |systems.**_system-name_**.bulk.flush.interval.ms|never|How often buffered messages should be flushed.| + [3.7 Azure Blob Storage](#azure-blob-storage) +Configs for producing to [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/). This section applies if you have set systems.**__system-name__**.samza.factory = `org.apache.samza.system.azureblob.AzureBlobSystemFactory`. +**_system-name_** is the Azure container name you want to produce blobs to. If such a container does not exist then it is created. + +|Name|Default|Description| +|--- |--- |--- | +|sensitive.systems.**_system-name_**.azureblob.account.name| |__Required:__ The Azure account name to which the Azure container belongs to. | +|sensitive.systems.**_system-name_**.azureblob.account.key| |__Required:__ Key for the Azure account specified above.| + + [Advanced Azure Blob Storage Configurations](#advanced-azure-blob-storage) +|Name|Default|Description| +|--- |--- |--- | +|systems.**_system-name_**.azureblob.proxy.use |"false"|if true, proxy will be used to connect to Azure.| +|systems.**_system-name_**.azureblob.proxy.hostname| |if proxy.use is true then host name of proxy.| +|systems.**_system-name_**.azureblob.proxy.port| |if proxy.use is true then port of proxy.| +|samza.azureblob.log.slowRequestMs|30 secs|The duration after which an Azure request will be logged as a warning.| Review comment: Can you please clarify the description? I think the usage of the term "duration" might be overloaded. Do you mean that if the Azure request takes 30s to complete, then it will be logged? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [samza] cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer
cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer URL: https://github.com/apache/samza/pull/1323#discussion_r396828747 ## File path: docs/learn/documentation/versioned/jobs/samza-configurations.md ## @@ -245,6 +246,34 @@ Configs for producing to [ElasticSearch](https://www.elastic.co/products/elastic |systems.**_system-name_**.bulk.flush.max.size.mb|5|The maximum aggregate size of messages in the buffered before flushing.| |systems.**_system-name_**.bulk.flush.interval.ms|never|How often buffered messages should be flushed.| + [3.7 Azure Blob Storage](#azure-blob-storage) +Configs for producing to [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/). This section applies if you have set systems.**__system-name__**.samza.factory = `org.apache.samza.system.azureblob.AzureBlobSystemFactory`. Review comment: Minor: The `**__system-name__**` part looks a little inconsistent with the other sections (which use `systems.*.samza.factory`). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [samza] cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer
cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer URL: https://github.com/apache/samza/pull/1323#discussion_r396831076 ## File path: docs/learn/documentation/versioned/jobs/samza-configurations.md ## @@ -245,6 +246,34 @@ Configs for producing to [ElasticSearch](https://www.elastic.co/products/elastic |systems.**_system-name_**.bulk.flush.max.size.mb|5|The maximum aggregate size of messages in the buffered before flushing.| |systems.**_system-name_**.bulk.flush.interval.ms|never|How often buffered messages should be flushed.| + [3.7 Azure Blob Storage](#azure-blob-storage) +Configs for producing to [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/). This section applies if you have set systems.**__system-name__**.samza.factory = `org.apache.samza.system.azureblob.AzureBlobSystemFactory`. +**_system-name_** is the Azure container name you want to produce blobs to. If such a container does not exist then it is created. + +|Name|Default|Description| +|--- |--- |--- | +|sensitive.systems.**_system-name_**.azureblob.account.name| |__Required:__ The Azure account name to which the Azure container belongs to. | +|sensitive.systems.**_system-name_**.azureblob.account.key| |__Required:__ Key for the Azure account specified above.| + + [Advanced Azure Blob Storage Configurations](#advanced-azure-blob-storage) +|Name|Default|Description| +|--- |--- |--- | +|systems.**_system-name_**.azureblob.proxy.use |"false"|if true, proxy will be used to connect to Azure.| +|systems.**_system-name_**.azureblob.proxy.hostname| |if proxy.use is true then host name of proxy.| +|systems.**_system-name_**.azureblob.proxy.port| |if proxy.use is true then port of proxy.| +|samza.azureblob.log.slowRequestMs|30 secs|The duration after which an Azure request will be logged as a warning.| +|systems.**_system-name_**.azureblob.writer.factory.class|`org.apache.samza.system.``azureblob.avro.``AzureBlobAvroWriterFactory`|Fully qualified class name of the `org.apache.samza.system.azureblob.producer.AzureBlobWriter` impl for the system producer.The default writer creates blobs that are of type AVRO and require the messages sent to a blob to be AVRO records. The blobs created by the default writer are of type [Block Blobs](https://docs.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs#about-block-blobs).All the following configs are relevant to this default writer.| +|systems.**_system-name_**.azureblob.compression.type|"none"|type of compression to be used before uploading blocks. Can be "none" or "gzip".| +|systems.**_system-name_**.azureblob.maxFlushThresholdSize|10485760 (10 MB)|max size of the uncompressed block to be uploaded in bytes. Maximum size allowed by Azure is 100MB.| +|systems.**_system-name_**.azureblob.maxBlobSize|Long.MAX_VALUE (unlimited)|max size of the uncompressed blob in bytes.If default value then size is unlimited capped only by Azure BlockBlob size of 4.75 TB (100 MB per block X 50,000 blocks).| +|systems.**_system-name_**.azureblob.maxMessagesPerBlob|Long.MAX_VALUE (unlimited)|max number of messages per blob.| +|systems.**_system-name_**.azureblob.threadPoolCount|2|number of threads for the asynchronous uploading of blocks.| +|systems.**_system-name_**.azureblob.blockingQueueSize|Thread Pool Count * 2|size of the queue to hold blocks ready to be uploaded by asynchronous threads.If all threads are busy uploading then blocks are queued and if queue is full then main thread will start uploading which will block processing of incoming messages.| +|systems.**_system-name_**.azureblob.flushTimeoutMs|3 mins|timeout to finish uploading all blocks before committing a blob.| +|systems.**_system-name_**.azureblob.closeTimeoutMs|5 mins|timeout to finish committing all the blobs currently being written to. This does not include the flush timeout per blob.| Review comment: Minor: same as above regarding using the actual milliseconds value This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [samza] cameronlee314 commented on issue #1325: SAMZA-2491: log uncaught exceptions in JC
cameronlee314 commented on issue #1325: SAMZA-2491: log uncaught exceptions in JC URL: https://github.com/apache/samza/pull/1325#issuecomment-602913662 Looks like your checkstyle failed in CI. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [samza] alnzng commented on issue #1326: SAMZA-2492: Add new deserialize function for JobGraphJson and change the scope of related classes as public
alnzng commented on issue #1326: SAMZA-2492: Add new deserialize function for JobGraphJson and change the scope of related classes as public URL: https://github.com/apache/samza/pull/1326#issuecomment-602847934 @MabelYC Seems you are looking for the same function provided in this PR, can you please help take a look at it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (SAMZA-2492) Add new deserialize function for JobGraphJson and change the scope of related classes as public
[ https://issues.apache.org/jira/browse/SAMZA-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Zhang updated SAMZA-2492: -- Description: In Samza core, `JobGraphJsonGenerator` provides the function to serliaze `JobGraphJson` as plan JSON and later the config `samza.internal.execution.plan` will use this string as value. However, it doesn't provide the deserialize function to help generate `JobGraphJson` from plan JSON string. This function is useful when you need to parse information from `samza.internal.execution.plan`. In this ticket, there are two changes will be done here: # Add new deserialize function `toJobGraphJson()` # Change the package scope of class JobGraphJsonGenerator as public was: In Samza core, `JobGraphJsonGenerator` provides the function to serliaze `JobGraphJson` as plan json and later the config `samza.internal.execution.plan` will use this string as value. However, it doesn't provide the deserialize function to help generate `JobGraphJson` from plan json string. This function is useful when you need to pase information from `samza.internal.execution.plan`. In this ticket, there are two changes will be done here: # Add new deserialize function `toJobGraphJson()` # Change the package scope of class JobGraphJsonGenerator as public > Add new deserialize function for JobGraphJson and change the scope of related > classes as public > --- > > Key: SAMZA-2492 > URL: https://issues.apache.org/jira/browse/SAMZA-2492 > Project: Samza > Issue Type: Task >Reporter: Alan Zhang >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In Samza core, `JobGraphJsonGenerator` provides the function to serliaze > `JobGraphJson` as plan JSON and later the config > `samza.internal.execution.plan` will use this string as value. > However, it doesn't provide the deserialize function to help generate > `JobGraphJson` from plan JSON string. This function is useful when you need > to parse information from `samza.internal.execution.plan`. > In this ticket, there are two changes will be done here: > # Add new deserialize function `toJobGraphJson()` > # Change the package scope of class JobGraphJsonGenerator as public > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (SAMZA-2492) Add new deserialize function for JobGraphJson and change the scope of related classes as public
[ https://issues.apache.org/jira/browse/SAMZA-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Zhang updated SAMZA-2492: -- Summary: Add new deserialize function for JobGraphJson and change the scope of related classes as public (was: Add new deserialize function for JobGraphJson and change related classes' scope as public) > Add new deserialize function for JobGraphJson and change the scope of related > classes as public > --- > > Key: SAMZA-2492 > URL: https://issues.apache.org/jira/browse/SAMZA-2492 > Project: Samza > Issue Type: Task >Reporter: Alan Zhang >Priority: Major > > In Samza core, `JobGraphJsonGenerator` provides the function to serliaze > `JobGraphJson` as plan json and later the config > `samza.internal.execution.plan` will use this string as value. > However, it doesn't provide the deserialize function to help generate > `JobGraphJson` from plan json string. This function is useful when you need > to pase information from `samza.internal.execution.plan`. > In this ticket, there are two changes will be done here: > # Add new deserialize function `toJobGraphJson()` > # Change the package scope of class JobGraphJsonGenerator as public > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (SAMZA-2492) Add new deserialize function for JobGraphJson and change related classes' scope as public
[ https://issues.apache.org/jira/browse/SAMZA-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Zhang updated SAMZA-2492: -- Summary: Add new deserialize function for JobGraphJson and change related classes' scope as public (was: Add new deserialize function for JobGraphJson and make JobGraphJsonGenerator as public) > Add new deserialize function for JobGraphJson and change related classes' > scope as public > - > > Key: SAMZA-2492 > URL: https://issues.apache.org/jira/browse/SAMZA-2492 > Project: Samza > Issue Type: Task >Reporter: Alan Zhang >Priority: Major > > In Samza core, `JobGraphJsonGenerator` provides the function to serliaze > `JobGraphJson` as plan json and later the config > `samza.internal.execution.plan` will use this string as value. > However, it doesn't provide the deserialize function to help generate > `JobGraphJson` from plan json string. This function is useful when you need > to pase information from `samza.internal.execution.plan`. > In this ticket, there are two changes will be done here: > # Add new deserialize function `toJobGraphJson()` > # Change the package scope of class JobGraphJsonGenerator as public > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [samza] alnzng opened a new pull request #1326: SAMZA-2492: Add new deserialize function for JobGraphJson and change the scope of related classes as public
alnzng opened a new pull request #1326: SAMZA-2492: Add new deserialize function for JobGraphJson and change the scope of related classes as public URL: https://github.com/apache/samza/pull/1326 Symptom In Samza core, `JobGraphJsonGenerator` provides the function to serialize `JobGraphJson` as plan JSON and later the config `samza.internal.execution.plan` will use this string as value. However, it doesn't provide the deserialize function to help generate `JobGraphJson` from plan JSON string. This function is useful when you need to parse information from the config `samza.internal.execution.plan`. Changes In this PR, there are two changes made for supporting `JobGraphJson` deserialization: 1. Add new deserialize function `toJobGraphJson()` 1. Change the package scope of related classes as `public` Tests - [ ] All unit tests and integration tests are passed API Changes None Upgrade Instructions None Usage Instructions None This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [samza] lhaiesp opened a new pull request #1325: SAMZA-2491: log uncaught exceptions in JC
lhaiesp opened a new pull request #1325: SAMZA-2491: log uncaught exceptions in JC URL: https://github.com/apache/samza/pull/1325 AM should log uncaught exceptions and System.exit to ensure that the process dies on errors This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (SAMZA-2492) Add new deserialize function for JobGraphJson and make JobGraphJsonGenerator as public
Alan Zhang created SAMZA-2492: - Summary: Add new deserialize function for JobGraphJson and make JobGraphJsonGenerator as public Key: SAMZA-2492 URL: https://issues.apache.org/jira/browse/SAMZA-2492 Project: Samza Issue Type: Task Reporter: Alan Zhang In Samza core, `JobGraphJsonGenerator` provides the function to serliaze `JobGraphJson` as plan json and later the config `samza.internal.execution.plan` will use this string as value. However, it doesn't provide the deserialize function to help generate `JobGraphJson` from plan json string. This function is useful when you need to pase information from `samza.internal.execution.plan`. In this ticket, there are two changes will be done here: # Add new deserialize function `toJobGraphJson()` # Change the package scope of class JobGraphJsonGenerator as public -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (SAMZA-2491) AM should log uncaught exceptions and System.exit to ensure that the process dies on errors
Hai Lu created SAMZA-2491: - Summary: AM should log uncaught exceptions and System.exit to ensure that the process dies on errors Key: SAMZA-2491 URL: https://issues.apache.org/jira/browse/SAMZA-2491 Project: Samza Issue Type: Improvement Reporter: Hai Lu Assignee: Hai Lu From: pmaheshw Symptom: A job deployment timed out waiting for application attempt to transition from New to Running. Cause: ClusterBasedJobCoordinator threw an exception during startup due to a misconfiguration, but did not kill the AM process (likely due to non-daemon threads). Suggested fixes: 1. ClusterBasedJobCoordinator#main doesn't use an uncaught exception handler, and doesn't catch + log any exceptions thrown from ClusterBasedJobCoordinator constructor or from run(). We should fix this. Uncaught exceptions go to stderr instead of logs and do not have a timestamp, which makes debugging difficult. E.g.: Exception in thread "main" org.apache.samza.SamzaException: Cannot get systemAdmin for system aggregate-tracking at org.apache.samza.system.SystemAdmins.getSystemAdmin(SystemAdmins.java:63) at org.apache.samza.system.StreamMetadataCache$$anonfun$3.apply(StreamMetadataCache.scala:66) at org.apache.samza.system.StreamMetadataCache$$anonfun$3.apply(StreamMetadataCache.scala:64) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.Map$Map2.foreach(Map.scala:137) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at org.apache.samza.system.StreamMetadataCache.getStreamMetadata(StreamMetadataCache.scala:64) at org.apache.samza.coordinator.StreamPartitionCountMonitor.getMetadata(StreamPartitionCountMonitor.java:92) at org.apache.samza.coordinator.StreamPartitionCountMonitor.(StreamPartitionCountMonitor.java:113) at org.apache.samza.clustermanager.ClusterBasedJobCoordinator.getPartitionCountMonitor(ClusterBasedJobCoordinator.java:343) at org.apache.samza.clustermanager.ClusterBasedJobCoordinator.(ClusterBasedJobCoordinator.java:207) at org.apache.samza.clustermanager.ClusterBasedJobCoordinator.main(ClusterBasedJobCoordinator.java:441) 2. JC should call System.exit on returning from main (cleanly or on exception) and from the uncaught exception handler to ensure that the AM process dies on these errors and does not leave the deployment hanging. We've also seen this issue due to client libraries (datavault, brooklin, kafka etc.) creating non-daemon threads and not stopping them cleanly. See LocalContainerRunner for reference, which does kill the process on returning from main thread. E.g., in this case its threads like this: "AsyncHttpClient-27-1" #134 prio=5 os_prio=0 tid=0x7faead675000 nid=0x4151 runnable [0x7fae9c9da000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0xfe6a2f40> (a com.linkedin.mario.shaded.io.netty.channel.nio.SelectedSelectionKeySet) - locked <0xfe6fe9c0> (a java.util.Collections$UnmodifiableSet) - locked <0xfe6a3f68> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at com.linkedin.mario.shaded.io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62) at com.linkedin.mario.shaded.io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:824) at com.linkedin.mario.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457) at com.linkedin.mario.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044) at com.linkedin.mario.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at com.linkedin.mario.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[CONF] Apache Samza > Samza Enhancement Proposal
Title: Message Title There's 1 new edit on this page Samza Enhancement Proposal Cameron Lee edited this page Here's what changed: ... SEP Status Link to Discussion Thread Related JIRA SEP-1: Semantics of ProcessorId in Samza Accepted http://mail-archives.apache.org/mod_mbox/samza-dev/201703.mbox/browser SAMZA-1126 SEP-2: ApplicationRunner Design Discuss SAMZA-1130 SEP-3: Heart-beat mechanism between JobCoordinator and all running containers Accepted http://mail-archives.apache.org/mod_mbox/samza-dev/201705.mbox/%3CCANxwKLaVro6MBvUJW2RvoNLDO9-G87Y3Ox%2B5W66K_CxBqeVfgQ%40mail.gmail.com%3E SAMZA-871 SEP-4: Adjunct Data Store for Unbounded Datasets Discuss http://mail-archives.apache.org/mod_mbox/samza-dev/201705.mbox/browser SAMZA-1278 SEP-5: Enable partition expansion of input streams Accepted SAMZA-1293 SEP-6: Support Control Message Across Intermediate Streams Discuss SAMZA-1260 SEP-7: Samza on Azure Discuss SAMZA-1373 SEP-8 Add in-memory system consumer & producer Accepted http://mail-archives.apache.org/mod_mbox/samza-dev/201708.mbox/%3ccag2vmjhyxckqn4k+pst83ocpjc1r79mapsqy-vnkbbpubue...@mail.gmail.com%3E SAMZA-1395 SEP-9 Add a Kinesis SystemConsumer and SystemProducer SEP-10 Exactly-once Processing in Samza SEP-11: Host affinity in standalone. Accepted SAMZA-1554 SEP-12: Integration Test Framework Accepted http://mail-archives.apache.org/mod_mbox/samza-dev/201805.mbox/%3CDM5PR21MB02827A6FA9F47CB8EF99A339A2810%40DM5PR21MB0282.namprd21.prod.outlook.com%3E SAMZA-1629 SEP-13: unify high- and low-level user applications in YARN and standalone Accepted https://lists.apache.org/thread.html/5be324d239633f7433525b0e52f0377ad2f8c25787eefba1b96a492c@%3Cdev.samza.apache.org%3E SAMZA-1789 SEP-14: System and Stream Descriptors Accepted SAMZA-1804 SEP-15: New Runtime Context API Accepted SAMZA-1714 SEP-16: Extend ExecutionPlanner to Support Stream-Table Join Accepted SAMZA-1889 SEP-17: Samza SQL Shell Accepted SEP-18: Startpoints - Manipulating Starting Offsets for Input Streams Accepted SAMZA-1983 SEP-19: Hot standby state for Samza applications Discuss SEP-20: Samza on Kubernetes Accepted SEP-21: Samza Async API for High Level Accepted SAMZA-2055 SEP-22: Container Placements in Samza Discuss SEP-23: Simplify Job Runner Accepted SEP-24: Cluster-based Job Coordinator Dependency Isolation DiscussAccepted https://mail-archives.apache.org/mod_mbox/samza-dev/202003.mbox/%3CCABbqq3yEdAXiiTzXtO%2B%3DUnCBvxFR-5q_aJx3uqcQOoxF7M09vQ%40mail.gmail.com%3E SEP-25: PR Title And Description Guidelines Accepted http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/%3CCAMja7KeQr9C048UVZwfSC46h%3DEX_9S%2BSEvMF9NPg0V5dPTPfZg%40mail.gmail.com%3E SEP-26: Azure Blob Storage Producer Discuss SAMZA-2421 SEP-27: Side Inputs for Local Stores Discuss SAMZA-1773 ... Go to page history View page Stop watching space • Manage notifications This message was sent by Atlassian Confluence 7.1.2
[samza] branch master updated (74261aa -> a8eee8b)
This is an automated email from the ASF dual-hosted git repository. cameronlee pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/samza.git. from 74261aa [Minor] bump version to 1.5.0 since Samza 1.4 has been released (#1322) add a8eee8b SAMZA-2481: Improve memory usage in samza-core unit tests (#1324) No new revisions were added by this update. Summary of changes: build.gradle | 8 ++-- gradle/dependency-versions.gradle | 2 +- 2 files changed, 7 insertions(+), 3 deletions(-)
[GitHub] [samza] cameronlee314 merged pull request #1324: SAMZA-2481: Improve memory usage in samza-core unit tests
cameronlee314 merged pull request #1324: SAMZA-2481: Improve memory usage in samza-core unit tests URL: https://github.com/apache/samza/pull/1324 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services