[GitHub] [samza] cameronlee314 commented on issue #1323: Add docs for configs of Azure Blob SystemProducer

2020-03-23 Thread GitBox
cameronlee314 commented on issue #1323: Add docs for configs of Azure Blob 
SystemProducer 
URL: https://github.com/apache/samza/pull/1323#issuecomment-602931582
 
 
   FYI, in case you didn't know, you can test what your changes look like by 
following `docs/README.md`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [samza] cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer

2020-03-23 Thread GitBox
cameronlee314 commented on a change in pull request #1323: Add docs for configs 
of Azure Blob SystemProducer 
URL: https://github.com/apache/samza/pull/1323#discussion_r396830826
 
 

 ##
 File path: docs/learn/documentation/versioned/jobs/samza-configurations.md
 ##
 @@ -245,6 +246,34 @@ Configs for producing to 
[ElasticSearch](https://www.elastic.co/products/elastic
 |systems.**_system-name_**.bulk.flush.max.size.mb|5|The maximum aggregate 
size of messages in the buffered before flushing.|
 |systems.**_system-name_**.bulk.flush.interval.ms|never|How often buffered 
messages should be flushed.|
 
+ [3.7 Azure Blob 
Storage](#azure-blob-storage)
+Configs for producing to [Azure Blob 
Storage](https://azure.microsoft.com/en-us/services/storage/blobs/). This 
section applies if you have set systems.**__system-name__**.samza.factory = 
`org.apache.samza.system.azureblob.AzureBlobSystemFactory`.
+**_system-name_** is the Azure container name you want to produce blobs to. If 
such a container does not exist then it is created. 
+
+|Name|Default|Description|
+|--- |--- |--- |
+|sensitive.systems.**_system-name_**.azureblob.account.name| |__Required:__ 
The Azure account name to which the Azure container belongs to. |
+|sensitive.systems.**_system-name_**.azureblob.account.key| |__Required:__ Key 
for the Azure account specified above.|
+
+ [Advanced Azure Blob Storage 
Configurations](#advanced-azure-blob-storage)
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.azureblob.proxy.use |"false"|if true, proxy will be 
used to connect to Azure.|
+|systems.**_system-name_**.azureblob.proxy.hostname| |if proxy.use is true 
then host name of proxy.|
+|systems.**_system-name_**.azureblob.proxy.port| |if proxy.use is true then 
port of proxy.|
+|samza.azureblob.log.slowRequestMs|30 secs|The duration after which an Azure 
request will be logged as a warning.|
+|systems.**_system-name_**.azureblob.writer.factory.class|`org.apache.samza.system.``azureblob.avro.``AzureBlobAvroWriterFactory`|Fully
 qualified class name of the 
`org.apache.samza.system.azureblob.producer.AzureBlobWriter` impl for the 
system producer.The default writer creates blobs that are of type AVRO 
and require the messages sent to a blob to be AVRO records. The blobs created 
by the default writer are of type [Block 
Blobs](https://docs.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs#about-block-blobs).All
 the following configs are relevant to this default writer.|
+|systems.**_system-name_**.azureblob.compression.type|"none"|type of 
compression to be used before uploading blocks. Can be "none" or "gzip".|
+|systems.**_system-name_**.azureblob.maxFlushThresholdSize|10485760 (10 
MB)|max size of the uncompressed block to be uploaded in bytes. Maximum size 
allowed by Azure is 100MB.|
+|systems.**_system-name_**.azureblob.maxBlobSize|Long.MAX_VALUE 
(unlimited)|max size of the uncompressed blob in bytes.If default value 
then size is unlimited capped only by Azure BlockBlob size of  4.75 TB (100 MB 
per block X 50,000 blocks).|
 
 Review comment:
   Minor: extra space before `4.75TB`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [samza] cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer

2020-03-23 Thread GitBox
cameronlee314 commented on a change in pull request #1323: Add docs for configs 
of Azure Blob SystemProducer 
URL: https://github.com/apache/samza/pull/1323#discussion_r396833814
 
 

 ##
 File path: docs/learn/documentation/versioned/jobs/samza-configurations.md
 ##
 @@ -245,6 +246,34 @@ Configs for producing to 
[ElasticSearch](https://www.elastic.co/products/elastic
 |systems.**_system-name_**.bulk.flush.max.size.mb|5|The maximum aggregate 
size of messages in the buffered before flushing.|
 |systems.**_system-name_**.bulk.flush.interval.ms|never|How often buffered 
messages should be flushed.|
 
+ [3.7 Azure Blob 
Storage](#azure-blob-storage)
+Configs for producing to [Azure Blob 
Storage](https://azure.microsoft.com/en-us/services/storage/blobs/). This 
section applies if you have set systems.**__system-name__**.samza.factory = 
`org.apache.samza.system.azureblob.AzureBlobSystemFactory`.
+**_system-name_** is the Azure container name you want to produce blobs to. If 
such a container does not exist then it is created. 
+
+|Name|Default|Description|
+|--- |--- |--- |
+|sensitive.systems.**_system-name_**.azureblob.account.name| |__Required:__ 
The Azure account name to which the Azure container belongs to. |
+|sensitive.systems.**_system-name_**.azureblob.account.key| |__Required:__ Key 
for the Azure account specified above.|
+
+ [Advanced Azure Blob Storage 
Configurations](#advanced-azure-blob-storage)
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.azureblob.proxy.use |"false"|if true, proxy will be 
used to connect to Azure.|
+|systems.**_system-name_**.azureblob.proxy.hostname| |if proxy.use is true 
then host name of proxy.|
+|systems.**_system-name_**.azureblob.proxy.port| |if proxy.use is true then 
port of proxy.|
+|samza.azureblob.log.slowRequestMs|30 secs|The duration after which an Azure 
request will be logged as a warning.|
+|systems.**_system-name_**.azureblob.writer.factory.class|`org.apache.samza.system.``azureblob.avro.``AzureBlobAvroWriterFactory`|Fully
 qualified class name of the 
`org.apache.samza.system.azureblob.producer.AzureBlobWriter` impl for the 
system producer.The default writer creates blobs that are of type AVRO 
and require the messages sent to a blob to be AVRO records. The blobs created 
by the default writer are of type [Block 
Blobs](https://docs.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs#about-block-blobs).All
 the following configs are relevant to this default writer.|
 
 Review comment:
   Regarding "All the following configs are relevant to this default writer.": 
The following configs apply to other writers too, right? The wording kind of 
makes it sound like the following configs won't apply to a non-default writer. 
Can you please clarify that a little bit (or maybe you can just remove that 
sentence)?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [samza] cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer

2020-03-23 Thread GitBox
cameronlee314 commented on a change in pull request #1323: Add docs for configs 
of Azure Blob SystemProducer 
URL: https://github.com/apache/samza/pull/1323#discussion_r396829354
 
 

 ##
 File path: docs/learn/documentation/versioned/jobs/samza-configurations.md
 ##
 @@ -245,6 +246,34 @@ Configs for producing to 
[ElasticSearch](https://www.elastic.co/products/elastic
 |systems.**_system-name_**.bulk.flush.max.size.mb|5|The maximum aggregate 
size of messages in the buffered before flushing.|
 |systems.**_system-name_**.bulk.flush.interval.ms|never|How often buffered 
messages should be flushed.|
 
+ [3.7 Azure Blob 
Storage](#azure-blob-storage)
+Configs for producing to [Azure Blob 
Storage](https://azure.microsoft.com/en-us/services/storage/blobs/). This 
section applies if you have set systems.**__system-name__**.samza.factory = 
`org.apache.samza.system.azureblob.AzureBlobSystemFactory`.
+**_system-name_** is the Azure container name you want to produce blobs to. If 
such a container does not exist then it is created. 
+
+|Name|Default|Description|
+|--- |--- |--- |
+|sensitive.systems.**_system-name_**.azureblob.account.name| |__Required:__ 
The Azure account name to which the Azure container belongs to. |
+|sensitive.systems.**_system-name_**.azureblob.account.key| |__Required:__ Key 
for the Azure account specified above.|
+
+ [Advanced Azure Blob Storage 
Configurations](#advanced-azure-blob-storage)
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.azureblob.proxy.use |"false"|if true, proxy will be 
used to connect to Azure.|
 
 Review comment:
   Minor: It looks like other parts of this documentation use `false` instead 
of `"false"`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [samza] cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer

2020-03-23 Thread GitBox
cameronlee314 commented on a change in pull request #1323: Add docs for configs 
of Azure Blob SystemProducer 
URL: https://github.com/apache/samza/pull/1323#discussion_r396830013
 
 

 ##
 File path: docs/learn/documentation/versioned/jobs/samza-configurations.md
 ##
 @@ -245,6 +246,34 @@ Configs for producing to 
[ElasticSearch](https://www.elastic.co/products/elastic
 |systems.**_system-name_**.bulk.flush.max.size.mb|5|The maximum aggregate 
size of messages in the buffered before flushing.|
 |systems.**_system-name_**.bulk.flush.interval.ms|never|How often buffered 
messages should be flushed.|
 
+ [3.7 Azure Blob 
Storage](#azure-blob-storage)
+Configs for producing to [Azure Blob 
Storage](https://azure.microsoft.com/en-us/services/storage/blobs/). This 
section applies if you have set systems.**__system-name__**.samza.factory = 
`org.apache.samza.system.azureblob.AzureBlobSystemFactory`.
+**_system-name_** is the Azure container name you want to produce blobs to. If 
such a container does not exist then it is created. 
+
+|Name|Default|Description|
+|--- |--- |--- |
+|sensitive.systems.**_system-name_**.azureblob.account.name| |__Required:__ 
The Azure account name to which the Azure container belongs to. |
+|sensitive.systems.**_system-name_**.azureblob.account.key| |__Required:__ Key 
for the Azure account specified above.|
+
+ [Advanced Azure Blob Storage 
Configurations](#advanced-azure-blob-storage)
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.azureblob.proxy.use |"false"|if true, proxy will be 
used to connect to Azure.|
+|systems.**_system-name_**.azureblob.proxy.hostname| |if proxy.use is true 
then host name of proxy.|
+|systems.**_system-name_**.azureblob.proxy.port| |if proxy.use is true then 
port of proxy.|
+|samza.azureblob.log.slowRequestMs|30 secs|The duration after which an Azure 
request will be logged as a warning.|
 
 Review comment:
   Minor: For consistency, maybe put the actual milliseconds number. You can 
put `30s` in parentheses or as a note in the Description part.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [samza] cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer

2020-03-23 Thread GitBox
cameronlee314 commented on a change in pull request #1323: Add docs for configs 
of Azure Blob SystemProducer 
URL: https://github.com/apache/samza/pull/1323#discussion_r396832610
 
 

 ##
 File path: docs/learn/documentation/versioned/jobs/samza-configurations.md
 ##
 @@ -245,6 +246,34 @@ Configs for producing to 
[ElasticSearch](https://www.elastic.co/products/elastic
 |systems.**_system-name_**.bulk.flush.max.size.mb|5|The maximum aggregate 
size of messages in the buffered before flushing.|
 |systems.**_system-name_**.bulk.flush.interval.ms|never|How often buffered 
messages should be flushed.|
 
+ [3.7 Azure Blob 
Storage](#azure-blob-storage)
+Configs for producing to [Azure Blob 
Storage](https://azure.microsoft.com/en-us/services/storage/blobs/). This 
section applies if you have set systems.**__system-name__**.samza.factory = 
`org.apache.samza.system.azureblob.AzureBlobSystemFactory`.
+**_system-name_** is the Azure container name you want to produce blobs to. If 
such a container does not exist then it is created. 
+
+|Name|Default|Description|
+|--- |--- |--- |
+|sensitive.systems.**_system-name_**.azureblob.account.name| |__Required:__ 
The Azure account name to which the Azure container belongs to. |
+|sensitive.systems.**_system-name_**.azureblob.account.key| |__Required:__ Key 
for the Azure account specified above.|
+
+ [Advanced Azure Blob Storage 
Configurations](#advanced-azure-blob-storage)
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.azureblob.proxy.use |"false"|if true, proxy will be 
used to connect to Azure.|
+|systems.**_system-name_**.azureblob.proxy.hostname| |if proxy.use is true 
then host name of proxy.|
+|systems.**_system-name_**.azureblob.proxy.port| |if proxy.use is true then 
port of proxy.|
+|samza.azureblob.log.slowRequestMs|30 secs|The duration after which an Azure 
request will be logged as a warning.|
 
 Review comment:
   Can you please clarify the description? I think the usage of the term 
"duration" might be overloaded. Do you mean that if the Azure request takes 30s 
to complete, then it will be logged?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [samza] cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer

2020-03-23 Thread GitBox
cameronlee314 commented on a change in pull request #1323: Add docs for configs 
of Azure Blob SystemProducer 
URL: https://github.com/apache/samza/pull/1323#discussion_r396828747
 
 

 ##
 File path: docs/learn/documentation/versioned/jobs/samza-configurations.md
 ##
 @@ -245,6 +246,34 @@ Configs for producing to 
[ElasticSearch](https://www.elastic.co/products/elastic
 |systems.**_system-name_**.bulk.flush.max.size.mb|5|The maximum aggregate 
size of messages in the buffered before flushing.|
 |systems.**_system-name_**.bulk.flush.interval.ms|never|How often buffered 
messages should be flushed.|
 
+ [3.7 Azure Blob 
Storage](#azure-blob-storage)
+Configs for producing to [Azure Blob 
Storage](https://azure.microsoft.com/en-us/services/storage/blobs/). This 
section applies if you have set systems.**__system-name__**.samza.factory = 
`org.apache.samza.system.azureblob.AzureBlobSystemFactory`.
 
 Review comment:
   Minor: The `**__system-name__**` part looks a little inconsistent with the 
other sections (which use `systems.*.samza.factory`).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [samza] cameronlee314 commented on a change in pull request #1323: Add docs for configs of Azure Blob SystemProducer

2020-03-23 Thread GitBox
cameronlee314 commented on a change in pull request #1323: Add docs for configs 
of Azure Blob SystemProducer 
URL: https://github.com/apache/samza/pull/1323#discussion_r396831076
 
 

 ##
 File path: docs/learn/documentation/versioned/jobs/samza-configurations.md
 ##
 @@ -245,6 +246,34 @@ Configs for producing to 
[ElasticSearch](https://www.elastic.co/products/elastic
 |systems.**_system-name_**.bulk.flush.max.size.mb|5|The maximum aggregate 
size of messages in the buffered before flushing.|
 |systems.**_system-name_**.bulk.flush.interval.ms|never|How often buffered 
messages should be flushed.|
 
+ [3.7 Azure Blob 
Storage](#azure-blob-storage)
+Configs for producing to [Azure Blob 
Storage](https://azure.microsoft.com/en-us/services/storage/blobs/). This 
section applies if you have set systems.**__system-name__**.samza.factory = 
`org.apache.samza.system.azureblob.AzureBlobSystemFactory`.
+**_system-name_** is the Azure container name you want to produce blobs to. If 
such a container does not exist then it is created. 
+
+|Name|Default|Description|
+|--- |--- |--- |
+|sensitive.systems.**_system-name_**.azureblob.account.name| |__Required:__ 
The Azure account name to which the Azure container belongs to. |
+|sensitive.systems.**_system-name_**.azureblob.account.key| |__Required:__ Key 
for the Azure account specified above.|
+
+ [Advanced Azure Blob Storage 
Configurations](#advanced-azure-blob-storage)
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.azureblob.proxy.use |"false"|if true, proxy will be 
used to connect to Azure.|
+|systems.**_system-name_**.azureblob.proxy.hostname| |if proxy.use is true 
then host name of proxy.|
+|systems.**_system-name_**.azureblob.proxy.port| |if proxy.use is true then 
port of proxy.|
+|samza.azureblob.log.slowRequestMs|30 secs|The duration after which an Azure 
request will be logged as a warning.|
+|systems.**_system-name_**.azureblob.writer.factory.class|`org.apache.samza.system.``azureblob.avro.``AzureBlobAvroWriterFactory`|Fully
 qualified class name of the 
`org.apache.samza.system.azureblob.producer.AzureBlobWriter` impl for the 
system producer.The default writer creates blobs that are of type AVRO 
and require the messages sent to a blob to be AVRO records. The blobs created 
by the default writer are of type [Block 
Blobs](https://docs.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs#about-block-blobs).All
 the following configs are relevant to this default writer.|
+|systems.**_system-name_**.azureblob.compression.type|"none"|type of 
compression to be used before uploading blocks. Can be "none" or "gzip".|
+|systems.**_system-name_**.azureblob.maxFlushThresholdSize|10485760 (10 
MB)|max size of the uncompressed block to be uploaded in bytes. Maximum size 
allowed by Azure is 100MB.|
+|systems.**_system-name_**.azureblob.maxBlobSize|Long.MAX_VALUE 
(unlimited)|max size of the uncompressed blob in bytes.If default value 
then size is unlimited capped only by Azure BlockBlob size of  4.75 TB (100 MB 
per block X 50,000 blocks).|
+|systems.**_system-name_**.azureblob.maxMessagesPerBlob|Long.MAX_VALUE 
(unlimited)|max number of messages per blob.|
+|systems.**_system-name_**.azureblob.threadPoolCount|2|number of threads for 
the asynchronous uploading of blocks.|
+|systems.**_system-name_**.azureblob.blockingQueueSize|Thread Pool Count * 
2|size of the queue to hold blocks ready to be uploaded by asynchronous 
threads.If all threads are busy uploading then blocks are queued and if 
queue is full then main thread will start uploading which will block processing 
of incoming messages.|
+|systems.**_system-name_**.azureblob.flushTimeoutMs|3 mins|timeout to finish 
uploading all blocks before committing a blob.|
+|systems.**_system-name_**.azureblob.closeTimeoutMs|5 mins|timeout to finish 
committing all the blobs currently being written to. This does not include the 
flush timeout per blob.|
 
 Review comment:
   Minor: same as above regarding using the actual milliseconds value


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [samza] cameronlee314 commented on issue #1325: SAMZA-2491: log uncaught exceptions in JC

2020-03-23 Thread GitBox
cameronlee314 commented on issue #1325: SAMZA-2491: log uncaught exceptions in 
JC
URL: https://github.com/apache/samza/pull/1325#issuecomment-602913662
 
 
   Looks like your checkstyle failed in CI.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [samza] alnzng commented on issue #1326: SAMZA-2492: Add new deserialize function for JobGraphJson and change the scope of related classes as public

2020-03-23 Thread GitBox
alnzng commented on issue #1326: SAMZA-2492: Add new deserialize function for 
JobGraphJson and change the scope of related classes as public
URL: https://github.com/apache/samza/pull/1326#issuecomment-602847934
 
 
   @MabelYC Seems you are looking for the same function provided in this PR, 
can you please help take a look at it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (SAMZA-2492) Add new deserialize function for JobGraphJson and change the scope of related classes as public

2020-03-23 Thread Alan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SAMZA-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Zhang updated SAMZA-2492:
--
Description: 
In Samza core, `JobGraphJsonGenerator` provides the function to serliaze 
`JobGraphJson` as plan JSON and later the config 
`samza.internal.execution.plan` will use this string as value.

However, it doesn't provide the deserialize function to help generate 
`JobGraphJson` from plan JSON string. This function is useful when you need to 
parse information from `samza.internal.execution.plan`.

In this ticket, there are two changes will be done here:
 # Add new deserialize function `toJobGraphJson()`
 # Change the package scope of class JobGraphJsonGenerator as public

 

  was:
In Samza core, `JobGraphJsonGenerator` provides the function to serliaze 
`JobGraphJson` as plan json and later the config 
`samza.internal.execution.plan` will use this string as value.

However, it doesn't provide the deserialize function to help generate 
`JobGraphJson` from plan json string. This function is useful when you need to 
pase information from `samza.internal.execution.plan`.

In this ticket, there are two changes will be done here:
 # Add new deserialize function `toJobGraphJson()`
 # Change the package scope of class JobGraphJsonGenerator as public

 


> Add new deserialize function for JobGraphJson and change the scope of related 
> classes as public
> ---
>
> Key: SAMZA-2492
> URL: https://issues.apache.org/jira/browse/SAMZA-2492
> Project: Samza
>  Issue Type: Task
>Reporter: Alan Zhang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In Samza core, `JobGraphJsonGenerator` provides the function to serliaze 
> `JobGraphJson` as plan JSON and later the config 
> `samza.internal.execution.plan` will use this string as value.
> However, it doesn't provide the deserialize function to help generate 
> `JobGraphJson` from plan JSON string. This function is useful when you need 
> to parse information from `samza.internal.execution.plan`.
> In this ticket, there are two changes will be done here:
>  # Add new deserialize function `toJobGraphJson()`
>  # Change the package scope of class JobGraphJsonGenerator as public
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SAMZA-2492) Add new deserialize function for JobGraphJson and change the scope of related classes as public

2020-03-23 Thread Alan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SAMZA-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Zhang updated SAMZA-2492:
--
Summary: Add new deserialize function for JobGraphJson and change the scope 
of related classes as public  (was: Add new deserialize function for 
JobGraphJson and change related classes' scope as public)

> Add new deserialize function for JobGraphJson and change the scope of related 
> classes as public
> ---
>
> Key: SAMZA-2492
> URL: https://issues.apache.org/jira/browse/SAMZA-2492
> Project: Samza
>  Issue Type: Task
>Reporter: Alan Zhang
>Priority: Major
>
> In Samza core, `JobGraphJsonGenerator` provides the function to serliaze 
> `JobGraphJson` as plan json and later the config 
> `samza.internal.execution.plan` will use this string as value.
> However, it doesn't provide the deserialize function to help generate 
> `JobGraphJson` from plan json string. This function is useful when you need 
> to pase information from `samza.internal.execution.plan`.
> In this ticket, there are two changes will be done here:
>  # Add new deserialize function `toJobGraphJson()`
>  # Change the package scope of class JobGraphJsonGenerator as public
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SAMZA-2492) Add new deserialize function for JobGraphJson and change related classes' scope as public

2020-03-23 Thread Alan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SAMZA-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Zhang updated SAMZA-2492:
--
Summary: Add new deserialize function for JobGraphJson and change related 
classes' scope as public  (was: Add new deserialize function for JobGraphJson 
and make JobGraphJsonGenerator as public)

> Add new deserialize function for JobGraphJson and change related classes' 
> scope as public
> -
>
> Key: SAMZA-2492
> URL: https://issues.apache.org/jira/browse/SAMZA-2492
> Project: Samza
>  Issue Type: Task
>Reporter: Alan Zhang
>Priority: Major
>
> In Samza core, `JobGraphJsonGenerator` provides the function to serliaze 
> `JobGraphJson` as plan json and later the config 
> `samza.internal.execution.plan` will use this string as value.
> However, it doesn't provide the deserialize function to help generate 
> `JobGraphJson` from plan json string. This function is useful when you need 
> to pase information from `samza.internal.execution.plan`.
> In this ticket, there are two changes will be done here:
>  # Add new deserialize function `toJobGraphJson()`
>  # Change the package scope of class JobGraphJsonGenerator as public
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [samza] alnzng opened a new pull request #1326: SAMZA-2492: Add new deserialize function for JobGraphJson and change the scope of related classes as public

2020-03-23 Thread GitBox
alnzng opened a new pull request #1326: SAMZA-2492: Add new deserialize 
function for JobGraphJson and change the scope of related classes as public
URL: https://github.com/apache/samza/pull/1326
 
 
    Symptom
   In Samza core, `JobGraphJsonGenerator` provides the function to serialize 
`JobGraphJson` as plan JSON and later the config 
`samza.internal.execution.plan` will use this string as value.
   
   However, it doesn't provide the deserialize function to help generate 
`JobGraphJson` from plan JSON string. This function is useful when you need to 
parse information from the config `samza.internal.execution.plan`.
   
    Changes
   In this PR, there are two changes made for supporting `JobGraphJson` 
deserialization:
   
   1. Add new deserialize function `toJobGraphJson()`
   1. Change the package scope of related classes as `public`
   
    Tests
   - [ ] All unit tests and integration tests are passed
   
    API Changes
   None
   
    Upgrade Instructions
   None
   
    Usage Instructions
   None


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [samza] lhaiesp opened a new pull request #1325: SAMZA-2491: log uncaught exceptions in JC

2020-03-23 Thread GitBox
lhaiesp opened a new pull request #1325: SAMZA-2491: log uncaught exceptions in 
JC
URL: https://github.com/apache/samza/pull/1325
 
 
   AM should log uncaught exceptions and System.exit to ensure that the process 
dies on errors


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (SAMZA-2492) Add new deserialize function for JobGraphJson and make JobGraphJsonGenerator as public

2020-03-23 Thread Alan Zhang (Jira)
Alan Zhang created SAMZA-2492:
-

 Summary: Add new deserialize function for JobGraphJson and make 
JobGraphJsonGenerator as public
 Key: SAMZA-2492
 URL: https://issues.apache.org/jira/browse/SAMZA-2492
 Project: Samza
  Issue Type: Task
Reporter: Alan Zhang


In Samza core, `JobGraphJsonGenerator` provides the function to serliaze 
`JobGraphJson` as plan json and later the config 
`samza.internal.execution.plan` will use this string as value.

However, it doesn't provide the deserialize function to help generate 
`JobGraphJson` from plan json string. This function is useful when you need to 
pase information from `samza.internal.execution.plan`.

In this ticket, there are two changes will be done here:
 # Add new deserialize function `toJobGraphJson()`
 # Change the package scope of class JobGraphJsonGenerator as public

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (SAMZA-2491) AM should log uncaught exceptions and System.exit to ensure that the process dies on errors

2020-03-23 Thread Hai Lu (Jira)
Hai Lu created SAMZA-2491:
-

 Summary: AM should log uncaught exceptions and System.exit to 
ensure that the process dies on errors
 Key: SAMZA-2491
 URL: https://issues.apache.org/jira/browse/SAMZA-2491
 Project: Samza
  Issue Type: Improvement
Reporter: Hai Lu
Assignee: Hai Lu


From: pmaheshw

Symptom: A job deployment timed out waiting for application attempt to 
transition from New to Running.

Cause: ClusterBasedJobCoordinator threw an exception during startup due to a 
misconfiguration, but did not kill the AM process (likely due to non-daemon 
threads).

Suggested fixes:
1. ClusterBasedJobCoordinator#main doesn't use an uncaught exception handler, 
and doesn't catch + log any exceptions thrown from ClusterBasedJobCoordinator 
constructor or from run(). We should fix this. Uncaught exceptions go to stderr 
instead of logs and do not have a timestamp, which makes debugging difficult. 
E.g.:

Exception in thread "main" org.apache.samza.SamzaException: Cannot get 
systemAdmin for system aggregate-tracking
at org.apache.samza.system.SystemAdmins.getSystemAdmin(SystemAdmins.java:63)
at 
org.apache.samza.system.StreamMetadataCache$$anonfun$3.apply(StreamMetadataCache.scala:66)
at 
org.apache.samza.system.StreamMetadataCache$$anonfun$3.apply(StreamMetadataCache.scala:64)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.Map$Map2.foreach(Map.scala:137)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at 
org.apache.samza.system.StreamMetadataCache.getStreamMetadata(StreamMetadataCache.scala:64)
at 
org.apache.samza.coordinator.StreamPartitionCountMonitor.getMetadata(StreamPartitionCountMonitor.java:92)
at 
org.apache.samza.coordinator.StreamPartitionCountMonitor.(StreamPartitionCountMonitor.java:113)
at 
org.apache.samza.clustermanager.ClusterBasedJobCoordinator.getPartitionCountMonitor(ClusterBasedJobCoordinator.java:343)
at 
org.apache.samza.clustermanager.ClusterBasedJobCoordinator.(ClusterBasedJobCoordinator.java:207)
at 
org.apache.samza.clustermanager.ClusterBasedJobCoordinator.main(ClusterBasedJobCoordinator.java:441)

2. JC should call System.exit on returning from main (cleanly or on exception) 
and from the uncaught exception handler to ensure that the AM process dies on 
these errors and does not leave the deployment hanging. We've also seen this 
issue due to client libraries (datavault, brooklin, kafka etc.) creating 
non-daemon threads and not stopping them cleanly. See LocalContainerRunner for 
reference, which does kill the process on returning from main thread. E.g., in 
this case its threads like this:
"AsyncHttpClient-27-1" #134 prio=5 os_prio=0 tid=0x7faead675000 nid=0x4151 
runnable [0x7fae9c9da000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
 - locked <0xfe6a2f40> (a 
com.linkedin.mario.shaded.io.netty.channel.nio.SelectedSelectionKeySet)
 - locked <0xfe6fe9c0> (a java.util.Collections$UnmodifiableSet)
 - locked <0xfe6a3f68> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at 
com.linkedin.mario.shaded.io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)
at 
com.linkedin.mario.shaded.io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:824)
at 
com.linkedin.mario.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
at 
com.linkedin.mario.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
at 
com.linkedin.mario.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at 
com.linkedin.mario.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[CONF] Apache Samza > Samza Enhancement Proposal

2020-03-23 Thread Cameron Lee (Confluence)
Title: Message Title



 
 
 
There's 1 new edit on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Samza Enhancement Proposal 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Cameron Lee edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
 ... 
 
 
 
 
 
 
 
 
 
 
SEP 
Status 
Link to Discussion Thread 
Related JIRA 
 
 
SEP-1: Semantics of ProcessorId in Samza 
Accepted 
http://mail-archives.apache.org/mod_mbox/samza-dev/201703.mbox/browser 
SAMZA-1126 
 
 
SEP-2: ApplicationRunner Design 
Discuss 
   
  SAMZA-1130   
 
 
SEP-3: Heart-beat mechanism between JobCoordinator and all running containers 
Accepted 
http://mail-archives.apache.org/mod_mbox/samza-dev/201705.mbox/%3CCANxwKLaVro6MBvUJW2RvoNLDO9-G87Y3Ox%2B5W66K_CxBqeVfgQ%40mail.gmail.com%3E 
SAMZA-871 
 
 
SEP-4: Adjunct Data Store for Unbounded Datasets 
Discuss 
http://mail-archives.apache.org/mod_mbox/samza-dev/201705.mbox/browser 
SAMZA-1278 
 
 
SEP-5: Enable partition expansion of input streams 
Accepted 
   
SAMZA-1293 
 
 
SEP-6: Support Control Message Across Intermediate Streams 
Discuss 
   
SAMZA-1260 
 
 
SEP-7: Samza on Azure 
Discuss 
   
SAMZA-1373 
 
 
SEP-8 Add in-memory system consumer & producer 
Accepted 
http://mail-archives.apache.org/mod_mbox/samza-dev/201708.mbox/%3ccag2vmjhyxckqn4k+pst83ocpjc1r79mapsqy-vnkbbpubue...@mail.gmail.com%3E 
SAMZA-1395 
 
 
SEP-9 Add a Kinesis SystemConsumer and SystemProducer 
   
   
   
 
 
SEP-10 Exactly-once Processing in Samza 
   
   
   
 
 
SEP-11: Host affinity in standalone. 
Accepted 
   
SAMZA-1554 
 
 
SEP-12: Integration Test Framework 
Accepted 
http://mail-archives.apache.org/mod_mbox/samza-dev/201805.mbox/%3CDM5PR21MB02827A6FA9F47CB8EF99A339A2810%40DM5PR21MB0282.namprd21.prod.outlook.com%3E 
SAMZA-1629 
 
 
SEP-13: unify high- and low-level user applications in YARN and standalone 
Accepted 
  https://lists.apache.org/thread.html/5be324d239633f7433525b0e52f0377ad2f8c25787eefba1b96a492c@%3Cdev.samza.apache.org%3E   
SAMZA-1789 
 
 
SEP-14: System and Stream Descriptors 
Accepted 
   
SAMZA-1804 
 
 
SEP-15: New Runtime Context API 
Accepted 
   
 
 
 SAMZA-1714  
  
 
 
SEP-16: Extend ExecutionPlanner to Support Stream-Table Join 
Accepted 
   
SAMZA-1889 
 
 
SEP-17: Samza SQL Shell 
Accepted 
   
   
 
 
SEP-18: Startpoints - Manipulating Starting Offsets for Input Streams 
Accepted 
   
SAMZA-1983 
 
 
  SEP-19: Hot standby state for Samza applications   
Discuss 
   
   
 
 
SEP-20: Samza on Kubernetes 
Accepted 
   
   
 
 
SEP-21: Samza Async API for High Level 
Accepted 
   
SAMZA-2055 
 
 
SEP-22: Container Placements in Samza 
Discuss 
   
   
 
 
SEP-23: Simplify Job Runner 
Accepted 
   
   
 
 
SEP-24: Cluster-based Job Coordinator Dependency Isolation 
DiscussAccepted 
https://mail-archives.apache.org/mod_mbox/samza-dev/202003.mbox/%3CCABbqq3yEdAXiiTzXtO%2B%3DUnCBvxFR-5q_aJx3uqcQOoxF7M09vQ%40mail.gmail.com%3E 
   
 
 
  SEP-25: PR Title And Description Guidelines   
Accepted 
http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/%3CCAMja7KeQr9C048UVZwfSC46h%3DEX_9S%2BSEvMF9NPg0V5dPTPfZg%40mail.gmail.com%3E 
   
 
 
SEP-26: Azure Blob Storage Producer 
Discuss 
   
SAMZA-2421 
 
 
SEP-27: Side Inputs for Local Stores 
Discuss 
   
SAMZA-1773 
 
 
 
 ...  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Go to page history 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[samza] branch master updated (74261aa -> a8eee8b)

2020-03-23 Thread cameronlee
This is an automated email from the ASF dual-hosted git repository.

cameronlee pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/samza.git.


from 74261aa  [Minor] bump version to 1.5.0 since Samza 1.4 has been 
released (#1322)
 add a8eee8b  SAMZA-2481: Improve memory usage in samza-core unit tests 
(#1324)

No new revisions were added by this update.

Summary of changes:
 build.gradle  | 8 ++--
 gradle/dependency-versions.gradle | 2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)



[GitHub] [samza] cameronlee314 merged pull request #1324: SAMZA-2481: Improve memory usage in samza-core unit tests

2020-03-23 Thread GitBox
cameronlee314 merged pull request #1324: SAMZA-2481: Improve memory usage in 
samza-core unit tests
URL: https://github.com/apache/samza/pull/1324
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services