[jira] [Resolved] (SAMZA-1914) out of range starting offset for EH consumer
[ https://issues.apache.org/jira/browse/SAMZA-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai resolved SAMZA-1914. Resolution: Fixed > out of range starting offset for EH consumer > > > Key: SAMZA-1914 > URL: https://issues.apache.org/jira/browse/SAMZA-1914 > Project: Samza > Issue Type: Bug >Reporter: Hai >Assignee: Hai >Priority: Major > > In EventHubs today we use offset + 1 as the next offset. This would cause > problem if the consumer restarts and there is no event produced into the > server. Because then offset + 1 would be an out of range offset and eventhubs > client would fail to initialize. > The proposed fix here is to always use offset (+0) as the "next" offset. And > when we initialize the EH client, we specify to exclude the offset that we > provide. This way we won't be reprocessing the last event and we avoid out of > range offset error. > We need to be careful that this doesn't cause Brooklin to miss event; so > should also double check the EH connector in Brooklin as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SAMZA-1914) out of range starting offset for EH consumer
Hai created SAMZA-1914: -- Summary: out of range starting offset for EH consumer Key: SAMZA-1914 URL: https://issues.apache.org/jira/browse/SAMZA-1914 Project: Samza Issue Type: Bug Reporter: Hai Assignee: Hai In EventHubs today we use offset + 1 as the next offset. This would cause problem if the consumer restarts and there is no event produced into the server. Because then offset + 1 would be an out of range offset and eventhubs client would fail to initialize. The proposed fix here is to always use offset (+0) as the "next" offset. And when we initialize the EH client, we specify to exclude the offset that we provide. This way we won't be reprocessing the last event and we avoid out of range offset error. We need to be careful that this doesn't cause Brooklin to miss event; so should also double check the EH connector in Brooklin as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (SAMZA-1870) HDFS system admin not handling END_OF_STREAM offset
[ https://issues.apache.org/jira/browse/SAMZA-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai resolved SAMZA-1870. Resolution: Fixed > HDFS system admin not handling END_OF_STREAM offset > --- > > Key: SAMZA-1870 > URL: https://issues.apache.org/jira/browse/SAMZA-1870 > Project: Samza > Issue Type: Bug >Reporter: Hai >Assignee: Hai >Priority: Major > > This happens particularly when using HDFS as a bootstrap stream: > org.apache.samza.SamzaException: Invalid offset for MultiFileHdfsReader: > END_OF_STREAM > at > org.apache.samza.system.hdfs.reader.MultiFileHdfsReader.getCurFileIndex(MultiFileHdfsReader.java:64) > at > org.apache.samza.system.hdfs.HdfsSystemAdmin.offsetComparator(HdfsSystemAdmin.java:224) > at > org.apache.samza.system.chooser.BootstrappingChooser.org$apache$samza$system$chooser$BootstrappingChooser$$checkOffset(BootstrappingChooser.scala:274) > at > org.apache.samza.system.chooser.BootstrappingChooser.choose(BootstrappingChooser.scala:204) > at > org.apache.samza.system.chooser.DefaultChooser.choose(DefaultChooser.scala:294) > at org.apache.samza.system.SystemConsumers.choose(SystemConsumers.scala:210) > at org.apache.samza.task.AsyncRunLoop.chooseEnvelope(AsyncRunLoop.java:208) > at org.apache.samza.task.AsyncRunLoop.run(AsyncRunLoop.java:156) > at org.apache.samza.container.SamzaContainer.run(SamzaContainer.scala:787) > at > org.apache.samza.runtime.LocalContainerRunner.run(LocalContainerRunner.java:101) > at > org.apache.samza.runtime.LocalContainerRunner.main(LocalContainerRunner.java:148) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SAMZA-1870) HDFS system admin not handling END_OF_STREAM offset
Hai created SAMZA-1870: -- Summary: HDFS system admin not handling END_OF_STREAM offset Key: SAMZA-1870 URL: https://issues.apache.org/jira/browse/SAMZA-1870 Project: Samza Issue Type: Bug Reporter: Hai Assignee: Hai This happens particularly when using HDFS as a bootstrap stream: org.apache.samza.SamzaException: Invalid offset for MultiFileHdfsReader: END_OF_STREAM at org.apache.samza.system.hdfs.reader.MultiFileHdfsReader.getCurFileIndex(MultiFileHdfsReader.java:64) at org.apache.samza.system.hdfs.HdfsSystemAdmin.offsetComparator(HdfsSystemAdmin.java:224) at org.apache.samza.system.chooser.BootstrappingChooser.org$apache$samza$system$chooser$BootstrappingChooser$$checkOffset(BootstrappingChooser.scala:274) at org.apache.samza.system.chooser.BootstrappingChooser.choose(BootstrappingChooser.scala:204) at org.apache.samza.system.chooser.DefaultChooser.choose(DefaultChooser.scala:294) at org.apache.samza.system.SystemConsumers.choose(SystemConsumers.scala:210) at org.apache.samza.task.AsyncRunLoop.chooseEnvelope(AsyncRunLoop.java:208) at org.apache.samza.task.AsyncRunLoop.run(AsyncRunLoop.java:156) at org.apache.samza.container.SamzaContainer.run(SamzaContainer.scala:787) at org.apache.samza.runtime.LocalContainerRunner.run(LocalContainerRunner.java:101) at org.apache.samza.runtime.LocalContainerRunner.main(LocalContainerRunner.java:148) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SAMZA-1708) investigate timeout exception in eventhubs client during shutdown
[ https://issues.apache.org/jira/browse/SAMZA-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-1708: --- Summary: investigate timeout exception in eventhubs client during shutdown (was: investigate shutdown timeout on eventhubs producer) > investigate timeout exception in eventhubs client during shutdown > - > > Key: SAMZA-1708 > URL: https://issues.apache.org/jira/browse/SAMZA-1708 > Project: Samza > Issue Type: Improvement >Reporter: Hai >Assignee: Hai >Priority: Major > Labels: EventHub > > We oftentimes see timeout when trying to shutdown consumer/producer because > the EH client times out when closing. Need to investigate the root cause and > fix it or push a fix in the EH client. > java.util.concurrent.TimeoutException > java.util.concurrent.CompletableFuture.timedGet - Line 1771 > (CompletableFuture.java) java.util.concurrent.CompletableFuture.get - Line > 1915 (CompletableFuture.java) > org.apache.samza.system.eventhub.consumer.EventHubSystemConsumer$1.run - Line > 442 (EventHubSystemConsumer.java) > java.util.concurrent.Executors$RunnableAdapter.call - Line 511 > (Executors.java) java.util.concurrent.FutureTask.run - Line 266 > (FutureTask.java) java.util.concurrent.ThreadPoolExecutor.runWorker - Line > 1142 (ThreadPoolExecutor.java) > java.util.concurrent.ThreadPoolExecutor$Worker.run - Line 617 > (ThreadPoolExecutor.java) java.lang.Thread.run - Line 745 (Thread.java) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SAMZA-1708) investigate shutdown timeout on eventhubs producer
[ https://issues.apache.org/jira/browse/SAMZA-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-1708: --- Description: We oftentimes see timeout when trying to shutdown consumer/producer because the EH client times out when closing. Need to investigate the root cause and fix it or push a fix in the EH client. java.util.concurrent.TimeoutException java.util.concurrent.CompletableFuture.timedGet - Line 1771 (CompletableFuture.java) java.util.concurrent.CompletableFuture.get - Line 1915 (CompletableFuture.java) org.apache.samza.system.eventhub.consumer.EventHubSystemConsumer$1.run - Line 442 (EventHubSystemConsumer.java) java.util.concurrent.Executors$RunnableAdapter.call - Line 511 (Executors.java) java.util.concurrent.FutureTask.run - Line 266 (FutureTask.java) java.util.concurrent.ThreadPoolExecutor.runWorker - Line 1142 (ThreadPoolExecutor.java) java.util.concurrent.ThreadPoolExecutor$Worker.run - Line 617 (ThreadPoolExecutor.java) java.lang.Thread.run - Line 745 (Thread.java) was:We oftentimes see timeout when trying to shutdown consumer/producer because the EH client times out when closing. Need to investigate the root cause and fix it or push a fix in the EH client. > investigate shutdown timeout on eventhubs producer > -- > > Key: SAMZA-1708 > URL: https://issues.apache.org/jira/browse/SAMZA-1708 > Project: Samza > Issue Type: Improvement >Reporter: Hai >Assignee: Hai >Priority: Major > Labels: EventHub > > We oftentimes see timeout when trying to shutdown consumer/producer because > the EH client times out when closing. Need to investigate the root cause and > fix it or push a fix in the EH client. > java.util.concurrent.TimeoutException > java.util.concurrent.CompletableFuture.timedGet - Line 1771 > (CompletableFuture.java) java.util.concurrent.CompletableFuture.get - Line > 1915 (CompletableFuture.java) > org.apache.samza.system.eventhub.consumer.EventHubSystemConsumer$1.run - Line > 442 (EventHubSystemConsumer.java) > java.util.concurrent.Executors$RunnableAdapter.call - Line 511 > (Executors.java) java.util.concurrent.FutureTask.run - Line 266 > (FutureTask.java) java.util.concurrent.ThreadPoolExecutor.runWorker - Line > 1142 (ThreadPoolExecutor.java) > java.util.concurrent.ThreadPoolExecutor$Worker.run - Line 617 > (ThreadPoolExecutor.java) java.lang.Thread.run - Line 745 (Thread.java) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (SAMZA-1794) setting application acl in launch context for secured YARN cluster
[ https://issues.apache.org/jira/browse/SAMZA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai resolved SAMZA-1794. Resolution: Fixed > setting application acl in launch context for secured YARN cluster > -- > > Key: SAMZA-1794 > URL: https://issues.apache.org/jira/browse/SAMZA-1794 > Project: Samza > Issue Type: Improvement >Reporter: Hai >Assignee: Hai >Priority: Major > > Currently we don't set application acl for container launch context. See > [https://hadoop.apache.org/docs/r2.6.4/api/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.html#setApplicationACLs(java.util.Map)] > This could potentially cause problem if samza job is running on a secured > YARN cluster. Say user A submits the job, then by default only user A can > view the log and the status of the job. Even worse case is that user A > submits the job through some proxy account, then even user A herself/himself > couldn't access to logs/status of the application. > We need to make some changes for the YARN application submission to set > application acls in launch context as configured. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SAMZA-1794) setting application acl in launch context for secured YARN cluster
[ https://issues.apache.org/jira/browse/SAMZA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-1794: --- Description: Currently we don't set application acl for container launch context. See [https://hadoop.apache.org/docs/r2.6.4/api/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.html#setApplicationACLs(java.util.Map)] This could potentially cause problem if samza job is running on a secured YARN cluster. Say user A submits the job, then by default only user A can view the log and the status of the job. Even worse case is that user A submits the job through some proxy account, then even user A herself/himself couldn't access to logs/status of the application. We need to make some changes for the YARN application submission to set application acls in launch context as configured. was: Currently we don't set application acl for container launch context. See [https://hadoop.apache.org/docs/r2.6.4/api/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.html#setApplicationACLs(java.util.Map)] This could potentially cause problem if samza job is running on a secured YARN cluster with proxy account. > setting application acl in launch context for secured YARN cluster > -- > > Key: SAMZA-1794 > URL: https://issues.apache.org/jira/browse/SAMZA-1794 > Project: Samza > Issue Type: Improvement >Reporter: Hai >Assignee: Hai >Priority: Major > > Currently we don't set application acl for container launch context. See > [https://hadoop.apache.org/docs/r2.6.4/api/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.html#setApplicationACLs(java.util.Map)] > This could potentially cause problem if samza job is running on a secured > YARN cluster. Say user A submits the job, then by default only user A can > view the log and the status of the job. Even worse case is that user A > submits the job through some proxy account, then even user A herself/himself > couldn't access to logs/status of the application. > We need to make some changes for the YARN application submission to set > application acls in launch context as configured. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SAMZA-1794) setting application acl in launch context for secured YARN cluster
Hai created SAMZA-1794: -- Summary: setting application acl in launch context for secured YARN cluster Key: SAMZA-1794 URL: https://issues.apache.org/jira/browse/SAMZA-1794 Project: Samza Issue Type: Improvement Reporter: Hai Assignee: Hai Currently we don't set application acl for container launch context. See [https://hadoop.apache.org/docs/r2.6.4/api/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.html#setApplicationACLs(java.util.Map)] This could potentially cause problem if samza job is running on a secured YARN cluster with proxy account. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (SAMZA-1785) implement retry logic in eventhubs consumer
[ https://issues.apache.org/jira/browse/SAMZA-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai resolved SAMZA-1785. Resolution: Fixed > implement retry logic in eventhubs consumer > --- > > Key: SAMZA-1785 > URL: https://issues.apache.org/jira/browse/SAMZA-1785 > Project: Samza > Issue Type: Bug >Reporter: Hai >Assignee: Hai >Priority: Major > Labels: EventHub > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (SAMZA-1785) implement retry logic in eventhubs consumer
[ https://issues.apache.org/jira/browse/SAMZA-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558644#comment-16558644 ] Hai commented on SAMZA-1785: We initially found TimeoutException but later on saw different exception (CancelOperationException) which failed the K2 containers. So it seems like fixing the timeout exception wouldn't resolve the issue. We also don't know what else is going to happen in the future. So we decide to implement a retry logic in eventhubs, given the lack of retry logic in samza and the lack of nurse job on azure for K2. > implement retry logic in eventhubs consumer > --- > > Key: SAMZA-1785 > URL: https://issues.apache.org/jira/browse/SAMZA-1785 > Project: Samza > Issue Type: Bug >Reporter: Hai >Assignee: Hai >Priority: Major > Labels: EventHub > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SAMZA-1785) implement retry logic in eventhubs consumer
[ https://issues.apache.org/jira/browse/SAMZA-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-1785: --- Summary: implement retry logic in eventhubs consumer (was: ignore timeout exception during receiver renew in eventhubs consumer) > implement retry logic in eventhubs consumer > --- > > Key: SAMZA-1785 > URL: https://issues.apache.org/jira/browse/SAMZA-1785 > Project: Samza > Issue Type: Bug >Reporter: Hai >Assignee: Hai >Priority: Major > Labels: EventHub > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SAMZA-1785) ignore timeout exception during receiver renew in eventhubs consumer
Hai created SAMZA-1785: -- Summary: ignore timeout exception during receiver renew in eventhubs consumer Key: SAMZA-1785 URL: https://issues.apache.org/jira/browse/SAMZA-1785 Project: Samza Issue Type: Bug Reporter: Hai Assignee: Hai -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (SAMZA-1775) EH consumer renew too frequently under transient exception
[ https://issues.apache.org/jira/browse/SAMZA-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai resolved SAMZA-1775. Resolution: Fixed > EH consumer renew too frequently under transient exception > -- > > Key: SAMZA-1775 > URL: https://issues.apache.org/jira/browse/SAMZA-1775 > Project: Samza > Issue Type: Bug >Reporter: Hai >Assignee: Hai >Priority: Major > Labels: EventHub > > There is no delay at all before we renew the partition. This sometimes lead > to spam in the log for the following messages: > Received transient exception from EH client. Renew partition receiver for ssp > ... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SAMZA-1775) EH consumer renew too frequently under transient exception
Hai created SAMZA-1775: -- Summary: EH consumer renew too frequently under transient exception Key: SAMZA-1775 URL: https://issues.apache.org/jira/browse/SAMZA-1775 Project: Samza Issue Type: Bug Reporter: Hai Assignee: Hai There is no delay at all before we renew the partition. This sometimes lead to spam in the log for the following messages: Received transient exception from EH client. Renew partition receiver for ssp ... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (SAMZA-1741) EH consumer shutdown taking too long and causing rebalance to fail
[ https://issues.apache.org/jira/browse/SAMZA-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai resolved SAMZA-1741. Resolution: Fixed > EH consumer shutdown taking too long and causing rebalance to fail > -- > > Key: SAMZA-1741 > URL: https://issues.apache.org/jira/browse/SAMZA-1741 > Project: Samza > Issue Type: Bug >Reporter: Hai >Assignee: Hai >Priority: Major > Labels: EventHub > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SAMZA-1741) EH consumer shutdown taking too long and causing rebalance to fail
Hai created SAMZA-1741: -- Summary: EH consumer shutdown taking too long and causing rebalance to fail Key: SAMZA-1741 URL: https://issues.apache.org/jira/browse/SAMZA-1741 Project: Samza Issue Type: Bug Reporter: Hai Assignee: Hai -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (SAMZA-1706) lazy initialization for eventhub system producer
[ https://issues.apache.org/jira/browse/SAMZA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai resolved SAMZA-1706. Resolution: Fixed > lazy initialization for eventhub system producer > > > Key: SAMZA-1706 > URL: https://issues.apache.org/jira/browse/SAMZA-1706 > Project: Samza > Issue Type: Improvement >Reporter: Hai >Assignee: Hai >Priority: Major > Labels: EventHub > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SAMZA-1708) investigate shutdown timeout on eventhubs producer
Hai created SAMZA-1708: -- Summary: investigate shutdown timeout on eventhubs producer Key: SAMZA-1708 URL: https://issues.apache.org/jira/browse/SAMZA-1708 Project: Samza Issue Type: Improvement Reporter: Hai Assignee: Hai We oftentimes see timeout when trying to shutdown consumer/producer because the EH client times out when closing. Need to investigate the root cause and fix it or push a fix in the EH client. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SAMZA-1706) lazy initialization for eventhub system producer
Hai created SAMZA-1706: -- Summary: lazy initialization for eventhub system producer Key: SAMZA-1706 URL: https://issues.apache.org/jira/browse/SAMZA-1706 Project: Samza Issue Type: Improvement Reporter: Hai Assignee: Hai -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SAMZA-1624) EventHub system should prefix the configs with senstive for SasKey and SasToken
[ https://issues.apache.org/jira/browse/SAMZA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-1624: --- Fix Version/s: 0.14.1 > EventHub system should prefix the configs with senstive for SasKey and > SasToken > --- > > Key: SAMZA-1624 > URL: https://issues.apache.org/jira/browse/SAMZA-1624 > Project: Samza > Issue Type: Bug >Reporter: Srinivasulu Punuru >Priority: Major > Labels: EventHub > Fix For: 0.14.1 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SAMZA-1673) EventHub: readLatency metric returns negative value
[ https://issues.apache.org/jira/browse/SAMZA-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-1673: --- Fix Version/s: 0.14.1 > EventHub: readLatency metric returns negative value > --- > > Key: SAMZA-1673 > URL: https://issues.apache.org/jira/browse/SAMZA-1673 > Project: Samza > Issue Type: Sub-task >Reporter: Srinivasulu Punuru >Assignee: Hai >Priority: Major > Labels: EventHub > Fix For: 0.14.1 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SAMZA-1675) EventHub: Log the metadata that we are fetching from the event hubs.
[ https://issues.apache.org/jira/browse/SAMZA-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-1675: --- Fix Version/s: 0.14.1 > EventHub: Log the metadata that we are fetching from the event hubs. > > > Key: SAMZA-1675 > URL: https://issues.apache.org/jira/browse/SAMZA-1675 > Project: Samza > Issue Type: Sub-task >Reporter: Srinivasulu Punuru >Assignee: Hai >Priority: Major > Labels: EventHub > Fix For: 0.14.1 > > > Log the partitionRuntimeinformation that we are fetching and also any other > missing logging in the event hub system. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SAMZA-1674) EventHub: Rename readLatency to consumptionLagMs
[ https://issues.apache.org/jira/browse/SAMZA-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-1674: --- Fix Version/s: 0.14.1 > EventHub: Rename readLatency to consumptionLagMs > > > Key: SAMZA-1674 > URL: https://issues.apache.org/jira/browse/SAMZA-1674 > Project: Samza > Issue Type: Sub-task >Reporter: Srinivasulu Punuru >Assignee: Hai >Priority: Major > Labels: EventHub > Fix For: 0.14.1 > > > Rename the metric readLatency to consumptionLagMs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SAMZA-1625) EventHub systemAdmin is swallowing exceptions
[ https://issues.apache.org/jira/browse/SAMZA-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-1625: --- Fix Version/s: 0.14.1 > EventHub systemAdmin is swallowing exceptions > - > > Key: SAMZA-1625 > URL: https://issues.apache.org/jira/browse/SAMZA-1625 > Project: Samza > Issue Type: Sub-task >Reporter: Srinivasulu Punuru >Assignee: Hai >Priority: Major > Labels: EventHub > Fix For: 0.14.1 > > > Right now eventhubsystem admin is swallowing exception and creating a new > exception rather than wrapping them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SAMZA-1676) miscellaneous fix and improvement for eventhubs system
[ https://issues.apache.org/jira/browse/SAMZA-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-1676: --- Fix Version/s: 0.14.1 > miscellaneous fix and improvement for eventhubs system > -- > > Key: SAMZA-1676 > URL: https://issues.apache.org/jira/browse/SAMZA-1676 > Project: Samza > Issue Type: Improvement >Reporter: Hai >Assignee: Hai >Priority: Major > Labels: EventHub > Fix For: 0.14.1 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SAMZA-1690) AsyncSystemProducer should enforce max pending sends
Hai created SAMZA-1690: -- Summary: AsyncSystemProducer should enforce max pending sends Key: SAMZA-1690 URL: https://issues.apache.org/jira/browse/SAMZA-1690 Project: Samza Issue Type: Improvement Reporter: Hai Assignee: Srinivasulu Punuru Right now AsyncSystemProducer accepts any number of AsyncSend regardless of the existing pending sends. Ideally, we should effectively block even the AsyncSend if the pending sends count or pending sends bytes reach a very high limit. In EventHubs we have seen TimeoutException if we don't do this kind of flow control. We don't such issues with Kafka producer, though. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (SAMZA-1688) use per partition eventhubs client
[ https://issues.apache.org/jira/browse/SAMZA-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai resolved SAMZA-1688. Resolution: Fixed > use per partition eventhubs client > -- > > Key: SAMZA-1688 > URL: https://issues.apache.org/jira/browse/SAMZA-1688 > Project: Samza > Issue Type: Improvement >Reporter: Hai >Assignee: Hai >Priority: Major > Labels: EventHub > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SAMZA-1688) use per partition eventhubs client
Hai created SAMZA-1688: -- Summary: use per partition eventhubs client Key: SAMZA-1688 URL: https://issues.apache.org/jira/browse/SAMZA-1688 Project: Samza Issue Type: Improvement Reporter: Hai Assignee: Hai -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SAMZA-1688) use per partition eventhubs client
[ https://issues.apache.org/jira/browse/SAMZA-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-1688: --- Labels: EventHub (was: ) > use per partition eventhubs client > -- > > Key: SAMZA-1688 > URL: https://issues.apache.org/jira/browse/SAMZA-1688 > Project: Samza > Issue Type: Improvement >Reporter: Hai >Assignee: Hai >Priority: Major > Labels: EventHub > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (SAMZA-1625) EventHub systemAdmin is swallowing exceptions
[ https://issues.apache.org/jira/browse/SAMZA-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai resolved SAMZA-1625. Resolution: Fixed > EventHub systemAdmin is swallowing exceptions > - > > Key: SAMZA-1625 > URL: https://issues.apache.org/jira/browse/SAMZA-1625 > Project: Samza > Issue Type: Sub-task >Reporter: Srinivasulu Punuru >Assignee: Hai >Priority: Major > Labels: EventHub > > Right now eventhubsystem admin is swallowing exception and creating a new > exception rather than wrapping them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (SAMZA-1673) EventHub: readLatency metric returns negative value
[ https://issues.apache.org/jira/browse/SAMZA-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai resolved SAMZA-1673. Resolution: Fixed > EventHub: readLatency metric returns negative value > --- > > Key: SAMZA-1673 > URL: https://issues.apache.org/jira/browse/SAMZA-1673 > Project: Samza > Issue Type: Sub-task >Reporter: Srinivasulu Punuru >Assignee: Hai >Priority: Major > Labels: EventHub > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (SAMZA-1674) EventHub: Rename readLatency to consumptionLagMs
[ https://issues.apache.org/jira/browse/SAMZA-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai resolved SAMZA-1674. Resolution: Fixed > EventHub: Rename readLatency to consumptionLagMs > > > Key: SAMZA-1674 > URL: https://issues.apache.org/jira/browse/SAMZA-1674 > Project: Samza > Issue Type: Sub-task >Reporter: Srinivasulu Punuru >Assignee: Hai >Priority: Major > Labels: EventHub > > Rename the metric readLatency to consumptionLagMs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (SAMZA-1675) EventHub: Log the metadata that we are fetching from the event hubs.
[ https://issues.apache.org/jira/browse/SAMZA-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai resolved SAMZA-1675. Resolution: Fixed > EventHub: Log the metadata that we are fetching from the event hubs. > > > Key: SAMZA-1675 > URL: https://issues.apache.org/jira/browse/SAMZA-1675 > Project: Samza > Issue Type: Sub-task >Reporter: Srinivasulu Punuru >Assignee: Hai >Priority: Major > Labels: EventHub > > Log the partitionRuntimeinformation that we are fetching and also any other > missing logging in the event hub system. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (SAMZA-1676) miscellaneous fix and improvement for eventhubs system
[ https://issues.apache.org/jira/browse/SAMZA-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai resolved SAMZA-1676. Resolution: Fixed > miscellaneous fix and improvement for eventhubs system > -- > > Key: SAMZA-1676 > URL: https://issues.apache.org/jira/browse/SAMZA-1676 > Project: Samza > Issue Type: Improvement >Reporter: Hai >Assignee: Hai >Priority: Major > Labels: EventHub > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SAMZA-1625) EventHub systemAdmin is swallowing exceptions
[ https://issues.apache.org/jira/browse/SAMZA-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-1625: --- Issue Type: Sub-task (was: Bug) Parent: SAMZA-1676 > EventHub systemAdmin is swallowing exceptions > - > > Key: SAMZA-1625 > URL: https://issues.apache.org/jira/browse/SAMZA-1625 > Project: Samza > Issue Type: Sub-task >Reporter: Srinivasulu Punuru >Assignee: Hai >Priority: Major > Labels: EventHub > > Right now eventhubsystem admin is swallowing exception and creating a new > exception rather than wrapping them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SAMZA-1673) EventHub: readLatency metric returns negative value
[ https://issues.apache.org/jira/browse/SAMZA-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-1673: --- Issue Type: Sub-task (was: Bug) Parent: SAMZA-1676 > EventHub: readLatency metric returns negative value > --- > > Key: SAMZA-1673 > URL: https://issues.apache.org/jira/browse/SAMZA-1673 > Project: Samza > Issue Type: Sub-task >Reporter: Srinivasulu Punuru >Assignee: Hai >Priority: Major > Labels: EventHub > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SAMZA-1674) EventHub: Rename readLatency to consumptionLagMs
[ https://issues.apache.org/jira/browse/SAMZA-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-1674: --- Issue Type: Sub-task (was: Bug) Parent: SAMZA-1676 > EventHub: Rename readLatency to consumptionLagMs > > > Key: SAMZA-1674 > URL: https://issues.apache.org/jira/browse/SAMZA-1674 > Project: Samza > Issue Type: Sub-task >Reporter: Srinivasulu Punuru >Assignee: Hai >Priority: Major > Labels: EventHub > > Rename the metric readLatency to consumptionLagMs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SAMZA-1675) EventHub: Log the metadata that we are fetching from the event hubs.
[ https://issues.apache.org/jira/browse/SAMZA-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-1675: --- Issue Type: Sub-task (was: Bug) Parent: SAMZA-1676 > EventHub: Log the metadata that we are fetching from the event hubs. > > > Key: SAMZA-1675 > URL: https://issues.apache.org/jira/browse/SAMZA-1675 > Project: Samza > Issue Type: Sub-task >Reporter: Srinivasulu Punuru >Assignee: Hai >Priority: Major > Labels: EventHub > > Log the partitionRuntimeinformation that we are fetching and also any other > missing logging in the event hub system. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SAMZA-1676) miscellaneous fix and improvement for eventhubs system
[ https://issues.apache.org/jira/browse/SAMZA-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-1676: --- Labels: EventHub (was: ) > miscellaneous fix and improvement for eventhubs system > -- > > Key: SAMZA-1676 > URL: https://issues.apache.org/jira/browse/SAMZA-1676 > Project: Samza > Issue Type: Improvement >Reporter: Hai >Assignee: Hai >Priority: Major > Labels: EventHub > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SAMZA-1676) miscellaneous fix and improvement for eventhubs system
Hai created SAMZA-1676: -- Summary: miscellaneous fix and improvement for eventhubs system Key: SAMZA-1676 URL: https://issues.apache.org/jira/browse/SAMZA-1676 Project: Samza Issue Type: Improvement Reporter: Hai Assignee: Hai -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (SAMZA-1675) EventHub: Log the metadata that we are fetching from the event hubs.
[ https://issues.apache.org/jira/browse/SAMZA-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai reassigned SAMZA-1675: -- Assignee: Hai > EventHub: Log the metadata that we are fetching from the event hubs. > > > Key: SAMZA-1675 > URL: https://issues.apache.org/jira/browse/SAMZA-1675 > Project: Samza > Issue Type: Bug >Reporter: Srinivasulu Punuru >Assignee: Hai >Priority: Major > Labels: EventHub > > Log the partitionRuntimeinformation that we are fetching and also any other > missing logging in the event hub system. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (SAMZA-1674) EventHub: Rename readLatency to consumptionLagMs
[ https://issues.apache.org/jira/browse/SAMZA-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai reassigned SAMZA-1674: -- Assignee: Hai > EventHub: Rename readLatency to consumptionLagMs > > > Key: SAMZA-1674 > URL: https://issues.apache.org/jira/browse/SAMZA-1674 > Project: Samza > Issue Type: Bug >Reporter: Srinivasulu Punuru >Assignee: Hai >Priority: Major > Labels: EventHub > > Rename the metric readLatency to consumptionLagMs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (SAMZA-1673) EventHub: readLatency metric returns negative value
[ https://issues.apache.org/jira/browse/SAMZA-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai reassigned SAMZA-1673: -- Assignee: Hai > EventHub: readLatency metric returns negative value > --- > > Key: SAMZA-1673 > URL: https://issues.apache.org/jira/browse/SAMZA-1673 > Project: Samza > Issue Type: Bug >Reporter: Srinivasulu Punuru >Assignee: Hai >Priority: Major > Labels: EventHub > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (SAMZA-1624) EventHub system should prefix the configs with senstive for SasKey and SasToken
[ https://issues.apache.org/jira/browse/SAMZA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai resolved SAMZA-1624. Resolution: Fixed > EventHub system should prefix the configs with senstive for SasKey and > SasToken > --- > > Key: SAMZA-1624 > URL: https://issues.apache.org/jira/browse/SAMZA-1624 > Project: Samza > Issue Type: Bug >Reporter: Srinivasulu Punuru >Priority: Major > Labels: EventHub > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (SAMZA-1625) EventHub systemAdmin is swallowing exceptions
[ https://issues.apache.org/jira/browse/SAMZA-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai reassigned SAMZA-1625: -- Assignee: Hai > EventHub systemAdmin is swallowing exceptions > - > > Key: SAMZA-1625 > URL: https://issues.apache.org/jira/browse/SAMZA-1625 > Project: Samza > Issue Type: Bug >Reporter: Srinivasulu Punuru >Assignee: Hai >Priority: Major > Labels: EventHub > > Right now eventhubsystem admin is swallowing exception and creating a new > exception rather than wrapping them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SAMZA-1654) calculate the exact size of the AMQP message for skipping large message
Hai created SAMZA-1654: -- Summary: calculate the exact size of the AMQP message for skipping large message Key: SAMZA-1654 URL: https://issues.apache.org/jira/browse/SAMZA-1654 Project: Samza Issue Type: Improvement Reporter: Hai Assignee: Hai Today we only calculate the event size based on the body size which is not accurate. We are waiting for EventHubs client to expose the API that returns the actual eventData size. See more info below https://github.com/Azure/azure-event-hubs-java/issues/305 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (SAMZA-1623) AvroDataFileHdfsWriter should include avro as the file suffix
[ https://issues.apache.org/jira/browse/SAMZA-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai resolved SAMZA-1623. Resolution: Fixed > AvroDataFileHdfsWriter should include avro as the file suffix > - > > Key: SAMZA-1623 > URL: https://issues.apache.org/jira/browse/SAMZA-1623 > Project: Samza > Issue Type: Task >Reporter: Hai >Assignee: Hai >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SAMZA-1623) AvroDataFileHdfsWriter should include avro as the file suffix
Hai created SAMZA-1623: -- Summary: AvroDataFileHdfsWriter should include avro as the file suffix Key: SAMZA-1623 URL: https://issues.apache.org/jira/browse/SAMZA-1623 Project: Samza Issue Type: Task Reporter: Hai Assignee: Hai -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SAMZA-1622) avro writer in hdfs system producer to support generic record
Hai created SAMZA-1622: -- Summary: avro writer in hdfs system producer to support generic record Key: SAMZA-1622 URL: https://issues.apache.org/jira/browse/SAMZA-1622 Project: Samza Issue Type: Bug Reporter: Hai Assignee: Hai -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (SAMZA-1618) fix HdfsFileSystemAdapter to get files recursively
[ https://issues.apache.org/jira/browse/SAMZA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai resolved SAMZA-1618. Resolution: Fixed > fix HdfsFileSystemAdapter to get files recursively > -- > > Key: SAMZA-1618 > URL: https://issues.apache.org/jira/browse/SAMZA-1618 > Project: Samza > Issue Type: Bug >Reporter: Hai >Assignee: Hai >Priority: Major > > Right now HdfsFileSystemAdapter doesn't handle subfolder -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SAMZA-1618) fix HdfsFileSystemAdapter to get files recursively
Hai created SAMZA-1618: -- Summary: fix HdfsFileSystemAdapter to get files recursively Key: SAMZA-1618 URL: https://issues.apache.org/jira/browse/SAMZA-1618 Project: Samza Issue Type: Bug Reporter: Hai Assignee: Hai Right now HdfsFileSystemAdapter doesn't handle subfolder -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SAMZA-1471) No polling for end-of-stream partitions in SystemConsumers
[ https://issues.apache.org/jira/browse/SAMZA-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-1471: --- Labels: samzaHadoop (was: ) > No polling for end-of-stream partitions in SystemConsumers > -- > > Key: SAMZA-1471 > URL: https://issues.apache.org/jira/browse/SAMZA-1471 > Project: Samza > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Xinyu Liu >Assignee: Hai > Labels: samzaHadoop > Fix For: 0.14.0 > > > Currently when a stream reached the end, it's not removed from the empty > partition set and will be kept polling afterwards, which affects performance > a lot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (SAMZA-1463) Disable flaky tests in hdfs
[ https://issues.apache.org/jira/browse/SAMZA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai resolved SAMZA-1463. Resolution: Fixed > Disable flaky tests in hdfs > --- > > Key: SAMZA-1463 > URL: https://issues.apache.org/jira/browse/SAMZA-1463 > Project: Samza > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Xinyu Liu >Assignee: Hai > > 1. testHdfsSystemConsumerE2E FAILED > java.lang.AssertionError: Did not receive all the events. Retry counter = > 0 expected:<303> but was:<302> > at org.junit.Assert.fail(Assert.java:91) > at org.junit.Assert.failNotEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:126) > at org.junit.Assert.assertEquals(Assert.java:470) > at > org.apache.samza.system.hdfs.TestHdfsSystemConsumer.testHdfsSystemConsumerE2E(TestHdfsSystemConsumer.java:125) > testHdfsSystemProducerWriteAvroBatches FAILED > java.io.FileNotFoundException: File > /user/jenkins/samza-hdfs-test-batch-job-avro/2017_10_11-19 does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:697) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751) > at > org.apache.samza.system.hdfs.TestHdfsSystemProducerTestSuite.testHdfsSystemProducerWriteAvroBatches(TestHdfsSystemProducerTestSuite.scala:321) > > testHdfsSystemProducerBinaryWrite FAILED > java.io.FileNotFoundException: File > /user/jenkins/samza-hdfs-test-job/2017_10_11-19 does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:697) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751) > at > org.apache.samza.system.hdfs.TestHdfsSystemProducerTestSuite.testHdfsSystemProducerBinaryWrite(TestHdfsSystemProducerTestSuite.scala:127) > > testHdfsSystemProducerTextWrite FAILED > java.io.FileNotFoundException: File > /user/jenkins/samza-hdfs-test-job-text/2017_10_11-19 does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:697) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751) > at > org.apache.samza.system.hdfs.TestHdfsSystemProducerTestSuite.testHdfsSystemProducerTextWrite(TestHdfsSystemProducerTestSuite.scala:204) > > testHdfsSystemProducerWriteTextBatches FAILED > java.io.FileNotFoundException: File > /user/jenkins/samza-hdfs-test-batch-job-text/2017_10_11-19 does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:697) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751) > at > org.apache.samza.system.hdfs.TestHdfsSystemProducerTestSuite.testHdfsSystemProducerWriteTextBatches(TestHdfsSystemProducerTestSuite.scala:242) > > testHdfsSystemProducerAvroWrite FAILED > java.io.FileNotFoundException: File > /user/jenkins/samza-hdfs-test-job-avro/2017_10_11-19 does not exist. > at >
[jira] [Commented] (SAMZA-1034) Support LATEST path and single file as the input of HDFSSystemConsumer
[ https://issues.apache.org/jira/browse/SAMZA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563296#comment-15563296 ] Hai commented on SAMZA-1034: On second thought, this is probably not a very interesting feature on open source. > Support LATEST path and single file as the input of HDFSSystemConsumer > -- > > Key: SAMZA-1034 > URL: https://issues.apache.org/jira/browse/SAMZA-1034 > Project: Samza > Issue Type: Sub-task >Reporter: Hai >Assignee: Hai > > Right now users have to specify a directory of the HDFS to be consumed. Many > other systems support the idea of "LATEST" file/directory. For example, if we > have: > /data/database/db1/snapshot-2016-09-10 > /data/database/db1/snapshot-2016-09-11 > Then "/data/database/db1/#LATEST" will automatically point to > "/data/database/db1/snapshot-2016-09-11" > We want to implement such a feature. > Plus, we also want to support the consumption of single file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SAMZA-1034) Support LATEST path and single file as the input of HDFSSystemConsumer
[ https://issues.apache.org/jira/browse/SAMZA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556764#comment-15556764 ] Hai commented on SAMZA-1034: https://reviews.apache.org/r/52660/ > Support LATEST path and single file as the input of HDFSSystemConsumer > -- > > Key: SAMZA-1034 > URL: https://issues.apache.org/jira/browse/SAMZA-1034 > Project: Samza > Issue Type: Sub-task >Reporter: Hai >Assignee: Hai > > Right now users have to specify a directory of the HDFS to be consumed. Many > other systems support the idea of "LATEST" file/directory. For example, if we > have: > /data/database/db1/snapshot-2016-09-10 > /data/database/db1/snapshot-2016-09-11 > Then "/data/database/db1/#LATEST" will automatically point to > "/data/database/db1/snapshot-2016-09-11" > We want to implement such a feature. > Plus, we also want to support the consumption of single file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SAMZA-1025) Documentation page for HDFS System Consumer feature
[ https://issues.apache.org/jira/browse/SAMZA-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556766#comment-15556766 ] Hai commented on SAMZA-1025: https://reviews.apache.org/r/52570/ > Documentation page for HDFS System Consumer feature > --- > > Key: SAMZA-1025 > URL: https://issues.apache.org/jira/browse/SAMZA-1025 > Project: Samza > Issue Type: Sub-task >Reporter: Hai >Assignee: Hai > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SAMZA-1034) Support LATEST path and single file as the input of HDFSSystemConsumer
Hai created SAMZA-1034: -- Summary: Support LATEST path and single file as the input of HDFSSystemConsumer Key: SAMZA-1034 URL: https://issues.apache.org/jira/browse/SAMZA-1034 Project: Samza Issue Type: Sub-task Reporter: Hai Assignee: Hai Right now users have to specify a directory of the HDFS to be consumed. Many other systems support the idea of "LATEST" file/directory. For example, if we have: /data/database/db1/snapshot-2016-09-10 /data/database/db1/snapshot-2016-09-11 Then "/data/database/db1/#LATEST" will automatically point to "/data/database/db1/snapshot-2016-09-11" We want to implement such a feature. Plus, we also want to support the consumption of single file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SAMZA-1032) create staging directory in the job level instead of the YARN module
Hai created SAMZA-1032: -- Summary: create staging directory in the job level instead of the YARN module Key: SAMZA-1032 URL: https://issues.apache.org/jira/browse/SAMZA-1032 Project: Samza Issue Type: Bug Reporter: Hai Priority: Minor YarnConfig currently has a "YARN_JOB_STAGING_DIRECTORY". YARN creates the directory in the beginning and clean it up when the job is shutdown. Ideally this should be common to any cluster manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SAMZA-1026) HDFS System Producer should not have Kafka dependency
Hai created SAMZA-1026: -- Summary: HDFS System Producer should not have Kafka dependency Key: SAMZA-1026 URL: https://issues.apache.org/jira/browse/SAMZA-1026 Project: Samza Issue Type: Sub-task Reporter: Hai Currently HDFSSystemFactory has seemly unnecessary dependency on Kafka: def getProducer(systemName: String, config: Config, registry: MetricsRegistry) = { val clientId = KafkaUtil.getClientId("samza-producer", config) val metrics = new HdfsSystemProducerMetrics(systemName, registry) new HdfsSystemProducer(systemName, clientId, config, metrics) } Should try to get rid of the dependency -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SAMZA-1025) Documentation page for HDFS System Consumer feature
Hai created SAMZA-1025: -- Summary: Documentation page for HDFS System Consumer feature Key: SAMZA-1025 URL: https://issues.apache.org/jira/browse/SAMZA-1025 Project: Samza Issue Type: Sub-task Reporter: Hai Assignee: Hai -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SAMZA-967) Add HDFS system consumer to Samza
[ https://issues.apache.org/jira/browse/SAMZA-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-967: -- Attachment: HDFSSystemConsumer.pdf > Add HDFS system consumer to Samza > - > > Key: SAMZA-967 > URL: https://issues.apache.org/jira/browse/SAMZA-967 > Project: Samza > Issue Type: Sub-task >Reporter: Hai >Assignee: Hai > Fix For: 0.12.0 > > Attachments: HDFSSystemConsumer.pdf > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SAMZA-967) Add HDFS system consumer to Samza
[ https://issues.apache.org/jira/browse/SAMZA-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15475223#comment-15475223 ] Hai commented on SAMZA-967: --- [~navina] Thanks so much for your valuable feedback. Please take a look at the updated RB when you are free. In regards to your comments on the design doc, I have updated the design doc as well, here is my answers to your questions: Q: Is the “End of Stream” feature a pre-requisite for HDFS consumer? If yes, link the corresponding JIRA and design document. Providing a high-level description of how that feature will be leveraged for solving this problem will layout more ground-work for readers who are not familiar about this A: Yes. Updated the doc and the jira to reflect that Samza-974 is a pre-requisite Q: One of the goals and non-goals are slightly overlapping. "(Goal) The system consumer should support a variety of folder structures and filename conventions" and "(Non-Goal) Support ALL kinds of HDFS folder structures and filename formats" . Can you specifically call out which structure and conventions you are supporting or call out which ones you are not supporting? Just to more clarity to the document. A: Updated the doc to be more specific. Q: Along with the 3rd point under Assumptions, you can call out "write-once, read-many" as the underlying usage pattern. A: Done Q: What does the whitelist and blacklist here consists of ? Why do we need both ? Can you provide example of how this config will look like? A: As pointed out in the design doc, this is to simplify the regex by having two instead of one regex. Many systems including kafka is doing this. You can always craft one regex to combine whitelist and blacklist, but that's gonna look complicated. Updated doc to give examples. Q: In case of repartitioner, multiple samza tasks cannot write to the same file. Hence, each task can write in a separate file within the partition directory -> what defines the ordering among these files when the downstream job is consuming ? is it based on timestamp? A: In this case there is no ordering among these files. Let's imaging, instead of writing to HDFS, we write to Kafka, then you also have no ordering within the samza topic partition when the events are coming from different upstream producers. Q: when does the HDFSSystemAdmin write the PartitionDescriptor to HDFS?? Is it done by the job coordinator or by each container? A: This is more of an implementation details so I didn't provide specifics on the doc. You are right, it's done by job coordinator. It happens when getSystemStreamMetadata is called given the current implementation. Q: Is the PartitionDescriptor file expected to follow any convention? Or is it simply going to contain a map? A: It's simply a map in the json format. > Add HDFS system consumer to Samza > - > > Key: SAMZA-967 > URL: https://issues.apache.org/jira/browse/SAMZA-967 > Project: Samza > Issue Type: Sub-task >Reporter: Hai >Assignee: Hai > Fix For: 0.12.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SAMZA-967) Add HDFS system consumer to Samza
[ https://issues.apache.org/jira/browse/SAMZA-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hai updated SAMZA-967: -- Attachment: HDFSSystemConsumer.pdf > Add HDFS system consumer to Samza > - > > Key: SAMZA-967 > URL: https://issues.apache.org/jira/browse/SAMZA-967 > Project: Samza > Issue Type: Sub-task >Reporter: Hai >Assignee: Hai > Fix For: 0.11.0 > > Attachments: HDFSSystemConsumer.pdf > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SAMZA-967) Add HDFS system consumer to Samza
Hai created SAMZA-967: - Summary: Add HDFS system consumer to Samza Key: SAMZA-967 URL: https://issues.apache.org/jira/browse/SAMZA-967 Project: Samza Issue Type: Sub-task Reporter: Hai -- This message was sent by Atlassian JIRA (v6.3.4#6332)