[jira] [Created] (NIFI-5853) Support ZooKeeper as a backend for DistributedMapCacheServer
Boris Tyukin created NIFI-5853: -- Summary: Support ZooKeeper as a backend for DistributedMapCacheServer Key: NIFI-5853 URL: https://issues.apache.org/jira/browse/NIFI-5853 Project: Apache NiFi Issue Type: Improvement Components: Extensions Affects Versions: 1.8.0 Reporter: Boris Tyukin ZooKeeper is readily available on most big data clusters, already used by NiFi cluster to store state, highly reliable and scalable. It would be great to support it as a backend for DistributedMapCacheServer -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5064) Fixes and improvements to PutKudu processor
[ https://issues.apache.org/jira/browse/NIFI-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686955#comment-16686955 ] Boris Tyukin commented on NIFI-5064: [~cammach] I think you created this processor so wanted to see if you can clarify my question above > Fixes and improvements to PutKudu processor > --- > > Key: NIFI-5064 > URL: https://issues.apache.org/jira/browse/NIFI-5064 > Project: Apache NiFi > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Junegunn Choi >Priority: Major > Fix For: 1.7.0 > > > 1. Currently, PutKudu fails with NPE on null or missing values. > 2. {{IllegalArgumentException}} on 16-bit integer columns because of [a > missing {{break}} in case clause for INT16 > columns|https://github.com/apache/nifi/blob/rel/nifi-1.6.0/nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/PutKudu.java#L112-L115]. > 3. Also, {{IllegalArgumentException}} on 8-bit integer columns. We need a > separate case clause for INT8 columns where {{PartialRow#addByte}} instead of > {{PartialRow#addShort}} is be used. > 4. NIFI-4384 added batch size parameter, however, it only applies to > FlowFiles with multiple records. {{KuduSession}} is created and closed for > each FlowFile, so if a FlowFile contains only a single record, no batching > takes place. A workaround would be to use a preprocessor to concatenate > multiple FlowFiles, but since {{PutHBase}} and {{PutSQL}} use > {{session.get(batchSize)}} to handle multiple FlowFiles at once, I think we > can take the same approach here with PutKudu as it simplifies the data flow. > 5. {{PutKudu}} depends on kudu-client 1.3.0. But we can safely update to > 1.7.0. > - [https://github.com/apache/kudu/blob/1.7.0/docs/release_notes.adoc] > - [https://github.com/apache/kudu/blob/1.7.0/docs/prior_release_notes.adoc] > A notable change in Kudu 1.7.0 is the addition of Decimal type. > 6. {{PutKudu}} has {{Skip head line}} property for ignoring the first record > in a FlowFile. I suppose this was added to handle header lines in CSV files, > but I really don't think it's something {{PutKudu}} should handle. > {{CSVReader}} already has {{Treat First Line as Header}} option, so we should > tell the users to use it instead as we don't want to have the same option > here and there. Also, the default value of {{Skip head line}} is {{true}}, > and I found it very confusing as my use case was to stream-process > single-record FlowFiles. We can keep this property for backward > compatibility, but we should at least deprecate it and change the default > value to {{false}}. > 7. Server-side errors such as uniqueness constraint violation are not checked > and simply ignored. When flush mode is set to {{AUTO_FLUSH_SYNC}}, we should > check the return value of {{KuduSession#apply}} to see it has {{RowError}}, > but PutKudu currently ignores it. For example, on uniqueness constraint > violation, we get a {{RowError}} saying "_Already present: key already > present (error 0)_". > On the other hand, when flush mode is set to {{AUTO_FLUSH_BACKGROUND}}, > {{KuduSession#apply}}, understandably, returns null, and we should check the > return value of {{KuduSession#getPendingErrors()}}. And when the mode is > {{MANUAL_FLUSH}}, we should examine the return value of > {{KuduSession#flush()}} or {{KuduSession#close()}}. In this case, we also > have to make sure that we don't overflow the mutation buffer of > {{KuduSession}} by calling {{flush()}} before too late. > > I'll create a pull request on GitHub. Since there are multiple issues to be > addressed, I made separate commits for each issue mentioned above so that > it's easier to review. You might want to squash them into one, or cherry-pick > a subset of commits if you don't agree with some decisions I made. > Please let me know what you think. We deployed the code to a production > server last week and it's been running since without any issues steadily > processing 20K records/second. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5064) Fixes and improvements to PutKudu processor
[ https://issues.apache.org/jira/browse/NIFI-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686951#comment-16686951 ] Boris Tyukin commented on NIFI-5064: [~junegunn] awesome and much-needed changes, thanks a bunch! Is there any strong reason why processor cannot support dynamic table name? I do see that client and table name is initialized in onschedule method but technically we can move table init to ontrigger and make it use expression > Fixes and improvements to PutKudu processor > --- > > Key: NIFI-5064 > URL: https://issues.apache.org/jira/browse/NIFI-5064 > Project: Apache NiFi > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Junegunn Choi >Priority: Major > Fix For: 1.7.0 > > > 1. Currently, PutKudu fails with NPE on null or missing values. > 2. {{IllegalArgumentException}} on 16-bit integer columns because of [a > missing {{break}} in case clause for INT16 > columns|https://github.com/apache/nifi/blob/rel/nifi-1.6.0/nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/PutKudu.java#L112-L115]. > 3. Also, {{IllegalArgumentException}} on 8-bit integer columns. We need a > separate case clause for INT8 columns where {{PartialRow#addByte}} instead of > {{PartialRow#addShort}} is be used. > 4. NIFI-4384 added batch size parameter, however, it only applies to > FlowFiles with multiple records. {{KuduSession}} is created and closed for > each FlowFile, so if a FlowFile contains only a single record, no batching > takes place. A workaround would be to use a preprocessor to concatenate > multiple FlowFiles, but since {{PutHBase}} and {{PutSQL}} use > {{session.get(batchSize)}} to handle multiple FlowFiles at once, I think we > can take the same approach here with PutKudu as it simplifies the data flow. > 5. {{PutKudu}} depends on kudu-client 1.3.0. But we can safely update to > 1.7.0. > - [https://github.com/apache/kudu/blob/1.7.0/docs/release_notes.adoc] > - [https://github.com/apache/kudu/blob/1.7.0/docs/prior_release_notes.adoc] > A notable change in Kudu 1.7.0 is the addition of Decimal type. > 6. {{PutKudu}} has {{Skip head line}} property for ignoring the first record > in a FlowFile. I suppose this was added to handle header lines in CSV files, > but I really don't think it's something {{PutKudu}} should handle. > {{CSVReader}} already has {{Treat First Line as Header}} option, so we should > tell the users to use it instead as we don't want to have the same option > here and there. Also, the default value of {{Skip head line}} is {{true}}, > and I found it very confusing as my use case was to stream-process > single-record FlowFiles. We can keep this property for backward > compatibility, but we should at least deprecate it and change the default > value to {{false}}. > 7. Server-side errors such as uniqueness constraint violation are not checked > and simply ignored. When flush mode is set to {{AUTO_FLUSH_SYNC}}, we should > check the return value of {{KuduSession#apply}} to see it has {{RowError}}, > but PutKudu currently ignores it. For example, on uniqueness constraint > violation, we get a {{RowError}} saying "_Already present: key already > present (error 0)_". > On the other hand, when flush mode is set to {{AUTO_FLUSH_BACKGROUND}}, > {{KuduSession#apply}}, understandably, returns null, and we should check the > return value of {{KuduSession#getPendingErrors()}}. And when the mode is > {{MANUAL_FLUSH}}, we should examine the return value of > {{KuduSession#flush()}} or {{KuduSession#close()}}. In this case, we also > have to make sure that we don't overflow the mutation buffer of > {{KuduSession}} by calling {{flush()}} before too late. > > I'll create a pull request on GitHub. Since there are multiple issues to be > addressed, I made separate commits for each issue mentioned above so that > it's easier to review. You might want to squash them into one, or cherry-pick > a subset of commits if you don't agree with some decisions I made. > Please let me know what you think. We deployed the code to a production > server last week and it's been running since without any issues steadily > processing 20K records/second. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (NIFI-4877) ExecuteStreamCommand tasks stuck
[ https://issues.apache.org/jira/browse/NIFI-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Tyukin resolved NIFI-4877. Resolution: Duplicate duplicate of NIFI-528 > ExecuteStreamCommand tasks stuck > > > Key: NIFI-4877 > URL: https://issues.apache.org/jira/browse/NIFI-4877 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Affects Versions: 1.5.0 > Environment: Oracle Linux 6.8, NiFi 1.5.0 >Reporter: Boris Tyukin >Priority: Critical > > there is no way to stop/kill processes, executed by ExecuteStreamCommand. > Steps to reproduce: > # In ExecuteStreamCommand processer, use sleep as a command, and 100 as > argument - this will make it run for a while. Set ignore STDIN to true. > # Run the flow and stop ExecuteStreamCommand processor - it will stop but > not really. the task will be stuck forever and the only workaround I found is > to restart nifi service which is not cool at all. > # to make it worse, processor would not accept new flowfiles and you cannot > Run it. Only restart fixes the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-90) Replace explicit penalization with automatic penalization/back-off
[ https://issues.apache.org/jira/browse/NIFI-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388431#comment-16388431 ] Boris Tyukin commented on NIFI-90: -- I am surprised this Jira has not got any traction after 3 years...Having used Apache Airflow for a while, I am looking to retry capabilities in NiFi and it comes down to "building your own" flows, that would handle retries in a loop and then sleeping for some time. The best solution I found, suggested by [Alessio Palma|https://community.hortonworks.com/users/16011/ozw1z5rd.html] [https://community.hortonworks.com/questions/56167/is-there-wait-processor-in-nifi.html] but still would be nice to have re-try capabilities like Airflow folks. You can specify global re-try behavior for a flow or override per task/processor. This helps a lot to deal with intermittent issues, like losing network connection or source database system, being down for maintenance. > Replace explicit penalization with automatic penalization/back-off > -- > > Key: NIFI-90 > URL: https://issues.apache.org/jira/browse/NIFI-90 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Joseph Witt >Priority: Minor > > Rather than having users configure explicit penalization periods and > requiring developers to implement it in their processors we can automate > this. Perhaps keep a LinkedHashMap of size 5 or so > in the FlowFileRecord construct. When a FlowFile is routed to a Connection, > the counter is incremented. If the counter exceeds 3 visits to the same > connection, the FlowFile will be automatically penalized. This protects us > "5 hops out" so that if we have something like DistributeLoad -> PostHTTP > with failure looping back to DistributeLoad, we will still penalize when > appropriate. > In addition, we will remove the configuration option from the UI, setting the > penalization period to some default such as 5 seconds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (NIFI-4921) better support for promoting NiFi processor parameters between dev and prod environments
Boris Tyukin created NIFI-4921: -- Summary: better support for promoting NiFi processor parameters between dev and prod environments Key: NIFI-4921 URL: https://issues.apache.org/jira/browse/NIFI-4921 Project: Apache NiFi Issue Type: Improvement Components: Flow Versioning, SDLC Affects Versions: 1.6.0 Reporter: Boris Tyukin Need a better way to promote processor parameters, like "Concurrent tasks" from development to production environments. Bryan Bende suggested: I think we may want to consider making the concurrent tasks work similar to variables, in that we capture the concurrent tasks that the flow was developed with and would use it initially, but then if you have modified this value in the target environment it would not trigger a local change and would be retained across upgrades so that you don't have to reset it. I would extend this Jira to similar parameters, that need to be changed manually now when promoting flows to production from dev/test environments and they cannot use expression language or variables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-3700) Create Sqoop processor
[ https://issues.apache.org/jira/browse/NIFI-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379257#comment-16379257 ] Boris Tyukin commented on NIFI-3700: +1 on this! sqoop is awesome! in the meantime, I ended up using ExecuteScript to call sqoop. Check my blog post [http://boristyukin.com/how-to-run-sqoop-from-nifi/] > Create Sqoop processor > -- > > Key: NIFI-3700 > URL: https://issues.apache.org/jira/browse/NIFI-3700 > Project: Apache NiFi > Issue Type: New Feature >Reporter: Joseph Petro >Priority: Minor > > A sqoop processor should be added. Sqoop is powerful for data ingestion and > export. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (NIFI-4877) ExecuteStreamCommand tasks stuck
[ https://issues.apache.org/jira/browse/NIFI-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Tyukin updated NIFI-4877: --- Description: there is no way to stop/kill processes, executed by ExecuteStreamCommand. Steps to reproduce: # In ExecuteStreamCommand processer, use sleep as a command, and 100 as argument - this will make it run for a while. Set ignore STDIN to true. # Run the flow and stop ExecuteStreamCommand processor - it will stop but not really. the task will be stuck forever and the only workaround I found is to restart nifi service which is not cool at all. # to make it worse, processor would not accept new flowfiles and you cannot Run it. Only restart fixes the issue. was: there is no way to stop/kill processes, executed by ExecuteStreamCommand. Steps to reproduce: # In ExecuteStreamCommand processer, use sleep as a command, and 100 as argument - this will make it run for a while. Set ignore STDIN to true. # Run the flow and stop ExecuteStreamCommand processor - it will stop but not really. the task will be stuck forever and the only workaround I found is to restart nifi service which is not cool at all. # to make it worth, processor would not accept new flowfiles and you cannot Run it. Only restart fixes the issue. > ExecuteStreamCommand tasks stuck > > > Key: NIFI-4877 > URL: https://issues.apache.org/jira/browse/NIFI-4877 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Affects Versions: 1.5.0 > Environment: Oracle Linux 6.8, NiFi 1.5.0 >Reporter: Boris Tyukin >Priority: Critical > > there is no way to stop/kill processes, executed by ExecuteStreamCommand. > Steps to reproduce: > # In ExecuteStreamCommand processer, use sleep as a command, and 100 as > argument - this will make it run for a while. Set ignore STDIN to true. > # Run the flow and stop ExecuteStreamCommand processor - it will stop but > not really. the task will be stuck forever and the only workaround I found is > to restart nifi service which is not cool at all. > # to make it worse, processor would not accept new flowfiles and you cannot > Run it. Only restart fixes the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (NIFI-4877) ExecuteStreamCommand tasks stuck
Boris Tyukin created NIFI-4877: -- Summary: ExecuteStreamCommand tasks stuck Key: NIFI-4877 URL: https://issues.apache.org/jira/browse/NIFI-4877 Project: Apache NiFi Issue Type: Improvement Components: Extensions Affects Versions: 1.5.0 Environment: Oracle Linux 6.8, NiFi 1.5.0 Reporter: Boris Tyukin there is no way to stop/kill processes, executed by ExecuteStreamCommand. Steps to reproduce: # In ExecuteStreamCommand processer, use sleep as a command, and 100 as argument - this will make it run for a while. Set ignore STDIN to true. # Run the flow and stop ExecuteStreamCommand processor - it will stop but not really. the task will be stuck forever and the only workaround I found is to restart nifi service which is not cool at all. # to make it worth, processor would not accept new flowfiles and you cannot Run it. Only restart fixes the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)