[jira] [Created] (NIFI-5853) Support ZooKeeper as a backend for DistributedMapCacheServer

2018-11-29 Thread Boris Tyukin (JIRA)
Boris Tyukin created NIFI-5853:
--

 Summary: Support ZooKeeper as a backend for 
DistributedMapCacheServer
 Key: NIFI-5853
 URL: https://issues.apache.org/jira/browse/NIFI-5853
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Extensions
Affects Versions: 1.8.0
Reporter: Boris Tyukin


ZooKeeper is readily available on most big data clusters, already used by NiFi 
cluster to store state, highly reliable and scalable. It would be great to 
support it as a backend for DistributedMapCacheServer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5064) Fixes and improvements to PutKudu processor

2018-11-14 Thread Boris Tyukin (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686955#comment-16686955
 ] 

Boris Tyukin commented on NIFI-5064:


[~cammach] I think you created this processor so wanted to see if you can 
clarify my question above

> Fixes and improvements to PutKudu processor
> ---
>
> Key: NIFI-5064
> URL: https://issues.apache.org/jira/browse/NIFI-5064
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.6.0
>Reporter: Junegunn Choi
>Priority: Major
> Fix For: 1.7.0
>
>
> 1. Currently, PutKudu fails with NPE on null or missing values.
> 2. {{IllegalArgumentException}} on 16-bit integer columns because of [a 
> missing {{break}} in case clause for INT16 
> columns|https://github.com/apache/nifi/blob/rel/nifi-1.6.0/nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/PutKudu.java#L112-L115].
> 3. Also, {{IllegalArgumentException}} on 8-bit integer columns. We need a 
> separate case clause for INT8 columns where {{PartialRow#addByte}} instead of 
> {{PartialRow#addShort}} is be used.
> 4. NIFI-4384 added batch size parameter, however, it only applies to 
> FlowFiles with multiple records. {{KuduSession}} is created and closed for 
> each FlowFile, so if a FlowFile contains only a single record, no batching 
> takes place. A workaround would be to use a preprocessor to concatenate 
> multiple FlowFiles, but since {{PutHBase}} and {{PutSQL}} use 
> {{session.get(batchSize)}} to handle multiple FlowFiles at once, I think we 
> can take the same approach here with PutKudu as it simplifies the data flow.
> 5. {{PutKudu}} depends on kudu-client 1.3.0. But we can safely update to 
> 1.7.0.
>  - [https://github.com/apache/kudu/blob/1.7.0/docs/release_notes.adoc]
>  - [https://github.com/apache/kudu/blob/1.7.0/docs/prior_release_notes.adoc]
> A notable change in Kudu 1.7.0 is the addition of Decimal type.
> 6. {{PutKudu}} has {{Skip head line}} property for ignoring the first record 
> in a FlowFile. I suppose this was added to handle header lines in CSV files, 
> but I really don't think it's something {{PutKudu}} should handle. 
> {{CSVReader}} already has {{Treat First Line as Header}} option, so we should 
> tell the users to use it instead as we don't want to have the same option 
> here and there. Also, the default value of {{Skip head line}} is {{true}}, 
> and I found it very confusing as my use case was to stream-process 
> single-record FlowFiles. We can keep this property for backward 
> compatibility, but we should at least deprecate it and change the default 
> value to {{false}}.
> 7. Server-side errors such as uniqueness constraint violation are not checked 
> and simply ignored. When flush mode is set to {{AUTO_FLUSH_SYNC}}, we should 
> check the return value of {{KuduSession#apply}} to see it has {{RowError}}, 
> but PutKudu currently ignores it. For example, on uniqueness constraint 
> violation, we get a {{RowError}} saying "_Already present: key already 
> present (error 0)_".
> On the other hand, when flush mode is set to {{AUTO_FLUSH_BACKGROUND}}, 
> {{KuduSession#apply}}, understandably, returns null, and we should check the 
> return value of {{KuduSession#getPendingErrors()}}. And when the mode is 
> {{MANUAL_FLUSH}}, we should examine the return value of 
> {{KuduSession#flush()}} or {{KuduSession#close()}}. In this case, we also 
> have to make sure that we don't overflow the mutation buffer of 
> {{KuduSession}} by calling {{flush()}} before too late.
> 
> I'll create a pull request on GitHub. Since there are multiple issues to be 
> addressed, I made separate commits for each issue mentioned above so that 
> it's easier to review. You might want to squash them into one, or cherry-pick 
> a subset of commits if you don't agree with some decisions I made.
> Please let me know what you think. We deployed the code to a production 
> server last week and it's been running since without any issues steadily 
> processing 20K records/second.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5064) Fixes and improvements to PutKudu processor

2018-11-14 Thread Boris Tyukin (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686951#comment-16686951
 ] 

Boris Tyukin commented on NIFI-5064:


 [~junegunn] awesome and much-needed changes, thanks a bunch! 

Is there any strong reason why processor cannot support dynamic table name? I 
do see that client and table name is initialized in onschedule method but 
technically we can move table init to ontrigger and make it use expression 

> Fixes and improvements to PutKudu processor
> ---
>
> Key: NIFI-5064
> URL: https://issues.apache.org/jira/browse/NIFI-5064
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.6.0
>Reporter: Junegunn Choi
>Priority: Major
> Fix For: 1.7.0
>
>
> 1. Currently, PutKudu fails with NPE on null or missing values.
> 2. {{IllegalArgumentException}} on 16-bit integer columns because of [a 
> missing {{break}} in case clause for INT16 
> columns|https://github.com/apache/nifi/blob/rel/nifi-1.6.0/nifi-nar-bundles/nifi-kudu-bundle/nifi-kudu-processors/src/main/java/org/apache/nifi/processors/kudu/PutKudu.java#L112-L115].
> 3. Also, {{IllegalArgumentException}} on 8-bit integer columns. We need a 
> separate case clause for INT8 columns where {{PartialRow#addByte}} instead of 
> {{PartialRow#addShort}} is be used.
> 4. NIFI-4384 added batch size parameter, however, it only applies to 
> FlowFiles with multiple records. {{KuduSession}} is created and closed for 
> each FlowFile, so if a FlowFile contains only a single record, no batching 
> takes place. A workaround would be to use a preprocessor to concatenate 
> multiple FlowFiles, but since {{PutHBase}} and {{PutSQL}} use 
> {{session.get(batchSize)}} to handle multiple FlowFiles at once, I think we 
> can take the same approach here with PutKudu as it simplifies the data flow.
> 5. {{PutKudu}} depends on kudu-client 1.3.0. But we can safely update to 
> 1.7.0.
>  - [https://github.com/apache/kudu/blob/1.7.0/docs/release_notes.adoc]
>  - [https://github.com/apache/kudu/blob/1.7.0/docs/prior_release_notes.adoc]
> A notable change in Kudu 1.7.0 is the addition of Decimal type.
> 6. {{PutKudu}} has {{Skip head line}} property for ignoring the first record 
> in a FlowFile. I suppose this was added to handle header lines in CSV files, 
> but I really don't think it's something {{PutKudu}} should handle. 
> {{CSVReader}} already has {{Treat First Line as Header}} option, so we should 
> tell the users to use it instead as we don't want to have the same option 
> here and there. Also, the default value of {{Skip head line}} is {{true}}, 
> and I found it very confusing as my use case was to stream-process 
> single-record FlowFiles. We can keep this property for backward 
> compatibility, but we should at least deprecate it and change the default 
> value to {{false}}.
> 7. Server-side errors such as uniqueness constraint violation are not checked 
> and simply ignored. When flush mode is set to {{AUTO_FLUSH_SYNC}}, we should 
> check the return value of {{KuduSession#apply}} to see it has {{RowError}}, 
> but PutKudu currently ignores it. For example, on uniqueness constraint 
> violation, we get a {{RowError}} saying "_Already present: key already 
> present (error 0)_".
> On the other hand, when flush mode is set to {{AUTO_FLUSH_BACKGROUND}}, 
> {{KuduSession#apply}}, understandably, returns null, and we should check the 
> return value of {{KuduSession#getPendingErrors()}}. And when the mode is 
> {{MANUAL_FLUSH}}, we should examine the return value of 
> {{KuduSession#flush()}} or {{KuduSession#close()}}. In this case, we also 
> have to make sure that we don't overflow the mutation buffer of 
> {{KuduSession}} by calling {{flush()}} before too late.
> 
> I'll create a pull request on GitHub. Since there are multiple issues to be 
> addressed, I made separate commits for each issue mentioned above so that 
> it's easier to review. You might want to squash them into one, or cherry-pick 
> a subset of commits if you don't agree with some decisions I made.
> Please let me know what you think. We deployed the code to a production 
> server last week and it's been running since without any issues steadily 
> processing 20K records/second.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (NIFI-4877) ExecuteStreamCommand tasks stuck

2018-03-07 Thread Boris Tyukin (JIRA)

 [ 
https://issues.apache.org/jira/browse/NIFI-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Tyukin resolved NIFI-4877.

Resolution: Duplicate

duplicate of NIFI-528

> ExecuteStreamCommand tasks stuck
> 
>
> Key: NIFI-4877
> URL: https://issues.apache.org/jira/browse/NIFI-4877
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.5.0
> Environment: Oracle Linux 6.8, NiFi 1.5.0
>Reporter: Boris Tyukin
>Priority: Critical
>
> there is no way to stop/kill processes, executed by ExecuteStreamCommand. 
> Steps to reproduce:
>  # In ExecuteStreamCommand processer, use sleep as a command, and 100 as 
> argument - this will make it run for a while. Set ignore STDIN to true.
>  # Run the flow and stop ExecuteStreamCommand processor - it will stop but 
> not really. the task will be stuck forever and the only workaround I found is 
> to restart nifi service which is not cool at all.
>  # to make it worse, processor would not accept new flowfiles and you cannot 
> Run it. Only restart fixes the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-90) Replace explicit penalization with automatic penalization/back-off

2018-03-06 Thread Boris Tyukin (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388431#comment-16388431
 ] 

Boris Tyukin commented on NIFI-90:
--

I am surprised this Jira has not got any traction after 3 years...Having used 
Apache Airflow for a while, I am looking to retry capabilities in NiFi and it 
comes down to "building your own" flows, that would handle retries in a loop 
and then sleeping for some time. The best solution I found, suggested by 
[Alessio Palma|https://community.hortonworks.com/users/16011/ozw1z5rd.html] 
[https://community.hortonworks.com/questions/56167/is-there-wait-processor-in-nifi.html]
 but still would be nice to have re-try capabilities like Airflow folks. You 
can specify global re-try behavior for a flow or override per task/processor. 
This helps a lot to deal with intermittent issues, like losing network 
connection or source database system, being down for maintenance. 

> Replace explicit penalization with automatic penalization/back-off
> --
>
> Key: NIFI-90
> URL: https://issues.apache.org/jira/browse/NIFI-90
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Joseph Witt
>Priority: Minor
>
> Rather than having users configure explicit penalization periods and 
> requiring developers to implement it in their processors we can automate 
> this.  Perhaps keep a LinkedHashMap of size 5 or so 
> in the FlowFileRecord construct.  When a FlowFile is routed to a Connection, 
> the counter is incremented.  If the counter exceeds 3 visits to the same 
> connection, the FlowFile will be automatically penalized.  This protects us 
> "5 hops out" so that if we have something like DistributeLoad -> PostHTTP 
> with failure looping back to DistributeLoad, we will still penalize when 
> appropriate.
> In addition, we will remove the configuration option from the UI, setting the 
> penalization period to some default such as 5 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (NIFI-4921) better support for promoting NiFi processor parameters between dev and prod environments

2018-03-02 Thread Boris Tyukin (JIRA)
Boris Tyukin created NIFI-4921:
--

 Summary: better support for promoting NiFi processor parameters 
between dev and prod environments
 Key: NIFI-4921
 URL: https://issues.apache.org/jira/browse/NIFI-4921
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Flow Versioning, SDLC
Affects Versions: 1.6.0
Reporter: Boris Tyukin


Need a better way to promote processor parameters, like "Concurrent tasks" from 
development to production environments. 

Bryan Bende suggested:

I think we may want to consider making the concurrent tasks work
similar to variables, in that we capture the concurrent tasks that the
flow was developed with and would use it initially, but then if you
have modified this value in the target environment it would not
trigger a local change and would be retained across upgrades so that
you don't have to reset it.

 

I would extend this Jira to similar parameters, that need to be changed 
manually now when promoting flows to production from dev/test environments and 
they cannot use expression language or variables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-3700) Create Sqoop processor

2018-02-27 Thread Boris Tyukin (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379257#comment-16379257
 ] 

Boris Tyukin commented on NIFI-3700:


+1 on this! sqoop is awesome! 

in the meantime, I ended up using ExecuteScript to call sqoop. Check my blog 
post [http://boristyukin.com/how-to-run-sqoop-from-nifi/]

> Create Sqoop processor
> --
>
> Key: NIFI-3700
> URL: https://issues.apache.org/jira/browse/NIFI-3700
> Project: Apache NiFi
>  Issue Type: New Feature
>Reporter: Joseph Petro
>Priority: Minor
>
> A sqoop processor should be added. Sqoop is powerful for data ingestion and 
> export.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NIFI-4877) ExecuteStreamCommand tasks stuck

2018-02-14 Thread Boris Tyukin (JIRA)

 [ 
https://issues.apache.org/jira/browse/NIFI-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Tyukin updated NIFI-4877:
---
Description: 
there is no way to stop/kill processes, executed by ExecuteStreamCommand. Steps 
to reproduce:
 # In ExecuteStreamCommand processer, use sleep as a command, and 100 as 
argument - this will make it run for a while. Set ignore STDIN to true.
 # Run the flow and stop ExecuteStreamCommand processor - it will stop but not 
really. the task will be stuck forever and the only workaround I found is to 
restart nifi service which is not cool at all.
 # to make it worse, processor would not accept new flowfiles and you cannot 
Run it. Only restart fixes the issue.

  was:
there is no way to stop/kill processes, executed by ExecuteStreamCommand. Steps 
to reproduce:
 # In ExecuteStreamCommand processer, use sleep as a command, and 100 as 
argument - this will make it run for a while. Set ignore STDIN to true.
 # Run the flow and stop ExecuteStreamCommand processor - it will stop but not 
really. the task will be stuck forever and the only workaround I found is to 
restart nifi service which is not cool at all.
 # to make it worth, processor would not accept new flowfiles and you cannot 
Run it. Only restart fixes the issue.


> ExecuteStreamCommand tasks stuck
> 
>
> Key: NIFI-4877
> URL: https://issues.apache.org/jira/browse/NIFI-4877
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.5.0
> Environment: Oracle Linux 6.8, NiFi 1.5.0
>Reporter: Boris Tyukin
>Priority: Critical
>
> there is no way to stop/kill processes, executed by ExecuteStreamCommand. 
> Steps to reproduce:
>  # In ExecuteStreamCommand processer, use sleep as a command, and 100 as 
> argument - this will make it run for a while. Set ignore STDIN to true.
>  # Run the flow and stop ExecuteStreamCommand processor - it will stop but 
> not really. the task will be stuck forever and the only workaround I found is 
> to restart nifi service which is not cool at all.
>  # to make it worse, processor would not accept new flowfiles and you cannot 
> Run it. Only restart fixes the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (NIFI-4877) ExecuteStreamCommand tasks stuck

2018-02-14 Thread Boris Tyukin (JIRA)
Boris Tyukin created NIFI-4877:
--

 Summary: ExecuteStreamCommand tasks stuck
 Key: NIFI-4877
 URL: https://issues.apache.org/jira/browse/NIFI-4877
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Extensions
Affects Versions: 1.5.0
 Environment: Oracle Linux 6.8, NiFi 1.5.0
Reporter: Boris Tyukin


there is no way to stop/kill processes, executed by ExecuteStreamCommand. Steps 
to reproduce:
 # In ExecuteStreamCommand processer, use sleep as a command, and 100 as 
argument - this will make it run for a while. Set ignore STDIN to true.
 # Run the flow and stop ExecuteStreamCommand processor - it will stop but not 
really. the task will be stuck forever and the only workaround I found is to 
restart nifi service which is not cool at all.
 # to make it worth, processor would not accept new flowfiles and you cannot 
Run it. Only restart fixes the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)