[jira] [Updated] (NIFI-10752) Update Couchbase client to 3.x

2024-04-27 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-10752:

Summary: Update Couchbase client to 3.x  (was: Update 
com.couchbase.client.java-client to 3.4.0)

> Update Couchbase client to 3.x
> --
>
> Key: NIFI-10752
> URL: https://issues.apache.org/jira/browse/NIFI-10752
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Mike R
>Assignee: Jeyassri Balachandran
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Update com.couchbase.client.java-client to 3.4.0 to remediate CVEs in the 
> program
> Here are the release notes: [SDK Release Notes | Couchbase 
> Docs|https://docs.couchbase.com/java-sdk/current/project-docs/sdk-release-notes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-10752) Update Couchbase client to 3.x

2024-04-27 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-10752:

Description: 
Update com.couchbase.client.java-client to latest 3.x to remediate CVEs in the 
program



Here are the release notes: [SDK Release Notes | Couchbase 
Docs|https://docs.couchbase.com/java-sdk/current/project-docs/sdk-release-notes.html]

  was:
Update com.couchbase.client.java-client to 3.4.0 to remediate CVEs in the 
program



Here are the release notes: [SDK Release Notes | Couchbase 
Docs|https://docs.couchbase.com/java-sdk/current/project-docs/sdk-release-notes.html]


> Update Couchbase client to 3.x
> --
>
> Key: NIFI-10752
> URL: https://issues.apache.org/jira/browse/NIFI-10752
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.18.0
>Reporter: Mike R
>Assignee: Jeyassri Balachandran
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Update com.couchbase.client.java-client to latest 3.x to remediate CVEs in 
> the program
> Here are the release notes: [SDK Release Notes | Couchbase 
> Docs|https://docs.couchbase.com/java-sdk/current/project-docs/sdk-release-notes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NIFI-5519) Allow ListDatabaseTables to accept incoming connections

2024-04-22 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess resolved NIFI-5519.

Resolution: Won't Fix

I think that's why this Jira has sat around for so long, most ListXYZ 
processors are true source processors because they mostly deal with filesystems 
which are hierarchical and can be recursed for a single overall resource 
(top-level folder, etc.) Although databases tend to be less hierarchical, I 
think between the Category, Schema, and Table Name properties being able to 
match on wildcards, we can cover all tables for a single overall resource (the 
top-level database). The usual convention is to have multiple instances of the 
processor for each top-level resource (i.e. database connection).

Having said that, if we want to leverage ControllerServiceLookups to allow a 
single instance to perform more diverse "work", we may want to look at another 
set of processors that better support swapping out controller services based on 
attribute values.

> Allow ListDatabaseTables to accept incoming connections
> ---
>
> Key: NIFI-5519
> URL: https://issues.apache.org/jira/browse/NIFI-5519
> Project: Apache NiFi
>  Issue Type: Wish
>Reporter: Matt Burgess
>Assignee: Jim Steinebrey
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As of [NIFI-5229|https://issues.apache.org/jira/browse/NIFI-5229], 
> DBCPConnectionPoolLookup allows the dynamic selection of a DBCPConnectionPool 
> by name. This allows processors who are to perform the same work on multiple 
> databases to be able to do so by providing individual flow files upstream 
> with the database.name attribute set.
> However ListDatabaseTables does not accept incoming connections, so you 
> currently need 1 DBCPConnectionPool per database, plus 1 ListDatabaseTables 
> per database, each using a corresponding DBCPConnectionPool. It would be nice 
> if ListDatabaseTables could accept incoming connection(s), if only to provide 
> attributes for selecting the DBCPConnectionPool.
> I propose the behavior be like other processors that can generate data with 
> or without an incoming connection (such as GenerateTableFetch, see 
> [NIFI-2881|https://issues.apache.org/jira/browse/NIFI-2881] for details). In 
> general that means if there is an incoming non-loop connection, it becomes 
> more "event-driven" in the sense that it will not execute if there is no 
> FlowFile on which to work. If there is no incoming connection, then it would 
> run as it always has, on its Run Schedule and with State Management, so as 
> not to re-list the same tables every time it executes. 
> However with an incoming connection and an available FlowFile, the behavior 
> could be that all tables for that database are listed, meaning that processor 
> state would not be updated nor queried, making it fully "event-driven". If 
> the tables for a database are not to be re-listed, the onus would be on the 
> upstream flow to not send a flow file for that database. This is not a 
> requirement, just a suggestion; it could be more flexible by honoring 
> processor state if the Refresh Interval is non-zero, but I think that adds 
> too much complexity for the user, for little payoff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-8063) Add profile to Maven POM to enable NAR exclusion

2024-04-19 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-8063:
---
Status: Open  (was: Patch Available)

> Add profile to Maven POM to enable NAR exclusion
> 
>
> Key: NIFI-8063
> URL: https://issues.apache.org/jira/browse/NIFI-8063
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Tools and Build
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Sometimes a bare-bones NiFi is all that is needed to test systems, 
> integrations, etc. It would be nice to be able to build a version of NiFi 
> without all the NARs in it (currently 1.5+ GB in total size). If this is done 
> as a profile in the assembly POM, the resulting artifacts can also be 
> dockerized (using the dockermaven module in the codebase).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NIFI-7252) wrong table mapping when using the nifi CaptureChangeMySQL modules combinded with mysql triggers:

2024-04-19 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess resolved NIFI-7252.

Resolution: Duplicate

> wrong table mapping when using the nifi CaptureChangeMySQL modules combinded 
> with mysql triggers:
> -
>
> Key: NIFI-7252
> URL: https://issues.apache.org/jira/browse/NIFI-7252
> Project: Apache NiFi
>  Issue Type: Bug
> Environment: Linux
>Reporter: Michel Elias
>Priority: Major
>
> wrong table mapping when using the nifi CaptureChangeMySQL modules combinded 
> with mysql triggers:
> when using the nifi CaptureChangeMySQL module 
> ([https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-cdc/nifi-cdc-mysql-bundle/nifi-cdc-mysql-processors/src/main/java/org/apache/nifi/cdc/mysql/processors/CaptureChangeMySQL.java])
>  combinded with mysql triggers the table mapping does not work correctly.
> when using triggers the order of the events is slightly different than when 
> using regular inserts.
> ==
>  regular insert:
> BEGIN;
>  PREPARE stmt1 FROM 'insert into changes set Header_RefID=?';
>  PREPARE stmt2 FROM 'insert into tab_tracedata set ZE140_RT_Aufnahme=?';
>  SET @a = 3;
>  SET @b = 4;
>  EXECUTE stmt1 USING @a;
>  EXECUTE stmt2 USING @b;
>  COMMIT;
> events in binlog:
> |mysql-bin.29|349004|Gtid|2|349042|BEGIN GTID 0-2-6431|
> |mysql-bin.29|349042|Table_map|2|349091|table_id: 19 (asic.changes)|
> |mysql-bin.29|349091|Write_rows_v1|2|349127|table_id: 19 flags: 
> STMT_END_F|
> |mysql-bin.29|349127|Table_map|2|349911|table_id: 24 (asic.tab_tracedata)|
> |mysql-bin.29|349911|Write_rows_v1|2|350041|table_id: 24 flags: 
> STMT_END_F|
> |mysql-bin.29|350041|Xid|2|350068|COMMIT /* xid=366 */|
> ==
>  triggered insert:
> CREATE DEFINER=`root`@`%` TRIGGER `insertChanges` AFTER INSERT ON 
> `tab_tracedata` FOR EACH ROW BEGIN
>  INSERT INTO changes (changes.Header_RefID) SELECT NEW.Header_RefID;
>  END
> events in eventlog:
> |mysql-bin.29|343|Gtid|1|381|BEGIN GTID 0-1-6289|
> |mysql-bin.29|381|Table_map|1|1165|table_id: 21 (asic.tab_tracedata)|
> |mysql-bin.29|1165|Table_map|1|1214|table_id: 19 (asic.changes)|
> |mysql-bin.29|1214|Write_rows_v1|1|3020|table_id: 21|
> |mysql-bin.29|3020|Write_rows_v1|1|3071|table_id: 19 flags: STMT_END_F|
> |mysql-bin.29|3071|Xid|1|3098|COMMIT /* xid=18 */|
>  
> showcase:
> CREATE TABLE test (id int auto_increment, data varchar(255),PRIMARY KEY (id));
>  CREATE TABLE triggered (id int);
> create trigger testtrigger AFTER insert ON test FOR EACH ROW INSERT INTO 
> triggered (triggered.id) SELECT NEW.id;
> insert into test set data="lala";
> subscribe to eventlog via CaptureChangeMySQL module. There are 2 events 
> generated. Both inserts are mapped to the "triggered" table. thats wrong. one 
> should be mapped to the "test" table - the other one to the "triggered" table.
> {"type":"insert","timestamp":1580393721000,"binlog_filename":"mysql-bin.29","binlog_position":5035809,"database":"nifi_test_case","table_name":"triggered","table_id":31,"columns":[\\{"id":1,"name":"id","column_type":4,"value":1}
> ,\{"id":2,"value":null}]}
> {"type":"insert","timestamp":1580393721000,"binlog_filename":"mysql-bin.29","binlog_position":5035849,"database":"nifi_test_case","table_name":"triggered","table_id":31,"columns":[\\{"id":1,"name":"id","column_type":4,"value":1}
> ]}--=ixed 004F59E8C12584FF_=



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NIFI-8063) Add profile to Maven POM to enable NAR exclusion

2024-04-19 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess resolved NIFI-8063.

Resolution: Won't Fix

We have a new convention of adding profiles for individual capabilities, if 
there's a need for a bare-bones profile this case can be reopened, closing for 
now.

> Add profile to Maven POM to enable NAR exclusion
> 
>
> Key: NIFI-8063
> URL: https://issues.apache.org/jira/browse/NIFI-8063
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Tools and Build
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Sometimes a bare-bones NiFi is all that is needed to test systems, 
> integrations, etc. It would be nice to be able to build a version of NiFi 
> without all the NARs in it (currently 1.5+ GB in total size). If this is done 
> as a profile in the assembly POM, the resulting artifacts can also be 
> dockerized (using the dockermaven module in the codebase).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-13069) Remove ConvertAvroToJSON

2024-04-19 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-13069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-13069:

Fix Version/s: 2.0.0-M3
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Remove ConvertAvroToJSON
> 
>
> Key: NIFI-13069
> URL: https://issues.apache.org/jira/browse/NIFI-13069
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Daniel Stieglitz
>Assignee: Daniel Stieglitz
>Priority: Major
> Fix For: 2.0.0-M3, 1.26.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The ConvertRecord configured with an AvroReader and a JsonRecordSetWriter 
> accomplishes the same as this processor. Per the following [thread 
> |https://lists.apache.org/thread/5cvvfn8oq0ttcxz0pggs8wr4xn3608tq] it was 
> agreed this should be deprecated in 1.x and removed for 2.x 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-13069) Remove ConvertAvroToJSON

2024-04-19 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-13069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-13069:

Fix Version/s: 1.26.0

> Remove ConvertAvroToJSON
> 
>
> Key: NIFI-13069
> URL: https://issues.apache.org/jira/browse/NIFI-13069
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Daniel Stieglitz
>Assignee: Daniel Stieglitz
>Priority: Major
> Fix For: 1.26.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The ConvertRecord configured with an AvroReader and a JsonRecordSetWriter 
> accomplishes the same as this processor. Per the following [thread 
> |https://lists.apache.org/thread/5cvvfn8oq0ttcxz0pggs8wr4xn3608tq] it was 
> agreed this should be deprecated in 1.x and removed for 2.x 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NIFI-4238) Error in QueryDatabaseTable (NiFi CDC support): Unable to execute SQL select query due to org.apache.nifi.processor.exception.ProcessException: Error during database quer

2024-04-19 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess resolved NIFI-4238.

Resolution: Won't Fix

Please feel free to reopen this case as necessary but since it is a driver 
limitation, hopefully it has been fixed in later versions of the driver. I'd 
really rather not hack the solution because non-standard JDBC types should be 
handled by the Database Type value in these processors.

> Error in QueryDatabaseTable (NiFi CDC support): Unable to execute SQL select 
> query due to org.apache.nifi.processor.exception.ProcessException: Error 
> during database query or conversion of records to Avro
> 
>
> Key: NIFI-4238
> URL: https://issues.apache.org/jira/browse/NIFI-4238
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.3.0
> Environment: Centos
>Reporter: Ella
>Priority: Major
> Attachments: Error.png, QueryDatabaseTableError.png, 
> config_file_1.png, config_file_2.png, config_file_3.png, diagram.png
>
>
> Hi Guys,
> I should retrieve only the new added records from the DB2 database to a file 
> by NiFi's CDC feature--QueryDatabaseTable processor; however, I have 
> encountered the Error during executing my dataflow scenario. I have 
> respectfully attached the snapshot of Error as well as the dataflow; I would 
> really appreciate if someone helped me after all.
> Thanks a lot.
> Sincerely,
> Ella



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NIFI-12923) PutHDFS to support appending avro data

2024-04-19 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess resolved NIFI-12923.
-
Fix Version/s: 2.0.0-M3
   1.26.0
   Resolution: Fixed

> PutHDFS to support appending avro data
> --
>
> Key: NIFI-12923
> URL: https://issues.apache.org/jira/browse/NIFI-12923
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Balázs Gerner
>Assignee: Balázs Gerner
>Priority: Major
> Fix For: 2.0.0-M3, 1.26.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The goal of this ticket is to extend the PutHDFS processor with the ability 
> to append avro records. The processor already provides an option to set 
> 'append' as conflict resolution strategy, but that does not work correctly in 
> case of avro files, because the serialized avro file cannot be deserialized 
> again (because the binary content is invalid).
> Some notes about the implementation:
>  * The user needs to explicitly select avro as file format and append as 
> conflict resolution mode to enable 'avro append' mode, otherwise regular 
> append mode will work just as before. There is no auto detection of mimetype 
> for the incoming flowfile.
>  * The records of the incoming flowfile and the ones in the existing avro 
> file need to conform to the same avro schema, otherwise the append operation 
> fails with incompatible schema.
>  * The 'avro append' mode should only work when compression type is set to 
> 'none', if any other compression type is selected in 'avro append' mode the 
> user should get a validation error.
> The changes will have to be added to *support/nifi-1.x* branch also.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12993) PutDatabaseRecord: add auto commit property and fully implement Batch Size for sql statement type

2024-04-18 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12993:

Fix Version/s: 2.0.0-M3
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> PutDatabaseRecord: add auto commit property and fully implement Batch Size 
> for sql statement type
> -
>
> Key: NIFI-12993
> URL: https://issues.apache.org/jira/browse/NIFI-12993
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Jim Steinebrey
>Assignee: Jim Steinebrey
>Priority: Major
> Fix For: 2.0.0-M3, 1.26.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add auto_commit property to PutDatabaseRecord.
> Batch size  property exists in PutDatabaseRecord and is implemented for some 
> statement types, but batch size is ignored for the SQL statement type 
> processing. Implement batch size processing for SQL statement types so all 
> statement types in PutDatabaseRecord support it equally.
> PutSQL and other SQL processors have auto commit and batch size properties so 
> it will be beneficial for PutDatabaseRecord to also implement them fully for 
> consistency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12993) PutDatabaseRecord: add auto commit property and fully implement Batch Size for sql statement type

2024-04-18 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12993:

Fix Version/s: 1.26.0

> PutDatabaseRecord: add auto commit property and fully implement Batch Size 
> for sql statement type
> -
>
> Key: NIFI-12993
> URL: https://issues.apache.org/jira/browse/NIFI-12993
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Jim Steinebrey
>Assignee: Jim Steinebrey
>Priority: Major
> Fix For: 1.26.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add auto_commit property to PutDatabaseRecord.
> Batch size  property exists in PutDatabaseRecord and is implemented for some 
> statement types, but batch size is ignored for the SQL statement type 
> processing. Implement batch size processing for SQL statement types so all 
> statement types in PutDatabaseRecord support it equally.
> PutSQL and other SQL processors have auto commit and batch size properties so 
> it will be beneficial for PutDatabaseRecord to also implement them fully for 
> consistency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-11449) Investigate Iceberg insert on Object Storage

2024-04-04 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-11449:

Summary: Investigate Iceberg insert on Object Storage  (was: add autocommit 
property to PutDatabaseRecord processor)

> Investigate Iceberg insert on Object Storage
> 
>
> Key: NIFI-11449
> URL: https://issues.apache.org/jira/browse/NIFI-11449
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.21.0
> Environment: Any Nifi Deployment
>Reporter: Abdelrahim Ahmad
>Assignee: Jim Steinebrey
>Priority: Blocker
>  Labels: Trino, autocommit, database, iceberg, putdatabaserecord
>
> The issue is with the {{PutDatabaseRecord}} processor in Apache NiFi. When 
> using the processor with the Trino-JDBC-Driver or Dremio-JDBC-Driver to write 
> to an Iceberg catalog, it disables the autocommit feature. This leads to 
> errors such as "{*}Catalog only supports writes using autocommit: iceberg{*}".
> the autocommit feature needs to be added in the processor to be 
> enabled/disabled.
> enabling auto-commit in the Nifi PutDatabaseRecord processor is important for 
> Deltalake, Iceberg, and Hudi as it ensures data consistency and integrity by 
> allowing atomic writes to be performed in the underlying database. This will 
> allow the process to be widely used with bigger range of databases.
> _Improving this processor will allow Nifi to be the main tool to ingest data 
> into these new Technologies. So we don't have to deal with another tool to do 
> so._
> +*_{color:#de350b}BUT:{color}_*+
> I have reviewed The {{PutDatabaseRecord}} processor in NiFi. It inserts 
> records one by one into the database using a prepared statement, and commits 
> the transaction at the end of the loop that processes each record. This 
> approach can be inefficient and slow when inserting large volumes of data 
> into tables that are optimized for bulk ingestion, such as Delta Lake, 
> Iceberg, and Hudi tables.
> These tables use various techniques to optimize the performance of bulk 
> ingestion, such as partitioning, clustering, and indexing. Inserting records 
> one by one using a prepared statement can bypass these optimizations, leading 
> to poor performance and potentially causing issues such as excessive disk 
> usage, increased memory consumption, and decreased query performance.
> To avoid these issues, it is recommended to have a new processor, or add 
> feature to the current one, to bulk insert method with AutoCommit feature 
> when inserting large volumes of data into Delta Lake, Iceberg, and Hudi 
> tables. 
>  
> P.S.: using PutSQL is not a have autoCommit but have the same performance 
> problem described above..
> Thanks and best regards :)
> Abdelrahim Ahmad



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12993) PutDatabaseRecord: add auto commit property and fully implement Batch Size for sql statement type

2024-04-04 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12993:

Affects Version/s: (was: 1.25.0)
   (was: 2.0.0-M2)
   Status: Patch Available  (was: Open)

> PutDatabaseRecord: add auto commit property and fully implement Batch Size 
> for sql statement type
> -
>
> Key: NIFI-12993
> URL: https://issues.apache.org/jira/browse/NIFI-12993
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Jim Steinebrey
>Assignee: Jim Steinebrey
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add auto_commit property to PutDatabaseRecord.
> Batch size  property exists in PutDatabaseRecord and is implemented for some 
> statement types, but batch size is ignored for the SQL statement type 
> processing. Implement batch size processing for SQL statement types so all 
> statement types in PutDatabaseRecord support it equally.
> PutSQL and other SQL processors have auto commit and batch size properties so 
> it will be beneficial for PutDatabaseRecord to also implement them fully for 
> consistency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12939) Retry Kerberos login on authentication failure in Iceberg processors

2024-03-26 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12939:

Fix Version/s: 2.0.0-M3
   1.26.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Retry Kerberos login on authentication failure in Iceberg processors
> 
>
> Key: NIFI-12939
> URL: https://issues.apache.org/jira/browse/NIFI-12939
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Mark Bathori
>Assignee: Mark Bathori
>Priority: Major
> Fix For: 2.0.0-M3, 1.26.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When multiple processors trying to renew the same expired Kerberos ticket, it 
> can cause authentication error that will route the current FlowFile into the 
> processor's failure relation. Since the issue is not permanent the processor 
> should retry the authentication.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12939) Retry Kerberos login on authentication failure in Iceberg processors

2024-03-25 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12939:

Status: Patch Available  (was: Open)

> Retry Kerberos login on authentication failure in Iceberg processors
> 
>
> Key: NIFI-12939
> URL: https://issues.apache.org/jira/browse/NIFI-12939
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Mark Bathori
>Assignee: Mark Bathori
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When multiple processors trying to renew the same expired Kerberos ticket, it 
> can cause authentication error that will route the current FlowFile into the 
> processor's failure relation. Since the issue is not permanent the processor 
> should retry the authentication.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-1931) QueryDatabaseTable processor; setFetchSize not working for postgres driver causing out of memory

2024-03-25 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-1931:
---
Fix Version/s: 2.0.0-M3
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> QueryDatabaseTable processor; setFetchSize not working for postgres driver 
> causing out of memory
> 
>
> Key: NIFI-1931
> URL: https://issues.apache.org/jira/browse/NIFI-1931
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Paul Bormans
>Assignee: Jim Steinebrey
>Priority: Major
> Fix For: 2.0.0-M3, 1.26.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> With NIFI-1691 the ability to specify the fetch size is added. However for 
> postgres driver (or at least for postgresql-9.4.1208.jre6.jar) this seems not 
> to be working since after some time out of memory is reported in the logs.
> See https://jdbc.postgresql.org/documentation/head/query.html also for some 
> constraints; like auto commit needs to be set to False.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-1931) QueryDatabaseTable processor; setFetchSize not working for postgres driver causing out of memory

2024-03-25 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-1931:
---
Fix Version/s: 1.26.0

> QueryDatabaseTable processor; setFetchSize not working for postgres driver 
> causing out of memory
> 
>
> Key: NIFI-1931
> URL: https://issues.apache.org/jira/browse/NIFI-1931
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Paul Bormans
>Assignee: Jim Steinebrey
>Priority: Major
> Fix For: 1.26.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> With NIFI-1691 the ability to specify the fetch size is added. However for 
> postgres driver (or at least for postgresql-9.4.1208.jre6.jar) this seems not 
> to be working since after some time out of memory is reported in the logs.
> See https://jdbc.postgresql.org/documentation/head/query.html also for some 
> constraints; like auto commit needs to be set to False.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-1931) QueryDatabaseTable processor; setFetchSize not working for postgres driver causing out of memory

2024-03-22 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-1931:
---
Affects Version/s: (was: 0.6.1)
   Status: Patch Available  (was: Open)

> QueryDatabaseTable processor; setFetchSize not working for postgres driver 
> causing out of memory
> 
>
> Key: NIFI-1931
> URL: https://issues.apache.org/jira/browse/NIFI-1931
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Paul Bormans
>Assignee: Jim Steinebrey
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> With NIFI-1691 the ability to specify the fetch size is added. However for 
> postgres driver (or at least for postgresql-9.4.1208.jre6.jar) this seems not 
> to be working since after some time out of memory is reported in the logs.
> See https://jdbc.postgresql.org/documentation/head/query.html also for some 
> constraints; like auto commit needs to be set to False.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12700) PutKudu memory optimization for unbatched flush mode (AUTO_FLUSH_SYNC)

2024-03-15 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12700:

Fix Version/s: 2.0.0
   1.26.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> PutKudu memory optimization for unbatched flush mode (AUTO_FLUSH_SYNC)
> --
>
> Key: NIFI-12700
> URL: https://issues.apache.org/jira/browse/NIFI-12700
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Emilio Setiadarma
>Assignee: Emilio Setiadarma
>Priority: Major
> Fix For: 2.0.0, 1.26.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The PutKudu processor's existing implementation uses a Map of KuduOperation 
> -> FlowFile  to keep track of which FlowFile was processing when the 
> KuduOperation was created. This is mapping is eventually used to associate 
> FlowFiles with the RowError (if any occurs), a mapping that is necessary for 
> transferring FlowFiles to success/failure relationships or logging failures 
> among other things. 
> For very large inputs, Kudu Operation objects can grow very large. There is 
> no memory leak, but still could cause OutOfMemory issues in very large input 
> data. There is a possibility to not require the use of a KuduOperation -> 
> FlowFile map for unbatched flush modes (e.g. when using the AUTO_FLUSH_SYNC 
> flush mode, where the KuduSession.apply() would have already flushed the 
> buffer before returning, 
> [https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html)|https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html]
> This Jira attempts to capture the efforts for refactoring PutKudu processor 
> to make it more memory optimized.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12700) PutKudu memory optimization for unbatched flush mode (AUTO_FLUSH_SYNC)

2024-03-13 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12700:

Status: Patch Available  (was: Open)

> PutKudu memory optimization for unbatched flush mode (AUTO_FLUSH_SYNC)
> --
>
> Key: NIFI-12700
> URL: https://issues.apache.org/jira/browse/NIFI-12700
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Emilio Setiadarma
>Assignee: Emilio Setiadarma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The PutKudu processor's existing implementation uses a Map of KuduOperation 
> -> FlowFile  to keep track of which FlowFile was processing when the 
> KuduOperation was created. This is mapping is eventually used to associate 
> FlowFiles with the RowError (if any occurs), a mapping that is necessary for 
> transferring FlowFiles to success/failure relationships or logging failures 
> among other things. 
> For very large inputs, Kudu Operation objects can grow very large. There is 
> no memory leak, but still could cause OutOfMemory issues in very large input 
> data. There is a possibility to not require the use of a KuduOperation -> 
> FlowFile map for unbatched flush modes (e.g. when using the AUTO_FLUSH_SYNC 
> flush mode, where the KuduSession.apply() would have already flushed the 
> buffer before returning, 
> [https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html)|https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html]
> This Jira attempts to capture the efforts for refactoring PutKudu processor 
> to make it more memory optimized.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12855) Add more information to provenance events to facilitate full graph traversal

2024-03-13 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12855:

Description: 
Although NiFi has a capability in the UI to issue and display lineage queries 
for provenance events, it is not a complete graph that can be traversed if for 
example provenance events were stored in a graph database. The following 
features should be added:

- A reference in a provenance event to any parent events ("previousEventIds")
- Add methods to GraphClientService to generate queries/statements in popular 
graph languages such as Tinkerpop/Gremlin, Cypher, and SQL
- Add ArcadeDB service as reference implementation for SQL generation

  was:
Although NiFi has a capability in the UI to issue and display lineage queries 
for provenance events, it is not a complete graph that can be traversed if for 
example provenance events were stored in a graph database. The following 
features should be added:

- A reference in a provenance event to any parent events ("previousEventIds")
- Add methods to GraphClientService to generate queries/statements in popular 
graph languages such as Tinkerpop/Gremlin, Cypher, and SQL
- Add explicit references to the relationship to which the FlowFile was 
transferred
- Add ArcadeDB service as reference implementation for SQL generation


> Add more information to provenance events to facilitate full graph traversal
> 
>
> Key: NIFI-12855
> URL: https://issues.apache.org/jira/browse/NIFI-12855
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Although NiFi has a capability in the UI to issue and display lineage queries 
> for provenance events, it is not a complete graph that can be traversed if 
> for example provenance events were stored in a graph database. The following 
> features should be added:
> - A reference in a provenance event to any parent events ("previousEventIds")
> - Add methods to GraphClientService to generate queries/statements in popular 
> graph languages such as Tinkerpop/Gremlin, Cypher, and SQL
> - Add ArcadeDB service as reference implementation for SQL generation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12880) Add DeleteFile processor

2024-03-12 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12880:

Status: Patch Available  (was: Open)

> Add DeleteFile processor
> 
>
> Key: NIFI-12880
> URL: https://issues.apache.org/jira/browse/NIFI-12880
> Project: Apache NiFi
>  Issue Type: New Feature
>Reporter: endzeit
>Assignee: endzeit
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The existing processor to retrieve a file from the file system, namely 
> {{FetchFile}} and {{GetFile}}, support the removal of the file from the file 
> system once the content has been copied into the FlowFile.
> However, deleting the file from the file system immediately might not be 
> feasible in certain circumstances.  
> In cases where the content repository of NiFi does not meet sufficient data 
> durability guarantees, it might be desired to remove the source file only 
> after it has been processed successfully and its result transferred to a 
> system that satisfies those durability constraints.
> As of now, there is no built-in solution to achieve such behavior using the 
> standard NiFi distribution.
> Current workarounds involve the usage of a scripted processor or the creation 
> of a custom processor, that provides the desired functionality.
> This issue proposes the addition of a {{DeleteFile}} processor to the NiFi 
> standard-processors bundle, that fills this gap.
> It should expect a FlowFile and delete the file at the path derived from the 
> FlowFile attributes. The default values to determine the file path should be 
> compatible with the attributes written by the existing {{ListFiles}} 
> processor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (NIFI-12889) Retry Kerberos Login on auth failures in HDFS processors

2024-03-12 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess reassigned NIFI-12889:
---

Assignee: Matt Burgess

> Retry Kerberos Login on auth failures in HDFS processors
> 
>
> Key: NIFI-12889
> URL: https://issues.apache.org/jira/browse/NIFI-12889
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently if a Kerberos authentication failure happens (during ticket 
> relogin, e.g.) in the HDFS processors, the controller service must be 
> restarted manually in order for the processors to execute correctly. From the 
> processors we should reset the HDFS resources on auth failure to simulate a 
> "restart" of the controller service so relogin can occur correctly.
> At a minimum this includes the following processors:
> FetchHDFS
> GetHDFS
> GetHDFSFileInfo
> PutHDFS
> ListHDFS



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12889) Retry Kerberos Login on auth failures in HDFS processors

2024-03-12 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12889:

Status: Patch Available  (was: In Progress)

> Retry Kerberos Login on auth failures in HDFS processors
> 
>
> Key: NIFI-12889
> URL: https://issues.apache.org/jira/browse/NIFI-12889
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently if a Kerberos authentication failure happens (during ticket 
> relogin, e.g.) in the HDFS processors, the controller service must be 
> restarted manually in order for the processors to execute correctly. From the 
> processors we should reset the HDFS resources on auth failure to simulate a 
> "restart" of the controller service so relogin can occur correctly.
> At a minimum this includes the following processors:
> FetchHDFS
> GetHDFS
> GetHDFSFileInfo
> PutHDFS
> ListHDFS



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NIFI-12889) Retry Kerberos Login on auth failures in HDFS processors

2024-03-12 Thread Matt Burgess (Jira)
Matt Burgess created NIFI-12889:
---

 Summary: Retry Kerberos Login on auth failures in HDFS processors
 Key: NIFI-12889
 URL: https://issues.apache.org/jira/browse/NIFI-12889
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Extensions
Reporter: Matt Burgess


Currently if a Kerberos authentication failure happens (during ticket relogin, 
e.g.) in the HDFS processors, the controller service must be restarted manually 
in order for the processors to execute correctly. From the processors we should 
reset the HDFS resources on auth failure to simulate a "restart" of the 
controller service so relogin can occur correctly.

At a minimum this includes the following processors:

FetchHDFS
GetHDFS
GetHDFSFileInfo
PutHDFS
ListHDFS



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-3625) Add JSON support to PutHiveStreaming

2024-03-12 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-3625:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

The PutHiveStreaming processors use a Record Reader to ingest the records from 
the FlowFile. Closing this as OBE.

> Add JSON support to PutHiveStreaming
> 
>
> Key: NIFI-3625
> URL: https://issues.apache.org/jira/browse/NIFI-3625
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.2.0
>Reporter: Ryan Persaud
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As noted in a Hortonworks Community Connection post 
> (https://community.hortonworks.com/questions/88424/nifi-puthivestreaming-requires-avro.html),
>  PutHiveStreaming does not currently support JSON Flow File content.  I've 
> completed the code to allow JSON flow files to be streamed into hive, and I'm 
> currently working on test cases and updated documentation.  I should have a 
> PR to submit this week.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NIFI-10778) Add support for Kerberos Authentication in ElasticSearchClientServiceImpl

2024-03-12 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess resolved NIFI-10778.
-
Resolution: Duplicate

Closing as duplicate of NIFI-10830

> Add support for Kerberos Authentication in ElasticSearchClientServiceImpl
> -
>
> Key: NIFI-10778
> URL: https://issues.apache.org/jira/browse/NIFI-10778
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Nandor Soma Abonyi
>Priority: Major
>  Labels: elasticsearch
>
> Initiate discussion: 
> [https://github.com/apache/nifi/pull/6619#discussion_r1014662812]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-11449) add autocommit property to PutDatabaseRecord processor

2024-03-11 Thread Matt Burgess (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-11449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825426#comment-17825426
 ] 

Matt Burgess commented on NIFI-11449:
-

For AWS, what does Iceberg use for a catalog? DynamoDB?

> add autocommit property to PutDatabaseRecord processor
> --
>
> Key: NIFI-11449
> URL: https://issues.apache.org/jira/browse/NIFI-11449
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.21.0
> Environment: Any Nifi Deployment
>Reporter: Abdelrahim Ahmad
>Priority: Blocker
>  Labels: Trino, autocommit, database, iceberg, putdatabaserecord
>
> The issue is with the {{PutDatabaseRecord}} processor in Apache NiFi. When 
> using the processor with the Trino-JDBC-Driver or Dremio-JDBC-Driver to write 
> to an Iceberg catalog, it disables the autocommit feature. This leads to 
> errors such as "{*}Catalog only supports writes using autocommit: iceberg{*}".
> the autocommit feature needs to be added in the processor to be 
> enabled/disabled.
> enabling auto-commit in the Nifi PutDatabaseRecord processor is important for 
> Deltalake, Iceberg, and Hudi as it ensures data consistency and integrity by 
> allowing atomic writes to be performed in the underlying database. This will 
> allow the process to be widely used with bigger range of databases.
> _Improving this processor will allow Nifi to be the main tool to ingest data 
> into these new Technologies. So we don't have to deal with another tool to do 
> so._
> +*_{color:#de350b}BUT:{color}_*+
> I have reviewed The {{PutDatabaseRecord}} processor in NiFi. It inserts 
> records one by one into the database using a prepared statement, and commits 
> the transaction at the end of the loop that processes each record. This 
> approach can be inefficient and slow when inserting large volumes of data 
> into tables that are optimized for bulk ingestion, such as Delta Lake, 
> Iceberg, and Hudi tables.
> These tables use various techniques to optimize the performance of bulk 
> ingestion, such as partitioning, clustering, and indexing. Inserting records 
> one by one using a prepared statement can bypass these optimizations, leading 
> to poor performance and potentially causing issues such as excessive disk 
> usage, increased memory consumption, and decreased query performance.
> To avoid these issues, it is recommended to have a new processor, or add 
> feature to the current one, to bulk insert method with AutoCommit feature 
> when inserting large volumes of data into Delta Lake, Iceberg, and Hudi 
> tables. 
>  
> P.S.: using PutSQL is not a have autoCommit but have the same performance 
> problem described above..
> Thanks and best regards :)
> Abdelrahim Ahmad



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-11449) add autocommit property to PutDatabaseRecord processor

2024-03-11 Thread Matt Burgess (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-11449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825392#comment-17825392
 ] 

Matt Burgess commented on NIFI-11449:
-

I believe there are separate processors for AWS and GCP as well. They may not 
be included with the Apache NiFi release binaries but the NARs can be added 
manually from the Apache repository. Do those solve your use case or do we need 
PutIceberg to support Object-backed storage as well?

> add autocommit property to PutDatabaseRecord processor
> --
>
> Key: NIFI-11449
> URL: https://issues.apache.org/jira/browse/NIFI-11449
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Affects Versions: 1.21.0
> Environment: Any Nifi Deployment
>Reporter: Abdelrahim Ahmad
>Priority: Blocker
>  Labels: Trino, autocommit, database, iceberg, putdatabaserecord
>
> The issue is with the {{PutDatabaseRecord}} processor in Apache NiFi. When 
> using the processor with the Trino-JDBC-Driver or Dremio-JDBC-Driver to write 
> to an Iceberg catalog, it disables the autocommit feature. This leads to 
> errors such as "{*}Catalog only supports writes using autocommit: iceberg{*}".
> the autocommit feature needs to be added in the processor to be 
> enabled/disabled.
> enabling auto-commit in the Nifi PutDatabaseRecord processor is important for 
> Deltalake, Iceberg, and Hudi as it ensures data consistency and integrity by 
> allowing atomic writes to be performed in the underlying database. This will 
> allow the process to be widely used with bigger range of databases.
> _Improving this processor will allow Nifi to be the main tool to ingest data 
> into these new Technologies. So we don't have to deal with another tool to do 
> so._
> +*_{color:#de350b}BUT:{color}_*+
> I have reviewed The {{PutDatabaseRecord}} processor in NiFi. It inserts 
> records one by one into the database using a prepared statement, and commits 
> the transaction at the end of the loop that processes each record. This 
> approach can be inefficient and slow when inserting large volumes of data 
> into tables that are optimized for bulk ingestion, such as Delta Lake, 
> Iceberg, and Hudi tables.
> These tables use various techniques to optimize the performance of bulk 
> ingestion, such as partitioning, clustering, and indexing. Inserting records 
> one by one using a prepared statement can bypass these optimizations, leading 
> to poor performance and potentially causing issues such as excessive disk 
> usage, increased memory consumption, and decreased query performance.
> To avoid these issues, it is recommended to have a new processor, or add 
> feature to the current one, to bulk insert method with AutoCommit feature 
> when inserting large volumes of data into Delta Lake, Iceberg, and Hudi 
> tables. 
>  
> P.S.: using PutSQL is not a have autoCommit but have the same performance 
> problem described above..
> Thanks and best regards :)
> Abdelrahim Ahmad



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12825) Implement processor to get row key ranges for HBase regions

2024-03-06 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12825:

Fix Version/s: 2.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Implement processor to get row key ranges for HBase regions
> ---
>
> Key: NIFI-12825
> URL: https://issues.apache.org/jira/browse/NIFI-12825
> Project: Apache NiFi
>  Issue Type: New Feature
>Reporter: Emilio Setiadarma
>Assignee: Emilio Setiadarma
>Priority: Major
> Fix For: 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> A common way for parallelizing scan operations to HBase is to scan by row key 
> ranges. In the HBase architecture, HBase splits tables into regions, each 
> with a range of row keys. These row key ranges are mutually exclusive, and 
> they include all the row keys.
> The manual approach currently to parallelize scans to HBase via row key 
> ranges is to go to HBase shell, perform the "list_regions" function to obtain 
> row key ranges. This approach has its downsides, most importantly being the 
> fact that row key ranges are not static. HBase regions may also split, 
> creating two regions with the row key range split in the middle.
> Providing a way for NiFi to obtain these row key ranges per HBase region 
> could help improve the ease of creating a flow that performs scans to HBase 
> parallelized by row key range. Once we know row key ranges, this information 
> could be easily fed into a scanning processor (i.e. ScanHBase).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12825) Implement processor to get row key ranges for HBase regions

2024-03-06 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12825:

Status: Patch Available  (was: Open)

> Implement processor to get row key ranges for HBase regions
> ---
>
> Key: NIFI-12825
> URL: https://issues.apache.org/jira/browse/NIFI-12825
> Project: Apache NiFi
>  Issue Type: New Feature
>Reporter: Emilio Setiadarma
>Assignee: Emilio Setiadarma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A common way for parallelizing scan operations to HBase is to scan by row key 
> ranges. In the HBase architecture, HBase splits tables into regions, each 
> with a range of row keys. These row key ranges are mutually exclusive, and 
> they include all the row keys.
> The manual approach currently to parallelize scans to HBase via row key 
> ranges is to go to HBase shell, perform the "list_regions" function to obtain 
> row key ranges. This approach has its downsides, most importantly being the 
> fact that row key ranges are not static. HBase regions may also split, 
> creating two regions with the row key range split in the middle.
> Providing a way for NiFi to obtain these row key ranges per HBase region 
> could help improve the ease of creating a flow that performs scans to HBase 
> parallelized by row key range. Once we know row key ranges, this information 
> could be easily fed into a scanning processor (i.e. ScanHBase).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12855) Add more information to provenance events to facilitate full graph traversal

2024-03-05 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12855:

Status: Patch Available  (was: In Progress)

> Add more information to provenance events to facilitate full graph traversal
> 
>
> Key: NIFI-12855
> URL: https://issues.apache.org/jira/browse/NIFI-12855
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Although NiFi has a capability in the UI to issue and display lineage queries 
> for provenance events, it is not a complete graph that can be traversed if 
> for example provenance events were stored in a graph database. The following 
> features should be added:
> - A reference in a provenance event to any parent events ("previousEventIds")
> - Add methods to GraphClientService to generate queries/statements in popular 
> graph languages such as Tinkerpop/Gremlin, Cypher, and SQL
> - Add explicit references to the relationship to which the FlowFile was 
> transferred
> - Add ArcadeDB service as reference implementation for SQL generation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12630) NPE getLogger in ConsumeSlack

2024-03-05 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12630:

Fix Version/s: 2.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> NPE getLogger in ConsumeSlack
> -
>
> Key: NIFI-12630
> URL: https://issues.apache.org/jira/browse/NIFI-12630
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Jim Steinebrey
>Priority: Major
> Fix For: 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Getting this NPE when hitting the Slack rate limit:
> {code:java}
> 2024-01-18 16:07:53,560 WARN [Timer-Driven Process Thread-12] 
> o.a.n.controller.tasks.ConnectableTask Processing halted: uncaught exception 
> in Component [ConsumeSlack[id=592d68c7-3fe6-3039-53e4-eae3bfbfbd57]]
> java.lang.NullPointerException: Cannot invoke 
> "org.apache.nifi.logging.ComponentLog.debug(String, Object[])" because 
> "this.logger" is null
>   at 
> org.apache.nifi.processors.slack.util.RateLimit.isLimitReached(RateLimit.java:42)
>   at 
> org.apache.nifi.processors.slack.ConsumeSlack.onTrigger(ConsumeSlack.java:332)
>   at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
>   at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1274)
>   at 
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:244)
>   at 
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:102)
>   at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
>   at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:358)
>   at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
>   at java.base/java.lang.Thread.run(Thread.java:1583) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12630) NPE getLogger in ConsumeSlack

2024-03-05 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12630:

Affects Version/s: (was: 2.0.0-M1)
   (was: 1.24.0)
   Status: Patch Available  (was: Open)

> NPE getLogger in ConsumeSlack
> -
>
> Key: NIFI-12630
> URL: https://issues.apache.org/jira/browse/NIFI-12630
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Jim Steinebrey
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Getting this NPE when hitting the Slack rate limit:
> {code:java}
> 2024-01-18 16:07:53,560 WARN [Timer-Driven Process Thread-12] 
> o.a.n.controller.tasks.ConnectableTask Processing halted: uncaught exception 
> in Component [ConsumeSlack[id=592d68c7-3fe6-3039-53e4-eae3bfbfbd57]]
> java.lang.NullPointerException: Cannot invoke 
> "org.apache.nifi.logging.ComponentLog.debug(String, Object[])" because 
> "this.logger" is null
>   at 
> org.apache.nifi.processors.slack.util.RateLimit.isLimitReached(RateLimit.java:42)
>   at 
> org.apache.nifi.processors.slack.ConsumeSlack.onTrigger(ConsumeSlack.java:332)
>   at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
>   at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1274)
>   at 
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:244)
>   at 
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:102)
>   at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
>   at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:358)
>   at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
>   at java.base/java.lang.Thread.run(Thread.java:1583) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12645) InvokeScriptedProcessor: The onStopped method is never called

2024-03-04 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12645:

Fix Version/s: 2.0.0
   1.26.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> InvokeScriptedProcessor: The onStopped method is never called
> -
>
> Key: NIFI-12645
> URL: https://issues.apache.org/jira/browse/NIFI-12645
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: All operatiing system
>Reporter: Antonio Pezzuto
>Assignee: Jim Steinebrey
>Priority: Major
>  Labels: easyfix
> Fix For: 2.0.0, 1.26.0
>
> Attachments: Test_Script_Body_InvokeScriptedProcessor.json, 
> image-2024-01-19-12-40-24-453.png, image-2024-01-19-12-41-15-173.png, 
> image-2024-01-19-12-44-08-030.png, image-2024-01-19-12-44-39-464.png
>
>   Original Estimate: 24h
>  Time Spent: 0.5h
>  Remaining Estimate: 23.5h
>
> Processor: *InvokeScriptedProcessor*
> Script Engine : groovy
> The _InvokeScriptedProcessor_ processor was used to create a custom processor 
> in groovy.
> The groovy custom processor exposes the *onStopped* method
>  
> {code:java}
>  public void onStopped(ProcessContext context) throws Exception {
>         System.out.println("Stop")
>         restServer?.shutDown()
>     }{code}
>  
> When the _InvokeScriptedProcessor_ is stopped the groovy processor's 
> *onStopped* method is not invoked.
> I ran the InvokeScriptedProcessor processor in remote debug and the cause 
> seems to be due to a bug in the management of the scriptNeedsReload variable. 
> The groovy processor's onStopped method executes only if the 
> scriptNeedsReload instance variable equals false
> !image-2024-01-19-12-41-15-173.png!
> According to my analysis _scriptNeedsReload_ is always set to true due to an 
> error in the *reloadScript* method.
> Currently the *reloadScript* method returns true if the script body is empty 
> or if there are validation errors, it returns false if there are no 
> validation errors.
> The *reloadScript* method should return true if there are no validation 
> errors and return false if the script body is empty or if there are 
> validation errors.
> !image-2024-01-19-12-44-08-030.png!
> !image-2024-01-19-12-44-39-464.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12645) InvokeScriptedProcessor: The onStopped method is never called

2024-03-04 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12645:

Affects Version/s: (was: 1.20.0)
   (was: 1.19.1)
   (was: 1.21.0)
   (was: 1.22.0)
   (was: 1.23.0)
   (was: 1.24.0)
   (was: 1.23.1)
   (was: 1.23.2)
   Status: Patch Available  (was: Open)

> InvokeScriptedProcessor: The onStopped method is never called
> -
>
> Key: NIFI-12645
> URL: https://issues.apache.org/jira/browse/NIFI-12645
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: All operatiing system
>Reporter: Antonio Pezzuto
>Assignee: Jim Steinebrey
>Priority: Major
>  Labels: easyfix
> Attachments: Test_Script_Body_InvokeScriptedProcessor.json, 
> image-2024-01-19-12-40-24-453.png, image-2024-01-19-12-41-15-173.png, 
> image-2024-01-19-12-44-08-030.png, image-2024-01-19-12-44-39-464.png
>
>   Original Estimate: 24h
>  Time Spent: 10m
>  Remaining Estimate: 23h 50m
>
> Processor: *InvokeScriptedProcessor*
> Script Engine : groovy
> The _InvokeScriptedProcessor_ processor was used to create a custom processor 
> in groovy.
> The groovy custom processor exposes the *onStopped* method
>  
> {code:java}
>  public void onStopped(ProcessContext context) throws Exception {
>         System.out.println("Stop")
>         restServer?.shutDown()
>     }{code}
>  
> When the _InvokeScriptedProcessor_ is stopped the groovy processor's 
> *onStopped* method is not invoked.
> I ran the InvokeScriptedProcessor processor in remote debug and the cause 
> seems to be due to a bug in the management of the scriptNeedsReload variable. 
> The groovy processor's onStopped method executes only if the 
> scriptNeedsReload instance variable equals false
> !image-2024-01-19-12-41-15-173.png!
> According to my analysis _scriptNeedsReload_ is always set to true due to an 
> error in the *reloadScript* method.
> Currently the *reloadScript* method returns true if the script body is empty 
> or if there are validation errors, it returns false if there are no 
> validation errors.
> The *reloadScript* method should return true if there are no validation 
> errors and return false if the script body is empty or if there are 
> validation errors.
> !image-2024-01-19-12-44-08-030.png!
> !image-2024-01-19-12-44-39-464.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (NIFI-12855) Add more information to provenance events to facilitate full graph traversal

2024-03-01 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess reassigned NIFI-12855:
---

Assignee: Matt Burgess

> Add more information to provenance events to facilitate full graph traversal
> 
>
> Key: NIFI-12855
> URL: https://issues.apache.org/jira/browse/NIFI-12855
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
>
> Although NiFi has a capability in the UI to issue and display lineage queries 
> for provenance events, it is not a complete graph that can be traversed if 
> for example provenance events were stored in a graph database. The following 
> features should be added:
> - A reference in a provenance event to any parent events ("previousEventIds")
> - Add methods to GraphClientService to generate queries/statements in popular 
> graph languages such as Tinkerpop/Gremlin, Cypher, and SQL
> - Add explicit references to the relationship to which the FlowFile was 
> transferred
> - Add ArcadeDB service as reference implementation for SQL generation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NIFI-12855) Add more information to provenance events to facilitate full graph traversal

2024-03-01 Thread Matt Burgess (Jira)
Matt Burgess created NIFI-12855:
---

 Summary: Add more information to provenance events to facilitate 
full graph traversal
 Key: NIFI-12855
 URL: https://issues.apache.org/jira/browse/NIFI-12855
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Core Framework
Reporter: Matt Burgess


Although NiFi has a capability in the UI to issue and display lineage queries 
for provenance events, it is not a complete graph that can be traversed if for 
example provenance events were stored in a graph database. The following 
features should be added:

- A reference in a provenance event to any parent events ("previousEventIds")
- Add methods to GraphClientService to generate queries/statements in popular 
graph languages such as Tinkerpop/Gremlin, Cypher, and SQL
- Add explicit references to the relationship to which the FlowFile was 
transferred
- Add ArcadeDB service as reference implementation for SQL generation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12850) Failure to index Provenance Events with large filename attribute

2024-02-29 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12850:

Fix Version/s: 2.0.0
   1.26.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Failure to index Provenance Events with large filename attribute
> 
>
> Key: NIFI-12850
> URL: https://issues.apache.org/jira/browse/NIFI-12850
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.25.0, 2.0.0-M2
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Major
> Fix For: 2.0.0, 1.26.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code:java}
> ERROR org.apache.nifi.provenance.index.lucene.EventIndexTask: Failed to index 
> Provenance Events java.lang.IllegalArgumentException: Document contains at 
> least one immense term in field="filename" (whose UTF8 encoding is longer 
> than the max length 32766), all of which were skipped. Please correct the 
> analyzer to not produce such terms. The prefix of the first immense term is: 
> '[49, 50, 55, 48, 54, 50, 51, 55, 51, 57, 51, 52, 53, 50, 56, 51, 53, 46, 48, 
> 46, 97, 118, 114, 111, 46, 48, 46, 97, 118, 114]...', original message: bytes 
> can be at most 32766 in length; got 74483 at 
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:984)
>  at 
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:527)
>  at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:491)
>  at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:208)
>  at 
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:415)
>  at 
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471) at 
> org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1444) at 
> org.apache.nifi.provenance.lucene.LuceneEventIndexWriter.index(LuceneEventIndexWriter.java:70)
>  at 
> org.apache.nifi.provenance.index.lucene.EventIndexTask.index(EventIndexTask.java:202)
>  at 
> org.apache.nifi.provenance.index.lucene.EventIndexTask.run(EventIndexTask.java:113)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:750) Caused by: 
> org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes 
> can be at most 32766 in length; got 74483 at 
> org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:281) at 
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:182) at 
> org.apache.lucene.index.DefaultIndexingChain$PerField. {code}
> Looking at the code, it looks like filename is the only attribute that could 
> be set with arbitrary values that is not protected against overly large 
> values right now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12828) DatabaseTableSchemaRegistry improvement to handle BIT/Boolean Type

2024-02-29 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12828:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> DatabaseTableSchemaRegistry  improvement to handle BIT/Boolean Type  
> -
>
> Key: NIFI-12828
> URL: https://issues.apache.org/jira/browse/NIFI-12828
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.25.0
>Reporter: RAVINARAYAN SINGH
>Assignee: RAVINARAYAN SINGH
>Priority: Major
>  Labels: controller_services
> Fix For: 2.0.0, 1.26.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The issue stems from the fact that for the PostgreSQL JDBC driver, the 
> Boolean type is mapped to {{{}java.sql.Types.BIT{}}}, not {{{}BOOLEAN{}}}. 
> This causes the line {{columnResultSet.getInt("DATA_TYPE")}} to return -7, 
> which corresponds to {{{}BIT{}}}, and 
> {{DataTypeUtils.getDataTypeFromSQLTypeValue(dataType)}} to return 
> {{{}null{}}}, leading to a null pointer exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12828) DatabaseTableSchemaRegistry improvement to handle BIT/Boolean Type

2024-02-29 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12828:

Fix Version/s: 2.0.0
   1.26.0

> DatabaseTableSchemaRegistry  improvement to handle BIT/Boolean Type  
> -
>
> Key: NIFI-12828
> URL: https://issues.apache.org/jira/browse/NIFI-12828
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.25.0
>Reporter: RAVINARAYAN SINGH
>Assignee: RAVINARAYAN SINGH
>Priority: Major
>  Labels: controller_services
> Fix For: 2.0.0, 1.26.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The issue stems from the fact that for the PostgreSQL JDBC driver, the 
> Boolean type is mapped to {{{}java.sql.Types.BIT{}}}, not {{{}BOOLEAN{}}}. 
> This causes the line {{columnResultSet.getInt("DATA_TYPE")}} to return -7, 
> which corresponds to {{{}BIT{}}}, and 
> {{DataTypeUtils.getDataTypeFromSQLTypeValue(dataType)}} to return 
> {{{}null{}}}, leading to a null pointer exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12839) Maven archetype for processor bundle incorrectly sets NiFi dependency version

2024-02-23 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12839:

Status: Patch Available  (was: In Progress)

> Maven archetype for processor bundle incorrectly sets NiFi dependency version
> -
>
> Key: NIFI-12839
> URL: https://issues.apache.org/jira/browse/NIFI-12839
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Tools and Build
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 2.0.0, 1.26.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The processor bundle archetype does not explicitly set the dependency version 
> for nifi-standard-services-api-nar, leading the generated POM to set it to 
> the version of the extension. It should explicitly set the version to 
> "nifiVersion".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NIFI-12839) Maven archetype for processor bundle incorrectly sets NiFi dependency version

2024-02-23 Thread Matt Burgess (Jira)
Matt Burgess created NIFI-12839:
---

 Summary: Maven archetype for processor bundle incorrectly sets 
NiFi dependency version
 Key: NIFI-12839
 URL: https://issues.apache.org/jira/browse/NIFI-12839
 Project: Apache NiFi
  Issue Type: Bug
  Components: Tools and Build
Reporter: Matt Burgess
Assignee: Matt Burgess
 Fix For: 2.0.0, 1.26.0


The processor bundle archetype does not explicitly set the dependency version 
for nifi-standard-services-api-nar, leading the generated POM to set it to the 
version of the extension. It should explicitly set the version to "nifiVersion".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-4385) Adjust the QueryDatabaseTable processor for handling big tables.

2024-02-21 Thread Matt Burgess (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819430#comment-17819430
 ] 

Matt Burgess commented on NIFI-4385:


I'm ok with changing the Fetch Size defaults but 0 leaves it up to the driver, 
and often their defaults are WAY too low. Would be nice to add more 
documentation around the choice of default as well as what it means to the 
user, as sometimes I've seen assumptions that that's how many rows will be in a 
FlowFile when there's a separate property for that.

Another thing I'm looking into is to see if I can speed everything up by doing 
the fetch in parallel (on multiple cores), fetching the specified number of 
rows and having another thread writing them to the FlowFile. It always depends 
on the use case but sometimes the Avro conversion of the ResultSet is "the long 
pole in the tent", so rather than that logic waiting on a fetch it can be 
working constantly while at least one separate thread ensures there's always 
data ready to be converted and written out. I'll write up a separate Jira for 
that once I fully characterize the issue and proposed solution. In the meantime 
I welcome all thoughts, comments, questions, and concerns right here :)

> Adjust the QueryDatabaseTable processor for handling big tables.
> 
>
> Key: NIFI-4385
> URL: https://issues.apache.org/jira/browse/NIFI-4385
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.3.0
>Reporter: Tim Späth
>Priority: Major
>
> When querying large database tables, the *QueryDatabaseTable* processor does 
> not perform very well.
> The processor will always perform the full query and then transfer all 
> flowfiles as a list instead of 
> transferring them particularly after the *ResultSet* is fetching the next 
> rows(If a fetch size is given). 
> If you want to query a billion rows from a table, 
> the processor will add all flowfiles in an ArrayList in memory 
> before transferring the whole list after the last row is fetched by the 
> ResultSet. 
> I've checked the code in 
> *org.apache.nifi.processors.standard.QueryDatabaseTable.java* 
> and in my opinion, it would be no big deal to move the session.transfer to a 
> proper position in the code (into the while loop where the flowfile is added 
> to the list) to 
> achieve a real _stream support_. There was also a bug report for this problem 
> which resulted in adding the new property *Maximum Number of Fragments*, 
> but this property will just limit the results. 
> Now you have to multiply *Max Rows Per Flow File* with *Maximum Number of 
> Fragments* to get your limit, 
> which is not really a solution for the original problem imho. 
> Also the workaround with GenerateTableFetch and/or ExecuteSQL processors is 
> much slower than using a database cursor or a ResultSet
> and stream the rows in flowfiles directly in the queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-4385) Adjust the QueryDatabaseTable processor for handling big tables.

2024-02-13 Thread Matt Burgess (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817192#comment-17817192
 ] 

Matt Burgess commented on NIFI-4385:


For large tables, GenerateTableFetch is the way to go. If you have a standalone 
NiFi instance and want to fetch a ton of rows, it's going to be slow. 

> Adjust the QueryDatabaseTable processor for handling big tables.
> 
>
> Key: NIFI-4385
> URL: https://issues.apache.org/jira/browse/NIFI-4385
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.3.0
>Reporter: Tim Späth
>Priority: Major
>
> When querying large database tables, the *QueryDatabaseTable* processor does 
> not perform very well.
> The processor will always perform the full query and then transfer all 
> flowfiles as a list instead of 
> transferring them particularly after the *ResultSet* is fetching the next 
> rows(If a fetch size is given). 
> If you want to query a billion rows from a table, 
> the processor will add all flowfiles in an ArrayList in memory 
> before transferring the whole list after the last row is fetched by the 
> ResultSet. 
> I've checked the code in 
> *org.apache.nifi.processors.standard.QueryDatabaseTable.java* 
> and in my opinion, it would be no big deal to move the session.transfer to a 
> proper position in the code (into the while loop where the flowfile is added 
> to the list) to 
> achieve a real _stream support_. There was also a bug report for this problem 
> which resulted in adding the new property *Maximum Number of Fragments*, 
> but this property will just limit the results. 
> Now you have to multiply *Max Rows Per Flow File* with *Maximum Number of 
> Fragments* to get your limit, 
> which is not really a solution for the original problem imho. 
> Also the workaround with GenerateTableFetch and/or ExecuteSQL processors is 
> much slower than using a database cursor or a ResultSet
> and stream the rows in flowfiles directly in the queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12777) QueryRecord does not support UUID field type

2024-02-13 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12777:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> QueryRecord does not support UUID field type
> 
>
> Key: NIFI-12777
> URL: https://issues.apache.org/jira/browse/NIFI-12777
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.25.0
> Environment: windows
>Reporter: Nikolas Falco
>Priority: Major
> Fix For: 1.26.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I have a QueryRecord processor that takes a flowfile as input. This flowfile 
> contains records with some fields of type UUID.
> {noformat}
> 

[jira] [Updated] (NIFI-12777) QueryRecord does not support UUID field type

2024-02-13 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12777:

Fix Version/s: 1.26.0

> QueryRecord does not support UUID field type
> 
>
> Key: NIFI-12777
> URL: https://issues.apache.org/jira/browse/NIFI-12777
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.25.0
> Environment: windows
>Reporter: Nikolas Falco
>Priority: Major
> Fix For: 1.26.0
>
>
> I have a QueryRecord processor that takes a flowfile as input. This flowfile 
> contains records with some fields of type UUID.
> {noformat}
> 

[jira] [Updated] (NIFI-12777) QueryRecord does not support UUID field type

2024-02-13 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12777:

Status: Patch Available  (was: Open)

> QueryRecord does not support UUID field type
> 
>
> Key: NIFI-12777
> URL: https://issues.apache.org/jira/browse/NIFI-12777
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.25.0
> Environment: windows
>Reporter: Nikolas Falco
>Priority: Major
>
> I have a QueryRecord processor that takes a flowfile as input. This flowfile 
> contains records with some fields of type UUID.
> {noformat}
> 

[jira] [Updated] (NIFI-8932) Add feature to CSVReader to skip N lines at top of the file

2024-02-01 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-8932:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

> Add feature to CSVReader to skip N lines at top of the file
> ---
>
> Key: NIFI-8932
> URL: https://issues.apache.org/jira/browse/NIFI-8932
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Philipp Korniets
>Assignee: Matt Burgess
>Priority: Minor
>  Labels: backport-needed
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> We have a lot of CSV files where provider add custom header/footer to valid 
> CSV content.
>  CSV header is actually second row. 
> To remove unnecessary data we can use
>  * ReplaceText 
>  * splitText->RouteOnAttribute -> MergeContent
> It would be great to have an option in CSVReader controller to skip N rows 
> from top/bottom in order to get5 clean data.
>  * skip N from the top
>  * skip M from the bottom
>  Similar request was developed in FLINK 
> https://issues.apache.org/jira/browse/FLINK-1002
>  
> Data Example:
> {code}
> 7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X),,,
> distribution_id,Distribution 
> Id,settle_date,group_code,company_name,currency_code,common_account_name,business_date,prod_code,security,class,asset_type
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,EUR,TPSL_21025226   ,19-Jul-21,BRM96ST7   ,ABC 
> 14/09/24,NR,BOND  
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,GBP,RPSS_21025226   ,19-Jul-21,,Total @ -0.11,,
> {code}
> |7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X)|  |  |  |  |  |  |  
> |  |  |  |  |  
> |distribution_id|Distribution 
> Id|settle_date|group_code|company_name|currency_code|common_account_name|business_date|prod_code|security|class|asset_type|
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |EUR|TPSL_21025226   |19-Jul-21|BRM96ST7   |ABC 
> 14/09/24|NR|BOND  |
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |GBP|RPSS_21025226   |19-Jul-21| |Total @ -0.11| | |



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-12731) GetHBase should save state whenever the session is committed

2024-02-01 Thread Matt Burgess (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813436#comment-17813436
 ] 

Matt Burgess commented on NIFI-12731:
-

There are two PRs due to merge conflicts, one based on main and one based on 
support/nifi-1.x

> GetHBase should save state whenever the session is committed
> 
>
> Key: NIFI-12731
> URL: https://issues.apache.org/jira/browse/NIFI-12731
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 2.0.0, 1.26.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently there is a place in the GetHBase code where the session is 
> committed after each set of 500 rows/FlowFiles (so as not to run out of 
> memory buffering millions of rows/FlowFiles) but the state is not updated. If 
> an error occurs during processing of the entire table, the state is not 
> updated but FlowFiles have already been sent downstream, so restarting the 
> processor results in duplicate data.
> GetHBase should save the current state whenever the session is committed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12731) GetHBase should save state whenever the session is committed

2024-02-01 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12731:

Status: Patch Available  (was: In Progress)

> GetHBase should save state whenever the session is committed
> 
>
> Key: NIFI-12731
> URL: https://issues.apache.org/jira/browse/NIFI-12731
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 2.0.0, 1.26.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there is a place in the GetHBase code where the session is 
> committed after each set of 500 rows/FlowFiles (so as not to run out of 
> memory buffering millions of rows/FlowFiles) but the state is not updated. If 
> an error occurs during processing of the entire table, the state is not 
> updated but FlowFiles have already been sent downstream, so restarting the 
> processor results in duplicate data.
> GetHBase should save the current state whenever the session is committed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (NIFI-12731) GetHBase should save state whenever the session is committed

2024-02-01 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess reassigned NIFI-12731:
---

Assignee: Matt Burgess

> GetHBase should save state whenever the session is committed
> 
>
> Key: NIFI-12731
> URL: https://issues.apache.org/jira/browse/NIFI-12731
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 2.0.0, 1.26.0
>
>
> Currently there is a place in the GetHBase code where the session is 
> committed after each set of 500 rows/FlowFiles (so as not to run out of 
> memory buffering millions of rows/FlowFiles) but the state is not updated. If 
> an error occurs during processing of the entire table, the state is not 
> updated but FlowFiles have already been sent downstream, so restarting the 
> processor results in duplicate data.
> GetHBase should save the current state whenever the session is committed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NIFI-12731) GetHBase should save state whenever the session is committed

2024-02-01 Thread Matt Burgess (Jira)
Matt Burgess created NIFI-12731:
---

 Summary: GetHBase should save state whenever the session is 
committed
 Key: NIFI-12731
 URL: https://issues.apache.org/jira/browse/NIFI-12731
 Project: Apache NiFi
  Issue Type: Bug
  Components: Extensions
Reporter: Matt Burgess
 Fix For: 2.0.0, 1.26.0


Currently there is a place in the GetHBase code where the session is committed 
after each set of 500 rows/FlowFiles (so as not to run out of memory buffering 
millions of rows/FlowFiles) but the state is not updated. If an error occurs 
during processing of the entire table, the state is not updated but FlowFiles 
have already been sent downstream, so restarting the processor results in 
duplicate data.

GetHBase should save the current state whenever the session is committed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-9677) LookUpRecord record path evaluation is "breaking" the next evaluation in case data is missing

2024-01-26 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-9677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-9677:
---
Fix Version/s: 2.0.0
   1.26.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> LookUpRecord record path evaluation is "breaking" the next evaluation in case 
> data is missing
> -
>
> Key: NIFI-9677
> URL: https://issues.apache.org/jira/browse/NIFI-9677
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Apache NiFi custom built from Github repo.
>Reporter: Peter Molnar
>Assignee: Jim Steinebrey
>Priority: Major
> Fix For: 2.0.0, 1.26.0
>
> Attachments: LookUpRecord_empty_array_data_issue.xml, 
> image-2022-02-11-12-30-53-134.png, image-2022-02-11-12-32-01-833.png, 
> image-2022-02-11-12-33-23-283.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Input JSON generated by GenerateFlowFile processor looks like this (actually 
> I just added a currencies array under each record in addition to the "Record 
> Update Strategy - Replace Existing Values" example here 
> [https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.15.0/org.apache.nifi.processors.standard.LookupRecord/additionalDetails.html).]
> *Note:* for the first record currencies array is empty.
>  
> {code:java}
> [
>   {
>     "locales": [
>       {
>         "region": "FR",
>         "language": "fr"
>       }, {
>         "region": "US",
>         "language": "en"
>       }
>     ],
>     "currencies": []
>   }, {
>     "locales": [
>       {
>         "region": "CA",
>         "language": "fr"
>       }, 
>       {
>         "region": "JP",
>         "language": "ja"
>       }
>     ],
>     "currencies": [
>       {
>         "currency": "CAD"
>       }, {
>         "currency": "JPY"
>       }
>     ]
>   }
> ]{code}
>  
> SimpleKeyValueLookUp service contains the following values:
> !image-2022-02-11-12-33-23-283.png!
>  
> LookUpRecord processor is configured as follows: 
> !image-2022-02-11-12-30-53-134.png!
> Once I execute the LookUpRecord processor for the flow file, language look up 
> works fine, but the look up for currencies and regions do not work.
>  
> !image-2022-02-11-12-32-01-833.png!
> *Note:* in case the 1st currencies array is not empty but contains \{ 
> "currency": "EUR" }, \{ "currency": "USD" }, all look up works fine. But a 
> missing data seems to break the next evaluation of the record path.
> Please find the template for reproducing the issue enclosed as 
> "LookUpRecord_empty_array_data_issue.xml".
> Thank you.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12564) Processor 'ExecuteSQL' error, when uses unsigned mediumint in mysql

2024-01-25 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12564:

Summary: Processor 'ExecuteSQL' error, when uses unsigned mediumint in 
mysql  (was: Processor 'ExecuteSQL' error, when uses unsigned mdeiumint in 
mysql)

> Processor 'ExecuteSQL' error, when uses unsigned mediumint in mysql
> ---
>
> Key: NIFI-12564
> URL: https://issues.apache.org/jira/browse/NIFI-12564
> Project: Apache NiFi
>  Issue Type: Bug
>Reporter: duhee
>Priority: Major
>
> * Nifi Version : NiFi 1.13.2
>  * Processor : ExecuteSQL
>  * Table schema
> {code:java}
> CREATE TABLE `nifi_test` (
>   `clm1` varchar(15) NOT NULL DEFAULT '',
>   `clm2` varchar(10) NOT NULL DEFAULT '',
>   `clm3` varchar(12) NOT NULL DEFAULT '',
>   `clm4` varchar(20) NOT NULL DEFAULT '',
>   `clm5` mediumint(20) unsigned DEFAULT '1',
>   PRIMARY KEY (`clm1`,`clm2`,`clm3`,`clm4`)
> );
> insert into nifi_test values ('1','2','3','4',1), ('11','12','13','14',100), 
> ('21','22','23','24',0); {code}
>  * Error Message
> {code:java}
>         at 
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:95)
> 2024-01-04 10:55:09,064 ERROR [Timer-Driven Process Thread-91] 
> o.a.nifi.processors.standard.ExecuteSQL 
> ExecuteSQL[id=ce977968-018c-1000-e472-2fba6455ce07] Unable to execute SQL 
> select query [SELECT * FROM nifi_test WHERE 1=1 LIMIT 10] for 
> StandardFlowFileRecord[uuid=050b1e1c-85a4-440c-9dc4-27404cf1445d,claim=StandardContentClaim
>  [resourceClaim=StandardResourceClaim[id=170406508-74869, 
> container=default, section=117], offset=0, 
> length=46],offset=0,name=72fd1391-d014-4329-927b-fd037587ddbc,size=46] 
> routing to failure
> org.apache.nifi.processor.exception.ProcessException: 
> java.lang.RuntimeException: Unable to resolve union for value 1 with type 
> java.lang.Integer while appending record {"clm1": "1", "clm2": "2", "clm3": 
> "3", "clm4": "4","clm5": 1}
>         at 
> org.apache.nifi.processors.standard.AbstractExecuteSQL.lambda$onTrigger$4(AbstractExecuteSQL.java:336)
>         at 
> org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:3129)
>         at 
> org.apache.nifi.processors.standard.AbstractExecuteSQL.onTrigger(AbstractExecuteSQL.java:332)
>         at 
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
>         at 
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1361)
>         at 
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:247)
>         at 
> org.apache.nifi.controller.scheduling.AbstractTimeBasedSchedulingAgent.lambda$doScheduleOnce$0(AbstractTimeBasedSchedulingAgent.java:59)
>         at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.RuntimeException: Unable to resolve union for value 1 
> with type java.lang.Integer while appending record {"clm1": "1", "clm2": "2", 
> "clm3": "3", "clm4": "4", "clm5": 1}
>         at 
> org.apache.nifi.util.db.JdbcCommon.convertToAvroStream(JdbcCommon.java:456)
>         at 
> org.apache.nifi.processors.standard.sql.DefaultAvroSqlWriter.writeResultSet(DefaultAvroSqlWriter.java:49)
>         at 
> org.apache.nifi.processors.standard.AbstractExecuteSQL.lambda$onTrigger$4(AbstractExecuteSQL.java:334)
>         ... 14 common frames omitted
> Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: 
> org.apache.avro.UnresolvedUnionException:Not in union ["null","long"]: 1 
> (field=clm5)
>         at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:317)
>         at 
> org.apache.nifi.util.db.JdbcCommon.convertToAvroStream(JdbcCommon.java:448)
>         ... 16 common frames omitted
> Caused by: org.apache.avro.UnresolvedUnionException: Not in union 
> ["null","long"]: 1 (field=clm5)
>         at 
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:247)
>         at 
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:234)
>         at 
> 

[jira] [Updated] (NIFI-9677) LookUpRecord record path evaluation is "breaking" the next evaluation in case data is missing

2024-01-18 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-9677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-9677:
---
Affects Version/s: (was: 1.16.0)
   Status: Patch Available  (was: Open)

> LookUpRecord record path evaluation is "breaking" the next evaluation in case 
> data is missing
> -
>
> Key: NIFI-9677
> URL: https://issues.apache.org/jira/browse/NIFI-9677
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
> Environment: Apache NiFi custom built from Github repo.
>Reporter: Peter Molnar
>Assignee: Jim Steinebrey
>Priority: Major
> Attachments: LookUpRecord_empty_array_data_issue.xml, 
> image-2022-02-11-12-30-53-134.png, image-2022-02-11-12-32-01-833.png, 
> image-2022-02-11-12-33-23-283.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Input JSON generated by GenerateFlowFile processor looks like this (actually 
> I just added a currencies array under each record in addition to the "Record 
> Update Strategy - Replace Existing Values" example here 
> [https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.15.0/org.apache.nifi.processors.standard.LookupRecord/additionalDetails.html).]
> *Note:* for the first record currencies array is empty.
>  
> {code:java}
> [
>   {
>     "locales": [
>       {
>         "region": "FR",
>         "language": "fr"
>       }, {
>         "region": "US",
>         "language": "en"
>       }
>     ],
>     "currencies": []
>   }, {
>     "locales": [
>       {
>         "region": "CA",
>         "language": "fr"
>       }, 
>       {
>         "region": "JP",
>         "language": "ja"
>       }
>     ],
>     "currencies": [
>       {
>         "currency": "CAD"
>       }, {
>         "currency": "JPY"
>       }
>     ]
>   }
> ]{code}
>  
> SimpleKeyValueLookUp service contains the following values:
> !image-2022-02-11-12-33-23-283.png!
>  
> LookUpRecord processor is configured as follows: 
> !image-2022-02-11-12-30-53-134.png!
> Once I execute the LookUpRecord processor for the flow file, language look up 
> works fine, but the look up for currencies and regions do not work.
>  
> !image-2022-02-11-12-32-01-833.png!
> *Note:* in case the 1st currencies array is not empty but contains \{ 
> "currency": "EUR" }, \{ "currency": "USD" }, all look up works fine. But a 
> missing data seems to break the next evaluation of the record path.
> Please find the template for reproducing the issue enclosed as 
> "LookUpRecord_empty_array_data_issue.xml".
> Thank you.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12420) Add Sawmill transformation processors

2024-01-17 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12420:

Resolution: Won't Do
Status: Resolved  (was: Patch Available)

> Add Sawmill transformation processors
> -
>
> Key: NIFI-12420
> URL: https://issues.apache.org/jira/browse/NIFI-12420
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Similarly to 
> JoltTransformJSON
> JoltTransformRecord
> JSLTTransformJSON
> It would be nice to have  SawmillTransformJSON and SawmillTransformRecord 
> processors that rely on the Sawmill transformation DSL
> https://github.com/logzio/sawmill
> https://github.com/logzio/sawmill/wiki
> to transform input data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (NIFI-12420) Add Sawmill transformation processors

2024-01-17 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess reassigned NIFI-12420:
---

Assignee: (was: Matt Burgess)

> Add Sawmill transformation processors
> -
>
> Key: NIFI-12420
> URL: https://issues.apache.org/jira/browse/NIFI-12420
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Reporter: Matt Burgess
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Similarly to 
> JoltTransformJSON
> JoltTransformRecord
> JSLTTransformJSON
> It would be nice to have  SawmillTransformJSON and SawmillTransformRecord 
> processors that rely on the Sawmill transformation DSL
> https://github.com/logzio/sawmill
> https://github.com/logzio/sawmill/wiki
> to transform input data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12596) PutIceberg is missing case-insensitive Record type handling in List and Map types

2024-01-16 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12596:

Fix Version/s: 1.25.0
   2.0.0-M2
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> PutIceberg is missing case-insensitive Record type handling in List and Map 
> types
> -
>
> Key: NIFI-12596
> URL: https://issues.apache.org/jira/browse/NIFI-12596
> Project: Apache NiFi
>  Issue Type: Bug
>Reporter: Mark Bathori
>Assignee: Mark Bathori
>Priority: Major
> Fix For: 1.25.0, 2.0.0-M2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> With NIFI-11263 case-insensitive and order independent field handling was 
> added to Record types but it is missing in case of List and Map types 
> containing Records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-4491) Add a CaptureChangeMySQLRecord processor

2024-01-15 Thread Matt Burgess (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806977#comment-17806977
 ] 

Matt Burgess commented on NIFI-4491:


That would be great!

> Add a CaptureChangeMySQLRecord processor
> 
>
> Key: NIFI-4491
> URL: https://issues.apache.org/jira/browse/NIFI-4491
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Priority: Major
>
> The main reason CaptureChangeMySQL doesn't leverage the RecordSetWriter API 
> is that those capabilities were being developed in parallel with that 
> processor. Whether a new record-aware processor is better than an improvement 
> to the original is up for discussion; however, it would be a good idea to 
> support the RecordSetWriter API for any CDC (CaptureChangeXYZ) processor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12596) PutIceberg is missing case-insensitive Record type handling in List and Map types

2024-01-15 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12596:

Status: Patch Available  (was: Open)

> PutIceberg is missing case-insensitive Record type handling in List and Map 
> types
> -
>
> Key: NIFI-12596
> URL: https://issues.apache.org/jira/browse/NIFI-12596
> Project: Apache NiFi
>  Issue Type: Bug
>Reporter: Mark Bathori
>Assignee: Mark Bathori
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> With NIFI-11263 case-insensitive and order independent field handling was 
> added to Record types but it is missing in case of List and Map types 
> containing Records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12606) Update Apache parent POM to version 31

2024-01-12 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12606:

Fix Version/s: 1.25.0
   2.0.0
   Status: Patch Available  (was: In Progress)

> Update Apache parent POM to version 31
> --
>
> Key: NIFI-12606
> URL: https://issues.apache.org/jira/browse/NIFI-12606
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Tools and Build
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Minor
> Fix For: 1.25.0, 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Apache parent POM (used at the root pom.xml in NiFi) has a new release, 
> version 31:
> https://github.com/apache/maven-apache-parent/releases/tag/apache-31
> This includes a number of dependency upgrades.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (NIFI-12606) Update Apache parent POM to version 31

2024-01-12 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess reassigned NIFI-12606:
---

Assignee: Matt Burgess

> Update Apache parent POM to version 31
> --
>
> Key: NIFI-12606
> URL: https://issues.apache.org/jira/browse/NIFI-12606
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Tools and Build
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Minor
>
> The Apache parent POM (used at the root pom.xml in NiFi) has a new release, 
> version 31:
> https://github.com/apache/maven-apache-parent/releases/tag/apache-31
> This includes a number of dependency upgrades.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NIFI-12606) Update Apache parent POM to version 31

2024-01-12 Thread Matt Burgess (Jira)
Matt Burgess created NIFI-12606:
---

 Summary: Update Apache parent POM to version 31
 Key: NIFI-12606
 URL: https://issues.apache.org/jira/browse/NIFI-12606
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Tools and Build
Reporter: Matt Burgess


The Apache parent POM (used at the root pom.xml in NiFi) has a new release, 
version 31:

https://github.com/apache/maven-apache-parent/releases/tag/apache-31

This includes a number of dependency upgrades.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-9032) Refactoring HDFS processors

2024-01-11 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-9032:
---
Fix Version/s: 1.15.0

> Refactoring HDFS processors
> ---
>
> Key: NIFI-9032
> URL: https://issues.apache.org/jira/browse/NIFI-9032
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Simon Bence
>Assignee: Simon Bence
>Priority: Major
> Fix For: 1.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As a continuation of 
> [NIFI-8717|https://issues.apache.org/jira/browse/NIFI-8717]. In order to make 
> extending functionality easier I propose some refactors for the DeleteHDFS, 
> ListHDFS and FetchHDFS processors. This mainly involves the decoupling of the 
> "HDFS business logic" from the "NiFi specific" controller behaviour and opens 
> the possibility to extend the NiFi specific code by providing hook points.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12587) Improve error message in ValidateCSV

2024-01-09 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12587:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Improve error message in ValidateCSV
> 
>
> Key: NIFI-12587
> URL: https://issues.apache.org/jira/browse/NIFI-12587
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Major
> Fix For: 1.25.0, 2.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Improve the error message in ValidateCSV to include line number, row number 
> and column number.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12571) Upgrade Logback to 1.4.14

2024-01-06 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12571:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Upgrade Logback to 1.4.14
> -
>
> Key: NIFI-12571
> URL: https://issues.apache.org/jira/browse/NIFI-12571
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: David Handermann
>Assignee: David Handermann
>Priority: Minor
> Fix For: 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Logback 1.3 and 1.4 have the same set of features, but Logback 1.4 requires a 
> minimum of Java 11 and has optional components that target current Jakarta EE 
> versions.
> The main branch of NiFi should be upgraded to Logback 1.4 to align with 
> current dependency upgrade efforts for NiFi 2.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12570) Upgrade Apache IoTDB to 1.3.0

2024-01-06 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12570:

Fix Version/s: 1.25.0
   2.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Upgrade Apache IoTDB to 1.3.0
> -
>
> Key: NIFI-12570
> URL: https://issues.apache.org/jira/browse/NIFI-12570
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: David Handermann
>Assignee: David Handermann
>Priority: Minor
>  Labels: backport-needed
> Fix For: 1.25.0, 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Apache IoTDB 
> [1.3.0|https://lists.apache.org/thread/754581rl7khhjo7k5gvzc1twhojnlft8] 
> contains a number of bug fixes and improvements for both server and client 
> features.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-9464) Provenance Events files corrupted

2024-01-04 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-9464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-9464:
---
Fix Version/s: 1.25.0
   2.0.0
   Status: Patch Available  (was: Open)

> Provenance Events files corrupted
> -
>
> Key: NIFI-9464
> URL: https://issues.apache.org/jira/browse/NIFI-9464
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.15.0, 1.11.0
> Environment: java 11, centos 7, nifi standalone
>Reporter: Wiktor Kubicki
>Assignee: Tamas Palfy
>Priority: Minor
> Fix For: 1.25.0, 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In my logs i found:
> {code:java}
> SiteToSiteProvenanceReportingTask[id=b209c0ae-016e-1000-ae39-301c9dcfc544] 
> Failed to retrieve Provenance Events from repository due to: Attempted to 
> skip to byte offset 9149491 for 1125432890.prov.gz but file does not have 
> that many bytes (TOC 
> Reader=StandardTocReader[file=//provenance_repository/toc/1125432890.toc, 
> compressed=false]): java.io.EOFException: Attempted to skip to byte offset 
> 9149491 for 1125432890.prov.gz but file does not have that many bytes (TOC 
> Reader=StandardTocReader[file=/.../provenance_repository/toc/1125432890.toc, 
> compressed=false])
> {code}
> It is criticaly important for me to have 100% sure of my logs. It happened 
> about 100 times in last 1 year for 15 *.prov.gz files:
> {code:java}
> -rw-rw-rw-. 1 user user 1013923 Oct 17 21:17 1075441276.prov.gz
> -rw-rw-rw-. 1 user user 1345431 Oct 24 13:06 1083362251.prov.gz
> -rw-rw-rw-. 1 user user 1359282 Oct 25 13:07 1084546392.prov.gz
> -rw-rw-rw-. 1 user user 1155791 Nov  2 17:08 1094516954.prov.gz
> -rw-rw-r--. 1 user user  974136 Nov 18 22:07 1113402183.prov.gz
> -rw-rw-r--. 1 user user 1125608 Nov 28 22:00 1125097576.prov.gz
> -rw-rw-r--. 1 user user 1248319 Nov 29 04:30 1125432890.prov.gz
> -rw-rw-r--. 1 user user  832120 Feb  2  2021 661957813.prov.gz
> -rw-rw-r--. 1 user user 1110978 Mar 17  2021 734807613.prov.gz
> -rw-rw-r--. 1 user user 1506819 Apr 16  2021 786154249.prov.gz
> -rw-rw-r--. 1 user user 1763198 May 25  2021 852626782.prov.gz
> -rw-rw-r--. 1 user user 1580598 Jun 15 08:32 891934274.prov.gz
> -rw-rw-r--. 1 user user 2960296 Jun 28 17:07 917991812.prov.gz
> -rw-rw-r--. 1 user user 1808037 Jun 28 17:37 918051650.prov.gz
> -rw-rw-rw-. 1 user user  765924 Aug 14 13:09 991505484.prov.gz
> {code}
> BTW it's interesting why thera ere different chmods
> My config for provenance (BTW if you see posibbility for tune it, please tell 
> me):
> {code:java}
> nifi.provenance.repository.directory.default=/../provenance_repository
> nifi.provenance.repository.max.storage.time=730 days
> nifi.provenance.repository.max.storage.size=512 GB
> nifi.provenance.repository.rollover.time=10 mins
> nifi.provenance.repository.rollover.size=100 MB
> nifi.provenance.repository.query.threads=2
> nifi.provenance.repository.index.threads=1
> nifi.provenance.repository.compress.on.rollover=true
> nifi.provenance.repository.always.sync=false
> nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, 
> ProcessorID
> nifi.provenance.repository.indexed.attributes=
> nifi.provenance.repository.index.shard.size=1 GB
> nifi.provenance.repository.max.attribute.length=65536
> nifi.provenance.repository.concurrent.merge.threads=1
> nifi.provenance.repository.buffer.size=10
> {code}
> Now my provenance repo has 140GB of data.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12567) NPE in CuratorLeaderElectionManager.getLeadershipChangeCount

2024-01-04 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12567:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> NPE in CuratorLeaderElectionManager.getLeadershipChangeCount
> 
>
> Key: NIFI-12567
> URL: https://issues.apache.org/jira/browse/NIFI-12567
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>Priority: Major
> Fix For: 1.25.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Since NiFi 1.10, a NPE can happen in the code of 
> CuratorLeaderElectionManager.getLeadershipChangeCount when trying to get 
> diagnostic data for a NiFi node:
>  
> {code:java}
> ERROR org.apache.nifi.diagnostics.bootstrap.BootstrapDiagnosticsFactory: 
> Failed to obtain diagnostics information from class 
> org.apache.nifi.diagnostics.bootstrap.tasks.ClusterDiagnosticTask
> java.lang.NullPointerException: null
>   at 
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.getLeadershipChangeCount(CuratorLeaderElectionManager.java:241)
>   at 
> org.apache.nifi.diagnostics.bootstrap.tasks.ClusterDiagnosticTask.captureDump(ClusterDiagnosticTask.java:75)
>   at 
> org.apache.nifi.diagnostics.bootstrap.BootstrapDiagnosticsFactory.create(BootstrapDiagnosticsFactory.java:58)
>   at 
> org.apache.nifi.BootstrapListener.writeDiagnostics(BootstrapListener.java:288)
>   at 
> org.apache.nifi.BootstrapListener.access$600(BootstrapListener.java:41)
>   at 
> org.apache.nifi.BootstrapListener$Listener$1.run(BootstrapListener.java:240)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748) {code}
>  
> The problematic line is here:
> [https://github.com/apache/nifi/blob/support/nifi-1.x/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/leader/election/CuratorLeaderElectionManager.java#L241]
> This problem does not exist in NiFi 2.x as we have a proper null check at
> [https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-leader-election-shared/src/main/java/org/apache/nifi/controller/leader/election/TrackedLeaderElectionManager.java#L60]
> This JIRA is to track the addition of a similar null check in the NiFi 1.x 
> support line.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12558) Upgrade Jagged to 0.3.0

2024-01-02 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12558:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Upgrade Jagged to 0.3.0
> ---
>
> Key: NIFI-12558
> URL: https://issues.apache.org/jira/browse/NIFI-12558
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: David Handermann
>Assignee: David Handermann
>Priority: Minor
>  Labels: backport-needed
> Fix For: 1.25.0, 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The Jagged library should be upgraded to version 
> [0.3.0|https://github.com/exceptionfactory/jagged/releases/tag/0.3.0] on both 
> main and support branches to resolve an issue with concurrent access to 
> KeyAgreement instances that presents itself when configuring the 
> DecryptContentAge Processor with more than one Concurrent Task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12558) Upgrade Jagged to 0.3.0

2024-01-02 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12558:

Fix Version/s: 1.25.0
   2.0.0

> Upgrade Jagged to 0.3.0
> ---
>
> Key: NIFI-12558
> URL: https://issues.apache.org/jira/browse/NIFI-12558
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: David Handermann
>Assignee: David Handermann
>Priority: Minor
>  Labels: backport-needed
> Fix For: 1.25.0, 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The Jagged library should be upgraded to version 
> [0.3.0|https://github.com/exceptionfactory/jagged/releases/tag/0.3.0] on both 
> main and support branches to resolve an issue with concurrent access to 
> KeyAgreement instances that presents itself when configuring the 
> DecryptContentAge Processor with more than one Concurrent Task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12524) AuditService NullPointerException After 1.23.2 to 1.24.0 Upgrade

2024-01-02 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12524:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> AuditService NullPointerException After 1.23.2 to 1.24.0 Upgrade
> 
>
> Key: NIFI-12524
> URL: https://issues.apache.org/jira/browse/NIFI-12524
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.24.0
> Environment: RHEL 8, OpenJDK11 Latest
>Reporter: Shawn Weeks
>Assignee: David Handermann
>Priority: Major
> Fix For: 1.25.0, 2.0.0
>
> Attachments: nifi-app.log
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Seems to be an issue with the migration from H2 to Xodus. After upgrading 
> from 1.23.2 to 1.24.0 NiFi fails to start and throws a null pointer exception 
> with the auditService Bean. See attached log snippet.
> The cluster started life as NiFi 1.13.2 and has been steadily upgraded since 
> mid 2021.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12559) Upgrade SSHJ to 0.38.0

2024-01-02 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12559:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Upgrade SSHJ to 0.38.0
> --
>
> Key: NIFI-12559
> URL: https://issues.apache.org/jira/browse/NIFI-12559
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: David Handermann
>Assignee: David Handermann
>Priority: Major
>  Labels: backport-needed
> Fix For: 1.25.0, 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> SSHJ should be upgraded to version 
> [0.38.0|https://github.com/hierynomus/sshj#release-history] to incorporate 
> mitigations for the Terrapin vulnerability described in CVE-2023-48795, 
> applicable to SFTP Processors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-4105) support the specified Maximum value column and CSV Stream for Cassandra

2023-12-21 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-4105:
---
Affects Version/s: (was: 1.3.0)
   Status: Patch Available  (was: Open)

> support the specified Maximum value column and CSV Stream for Cassandra
> ---
>
> Key: NIFI-4105
> URL: https://issues.apache.org/jira/browse/NIFI-4105
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Yoonwon Ko
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I'm trying to find a CassandraProcessor to fetch rows whose values in the 
> specified Maximum Value columns are larger than the previously-seen maximum 
> like QueryDatabaseTable.
> But I found only QueryCassandra. It just executes same CQL everytime without 
> keeping maximum value.
> and I think we also need convertToCsvStream option.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12526) QueryCassandra should not output FlowFiles as soon as the "available rows without fetching" is reached

2023-12-21 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12526:

Status: Patch Available  (was: In Progress)

> QueryCassandra should not output FlowFiles as soon as the "available rows 
> without fetching" is reached
> --
>
> Key: NIFI-12526
> URL: https://issues.apache.org/jira/browse/NIFI-12526
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 1.25.0, 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> NIFI-5642 introduced the Max Rows Per Flow File and Output Batch Size 
> properties to QueryCassandra, but still uses the internal 
> "rowsAvailableWithoutFetching" variable (whose value comes from the Cassandra 
> ResultSet and defaults to 5000) as a trigger to stop processing rows for a 
> FlowFile. This can cause unexpected behavior, such as getting multiple 
> FlowFiles when only one is expected.
> Also since those properties were added, the fragment.* attributes should be 
> added and populated accordingly (such as ExecuteSQL does).
> NIFI-5642 also removes the Compression Type property, which might be ok for 
> 2.x but will cause all flows using this property in 1.x to become invalid. On 
> the support (1.x) branch we need to add the property back in, perhaps we can 
> keep it removed for 2.x but we'd want to remove the Cassandra Connect Points 
> property and such to force the user to use a Cassandra Connection controller 
> service. The reason to add the property back is if the Cassandra Contact 
> Points property is used instead of the Cassandra Connection controller 
> service, there is no way to set the Compression Type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12526) QueryCassandra should not output FlowFiles as soon as the "available rows without fetching" is reached

2023-12-21 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12526:

Description: 
NIFI-5642 introduced the Max Rows Per Flow File and Output Batch Size 
properties to QueryCassandra, but still uses the internal 
"rowsAvailableWithoutFetching" variable (whose value comes from the Cassandra 
ResultSet and defaults to 5000) as a trigger to stop processing rows for a 
FlowFile. This can cause unexpected behavior, such as getting multiple 
FlowFiles when only one is expected.

Also since those properties were added, the fragment.* attributes should be 
added and populated accordingly (such as ExecuteSQL does).

NIFI-5642 also removes the Compression Type property, which might be ok for 2.x 
but will cause all flows using this property in 1.x to become invalid. On the 
support (1.x) branch we need to add the property back in, perhaps we can keep 
it removed for 2.x but we'd want to remove the Cassandra Connect Points 
property and such to force the user to use a Cassandra Connection controller 
service. The reason to add the property back is if the Cassandra Contact Points 
property is used instead of the Cassandra Connection controller service, there 
is no way to set the Compression Type.

  was:
NIFI-5642 introduced the Max Rows Per Flow File and Output Batch Size 
properties to QueryCassandra, but still uses the internal 
"rowsAvailableWithoutFetching" variable (whose value comes from the Cassandra 
ResultSet and defaults to 5000) as a trigger to stop processing rows for a 
FlowFile. This can cause unexpected behavior, such as getting multiple 
FlowFiles when only one is expected.

NIFI-5642 also removes the Compression Type property, which might be ok for 2.x 
but will cause all flows using this property in 1.x to become invalid. On the 
support (1.x) branch we need to add the property back in, perhaps we can keep 
it removed for 2.x but we'd want to remove the Cassandra Connect Points 
property and such to force the user to use a Cassandra Connection controller 
service. The reason to add the property back is if the Cassandra Contact Points 
property is used instead of the Cassandra Connection controller service, there 
is no way to set the Compression Type.


> QueryCassandra should not output FlowFiles as soon as the "available rows 
> without fetching" is reached
> --
>
> Key: NIFI-12526
> URL: https://issues.apache.org/jira/browse/NIFI-12526
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 1.25.0, 2.0.0
>
>
> NIFI-5642 introduced the Max Rows Per Flow File and Output Batch Size 
> properties to QueryCassandra, but still uses the internal 
> "rowsAvailableWithoutFetching" variable (whose value comes from the Cassandra 
> ResultSet and defaults to 5000) as a trigger to stop processing rows for a 
> FlowFile. This can cause unexpected behavior, such as getting multiple 
> FlowFiles when only one is expected.
> Also since those properties were added, the fragment.* attributes should be 
> added and populated accordingly (such as ExecuteSQL does).
> NIFI-5642 also removes the Compression Type property, which might be ok for 
> 2.x but will cause all flows using this property in 1.x to become invalid. On 
> the support (1.x) branch we need to add the property back in, perhaps we can 
> keep it removed for 2.x but we'd want to remove the Cassandra Connect Points 
> property and such to force the user to use a Cassandra Connection controller 
> service. The reason to add the property back is if the Cassandra Contact 
> Points property is used instead of the Cassandra Connection controller 
> service, there is no way to set the Compression Type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (NIFI-12526) QueryCassandra should not output FlowFiles as soon as the "available rows without fetching" is reached

2023-12-21 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess reassigned NIFI-12526:
---

Assignee: Matt Burgess

> QueryCassandra should not output FlowFiles as soon as the "available rows 
> without fetching" is reached
> --
>
> Key: NIFI-12526
> URL: https://issues.apache.org/jira/browse/NIFI-12526
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 1.25.0, 2.0.0
>
>
> NIFI-5642 introduced the Max Rows Per Flow File and Output Batch Size 
> properties to QueryCassandra, but still uses the internal 
> "rowsAvailableWithoutFetching" variable (whose value comes from the Cassandra 
> ResultSet and defaults to 5000) as a trigger to stop processing rows for a 
> FlowFile. This can cause unexpected behavior, such as getting multiple 
> FlowFiles when only one is expected.
> NIFI-5642 also removes the Compression Type property, which might be ok for 
> 2.x but will cause all flows using this property in 1.x to become invalid. On 
> the support (1.x) branch we need to add the property back in, perhaps we can 
> keep it removed for 2.x but we'd want to remove the Cassandra Connect Points 
> property and such to force the user to use a Cassandra Connection controller 
> service. The reason to add the property back is if the Cassandra Contact 
> Points property is used instead of the Cassandra Connection controller 
> service, there is no way to set the Compression Type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-12530) Support CREATE TABLE for Oracle databases

2023-12-20 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12530:

Status: Patch Available  (was: In Progress)

> Support CREATE TABLE for Oracle databases
> -
>
> Key: NIFI-12530
> URL: https://issues.apache.org/jira/browse/NIFI-12530
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 1.25.0, 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The UpdateDatabaseTable processor has the capability to create a target table 
> if it does not exist in the target database. However the database adapter 
> used must support it, and the Oracle database adapters do not. Even though it 
> would likely be a PL/SQL function, this is done "under the hood" and so could 
> be supported by Oracle.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NIFI-12530) Support CREATE TABLE for Oracle databases

2023-12-20 Thread Matt Burgess (Jira)
Matt Burgess created NIFI-12530:
---

 Summary: Support CREATE TABLE for Oracle databases
 Key: NIFI-12530
 URL: https://issues.apache.org/jira/browse/NIFI-12530
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Extensions
Reporter: Matt Burgess
Assignee: Matt Burgess
 Fix For: 1.25.0, 2.0.0


The UpdateDatabaseTable processor has the capability to create a target table 
if it does not exist in the target database. However the database adapter used 
must support it, and the Oracle database adapters do not. Even though it would 
likely be a PL/SQL function, this is done "under the hood" and so could be 
supported by Oracle.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NIFI-4071) ConvertJSONToSQL does not support Hive

2023-12-20 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess resolved NIFI-4071.

Resolution: Implemented

> ConvertJSONToSQL does not support Hive
> --
>
> Key: NIFI-4071
> URL: https://issues.apache.org/jira/browse/NIFI-4071
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Priority: Minor
>
> Currently, the ConvertJSONToSQL processor does not support Hive as the target 
> database, either using a HiveConnectionPool or a DBCPConnectionPool 
> configured with a Hive driver. At the very least:
> 1) A SQLException occurs when determining the auto-increment value. This is 
> due to a Hive bug (HIVE-13528 
> [https://issues.apache.org/jira/browse/HIVE-13528]) where the column is not 
> named according to the spec 
> ([http://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getColumns-java.lang.String-java.lang.String-java.lang.String-java.lang.String-]).
> 2) Column names are returned with the table name prepended (using a dot 
> separator)
> 3) There may be other JDBC API calls that are not supported by the Hive JDBC 
> driver.
> #1 could be solved by checking for "IS_AUTOINCREMENT" then failing over to 
> "IS_AUTO_INCREMENT".  #2 could be solved with a lastIndexOf("."), as is done 
> in _org.apache.nifi.util.hive.HiveJdbcCommon#createSchema(java.sql.ResultSet, 
> java.lang.String)_. #3 would take some investigation into the other JDBC API 
> calls being made.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NIFI-7086) oracle db read is slow (for me its bug)

2023-12-20 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess resolved NIFI-7086.

Resolution: Not A Problem

> oracle db read is slow (for me its bug)
> ---
>
> Key: NIFI-7086
> URL: https://issues.apache.org/jira/browse/NIFI-7086
> Project: Apache NiFi
>  Issue Type: Improvement
> Environment:  nifi 1.8.0
>Reporter: naveen kumar saharan
>Priority: Minor
>
> I am not able to fetch oracle db for 1 billion record table. It is taking too 
> much time(17 hours).
> I tried creating queries based on dates using executesql -> 
> generatetablefetch -> executesql to parallel execution
> small tables also performs slow as compared to python database table fetch 
> program around 20 times slower. This is very disapppointing.
> querydatabasetable runs only on primary node with , if i increase thread it 
> give duplicate data.
> Then what is the use of concurrent thread? 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NIFI-12526) QueryCassandra should not output FlowFiles as soon as the "available rows without fetching" is reached

2023-12-19 Thread Matt Burgess (Jira)
Matt Burgess created NIFI-12526:
---

 Summary: QueryCassandra should not output FlowFiles as soon as the 
"available rows without fetching" is reached
 Key: NIFI-12526
 URL: https://issues.apache.org/jira/browse/NIFI-12526
 Project: Apache NiFi
  Issue Type: Bug
  Components: Extensions
Reporter: Matt Burgess
 Fix For: 1.25.0, 2.0.0


NIFI-5642 introduced the Max Rows Per Flow File and Output Batch Size 
properties to QueryCassandra, but still uses the internal 
"rowsAvailableWithoutFetching" variable (whose value comes from the Cassandra 
ResultSet and defaults to 5000) as a trigger to stop processing rows for a 
FlowFile. This can cause unexpected behavior, such as getting multiple 
FlowFiles when only one is expected.

NIFI-5642 also removes the Compression Type property, which might be ok for 2.x 
but will cause all flows using this property in 1.x to become invalid. On the 
support (1.x) branch we need to add the property back in, perhaps we can keep 
it removed for 2.x but we'd want to remove the Cassandra Connect Points 
property and such to force the user to use a Cassandra Connection controller 
service. The reason to add the property back is if the Cassandra Contact Points 
property is used instead of the Cassandra Connection controller service, there 
is no way to set the Compression Type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-5642) QueryCassandra processor : output FlowFiles as soon fetch_size is reached

2023-12-19 Thread Matt Burgess (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-5642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798751#comment-17798751
 ] 

Matt Burgess commented on NIFI-5642:


The "maxRowsPerFlowFile" variable is changed to match the rows available 
without fetching (default is 5000), so only 5000 rows get put into the FlowFile 
even if the Max Rows Per Flow File property is set to zero. I have a test table 
with 20k rows, if I run QueryCassandra once I get 4 output FlowFiles when I 
should only get one. There's a logic error or two in there that need to be 
fixed, but since it has been released as of 1.22.0 and 2.0.0-M1, I will open a 
new Jira to fix them.

> QueryCassandra processor : output FlowFiles as soon fetch_size is reached
> -
>
> Key: NIFI-5642
> URL: https://issues.apache.org/jira/browse/NIFI-5642
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.7.1
>Reporter: André Gomes Lamas Otero
>Assignee: Levi Lentz
>Priority: Major
> Fix For: 2.0.0-M1, 1.22.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When I'm using QueryCassandra alongside with fetch_size parameter I expected 
> that as soon my reader reaches the fetch_size the processor outputs some data 
> to be processed by the next processor, but QueryCassandra reads all the data, 
> then output the flow files.
> I'll start to work on a patch for this situation, I'll appreciate any 
> suggestion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-5642) QueryCassandra processor : output FlowFiles as soon fetch_size is reached

2023-12-13 Thread Matt Burgess (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-5642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796449#comment-17796449
 ] 

Matt Burgess commented on NIFI-5642:


That's not what Fetch Size is for, it's the number of rows the client asks for 
at a time. The Max Rows Per Flow File is what should be used to send FlowFiles 
downstream when they're "full". Any commits should be reverted as this will 
change the behavior other users expect.

> QueryCassandra processor : output FlowFiles as soon fetch_size is reached
> -
>
> Key: NIFI-5642
> URL: https://issues.apache.org/jira/browse/NIFI-5642
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.7.1
>Reporter: André Gomes Lamas Otero
>Assignee: Levi Lentz
>Priority: Major
> Fix For: 2.0.0-M1, 1.22.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When I'm using QueryCassandra alongside with fetch_size parameter I expected 
> that as soon my reader reaches the fetch_size the processor outputs some data 
> to be processed by the next processor, but QueryCassandra reads all the data, 
> then output the flow files.
> I'll start to work on a patch for this situation, I'll appreciate any 
> suggestion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-5291) Add stats for number object to CalculateRecordStats

2023-12-06 Thread Matt Burgess (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793857#comment-17793857
 ] 

Matt Burgess commented on NIFI-5291:


Do we need this if we have QueryRecord? I suppose if there are memory issues 
for large files it would still be helpful to be able to calculate a few simple 
stats incrementally while reading records

> Add stats for number object to CalculateRecordStats
> ---
>
> Key: NIFI-5291
> URL: https://issues.apache.org/jira/browse/NIFI-5291
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Koji Kawamura
>Priority: Minor
>
> Currently, CalculateRecordStats processor counts each string value 
> occurrence. In addition to that, we can add more stats for numeric values, 
> such as min, max and sum. Time, Timestamp and Date can have min and max 
> similarly. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-2735) Add processor to perform simple aggregations

2023-12-06 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-2735:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

Resolving as QueryRecord and/or NIFI-5291 cover this

> Add processor to perform simple aggregations
> 
>
> Key: NIFI-2735
> URL: https://issues.apache.org/jira/browse/NIFI-2735
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Reporter: Matt Burgess
>Priority: Major
>
> This is a proposal for a new processor (AggregateValues, for example) that 
> can perform simple aggregation operations such as count, sum, average, min, 
> max, and concatenate, over a set of "related" flow files. For example, when a 
> JSON file is split on an array (using the SplitJson processor), the total 
> count of the splits, the index of each split, and the unique identifier 
> (shared by each split) are stored as attributes in each flow file sent to the 
> "splits" relationship:
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitJson/index.html
> These attributes are the "fragment.*" attributes in the documentation for 
> SplitText, SplitXml, and SplitJson, for example.
> Such a processor could perform these operations for each flow file split from 
> the original document, and when all documents from a split have been 
> processed, a flow file could be transferred to an "aggregate" relationship 
> containing attributes for the operation, aggregate value, etc.
> An interesting application of this (besides the actual aggregation 
> operations) is that you can use the "aggregate" relationship as an event 
> trigger. For example if you need to wait until all files from a group are 
> processed, you can use AggregateValues and the "aggregate" relationship to 
> indicate downstream that the entire group has been processed. If there is not 
> a Split processor upstream, then the attributes (fragment.*) would have to be 
> manipulated by the data flow designer, but this can be accomplished with 
> other processors (including the scripting processors if necessary). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (NIFI-2735) Add processor to perform simple aggregations

2023-12-06 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess reassigned NIFI-2735:
--

Assignee: (was: Matt Burgess)

> Add processor to perform simple aggregations
> 
>
> Key: NIFI-2735
> URL: https://issues.apache.org/jira/browse/NIFI-2735
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Reporter: Matt Burgess
>Priority: Major
>
> This is a proposal for a new processor (AggregateValues, for example) that 
> can perform simple aggregation operations such as count, sum, average, min, 
> max, and concatenate, over a set of "related" flow files. For example, when a 
> JSON file is split on an array (using the SplitJson processor), the total 
> count of the splits, the index of each split, and the unique identifier 
> (shared by each split) are stored as attributes in each flow file sent to the 
> "splits" relationship:
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitJson/index.html
> These attributes are the "fragment.*" attributes in the documentation for 
> SplitText, SplitXml, and SplitJson, for example.
> Such a processor could perform these operations for each flow file split from 
> the original document, and when all documents from a split have been 
> processed, a flow file could be transferred to an "aggregate" relationship 
> containing attributes for the operation, aggregate value, etc.
> An interesting application of this (besides the actual aggregation 
> operations) is that you can use the "aggregate" relationship as an event 
> trigger. For example if you need to wait until all files from a group are 
> processed, you can use AggregateValues and the "aggregate" relationship to 
> indicate downstream that the entire group has been processed. If there is not 
> a Split processor upstream, then the attributes (fragment.*) would have to be 
> manipulated by the data flow designer, but this can be accomplished with 
> other processors (including the scripting processors if necessary). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-5604) Allow GenerateTableFetch to send empty flow files when no rows would be fetched

2023-12-06 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-5604:
---
Fix Version/s: 1.9.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Allow GenerateTableFetch to send empty flow files when no rows would be 
> fetched
> ---
>
> Key: NIFI-5604
> URL: https://issues.apache.org/jira/browse/NIFI-5604
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 1.9.0
>
>
> Currently, GenerateTableFetch will not output a flow file if there are no 
> rows to be fetched. However, it may be desired (especially with incoming flow 
> files) that a flow file be sent out even if GTF does not generate any SQL. 
> This capability, along with the fragment attributes from NIFI-5601, would 
> allow the user to handle this downstream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-7054) Add RecordSinkServiceLookup

2023-12-06 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-7054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-7054:
---
Fix Version/s: 1.12.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Add RecordSinkServiceLookup
> ---
>
> Key: NIFI-7054
> URL: https://issues.apache.org/jira/browse/NIFI-7054
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 1.12.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The RecordSinkService controller service interface was added in NiFi 1.10 
> (via NIFI-6780) to decouple the destination for records in a FlowFile from 
> the format of those records. Since then there have been various 
> implementations (NIFI-6799, NIFI-6819). Other controller services have been 
> augmented with a "lookup" pattern where the actual CS can be swapped out 
> during the flow based on an attribute/variable (such as 
> DBCPConnectionPoolLookup). 
> RecordSinkService could be improved to have such a lookup as well, especially 
> with the advent of the PutRecord processor (NIFI-6947). This would allow some 
> flow files to be routed to Kafka while others are sent Site-to-Site for 
> example, all with a single configured controller service.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (NIFI-6670) Create a RecordReader that reads lines of text into single-field records

2023-12-06 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess reassigned NIFI-6670:
--

Assignee: (was: Matt Burgess)

> Create a RecordReader that reads lines of text into single-field records
> 
>
> Key: NIFI-6670
> URL: https://issues.apache.org/jira/browse/NIFI-6670
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Reporter: Matt Burgess
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> It would be nice to have a reader that can take any textual input and treat 
> each "line" as a single-field record. This is like CSVReader but there 
> wouldn't be a field delimiter; rather, a property to specify the name of the 
> field, and each line becomes a value for that field in the record.
> Additional capabilities could be added as well, such as skipping header 
> lines, grouping lines together as a single field value, ignoring empty lines, 
> etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-7516) Predictions model throws intermittent SingularMatrixExceptions

2023-12-06 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-7516:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Predictions model throws intermittent SingularMatrixExceptions
> --
>
> Key: NIFI-7516
> URL: https://issues.apache.org/jira/browse/NIFI-7516
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 1.12.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Under some circumstances, the Connection Status Analytics model (specifically 
> the Ordinary Least Squares model) throws a SingularMatrix exception:
> org.apache.commons.math3.linear.SingularMatrixException: matrix is singular
> This can happen (usually intermittently) when the data points used to update 
> the model form a matrix that has no inverse (i.e. singular).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (NIFI-4946) nifi-spark-bundle : Adding support for pyfiles, file, jars options

2023-12-06 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess reassigned NIFI-4946:
--

Assignee: (was: Matt Burgess)

> nifi-spark-bundle : Adding support for pyfiles, file, jars options
> --
>
> Key: NIFI-4946
> URL: https://issues.apache.org/jira/browse/NIFI-4946
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
> Environment: Ubuntu 16.04, IntelliJ
>Reporter: Mageswaran
>Priority: Major
> Attachments: nifi-spark-options.png, nifi-spark.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Adding support for submitting PySpark based Sparks jobs (which is normally 
> structured as modules) over Livy on existing "ExecuteSparkInteractive" 
> processor.
> This is done by reading file paths for pyfiles and file and an option from 
> user whether the processor should trigger a batch job or not.
> [https://livy.incubator.apache.org/docs/latest/rest-api.html]
>  *Current Work flow Logic ( [https://github.com/apache/nifi/pull/2521 
> )|https://github.com/apache/nifi/pull/2521]*
>  * Check whether the processor has to handle code or submit a Spark job
>  * Read incoming flow file
>  ** If batch == true
>  *** If flow file matches Livy `batches` JSON response through `wait` loop
>   Wait for Status Check Interval
>   Read the state
>   If state is `running` route it to `wait` or if it  is `success` or 
> `dead` route it accordingly
>  *** Else
>   Ignore the flow file
>   Trigger the Spark job over Livy `batches` endpoint
>   Read the state of the submitted job
>   If state is `running` route it to `wait` or if it  is `success` or 
> `dead` route it accordingly
>  ** Else:
>  *** Existing Logic to handle `Code`
>  
> !nifi-spark-options.png!
> !nifi-spark.png!
>  
> Thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-9814) Add range sampling to SampleRecord

2023-12-06 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-9814:
---
Fix Version/s: 1.19.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Add range sampling to SampleRecord
> --
>
> Key: NIFI-9814
> URL: https://issues.apache.org/jira/browse/NIFI-9814
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 1.19.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It would be nice to be able to specify exactly which records or ranges of 
> records are sampled from a FlowFile. For example if the first 5 lines of a 
> comma-delimited file are free-text (meaning it's not technically a CSV file 
> from NiFi's perspective as the first 5 lines do not constitute a header in 
> this example), it would be handy to be able to exclude them by specifying a 
> range filter of "6-" to say the 6th and every following record should be 
> output.
> In that vein SampleRecord could have a "range sampling" strategy where the 
> user could specify something like "2, 5-7, 25-" where the second, fifth, 
> sixth, seventh, and every record from the twenty-fifth record on would be 
> included in the outgoing flow file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (NIFI-4491) Add a CaptureChangeMySQLRecord processor

2023-12-06 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess reassigned NIFI-4491:
--

Assignee: (was: Matt Burgess)

> Add a CaptureChangeMySQLRecord processor
> 
>
> Key: NIFI-4491
> URL: https://issues.apache.org/jira/browse/NIFI-4491
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Priority: Major
>
> The main reason CaptureChangeMySQL doesn't leverage the RecordSetWriter API 
> is that those capabilities were being developed in parallel with that 
> processor. Whether a new record-aware processor is better than an improvement 
> to the original is up for discussion; however, it would be a good idea to 
> support the RecordSetWriter API for any CDC (CaptureChangeXYZ) processor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (NIFI-11789) ExecuteSQL doesn't set fragment.count attribute when Output Batch Size is set

2023-12-06 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess reassigned NIFI-11789:
---

Assignee: (was: Matt Burgess)

> ExecuteSQL doesn't set fragment.count attribute when Output Batch Size is set
> -
>
> Key: NIFI-11789
> URL: https://issues.apache.org/jira/browse/NIFI-11789
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Tamas Neumer
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hi,
> I am working with the ExecuteSQL processor and discovered an unexpected 
> behavior. If I specify the attribute "Output Batch Size", I get the 
> fragment.index on the outflowing flowing Flowfiles, but the fragment.count 
> attribute is not set (according to the documentation).
> The behavior I would expect (in line with how merge processors work) is that 
> the attribute fragment.count is just set at the last Flowfile for the batch. 
> This would make it possible to merge all the batches together afterward.
> So my proposal, in short, is that the fragment.count should be set in the 
> last Flowfile of a batch. 
> BR Florian



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (NIFI-12467) HadoopDBCPConnectionPool doesn't use KerberosUserService

2023-12-06 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-12467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess reassigned NIFI-12467:
---

Assignee: Matt Burgess

> HadoopDBCPConnectionPool doesn't use KerberosUserService
> 
>
> Key: NIFI-12467
> URL: https://issues.apache.org/jira/browse/NIFI-12467
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 1.25.0, 2.0.0
>
>
> NIFI-8978 added some KerberosUserService integration with the 
> DBCPConnectionPool implementations, but for HadoopDBCPConnectionPool its 
> values are only used during customValidate(). They are not read in 
> onEnabled() and will fail when authenticating.
> The workaround is to use the Principal and Keytab/Password properties or a 
> Keytab Credentials Service instead of a KerberosUserService.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NIFI-12467) HadoopDBCPConnectionPool doesn't use KerberosUserService

2023-12-04 Thread Matt Burgess (Jira)
Matt Burgess created NIFI-12467:
---

 Summary: HadoopDBCPConnectionPool doesn't use KerberosUserService
 Key: NIFI-12467
 URL: https://issues.apache.org/jira/browse/NIFI-12467
 Project: Apache NiFi
  Issue Type: Bug
  Components: Extensions
Reporter: Matt Burgess
 Fix For: 1.25.0, 2.0.0


NIFI-8978 added some KerberosUserService integration with the 
DBCPConnectionPool implementations, but for HadoopDBCPConnectionPool its values 
are only used during customValidate(). They are not read in onEnabled() and 
will fail when authenticating.

The workaround is to use the Principal and Keytab/Password properties or a 
Keytab Credentials Service instead of a KerberosUserService.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-6054) phoenix DBCP connection pool

2023-12-04 Thread Matt Burgess (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792944#comment-17792944
 ] 

Matt Burgess commented on NIFI-6054:


Can we close this as superseded by NIFI-7257?

> phoenix DBCP connection pool
> 
>
> Key: NIFI-6054
> URL: https://issues.apache.org/jira/browse/NIFI-6054
> Project: Apache NiFi
>  Issue Type: New Feature
>Reporter: Karthik Narayanan
>Assignee: Karthik Narayanan
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> I have seen a lot of users having trouble connecting to phoenix and query 
> hbase. This Jira will create a DBCP connection pool controller service for 
> phoenix. This will be a lot easier than creating shaded jars with 
> configuration xmls embedded in them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >