[jira] [Updated] (NIFI-8932) Add feature to CSVReader to skip N lines at top of the file

2024-02-01 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-8932:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

> Add feature to CSVReader to skip N lines at top of the file
> ---
>
> Key: NIFI-8932
> URL: https://issues.apache.org/jira/browse/NIFI-8932
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Philipp Korniets
>Assignee: Matt Burgess
>Priority: Minor
>  Labels: backport-needed
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> We have a lot of CSV files where provider add custom header/footer to valid 
> CSV content.
>  CSV header is actually second row. 
> To remove unnecessary data we can use
>  * ReplaceText 
>  * splitText->RouteOnAttribute -> MergeContent
> It would be great to have an option in CSVReader controller to skip N rows 
> from top/bottom in order to get5 clean data.
>  * skip N from the top
>  * skip M from the bottom
>  Similar request was developed in FLINK 
> https://issues.apache.org/jira/browse/FLINK-1002
>  
> Data Example:
> {code}
> 7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X),,,
> distribution_id,Distribution 
> Id,settle_date,group_code,company_name,currency_code,common_account_name,business_date,prod_code,security,class,asset_type
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,EUR,TPSL_21025226   ,19-Jul-21,BRM96ST7   ,ABC 
> 14/09/24,NR,BOND  
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,GBP,RPSS_21025226   ,19-Jul-21,,Total @ -0.11,,
> {code}
> |7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X)|  |  |  |  |  |  |  
> |  |  |  |  |  
> |distribution_id|Distribution 
> Id|settle_date|group_code|company_name|currency_code|common_account_name|business_date|prod_code|security|class|asset_type|
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |EUR|TPSL_21025226   |19-Jul-21|BRM96ST7   |ABC 
> 14/09/24|NR|BOND  |
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |GBP|RPSS_21025226   |19-Jul-21| |Total @ -0.11| | |



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-8932) Add feature to CSVReader to skip N lines at top of the file

2024-01-15 Thread David Handermann (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Handermann updated NIFI-8932:
---
Labels: backport-needed  (was: )

> Add feature to CSVReader to skip N lines at top of the file
> ---
>
> Key: NIFI-8932
> URL: https://issues.apache.org/jira/browse/NIFI-8932
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Philipp Korniets
>Assignee: Matt Burgess
>Priority: Minor
>  Labels: backport-needed
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> We have a lot of CSV files where provider add custom header/footer to valid 
> CSV content.
>  CSV header is actually second row. 
> To remove unnecessary data we can use
>  * ReplaceText 
>  * splitText->RouteOnAttribute -> MergeContent
> It would be great to have an option in CSVReader controller to skip N rows 
> from top/bottom in order to get5 clean data.
>  * skip N from the top
>  * skip M from the bottom
>  Similar request was developed in FLINK 
> https://issues.apache.org/jira/browse/FLINK-1002
>  
> Data Example:
> {code}
> 7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X),,,
> distribution_id,Distribution 
> Id,settle_date,group_code,company_name,currency_code,common_account_name,business_date,prod_code,security,class,asset_type
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,EUR,TPSL_21025226   ,19-Jul-21,BRM96ST7   ,ABC 
> 14/09/24,NR,BOND  
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,GBP,RPSS_21025226   ,19-Jul-21,,Total @ -0.11,,
> {code}
> |7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X)|  |  |  |  |  |  |  
> |  |  |  |  |  
> |distribution_id|Distribution 
> Id|settle_date|group_code|company_name|currency_code|common_account_name|business_date|prod_code|security|class|asset_type|
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |EUR|TPSL_21025226   |19-Jul-21|BRM96ST7   |ABC 
> 14/09/24|NR|BOND  |
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |GBP|RPSS_21025226   |19-Jul-21| |Total @ -0.11| | |



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-8932) Add feature to CSVReader to skip N lines at top of the file

2024-01-15 Thread David Handermann (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Handermann updated NIFI-8932:
---
Fix Version/s: (was: 1.25.0)
   (was: 2.0.0)

> Add feature to CSVReader to skip N lines at top of the file
> ---
>
> Key: NIFI-8932
> URL: https://issues.apache.org/jira/browse/NIFI-8932
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Philipp Korniets
>Assignee: Matt Burgess
>Priority: Minor
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> We have a lot of CSV files where provider add custom header/footer to valid 
> CSV content.
>  CSV header is actually second row. 
> To remove unnecessary data we can use
>  * ReplaceText 
>  * splitText->RouteOnAttribute -> MergeContent
> It would be great to have an option in CSVReader controller to skip N rows 
> from top/bottom in order to get5 clean data.
>  * skip N from the top
>  * skip M from the bottom
>  Similar request was developed in FLINK 
> https://issues.apache.org/jira/browse/FLINK-1002
>  
> Data Example:
> {code}
> 7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X),,,
> distribution_id,Distribution 
> Id,settle_date,group_code,company_name,currency_code,common_account_name,business_date,prod_code,security,class,asset_type
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,EUR,TPSL_21025226   ,19-Jul-21,BRM96ST7   ,ABC 
> 14/09/24,NR,BOND  
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,GBP,RPSS_21025226   ,19-Jul-21,,Total @ -0.11,,
> {code}
> |7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X)|  |  |  |  |  |  |  
> |  |  |  |  |  
> |distribution_id|Distribution 
> Id|settle_date|group_code|company_name|currency_code|common_account_name|business_date|prod_code|security|class|asset_type|
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |EUR|TPSL_21025226   |19-Jul-21|BRM96ST7   |ABC 
> 14/09/24|NR|BOND  |
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |GBP|RPSS_21025226   |19-Jul-21| |Total @ -0.11| | |



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-8932) Add feature to CSVReader to skip N lines at top of the file

2023-11-29 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-8932:
---
Fix Version/s: 1.25.0
   2.0.0

> Add feature to CSVReader to skip N lines at top of the file
> ---
>
> Key: NIFI-8932
> URL: https://issues.apache.org/jira/browse/NIFI-8932
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Philipp Korniets
>Assignee: Matt Burgess
>Priority: Minor
> Fix For: 1.25.0, 2.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We have a lot of CSV files where provider add custom header/footer to valid 
> CSV content.
>  CSV header is actually second row. 
> To remove unnecessary data we can use
>  * ReplaceText 
>  * splitText->RouteOnAttribute -> MergeContent
> It would be great to have an option in CSVReader controller to skip N rows 
> from top/bottom in order to get5 clean data.
>  * skip N from the top
>  * skip M from the bottom
>  Similar request was developed in FLINK 
> https://issues.apache.org/jira/browse/FLINK-1002
>  
> Data Example:
> {code}
> 7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X),,,
> distribution_id,Distribution 
> Id,settle_date,group_code,company_name,currency_code,common_account_name,business_date,prod_code,security,class,asset_type
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,EUR,TPSL_21025226   ,19-Jul-21,BRM96ST7   ,ABC 
> 14/09/24,NR,BOND  
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,GBP,RPSS_21025226   ,19-Jul-21,,Total @ -0.11,,
> {code}
> |7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X)|  |  |  |  |  |  |  
> |  |  |  |  |  
> |distribution_id|Distribution 
> Id|settle_date|group_code|company_name|currency_code|common_account_name|business_date|prod_code|security|class|asset_type|
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |EUR|TPSL_21025226   |19-Jul-21|BRM96ST7   |ABC 
> 14/09/24|NR|BOND  |
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |GBP|RPSS_21025226   |19-Jul-21| |Total @ -0.11| | |



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-8932) Add feature to CSVReader to skip N lines at top of the file

2023-10-29 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-8932:
---
Status: Patch Available  (was: In Progress)

> Add feature to CSVReader to skip N lines at top of the file
> ---
>
> Key: NIFI-8932
> URL: https://issues.apache.org/jira/browse/NIFI-8932
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Philipp Korniets
>Assignee: Matt Burgess
>Priority: Minor
> Fix For: 1.latest, 2.latest
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have a lot of CSV files where provider add custom header/footer to valid 
> CSV content.
>  CSV header is actually second row. 
> To remove unnecessary data we can use
>  * ReplaceText 
>  * splitText->RouteOnAttribute -> MergeContent
> It would be great to have an option in CSVReader controller to skip N rows 
> from top/bottom in order to get5 clean data.
>  * skip N from the top
>  * skip M from the bottom
>  Similar request was developed in FLINK 
> https://issues.apache.org/jira/browse/FLINK-1002
>  
> Data Example:
> {code}
> 7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X),,,
> distribution_id,Distribution 
> Id,settle_date,group_code,company_name,currency_code,common_account_name,business_date,prod_code,security,class,asset_type
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,EUR,TPSL_21025226   ,19-Jul-21,BRM96ST7   ,ABC 
> 14/09/24,NR,BOND  
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,GBP,RPSS_21025226   ,19-Jul-21,,Total @ -0.11,,
> {code}
> |7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X)|  |  |  |  |  |  |  
> |  |  |  |  |  
> |distribution_id|Distribution 
> Id|settle_date|group_code|company_name|currency_code|common_account_name|business_date|prod_code|security|class|asset_type|
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |EUR|TPSL_21025226   |19-Jul-21|BRM96ST7   |ABC 
> 14/09/24|NR|BOND  |
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |GBP|RPSS_21025226   |19-Jul-21| |Total @ -0.11| | |



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-8932) Add feature to CSVReader to skip N lines at top of the file

2023-10-28 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-8932:
---
Summary: Add feature to CSVReader to skip N lines at top of the file  (was: 
Add feature to CSVReader to skip N lines at top/bottom of the file)

> Add feature to CSVReader to skip N lines at top of the file
> ---
>
> Key: NIFI-8932
> URL: https://issues.apache.org/jira/browse/NIFI-8932
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Philipp Korniets
>Assignee: Matt Burgess
>Priority: Minor
> Fix For: 1.latest, 2.latest
>
>
> We have a lot of CSV files where provider add custom header/footer to valid 
> CSV content.
>  CSV header is actually second row. 
> To remove unnecessary data we can use
>  * ReplaceText 
>  * splitText->RouteOnAttribute -> MergeContent
> It would be great to have an option in CSVReader controller to skip N rows 
> from top/bottom in order to get5 clean data.
>  * skip N from the top
>  * skip M from the bottom
>  Similar request was developed in FLINK 
> https://issues.apache.org/jira/browse/FLINK-1002
>  
> Data Example:
> {code}
> 7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X),,,
> distribution_id,Distribution 
> Id,settle_date,group_code,company_name,currency_code,common_account_name,business_date,prod_code,security,class,asset_type
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,EUR,TPSL_21025226   ,19-Jul-21,BRM96ST7   ,ABC 
> 14/09/24,NR,BOND  
> -1,all,20210719,Repo 21025226,qwerty                                    
> ,GBP,RPSS_21025226   ,19-Jul-21,,Total @ -0.11,,
> {code}
> |7/20/21 2:48:47 AM GMT-04:00  ABB: Blended Rate Calc (X)|  |  |  |  |  |  |  
> |  |  |  |  |  
> |distribution_id|Distribution 
> Id|settle_date|group_code|company_name|currency_code|common_account_name|business_date|prod_code|security|class|asset_type|
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |EUR|TPSL_21025226   |19-Jul-21|BRM96ST7   |ABC 
> 14/09/24|NR|BOND  |
> |-1|all|20210719|Repo 21025226|qwerty                                    
> |GBP|RPSS_21025226   |19-Jul-21| |Total @ -0.11| | |



--
This message was sent by Atlassian Jira
(v8.20.10#820010)